source: Deliverables/D3.1/Report/report.tex @ 381

Last change on this file since 381 was 381, checked in by campbell, 10 years ago

Some d3.1 work.

File size: 29.2 KB
Line 
1\documentclass[11pt,epsf,a4wide]{article}
2\usepackage[utf8x]{inputenc}
3\usepackage{listings}
4\usepackage{../../style/cerco}
5\newcommand{\ocaml}{OCaml}
6\newcommand{\clight}{Clight}
7\newcommand{\matita}{matita}
8\newcommand{\sdcc}{\texttt{sdcc}}
9
10\newcommand{\textSigma}{\ensuremath{\Sigma}}
11
12% LaTeX Companion, p 74
13\newcommand{\todo}[1]{\marginpar{\raggedright - #1}}
14
15\lstdefinelanguage{coq}
16  {keywords={Definition,Lemma,Theorem,Remark,Qed,Save,Inductive,Record},
17   morekeywords={[2]if,then,else},
18  }
19
20\lstdefinelanguage{matita}
21  {keywords={ndefinition,nlemma,ntheorem,nremark,ninductive,nrecord,nqed,nlet,rec,match,with,Type},
22   morekeywords={[2]nwhd,nnormalize,nelim,ncases,ndestruct},
23   mathescape=true,
24  }
25
26\lstset{language=matita,basicstyle=\small\tt,columns=flexible,breaklines=false,
27        keywordstyle=\color{red}\bfseries,
28        keywordstyle=[2]\color{blue},
29        commentstyle=\color{green},
30        stringstyle=\color{blue},
31        showspaces=false,showstringspaces=false}
32
33\lstset{extendedchars=false}
34\lstset{inputencoding=utf8x}
35\DeclareUnicodeCharacter{8797}{:=}
36\DeclareUnicodeCharacter{10746}{++}
37\DeclareUnicodeCharacter{9001}{\ensuremath{\langle}}
38\DeclareUnicodeCharacter{9002}{\ensuremath{\rangle}}
39
40
41\title{
42INFORMATION AND COMMUNICATION TECHNOLOGIES\\
43(ICT)\\
44PROGRAMME\\
45\vspace*{1cm}Project FP7-ICT-2009-C-243881 \cerco{}}
46
47\date{ }
48\author{}
49
50\begin{document}
51\thispagestyle{empty}
52
53\vspace*{-1cm}
54\begin{center}
55\includegraphics[width=0.6\textwidth]{../../style/cerco_logo.png}
56\end{center}
57
58\begin{minipage}{\textwidth}
59\maketitle
60\end{minipage}
61
62
63\vspace*{0.5cm}
64\begin{center}
65\begin{LARGE}
66\bf
67Report n. D3.1\\
68Executable Formal Semantics of C\\
69\end{LARGE} 
70\end{center}
71
72\vspace*{2cm}
73\begin{center}
74\begin{large}
75Version 1.0
76\end{large}
77\end{center}
78
79\vspace*{0.5cm}
80\begin{center}
81\begin{large}
82Main Authors:\\
83Brian~Campbell, Randy~Pollack
84\end{large}
85\end{center}
86
87\vspace*{\fill}
88\noindent
89Project Acronym: \cerco{}\\
90Project full title: Certified Complexity\\
91Proposal/Contract no.: FP7-ICT-2009-C-243881 \cerco{}\\
92
93\clearpage \pagestyle{myheadings} \markright{\cerco{}, FP7-ICT-2009-C-243881}
94
95\tableofcontents
96
97\section{Introduction}
98
99We present an executable formal semantics of the C programming language which
100will serve as the specification of the input language for the \cerco{}
101verified compiler.  Our semantics is based on Leroy et.~al.'s C semantics for
102the CompCert project~\cite{1111042,compcert-mm08}, which divides the treatment
103of C into two pieces.  The first is an \ocaml{} stage which parses and
104elaborates C into an abstract syntax tree for the simpler \clight{} language,
105based on the CIL C parser.  The second part is a small step semantics for
106\clight{} formalised in the proof tool, which we have ported from Coq to the
107\matita{} theorem prover.
108This semantics is given in the form of inductive definitions, and so we have
109added a third part giving an equivalent functional presentation in \matita{}.
110
111The \cerco{} compiler needs to deal with the constrained memory model of
112the target microcontroller (in our case, the 8051).  Thus each part of the
113semantics has been extended to allow explicit handling of the
114microcontroller's memory spaces.  \emph{Cost labels} have also been added to
115the \clight{} semantics to support the labelling approach to cost annotations
116presented in~\cite{d2.1}.
117
118The following section discusses the C language extensions for memory spaces.
119Then the port of the two stages of the CompCert \clight{} semantics is
120described in Section~\ref{sec:port}, followed by the new executable semantics
121in Section~\ref{sec:exec}.  Finally we discuss how the semantics can be tested
122in Section~\ref{sec:valid}.
123
124\section{Language extensions for the 8051 memory model}
125\label{sec:ext}
126
127The choice of an extended 8051 target for the \cerco{} compiler imposes an
128irregular memory model with tight resource constraints.  The different memory
129spaces and access modes are summarised in Figure~\ref{fig:memorymodel} ---
130essentially the evolution of the 8051 family has fragmented memory into
131four regions:  one half of the `internal' memory is fully accessible but also
132contains the register banks, the second half cannot be accessed by direct addressing
133because it is shadowed by the `Special Function Registers' (SFRs) for I/O;
134`external memory' provides the bulk of memory in a separate address space; and
135the code is in its own read-only space.
136\begin{figure}
137\setlength{\unitlength}{1pt}
138\begin{picture}(410,250)(-50,200)
139%\put(-50,200){\framebox(410,250){}}
140\put(12,410){\makebox(80,0)[b]{Internal (256B)}}
141\put(13,242){\line(0,1){165}}
142\put(93,242){\line(0,1){165}}
143\put(13,407){\line(1,0){80}}
144\put(12,400){\makebox(0,0)[r]{0h}}  \put(14,400){\makebox(0,0)[l]{Register bank 0}}
145\put(13,393){\line(1,0){80}}
146\put(12,386){\makebox(0,0)[r]{8h}}  \put(14,386){\makebox(0,0)[l]{Register bank 1}}
147\put(13,379){\line(1,0){80}}
148\put(12,372){\makebox(0,0)[r]{10h}}  \put(14,372){\makebox(0,0)[l]{Register bank 2}}
149\put(13,365){\line(1,0){80}}
150\put(12,358){\makebox(0,0)[r]{18h}} \put(14,358){\makebox(0,0)[l]{Register bank 3}}
151\put(13,351){\line(1,0){80}}
152\put(12,344){\makebox(0,0)[r]{20h}} \put(14,344){\makebox(0,0)[l]{Bit addressable}}
153\put(13,323){\line(1,0){80}}
154\put(12,316){\makebox(0,0)[r]{30h}}
155  \put(14,309){\makebox(0,0)[l]{\quad \vdots}}
156\put(13,291){\line(1,0){80}}
157\put(12,284){\makebox(0,0)[r]{80h}}
158  \put(14,263){\makebox(0,0)[l]{\quad \vdots}}
159\put(12,249){\makebox(0,0)[r]{ffh}}
160\put(13,242){\line(1,0){80}}
161
162\qbezier(-2,407)(-6,407)(-6,393)
163\qbezier(-6,393)(-6,324)(-10,324)
164\put(-12,324){\makebox(0,0)[r]{indirect}}
165\qbezier(-6,256)(-6,324)(-10,324)
166\qbezier(-2,242)(-6,242)(-6,256)
167
168\qbezier(94,407)(98,407)(98,393)
169\qbezier(98,393)(98,349)(102,349)
170\put(104,349){\makebox(0,0)[l]{direct/stack}}
171\qbezier(98,305)(98,349)(102,349)
172\qbezier(94,291)(98,291)(98,305)
173
174\put(102,242){\framebox(20,49){sfr}}
175% bit access to sfrs?
176
177\qbezier(124,291)(128,291)(128,277)
178\qbezier(128,277)(128,266)(132,266)
179\put(134,266){\makebox(0,0)[l]{direct}}
180\qbezier(128,257)(128,266)(132,266)
181\qbezier(124,242)(128,242)(128,256)
182
183\put(164,410){\makebox(80,0)[b]{External (64kB)}}
184\put(164,220){\line(0,1){187}}
185\put(164,407){\line(1,0){80}}
186\put(244,220){\line(0,1){187}}
187\put(164,242){\line(1,0){80}}
188\put(163,400){\makebox(0,0)[r]{0h}}
189\put(164,324){\makebox(80,0){paged access}}
190  \put(164,310){\makebox(80,0){direct/indirect}}
191\put(163,235){\makebox(0,0)[r]{80h}}
192  \put(164,228){\makebox(80,0){\vdots}}
193  \put(164,210){\makebox(80,0){direct/indirect}}
194
195\put(264,410){\makebox(80,0)[b]{Code (64kB)}}
196\put(264,220){\line(0,1){187}}
197\put(264,407){\line(1,0){80}}
198\put(344,220){\line(0,1){187}}
199\put(263,400){\makebox(0,0)[r]{0h}}
200  \put(264,228){\makebox(80,0){\vdots}}
201  \put(264,324){\makebox(80,0){direct}}
202  \put(264,310){\makebox(80,0){PC relative}}
203\end{picture}
204\caption{The extended 8051 memory model}
205\label{fig:memorymodel}
206\end{figure}
207
208To make efficient use of the limited amount of memory, compilers for 8051
209microcontrollers provide extra keywords to allocate global variables to
210particular memory spaces, and to limit pointers to address a particular space.
211The freely available \sdcc{} compiler provides the following extensions for
212the 8051 memory spaces:
213\begin{table}[ht]
214\begin{tabular}{rcl}
215Attribute & Pointer size (bytes) & Memory space \\\hline
216\lstinline'__data' & 1 & Internal, first half (0h -- 7fh) \\
217\lstinline'__idata' & 1 & Internal, indirect only (80h -- ffh) \\
218\lstinline'__pdata' & 1 & External, page access (usually 0h -- 7fh) \\
219\lstinline'__xdata' & 2 & External, any (0h -- ffffh) \\
220\lstinline'__code' & 2 & Code, any (0h -- ffffh) \\
221\emph{none}& 3 & Any / Generic pointer
222\end{tabular}
223\end{table}
224The generic pointers are a tagged union of the other kinds of pointers.
225
226We intend the \cerco{} compiler to support extensions that are broadly
227compatible with \sdcc{} to enable the compilation of programs with
228either tool.  In particular, this would allow the comparison of the
229behaviour of test cases compiled with each compiler.  Thus the C
230syntax and semantics have been extended with the memory space
231attributes listed above.  The syntax follows \sdcc{} and in the
232semantics we track the memory space that each block was allocated in
233and only permit access via the appropriate kinds of pointers.  The
234details on these changes are given in the following sections.
235
236The \sdcc{} compiler also supports special variable types for accessing the
237SFRs, which provide the standard I/O mechanism for the 8051 family.  (Note
238that pointers to these types are not permitted because only direct addressing of
239the SFRs is allowed.)  We intend to use CompCert-style `external functions'
240instead of special types.  These are functions which are declared, but no C
241implementation of them is provided.  Instead they are provided by the runtime or
242compiled directly to the corresponding machine code.  This has the advantage
243that no changes from the CompCert semantics are required, and a compatibility
244library can be provided for \sdcc{} if necessary.  The 8051 and the \sdcc{}
245compiler also provide bit-level access to a small region of internal memory.
246We do not intend to expose this feature to C programs in the \cerco{}
247compiler, and so no extension is provided for it.
248
249Finally, we adopt the \sdcc{} extension to allocate a variable at a
250particular address to provide a way to deal with memory mapped I/O in
251the external memory space.
252\todo{Are we really going to do this?}
253
254\section{Port of CompCert \clight{} semantics to \matita{}}
255\label{sec:port}
256
257\todo{Could do with some text here}
258\subsection{Parsing and elaboration}
259The first stage taken from the CompCert semantics is the parsing and
260elaboration of C programs into the simpler \clight{} language.  This is based
261upon the CIL library for parsing, analysing and transforming C programs by
262Necula et.~al.~\cite{cil02}.  The elaboration performed provides explicit type
263information throughout the program, including extra casts for promotion, and
264performs simplifications such as breaking up expressions with side effects
265into effect-free expressions and statements to perform the effects.  The
266transformed \clight{} programs are much more manageable and lack the ambiguities
267of C, but also remain easily understood by C programmers.
268
269The parser has been extended with the 8051 memory spaces attributes given
270above.  The resulting abstract syntax tree records them on global variable
271declarations and pointer types.  However, we also need to deal with them
272during the elaboration process to produce all of the required type information.
273For example, when the address-of operator \lstinline'&' is used it must decide
274which kind of pointer should be used.  Thus the extended elaboration process
275keeps track of the memory space (if any) that the value of each
276expression resides in.  Where the memory space is not known, a generic pointer
277will be used instead.  Moreover, we also include the pointer kind when
278determining whether a cast must be inserted so that conversions between
279pointer representations can be performed.
280
281
282Thus the elaboration turns the following C code
283\begin{quote}
284\begin{lstlisting}[language=C]
285int g(int *x) { return 5; }
286
287int f(__data int *x, int *y) {
288  return x==y ? g(x) : *x;
289}
290
291__data int i = 1;
292
293int main(void) {
294  return f(&i, &i);
295}
296\end{lstlisting}
297\end{quote}
298into the Clight program below:
299\begin{quote}
300\begin{lstlisting}[language=C]
301int g(int *x) { return 5; }
302
303int f(__data int * x, int * y)
304{
305  int t;
306  if (x == (__data int * )y) {
307    t = g((int * )x);
308  } else {
309    t = *x;
310  }
311  return t;
312}
313
314int main(void)
315{
316  int t;
317  t = f(&i, (int * )(&i));
318  return t;
319}
320\end{lstlisting}
321\end{quote}
322The expressions in \lstinline'f' and \lstinline'main' had to be broken up due
323to side-effects, and casts have been added to change between generic pointers
324and pointers specific to the \lstinline'__data' section of memory.  The
325underlying data structure also has types attached to every expression, but
326these are impractical to show in source form.
327
328Note that the translation from C to \clight{} is not proven correct
329--- instead it effectively forms a semi-formal part of the whole C
330semantics.  We can have some confidence in the code, however, because
331it has received testing in the \cerco{} prototype, and it is very
332close to the version used in CompCert.  We can also perform testing of
333the semantics without involving the rest of the compiler because we
334have an executable semantics.  Moreover, the cautious programmer could
335choose to inspect the generated \clight{} code, or even work entirely
336in the \clight{} language.
337
338\subsection{Small-step inductive semantics}
339\label{sec:inductive}
340
341The semantics for \clight{} itself has been ported from the Coq development
342used in CompCert to \matita{} for use in \cerco{}.  Details about the original
343big-step formalisation of \clight{} can be found in Leroy and
344Blazy~\cite{compcert-mm08} (including a discussion of the translation from C
345in \S 4.1), although we started from a later version with a
346small-step semantics and hence support for \lstinline'goto' statements.
347Several parts of the semantics were shared with other parts of the CompCert
348development, notably:
349\begin{itemize}
350\item the representation of primitive values (integers, pointers and undefined
351values, but not structures or unions) and operations on them,
352\item traces of I/O events,
353\item a memory model that keeps conceptually distinct sections of memory
354strictly separate (assigning `undefined behaviour' to a buffer overflow, for
355instance),
356\item results about composing execution steps of arbitrary small-step
357semantics,
358\item data structures for local and global environments, and
359\item common error handling constructs, in particular an error monad.
360\end{itemize}
361We anticipate a similar arrangement for the \cerco{} verified
362compiler, although this means that there may be further changes to the
363common parts of the semantics later in the project to harmonise the
364stages of the compiler.  In particular, some of data structures for
365environments are just preliminary definitions for developing the
366semantics.
367
368The main body of the small-step semantics is a number of inductive definitions
369giving details of the defined behaviour for casts, expressions and statements.
370Expressions are side-effect free in \clight{} and only produce a value as
371output.  In our case we also need a trace of any cost labels that are
372`evaluated' so that we will be able to give fine-grained costs for
373the execution of compiled conditional expressions.
374
375As an example of one of the expression rules, consider an expression which
376evaluates a variable, \lstinline'Expr (Evar id) ty'.  A variable is an example
377of a class of expressions called \emph{lvalues}, which are roughly those
378expressions which can be assigned to.  Thus we use a general rule for lvalues,
379\label{page:evalvar}
380\begin{quote}
381\begin{lstlisting}
382ninductive eval_expr (ge:genv) (e:env) (m:mem) : expr $\rightarrow$ val $\rightarrow$ trace $\rightarrow$ Prop ≝
383...
384  | eval_Elvalue: $\forall$a,ty,psp,loc,ofs,v,tr.
385      eval_lvalue ge e m (Expr a ty) psp loc ofs tr $\rightarrow$
386      load_value_of_type ty m psp loc ofs = Some ? v $\rightarrow$
387      eval_expr ge e m (Expr a ty) v tr
388\end{lstlisting}
389\end{quote}
390where the auxiliary relation \lstinline'eval_lvalue' yields the location of
391the value, \lstinline'psp,loc,ofs', consisting of memory space, memory block,
392and offset into the block, respectively.  The expression can thus evaluate to
393the value \lstinline'v' if \lstinline'v' can be loaded from that location.
394One corresponding part of the \lstinline'eval_lvalue' definition is
395\begin{quote}
396\begin{lstlisting}
397...
398with eval_lvalue (*(ge:genv) (e:env) (m:mem)*) :
399        expr $\rightarrow$ memory_space $\rightarrow$ block $\rightarrow$ int $\rightarrow$ trace $\rightarrow$ Prop ≝
400  | eval_Evar_local:   $\forall$id,l,ty.
401      get ??? id e = Some ? l $\rightarrow$
402      eval_lvalue ge e m (Expr (Evar id) ty) Any l zero E0
403...
404\end{lstlisting}
405\end{quote}
406simply looks up the variable in the local environment.  The offset is
407zero because all variables are given their own memory block to prevent
408the use of stray pointers.  A similar rule handles global variables,
409with an extra check to ensure that no local variable has the same
410name.  Note that the two relations are defined using mutual recursion
411because \lstinline'eval_lvalue' uses \lstinline'eval_expr' for the
412evaluation of the pointer expression in the dereferencing rule.
413
414Casts also have an auxiliary relation to specify the allowed changes, and
415operations on values (including the changes in representation performed by
416casting) are given as functions.
417
418The only new expression in our semantics is the cost label which wraps
419around another expression.  It does not change the result, but merely
420prefixes the trace with the given label to identify the branches taken
421in conditional expressions so that accurate cost information can be
422attached to the program:
423\begin{quote}
424\begin{lstlisting}
425  | eval_Ecost: $\forall$a,ty,v,l,tr.
426      eval_expr ge e m a v tr $\rightarrow$
427      eval_expr ge e m (Expr (Ecost l a) ty) v (tr++Echarge l)
428\end{lstlisting}
429\end{quote}
430\todo{make the semantics of the cost labels clearer}
431
432As the expressions are side-effect free, all of the changes to the state are
433performed by statements.  The state itself is represented by records of the
434form
435\begin{quote}
436\begin{lstlisting}
437ninductive state: Type :=
438  | State:
439      $\forall$f: function.
440      $\forall$s: statement.
441      $\forall$k: cont.
442      $\forall$e: env.
443      $\forall$m: mem.  state
444  | Callstate:
445      $\forall$fd: fundef.
446      $\forall$args: list val.
447      $\forall$k: cont.
448      $\forall$m: mem. state
449  | Returnstate:
450      $\forall$res: val.
451      $\forall$k: cont.
452      $\forall$m: mem. state.
453\end{lstlisting}
454\end{quote}
455During normal execution the state contains the currently executing
456function's definition (used to find \lstinline'goto' labels and also
457to check whether the function is expected to return a value), the
458statement to be executed next, a continuation value to be executed
459afterwards (where successor statements and details of function calls
460and loops are stored), the local environment mapping variables to
461memory locations\footnote{In the semantics all variables are
462  allocated, although the compiler may subsequently allocate them to
463  registers where possible.} and the current memory state.  \todo{need
464  to note that all variables 'appear' to be memory allocated, even if
465  they're subsequently optimised away.}  The function call and return
466states appear to store less information because the details of the
467caller are contained in the continuation.
468
469An example of the statement execution rules is the assignment rule
470(corresponding to the C syntax \lstinline'a1 = a2'),
471\label{page:Sassign}
472\begin{quote}
473\begin{lstlisting}
474ninductive step (ge:genv) : state $\rightarrow$ trace $\rightarrow$ state $\rightarrow$ Prop ≝
475  | step_assign:   $\forall$f,a1,a2,k,e,m,psp,loc,ofs,v2,m',tr1,tr2.
476      eval_lvalue ge e m a1 psp loc ofs tr1 $\rightarrow$
477      eval_expr ge e m a2 v2 tr2 $\rightarrow$
478      store_value_of_type (typeof a1) m psp loc ofs v2 = Some ? m' $\rightarrow$
479      step ge (State f (Sassign a1 a2) k e m)
480           (tr1++tr2) (State f Sskip k e m')
481...
482\end{lstlisting}
483\end{quote}
484which can be read as:
485\begin{itemize}
486\item if \lstinline'a1' can evaluate to the location \lstinline'psp,loc,ofs',
487\item \lstinline'a2' can evaluate to a value \lstinline'v2', and
488\item storing \lstinline'v2' at location \lstinline'psp,loc,ofs' succeeds,
489yielding the new memory state \lstinline-m'-, then
490\item the program can step from the state about to execute
491\lstinline'Sassign a1 a2' to a state with the updated memory \lstinline-m'-
492about to execute the no-op \lstinline'Sskip'.
493\end{itemize}
494This rule would be followed by one of the rules to which replaces the
495\lstinline'Sskip' statement with the `real' next statement constructed
496from the continuation \lstinline'k'.  Note that the only true
497side-effect here is the change in memory --- the local environment is
498initialised once and for all on function entry, and the only events
499appearing in the trace are cost labels used purely for accounting.  At
500present this imposes an ordering due to the cost labels.  Should this
501prove too restrictive we may change it to produce a set of labels
502encountered.
503
504The \clight{} language provides input and output effects through `external'
505functions and the step rule
506\begin{quote}
507\begin{lstlisting}
508  | step_external_function: $\forall$id,targs,tres,vargs,k,m,vres,t.
509      event_match (external_function id targs tres) vargs t vres $\rightarrow$
510      step ge (Callstate (External id targs tres) vargs k m)
511            t (Returnstate vres k m)
512\end{lstlisting}
513\end{quote}
514which allows the function to be invoked with and return any values subject to
515the enforcement of the typing rules in \lstinline'event_match', which also
516provides the trace.
517
518Cost label statements add the given label to the trace analogously to the cost
519label expressions above.
520
521\section{Executable semantics}
522\label{sec:exec}
523
524We have added an equivalent functional version of the \clight{} semantics that
525can be used to animate programs.  The definitions roughly follow the inductive
526semantics, but are necessarily rearranged around pattern matching of the
527relevant parts of the state rather than presenting each case separately.
528
529\subsection{Expressions}
530
531The code corresponding to the variable lookup definitions on
532page~\pageref{page:evalvar} is
533\begin{quote}
534\begin{lstlisting}
535nlet rec exec_expr (ge:genv) (en:env) (m:mem) (e:expr) on e :
536               res ($\Sigma$r:val$\times$trace. eval_expr ge en m e (\fst r) (\snd r)) ≝
537match e with
538[ Expr e' ty $\Rightarrow$
539  match e' with
540  [ ...
541  | Evar _ $\Rightarrow$ Some ? (
542      do $\langle$l,tr$\rangle$ $\leftarrow$ exec_lvalue' ge en m e' ty;
543      do v $\leftarrow$ opt_to_res ? (load_value_of_type' ty m l);
544      OK ? $\langle$v,tr$\rangle$)
545...
546  ]
547]
548and exec_lvalue' (ge:genv) (en:env) (m:mem) (e':expr_descr) (ty:type) on e' :
549    res ($\Sigma$r:memory_space $\times$ block $\times$ int $\times$ trace. eval_lvalue ge en m (Expr e' ty) (\fst (\fst (\fst r))) (\snd (\fst (\fst r))) (\snd (\fst r)) (\snd r)) ≝
550  match e' with
551  [ Evar id $\Rightarrow$ 
552      match (get … id en) with
553      [ None $\Rightarrow$ Some ? (do $\langle$sp,l$\rangle$ $\leftarrow$ opt_to_res ? (find_symbol ? ? ge id); OK ? $\langle$$\langle$$\langle$sp,l$\rangle$,zero$\rangle$,E0$\rangle$) (* global *)
554      | Some loc $\Rightarrow$ Some ? (OK ? $\langle$$\langle$$\langle$Any,loc$\rangle$,zero$\rangle$,E0$\rangle$) (* local *)
555      ]
556...
557\end{lstlisting}
558\end{quote}
559where the result is placed in an error monad (the \lstinline'res' type
560constructor) so that \emph{undefined behaviour} such as dereferencing an
561invalid pointer can be rejected.  We use \lstinline'do' notation similar to
562Haskell and CompCert, where
563\begin{quote}
564\begin{lstlisting}
565do x $\leftarrow$ e; e'
566\end{lstlisting}
567\end{quote}
568means evaluate \lstinline'e' and if it yields a value then bind that to
569\lstinline'x' and evaluate \lstinline-e'-, and otherwise propogate the error.
570
571\subsection{Statements}
572
573Evaluating a step of a statement is complicated by the presence of the
574`external' functions for I/O, which can return arbitrary values.  These are
575handled by a resumption monad, which on encountering some I/O returns a
576suspension. When the suspension is applied to a value the evaluation of the
577semantics is resumed.  Resumption monads are a standard tool for providing
578denotational semantics for input~\cite{Moggi199155} and interleaved
579concurrency~\cite[Chapter 12]{schmidt86}.
580The definition also incorporates errors, and uses a coercion to automatically
581transform values from the plain error monad.
582\todo{should perhaps give more details of the resumption monad?}
583
584The execution of assignments is straightforward,
585\begin{quote}
586\begin{lstlisting}
587nlet rec exec_step (ge:genv) (st:state) on st : (IO eventval io_out ($\Sigma$r:trace $\times$ state. step ge st (\fst r) (\snd r))) ≝
588match st with
589[ State f s k e m $\Rightarrow$
590  match s with
591  [ Sassign a1 a2 $\Rightarrow$ Some ? (
592    ! $\langle$l,tr1$\rangle$ $\leftarrow$ exec_lvalue ge e m a1;:
593    ! $\langle$v2,tr2$\rangle$ $\leftarrow$ exec_expr ge e m a2;:
594    ! m' $\leftarrow$ store_value_of_type' (typeof a1) m l v2;:
595    ret ? $\langle$tr1++tr2, State f Sskip k e m'$\rangle$)
596...
597\end{lstlisting}
598\end{quote}
599where \lstinline'!' is used in place of \lstinline'do' due to the change in
600monad.  The content is essentially the same as the inductive rule given on
601page~\pageref{page:Sassign}.
602
603The handling of external calls uses the
604\begin{quote}
605\begin{lstlisting}
606do_io : ident $\rightarrow$ list eventval $\rightarrow$ IO eventval io_out eventval
607\end{lstlisting}
608\end{quote}
609function to suspend execution:
610\begin{quote}
611\begin{lstlisting}
612...
613  ]
614| Callstate f0 vargs k m $\Rightarrow$
615  match f0 with
616  [ ...
617  | External f argtys retty $\Rightarrow$ Some ? (
618      ! evargs $\leftarrow$ check_eventval_list vargs (typlist_of_typelist argtys);:
619      ! evres $\leftarrow$ do_io f evargs;:
620      ! vres $\leftarrow$ check_eventval evres (proj_sig_res (signature_of_type argtys rett
621      ret ? $\langle$(Eextcall f evargs evres), Returnstate vres k m$\rangle$)
622...
623\end{lstlisting}
624\end{quote}
625\todo{say something more about the rest?}
626
627Together with functions to provide the initial state for a program and to
628detect a final state we can write a function to run the program up to a given
629number of steps.  Similarly, a corecursive function can return the entire
630execution as a stream of trace and state pairs.
631
632\section{Validation}
633\label{sec:valid}
634
635We have used two methods to validate our executable semantics: we have
636proven them equivalent to the inductive semantics of
637Section~\ref{sec:inductive}, and we have animated small examples of
638key areas.
639
640\subsection{Equivalence to inductive semantics}
641
642To show that the executable semantics are sound with respect to the
643inductive semantics we need to prove that any value produced by each
644function satisfies the corresponding relation, modulo errors and
645resumption.  To deal with these monads we lift the properties
646required.  In particular, for the resumption monad we ignore error
647values, require the property when a value is produced, and quantify
648over any interaction with the outside world:
649\begin{quote}
650\begin{lstlisting}
651nlet rec P_io O I (A:Type) (P:A $\rightarrow$ Prop) (v:IO O I A) on v : Prop :=
652match v return $\lambda$_.Prop with
653[ Wrong $\Rightarrow$ True
654| Value z $\Rightarrow$ P z
655| Interact out k $\Rightarrow$ $\forall$v'.P_io O I A P (k v')
656].
657\end{lstlisting}
658\end{quote}
659We can use this lifting with the relations from the inductive
660semantics to state soundness properties:
661\begin{quote}
662\begin{lstlisting}
663ntheorem exec_step_sound: $\forall$ge,st.
664  P_io ??? ($\lambda$r. step ge st (\fst r) (\snd r)) (exec_step ge st).
665\end{lstlisting}
666\end{quote}
667The proofs of these theorems use case analysis over the state, a few
668lemmas to break up the expressions in the monad and the other
669soundness results to form the corresponding derivation in the
670inductive semantics.
671
672We experimented with a different way of specifying soundness using
673dependent types:
674\begin{quote}
675\begin{lstlisting}
676nlet rec exec_step (ge:genv) (st:state) on st : (IO eventval io_out ($\Sigma$r:trace $\times$ state. step ge st (\fst r) (\snd r))) ≝
677\end{lstlisting}
678\end{quote}
679Note the $\Sigma$ type for the result of the function, which shows that
680successful executions are sound with respect to the inductive semantics.
681Matita automatically generates proof obligations for each case due to a
682coercion between the types
683\begin{center}
684\lstinline'option (res T)' \quad and \quad \lstinline'res ($\Sigma$x:T. P x)'
685\end{center}
686(where a branch marked \lstinline'None' would generate a proof obligation to
687show that it is impossible, although the semantics do not use this feature).
688This is intended to mimic Sozeau's \textsc{Russell} language and elaboration
689into Coq~\cite{sozeau06}.  The coercion also triggers an automatic mechanism
690in \matita{} to add equalities for each pattern matched.  Each case of the
691soundness proof consists of extracting an equality for each computation in the
692monad,
693% "in the monad" is a bit vague here
694making any further case distinctions necessary and applying the corresponding
695rule from the inductive semantics.
696
697However, the soundness proofs then pervade the executable semantics,
698making the correctness proofs more difficult.  We decided to keep the
699soundness results separate, partly because of the increased difficulty
700of using the resulting terms in proofs, and partly because they are of
701little consequence once equivalence has been shown.
702
703The completeness results requiring a dual lifting which requires the
704term to reduce to a particular value, allowing for resumptions with
705existential quantification:
706\begin{quote}
707\begin{lstlisting}
708nlet rec yieldsIObare (A:Type) (a:IO io_out io_in A) (v':A) on a : Prop :=
709match a with
710[ Value v $\Rightarrow$ v' = v
711| Interact _ k $\Rightarrow$ $\exists$r.yieldsIObare A (k r) v'
712| _ $\Rightarrow$ False
713].
714\end{lstlisting}
715\end{quote}
716\begin{quote}
717We then show the completeness theorems, such as
718\begin{lstlisting}
719ntheorem step_complete: $\forall$ge,s,tr,s'.
720  step ge s tr s' $\rightarrow$ yieldsIObare ? (exec_step ge s) $\langle$tr,s'$\rangle$.
721\end{lstlisting}
722\end{quote}
723by case analysis on the inductive derivation and a mixture of
724reduction and rewriting.
725\todo{Whole execution equivalence}
726
727\subsection{Animation of simple C programs}
728
729\begin{verbatim}
730MORE:
731  library stuff: choice of integers, maps
732  check for any axioms
733\end{verbatim}
734
735\bibliographystyle{plain}
736\bibliography{report}
737
738\end{document}
Note: See TracBrowser for help on using the repository browser.