source: Deliverables/D3.1/Report/report.tex

Last change on this file was 620, checked in by mulligan, 9 years ago

More changes to presentation. Modified some of the C examples to test for Ayache's reported bug.

File size: 38.8 KB
Line 
1\documentclass[11pt,epsf,a4wide]{article}
2\usepackage[utf8x]{inputenc}
3\usepackage{listings}
4\usepackage{../../style/cerco}
5\newcommand{\ocaml}{OCaml}
6\newcommand{\clight}{Clight}
7\newcommand{\matita}{Matita}
8\newcommand{\sdcc}{\texttt{sdcc}}
9
10\newcommand{\textSigma}{\ensuremath{\Sigma}}
11
12% LaTeX Companion, p 74
13\newcommand{\todo}[1]{\marginpar{\raggedright - #1}}
14
15\lstdefinelanguage{coq}
16  {keywords={Definition,Lemma,Theorem,Remark,Qed,Save,Inductive,Record},
17   morekeywords={[2]if,then,else},
18  }
19
20\lstdefinelanguage{matita}
21  {keywords={ndefinition,nlemma,ntheorem,nremark,ninductive,nrecord,nqed,nlet,rec,match,with,Type,and,on},
22   morekeywords={[2]nwhd,nnormalize,nelim,ncases,ndestruct},
23   mathescape=true,
24  }
25
26\lstset{language=matita,basicstyle=\small\tt,columns=flexible,breaklines=false,
27        keywordstyle=\color{red}\bfseries,
28        keywordstyle=[2]\color{blue},
29        commentstyle=\color{green},
30        stringstyle=\color{blue},
31        showspaces=false,showstringspaces=false}
32
33\lstset{extendedchars=false}
34\lstset{inputencoding=utf8x}
35\DeclareUnicodeCharacter{8797}{:=}
36\DeclareUnicodeCharacter{10746}{++}
37\DeclareUnicodeCharacter{9001}{\ensuremath{\langle}}
38\DeclareUnicodeCharacter{9002}{\ensuremath{\rangle}}
39
40
41\title{
42INFORMATION AND COMMUNICATION TECHNOLOGIES\\
43(ICT)\\
44PROGRAMME\\
45\vspace*{1cm}Project FP7-ICT-2009-C-243881 \cerco{}}
46
47\date{ }
48\author{}
49
50\begin{document}
51\thispagestyle{empty}
52
53\vspace*{-1cm}
54\begin{center}
55\includegraphics[width=0.6\textwidth]{../../style/cerco_logo.png}
56\end{center}
57
58\begin{minipage}{\textwidth}
59\maketitle
60\end{minipage}
61
62
63\vspace*{0.5cm}
64\begin{center}
65\begin{LARGE}
66\bf
67Report n. D3.1\\
68Executable Formal Semantics of C\\
69\end{LARGE} 
70\end{center}
71
72\vspace*{2cm}
73\begin{center}
74\begin{large}
75Version 1.0
76\end{large}
77\end{center}
78
79\vspace*{0.5cm}
80\begin{center}
81\begin{large}
82Main Authors:\\
83Brian~Campbell, Randy~Pollack
84\end{large}
85\end{center}
86
87\vspace*{\fill}
88\noindent
89Project Acronym: \cerco{}\\
90Project full title: Certified Complexity\\
91Proposal/Contract no.: FP7-ICT-2009-C-243881 \cerco{}\\
92
93\clearpage \pagestyle{myheadings} \markright{\cerco{}, FP7-ICT-2009-C-243881}
94
95\newpage
96
97\vspace*{7cm}
98\paragraph{Abstract}
99We present an execution semantics of the C programming language for
100use in the \cerco{} project.  It is based on the small-step inductive
101semantics used by the CompCert verified compiler.  We discuss the
102extensions required for our target architecture, porting the semantics
103to our choice of tool, \matita{}, providing an equivalent executable
104semantics and the validation of the semantics.
105
106\newpage 
107
108\tableofcontents
109
110\section{Introduction}
111
112We present an executable formal semantics of the C programming language which
113will serve as the specification of the input language for the \cerco{}
114verified compiler.  Our semantics is based on Leroy et.~al.'s C semantics for
115the CompCert project~\cite{1111042,compcert-mm08}, which divides the treatment
116of C into two pieces.  The first is an \ocaml{} stage which parses and
117elaborates C into an abstract syntax tree for the simpler \clight{} language,
118based on the CIL C parser.  The second part is a small step semantics for
119\clight{} formalised in the proof tool, which we have ported from Coq to the
120\matita{} theorem prover.
121This semantics is given in the form of inductive definitions, and so we have
122added a third part giving an equivalent functional presentation in \matita{}.
123
124The \cerco{} compiler needs to deal with the constrained memory model of
125the target microcontroller (in our case, the 8051).  Thus each part of the
126semantics has been extended to allow explicit handling of the
127microcontroller's memory spaces.  \emph{Cost labels} have also been added to
128the \clight{} semantics to support the labelling approach to cost annotations
129presented in a previous deliverable~\cite{d2.1}.
130
131The following section discusses the C language extensions for memory spaces.
132Then the port of the two stages of the CompCert \clight{} semantics is
133described in Section~\ref{sec:port}, followed by the new executable semantics
134in Section~\ref{sec:exec}.  Finally we discuss how the semantics has
135been validated
136in Section~\ref{sec:valid}.
137
138\section{Language extensions for the 8051 memory model}
139\label{sec:ext}
140
141The choice of an extended 8051 target for the \cerco{} compiler imposes an
142irregular memory model with tight resource constraints.  The different memory
143spaces and access modes are summarised in Figure~\ref{fig:memorymodel} ---
144essentially the evolution of the 8051 family has fragmented memory into
145four regions:  one half of the `internal' memory is fully accessible but also
146contains the register banks, the second half cannot be accessed by direct addressing
147because it is shadowed by the `Special Function Registers' (SFRs) for I/O;
148`external memory' provides the bulk of memory in a separate address space; and
149the code is in its own read-only space.
150\begin{figure}
151\setlength{\unitlength}{1pt}
152\begin{picture}(410,250)(-50,200)
153%\put(-50,200){\framebox(410,250){}}
154\put(12,410){\makebox(80,0)[b]{Internal (256B)}}
155\put(13,242){\line(0,1){165}}
156\put(93,242){\line(0,1){165}}
157\put(13,407){\line(1,0){80}}
158\put(12,400){\makebox(0,0)[r]{0h}}  \put(14,400){\makebox(0,0)[l]{Register bank 0}}
159\put(13,393){\line(1,0){80}}
160\put(12,386){\makebox(0,0)[r]{8h}}  \put(14,386){\makebox(0,0)[l]{Register bank 1}}
161\put(13,379){\line(1,0){80}}
162\put(12,372){\makebox(0,0)[r]{10h}}  \put(14,372){\makebox(0,0)[l]{Register bank 2}}
163\put(13,365){\line(1,0){80}}
164\put(12,358){\makebox(0,0)[r]{18h}} \put(14,358){\makebox(0,0)[l]{Register bank 3}}
165\put(13,351){\line(1,0){80}}
166\put(12,344){\makebox(0,0)[r]{20h}} \put(14,344){\makebox(0,0)[l]{Bit addressable}}
167\put(13,323){\line(1,0){80}}
168\put(12,316){\makebox(0,0)[r]{30h}}
169  \put(14,309){\makebox(0,0)[l]{\quad \vdots}}
170\put(13,291){\line(1,0){80}}
171\put(12,284){\makebox(0,0)[r]{80h}}
172  \put(14,263){\makebox(0,0)[l]{\quad \vdots}}
173\put(12,249){\makebox(0,0)[r]{ffh}}
174\put(13,242){\line(1,0){80}}
175
176\qbezier(-2,407)(-6,407)(-6,393)
177\qbezier(-6,393)(-6,324)(-10,324)
178\put(-12,324){\makebox(0,0)[r]{Indirect/Stack}}
179\qbezier(-6,256)(-6,324)(-10,324)
180\qbezier(-2,242)(-6,242)(-6,256)
181
182\qbezier(94,407)(98,407)(98,393)
183\qbezier(98,393)(98,349)(102,349)
184\put(104,349){\makebox(0,0)[l]{Direct}}
185\qbezier(98,305)(98,349)(102,349)
186\qbezier(94,291)(98,291)(98,305)
187
188\put(102,242){\framebox(20,49){SFR}}
189% bit access to sfrs?
190
191\qbezier(124,291)(128,291)(128,277)
192\qbezier(128,277)(128,266)(132,266)
193\put(134,266){\makebox(0,0)[l]{Direct}}
194\qbezier(128,257)(128,266)(132,266)
195\qbezier(124,242)(128,242)(128,256)
196
197\put(164,410){\makebox(80,0)[b]{External (64kB)}}
198\put(164,220){\line(0,1){187}}
199\put(164,407){\line(1,0){80}}
200\put(244,220){\line(0,1){187}}
201\put(164,242){\line(1,0){80}}
202\put(163,400){\makebox(0,0)[r]{0h}}
203\put(164,324){\makebox(80,0){Paged access}}
204  \put(164,310){\makebox(80,0){direct/indirect}}
205\put(163,235){\makebox(0,0)[r]{80h}}
206  \put(164,228){\makebox(80,0){\vdots}}
207  \put(164,210){\makebox(80,0){Direct/indirect}}
208
209\put(264,410){\makebox(80,0)[b]{Code (64kB)}}
210\put(264,220){\line(0,1){187}}
211\put(264,407){\line(1,0){80}}
212\put(344,220){\line(0,1){187}}
213\put(263,400){\makebox(0,0)[r]{0h}}
214  \put(264,228){\makebox(80,0){\vdots}}
215  \put(264,324){\makebox(80,0){Dsirect}}
216  \put(264,310){\makebox(80,0){PC relative}}
217\end{picture}
218\caption{The extended 8051 memory model}
219\label{fig:memorymodel}
220\end{figure}
221
222To make efficient use of the limited amount of memory, compilers for 8051
223microcontrollers provide extra keywords to allocate global variables to
224particular memory spaces, and to limit pointers to address a particular space.
225The freely available \sdcc{} compiler provides the following extensions for
226the 8051 memory spaces:
227\begin{table}[ht]
228\begin{tabular}{rcl}
229Attribute & Pointer size (bytes) & Memory space \\\hline
230\lstinline'__data' & 1 & Internal, first half (0h -- 7fh) \\
231\lstinline'__idata' & 1 & Internal, indirect only (80h -- ffh) \\
232\lstinline'__pdata' & 1 & External, page access (usually 0h -- 7fh) \\
233\lstinline'__xdata' & 2 & External, any (0h -- ffffh) \\
234\lstinline'__code' & 2 & Code, any (0h -- ffffh) \\
235\emph{none}& 3 & Any / Generic pointer
236\end{tabular}
237\end{table}\\
238The generic pointers are a tagged union of the other kinds of pointers.
239
240We intend the \cerco{} compiler to support extensions that are broadly
241compatible with \sdcc{} to enable the compilation of programs with
242either tool.  In particular, this would allow the comparison of the
243behaviour of test cases compiled with each compiler.  Thus the C
244syntax and semantics have been extended with the memory space
245attributes listed above.  The syntax follows \sdcc{} and in the
246semantics we track the memory space that each block was allocated from
247and only permit access via the appropriate kinds of pointers.  The
248details on these changes are given in the following sections.
249
250The \sdcc{} compiler also supports special variable types for accessing the
251SFRs, which provide the standard I/O mechanism for the 8051 family.  (Note
252that pointers to these types are not permitted because only direct addressing of
253the SFRs is allowed.)  We intend to use CompCert-style `external functions'
254instead of special types.  These are functions which are declared, but no C
255implementation of them is provided.  Instead they are provided by the runtime or
256compiled directly to the corresponding machine code.  This has the advantage
257that no changes from the CompCert semantics are required, and a compatibility
258library can be provided for \sdcc{} if necessary.  The 8051 and the \sdcc{}
259compiler also provide bit-level access to a small region of internal memory.
260We do not intend to expose this feature to C programs in the \cerco{}
261compiler, and so no extension is provided for it.
262
263Finally, we have the option of using CompCert's translation of
264\lstinline'volatile' variable accesses to `external' function calls.  Should we
265need more flexible I/O than SFRs provide, then we could adopt the \sdcc{}
266extension to allocate a variable at a particular address to provide a way to
267deal with memory mapped I/O in the external memory space.  The translation to
268function calls would mean that the semantics presented here would be
269unaffected.
270
271\section{Port of CompCert \clight{} semantics to \matita{}}
272\label{sec:port}
273
274\subsection{Parsing and elaboration}
275The first stage taken from the CompCert semantics is the parsing and
276elaboration of C programs into the simpler \clight{} language.  This is based
277upon the CIL library for parsing, analysing and transforming C programs by
278Necula et.~al.~\cite{cil02}.  The elaboration provides explicit type
279information throughout the program, including extra casts for
280promotion.  It also
281performs simplifications such as breaking up expressions with side effects
282into effect-free expressions along with statements to perform the effects.  The
283transformed \clight{} programs are much more manageable and lack the ambiguities
284of C, but also remain easily understood by C programmers.
285
286The parser has been extended with the 8051 memory spaces attributes given
287above.  The resulting abstract syntax tree records them on global variable
288declarations and pointer types.  However, we also need to deal with them
289during the elaboration process to produce all of the required type information.
290For example, when the address-of operator \lstinline'&' is used it must decide
291which kind of pointer should be used.  Thus the extended elaboration process
292keeps track of the memory space (if any) that the value of each
293expression resides in.  Where the memory space is not known, a generic pointer
294will be used instead.  Moreover, we also include the pointer kind when
295determining whether a cast must be inserted so that conversions between
296pointer representations can be performed.
297
298
299Thus the elaboration turns the following C code
300\begin{quote}
301\begin{lstlisting}[language=C]
302int g(int *x) { return 5; }
303
304int f(__data int *x, int *y) {
305  return x==y ? g(x) : *x;
306}
307
308__data int i = 1;
309
310int main(void) {
311  return f(&i, &i);
312}
313\end{lstlisting}
314\end{quote}
315into the Clight program below:
316\begin{quote}
317\begin{lstlisting}[language=C]
318int g(int *x) { return 5; }
319
320int f(__data int * x, int * y)
321{
322  int t;
323  if (x == (__data int * )y) {
324    t = g((int * )x);
325  } else {
326    t = *x;
327  }
328  return t;
329}
330
331int main(void)
332{
333  int t;
334  t = f(&i, (int * )(&i));
335  return t;
336}
337\end{lstlisting}
338\end{quote}
339The expression in \lstinline'f' had to be broken up due
340to the call to \lstinline'g', and casts have been added to change between generic pointers
341and pointers specific to the \lstinline'__data' section of memory.  The
342underlying data structure also has types attached to every expression, but
343these are impractical to show in source form.
344
345Note that the translation from C to \clight{} is not proven correct
346--- instead it effectively forms a semi-formal part of the whole C
347semantics.  We can have some confidence in the code, however, because
348it has received testing in the \cerco{} prototype, and it is very
349close to the version used in CompCert.  We can also perform testing of
350the semantics without involving the rest of the compiler because we
351have an executable semantics.  Moreover, the cautious programmer could
352choose to inspect the generated \clight{} code, or even work entirely
353in the \clight{} language.
354
355\subsection{Small-step inductive semantics}
356\label{sec:inductive}
357
358The semantics for \clight{} itself has been ported from the Coq development
359used in CompCert to \matita{} for use in \cerco{}.  Details about the original
360big-step formalisation of \clight{} can be found in Leroy and
361Blazy~\cite{compcert-mm08} (including a discussion of the translation from C
362in \S 4.1), although we started from a later version with a
363small-step semantics and hence support for \lstinline'goto' statements.
364Several parts of the semantics were shared with other parts of the CompCert
365development, notably:
366\begin{itemize}
367\item the representation of primitive values (integers, pointers and undefined
368values, but not structures or unions) and operations on them,
369\item traces of I/O events,
370\item a memory model that keeps conceptually distinct sections of memory
371strictly separate (assigning `undefined behaviour' to a buffer overflow, for
372instance),
373\item results about composing execution steps of arbitrary small-step
374semantics,
375\item data structures for local and global environments, and
376\item common error handling constructs, in particular an error monad.
377\end{itemize}
378We anticipate a similar arrangement for the \cerco{} verified
379compiler, although this means that there may be further changes to the
380common parts of the semantics later in the project to harmonise the
381stages of the compiler.  In particular, some of data structures for
382environments are just preliminary definitions for developing the
383semantics.
384
385The main body of the small-step semantics is a number of inductive definitions
386giving details of the defined behaviour for casts, expressions and statements.
387Expressions are side-effect free in \clight{} and only produce a value as
388output.  In our case we also need a trace of any cost labels that are
389`evaluated' so that we will be able to give fine-grained costs for
390the execution of compiled conditional expressions.
391
392As an example of one of the expression rules, consider an expression which
393evaluates a variable, \lstinline'Expr (Evar id) ty'.  A variable is an example
394of a class of expressions called \emph{lvalues}, which are roughly those
395expressions which can be assigned to.  Thus we use a general rule for lvalues,
396\label{page:evalvar}
397\begin{quote}
398\begin{lstlisting}
399ninductive eval_expr (ge:genv) (e:env) (m:mem) : expr $\rightarrow$ val $\rightarrow$ trace $\rightarrow$ Prop :=
400...
401  | eval_Elvalue: $\forall$a,ty,psp,loc,ofs,v,tr.
402      eval_lvalue ge e m (Expr a ty) psp loc ofs tr $\rightarrow$
403      load_value_of_type ty m psp loc ofs = Some ? v $\rightarrow$
404      eval_expr ge e m (Expr a ty) v tr
405\end{lstlisting}
406\end{quote}
407where the auxiliary relation \lstinline'eval_lvalue' yields the location of
408the value, \lstinline'psp,loc,ofs', consisting of memory space, memory block,
409and offset into the block, respectively.  The expression can thus evaluate to
410the value \lstinline'v' if \lstinline'v' can be loaded from that location.
411One corresponding part of the \lstinline'eval_lvalue' definition is
412\begin{quote}
413\begin{lstlisting}
414...
415with eval_lvalue (*(ge:genv) (e:env) (m:mem)*) :
416        expr $\rightarrow$ memory_space $\rightarrow$ block $\rightarrow$ int $\rightarrow$ trace $\rightarrow$ Prop ≝
417  | eval_Evar_local:   $\forall$id,l,ty.
418      get ??? id e = Some ? l $\rightarrow$
419      eval_lvalue ge e m (Expr (Evar id) ty) Any l zero E0
420...
421\end{lstlisting}
422\end{quote}
423simply looks up the variable in the local environment.  The offset is
424zero because all variables are given their own memory block to prevent
425the use of stray pointers.  A similar rule handles global variables,
426with an extra check to ensure that no local variable has the same
427name.  Note that the two relations are defined using mutual recursion
428because \lstinline'eval_lvalue' uses \lstinline'eval_expr' for the
429evaluation of the pointer expression in the dereferencing rule.
430
431Casts also have an auxiliary relation to specify the allowed changes, and
432operations on values (including the changes in representation performed by
433casting) are given as functions.
434
435The only new expression in our semantics is the cost label which wraps
436around another expression.  It does not change the result, but merely
437augments the trace with the given label to identify the branches taken
438in conditional expressions so that accurate cost information can be
439attached to the program:
440\begin{quote}
441\begin{lstlisting}
442  | eval_Ecost: $\forall$a,ty,v,l,tr.
443      eval_expr ge e m a v tr $\rightarrow$
444      eval_expr ge e m (Expr (Ecost l a) ty) v (tr++Echarge l)
445\end{lstlisting}
446\end{quote}
447
448As the expressions are side-effect free, all of the changes to the state are
449performed by statements.  The state itself is represented by records of the
450form
451\begin{quote}
452\begin{lstlisting}
453ninductive state: Type :=
454  | State:
455      $\forall$f: function.
456      $\forall$s: statement.
457      $\forall$k: cont.
458      $\forall$e: env.
459      $\forall$m: mem.  state
460  | Callstate:
461      $\forall$fd: fundef.
462      $\forall$args: list val.
463      $\forall$k: cont.
464      $\forall$m: mem. state
465  | Returnstate:
466      $\forall$res: val.
467      $\forall$k: cont.
468      $\forall$m: mem. state.
469\end{lstlisting}
470\end{quote}
471During normal execution the state contains the currently executing
472function's definition (used to find \lstinline'goto' labels and also
473to check whether the function is expected to return a value), the
474statement to be executed next, a continuation value to be executed
475afterwards (where successor statements and details of function calls
476and loops are stored), the local environment mapping variables to
477memory locations\footnote{In the semantics all variables are
478  allocated, although the compiler may subsequently allocate them to
479  registers where possible.} and the current memory state.  The
480function call and return states appear to store less information
481because the details of the caller are contained in the continuation.
482
483An example of the statement execution rules is the assignment rule
484(corresponding to the C syntax \lstinline'a1 = a2'),
485\label{page:Sassign}
486\begin{quote}
487\begin{lstlisting}
488ninductive step (ge:genv) : state $\rightarrow$ trace $\rightarrow$ state $\rightarrow$ Prop ≝
489  | step_assign:   $\forall$f,a1,a2,k,e,m,psp,loc,ofs,v2,m',tr1,tr2.
490      eval_lvalue ge e m a1 psp loc ofs tr1 $\rightarrow$
491      eval_expr ge e m a2 v2 tr2 $\rightarrow$
492      store_value_of_type (typeof a1) m psp loc ofs v2 = Some ? m' $\rightarrow$
493      step ge (State f (Sassign a1 a2) k e m)
494           (tr1++tr2) (State f Sskip k e m')
495...
496\end{lstlisting}
497\end{quote}
498which can be read as:
499\begin{itemize}
500\item if \lstinline'a1' can evaluate to the location \lstinline'psp,loc,ofs',
501\item \lstinline'a2' can evaluate to a value \lstinline'v2', and
502\item storing \lstinline'v2' at location \lstinline'psp,loc,ofs' succeeds,
503yielding the new memory state \lstinline-m'-, then
504\item the program can step from the state about to execute
505\lstinline'Sassign a1 a2' to a state with the updated memory \lstinline-m'-
506about to execute the no-op \lstinline'Sskip'.
507\end{itemize}
508This rule would be followed by one of the rules to which replaces the
509\lstinline'Sskip' statement with the `real' next statement constructed
510from the continuation \lstinline'k'.  Note that the only true
511side-effect here is the change in memory --- the local environment is
512initialised once and for all on function entry, and the only events
513appearing in the trace are cost labels used purely for accounting.  At
514present this imposes an ordering due to the cost labels.  Should this
515prove too restrictive we may change it to produce a set of labels
516encountered.
517
518The \clight{} language provides input and output effects through `external'
519functions and the step rule
520\begin{quote}
521\begin{lstlisting}
522  | step_external_function: $\forall$id,targs,tres,vargs,k,m,vres,t.
523      event_match (external_function id targs tres) vargs t vres $\rightarrow$
524      step ge (Callstate (External id targs tres) vargs k m)
525            t (Returnstate vres k m)
526\end{lstlisting}
527\end{quote}
528which allows the function to be invoked with and return any values subject to
529the enforcement of the typing rules in \lstinline'event_match', which also
530provides the trace.
531
532Cost label statements prefix the trace with the given label, similar to the cost
533label expressions above.
534
535\section{Executable semantics}
536\label{sec:exec}
537
538We have added an equivalent functional version of the \clight{} semantics that
539can be used to animate programs.  The definitions roughly follow the inductive
540semantics, but are necessarily rearranged around pattern matching of the
541relevant parts of the state rather than presenting each case separately.
542
543\subsection{Expressions}
544
545The code corresponding to the variable lookup definitions on
546page~\pageref{page:evalvar} is
547\begin{quote}
548\begin{lstlisting}
549nlet rec exec_expr (ge:genv) (en:env) (m:mem) (e:expr) on e : res (val$\times$trace) ≝
550match e with
551[ Expr e' ty $\Rightarrow$
552  match e' with
553  [ ...
554  | Evar _ $\Rightarrow$
555      do $\langle$l,tr$\rangle$ $\leftarrow$ exec_lvalue' ge en m e' ty;
556      do v $\leftarrow$ opt_to_res ? (load_value_of_type' ty m l);
557      OK ? $\langle$v,tr$\rangle$
558...
559  ]
560]
561and exec_lvalue' (ge:genv) (en:env) (m:mem) (e':expr_descr) (ty:type) on e'
562 : res (memory_space $\times$ block $\times$ int $\times$ trace) ≝
563  match e' with
564  [ Evar id $\Rightarrow$ 
565      match (get … id en) with
566      [ None $\Rightarrow$ do $\langle$sp,l$\rangle$ $\leftarrow$ opt_to_res ? (find_symbol ? ? ge id);
567                 OK ? $\langle$$\langle$$\langle$sp,l$\rangle$,zero$\rangle$,E0$\rangle$ (* global *)
568      | Some loc $\Rightarrow$ OK ? $\langle$$\langle$$\langle$Any,loc$\rangle$,zero$\rangle$,E0$\rangle$ (* local *)
569      ]
570...
571\end{lstlisting}
572\end{quote}
573where the result is placed in an error monad (the \lstinline'res' type
574constructor) so that \emph{undefined behaviour} such as dereferencing an
575invalid pointer can be rejected.  We use \lstinline'do' notation similar to
576Haskell and CompCert, where
577\begin{quote}
578\begin{lstlisting}
579do x $\leftarrow$ e; e'
580\end{lstlisting}
581\end{quote}
582means evaluate \lstinline'e' and if it yields a value then bind that to
583\lstinline'x' and evaluate \lstinline-e'-, and otherwise propogate the error.
584
585\subsection{Statements}
586
587Evaluating a step of a statement is complicated by the presence of the
588`external' functions for I/O, which can return arbitrary values.  These are
589handled by a resumption monad, which on encountering some I/O returns a
590suspension. When the suspension is applied to a value the evaluation of the
591semantics is resumed.  Resumption monads are a standard tool for providing
592denotational semantics for input~\cite{Moggi199155} and interleaved
593concurrency~\cite[Chapter 12]{schmidt86}.
594The definition also incorporates errors, and uses a coercion to automatically
595transform values from the plain error monad.
596
597The definition of the monad is:
598\begin{quote}
599\begin{lstlisting}
600ninductive IO (output:Type) (input:output $\rightarrow$ Type) (T:Type) : Type :=
601| Interact : $\forall$o:output. (input o $\rightarrow$ IO output input T) $\rightarrow$ IO output input T
602| Value : T $\rightarrow$ IO output input T
603| Wrong : IO output input T.
604
605nlet rec bindIO (O:Type) (I:O $\rightarrow$ Type) (T,T':Type)
606                  (v:IO O I T) (f:T $\rightarrow$ IO O I T') on v : IO O I T' :=
607match v with
608[ Interact out k $\Rightarrow$ (Interact ??? out ($\lambda$res. bindIO O I T T' (k res) f))
609| Value v' $\Rightarrow$ (f v')
610| Wrong $\Rightarrow$ Wrong O I T'
611].
612\end{lstlisting}
613\end{quote}
614Note that the type of the input value is dependent on the output value.
615This enables us to ensure that the input is always well-typed.  An
616alternative approach is a check in the semantics, but this causes
617programs to fail in a way that has no counterpart in the inductive
618semantics.
619
620The execution of assignments is straightforward,
621\begin{quote}
622\begin{lstlisting}
623nlet rec exec_step (ge:genv) (st:state) on st : (IO io_out io_in (trace $\times$ state)) ≝
624match st with
625[ State f s k e m $\Rightarrow$
626  match s with
627  [ Sassign a1 a2 $\Rightarrow$
628    ! $\langle$l,tr1$\rangle$ $\leftarrow$ exec_lvalue ge e m a1;
629    ! $\langle$v2,tr2$\rangle$ $\leftarrow$ exec_expr ge e m a2;
630    ! m' $\leftarrow$ store_value_of_type' (typeof a1) m l v2;
631    ret ? $\langle$tr1++tr2, State f Sskip k e m'$\rangle$
632...
633\end{lstlisting}
634\end{quote}
635where \lstinline'!' is used in place of \lstinline'do' due to the change in
636monad.  The content is essentially the same as the inductive rule given on
637page~\pageref{page:Sassign}.
638
639Most other rules are similar translations of the inductive semantics.
640The handling of external calls uses the
641\begin{quote}
642\begin{lstlisting}
643do_io : ident $\rightarrow$ list eventval $\rightarrow$ IO eventval io_out eventval
644\end{lstlisting}
645\end{quote}
646function to suspend execution:
647\begin{quote}
648\begin{lstlisting}
649...
650| Callstate f0 vargs k m $\Rightarrow$
651  match f0 with
652  [ ...
653  | External f argtys retty $\Rightarrow$ 
654      ! evargs $\leftarrow$ err_to_io_sig … (check_eventval_list vargs (typlist_of_typelist argtys));
655      ! evres $\leftarrow$ do_io f evargs (proj_sig_res (signature_of_type argtys retty));
656      ret ? $\langle$Eextcall f evargs (mk_eventval ? evres), Returnstate (mk_val ? evres) k m$\rangle$
657  ]
658...
659\end{lstlisting}
660\end{quote}
661The rest of the code after \lstinline'do_io' is included in the
662suspension returned.
663
664Together with functions to provide the initial state for a program and to
665detect a final state we can write a function to run the program up to a given
666number of steps.  Similarly, a corecursive function can return the entire
667execution as a stream of trace and state pairs.
668
669\section{Validation}
670\label{sec:valid}
671
672We have used two methods to validate our executable semantics: we have
673proven them equivalent to the inductive semantics of
674Section~\ref{sec:inductive}, and we have animated small examples of
675key areas.
676
677\subsection{Equivalence to inductive semantics}
678
679To show that the executable semantics are sound with respect to the
680inductive semantics we need to prove that any value produced by each
681function satisfies the corresponding relation, modulo errors and
682resumption.  To deal with these monads we lift the properties
683required.  In particular, for the resumption monad we ignore error
684values, require the property when a value is produced, and quantify
685over any interaction with the outside world:
686\begin{quote}
687\begin{lstlisting}
688nlet rec P_io O I (A:Type) (P:A $\rightarrow$ Prop) (v:IO O I A) on v : Prop :=
689match v return $\lambda$_.Prop with
690[ Wrong $\Rightarrow$ True
691| Value z $\Rightarrow$ P z
692| Interact out k $\Rightarrow$ $\forall$v'.P_io O I A P (k v')
693].
694\end{lstlisting}
695\end{quote}
696We can use this lifting with the relations from the inductive
697semantics to state soundness properties:
698\begin{quote}
699\begin{lstlisting}
700ntheorem exec_step_sound: $\forall$ge,st.
701  P_io ??? ($\lambda$r. step ge st (\fst r) (\snd r)) (exec_step ge st).
702\end{lstlisting}
703\end{quote}
704The proofs of these theorems use case analysis over the state, a few
705lemmas to break up the expressions in the monad and the other
706soundness results to form the corresponding derivation in the
707inductive semantics.
708
709We experimented with a different way of specifying soundness using
710dependent types:
711\begin{quote}
712\begin{lstlisting}
713nlet rec exec_step (ge:genv) (st:state) on st
714 : (IO eventval io_out ($\Sigma$r:trace $\times$ state. step ge st (\fst r) (\snd r))) ≝
715\end{lstlisting}
716\end{quote}
717Note the $\Sigma$ type for the result of the function, which shows that
718successful executions are sound with respect to the inductive semantics.
719Matita automatically generates proof obligations for each case due to a
720coercion between the types
721\begin{center}
722\lstinline'option (res T)' \quad and \quad \lstinline'res ($\Sigma$x:T. P x)'
723\end{center}
724(where a branch marked \lstinline'None' would generate a proof obligation to
725show that it is impossible, although the semantics do not use this feature).
726This is intended to mimic Sozeau's \textsc{Russell} language and elaboration
727into Coq~\cite{sozeau06}.  The coercion also triggers an automatic mechanism
728in \matita{} to add equalities for each pattern matched.  The proofs
729are essentially the same as before.
730
731However, the soundness proofs then pervade the executable semantics,
732making rewriting in the correctness proofs more difficult.  We decided
733to keep the soundness results separate, partly because of the
734increased difficulty of using the resulting terms in proofs, and
735partly because they are of little consequence once equivalence has
736been shown.
737
738The completeness results requiring a dual lifting which requires the
739term to reduce to a particular value, allowing for resumptions with
740existential quantification:
741\begin{quote}
742\begin{lstlisting}
743nlet rec yieldsIO (A:Type) (a:IO io_out io_in A) (v':A) on a : Prop :=
744match a with
745[ Value v $\Rightarrow$ v' = v
746| Interact _ k $\Rightarrow$ $\exists$r.yieldsIO A (k r) v'
747| _ $\Rightarrow$ False
748].
749\end{lstlisting}
750\end{quote}
751We then show the completeness theorems, such as
752\begin{quote}
753\begin{lstlisting}
754ntheorem step_complete: $\forall$ge,s,tr,s'.
755  step ge s tr s' $\rightarrow$ yieldsIO ? (exec_step ge s) $\langle$tr,s'$\rangle$.
756\end{lstlisting}
757\end{quote}
758by case analysis on the inductive derivation and a mixture of
759reduction and rewriting.  Thus we know that executing a step in these
760semantics is equivalent to a step in the inductive semantics.
761
762Showing the equivalence of whole program execution is a little
763trickier.  Our executable semantics produces a coinductive
764\lstinline'execution' which is really a tree of executions, branching
765at each I/O resumption on the input value:
766\[
767\begin{array}{rcl@{\;}l}
768&  & \mathsf{e\_step}\ t_i\ s_i \rightarrow &\cdots\ \mathsf{e\_stop}\ t_j\ i\ m \\
769& \nearrow & \quad\quad \vdots \\
770\mathsf{e\_step}\ \mathsf{E0}\ s_0 \rightarrow \mathsf{e\_step}\ t_1\ s_1 \rightarrow \cdots \mathsf{e\_interact}\ o_1\ k_1 & \rightarrow & \mathsf{e\_step}\ t_i'\ s_i' \rightarrow &\cdots\ \mathsf{e\_step}\ t_j\ s_j \rightarrow \cdots \\
771& \searrow & \quad\quad\vdots \\
772&& \mathsf{e\_step}\ t_i''\ s_i'' \rightarrow &\cdots\ \mathsf{e\_wrong}
773\end{array}
774\]
775Each \textsf{e\_step} comes with the trace (often the empty trace,
776\textsf{E0}) and current state.  Each branch corresponds to calling
777the continuation $k_1$ with a different input value.  We use the
778\lstinline'single_exec_of' predicate to identify single executions from
779these trees, essentially fixing a stream of input values.
780
781However, the inductive semantics divides program behaviours into four
782categories which have \emph{individual} (co)inductive descriptions:
783\begin{itemize}
784\item successfully terminating executions;
785\item programs which eventually diverge (with an empty trace);
786\item programs which keep interacting in some way (with an infinite
787  trace)\footnote{In our setting this includes passing through cost
788    labels as well as I/O.}; and
789\item programs which \emph{go wrong}.
790\end{itemize}
791
792We cannot constructively decide which of these categories an execution
793can fit into because the properties they describe are undecidable.
794Hence we follow CompCert's approach for showing that one of the
795behaviours always exists using classical logic.  Thus we characterise
796the executions, then show the existence of the inductive semantics'
797behaviour that matches.  We limit the scope of classical reasoning by
798taking the relevant axioms as hypotheses:
799\begin{quote}
800\begin{lstlisting}
801ntheorem exec_inf_equivalence:
802  $\forall$classic:($\forall$P:Prop.P $\vee$ $\neg$P).
803  $\forall$constructive_indefinite_description:($\forall$A:Type. $\forall$P:A$\rightarrow$Prop. ($\exists$x. P x) $\rightarrow$ $\Sigma$x:A. P x).
804  $\forall$p,e. single_exec_of (exec_inf p) e $\rightarrow$
805   $\exists$b.execution_matches_behavior e b $\wedge$ exec_program p b.
806\end{lstlisting}
807\end{quote}
808
809\subsection{Animation of simple C programs}
810
811We are currently working with a development version of \matita{} which
812(temporarily) does not support extraction to OCaml code.  Hence to
813animate a program we first parse it with CIL and produce a \matita{}
814term in text format representing the program, then interpret it within
815\matita{}.
816
817This process is rather laborious, so we have concentrated on testing
818small programs which exercised areas of the semantics which depart
819from CompCert.  In particular, we tested several aspects of the
820handling of memory spaces and the casting of pointers.  Together with
821the results in the previous sections we gain considerable confidence
822that the semantics describe the behaviour of programs properly.
823Nevertheless, we intend to experiment with larger C programs once
824extraction is available.
825
826To give a concrete example, the following C program reads an integer
827using an `external' function and returns its factorial as the
828program's exit value:
829\begin{lstlisting}[language=C]
830int get_input(void);
831
832int main(void) {
833  int i = get_input();
834  int r = 1;
835  int j;
836
837  for (j = 2; j<=i; j++)
838    r = r * j;
839
840  return r;
841}
842\end{lstlisting}
843The Clight code is essentially the same, and the \matita{} term is:
844\begin{lstlisting}
845ndefinition myprog := mk_program fundef type
846 [mk_pair ?? (succ_pos_of_nat 132 (* get_input *))
847             (External (succ_pos_of_nat 132) Tnil (Tint I32 Signed));
848  mk_pair ?? (succ_pos_of_nat 133 (* main *)) (Internal (
849    mk_function (Tint I32 Signed  ) [] [mk_pair ?? (succ_pos_of_nat 134) (Tint I32 Signed);
850                                        mk_pair ?? (succ_pos_of_nat 135) (Tint I32 Signed);
851                                        mk_pair ?? (succ_pos_of_nat 136) (Tint I32 Signed)]
852      (Ssequence
853      (Scall (Some ? (Expr (Evar (succ_pos_of_nat 134)) (Tint I32 Signed)))
854        (Expr (Evar (succ_pos_of_nat 132))
855          (Tfunction Tnil (Tint I32 Signed)))
856        [])
857      (Ssequence
858      (Sassign (Expr (Evar (succ_pos_of_nat 135)) (Tint I32 Signed))
859        (Expr (Econst_int (repr 1)) (Tint I32 Signed)))
860      (Ssequence
861      (Sfor (Sassign (Expr (Evar (succ_pos_of_nat 136)) (Tint I32 Signed))
862              (Expr (Econst_int (repr 2)) (Tint I32 Signed)))
863        (Expr (Ebinop Ole
864          (Expr (Evar (succ_pos_of_nat 136)) (Tint I32 Signed))
865          (Expr (Evar (succ_pos_of_nat 134)) (Tint I32 Signed)))
866          (Tint I32 Signed  ))
867        (Sassign (Expr (Evar (succ_pos_of_nat 136)) (Tint I32 Signed))
868          (Expr (Ebinop Oadd
869            (Expr (Evar (succ_pos_of_nat 136)) (Tint I32 Signed))
870            (Expr (Econst_int (repr 1)) (Tint I32 Signed)))
871            (Tint I32 Signed)))
872        (Sassign (Expr (Evar (succ_pos_of_nat 135)) (Tint I32 Signed))
873          (Expr (Ebinop Omul
874            (Expr (Evar (succ_pos_of_nat 135)) (Tint I32 Signed))
875            (Expr (Evar (succ_pos_of_nat 136)) (Tint I32 Signed)))
876            (Tint I32 Signed)))
877      )
878      (Sreturn (Some ? (Expr (Evar (succ_pos_of_nat 135))
879                         (Tint I32 Signed)))))))
880  ))]
881  (succ_pos_of_nat 133)
882  [].
883\end{lstlisting}
884We can use the definitions in the \texttt{Animation.ma} file to reduce
885the term for a given input (5, in this case; executing a maximum of 40 steps):
886\begin{lstlisting}
887nremark exec: result ? (exec_up_to myprog 40 [EVint (repr 5)]).
888nnormalize;  (* you can examine the result here *)
889@; nqed.
890\end{lstlisting}
891The result state records the interaction (the \lstinline'EVextcall')
892and the return result (120 in least-significant-bit first binary):
893\begin{lstlisting}
894result (res (list event$\times$state))
895  (OK (list event$\times$state)
896    $\langle$[(EVextcall (p1 (p0 (p1 (p0 (p0 (p0 (p0 one))))))) []
897          (EVint (mk_int (pos (p1 (p0 one))) (inrg_mod (pos (p1 (p0 one)))))))],
898     Returnstate (Vint (mk_int (pos (p0 (p0 (p0 (p1 (p1 (p1 one)))))))
899                         (inrg_mod (pos (p0 (p0 (p0 (p1 (p1 (p1 one))))))))))
900       Kstop MEM $\rangle$)
901\end{lstlisting}
902
903\appendix
904
905\section{Description of the Code}
906
907The files ported from CompCert were based on version 1.6, with some
908minor details taken from 1.7.1 (in particular, the parser and the
909method of building infinite traces for the equivalence proof).
910
911The majority of the semantics is given as \matita{} source files.  The
912exception is the changes to the CIL based parser, which is presented
913as a patch to a preliminary version of the \cerco{} prototype
914compiler.  This patch provides both the extensions to the parser and a
915pretty printer to produce a usable \matita{} term representing the
916program.
917
918\begin{quote}
919\begin{tabular}{p{9em}l}
920acc-0.1.spaces.patch & Changes to early prototype compiler for parsing
921\end{tabular}
922\end{quote}
923
924\subsubsection*{Ancilliary definitions}
925
926Files corresponding to CompCert.
927
928\begin{quote}
929\begin{tabular}{p{9em}l}
930Coqlib.ma & Minor definitions ported from CompCert \\
931Errors.ma & The error monad \\
932Floats.ma & Axiomatised floating point numbers \\
933Globalenvs.ma & Global environments \\
934Integers.ma & Integers modulo powers of two \\
935Maps.ma & Finite maps (used in particular for local environments) \\
936Smallstep.ma & Generic definitions and lemmas for small step semantics \\
937\end{tabular}
938\end{quote}
939
940\noindent
941Files specific to this development.
942
943\begin{quote}
944\begin{tabular}{p{9em}l}
945binary/positive.ma & Binary positive numbers \\
946binary/Z.ma & Binary integers \\
947extralib.ma & Extensions to \matita{}'s library \\
948\end{tabular}
949\end{quote}
950
951\subsubsection*{Inductive semantics ported from CompCert}
952
953\begin{quote}
954\begin{tabular}{p{9em}l}
955AST.ma &   Minor syntax definitions intended for several compiler stages \\
956Values.ma &   Definitions for values manipulated by Clight programs \\
957Mem.ma &   Definition of the memory model \\
958Events.ma &   I/O events \\
959CostLabel.ma &   Definition of cost labels \\
960Csyntax.ma &   Clight syntax trees \\
961Csem.ma &   Clight inductive semantics \\
962\end{tabular}
963\end{quote}
964
965\subsubsection*{Executable semantics}
966
967\begin{quote}
968\begin{tabular}{p{9em}l}
969IOMonad.ma &  Definitions of I/O resumption monad \\
970Cexec.ma &  Definition of the executable semantics \\
971CexecSound.ma &  Soundness of individual steps \\
972CexecComplete.ma &  Completeness of individual steps \\
973CexecEquiv.ma &  Equivalence of whole program executions \\
974Animation.ma &  Definitions to help test the semantics
975\end{tabular}
976\end{quote}
977
978\bibliographystyle{plain}
979\bibliography{report}
980
981\end{document}
Note: See TracBrowser for help on using the repository browser.