source: Deliverables/D3.1/Report/report.tex @ 197

Last change on this file since 197 was 197, checked in by campbell, 10 years ago

Add some first draft text for 3.1.

File size: 18.7 KB
Line 
1\documentclass[11pt,epsf,a4wide]{article}
2\usepackage[utf8x]{inputenc}
3\usepackage{listings}
4\usepackage{../style/cerco}
5\newcommand{\ocaml}{OCaml}
6\newcommand{\clight}{Clight}
7\newcommand{\matita}{matita}
8
9\newcommand{\textSigma}{\ensuremath{\Sigma}}
10
11% LaTeX Companion, p 74
12\newcommand{\todo}[1]{\marginpar{\raggedright - #1}}
13
14\lstdefinelanguage{coq}
15  {keywords={Definition,Lemma,Theorem,Remark,Qed,Save,Inductive,Record},
16   morekeywords={[2]if,then,else},
17  }
18
19\lstdefinelanguage{matita}
20  {keywords={ndefinition,nlemma,ntheorem,nremark,ninductive,nrecord,nqed,nlet,rec,match,with,Type},
21   morekeywords={[2]nwhd,nnormalize,nelim,ncases,ndestruct},
22   mathescape=true,
23  }
24
25\lstset{language=matita,basicstyle=\small\tt,columns=flexible,breaklines=false,
26        keywordstyle=\color{red}\bfseries,
27        keywordstyle=[2]\color{blue},
28        commentstyle=\color{green},
29        stringstyle=\color{blue},
30        showspaces=false,showstringspaces=false}
31
32\lstset{extendedchars=false}
33\lstset{inputencoding=utf8x}
34\DeclareUnicodeCharacter{8797}{:=}
35\DeclareUnicodeCharacter{10746}{++}
36\DeclareUnicodeCharacter{9001}{\ensuremath{\langle}}
37\DeclareUnicodeCharacter{9002}{\ensuremath{\rangle}}
38
39
40\title{
41INFORMATION AND COMMUNICATION TECHNOLOGIES\\
42(ICT)\\
43PROGRAMME\\
44\vspace*{1cm}Project FP7-ICT-2009-C-243881 \cerco{}}
45
46\date{ }
47\author{}
48
49\begin{document}
50\thispagestyle{empty}
51
52\vspace*{-1cm}
53\begin{center}
54\includegraphics[width=0.6\textwidth]{../style/cerco_logo.png}
55\end{center}
56
57\begin{minipage}{\textwidth}
58\maketitle
59\end{minipage}
60
61
62\vspace*{0.5cm}
63\begin{center}
64\begin{LARGE}
65\bf
66Report n. D3.1\\
67Executable Formal Semantics of C\\
68\end{LARGE} 
69\end{center}
70
71\vspace*{2cm}
72\begin{center}
73\begin{large}
74Version 1.0
75\end{large}
76\end{center}
77
78\vspace*{0.5cm}
79\begin{center}
80\begin{large}
81Main Authors:\\
82 go here
83\end{large}
84\end{center}
85
86\vspace*{\fill}
87\noindent
88Project Acronym: \cerco{}\\
89Project full title: Certified Complexity\\
90Proposal/Contract no.: FP7-ICT-2009-C-243881 \cerco{}\\
91
92\clearpage \pagestyle{myheadings} \markright{\cerco{}, FP7-ICT-2009-C-243881}
93
94\tableofcontents
95
96\section{Introduction}
97
98We present an executable formal semantics of the C programming language which
99will serve as the specification of the input language for the \cerco{}
100verified compiler.  Our semantics is based on Leroy et.~al.'s C semantics for
101the CompCert project~\cite{1111042,compcert-mm08}, which divides the treatment
102of C into two pieces.  The first is an \ocaml{} stage which parses and
103elaborates C into an abstract syntax tree for the simpler \clight{} language,
104based on the CIL C parser.  The second part is a small step semantics for
105\clight{} formalised by in the proof tool, which we have ported to \matita{}.
106This semantics is given in the form of inductive definitions, and so we have
107added a third part giving an equivalent functional presentation in \matita{}.
108
109The \cerco{} compiler also needs to deal with the constrained memory model of
110the target microcontroller (in our case, the 8051).  Thus each part of the
111semantics has been extended to allow explicit handling of the
112microcontroller's memory spaces.
113
114The port of the two stages of the CompCert \clight{} semantics is described in
115Section~\ref{sec:port}.  The new executable semantics is then discussed in
116Section~\ref{sec:exec}, followed by the extensions for memory spaces in
117Section~\ref{sec:memory}.  Finally we discuss how the semantics can be tested
118in Section~\ref{sec:valid}.
119
120\section{Port of CompCert \clight{} semantics to \matita{}}
121\label{sec:port}
122
123\begin{verbatim}
124Small-step inductive semantics.
125Use CIL-based parser from CompCert 1.7.1 (same as unverified prototype).
126COST LABELS
127\end{verbatim}
128
129The first stage from the CompCert semantics is the parsing and elaboration of
130C programs into the simpler \clight{} language.  This is based upon the CIL
131library for parsing, analysing and transforming C programs by Necula
132et.~al.~\cite{cil02}.  The elaboration performed provides explicit type
133information throughout the program, including extra casts for promotion, and
134performs simplifications such as breaking up expressions with side effects
135into statements which perform the effects and effect-free expressions.  The
136resulting \clight{} programs are much more manageable and lack the ambiguities
137of C, but remain easily understood by C programmers.
138
139Note that the translation from C to \clight{} is not proven correct ---
140instead it effectively forms a semi-formal part of the whole C semantics.  We
141can have some confidence in the code, however, because it has received testing
142in the \cerco{} prototype, and it is very close to the version used in
143CompCert.  As we also have an executable semantics we can perform testing of
144it without involving the rest of the compiler.  Moreover, the careful
145programmer could choose to inspect the \clight{} code, or even work entirely
146in the \clight{} language.
147\todo{cite compcert discussion on CIL}
148\todo{some details on what CIL does and example before/after}
149
150The semantics for \clight{} itself has been ported from the Coq development
151used in CompCert to \matita{} for use in CerCo.  Details about the original
152big-step formalisation of \clight{} can be found in Leroy and
153Blazy~\cite{compcert-mm08}, although we started from a later version with a
154small-step semantics and hence support for \lstinline'goto' statements.
155Several parts of the semantics were shared with other parts of the CompCert
156development, notably:
157\begin{itemize}
158\item representation of primitive values (integers, pointers and undefined
159values, but not structures or unions) and operations on them,
160\item traces of I/O events,
161\item a memory model that keeps conceptually distinct sections of memory
162separate (assigning `undefined behaviour' to a buffer overflow, for instance),
163\item resulting about composing execution steps of arbitrary small-step
164semantics,
165\item data structures for local and global environments, and
166\item common error handling constructs, in particular an error monad.
167\end{itemize}
168We anticipate a similar arrangement for the \cerco{} verified compiler,
169although this means that there may be further changes to these parts of the
170semantics later in the project to harmonise the stages of the compiler.
171
172The main body of the small-step semantics is a number of inductive definitions
173giving details of the defined behaviour for casts, expressions and statements.
174Expressions are side-effect free in \clight{} and only produce a value as
175output.  In our case we also need a trace of any cost labels that are
176`evaluated' so that we will be able to give fine-grained costs for
177the execution of compiled conditional expressions.
178
179As an example of one of the expression rules, consider an expression which
180evaluates a variable, \lstinline'Expr (Evar id) ty'.  A variable is an example
181of a class of expressions called \emph{lvalues}, which are roughly those
182expressions which can be assigned to.  Thus we use a general rule for lvalues,
183\label{page:evalvar}
184\begin{quote}
185\begin{lstlisting}
186ninductive eval_expr (ge:genv) (e:env) (m:mem) : expr → val → trace → Prop ≝
187...
188  | eval_Elvalue: ∀a,ty,psp,loc,ofs,v,tr.
189      eval_lvalue ge e m (Expr a ty) psp loc ofs tr →
190      load_value_of_type ty m psp loc ofs = Some ? v →
191      eval_expr ge e m (Expr a ty) v tr
192\end{lstlisting}
193\end{quote}
194where the auxiliary relation \lstinline'eval_lvalue' yields the location of
195the value, \lstinline'psp,loc,ofs', consisting of memory space, memory block,
196and offset into the block, respectively.  The expression can thus evaluate to
197the value \lstinline'v' if \lstinline'v' can be loaded from that location.
198One corresponding part of the \lstinline'eval_lvalue' definition,
199\begin{quote}
200\begin{lstlisting}
201...
202with eval_lvalue (*(ge:genv) (e:env) (m:mem)*) : expr → memory_space → block → int → trace → Prop ≝
203  | eval_Evar_local:   ∀id,l,ty.
204      get ??? id e = Some ? l →
205      eval_lvalue ge e m (Expr (Evar id) ty) Any l zero E0
206...
207\end{lstlisting}
208\end{quote}
209simply looks up the variable in the local environment.  A similar rule handles
210global variables, with an extra check to ensure that no local variable has the
211same name.  Note that the two relations are defined using mutual recursion
212because \lstinline'eval_lvalue' uses \lstinline'eval_expr' for the evaluation
213of the pointer expression in the dereferencing rule.
214
215Casts also have an auxiliary relation to specify the allowed changes, and
216operations on values (including the changes in representation performed by
217casting) are given as functions.
218
219The only new expression in our semantics is the cost label which wraps around
220another expression.  It does not change the result, but merely adds the given
221label to the trace so as to identify the branches taken in conditional
222expressions so that accurate cost information can be attached to the program.
223\todo{make the semantics of the cost labels clearer}
224
225As the expressions are side-effect free, all of the changes to the state are
226performed by statements.  The state itself is represented by records of the
227form
228\begin{quote}
229\begin{lstlisting}
230ninductive state: Type :=
231  | State:
232      ∀f: function.
233      ∀s: statement.
234      ∀k: cont.
235      ∀e: env.
236      ∀m: mem.  state
237  | Callstate:
238      ∀fd: fundef.
239      ∀args: list val.
240      ∀k: cont.
241      ∀m: mem. state
242  | Returnstate:
243      ∀res: val.
244      ∀k: cont.
245      ∀m: mem. state.
246\end{lstlisting}
247\end{quote}
248During normal execution the state contains the currently executing function's
249definition (used to find \lstinline'goto' labels and also to check whether the
250function is expected to return a value), the statement to be executed next, a
251continuation value to be executed afterwards (where successor statements and
252details of function calls and loops are stored), the local environment mapping
253variables to memory locations and the current memory state.
254\todo{need to note that all variables 'appear' to be memory allocated, even
255       if they're subsequently optimised away.}
256The function call and return states appear to store less information because
257the details of the caller are contained in the continuation.
258
259An example of the statement execution rules is the assignment rule
260(corresponding to the C syntax \lstinline'a1 = a2'),
261\label{page:Sassign}
262\begin{quote}
263\begin{lstlisting}
264ninductive step (ge:genv) : state → trace → state → Prop ≝
265  | step_assign:   ∀f,a1,a2,k,e,m,psp,loc,ofs,v2,m',tr1,tr2.
266      eval_lvalue ge e m a1 psp loc ofs tr1 →
267      eval_expr ge e m a2 v2 tr2 →
268      store_value_of_type (typeof a1) m psp loc ofs v2 = Some ? m' →
269      step ge (State f (Sassign a1 a2) k e m)
270           (tr1⧺tr2) (State f Sskip k e m')
271...
272\end{lstlisting}
273\end{quote}
274which can be read as, if
275\begin{itemize}
276\item \lstinline'a1' can evaluate to the location \lstinline'psp,loc,ofs',
277\item \lstinline'a2' evaluates to a value \lstinline'v2', and
278\item storing \lstinline'v2' at location \lstinline'psp,loc,ofs' succeeds,
279yielding the new memory state \lstinline-m'-, then
280\item the program can step from the state about to execute
281\lstinline'Sassign a1 a2' to a state with the updated memory \lstinline-m'-
282about to execute the no-op \lstinline'Sskip'.
283\end{itemize}
284This rule would be followed by one of the rules to which replaces the
285\lstinline'Sskip' statement with the `real' next statement constructed from
286the continuation \lstinline'k'.  Note that the only true side-effect here is
287the change in memory --- the local environment is initialised once and for all
288on function entry, and the only events appearing in the trace are cost labels
289used purely for accounting.
290\todo{ordering imposed by cost labels?}
291
292The \clight{} language provides input and output effects through `external'
293functions and the step rule
294\begin{quote}
295\begin{lstlisting}
296  | step_external_function: ∀id,targs,tres,vargs,k,m,vres,t.
297      event_match (external_function id targs tres) vargs t vres →
298      step ge (Callstate (External id targs tres) vargs k m)
299            t (Returnstate vres k m)
300\end{lstlisting}
301\end{quote}
302which allows the function to be invoked with and return any values subject to
303the enforcement of the typing rules in \lstinline'event_match', which also
304provides the trace.
305
306Cost label statements add the given label to the trace analogously to the cost
307label expressions above.
308
309\section{Executable semantics}
310\label{sec:exec}
311
312\todo{bit of a cheat to claim that it's equivalent without a completeness
313proof.}
314We have added an equivalent functional version of the \clight{} semantics that
315can be used to animate programs.  The definitions roughly follow the inductive
316semantics, but are necessarily rearranged around pattern matching of the
317relevant parts of the state rather than presenting each case separately.
318
319The code corresponding to the variable lookup definitions on
320page~\pageref{page:evalvar} is
321\begin{quote}
322\begin{lstlisting}
323nlet rec exec_expr (ge:genv) (en:env) (m:mem) (e:expr) on e : res (Σr:val×trace. eval_expr ge en m e (\fst r) (\snd r)) ≝
324match e with
325[ Expr e' ty ⇒
326  match e' with
327  [ ...
328  | Evar _ ⇒ Some ? (
329      do 〈l,tr〉 ← exec_lvalue' ge en m e' ty;
330      do v ← opt_to_res ? (load_value_of_type' ty m l);
331      OK ? 〈v,tr〉)
332...
333  ]
334]
335and exec_lvalue' (ge:genv) (en:env) (m:mem) (e':expr_descr) (ty:type) on e' : res (Σr:memory_space × block × int × trace. eval_lvalue ge en m (Expr e' ty) (\fst (\fst (\fst r))) (\snd (\fst (\fst r))) (\snd (\fst r)) (\snd r)) ≝
336  match e' with
337  [ Evar id ⇒
338      match (get … id en) with
339      [ None ⇒ Some ? (do 〈sp,l〉 ← opt_to_res ? (find_symbol ? ? ge id); OK ? 〈〈〈sp,l〉,zero〉,E0〉) (* global *)
340      | Some loc ⇒ Some ? (OK ? 〈〈〈Any,loc〉,zero〉,E0〉) (* local *)
341      ]
342...
343\end{lstlisting}
344\end{quote}
345where the result is placed in an error monad (the \lstinline'res' type
346constructor) so that \emph{undefined behaviour} such as dereferencing an
347invalid pointer can be rejected.  We use \lstinline'do' notation similar to
348Haskell and CompCert, where
349\begin{quote}
350\begin{lstlisting}
351do x ← e; e'
352\end{lstlisting}
353\end{quote}
354means evaluate \lstinline'e' and if it yields a value then bind that to
355\lstinline'x' and evaluate \lstinline-e'-, and otherwise propogate the error.
356
357Note the $\Sigma$ type for the result of the function, which shows that
358successful executions are sound with respect to the inductive semantics.
359Matita automatically generates proof obligations for each case due to a
360coercion between the types
361\begin{center}
362\lstinline'option res T' \quad and \quad \lstinline'res (Σx:T. P x)'
363\end{center}
364(where a branch marked \lstinline'None' would generate a proof obligation to
365show that it is impossible, although the semantics do not use this feature).
366This is intended to mimic Sozeau's \textsc{Russell} language and elaboration
367into Coq~\cite{sozeau06}.  The coercion also triggers an automatic mechanism
368in \matita{} to add equalities for each pattern matched.  Each case of the
369soundness proof consists of extracting an equality for each computation in the
370monad,
371% "in the monad" is a bit vague here
372making any further case distinctions necessary and applying the corresponding
373rule from the inductive semantics.
374
375Evaluating a step of a statement is complicated by the presence of the
376`external' functions for I/O, which can return arbitrary values.  These are
377handled by a resumption monad, which on encountering some I/O returns a
378suspension that when applied to a value resumes the evaluation of the
379semantics.  Resumption monads are a standard tool for providing denotational
380semantics for input~\cite{Moggi199155} and interleaved concurrency~\cite{??}.
381The definition also incorporates errors, and uses a coercion to automatically
382transform values from the plain error monad.
383\todo{should perhaps give more details of the resumption monad?}
384
385The execution of assignments is straightforward,
386\begin{quote}
387\begin{lstlisting}
388nlet rec exec_step (ge:genv) (st:state) on st : (IO eventval io_out (Σr:trace × state. step ge st (\fst r) (\snd r))) ≝
389match st with
390[ State f s k e m ⇒
391  match s with
392  [ Sassign a1 a2 ⇒ Some ? (
393    ! 〈l,tr1〉 ← exec_lvalue ge e m a1;:
394    ! 〈v2,tr2〉 ← exec_expr ge e m a2;:
395    ! m' ← store_value_of_type' (typeof a1) m l v2;:
396    ret ? 〈tr1⧺tr2, State f Sskip k e m'〉)
397...
398\end{lstlisting}
399\end{quote}
400where \lstinline'!' is used in place of \lstinline'do' due to the change in
401monad.  The content is essentially the same as the inductive rule given on
402page~\pageref{page:Sassign}.
403
404The handling of external calls uses the
405\begin{quote}
406\begin{lstlisting}
407do_io : ident → list eventval → IO eventval io_out eventval
408\end{lstlisting}
409\end{quote}
410function to suspend execution:
411\begin{quote}
412\begin{lstlisting}
413...
414  ]
415| Callstate f0 vargs k m ⇒
416  match f0 with
417  [ ...
418  | External f argtys retty ⇒ Some ? (
419      ! evargs ← check_eventval_list vargs (typlist_of_typelist argtys);:
420      ! evres ← do_io f evargs;:
421      ! vres ← check_eventval evres (proj_sig_res (signature_of_type argtys rett
422      ret ? 〈(Eextcall f evargs evres), Returnstate vres k m〉)
423...
424\end{lstlisting}
425\end{quote}
426\todo{say something more about the rest?}
427
428\todo{``the valid''?}
429Together with functions to provide a valid initial state for a program and to
430detect a final state we can write a function to run the program up to a given
431number of steps.  Similarly, a corecursive function can return the entire
432execution as a stream of trace and state pairs.
433
434\todo{completeness?}
435
436\begin{verbatim}
437Functional version of the semantics.
438Soundness given in dependent type of execution functions, using Russell-style
439separation of terms and proofs.  Pattern matching mimicks inductive definitions from small-step semantics.  Error monad to deal with undefined behaviour;
440IO monad with handling of I/O by resumption (which incorporates errors too).
441\end{verbatim}
442
443\section{8051 memory spaces}
444\label{sec:memory}
445
446\begin{verbatim}
447Share quite a bit with the 8051 doc.
448
449Outline of 8051 memory.
450
451Semantic changes: Values: Blocks in the memory model carry space they are in
452(or \texttt{Any} if unspecified; pointers carry space they can point at (or
453\texttt{Any} for a generic pointer which can refer to any block).  Types:
454pointers carry space corresponding to the allowed pointer values (arrays similar
455because they degrade into pointers).  (Not
456redundant: pointers can be put into a sufficiently large integer then recast to
457their original type.  Cast must be undefined if spaces don't match.)  Operations: casts, comparisons, addition and subtraction.
458
459Syntactic changes: Clight, pointer and array types, and global decls have spaces added.  CIL similarly changed, but had to tracking space for expr in the elaboration to discover spaces for types.
460\end{verbatim}
461
462\section{Validation}
463\label{sec:valid}
464
465\begin{verbatim}
466Animation of simple C programs.
467\end{verbatim}
468
469\begin{verbatim}
470MORE:
471  library stuff: choice of integers, maps
472  check for any axioms
473\end{verbatim}
474
475\bibliographystyle{plain}
476\bibliography{report}
477
478\end{document}
Note: See TracBrowser for help on using the repository browser.