1 | \documentclass[11pt,epsf,a4wide]{article} |
---|
2 | \usepackage[utf8x]{inputenc} |
---|
3 | \usepackage{listings} |
---|
4 | \usepackage{../../style/cerco} |
---|
5 | \newcommand{\ocaml}{OCaml} |
---|
6 | \newcommand{\clight}{Clight} |
---|
7 | \newcommand{\matita}{matita} |
---|
8 | \newcommand{\sdcc}{\texttt{sdcc}} |
---|
9 | |
---|
10 | \newcommand{\textSigma}{\ensuremath{\Sigma}} |
---|
11 | |
---|
12 | % LaTeX Companion, p 74 |
---|
13 | \newcommand{\todo}[1]{\marginpar{\raggedright - #1}} |
---|
14 | |
---|
15 | \lstdefinelanguage{coq} |
---|
16 | {keywords={Definition,Lemma,Theorem,Remark,Qed,Save,Inductive,Record}, |
---|
17 | morekeywords={[2]if,then,else}, |
---|
18 | } |
---|
19 | |
---|
20 | \lstdefinelanguage{matita} |
---|
21 | {keywords={ndefinition,nlemma,ntheorem,nremark,ninductive,nrecord,nqed,nlet,rec,match,with,Type}, |
---|
22 | morekeywords={[2]nwhd,nnormalize,nelim,ncases,ndestruct}, |
---|
23 | mathescape=true, |
---|
24 | } |
---|
25 | |
---|
26 | \lstset{language=matita,basicstyle=\small\tt,columns=flexible,breaklines=false, |
---|
27 | keywordstyle=\color{red}\bfseries, |
---|
28 | keywordstyle=[2]\color{blue}, |
---|
29 | commentstyle=\color{green}, |
---|
30 | stringstyle=\color{blue}, |
---|
31 | showspaces=false,showstringspaces=false} |
---|
32 | |
---|
33 | \lstset{extendedchars=false} |
---|
34 | \lstset{inputencoding=utf8x} |
---|
35 | \DeclareUnicodeCharacter{8797}{:=} |
---|
36 | \DeclareUnicodeCharacter{10746}{++} |
---|
37 | \DeclareUnicodeCharacter{9001}{\ensuremath{\langle}} |
---|
38 | \DeclareUnicodeCharacter{9002}{\ensuremath{\rangle}} |
---|
39 | |
---|
40 | |
---|
41 | \title{ |
---|
42 | INFORMATION AND COMMUNICATION TECHNOLOGIES\\ |
---|
43 | (ICT)\\ |
---|
44 | PROGRAMME\\ |
---|
45 | \vspace*{1cm}Project FP7-ICT-2009-C-243881 \cerco{}} |
---|
46 | |
---|
47 | \date{ } |
---|
48 | \author{} |
---|
49 | |
---|
50 | \begin{document} |
---|
51 | \thispagestyle{empty} |
---|
52 | |
---|
53 | \vspace*{-1cm} |
---|
54 | \begin{center} |
---|
55 | \includegraphics[width=0.6\textwidth]{../../style/cerco_logo.png} |
---|
56 | \end{center} |
---|
57 | |
---|
58 | \begin{minipage}{\textwidth} |
---|
59 | \maketitle |
---|
60 | \end{minipage} |
---|
61 | |
---|
62 | |
---|
63 | \vspace*{0.5cm} |
---|
64 | \begin{center} |
---|
65 | \begin{LARGE} |
---|
66 | \bf |
---|
67 | Report n. D3.1\\ |
---|
68 | Executable Formal Semantics of C\\ |
---|
69 | \end{LARGE} |
---|
70 | \end{center} |
---|
71 | |
---|
72 | \vspace*{2cm} |
---|
73 | \begin{center} |
---|
74 | \begin{large} |
---|
75 | Version 1.0 |
---|
76 | \end{large} |
---|
77 | \end{center} |
---|
78 | |
---|
79 | \vspace*{0.5cm} |
---|
80 | \begin{center} |
---|
81 | \begin{large} |
---|
82 | Main Authors:\\ |
---|
83 | Brian~Campbell, Randy~Pollack |
---|
84 | \end{large} |
---|
85 | \end{center} |
---|
86 | |
---|
87 | \vspace*{\fill} |
---|
88 | \noindent |
---|
89 | Project Acronym: \cerco{}\\ |
---|
90 | Project full title: Certified Complexity\\ |
---|
91 | Proposal/Contract no.: FP7-ICT-2009-C-243881 \cerco{}\\ |
---|
92 | |
---|
93 | \clearpage \pagestyle{myheadings} \markright{\cerco{}, FP7-ICT-2009-C-243881} |
---|
94 | |
---|
95 | \tableofcontents |
---|
96 | |
---|
97 | \section{Introduction} |
---|
98 | |
---|
99 | We present an executable formal semantics of the C programming language which |
---|
100 | will serve as the specification of the input language for the \cerco{} |
---|
101 | verified compiler. Our semantics is based on Leroy et.~al.'s C semantics for |
---|
102 | the CompCert project~\cite{1111042,compcert-mm08}, which divides the treatment |
---|
103 | of C into two pieces. The first is an \ocaml{} stage which parses and |
---|
104 | elaborates C into an abstract syntax tree for the simpler \clight{} language, |
---|
105 | based on the CIL C parser. The second part is a small step semantics for |
---|
106 | \clight{} formalised in the proof tool, which we have ported from Coq to the |
---|
107 | \matita{} theorem prover. |
---|
108 | This semantics is given in the form of inductive definitions, and so we have |
---|
109 | added a third part giving an equivalent functional presentation in \matita{}. |
---|
110 | |
---|
111 | The \cerco{} compiler needs to deal with the constrained memory model of |
---|
112 | the target microcontroller (in our case, the 8051). Thus each part of the |
---|
113 | semantics has been extended to allow explicit handling of the |
---|
114 | microcontroller's memory spaces. \emph{Cost labels} have also been added to |
---|
115 | the \clight{} semantics to support the labelling approach to cost annotations |
---|
116 | presented in~\cite{d2.1}. |
---|
117 | |
---|
118 | The following section discusses the C language extensions for memory spaces. |
---|
119 | Then the port of the two stages of the CompCert \clight{} semantics is |
---|
120 | described in Section~\ref{sec:port}, followed by the new executable semantics |
---|
121 | in Section~\ref{sec:exec}. Finally we discuss how the semantics can be tested |
---|
122 | in Section~\ref{sec:valid}. |
---|
123 | |
---|
124 | \section{Language extensions for the 8051 memory model} |
---|
125 | \label{sec:ext} |
---|
126 | |
---|
127 | The choice of an extended 8051 target for the \cerco{} compiler imposes an |
---|
128 | irregular memory model with tight resource constraints. The different memory |
---|
129 | spaces and access modes are summarised in Figure~\ref{fig:memorymodel} --- |
---|
130 | essentially the development of the 8051 family has fragmented memory into |
---|
131 | four regions: one half of the `internal' memory is fully accessible but also |
---|
132 | contains the register banks, the second half cannot be directly addressed |
---|
133 | because it is shadowed by the `Special Function Registers' (SFRs) for I/O; |
---|
134 | `external memory' provides the bulk of memory in a separate address space; and |
---|
135 | the code is in its own read-only space. |
---|
136 | |
---|
137 | To make efficient use of the limited amount of memory, compilers for 8051 |
---|
138 | microcontrollers provide extra keywords to allocate global variables to |
---|
139 | particular memory spaces, and to limit pointers to address a particular space. |
---|
140 | The freely available \sdcc{} compiler provides the following extensions for |
---|
141 | the 8051 memory spaces: |
---|
142 | \begin{table}[ht] |
---|
143 | \begin{tabular}{rcl} |
---|
144 | Attribute & Pointer size & Memory space \\\hline |
---|
145 | \lstinline'__data' & 1 & Internal, first half (0h -- 7fh) \\ |
---|
146 | \lstinline'__idata' & 1 & Internal, indirect only (80h -- ffh) \\ |
---|
147 | \lstinline'__pdata' & 1 & External, page access (usually 0h -- 7fh) \\ |
---|
148 | \lstinline'__xdata' & 2 & External, any (0h -- ffffh) \\ |
---|
149 | \lstinline'__code' & 2 & Code, any (0h -- ffffh) \\ |
---|
150 | \emph{none}& 3 & Any / Generic pointer |
---|
151 | \end{tabular} |
---|
152 | \end{table} |
---|
153 | The generic pointers are a tagged union of the other kinds of pointers. |
---|
154 | |
---|
155 | We intend the \cerco{} compiler to support extensions that are broadly |
---|
156 | compatible with \sdcc{} to enable the compilation of programs with either. In |
---|
157 | particular, this would allow the comparison of the behaviour of test cases |
---|
158 | compiled with both compilers. Thus the C syntax and semantics have been |
---|
159 | extended with the memory space attributes listed above. The syntax follows |
---|
160 | \sdcc{} and in the semantics we track the memory space that each block was |
---|
161 | allocated in and only permit access via the appropriate kinds of pointers. |
---|
162 | The details on these changes are given in the following sections. |
---|
163 | |
---|
164 | The \sdcc{} compiler also supports special variable types for accessing the |
---|
165 | SFRs, which provide the standard I/O mechanism for the 8051 family. (Note |
---|
166 | that pointers to these types are not allowed because only direct addressing of |
---|
167 | the SFRs is allowed.) We intend to use CompCert-style `external functions' |
---|
168 | instead of special types. These are functions which are declared, but no C |
---|
169 | implementation is provided. Instead they are provided by the runtime or |
---|
170 | compiled directly to the corresponding machine code. This has the advantage |
---|
171 | that no changes from the CompCert semantics are required, and a compatibility |
---|
172 | library can be provided for \sdcc{} if necessary. The 8051 and the \sdcc{} |
---|
173 | compiler also provide bit-level access to a small region of internal memory. |
---|
174 | We do not intend to expose this feature to C programs in the \cerco{} |
---|
175 | compiler, and so no extension is provided for it. |
---|
176 | |
---|
177 | \section{Port of CompCert \clight{} semantics to \matita{}} |
---|
178 | \label{sec:port} |
---|
179 | |
---|
180 | The first stage taken from the CompCert semantics is the parsing and |
---|
181 | elaboration of C programs into the simpler \clight{} language. This is based |
---|
182 | upon the CIL library for parsing, analysing and transforming C programs by |
---|
183 | Necula et.~al.~\cite{cil02}. The elaboration performed provides explicit type |
---|
184 | information throughout the program, including extra casts for promotion, and |
---|
185 | performs simplifications such as breaking up expressions with side effects |
---|
186 | into statements that perform the effects using effect-free expressions. The |
---|
187 | transformed \clight{} programs are much more manageable and lack the ambiguities |
---|
188 | of C, but also remain easily understood by C programmers. |
---|
189 | |
---|
190 | The parser has been extended with the 8051 memory spaces attributes given |
---|
191 | above. The resulting abstract syntax tree records them on global variable |
---|
192 | declarations and pointer types. However, we also need to deal with them |
---|
193 | during the elaboration process to produce all of the required information. |
---|
194 | For example, when the address-of operator \lstinline'&' is used it must decide |
---|
195 | which kind of pointer should be used. Thus the extended elaboration process |
---|
196 | keeps track of the memory space (if any) that the value of each |
---|
197 | expression resides in. Where the memory space is not known, a generic pointer |
---|
198 | will be used instead. Moreover, we also include the pointer kind when |
---|
199 | determining whether a cast must be inserted so that conversions between |
---|
200 | pointer representations can be performed. |
---|
201 | |
---|
202 | |
---|
203 | Thus the elaboration turns the C code |
---|
204 | \begin{quote} |
---|
205 | \begin{lstlisting}[language=C] |
---|
206 | int g(int *x) { return 5; } |
---|
207 | |
---|
208 | int f(__data int *x, int *y) { |
---|
209 | return x==y ? g(x) : *x; |
---|
210 | } |
---|
211 | |
---|
212 | __data int i = 1; |
---|
213 | |
---|
214 | int main(void) { |
---|
215 | return f(&i, &i); |
---|
216 | } |
---|
217 | \end{lstlisting} |
---|
218 | \end{quote} |
---|
219 | into the following Clight program: |
---|
220 | \begin{quote} |
---|
221 | \begin{lstlisting}[language=C] |
---|
222 | int g(int *x) { return 5; } |
---|
223 | |
---|
224 | int f(__data int * x, int * y) |
---|
225 | { |
---|
226 | int t; |
---|
227 | if (x == (__data int * )y) { |
---|
228 | t = g((int * )x); |
---|
229 | } else { |
---|
230 | t = *x; |
---|
231 | } |
---|
232 | return t; |
---|
233 | } |
---|
234 | |
---|
235 | int main(void) |
---|
236 | { |
---|
237 | int t; |
---|
238 | t = f(&i, (int * )(&i)); |
---|
239 | return t; |
---|
240 | } |
---|
241 | \end{lstlisting} |
---|
242 | \end{quote} |
---|
243 | The expressions in \lstinline'f' and \lstinline'main' had to be broken up due |
---|
244 | to side-effects, and casts have been added to change between generic pointers |
---|
245 | and pointers specific to the \lstinline'__data' section of memory. The |
---|
246 | underlying data structure also has types attached to every expression, but |
---|
247 | these are inconvenient to show in source form. |
---|
248 | |
---|
249 | Note that the translation from C to \clight{} is not proven correct --- |
---|
250 | instead it effectively forms a semi-formal part of the whole C semantics. We |
---|
251 | can have some confidence in the code, however, because it has received testing |
---|
252 | in the \cerco{} prototype, and it is very close to the version used in |
---|
253 | CompCert. We can also perform testing of the semantics without involving the |
---|
254 | rest of the compiler because we have an executable semantics. Moreover, the |
---|
255 | cautious programmer could choose to inspect the \clight{} code, or even work |
---|
256 | entirely in the \clight{} language. |
---|
257 | |
---|
258 | The semantics for \clight{} itself has been ported from the Coq development |
---|
259 | used in CompCert to \matita{} for use in \cerco{}. Details about the original |
---|
260 | big-step formalisation of \clight{} can be found in Leroy and |
---|
261 | Blazy~\cite{compcert-mm08} (including a discussion of the translation from C |
---|
262 | in \S 4.1), although we started from a later version with a |
---|
263 | small-step semantics and hence support for \lstinline'goto' statements. |
---|
264 | Several parts of the semantics were shared with other parts of the CompCert |
---|
265 | development, notably: |
---|
266 | \begin{itemize} |
---|
267 | \item representation of primitive values (integers, pointers and undefined |
---|
268 | values, but not structures or unions) and operations on them, |
---|
269 | \item traces of I/O events, |
---|
270 | \item a memory model that keeps conceptually distinct sections of memory |
---|
271 | strictly separate (assigning `undefined behaviour' to a buffer overflow, for |
---|
272 | instance), |
---|
273 | \item results about composing execution steps of arbitrary small-step |
---|
274 | semantics, |
---|
275 | \item data structures for local and global environments, and |
---|
276 | \item common error handling constructs, in particular an error monad. |
---|
277 | \end{itemize} |
---|
278 | We anticipate a similar arrangement for the \cerco{} verified compiler, |
---|
279 | although this means that there may be further changes to the common parts of the |
---|
280 | semantics later in the project to harmonise the stages of the compiler. |
---|
281 | |
---|
282 | The main body of the small-step semantics is a number of inductive definitions |
---|
283 | giving details of the defined behaviour for casts, expressions and statements. |
---|
284 | Expressions are side-effect free in \clight{} and only produce a value as |
---|
285 | output. In our case we also need a trace of any cost labels that are |
---|
286 | `evaluated' so that we will be able to give fine-grained costs for |
---|
287 | the execution of compiled conditional expressions. |
---|
288 | |
---|
289 | As an example of one of the expression rules, consider an expression which |
---|
290 | evaluates a variable, \lstinline'Expr (Evar id) ty'. A variable is an example |
---|
291 | of a class of expressions called \emph{lvalues}, which are roughly those |
---|
292 | expressions which can be assigned to. Thus we use a general rule for lvalues, |
---|
293 | \label{page:evalvar} |
---|
294 | \begin{quote} |
---|
295 | \begin{lstlisting} |
---|
296 | ninductive eval_expr (ge:genv) (e:env) (m:mem) : expr $\rightarrow$ val $\rightarrow$ trace $\rightarrow$ Prop ≝ |
---|
297 | ... |
---|
298 | | eval_Elvalue: $\forall$a,ty,psp,loc,ofs,v,tr. |
---|
299 | eval_lvalue ge e m (Expr a ty) psp loc ofs tr $\rightarrow$ |
---|
300 | load_value_of_type ty m psp loc ofs = Some ? v $\rightarrow$ |
---|
301 | eval_expr ge e m (Expr a ty) v tr |
---|
302 | \end{lstlisting} |
---|
303 | \end{quote} |
---|
304 | where the auxiliary relation \lstinline'eval_lvalue' yields the location of |
---|
305 | the value, \lstinline'psp,loc,ofs', consisting of memory space, memory block, |
---|
306 | and offset into the block, respectively. The expression can thus evaluate to |
---|
307 | the value \lstinline'v' if \lstinline'v' can be loaded from that location. |
---|
308 | One corresponding part of the \lstinline'eval_lvalue' definition is |
---|
309 | \begin{quote} |
---|
310 | \begin{lstlisting} |
---|
311 | ... |
---|
312 | with eval_lvalue (*(ge:genv) (e:env) (m:mem)*) : |
---|
313 | expr $\rightarrow$ memory_space $\rightarrow$ block $\rightarrow$ int $\rightarrow$ trace $\rightarrow$ Prop ≝ |
---|
314 | | eval_Evar_local: $\forall$id,l,ty. |
---|
315 | get ??? id e = Some ? l $\rightarrow$ |
---|
316 | eval_lvalue ge e m (Expr (Evar id) ty) Any l zero E0 |
---|
317 | ... |
---|
318 | \end{lstlisting} |
---|
319 | \end{quote} |
---|
320 | simply looks up the variable in the local environment. A similar rule handles |
---|
321 | global variables, with an extra check to ensure that no local variable has the |
---|
322 | same name. Note that the two relations are defined using mutual recursion |
---|
323 | because \lstinline'eval_lvalue' uses \lstinline'eval_expr' for the evaluation |
---|
324 | of the pointer expression in the dereferencing rule. |
---|
325 | |
---|
326 | Casts also have an auxiliary relation to specify the allowed changes, and |
---|
327 | operations on values (including the changes in representation performed by |
---|
328 | casting) are given as functions. |
---|
329 | |
---|
330 | The only new expression in our semantics is the cost label which wraps around |
---|
331 | another expression. It does not change the result, but merely adds the given |
---|
332 | label to the trace to identify the branches taken in conditional |
---|
333 | expressions so that accurate cost information can be attached to the program: |
---|
334 | \begin{quote} |
---|
335 | \begin{lstlisting} |
---|
336 | | eval_Ecost: $\forall$a,ty,v,l,tr. |
---|
337 | eval_expr ge e m a v tr $\rightarrow$ |
---|
338 | eval_expr ge e m (Expr (Ecost l a) ty) v (tr++Echarge l) |
---|
339 | \end{lstlisting} |
---|
340 | \end{quote} |
---|
341 | \todo{make the semantics of the cost labels clearer} |
---|
342 | |
---|
343 | As the expressions are side-effect free, all of the changes to the state are |
---|
344 | performed by statements. The state itself is represented by records of the |
---|
345 | form |
---|
346 | \begin{quote} |
---|
347 | \begin{lstlisting} |
---|
348 | ninductive state: Type := |
---|
349 | | State: |
---|
350 | $\forall$f: function. |
---|
351 | $\forall$s: statement. |
---|
352 | $\forall$k: cont. |
---|
353 | $\forall$e: env. |
---|
354 | $\forall$m: mem. state |
---|
355 | | Callstate: |
---|
356 | $\forall$fd: fundef. |
---|
357 | $\forall$args: list val. |
---|
358 | $\forall$k: cont. |
---|
359 | $\forall$m: mem. state |
---|
360 | | Returnstate: |
---|
361 | $\forall$res: val. |
---|
362 | $\forall$k: cont. |
---|
363 | $\forall$m: mem. state. |
---|
364 | \end{lstlisting} |
---|
365 | \end{quote} |
---|
366 | During normal execution the state contains the currently executing function's |
---|
367 | definition (used to find \lstinline'goto' labels and also to check whether the |
---|
368 | function is expected to return a value), the statement to be executed next, a |
---|
369 | continuation value to be executed afterwards (where successor statements and |
---|
370 | details of function calls and loops are stored), the local environment mapping |
---|
371 | variables to memory locations and the current memory state. |
---|
372 | \todo{need to note that all variables 'appear' to be memory allocated, even |
---|
373 | if they're subsequently optimised away.} |
---|
374 | The function call and return states appear to store less information because |
---|
375 | the details of the caller are contained in the continuation. |
---|
376 | |
---|
377 | An example of the statement execution rules is the assignment rule |
---|
378 | (corresponding to the C syntax \lstinline'a1 = a2'), |
---|
379 | \label{page:Sassign} |
---|
380 | \begin{quote} |
---|
381 | \begin{lstlisting} |
---|
382 | ninductive step (ge:genv) : state $\rightarrow$ trace $\rightarrow$ state $\rightarrow$ Prop ≝ |
---|
383 | | step_assign: $\forall$f,a1,a2,k,e,m,psp,loc,ofs,v2,m',tr1,tr2. |
---|
384 | eval_lvalue ge e m a1 psp loc ofs tr1 $\rightarrow$ |
---|
385 | eval_expr ge e m a2 v2 tr2 $\rightarrow$ |
---|
386 | store_value_of_type (typeof a1) m psp loc ofs v2 = Some ? m' $\rightarrow$ |
---|
387 | step ge (State f (Sassign a1 a2) k e m) |
---|
388 | (tr1++tr2) (State f Sskip k e m') |
---|
389 | ... |
---|
390 | \end{lstlisting} |
---|
391 | \end{quote} |
---|
392 | which can be read as: |
---|
393 | \begin{itemize} |
---|
394 | \item if \lstinline'a1' can evaluate to the location \lstinline'psp,loc,ofs', |
---|
395 | \item \lstinline'a2' evaluates to a value \lstinline'v2', and |
---|
396 | \item storing \lstinline'v2' at location \lstinline'psp,loc,ofs' succeeds, |
---|
397 | yielding the new memory state \lstinline-m'-, then |
---|
398 | \item the program can step from the state about to execute |
---|
399 | \lstinline'Sassign a1 a2' to a state with the updated memory \lstinline-m'- |
---|
400 | about to execute the no-op \lstinline'Sskip'. |
---|
401 | \end{itemize} |
---|
402 | This rule would be followed by one of the rules to which replaces the |
---|
403 | \lstinline'Sskip' statement with the `real' next statement constructed from |
---|
404 | the continuation \lstinline'k'. Note that the only true side-effect here is |
---|
405 | the change in memory --- the local environment is initialised once and for all |
---|
406 | on function entry, and the only events appearing in the trace are cost labels |
---|
407 | used purely for accounting. |
---|
408 | \todo{ordering imposed by cost labels?} |
---|
409 | |
---|
410 | The \clight{} language provides input and output effects through `external' |
---|
411 | functions and the step rule |
---|
412 | \begin{quote} |
---|
413 | \begin{lstlisting} |
---|
414 | | step_external_function: $\forall$id,targs,tres,vargs,k,m,vres,t. |
---|
415 | event_match (external_function id targs tres) vargs t vres $\rightarrow$ |
---|
416 | step ge (Callstate (External id targs tres) vargs k m) |
---|
417 | t (Returnstate vres k m) |
---|
418 | \end{lstlisting} |
---|
419 | \end{quote} |
---|
420 | which allows the function to be invoked with and return any values subject to |
---|
421 | the enforcement of the typing rules in \lstinline'event_match', which also |
---|
422 | provides the trace. |
---|
423 | |
---|
424 | Cost label statements add the given label to the trace analogously to the cost |
---|
425 | label expressions above. |
---|
426 | |
---|
427 | \section{Executable semantics} |
---|
428 | \label{sec:exec} |
---|
429 | |
---|
430 | \todo{bit of a cheat to claim that it's equivalent without a completeness |
---|
431 | proof.} |
---|
432 | We have added an equivalent functional version of the \clight{} semantics that |
---|
433 | can be used to animate programs. The definitions roughly follow the inductive |
---|
434 | semantics, but are necessarily rearranged around pattern matching of the |
---|
435 | relevant parts of the state rather than presenting each case separately. |
---|
436 | |
---|
437 | The code corresponding to the variable lookup definitions on |
---|
438 | page~\pageref{page:evalvar} is |
---|
439 | \begin{quote} |
---|
440 | \begin{lstlisting} |
---|
441 | nlet rec exec_expr (ge:genv) (en:env) (m:mem) (e:expr) on e : |
---|
442 | res ($\Sigma$r:val$\times$trace. eval_expr ge en m e (\fst r) (\snd r)) ≝ |
---|
443 | match e with |
---|
444 | [ Expr e' ty $\Rightarrow$ |
---|
445 | match e' with |
---|
446 | [ ... |
---|
447 | | Evar _ $\Rightarrow$ Some ? ( |
---|
448 | do $\langle$l,tr$\rangle$ $\leftarrow$ exec_lvalue' ge en m e' ty; |
---|
449 | do v $\leftarrow$ opt_to_res ? (load_value_of_type' ty m l); |
---|
450 | OK ? $\langle$v,tr$\rangle$) |
---|
451 | ... |
---|
452 | ] |
---|
453 | ] |
---|
454 | and exec_lvalue' (ge:genv) (en:env) (m:mem) (e':expr_descr) (ty:type) on e' : |
---|
455 | res ($\Sigma$r:memory_space $\times$ block $\times$ int $\times$ trace. eval_lvalue ge en m (Expr e' ty) (\fst (\fst (\fst r))) (\snd (\fst (\fst r))) (\snd (\fst r)) (\snd r)) ≝ |
---|
456 | match e' with |
---|
457 | [ Evar id $\Rightarrow$ |
---|
458 | match (get … id en) with |
---|
459 | [ None $\Rightarrow$ Some ? (do $\langle$sp,l$\rangle$ $\leftarrow$ opt_to_res ? (find_symbol ? ? ge id); OK ? $\langle$$\langle$$\langle$sp,l$\rangle$,zero$\rangle$,E0$\rangle$) (* global *) |
---|
460 | | Some loc $\Rightarrow$ Some ? (OK ? $\langle$$\langle$$\langle$Any,loc$\rangle$,zero$\rangle$,E0$\rangle$) (* local *) |
---|
461 | ] |
---|
462 | ... |
---|
463 | \end{lstlisting} |
---|
464 | \end{quote} |
---|
465 | where the result is placed in an error monad (the \lstinline'res' type |
---|
466 | constructor) so that \emph{undefined behaviour} such as dereferencing an |
---|
467 | invalid pointer can be rejected. We use \lstinline'do' notation similar to |
---|
468 | Haskell and CompCert, where |
---|
469 | \begin{quote} |
---|
470 | \begin{lstlisting} |
---|
471 | do x $\leftarrow$ e; e' |
---|
472 | \end{lstlisting} |
---|
473 | \end{quote} |
---|
474 | means evaluate \lstinline'e' and if it yields a value then bind that to |
---|
475 | \lstinline'x' and evaluate \lstinline-e'-, and otherwise propogate the error. |
---|
476 | |
---|
477 | Note the $\Sigma$ type for the result of the function, which shows that |
---|
478 | successful executions are sound with respect to the inductive semantics. |
---|
479 | Matita automatically generates proof obligations for each case due to a |
---|
480 | coercion between the types |
---|
481 | \begin{center} |
---|
482 | \lstinline'option (res T)' \quad and \quad \lstinline'res ($\Sigma$x:T. P x)' |
---|
483 | \end{center} |
---|
484 | (where a branch marked \lstinline'None' would generate a proof obligation to |
---|
485 | show that it is impossible, although the semantics do not use this feature). |
---|
486 | This is intended to mimic Sozeau's \textsc{Russell} language and elaboration |
---|
487 | into Coq~\cite{sozeau06}. The coercion also triggers an automatic mechanism |
---|
488 | in \matita{} to add equalities for each pattern matched. Each case of the |
---|
489 | soundness proof consists of extracting an equality for each computation in the |
---|
490 | monad, |
---|
491 | % "in the monad" is a bit vague here |
---|
492 | making any further case distinctions necessary and applying the corresponding |
---|
493 | rule from the inductive semantics. |
---|
494 | |
---|
495 | Evaluating a step of a statement is complicated by the presence of the |
---|
496 | `external' functions for I/O, which can return arbitrary values. These are |
---|
497 | handled by a resumption monad, which on encountering some I/O returns a |
---|
498 | suspension. When the suspension is applied to a value the evaluation of the |
---|
499 | semantics is resumed. Resumption monads are a standard tool for providing |
---|
500 | denotational semantics for input~\cite{Moggi199155} and interleaved |
---|
501 | concurrency~\cite{??}. |
---|
502 | The definition also incorporates errors, and uses a coercion to automatically |
---|
503 | transform values from the plain error monad. |
---|
504 | \todo{should perhaps give more details of the resumption monad?} |
---|
505 | |
---|
506 | The execution of assignments is straightforward, |
---|
507 | \begin{quote} |
---|
508 | \begin{lstlisting} |
---|
509 | nlet rec exec_step (ge:genv) (st:state) on st : (IO eventval io_out ($\Sigma$r:trace $\times$ state. step ge st (\fst r) (\snd r))) ≝ |
---|
510 | match st with |
---|
511 | [ State f s k e m $\Rightarrow$ |
---|
512 | match s with |
---|
513 | [ Sassign a1 a2 $\Rightarrow$ Some ? ( |
---|
514 | ! $\langle$l,tr1$\rangle$ $\leftarrow$ exec_lvalue ge e m a1;: |
---|
515 | ! $\langle$v2,tr2$\rangle$ $\leftarrow$ exec_expr ge e m a2;: |
---|
516 | ! m' $\leftarrow$ store_value_of_type' (typeof a1) m l v2;: |
---|
517 | ret ? $\langle$tr1++tr2, State f Sskip k e m'$\rangle$) |
---|
518 | ... |
---|
519 | \end{lstlisting} |
---|
520 | \end{quote} |
---|
521 | where \lstinline'!' is used in place of \lstinline'do' due to the change in |
---|
522 | monad. The content is essentially the same as the inductive rule given on |
---|
523 | page~\pageref{page:Sassign}. |
---|
524 | |
---|
525 | The handling of external calls uses the |
---|
526 | \begin{quote} |
---|
527 | \begin{lstlisting} |
---|
528 | do_io : ident $\rightarrow$ list eventval $\rightarrow$ IO eventval io_out eventval |
---|
529 | \end{lstlisting} |
---|
530 | \end{quote} |
---|
531 | function to suspend execution: |
---|
532 | \begin{quote} |
---|
533 | \begin{lstlisting} |
---|
534 | ... |
---|
535 | ] |
---|
536 | | Callstate f0 vargs k m $\Rightarrow$ |
---|
537 | match f0 with |
---|
538 | [ ... |
---|
539 | | External f argtys retty $\Rightarrow$ Some ? ( |
---|
540 | ! evargs $\leftarrow$ check_eventval_list vargs (typlist_of_typelist argtys);: |
---|
541 | ! evres $\leftarrow$ do_io f evargs;: |
---|
542 | ! vres $\leftarrow$ check_eventval evres (proj_sig_res (signature_of_type argtys rett |
---|
543 | ret ? $\langle$(Eextcall f evargs evres), Returnstate vres k m$\rangle$) |
---|
544 | ... |
---|
545 | \end{lstlisting} |
---|
546 | \end{quote} |
---|
547 | \todo{say something more about the rest?} |
---|
548 | |
---|
549 | Together with functions to provide the initial state for a program and to |
---|
550 | detect a final state we can write a function to run the program up to a given |
---|
551 | number of steps. Similarly, a corecursive function can return the entire |
---|
552 | execution as a stream of trace and state pairs. |
---|
553 | |
---|
554 | \todo{completeness?} |
---|
555 | |
---|
556 | \section{Validation} |
---|
557 | \label{sec:valid} |
---|
558 | |
---|
559 | \begin{verbatim} |
---|
560 | Animation of simple C programs. |
---|
561 | \end{verbatim} |
---|
562 | |
---|
563 | \begin{verbatim} |
---|
564 | MORE: |
---|
565 | library stuff: choice of integers, maps |
---|
566 | check for any axioms |
---|
567 | \end{verbatim} |
---|
568 | |
---|
569 | \bibliographystyle{plain} |
---|
570 | \bibliography{report} |
---|
571 | |
---|
572 | \end{document} |
---|