1 | \documentclass[11pt,epsf,a4wide]{article} |
---|
2 | \usepackage[utf8x]{inputenc} |
---|
3 | \usepackage{listings} |
---|
4 | \usepackage{../../style/cerco} |
---|
5 | \newcommand{\ocaml}{OCaml} |
---|
6 | \newcommand{\clight}{Clight} |
---|
7 | \newcommand{\matita}{matita} |
---|
8 | \newcommand{\sdcc}{\texttt{sdcc}} |
---|
9 | |
---|
10 | \newcommand{\textSigma}{\ensuremath{\Sigma}} |
---|
11 | |
---|
12 | % LaTeX Companion, p 74 |
---|
13 | \newcommand{\todo}[1]{\marginpar{\raggedright - #1}} |
---|
14 | |
---|
15 | \lstdefinelanguage{coq} |
---|
16 | {keywords={Definition,Lemma,Theorem,Remark,Qed,Save,Inductive,Record}, |
---|
17 | morekeywords={[2]if,then,else}, |
---|
18 | } |
---|
19 | |
---|
20 | \lstdefinelanguage{matita} |
---|
21 | {keywords={ndefinition,nlemma,ntheorem,nremark,ninductive,nrecord,nqed,nlet,rec,match,with,Type}, |
---|
22 | morekeywords={[2]nwhd,nnormalize,nelim,ncases,ndestruct}, |
---|
23 | mathescape=true, |
---|
24 | } |
---|
25 | |
---|
26 | \lstset{language=matita,basicstyle=\small\tt,columns=flexible,breaklines=false, |
---|
27 | keywordstyle=\color{red}\bfseries, |
---|
28 | keywordstyle=[2]\color{blue}, |
---|
29 | commentstyle=\color{green}, |
---|
30 | stringstyle=\color{blue}, |
---|
31 | showspaces=false,showstringspaces=false} |
---|
32 | |
---|
33 | \lstset{extendedchars=false} |
---|
34 | \lstset{inputencoding=utf8x} |
---|
35 | \DeclareUnicodeCharacter{8797}{:=} |
---|
36 | \DeclareUnicodeCharacter{10746}{++} |
---|
37 | \DeclareUnicodeCharacter{9001}{\ensuremath{\langle}} |
---|
38 | \DeclareUnicodeCharacter{9002}{\ensuremath{\rangle}} |
---|
39 | |
---|
40 | |
---|
41 | \title{ |
---|
42 | INFORMATION AND COMMUNICATION TECHNOLOGIES\\ |
---|
43 | (ICT)\\ |
---|
44 | PROGRAMME\\ |
---|
45 | \vspace*{1cm}Project FP7-ICT-2009-C-243881 \cerco{}} |
---|
46 | |
---|
47 | \date{ } |
---|
48 | \author{} |
---|
49 | |
---|
50 | \begin{document} |
---|
51 | \thispagestyle{empty} |
---|
52 | |
---|
53 | \vspace*{-1cm} |
---|
54 | \begin{center} |
---|
55 | \includegraphics[width=0.6\textwidth]{../../style/cerco_logo.png} |
---|
56 | \end{center} |
---|
57 | |
---|
58 | \begin{minipage}{\textwidth} |
---|
59 | \maketitle |
---|
60 | \end{minipage} |
---|
61 | |
---|
62 | |
---|
63 | \vspace*{0.5cm} |
---|
64 | \begin{center} |
---|
65 | \begin{LARGE} |
---|
66 | \bf |
---|
67 | Report n. D3.1\\ |
---|
68 | Executable Formal Semantics of C\\ |
---|
69 | \end{LARGE} |
---|
70 | \end{center} |
---|
71 | |
---|
72 | \vspace*{2cm} |
---|
73 | \begin{center} |
---|
74 | \begin{large} |
---|
75 | Version 1.0 |
---|
76 | \end{large} |
---|
77 | \end{center} |
---|
78 | |
---|
79 | \vspace*{0.5cm} |
---|
80 | \begin{center} |
---|
81 | \begin{large} |
---|
82 | Main Authors:\\ |
---|
83 | Brian~Campbell, Randy~Pollack |
---|
84 | \end{large} |
---|
85 | \end{center} |
---|
86 | |
---|
87 | \vspace*{\fill} |
---|
88 | \noindent |
---|
89 | Project Acronym: \cerco{}\\ |
---|
90 | Project full title: Certified Complexity\\ |
---|
91 | Proposal/Contract no.: FP7-ICT-2009-C-243881 \cerco{}\\ |
---|
92 | |
---|
93 | \clearpage \pagestyle{myheadings} \markright{\cerco{}, FP7-ICT-2009-C-243881} |
---|
94 | |
---|
95 | \tableofcontents |
---|
96 | |
---|
97 | \section{Introduction} |
---|
98 | |
---|
99 | We present an executable formal semantics of the C programming language which |
---|
100 | will serve as the specification of the input language for the \cerco{} |
---|
101 | verified compiler. Our semantics is based on Leroy et.~al.'s C semantics for |
---|
102 | the CompCert project~\cite{1111042,compcert-mm08}, which divides the treatment |
---|
103 | of C into two pieces. The first is an \ocaml{} stage which parses and |
---|
104 | elaborates C into an abstract syntax tree for the simpler \clight{} language, |
---|
105 | based on the CIL C parser. The second part is a small step semantics for |
---|
106 | \clight{} formalised in the proof tool, which we have ported from Coq to the |
---|
107 | \matita{} theorem prover. |
---|
108 | This semantics is given in the form of inductive definitions, and so we have |
---|
109 | added a third part giving an equivalent functional presentation in \matita{}. |
---|
110 | |
---|
111 | The \cerco{} compiler needs to deal with the constrained memory model of |
---|
112 | the target microcontroller (in our case, the 8051). Thus each part of the |
---|
113 | semantics has been extended to allow explicit handling of the |
---|
114 | microcontroller's memory spaces. \emph{Cost labels} have also been added to |
---|
115 | the \clight{} semantics to support the labelling approach to cost annotations |
---|
116 | presented in~\cite{d2.1}. |
---|
117 | |
---|
118 | The following section discusses the C language extensions for memory spaces. |
---|
119 | Then the port of the two stages of the CompCert \clight{} semantics is |
---|
120 | described in Section~\ref{sec:port}, followed by the new executable semantics |
---|
121 | in Section~\ref{sec:exec}. Finally we discuss how the semantics can be tested |
---|
122 | in Section~\ref{sec:valid}. |
---|
123 | |
---|
124 | \section{Language extensions for the 8051 memory model} |
---|
125 | \label{sec:ext} |
---|
126 | |
---|
127 | The choice of an extended 8051 target for the \cerco{} compiler imposes an |
---|
128 | irregular memory model with tight resource constraints. The different memory |
---|
129 | spaces and access modes are summarised in Figure~\ref{fig:memorymodel} --- |
---|
130 | essentially the evolution of the 8051 family has fragmented memory into |
---|
131 | four regions: one half of the `internal' memory is fully accessible but also |
---|
132 | contains the register banks, the second half cannot be accessed by direct addressing |
---|
133 | because it is shadowed by the `Special Function Registers' (SFRs) for I/O; |
---|
134 | `external memory' provides the bulk of memory in a separate address space; and |
---|
135 | the code is in its own read-only space. |
---|
136 | \begin{figure} |
---|
137 | \setlength{\unitlength}{1pt} |
---|
138 | \begin{picture}(410,250)(-50,200) |
---|
139 | %\put(-50,200){\framebox(410,250){}} |
---|
140 | \put(12,410){\makebox(80,0)[b]{Internal (256B)}} |
---|
141 | \put(13,242){\line(0,1){165}} |
---|
142 | \put(93,242){\line(0,1){165}} |
---|
143 | \put(13,407){\line(1,0){80}} |
---|
144 | \put(12,400){\makebox(0,0)[r]{0h}} \put(14,400){\makebox(0,0)[l]{Register bank 0}} |
---|
145 | \put(13,393){\line(1,0){80}} |
---|
146 | \put(12,386){\makebox(0,0)[r]{8h}} \put(14,386){\makebox(0,0)[l]{Register bank 1}} |
---|
147 | \put(13,379){\line(1,0){80}} |
---|
148 | \put(12,372){\makebox(0,0)[r]{10h}} \put(14,372){\makebox(0,0)[l]{Register bank 2}} |
---|
149 | \put(13,365){\line(1,0){80}} |
---|
150 | \put(12,358){\makebox(0,0)[r]{18h}} \put(14,358){\makebox(0,0)[l]{Register bank 3}} |
---|
151 | \put(13,351){\line(1,0){80}} |
---|
152 | \put(12,344){\makebox(0,0)[r]{20h}} \put(14,344){\makebox(0,0)[l]{Bit addressable}} |
---|
153 | \put(13,323){\line(1,0){80}} |
---|
154 | \put(12,316){\makebox(0,0)[r]{30h}} |
---|
155 | \put(14,309){\makebox(0,0)[l]{\quad \vdots}} |
---|
156 | \put(13,291){\line(1,0){80}} |
---|
157 | \put(12,284){\makebox(0,0)[r]{80h}} |
---|
158 | \put(14,263){\makebox(0,0)[l]{\quad \vdots}} |
---|
159 | \put(12,249){\makebox(0,0)[r]{ffh}} |
---|
160 | \put(13,242){\line(1,0){80}} |
---|
161 | |
---|
162 | \qbezier(-2,407)(-6,407)(-6,393) |
---|
163 | \qbezier(-6,393)(-6,324)(-10,324) |
---|
164 | \put(-12,324){\makebox(0,0)[r]{indirect}} |
---|
165 | \qbezier(-6,256)(-6,324)(-10,324) |
---|
166 | \qbezier(-2,242)(-6,242)(-6,256) |
---|
167 | |
---|
168 | \qbezier(94,407)(98,407)(98,393) |
---|
169 | \qbezier(98,393)(98,349)(102,349) |
---|
170 | \put(104,349){\makebox(0,0)[l]{direct/stack}} |
---|
171 | \qbezier(98,305)(98,349)(102,349) |
---|
172 | \qbezier(94,291)(98,291)(98,305) |
---|
173 | |
---|
174 | \put(102,242){\framebox(20,49){sfr}} |
---|
175 | % bit access to sfrs? |
---|
176 | |
---|
177 | \qbezier(124,291)(128,291)(128,277) |
---|
178 | \qbezier(128,277)(128,266)(132,266) |
---|
179 | \put(134,266){\makebox(0,0)[l]{direct}} |
---|
180 | \qbezier(128,257)(128,266)(132,266) |
---|
181 | \qbezier(124,242)(128,242)(128,256) |
---|
182 | |
---|
183 | \put(164,410){\makebox(80,0)[b]{External (64kB)}} |
---|
184 | \put(164,220){\line(0,1){187}} |
---|
185 | \put(164,407){\line(1,0){80}} |
---|
186 | \put(244,220){\line(0,1){187}} |
---|
187 | \put(164,242){\line(1,0){80}} |
---|
188 | \put(163,400){\makebox(0,0)[r]{0h}} |
---|
189 | \put(164,324){\makebox(80,0){paged access}} |
---|
190 | \put(164,310){\makebox(80,0){direct/indirect}} |
---|
191 | \put(163,235){\makebox(0,0)[r]{80h}} |
---|
192 | \put(164,228){\makebox(80,0){\vdots}} |
---|
193 | \put(164,210){\makebox(80,0){direct/indirect}} |
---|
194 | |
---|
195 | \put(264,410){\makebox(80,0)[b]{Code (64kB)}} |
---|
196 | \put(264,220){\line(0,1){187}} |
---|
197 | \put(264,407){\line(1,0){80}} |
---|
198 | \put(344,220){\line(0,1){187}} |
---|
199 | \put(263,400){\makebox(0,0)[r]{0h}} |
---|
200 | \put(264,228){\makebox(80,0){\vdots}} |
---|
201 | \put(264,324){\makebox(80,0){direct}} |
---|
202 | \put(264,310){\makebox(80,0){PC relative}} |
---|
203 | \end{picture} |
---|
204 | \caption{The extended 8051 memory model} |
---|
205 | \label{fig:memorymodel} |
---|
206 | \end{figure} |
---|
207 | |
---|
208 | To make efficient use of the limited amount of memory, compilers for 8051 |
---|
209 | microcontrollers provide extra keywords to allocate global variables to |
---|
210 | particular memory spaces, and to limit pointers to address a particular space. |
---|
211 | The freely available \sdcc{} compiler provides the following extensions for |
---|
212 | the 8051 memory spaces: |
---|
213 | \begin{table}[ht] |
---|
214 | \begin{tabular}{rcl} |
---|
215 | Attribute & Pointer size (bytes) & Memory space \\\hline |
---|
216 | \lstinline'__data' & 1 & Internal, first half (0h -- 7fh) \\ |
---|
217 | \lstinline'__idata' & 1 & Internal, indirect only (80h -- ffh) \\ |
---|
218 | \lstinline'__pdata' & 1 & External, page access (usually 0h -- 7fh) \\ |
---|
219 | \lstinline'__xdata' & 2 & External, any (0h -- ffffh) \\ |
---|
220 | \lstinline'__code' & 2 & Code, any (0h -- ffffh) \\ |
---|
221 | \emph{none}& 3 & Any / Generic pointer |
---|
222 | \end{tabular} |
---|
223 | \end{table} |
---|
224 | The generic pointers are a tagged union of the other kinds of pointers. |
---|
225 | |
---|
226 | We intend the \cerco{} compiler to support extensions that are broadly |
---|
227 | compatible with \sdcc{} to enable the compilation of programs with |
---|
228 | either tool. In particular, this would allow the comparison of the |
---|
229 | behaviour of test cases compiled with each compiler. Thus the C |
---|
230 | syntax and semantics have been extended with the memory space |
---|
231 | attributes listed above. The syntax follows \sdcc{} and in the |
---|
232 | semantics we track the memory space that each block was allocated in |
---|
233 | and only permit access via the appropriate kinds of pointers. The |
---|
234 | details on these changes are given in the following sections. |
---|
235 | |
---|
236 | The \sdcc{} compiler also supports special variable types for accessing the |
---|
237 | SFRs, which provide the standard I/O mechanism for the 8051 family. (Note |
---|
238 | that pointers to these types are not permitted because only direct addressing of |
---|
239 | the SFRs is allowed.) We intend to use CompCert-style `external functions' |
---|
240 | instead of special types. These are functions which are declared, but no C |
---|
241 | implementation of them is provided. Instead they are provided by the runtime or |
---|
242 | compiled directly to the corresponding machine code. This has the advantage |
---|
243 | that no changes from the CompCert semantics are required, and a compatibility |
---|
244 | library can be provided for \sdcc{} if necessary. The 8051 and the \sdcc{} |
---|
245 | compiler also provide bit-level access to a small region of internal memory. |
---|
246 | We do not intend to expose this feature to C programs in the \cerco{} |
---|
247 | compiler, and so no extension is provided for it. |
---|
248 | |
---|
249 | Finally, we adopt the \sdcc{} extension to allocate a variable at a |
---|
250 | particular address to provide a way to deal with memory mapped I/O in |
---|
251 | the external memory space. |
---|
252 | \todo{Are we really going to do this?} |
---|
253 | |
---|
254 | \section{Port of CompCert \clight{} semantics to \matita{}} |
---|
255 | \label{sec:port} |
---|
256 | |
---|
257 | \todo{Could do with some text here} |
---|
258 | \subsection{Parsing and elaboration} |
---|
259 | The first stage taken from the CompCert semantics is the parsing and |
---|
260 | elaboration of C programs into the simpler \clight{} language. This is based |
---|
261 | upon the CIL library for parsing, analysing and transforming C programs by |
---|
262 | Necula et.~al.~\cite{cil02}. The elaboration performed provides explicit type |
---|
263 | information throughout the program, including extra casts for promotion, and |
---|
264 | performs simplifications such as breaking up expressions with side effects |
---|
265 | into effect-free expressions and statements to perform the effects. The |
---|
266 | transformed \clight{} programs are much more manageable and lack the ambiguities |
---|
267 | of C, but also remain easily understood by C programmers. |
---|
268 | |
---|
269 | The parser has been extended with the 8051 memory spaces attributes given |
---|
270 | above. The resulting abstract syntax tree records them on global variable |
---|
271 | declarations and pointer types. However, we also need to deal with them |
---|
272 | during the elaboration process to produce all of the required type information. |
---|
273 | For example, when the address-of operator \lstinline'&' is used it must decide |
---|
274 | which kind of pointer should be used. Thus the extended elaboration process |
---|
275 | keeps track of the memory space (if any) that the value of each |
---|
276 | expression resides in. Where the memory space is not known, a generic pointer |
---|
277 | will be used instead. Moreover, we also include the pointer kind when |
---|
278 | determining whether a cast must be inserted so that conversions between |
---|
279 | pointer representations can be performed. |
---|
280 | |
---|
281 | |
---|
282 | Thus the elaboration turns the following C code |
---|
283 | \begin{quote} |
---|
284 | \begin{lstlisting}[language=C] |
---|
285 | int g(int *x) { return 5; } |
---|
286 | |
---|
287 | int f(__data int *x, int *y) { |
---|
288 | return x==y ? g(x) : *x; |
---|
289 | } |
---|
290 | |
---|
291 | __data int i = 1; |
---|
292 | |
---|
293 | int main(void) { |
---|
294 | return f(&i, &i); |
---|
295 | } |
---|
296 | \end{lstlisting} |
---|
297 | \end{quote} |
---|
298 | into the Clight program below: |
---|
299 | \begin{quote} |
---|
300 | \begin{lstlisting}[language=C] |
---|
301 | int g(int *x) { return 5; } |
---|
302 | |
---|
303 | int f(__data int * x, int * y) |
---|
304 | { |
---|
305 | int t; |
---|
306 | if (x == (__data int * )y) { |
---|
307 | t = g((int * )x); |
---|
308 | } else { |
---|
309 | t = *x; |
---|
310 | } |
---|
311 | return t; |
---|
312 | } |
---|
313 | |
---|
314 | int main(void) |
---|
315 | { |
---|
316 | int t; |
---|
317 | t = f(&i, (int * )(&i)); |
---|
318 | return t; |
---|
319 | } |
---|
320 | \end{lstlisting} |
---|
321 | \end{quote} |
---|
322 | The expressions in \lstinline'f' and \lstinline'main' had to be broken up due |
---|
323 | to side-effects, and casts have been added to change between generic pointers |
---|
324 | and pointers specific to the \lstinline'__data' section of memory. The |
---|
325 | underlying data structure also has types attached to every expression, but |
---|
326 | these are impractical to show in source form. |
---|
327 | |
---|
328 | Note that the translation from C to \clight{} is not proven correct |
---|
329 | --- instead it effectively forms a semi-formal part of the whole C |
---|
330 | semantics. We can have some confidence in the code, however, because |
---|
331 | it has received testing in the \cerco{} prototype, and it is very |
---|
332 | close to the version used in CompCert. We can also perform testing of |
---|
333 | the semantics without involving the rest of the compiler because we |
---|
334 | have an executable semantics. Moreover, the cautious programmer could |
---|
335 | choose to inspect the generated \clight{} code, or even work entirely |
---|
336 | in the \clight{} language. |
---|
337 | |
---|
338 | \subsection{Small-step inductive semantics} |
---|
339 | \label{sec:inductive} |
---|
340 | |
---|
341 | The semantics for \clight{} itself has been ported from the Coq development |
---|
342 | used in CompCert to \matita{} for use in \cerco{}. Details about the original |
---|
343 | big-step formalisation of \clight{} can be found in Leroy and |
---|
344 | Blazy~\cite{compcert-mm08} (including a discussion of the translation from C |
---|
345 | in \S 4.1), although we started from a later version with a |
---|
346 | small-step semantics and hence support for \lstinline'goto' statements. |
---|
347 | Several parts of the semantics were shared with other parts of the CompCert |
---|
348 | development, notably: |
---|
349 | \begin{itemize} |
---|
350 | \item the representation of primitive values (integers, pointers and undefined |
---|
351 | values, but not structures or unions) and operations on them, |
---|
352 | \item traces of I/O events, |
---|
353 | \item a memory model that keeps conceptually distinct sections of memory |
---|
354 | strictly separate (assigning `undefined behaviour' to a buffer overflow, for |
---|
355 | instance), |
---|
356 | \item results about composing execution steps of arbitrary small-step |
---|
357 | semantics, |
---|
358 | \item data structures for local and global environments, and |
---|
359 | \item common error handling constructs, in particular an error monad. |
---|
360 | \end{itemize} |
---|
361 | We anticipate a similar arrangement for the \cerco{} verified |
---|
362 | compiler, although this means that there may be further changes to the |
---|
363 | common parts of the semantics later in the project to harmonise the |
---|
364 | stages of the compiler. In particular, some of data structures for |
---|
365 | environments are just preliminary definitions for developing the |
---|
366 | semantics. |
---|
367 | |
---|
368 | The main body of the small-step semantics is a number of inductive definitions |
---|
369 | giving details of the defined behaviour for casts, expressions and statements. |
---|
370 | Expressions are side-effect free in \clight{} and only produce a value as |
---|
371 | output. In our case we also need a trace of any cost labels that are |
---|
372 | `evaluated' so that we will be able to give fine-grained costs for |
---|
373 | the execution of compiled conditional expressions. |
---|
374 | |
---|
375 | As an example of one of the expression rules, consider an expression which |
---|
376 | evaluates a variable, \lstinline'Expr (Evar id) ty'. A variable is an example |
---|
377 | of a class of expressions called \emph{lvalues}, which are roughly those |
---|
378 | expressions which can be assigned to. Thus we use a general rule for lvalues, |
---|
379 | \label{page:evalvar} |
---|
380 | \begin{quote} |
---|
381 | \begin{lstlisting} |
---|
382 | ninductive eval_expr (ge:genv) (e:env) (m:mem) : expr $\rightarrow$ val $\rightarrow$ trace $\rightarrow$ Prop ≝ |
---|
383 | ... |
---|
384 | | eval_Elvalue: $\forall$a,ty,psp,loc,ofs,v,tr. |
---|
385 | eval_lvalue ge e m (Expr a ty) psp loc ofs tr $\rightarrow$ |
---|
386 | load_value_of_type ty m psp loc ofs = Some ? v $\rightarrow$ |
---|
387 | eval_expr ge e m (Expr a ty) v tr |
---|
388 | \end{lstlisting} |
---|
389 | \end{quote} |
---|
390 | where the auxiliary relation \lstinline'eval_lvalue' yields the location of |
---|
391 | the value, \lstinline'psp,loc,ofs', consisting of memory space, memory block, |
---|
392 | and offset into the block, respectively. The expression can thus evaluate to |
---|
393 | the value \lstinline'v' if \lstinline'v' can be loaded from that location. |
---|
394 | One corresponding part of the \lstinline'eval_lvalue' definition is |
---|
395 | \begin{quote} |
---|
396 | \begin{lstlisting} |
---|
397 | ... |
---|
398 | with eval_lvalue (*(ge:genv) (e:env) (m:mem)*) : |
---|
399 | expr $\rightarrow$ memory_space $\rightarrow$ block $\rightarrow$ int $\rightarrow$ trace $\rightarrow$ Prop ≝ |
---|
400 | | eval_Evar_local: $\forall$id,l,ty. |
---|
401 | get ??? id e = Some ? l $\rightarrow$ |
---|
402 | eval_lvalue ge e m (Expr (Evar id) ty) Any l zero E0 |
---|
403 | ... |
---|
404 | \end{lstlisting} |
---|
405 | \end{quote} |
---|
406 | simply looks up the variable in the local environment. The offset is |
---|
407 | zero because all variables are given their own memory block to prevent |
---|
408 | the use of stray pointers. A similar rule handles global variables, |
---|
409 | with an extra check to ensure that no local variable has the same |
---|
410 | name. Note that the two relations are defined using mutual recursion |
---|
411 | because \lstinline'eval_lvalue' uses \lstinline'eval_expr' for the |
---|
412 | evaluation of the pointer expression in the dereferencing rule. |
---|
413 | |
---|
414 | Casts also have an auxiliary relation to specify the allowed changes, and |
---|
415 | operations on values (including the changes in representation performed by |
---|
416 | casting) are given as functions. |
---|
417 | |
---|
418 | The only new expression in our semantics is the cost label which wraps |
---|
419 | around another expression. It does not change the result, but merely |
---|
420 | prefixes the trace with the given label to identify the branches taken |
---|
421 | in conditional expressions so that accurate cost information can be |
---|
422 | attached to the program: |
---|
423 | \begin{quote} |
---|
424 | \begin{lstlisting} |
---|
425 | | eval_Ecost: $\forall$a,ty,v,l,tr. |
---|
426 | eval_expr ge e m a v tr $\rightarrow$ |
---|
427 | eval_expr ge e m (Expr (Ecost l a) ty) v (tr++Echarge l) |
---|
428 | \end{lstlisting} |
---|
429 | \end{quote} |
---|
430 | \todo{make the semantics of the cost labels clearer} |
---|
431 | |
---|
432 | As the expressions are side-effect free, all of the changes to the state are |
---|
433 | performed by statements. The state itself is represented by records of the |
---|
434 | form |
---|
435 | \begin{quote} |
---|
436 | \begin{lstlisting} |
---|
437 | ninductive state: Type := |
---|
438 | | State: |
---|
439 | $\forall$f: function. |
---|
440 | $\forall$s: statement. |
---|
441 | $\forall$k: cont. |
---|
442 | $\forall$e: env. |
---|
443 | $\forall$m: mem. state |
---|
444 | | Callstate: |
---|
445 | $\forall$fd: fundef. |
---|
446 | $\forall$args: list val. |
---|
447 | $\forall$k: cont. |
---|
448 | $\forall$m: mem. state |
---|
449 | | Returnstate: |
---|
450 | $\forall$res: val. |
---|
451 | $\forall$k: cont. |
---|
452 | $\forall$m: mem. state. |
---|
453 | \end{lstlisting} |
---|
454 | \end{quote} |
---|
455 | During normal execution the state contains the currently executing |
---|
456 | function's definition (used to find \lstinline'goto' labels and also |
---|
457 | to check whether the function is expected to return a value), the |
---|
458 | statement to be executed next, a continuation value to be executed |
---|
459 | afterwards (where successor statements and details of function calls |
---|
460 | and loops are stored), the local environment mapping variables to |
---|
461 | memory locations\footnote{In the semantics all variables are |
---|
462 | allocated, although the compiler may subsequently allocate them to |
---|
463 | registers where possible.} and the current memory state. \todo{need |
---|
464 | to note that all variables 'appear' to be memory allocated, even if |
---|
465 | they're subsequently optimised away.} The function call and return |
---|
466 | states appear to store less information because the details of the |
---|
467 | caller are contained in the continuation. |
---|
468 | |
---|
469 | An example of the statement execution rules is the assignment rule |
---|
470 | (corresponding to the C syntax \lstinline'a1 = a2'), |
---|
471 | \label{page:Sassign} |
---|
472 | \begin{quote} |
---|
473 | \begin{lstlisting} |
---|
474 | ninductive step (ge:genv) : state $\rightarrow$ trace $\rightarrow$ state $\rightarrow$ Prop ≝ |
---|
475 | | step_assign: $\forall$f,a1,a2,k,e,m,psp,loc,ofs,v2,m',tr1,tr2. |
---|
476 | eval_lvalue ge e m a1 psp loc ofs tr1 $\rightarrow$ |
---|
477 | eval_expr ge e m a2 v2 tr2 $\rightarrow$ |
---|
478 | store_value_of_type (typeof a1) m psp loc ofs v2 = Some ? m' $\rightarrow$ |
---|
479 | step ge (State f (Sassign a1 a2) k e m) |
---|
480 | (tr1++tr2) (State f Sskip k e m') |
---|
481 | ... |
---|
482 | \end{lstlisting} |
---|
483 | \end{quote} |
---|
484 | which can be read as: |
---|
485 | \begin{itemize} |
---|
486 | \item if \lstinline'a1' can evaluate to the location \lstinline'psp,loc,ofs', |
---|
487 | \item \lstinline'a2' can evaluate to a value \lstinline'v2', and |
---|
488 | \item storing \lstinline'v2' at location \lstinline'psp,loc,ofs' succeeds, |
---|
489 | yielding the new memory state \lstinline-m'-, then |
---|
490 | \item the program can step from the state about to execute |
---|
491 | \lstinline'Sassign a1 a2' to a state with the updated memory \lstinline-m'- |
---|
492 | about to execute the no-op \lstinline'Sskip'. |
---|
493 | \end{itemize} |
---|
494 | This rule would be followed by one of the rules to which replaces the |
---|
495 | \lstinline'Sskip' statement with the `real' next statement constructed |
---|
496 | from the continuation \lstinline'k'. Note that the only true |
---|
497 | side-effect here is the change in memory --- the local environment is |
---|
498 | initialised once and for all on function entry, and the only events |
---|
499 | appearing in the trace are cost labels used purely for accounting. At |
---|
500 | present this imposes an ordering due to the cost labels. Should this |
---|
501 | prove too restrictive we may change it to produce a set of labels |
---|
502 | encountered. |
---|
503 | |
---|
504 | The \clight{} language provides input and output effects through `external' |
---|
505 | functions and the step rule |
---|
506 | \begin{quote} |
---|
507 | \begin{lstlisting} |
---|
508 | | step_external_function: $\forall$id,targs,tres,vargs,k,m,vres,t. |
---|
509 | event_match (external_function id targs tres) vargs t vres $\rightarrow$ |
---|
510 | step ge (Callstate (External id targs tres) vargs k m) |
---|
511 | t (Returnstate vres k m) |
---|
512 | \end{lstlisting} |
---|
513 | \end{quote} |
---|
514 | which allows the function to be invoked with and return any values subject to |
---|
515 | the enforcement of the typing rules in \lstinline'event_match', which also |
---|
516 | provides the trace. |
---|
517 | |
---|
518 | Cost label statements add the given label to the trace analogously to the cost |
---|
519 | label expressions above. |
---|
520 | |
---|
521 | \section{Executable semantics} |
---|
522 | \label{sec:exec} |
---|
523 | |
---|
524 | We have added an equivalent functional version of the \clight{} semantics that |
---|
525 | can be used to animate programs. The definitions roughly follow the inductive |
---|
526 | semantics, but are necessarily rearranged around pattern matching of the |
---|
527 | relevant parts of the state rather than presenting each case separately. |
---|
528 | |
---|
529 | \subsection{Expressions} |
---|
530 | |
---|
531 | The code corresponding to the variable lookup definitions on |
---|
532 | page~\pageref{page:evalvar} is |
---|
533 | \begin{quote} |
---|
534 | \begin{lstlisting} |
---|
535 | nlet rec exec_expr (ge:genv) (en:env) (m:mem) (e:expr) on e : |
---|
536 | res ($\Sigma$r:val$\times$trace. eval_expr ge en m e (\fst r) (\snd r)) ≝ |
---|
537 | match e with |
---|
538 | [ Expr e' ty $\Rightarrow$ |
---|
539 | match e' with |
---|
540 | [ ... |
---|
541 | | Evar _ $\Rightarrow$ Some ? ( |
---|
542 | do $\langle$l,tr$\rangle$ $\leftarrow$ exec_lvalue' ge en m e' ty; |
---|
543 | do v $\leftarrow$ opt_to_res ? (load_value_of_type' ty m l); |
---|
544 | OK ? $\langle$v,tr$\rangle$) |
---|
545 | ... |
---|
546 | ] |
---|
547 | ] |
---|
548 | and exec_lvalue' (ge:genv) (en:env) (m:mem) (e':expr_descr) (ty:type) on e' : |
---|
549 | res ($\Sigma$r:memory_space $\times$ block $\times$ int $\times$ trace. eval_lvalue ge en m (Expr e' ty) (\fst (\fst (\fst r))) (\snd (\fst (\fst r))) (\snd (\fst r)) (\snd r)) ≝ |
---|
550 | match e' with |
---|
551 | [ Evar id $\Rightarrow$ |
---|
552 | match (get … id en) with |
---|
553 | [ None $\Rightarrow$ Some ? (do $\langle$sp,l$\rangle$ $\leftarrow$ opt_to_res ? (find_symbol ? ? ge id); OK ? $\langle$$\langle$$\langle$sp,l$\rangle$,zero$\rangle$,E0$\rangle$) (* global *) |
---|
554 | | Some loc $\Rightarrow$ Some ? (OK ? $\langle$$\langle$$\langle$Any,loc$\rangle$,zero$\rangle$,E0$\rangle$) (* local *) |
---|
555 | ] |
---|
556 | ... |
---|
557 | \end{lstlisting} |
---|
558 | \end{quote} |
---|
559 | where the result is placed in an error monad (the \lstinline'res' type |
---|
560 | constructor) so that \emph{undefined behaviour} such as dereferencing an |
---|
561 | invalid pointer can be rejected. We use \lstinline'do' notation similar to |
---|
562 | Haskell and CompCert, where |
---|
563 | \begin{quote} |
---|
564 | \begin{lstlisting} |
---|
565 | do x $\leftarrow$ e; e' |
---|
566 | \end{lstlisting} |
---|
567 | \end{quote} |
---|
568 | means evaluate \lstinline'e' and if it yields a value then bind that to |
---|
569 | \lstinline'x' and evaluate \lstinline-e'-, and otherwise propogate the error. |
---|
570 | |
---|
571 | \subsection{Statements} |
---|
572 | |
---|
573 | Evaluating a step of a statement is complicated by the presence of the |
---|
574 | `external' functions for I/O, which can return arbitrary values. These are |
---|
575 | handled by a resumption monad, which on encountering some I/O returns a |
---|
576 | suspension. When the suspension is applied to a value the evaluation of the |
---|
577 | semantics is resumed. Resumption monads are a standard tool for providing |
---|
578 | denotational semantics for input~\cite{Moggi199155} and interleaved |
---|
579 | concurrency~\cite[Chapter 12]{schmidt86}. |
---|
580 | The definition also incorporates errors, and uses a coercion to automatically |
---|
581 | transform values from the plain error monad. |
---|
582 | \todo{should perhaps give more details of the resumption monad?} |
---|
583 | |
---|
584 | The execution of assignments is straightforward, |
---|
585 | \begin{quote} |
---|
586 | \begin{lstlisting} |
---|
587 | nlet rec exec_step (ge:genv) (st:state) on st : (IO eventval io_out ($\Sigma$r:trace $\times$ state. step ge st (\fst r) (\snd r))) ≝ |
---|
588 | match st with |
---|
589 | [ State f s k e m $\Rightarrow$ |
---|
590 | match s with |
---|
591 | [ Sassign a1 a2 $\Rightarrow$ Some ? ( |
---|
592 | ! $\langle$l,tr1$\rangle$ $\leftarrow$ exec_lvalue ge e m a1;: |
---|
593 | ! $\langle$v2,tr2$\rangle$ $\leftarrow$ exec_expr ge e m a2;: |
---|
594 | ! m' $\leftarrow$ store_value_of_type' (typeof a1) m l v2;: |
---|
595 | ret ? $\langle$tr1++tr2, State f Sskip k e m'$\rangle$) |
---|
596 | ... |
---|
597 | \end{lstlisting} |
---|
598 | \end{quote} |
---|
599 | where \lstinline'!' is used in place of \lstinline'do' due to the change in |
---|
600 | monad. The content is essentially the same as the inductive rule given on |
---|
601 | page~\pageref{page:Sassign}. |
---|
602 | |
---|
603 | The handling of external calls uses the |
---|
604 | \begin{quote} |
---|
605 | \begin{lstlisting} |
---|
606 | do_io : ident $\rightarrow$ list eventval $\rightarrow$ IO eventval io_out eventval |
---|
607 | \end{lstlisting} |
---|
608 | \end{quote} |
---|
609 | function to suspend execution: |
---|
610 | \begin{quote} |
---|
611 | \begin{lstlisting} |
---|
612 | ... |
---|
613 | ] |
---|
614 | | Callstate f0 vargs k m $\Rightarrow$ |
---|
615 | match f0 with |
---|
616 | [ ... |
---|
617 | | External f argtys retty $\Rightarrow$ Some ? ( |
---|
618 | ! evargs $\leftarrow$ check_eventval_list vargs (typlist_of_typelist argtys);: |
---|
619 | ! evres $\leftarrow$ do_io f evargs;: |
---|
620 | ! vres $\leftarrow$ check_eventval evres (proj_sig_res (signature_of_type argtys rett |
---|
621 | ret ? $\langle$(Eextcall f evargs evres), Returnstate vres k m$\rangle$) |
---|
622 | ... |
---|
623 | \end{lstlisting} |
---|
624 | \end{quote} |
---|
625 | \todo{say something more about the rest?} |
---|
626 | |
---|
627 | Together with functions to provide the initial state for a program and to |
---|
628 | detect a final state we can write a function to run the program up to a given |
---|
629 | number of steps. Similarly, a corecursive function can return the entire |
---|
630 | execution as a stream of trace and state pairs. |
---|
631 | |
---|
632 | \section{Validation} |
---|
633 | \label{sec:valid} |
---|
634 | |
---|
635 | We have used two methods to validate our executable semantics: we have |
---|
636 | proven them equivalent to the inductive semantics of |
---|
637 | Section~\ref{sec:inductive}, and we have animated small examples of |
---|
638 | key areas. |
---|
639 | |
---|
640 | \subsection{Equivalence to inductive semantics} |
---|
641 | |
---|
642 | To show that the executable semantics are sound with respect to the |
---|
643 | inductive semantics we need to prove that any value produced by each |
---|
644 | function satisfies the corresponding relation, modulo errors and |
---|
645 | resumption. To deal with these monads we lift the properties |
---|
646 | required. In particular, for the resumption monad we ignore error |
---|
647 | values, require the property when a value is produced, and quantify |
---|
648 | over any interaction with the outside world: |
---|
649 | \begin{quote} |
---|
650 | \begin{lstlisting} |
---|
651 | nlet rec P_io O I (A:Type) (P:A $\rightarrow$ Prop) (v:IO O I A) on v : Prop := |
---|
652 | match v return $\lambda$_.Prop with |
---|
653 | [ Wrong $\Rightarrow$ True |
---|
654 | | Value z $\Rightarrow$ P z |
---|
655 | | Interact out k $\Rightarrow$ $\forall$v'.P_io O I A P (k v') |
---|
656 | ]. |
---|
657 | \end{lstlisting} |
---|
658 | \end{quote} |
---|
659 | We can use this lifting with the relations from the inductive |
---|
660 | semantics to state soundness properties: |
---|
661 | \begin{quote} |
---|
662 | \begin{lstlisting} |
---|
663 | ntheorem exec_step_sound: $\forall$ge,st. |
---|
664 | P_io ??? ($\lambda$r. step ge st (\fst r) (\snd r)) (exec_step ge st). |
---|
665 | \end{lstlisting} |
---|
666 | \end{quote} |
---|
667 | The proofs of these theorems use case analysis over the state, a few |
---|
668 | lemmas to break up the expressions in the monad and the other |
---|
669 | soundness results to form the corresponding derivation in the |
---|
670 | inductive semantics. |
---|
671 | |
---|
672 | We experimented with a different way of specifying soundness using |
---|
673 | dependent types: |
---|
674 | \begin{quote} |
---|
675 | \begin{lstlisting} |
---|
676 | nlet rec exec_step (ge:genv) (st:state) on st : (IO eventval io_out ($\Sigma$r:trace $\times$ state. step ge st (\fst r) (\snd r))) ≝ |
---|
677 | \end{lstlisting} |
---|
678 | \end{quote} |
---|
679 | Note the $\Sigma$ type for the result of the function, which shows that |
---|
680 | successful executions are sound with respect to the inductive semantics. |
---|
681 | Matita automatically generates proof obligations for each case due to a |
---|
682 | coercion between the types |
---|
683 | \begin{center} |
---|
684 | \lstinline'option (res T)' \quad and \quad \lstinline'res ($\Sigma$x:T. P x)' |
---|
685 | \end{center} |
---|
686 | (where a branch marked \lstinline'None' would generate a proof obligation to |
---|
687 | show that it is impossible, although the semantics do not use this feature). |
---|
688 | This is intended to mimic Sozeau's \textsc{Russell} language and elaboration |
---|
689 | into Coq~\cite{sozeau06}. The coercion also triggers an automatic mechanism |
---|
690 | in \matita{} to add equalities for each pattern matched. Each case of the |
---|
691 | soundness proof consists of extracting an equality for each computation in the |
---|
692 | monad, |
---|
693 | % "in the monad" is a bit vague here |
---|
694 | making any further case distinctions necessary and applying the corresponding |
---|
695 | rule from the inductive semantics. |
---|
696 | |
---|
697 | However, the soundness proofs then pervade the executable semantics, |
---|
698 | making the correctness proofs more difficult. We decided to keep the |
---|
699 | soundness results separate, partly because of the increased difficulty |
---|
700 | of using the resulting terms in proofs, and partly because they are of |
---|
701 | little consequence once equivalence has been shown. |
---|
702 | |
---|
703 | The completeness results requiring a dual lifting which requires the |
---|
704 | term to reduce to a particular value, allowing for resumptions with |
---|
705 | existential quantification: |
---|
706 | \begin{quote} |
---|
707 | \begin{lstlisting} |
---|
708 | nlet rec yieldsIObare (A:Type) (a:IO io_out io_in A) (v':A) on a : Prop := |
---|
709 | match a with |
---|
710 | [ Value v $\Rightarrow$ v' = v |
---|
711 | | Interact _ k $\Rightarrow$ $\exists$r.yieldsIObare A (k r) v' |
---|
712 | | _ $\Rightarrow$ False |
---|
713 | ]. |
---|
714 | \end{lstlisting} |
---|
715 | \end{quote} |
---|
716 | \begin{quote} |
---|
717 | We then show the completeness theorems, such as |
---|
718 | \begin{lstlisting} |
---|
719 | ntheorem step_complete: $\forall$ge,s,tr,s'. |
---|
720 | step ge s tr s' $\rightarrow$ yieldsIObare ? (exec_step ge s) $\langle$tr,s'$\rangle$. |
---|
721 | \end{lstlisting} |
---|
722 | \end{quote} |
---|
723 | by case analysis on the inductive derivation and a mixture of |
---|
724 | reduction and rewriting. |
---|
725 | \todo{Whole execution equivalence} |
---|
726 | |
---|
727 | \subsection{Animation of simple C programs} |
---|
728 | |
---|
729 | \begin{verbatim} |
---|
730 | MORE: |
---|
731 | library stuff: choice of integers, maps |
---|
732 | check for any axioms |
---|
733 | \end{verbatim} |
---|
734 | |
---|
735 | \bibliographystyle{plain} |
---|
736 | \bibliography{report} |
---|
737 | |
---|
738 | \end{document} |
---|