1 | \documentclass[11pt,epsf,a4wide]{article} |
---|
2 | \usepackage[mathletters]{ucs} |
---|
3 | \usepackage[utf8x]{inputenc} |
---|
4 | \usepackage{listings} |
---|
5 | \usepackage{../../style/cerco} |
---|
6 | \newcommand{\ocaml}{OCaml} |
---|
7 | \newcommand{\clight}{Clight} |
---|
8 | \newcommand{\matita}{Matita} |
---|
9 | \newcommand{\sdcc}{\texttt{sdcc}} |
---|
10 | |
---|
11 | \newcommand{\textSigma}{\ensuremath{\Sigma}} |
---|
12 | |
---|
13 | % LaTeX Companion, p 74 |
---|
14 | \newcommand{\todo}[1]{\marginpar{\raggedright - #1}} |
---|
15 | |
---|
16 | \lstdefinelanguage{coq} |
---|
17 | {keywords={Definition,Lemma,Theorem,Remark,Qed,Save,Inductive,Record}, |
---|
18 | morekeywords={[2]if,then,else}, |
---|
19 | } |
---|
20 | |
---|
21 | \lstdefinelanguage{matita} |
---|
22 | {keywords={definition,lemma,theorem,remark,inductive,record,qed,let,rec,match,with,Type,and,on}, |
---|
23 | morekeywords={[2]whd,normalize,elim,cases,destruct}, |
---|
24 | mathescape=true, |
---|
25 | morecomment=[n]{(*}{*)}, |
---|
26 | } |
---|
27 | |
---|
28 | \lstset{language=matita,basicstyle=\small\tt,columns=flexible,breaklines=false, |
---|
29 | keywordstyle=\color{red}\bfseries, |
---|
30 | keywordstyle=[2]\color{blue}, |
---|
31 | commentstyle=\color{green}, |
---|
32 | stringstyle=\color{blue}, |
---|
33 | showspaces=false,showstringspaces=false} |
---|
34 | |
---|
35 | \lstset{extendedchars=false} |
---|
36 | \lstset{inputencoding=utf8x} |
---|
37 | \DeclareUnicodeCharacter{8797}{:=} |
---|
38 | \DeclareUnicodeCharacter{10746}{++} |
---|
39 | \DeclareUnicodeCharacter{9001}{\ensuremath{\langle}} |
---|
40 | \DeclareUnicodeCharacter{9002}{\ensuremath{\rangle}} |
---|
41 | |
---|
42 | |
---|
43 | \title{ |
---|
44 | INFORMATION AND COMMUNICATION TECHNOLOGIES\\ |
---|
45 | (ICT)\\ |
---|
46 | PROGRAMME\\ |
---|
47 | \vspace*{1cm}Project FP7-ICT-2009-C-243881 \cerco{}} |
---|
48 | |
---|
49 | \date{ } |
---|
50 | \author{} |
---|
51 | |
---|
52 | \begin{document} |
---|
53 | \thispagestyle{empty} |
---|
54 | |
---|
55 | \vspace*{-1cm} |
---|
56 | \begin{center} |
---|
57 | \includegraphics[width=0.6\textwidth]{../../style/cerco_logo.png} |
---|
58 | \end{center} |
---|
59 | |
---|
60 | \begin{minipage}{\textwidth} |
---|
61 | \maketitle |
---|
62 | \end{minipage} |
---|
63 | |
---|
64 | |
---|
65 | \vspace*{0.5cm} |
---|
66 | \begin{center} |
---|
67 | \begin{LARGE} |
---|
68 | \bf |
---|
69 | Report n. D3.3\\ |
---|
70 | Executable Formal Semantics of front-end intermediate languages\\ |
---|
71 | \end{LARGE} |
---|
72 | \end{center} |
---|
73 | |
---|
74 | \vspace*{2cm} |
---|
75 | \begin{center} |
---|
76 | \begin{large} |
---|
77 | Version 1.0 |
---|
78 | \end{large} |
---|
79 | \end{center} |
---|
80 | |
---|
81 | \vspace*{0.5cm} |
---|
82 | \begin{center} |
---|
83 | \begin{large} |
---|
84 | Main Authors:\\ |
---|
85 | Brian~Campbell |
---|
86 | \end{large} |
---|
87 | \end{center} |
---|
88 | |
---|
89 | \vspace*{\fill} |
---|
90 | \noindent |
---|
91 | Project Acronym: \cerco{}\\ |
---|
92 | Project full title: Certified Complexity\\ |
---|
93 | Proposal/Contract no.: FP7-ICT-2009-C-243881 \cerco{}\\ |
---|
94 | |
---|
95 | \clearpage \pagestyle{myheadings} \markright{\cerco{}, FP7-ICT-2009-C-243881} |
---|
96 | |
---|
97 | \newpage |
---|
98 | |
---|
99 | \vspace*{7cm} |
---|
100 | \paragraph{Abstract} |
---|
101 | We report on the formalization of the front-end intermediate languages for the |
---|
102 | \cerco{} project's compiler using executable semantics in the \matita{} proof |
---|
103 | assistant. This includes common definitions for fresh identifier handling, |
---|
104 | $n$-bit arithmetic and operations and testing of the semantics. We also |
---|
105 | consider embedding invariants into the semantics for showing correctness |
---|
106 | properties. |
---|
107 | |
---|
108 | This work was Task~3.3 of the \cerco{} project and the languages that were |
---|
109 | formalized were first described in Deliverable~2.1~\cite{d2.1}. They are used |
---|
110 | by the formalized front-end in Task~3.2 to provide the syntax for the |
---|
111 | transformations, and to test them by animating programs in the executable |
---|
112 | semantics. It will also be a crucial part of the specifications for the |
---|
113 | correctness results in Task~3.4. |
---|
114 | |
---|
115 | \newpage |
---|
116 | |
---|
117 | \tableofcontents |
---|
118 | |
---|
119 | % CHECK: clear up any -ize vs -ise |
---|
120 | % CHECK: clear up any "front end" vs "front-end" |
---|
121 | % CHECK: clear up any mentions of languages that aren't textsf'd. |
---|
122 | % CHECK: fix unicode in listings |
---|
123 | |
---|
124 | \section{Introduction} |
---|
125 | |
---|
126 | This work formalizes the front-end intermediate languages from the \cerco{} |
---|
127 | prototype compiler, described in previous deliverables~\cite{d2.1,d2.2}. The |
---|
128 | front-end of the compiler is summarized in Figure~\ref{fig:summary} including |
---|
129 | the intermediate languages and the compiler passes described in the |
---|
130 | accompanying Deliverable 3.2. We have also refined parts of the formal |
---|
131 | development that are used for several of the languages in the compiler. |
---|
132 | |
---|
133 | The input language to the formalized front-end is the \textsf{Clight} |
---|
134 | language. The executable semantics for this language were presented in a |
---|
135 | previous deliverable~\cite{d3.1}. Here we will report on some |
---|
136 | minor changes to its semantics made to better align it with the whole |
---|
137 | development. |
---|
138 | |
---|
139 | The formalization of each language takes the form of definitions for abstract |
---|
140 | syntax and functions providing a small-step executable semantics. This is |
---|
141 | done in the Calculus of Inductive Constructions (CIC), as implemented in the |
---|
142 | \matita{} proof assistant. These definitions will be essential for the |
---|
143 | correctness proofs of the compiler in Task 3.4. |
---|
144 | |
---|
145 | Finally, we will report on work to add several invariants to the |
---|
146 | languages. This activity overlaps with Task 3.4 on the correctness of the |
---|
147 | compiler front-end. However, the use of dependent types mean that this work |
---|
148 | is tied closely to the definition of the languages and the transformations of |
---|
149 | the front-end in Task 3.2. By considering it now we can experiment with and |
---|
150 | judge its impact on the formal semantics, and how feasible retrofitting such |
---|
151 | invariants is. |
---|
152 | |
---|
153 | \begin{figure} |
---|
154 | \begin{center} |
---|
155 | \begin{minipage}{.8\linewidth} |
---|
156 | \begin{tabbing} |
---|
157 | \quad \= $\downarrow$ \quad \= \kill |
---|
158 | \textsf{C} (unformalized)\\ |
---|
159 | \> $\downarrow$ \> CIL parser (unformalized \ocaml)\\ |
---|
160 | \textsf{Clight}\\ |
---|
161 | \> $\downarrow$ \> cast removal\\ |
---|
162 | \> $\downarrow$ \> add runtime functions\\ |
---|
163 | \> $\downarrow$ \> labelling\\ |
---|
164 | \> $\downarrow$ \> stack variable allocation and control structure |
---|
165 | simplification\\ |
---|
166 | \textsf{Cminor}\\ |
---|
167 | \> $\downarrow$ \> generate global variable initialization code\\ |
---|
168 | \> $\downarrow$ \> transform to RTL graph\\ |
---|
169 | \textsf{RTLabs}\\ |
---|
170 | \> $\downarrow$ \> start of target specific back-end\\ |
---|
171 | \>\quad \vdots |
---|
172 | \end{tabbing} |
---|
173 | \end{minipage} |
---|
174 | \end{center} |
---|
175 | \caption{Front-end languages and transformations} |
---|
176 | \label{fig:summary} |
---|
177 | \end{figure} |
---|
178 | |
---|
179 | \subsection{Revisions to the prototype compiler} |
---|
180 | |
---|
181 | Ongoing work to maintain and improve the prototype compiler has resulted in |
---|
182 | several changes, mostly minor. The most important change is that the |
---|
183 | transformations to replace 16 and 32 bit types have been moved from the |
---|
184 | \textsf{Clight} language to the target-specific stage between \textsf{RTLabs} |
---|
185 | and \textsf{RTL} to help generate better code, and the addition of a |
---|
186 | \textsf{Clight} cast removal transformation to reduce the number of 16 and 32 |
---|
187 | bit operations. |
---|
188 | |
---|
189 | The formalized semantics have tracked these changes, and in this report we |
---|
190 | describe them as they currently stand. |
---|
191 | |
---|
192 | \section{Definitions common to several languages} |
---|
193 | |
---|
194 | The semantics for many of the languages in the compiler share some core parts: |
---|
195 | the memory model, environments, runtime values, definitions of operations, |
---|
196 | a monad for encapsulating failure and I/O (using resumptions), and a common |
---|
197 | abstraction for small-step executable semantics. The executable |
---|
198 | memory model was ported from CompCert as part of the work on \textsf{Clight} |
---|
199 | and was reused for the front-end languages\footnote{However, it is likely that |
---|
200 | we will revise the memory model to make it better suited for describing |
---|
201 | all of our compiler, not just the front-end.}. The failure and I/O monad was introduced in the |
---|
202 | previous deliverable on \textsf{Clight}~\cite[\S4.2]{d3.1}. In all of the |
---|
203 | languages except \textsf{Clight} we have a basic form of type, \lstinline'typ' |
---|
204 | identifying integers and pointers along with their sizes. The other parts |
---|
205 | are discussed below, with the only change to the runtime values being the |
---|
206 | representation of integers. |
---|
207 | |
---|
208 | \subsection{Identifiers} |
---|
209 | |
---|
210 | Each language features one or more kinds of names to represent variables, such |
---|
211 | as registers, \texttt{goto} labels or RTL graph labels. We also need to |
---|
212 | describe various maps whose domain is a set of identifiers when defining the |
---|
213 | semantics and compilation. |
---|
214 | |
---|
215 | Previous work on the executable semantics of the target machine |
---|
216 | code included bit vectors and bit vector tries to define various integers in |
---|
217 | the semantics, and give a low level view of memory~\cite{d4.1}. To keep the |
---|
218 | size of the development down we have reused these data structures for |
---|
219 | identifiers and maps respectively. |
---|
220 | |
---|
221 | One difficulty with using fixed size bit vectors for identifiers is that |
---|
222 | fresh name generation can fail if generate too many. While we use an error |
---|
223 | monad to deal with failures, we wish to minimize its use in the compiler. |
---|
224 | Thus we add a flag to detect overflows, and check it after the phase of the |
---|
225 | compiler is complete to report exhaustion. The rest of the phase can then be |
---|
226 | written as if name generation always succeeds. In practice this will never |
---|
227 | occur on normal programs because more identifiers of each sort are available |
---|
228 | than bytes of code memory on the target. |
---|
229 | |
---|
230 | Given the wide variety of identifiers used in the compiler we also |
---|
231 | wish to separate the different classes of identifier. Thus we encapsulate the |
---|
232 | bit vector representing the identifier in a datatype that also carries a tag |
---|
233 | identifying which class of identifier we are using: |
---|
234 | \begin{lstlisting} |
---|
235 | inductive identifier (tag:String) : Type[0] ≝ |
---|
236 | an_identifier : Word $\rightarrow$ identifier tag. |
---|
237 | \end{lstlisting} |
---|
238 | The tries are also tagged in the same manner. These tags have also proved |
---|
239 | useful during testing by making the resulting terms more readable. |
---|
240 | |
---|
241 | \subsection{Machine integers and arithmetic} |
---|
242 | |
---|
243 | The bit vectors in~\cite{d4.1} also came equipped with some basic arithmetic |
---|
244 | for the target semantics. The front-end required these operations to be |
---|
245 | generalized and extended. In particular, we required operations such as zero |
---|
246 | and sign extension and translation between bit vectors and full integers. It |
---|
247 | also became apparent that while the original definitions worked reasonably on |
---|
248 | 8-bit vectors, they did not scale up to 32-bit integers. The definitions were |
---|
249 | then reworked to make them efficient enough to animate programs in the |
---|
250 | front-end semantics. |
---|
251 | |
---|
252 | \subsection{Front-end operations} |
---|
253 | |
---|
254 | The two front-end intermediate languages, \textsf{Cminor} and \textsf{RTLabs}, |
---|
255 | share the same set of operations on values. They differ from |
---|
256 | \textsf{Clight}'s operations by incorporating casts and by having a separate |
---|
257 | operation for each type of data operated upon. For example, subtraction of |
---|
258 | pointers is treated as a different operation from subtraction of integers. |
---|
259 | |
---|
260 | A common semantics is given for these operations in the form of simple |
---|
261 | CIC functions on the operation and runtime values. |
---|
262 | |
---|
263 | \subsection{Presentation of small-step executable semantics} |
---|
264 | \label{sec:smallstepexec} |
---|
265 | |
---|
266 | Each language's semantics is described by an instantiation of two records for |
---|
267 | defining transition systems. We already use these to animate the semantics of |
---|
268 | the languages for testing, but they will also be used to state the simulation |
---|
269 | properties we will prove in Tasks~3.4 and~4.4. |
---|
270 | |
---|
271 | First we describe suitable transition systems, |
---|
272 | \begin{lstlisting} |
---|
273 | record trans_system (outty:Type[0]) (inty:outty $\rightarrow$ Type[0]) : Type[2] := |
---|
274 | { global : Type[1] |
---|
275 | ; state : global $\rightarrow$ Type[0] |
---|
276 | ; is_final : $\forall$g. state g $\rightarrow$ option int |
---|
277 | ; step : $\forall$g. state g $\rightarrow$ IO outty inty (trace$\times$(state g)) |
---|
278 | }. |
---|
279 | \end{lstlisting} |
---|
280 | where we have some type of \lstinline'global' data that remains fixed |
---|
281 | throughout evaluation (typically used for the global environment), a type for |
---|
282 | states, a function to detect a final successful state and a step function |
---|
283 | which can also return an error or request for I/O. The type of states may |
---|
284 | depend on the fixed data so that we will be able to add invariants asserting |
---|
285 | that (for example) all the global variables referenced by the program state |
---|
286 | exist. |
---|
287 | |
---|
288 | We also define a coinductive description of executions and a cofixpoint to |
---|
289 | produce them. The second record extends the transition system with |
---|
290 | a type of programs and functions to initialise the transition system: |
---|
291 | \begin{lstlisting} |
---|
292 | record fullexec (outty:Type[0]) (inty:outty $\rightarrow$ Type[0]) : Type[2] := |
---|
293 | { program : Type[0] |
---|
294 | ; es1 :> trans_system outty inty |
---|
295 | ; make_global : program $\rightarrow$ global ?? es1 |
---|
296 | ; make_initial_state : $\forall$p:program. res (state ?? es1 (make_global p)) |
---|
297 | }. |
---|
298 | \end{lstlisting} |
---|
299 | Finally another function is given which uses them to produce a full execution |
---|
300 | starting from the program alone. |
---|
301 | |
---|
302 | \section{\textsf{Clight} modifications} |
---|
303 | |
---|
304 | The \textsf{Clight} input language remained largely the same as in the |
---|
305 | previous deliverable~\cite{d3.1}. The principal changes were to use the |
---|
306 | identifiers and arithmetic described above in place of the arbitrarily large |
---|
307 | integers used before. For the identifiers, this relieved us of the burden of |
---|
308 | adding an efficient datatype for maps by reusing the bit vector tries instead. |
---|
309 | |
---|
310 | The arithmetic replaced a dependent pair of an arbitrary integer and a proof |
---|
311 | that it was in range of 32 bit integers by the exact bit vector for each |
---|
312 | size of integer. This direct approach is closer to the implementation and |
---|
313 | more obviously correct --- no extra precision can be left in by accident. |
---|
314 | |
---|
315 | \section{\textsf{Cminor}} |
---|
316 | |
---|
317 | The \textsf{Cminor} language does not store local variables in memory, and has |
---|
318 | simpler control structures than \textsf{Clight}. It is similar in nature to |
---|
319 | the \textsf{Cminor} language in CompCert, although the semantics have been |
---|
320 | based on the \cerco{} prototype rather than ported from CompCert. The syntax |
---|
321 | is similar to the prototype, except that the types attached to expressions are |
---|
322 | restricted so that some corner cases are ruled out in the \textsf{Cminor} to |
---|
323 | \textsf{RTLabs} stage (see the accompanying Deliverable 3.2 for details): |
---|
324 | \begin{lstlisting} |
---|
325 | inductive expr : typ $\rightarrow$ Type[0] ≝ |
---|
326 | | Id : $\forall$t. ident $\rightarrow$ expr t |
---|
327 | | Cst : $\forall$t. constant $\rightarrow$ expr t |
---|
328 | | Op1 : $\forall$t,t'. unary_operation $\rightarrow$ expr t $\rightarrow$ expr t' |
---|
329 | | Op2 : $\forall$t1,t2,t'. binary_operation $\rightarrow$ expr t1 $\rightarrow$ expr t2 $\rightarrow$ expr t' |
---|
330 | | Mem : $\forall$t,r. memory_chunk $\rightarrow$ expr (ASTptr r) $\rightarrow$ expr t |
---|
331 | | Cond : $\forall$sz,sg,t. expr (ASTint sz sg) $\rightarrow$ expr t $\rightarrow$ expr t $\rightarrow$ expr t |
---|
332 | | Ecost : $\forall$t. costlabel $\rightarrow$ expr t $\rightarrow$ expr t. |
---|
333 | \end{lstlisting} |
---|
334 | For example, note that conditional expressions only switch on integer |
---|
335 | expressions. |
---|
336 | In principle we could extend this to statically ensure that only well-typed |
---|
337 | \textsf{Cminor} expressions are considered, and we will consider this as part |
---|
338 | of the work on correctness in Task 3.4. |
---|
339 | |
---|
340 | We also provide a variant of the syntax where the only initialization data is |
---|
341 | the size of each global variable, for use after the initialization code has |
---|
342 | been generated. |
---|
343 | |
---|
344 | The definition of the semantics is routine: a functional definition of a |
---|
345 | single small-step of the machine is given, reusing the memory model, |
---|
346 | environments, arithmetic and operations mentioned above. |
---|
347 | |
---|
348 | \section{\textsf{RTLabs}} |
---|
349 | |
---|
350 | The \textsf{RTLabs} language provides a target independent Register Transfer |
---|
351 | Language, where programs are represented as control flow graphs. We use the |
---|
352 | identifiers described above for the graph labels and the maps for the graph |
---|
353 | itself. The tagging mechanism ensures that labels cannot be mixed up with |
---|
354 | other identifiers in the program (in particular, we cannot accidentally reuse |
---|
355 | a \texttt{goto} label from Cminor where a graph label should appear). |
---|
356 | |
---|
357 | Otherwise, the syntax and semantics of \textsf{RTLabs} mirrors that of the |
---|
358 | prototype compiler. Some of the syntax is shown below, including the type of |
---|
359 | the control flow graphs. The same common elements are used as for \textsf{Cminor}, |
---|
360 | including the front-end operations mentioned above. |
---|
361 | |
---|
362 | \begin{lstlisting}[language=matita] |
---|
363 | inductive statement : Type[0] ≝ |
---|
364 | | St_skip : label $\rightarrow$ statement |
---|
365 | | St_cost : costlabel $\rightarrow$ label $\rightarrow$ statement |
---|
366 | | St_const : register $\rightarrow$ constant $\rightarrow$ label $\rightarrow$ statement |
---|
367 | | St_op1 : unary_operation $\rightarrow$ register $\rightarrow$ register $\rightarrow$ label $\rightarrow$ statement |
---|
368 | ... |
---|
369 | | St_return : statement |
---|
370 | . |
---|
371 | |
---|
372 | definition graph : Type[0] $\rightarrow$ Type[0] ≝ identifier_map LabelTag. |
---|
373 | |
---|
374 | record internal_function : Type[0] ≝ |
---|
375 | { f_labgen : universe LabelTag |
---|
376 | ; f_reggen : universe RegisterTag |
---|
377 | ; f_result : option (register $\times$ typ) |
---|
378 | ; f_params : list (register $\times$ typ) |
---|
379 | ; f_locals : list (register $\times$ typ) |
---|
380 | ; f_stacksize : nat |
---|
381 | ; f_graph : graph statement |
---|
382 | }. |
---|
383 | \end{lstlisting} |
---|
384 | |
---|
385 | \section{Testing} |
---|
386 | |
---|
387 | To provide some assurance that the semantics were properly implemented, and to |
---|
388 | support the testing described in the accompanying Deliverable 3.2, we have |
---|
389 | adapted the pretty printers in the prototype compiler to produce \matita{} |
---|
390 | terms for the syntax of each language described above. |
---|
391 | |
---|
392 | A few common definitions were added for animating the small-step semantics |
---|
393 | of any of the front-end languages in \matita{}, given a bound on |
---|
394 | the number of steps to execute and any input required. These use the |
---|
395 | definitions described in Section~\ref{sec:smallstepexec} to perform the actual |
---|
396 | execution. We then used a small selection of test cases to ensure basic |
---|
397 | functionality. However, this is still a time consuming process, so more |
---|
398 | testing will carried out once the extraction of CIC terms to \ocaml{} programs |
---|
399 | is implemented in \matita. |
---|
400 | |
---|
401 | \section{Embedding invariants} |
---|
402 | |
---|
403 | Each phase of the prototype compiler can fail in a number of places if the |
---|
404 | input language permits programs that are badly structured in some sense: a |
---|
405 | missing label in a \texttt{goto} statement or CFG, an undefined variable name, |
---|
406 | a \texttt{break} statement outside of a loop or \texttt{switch}, and so on. |
---|
407 | We wish to restrict our intermediate languages using dependent types to remove |
---|
408 | as many `junk' programs as possible to rule out such failures. We also hope |
---|
409 | that such restrictions will help in other correctness proofs. |
---|
410 | |
---|
411 | This goal lies in the overlap between several tasks in the project: it |
---|
412 | involves manipulating the syntax and semantics of the intermediate languages |
---|
413 | (the present work), the encoding of the front-end compiler phases in \matita{} |
---|
414 | (Task 3.2) and the correctness of the front-end (Task 3.4). Thus this work is |
---|
415 | rather experimental; it is being carried out on branches in our source code |
---|
416 | repository and the final form will be decided and merged in during Task 3.4. |
---|
417 | |
---|
418 | So far we have tried adding two forms of invariant --- one using dependent |
---|
419 | types to index statements in \textsf{Cminor} by their block depth, and the |
---|
420 | other asserts that variables and labels are present in the appropriate |
---|
421 | environments by adding a separate invariant to each function. Note that these |
---|
422 | do not yet cover all of the properties that a program in these languages is |
---|
423 | expected to enjoy; for example, there are currently no checks that references |
---|
424 | to globals are well-defined. |
---|
425 | |
---|
426 | \subsection{\textsf{Cminor} block depth} |
---|
427 | |
---|
428 | The \textsf{Cminor} language has relatively simple control structures. |
---|
429 | Statements are provided for infinite loops, non-looping blocks and |
---|
430 | exiting an arbitrary number of blocks (for \texttt{break}, failing loop guards |
---|
431 | and the switch statement\footnote{We are considering replacing the |
---|
432 | \textsf{Cminor} switch statement with one that uses \texttt{goto}-like labels |
---|
433 | in both the prototype and the formalized compilers, but for now we stick with |
---|
434 | this CompCert-style arrangement.}). |
---|
435 | |
---|
436 | However, this means that there are badly-formed \textsf{Cminor} programs such |
---|
437 | as |
---|
438 | \begin{lstlisting}[language={}] |
---|
439 | int main() { |
---|
440 | block { |
---|
441 | loop { |
---|
442 | exit 5 |
---|
443 | } |
---|
444 | } |
---|
445 | } |
---|
446 | \end{lstlisting} |
---|
447 | where we attempt to exit more blocks than exist. To rule these out (including |
---|
448 | demonstrating that the previous phase of the compiler does not generate them) |
---|
449 | we can index the statements of the language by the depth of the enclosing |
---|
450 | blocks. |
---|
451 | |
---|
452 | The adaption of the syntax adds the depth to every statement, and uses bounded |
---|
453 | integers in the exit and switch statements: |
---|
454 | \begin{lstlisting}[language=matita] |
---|
455 | inductive stmt : $\forall$blockdepth:nat. Type[0] ≝ |
---|
456 | | St_skip : $\forall$n. stmt n |
---|
457 | ... |
---|
458 | | St_loop : $\forall$n. stmt n $\rightarrow$ stmt n |
---|
459 | | St_block : $\forall$n. stmt (S n) $\rightarrow$ stmt n |
---|
460 | | St_exit : $\forall$n. Fin n $\rightarrow$ stmt n |
---|
461 | (* expr to switch on, table of <switch value, #blocks to exit>, default *) |
---|
462 | | St_switch : expr $\rightarrow$ $\forall$n. list (int $\times$ (Fin n)) $\rightarrow$ Fin n $\rightarrow$ stmt n |
---|
463 | ... |
---|
464 | \end{lstlisting} |
---|
465 | where \texttt{stmt n} is a statement enclosed in \texttt{n} blocks, |
---|
466 | and \texttt{Fin n} is a standard construction for a natural number which is at |
---|
467 | most \texttt{n}. |
---|
468 | |
---|
469 | In the semantics the number of blocks is also added to the continuations and |
---|
470 | state, and the function to find the continuation from an exit statement can be |
---|
471 | made failure-free. |
---|
472 | We note in passing that adding this parameter detected a small mistake in the |
---|
473 | semantics concerning continuations and tail calls, although the mistake itself |
---|
474 | was benign. |
---|
475 | |
---|
476 | \subsection{Identifier invariants} |
---|
477 | |
---|
478 | To show that the variables and labels occurring in the body of a function |
---|
479 | are present in the relevant structures we add an additional invariant to the |
---|
480 | function records. |
---|
481 | |
---|
482 | For \textsf{Cminor} we use a higher-order predicate which recursively applies |
---|
483 | a predicate to all substatements: |
---|
484 | \begin{lstlisting}[language=matita] |
---|
485 | let rec stmt_P (P:stmt $\rightarrow$ Prop) (s:stmt) on s : Prop ≝ |
---|
486 | match s with |
---|
487 | [ St_seq s1 s2 $\Rightarrow$ P s $\wedge$ stmt_P P s1 $\wedge$ stmt_P P s2 |
---|
488 | | St_ifthenelse _ _ _ s1 s2 $\Rightarrow$ P s $\wedge$ stmt_P P s1 $\wedge$ stmt_P P s2 |
---|
489 | | St_loop s' $\Rightarrow$ P s $\wedge$ stmt_P P s' |
---|
490 | | St_block s' $\Rightarrow$ P s $\wedge$ stmt_P P s' |
---|
491 | | St_label _ s' $\Rightarrow$ P s $\wedge$ stmt_P P s' |
---|
492 | | St_cost _ s' $\Rightarrow$ P s $\wedge$ stmt_P P s' |
---|
493 | | _ $\Rightarrow$ P s |
---|
494 | ]. |
---|
495 | \end{lstlisting} |
---|
496 | Dependent pattern matching on statements thus allows an accompanying |
---|
497 | \lstinline'stmt_P' fact to be unfold to the predicate on the current statement |
---|
498 | and the predicate applied to all substatements. |
---|
499 | |
---|
500 | We require two properties to hold in \textsf{Cminor} functions: |
---|
501 | \begin{enumerate} |
---|
502 | \item All variables in the body are present in the list of parameters or the |
---|
503 | list of variables for the function (this also uses a similar recursive |
---|
504 | predicate on expressions). |
---|
505 | \item All labels in \texttt{goto} statements appear in a label statement. |
---|
506 | \end{enumerate} |
---|
507 | The function definition thus becomes: |
---|
508 | \begin{lstlisting}[language=matita] |
---|
509 | record internal_function : Type[0] ≝ |
---|
510 | { f_return : option typ |
---|
511 | ; f_params : list (ident $\times$ typ) |
---|
512 | ; f_vars : list (ident $\times$ typ) |
---|
513 | ; f_stacksize : nat |
---|
514 | ; f_body : stmt |
---|
515 | ; f_inv : stmt_P ($\lambda$s.stmt_vars ($\lambda$i.Exists ? ($\lambda$x.\fst x = i) (f_params @ f_vars)) s $\wedge$ |
---|
516 | stmt_labels ($\lambda$l.Exists ? ($\lambda$l'.l' = l) (labels_of f_body)) s) f_body |
---|
517 | }. |
---|
518 | \end{lstlisting} |
---|
519 | where \lstinline'stmt_vars' and \lstinline'stmt_labels' constrain the |
---|
520 | variables and labels that appear directly in a statement (but not |
---|
521 | substatements) to appear in the given list, and |
---|
522 | \lstinline'labels_of' returns a list of all the labels defined in a |
---|
523 | statement. |
---|
524 | |
---|
525 | The \textsf{Clight} semantics can be amended to use these invariants, although |
---|
526 | the main benefit is for the compiler stages (see the accompanying Deliverable |
---|
527 | 3.2 for details). The semantics require the invariants to be added to the |
---|
528 | state and continuations. It was convenient to split the continuations between |
---|
529 | the local continuation representing the rest of the code to be executed within |
---|
530 | the function, and the stack of function calls because it becomes easier to |
---|
531 | state the property on the local continuation alone. |
---|
532 | The invariant for variables is |
---|
533 | slightly different --- we require that every variable appear in the local |
---|
534 | environment. We use \lstinline'f_inv' from the function to establish this |
---|
535 | invariant when the environment is set up on function entry. |
---|
536 | |
---|
537 | It is unclear whether changing the semantics is really worthwhile. It |
---|
538 | witnesses that the invariants are those we wanted, but makes no difference to |
---|
539 | the actual execution of the program, especially as the execution can still |
---|
540 | fail due to genuine runtime errors. Moreover, it is unclear what effect the |
---|
541 | presence of proof terms and more dependent pattern matching in the semantics |
---|
542 | will have on the complexity of future correctness proofs. We plan to examine |
---|
543 | this issue during Task 3.4. |
---|
544 | |
---|
545 | We use a similar method to specify the invariant that the \textsf{RTLabs} |
---|
546 | graph is closed --- that is, any successor labels in a statement in the graph |
---|
547 | are present in the graph. The definition is simpler in \textsf{RTLabs} |
---|
548 | because the flat representation of the graph does not require recursive |
---|
549 | definitions like \lstinline'stmt_P' above. |
---|
550 | |
---|
551 | \section{Conclusion} |
---|
552 | |
---|
553 | We have developed executable semantics for each of the front-end languages of |
---|
554 | the \cerco{} compiler. These will form the basis of the correctness |
---|
555 | statements for each stage of the compiler in Task 3.4. We have also shown |
---|
556 | that useful invariants can be added as dependent types, and intend to use |
---|
557 | these in subsequent work. |
---|
558 | |
---|
559 | \bibliographystyle{plain} |
---|
560 | \bibliography{report} |
---|
561 | |
---|
562 | \end{document} |
---|