source: Deliverables/D1.2/CompilerProofOutline/outline.tex @ 1778

Last change on this file since 1778 was 1778, checked in by sacerdot, 8 years ago

...

File size: 63.2 KB
Line 
1\documentclass[11pt,a4paper]{article}
2\usepackage{../../style/cerco}
3\usepackage{a4wide}
4\usepackage{amsfonts}
5\usepackage{amsmath}
6\usepackage{amssymb}
7\usepackage[english]{babel}
8\usepackage{color}
9\usepackage{diagrams}
10\usepackage{graphicx}
11\usepackage{listings}
12\usepackage{microtype}
13\usepackage{skull}
14\usepackage{stmaryrd}
15\usepackage{array}
16\newcolumntype{b}{@{}>{{}}}
17\newcolumntype{B}{@{}>{{}}c<{{}}@{}}
18\newcolumntype{h}[1]{@{\hspace{#1}}}
19\newcolumntype{L}{>{$}l<{$}}
20\newcolumntype{C}{>{$}c<{$}}
21\newcolumntype{R}{>{$}r<{$}}
22\newcolumntype{S}{>{$(}r<{)$}}
23\newcolumntype{n}{@{}}
24\usepackage{wasysym}
25
26\hypersetup{bookmarksopenlevel=2}
27
28\title{
29INFORMATION AND COMMUNICATION TECHNOLOGIES\\
30(ICT)\\
31PROGRAMME\\
32\vspace*{1cm}Project FP7-ICT-2009-C-243881 {\cerco}}
33
34\date{ }
35\author{}
36
37\begin{document}
38\thispagestyle{empty}
39
40\vspace*{-1cm}
41\begin{center}
42\includegraphics[width=0.6\textwidth]{../../style/cerco_logo.png}
43\end{center}
44
45\begin{minipage}{\textwidth}
46\maketitle
47\end{minipage}
48
49
50\vspace*{0.5cm}
51\begin{center}
52\begin{LARGE}
53\bf
54Proof outline for the correctness of the CerCo compiler
55\end{LARGE} 
56\end{center}
57
58\vspace*{2cm}
59\begin{center}
60\begin{large}
61Version 1.0
62\end{large}
63\end{center}
64
65\vspace*{0.5cm}
66\begin{center}
67\begin{large}
68Main Authors:\\
69B. Campbell, D. Mulligan, P. Tranquilli, C. Sacerdoti Coen
70\end{large}
71\end{center}
72
73\vspace*{\fill}
74\noindent
75Project Acronym: {\cerco}\\
76Project full title: Certified Complexity\\
77Proposal/Contract no.: FP7-ICT-2009-C-243881 {\cerco}\\
78
79\clearpage \pagestyle{myheadings} \markright{{\cerco}, FP7-ICT-2009-C-243881}
80
81\tableofcontents
82
83\section{Introduction}
84\label{sect.introduction}
85
86In the last project review of the CerCo project, the project reviewers
87recommended us to quickly outline a paper-and-pencil correctness proof
88for each of the stages of the CerCo compiler in order to allow for an
89estimation of the complexity and time required to complete the formalization
90of the proof. This has been possible starting from month 18 when we have
91completed the formalization in Matita of the datastructures and code of
92the compiler.
93
94In this document we provide a very high-level, pen-and-paper
95sketch of what we view as the best path to completing the correctness proof
96for the compiler. In particular, for every translation between two intermediate languages, in both the front- and back-ends, we identify the key translation steps, and identify some invariants that we view as being important for the correctness proof.  We sketch the overall correctness results, and also briefly describe the parts of the proof that have already
97been completed at the end of the First Period.
98
99In the last section we finally present an estimation of the effort required
100for the certification in Matita of the compiler and we draw conclusions.
101
102\section{Front-end: Clight to RTLabs}
103
104The front-end of the CerCo compiler consists of several stages:
105
106\begin{center}
107\begin{minipage}{.8\linewidth}
108\begin{tabbing}
109\quad \= $\downarrow$ \quad \= \kill
110\textsf{Clight}\\
111\> $\downarrow$ \> cast removal\\
112\> $\downarrow$ \> add runtime functions\footnote{Following the last project
113meeting we intend to move this transformation to the back-end}\\
114\> $\downarrow$ \> cost labelling\\
115\> $\downarrow$ \> loop optimizations\footnote{\label{lab:opt2}To be ported from the untrusted compiler and certified only in case of early completion of the certification of the other passes.} (an endo-transformation)\\
116\> $\downarrow$ \> partial redundancy elimination$^{\mbox{\scriptsize \ref{lab:opt2}}}$ (an endo-transformation)\\
117\> $\downarrow$ \> stack variable allocation and control structure
118 simplification\\
119\textsf{Cminor}\\
120\> $\downarrow$ \> generate global variable initialisation code\\
121\> $\downarrow$ \> transform to RTL graph\\
122\textsf{RTLabs}\\
123\> $\downarrow$ \> \\
124\>\,\vdots
125\end{tabbing}
126\end{minipage}
127\end{center}
128
129Here, by `endo-transformation', we mean a mapping from language back to itself:
130the loop optimization step maps the Clight language to itself.
131
132%Our overall statements of correctness with respect to costs will
133%require a correctly labelled program
134There are three layers in most of the proofs proposed:
135\begin{enumerate}
136\item invariants closely tied to the syntax and transformations using
137  dependent types (such as the presence of variable names in environments),
138\item a forward simulation proof relating each small-step of the
139  source to zero or more steps of the target, and
140\item proofs about syntactic properties of the cost labelling.
141\end{enumerate}
142The first will support both functional correctness and allow us to
143show the totality of some of the compiler stages (that is, those
144stages of the compiler cannot fail).  The second provides the main
145functional correctness result, including the preservation of cost
146labels in the traces, and the last will be crucial for applying
147correctness results about the costings from the back-end by showing
148that they appear in enough places so that we can assign all of the
149execution costs to them.
150
151We will also prove that a suitably labelled RTLabs trace can be turned
152into a \emph{structured trace} which splits the execution trace into
153cost-label to cost-label chunks with nested function calls.  This
154structure was identified during work on the correctness of the
155back-end cost analysis as retaining important information about the
156structure of the execution that is difficult to reconstruct later in
157the compiler.
158
159\subsection{Clight cast removal}
160
161This transformation removes some casts inserted by the parser to make
162arithmetic promotion explicit but which are superfluous (such as
163\lstinline[language=C]'c = (short)((int)a + (int)b);' where
164\lstinline'a' and \lstinline'b' are \lstinline[language=C]'short').
165This is necessary for producing good code for our target architecture.
166
167It only affects Clight expressions, recursively detecting casts that
168can be safely eliminated.  The semantics provides a big-step
169definition for expression, so we should be able to show a lock-step
170forward simulation between otherwise identical states using a lemma
171showing that cast elimination does not change the evaluation of
172expressions.  This lemma will follow from a structural induction on
173the source expression.  We have already proved a few of the underlying
174arithmetic results necessary to validate the approach.
175
176\subsection{Clight cost labelling}
177
178This adds cost labels before and after selected statements and
179expressions, and the execution traces ought to be equivalent modulo
180the new cost labels.  Hence it requires a simple forward simulation
181with a limited amount of stuttering whereever a new cost label is
182introduced.  A bound can be given for the amount of stuttering allowed
183based on the statement or continuation to be evaluated next.
184
185We also intend to show three syntactic properties about the cost
186labelling:
187\begin{enumerate}
188\item every function starts with a cost label,
189\item every branching instruction is followed by a cost label (note that
190  exiting a loop is treated as a branch), and
191\item the head of every loop (and any \lstinline'goto' destination) is
192  a cost label.
193\end{enumerate}
194These can be shown by structural induction on the source term.
195
196\subsection{Clight to Cminor translation}
197
198This translation is the first to introduce some invariants, with the
199proofs closely tied to the implementation by dependent typing.  These
200are largely complete and show that the generated code enjoys:
201\begin{itemize}
202\item some minimal type safety shown by explicit checks on the
203  Cminor types during the transformation (a little more work remains
204  to be done here, but follows the same form);
205\item that variables named in the parameter and local variable
206  environments are distinct from one another, again by an explicit
207  check;
208\item that variables used in the generated code are present in the
209  resulting environment (either by checking their presence in the
210  source environment, or from a list of freshly generated temporary variables);
211  and
212\item that all \lstinline[language=C]'goto' labels are present (by
213  checking them against a list of source labels and proving that all
214  source labels are preserved).
215\end{itemize}
216
217The simulation will be similar to the relevant stages of CompCert
218(Clight to Csharpminor and Csharpminor to Cminor --- in the event that
219the direct proof is unwieldy we could introduce an intermediate
220language corresponding to Csharpminor).  During early experimentation
221with porting CompCert definitions to the Matita proof assistant we
222found little difficulty reproving the results for the memory model, so
223we plan to port the memory injection properties and use them to relate
224Clight in-memory variables with either the value of the local variable or a
225stack slot, depending on how it was classified.
226
227This should be sufficient to show the equivalence of (big-step)
228expression evaluation.  The simulation can then be shown by relating
229corresponding blocks of statement and continuations with their Cminor
230counterparts and proving that a few steps reaches the next matching
231state.
232
233The syntactic properties required for cost labels remain similar and a
234structural induction on the function bodies should be sufficient to
235show that they are preserved.
236
237\subsection{Cminor global initialisation code}
238
239This short phase replaces the global variable initialisation data with
240code that executes when the program starts.  Each piece of
241initialisation data in the source is matched by a new statement
242storing that data.  As each global variable is allocated a distinct
243memory block, the program state after the initialisation statements
244will be the same as the original program's state at the start of
245execution, and will proceed in the same manner afterwards.
246
247% Actually, the above is wrong...
248% ... this ought to be in a fresh main function with a fresh cost label
249
250\subsection{Cminor to RTLabs translation}
251
252In this part of the compiler we transform the program's functions into
253control flow graphs.  It is closely related to CompCert's Cminorsel to
254RTL transformation, albeit with target-independent operations.
255
256We already enforce several invariants with dependent types: some type
257safety, mostly shown using the type information from Cminor; and
258that the graph is closed (by showing that each successor was recently
259added, or corresponds to a \lstinline[language=C]'goto' label which
260are all added before the end).  Note that this relies on a
261monotonicity property; CompCert maintains a similar property in a
262similar way while building RTL graphs.  We will also add a result
263showing that all of the pseudo-register names are distinct for use by
264later stages using the same method as Cminor.
265
266The simulation will relate Cminor states to RTLabs states which are about to
267execute the code corresponding to the Cminor statement or continuation.
268Each Cminor statement becomes zero or more RTLabs statements, with a
269decreasing measure based on the statement and continuations similar to
270CompCert's.  We may also follow CompCert in using a relational
271specification of this stage so as to abstract away from the functional
272(and highly dependently typed) definition.
273
274The first two labelling properties remain as before; we will show that
275cost labels are preserved, so the function entry point will be a cost
276label, and successors to any statement that are cost labels map still
277map to cost labels, preserving the condition on branches.  We replace
278the property for loops with the notion that we will always reach a
279cost label or the end of the function after following a bounded number of
280successors.  This can be easily seen in Cminor using the requirement
281for cost labels at the head of loops and after gotos.  It remains to
282show that this is preserved by the translation to RTLabs.  % how?
283
284\subsection{RTLabs structured trace generation}
285
286This proof-only step incorporates the function call structure and cost
287labelling properties into the execution trace.  As the function calls
288are nested within the trace, we need to distinguish between
289terminating and non-terminating function calls.  Thus we use the
290excluded middle (specialised to a function termination property) to do
291this.
292
293Structured traces for terminating functions are built by following the
294flat trace, breaking it into chunks between cost labels and
295recursively processing function calls.  The main difficulties here are
296the non-structurally recursive nature of the function (instead we use
297the size of the termination proof as a measure) and using the RTLabs
298cost labelling properties to show that the constraints of the
299structured traces are observed.  We also show that the lower stack
300frames are preserved during function calls in order to prove that
301after returning from a function call we resume execution of the
302correct code.  This part of the work has already been constructed, but
303still requires a simple proof to show that flattening the structured
304trace recreates the original flat trace.
305
306The non-terminating case follows the trace like the terminating
307version to build up chunks of trace from cost-label to cost-label
308(which, by the finite distance to a cost label property shown before,
309can be represented by an inductive type).  These chunks are chained
310together in a coinductive data structure that can represent
311non-terminating traces.  The excluded middle is used to decide whether
312function calls terminate, in which case the function described above
313constructs an inductive terminating structured trace which is nested
314in the caller's trace.  Otherwise, another coinductive constructor is
315used to embed the non-terminating trace of the callee, generated by
316corecursion.  This part of the trace transformation is currently under
317construction, and will also need a flattening result to show that it
318is correct.
319
320
321\section{Backend: RTLabs to machine code}
322\label{sect.backend.rtlabs.machine.code}
323
324The compiler backend consists of the following intermediate languages, and stages of translation:
325
326\begin{center}
327\begin{minipage}{.8\linewidth}
328\begin{tabbing}
329\quad \=\,\vdots\= \\
330\> $\downarrow$ \>\\
331\> $\downarrow$ \quad \= \kill
332\textsf{RTLabs}\\
333\> $\downarrow$ \> copy propagation\footnote{\label{lab:opt}To be ported from the untrusted compiler and certified only in case of early completion of the certification of the other passes.} (an endo-transformation) \\
334\> $\downarrow$ \> instruction selection\\
335\> $\downarrow$ \> change of memory models in compiler\\
336\textsf{RTL}\\
337\> $\downarrow$ \> constant propagation$^{\mbox{\scriptsize \ref{lab:opt}}}$ (an endo-transformation) \\
338\> $\downarrow$ \> calling convention made explicit \\
339\> $\downarrow$ \> layout of activation records \\
340\textsf{ERTL}\\
341\> $\downarrow$ \> register allocation and spilling\\
342\> $\downarrow$ \> dead code elimination\\
343\textsf{LTL}\\
344\> $\downarrow$ \> function linearisation\\
345\> $\downarrow$ \> branch compression (an endo-transformation) \\
346\textsf{LIN}\\
347\> $\downarrow$ \> relabeling\\
348\textsf{ASM}\\
349\> $\downarrow$ \> pseudoinstruction expansion\\
350\textsf{MCS-51 machine code}\\
351\end{tabbing}
352\end{minipage}
353\end{center}
354
355\subsection{Graph translations}
356RTLabs and most intermediate languages in the back-end have a graph
357representation:
358the code of each function is represented by a graph of instructions.
359The graph maps a set of labels (the names of the nodes) to the instruction
360stored at that label (the nodes of the graph).
361Instructions reference zero or more additional labels that are the immediate
362successors of the instruction: zero for return from functions; more than one
363for conditional jumps and calls; one in all other cases. The references
364from one instruction to its immediates are the arcs of the graph.
365
366Status of graph languages always have a program counter that holds a
367representation of a reference to the current instruction.
368
369A translation between two consecutive graph languages maps each instruction
370stored at location $l$ in the first graph and with immediate successors
371$\{l_1,\ldots,l_n\}$ to a subgraph of the output graph that has a single
372entry point at location $l$ and exit arcs to $\{l_1,\ldots,l_n\}$. Moreover,
373the labels of all non entry nodes in the subgraph are distinct from all the
374labels in the source graph.
375
376In order to simplify the translations and the relative proofs of forward
377simulation, after the release of D4.2 and D4.3, we have provided:
378\begin{itemize}
379 \item a new data type (called \texttt{blist}) that represents a
380   sequence of instructions to be added to the output graph.
381   The ``b'' in the name stands for binder, since a \texttt{blist} is
382   either empty, an extension of a \texttt{blist} with an instruction
383   at the front, or the generation of a fresh quantity followed by a
384   \texttt{blist}. The latter feature is used, for instance, to generate
385   fresh register names. The instructions in the list are unlabelled and
386   all of them but the last one are also sequential, like in a linear program.
387 \item a new iterator (called \texttt{b\_graph\_translate}) of type
388\begin{displaymath}
389\mathtt{b\_graph\_translate}: (\mathtt{label} \rightarrow \mathtt{blist})
390\rightarrow \mathtt{graph} \rightarrow \mathtt{graph}
391\end{displaymath}
392   The iterator transform the input graph in the output graph by replacing
393   each node with the graph that corresponds to the linear \texttt{blist}
394   obtained by applying the function in input to the node label.
395\end{itemize}
396
397Using the iterator above, the code can be written in such a way that
398the programmer does not see any distinction between writing a transformation
399on linear or graph languages.
400
401In order to prove simulations for translations obtained using the iterator,
402we will prove the following theorem:
403
404\begin{align*}
405\mathtt{theorem} &\ \mathtt{b\_graph\_translate\_ok}: \\
406& \forall  f.\forall G_{i}.\mathtt{let}\ G_{\sigma} := \mathtt{b\_graph\_translate}\ f\ G_{i}\ \mathtt{in} \\
407&       \forall l \in G_{i}.\mathtt{subgraph}\ (f\ l)\ l\ (next \ l \ G_i)\ G_{\sigma}
408\end{align*}
409
410Here \texttt{subgraph} is a computational predicate that given a \texttt{blist}
411$[i_1, \ldots, i_n]$, an entry label $l$, an exit label $l'$ and a graph $G$
412expands to the fact that fetching from $G$ at address $l$ one retrieves a node
413$i_1$ with a successor $l_1$ that, when fetched, yields a node $i_2$ with a
414successor $l_2$ such that \ldots. The successor of $i_n$ is $l'$.
415
416Proving a forward simulation diagram of the following kind using the theorem
417above is now as simple as doing the same using standard small step operational
418semantics over linear languages.
419
420\begin{align*}
421\mathtt{lemma} &\ \mathtt{execute\_1\_step\_ok}: \\
422&       \forall s.  \mathtt{let}\ s' := s\ \sigma\ \mathtt{in} \\
423&       \mathtt{let}\ l := pc\ s\ \mathtt{in} \\
424&       s \stackrel{1}{\rightarrow} s^{*} \Rightarrow \exists n. s' \stackrel{n}{\rightarrow} s'^{*} \wedge s'^{*} = s'\ \sigma
425\end{align*}
426
427Because of the fact that graph translation preserves entry and exit labels of
428translated statements, the state translation function $\sigma$ will simply
429preserve the value of the program counter. The program code, which is
430part of the state, is translated using the iterator.
431
432The proof is then roughly the following. Let $l$ be the program counter of the
433input state $s$. We proceed by cases on the current instruction of $s$.
434Let $[i_1, \ldots, i_n]$ be the \texttt{blist} associated to $l$ and $s$
435by the translation function. The witness required for the existential
436statement is simply $n$. By applying the theorem above we know that the
437next $n$ instructions that will be fetched from $s\ \sigma$ will be
438$[i_1, \ldots, i_n]$ and it is now sufficient to prove that they simulate
439the original instruction.
440
441\subsection{The RTLabs to RTL translation}
442\label{subsect.rtlabs.rtl.translation}
443
444The RTLabs to RTL translation pass marks the frontier between the two memory models used in the CerCo project.
445As a result, we require some method of translating between the values that the two memory models permit.
446Suppose we have such a translation, $\sigma$.
447Then the translation between values of the two memory models may be pictured with:
448
449\begin{displaymath}
450\mathtt{Value} ::= \bot \mid \mathtt{int(size)} \mid \mathtt{float} \mid \mathtt{null} \mid \mathtt{ptr} \quad\stackrel{\sigma}{\longrightarrow}\quad \mathtt{BEValue} ::= \bot \mid \mathtt{byte} \mid \mathtt{null}_i \mid \mathtt{ptr}_i
451\end{displaymath}
452
453In the front-end, we have both integer and float values, where integer values are `sized', along with null values and pointers. Some frontenv values are
454representables in a byte, but some others require more bits.
455
456In the back-end model all values are meant to be represented in a single byte.
457Values can thefore be undefined, be one byte long integers or be indexed
458fragments of a pointer, null or not. Floats values are no longer present, as floating point arithmetic is not supported by the CerCo compiler.
459
460The $\sigma$ map implements a one-to-many relation: a single front-end value
461is mapped to a sequence of back-end values when its size is more then one byte.
462
463We further require a map, $\sigma$, which maps the front-end \texttt{Memory} and the back-end's notion of \texttt{BEMemory}. Both kinds of memory can be
464thought as an instance of a generic \texttt{Mem} data type parameterized over
465the kind of values stored in memory.
466
467\begin{displaymath}
468\mathtt{Mem}\ \alpha = \mathtt{Block} \rightarrow (\mathbb{Z} \rightarrow \alpha)
469\end{displaymath}
470
471Here, \texttt{Block} consists of a \texttt{Region} paired with an identifier.
472
473\begin{displaymath}
474\mathtt{Block} ::= \mathtt{Region} \times \mathtt{ID}
475\end{displaymath}
476
477We now have what we need for defining what is meant by the `memory' in the backend memory model.
478Namely, we instantiate the previously defined \texttt{Mem} type with the type of back-end memory values.
479
480\begin{displaymath}
481\mathtt{BEMem} = \mathtt{Mem}~\mathtt{BEValue}
482\end{displaymath}
483
484Memory addresses consist of a pair of back-end memory values:
485
486\begin{displaymath}
487\mathtt{Address} = \mathtt{BEValue} \times  \mathtt{BEValue} \\
488\end{displaymath}
489
490The back- and front-end memory models differ in how they represent sized integeer values in memory.
491In particular, the front-end stores integer values as a header, with size information, followed by a string of `continuation' blocks, marking out the full representation of the value in memory.
492In contrast, the layout of sized integer values in the back-end memory model consists of a series of byte-sized `chunks':
493
494\begin{center}
495\begin{picture}(0, 25)
496\put(-125,0){\framebox(25,25)[c]{\texttt{v,4}}}
497\put(-100,0){\framebox(25,25)[c]{\texttt{cont}}}
498\put(-75,0){\framebox(25,25)[c]{\texttt{cont}}}
499\put(-50,0){\framebox(25,25)[c]{\texttt{cont}}}
500\put(-15,10){\vector(1, 0){30}}
501\put(25,0){\framebox(25,25)[c]{\texttt{\texttt{v$_1$}}}}
502\put(50,0){\framebox(25,25)[c]{\texttt{\texttt{v$_2$}}}}
503\put(75,0){\framebox(25,25)[c]{\texttt{\texttt{v$_3$}}}}
504\put(100,0){\framebox(25,25)[c]{\texttt{\texttt{v$_4$}}}}
505\end{picture}
506\end{center}
507
508Chunks for pointers are pairs made of the original pointer and the index of the chunk.
509Therefore, when assembling the chunks together, we can always recognize if all chunks refer to the same value or if the operation is meaningless.
510
511The differing memory representations of values in the two memory models imply the need for a series of lemmas on the actions of \texttt{load} and \texttt{store} to ensure correctness.
512The first lemma required has the following statement:
513\begin{displaymath}
514\mathtt{load}\ s\ a\ M = \mathtt{Some}\ v \rightarrow \forall i \leq s.\ \mathtt{load}\ s\ (a + i)\ \sigma(M) = \mathtt{Some}\ v_i
515\end{displaymath}
516That is, if we are successful in reading a value of size $s$ from memory at address $a$ in front-end memory, then we should successfully be able to read all of its chunks from memory in the back-end memory at appropriate address (from address $a$ up to and including address $a + i$, where $i \leq s$).
517
518Next, we must show that \texttt{store} properly commutes with the $\sigma$-map between memory spaces:
519\begin{displaymath}
520\sigma(\mathtt{store}\ a\ v\ M) = \mathtt{store}\ \sigma(v)\ \sigma(a)\ \sigma(M)
521\end{displaymath}
522That is, if we store a value \texttt{v} in the front-end memory \texttt{M} at address \texttt{a} and transform the resulting memory with $\sigma$, then this is equivalent to storing a transformed value $\mathtt{\sigma(v)}$ at address $\mathtt{\sigma(a)}$ into the back-end memory $\mathtt{\sigma(M)}$.
523
524Finally, the commutation properties between \texttt{load} and \texttt{store} are weakened in the $\sigma$-image of the memory.
525Writing \texttt{load}$^*$ for the multiple consecutive iterations of \texttt{load} used to fetch all chunks of a value, we must prove that, when $a \neq a'$:
526\begin{displaymath}
527\texttt{load}^* \sigma(a)\ (\mathtt{store}\ \sigma(a')\ \sigma(v)\ \sigma(M)) = \mathtt{load}^*\ \sigma(s)\ \sigma(a)\ \sigma(M)
528\end{displaymath}
529That is, suppose we store a transformed value $\mathtt{\sigma(v)}$ into a back-end memory $\mathtt{\sigma(M)}$ at address $\mathtt{\sigma(a')}$, using \texttt{store}, and then load from the address $\sigma(a)$. Even if $a$ and $a'$ are
530distinct by hypothesis, there is a priori no guarantee that the consecutive
531bytes for the value stored at $\sigma(a)$ are disjoint from those for the
532values stored at $\sigma(a')$. The fact that this holds is a non-trivial
533property of $\sigma$ to be proved.
534
535RTLabs states come in three flavours:
536\begin{displaymath}
537\begin{array}{rll}
538\mathtt{State} & ::=  & (\mathtt{State} : \mathtt{Frame}^* \times \mathtt{Frame} \\
539               & \mid & \mathtt{Call} : \mathtt{Frame}^* \times \mathtt{Args} \times \mathtt{Return} \times \mathtt{Fun} \\
540               & \mid & \mathtt{Return} : \mathtt{Frame}^* \times \mathtt{Value} \times \mathtt{Return}) \times \mathtt{Mem}
541\end{array}
542\end{displaymath}
543\texttt{State} is the default state in which RTLabs programs are almost always in.
544The \texttt{Call} state is only entered when a call instruction is being executed, and then we immediately return to being in \texttt{State}.
545Similarly, \texttt{Return} is only entered when a return instruction is being executed, before returning immediately to \texttt{State}.
546All RTLabs states are accompanied by a memory, \texttt{Mem}, with \texttt{Call} and \texttt{Return} keeping track of arguments, return addresses and the results of functions.
547\texttt{State} keeps track of a list of stack frames.
548
549RTL states differ from their RTLabs counterparts, in including a program counter \texttt{PC}, stack-pointer \texttt{SP}, internal stack pointer \texttt{ISP}, a carry flag \texttt{CARRY} and a set of registers \texttt{REGS}:
550\begin{displaymath}
551\mathtt{State} ::= \mathtt{Frame}^* \times \mathtt{PC} \times \mathtt{SP} \times \mathtt{ISP} \times \mathtt{CARRY} \times \mathtt{REGS}
552\end{displaymath}
553The internal stack pointer \texttt{ISP}, and its relationship with the stack pointer \texttt{SP}, needs some comment.
554Due to the design of the MCS-51, and its minuscule stack, it was decided that the compiler would implement an emulated stack in external memory.
555As a result, we have two stack pointers in our state: \texttt{ISP}, which is the real, hardware stack, and \texttt{SP}, which is the stack pointer of the emulated stack in memory.
556The emulated stack is used for pushing and popping stack frames when calling or returning from function calls, however this is done using the hardware stack, indexed by \texttt{ISP} as an intermediary.
557Instructions like \texttt{LCALL} and \texttt{ACALL} are hardwired by the processor's design to push the return address on to the hardware stack. Therefore after a call has been made, and before a call returns, the compiler emits code to move the return address back and forth the two stacks. Parameters, return values
558and local variables are only present in the external stack.
559As a result, for most of the execution of the processor, the hardware stack is empty, or contains a single item ready to be moved into external memory.
560
561Once more, we require a relation $\sigma$ between RTLabs states and RTL states.
562Because $\sigma$ is one-to-many and, morally, a multi-function,
563we use in the following the functional notation for $\sigma$, using $\star$
564in the output of $\sigma$ to mean that any value is accepted.
565\begin{displaymath}
566\mathtt{State} \stackrel{\sigma}{\longrightarrow} \mathtt{State}
567\end{displaymath}
568
569Translating an RTLabs state to an RTL state proceeds by cases on the particular type of state we are trying to translate, either a \texttt{State}, \texttt{Call} or a \texttt{Return}.
570For \texttt{State} we perform a further case analysis of the top stack frame, which decomposes into a tuple holding the current program counter value, the current stack pointer and the value of the registers:
571\begin{displaymath}
572\sigma(\mathtt{State} (\mathtt{Frame}^* \times \mathtt{\langle PC, REGS, SP \rangle})) \longrightarrow ((\sigma(\mathtt{Frame}^*), \sigma(\mathtt{PC}), \sigma(\mathtt{SP}), \star, \star, \sigma(\mathtt{REGS})), \sigma(\mathtt{Mem}))
573\end{displaymath}
574Translation then proceeds by translating the remaining stack frames, as well as the contents of the top stack frame. Any value for the internal stack pointer
575and the carry bit is admitted.
576
577Translating \texttt{Call} and \texttt{Return} states is more involved, as a commutation between a single step of execution and the translation process must hold:
578\begin{displaymath}
579\sigma(\mathtt{Return}(-)) \longrightarrow \sigma \circ \text{return one step}
580\end{displaymath}
581
582\begin{displaymath}
583\sigma(\mathtt{Call}(-)) \longrightarrow \sigma \circ \text{call one step}
584\end{displaymath}
585
586Here \emph{return one step} and \emph{call one step} refer to a pair of commuting diagrams relating the one-step execution of a call and return state and translation of both.
587We provide the one step commuting diagrams in Figure~\ref{fig.commuting.diagrams}. The fact that one execution step in the source language is not performed
588in the target language is not problematic for preservation of divergence
589because it is easy to show that every step from a \texttt{Call} or
590\texttt{Return} state is always preceeded/followed by one step that is always
591simulated.
592
593\begin{figure}
594\begin{displaymath}
595\begin{diagram}
596s & \rTo^{\text{one step of execution}} & s'   \\
597  & \rdTo                             & \dTo \\
598  &                                   & \llbracket s'' \rrbracket
599\end{diagram}
600\end{displaymath}
601
602\begin{displaymath}
603\begin{diagram}
604s & \rTo^{\text{one step of execution}} & s'   \\
605  & \rdTo                             & \dTo \\
606  &                                   & \llbracket s'' \rrbracket
607\end{diagram}
608\end{displaymath}
609\caption{The one-step commuting diagrams for \texttt{Call} and \texttt{Return} state translations}
610\label{fig.commuting.diagrams}
611\end{figure}
612
613The forward simulation proof for all steps that do not involve function calls are lengthy, but routine.
614They consist of simulating a front-end operation on front-end pseudo-registers and the front-end memory with sequences of back-end operations on the back-end pseudo-registers and back-end memory.
615The properties of $\sigma$ presented before that relate values and memories will need to be heavily exploited.
616
617The simulation of invocation of functions and returns from functions is less obvious.
618We sketch here what happens on the source code and on its translation.
619
620\begin{displaymath}
621\begin{array}{rcl}
622\mathtt{Call(id,\ args,\ dst,\ pc),\ State(Frame^*, Frame)} & \longrightarrow & \mathtt{Call(M(args), dst)}, \\
623                                                           &                 & \mathtt{PUSH(Frame[PC := after\_return])}
624\end{array}
625\end{displaymath}
626Suppose we are given a \texttt{State} with a list of stack frames, with the top frame being \texttt{Frame}.
627Suppose also that the program counter in \texttt{Frame} points to a \texttt{Call} instruction, complete with arguments and destination address.
628Then this is executed by entering into a \texttt{Call} state where the arguments are loaded from memory, and the address pointing to the instruction immediately following the \texttt{Call} instruction is filled in, with the current stack frame being pushed on top of the stack with the return address substituted for the program counter.
629
630Now, what happens next depends on whether we are executing an internal or an external function.
631In the case where the call is to an external function, we have:
632\begin{displaymath}
633\begin{array}{rcl}
634\mathtt{Call(M(args), dst)},                       & \stackrel{\mathtt{ret\_val = f(M(args))}}{\longrightarrow} & \mathtt{Return(ret\_val,\ dst,\ PUSH(...))} \\
635\mathtt{PUSH(current\_frame[PC := after\_return])} &                                                            & 
636\end{array}
637\end{displaymath}
638That is, the call to the external function enters a return state after first computing the return value by executing the external function on the arguments.
639Then the return state restores the program counter by popping the stack, and execution proceeds in a new \texttt{State}:
640\begin{displaymath}
641\begin{array}{rcl}
642\mathtt{Return(ret\_val,\ dst,\ PUSH(...))} & \longrightarrow & \mathtt{pc = POP\_STACK(regs[dst := M(ret\_val)],\ pc)} \\
643                                            &                 & \mathtt{State(regs[dst := M(ret\_val),\ pc)}
644\end{array}
645\end{displaymath}
646
647Suppose we are executing an internal function, however:
648\begin{displaymath}
649\begin{array}{rcl}
650\mathtt{Call(M(args), dst)}                        & \longrightarrow & \mathtt{SP = alloc,\ regs = \emptyset[- := params]} \\
651\mathtt{PUSH(current\_frame[PC := after\_return])} &                 & \mathtt{State(regs,\ sp,\ pc_\emptyset,\ dst)}
652\end{array}
653\end{displaymath}
654Here, execution of the \texttt{Call} state first pushes the current frame with the program counter set to the address following the function call.
655The stack pointer allocates more space, the register map is initialized first to the empty map, assigning an undefined value to all register, before the value of the parameters is inserted into the map into the argument registers, and a new \texttt{State} follows.
656After this, the stack pointer is freed and a \texttt{Return} state is entered:
657\begin{displaymath}
658\begin{array}{rcl}
659\mathtt{sp = alloc,\ regs = \emptyset[- := PARAMS]} & \longrightarrow & \mathtt{free(sp)} \\
660\mathtt{State(regs,\ sp,\ pc_\emptyset,\ dst)}     &                 & \mathtt{Return(M(ret\_val), dst, Frames)}
661\end{array}
662\end{displaymath}
663Then the return state restores the program counter by popping the stack, and execution proceeds in a new \texttt{State}, like the case for external functions:
664\begin{displaymath}
665\begin{array}{rcl}
666\mathtt{free(sp)}                         & \longrightarrow & \mathtt{pc = POP\_STACK(regs[dst := M(ret\_val)],\ pc)} \\
667\mathtt{Return(M(ret\_val), dst, frames)} &                 & \mathtt{State(regs[dst := M(ret\_val),\ pc)}
668\end{array}
669\end{displaymath}
670
671Translation from RTLabs to RTL states proceeds as follows.
672Return states are translated as is:
673\begin{displaymath}
674\mathtt{Return} \longrightarrow \mathtt{Return}
675\end{displaymath}
676
677\texttt{Call} states are translated to \texttt{Call\_ID} states:
678\begin{displaymath}
679\mathtt{Call(id,\ args,\ dst,\ pc)} \longrightarrow \mathtt{Call\_ID(id,\ \sigma'(args),\ \sigma(dst),\ pc)}
680\end{displaymath}
681Here, $\sigma$ and $\sigma'$ are two maps to be defined between pseudo-registers and lists of pseudo-registers, of the type:
682
683\begin{displaymath}
684\sigma: \mathtt{register} \rightarrow \mathtt{list\ register}
685\end{displaymath}
686
687and:
688
689\begin{displaymath}
690\sigma': \mathtt{list\ register} \rightarrow \mathtt{list\ register}
691\end{displaymath}
692
693where $\sigma'$ is implemented as:
694
695\begin{displaymath}
696\sigma' = \mathtt{flatten} \circ \sigma
697\end{displaymath}
698
699In the case of RTL, execution proceeds as follows.
700Suppose we are executing a \texttt{CALL\_ID} instruction.
701Then a case split occurs depending on whether we are executing an internal or an external function, as in the RTLabs case:
702\begin{displaymath}
703\begin{diagram}
704& & \llbracket \mathtt{CALL\_ID}(\mathtt{id}, \mathtt{args}, \mathtt{dst}, \mathtt{pc})\rrbracket & & \\
705& \ldTo^{\text{external}} & & \rdTo^{\text{internal}} & \\
706\skull & & & & \mathtt{regs} = [\mathtt{params}/-] \\
707& & & & \mathtt{sp} = \mathtt{ALLOC} \\
708& & & & \mathtt{PUSH}(\mathtt{carry}, \mathtt{regs}, \mathtt{dst}, \mathtt{return\_addr}), \mathtt{pc}_{0}, \mathtt{regs}, \mathtt{sp} \\
709\end{diagram}
710\end{displaymath}
711Here, however, we differ from RTLabs when we attempt to execute an external function, in that we use a daemon (i.e. an axiom that can close any goal) to artificially close the case, as we have not yet implemented external functions in the backend.
712The reason for this lack of implementation is as follows.
713Though we have implemented an optimising assembler as the target of the compiler's backend, we have not yet implemented a linker for that assembler, so external functions can not yet be called.
714Whilst external functions are carried forth throughout the entirety of the compiler's frontend, we choose not to do the same for the backend, instead eliminating them in RTL.
715However, it is plausible that we could have carried external functions forth, in order to eliminate them at a later stage (i.e. when translating from LIN to assembly).
716
717In the case of an internal function being executed, we proceed as follows.
718The register map is initialized to the empty map, where all registers are assigned the undefined value, and then the registers corresponding to the function parameters are assigned the value of the parameters.
719Further, the stack pointer is reallocated to make room for an extra stack frame, then a frame is pushed onto the stack with the correct address to jump back to in place of the program counter.
720
721Note, in particular, that this final act of pushing a frame on the stack leaves us in an identical state to the RTLabs case, where the instruction
722\begin{displaymath}
723\mathtt{PUSH(current\_frame[PC := after\_return])}
724\end{displaymath}
725
726was executed.
727
728The execution of \texttt{Return} in RTL is similarly straightforward, with the return address, stack pointer, and so on, being computed by popping off the top of the stack, and the return value computed by the function being retrieved from memory:
729\begin{align*}
730\mathtt{return\_addr} & := \mathtt{top}(\mathtt{stack}) \\
731v*                    & := M(\mathtt{rv\_regs}) \\
732\mathtt{dst}, \mathtt{sp}, \mathtt{carry}, \mathtt{regs} & := \mathtt{pop} \\
733\mathtt{regs}[v* / \mathtt{dst}] \\
734\end{align*}
735
736\begin{displaymath}
737\begin{diagram}
738s    & \rTo^1 & s' & \rTo^1 & s'' \\
739\dTo &        &    & \rdTo  & \dTo \\
740\llbracket s \rrbracket & \rTo(1,3)^1 & & & \llbracket s'' \rrbracket \\ 
741\mathtt{CALL} \\
742\end{diagram}
743\end{displaymath}
744
745\begin{displaymath}
746\begin{diagram}
747s    & \rTo^1 & s' & \rTo^1 & s'' \\
748\dTo &        &    & \rdTo  & \dTo \\
749\  & \rTo(1,3) & & & \ \\
750\mathtt{RETURN} \\
751\end{diagram}
752\end{displaymath}
753
754\begin{align*}
755\mathrm{RTL\ status} & \ \ \mathrm{ERTL\ status} \\
756\mathtt{sp} & = \mathtt{spl} / \mathtt{sph} \\
757\mathtt{graph} &  \mathtt{graph} + \mathtt{prologue}(s) + \mathtt{epilogue}(s) \\
758& \mathrm{where}\ s = \mathrm{callee\ saved} + \nu \mathrm{RA} \\
759\end{align*}
760
761\begin{displaymath}
762\begin{diagram}
763\mathtt{CALL} & \rTo^1 & \mathtt{inside\ function} \\
764\dTo & & \dTo \\
765\underbrace{\ldots}_{\llbracket \mathtt{CALL} \rrbracket} & \rTo &
766\underbrace{\ldots}_{\mathtt{prologue}} \\
767\end{diagram}
768\end{displaymath}
769
770\begin{displaymath}
771\begin{diagram}
772\mathtt{RETURN} & \rTo^1 & \mathtt{.} \\
773\dTo & & \dTo \\
774\underbrace{\ldots}_{\mathtt{epilogue}} & \rTo &
775\underbrace{\ldots} \\
776\end{diagram}
777\end{displaymath}
778
779\begin{align*}
780\mathtt{prologue}(s) = & \mathtt{create\_new\_frame}; \\
781                       & \mathtt{pop\ ra}; \\
782                       & \mathtt{save\ callee\_saved}; \\
783                                                                                         & \mathtt{get\_params} \\
784                                                                                         & \ \ \mathtt{reg\_params}: \mathtt{move} \\
785                                                                                         & \ \ \mathtt{stack\_params}: \mathtt{push}/\mathtt{pop}/\mathtt{move} \\
786\end{align*}
787
788\begin{align*}
789\mathtt{epilogue}(s) = & \mathtt{save\ return\ to\ tmp\ real\ regs}; \\
790                                                                                         & \mathtt{restore\_registers}; \\
791                       & \mathtt{push\ ra}; \\
792                       & \mathtt{delete\_frame}; \\
793                       & \mathtt{save return} \\
794\end{align*}
795
796\begin{displaymath}
797\mathtt{CALL}\ id \mapsto \mathtt{set\_params};\ \mathtt{CALL}\ id;\ \mathtt{fetch\_result}
798\end{displaymath}
799
800\subsection{The ERTL to LTL translation}
801\label{subsect.ertl.ltl.translation}
802\newcommand{\declsf}[1]{\expandafter\newcommand\expandafter{\csname #1\endcsname}{\mathop{\mathsf{#1}}\nolimits}}
803\declsf{Livebefore}
804\declsf{Liveafter}
805\declsf{Defined}
806\declsf{Used}
807\declsf{Eliminable}
808\declsf{StatementSem}
809For the liveness analysis, we aim at a map
810$\ell \in \mathtt{label} \mapsto $ live registers at $\ell$.
811We define the following operators on ERTL statements.
812$$
813\begin{array}{lL>{(ex. $}L<{)$}}
814\Defined(s) & registers defined at $s$ & r_1\leftarrow r_2+r_3 \mapsto \{r_1,C\}, \mathtt{CALL}~id\mapsto \text{caller-save}
815\\
816\Used(s) & registers used at $s$ & r_1\leftarrow r_2+r_3 \mapsto \{r_2,r_3\}, \mathtt{CALL}~id\mapsto \text{parameters}
817\end{array}
818$$
819Given $LA:\mathtt{label}\to\mathtt{lattice}$ (where $\mathtt{lattice}$
820is the type of sets of registers\footnote{More precisely, it is thethe lattice
821of pairs of sets of pseudo-registers and sets of hardware registers,
822with pointwise operations.}, we also have have the following
823predicates:
824$$
825\begin{array}{lL}
826\Eliminable_{LA}(\ell) & iff $s(\ell)$ has side-effects only on $r\notin LA(\ell)$
827\\&
828(ex.\ $\ell : r_1\leftarrow r_2+r_3 \mapsto (\{r_1,C\}\cap LA(\ell)\neq\emptyset,
829  \mathtt{CALL}id\mapsto \text{never}$)
830\\
831\Livebefore_{LA}(\ell) &$:=
832  \begin{cases}
833    LA(\ell) &\text{if $\Eliminable_{LA}(\ell)$,}\\
834    (LA(\ell)\setminus \Defined(s(\ell)))\cup \Used(s(\ell) &\text{otherwise}.
835  \end{cases}$
836\end{array}
837$$
838In particular, $\Livebefore$ has type $(\mathtt{label}\to\mathtt{lattice})\to
839\mathtt{label}\to\mathtt{lattice}$.
840
841The equation on which we build the fixpoint is then
842$$\Liveafter(\ell) \doteq \bigcup_{\ell' >_1 \ell} \Livebefore_{\Liveafter}(\ell')$$
843where $\ell' >_1 \ell$ denotes that $\ell'$ is an immediate successor of $\ell$
844in the graph. We do not require the fixpoint to be the least one, so the hypothesis
845on $\Liveafter$ that we require is
846$$\Liveafter(\ell) \supseteq \bigcup_{\ell' >_1 \ell} \Livebefore(\ell')$$
847(for shortness we drop the subscript from $\Livebefore$).
848\subsection{The LTL to LIN translation}
849\label{subsect.ltl.lin.translation}
850Ad detailed elsewhere in the reports, due to the parameterized representation of
851the back-end languages, the pass described here is actually much more generic
852than the translation from LTL to LIN. It consists in a linearization pass
853that maps any graph-based back-end language to its corresponding linear form,
854preserving its semantics. In the rest of the section, however, we will keep
855the names LTL and LIN for the two partial instantiations of the parameterized
856language.
857
858We require a map, $\sigma$, from LTL statuses, where program counters are represented as labels in a graph data structure, to LIN statuses, where program counters are natural numbers:
859\begin{displaymath}
860\mathtt{pc : label} \stackrel{\sigma}{\longrightarrow} \mathbb{N}
861\end{displaymath}
862
863The LTL to LIN translation pass also linearises the graph data structure into a list of instructions.
864Pseudocode for the linearisation process is as follows:
865
866\begin{lstlisting}
867let rec linearise graph visited required generated todo :=
868  match todo with
869  | l::todo ->
870    if l $\in$ visited then
871      let generated := generated $\cup\ \{$ Goto l $\}$ in
872      let required := required $\cup$ l in
873        linearise graph visited required generated todo
874    else
875      -- Get the instruction at label `l' in the graph
876      let lookup := graph(l) in
877      let generated := generated $\cup\ \{$ lookup $\}$ in
878      -- Find the successor of the instruction at label `l' in the graph
879      let successor := succ(l, graph) in
880      let todo := successor::todo in
881        linearise graph visited required generated todo
882  | []      -> (required, generated)
883\end{lstlisting}
884
885It is easy to see that this linearisation process eventually terminates.
886In particular, the size of the visited label set is monotonically increasing, and is bounded above by the size of the graph that we are linearising.
887
888The initial call to \texttt{linearise} sees the \texttt{visited}, \texttt{required} and \texttt{generated} sets set to the empty set, and \texttt{todo} initialized with the singleton list consisting of the entry point of the graph.
889We envisage needing to prove the following invariants on the linearisation function above:
890
891\begin{enumerate}
892\item
893$\mathtt{visited} \approx \mathtt{generated}$, where $\approx$ is \emph{multiset} equality, as \texttt{generated} is a set of instructions where instructions may mention labels multiple times, and \texttt{visited} is a set of labels,
894\item
895$\forall \mathtt{l} \in \mathtt{generated}.\ \mathtt{succ(l,\ graph)} \subseteq \mathtt{required} \cup \mathtt{todo}$,
896\item
897$\mathtt{required} \subseteq \mathtt{visited}$,
898\item
899$\mathtt{visited} \cap \mathtt{todo} = \emptyset$.
900\end{enumerate}
901
902The invariants collectively imply the following properties, crucial to correctness, about the linearisation process:
903
904\begin{enumerate}
905\item
906Every graph node is visited at most once,
907\item
908Every instruction that is generated is generated due to some graph node being visited,
909\item
910The successor instruction of every instruction that has been visited already will eventually be visited too.
911\end{enumerate}
912
913Note, because the LTL to LIN transformation is the first time the code of
914a function is linearised in the back-end, we must discover a notion of `well formed function code' suitable for linearised forms.
915In particular, we see the notion of well formedness (yet to be formally defined) resting on the following conditions:
916
917\begin{enumerate}
918\item
919For every jump to a label in a linearised function code, the target label exists at some point in the function code,
920\item
921Each label is unique, appearing only once in the function code,
922\item
923The final instruction of a function code must be a return or an unconditional
924jump.
925\end{enumerate}
926
927We assume that these properties will be easy consequences of the invariants on the linearisation function defined above.
928
929The final condition above is potentially a little opaque, so we explain further.
930The only instructions that can reasonably appear in final position at the end of a function code are returns or backward jumps, as any other instruction would cause execution to `fall out' of the end of the program (for example, when a function invoked with \texttt{CALL} returns, it returns to the next instruction past the \texttt{CALL} that invoked it).
931
932\subsection{The LIN to ASM and ASM to MCS-51 machine code translations}
933\label{subsect.lin.asm.translation}
934
935The LIN to ASM translation step is trivial, being almost the identity function.
936The only non-trivial feature of the LIN to ASM translation is that all labels are `named apart' so that there is no chance of freshly generated labels from different namespaces clashing with labels from another namespace.
937
938The ASM to MCS-51 machine code translation step, and the required statements of correctness, are found in an unpublished manuscript attached to this document.
939This is the most complex translation because of the huge number of cases
940to be addressed and because of the complexity of the two semantics.
941Moreover, in the assembly code we have conditional and unconditional jumps
942to arbitrary locations in the code, which are not supported by the MCS-51
943instruction set. The latter has several kind of jumps characterized by a
944different instruction size and execution time, but limited in range. For
945instance, conditional jumps to locations whose destination is more than
946$2^7$ bytes away from the jump instruction location are not supported at
947all and need to be emulated with a code transformation. The problem, which
948is known in the litterature as branch displacement and that applies also
949to modern architectures, is known to be hard and is often NP. As far as we
950know, we will provide the first formally verified proof of correctness for
951an assembler that implements branch displacement. We are also providing
952the first verified proof of correctness of a mildly optimizing branch
953displacement algorithm that, at the moment, is almost finished, but not
954described in the companion paper. This proof by itself took about 6 men
955months.
956
957\section{Correctness of cost prediction}
958Roughly speaking,
959the proof of correctness of cost prediction shows that the cost of executing
960a labelled object code program is the same as the sum over all labels in the
961program execution trace of the cost statically associated to the label and
962computed on the object code itself.
963
964In presence of object level function calls, the previous statement is, however,
965incorrect. The reason is twofold. First of all, a function call may diverge.
966To the last labels that comes before the call, however, we also associate
967the cost of the instructions that follow the call. Therefore, in the
968sum over all labels, when we meet a label we pre-pay for the instructions
969after function calls, assuming all calls to be terminating. This choice is
970driven by considerations on the source code. Functions can be called also
971inside expressions and it would be too disruptive to put labels inside
972expressions to capture the cost of instructions that follow a call. Moreover,
973adding a label after each call would produce a much higher number of proof
974obligations in the certification of source programs using Frama-C. The
975proof obligations, moreover, would be guarded by termination of all functions
976involved, that also generates lots of additional complex proof obligations
977that have little to do with execution costs. With our approach, instead, we
978put less burden on the user, at the price of proving a weaker statement:
979the estimated and actual costs will be the same if and only if the high level
980program is converging. For prefixes of diverging programs we can provide
981a similar result where the equality is replaced by an inequality (loss of
982precision).
983
984Assuming totality of functions is however not sufficient yet at the object
985level. Even if a function returns, there is no guarantee that it will transfer
986control back to the calling point. For instance, the function could have
987manipulated the return address from its stack frame. Moreover, an object level
988program can forge any address and transfer control to it, with no guarantee
989on the execution behaviour and labelling properties of the called program.
990
991To solve the problem, we introduced the notion of \emph{structured trace}
992that come in two flavours: structured traces for total programs (an inductive
993type) and structured traces for diverging programs (a co-inductive type based
994on the previous one). Roughly speaking, a structured trace represents the
995execution of a well behaved program that is subject to several constraints
996like
997\begin{enumerate}
998 \item All function calls return control just after the calling point
999 \item The execution of all function bodies start with a label and end with
1000   a RET (even the ones reached by invoking a function pointer)
1001 \item All instructions are covered by a label (required by correctness of
1002   the labelling approach)
1003 \item The target of all conditional jumps must be labelled (a sufficient
1004   but not necessary condition for precision of the labelling approach)
1005 \item \label{prop5} Two structured traces with the same structure yield the same
1006   cost traces.
1007\end{enumerate}
1008
1009Correctness of cost predictions is proved only for structured execution traces,
1010i.e. well behaved programs. The forward simulation proof for all back-end
1011passes will actually be a proof of preservation of the structure of
1012the structured traces that, because of property \ref{prop5}, will imply
1013correctness of the cost prediction for the back-end. The Clight to RTLabs
1014will also include a proof that associates to each converging execution its
1015converging structured trace and to each diverging execution its diverging
1016structured trace.
1017
1018There are also other two issues that invalidate the naive statement of
1019correctness of cost prediciton given above. The algorithm that statically
1020computes the cost of blocks is correct only when the object code is \emph{well
1021formed} and the program counter is \emph{reachable}.
1022A well formed object code is such that
1023the program counter will never overflow after the execution step of
1024the processor. An overflow that occurs during fetching but is overwritten
1025during execution is, however, correct and necessary to accept correct
1026programs that are as large as the program memory. Temporary overflows add
1027complications to the proof. A reachable address is an address that can be
1028obtained by fetching (not executing!) a finite number of times from the
1029beginning of the code memory without ever overflowing. The complication is that
1030the static prediction traverses the code memory assuming that the memory will
1031be read sequentially from the beginning and that all jumps jump only to
1032reachable addresses. When this property is violated, the way the code memory
1033is interpreted is uncorrect and the cost computed is totally meaningless.
1034The reachability relation is closed by fetching for well formed programs.
1035The property that calls to function pointers only target reachable and
1036well labelled locations, however, is not statically predictable and it is
1037enforced in the structured trace.
1038
1039The proof of correctness of cost predictions has been quite complex. Setting
1040up the good invariants (structured traces, well formed programs, reachability)
1041and completing the proof has required more than 3 men months while the initally
1042estimated effort was much lower. In the paper-and-pencil proof for IMP, the
1043corresponding proof was obvious and only took two lines.
1044
1045The proof itself is quite involved. We
1046basically need to show as an important lemma that the sum of the execution
1047costs over a structured trace, where the costs are summed in execution order,
1048is equivalent to the sum of the execution costs in the order of pre-payment.
1049The two orders are quite different and the proof is by mutual recursion over
1050the definition of the converging structured traces, which is a family of three
1051mutual inductive types. The fact that this property only holds for converging
1052function calls in hidden in the definition of the structured traces.
1053Then we need to show that the order of pre-payment
1054corresponds to the order induced by the cost traces extracted from the
1055structured trace. Finally, we need to show that the statically computed cost
1056for one block corresponds to the cost dinamically computed in pre-payment
1057order.
1058
1059\section{Overall results}
1060
1061Functional correctness of the compiled code can be shown by composing
1062the simulations to show that the target behaviour matches the
1063behaviour of the source program, if the source program does not `go
1064wrong'.  More precisely, we show that there is a forward simulation
1065between the source trace and a (flattened structured) trace of the
1066output, and conclude equivalence because the target's semantics are
1067in the form of an executable function, and hence
1068deterministic.
1069
1070Combining this with the correctness of the assignment of costs to cost
1071labels at the ASM level for a structured trace, we can show that the
1072cost of executing any compiled function (including the main function)
1073is equal to the sum of all the values for cost labels encountered in
1074the \emph{source code's} trace of the function.
1075
1076\section{Estimated effort}
1077Based on the rough analysis performed so far we can estimate the total
1078effort for the certification of the compiler. We obtain this estimation by
1079combining, for each pass: 1) the number of lines of code to be certified;
10802) the ratio of number of lines of proof to number of lines of code from
1081the CompCert project~\cite{compcert} for the CompCert pass that is closest to
1082ours; 3) an estimation of the complexity of the pass according to the
1083analysis above. The result is shown in Table~\ref{table}.
1084
1085\begin{table}{h}
1086\begin{tabular}{lrlrr}
1087Pass origin & Code lines & CompCert ratio & Estimated effort & Estimated effort \\
1088            &            &                & (based on CompCert) & \\
1089\hline
1090Common &  4864 & 4.25 \permil & 20.67 & 17.0 \\
1091Cminor &  1057 & 5.23 \permil & 5.53  &  6.0 \\
1092Clight &  1856 & 5.23 \permil & 9.71  & 10.0 \\ 
1093RTLabs &  1252 & 1.17 \permil & 1.48  &  5.0 \\
1094RTL    &   469 & 4.17 \permil & 1.95  &  2.0 \\
1095ERTL   &   789 & 3.01 \permil & 2.38  & 2.5 \\
1096LTL    &    92 & 5.94 \permil & 0.55  & 0.5 \\
1097LIN    &   354 & 6.54 \permil & 2.31  &   1.0 \\
1098ASM    &   984 & 4.80 \permil & 4.72  &  10.0 \\
1099\hline
1100Total common    &  4864 & 4.25 \permil & 20.67 & 17.0 \\
1101Total front-end &  2913 & 5.23 \permil & 15.24 & 16.0 \\
1102Total back-end  &  6853 & 4.17 \permil & 13.39 & 21.0 \\
1103\hline
1104Total           & 14630 & 3.75 \permil & 49.30 & 54.0 \\
1105\end{tabular}
1106\caption{\label{table}Estimated effort}
1107\end{table}
1108
1109We provide now some additional informations on the methodology used in the
1110computation. The passes in Cerco and CompCert front-end closely match each
1111other. However, there is no clear correspondence between the two back-ends.
1112For instance, we enforce the calling convention immediately after instruction
1113selection, whereas in CompCert this is performed in a later phase. Or we
1114linearize the code at the very end, whereas CompCert performs linearization
1115as soon as possible. Therefore, the first part of the exercise has consisted
1116in shuffling and partitioning the CompCert code in order to assign to each
1117CerCo pass the CompCert code that performs the same transformation.
1118
1119After this preliminary step, using the data given in~\cite{compcert} (which
1120are relative to an early version of CompCert) we computed the ratio between
1121men months and lines of code in CompCert for each CerCo pass. This is shown
1122in the third column of Table~\ref{wildguess}. For those CerCo passes that
1123have no correspondence in CompCert (like the optimizing assembler) or where
1124we have insufficient data, we have used the average of the ratios computed
1125above.
1126
1127The first column of the table shows the number of lines of code for each
1128pass in CerCo. The third column is obtained multiplying the first with the
1129CompCert ratio. It provides an estimate of the effort required (in men months)
1130if the complexity of the proofs for CerCo and Compcert would be the same.
1131
1132The two proof styles, however, are on purpose completely different. Where
1133CompCert uses non executable semantics, describing the various semantics with
1134inductive types, we have preferred executable semantics. Therefore, CompCert
1135proofs by induction and inversion become proof by functional inversion,
1136performed using the Russel methodology (now called Program in Coq, but whose
1137behaviour differs from Matita's one). Moreover, CompCert code is written using
1138only types that belong to the Hindley-Milner fragment, whereas we have
1139heavily exploited dependent types all over the code. The dependent type
1140discipline offers many advantages from the point of view of clarity of the
1141invariants involved and early detection of errors and it naturally combines
1142well with the Russel approach which is based on dependent types. However, it
1143is also well known to introduce technical problems all over the code, like
1144the need to explicitly prove type equalities to be able to manipulate
1145expressions in certain ways. In many situations, the difficulties encountered
1146with manipulating dependent types are better addressed by improving the Matita
1147system, according to the formalization driven system development. For this
1148reason, and assuming a pessimistic point of view on our performance, the
1149fourth columns presents the final estimation of the effort required, that also
1150takes in account the complexity of the proof suggested by the informal proofs
1151sketched in the previous section.
1152
1153\subsection{Contingency plan}
1154On the basis of the proof strategy sketched in this document and the
1155estimated effort, we can refine our contingency plan. In case we will end
1156the certification of the basic compiler in advance we will have the choice
1157of either proving loop optimizations and/or partial redundancy elimination
1158correct (both tasks that seem difficult to achieve in a short time) or
1159considering the MCS-51 specific extensions introduced during the first period
1160and under-used in the formalized prototype. Yet another possibility would be
1161to better study retargeting of the code and the commutation property between
1162different compiler passes. The latter study is easily enabled by our
1163approach where all back-end languages are instances of the same parameterized
1164language.
1165
1166In the case of a consistent delay in the certification of some
1167components, we will address first the passes that are more likely to have
1168undetected bugs and we will follow a top-down approach, axiomatizing
1169the properties of the data structured used in the compiler to focus more
1170on the algorithms. The rational is that data structures are easier then
1171algorithms to test using well known methodologies.
1172The effort table clearly shows that commond definitions
1173and data structures are 1/4th of the size of the code and require slightly
1174less than 1/3rd of the total effort. At least half of this effort really goes
1175into simple data structures (vectors, bounded and unbounded integers, tries
1176and maps) whose certification is not interesting and whose code could be
1177taken without re-proving it from the library of any other theorem prover.
1178
1179\section{Conclusions}
1180The overall exercise, whose details have been only minimally reported here,
1181has been very useful. It has allowed to spot in an early moment some criticities
1182of the proof that have required major changes in the proof plan. It has also
1183shown that the last passes of the compilation (e.g. assembly) and cost
1184prediction on the object code are much more involved than more high level
1185passes.
1186
1187The final estimation for the effort is surely affected by a low degree of
1188confidence. It is however sufficient to conclude that the effort required
1189is in line with the man power that was scheduled for the second half of the
1190second period and for the third period. Compared to the number of men months
1191declared in Annex I of the contract, we will need more men months. However,
1192both at UNIBO and UEDIN there have been major differences in hiring with
1193respect to the Annex. Therefore both sites have now an higher number of men
1194months available, with the trade-off of a lower level of maturity of the
1195people employed.
1196
1197The reviewers suggested that we use this estimation to compare two possible
1198scenarios: a) proceed as planned, porting all the CompCert proofs to Matita
1199or b) port D3.1 and D4.1 to Coq and re-use the CompCert proofs.
1200We remark here again that the back-end of the two compilers, from the
1201memory model on, are sensibly different: we are not re-proving correctness
1202of the same piece of code. Moreover, the proof techniques are different for
1203the front-end too. Switching to the CompCert formalization would imply
1204the abandon of the untrusted compiler, the abandon of the experiment with
1205a different proof technique, the abandon of the quest for an open source
1206proof, and the abandon of the co-development of the formalization and the
1207Matita proof assistant. In the Commitment Letter~\cite{letter} delivered
1208to the Officer in May we clarified our personal perspective on the project
1209goals and objectives. We do not re-describe here the point of view presented
1210in the letter that we can condense in ``we value diversity''.
1211
1212Clearly, if the execise would have suggested the infeasability in terms of
1213effort of concluding the formalization or getting close to that, we would have
1214abandoned our path and embraced the reviewer's suggestion. However, we
1215have been comforted in the analysis we did in autumn and further progress done
1216during the winter does not show yet any major delay with respect to the
1217proof schedule. We are thus planning to continue the certification according
1218to the more detailed proof plan that came out from the exercise reported in
1219this manuscript.
1220
1221\begin{thebibliography}{2}
1222\bibitem{compcert} X. Leroy, ``A Formally Verified Compiler back-end'',
1223Journal of Automated Reasoning 43(4)):363-446, 2009.
1224
1225\bibitem{letter}The CerCo team, ``Commitment to the Consideration of Reviewer's Reccomendation'', 16/05/2011.
1226\end{thebibliography}
1227
1228
1229\end{document}
Note: See TracBrowser for help on using the repository browser.