source: Deliverables/D1.2/CompilerProofOutline/outline.tex @ 1770

Last change on this file since 1770 was 1770, checked in by mulligan, 8 years ago

...

File size: 54.2 KB
Line 
1\documentclass[a4paper, 10pt]{article}
2
3\usepackage{a4wide}
4\usepackage{amsfonts}
5\usepackage{amsmath}
6\usepackage{amssymb}
7\usepackage[english]{babel}
8\usepackage{color}
9\usepackage{diagrams}
10\usepackage{graphicx}
11\usepackage[colorlinks]{hyperref}
12\usepackage[utf8x]{inputenc}
13\usepackage{listings}
14\usepackage{microtype}
15\usepackage{skull}
16\usepackage{stmaryrd}
17\usepackage{array}
18\newcolumntype{b}{@{}>{{}}}
19\newcolumntype{B}{@{}>{{}}c<{{}}@{}}
20\newcolumntype{h}[1]{@{\hspace{#1}}}
21\newcolumntype{L}{>{$}l<{$}}
22\newcolumntype{C}{>{$}c<{$}}
23\newcolumntype{R}{>{$}r<{$}}
24\newcolumntype{S}{>{$(}r<{)$}}
25\newcolumntype{n}{@{}}
26\usepackage{wasysym}
27
28
29\lstdefinelanguage{matita-ocaml} {
30  mathescape=true
31}
32\lstset{
33  language=matita-ocaml,basicstyle=\tt,columns=flexible,breaklines=false,
34  showspaces=false, showstringspaces=false, extendedchars=false,
35  inputencoding=utf8x, tabsize=2
36}
37
38\title{Proof outline for the correctness of the CerCo compiler}
39\date{\today}
40\author{The CerCo team}
41
42\begin{document}
43
44\maketitle
45
46\section{Introduction}
47\label{sect.introduction}
48
49In the last project review of the CerCo project, the project reviewers
50recommended us to quickly outline a paper-and-pencil correctness proof
51for each of the stages of the CerCo compiler in order to allow for an
52estimation of the complexity and time required to complete the formalization
53of the proof. This has been possible starting from month 18 when we have
54completed the formalization in Matita of the datastructures and code of
55the compiler.
56
57In this document we provide a very high-level, pen-and-paper
58sketch of what we view as the best path to completing the correctness proof
59for the compiler. In particular, for every translation between two intermediate languages, in both the front- and back-ends, we identify the key translation steps, and identify some invariants that we view as being important for the correctness proof.  We sketch the overall correctness results, and also briefly describe the parts of the proof that have already
60been completed at the end of the First Period.
61
62In the last section we finally present an estimation of the effort required
63for the certification in Matita of the compiler and we draw conclusions.
64
65\section{Front-end: Clight to RTLabs}
66
67The front-end of the CerCo compiler consists of several stages:
68
69\begin{center}
70\begin{minipage}{.8\linewidth}
71\begin{tabbing}
72\quad \= $\downarrow$ \quad \= \kill
73\textsf{Clight}\\
74\> $\downarrow$ \> cast removal\\
75\> $\downarrow$ \> add runtime functions\footnote{Following the last project
76meeting we intend to move this transformation to the back-end}\\
77\> $\downarrow$ \> cost labelling\\
78\> $\downarrow$ \> loop optimizations\footnote{\label{lab:opt2}To be ported from the untrusted compiler and certified only in case of early completion of the certification of the other passes.} (an endo-transformation)\\
79\> $\downarrow$ \> partial redundancy elimination$^{\mbox{\scriptsize \ref{lab:opt2}}}$ (an endo-transformation)\\
80\> $\downarrow$ \> stack variable allocation and control structure
81 simplification\\
82\textsf{Cminor}\\
83\> $\downarrow$ \> generate global variable initialisation code\\
84\> $\downarrow$ \> transform to RTL graph\\
85\textsf{RTLabs}\\
86\> $\downarrow$ \> \\
87\>\,\vdots
88\end{tabbing}
89\end{minipage}
90\end{center}
91
92Here, by `endo-transformation', we mean a mapping from language back to itself:
93the loop optimization step maps the Clight language to itself.
94
95%Our overall statements of correctness with respect to costs will
96%require a correctly labelled program
97There are three layers in most of the proofs proposed:
98\begin{enumerate}
99\item invariants closely tied to the syntax and transformations using
100  dependent types (such as the presence of variable names in environments),
101\item a forward simulation proof relating each small-step of the
102  source to zero or more steps of the target, and
103\item proofs about syntactic properties of the cost labelling.
104\end{enumerate}
105The first will support both functional correctness and allow us to
106show the totality of some of the compiler stages (that is, those
107stages of the compiler cannot fail).  The second provides the main
108functional correctness result, including the preservation of cost
109labels in the traces, and the last will be crucial for applying
110correctness results about the costings from the back-end by showing
111that they appear in enough places so that we can assign all of the
112execution costs to them.
113
114We will also prove that a suitably labelled RTLabs trace can be turned
115into a \emph{structured trace} which splits the execution trace into
116cost-label to cost-label chunks with nested function calls.  This
117structure was identified during work on the correctness of the
118back-end cost analysis as retaining important information about the
119structure of the execution that is difficult to reconstruct later in
120the compiler.
121
122\subsection{Clight cast removal}
123
124This transformation removes some casts inserted by the parser to make
125arithmetic promotion explicit but which are superfluous (such as
126\lstinline[language=C]'c = (short)((int)a + (int)b);' where
127\lstinline'a' and \lstinline'b' are \lstinline[language=C]'short').
128This is necessary for producing good code for our target architecture.
129
130It only affects Clight expressions, recursively detecting casts that
131can be safely eliminated.  The semantics provides a big-step
132definition for expression, so we should be able to show a lock-step
133forward simulation between otherwise identical states using a lemma
134showing that cast elimination does not change the evaluation of
135expressions.  This lemma will follow from a structural induction on
136the source expression.  We have already proved a few of the underlying
137arithmetic results necessary to validate the approach.
138
139\subsection{Clight cost labelling}
140
141This adds cost labels before and after selected statements and
142expressions, and the execution traces ought to be equivalent modulo
143the new cost labels.  Hence it requires a simple forward simulation
144with a limited amount of stuttering whereever a new cost label is
145introduced.  A bound can be given for the amount of stuttering allowed
146based on the statement or continuation to be evaluated next.
147
148We also intend to show three syntactic properties about the cost
149labelling:
150\begin{enumerate}
151\item every function starts with a cost label,
152\item every branching instruction is followed by a cost label (note that
153  exiting a loop is treated as a branch), and
154\item the head of every loop (and any \lstinline'goto' destination) is
155  a cost label.
156\end{enumerate}
157These can be shown by structural induction on the source term.
158
159\subsection{Clight to Cminor translation}
160
161This translation is the first to introduce some invariants, with the
162proofs closely tied to the implementation by dependent typing.  These
163are largely complete and show that the generated code enjoys:
164\begin{itemize}
165\item some minimal type safety shown by explicit checks on the
166  Cminor types during the transformation (a little more work remains
167  to be done here, but follows the same form);
168\item that variables named in the parameter and local variable
169  environments are distinct from one another, again by an explicit
170  check;
171\item that variables used in the generated code are present in the
172  resulting environment (either by checking their presence in the
173  source environment, or from a list of freshly generated temporary variables);
174  and
175\item that all \lstinline[language=C]'goto' labels are present (by
176  checking them against a list of source labels and proving that all
177  source labels are preserved).
178\end{itemize}
179
180The simulation will be similar to the relevant stages of CompCert
181(Clight to Csharpminor and Csharpminor to Cminor --- in the event that
182the direct proof is unwieldy we could introduce an intermediate
183language corresponding to Csharpminor).  During early experimentation
184with porting CompCert definitions to the Matita proof assistant we
185found little difficulty reproving the results for the memory model, so
186we plan to port the memory injection properties and use them to relate
187Clight in-memory variables with either the value of the local variable or a
188stack slot, depending on how it was classified.
189
190This should be sufficient to show the equivalence of (big-step)
191expression evaluation.  The simulation can then be shown by relating
192corresponding blocks of statement and continuations with their Cminor
193counterparts and proving that a few steps reaches the next matching
194state.
195
196The syntactic properties required for cost labels remain similar and a
197structural induction on the function bodies should be sufficient to
198show that they are preserved.
199
200\subsection{Cminor global initialisation code}
201
202This short phase replaces the global variable initialisation data with
203code that executes when the program starts.  Each piece of
204initialisation data in the source is matched by a new statement
205storing that data.  As each global variable is allocated a distinct
206memory block, the program state after the initialisation statements
207will be the same as the original program's state at the start of
208execution, and will proceed in the same manner afterwards.
209
210% Actually, the above is wrong...
211% ... this ought to be in a fresh main function with a fresh cost label
212
213\subsection{Cminor to RTLabs translation}
214
215In this part of the compiler we transform the program's functions into
216control flow graphs.  It is closely related to CompCert's Cminorsel to
217RTL transformation, albeit with target-independent operations.
218
219We already enforce several invariants with dependent types: some type
220safety, mostly shown using the type information from Cminor; and
221that the graph is closed (by showing that each successor was recently
222added, or corresponds to a \lstinline[language=C]'goto' label which
223are all added before the end).  Note that this relies on a
224monotonicity property; CompCert maintains a similar property in a
225similar way while building RTL graphs.  We will also add a result
226showing that all of the pseudo-register names are distinct for use by
227later stages using the same method as Cminor.
228
229The simulation will relate Cminor states to RTLabs states which are about to
230execute the code corresponding to the Cminor statement or continuation.
231Each Cminor statement becomes zero or more RTLabs statements, with a
232decreasing measure based on the statement and continuations similar to
233CompCert's.  We may also follow CompCert in using a relational
234specification of this stage so as to abstract away from the functional
235(and highly dependently typed) definition.
236
237The first two labelling properties remain as before; we will show that
238cost labels are preserved, so the function entry point will be a cost
239label, and successors to any statement that are cost labels map still
240map to cost labels, preserving the condition on branches.  We replace
241the property for loops with the notion that we will always reach a
242cost label or the end of the function after following a bounded number of
243successors.  This can be easily seen in Cminor using the requirement
244for cost labels at the head of loops and after gotos.  It remains to
245show that this is preserved by the translation to RTLabs.  % how?
246
247\subsection{RTLabs structured trace generation}
248
249This proof-only step incorporates the function call structure and cost
250labelling properties into the execution trace.  As the function calls
251are nested within the trace, we need to distinguish between
252terminating and non-terminating function calls.  Thus we use the
253excluded middle (specialised to a function termination property) to do
254this.
255
256Structured traces for terminating functions are built by following the
257flat trace, breaking it into chunks between cost labels and
258recursively processing function calls.  The main difficulties here are
259the non-structurally recursive nature of the function (instead we use
260the size of the termination proof as a measure) and using the RTLabs
261cost labelling properties to show that the constraints of the
262structured traces are observed.  We also show that the lower stack
263frames are preserved during function calls in order to prove that
264after returning from a function call we resume execution of the
265correct code.  This part of the work has already been constructed, but
266still requires a simple proof to show that flattening the structured
267trace recreates the original flat trace.
268
269The non-terminating case follows the trace like the terminating
270version to build up chunks of trace from cost-label to cost-label
271(which, by the finite distance to a cost label property shown before,
272can be represented by an inductive type).  These chunks are chained
273together in a coinductive data structure that can represent
274non-terminating traces.  The excluded middle is used to decide whether
275function calls terminate, in which case the function described above
276constructs an inductive terminating structured trace which is nested
277in the caller's trace.  Otherwise, another coinductive constructor is
278used to embed the non-terminating trace of the callee, generated by
279corecursion.  This part of the trace transformation is currently under
280construction, and will also need a flattening result to show that it
281is correct.
282
283
284\section{Backend: RTLabs to machine code}
285\label{sect.backend.rtlabs.machine.code}
286
287The compiler backend consists of the following intermediate languages, and stages of translation:
288
289\begin{center}
290\begin{minipage}{.8\linewidth}
291\begin{tabbing}
292\quad \=\,\vdots\= \\
293\> $\downarrow$ \>\\
294\> $\downarrow$ \quad \= \kill
295\textsf{RTLabs}\\
296\> $\downarrow$ \> copy propagation\footnote{\label{lab:opt}To be ported from the untrusted compiler and certified only in case of early completion of the certification of the other passes.} (an endo-transformation) \\
297\> $\downarrow$ \> instruction selection\\
298\> $\downarrow$ \> change of memory models in compiler\\
299\textsf{RTL}\\
300\> $\downarrow$ \> constant propagation$^{\mbox{\scriptsize \ref{lab:opt}}}$ (an endo-transformation) \\
301\> $\downarrow$ \> calling convention made explicit \\
302\> $\downarrow$ \> layout of activation records \\
303\textsf{ERTL}\\
304\> $\downarrow$ \> register allocation and spilling\\
305\> $\downarrow$ \> dead code elimination\\
306\textsf{LTL}\\
307\> $\downarrow$ \> function linearisation\\
308\> $\downarrow$ \> branch compression (an endo-transformation) \\
309\textsf{LIN}\\
310\> $\downarrow$ \> relabeling\\
311\textsf{ASM}\\
312\> $\downarrow$ \> pseudoinstruction expansion\\
313\textsf{MCS-51 machine code}\\
314\end{tabbing}
315\end{minipage}
316\end{center}
317
318\subsection{Graph translations}
319
320\subsection{The RTLabs to RTL translation}
321\label{subsect.rtlabs.rtl.translation}
322
323The RTLabs to RTL translation pass marks the frontier between the two memory models used in the CerCo project.
324As a result, we require some method of translating between the values that the two memory models permit.
325Suppose we have such a translation, $\sigma$.
326Then the translation between values of the two memory models may be pictured with:
327
328\begin{displaymath}
329\mathtt{Value} ::= \bot \mid \mathtt{int(size)} \mid \mathtt{float} \mid \mathtt{null} \mid \mathtt{ptr} \quad\stackrel{\sigma}{\longrightarrow}\quad \mathtt{BEValue} ::= \bot \mid \mathtt{byte} \mid \mathtt{null}_i \mid \mathtt{ptr}_i
330\end{displaymath}
331
332In the front-end, we have both integer and float values, where integer values are `sized', along with null values and pointers. Some frontenv values are
333representables in a byte, but some others require more bits.
334
335In the back-end model all values are meant to be represented in a single byte.
336Values can thefore be undefined, be one byte long integers or be indexed
337fragments of a pointer, null or not. Floats values are no longer present, as floating point arithmetic is not supported by the CerCo compiler.
338
339The $\sigma$ map implements a one-to-many relation: a single front-end value
340is mapped to a sequence of back-end values when its size is more then one byte.
341
342We further require a map, $\sigma$, which maps the front-end \texttt{Memory} and the back-end's notion of \texttt{BEMemory}. Both kinds of memory can be
343thought as an instance of a generic \texttt{Mem} data type parameterized over
344the kind of values stored in memory.
345
346\begin{displaymath}
347\mathtt{Mem}\ \alpha = \mathtt{Block} \rightarrow (\mathbb{Z} \rightarrow \alpha)
348\end{displaymath}
349
350Here, \texttt{Block} consists of a \texttt{Region} paired with an identifier.
351
352\begin{displaymath}
353\mathtt{Block} ::= \mathtt{Region} \times \mathtt{ID}
354\end{displaymath}
355
356We now have what we need for defining what is meant by the `memory' in the backend memory model.
357Namely, we instantiate the previously defined \texttt{Mem} type with the type of back-end memory values.
358
359\begin{displaymath}
360\mathtt{BEMem} = \mathtt{Mem}~\mathtt{BEValue}
361\end{displaymath}
362
363Memory addresses consist of a pair of back-end memory values:
364
365\begin{displaymath}
366\mathtt{Address} = \mathtt{BEValue} \times  \mathtt{BEValue} \\
367\end{displaymath}
368
369The back- and front-end memory models differ in how they represent sized integeer values in memory.
370In particular, the front-end stores integer values as a header, with size information, followed by a string of `continuation' blocks, marking out the full representation of the value in memory.
371In contrast, the layout of sized integer values in the back-end memory model consists of a series of byte-sized `chunks':
372
373\begin{center}
374\begin{picture}(0, 25)
375\put(-125,0){\framebox(25,25)[c]{\texttt{v,4}}}
376\put(-100,0){\framebox(25,25)[c]{\texttt{cont}}}
377\put(-75,0){\framebox(25,25)[c]{\texttt{cont}}}
378\put(-50,0){\framebox(25,25)[c]{\texttt{cont}}}
379\put(-15,10){\vector(1, 0){30}}
380\put(25,0){\framebox(25,25)[c]{\texttt{\texttt{v$_1$}}}}
381\put(50,0){\framebox(25,25)[c]{\texttt{\texttt{v$_2$}}}}
382\put(75,0){\framebox(25,25)[c]{\texttt{\texttt{v$_3$}}}}
383\put(100,0){\framebox(25,25)[c]{\texttt{\texttt{v$_4$}}}}
384\end{picture}
385\end{center}
386
387Chunks for pointers are pairs made of the original pointer and the index of the chunk.
388Therefore, when assembling the chunks together, we can always recognize if all chunks refer to the same value or if the operation is meaningless.
389
390The differing memory representations of values in the two memory models imply the need for a series of lemmas on the actions of \texttt{load} and \texttt{store} to ensure correctness.
391The first lemma required has the following statement:
392\begin{displaymath}
393\mathtt{load}\ s\ a\ M = \mathtt{Some}\ v \rightarrow \forall i \leq s.\ \mathtt{load}\ s\ (a + i)\ \sigma(M) = \mathtt{Some}\ v_i
394\end{displaymath}
395That is, if we are successful in reading a value of size $s$ from memory at address $a$ in front-end memory, then we should successfully be able to read all of its chunks from memory in the back-end memory at appropriate address (from address $a$ up to and including address $a + i$, where $i \leq s$).
396
397Next, we must show that \texttt{store} properly commutes with the $\sigma$-map between memory spaces:
398\begin{displaymath}
399\sigma(\mathtt{store}\ a\ v\ M) = \mathtt{store}\ \sigma(v)\ \sigma(a)\ \sigma(M)
400\end{displaymath}
401That is, if we store a value \texttt{v} in the front-end memory \texttt{M} at address \texttt{a} and transform the resulting memory with $\sigma$, then this is equivalent to storing a transformed value $\mathtt{\sigma(v)}$ at address $\mathtt{\sigma(a)}$ into the back-end memory $\mathtt{\sigma(M)}$.
402
403Finally, the commutation properties between \texttt{load} and \texttt{store} are weakened in the $\sigma$-image of the memory.
404Writing \texttt{load}$^*$ for the multiple consecutive iterations of \texttt{load} used to fetch all chunks of a value, we must prove that, when $a \neq a'$:
405\begin{displaymath}
406\texttt{load}^* \sigma(a)\ (\mathtt{store}\ \sigma(a')\ \sigma(v)\ \sigma(M)) = \mathtt{load}^*\ \sigma(s)\ \sigma(a)\ \sigma(M)
407\end{displaymath}
408That is, suppose we store a transformed value $\mathtt{\sigma(v)}$ into a back-end memory $\mathtt{\sigma(M)}$ at address $\mathtt{\sigma(a')}$, using \texttt{store}, and then load from the address $\sigma(a)$. Even if $a$ and $a'$ are
409distinct by hypothesis, there is a priori no guarantee that the consecutive
410bytes for the value stored at $\sigma(a)$ are disjoint from those for the
411values stored at $\sigma(a')$. The fact that this holds is a non-trivial
412property of $\sigma$ to be proved.
413
414RTLabs states come in three flavours:
415\begin{displaymath}
416\begin{array}{rll}
417\mathtt{State} & ::=  & (\mathtt{State} : \mathtt{Frame}^* \times \mathtt{Frame} \\
418               & \mid & \mathtt{Call} : \mathtt{Frame}^* \times \mathtt{Args} \times \mathtt{Return} \times \mathtt{Fun} \\
419               & \mid & \mathtt{Return} : \mathtt{Frame}^* \times \mathtt{Value} \times \mathtt{Return}) \times \mathtt{Mem}
420\end{array}
421\end{displaymath}
422\texttt{State} is the default state in which RTLabs programs are almost always in.
423The \texttt{Call} state is only entered when a call instruction is being executed, and then we immediately return to being in \texttt{State}.
424Similarly, \texttt{Return} is only entered when a return instruction is being executed, before returning immediately to \texttt{State}.
425All RTLabs states are accompanied by a memory, \texttt{Mem}, with \texttt{Call} and \texttt{Return} keeping track of arguments, return addresses and the results of functions.
426\texttt{State} keeps track of a list of stack frames.
427
428RTL states differ from their RTLabs counterparts, in including a program counter \texttt{PC}, stack-pointer \texttt{SP}, internal stack pointer \texttt{ISP}, a carry flag \texttt{CARRY} and a set of registers \texttt{REGS}:
429\begin{displaymath}
430\mathtt{State} ::= \mathtt{Frame}^* \times \mathtt{PC} \times \mathtt{SP} \times \mathtt{ISP} \times \mathtt{CARRY} \times \mathtt{REGS}
431\end{displaymath}
432The internal stack pointer \texttt{ISP}, and its relationship with the stack pointer \texttt{SP}, needs some comment.
433Due to the design of the MCS-51, and its minuscule stack, it was decided that the compiler would implement an emulated stack in external memory.
434As a result, we have two stack pointers in our state: \texttt{ISP}, which is the real, hardware stack, and \texttt{SP}, which is the stack pointer of the emulated stack in memory.
435The emulated stack is used for pushing and popping stack frames when calling or returning from function calls, however this is done using the hardware stack, indexed by \texttt{ISP} as an intermediary.
436Instructions like \texttt{LCALL} and \texttt{ACALL} are hardwired by the processor's design to push the return address on to the hardware stack. Therefore after a call has been made, and before a call returns, the compiler emits code to move the return address back and forth the two stacks. Parameters, return values
437and local variables are only present in the external stack.
438As a result, for most of the execution of the processor, the hardware stack is empty, or contains a single item ready to be moved into external memory.
439
440Once more, we require a relation $\sigma$ between RTLabs states and RTL states.
441Because $\sigma$ is one-to-many and, morally, a multi-function,
442we use in the following the functional notation for $\sigma$, using $\star$
443in the output of $\sigma$ to mean that any value is accepted.
444\begin{displaymath}
445\mathtt{State} \stackrel{\sigma}{\longrightarrow} \mathtt{State}
446\end{displaymath}
447
448Translating an RTLabs state to an RTL state proceeds by cases on the particular type of state we are trying to translate, either a \texttt{State}, \texttt{Call} or a \texttt{Return}.
449For \texttt{State} we perform a further case analysis of the top stack frame, which decomposes into a tuple holding the current program counter value, the current stack pointer and the value of the registers:
450\begin{displaymath}
451\sigma(\mathtt{State} (\mathtt{Frame}^* \times \mathtt{\langle PC, REGS, SP \rangle})) \longrightarrow ((\sigma(\mathtt{Frame}^*), \sigma(\mathtt{PC}), \sigma(\mathtt{SP}), \star, \star, \sigma(\mathtt{REGS})), \sigma(\mathtt{Mem}))
452\end{displaymath}
453Translation then proceeds by translating the remaining stack frames, as well as the contents of the top stack frame. Any value for the internal stack pointer
454and the carry bit is admitted.
455
456Translating \texttt{Call} and \texttt{Return} states is more involved, as a commutation between a single step of execution and the translation process must hold:
457\begin{displaymath}
458\sigma(\mathtt{Return}(-)) \longrightarrow \sigma \circ \text{return one step}
459\end{displaymath}
460
461\begin{displaymath}
462\sigma(\mathtt{Call}(-)) \longrightarrow \sigma \circ \text{call one step}
463\end{displaymath}
464
465Here \emph{return one step} and \emph{call one step} refer to a pair of commuting diagrams relating the one-step execution of a call and return state and translation of both.
466We provide the one step commuting diagrams in Figure~\ref{fig.commuting.diagrams}. The fact that one execution step in the source language is not performed
467in the target language is not problematic for preservation of divergence
468because it is easy to show that every step from a \texttt{Call} or
469\texttt{Return} state is always preceeded/followed by one step that is always
470simulated.
471
472\begin{figure}
473\begin{displaymath}
474\begin{diagram}
475s & \rTo^{\text{one step of execution}} & s'   \\
476  & \rdTo                             & \dTo \\
477  &                                   & \llbracket s'' \rrbracket
478\end{diagram}
479\end{displaymath}
480
481\begin{displaymath}
482\begin{diagram}
483s & \rTo^{\text{one step of execution}} & s'   \\
484  & \rdTo                             & \dTo \\
485  &                                   & \llbracket s'' \rrbracket
486\end{diagram}
487\end{displaymath}
488\caption{The one-step commuting diagrams for \texttt{Call} and \texttt{Return} state translations}
489\label{fig.commuting.diagrams}
490\end{figure}
491
492The forward simulation proof for all steps that do not involve function
493calls are lengthy, but unsurprising. They consists in simulating a
494front-end operation on front-end pseudo-registers and the front-end memory
495with sequences of back-end operations on the back-end pseudo-registers
496and back-end memory. The properties of $\sigma$ presented before that relate
497values and memories needs to be heavily exploited.
498
499The simulation of invocation of functions and returns from functions is
500less obvious. We sketch here what happens on the source code and on its
501translation.
502
503\begin{displaymath}
504\begin{array}{rcl}
505\mathtt{Call(id,\ args,\ dst,\ pc),\ State(Frame^*, Frame)} & \longrightarrow & \mathtt{Call(M(args), dst)}, \\
506                                                           &                 & \mathtt{PUSH(Frame[PC := after\_return])}
507\end{array}
508\end{displaymath}
509Suppose we are given a \texttt{State} with a list of stack frames, with the top frame being \texttt{Frame}.
510Suppose also that the program counter in \texttt{Frame} points to a \texttt{Call} instruction, complete with arguments and destination address.
511Then this is executed by entering into a \texttt{Call} state were the arguments are looked up in memory, and the destination address is filled in, with the current stack frame being pushed on top of the stack with the return address substituted for the program counter.
512
513Now, what happens next depends on whether we are executing an internal or an external function.
514In the case where the call is to an external function, we have:
515\begin{displaymath}
516\begin{array}{rcl}
517\mathtt{Call(M(args), dst)},                       & \stackrel{\mathtt{ret\_val = f(M(args))}}{\longrightarrow} & \mathtt{Return(ret\_val,\ dst,\ PUSH(...))} \\
518\mathtt{PUSH(current\_frame[PC := after\_return])} &                                                            & 
519\end{array}
520\end{displaymath}
521That is, the call to the external function enters a return state after first computing the return value by executing the external function on the arguments.
522Then the return state restores the program counter by popping the stack, and execution proceeds in a new \texttt{State}:
523\begin{displaymath}
524\begin{array}{rcl}
525\mathtt{Return(ret\_val,\ dst,\ PUSH(...))} & \longrightarrow & \mathtt{pc = POP\_STACK(regs[dst := M(ret\_val)],\ pc)} \\
526                                            &                 & \mathtt{State(regs[dst := M(ret\_val),\ pc)}
527\end{array}
528\end{displaymath}
529
530Suppose we are executing an internal function, however:
531\begin{displaymath}
532\begin{array}{rcl}
533\mathtt{Call(M(args), dst)}                        & \longrightarrow & \mathtt{SP = alloc}, regs = \emptyset[- := params] \\
534\mathtt{PUSH(current\_frame[PC := after\_return])} &                 & \mathtt{State}(regs,\ sp,\ pc_\emptyset,\ dst)
535\end{array}
536\end{displaymath}
537Here, execution of the \texttt{Call} state first pushes the current frame with the program counter set to the address following the function call.
538The stack pointer allocates more space, the register map is initialized first to the empty map, assigning an undefined value to all register, before the value of the parameters is inserted into the map into the argument registers, and a new \texttt{State} follows.
539After this, the stack pointer is freed and a \texttt{Return} state is entered:
540\begin{displaymath}
541\begin{array}{rcl}
542\mathtt{sp = alloc}, regs = \emptyset[- := PARAMS] & \longrightarrow & \mathtt{free(sp)} \\
543\mathtt{State}(regs,\ sp,\ pc_\emptyset,\ dst)     &                 & \mathtt{Return(M(ret\_val), dst, Frames)}
544\end{array}
545\end{displaymath}
546Then the return state restores the program counter by popping the stack, and execution proceeds in a new \texttt{State}, like the case for external functions:
547\begin{displaymath}
548\begin{array}{rcl}
549\mathtt{free(sp)}                         & \longrightarrow & \mathtt{pc = POP\_STACK(regs[dst := M(ret\_val)],\ pc)} \\
550\mathtt{Return(M(ret\_val), dst, frames)} &                 & \mathtt{State(regs[dst := M(ret\_val),\ pc)}
551\end{array}
552\end{displaymath}
553
554\subsection{The RTL to ERTL translation}
555\label{subsect.rtl.ertl.translation}
556
557\begin{displaymath}
558\begin{diagram}
559& & \llbracket \mathtt{CALL\_ID}(\mathtt{id}, \mathtt{args}, \mathtt{dst}, \mathtt{pc})\rrbracket & & \\
560& \ldTo^{\text{external}} & & \rdTo^{\text{internal}} & \\
561\skull & & & & \mathtt{regs} = [\mathtt{params}/-] \\
562& & & & \mathtt{sp} = \mathtt{ALLOC} \\
563& & & & \mathtt{PUSH}(\mathtt{carry}, \mathtt{regs}, \mathtt{dst}, \mathtt{return\_addr}), \mathtt{pc}_{0}, \mathtt{regs}, \mathtt{sp} \\
564\end{diagram}
565\end{displaymath}
566
567\begin{align*}
568\llbracket \mathtt{RETURN} \rrbracket \\
569\mathtt{return\_addr} & := \mathtt{top}(\mathtt{stack}) \\
570v*                    & := m(\mathtt{rv\_regs}) \\
571\mathtt{dst}, \mathtt{sp}, \mathtt{carry}, \mathtt{regs} & := \mathtt{pop} \\
572\mathtt{regs}[v* / \mathtt{dst}] \\
573\end{align*}
574
575\begin{displaymath}
576\begin{diagram}
577s    & \rTo^1 & s' & \rTo^1 & s'' \\
578\dTo &        &    & \rdTo  & \dTo \\
579\llbracket s \rrbracket & \rTo(1,3)^1 & & & \llbracket s'' \rrbracket \\ 
580\mathtt{CALL} \\
581\end{diagram}
582\end{displaymath}
583
584\begin{displaymath}
585\begin{diagram}
586s    & \rTo^1 & s' & \rTo^1 & s'' \\
587\dTo &        &    & \rdTo  & \dTo \\
588\  & \rTo(1,3) & & & \ \\
589\mathtt{RETURN} \\
590\end{diagram}
591\end{displaymath}
592
593\begin{displaymath}
594\mathtt{b\_graph\_translate}: (\mathtt{label} \rightarrow \mathtt{blist'})
595\rightarrow \mathtt{graph} \rightarrow \mathtt{graph}
596\end{displaymath}
597
598\begin{align*}
599\mathtt{theorem} &\ \mathtt{b\_graph\_translate\_ok}: \\
600& \forall  f.\forall G_{i}.\mathtt{let}\ G_{\sigma} := \mathtt{b\_graph\_translate}\ f\ G_{i}\ \mathtt{in} \\
601&       \forall l \in G_{i}.\mathtt{subgraph}\ (f\ l)\ l\ (\mathtt{next}\ l\ G_{i})\ G_{\sigma}
602\end{align*}
603
604\begin{align*}
605\mathtt{lemma} &\ \mathtt{execute\_1\_step\_ok}: \\
606&       \forall s.  \mathtt{let}\ s' := s\ \sigma\ \mathtt{in} \\
607&       \mathtt{let}\ l := pc\ s\ \mathtt{in} \\
608&       s \stackrel{1}{\rightarrow} s^{*} \Rightarrow \exists n. s' \stackrel{n}{\rightarrow} s'^{*} \wedge s'^{*} = s'\ \sigma
609\end{align*}
610
611\begin{align*}
612\mathrm{RTL\ status} & \ \ \mathrm{ERTL\ status} \\
613\mathtt{sp} & = \mathtt{spl} / \mathtt{sph} \\
614\mathtt{graph} &  \mathtt{graph} + \mathtt{prologue}(s) + \mathtt{epilogue}(s) \\
615& \mathrm{where}\ s = \mathrm{callee\ saved} + \nu \mathrm{RA} \\
616\end{align*}
617
618\begin{displaymath}
619\begin{diagram}
620\mathtt{CALL} & \rTo^1 & \mathtt{inside\ function} \\
621\dTo & & \dTo \\
622\underbrace{\ldots}_{\llbracket \mathtt{CALL} \rrbracket} & \rTo &
623\underbrace{\ldots}_{\mathtt{prologue}} \\
624\end{diagram}
625\end{displaymath}
626
627\begin{displaymath}
628\begin{diagram}
629\mathtt{RETURN} & \rTo^1 & \mathtt{.} \\
630\dTo & & \dTo \\
631\underbrace{\ldots}_{\mathtt{epilogue}} & \rTo &
632\underbrace{\ldots} \\
633\end{diagram}
634\end{displaymath}
635
636\begin{align*}
637\mathtt{prologue}(s) = & \mathtt{create\_new\_frame}; \\
638                       & \mathtt{pop\ ra}; \\
639                       & \mathtt{save\ callee\_saved}; \\
640                                                                                         & \mathtt{get\_params} \\
641                                                                                         & \ \ \mathtt{reg\_params}: \mathtt{move} \\
642                                                                                         & \ \ \mathtt{stack\_params}: \mathtt{push}/\mathtt{pop}/\mathtt{move} \\
643\end{align*}
644
645\begin{align*}
646\mathtt{epilogue}(s) = & \mathtt{save\ return\ to\ tmp\ real\ regs}; \\
647                                                                                         & \mathtt{restore\_registers}; \\
648                       & \mathtt{push\ ra}; \\
649                       & \mathtt{delete\_frame}; \\
650                       & \mathtt{save return} \\
651\end{align*}
652
653\begin{displaymath}
654\mathtt{CALL}\ id \mapsto \mathtt{set\_params};\ \mathtt{CALL}\ id;\ \mathtt{fetch\_result}
655\end{displaymath}
656
657\subsection{The ERTL to LTL translation}
658\label{subsect.ertl.ltl.translation}
659\newcommand{\declsf}[1]{\expandafter\newcommand\expandafter{\csname #1\endcsname}{\mathop{\mathsf{#1}}\nolimits}}
660\declsf{Livebefore}
661\declsf{Liveafter}
662\declsf{Defined}
663\declsf{Used}
664\declsf{Eliminable}
665\declsf{StatementSem}
666For the liveness analysis, we aim at a map
667$\ell \in \mathtt{label} \mapsto $ live registers at $\ell$.
668We define the following operators on ERTL statements.
669$$
670\begin{array}{lL>{(ex. $}L<{)$}}
671\Defined(s) & registers defined at $s$ & r_1\leftarrow r_2+r_3 \mapsto \{r_1,C\}, \mathtt{CALL}~id\mapsto \text{caller-save}
672\\
673\Used(s) & registers used at $s$ & r_1\leftarrow r_2+r_3 \mapsto \{r_2,r_3\}, \mathtt{CALL}~id\mapsto \text{parameters}
674\end{array}
675$$
676Given $LA:\mathtt{label}\to\mathtt{lattice}$ (where $\mathtt{lattice}$
677is the type of sets of registers\footnote{More precisely, it is thethe lattice
678of pairs of sets of pseudo-registers and sets of hardware registers,
679with pointwise operations.}, we also have have the following
680predicates:
681$$
682\begin{array}{lL}
683\Eliminable_{LA}(\ell) & iff $s(\ell)$ has side-effects only on $r\notin LA(\ell)$
684\\&
685(ex.\ $\ell : r_1\leftarrow r_2+r_3 \mapsto (\{r_1,C\}\cap LA(\ell)\neq\emptyset,
686  \mathtt{CALL}id\mapsto \text{never}$)
687\\
688\Livebefore_{LA}(\ell) &$:=
689  \begin{cases}
690    LA(\ell) &\text{if $\Eliminable_{LA}(\ell)$,}\\
691    (LA(\ell)\setminus \Defined(s(\ell)))\cup \Used(s(\ell) &\text{otherwise}.
692  \end{cases}$
693\end{array}
694$$
695In particular, $\Livebefore$ has type $(\mathtt{label}\to\mathtt{lattice})\to
696\mathtt{label}\to\mathtt{lattice}$.
697
698The equation on which we build the fixpoint is then
699$$\Liveafter(\ell) \doteq \bigcup_{\ell' >_1 \ell} \Livebefore_{\Liveafter}(\ell')$$
700where $\ell' >_1 \ell$ denotes that $\ell'$ is an immediate successor of $\ell$
701in the graph. We do not require the fixpoint to be the least one, so the hypothesis
702on $\Liveafter$ that we require is
703$$\Liveafter(\ell) \supseteq \bigcup_{\ell' >_1 \ell} \Livebefore(\ell')$$
704(for shortness we drop the subscript from $\Livebefore$).
705\subsection{The LTL to LIN translation}
706\label{subsect.ltl.lin.translation}
707Ad detailed elsewhere in the reports, due to the parameterized representation of
708the back-end languages, the pass described here is actually much more generic
709than the translation from LTL to LIN. It consists in a linearization pass
710that maps any graph-based back-end language to its corresponding linear form,
711preserving its semantics. In the rest of the section, however, we will keep
712the names LTL and LIN for the two partial instantiations of the parameterized
713language.
714
715We require a map, $\sigma$, from LTL statuses, where program counters are represented as labels in a graph data structure, to LIN statuses, where program counters are natural numbers:
716\begin{displaymath}
717\mathtt{pc : label} \stackrel{\sigma}{\longrightarrow} \mathbb{N}
718\end{displaymath}
719
720The LTL to LIN translation pass also linearises the graph data structure into a list of instructions.
721Pseudocode for the linearisation process is as follows:
722
723\begin{lstlisting}
724let rec linearise graph visited required generated todo :=
725  match todo with
726  | l::todo ->
727    if l $\in$ visited then
728      let generated := generated $\cup\ \{$ Goto l $\}$ in
729      let required := required $\cup$ l in
730        linearise graph visited required generated todo
731    else
732      -- Get the instruction at label `l' in the graph
733      let lookup := graph(l) in
734      let generated := generated $\cup\ \{$ lookup $\}$ in
735      -- Find the successor of the instruction at label `l' in the graph
736      let successor := succ(l, graph) in
737      let todo := successor::todo in
738        linearise graph visited required generated todo
739  | []      -> (required, generated)
740\end{lstlisting}
741
742It is easy to see that this linearisation process eventually terminates.
743In particular, the size of the visited label set is monotonically increasing, and is bounded above by the size of the graph that we are linearising.
744
745The initial call to \texttt{linearise} sees the \texttt{visited}, \texttt{required} and \texttt{generated} sets set to the empty set, and \texttt{todo} initialized with the singleton list consisting of the entry point of the graph.
746We envisage needing to prove the following invariants on the linearisation function above:
747
748\begin{enumerate}
749\item
750$\mathtt{visited} \approx \mathtt{generated}$, where $\approx$ is \emph{multiset} equality, as \texttt{generated} is a set of instructions where instructions may mention labels multiple times, and \texttt{visited} is a set of labels,
751\item
752$\forall \mathtt{l} \in \mathtt{generated}.\ \mathtt{succ(l,\ graph)} \subseteq \mathtt{required} \cup \mathtt{todo}$,
753\item
754$\mathtt{required} \subseteq \mathtt{visited}$,
755\item
756$\mathtt{visited} \cap \mathtt{todo} = \emptyset$.
757\end{enumerate}
758
759The invariants collectively imply the following properties, crucial to correctness, about the linearisation process:
760
761\begin{enumerate}
762\item
763Every graph node is visited at most once,
764\item
765Every instruction that is generated is generated due to some graph node being visited,
766\item
767The successor instruction of every instruction that has been visited already will eventually be visited too.
768\end{enumerate}
769
770Note, because the LTL to LIN transformation is the first time the code of
771a function is linearised in the back-end, we must discover a notion of `well formed function code' suitable for linearised forms.
772In particular, we see the notion of well formedness (yet to be formally defined) resting on the following conditions:
773
774\begin{enumerate}
775\item
776For every jump to a label in a linearised function code, the target label exists at some point in the function code,
777\item
778Each label is unique, appearing only once in the function code,
779\item
780The final instruction of a function code must be a return or an unconditional
781jump.
782\end{enumerate}
783
784We assume that these properties will be easy consequences of the invariants on the linearisation function defined above.
785
786The final condition above is potentially a little opaque, so we explain further.
787The only instructions that can reasonably appear in final position at the end of a function code are returns or backward jumps, as any other instruction would cause execution to `fall out' of the end of the program (for example, when a function invoked with \texttt{CALL} returns, it returns to the next instruction past the \texttt{CALL} that invoked it).
788
789\subsection{The LIN to ASM and ASM to MCS-51 machine code translations}
790\label{subsect.lin.asm.translation}
791
792The LIN to ASM translation step is trivial, being almost the identity function.
793The only non-trivial feature of the LIN to ASM translation is that all labels are `named apart' so that there is no chance of freshly generated labels from different namespaces clashing with labels from another namespace.
794
795The ASM to MCS-51 machine code translation step, and the required statements of correctness, are found in an unpublished manuscript attached to this document.
796This is the most complex translation because of the huge number of cases
797to be addressed and because of the complexity of the two semantics.
798Moreover, in the assembly code we have conditional and unconditional jumps
799to arbitrary locations in the code, which are not supported by the MCS-51
800instruction set. The latter has several kind of jumps characterized by a
801different instruction size and execution time, but limited in range. For
802instance, conditional jumps to locations whose destination is more than
803$2^7$ bytes away from the jump instruction location are not supported at
804all and need to be emulated with a code transformation. The problem, which
805is known in the litterature as branch displacement and that applies also
806to modern architectures, is known to be hard and is often NP. As far as we
807know, we will provide the first formally verified proof of correctness for
808an assembler that implements branch displacement. We are also providing
809the first verified proof of correctness of a mildly optimizing branch
810displacement algorithm that, at the moment, is almost finished, but not
811described in the companion paper. This proof by itself took about 6 men
812months.
813
814\section{Correctness of cost prediction}
815Roughly speaking,
816the proof of correctness of cost prediction shows that the cost of executing
817a labelled object code program is the same as the sum over all labels in the
818program execution trace of the cost statically associated to the label and
819computed on the object code itself.
820
821In presence of object level function calls, the previous statement is, however,
822incorrect. The reason is twofold. First of all, a function call may diverge.
823To the last labels that comes before the call, however, we also associate
824the cost of the instructions that follow the call. Therefore, in the
825sum over all labels, when we meet a label we pre-pay for the instructions
826after function calls, assuming all calls to be terminating. This choice is
827driven by considerations on the source code. Functions can be called also
828inside expressions and it would be too disruptive to put labels inside
829expressions to capture the cost of instructions that follow a call. Moreover,
830adding a label after each call would produce a much higher number of proof
831obligations in the certification of source programs using Frama-C. The
832proof obligations, moreover, would be guarded by termination of all functions
833involved, that also generates lots of additional complex proof obligations
834that have little to do with execution costs. With our approach, instead, we
835put less burden on the user, at the price of proving a weaker statement:
836the estimated and actual costs will be the same if and only if the high level
837program is converging. For prefixes of diverging programs we can provide
838a similar result where the equality is replaced by an inequality (loss of
839precision).
840
841Assuming totality of functions is however not sufficient yet at the object
842level. Even if a function returns, there is no guarantee that it will transfer
843control back to the calling point. For instance, the function could have
844manipulated the return address from its stack frame. Moreover, an object level
845program can forge any address and transfer control to it, with no guarantee
846on the execution behaviour and labelling properties of the called program.
847
848To solve the problem, we introduced the notion of \emph{structured trace}
849that come in two flavours: structured traces for total programs (an inductive
850type) and structured traces for diverging programs (a co-inductive type based
851on the previous one). Roughly speaking, a structured trace represents the
852execution of a well behaved program that is subject to several constraints
853like
854\begin{enumerate}
855 \item All function calls return control just after the calling point
856 \item The execution of all function bodies start with a label and end with
857   a RET (even the ones reached by invoking a function pointer)
858 \item All instructions are covered by a label (required by correctness of
859   the labelling approach)
860 \item The target of all conditional jumps must be labelled (a sufficient
861   but not necessary condition for precision of the labelling approach)
862 \item \label{prop5} Two structured traces with the same structure yield the same
863   cost traces.
864\end{enumerate}
865
866Correctness of cost predictions is proved only for structured execution traces,
867i.e. well behaved programs. The forward simulation proof for all back-end
868passes will actually be a proof of preservation of the structure of
869the structured traces that, because of property \ref{prop5}, will imply
870correctness of the cost prediction for the back-end. The Clight to RTLabs
871will also include a proof that associates to each converging execution its
872converging structured trace and to each diverging execution its diverging
873structured trace.
874
875There are also other two issues that invalidate the naive statement of
876correctness of cost prediciton given above. The algorithm that statically
877computes the cost of blocks is correct only when the object code is \emph{well
878formed} and the program counter is \emph{reachable}.
879A well formed object code is such that
880the program counter will never overflow after the execution step of
881the processor. An overflow that occurs during fetching but is overwritten
882during execution is, however, correct and necessary to accept correct
883programs that are as large as the program memory. Temporary overflows add
884complications to the proof. A reachable address is an address that can be
885obtained by fetching (not executing!) a finite number of times from the
886beginning of the code memory without ever overflowing. The complication is that
887the static prediction traverses the code memory assuming that the memory will
888be read sequentially from the beginning and that all jumps jump only to
889reachable addresses. When this property is violated, the way the code memory
890is interpreted is uncorrect and the cost computed is totally meaningless.
891The reachability relation is closed by fetching for well formed programs.
892The property that calls to function pointers only target reachable and
893well labelled locations, however, is not statically predictable and it is
894enforced in the structured trace.
895
896The proof of correctness of cost predictions has been quite complex. Setting
897up the good invariants (structured traces, well formed programs, reachability)
898and completing the proof has required more than 3 men months while the initally
899estimated effort was much lower. In the paper-and-pencil proof for IMP, the
900corresponding proof was obvious and only took two lines.
901
902The proof itself is quite involved. We
903basically need to show as an important lemma that the sum of the execution
904costs over a structured trace, where the costs are summed in execution order,
905is equivalent to the sum of the execution costs in the order of pre-payment.
906The two orders are quite different and the proof is by mutual recursion over
907the definition of the converging structured traces, which is a family of three
908mutual inductive types. The fact that this property only holds for converging
909function calls in hidden in the definition of the structured traces.
910Then we need to show that the order of pre-payment
911corresponds to the order induced by the cost traces extracted from the
912structured trace. Finally, we need to show that the statically computed cost
913for one block corresponds to the cost dinamically computed in pre-payment
914order.
915
916\section{Overall results}
917
918Functional correctness of the compiled code can be shown by composing
919the simulations to show that the target behaviour matches the
920behaviour of the source program, if the source program does not `go
921wrong'.  More precisely, we show that there is a forward simulation
922between the source trace and a (flattened structured) trace of the
923output, and conclude equivalence because the target's semantics are
924in the form of an executable function, and hence
925deterministic.
926
927Combining this with the correctness of the assignment of costs to cost
928labels at the ASM level for a structured trace, we can show that the
929cost of executing any compiled function (including the main function)
930is equal to the sum of all the values for cost labels encountered in
931the \emph{source code's} trace of the function.
932
933\section{Estimated effort}
934Based on the rough analysis performed so far we can estimate the total
935effort for the certification of the compiler. We obtain this estimation by
936combining, for each pass: 1) the number of lines of code to be certified;
9372) the ratio of number of lines of proof to number of lines of code from
938the CompCert project~\cite{compcert} for the CompCert pass that is closest to
939ours; 3) an estimation of the complexity of the pass according to the
940analysis above.
941
942\begin{tabular}{lrlrr}
943Pass origin & Code lines & CompCert ratio & Estimated effort & Estimated effort \\
944            &            &                & (based on CompCert) & \\
945\hline
946Common &  4864 & 4.25 \permil & 20.67 & 17.0 \\
947Cminor &  1057 & 5.23 \permil & 5.53  &  6.0 \\
948Clight &  1856 & 5.23 \permil & 9.71  & 10.0 \\ 
949RTLabs &  1252 & 1.17 \permil & 1.48  &  5.0 \\
950RTL    &   469 & 4.17 \permil & 1.95  &  2.0 \\
951ERTL   &   789 & 3.01 \permil & 2.38  & 2.5 \\
952LTL    &    92 & 5.94 \permil & 0.55  & 0.5 \\
953LIN    &   354 & 6.54 \permil & 2.31  &   1.0 \\
954ASM    &   984 & 4.80 \permil & 4.72  &  10.0 \\
955\hline
956Total common    &  4864 & 4.25 \permil & 20.67 & 17.0 \\
957Total front-end &  2913 & 5.23 \permil & 15.24 & 16.0 \\
958Total back-end  &  6853 & 4.17 \permil & 13.39 & 21.0 \\
959\hline
960Total           & 14630 & 3.75 \permil & 49.30 & 54.0 \\
961\end{tabular}
962
963We provide now some additional informations on the methodology used in the
964computation. The passes in Cerco and CompCert front-end closely match each
965other. However, there is no clear correspondence between the two back-ends.
966For instance, we enforce the calling convention immediately after instruction
967selection, whereas in CompCert this is performed in a later phase. Or we
968linearize the code at the very end, whereas CompCert performs linearization
969as soon as possible. Therefore, the first part of the exercise has consisted
970in shuffling and partitioning the CompCert code in order to assign to each
971CerCo pass the CompCert code that performs the same transformation.
972
973After this preliminary step, using the data given in~\cite{compcert} (which
974are relative to an early version of CompCert) we computed the ratio between
975men months and lines of code in CompCert for each CerCo pass. This is shown
976in the third column of Table~\ref{wildguess}. For those CerCo passes that
977have no correspondence in CompCert (like the optimizing assembler) or where
978we have insufficient data, we have used the average of the ratios computed
979above.
980
981The first column of the table shows the number of lines of code for each
982pass in CerCo. The third column is obtained multiplying the first with the
983CompCert ratio. It provides an estimate of the effort required (in men months)
984if the complexity of the proofs for CerCo and Compcert would be the same.
985
986The two proof styles, however, are on purpose completely different. Where
987CompCert uses non executable semantics, describing the various semantics with
988inductive types, we have preferred executable semantics. Therefore, CompCert
989proofs by induction and inversion become proof by functional inversion,
990performed using the Russel methodology (now called Program in Coq, but whose
991behaviour differs from Matita's one). Moreover, CompCert code is written using
992only types that belong to the Hindley-Milner fragment, whereas we have
993heavily exploited dependent types all over the code. The dependent type
994discipline offers many advantages from the point of view of clarity of the
995invariants involved and early detection of errors and it naturally combines
996well with the Russel approach which is based on dependent types. However, it
997is also well known to introduce technical problems all over the code, like
998the need to explicitly prove type equalities to be able to manipulate
999expressions in certain ways. In many situations, the difficulties encountered
1000with manipulating dependent types are better addressed by improving the Matita
1001system, according to the formalization driven system development. For this
1002reason, and assuming a pessimistic point of view on our performance, the
1003fourth columns presents the final estimation of the effort required, that also
1004takes in account the complexity of the proof suggested by the informal proofs
1005sketched in the previous section.
1006
1007\section{Conclusions}
1008The overall exercise, whose details have been only minimally reported here,
1009has been very useful. It has allowed to spot in an early moment some criticities
1010of the proof that have required major changes in the proof plan. It has also
1011shown that the last passes of the compilation (e.g. assembly) and cost
1012prediction on the object code are much more involved than more high level
1013passes.
1014
1015The final estimation for the effort is surely affected by a low degree of
1016confidence. It is however sufficient to conclude that the effort required
1017is in line with the man power that was scheduled for the second half of the
1018second period and for the third period. Compared to the number of men months
1019declared in Annex I of the contract, we will need more men months. However,
1020both at UNIBO and UEDIN there have been major differences in hiring with
1021respect to the Annex. Therefore both sites have now an higher number of men
1022months available, with the trade-off of a lower level of maturity of the
1023people employed.
1024
1025The reviewers suggested that we use this estimation to compare two possible
1026scenarios: a) proceed as planned, porting all the CompCert proofs to Matita
1027or b) port D3.1 and D4.1 to Coq and re-use the CompCert proofs.
1028We remark here again that the back-end of the two compilers, from the
1029memory model on, are sensibly different: we are not re-proving correctness
1030of the same piece of code. Moreover, the proof techniques are different for
1031the front-end too. Switching to the CompCert formalization would imply
1032the abandon of the untrusted compiler, the abandon of the experiment with
1033a different proof technique, the abandon of the quest for an open source
1034proof, and the abandon of the co-development of the formalization and the
1035Matita proof assistant. In the Commitment Letter~\cite{letter} delivered
1036to the Officer in May we clarified our personal perspective on the project
1037goals and objectives. We do not re-describe here the point of view presented
1038in the letter that we can condense in ``we value diversity''.
1039
1040Clearly, if the execise would have suggested the infeasability in terms of
1041effort of concluding the formalization or getting close to that, we would have
1042abandoned our path and embraced the reviewer's suggestion. However, we
1043have been comforted in the analysis we did in autumn and further progress done
1044during the winter does not show yet any major delay with respect to the
1045proof schedule. We are thus planning to continue the certification according
1046to the more detailed proof plan that came out from the exercise reported in
1047this manuscript.
1048
1049\end{document}
Note: See TracBrowser for help on using the repository browser.