source: Deliverables/D1.2/CompilerProofOutline/outline.tex @ 1803

Last change on this file since 1803 was 1803, checked in by tranquil, 8 years ago

yet another small correction in the LTL to LIN part

File size: 77.0 KB
Line 
1\documentclass[11pt,a4paper]{article}
2
3\usepackage{a4wide}
4\usepackage{amsfonts}
5\usepackage{amsmath}
6\usepackage{amssymb}
7\usepackage{array}
8\usepackage[english]{babel}
9\usepackage{../../style/cerco}
10\usepackage{color}
11\usepackage{diagrams}
12\usepackage{graphicx}
13\usepackage[utf8x]{inputenc}
14\usepackage{listings}
15\usepackage{microtype}
16\usepackage{pdfpages}
17\usepackage{skull}
18\usepackage{stmaryrd}
19\usepackage{wasysym}
20
21\newcolumntype{b}{\at{}>{{}}}
22\newcolumntype{B}{\at{}>{{}}c<{{}}\at{}}
23\newcolumntype{h}[1]{\at{\hspace{#1}}}
24\newcolumntype{L}{>{$}l<{$}}
25\newcolumntype{C}{>{$}c<{$}}
26\newcolumntype{R}{>{$}r<{$}}
27\newcolumntype{S}{>{$(}r<{)$}}
28\newcolumntype{n}{\at{}}
29
30\lstdefinelanguage{matita-ocaml}
31  {keywords={definition,coercion,lemma,theorem,remark,inductive,record,qed,let,in,rec,match,return,with,Type,try,on,to},
32   morekeywords={[2]whd,normalize,elim,cases,destruct},
33   morekeywords={[3]type,of,val,assert,let,function},
34   mathescape=true,
35  }
36\lstset{language=matita-ocaml,basicstyle=\tt,columns=flexible,breaklines=false,
37        keywordstyle=\bfseries, %\color{red}\bfseries,
38        keywordstyle=[2]\bfseries, %\color{blue},
39        keywordstyle=[3]\bfseries, %\color{blue}\bfseries,
40%        commentstyle=\color{green},
41%        stringstyle=\color{blue},
42        showspaces=false,showstringspaces=false}
43\DeclareUnicodeCharacter{8797}{:=}
44\DeclareUnicodeCharacter{10746}{++}
45\DeclareUnicodeCharacter{9001}{\ensuremath{\langle}}
46\DeclareUnicodeCharacter{9002}{\ensuremath{\rangle}}
47
48\lstset{
49  extendedchars=false,
50  inputencoding=utf8x,
51  tabsize=1
52}
53
54\hypersetup{bookmarksopenlevel=2}
55
56\title{
57INFORMATION AND COMMUNICATION TECHNOLOGIES\\
58(ICT)\\
59PROGRAMME\\
60\vspace*{1cm}Project FP7-ICT-2009-C-243881 {\cerco}}
61
62\date{ }
63\author{}
64
65\begin{document}
66\thispagestyle{empty}
67
68\vspace*{-1cm}
69\begin{center}
70\includegraphics[width=0.6\textwidth]{../../style/cerco_logo.png}
71\end{center}
72
73\begin{minipage}{\textwidth}
74\maketitle
75\end{minipage}
76
77
78\vspace*{0.5cm}
79\begin{center}
80\begin{LARGE}
81\bf
82Proof outline for the correctness of\\the CerCo compiler
83\end{LARGE} 
84\end{center}
85
86\vspace*{2cm}
87\begin{center}
88\begin{large}
89Version 1.0
90\end{large}
91\end{center}
92
93\vspace*{0.5cm}
94\begin{center}
95\begin{large}
96Main Authors:\\
97J. Boender, B. Campbell, D. Mulligan, P. Tranquilli, C. Sacerdoti Coen
98\end{large}
99\end{center}
100
101\vspace*{\fill}
102\noindent
103Project Acronym: {\cerco}\\
104Project full title: Certified Complexity\\
105Proposal/Contract no.: FP7-ICT-2009-C-243881 {\cerco}\\
106
107\clearpage \pagestyle{myheadings} \markright{{\cerco}, FP7-ICT-2009-C-243881}
108
109\tableofcontents
110
111\section{Introduction}
112\label{sect.introduction}
113
114In the last project review of the CerCo project, the project reviewers recommended that we briefly outline a pencil-and-paper correctness proof for each of the stages of the CerCo compiler in order to facilitate an estimation of the complexity and time required to complete the formalisation of the proof.
115This has been possible starting from month eighteen, as we have now completed the formalisation in Matita of the data structures and code of the compiler.
116
117In this document we provide a very high-level, pencil-and-paper sketch of what we view as the best path to completing the correctness proof for the compiler.
118In particular, for every translation between two intermediate languages, in both the front- and back-ends, we identify the key translation steps, and identify some invariants that we view as being important for the correctness proof.
119We sketch the overall correctness results, and also briefly describe the parts of the proof that have already been completed at the end of the First Period.
120We also attach the draft for a paper that describes in more details the
121certification of the optimizing assembler.
122
123Finally, in the last sections before the attached paper we present an estimation of the effort required for the certification in Matita of the compiler and draw conclusions.
124
125\section{Front-end: Clight to RTLabs}
126
127The front-end of the CerCo compiler consists of several stages:
128
129\begin{center}
130\begin{minipage}{.8\linewidth}
131\begin{tabbing}
132\quad \= $\downarrow$ \quad \= \kill
133\textsf{Clight}\\
134\> $\downarrow$ \> cast removal (an endo-transformation)\\
135\> $\downarrow$ \> add runtime functions\footnote{Following the last project
136meeting we intend to move this transformation to the back-end}\\
137\> $\downarrow$ \> cost labelling\\
138\> $\downarrow$ \> loop optimisations\footnote{\label{lab:opt2}To be ported from the untrusted compiler and certified only in case of early completion of the certification of the other passes.} (an endo-transformation)\\
139\> $\downarrow$ \> partial redundancy elimination$^{\mbox{\scriptsize \ref{lab:opt2}}}$ (an endo-transformation)\\
140\> $\downarrow$ \> stack variable allocation and control structure
141 simplification\\
142\textsf{Cminor}\\
143\> $\downarrow$ \> generate global variable initialisation code\\
144\> $\downarrow$ \> transform to RTL graph\\
145\textsf{RTLabs}\\
146\> $\downarrow$ \> \\
147\>\,\vdots
148\end{tabbing}
149\end{minipage}
150\end{center}
151
152Here, by `endo-transformation', we mean a mapping from language back to itself:
153the loop optimisation step maps the Clight language to itself.
154
155%Our overall statements of correctness with respect to costs will
156%require a correctly labelled program
157There are three layers in most of the proofs proposed:
158\begin{enumerate}
159\item invariants closely tied to the syntax and transformations using
160  dependent types (such as the presence of variable names in environments),
161\item a forward simulation proof relating each small-step of the
162  source to zero or more steps of the target, and
163\item proofs about syntactic properties of the cost labelling.
164\end{enumerate}
165The first will support both functional correctness and allow us to
166show the totality of some of the compiler stages (that is, those
167stages of the compiler cannot fail).  The second provides the main
168functional correctness result, including the preservation of cost
169labels in the traces, and the last will be crucial for applying
170correctness results about the costings from the back-end by showing
171that they appear in enough places so that we can assign all of the
172execution costs to them.
173
174We will also prove that a suitably labelled RTLabs trace can be turned
175into a \emph{structured trace} which splits the execution trace into
176cost-label to cost-label chunks with nested function calls.  This
177structure was identified during work on the correctness of the
178back-end cost analysis as retaining important information about the
179structure of the execution that is difficult to reconstruct later in
180the compiler.
181
182\subsection{Clight cast removal}
183
184This transformation removes some casts inserted by the parser to make
185arithmetic promotion explicit but which are superfluous (such as
186\lstinline[language=C]'c = (short)((int)a + (int)b);' where
187\lstinline'a' and \lstinline'b' are \lstinline[language=C]'short').
188This is necessary for producing good code for our target architecture.
189
190It only affects Clight expressions, recursively detecting casts that
191can be safely eliminated.  The semantics provides a big-step
192definition for expression, so we should be able to show a lock-step
193forward simulation between otherwise identical states using a lemma
194showing that cast elimination does not change the evaluation of
195expressions.  This lemma will follow from a structural induction on
196the source expression.  We have already proved a few of the underlying
197arithmetic results necessary to validate the approach.
198
199\subsection{Clight cost labelling}
200
201This adds cost labels before and after selected statements and
202expressions, and the execution traces ought to be equivalent modulo
203the new cost labels.  Hence it requires a simple forward simulation
204with a limited amount of stuttering whereever a new cost label is
205introduced.  A bound can be given for the amount of stuttering allowed
206based on the statement or continuation to be evaluated next.
207
208We also intend to show three syntactic properties about the cost
209labelling:
210\begin{enumerate}
211\item every function starts with a cost label,
212\item every branching instruction is followed by a cost label (note that
213  exiting a loop is treated as a branch), and
214\item the head of every loop (and any \lstinline'goto' destination) is
215  a cost label.
216\end{enumerate}
217These can be shown by structural induction on the source term.
218
219\subsection{Clight to Cminor translation}
220
221This translation is the first to introduce some invariants, with the
222proofs closely tied to the implementation by dependent typing.  These
223are largely complete and show that the generated code enjoys:
224\begin{itemize}
225\item some minimal type safety shown by explicit checks on the
226  Cminor types during the transformation (a little more work remains
227  to be done here, but follows the same form);
228\item that variables named in the parameter and local variable
229  environments are distinct from one another, again by an explicit
230  check;
231\item that variables used in the generated code are present in the
232  resulting environment (either by checking their presence in the
233  source environment, or from a list of freshly generated temporary variables);
234  and
235\item that all \lstinline[language=C]'goto' labels are present (by
236  checking them against a list of source labels and proving that all
237  source labels are preserved).
238\end{itemize}
239
240The simulation will be similar to the relevant stages of CompCert
241(Clight to Csharpminor and Csharpminor to Cminor --- in the event that
242the direct proof is unwieldy we could introduce an intermediate
243language corresponding to Csharpminor).  During early experimentation
244with porting CompCert definitions to the Matita proof assistant we
245found little difficulty reproving the results for the memory model, so
246we plan to port the memory injection properties and use them to relate
247Clight in-memory variables with either the value of the local variable or a
248stack slot, depending on how it was classified.
249
250This should be sufficient to show the equivalence of (big-step)
251expression evaluation.  The simulation can then be shown by relating
252corresponding blocks of statement and continuations with their Cminor
253counterparts and proving that a few steps reaches the next matching
254state.
255
256The syntactic properties required for cost labels remain similar and a
257structural induction on the function bodies should be sufficient to
258show that they are preserved.
259
260\subsection{Cminor global initialisation code}
261
262This short phase replaces the global variable initialisation data with
263code that executes when the program starts.  Each piece of
264initialisation data in the source is matched by a new statement
265storing that data.  As each global variable is allocated a distinct
266memory block, the program state after the initialisation statements
267will be the same as the original program's state at the start of
268execution, and will proceed in the same manner afterwards.
269
270% Actually, the above is wrong...
271% ... this ought to be in a fresh main function with a fresh cost label
272
273\subsection{Cminor to RTLabs translation}
274
275In this part of the compiler we transform the program's functions into
276control flow graphs.  It is closely related to CompCert's Cminorsel to
277RTL transformation, albeit with target-independent operations.
278
279We already enforce several invariants with dependent types: some type
280safety, mostly shown using the type information from Cminor; and
281that the graph is closed (by showing that each successor was recently
282added, or corresponds to a \lstinline[language=C]'goto' label which
283are all added before the end).  Note that this relies on a
284monotonicity property; CompCert maintains a similar property in a
285similar way while building RTL graphs.  We will also add a result
286showing that all of the pseudo-register names are distinct for use by
287later stages using the same method as Cminor.
288
289The simulation will relate Cminor states to RTLabs states which are about to
290execute the code corresponding to the Cminor statement or continuation.
291Each Cminor statement becomes zero or more RTLabs statements, with a
292decreasing measure based on the statement and continuations similar to
293CompCert's.  We may also follow CompCert in using a relational
294specification of this stage so as to abstract away from the functional
295(and highly dependently typed) definition.
296
297The first two labelling properties remain as before; we will show that
298cost labels are preserved, so the function entry point will be a cost
299label, and successors to any statement that are cost labels map still
300map to cost labels, preserving the condition on branches.  We replace
301the property for loops with the notion that we will always reach a
302cost label or the end of the function after following a bounded number of
303successors.  This can be easily seen in Cminor using the requirement
304for cost labels at the head of loops and after gotos.  It remains to
305show that this is preserved by the translation to RTLabs.  % how?
306
307\subsection{RTLabs structured trace generation}
308
309This proof-only step incorporates the function call structure and cost
310labelling properties into the execution trace.  As the function calls
311are nested within the trace, we need to distinguish between
312terminating and non-terminating function calls.  Thus we use the
313excluded middle (specialised to a function termination property) to do
314this.
315
316Structured traces for terminating functions are built by following the
317flat trace, breaking it into chunks between cost labels and
318recursively processing function calls.  The main difficulties here are
319the non-structurally recursive nature of the function (instead we use
320the size of the termination proof as a measure) and using the RTLabs
321cost labelling properties to show that the constraints of the
322structured traces are observed.  We also show that the lower stack
323frames are preserved during function calls in order to prove that
324after returning from a function call we resume execution of the
325correct code.  This part of the work has already been constructed, but
326still requires a simple proof to show that flattening the structured
327trace recreates the original flat trace.
328
329The non-terminating case follows the trace like the terminating
330version to build up chunks of trace from cost-label to cost-label
331(which, by the finite distance to a cost label property shown before,
332can be represented by an inductive type).  These chunks are chained
333together in a coinductive data structure that can represent
334non-terminating traces.  The excluded middle is used to decide whether
335function calls terminate, in which case the function described above
336constructs an inductive terminating structured trace which is nested
337in the caller's trace.  Otherwise, another coinductive constructor is
338used to embed the non-terminating trace of the callee, generated by
339corecursion.  This part of the trace transformation is currently under
340construction, and will also need a flattening result to show that it
341is correct.
342
343
344\section{Backend: RTLabs to machine code}
345\label{sect.backend.rtlabs.machine.code}
346
347The compiler backend consists of the following intermediate languages, and stages of translation:
348
349\begin{center}
350\begin{minipage}{.8\linewidth}
351\begin{tabbing}
352\quad \=\,\vdots\= \\
353\> $\downarrow$ \>\\
354\> $\downarrow$ \quad \= \kill
355\textsf{RTLabs}\\
356\> $\downarrow$ \> copy propagation\footnote{\label{lab:opt}To be ported from the untrusted compiler and certified only in the case of an early completion of the certification of the other passes.} (an endo-transformation) \\
357\> $\downarrow$ \> instruction selection\\
358\> $\downarrow$ \> change of memory models in compiler\\
359\textsf{RTL}\\
360\> $\downarrow$ \> constant propagation$^{\mbox{\scriptsize \ref{lab:opt}}}$ (an endo-transformation) \\
361\> $\downarrow$ \> calling convention made explicit \\
362\> $\downarrow$ \> layout of activation records \\
363\textsf{ERTL}\\
364\> $\downarrow$ \> register allocation and spilling\\
365\> $\downarrow$ \> dead code elimination\\
366\textsf{LTL}\\
367\> $\downarrow$ \> function linearisation\\
368\> $\downarrow$ \> branch compression (an endo-transformation) \\
369\textsf{LIN}\\
370\> $\downarrow$ \> relabeling\\
371\textsf{ASM}\\
372\> $\downarrow$ \> pseudoinstruction expansion\\
373\textsf{MCS-51 machine code}\\
374\end{tabbing}
375\end{minipage}
376\end{center}
377
378\subsection{Graph translations}
379RTLabs and most intermediate languages in the back-end have a graph representation: the code of each function is represented by a graph of instructions.
380The graph maps a set of labels (the names of the nodes) to the instruction stored at that label (the nodes of the graph).
381Instructions reference zero or more additional labels that are the immediate successors of the instruction: zero for return from functions, more than one for conditional jumps and calls, one in all other cases.
382The references from one instruction to its immediate successors are the arcs of the graph.
383
384The statuses of graph languages always contain a program counter that holds a representation of a reference to the current instruction.
385
386A translation between two consecutive graph languages maps each instruction stored at location $l$ in the first graph and with immediate successors $\{l_1,\ldots,l_n\}$ to a subgraph of the output graph that has a single entry point at location $l$ and exit arcs to $\{l_1,\ldots,l_n\}$.
387Moreover, the labels of all non-entry nodes in the subgraph are distinct from all the labels in the source graph.
388
389In order to simplify the translations and the relative proofs of forward simulation, after the release of D4.2 and D4.3, we have provided:
390\begin{itemize}
391\item
392A new data type (called \texttt{blist}) that represents a sequence of instructions to be added to the output graph.
393The ``b'' in the name stands for binder, since a \texttt{blist} is either empty, an extension of a \texttt{blist} with an instruction at the front, or the generation of a fresh quantity followed by a \texttt{blist}.
394The latter feature is used, for instance, to generate fresh register names.
395The instructions in the list are unlabelled and all of them but the last one are also sequential, like in a linear program.
396\item 
397A new iterator (called \texttt{b\_graph\_translate}) of type
398\begin{displaymath}
399\mathtt{b\_graph\_translate}: (\mathtt{label} \rightarrow \mathtt{blist})
400\rightarrow \mathtt{graph} \rightarrow \mathtt{graph}
401\end{displaymath}
402The iterator transform the input graph in the output graph by replacing each node with the graph that corresponds to the linear \texttt{blist} obtained by applying the function in input to the node label.
403\end{itemize}
404
405Using the iterator above, the code can be written in such a way that the programmer does not see any distinction between writing a transformatio on linear or graph languages.
406
407In order to prove simulations for translations obtained using the iterator, we will prove the following theorem:
408
409\begin{align*}
410\mathtt{theorem} &\ \mathtt{b\_graph\_translate\_ok}: \\
411& \forall  f.\forall G_{i}.\mathtt{let}\ G_{\sigma} := \mathtt{b\_graph\_translate}\ f\ G_{i}\ \mathtt{in} \\
412&       \forall l \in G_{i}.\mathtt{subgraph}\ (f\ l)\ l\ (next \ l \ G_i)\ G_{\sigma}
413\end{align*}
414
415Here \texttt{subgraph} is a computational predicate that given a \texttt{blist} $[i_1, \ldots, i_n]$, an entry label $l$, an exit label $l'$ and a graph $G$ expands to the fact that fetching from $G$ at address $l$ one retrieves a node $i_1$ with a successor $l_1$ that, when fetched, yields a node $i_2$ with a successor $l_2$ such that \ldots. The successor of $i_n$ is $l'$.
416
417Proving a forward simulation diagram of the following kind using the aforementioned theorem is now as straightforward as doing the same using standard small-step operational semantics over linear languages.
418
419\begin{align*}
420\mathtt{lemma} &\ \mathtt{execute\_1\_step\_ok}: \\
421&       \forall s.  \mathtt{let}\ s' := s\ \sigma\ \mathtt{in} \\
422&       \mathtt{let}\ l := pc\ s\ \mathtt{in} \\
423&       s \stackrel{1}{\rightarrow} s^{*} \Rightarrow \exists n. s' \stackrel{n}{\rightarrow} s'^{*} \wedge s'^{*} = s'\ \sigma
424\end{align*}
425
426Because graph translations preserve entry and exit labels of translated statements, the state translation function $\sigma$ will simply preserve the value of the program counter.
427The program code, which is part of the state, is translated using the iterator.
428
429The proof is then roughly as follows.
430Let $l$ be the program counter of the input state $s$.
431We proceed by cases on the current instruction of $s$.
432Let $[i_1, \ldots, i_n]$ be the \texttt{blist} associated to $l$ and $s$ by the translation function.
433The witness required for the existential statement is simply $n$.
434By applying the theorem above we know that the next $n$ instructions that will be fetched from $s\ \sigma$ will be $[i_1, \ldots, i_n]$ and it is now sufficient to prove that they simulate the original instruction.
435
436\subsection{The RTLabs to RTL translation}
437\label{subsect.rtlabs.rtl.translation}
438
439\subsubsection*{Translation of values and memory}
440
441The RTLabs to RTL translation pass marks the frontier between the two memory models used in the CerCo project.
442As a result, we require some method of translating between the values that the two memory models permit.
443Suppose we have such a translation, $\sigma$.
444Then the translation between values of the two memory models may be pictured with:
445
446\begin{displaymath}
447\mathtt{Value} ::= \bot \mid \mathtt{int(size)} \mid \mathtt{float} \mid \mathtt{null} \mid \mathtt{ptr} \quad\stackrel{\sigma}{\longrightarrow}\quad \mathtt{BEValue} ::= \bot \mid \mathtt{byte} \mid \mathtt{null}_i \mid \mathtt{ptr}_i
448\end{displaymath}
449
450In the front-end, we have both integer and float values, where integer values are `sized', along with null values and pointers.
451Some front-end values are representables in a byte, but some others require more bits.
452
453In the back-end model all values are meant to be represented in a single byte.
454Values can thefore be undefined, be one byte long integers or be indexed fragments of a pointer, null or not.
455Floats values are no longer present, as floating point arithmetic is not supported by the CerCo compiler.
456
457The $\sigma$ map implements a one-to-many relation: a single front-end value is mapped to a sequence of back-end values when its size is more then one byte.
458
459We further require a map, $\sigma$, which maps the front-end \texttt{Memory} and the back-end's notion of \texttt{BEMemory}.
460Both kinds of memory can be thought as an instance of a generic \texttt{Mem} data type parameterized over the kind of values stored in memory:
461
462\begin{displaymath}
463\mathtt{Mem}\ \alpha = \mathtt{Block} \rightarrow (\mathbb{Z} \rightarrow \alpha)
464\end{displaymath}
465Here, \texttt{Block} consists of a \texttt{Region} paired with an identifier.
466
467\begin{displaymath}
468\mathtt{Block} ::= \mathtt{Region} \times \mathtt{ID}
469\end{displaymath}
470We now have what we need for defining what is meant by the `memory' in the backend memory model.
471Namely, we instantiate the previously defined \texttt{Mem} type with the type of back-end memory values.
472
473\begin{displaymath}
474\mathtt{BEMem} = \mathtt{Mem}~\mathtt{BEValue}
475\end{displaymath}
476Memory addresses consist of a pair of back-end memory values:
477
478\begin{displaymath}
479\mathtt{Address} = \mathtt{BEValue} \times  \mathtt{BEValue} \\
480\end{displaymath}
481The back- and front-end memory models differ in how they represent sized integeer values in memory.
482In particular, the front-end stores integer values as a header, with size information, followed by a string of `continuation' blocks, marking out the full representation of the value in memory.
483In contrast, the layout of sized integer values in the back-end memory model consists of a series of byte-sized `chunks':
484
485\begin{center}
486\begin{picture}(0, 25)
487\put(-125,0){\framebox(25,25)[c]{\texttt{v,4}}}
488\put(-100,0){\framebox(25,25)[c]{\texttt{cont}}}
489\put(-75,0){\framebox(25,25)[c]{\texttt{cont}}}
490\put(-50,0){\framebox(25,25)[c]{\texttt{cont}}}
491\put(-15,10){\vector(1, 0){30}}
492\put(25,0){\framebox(25,25)[c]{\texttt{\texttt{v$_1$}}}}
493\put(50,0){\framebox(25,25)[c]{\texttt{\texttt{v$_2$}}}}
494\put(75,0){\framebox(25,25)[c]{\texttt{\texttt{v$_3$}}}}
495\put(100,0){\framebox(25,25)[c]{\texttt{\texttt{v$_4$}}}}
496\end{picture}
497\end{center}
498
499Chunks for pointers are pairs made of the original pointer and the index of the chunk.
500Therefore, when assembling the chunks together, we can always recognize if all chunks refer to the same value or if the operation is meaningless.
501
502The differing memory representations of values in the two memory models imply the need for a series of lemmas on the actions of \texttt{load} and \texttt{store} to ensure correctness.
503The first lemma required has the following statement:
504\begin{displaymath}
505\mathtt{load\ s\ a\ M} = \mathtt{Some\ v} \rightarrow \forall i \leq s.\ \mathtt{load\ s\ (a + i)\ \sigma(M)} = \mathtt{Some\ v_i}
506\end{displaymath}
507That is, if we are successful in reading a value of size $s$ from memory at address $a$ in front-end memory, then we should successfully be able to read all of its chunks from memory in the back-end memory at appropriate address (from address $a$ up to and including address $a + i$, where $i \leq s$).
508
509Next, we must show that \texttt{store} properly commutes with the $\sigma$-map between memory spaces:
510\begin{displaymath}
511\sigma(\mathtt{store\ a\ v\ M}) = \mathtt{store\ \sigma(v)\ \sigma(a)\ \sigma(M)}
512\end{displaymath}
513That is, if we store a value \texttt{v} in the front-end memory \texttt{M} at address \texttt{a} and transform the resulting memory with $\sigma$, then this is equivalent to storing a transformed value $\mathtt{\sigma(v)}$ at address $\mathtt{\sigma(a)}$ into the back-end memory $\mathtt{\sigma(M)}$.
514
515Finally, the commutation properties between \texttt{load} and \texttt{store} are weakened in the $\sigma$-image of the memory.
516Writing \texttt{load}$^*$ for the multiple consecutive iterations of \texttt{load} used to fetch all chunks of a value, we must prove that, when $a \neq a'$:
517\begin{displaymath}
518\mathtt{load^* \sigma(a)\ (\mathtt{store}\ \sigma(a')\ \sigma(v)\ \sigma(M))} = \mathtt{load^*\ \sigma(s)\ \sigma(a)\ \sigma(M)}
519\end{displaymath}
520That is, suppose we store a transformed value $\mathtt{\sigma(v)}$ into a back-end memory $\mathtt{\sigma(M)}$ at address $\mathtt{\sigma(a')}$, using \texttt{store}, and then load from the address $\sigma(a)$. Even if $a$ and $a'$ are distinct by hypothesis, there is \emph{a priori} no guarantee that the consecutive bytes for the value stored at $\sigma(a)$ are disjoint from those for the values stored at $\sigma(a')$.
521The fact that this holds is a non-trivial property of $\sigma$ that must be explicitly proved.
522
523\subsubsection*{Translation of RTLabs states}
524
525RTLabs states come in three flavours:
526\begin{displaymath}
527\begin{array}{rll}
528\mathtt{StateRTLabs} & ::=  & (\mathtt{State} : \mathtt{Frame}^* \times \mathtt{Frame} \\
529               & \mid & \mathtt{Call} : \mathtt{Frame}^* \times \mathtt{Args} \times \mathtt{Return} \times \mathtt{Fun} \\
530               & \mid & \mathtt{Return} : \mathtt{Frame}^* \times \mathtt{Value} \times \mathtt{Return}) \times \mathtt{Mem}
531\end{array}
532\end{displaymath}
533\texttt{State} is the default state in which RTLabs programs are almost always in for the duration of their execution.
534The \texttt{Call} state is only entered when a call instruction is being executed, and then we immediately return to being in \texttt{State}.
535Similarly, \texttt{Return} is only entered when a return instruction is being executed, before returning immediately to \texttt{State}.
536All RTLabs states are accompanied by a memory, \texttt{Mem}, with \texttt{Call} and \texttt{Return} keeping track of arguments, return addresses and the results of functions.
537\texttt{State} keeps track of a list of stack frames.
538
539RTL states differ from their RTLabs counterparts, in including a program counter \texttt{PC}, stack-pointer \texttt{SP}, internal stack pointer \texttt{ISP}, a carry flag \texttt{CARRY} and a set of registers \texttt{REGS}:
540
541\begin{displaymath}
542\mathtt{StateRTL} ::= (\mathtt{Frame}^* \times \mathtt{PC} \times \mathtt{SP} \times \mathtt{ISP} \times \mathtt{CARRY} \times \mathtt{REGS}) \times \mathtt{Mem}
543\end{displaymath}
544The internal stack pointer \texttt{ISP}, and its relationship with the stack pointer \texttt{SP}, needs some comment.
545Due to the design of the MCS-51, and its minuscule stack, it was decided that the compiler would implement an emulated stack in external memory.
546As a result, we have two stack pointers in our state: \texttt{ISP}, which is the real, hardware stack, and \texttt{SP}, which is the stack pointer of the emulated stack in memory.
547The emulated stack is used for pushing and popping stack frames when calling or returning from function calls, however this is done using the hardware stack, indexed by \texttt{ISP} as an intermediary.
548Instructions like \texttt{LCALL} and \texttt{ACALL} are hardwired by the processor's design to push the return address on to the hardware stack.
549Therefore after a call has been made, and before a call returns, the compiler emits code to move the return address back and forth the two stacks.
550Parameters, return values and local variables are only present in the external stack.
551As a result, for most of the execution of the processor, the hardware stack is empty, or contains a single item ready to be moved into external memory.
552
553Once more, we require a relation $\sigma$ between RTLabs states and RTL states.
554Because $\sigma$ is one-to-many and, morally, a multivalued function, we use in the sequel the functional notation for $\sigma$, using $\star$ as a distinct marker in the range of $\sigma$ to mean that any value is accepted.
555\begin{displaymath}
556\mathtt{StateRTLabs} \stackrel{\sigma}{\longrightarrow} \mathtt{StateRTL}s
557\end{displaymath}
558
559Translating an RTLabs state to an RTL state proceeds by cases on the particular type of state we are trying to translate, either a \texttt{State}, \texttt{Call} or a \texttt{Return}.
560For \texttt{State} we perform a further case analysis of the top stack frame, which decomposes into a tuple holding the current program counter value, the current stack pointer and the value of the registers:
561\begin{displaymath}
562\sigma(\mathtt{State} (\mathtt{Frame}^* \times \mathtt{\langle PC, REGS, SP, \ldots \rangle})) \longrightarrow ((\sigma(\mathtt{Frame}^*), \sigma(\mathtt{PC}), \sigma(\mathtt{SP}), \star, \star, \sigma(\mathtt{REGS})), \sigma(\mathtt{Mem}))
563\end{displaymath}
564Translation then proceeds by translating the remaining stack frames, as well as the contents of the top stack frame.
565An arbitrary value for the internal stack pointer and the carry bit is admitted.
566
567Translating \texttt{Call} and \texttt{Return} states is more involved, as a commutation between a single step of execution and the translation process must hold:
568\begin{displaymath}
569\sigma(\mathtt{Return}(-)) \longrightarrow \sigma \circ \text{return one step}
570\end{displaymath}
571
572\begin{displaymath}
573\sigma(\mathtt{Call}(-)) \longrightarrow \sigma \circ \text{call one step}
574\end{displaymath}
575
576Here \emph{return one step} and \emph{call one step} refer to a pair of commuting diagrams relating the one-step execution of a call and return state and translation of both.
577We provide the one step commuting diagrams in Figure~\ref{fig.commuting.diagrams}.
578The fact that one execution step in the source language is not performed in the target language is not problematic for preservation of divergence because it is easy to show that every step from a \texttt{Call} or \texttt{Return} state is always preceeded or followed by one step that is always simulated.
579
580\begin{figure}
581\begin{displaymath}
582\begin{diagram}
583s & \rTo^1 & s'   \\
584  & \rdTo                             & \dTo \\
585  &                                   & \llbracket s'' \rrbracket
586\end{diagram}
587\end{displaymath}
588
589\caption{The one-step commuting diagrams for \texttt{Call} and \texttt{Return} state translations.}
590\label{fig.commuting.diagrams}
591\end{figure}
592
593\subsubsection*{The forward simulation proof}
594
595The forward simulation proofs for all steps that do not involve function calls are lengthy, but routine.
596They consist of simulating a front-end operation on front-end pseudo-registers and the front-end memory with sequences of back-end operations on the back-end pseudo-registers and back-end memory.
597The properties of $\sigma$ presented before that relate values and memories will need to be heavily exploited.
598
599The simulation of invocation of functions and returns from functions is less obvious.
600We sketch here what happens on the source code and on its translation.
601
602\subparagraph{Function call/return in RTLabs}
603
604\begin{displaymath}
605\begin{array}{rcl}
606\mathtt{CALL(id,\ args,\ dst,\ pc) \in State(Frame^*, Frame)} & \longrightarrow & \mathtt{PUSH(Frame[pc := after\_return])}, \\
607                                                           &                 &  \mathtt{Call(M(args), dst)}
608\end{array}
609\end{displaymath}
610Suppose we are given a \texttt{State} with a list of stack frames, with the top frame being \texttt{Frame}.
611Suppose also that the program counter in \texttt{Frame} points to a \texttt{CALL} instruction, complete with arguments and destination address.
612Then this is executed by entering into a \texttt{Call} state where the arguments are loaded from memory, and the address pointing to the instruction immediately following the \texttt{Call} instruction is filled in, with the current stack frame being pushed on top of the stack with the return address substituted for the program counter.
613
614Now, what happens next depends on whether we are executing an internal or an external function.
615In the case where the call is to an external function, we have:
616\begin{displaymath}
617\begin{array}{rcl}
618\mathtt{PUSH(Frame[pc := after\_return])}& \stackrel{\mathtt{ret\_val = f(M(args))}}{\longrightarrow} & \mathtt{Return(ret\_val,\ dst,\ PUSH(...))} \\
619\mathtt{Call(M(args), dst)}                 &                                                            & 
620\end{array}
621\end{displaymath}
622That is, the call to the external function enters a return state after first computing the return value by executing the external function on the arguments.
623Then the return state restores the program counter by popping the stack, and execution proceeds in a new \texttt{State}:
624\begin{displaymath}
625\begin{array}{rcl}
626\mathtt{Return(ret\_val,\ dst,\ PUSH(...))} & \longrightarrow & \mathtt{pc = POP\_STACK(regs[dst := M(ret\_val)],\ pc)} \\
627                                            &                 & \mathtt{State(regs[dst := M(ret\_val),\ pc)}
628\end{array}
629\end{displaymath}
630
631Suppose we are executing an internal function, however:
632\begin{displaymath}
633\begin{array}{rcl}
634\mathtt{Call(M(args), dst)}                        & \longrightarrow & \mathtt{sp = alloc,\ regs = \emptyset[- := params]} \\
635\mathtt{PUSH(Frame[pc := after\_return])} &                 & \mathtt{State(regs,\ sp,\ pc_\emptyset,\ dst)}
636\end{array}
637\end{displaymath}
638A new stack frame is allocated and its address is stored in the stack pointer.
639The register map is initialized first to the empty map, assigning an undefined value to all register, before the value of the parameters is inserted into the map into the argument registers, and a new \texttt{State} follows.
640
641
642Eventually, a \texttt{RET} instruction is faced, the return value is fetched from the registers map, the stack frame is deallocated and a \texttt{Return} state is entered:
643\begin{displaymath}
644\begin{array}{rcl}
645\mathtt{RET(id,\ args,\ dst,\ pc) \in State(Frame^*, Frame)} & \longrightarrow & \mathtt{free(sp)} \\
646   &                 & \mathtt{Return(M(ret\_val), dst, Frames)}
647\end{array}
648\end{displaymath}
649
650Then the \texttt{Return} state restores the program counter by popping the stack, and execution proceeds in a new \texttt{State}, like the case for external functions:
651\begin{displaymath}
652\begin{array}{rcl}
653\mathtt{free(sp)}                         & \longrightarrow & \mathtt{pc = POP\_STACK(regs[dst := M(ret\_val)],\ pc)} \\
654\mathtt{Return(M(ret\_val), dst, Frame^*)} &                 & \mathtt{State(regs[dst := M(ret\_val),\ pc)}
655\end{array}
656\end{displaymath}
657
658\subparagraph{The RTLabs to RTL translation for function calls}
659
660Return instructions are translated to return instructions:
661\begin{displaymath}
662\mathtt{RET} \longrightarrow \mathtt{RET}
663\end{displaymath}
664
665\texttt{CALL} instructions are translated to \texttt{CALL\_ID} instructions:
666\begin{displaymath}
667\mathtt{CALL(id,\ args,\ dst,\ pc)} \longrightarrow \mathtt{CALL\_ID(id,\ \Sigma'(args),\ \Sigma(dst),\ pc)}
668\end{displaymath}
669Here $\Sigma$ is the map, computed by the compiler, that translate pseudo-registers holding front-end values to list of pseudo-registers holding the chunks for the front-end values.
670The specification for $\Sigma$ is that for every state $s$,
671\begin{displaymath}
672\sigma(s(r)) = (\sigma(s))(\Sigma(r))
673\end{displaymath}
674
675\subparagraph{Function call/return in RTL}
676
677In the case of RTL, execution proceeds as follows.
678Suppose we are executing a \texttt{CALL\_ID} instruction.
679Then a case split occurs depending on whether we are executing an internal or an external function, as in the RTLabs case:
680\begin{displaymath}
681\hspace{-3.5cm}
682\begin{diagram}
683& & \llbracket \mathtt{CALL\_ID}(\mathtt{id}, \mathtt{args}, \mathtt{dst}, \mathtt{pc})\rrbracket & & \\
684& \ldTo^{\text{external}} & & \rdTo^{\text{internal}} & \\
685\skull & & & &
686\begin{array}{l}
687\mathtt{sp = alloc,\ regs = \emptyset[- := params]} \\
688\mathtt{PUSH}(\mathtt{carry}, \mathtt{regs}, \mathtt{dst}, \mathtt{return\_addr}), \mathtt{pc}_{0}, \mathtt{regs}, \mathtt{sp}
689\end{array}
690\end{diagram}
691\end{displaymath}
692Here, however, we differ from RTLabs when we attempt to execute an external function, in that we use a daemon (i.e. an axiom that can close any goal) to artificially close the case, as we have not yet implemented external functions in the backend.
693The reason for this lack of implementation is as follows.
694Though we have implemented an optimising assembler as the target of the compiler's backend, we have not yet implemented a linker for that assembler, so external functions can not yet be called.
695Whilst external functions are carried forth throughout the entirety of the compiler's frontend, we choose not to do the same for the backend, instead eliminating them in RTL.
696However, it is plausible that we could have carried external functions forth, in order to eliminate them at a later stage (i.e. when translating from LIN to assembly).
697
698In the case of an internal function being executed, we proceed as follows.
699The register map is initialized to the empty map, where all registers are assigned the undefined value, and then the registers corresponding to the function parameters are assigned the value of the parameters.
700Further, the stack pointer is reallocated to make room for an extra stack frame, then a frame is pushed onto the stack with the correct address to jump back to in place of the program counter.
701
702Note, in particular, that this final act of pushing a frame on the stack leaves us in an identical state to the RTLabs case, where the instruction
703\begin{displaymath}
704\mathtt{PUSH(Frame[pc := after\_return])}
705\end{displaymath}
706was executed.
707To summarize, up to the different numer of transitions required to do the job, the RTL code for internal function calls closely simulates the RTLabs code.
708
709The execution of \texttt{Return} in RTL is similarly straightforward, with the return address, stack pointer, and so on, being computed by popping off the top of the stack, and the return value computed by the function being retrieved from memory:
710\begin{align*}
711\mathtt{return\_addr} & := \mathtt{top}(\mathtt{stack}) \\
712v^*                    & := M(\mathtt{rv\_regs}) \\
713\mathtt{dst}, \mathtt{sp}, \mathtt{carry}, \mathtt{regs} & := \mathtt{pop} \\
714\mathtt{regs}[v^* / \mathtt{dst}] \\
715\end{align*}
716
717To summarize, the forward simulation diagrams for function call/return
718have the following form where the triangle is the one given in
719Figure~\ref{fig.commuting.diagrams}, the next instruction to be
720executed in state $s$ is either a function call or return and the intermediate
721state $s'$ is either a \texttt{Call} or a \texttt{Return} state.
722\begin{displaymath}
723\begin{diagram}
724s    & \rTo^1 & s' & \rTo^1 & s'' \\
725\dTo &        &    & \rdTo  & \dTo \\
726\llbracket s \rrbracket & \rTo(1,3)^1 & & & \llbracket s'' \rrbracket \\ 
727\end{diagram}
728\end{displaymath}
729Two steps of execution are simulated by a single step.
730
731\subsection{The RTL to ERTL translation}
732\label{subsect.rtl.ertl.translation}
733
734We map RTL statuses to ERTL statuses as follows:
735\begin{align*}
736\mathtt{sp} & = \mathtt{RegisterSPH} / \mathtt{RegisterSPL} \\
737\mathtt{graph} & = \mathtt{graph} + \mathtt{prologue}(s) + \mathtt{epilogue}(s) \\
738& \mathrm{where}\ s = \mathrm{callee\ saved} + \nu \mathrm{RA} \\
739\end{align*}
740The 16-bit RTL stack pointer \texttt{SP} is mapped to a pair of 8-bit hardware registers \texttt{RegisterSPH} and \texttt{RegisterSPL}.
741The internal function graphs of RTL are augmented with an epilogue and a prologue, indexed by a set of registers, consisting of a fresh pair of registers \texttt{RA} and the set of registers that must be saved by the callee of a function.
742
743The prologue and epilogue that are added to the function graph do the following:
744\begin{align*}
745\mathtt{prologue}(s) = & \mathtt{create\_new\_frame}; \\
746                       & \mathtt{pop\ ra}; \\
747                       & \mathtt{save\ callee\_saved}; \\
748                                                                                         & \mathtt{get\_params} \\
749                                                                                         & \ \ \mathtt{reg\_params}: \mathtt{move} \\
750                                                                                         & \ \ \mathtt{stack\_params}: \mathtt{push}/\mathtt{pop}/\mathtt{move} \\
751\end{align*}
752That is, the prologue first creates a new stack frame, pops the return address from the stack, saves all the callee saved registers (i.e. the set \texttt{s}), fetches the parameters that are passed via registers and the stack and moves them into the correct registers.
753In other words, the prologue of a function correctly sets up the calling convention used in the compiler when calling a function.
754On the other hand, the epilogue undoes the action of the prologue:
755\begin{align*}
756\mathtt{epilogue}(s) = & \mathtt{save\ return\ to\ tmp\ real\ regs}; \\
757                                                                                         & \mathtt{restore\_registers}; \\
758                       & \mathtt{push\ ra}; \\
759                       & \mathtt{delete\_frame}; \\
760                       & \mathtt{save return} \\
761\end{align*}
762That is, the epilogue first saves the return value to a temporary register, restores all the registers, pushes the return address on to the stack, deletes the stack frame that the prologue created, and saves the return value.
763
764The \texttt{CALL} instruction is translated as follows:
765\begin{displaymath}
766\mathtt{CALL}\ id \mapsto \mathtt{set\_params};\ \mathtt{CALL}\ id;\ \mathtt{fetch\_result}
767\end{displaymath}
768Here, \texttt{set\_params} and \texttt{fetch\_result} are functions that implement what the caller of the function needs to do when calling a function, as opposed to the epilogue and prologue which implement what the callee must do.
769
770The translation from RTL to ERTL and execution functions must satisfy the following properties for \texttt{CALL} and \texttt{RETURN} instructions appropriately:
771\begin{displaymath}
772\begin{diagram}
773\mathtt{CALL} & \rTo^1 & \mathtt{inside\ function} \\
774\dTo & & \dTo \\
775\underbrace{\ldots}_{\llbracket \mathtt{CALL} \rrbracket} & \rTo &
776\underbrace{\ldots}_{\mathtt{prologue}} \\
777\end{diagram}
778\end{displaymath}
779That is, if we start in a RTL \texttt{CALL} instruction, and translate this to an ERTL \texttt{CALL} instruction, then executing the RTL \texttt{CALL} instruction for one step and translating should land us in the prologue of the translated function.
780A similar property for \texttt{RETURN} should also hold, substituting the prologue for the epilogue of the function being translated:
781\begin{displaymath}
782\begin{diagram}
783\mathtt{RETURN} & \rTo^1 & \mathtt{.} \\
784\dTo & & \dTo \\
785\underbrace{\ldots}_{\mathtt{epilogue}} & \rTo &
786\underbrace{\ldots} \\
787\end{diagram}
788\end{displaymath}
789
790\subsection{The ERTL to LTL translation}
791\label{subsect.ertl.ltl.translation}
792During the ERTL to LTL translation pseudo-registers are stored in hardware registers or spilled onto the stack frame.
793The decision is based on a liveness analysis performed on the ERTL code to determine what pair of pseudo-registers are live at the same time for a given location.
794A colouring algorithm is then used to choose where to store the pseudo-registers, permitting pseudo-registers that are deemed never to be live at the same time to share the same location.
795
796We will not certify any colouring algorithm or control flow analysis.
797Instead, we axiomatically assume the existence of `oracles' that implement the colouring and liveness analyses.
798In a later phase we plan to validate the solutions by writing and certifying the code of a validator.
799
800We describe the liveness analysis and colouring analysis first and then the ERTL to LTL translation after.
801
802Throughout this section, we denote pseudoregisters with the type $\mathtt{register}$ and hardware registers with $\mathtt{hdwregister}$.
803\subsubsection{Liveness analysis}
804\newcommand{\declsf}[1]{\expandafter\newcommand\expandafter{\csname #1\endcsname}{\mathop{\mathsf{#1}}\nolimits}}
805\declsf{Livebefore}
806\declsf{Liveafter}
807\declsf{Defined}
808\declsf{Used}
809\declsf{Eliminable}
810\declsf{StatementSem}
811
812For the liveness analysis, we aim to construct a map
813$\ell \in \mathtt{label} \mapsto $ live registers at $\ell$.
814We define the following operators on ERTL statements:
815
816\begin{displaymath}
817\begin{array}{lL>{(ex. $}L<{)$}}
818\Defined(\ell) & registers defined at $\ell$ & \ell:r_1\leftarrow r_2+r_3 \mapsto \{r_1,C\}, \ell:\mathtt{CALL}~id\mapsto \text{caller-save}
819\\
820\Used(\ell) & registers used at $\ell$ & \ell:r_1\leftarrow r_2+r_3 \mapsto \{r_2,r_3\}, \ell:\mathtt{CALL}~id\mapsto \text{parameters}
821\end{array}
822\end{displaymath}
823Given $LA:\mathtt{label}\to\mathtt{lattice}$ (where $\mathtt{lattice}$ is the type of sets of registers\footnote{More precisely, it is the lattice $\mathtt{set}(\mathtt{register}) \times \mathtt{set}(\mathtt{hdwregister})$, with pointwise operations.}), we also have have the following predicates:
824\begin{displaymath}
825\begin{array}{lL}
826\Eliminable_{LA}(\ell) & iff executing $\ell$ has side-effects only on $r\notin LA(\ell)$
827\\&
828(ex.\ $\ell : r_1\leftarrow r_2+r_3 \mapsto (\{r_1,C\}\cap LA(\ell)\neq\emptyset),
829  \mathtt{CALL}id\mapsto \text{never}$)
830\\
831\Livebefore_{LA}(\ell) &$:=
832  \begin{cases}
833    LA(\ell) &\text{if $\Eliminable_{LA}(\ell)$,}\\
834    (LA(\ell)\setminus \Defined(\ell))\cup \Used(\ell) &\text{otherwise}.
835  \end{cases}$
836\end{array}
837\end{displaymath}
838In particular, $\Livebefore$ has type $(\mathtt{label}\to\mathtt{lattice})\to \mathtt{label}\to\mathtt{lattice}$.
839
840The equation upon which we build the fixpoint is then
841\begin{displaymath}
842\Liveafter(\ell) \doteq \bigcup_{\ell <_1 \ell'} \Livebefore_{\Liveafter}(\ell')
843\end{displaymath}
844where $\ell <_1 \ell'$ denotes that $\ell'$ is an immediate successor of $\ell$ in the graph.
845We do not require the fixpoint to be the least one, so the hypothesis on $\Liveafter$ that we require is
846\begin{equation}
847\label{eq:livefixpoint}
848\Liveafter(\ell) \supseteq \bigcup_{\ell <_1 \ell'} \Livebefore(\ell')
849\end{equation}
850(for brevity we drop the subscript from $\Livebefore$).
851
852\subsubsection{Colouring}
853
854\declsf{Colour}
855\newcommand{\at}{\mathrel{@}}
856
857The aim of liveness analysis is to define what properties we need of the colouring function, which is a map (computed separately for each internal function)
858\begin{displaymath}
859\Colour:\mathtt{register}\to\mathtt{hdwregister}+\mathtt{nat}
860\end{displaymath}
861which identifies pseudoregisters with hardware ones if it is able to, otherwise it spills them to the stack.
862We will just state what property we require from such a map.
863First, we extend the definition to all types of registers by:
864\begin{displaymath}
865\begin{aligned}
866   \Colour^+:\mathtt{hdwregister}+\mathtt{register} &\to \mathtt{hdwregister}+\mathtt{nat}\\
867   r & \mapsto
868\begin{cases}
869  \Colour(r) &\text{if $r\in\mathtt{register}$,}\\
870  r &\text{if $r\in\mathtt{hdwregister}$,}.
871\end{cases}
872\end{aligned}
873\end{displaymath}
874The other piece of information we compute for each function is a \emph{similarity} relation, which is an equivalence relation on all kinds of registers which depends on the point of the program.
875We write
876\begin{displaymath}
877x\sim y \at \ell
878\end{displaymath}
879to state that registers $x$ and $y$ are similar at $\ell$.
880The formal definition of this relation's property will be given next, but intuitively it means that those two registers \emph{must} have the same value after $\ell$.
881The analysis that produces this information can be coarse: in our case, we just set two different registers to be similar at $\ell$ if at $\ell$ itself there is a move instruction between the two.
882
883The property required of colouring is the following:
884\begin{equation}
885\label{eq:colourprop}
886\forall \ell.\forall x,y. x,y\in \Liveafter(\ell)\Rightarrow
887  \Colour^+(x)=\Colour^+(y) \Rightarrow x\sim y \at\ell.
888\end{equation}
889
890\subsubsection{The translation}
891
892For example:
893\begin{displaymath}
894\ell : r_1\leftarrow r_2+r_3 \mapsto \begin{cases}
895                                 \varepsilon & \text{if $\Eliminable(\ell)$},\\
896                                 \Colour(r_1) \leftarrow \Colour(r_2) + \Colour(r_3) & \text{otherwise}.
897                                \end{cases}
898\end{displaymath}
899where $\varepsilon$ is the empty block of instructions (i.e.\ a \texttt{GOTO}), and $\Colour(r_1) \leftarrow \Colour(r_2) + \Colour(r_3)$ is a notation for a block of instructions that take into account:
900\begin{itemize}
901\item
902Load and store ops on the stack if any colouring is in fact a spilling;
903\item
904Using the accumulator to store intermediate values.
905\end{itemize}
906The overall effect is that if $T$ is an LTL state with $\ell(T)=\ell$ then we will have $T\to^+T'$ where $T'(\Colour(r_1))=T(\Colour(r_2))+T(\Colour(r_2))$, while $T'(y)=T(y)$ for any $y$ \emph{in the image of $\Colour$} different from $\Colour(r_1)$.
907Some hardware registers that are used for book-keeping and which are explicitly excluded from colouring may have different values.
908
909We skip the details of correctly dealing with the stack and its size.
910
911\subsubsection{The relation between ERTL and LTL states}
912
913Given a state $S$ in ERTL, we abuse notation by using $S$ as the underlying map
914\begin{displaymath}
915S : \mathtt{hdwregister}+\mathtt{register} \to \mathtt{Value}
916\end{displaymath}
917We write $\ell(S)$ for the program counter in $S$.
918At this point we can state the property asked from similarity:
919\begin{equation}
920\label{eq:similprop}
921\forall S,S'.S\to S' \Rightarrow \forall x,y.x\sim y \at \ell(S) \Rightarrow S'(x) = S'(y).
922\end{equation}
923
924Next, we relate ERTL states with LTL ones.
925For a state $T$ in LTL we again abuse notation using $T$ as a map
926\begin{displaymath}
927T: \mathtt{hdwregister}+\mathtt{nat} \to \mathtt{Value}
928\end{displaymath}
929which maps hardware registers and \emph{local stack offsets} to values (in particular, $T$ as a map depends on the saved frames for computing the correct absolute stack values).
930
931The relation existing between the states at the two sides of this translation step, which depends on liveness and colouring, is then defined as
932\begin{displaymath}
933S\mathrel\sigma T \iff \ldots \wedge \forall x. x\in \Livebefore(\ell(S))\Rightarrow T(\Colour^+(x)) = S(x)
934\end{displaymath}
935The ellipsis stands for other straightforward preservation, among which the properties $\ell(T) = \ell(S)$ and, inductively, the preservation of frames.
936
937\subsubsection{Proof of preservation}
938
939We will prove the following proposition:
940\begin{displaymath}
941\forall S, T. S \mathrel\sigma T \Rightarrow S \to S' \Rightarrow \exists T'.T\to^+ T' \wedge S'\mathrel\sigma T'
942\end{displaymath}
943(with appropriate cost-labelled trace preservation which we omit).
944We will call $S\mathrel \sigma T$ the inductive hypothsis, as it will be such in the complete proof by induction on the trace of the program.
945As usual, this step is done by cases on the statement at $\ell(S)$ and how it is translated.
946We carry out the case of a binary operation on registers in some detail.
947
948Suppose that $\ell(S):r_1 \leftarrow r_2+r_3$, so that
949\begin{displaymath}
950S'(x)=\begin{cases}S(r_1)+S(r_2) &\text{if $x=r_1$,}\\S(x) &\text{otherwise.}\end{cases}
951\end{displaymath}
952
953\paragraph*{Case $\Eliminable(\ell(S))$.}
954
955By definition we have $r_1\notin \Liveafter(\ell(S))$, and the translation of the operation yields a \texttt{GOTO}.
956We take $T'$ the immediate successor of $T$.
957
958Now in order to prove $S'\mathrel\sigma T'$, take any
959\begin{displaymath}
960x\in\Livebefore(\ell(S'))\subseteq \Liveafter(\ell(S)) = \Livebefore(\ell(S))
961\end{displaymath}
962where we have used property~\eqref{eq:livefixpoint} and the definition of $\Livebefore$ when $\Eliminable(\ell(S))$.
963We get the following chain of equalities:
964
965\begin{displaymath}
966T'(\Colour^+(x))\stackrel 1=T(\Colour^+(x))\stackrel 2=S(x) \stackrel 3= S'(x)
967\end{displaymath}
968where
969\begin{enumerate}
970\item
971follows as $T'$ has the same store as $T$,
972\item
973follows from the inductive hypothesis as $x\in\Livebefore(\ell(S))$,
974\item
975follows as $x\neq r_1$, as $r_1\notin \Liveafter(\ell(S))\ni x$.
976\end{enumerate}
977
978\paragraph*{Case $\neg\Eliminable(\ell(S))$.}
979
980We then have $r_1\in\Liveafter(\ell(S))$, and
981
982\begin{displaymath}
983\Livebefore(\ell(S))=(\Liveafter(\ell(S))\setminus\{r_1\})\cup\{r_2,r_3\}
984\end{displaymath}
985Moreover the statement is translated to $\Colour(r_1)\leftarrow\Colour(r_2)+\Colour(r_3)$, and we take the $T'\leftarrow^+T$ such that $T'(\Colour(r_1))=T(\Colour(r_2))+T(\Colour(r_3))$ and $T'(\Colour^+(x))=T(\Colour^+(x))$ for all $x$ with $\Colour^+(x)\neq\Colour(r_1)$.
986
987Take any $x\in\Livebefore(\ell(S'))\subseteq \Liveafter(\ell(S))$ (by property~\eqref{eq:livefixpoint}).
988
989If $\Colour^+(x)=\Colour(r_1)$, we have by property~\eqref{eq:colourprop} that $x\sim r_1\at \ell(S)$ (as both $r_1,x\in\Liveafter(\ell(S))$, so that
990\begin{displaymath}
991T'(\Colour^+(x))=T(\Colour(r_2))+T(\Colour(r_3))\stackrel 1=S(r_2)+S(r_3)=S'(r_1)\stackrel 2=S(x)
992\end{displaymath}
993where
994\begin{enumerate}
995\item
996follows from two uses of the inductive hypothesis, as $r_2,r_3\in\Livebefore(\ell(S))$,
997\item
998follows from property~\eqref{eq:similprop}\footnote{Notice that in our particular implementation for this case of binary op $x\sim r_1\at\ell(S)$ implies $x=r_1$.
999However, nothing prevents us from employing more finegrained heuristics for similarity.}.
1000\end{enumerate}
1001
1002If $\Colour^+(x)\neq\Colour(r_1)$ (so in particular $x\neq r_1$), then $x\in\Livebefore(\ell(S))$, so by inductive hypothesis we have
1003\begin{displaymath}
1004T'(\Colour^+(x))=T(\Colour^+(x))=S(x)=S'(x)
1005\end{displaymath}
1006
1007\subsection{The LTL to LIN translation}
1008\label{subsect.ltl.lin.translation}
1009As detailed elsewhere in the reports, due to the parameterised representation of the back-end languages, the pass described here is actually much more generic than the translation from LTL to LIN.
1010It consists in a linearisation pass that maps any graph-based back-end language to its corresponding linear form, preserving its semantics.
1011In the rest of the section, however, we will keep the names LTL and LIN for the two partial instantiations of the parameterized language for convenience.
1012
1013We require a map, $\sigma$, from LTL statuses, where program counters are represented as labels in a graph data structure, to LIN statuses, where program counters are natural numbers:
1014\begin{displaymath}
1015\mathtt{pc : label} \stackrel{\sigma}{\longrightarrow} \mathbb{N}
1016\end{displaymath}
1017
1018The LTL to LIN translation pass also linearises the graph data structure into a list of instructions.
1019Pseudocode for the linearisation process is as follows:
1020
1021\begin{lstlisting}
1022let rec linearise graph visited required generated todo :=
1023  match todo with
1024  | l::todo ->
1025    if l $\in$ visited then
1026      -- do not do anything if the node is alredy visited
1027      linearise graph visited required generated todo
1028    else
1029      -- Get the instruction at label `l' in the graph
1030      let lookup := graph(l) in
1031      let generated := generated $\cup\ \{$ lookup $\}$ in
1032      -- Find the explicit successors of the instruction at label `l' in the graph
1033      let xsuccs := explsuccs(l, graph) in
1034      let required := xsuccs $\cup$ required in
1035      -- Update generated and required in case the implicit successor is visited
1036      -- NB: succ can be none
1037      let succ := succ(l, graph) in
1038      let todo := succ :: xsuccs @ todo in
1039      let (generated, required) :=
1040          if succ is defined and succ $\in$ visited then
1041            (generated $\cup\ \{$ GOTO succ $\}$, $\{$ succ $\}\ \cup$ required)
1042          else
1043            (generated, required) in
1044      linearise graph visited required generated todo
1045  | []      -> (required, generated)
1046\end{lstlisting}
1047Explcit successors are labels that need to remain in the code (e.g.\ the target of a
1048\texttt{GOTO} that has survived branch compression, or the branching label of a
1049conditional), while the possible implicit one must be replaced by a \texttt{GOTO}
1050if and only if it is a back edge of the depth-first visit.
1051
1052It is easy to see that this linearisation process eventually terminates.
1053In particular, the size of the visited label set is non decreasing and  bounded above by the size of the graph that we are linearising. Moreover when it remains the same we are
1054decreasing the size of the todo list. So the lexicographic measure $(n - |\mathtt{visited}|, |\mathtt{todo}|)$ (where $n$ is the number of nodes in the graph)  strictly decreases upon each
1055call\footnote{The use of a lexicographic measure can be avoided by taking the first element in $\mathtt{todo}$ not in $\mathtt{visited}$ and discarding all preceding nodes before making the recursive call.}.
1056
1057The initial call to \texttt{linearise} sees the \texttt{visited}, \texttt{required} and \texttt{generated} sets set to the empty set, and \texttt{todo} initialized with the singleton list consisting of the entry point of the graph.
1058We envisage needing to prove the following invariants on the linearisation function above:
1059
1060\begin{enumerate}
1061\item
1062$\mathtt{visited} \approx \mathtt{generated}$, where $\approx$ is \emph{multiset} equality, as \texttt{generated} is a set of instructions where instructions may mention labels multiple times, and \texttt{visited} is a set of labels,
1063\item
1064$\forall \mathtt{l} \in \mathtt{generated}.\ \mathtt{succs(l,\ graph)} \subseteq \mathtt{generated} \cup \mathtt{todo}$,
1065\item
1066$\mathtt{required} \subseteq \mathtt{visited}\cup\mathtt{todo}
1067$,
1068\item
1069$\forall \mathtt{s}$ statement of $\mathtt{generated}$, $\mathtt{explsuccs}(s)\subseteq\mathtt{required}$.
1070\end{enumerate}
1071
1072The invariants collectively imply the following properties, crucial to correctness, about the linearisation process:
1073
1074\begin{enumerate}
1075\item
1076Every graph node is visited at most once,
1077\item
1078Every instruction that is generated is generated due to some graph node being visited,
1079\item
1080The successor instruction of every instruction that has been visited already will eventually be visited too.
1081\end{enumerate}
1082
1083Note, because the LTL to LIN transformation is the first time the code of a function is linearised in the back-end, we must discover a notion of `well-formed function code' suitable for linearised forms.
1084In particular, we see the notion of well-formedness (yet to be formally defined) resting on the following conditions:
1085
1086\begin{enumerate}
1087\item
1088For every jump to a label in a linearised function code, the target label exists at some point in the function code,
1089\item
1090Each label is unique, appearing only once in the function code,
1091\item
1092The final instruction of a function code must be a return or an unconditional jump.
1093\end{enumerate}
1094
1095We assume that these properties will be easy consequences of the invariants on the linearisation function defined above.
1096
1097The final condition above is potentially a little opaque, so we explain further.
1098The only instructions that can reasonably appear in final position at the end of a function code are returns or backward jumps, as any other instruction would cause execution to `fall out' of the end of the program (for example, when a function invoked with \texttt{CALL} returns, it returns to the next instruction past the \texttt{CALL} that invoked it).
1099
1100\subsection{The LIN to ASM and ASM to MCS-51 machine code translations}
1101\label{subsect.lin.asm.translation}
1102
1103The LIN to ASM translation step is trivial, being almost the identity function.
1104The only non-trivial feature of the LIN to ASM translation is that all labels are `named apart' so that there is no chance of freshly generated labels from different namespaces clashing with labels from another namespace.
1105
1106The ASM to MCS-51 machine code translation step, and the required statements of correctness, are found in an unpublished manuscript attached to this document.
1107This is the most complex translation because of the huge number of cases to be addressed and because of the complexity of the two semantics.
1108Moreover, in the assembly code we have conditional and unconditional jumps to arbitrary locations in the code, which are not supported by the MCS-51 instruction set.
1109The latter has several kind of jumps characterized by a different instruction size and execution time, but limited in range.
1110For instance, conditional jumps to locations whose destination is more than $2^7$ bytes away from the jump instruction location are not supported at all and need to be emulated with a code transformation.
1111This problem, which is known in the literature as branch displacement and is a universal problem for all architectures of microcontroller, is known to be computationally hard, often lying inside NP, depending on the exact characteristics of the target architecture.
1112As far as we know, we will provide the first formally verified proof of correctness for an assembler that implements branch displacement.
1113We are also providing the first verified proof of correctness of a mildly optimising branch displacement algorithm that, at the moment, is almost finished, but not described in the companion paper.
1114This proof, in isolation, took around 6 man months.
1115
1116\section{Correctness of cost prediction}
1117
1118Roughly speaking, the proof of correctness of cost prediction shows that the cost of executing a labelled object-code program is the same as the summation over all labels in the program execution trace of the cost statically associated to the label and computed on the object code itself.
1119
1120In the presence of object-level function calls, the previous statement is, however, incorrect.
1121The reason is twofold.
1122First of all, a function call may diverge.
1123However, to the labels that appears just before a call, we also associate the cost of the instructions that follow the call.
1124Therefore, in the summation over all labels, when we meet a label we pre-pay for the instructions that appear after function calls, assuming all calls to be terminating.
1125
1126This choice is driven by several considerations on the source code.
1127Namely, functions can be called inside expressions, and it would be too disruptive to put labels inside expressions to capture the cost of instructions that follow a call.
1128Moreover, adding a label after each call would produce a much higher number of proof obligations in the certification of source programs using Frama-C.
1129The proof obligations, further, would be guarded by a requirement to demonstrate the termination of all functions involved, that also generates lots of additional complex proof obligations that have little to do with execution costs.
1130
1131With our approach, instead, we put less burden on the user, at the price of proving a weaker statement: the estimated and actual costs will be the same if and only if the high-level program is converging.
1132For prefixes of diverging programs we can provide a similar result where the equality is replaced by an inequality (loss of precision).
1133
1134Assuming totality of functions is however not a sufficiently strong condition at the object-level.
1135Even if a function returns, there is no guarantee that it will transfer control back to the calling point.
1136For instance, the function could have manipulated the return address from its stack frame.
1137Moreover, an object-level program can forge any address and transfer control to it, with no guarantees about the execution behaviour and labelling properties of the called program.
1138
1139To solve the problem, we introduced the notion of a \emph{structured trace} that come in two flavours: structured traces for total programs (an inductive type) and structured traces for diverging programs (a co-inductive type based on the previous one).
1140Roughly speaking, a structured trace represents the execution of a well behaved program that is subject to several constraints, such as:
1141\begin{enumerate}
1142\item
1143All function calls return control just after the calling point,
1144\item
1145The execution of all function bodies start with a label and end with a \texttt{RET} instruction (even those reached by invoking a function pointer),
1146\item
1147All instructions are covered by a label (required by the correctness of the labelling approach),
1148\item
1149The target of all conditional jumps must be labelled (a sufficient but not necessary condition for precision of the labelling approach)
1150\item
1151\label{prop5}
1152Two structured traces with the same structure yield the same cost traces.
1153\end{enumerate}
1154
1155Correctness of cost predictions is only proved for structured execution traces, i.e. well behaved programs.
1156The forward simulation proof for all back-end passes will actually be a proof of preservation of the structure of the structured traces that, because of property \ref{prop5}, will imply correctness of the cost prediction for the back-end.
1157The Clight to RTLabs correctness proof will also include a proof that associates to each converging execution its converging structured trace and to each diverging execution its diverging structured trace.
1158
1159There are also two more issues that invalidate the na\"ive statement of correctness for cost prediciton given above.
1160The algorithm that statically computes the costs of blocks is correct only when the object code is \emph{well-formed} and the program counter is \emph{reachable}.
1161A well-formed object-code is such that the program counter will never overflow after any execution step of the processor.
1162An overflow that occurs during fetching but is overwritten during execution is, however, correct and necessary to accept correct programs that are as large as the processor's code memory.
1163Temporary overflows add complications to the proof.
1164A reachable address is an address that can be obtained by fetching (\emph{not executing!}) a finite number of times from the beginning of the code memory without ever overflowing.
1165The complication is that the static prediction traverses the code memory assuming that the memory will be read sequentially from the beginning and that all jumps jump only to reachable addresses.
1166When this property is violated, the way the code memory is interpreted is incorrect and the cost computed is meaningless.
1167The reachability relation is closed by fetching for well-formed programs.
1168The property that states that function pointers only target reachable and well-labelled locations, however, is not statically predictable and it is therefore enforced in the structured trace.
1169
1170The proof of correctness for cost predictions, and even discovering the correct statement, has been quite complex.
1171Setting up good invariants (i.e. structured traces, well formed programs, reachability) and completing the proof has required more than 3 man months, while the initally estimated effort was much lower.
1172In the paper-and-pencil proof for IMP, the corresponding proof was `obvious' and only took two lines.
1173
1174The proof itself is quite involved.
1175We must show, as an important lemma, that the sum of the execution costs over a structured trace, where the costs are summed in execution order, is equivalent to the sum of the execution costs in the order of pre-payment.
1176The two orders are quite different and the proof is by mutual recursion over the definition of the converging structured traces, which is a family of three mutual inductive types.
1177The fact that this property only holds for converging function calls is hidden in the definition of the structured traces.
1178Then we need to show that the order of pre-payment corresponds to the order induced by the cost traces extracted from the structured trace.
1179Finally, we need to show that the statically computed cost for one block corresponds to the cost dynamically computed in pre-payment order.
1180
1181\section{Overall results}
1182
1183Functional correctness of the compiled code can be shown by composing the simulations to show that the target behaviour matches the behaviour of the source program, if the source program does not `go wrong'.
1184More precisely, we show that there is a forward simulation between the source trace and a (flattened structured) trace of the output, and conclude equivalence because the target's semantics are in the form of an executable function, and hence deterministic.
1185
1186Combining this with the correctness of the assignment of costs to cost labels at the ASM level for a structured trace, we can show that the cost of executing any compiled function (including the top-level, main function of a C program) is equal to the sum of all the values for cost labels encountered in the \emph{source code's} trace of the function.
1187
1188\section{Estimated effort}
1189
1190Based on a rough analysis performed so far we can estimate the total effort required for the certification of the whole compiler.
1191We obtain this estimation by combining, for each pass:
1192\begin{enumerate}
1193\item
1194The number of lines of code to be certified,
1195\item
1196The ratio of number of lines of proof to number of lines of code from the CompCert project~\cite{compcert} for the CompCert pass that is closest in spirit to our own,
1197\item
1198An estimation of the complexity of the pass according to the analysis above.
1199\end{enumerate}
1200The result is shown in Table~\ref{table}.
1201
1202\begin{table}
1203\begin{center}
1204\begin{tabular}{lrlrr}
1205Pass origin & Code lines & CompCert ratio & Estimated effort & Estimated effort \\
1206            &            &                & (based on CompCert) & \\
1207\hline
1208Common &  4864 & 4.25 \permil & 20.67 & 17.0 \\
1209Clight &  1057 & 5.23 \permil & 5.53  &  6.0 \\
1210Cminor &  1856 & 5.23 \permil & 9.71  & 10.0 \\ 
1211RTLabs &  1252 & 1.17 \permil & 1.48  &  5.0 \\
1212RTL    &   469 & 4.17 \permil & 1.95  &  2.0 \\
1213ERTL   &   789 & 3.01 \permil & 2.38  & 2.5 \\
1214LTL    &    92 & 5.94 \permil & 0.55  & 0.5 \\
1215LIN    &   354 & 6.54 \permil & 2.31  &   1.0 \\
1216ASM    &   984 & 4.80 \permil & 4.72  &  10.0 \\
1217\hline
1218Total common    &  4864 & 4.25 \permil & 20.67 & 17.0 \\
1219Total front-end &  2913 & 5.23 \permil & 15.24 & 16.0 \\
1220Total back-end  &  6853 & 4.17 \permil & 13.39 & 21.0 \\
1221\hline
1222Total           & 14630 & 3.75 \permil & 49.30 & 54.0 \\
1223\end{tabular}
1224\end{center}
1225\caption{\label{table} Estimated effort}
1226\end{table}
1227
1228We provide now some additional informations on the methodology used in the computation.
1229The passes in Cerco and CompCert front-end closely match each other.
1230However, there is no clear correspondence between the two back-ends.
1231For instance, we enforce the calling convention immediately after instruction selection, whereas in CompCert this is performed in a later phase.
1232Further, we linearise the code at the very end, whereas CompCert performs linearisation as soon as possible.
1233Therefore, the first part of the estimation exercise has consisted of shuffling and partitioning the CompCert code in order to assign to each CerCo pass the CompCert code that performs a similar transformation.
1234
1235After this preliminary step, using the data given in~\cite{compcert} (which refers to an early version of CompCert) we computed the ratio between man months and lines of code in CompCert for each CerCo pass.
1236This is shown in the third column of Table~\ref{table}.
1237For those CerCo passes that have no correspondence in CompCert (like the optimising assembler) or where we have insufficient data, we have used the average of the ratios computed above.
1238
1239The first column of the table shows the number of lines of code for each pass in CerCo.
1240The third column is obtained multiplying the first with the CompCert ratio.
1241It provides an estimate of the effort required (in man months) if the complexity of the proofs for CerCo and CompCert would be the same.
1242
1243The two proof styles, however, are purposefully completely different.
1244Where CompCert uses non-executable semantics, describing the various semantics of languages with inductive types, we have preferred executable semantics.
1245Therefore, CompCert proofs by induction and inversion become proofs by functional inversion, performed using the Russell methodology (now called Program in Coq, but whose behaviour differs from Matita's one).
1246Moreover, CompCert code is written using only types that belong to the Hindley-Milner fragment, whereas we have heavily exploited dependent types throughout the codebase.
1247The dependent type discipline offers many advantages, especially from the point of view of clarity of the invariants involved and early detection of errors.
1248It also naturally combines well with the Russell approach which is based on dependent types.
1249However, it is also known to introduce several technical problems, like the need to explicitly prove type equalities to be able to manipulate expressions in certain ways.
1250In many situations, the difficulties encountered with manipulating dependent types are better addressed by improving the Matita system, according to the formalisation driven system development.
1251For this reason, and assuming a pessimistic estimation of our performance, the fourth columns presents the final estimation of the effort required, that also takes into account the complexity of the proof suggested by the informal proofs sketched in the previous section.
1252
1253\subsection{Contingency plan}
1254
1255On the basis of the proof strategy sketched in this document and the estimated effort, we can refine our contingency plan.
1256In case we will end the certification of the basic compiler in advance we will have the choice of either proving loop optimisations and/or partial redundancy elimination correct (both tasks seem difficult to achieve in a short time) or considering the MCS-51 specific extensions introduced during the first period and under-used in the formalised prototype.
1257Yet another possibility would be to better study retargeting of the code and the commutation property between different compiler passes.
1258The latter study is easily enabled by our approach where all back-end languages are instances of the same parameterized language.
1259
1260In the case of a consistent delay in the certification of some components, we will first address the passes that are more likely to have undetected bugs and we will follow a top-down approach, axiomatizing the properties of the data structures used in the compiler to focus more on the algorithms.
1261The rationale is that data structures are easier than algorithms to test using well-known methodologies.
1262The effort table clearly shows that common definitions and data structures are one quarter of the size of the current codebase code and require slightly less than one third of the total effort.
1263At least half of this effort really goes into simple data structures (vectors, bounded and unbounded integers, tries and maps) whose certification is not interesting and whose code could be taken without re-proving it from the library of any other theorem prover.
1264
1265\section{Conclusions}
1266
1267The overall exercise, whose details have only been sketched here, has been very useful.
1268It has brought to light some errors in our proof that have required major changes in the proof plan.
1269It has also shown that the last passes of the compilation (e.g. assembly) and cost prediction on the object-code are much more involved than more high-level passes.
1270
1271The final estimation for the effort required to complete the proof suffers from a low degree of confidence engendered in the numbers, due to the difficulty in relating our work, and compiler design, with that of CompCert.
1272It is however sufficient to conclude that the effort required is in line with the man power that was scheduled for the second half of the second period and for the third period.
1273Compared to the number of man months declared in Annex I of the contract, we will need more man months.
1274However, at both UNIBO and UEDIN there have been major differences in hiring with respect to the Annex.
1275Therefore both sites now have more manpower available, though with the associated trade-off of a lower level of maturity for the people employed.
1276
1277Our reviewers suggested that we use this estimation to compare two possible scenarios: a) proceed as planned, porting all the CompCert proofs to Matita or b) port D3.1 and D4.1 to Coq and re-use the CompCert proofs.
1278We remark here again that the back-end of the two compilers, from the memory model on, are quite different: we are not re-proving correctness of the same piece of code.
1279Moreover, the proof techniques are different for the front-end too.
1280Switching to the CompCert formalisation would imply the abandoning of the untrusted compiler, the abandoning of the experiment with a different proof technique, the abandoning of the quest for an open-source proof, and the abandoning of the co-development of the formalisation and the Matita proof assistant.
1281In the Commitment Letter~\cite{letter}, delivered to the Officer in May, we clarified our personal perspectives on the project goals and objectives.
1282We do not re-describe here the point of view presented in the letter, other than the condensed soundbite that ``we value diversity''.
1283
1284Clearly, if the execise would have suggested the infeasability in terms of effort of concluding the formalisation, or getting close to that, we would have abandoned our path and embraced the reviewer's suggestion.
1285However, we have been comforted in the analysis that we did in Autumn and further progress completed during the winter does not yet show any major delay with respect to the proof schedule.
1286We are thus planning to continue the certification according to the more detailed proof plan that came out from the exercise reported in this manuscript.
1287
1288\begin{thebibliography}{2}
1289\bibitem{compcert} X. Leroy, ``A Formally Verified Compiler back-end'',
1290Journal of Automated Reasoning 43(4)):363-446, 2009.
1291
1292\bibitem{letter}The CerCo team, ``Commitment to the Consideration of Reviewer's Reccomendation'', 16/05/2011.
1293\end{thebibliography}
1294
1295\newpage
1296
1297\includepdf[pages={-}]{cpp-2011.pdf}
1298
1299
1300\end{document}
Note: See TracBrowser for help on using the repository browser.