Index: /Deliverables/D1.2/CompilerProofOutline/outline.tex
===================================================================
 /Deliverables/D1.2/CompilerProofOutline/outline.tex (revision 1789)
+++ /Deliverables/D1.2/CompilerProofOutline/outline.tex (revision 1790)
@@ 78,5 +78,5 @@
\begin{LARGE}
\bf
Proof outline for the correctness of the CerCo compiler
+Proof outline for the correctness of\\the CerCo compiler
\end{LARGE}
\end{center}
@@ 93,5 +93,5 @@
\begin{large}
Main Authors:\\
B. Campbell, D. Mulligan, P. Tranquilli, C. Sacerdoti Coen
+J. Boender, B. Campbell, D. Mulligan, P. Tranquilli, C. Sacerdoti Coen
\end{large}
\end{center}
@@ 110,19 +110,12 @@
\label{sect.introduction}
In the last project review of the CerCo project, the project reviewers
recommended us to quickly outline a paperandpencil correctness proof
for each of the stages of the CerCo compiler in order to allow for an
estimation of the complexity and time required to complete the formalization
of the proof. This has been possible starting from month 18 when we have
completed the formalization in Matita of the datastructures and code of
the compiler.

In this document we provide a very highlevel, penandpaper
sketch of what we view as the best path to completing the correctness proof
for the compiler. In particular, for every translation between two intermediate languages, in both the front and backends, we identify the key translation steps, and identify some invariants that we view as being important for the correctness proof. We sketch the overall correctness results, and also briefly describe the parts of the proof that have already
been completed at the end of the First Period.

In the last section we finally present an estimation of the effort required
for the certification in Matita of the compiler and we draw conclusions.
+In the last project review of the CerCo project, the project reviewers recommended that we briefly outline a pencilandpaper correctness proof for each of the stages of the CerCo compiler in order to facilitate an estimation of the complexity and time required to complete the formalisation of the proof.
+This has been possible starting from month eighteen, as we have now completed the formalisation in Matita of the data structures and code of the compiler.
+
+In this document we provide a very highlevel, pencilandpaper sketch of what we view as the best path to completing the correctness proof for the compiler.
+In particular, for every translation between two intermediate languages, in both the front and backends, we identify the key translation steps, and identify some invariants that we view as being important for the correctness proof.
+We sketch the overall correctness results, and also briefly describe the parts of the proof that have already been completed at the end of the First Period.
+
+Finally, in the last section we present an estimation of the effort required for the certification in Matita of the compiler and draw conclusions.
\section{Frontend: Clight to RTLabs}
@@ 139,5 +132,5 @@
meeting we intend to move this transformation to the backend}\\
\> $\downarrow$ \> cost labelling\\
\> $\downarrow$ \> loop optimizations\footnote{\label{lab:opt2}To be ported from the untrusted compiler and certified only in case of early completion of the certification of the other passes.} (an endotransformation)\\
+\> $\downarrow$ \> loop optimisations\footnote{\label{lab:opt2}To be ported from the untrusted compiler and certified only in case of early completion of the certification of the other passes.} (an endotransformation)\\
\> $\downarrow$ \> partial redundancy elimination$^{\mbox{\scriptsize \ref{lab:opt2}}}$ (an endotransformation)\\
\> $\downarrow$ \> stack variable allocation and control structure
@@ 154,5 +147,5 @@
Here, by `endotransformation', we mean a mapping from language back to itself:
the loop optimization step maps the Clight language to itself.
+the loop optimisation step maps the Clight language to itself.
%Our overall statements of correctness with respect to costs will
@@ 357,5 +350,5 @@
\> $\downarrow$ \quad \= \kill
\textsf{RTLabs}\\
\> $\downarrow$ \> copy propagation\footnote{\label{lab:opt}To be ported from the untrusted compiler and certified only in case of early completion of the certification of the other passes.} (an endotransformation) \\
+\> $\downarrow$ \> copy propagation\footnote{\label{lab:opt}To be ported from the untrusted compiler and certified only in the case of an early completion of the certification of the other passes.} (an endotransformation) \\
\> $\downarrow$ \> instruction selection\\
\> $\downarrow$ \> change of memory models in compiler\\
@@ 380,51 +373,33 @@
\subsection{Graph translations}
RTLabs and most intermediate languages in the backend have a graph
representation:
the code of each function is represented by a graph of instructions.
The graph maps a set of labels (the names of the nodes) to the instruction
stored at that label (the nodes of the graph).
Instructions reference zero or more additional labels that are the immediate
successors of the instruction: zero for return from functions; more than one
for conditional jumps and calls; one in all other cases. The references
from one instruction to its immediates are the arcs of the graph.

Status of graph languages always have a program counter that holds a
representation of a reference to the current instruction.

A translation between two consecutive graph languages maps each instruction
stored at location $l$ in the first graph and with immediate successors
$\{l_1,\ldots,l_n\}$ to a subgraph of the output graph that has a single
entry point at location $l$ and exit arcs to $\{l_1,\ldots,l_n\}$. Moreover,
the labels of all non entry nodes in the subgraph are distinct from all the
labels in the source graph.

In order to simplify the translations and the relative proofs of forward
simulation, after the release of D4.2 and D4.3, we have provided:
+RTLabs and most intermediate languages in the backend have a graph representation: the code of each function is represented by a graph of instructions.
+The graph maps a set of labels (the names of the nodes) to the instruction stored at that label (the nodes of the graph).
+Instructions reference zero or more additional labels that are the immediate successors of the instruction: zero for return from functions, more than one for conditional jumps and calls, one in all other cases.
+The references from one instruction to its immediate successors are the arcs of the graph.
+
+The statuses of graph languages always contain a program counter that holds a representation of a reference to the current instruction.
+
+A translation between two consecutive graph languages maps each instruction stored at location $l$ in the first graph and with immediate successors $\{l_1,\ldots,l_n\}$ to a subgraph of the output graph that has a single entry point at location $l$ and exit arcs to $\{l_1,\ldots,l_n\}$.
+Moreover, the labels of all nonentry nodes in the subgraph are distinct from all the labels in the source graph.
+
+In order to simplify the translations and the relative proofs of forward simulation, after the release of D4.2 and D4.3, we have provided:
\begin{itemize}
 \item a new data type (called \texttt{blist}) that represents a
 sequence of instructions to be added to the output graph.
 The ``b'' in the name stands for binder, since a \texttt{blist} is
 either empty, an extension of a \texttt{blist} with an instruction
 at the front, or the generation of a fresh quantity followed by a
 \texttt{blist}. The latter feature is used, for instance, to generate
 fresh register names. The instructions in the list are unlabelled and
 all of them but the last one are also sequential, like in a linear program.
 \item a new iterator (called \texttt{b\_graph\_translate}) of type
+\item
+A new data type (called \texttt{blist}) that represents a sequence of instructions to be added to the output graph.
+The ``b'' in the name stands for binder, since a \texttt{blist} is either empty, an extension of a \texttt{blist} with an instruction at the front, or the generation of a fresh quantity followed by a \texttt{blist}.
+The latter feature is used, for instance, to generate fresh register names.
+The instructions in the list are unlabelled and all of them but the last one are also sequential, like in a linear program.
+\item
+A new iterator (called \texttt{b\_graph\_translate}) of type
\begin{displaymath}
\mathtt{b\_graph\_translate}: (\mathtt{label} \rightarrow \mathtt{blist})
\rightarrow \mathtt{graph} \rightarrow \mathtt{graph}
\end{displaymath}
 The iterator transform the input graph in the output graph by replacing
 each node with the graph that corresponds to the linear \texttt{blist}
 obtained by applying the function in input to the node label.
+The iterator transform the input graph in the output graph by replacing each node with the graph that corresponds to the linear \texttt{blist} obtained by applying the function in input to the node label.
\end{itemize}
Using the iterator above, the code can be written in such a way that
the programmer does not see any distinction between writing a transformation
on linear or graph languages.

In order to prove simulations for translations obtained using the iterator,
we will prove the following theorem:
+Using the iterator above, the code can be written in such a way that the programmer does not see any distinction between writing a transformatio on linear or graph languages.
+
+In order to prove simulations for translations obtained using the iterator, we will prove the following theorem:
\begin{align*}
@@ 434,13 +409,7 @@
\end{align*}
Here \texttt{subgraph} is a computational predicate that given a \texttt{blist}
$[i_1, \ldots, i_n]$, an entry label $l$, an exit label $l'$ and a graph $G$
expands to the fact that fetching from $G$ at address $l$ one retrieves a node
$i_1$ with a successor $l_1$ that, when fetched, yields a node $i_2$ with a
successor $l_2$ such that \ldots. The successor of $i_n$ is $l'$.

Proving a forward simulation diagram of the following kind using the theorem
above is now as simple as doing the same using standard small step operational
semantics over linear languages.
+Here \texttt{subgraph} is a computational predicate that given a \texttt{blist} $[i_1, \ldots, i_n]$, an entry label $l$, an exit label $l'$ and a graph $G$ expands to the fact that fetching from $G$ at address $l$ one retrieves a node $i_1$ with a successor $l_1$ that, when fetched, yields a node $i_2$ with a successor $l_2$ such that \ldots. The successor of $i_n$ is $l'$.
+
+Proving a forward simulation diagram of the following kind using the aforementioned theorem is now as straightforward as doing the same using standard smallstep operational semantics over linear languages.
\begin{align*}
@@ 451,17 +420,13 @@
\end{align*}
Because of the fact that graph translation preserves entry and exit labels of
translated statements, the state translation function $\sigma$ will simply
preserve the value of the program counter. The program code, which is
part of the state, is translated using the iterator.

The proof is then roughly the following. Let $l$ be the program counter of the
input state $s$. We proceed by cases on the current instruction of $s$.
Let $[i_1, \ldots, i_n]$ be the \texttt{blist} associated to $l$ and $s$
by the translation function. The witness required for the existential
statement is simply $n$. By applying the theorem above we know that the
next $n$ instructions that will be fetched from $s\ \sigma$ will be
$[i_1, \ldots, i_n]$ and it is now sufficient to prove that they simulate
the original instruction.
+Because graph translations preserve entry and exit labels of translated statements, the state translation function $\sigma$ will simply preserve the value of the program counter.
+The program code, which is part of the state, is translated using the iterator.
+
+The proof is then roughly as follows.
+Let $l$ be the program counter of the input state $s$.
+We proceed by cases on the current instruction of $s$.
+Let $[i_1, \ldots, i_n]$ be the \texttt{blist} associated to $l$ and $s$ by the translation function.
+The witness required for the existential statement is simply $n$.
+By applying the theorem above we know that the next $n$ instructions that will be fetched from $s\ \sigma$ will be $[i_1, \ldots, i_n]$ and it is now sufficient to prove that they simulate the original instruction.
\subsection{The RTLabs to RTL translation}
@@ 479,22 +444,19 @@
\end{displaymath}
In the frontend, we have both integer and float values, where integer values are `sized', along with null values and pointers. Some frontenv values are
representables in a byte, but some others require more bits.
+In the frontend, we have both integer and float values, where integer values are `sized', along with null values and pointers.
+Some frontend values are representables in a byte, but some others require more bits.
In the backend model all values are meant to be represented in a single byte.
Values can thefore be undefined, be one byte long integers or be indexed
fragments of a pointer, null or not. Floats values are no longer present, as floating point arithmetic is not supported by the CerCo compiler.

The $\sigma$ map implements a onetomany relation: a single frontend value
is mapped to a sequence of backend values when its size is more then one byte.

We further require a map, $\sigma$, which maps the frontend \texttt{Memory} and the backend's notion of \texttt{BEMemory}. Both kinds of memory can be
thought as an instance of a generic \texttt{Mem} data type parameterized over
the kind of values stored in memory.
+Values can thefore be undefined, be one byte long integers or be indexed fragments of a pointer, null or not.
+Floats values are no longer present, as floating point arithmetic is not supported by the CerCo compiler.
+
+The $\sigma$ map implements a onetomany relation: a single frontend value is mapped to a sequence of backend values when its size is more then one byte.
+
+We further require a map, $\sigma$, which maps the frontend \texttt{Memory} and the backend's notion of \texttt{BEMemory}.
+Both kinds of memory can be thought as an instance of a generic \texttt{Mem} data type parameterized over the kind of values stored in memory:
\begin{displaymath}
\mathtt{Mem}\ \alpha = \mathtt{Block} \rightarrow (\mathbb{Z} \rightarrow \alpha)
\end{displaymath}

Here, \texttt{Block} consists of a \texttt{Region} paired with an identifier.
@@ 502,5 +464,4 @@
\mathtt{Block} ::= \mathtt{Region} \times \mathtt{ID}
\end{displaymath}

We now have what we need for defining what is meant by the `memory' in the backend memory model.
Namely, we instantiate the previously defined \texttt{Mem} type with the type of backend memory values.
@@ 509,5 +470,4 @@
\mathtt{BEMem} = \mathtt{Mem}~\mathtt{BEValue}
\end{displaymath}

Memory addresses consist of a pair of backend memory values:
@@ 515,5 +475,4 @@
\mathtt{Address} = \mathtt{BEValue} \times \mathtt{BEValue} \\
\end{displaymath}

The back and frontend memory models differ in how they represent sized integeer values in memory.
In particular, the frontend stores integer values as a header, with size information, followed by a string of `continuation' blocks, marking out the full representation of the value in memory.
@@ 555,11 +514,9 @@
\texttt{load}^* \sigma(a)\ (\mathtt{store}\ \sigma(a')\ \sigma(v)\ \sigma(M)) = \mathtt{load}^*\ \sigma(s)\ \sigma(a)\ \sigma(M)
\end{displaymath}
That is, suppose we store a transformed value $\mathtt{\sigma(v)}$ into a backend memory $\mathtt{\sigma(M)}$ at address $\mathtt{\sigma(a')}$, using \texttt{store}, and then load from the address $\sigma(a)$. Even if $a$ and $a'$ are
distinct by hypothesis, there is a priori no guarantee that the consecutive
bytes for the value stored at $\sigma(a)$ are disjoint from those for the
values stored at $\sigma(a')$. The fact that this holds is a nontrivial
property of $\sigma$ to be proved.
+That is, suppose we store a transformed value $\mathtt{\sigma(v)}$ into a backend memory $\mathtt{\sigma(M)}$ at address $\mathtt{\sigma(a')}$, using \texttt{store}, and then load from the address $\sigma(a)$. Even if $a$ and $a'$ are distinct by hypothesis, there is \emph{a priori} no guarantee that the consecutive bytes for the value stored at $\sigma(a)$ are disjoint from those for the values stored at $\sigma(a')$.
+The fact that this holds is a nontrivial property of $\sigma$ that must be explicitly proved.
\subsubsection*{Translation of RTLabs states}
+
RTLabs states come in three flavours:
\begin{displaymath}
@@ 570,5 +527,5 @@
\end{array}
\end{displaymath}
\texttt{State} is the default state in which RTLabs programs are almost always in.
+\texttt{State} is the default state in which RTLabs programs are almost always in for the duration of their execution.
The \texttt{Call} state is only entered when a call instruction is being executed, and then we immediately return to being in \texttt{State}.
Similarly, \texttt{Return} is only entered when a return instruction is being executed, before returning immediately to \texttt{State}.
@@ 577,4 +534,5 @@
RTL states differ from their RTLabs counterparts, in including a program counter \texttt{PC}, stackpointer \texttt{SP}, internal stack pointer \texttt{ISP}, a carry flag \texttt{CARRY} and a set of registers \texttt{REGS}:
+
\begin{displaymath}
\mathtt{State} ::= \mathtt{Frame}^* \times \mathtt{PC} \times \mathtt{SP} \times \mathtt{ISP} \times \mathtt{CARRY} \times \mathtt{REGS}
@@ 584,12 +542,11 @@
As a result, we have two stack pointers in our state: \texttt{ISP}, which is the real, hardware stack, and \texttt{SP}, which is the stack pointer of the emulated stack in memory.
The emulated stack is used for pushing and popping stack frames when calling or returning from function calls, however this is done using the hardware stack, indexed by \texttt{ISP} as an intermediary.
Instructions like \texttt{LCALL} and \texttt{ACALL} are hardwired by the processor's design to push the return address on to the hardware stack. Therefore after a call has been made, and before a call returns, the compiler emits code to move the return address back and forth the two stacks. Parameters, return values
and local variables are only present in the external stack.
+Instructions like \texttt{LCALL} and \texttt{ACALL} are hardwired by the processor's design to push the return address on to the hardware stack.
+Therefore after a call has been made, and before a call returns, the compiler emits code to move the return address back and forth the two stacks.
+Parameters, return values and local variables are only present in the external stack.
As a result, for most of the execution of the processor, the hardware stack is empty, or contains a single item ready to be moved into external memory.
Once more, we require a relation $\sigma$ between RTLabs states and RTL states.
Because $\sigma$ is onetomany and, morally, a multifunction,
we use in the following the functional notation for $\sigma$, using $\star$
in the output of $\sigma$ to mean that any value is accepted.
+Because $\sigma$ is onetomany and, morally, a multivalued function, we use in the sequel the functional notation for $\sigma$, using $\star$ as a distinct marker in the range of $\sigma$ to mean that any value is accepted.
\begin{displaymath}
\mathtt{State} \stackrel{\sigma}{\longrightarrow} \mathtt{State}
@@ 601,6 +558,6 @@
\sigma(\mathtt{State} (\mathtt{Frame}^* \times \mathtt{\langle PC, REGS, SP \rangle})) \longrightarrow ((\sigma(\mathtt{Frame}^*), \sigma(\mathtt{PC}), \sigma(\mathtt{SP}), \star, \star, \sigma(\mathtt{REGS})), \sigma(\mathtt{Mem}))
\end{displaymath}
Translation then proceeds by translating the remaining stack frames, as well as the contents of the top stack frame. Any value for the internal stack pointer
and the carry bit is admitted.
+Translation then proceeds by translating the remaining stack frames, as well as the contents of the top stack frame.
+An arbitrary value for the internal stack pointer and the carry bit is admitted.
Translating \texttt{Call} and \texttt{Return} states is more involved, as a commutation between a single step of execution and the translation process must hold:
@@ 614,9 +571,6 @@
Here \emph{return one step} and \emph{call one step} refer to a pair of commuting diagrams relating the onestep execution of a call and return state and translation of both.
We provide the one step commuting diagrams in Figure~\ref{fig.commuting.diagrams}. The fact that one execution step in the source language is not performed
in the target language is not problematic for preservation of divergence
because it is easy to show that every step from a \texttt{Call} or
\texttt{Return} state is always preceeded/followed by one step that is always
simulated.
+We provide the one step commuting diagrams in Figure~\ref{fig.commuting.diagrams}.
+The fact that one execution step in the source language is not performed in the target language is not problematic for preservation of divergence because it is easy to show that every step from a \texttt{Call} or \texttt{Return} state is always preceeded or followed by one step that is always simulated.
\begin{figure}
@@ 641,5 +595,6 @@
\subsubsection*{The forward simulation proof}
The forward simulation proof for all steps that do not involve function calls are lengthy, but routine.
+
+The forward simulation proofs for all steps that do not involve function calls are lengthy, but routine.
They consist of simulating a frontend operation on frontend pseudoregisters and the frontend memory with sequences of backend operations on the backend pseudoregisters and backend memory.
The properties of $\sigma$ presented before that relate values and memories will need to be heavily exploited.
@@ 688,16 +643,13 @@
Eventually, a Return instruction is faced,
the return value is fetched from the registers map,
the stack frame is deallocated
and a Return state is entered:
+Eventually, a \texttt{RET} instruction is faced, the return value is fetched from the registers map, the stack frame is deallocated and a \texttt{Return} state is entered:
\begin{displaymath}
\begin{array}{rcl}
\mathtt{Return(id,\ args,\ dst,\ pc) \in State(Frame^*, Frame)} & \longrightarrow & \mathtt{free(sp)} \\
+\mathtt{RET(id,\ args,\ dst,\ pc) \in State(Frame^*, Frame)} & \longrightarrow & \mathtt{free(sp)} \\
& & \mathtt{Return(M(ret\_val), dst, Frames)}
\end{array}
\end{displaymath}
Then the return state restores the program counter by popping the stack, and execution proceeds in a new \texttt{State}, like the case for external functions:
+Then the \texttt{Return} state restores the program counter by popping the stack, and execution proceeds in a new \texttt{State}, like the case for external functions:
\begin{displaymath}
\begin{array}{rcl}
@@ 708,18 +660,19 @@
\subparagraph{The RTLabs to RTL translation for function calls}
+
Return instructions are translated to return instructions:
\begin{displaymath}
\mathtt{Return} \longrightarrow \mathtt{Return}
\end{displaymath}

\texttt{Call} instructions are translated to \texttt{Call\_ID} instructions:
\begin{displaymath}
\mathtt{Call(id,\ args,\ dst,\ pc)} \longrightarrow \mathtt{CALL\_ID(id,\ \Sigma'(args),\ \Sigma(dst),\ pc)}
\end{displaymath}
Here $\Sigma$ is the map, computed by the compiler,
that translate pseudoregisters holding frontend values to list of
pseudoregisters holding the chunks for the frontend values.
+\mathtt{RET} \longrightarrow \mathtt{RET}
+\end{displaymath}
+
+\texttt{CALL} instructions are translated to \texttt{CALL\_ID} instructions:
+\begin{displaymath}
+\mathtt{CALL(id,\ args,\ dst,\ pc)} \longrightarrow \mathtt{CALL\_ID(id,\ \Sigma'(args),\ \Sigma(dst),\ pc)}
+\end{displaymath}
+Here $\Sigma$ is the map, computed by the compiler, that translate pseudoregisters holding frontend values to list of pseudoregisters holding the chunks for the frontend values.
The specification for $\Sigma$ is that for every state $s$,
$$\sigma(s(r)) = (\sigma(s))(\Sigma(r))$$
+\begin{displaymath}
+\sigma(s(r)) = (\sigma(s))(\Sigma(r))
+\end{displaymath}
\subparagraph{Function call/return in RTL}
@@ 754,7 +707,6 @@
\mathtt{PUSH(current\_frame[pc := after\_return])}
\end{displaymath}
was executed. To summarize, up to the different numer of transitions required
to do the job, the RTL code for internal function calls closely simulates
the RTLabs code.
+was executed.
+To summarize, up to the different numer of transitions required to do the job, the RTL code for internal function calls closely simulates the RTLabs code.
The execution of \texttt{Return} in RTL is similarly straightforward, with the return address, stack pointer, and so on, being computed by popping off the top of the stack, and the return value computed by the function being retrieved from memory:
@@ 768,5 +720,4 @@
To summarize, the forward simulation diagrams for function call/return
XXX

Translation and execution must satisfy a pair of commutation properties for the \texttt{Return} and \texttt{Call} cases.
@@ 842,20 +793,15 @@
\subsection{The ERTL to LTL translation}
\label{subsect.ertl.ltl.translation}
During the ERTL to LTL translation pseudoregisters are stored in hardware
registers or spilled on to the stack frame. The decision is based on liveness
analysis of the ERTL code to determine what pair of pseudoregisters are live
at the same time at a given location. A coloring algorithm is then used to
choose where to store the pseudoregisters, allowing pseudoregisters that
are never live at the same time to share the same location.

We will not certify any coloring algorithm or control flow analysis.
Instead, we axiomatically assume the existence of solutions to the
coloring and liveness problems. In a later phase we plan to validate the
solutions by writing and certifying the code of a validator.

We describe the liveness analysis and colouring analysis first and then
the ERTL to LTL translation.

Throughout this section, we denote pseudoregisters with the type $\mathtt{register}$ and hardware ones with $\mathtt{hdwregister}$.
+During the ERTL to LTL translation pseudoregisters are stored in hardware registers or spilled onto the stack frame.
+The decision is based on a liveness analysis performed on the ERTL code to determine what pair of pseudoregisters are live at the same time for a given location.
+A colouring algorithm is then used to choose where to store the pseudoregisters, permitting pseudoregisters that are deemed never to be live at the same time to share the same location.
+
+We will not certify any colouring algorithm or control flow analysis.
+Instead, we axiomatically assume the existence of `oracles' that implement the colouring and liveness analyses.
+In a later phase we plan to validate the solutions by writing and certifying the code of a validator.
+
+We describe the liveness analysis and colouring analysis first and then the ERTL to LTL translation after.
+
+Throughout this section, we denote pseudoregisters with the type $\mathtt{register}$ and hardware registers with $\mathtt{hdwregister}$.
\subsubsection{Liveness analysis}
\newcommand{\declsf}[1]{\expandafter\newcommand\expandafter{\csname #1\endcsname}{\mathop{\mathsf{#1}}\nolimits}}
@@ 867,8 +813,9 @@
\declsf{StatementSem}
For the liveness analysis, we aim at a map
+For the liveness analysis, we aim to construct a map
$\ell \in \mathtt{label} \mapsto $ live registers at $\ell$.
We define the following operators on ERTL statements.
$$
+We define the following operators on ERTL statements:
+
+\begin{displaymath}
\begin{array}{lL>{(ex. $}L<{)$}}
\Defined(\ell) & registers defined at $\ell$ & \ell:r_1\leftarrow r_2+r_3 \mapsto \{r_1,C\}, \ell:\mathtt{CALL}~id\mapsto \text{callersave}
@@ 876,12 +823,7 @@
\Used(\ell) & registers used at $\ell$ & \ell:r_1\leftarrow r_2+r_3 \mapsto \{r_2,r_3\}, \ell:\mathtt{CALL}~id\mapsto \text{parameters}
\end{array}
$$
Given $LA:\mathtt{label}\to\mathtt{lattice}$ (where $\mathtt{lattice}$
is the type of sets of registers\footnote{More precisely, it is the lattice
$\mathtt{set}(\mathtt{register})\times
\mathtt{set}(\mathtt{hdwregister})$,
with pointwise operations.}), we also have have the following
predicates:
$$
+\end{displaymath}
+Given $LA:\mathtt{label}\to\mathtt{lattice}$ (where $\mathtt{lattice}$ is the type of sets of registers\footnote{More precisely, it is the lattice $\mathtt{set}(\mathtt{register}) \times \mathtt{set}(\mathtt{hdwregister})$, with pointwise operations.}), we also have have the following predicates:
+\begin{displaymath}
\begin{array}{lL}
\Eliminable_{LA}(\ell) & iff executing $\ell$ has sideeffects only on $r\notin LA(\ell)$
@@ 896,30 +838,33 @@
\end{cases}$
\end{array}
$$
In particular, $\Livebefore$ has type $(\mathtt{label}\to\mathtt{lattice})\to
\mathtt{label}\to\mathtt{lattice}$.

The equation on which we build the fixpoint is then
$$\Liveafter(\ell) \doteq \bigcup_{\ell <_1 \ell'} \Livebefore_{\Liveafter}(\ell')$$
where $\ell <_1 \ell'$ denotes that $\ell'$ is an immediate successor of $\ell$
in the graph. We do not require the fixpoint to be the least one, so the hypothesis
on $\Liveafter$ that we require is
+\end{displaymath}
+In particular, $\Livebefore$ has type $(\mathtt{label}\to\mathtt{lattice})\to \mathtt{label}\to\mathtt{lattice}$.
+
+The equation upon which we build the fixpoint is then
+\begin{displaymath}
+\Liveafter(\ell) \doteq \bigcup_{\ell <_1 \ell'} \Livebefore_{\Liveafter}(\ell')
+\end{displaymath}
+where $\ell <_1 \ell'$ denotes that $\ell'$ is an immediate successor of $\ell$ in the graph.
+We do not require the fixpoint to be the least one, so the hypothesis on $\Liveafter$ that we require is
\begin{equation}
\label{eq:livefixpoint}
\Liveafter(\ell) \supseteq \bigcup_{\ell <_1 \ell'} \Livebefore(\ell')
\end{equation}
(for shortness we drop the subscript from $\Livebefore$).
+(for brevity we drop the subscript from $\Livebefore$).
\subsubsection{Colouring}
+
\declsf{Colour}
\newcommand{\at}{\mathrel{@}}
The aim of the liveness analysis is to define what properties we need
of the colouring function, which is a map (computed separately for each
internal function)
$$\Colour:\mathtt{register}\to\mathtt{hdwregister}+\mathtt{nat}$$
which identifies pseudoregisters with hardware ones if it is able to, otherwise
it spills them to the stack. We will just state what property we need from such
a map. First, we extend the definition to all types of registers by:
$$\begin{aligned}
+
+The aim of liveness analysis is to define what properties we need of the colouring function, which is a map (computed separately for each internal function)
+\begin{displaymath}
+\Colour:\mathtt{register}\to\mathtt{hdwregister}+\mathtt{nat}
+\end{displaymath}
+which identifies pseudoregisters with hardware ones if it is able to, otherwise it spills them to the stack.
+We will just state what property we require from such a map.
+First, we extend the definition to all types of registers by:
+\begin{displaymath}
+\begin{aligned}
\Colour^+:\mathtt{hdwregister}+\mathtt{register} &\to \mathtt{hdwregister}+\mathtt{nat}\\
r & \mapsto
@@ 928,14 +873,14 @@
r &\text{if $r\in\mathtt{hdwregister}$,}.
\end{cases}
 \end{aligned}$$
The other piece of information we compute for each function is a \emph{similarity}
relation, which is an equivalence relation on all kinds of registers which depends
on the point of the program. We write
$$x\sim y \at \ell$$
to state that registers $x$ and $y$ are similar at $\ell$. The formal definition
of this relation's property will be given next, but intuitively it means that those two registers \emph{must}
have the same value after $\ell$. The analysis that produces this information can be
coarse: in our case, we just set two different registers to be similar at $\ell$
if at $\ell$ itself there is a move instruction between the two.
+\end{aligned}
+\end{displaymath}
+The other piece of information we compute for each function is a \emph{similarity} relation, which is an equivalence relation on all kinds of registers which depends on the point of the program.
+We write
+\begin{displaymath}
+x\sim y \at \ell
+\end{displaymath}
+to state that registers $x$ and $y$ are similar at $\ell$.
+The formal definition of this relation's property will be given next, but intuitively it means that those two registers \emph{must} have the same value after $\ell$.
+The analysis that produces this information can be coarse: in our case, we just set two different registers to be similar at $\ell$ if at $\ell$ itself there is a move instruction between the two.
The property required of colouring is the following:
@@ 947,28 +892,32 @@
\subsubsection{The translation}
+
For example:
$$\ell : r_1\leftarrow r_2+r_3 \mapsto \begin{cases}
+\begin{displaymath}
+\ell : r_1\leftarrow r_2+r_3 \mapsto \begin{cases}
\varepsilon & \text{if $\Eliminable(\ell)$},\\
\Colour(r_1) \leftarrow \Colour(r_2) + \Colour(r_3) & \text{otherwise}.
 \end{cases}$$
where $\varepsilon$ is the empty block of instructions (i.e.\ a \texttt{GOTO}),
and $\Colour(r_1) \leftarrow \Colour(r_2) + \Colour(r_3)$ is a notation for a
block of instructions that take into account:
+ \end{cases}
+\end{displaymath}
+where $\varepsilon$ is the empty block of instructions (i.e.\ a \texttt{GOTO}), and $\Colour(r_1) \leftarrow \Colour(r_2) + \Colour(r_3)$ is a notation for a block of instructions that take into account:
\begin{itemize}
 \item load and store ops on the stack if any colouring is in fact a spilling;
 \item using the accumulator to store intermediate values.
+\item
+Load and store ops on the stack if any colouring is in fact a spilling;
+\item
+Using the accumulator to store intermediate values.
\end{itemize}
The overall effect is that if $T$ is an LTL state with $\ell(T)=\ell$
then we will have $T\to^+T'$ where $T'(\Colour(r_1))=T(\Colour(r_2))+T(\Colour(r_2))$,
while $T'(y)=T(y)$ for any $y$ \emph{in the image of $\Colour$} different from
$\Colour(r_1)$. Some hardware registers that are used for bookkeeping and which
are explicitly excluded from colouring may have different values.
+The overall effect is that if $T$ is an LTL state with $\ell(T)=\ell$ then we will have $T\to^+T'$ where $T'(\Colour(r_1))=T(\Colour(r_2))+T(\Colour(r_2))$, while $T'(y)=T(y)$ for any $y$ \emph{in the image of $\Colour$} different from $\Colour(r_1)$.
+Some hardware registers that are used for bookkeeping and which are explicitly excluded from colouring may have different values.
We skip the details of correctly dealing with the stack and its size.
+
\subsubsection{The relation between ERTL and LTL states}
Given a state $S$ in ERTL, we abuse the notation by using $S$ as the underlying map
$$S : \mathtt{hdwregister}+\mathtt{register} \to \mathtt{Value}.$$
The program counter in $S$ is written as $\ell(S)$. At this point we can state
the property asked from similarity:
+
+Given a state $S$ in ERTL, we abuse notation by using $S$ as the underlying map
+\begin{displaymath}
+S : \mathtt{hdwregister}+\mathtt{register} \to \mathtt{Value}
+\end{displaymath}
+We write $\ell(S)$ for the program counter in $S$.
+At this point we can state the property asked from similarity:
\begin{equation}
\label{eq:similprop}
@@ 976,78 +925,92 @@
\end{equation}
Next, we relate ERTL states with LTL ones. For a state $T$ in LTL we again
abuse the notation using $T$ as a map
$$T: \mathtt{hdwregister}+\mathtt{nat} \to \mathtt{Value}$$
which maps hardware registers and \emph{local stack offsets} to values (in particular,
$T$ as a map depends on the saved frames for computing the correct absolute
stack values).

The relation existing between the states at the two sides of this translation step,
which depends on liveness and colouring, is
then defined as
$$S\mathrel\sigma T \iff \ldots \wedge \forall x. x\in \Livebefore(\ell(S))\Rightarrow T(\Colour^+(x)) = S(x).$$
The ellipsis stands for other straightforward preservation, among which the properties
$\ell(T) = \ell(S)$ and, inductively, the preservation of frames.
+Next, we relate ERTL states with LTL ones.
+For a state $T$ in LTL we again abuse notation using $T$ as a map
+\begin{displaymath}
+T: \mathtt{hdwregister}+\mathtt{nat} \to \mathtt{Value}
+\end{displaymath}
+which maps hardware registers and \emph{local stack offsets} to values (in particular, $T$ as a map depends on the saved frames for computing the correct absolute stack values).
+
+The relation existing between the states at the two sides of this translation step, which depends on liveness and colouring, is then defined as
+\begin{displaymath}
+S\mathrel\sigma T \iff \ldots \wedge \forall x. x\in \Livebefore(\ell(S))\Rightarrow T(\Colour^+(x)) = S(x)
+\end{displaymath}
+The ellipsis stands for other straightforward preservation, among which the properties $\ell(T) = \ell(S)$ and, inductively, the preservation of frames.
\subsubsection{Proof of preservation}
+
We will prove the following proposition:
$$\forall S, T. S \mathrel\sigma T \Rightarrow S \to S' \Rightarrow \exists T'.T\to^+ T' \wedge S'\mathrel\sigma T'$$
(with appropriate costlabelled trace preservation which we omit). We will call $S\mathrel \sigma T$
the inductive hypothsis, as it will be such in the complete proof by induction on the trace of the program.
As usual, this step is done by cases
on the statement at $\ell(S)$ and how it is translated. We carry out in some detail a single case, the one of
a binary operation on registers.
+\begin{displaymath}
+\forall S, T. S \mathrel\sigma T \Rightarrow S \to S' \Rightarrow \exists T'.T\to^+ T' \wedge S'\mathrel\sigma T'
+\end{displaymath}
+(with appropriate costlabelled trace preservation which we omit).
+We will call $S\mathrel \sigma T$ the inductive hypothsis, as it will be such in the complete proof by induction on the trace of the program.
+As usual, this step is done by cases on the statement at $\ell(S)$ and how it is translated.
+We carry out the case of a binary operation on registers in some detail.
Suppose that $\ell(S):r_1 \leftarrow r_2+r_3$, so that
$$S'(x)=\begin{cases}S(r_1)+S(r_2) &\text{if $x=r_1$,}\\S(x) &\text{otherwise.}\end{cases}$$
+\begin{displaymath}
+S'(x)=\begin{cases}S(r_1)+S(r_2) &\text{if $x=r_1$,}\\S(x) &\text{otherwise.}\end{cases}
+\end{displaymath}
+
\paragraph*{Case $\Eliminable(\ell(S))$.}
By definition we have $r_1\notin \Liveafter(\ell(S))$, and the translation
of the operation yields a \texttt{GOTO}. We take $T'$ the immediate successor
of $T$.
+
+By definition we have $r_1\notin \Liveafter(\ell(S))$, and the translation of the operation yields a \texttt{GOTO}.
+We take $T'$ the immediate successor of $T$.
Now in order to prove $S'\mathrel\sigma T'$, take any
$$x\in\Livebefore(\ell(S'))\subseteq \Liveafter(\ell(S)) = \Livebefore(\ell(S))$$
where we have used property~\eqref{eq:livefixpoint} and the definition
of $\Livebefore$ when $\Eliminable(\ell(S))$. We get the following chain of equalities:
$$T'(\Colour^+(x))\stackrel 1=T(\Colour^+(x))\stackrel 2=S(x) \stackrel 3= S'(x)$$
+\begin{displaymath}
+x\in\Livebefore(\ell(S'))\subseteq \Liveafter(\ell(S)) = \Livebefore(\ell(S))
+\end{displaymath}
+where we have used property~\eqref{eq:livefixpoint} and the definition of $\Livebefore$ when $\Eliminable(\ell(S))$.
+We get the following chain of equalities:
+
+\begin{displaymath}
+T'(\Colour^+(x))\stackrel 1=T(\Colour^+(x))\stackrel 2=S(x) \stackrel 3= S'(x)
+\end{displaymath}
where
\begin{enumerate}
 \item is because $T'$ has the same store as $T$,
 \item is by inductive hypothesis as $x\in\Livebefore(\ell(S))$,
 \item is because $x\neq r_1$, as $r_1\notin \Liveafter(\ell(S))\ni x$.
+\item
+follows as $T'$ has the same store as $T$,
+\item
+follows from the inductive hypothesis as $x\in\Livebefore(\ell(S))$,
+\item
+follows as $x\neq r_1$, as $r_1\notin \Liveafter(\ell(S))\ni x$.
\end{enumerate}
+
\paragraph*{Case $\neg\Eliminable(\ell(S))$.}
+
We then have $r_1\in\Liveafter(\ell(S))$, and
$$\Livebefore(\ell(S))=(\Liveafter(\ell(S))\setminus\{r_1\})\cup\{r_2,r_3\}.$$
Moreover the statement is translated to $\Colour(r_1)\leftarrow\Colour(r_2)+\Colour(r_3)$,
and we take the $T'\leftarrow^+T$ such that $T'(\Colour(r_1))=T(\Colour(r_2))+T(\Colour(r_3))$ and
$T'(\Colour^+(x))=T(\Colour^+(x))$ for all $x$ with $\Colour^+(x)\neq\Colour(r_1)$.
+
+\begin{displaymath}
+\Livebefore(\ell(S))=(\Liveafter(\ell(S))\setminus\{r_1\})\cup\{r_2,r_3\}
+\end{displaymath}
+Moreover the statement is translated to $\Colour(r_1)\leftarrow\Colour(r_2)+\Colour(r_3)$, and we take the $T'\leftarrow^+T$ such that $T'(\Colour(r_1))=T(\Colour(r_2))+T(\Colour(r_3))$ and $T'(\Colour^+(x))=T(\Colour^+(x))$ for all $x$ with $\Colour^+(x)\neq\Colour(r_1)$.
Take any $x\in\Livebefore(\ell(S'))\subseteq \Liveafter(\ell(S))$ (by property~\eqref{eq:livefixpoint}).
If $\Colour^+(x)=\Colour(r_1)$, we have by property~\eqref{eq:colourprop}
that $x\sim r_1\at \ell(S)$ (as both $r_1,x\in\Liveafter(\ell(S))$, so that
$$T'(\Colour^+(x))=T(\Colour(r_2))+T(\Colour(r_3))\stackrel 1=S(r_2)+S(r_3)=S'(r_1)\stackrel 2=S(x)$$
+If $\Colour^+(x)=\Colour(r_1)$, we have by property~\eqref{eq:colourprop} that $x\sim r_1\at \ell(S)$ (as both $r_1,x\in\Liveafter(\ell(S))$, so that
+\begin{displaymath}
+T'(\Colour^+(x))=T(\Colour(r_2))+T(\Colour(r_3))\stackrel 1=S(r_2)+S(r_3)=S'(r_1)\stackrel 2=S(x)
+\end{displaymath}
where
\begin{enumerate}
 \item is by two uses of inductive hypothesis, as $r_2,r_3\in\Livebefore(\ell(S))$,
 \item is by property~\eqref{eq:similprop}\footnote{Notice that in our particular implementation
for this case of binary op $x\sim r_1\at\ell(S)$ implies $x=r_1$. But nothing prevents us from
employing more fine euristics for similarity.}.
+\item
+follows from two uses of the inductive hypothesis, as $r_2,r_3\in\Livebefore(\ell(S))$,
+\item
+follows from property~\eqref{eq:similprop}\footnote{Notice that in our particular implementation for this case of binary op $x\sim r_1\at\ell(S)$ implies $x=r_1$.
+However, nothing prevents us from employing more finegrained heuristics for similarity.}.
\end{enumerate}
If $\Colour^+(x)\neq\Colour(r_1)$ (so in particular $x\neq r_1$), then $x\in\Livebefore(\ell(S))$,
so by inductive hypothesis we have
$$T'(\Colour^+(x))=T(\Colour^+(x))=S(x)=S'(x).$$
+If $\Colour^+(x)\neq\Colour(r_1)$ (so in particular $x\neq r_1$), then $x\in\Livebefore(\ell(S))$, so by inductive hypothesis we have
+\begin{displaymath}
+T'(\Colour^+(x))=T(\Colour^+(x))=S(x)=S'(x)
+\end{displaymath}
\subsection{The LTL to LIN translation}
\label{subsect.ltl.lin.translation}
Ad detailed elsewhere in the reports, due to the parameterized representation of
the backend languages, the pass described here is actually much more generic
than the translation from LTL to LIN. It consists in a linearization pass
that maps any graphbased backend language to its corresponding linear form,
preserving its semantics. In the rest of the section, however, we will keep
the names LTL and LIN for the two partial instantiations of the parameterized
language.
+As detailed elsewhere in the reports, due to the parameterised representation of the backend languages, the pass described here is actually much more generic than the translation from LTL to LIN.
+It consists in a linearisation pass that maps any graphbased backend language to its corresponding linear form, preserving its semantics.
+In the rest of the section, however, we will keep the names LTL and LIN for the two partial instantiations of the parameterized language for convenience.
We require a map, $\sigma$, from LTL statuses, where program counters are represented as labels in a graph data structure, to LIN statuses, where program counters are natural numbers:
@@ 1106,7 +1069,6 @@
\end{enumerate}
Note, because the LTL to LIN transformation is the first time the code of
a function is linearised in the backend, we must discover a notion of `well formed function code' suitable for linearised forms.
In particular, we see the notion of well formedness (yet to be formally defined) resting on the following conditions:
+Note, because the LTL to LIN transformation is the first time the code of a function is linearised in the backend, we must discover a notion of `wellformed function code' suitable for linearised forms.
+In particular, we see the notion of wellformedness (yet to be formally defined) resting on the following conditions:
\begin{enumerate}
@@ 1116,6 +1078,5 @@
Each label is unique, appearing only once in the function code,
\item
The final instruction of a function code must be a return or an unconditional
jump.
+The final instruction of a function code must be a return or an unconditional jump.
\end{enumerate}
@@ 1132,149 +1093,98 @@
The ASM to MCS51 machine code translation step, and the required statements of correctness, are found in an unpublished manuscript attached to this document.
This is the most complex translation because of the huge number of cases
to be addressed and because of the complexity of the two semantics.
Moreover, in the assembly code we have conditional and unconditional jumps
to arbitrary locations in the code, which are not supported by the MCS51
instruction set. The latter has several kind of jumps characterized by a
different instruction size and execution time, but limited in range. For
instance, conditional jumps to locations whose destination is more than
$2^7$ bytes away from the jump instruction location are not supported at
all and need to be emulated with a code transformation. The problem, which
is known in the litterature as branch displacement and that applies also
to modern architectures, is known to be hard and is often NP. As far as we
know, we will provide the first formally verified proof of correctness for
an assembler that implements branch displacement. We are also providing
the first verified proof of correctness of a mildly optimizing branch
displacement algorithm that, at the moment, is almost finished, but not
described in the companion paper. This proof by itself took about 6 men
months.
+This is the most complex translation because of the huge number of cases to be addressed and because of the complexity of the two semantics.
+Moreover, in the assembly code we have conditional and unconditional jumps to arbitrary locations in the code, which are not supported by the MCS51 instruction set.
+The latter has several kind of jumps characterized by a different instruction size and execution time, but limited in range.
+For instance, conditional jumps to locations whose destination is more than $2^7$ bytes away from the jump instruction location are not supported at all and need to be emulated with a code transformation.
+This problem, which is known in the literature as branch displacement and is a universal problem for all architectures of microcontroller, is known to be computationally hard, often lying inside NP, depending on the exact characteristics of the target architecture.
+As far as we know, we will provide the first formally verified proof of correctness for an assembler that implements branch displacement.
+We are also providing the first verified proof of correctness of a mildly optimising branch displacement algorithm that, at the moment, is almost finished, but not described in the companion paper.
+This proof, in isolation, took around 6 man months.
\section{Correctness of cost prediction}
Roughly speaking,
the proof of correctness of cost prediction shows that the cost of executing
a labelled object code program is the same as the sum over all labels in the
program execution trace of the cost statically associated to the label and
computed on the object code itself.

In presence of object level function calls, the previous statement is, however,
incorrect. The reason is twofold. First of all, a function call may diverge.
To the last labels that comes before the call, however, we also associate
the cost of the instructions that follow the call. Therefore, in the
sum over all labels, when we meet a label we prepay for the instructions
after function calls, assuming all calls to be terminating. This choice is
driven by considerations on the source code. Functions can be called also
inside expressions and it would be too disruptive to put labels inside
expressions to capture the cost of instructions that follow a call. Moreover,
adding a label after each call would produce a much higher number of proof
obligations in the certification of source programs using FramaC. The
proof obligations, moreover, would be guarded by termination of all functions
involved, that also generates lots of additional complex proof obligations
that have little to do with execution costs. With our approach, instead, we
put less burden on the user, at the price of proving a weaker statement:
the estimated and actual costs will be the same if and only if the high level
program is converging. For prefixes of diverging programs we can provide
a similar result where the equality is replaced by an inequality (loss of
precision).

Assuming totality of functions is however not sufficient yet at the object
level. Even if a function returns, there is no guarantee that it will transfer
control back to the calling point. For instance, the function could have
manipulated the return address from its stack frame. Moreover, an object level
program can forge any address and transfer control to it, with no guarantee
on the execution behaviour and labelling properties of the called program.

To solve the problem, we introduced the notion of \emph{structured trace}
that come in two flavours: structured traces for total programs (an inductive
type) and structured traces for diverging programs (a coinductive type based
on the previous one). Roughly speaking, a structured trace represents the
execution of a well behaved program that is subject to several constraints
like
+
+Roughly speaking, the proof of correctness of cost prediction shows that the cost of executing a labelled objectcode program is the same as the summation over all labels in the program execution trace of the cost statically associated to the label and computed on the object code itself.
+
+In the presence of objectlevel function calls, the previous statement is, however, incorrect.
+The reason is twofold.
+First of all, a function call may diverge.
+However, to the labels that appears just before a call, we also associate the cost of the instructions that follow the call.
+Therefore, in the summation over all labels, when we meet a label we prepay for the instructions that appear after function calls, assuming all calls to be terminating.
+
+This choice is driven by several considerations on the source code.
+Namely, functions can be called inside expressions, and it would be too disruptive to put labels inside expressions to capture the cost of instructions that follow a call.
+Moreover, adding a label after each call would produce a much higher number of proof obligations in the certification of source programs using FramaC.
+The proof obligations, further, would be guarded by a requirement to demonstrate the termination of all functions involved, that also generates lots of additional complex proof obligations that have little to do with execution costs.
+
+With our approach, instead, we put less burden on the user, at the price of proving a weaker statement: the estimated and actual costs will be the same if and only if the highlevel program is converging.
+For prefixes of diverging programs we can provide a similar result where the equality is replaced by an inequality (loss of precision).
+
+Assuming totality of functions is however not a sufficiently strong condition at the objectlevel.
+Even if a function returns, there is no guarantee that it will transfer control back to the calling point.
+For instance, the function could have manipulated the return address from its stack frame.
+Moreover, an objectlevel program can forge any address and transfer control to it, with no guarantees about the execution behaviour and labelling properties of the called program.
+
+To solve the problem, we introduced the notion of a \emph{structured trace} that come in two flavours: structured traces for total programs (an inductive type) and structured traces for diverging programs (a coinductive type based on the previous one).
+Roughly speaking, a structured trace represents the execution of a well behaved program that is subject to several constraints, such as:
\begin{enumerate}
 \item All function calls return control just after the calling point
 \item The execution of all function bodies start with a label and end with
 a RET (even the ones reached by invoking a function pointer)
 \item All instructions are covered by a label (required by correctness of
 the labelling approach)
 \item The target of all conditional jumps must be labelled (a sufficient
 but not necessary condition for precision of the labelling approach)
 \item \label{prop5} Two structured traces with the same structure yield the same
 cost traces.
+\item
+All function calls return control just after the calling point,
+\item
+The execution of all function bodies start with a label and end with a \texttt{RET} instruction (even those reached by invoking a function pointer),
+\item
+All instructions are covered by a label (required by the correctness of the labelling approach),
+\item
+The target of all conditional jumps must be labelled (a sufficient but not necessary condition for precision of the labelling approach)
+\item
+\label{prop5}
+Two structured traces with the same structure yield the same cost traces.
\end{enumerate}
Correctness of cost predictions is proved only for structured execution traces,
i.e. well behaved programs. The forward simulation proof for all backend
passes will actually be a proof of preservation of the structure of
the structured traces that, because of property \ref{prop5}, will imply
correctness of the cost prediction for the backend. The Clight to RTLabs
will also include a proof that associates to each converging execution its
converging structured trace and to each diverging execution its diverging
structured trace.

There are also other two issues that invalidate the naive statement of
correctness of cost prediciton given above. The algorithm that statically
computes the cost of blocks is correct only when the object code is \emph{well
formed} and the program counter is \emph{reachable}.
A well formed object code is such that
the program counter will never overflow after the execution step of
the processor. An overflow that occurs during fetching but is overwritten
during execution is, however, correct and necessary to accept correct
programs that are as large as the program memory. Temporary overflows add
complications to the proof. A reachable address is an address that can be
obtained by fetching (not executing!) a finite number of times from the
beginning of the code memory without ever overflowing. The complication is that
the static prediction traverses the code memory assuming that the memory will
be read sequentially from the beginning and that all jumps jump only to
reachable addresses. When this property is violated, the way the code memory
is interpreted is uncorrect and the cost computed is totally meaningless.
The reachability relation is closed by fetching for well formed programs.
The property that calls to function pointers only target reachable and
well labelled locations, however, is not statically predictable and it is
enforced in the structured trace.

The proof of correctness of cost predictions has been quite complex. Setting
up the good invariants (structured traces, well formed programs, reachability)
and completing the proof has required more than 3 men months while the initally
estimated effort was much lower. In the paperandpencil proof for IMP, the
corresponding proof was obvious and only took two lines.

The proof itself is quite involved. We
basically need to show as an important lemma that the sum of the execution
costs over a structured trace, where the costs are summed in execution order,
is equivalent to the sum of the execution costs in the order of prepayment.
The two orders are quite different and the proof is by mutual recursion over
the definition of the converging structured traces, which is a family of three
mutual inductive types. The fact that this property only holds for converging
function calls in hidden in the definition of the structured traces.
Then we need to show that the order of prepayment
corresponds to the order induced by the cost traces extracted from the
structured trace. Finally, we need to show that the statically computed cost
for one block corresponds to the cost dinamically computed in prepayment
order.
+Correctness of cost predictions is only proved for structured execution traces, i.e. well behaved programs.
+The forward simulation proof for all backend passes will actually be a proof of preservation of the structure of the structured traces that, because of property \ref{prop5}, will imply correctness of the cost prediction for the backend.
+The Clight to RTLabs correctness proof will also include a proof that associates to each converging execution its converging structured trace and to each diverging execution its diverging structured trace.
+
+There are also two more issues that invalidate the na\"ive statement of correctness for cost prediciton given above.
+The algorithm that statically computes the costs of blocks is correct only when the object code is \emph{wellformed} and the program counter is \emph{reachable}.
+A wellformed objectcode is such that the program counter will never overflow after any execution step of the processor.
+An overflow that occurs during fetching but is overwritten during execution is, however, correct and necessary to accept correct programs that are as large as the processor's code memory.
+Temporary overflows add complications to the proof.
+A reachable address is an address that can be obtained by fetching (\emph{not executing!}) a finite number of times from the beginning of the code memory without ever overflowing.
+The complication is that the static prediction traverses the code memory assuming that the memory will be read sequentially from the beginning and that all jumps jump only to reachable addresses.
+When this property is violated, the way the code memory is interpreted is incorrect and the cost computed is meaningless.
+The reachability relation is closed by fetching for wellformed programs.
+The property that states that function pointers only target reachable and welllabelled locations, however, is not statically predictable and it is therefore enforced in the structured trace.
+
+The proof of correctness for cost predictions, and even discovering the correct statement, has been quite complex.
+Setting up good invariants (i.e. structured traces, well formed programs, reachability) and completing the proof has required more than 3 man months, while the initally estimated effort was much lower.
+In the paperandpencil proof for IMP, the corresponding proof was `obvious' and only took two lines.
+
+The proof itself is quite involved.
+We must show, as an important lemma, that the sum of the execution costs over a structured trace, where the costs are summed in execution order, is equivalent to the sum of the execution costs in the order of prepayment.
+The two orders are quite different and the proof is by mutual recursion over the definition of the converging structured traces, which is a family of three mutual inductive types.
+The fact that this property only holds for converging function calls is hidden in the definition of the structured traces.
+Then we need to show that the order of prepayment corresponds to the order induced by the cost traces extracted from the structured trace.
+Finally, we need to show that the statically computed cost for one block corresponds to the cost dynamically computed in prepayment order.
\section{Overall results}
Functional correctness of the compiled code can be shown by composing
the simulations to show that the target behaviour matches the
behaviour of the source program, if the source program does not `go
wrong'. More precisely, we show that there is a forward simulation
between the source trace and a (flattened structured) trace of the
output, and conclude equivalence because the target's semantics are
in the form of an executable function, and hence
deterministic.

Combining this with the correctness of the assignment of costs to cost
labels at the ASM level for a structured trace, we can show that the
cost of executing any compiled function (including the main function)
is equal to the sum of all the values for cost labels encountered in
the \emph{source code's} trace of the function.
+Functional correctness of the compiled code can be shown by composing the simulations to show that the target behaviour matches the behaviour of the source program, if the source program does not `go wrong'.
+More precisely, we show that there is a forward simulation between the source trace and a (flattened structured) trace of the output, and conclude equivalence because the target's semantics are in the form of an executable function, and hence deterministic.
+
+Combining this with the correctness of the assignment of costs to cost labels at the ASM level for a structured trace, we can show that the cost of executing any compiled function (including the toplevel, main function of a C program) is equal to the sum of all the values for cost labels encountered in the \emph{source code's} trace of the function.
\section{Estimated effort}
Based on the rough analysis performed so far we can estimate the total
effort for the certification of the compiler. We obtain this estimation by
combining, for each pass: 1) the number of lines of code to be certified;
2) the ratio of number of lines of proof to number of lines of code from
the CompCert project~\cite{compcert} for the CompCert pass that is closest to
ours; 3) an estimation of the complexity of the pass according to the
analysis above. The result is shown in Table~\ref{table}.
+
+Based on a rough analysis performed so far we can estimate the total effort required for the certification of the whole compiler.
+We obtain this estimation by combining, for each pass:
+\begin{enumerate}
+\item
+The number of lines of code to be certified,
+\item
+The ratio of number of lines of proof to number of lines of code from the CompCert project~\cite{compcert} for the CompCert pass that is closest in spirit to our own,
+\item
+An estimation of the complexity of the pass according to the analysis above.
+\end{enumerate}
+The result is shown in Table~\ref{table}.
\begin{table}{h}
@@ 1302,115 +1212,63 @@
\end{table}
We provide now some additional informations on the methodology used in the
computation. The passes in Cerco and CompCert frontend closely match each
other. However, there is no clear correspondence between the two backends.
For instance, we enforce the calling convention immediately after instruction
selection, whereas in CompCert this is performed in a later phase. Or we
linearize the code at the very end, whereas CompCert performs linearization
as soon as possible. Therefore, the first part of the exercise has consisted
in shuffling and partitioning the CompCert code in order to assign to each
CerCo pass the CompCert code that performs the same transformation.

After this preliminary step, using the data given in~\cite{compcert} (which
are relative to an early version of CompCert) we computed the ratio between
men months and lines of code in CompCert for each CerCo pass. This is shown
in the third column of Table~\ref{table}. For those CerCo passes that
have no correspondence in CompCert (like the optimizing assembler) or where
we have insufficient data, we have used the average of the ratios computed
above.

The first column of the table shows the number of lines of code for each
pass in CerCo. The third column is obtained multiplying the first with the
CompCert ratio. It provides an estimate of the effort required (in men months)
if the complexity of the proofs for CerCo and Compcert would be the same.

The two proof styles, however, are on purpose completely different. Where
CompCert uses non executable semantics, describing the various semantics with
inductive types, we have preferred executable semantics. Therefore, CompCert
proofs by induction and inversion become proof by functional inversion,
performed using the Russel methodology (now called Program in Coq, but whose
behaviour differs from Matita's one). Moreover, CompCert code is written using
only types that belong to the HindleyMilner fragment, whereas we have
heavily exploited dependent types all over the code. The dependent type
discipline offers many advantages from the point of view of clarity of the
invariants involved and early detection of errors and it naturally combines
well with the Russel approach which is based on dependent types. However, it
is also well known to introduce technical problems all over the code, like
the need to explicitly prove type equalities to be able to manipulate
expressions in certain ways. In many situations, the difficulties encountered
with manipulating dependent types are better addressed by improving the Matita
system, according to the formalization driven system development. For this
reason, and assuming a pessimistic point of view on our performance, the
fourth columns presents the final estimation of the effort required, that also
takes in account the complexity of the proof suggested by the informal proofs
sketched in the previous section.
+We provide now some additional informations on the methodology used in the computation.
+The passes in Cerco and CompCert frontend closely match each other.
+However, there is no clear correspondence between the two backends.
+For instance, we enforce the calling convention immediately after instruction selection, whereas in CompCert this is performed in a later phase.
+Further, we linearise the code at the very end, whereas CompCert performs linearisation as soon as possible.
+Therefore, the first part of the estimation exercise has consisted of shuffling and partitioning the CompCert code in order to assign to each CerCo pass the CompCert code that performs a similar transformation.
+
+After this preliminary step, using the data given in~\cite{compcert} (which refers to an early version of CompCert) we computed the ratio between man months and lines of code in CompCert for each CerCo pass.
+This is shown in the third column of Table~\ref{table}.
+For those CerCo passes that have no correspondence in CompCert (like the optimising assembler) or where we have insufficient data, we have used the average of the ratios computed above.
+
+The first column of the table shows the number of lines of code for each pass in CerCo.
+The third column is obtained multiplying the first with the CompCert ratio.
+It provides an estimate of the effort required (in man months) if the complexity of the proofs for CerCo and CompCert would be the same.
+
+The two proof styles, however, are purposefully completely different.
+Where CompCert uses nonexecutable semantics, describing the various semantics of languages with inductive types, we have preferred executable semantics.
+Therefore, CompCert proofs by induction and inversion become proofs by functional inversion, performed using the Russell methodology (now called Program in Coq, but whose behaviour differs from Matita's one).
+Moreover, CompCert code is written using only types that belong to the HindleyMilner fragment, whereas we have heavily exploited dependent types throughout the codebase.
+The dependent type discipline offers many advantages, especially from the point of view of clarity of the invariants involved and early detection of errors.
+It also naturally combines well with the Russell approach which is based on dependent types.
+However, it is also known to introduce several technical problems, like the need to explicitly prove type equalities to be able to manipulate expressions in certain ways.
+In many situations, the difficulties encountered with manipulating dependent types are better addressed by improving the Matita system, according to the formalisation driven system development.
+For this reason, and assuming a pessimistic estimation of our performance, the fourth columns presents the final estimation of the effort required, that also takes into account the complexity of the proof suggested by the informal proofs sketched in the previous section.
\subsection{Contingency plan}
On the basis of the proof strategy sketched in this document and the
estimated effort, we can refine our contingency plan. In case we will end
the certification of the basic compiler in advance we will have the choice
of either proving loop optimizations and/or partial redundancy elimination
correct (both tasks that seem difficult to achieve in a short time) or
considering the MCS51 specific extensions introduced during the first period
and underused in the formalized prototype. Yet another possibility would be
to better study retargeting of the code and the commutation property between
different compiler passes. The latter study is easily enabled by our
approach where all backend languages are instances of the same parameterized
language.

In the case of a consistent delay in the certification of some
components, we will address first the passes that are more likely to have
undetected bugs and we will follow a topdown approach, axiomatizing
the properties of the data structured used in the compiler to focus more
on the algorithms. The rational is that data structures are easier then
algorithms to test using well known methodologies.
The effort table clearly shows that commond definitions
and data structures are 1/4th of the size of the code and require slightly
less than 1/3rd of the total effort. At least half of this effort really goes
into simple data structures (vectors, bounded and unbounded integers, tries
and maps) whose certification is not interesting and whose code could be
taken without reproving it from the library of any other theorem prover.
+
+On the basis of the proof strategy sketched in this document and the estimated effort, we can refine our contingency plan.
+In case we will end the certification of the basic compiler in advance we will have the choice of either proving loop optimisations and/or partial redundancy elimination correct (both tasks seem difficult to achieve in a short time) or considering the MCS51 specific extensions introduced during the first period and underused in the formalised prototype.
+Yet another possibility would be to better study retargeting of the code and the commutation property between different compiler passes.
+The latter study is easily enabled by our approach where all backend languages are instances of the same parameterized language.
+
+In the case of a consistent delay in the certification of some components, we will first address the passes that are more likely to have undetected bugs and we will follow a topdown approach, axiomatizing the properties of the data structures used in the compiler to focus more on the algorithms.
+The rationale is that data structures are easier than algorithms to test using wellknown methodologies.
+The effort table clearly shows that common definitions and data structures are one quarter of the size of the current codebase code and require slightly less than one third of the total effort.
+At least half of this effort really goes into simple data structures (vectors, bounded and unbounded integers, tries and maps) whose certification is not interesting and whose code could be taken without reproving it from the library of any other theorem prover.
\section{Conclusions}
The overall exercise, whose details have been only minimally reported here,
has been very useful. It has allowed to spot in an early moment some criticities
of the proof that have required major changes in the proof plan. It has also
shown that the last passes of the compilation (e.g. assembly) and cost
prediction on the object code are much more involved than more high level
passes.

The final estimation for the effort is surely affected by a low degree of
confidence. It is however sufficient to conclude that the effort required
is in line with the man power that was scheduled for the second half of the
second period and for the third period. Compared to the number of men months
declared in Annex I of the contract, we will need more men months. However,
both at UNIBO and UEDIN there have been major differences in hiring with
respect to the Annex. Therefore both sites have now an higher number of men
months available, with the tradeoff of a lower level of maturity of the
people employed.

The reviewers suggested that we use this estimation to compare two possible
scenarios: a) proceed as planned, porting all the CompCert proofs to Matita
or b) port D3.1 and D4.1 to Coq and reuse the CompCert proofs.
We remark here again that the backend of the two compilers, from the
memory model on, are sensibly different: we are not reproving correctness
of the same piece of code. Moreover, the proof techniques are different for
the frontend too. Switching to the CompCert formalization would imply
the abandon of the untrusted compiler, the abandon of the experiment with
a different proof technique, the abandon of the quest for an open source
proof, and the abandon of the codevelopment of the formalization and the
Matita proof assistant. In the Commitment Letter~\cite{letter} delivered
to the Officer in May we clarified our personal perspective on the project
goals and objectives. We do not redescribe here the point of view presented
in the letter that we can condense in ``we value diversity''.

Clearly, if the execise would have suggested the infeasability in terms of
effort of concluding the formalization or getting close to that, we would have
abandoned our path and embraced the reviewer's suggestion. However, we
have been comforted in the analysis we did in autumn and further progress done
during the winter does not show yet any major delay with respect to the
proof schedule. We are thus planning to continue the certification according
to the more detailed proof plan that came out from the exercise reported in
this manuscript.
+
+The overall exercise, whose details have only been sketched here, has been very useful.
+It has brought to light some errors in our proof that have required major changes in the proof plan.
+It has also shown that the last passes of the compilation (e.g. assembly) and cost prediction on the objectcode are much more involved than more highlevel passes.
+
+The final estimation for the effort required to complete the proof suffers from a low degree of confidence engendered in the numbers, due to the difficulty in relating our work, and compiler design, with that of CompCert.
+It is however sufficient to conclude that the effort required is in line with the man power that was scheduled for the second half of the second period and for the third period.
+Compared to the number of man months declared in Annex I of the contract, we will need more man months.
+However, at both UNIBO and UEDIN there have been major differences in hiring with respect to the Annex.
+Therefore both sites now have more manpower available, though with the associated tradeoff of a lower level of maturity for the people employed.
+
+Our reviewers suggested that we use this estimation to compare two possible scenarios: a) proceed as planned, porting all the CompCert proofs to Matita or b) port D3.1 and D4.1 to Coq and reuse the CompCert proofs.
+We remark here again that the backend of the two compilers, from the memory model on, are quite different: we are not reproving correctness of the same piece of code.
+Moreover, the proof techniques are different for the frontend too.
+Switching to the CompCert formalisation would imply the abandoning of the untrusted compiler, the abandoning of the experiment with a different proof technique, the abandoning of the quest for an opensource proof, and the abandoning of the codevelopment of the formalisation and the Matita proof assistant.
+In the Commitment Letter~\cite{letter}, delivered to the Officer in May, we clarified our personal perspectives on the project goals and objectives.
+We do not redescribe here the point of view presented in the letter, other than the condensed soundbite that ``we value diversity''.
+
+Clearly, if the execise would have suggested the infeasability in terms of effort of concluding the formalisation, or getting close to that, we would have abandoned our path and embraced the reviewer's suggestion.
+However, we have been comforted in the analysis that we did in Autumn and further progress completed during the winter does not yet show any major delay with respect to the proof schedule.
+We are thus planning to continue the certification according to the more detailed proof plan that came out from the exercise reported in this manuscript.
\begin{thebibliography}{2}