Ignore:
Timestamp:
Feb 28, 2012, 4:35:26 PM (8 years ago)
Author:
mulligan
Message:

edited the report so that all the spelling is consistent (british english, naturally), fixed italianisms, typos, etc. etc.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • Deliverables/D1.2/CompilerProofOutline/outline.tex

    r1788 r1790  
    7878\begin{LARGE}
    7979\bf
    80 Proof outline for the correctness of the CerCo compiler
     80Proof outline for the correctness of\\the CerCo compiler
    8181\end{LARGE}
    8282\end{center}
     
    9393\begin{large}
    9494Main Authors:\\
    95 B. Campbell, D. Mulligan, P. Tranquilli, C. Sacerdoti Coen
     95J. Boender, B. Campbell, D. Mulligan, P. Tranquilli, C. Sacerdoti Coen
    9696\end{large}
    9797\end{center}
     
    110110\label{sect.introduction}
    111111
    112 In the last project review of the CerCo project, the project reviewers
    113 recommended us to quickly outline a paper-and-pencil correctness proof
    114 for each of the stages of the CerCo compiler in order to allow for an
    115 estimation of the complexity and time required to complete the formalization
    116 of the proof. This has been possible starting from month 18 when we have
    117 completed the formalization in Matita of the datastructures and code of
    118 the compiler.
    119 
    120 In this document we provide a very high-level, pen-and-paper
    121 sketch of what we view as the best path to completing the correctness proof
    122 for the compiler. In particular, for every translation between two intermediate languages, in both the front- and back-ends, we identify the key translation steps, and identify some invariants that we view as being important for the correctness proof.  We sketch the overall correctness results, and also briefly describe the parts of the proof that have already
    123 been completed at the end of the First Period.
    124 
    125 In the last section we finally present an estimation of the effort required
    126 for the certification in Matita of the compiler and we draw conclusions.
     112In the last project review of the CerCo project, the project reviewers recommended that we briefly outline a pencil-and-paper correctness proof for each of the stages of the CerCo compiler in order to facilitate an estimation of the complexity and time required to complete the formalisation of the proof.
     113This has been possible starting from month eighteen, as we have now completed the formalisation in Matita of the data structures and code of the compiler.
     114
     115In this document we provide a very high-level, pencil-and-paper sketch of what we view as the best path to completing the correctness proof for the compiler.
     116In particular, for every translation between two intermediate languages, in both the front- and back-ends, we identify the key translation steps, and identify some invariants that we view as being important for the correctness proof.
     117We sketch the overall correctness results, and also briefly describe the parts of the proof that have already been completed at the end of the First Period.
     118
     119Finally, in the last section we present an estimation of the effort required for the certification in Matita of the compiler and draw conclusions.
    127120
    128121\section{Front-end: Clight to RTLabs}
     
    139132meeting we intend to move this transformation to the back-end}\\
    140133\> $\downarrow$ \> cost labelling\\
    141 \> $\downarrow$ \> loop optimizations\footnote{\label{lab:opt2}To be ported from the untrusted compiler and certified only in case of early completion of the certification of the other passes.} (an endo-transformation)\\
     134\> $\downarrow$ \> loop optimisations\footnote{\label{lab:opt2}To be ported from the untrusted compiler and certified only in case of early completion of the certification of the other passes.} (an endo-transformation)\\
    142135\> $\downarrow$ \> partial redundancy elimination$^{\mbox{\scriptsize \ref{lab:opt2}}}$ (an endo-transformation)\\
    143136\> $\downarrow$ \> stack variable allocation and control structure
     
    154147
    155148Here, by `endo-transformation', we mean a mapping from language back to itself:
    156 the loop optimization step maps the Clight language to itself.
     149the loop optimisation step maps the Clight language to itself.
    157150
    158151%Our overall statements of correctness with respect to costs will
     
    357350\> $\downarrow$ \quad \= \kill
    358351\textsf{RTLabs}\\
    359 \> $\downarrow$ \> copy propagation\footnote{\label{lab:opt}To be ported from the untrusted compiler and certified only in case of early completion of the certification of the other passes.} (an endo-transformation) \\
     352\> $\downarrow$ \> copy propagation\footnote{\label{lab:opt}To be ported from the untrusted compiler and certified only in the case of an early completion of the certification of the other passes.} (an endo-transformation) \\
    360353\> $\downarrow$ \> instruction selection\\
    361354\> $\downarrow$ \> change of memory models in compiler\\
     
    380373
    381374\subsection{Graph translations}
    382 RTLabs and most intermediate languages in the back-end have a graph
    383 representation:
    384 the code of each function is represented by a graph of instructions.
    385 The graph maps a set of labels (the names of the nodes) to the instruction
    386 stored at that label (the nodes of the graph).
    387 Instructions reference zero or more additional labels that are the immediate
    388 successors of the instruction: zero for return from functions; more than one
    389 for conditional jumps and calls; one in all other cases. The references
    390 from one instruction to its immediates are the arcs of the graph.
    391 
    392 Status of graph languages always have a program counter that holds a
    393 representation of a reference to the current instruction.
    394 
    395 A translation between two consecutive graph languages maps each instruction
    396 stored at location $l$ in the first graph and with immediate successors
    397 $\{l_1,\ldots,l_n\}$ to a subgraph of the output graph that has a single
    398 entry point at location $l$ and exit arcs to $\{l_1,\ldots,l_n\}$. Moreover,
    399 the labels of all non entry nodes in the subgraph are distinct from all the
    400 labels in the source graph.
    401 
    402 In order to simplify the translations and the relative proofs of forward
    403 simulation, after the release of D4.2 and D4.3, we have provided:
     375RTLabs and most intermediate languages in the back-end have a graph representation: the code of each function is represented by a graph of instructions.
     376The graph maps a set of labels (the names of the nodes) to the instruction stored at that label (the nodes of the graph).
     377Instructions reference zero or more additional labels that are the immediate successors of the instruction: zero for return from functions, more than one for conditional jumps and calls, one in all other cases.
     378The references from one instruction to its immediate successors are the arcs of the graph.
     379
     380The statuses of graph languages always contain a program counter that holds a representation of a reference to the current instruction.
     381
     382A translation between two consecutive graph languages maps each instruction stored at location $l$ in the first graph and with immediate successors $\{l_1,\ldots,l_n\}$ to a subgraph of the output graph that has a single entry point at location $l$ and exit arcs to $\{l_1,\ldots,l_n\}$.
     383Moreover, the labels of all non-entry nodes in the subgraph are distinct from all the labels in the source graph.
     384
     385In order to simplify the translations and the relative proofs of forward simulation, after the release of D4.2 and D4.3, we have provided:
    404386\begin{itemize}
    405  \item a new data type (called \texttt{blist}) that represents a
    406    sequence of instructions to be added to the output graph.
    407    The ``b'' in the name stands for binder, since a \texttt{blist} is
    408    either empty, an extension of a \texttt{blist} with an instruction
    409    at the front, or the generation of a fresh quantity followed by a
    410    \texttt{blist}. The latter feature is used, for instance, to generate
    411    fresh register names. The instructions in the list are unlabelled and
    412    all of them but the last one are also sequential, like in a linear program.
    413  \item a new iterator (called \texttt{b\_graph\_translate}) of type
     387\item
     388A new data type (called \texttt{blist}) that represents a sequence of instructions to be added to the output graph.
     389The ``b'' in the name stands for binder, since a \texttt{blist} is either empty, an extension of a \texttt{blist} with an instruction at the front, or the generation of a fresh quantity followed by a \texttt{blist}.
     390The latter feature is used, for instance, to generate fresh register names.
     391The instructions in the list are unlabelled and all of them but the last one are also sequential, like in a linear program.
     392\item
     393A new iterator (called \texttt{b\_graph\_translate}) of type
    414394\begin{displaymath}
    415395\mathtt{b\_graph\_translate}: (\mathtt{label} \rightarrow \mathtt{blist})
    416396\rightarrow \mathtt{graph} \rightarrow \mathtt{graph}
    417397\end{displaymath}
    418    The iterator transform the input graph in the output graph by replacing
    419    each node with the graph that corresponds to the linear \texttt{blist}
    420    obtained by applying the function in input to the node label.
     398The iterator transform the input graph in the output graph by replacing each node with the graph that corresponds to the linear \texttt{blist} obtained by applying the function in input to the node label.
    421399\end{itemize}
    422400
    423 Using the iterator above, the code can be written in such a way that
    424 the programmer does not see any distinction between writing a transformation
    425 on linear or graph languages.
    426 
    427 In order to prove simulations for translations obtained using the iterator,
    428 we will prove the following theorem:
     401Using the iterator above, the code can be written in such a way that the programmer does not see any distinction between writing a transformatio on linear or graph languages.
     402
     403In order to prove simulations for translations obtained using the iterator, we will prove the following theorem:
    429404
    430405\begin{align*}
     
    434409\end{align*}
    435410
    436 Here \texttt{subgraph} is a computational predicate that given a \texttt{blist}
    437 $[i_1, \ldots, i_n]$, an entry label $l$, an exit label $l'$ and a graph $G$
    438 expands to the fact that fetching from $G$ at address $l$ one retrieves a node
    439 $i_1$ with a successor $l_1$ that, when fetched, yields a node $i_2$ with a
    440 successor $l_2$ such that \ldots. The successor of $i_n$ is $l'$.
    441 
    442 Proving a forward simulation diagram of the following kind using the theorem
    443 above is now as simple as doing the same using standard small step operational
    444 semantics over linear languages.
     411Here \texttt{subgraph} is a computational predicate that given a \texttt{blist} $[i_1, \ldots, i_n]$, an entry label $l$, an exit label $l'$ and a graph $G$ expands to the fact that fetching from $G$ at address $l$ one retrieves a node $i_1$ with a successor $l_1$ that, when fetched, yields a node $i_2$ with a successor $l_2$ such that \ldots. The successor of $i_n$ is $l'$.
     412
     413Proving a forward simulation diagram of the following kind using the aforementioned theorem is now as straightforward as doing the same using standard small-step operational semantics over linear languages.
    445414
    446415\begin{align*}
     
    451420\end{align*}
    452421
    453 Because of the fact that graph translation preserves entry and exit labels of
    454 translated statements, the state translation function $\sigma$ will simply
    455 preserve the value of the program counter. The program code, which is
    456 part of the state, is translated using the iterator.
    457 
    458 The proof is then roughly the following. Let $l$ be the program counter of the
    459 input state $s$. We proceed by cases on the current instruction of $s$.
    460 Let $[i_1, \ldots, i_n]$ be the \texttt{blist} associated to $l$ and $s$
    461 by the translation function. The witness required for the existential
    462 statement is simply $n$. By applying the theorem above we know that the
    463 next $n$ instructions that will be fetched from $s\ \sigma$ will be
    464 $[i_1, \ldots, i_n]$ and it is now sufficient to prove that they simulate
    465 the original instruction.
     422Because graph translations preserve entry and exit labels of translated statements, the state translation function $\sigma$ will simply preserve the value of the program counter.
     423The program code, which is part of the state, is translated using the iterator.
     424
     425The proof is then roughly as follows.
     426Let $l$ be the program counter of the input state $s$.
     427We proceed by cases on the current instruction of $s$.
     428Let $[i_1, \ldots, i_n]$ be the \texttt{blist} associated to $l$ and $s$ by the translation function.
     429The witness required for the existential statement is simply $n$.
     430By applying the theorem above we know that the next $n$ instructions that will be fetched from $s\ \sigma$ will be $[i_1, \ldots, i_n]$ and it is now sufficient to prove that they simulate the original instruction.
    466431
    467432\subsection{The RTLabs to RTL translation}
     
    479444\end{displaymath}
    480445
    481 In the front-end, we have both integer and float values, where integer values are `sized', along with null values and pointers. Some frontenv values are
    482 representables in a byte, but some others require more bits.
     446In the front-end, we have both integer and float values, where integer values are `sized', along with null values and pointers.
     447Some front-end values are representables in a byte, but some others require more bits.
    483448
    484449In the back-end model all values are meant to be represented in a single byte.
    485 Values can thefore be undefined, be one byte long integers or be indexed
    486 fragments of a pointer, null or not. Floats values are no longer present, as floating point arithmetic is not supported by the CerCo compiler.
    487 
    488 The $\sigma$ map implements a one-to-many relation: a single front-end value
    489 is mapped to a sequence of back-end values when its size is more then one byte.
    490 
    491 We further require a map, $\sigma$, which maps the front-end \texttt{Memory} and the back-end's notion of \texttt{BEMemory}. Both kinds of memory can be
    492 thought as an instance of a generic \texttt{Mem} data type parameterized over
    493 the kind of values stored in memory.
     450Values can thefore be undefined, be one byte long integers or be indexed fragments of a pointer, null or not.
     451Floats values are no longer present, as floating point arithmetic is not supported by the CerCo compiler.
     452
     453The $\sigma$ map implements a one-to-many relation: a single front-end value is mapped to a sequence of back-end values when its size is more then one byte.
     454
     455We further require a map, $\sigma$, which maps the front-end \texttt{Memory} and the back-end's notion of \texttt{BEMemory}.
     456Both kinds of memory can be thought as an instance of a generic \texttt{Mem} data type parameterized over the kind of values stored in memory:
    494457
    495458\begin{displaymath}
    496459\mathtt{Mem}\ \alpha = \mathtt{Block} \rightarrow (\mathbb{Z} \rightarrow \alpha)
    497460\end{displaymath}
    498 
    499461Here, \texttt{Block} consists of a \texttt{Region} paired with an identifier.
    500462
     
    502464\mathtt{Block} ::= \mathtt{Region} \times \mathtt{ID}
    503465\end{displaymath}
    504 
    505466We now have what we need for defining what is meant by the `memory' in the backend memory model.
    506467Namely, we instantiate the previously defined \texttt{Mem} type with the type of back-end memory values.
     
    509470\mathtt{BEMem} = \mathtt{Mem}~\mathtt{BEValue}
    510471\end{displaymath}
    511 
    512472Memory addresses consist of a pair of back-end memory values:
    513473
     
    515475\mathtt{Address} = \mathtt{BEValue} \times  \mathtt{BEValue} \\
    516476\end{displaymath}
    517 
    518477The back- and front-end memory models differ in how they represent sized integeer values in memory.
    519478In particular, the front-end stores integer values as a header, with size information, followed by a string of `continuation' blocks, marking out the full representation of the value in memory.
     
    555514\texttt{load}^* \sigma(a)\ (\mathtt{store}\ \sigma(a')\ \sigma(v)\ \sigma(M)) = \mathtt{load}^*\ \sigma(s)\ \sigma(a)\ \sigma(M)
    556515\end{displaymath}
    557 That is, suppose we store a transformed value $\mathtt{\sigma(v)}$ into a back-end memory $\mathtt{\sigma(M)}$ at address $\mathtt{\sigma(a')}$, using \texttt{store}, and then load from the address $\sigma(a)$. Even if $a$ and $a'$ are
    558 distinct by hypothesis, there is a priori no guarantee that the consecutive
    559 bytes for the value stored at $\sigma(a)$ are disjoint from those for the
    560 values stored at $\sigma(a')$. The fact that this holds is a non-trivial
    561 property of $\sigma$ to be proved.
     516That is, suppose we store a transformed value $\mathtt{\sigma(v)}$ into a back-end memory $\mathtt{\sigma(M)}$ at address $\mathtt{\sigma(a')}$, using \texttt{store}, and then load from the address $\sigma(a)$. Even if $a$ and $a'$ are distinct by hypothesis, there is \emph{a priori} no guarantee that the consecutive bytes for the value stored at $\sigma(a)$ are disjoint from those for the values stored at $\sigma(a')$.
     517The fact that this holds is a non-trivial property of $\sigma$ that must be explicitly proved.
    562518
    563519\subsubsection*{Translation of RTLabs states}
     520
    564521RTLabs states come in three flavours:
    565522\begin{displaymath}
     
    570527\end{array}
    571528\end{displaymath}
    572 \texttt{State} is the default state in which RTLabs programs are almost always in.
     529\texttt{State} is the default state in which RTLabs programs are almost always in for the duration of their execution.
    573530The \texttt{Call} state is only entered when a call instruction is being executed, and then we immediately return to being in \texttt{State}.
    574531Similarly, \texttt{Return} is only entered when a return instruction is being executed, before returning immediately to \texttt{State}.
     
    577534
    578535RTL states differ from their RTLabs counterparts, in including a program counter \texttt{PC}, stack-pointer \texttt{SP}, internal stack pointer \texttt{ISP}, a carry flag \texttt{CARRY} and a set of registers \texttt{REGS}:
     536
    579537\begin{displaymath}
    580538\mathtt{State} ::= \mathtt{Frame}^* \times \mathtt{PC} \times \mathtt{SP} \times \mathtt{ISP} \times \mathtt{CARRY} \times \mathtt{REGS}
     
    584542As a result, we have two stack pointers in our state: \texttt{ISP}, which is the real, hardware stack, and \texttt{SP}, which is the stack pointer of the emulated stack in memory.
    585543The emulated stack is used for pushing and popping stack frames when calling or returning from function calls, however this is done using the hardware stack, indexed by \texttt{ISP} as an intermediary.
    586 Instructions like \texttt{LCALL} and \texttt{ACALL} are hardwired by the processor's design to push the return address on to the hardware stack. Therefore after a call has been made, and before a call returns, the compiler emits code to move the return address back and forth the two stacks. Parameters, return values
    587 and local variables are only present in the external stack.
     544Instructions like \texttt{LCALL} and \texttt{ACALL} are hardwired by the processor's design to push the return address on to the hardware stack.
     545Therefore after a call has been made, and before a call returns, the compiler emits code to move the return address back and forth the two stacks.
     546Parameters, return values and local variables are only present in the external stack.
    588547As a result, for most of the execution of the processor, the hardware stack is empty, or contains a single item ready to be moved into external memory.
    589548
    590549Once more, we require a relation $\sigma$ between RTLabs states and RTL states.
    591 Because $\sigma$ is one-to-many and, morally, a multi-function,
    592 we use in the following the functional notation for $\sigma$, using $\star$
    593 in the output of $\sigma$ to mean that any value is accepted.
     550Because $\sigma$ is one-to-many and, morally, a multivalued function, we use in the sequel the functional notation for $\sigma$, using $\star$ as a distinct marker in the range of $\sigma$ to mean that any value is accepted.
    594551\begin{displaymath}
    595552\mathtt{State} \stackrel{\sigma}{\longrightarrow} \mathtt{State}
     
    601558\sigma(\mathtt{State} (\mathtt{Frame}^* \times \mathtt{\langle PC, REGS, SP \rangle})) \longrightarrow ((\sigma(\mathtt{Frame}^*), \sigma(\mathtt{PC}), \sigma(\mathtt{SP}), \star, \star, \sigma(\mathtt{REGS})), \sigma(\mathtt{Mem}))
    602559\end{displaymath}
    603 Translation then proceeds by translating the remaining stack frames, as well as the contents of the top stack frame. Any value for the internal stack pointer
    604 and the carry bit is admitted.
     560Translation then proceeds by translating the remaining stack frames, as well as the contents of the top stack frame.
     561An arbitrary value for the internal stack pointer and the carry bit is admitted.
    605562
    606563Translating \texttt{Call} and \texttt{Return} states is more involved, as a commutation between a single step of execution and the translation process must hold:
     
    614571
    615572Here \emph{return one step} and \emph{call one step} refer to a pair of commuting diagrams relating the one-step execution of a call and return state and translation of both.
    616 We provide the one step commuting diagrams in Figure~\ref{fig.commuting.diagrams}. The fact that one execution step in the source language is not performed
    617 in the target language is not problematic for preservation of divergence
    618 because it is easy to show that every step from a \texttt{Call} or
    619 \texttt{Return} state is always preceeded/followed by one step that is always
    620 simulated.
     573We provide the one step commuting diagrams in Figure~\ref{fig.commuting.diagrams}.
     574The fact that one execution step in the source language is not performed in the target language is not problematic for preservation of divergence because it is easy to show that every step from a \texttt{Call} or \texttt{Return} state is always preceeded or followed by one step that is always simulated.
    621575
    622576\begin{figure}
     
    641595
    642596\subsubsection*{The forward simulation proof}
    643 The forward simulation proof for all steps that do not involve function calls are lengthy, but routine.
     597
     598The forward simulation proofs for all steps that do not involve function calls are lengthy, but routine.
    644599They consist of simulating a front-end operation on front-end pseudo-registers and the front-end memory with sequences of back-end operations on the back-end pseudo-registers and back-end memory.
    645600The properties of $\sigma$ presented before that relate values and memories will need to be heavily exploited.
     
    688643
    689644
    690 Eventually, a Return instruction is faced,
    691 the return value is fetched from the registers map,
    692 the stack frame is deallocated
    693 and a Return state is entered:
     645Eventually, a \texttt{RET} instruction is faced, the return value is fetched from the registers map, the stack frame is deallocated and a \texttt{Return} state is entered:
    694646\begin{displaymath}
    695647\begin{array}{rcl}
    696 \mathtt{Return(id,\ args,\ dst,\ pc) \in State(Frame^*, Frame)} & \longrightarrow & \mathtt{free(sp)} \\
     648\mathtt{RET(id,\ args,\ dst,\ pc) \in State(Frame^*, Frame)} & \longrightarrow & \mathtt{free(sp)} \\
    697649   &                 & \mathtt{Return(M(ret\_val), dst, Frames)}
    698650\end{array}
    699651\end{displaymath}
    700652
    701 Then the return state restores the program counter by popping the stack, and execution proceeds in a new \texttt{State}, like the case for external functions:
     653Then the \texttt{Return} state restores the program counter by popping the stack, and execution proceeds in a new \texttt{State}, like the case for external functions:
    702654\begin{displaymath}
    703655\begin{array}{rcl}
     
    708660
    709661\subparagraph{The RTLabs to RTL translation for function calls}
     662
    710663Return instructions are translated to return instructions:
    711664\begin{displaymath}
    712 \mathtt{Return} \longrightarrow \mathtt{Return}
    713 \end{displaymath}
    714 
    715 \texttt{Call} instructions are translated to \texttt{Call\_ID} instructions:
    716 \begin{displaymath}
    717 \mathtt{Call(id,\ args,\ dst,\ pc)} \longrightarrow \mathtt{CALL\_ID(id,\ \Sigma'(args),\ \Sigma(dst),\ pc)}
    718 \end{displaymath}
    719 Here $\Sigma$ is the map, computed by the compiler,
    720 that translate pseudo-registers holding front-end values to list of
    721 pseudo-registers holding the chunks for the front-end values.
     665\mathtt{RET} \longrightarrow \mathtt{RET}
     666\end{displaymath}
     667
     668\texttt{CALL} instructions are translated to \texttt{CALL\_ID} instructions:
     669\begin{displaymath}
     670\mathtt{CALL(id,\ args,\ dst,\ pc)} \longrightarrow \mathtt{CALL\_ID(id,\ \Sigma'(args),\ \Sigma(dst),\ pc)}
     671\end{displaymath}
     672Here $\Sigma$ is the map, computed by the compiler, that translate pseudo-registers holding front-end values to list of pseudo-registers holding the chunks for the front-end values.
    722673The specification for $\Sigma$ is that for every state $s$,
    723 $$\sigma(s(r)) = (\sigma(s))(\Sigma(r))$$
     674\begin{displaymath}
     675\sigma(s(r)) = (\sigma(s))(\Sigma(r))
     676\end{displaymath}
    724677
    725678\subparagraph{Function call/return in RTL}
     
    754707\mathtt{PUSH(current\_frame[pc := after\_return])}
    755708\end{displaymath}
    756 was executed. To summarize, up to the different numer of transitions required
    757 to do the job, the RTL code for internal function calls closely simulates
    758 the RTLabs code.
     709was executed.
     710To summarize, up to the different numer of transitions required to do the job, the RTL code for internal function calls closely simulates the RTLabs code.
    759711
    760712The execution of \texttt{Return} in RTL is similarly straightforward, with the return address, stack pointer, and so on, being computed by popping off the top of the stack, and the return value computed by the function being retrieved from memory:
     
    768720To summarize, the forward simulation diagrams for function call/return
    769721XXX
    770 
    771722
    772723Translation and execution must satisfy a pair of commutation properties for the \texttt{Return} and \texttt{Call} cases.
     
    842793\subsection{The ERTL to LTL translation}
    843794\label{subsect.ertl.ltl.translation}
    844 During the ERTL to LTL translation pseudo-registers are stored in hardware
    845 registers or spilled on to the stack frame. The decision is based on liveness
    846 analysis of the ERTL code to determine what pair of pseudoregisters are live
    847 at the same time at a given location. A coloring algorithm is then used to
    848 choose where to store the pseudo-registers, allowing pseudo-registers that
    849 are never live at the same time to share the same location.
    850 
    851 We will not certify any coloring algorithm or control flow analysis.
    852 Instead, we axiomatically assume the existence of solutions to the
    853 coloring and liveness problems. In a later phase we plan to validate the
    854 solutions by writing and certifying the code of a validator.
    855 
    856 We describe the liveness analysis and colouring analysis first and then
    857 the ERTL to LTL translation.
    858 
    859 Throughout this section, we denote pseudoregisters with the type $\mathtt{register}$ and hardware ones with $\mathtt{hdwregister}$.
     795During the ERTL to LTL translation pseudo-registers are stored in hardware registers or spilled onto the stack frame.
     796The decision is based on a liveness analysis performed on the ERTL code to determine what pair of pseudo-registers are live at the same time for a given location.
     797A colouring algorithm is then used to choose where to store the pseudo-registers, permitting pseudo-registers that are deemed never to be live at the same time to share the same location.
     798
     799We will not certify any colouring algorithm or control flow analysis.
     800Instead, we axiomatically assume the existence of `oracles' that implement the colouring and liveness analyses.
     801In a later phase we plan to validate the solutions by writing and certifying the code of a validator.
     802
     803We describe the liveness analysis and colouring analysis first and then the ERTL to LTL translation after.
     804
     805Throughout this section, we denote pseudoregisters with the type $\mathtt{register}$ and hardware registers with $\mathtt{hdwregister}$.
    860806\subsubsection{Liveness analysis}
    861807\newcommand{\declsf}[1]{\expandafter\newcommand\expandafter{\csname #1\endcsname}{\mathop{\mathsf{#1}}\nolimits}}
     
    867813\declsf{StatementSem}
    868814
    869 For the liveness analysis, we aim at a map
     815For the liveness analysis, we aim to construct a map
    870816$\ell \in \mathtt{label} \mapsto $ live registers at $\ell$.
    871 We define the following operators on ERTL statements.
    872 $$
     817We define the following operators on ERTL statements:
     818
     819\begin{displaymath}
    873820\begin{array}{lL>{(ex. $}L<{)$}}
    874821\Defined(\ell) & registers defined at $\ell$ & \ell:r_1\leftarrow r_2+r_3 \mapsto \{r_1,C\}, \ell:\mathtt{CALL}~id\mapsto \text{caller-save}
     
    876823\Used(\ell) & registers used at $\ell$ & \ell:r_1\leftarrow r_2+r_3 \mapsto \{r_2,r_3\}, \ell:\mathtt{CALL}~id\mapsto \text{parameters}
    877824\end{array}
    878 $$
    879 Given $LA:\mathtt{label}\to\mathtt{lattice}$ (where $\mathtt{lattice}$
    880 is the type of sets of registers\footnote{More precisely, it is the lattice
    881 $\mathtt{set}(\mathtt{register})\times
    882 \mathtt{set}(\mathtt{hdwregister})$,
    883 with pointwise operations.}), we also have have the following
    884 predicates:
    885 $$
     825\end{displaymath}
     826Given $LA:\mathtt{label}\to\mathtt{lattice}$ (where $\mathtt{lattice}$ is the type of sets of registers\footnote{More precisely, it is the lattice $\mathtt{set}(\mathtt{register}) \times \mathtt{set}(\mathtt{hdwregister})$, with pointwise operations.}), we also have have the following predicates:
     827\begin{displaymath}
    886828\begin{array}{lL}
    887829\Eliminable_{LA}(\ell) & iff executing $\ell$ has side-effects only on $r\notin LA(\ell)$
     
    896838  \end{cases}$
    897839\end{array}
    898 $$
    899 In particular, $\Livebefore$ has type $(\mathtt{label}\to\mathtt{lattice})\to
    900 \mathtt{label}\to\mathtt{lattice}$.
    901 
    902 The equation on which we build the fixpoint is then
    903 $$\Liveafter(\ell) \doteq \bigcup_{\ell <_1 \ell'} \Livebefore_{\Liveafter}(\ell')$$
    904 where $\ell <_1 \ell'$ denotes that $\ell'$ is an immediate successor of $\ell$
    905 in the graph. We do not require the fixpoint to be the least one, so the hypothesis
    906 on $\Liveafter$ that we require is
     840\end{displaymath}
     841In particular, $\Livebefore$ has type $(\mathtt{label}\to\mathtt{lattice})\to \mathtt{label}\to\mathtt{lattice}$.
     842
     843The equation upon which we build the fixpoint is then
     844\begin{displaymath}
     845\Liveafter(\ell) \doteq \bigcup_{\ell <_1 \ell'} \Livebefore_{\Liveafter}(\ell')
     846\end{displaymath}
     847where $\ell <_1 \ell'$ denotes that $\ell'$ is an immediate successor of $\ell$ in the graph.
     848We do not require the fixpoint to be the least one, so the hypothesis on $\Liveafter$ that we require is
    907849\begin{equation}
    908850\label{eq:livefixpoint}
    909851\Liveafter(\ell) \supseteq \bigcup_{\ell <_1 \ell'} \Livebefore(\ell')
    910852\end{equation}
    911 (for shortness we drop the subscript from $\Livebefore$).
     853(for brevity we drop the subscript from $\Livebefore$).
    912854
    913855\subsubsection{Colouring}
     856
    914857\declsf{Colour}
    915858\newcommand{\at}{\mathrel{@}}
    916 The aim of the liveness analysis is to define what properties we need
    917 of the colouring function, which is a map (computed separately for each
    918 internal function)
    919 $$\Colour:\mathtt{register}\to\mathtt{hdwregister}+\mathtt{nat}$$
    920 which identifies pseudoregisters with hardware ones if it is able to, otherwise
    921 it spills them to the stack. We will just state what property we need from such
    922 a map. First, we extend the definition to all types of registers by:
    923 $$\begin{aligned}
     859
     860The aim of liveness analysis is to define what properties we need of the colouring function, which is a map (computed separately for each internal function)
     861\begin{displaymath}
     862\Colour:\mathtt{register}\to\mathtt{hdwregister}+\mathtt{nat}
     863\end{displaymath}
     864which identifies pseudoregisters with hardware ones if it is able to, otherwise it spills them to the stack.
     865We will just state what property we require from such a map.
     866First, we extend the definition to all types of registers by:
     867\begin{displaymath}
     868\begin{aligned}
    924869   \Colour^+:\mathtt{hdwregister}+\mathtt{register} &\to \mathtt{hdwregister}+\mathtt{nat}\\
    925870   r & \mapsto
     
    928873  r &\text{if $r\in\mathtt{hdwregister}$,}.
    929874\end{cases}
    930   \end{aligned}$$
    931 The other piece of information we compute for each function is a \emph{similarity}
    932 relation, which is an equivalence relation on all kinds of registers which depends
    933 on the point of the program. We write
    934 $$x\sim y \at \ell$$
    935 to state that registers $x$ and $y$ are similar at $\ell$. The formal definition
    936 of this relation's property will be given next, but intuitively it means that those two registers \emph{must}
    937 have the same value after $\ell$. The analysis that produces this information can be
    938 coarse: in our case, we just set two different registers to be similar at $\ell$
    939 if at $\ell$ itself there is a move instruction between the two.
     875\end{aligned}
     876\end{displaymath}
     877The other piece of information we compute for each function is a \emph{similarity} relation, which is an equivalence relation on all kinds of registers which depends on the point of the program.
     878We write
     879\begin{displaymath}
     880x\sim y \at \ell
     881\end{displaymath}
     882to state that registers $x$ and $y$ are similar at $\ell$.
     883The formal definition of this relation's property will be given next, but intuitively it means that those two registers \emph{must} have the same value after $\ell$.
     884The analysis that produces this information can be coarse: in our case, we just set two different registers to be similar at $\ell$ if at $\ell$ itself there is a move instruction between the two.
    940885
    941886The property required of colouring is the following:
     
    947892
    948893\subsubsection{The translation}
     894
    949895For example:
    950 $$\ell : r_1\leftarrow r_2+r_3 \mapsto \begin{cases}
     896\begin{displaymath}
     897\ell : r_1\leftarrow r_2+r_3 \mapsto \begin{cases}
    951898                                 \varepsilon & \text{if $\Eliminable(\ell)$},\\
    952899                                 \Colour(r_1) \leftarrow \Colour(r_2) + \Colour(r_3) & \text{otherwise}.
    953                                 \end{cases}$$
    954 where $\varepsilon$ is the empty block of instructions (i.e.\ a \texttt{GOTO}),
    955 and $\Colour(r_1) \leftarrow \Colour(r_2) + \Colour(r_3)$ is a notation for a
    956 block of instructions that take into account:
     900                                \end{cases}
     901\end{displaymath}
     902where $\varepsilon$ is the empty block of instructions (i.e.\ a \texttt{GOTO}), and $\Colour(r_1) \leftarrow \Colour(r_2) + \Colour(r_3)$ is a notation for a block of instructions that take into account:
    957903\begin{itemize}
    958  \item load and store ops on the stack if any colouring is in fact a spilling;
    959  \item using the accumulator to store intermediate values.
     904\item
     905Load and store ops on the stack if any colouring is in fact a spilling;
     906\item
     907Using the accumulator to store intermediate values.
    960908\end{itemize}
    961 The overall effect is that if $T$ is an LTL state with $\ell(T)=\ell$
    962 then we will have $T\to^+T'$ where $T'(\Colour(r_1))=T(\Colour(r_2))+T(\Colour(r_2))$,
    963 while $T'(y)=T(y)$ for any $y$ \emph{in the image of $\Colour$} different from
    964 $\Colour(r_1)$. Some hardware registers that are used for book-keeping and which
    965 are explicitly excluded from colouring may have different values.
     909The overall effect is that if $T$ is an LTL state with $\ell(T)=\ell$ then we will have $T\to^+T'$ where $T'(\Colour(r_1))=T(\Colour(r_2))+T(\Colour(r_2))$, while $T'(y)=T(y)$ for any $y$ \emph{in the image of $\Colour$} different from $\Colour(r_1)$.
     910Some hardware registers that are used for book-keeping and which are explicitly excluded from colouring may have different values.
    966911
    967912We skip the details of correctly dealing with the stack and its size.
     913
    968914\subsubsection{The relation between ERTL and LTL states}
    969 Given a state $S$ in ERTL, we abuse the notation by using $S$ as the underlying map
    970 $$S : \mathtt{hdwregister}+\mathtt{register} \to \mathtt{Value}.$$
    971 The program counter in $S$ is written as $\ell(S)$. At this point we can state
    972 the property asked from similarity:
     915
     916Given a state $S$ in ERTL, we abuse notation by using $S$ as the underlying map
     917\begin{displaymath}
     918S : \mathtt{hdwregister}+\mathtt{register} \to \mathtt{Value}
     919\end{displaymath}
     920We write $\ell(S)$ for the program counter in $S$.
     921At this point we can state the property asked from similarity:
    973922\begin{equation}
    974923\label{eq:similprop}
     
    976925\end{equation}
    977926
    978 Next, we relate ERTL states with LTL ones. For a state $T$ in LTL we again
    979 abuse the notation using $T$ as a map
    980 $$T: \mathtt{hdwregister}+\mathtt{nat} \to \mathtt{Value}$$
    981 which maps hardware registers and \emph{local stack offsets} to values (in particular,
    982 $T$ as a map depends on the saved frames for computing the correct absolute
    983 stack values).
    984 
    985 The relation existing between the states at the two sides of this translation step,
    986 which depends on liveness and colouring, is
    987 then defined as
    988 $$S\mathrel\sigma T \iff \ldots \wedge \forall x. x\in \Livebefore(\ell(S))\Rightarrow T(\Colour^+(x)) = S(x).$$
    989 The ellipsis stands for other straightforward preservation, among which the properties
    990 $\ell(T) = \ell(S)$ and, inductively, the preservation of frames.
     927Next, we relate ERTL states with LTL ones.
     928For a state $T$ in LTL we again abuse notation using $T$ as a map
     929\begin{displaymath}
     930T: \mathtt{hdwregister}+\mathtt{nat} \to \mathtt{Value}
     931\end{displaymath}
     932which maps hardware registers and \emph{local stack offsets} to values (in particular, $T$ as a map depends on the saved frames for computing the correct absolute stack values).
     933
     934The relation existing between the states at the two sides of this translation step, which depends on liveness and colouring, is then defined as
     935\begin{displaymath}
     936S\mathrel\sigma T \iff \ldots \wedge \forall x. x\in \Livebefore(\ell(S))\Rightarrow T(\Colour^+(x)) = S(x)
     937\end{displaymath}
     938The ellipsis stands for other straightforward preservation, among which the properties $\ell(T) = \ell(S)$ and, inductively, the preservation of frames.
    991939
    992940\subsubsection{Proof of preservation}
     941
    993942We will prove the following proposition:
    994 $$\forall S, T. S \mathrel\sigma T \Rightarrow S \to S' \Rightarrow \exists T'.T\to^+ T' \wedge S'\mathrel\sigma T'$$
    995 (with appropriate cost-labelled trace preservation which we omit). We will call $S\mathrel \sigma T$
    996 the inductive hypothsis, as it will be such in the complete proof by induction on the trace of the program.
    997 As usual, this step is done by cases
    998 on the statement at $\ell(S)$ and how it is translated. We carry out in some detail a single case, the one of
    999 a binary operation on registers.
     943\begin{displaymath}
     944\forall S, T. S \mathrel\sigma T \Rightarrow S \to S' \Rightarrow \exists T'.T\to^+ T' \wedge S'\mathrel\sigma T'
     945\end{displaymath}
     946(with appropriate cost-labelled trace preservation which we omit).
     947We will call $S\mathrel \sigma T$ the inductive hypothsis, as it will be such in the complete proof by induction on the trace of the program.
     948As usual, this step is done by cases on the statement at $\ell(S)$ and how it is translated.
     949We carry out the case of a binary operation on registers in some detail.
    1000950
    1001951Suppose that $\ell(S):r_1 \leftarrow r_2+r_3$, so that
    1002 $$S'(x)=\begin{cases}S(r_1)+S(r_2) &\text{if $x=r_1$,}\\S(x) &\text{otherwise.}\end{cases}$$
     952\begin{displaymath}
     953S'(x)=\begin{cases}S(r_1)+S(r_2) &\text{if $x=r_1$,}\\S(x) &\text{otherwise.}\end{cases}
     954\end{displaymath}
     955
    1003956\paragraph*{Case $\Eliminable(\ell(S))$.}
    1004 By definition we have $r_1\notin \Liveafter(\ell(S))$, and the translation
    1005 of the operation yields a \texttt{GOTO}. We take $T'$ the immediate successor
    1006 of $T$.
     957
     958By definition we have $r_1\notin \Liveafter(\ell(S))$, and the translation of the operation yields a \texttt{GOTO}.
     959We take $T'$ the immediate successor of $T$.
    1007960
    1008961Now in order to prove $S'\mathrel\sigma T'$, take any
    1009 $$x\in\Livebefore(\ell(S'))\subseteq \Liveafter(\ell(S)) = \Livebefore(\ell(S))$$
    1010 where we have used property~\eqref{eq:livefixpoint} and the definition
    1011 of $\Livebefore$ when $\Eliminable(\ell(S))$. We get the following chain of equalities:
    1012 $$T'(\Colour^+(x))\stackrel 1=T(\Colour^+(x))\stackrel 2=S(x) \stackrel 3= S'(x)$$
     962\begin{displaymath}
     963x\in\Livebefore(\ell(S'))\subseteq \Liveafter(\ell(S)) = \Livebefore(\ell(S))
     964\end{displaymath}
     965where we have used property~\eqref{eq:livefixpoint} and the definition of $\Livebefore$ when $\Eliminable(\ell(S))$.
     966We get the following chain of equalities:
     967
     968\begin{displaymath}
     969T'(\Colour^+(x))\stackrel 1=T(\Colour^+(x))\stackrel 2=S(x) \stackrel 3= S'(x)
     970\end{displaymath}
    1013971where
    1014972\begin{enumerate}
    1015  \item is because $T'$ has the same store as $T$,
    1016  \item is by inductive hypothesis as $x\in\Livebefore(\ell(S))$,
    1017  \item is because $x\neq r_1$, as $r_1\notin \Liveafter(\ell(S))\ni x$.
     973\item
     974follows as $T'$ has the same store as $T$,
     975\item
     976follows from the inductive hypothesis as $x\in\Livebefore(\ell(S))$,
     977\item
     978follows as $x\neq r_1$, as $r_1\notin \Liveafter(\ell(S))\ni x$.
    1018979\end{enumerate}
     980
    1019981\paragraph*{Case $\neg\Eliminable(\ell(S))$.}
     982
    1020983We then have $r_1\in\Liveafter(\ell(S))$, and
    1021 $$\Livebefore(\ell(S))=(\Liveafter(\ell(S))\setminus\{r_1\})\cup\{r_2,r_3\}.$$
    1022 Moreover the statement is translated to $\Colour(r_1)\leftarrow\Colour(r_2)+\Colour(r_3)$,
    1023 and we take the $T'\leftarrow^+T$ such that $T'(\Colour(r_1))=T(\Colour(r_2))+T(\Colour(r_3))$ and
    1024 $T'(\Colour^+(x))=T(\Colour^+(x))$ for all $x$ with $\Colour^+(x)\neq\Colour(r_1)$.
     984
     985\begin{displaymath}
     986\Livebefore(\ell(S))=(\Liveafter(\ell(S))\setminus\{r_1\})\cup\{r_2,r_3\}
     987\end{displaymath}
     988Moreover the statement is translated to $\Colour(r_1)\leftarrow\Colour(r_2)+\Colour(r_3)$, and we take the $T'\leftarrow^+T$ such that $T'(\Colour(r_1))=T(\Colour(r_2))+T(\Colour(r_3))$ and $T'(\Colour^+(x))=T(\Colour^+(x))$ for all $x$ with $\Colour^+(x)\neq\Colour(r_1)$.
    1025989
    1026990Take any $x\in\Livebefore(\ell(S'))\subseteq \Liveafter(\ell(S))$ (by property~\eqref{eq:livefixpoint}).
    1027991
    1028 If $\Colour^+(x)=\Colour(r_1)$, we have by property~\eqref{eq:colourprop}
    1029 that $x\sim r_1\at \ell(S)$ (as both $r_1,x\in\Liveafter(\ell(S))$, so that
    1030 $$T'(\Colour^+(x))=T(\Colour(r_2))+T(\Colour(r_3))\stackrel 1=S(r_2)+S(r_3)=S'(r_1)\stackrel 2=S(x)$$
     992If $\Colour^+(x)=\Colour(r_1)$, we have by property~\eqref{eq:colourprop} that $x\sim r_1\at \ell(S)$ (as both $r_1,x\in\Liveafter(\ell(S))$, so that
     993\begin{displaymath}
     994T'(\Colour^+(x))=T(\Colour(r_2))+T(\Colour(r_3))\stackrel 1=S(r_2)+S(r_3)=S'(r_1)\stackrel 2=S(x)
     995\end{displaymath}
    1031996where
    1032997\begin{enumerate}
    1033  \item is by two uses of inductive hypothesis, as $r_2,r_3\in\Livebefore(\ell(S))$,
    1034  \item is by property~\eqref{eq:similprop}\footnote{Notice that in our particular implementation
    1035 for this case of binary op $x\sim r_1\at\ell(S)$ implies $x=r_1$. But nothing prevents us from
    1036 employing more fine euristics for similarity.}.
     998\item
     999follows from two uses of the inductive hypothesis, as $r_2,r_3\in\Livebefore(\ell(S))$,
     1000\item
     1001follows from property~\eqref{eq:similprop}\footnote{Notice that in our particular implementation for this case of binary op $x\sim r_1\at\ell(S)$ implies $x=r_1$.
     1002However, nothing prevents us from employing more finegrained heuristics for similarity.}.
    10371003\end{enumerate}
    10381004
    1039 If $\Colour^+(x)\neq\Colour(r_1)$ (so in particular $x\neq r_1$), then $x\in\Livebefore(\ell(S))$,
    1040 so by inductive hypothesis we have
    1041 $$T'(\Colour^+(x))=T(\Colour^+(x))=S(x)=S'(x).$$
     1005If $\Colour^+(x)\neq\Colour(r_1)$ (so in particular $x\neq r_1$), then $x\in\Livebefore(\ell(S))$, so by inductive hypothesis we have
     1006\begin{displaymath}
     1007T'(\Colour^+(x))=T(\Colour^+(x))=S(x)=S'(x)
     1008\end{displaymath}
    10421009
    10431010\subsection{The LTL to LIN translation}
    10441011\label{subsect.ltl.lin.translation}
    1045 Ad detailed elsewhere in the reports, due to the parameterized representation of
    1046 the back-end languages, the pass described here is actually much more generic
    1047 than the translation from LTL to LIN. It consists in a linearization pass
    1048 that maps any graph-based back-end language to its corresponding linear form,
    1049 preserving its semantics. In the rest of the section, however, we will keep
    1050 the names LTL and LIN for the two partial instantiations of the parameterized
    1051 language.
     1012As detailed elsewhere in the reports, due to the parameterised representation of the back-end languages, the pass described here is actually much more generic than the translation from LTL to LIN.
     1013It consists in a linearisation pass that maps any graph-based back-end language to its corresponding linear form, preserving its semantics.
     1014In the rest of the section, however, we will keep the names LTL and LIN for the two partial instantiations of the parameterized language for convenience.
    10521015
    10531016We require a map, $\sigma$, from LTL statuses, where program counters are represented as labels in a graph data structure, to LIN statuses, where program counters are natural numbers:
     
    11061069\end{enumerate}
    11071070
    1108 Note, because the LTL to LIN transformation is the first time the code of
    1109 a function is linearised in the back-end, we must discover a notion of `well formed function code' suitable for linearised forms.
    1110 In particular, we see the notion of well formedness (yet to be formally defined) resting on the following conditions:
     1071Note, because the LTL to LIN transformation is the first time the code of a function is linearised in the back-end, we must discover a notion of `well-formed function code' suitable for linearised forms.
     1072In particular, we see the notion of well-formedness (yet to be formally defined) resting on the following conditions:
    11111073
    11121074\begin{enumerate}
     
    11161078Each label is unique, appearing only once in the function code,
    11171079\item
    1118 The final instruction of a function code must be a return or an unconditional
    1119 jump.
     1080The final instruction of a function code must be a return or an unconditional jump.
    11201081\end{enumerate}
    11211082
     
    11321093
    11331094The ASM to MCS-51 machine code translation step, and the required statements of correctness, are found in an unpublished manuscript attached to this document.
    1134 This is the most complex translation because of the huge number of cases
    1135 to be addressed and because of the complexity of the two semantics.
    1136 Moreover, in the assembly code we have conditional and unconditional jumps
    1137 to arbitrary locations in the code, which are not supported by the MCS-51
    1138 instruction set. The latter has several kind of jumps characterized by a
    1139 different instruction size and execution time, but limited in range. For
    1140 instance, conditional jumps to locations whose destination is more than
    1141 $2^7$ bytes away from the jump instruction location are not supported at
    1142 all and need to be emulated with a code transformation. The problem, which
    1143 is known in the litterature as branch displacement and that applies also
    1144 to modern architectures, is known to be hard and is often NP. As far as we
    1145 know, we will provide the first formally verified proof of correctness for
    1146 an assembler that implements branch displacement. We are also providing
    1147 the first verified proof of correctness of a mildly optimizing branch
    1148 displacement algorithm that, at the moment, is almost finished, but not
    1149 described in the companion paper. This proof by itself took about 6 men
    1150 months.
     1095This is the most complex translation because of the huge number of cases to be addressed and because of the complexity of the two semantics.
     1096Moreover, in the assembly code we have conditional and unconditional jumps to arbitrary locations in the code, which are not supported by the MCS-51 instruction set.
     1097The latter has several kind of jumps characterized by a different instruction size and execution time, but limited in range.
     1098For instance, conditional jumps to locations whose destination is more than $2^7$ bytes away from the jump instruction location are not supported at all and need to be emulated with a code transformation.
     1099This problem, which is known in the literature as branch displacement and is a universal problem for all architectures of microcontroller, is known to be computationally hard, often lying inside NP, depending on the exact characteristics of the target architecture.
     1100As far as we know, we will provide the first formally verified proof of correctness for an assembler that implements branch displacement.
     1101We are also providing the first verified proof of correctness of a mildly optimising branch displacement algorithm that, at the moment, is almost finished, but not described in the companion paper.
     1102This proof, in isolation, took around 6 man months.
    11511103
    11521104\section{Correctness of cost prediction}
    1153 Roughly speaking,
    1154 the proof of correctness of cost prediction shows that the cost of executing
    1155 a labelled object code program is the same as the sum over all labels in the
    1156 program execution trace of the cost statically associated to the label and
    1157 computed on the object code itself.
    1158 
    1159 In presence of object level function calls, the previous statement is, however,
    1160 incorrect. The reason is twofold. First of all, a function call may diverge.
    1161 To the last labels that comes before the call, however, we also associate
    1162 the cost of the instructions that follow the call. Therefore, in the
    1163 sum over all labels, when we meet a label we pre-pay for the instructions
    1164 after function calls, assuming all calls to be terminating. This choice is
    1165 driven by considerations on the source code. Functions can be called also
    1166 inside expressions and it would be too disruptive to put labels inside
    1167 expressions to capture the cost of instructions that follow a call. Moreover,
    1168 adding a label after each call would produce a much higher number of proof
    1169 obligations in the certification of source programs using Frama-C. The
    1170 proof obligations, moreover, would be guarded by termination of all functions
    1171 involved, that also generates lots of additional complex proof obligations
    1172 that have little to do with execution costs. With our approach, instead, we
    1173 put less burden on the user, at the price of proving a weaker statement:
    1174 the estimated and actual costs will be the same if and only if the high level
    1175 program is converging. For prefixes of diverging programs we can provide
    1176 a similar result where the equality is replaced by an inequality (loss of
    1177 precision).
    1178 
    1179 Assuming totality of functions is however not sufficient yet at the object
    1180 level. Even if a function returns, there is no guarantee that it will transfer
    1181 control back to the calling point. For instance, the function could have
    1182 manipulated the return address from its stack frame. Moreover, an object level
    1183 program can forge any address and transfer control to it, with no guarantee
    1184 on the execution behaviour and labelling properties of the called program.
    1185 
    1186 To solve the problem, we introduced the notion of \emph{structured trace}
    1187 that come in two flavours: structured traces for total programs (an inductive
    1188 type) and structured traces for diverging programs (a co-inductive type based
    1189 on the previous one). Roughly speaking, a structured trace represents the
    1190 execution of a well behaved program that is subject to several constraints
    1191 like
     1105
     1106Roughly speaking, the proof of correctness of cost prediction shows that the cost of executing a labelled object-code program is the same as the summation over all labels in the program execution trace of the cost statically associated to the label and computed on the object code itself.
     1107
     1108In the presence of object-level function calls, the previous statement is, however, incorrect.
     1109The reason is twofold.
     1110First of all, a function call may diverge.
     1111However, to the labels that appears just before a call, we also associate the cost of the instructions that follow the call.
     1112Therefore, in the summation over all labels, when we meet a label we pre-pay for the instructions that appear after function calls, assuming all calls to be terminating.
     1113
     1114This choice is driven by several considerations on the source code.
     1115Namely, functions can be called inside expressions, and it would be too disruptive to put labels inside expressions to capture the cost of instructions that follow a call.
     1116Moreover, adding a label after each call would produce a much higher number of proof obligations in the certification of source programs using Frama-C.
     1117The proof obligations, further, would be guarded by a requirement to demonstrate the termination of all functions involved, that also generates lots of additional complex proof obligations that have little to do with execution costs.
     1118
     1119With our approach, instead, we put less burden on the user, at the price of proving a weaker statement: the estimated and actual costs will be the same if and only if the high-level program is converging.
     1120For prefixes of diverging programs we can provide a similar result where the equality is replaced by an inequality (loss of precision).
     1121
     1122Assuming totality of functions is however not a sufficiently strong condition at the object-level.
     1123Even if a function returns, there is no guarantee that it will transfer control back to the calling point.
     1124For instance, the function could have manipulated the return address from its stack frame.
     1125Moreover, an object-level program can forge any address and transfer control to it, with no guarantees about the execution behaviour and labelling properties of the called program.
     1126
     1127To solve the problem, we introduced the notion of a \emph{structured trace} that come in two flavours: structured traces for total programs (an inductive type) and structured traces for diverging programs (a co-inductive type based on the previous one).
     1128Roughly speaking, a structured trace represents the execution of a well behaved program that is subject to several constraints, such as:
    11921129\begin{enumerate}
    1193  \item All function calls return control just after the calling point
    1194  \item The execution of all function bodies start with a label and end with
    1195    a RET (even the ones reached by invoking a function pointer)
    1196  \item All instructions are covered by a label (required by correctness of
    1197    the labelling approach)
    1198  \item The target of all conditional jumps must be labelled (a sufficient
    1199    but not necessary condition for precision of the labelling approach)
    1200  \item \label{prop5} Two structured traces with the same structure yield the same
    1201    cost traces.
     1130\item
     1131All function calls return control just after the calling point,
     1132\item
     1133The execution of all function bodies start with a label and end with a \texttt{RET} instruction (even those reached by invoking a function pointer),
     1134\item
     1135All instructions are covered by a label (required by the correctness of the labelling approach),
     1136\item
     1137The target of all conditional jumps must be labelled (a sufficient but not necessary condition for precision of the labelling approach)
     1138\item
     1139\label{prop5}
     1140Two structured traces with the same structure yield the same cost traces.
    12021141\end{enumerate}
    12031142
    1204 Correctness of cost predictions is proved only for structured execution traces,
    1205 i.e. well behaved programs. The forward simulation proof for all back-end
    1206 passes will actually be a proof of preservation of the structure of
    1207 the structured traces that, because of property \ref{prop5}, will imply
    1208 correctness of the cost prediction for the back-end. The Clight to RTLabs
    1209 will also include a proof that associates to each converging execution its
    1210 converging structured trace and to each diverging execution its diverging
    1211 structured trace.
    1212 
    1213 There are also other two issues that invalidate the naive statement of
    1214 correctness of cost prediciton given above. The algorithm that statically
    1215 computes the cost of blocks is correct only when the object code is \emph{well
    1216 formed} and the program counter is \emph{reachable}.
    1217 A well formed object code is such that
    1218 the program counter will never overflow after the execution step of
    1219 the processor. An overflow that occurs during fetching but is overwritten
    1220 during execution is, however, correct and necessary to accept correct
    1221 programs that are as large as the program memory. Temporary overflows add
    1222 complications to the proof. A reachable address is an address that can be
    1223 obtained by fetching (not executing!) a finite number of times from the
    1224 beginning of the code memory without ever overflowing. The complication is that
    1225 the static prediction traverses the code memory assuming that the memory will
    1226 be read sequentially from the beginning and that all jumps jump only to
    1227 reachable addresses. When this property is violated, the way the code memory
    1228 is interpreted is uncorrect and the cost computed is totally meaningless.
    1229 The reachability relation is closed by fetching for well formed programs.
    1230 The property that calls to function pointers only target reachable and
    1231 well labelled locations, however, is not statically predictable and it is
    1232 enforced in the structured trace.
    1233 
    1234 The proof of correctness of cost predictions has been quite complex. Setting
    1235 up the good invariants (structured traces, well formed programs, reachability)
    1236 and completing the proof has required more than 3 men months while the initally
    1237 estimated effort was much lower. In the paper-and-pencil proof for IMP, the
    1238 corresponding proof was obvious and only took two lines.
    1239 
    1240 The proof itself is quite involved. We
    1241 basically need to show as an important lemma that the sum of the execution
    1242 costs over a structured trace, where the costs are summed in execution order,
    1243 is equivalent to the sum of the execution costs in the order of pre-payment.
    1244 The two orders are quite different and the proof is by mutual recursion over
    1245 the definition of the converging structured traces, which is a family of three
    1246 mutual inductive types. The fact that this property only holds for converging
    1247 function calls in hidden in the definition of the structured traces.
    1248 Then we need to show that the order of pre-payment
    1249 corresponds to the order induced by the cost traces extracted from the
    1250 structured trace. Finally, we need to show that the statically computed cost
    1251 for one block corresponds to the cost dinamically computed in pre-payment
    1252 order.
     1143Correctness of cost predictions is only proved for structured execution traces, i.e. well behaved programs.
     1144The forward simulation proof for all back-end passes will actually be a proof of preservation of the structure of the structured traces that, because of property \ref{prop5}, will imply correctness of the cost prediction for the back-end.
     1145The Clight to RTLabs correctness proof will also include a proof that associates to each converging execution its converging structured trace and to each diverging execution its diverging structured trace.
     1146
     1147There are also two more issues that invalidate the na\"ive statement of correctness for cost prediciton given above.
     1148The algorithm that statically computes the costs of blocks is correct only when the object code is \emph{well-formed} and the program counter is \emph{reachable}.
     1149A well-formed object-code is such that the program counter will never overflow after any execution step of the processor.
     1150An overflow that occurs during fetching but is overwritten during execution is, however, correct and necessary to accept correct programs that are as large as the processor's code memory.
     1151Temporary overflows add complications to the proof.
     1152A reachable address is an address that can be obtained by fetching (\emph{not executing!}) a finite number of times from the beginning of the code memory without ever overflowing.
     1153The complication is that the static prediction traverses the code memory assuming that the memory will be read sequentially from the beginning and that all jumps jump only to reachable addresses.
     1154When this property is violated, the way the code memory is interpreted is incorrect and the cost computed is meaningless.
     1155The reachability relation is closed by fetching for well-formed programs.
     1156The property that states that function pointers only target reachable and well-labelled locations, however, is not statically predictable and it is therefore enforced in the structured trace.
     1157
     1158The proof of correctness for cost predictions, and even discovering the correct statement, has been quite complex.
     1159Setting up good invariants (i.e. structured traces, well formed programs, reachability) and completing the proof has required more than 3 man months, while the initally estimated effort was much lower.
     1160In the paper-and-pencil proof for IMP, the corresponding proof was `obvious' and only took two lines.
     1161
     1162The proof itself is quite involved.
     1163We must show, as an important lemma, that the sum of the execution costs over a structured trace, where the costs are summed in execution order, is equivalent to the sum of the execution costs in the order of pre-payment.
     1164The two orders are quite different and the proof is by mutual recursion over the definition of the converging structured traces, which is a family of three mutual inductive types.
     1165The fact that this property only holds for converging function calls is hidden in the definition of the structured traces.
     1166Then we need to show that the order of pre-payment corresponds to the order induced by the cost traces extracted from the structured trace.
     1167Finally, we need to show that the statically computed cost for one block corresponds to the cost dynamically computed in pre-payment order.
    12531168
    12541169\section{Overall results}
    12551170
    1256 Functional correctness of the compiled code can be shown by composing
    1257 the simulations to show that the target behaviour matches the
    1258 behaviour of the source program, if the source program does not `go
    1259 wrong'.  More precisely, we show that there is a forward simulation
    1260 between the source trace and a (flattened structured) trace of the
    1261 output, and conclude equivalence because the target's semantics are
    1262 in the form of an executable function, and hence
    1263 deterministic.
    1264 
    1265 Combining this with the correctness of the assignment of costs to cost
    1266 labels at the ASM level for a structured trace, we can show that the
    1267 cost of executing any compiled function (including the main function)
    1268 is equal to the sum of all the values for cost labels encountered in
    1269 the \emph{source code's} trace of the function.
     1171Functional correctness of the compiled code can be shown by composing the simulations to show that the target behaviour matches the behaviour of the source program, if the source program does not `go wrong'.
     1172More precisely, we show that there is a forward simulation between the source trace and a (flattened structured) trace of the output, and conclude equivalence because the target's semantics are in the form of an executable function, and hence deterministic.
     1173
     1174Combining this with the correctness of the assignment of costs to cost labels at the ASM level for a structured trace, we can show that the cost of executing any compiled function (including the top-level, main function of a C program) is equal to the sum of all the values for cost labels encountered in the \emph{source code's} trace of the function.
    12701175
    12711176\section{Estimated effort}
    1272 Based on the rough analysis performed so far we can estimate the total
    1273 effort for the certification of the compiler. We obtain this estimation by
    1274 combining, for each pass: 1) the number of lines of code to be certified;
    1275 2) the ratio of number of lines of proof to number of lines of code from
    1276 the CompCert project~\cite{compcert} for the CompCert pass that is closest to
    1277 ours; 3) an estimation of the complexity of the pass according to the
    1278 analysis above. The result is shown in Table~\ref{table}.
     1177
     1178Based on a rough analysis performed so far we can estimate the total effort required for the certification of the whole compiler.
     1179We obtain this estimation by combining, for each pass:
     1180\begin{enumerate}
     1181\item
     1182The number of lines of code to be certified,
     1183\item
     1184The ratio of number of lines of proof to number of lines of code from the CompCert project~\cite{compcert} for the CompCert pass that is closest in spirit to our own,
     1185\item
     1186An estimation of the complexity of the pass according to the analysis above.
     1187\end{enumerate}
     1188The result is shown in Table~\ref{table}.
    12791189
    12801190\begin{table}{h}
     
    13021212\end{table}
    13031213
    1304 We provide now some additional informations on the methodology used in the
    1305 computation. The passes in Cerco and CompCert front-end closely match each
    1306 other. However, there is no clear correspondence between the two back-ends.
    1307 For instance, we enforce the calling convention immediately after instruction
    1308 selection, whereas in CompCert this is performed in a later phase. Or we
    1309 linearize the code at the very end, whereas CompCert performs linearization
    1310 as soon as possible. Therefore, the first part of the exercise has consisted
    1311 in shuffling and partitioning the CompCert code in order to assign to each
    1312 CerCo pass the CompCert code that performs the same transformation.
    1313 
    1314 After this preliminary step, using the data given in~\cite{compcert} (which
    1315 are relative to an early version of CompCert) we computed the ratio between
    1316 men months and lines of code in CompCert for each CerCo pass. This is shown
    1317 in the third column of Table~\ref{table}. For those CerCo passes that
    1318 have no correspondence in CompCert (like the optimizing assembler) or where
    1319 we have insufficient data, we have used the average of the ratios computed
    1320 above.
    1321 
    1322 The first column of the table shows the number of lines of code for each
    1323 pass in CerCo. The third column is obtained multiplying the first with the
    1324 CompCert ratio. It provides an estimate of the effort required (in men months)
    1325 if the complexity of the proofs for CerCo and Compcert would be the same.
    1326 
    1327 The two proof styles, however, are on purpose completely different. Where
    1328 CompCert uses non executable semantics, describing the various semantics with
    1329 inductive types, we have preferred executable semantics. Therefore, CompCert
    1330 proofs by induction and inversion become proof by functional inversion,
    1331 performed using the Russel methodology (now called Program in Coq, but whose
    1332 behaviour differs from Matita's one). Moreover, CompCert code is written using
    1333 only types that belong to the Hindley-Milner fragment, whereas we have
    1334 heavily exploited dependent types all over the code. The dependent type
    1335 discipline offers many advantages from the point of view of clarity of the
    1336 invariants involved and early detection of errors and it naturally combines
    1337 well with the Russel approach which is based on dependent types. However, it
    1338 is also well known to introduce technical problems all over the code, like
    1339 the need to explicitly prove type equalities to be able to manipulate
    1340 expressions in certain ways. In many situations, the difficulties encountered
    1341 with manipulating dependent types are better addressed by improving the Matita
    1342 system, according to the formalization driven system development. For this
    1343 reason, and assuming a pessimistic point of view on our performance, the
    1344 fourth columns presents the final estimation of the effort required, that also
    1345 takes in account the complexity of the proof suggested by the informal proofs
    1346 sketched in the previous section.
     1214We provide now some additional informations on the methodology used in the computation.
     1215The passes in Cerco and CompCert front-end closely match each other.
     1216However, there is no clear correspondence between the two back-ends.
     1217For instance, we enforce the calling convention immediately after instruction selection, whereas in CompCert this is performed in a later phase.
     1218Further, we linearise the code at the very end, whereas CompCert performs linearisation as soon as possible.
     1219Therefore, the first part of the estimation exercise has consisted of shuffling and partitioning the CompCert code in order to assign to each CerCo pass the CompCert code that performs a similar transformation.
     1220
     1221After this preliminary step, using the data given in~\cite{compcert} (which refers to an early version of CompCert) we computed the ratio between man months and lines of code in CompCert for each CerCo pass.
     1222This is shown in the third column of Table~\ref{table}.
     1223For those CerCo passes that have no correspondence in CompCert (like the optimising assembler) or where we have insufficient data, we have used the average of the ratios computed above.
     1224
     1225The first column of the table shows the number of lines of code for each pass in CerCo.
     1226The third column is obtained multiplying the first with the CompCert ratio.
     1227It provides an estimate of the effort required (in man months) if the complexity of the proofs for CerCo and CompCert would be the same.
     1228
     1229The two proof styles, however, are purposefully completely different.
     1230Where CompCert uses non-executable semantics, describing the various semantics of languages with inductive types, we have preferred executable semantics.
     1231Therefore, CompCert proofs by induction and inversion become proofs by functional inversion, performed using the Russell methodology (now called Program in Coq, but whose behaviour differs from Matita's one).
     1232Moreover, CompCert code is written using only types that belong to the Hindley-Milner fragment, whereas we have heavily exploited dependent types throughout the codebase.
     1233The dependent type discipline offers many advantages, especially from the point of view of clarity of the invariants involved and early detection of errors.
     1234It also naturally combines well with the Russell approach which is based on dependent types.
     1235However, it is also known to introduce several technical problems, like the need to explicitly prove type equalities to be able to manipulate expressions in certain ways.
     1236In many situations, the difficulties encountered with manipulating dependent types are better addressed by improving the Matita system, according to the formalisation driven system development.
     1237For this reason, and assuming a pessimistic estimation of our performance, the fourth columns presents the final estimation of the effort required, that also takes into account the complexity of the proof suggested by the informal proofs sketched in the previous section.
    13471238
    13481239\subsection{Contingency plan}
    1349 On the basis of the proof strategy sketched in this document and the
    1350 estimated effort, we can refine our contingency plan. In case we will end
    1351 the certification of the basic compiler in advance we will have the choice
    1352 of either proving loop optimizations and/or partial redundancy elimination
    1353 correct (both tasks that seem difficult to achieve in a short time) or
    1354 considering the MCS-51 specific extensions introduced during the first period
    1355 and under-used in the formalized prototype. Yet another possibility would be
    1356 to better study retargeting of the code and the commutation property between
    1357 different compiler passes. The latter study is easily enabled by our
    1358 approach where all back-end languages are instances of the same parameterized
    1359 language.
    1360 
    1361 In the case of a consistent delay in the certification of some
    1362 components, we will address first the passes that are more likely to have
    1363 undetected bugs and we will follow a top-down approach, axiomatizing
    1364 the properties of the data structured used in the compiler to focus more
    1365 on the algorithms. The rational is that data structures are easier then
    1366 algorithms to test using well known methodologies.
    1367 The effort table clearly shows that commond definitions
    1368 and data structures are 1/4th of the size of the code and require slightly
    1369 less than 1/3rd of the total effort. At least half of this effort really goes
    1370 into simple data structures (vectors, bounded and unbounded integers, tries
    1371 and maps) whose certification is not interesting and whose code could be
    1372 taken without re-proving it from the library of any other theorem prover.
     1240
     1241On the basis of the proof strategy sketched in this document and the estimated effort, we can refine our contingency plan.
     1242In case we will end the certification of the basic compiler in advance we will have the choice of either proving loop optimisations and/or partial redundancy elimination correct (both tasks seem difficult to achieve in a short time) or considering the MCS-51 specific extensions introduced during the first period and under-used in the formalised prototype.
     1243Yet another possibility would be to better study retargeting of the code and the commutation property between different compiler passes.
     1244The latter study is easily enabled by our approach where all back-end languages are instances of the same parameterized language.
     1245
     1246In the case of a consistent delay in the certification of some components, we will first address the passes that are more likely to have undetected bugs and we will follow a top-down approach, axiomatizing the properties of the data structures used in the compiler to focus more on the algorithms.
     1247The rationale is that data structures are easier than algorithms to test using well-known methodologies.
     1248The effort table clearly shows that common definitions and data structures are one quarter of the size of the current codebase code and require slightly less than one third of the total effort.
     1249At least half of this effort really goes into simple data structures (vectors, bounded and unbounded integers, tries and maps) whose certification is not interesting and whose code could be taken without re-proving it from the library of any other theorem prover.
    13731250
    13741251\section{Conclusions}
    1375 The overall exercise, whose details have been only minimally reported here,
    1376 has been very useful. It has allowed to spot in an early moment some criticities
    1377 of the proof that have required major changes in the proof plan. It has also
    1378 shown that the last passes of the compilation (e.g. assembly) and cost
    1379 prediction on the object code are much more involved than more high level
    1380 passes.
    1381 
    1382 The final estimation for the effort is surely affected by a low degree of
    1383 confidence. It is however sufficient to conclude that the effort required
    1384 is in line with the man power that was scheduled for the second half of the
    1385 second period and for the third period. Compared to the number of men months
    1386 declared in Annex I of the contract, we will need more men months. However,
    1387 both at UNIBO and UEDIN there have been major differences in hiring with
    1388 respect to the Annex. Therefore both sites have now an higher number of men
    1389 months available, with the trade-off of a lower level of maturity of the
    1390 people employed.
    1391 
    1392 The reviewers suggested that we use this estimation to compare two possible
    1393 scenarios: a) proceed as planned, porting all the CompCert proofs to Matita
    1394 or b) port D3.1 and D4.1 to Coq and re-use the CompCert proofs.
    1395 We remark here again that the back-end of the two compilers, from the
    1396 memory model on, are sensibly different: we are not re-proving correctness
    1397 of the same piece of code. Moreover, the proof techniques are different for
    1398 the front-end too. Switching to the CompCert formalization would imply
    1399 the abandon of the untrusted compiler, the abandon of the experiment with
    1400 a different proof technique, the abandon of the quest for an open source
    1401 proof, and the abandon of the co-development of the formalization and the
    1402 Matita proof assistant. In the Commitment Letter~\cite{letter} delivered
    1403 to the Officer in May we clarified our personal perspective on the project
    1404 goals and objectives. We do not re-describe here the point of view presented
    1405 in the letter that we can condense in ``we value diversity''.
    1406 
    1407 Clearly, if the execise would have suggested the infeasability in terms of
    1408 effort of concluding the formalization or getting close to that, we would have
    1409 abandoned our path and embraced the reviewer's suggestion. However, we
    1410 have been comforted in the analysis we did in autumn and further progress done
    1411 during the winter does not show yet any major delay with respect to the
    1412 proof schedule. We are thus planning to continue the certification according
    1413 to the more detailed proof plan that came out from the exercise reported in
    1414 this manuscript.
     1252
     1253The overall exercise, whose details have only been sketched here, has been very useful.
     1254It has brought to light some errors in our proof that have required major changes in the proof plan.
     1255It has also shown that the last passes of the compilation (e.g. assembly) and cost prediction on the object-code are much more involved than more high-level passes.
     1256
     1257The final estimation for the effort required to complete the proof suffers from a low degree of confidence engendered in the numbers, due to the difficulty in relating our work, and compiler design, with that of CompCert.
     1258It is however sufficient to conclude that the effort required is in line with the man power that was scheduled for the second half of the second period and for the third period.
     1259Compared to the number of man months declared in Annex I of the contract, we will need more man months.
     1260However, at both UNIBO and UEDIN there have been major differences in hiring with respect to the Annex.
     1261Therefore both sites now have more manpower available, though with the associated trade-off of a lower level of maturity for the people employed.
     1262
     1263Our reviewers suggested that we use this estimation to compare two possible scenarios: a) proceed as planned, porting all the CompCert proofs to Matita or b) port D3.1 and D4.1 to Coq and re-use the CompCert proofs.
     1264We remark here again that the back-end of the two compilers, from the memory model on, are quite different: we are not re-proving correctness of the same piece of code.
     1265Moreover, the proof techniques are different for the front-end too.
     1266Switching to the CompCert formalisation would imply the abandoning of the untrusted compiler, the abandoning of the experiment with a different proof technique, the abandoning of the quest for an open-source proof, and the abandoning of the co-development of the formalisation and the Matita proof assistant.
     1267In the Commitment Letter~\cite{letter}, delivered to the Officer in May, we clarified our personal perspectives on the project goals and objectives.
     1268We do not re-describe here the point of view presented in the letter, other than the condensed soundbite that ``we value diversity''.
     1269
     1270Clearly, if the execise would have suggested the infeasability in terms of effort of concluding the formalisation, or getting close to that, we would have abandoned our path and embraced the reviewer's suggestion.
     1271However, we have been comforted in the analysis that we did in Autumn and further progress completed during the winter does not yet show any major delay with respect to the proof schedule.
     1272We are thus planning to continue the certification according to the more detailed proof plan that came out from the exercise reported in this manuscript.
    14151273
    14161274\begin{thebibliography}{2}
Note: See TracChangeset for help on using the changeset viewer.