# Changeset 3657 for Papers

Ignore:
Timestamp:
Mar 16, 2017, 12:39:35 PM (2 years ago)
Message:

more cannibalisation

Location:
Papers/jar-cerco-2017
Files:
3 edited

### Legend:

Unmodified
 r3656 \section{Compiler proof} \label{sect.compiler.proof} \section{Introduction} The \cerco{} compiler produces a version of the source code containing annotations describing the timing behaviour of the object code, as well as the object code itself. It compiles C code, targeting microcontrollers implementing the Intel 8051 architecture.  There are two versions: first, an initial prototype was implemented in \ocaml{}~\cite{d2.2}, then a version was formalised in the \matita{} proof assistant~\cite{d3.2,d4.2} and extracted to \ocaml{} code to produce an executable compiler.  In this document we present results from Deliverable 3.4, the formalised proofs in \matita{} about the front-end of the latter version of the compiler (culminating in the \lstinline'front_end_correct' lemma), and describe how that fits into the verification of the whole compiler. A key part of this work was to layer the \emph{intensional} correctness results that show that the costs produced are correct on top of the proofs about the compiled code's \emph{extensional} behaviour (that is, the functional correctness of the compiler).  Unfortunately, the ambitious goal of completely verifying the entire compiler was not feasible within the time available, but thanks to this separation of extensional and intensional proofs we are able to axiomatise some extensional simulation results which are similar to those in other compiler verification projects and concentrate on the novel intensional proofs.  We were also able to add stack space costs to obtain a stronger result.  The proofs were made more tractable by introducing compile-time checks for the sound and precise' cost labelling properties rather than proving that they are preserved throughout. The overall statement of correctness says that the annotated program has the same behaviour as the input, and that for any suitably well-structured part of the execution (which we call \emph{measurable}), the object code will execute the same behaviour taking precisely the time given by the cost annotations in the annotated source program. In the next section we recall the structure of the compiler and make the overall statement more precise.  Following that, in Section~\ref{sec:fegoals} we describe the statements we need to prove about the intermediate \textsf{RTLabs} programs for the back-end proofs. Section~\ref{sec:inputtolabelling} covers the compiler passes which produce the annotated source program and Section~\ref{sec:measurablelifting} the rest of the transformations in the front-end.  Then the compile-time checks for good cost labelling are detailed in Section~\ref{sec:costchecks} and the proofs that the structured traces required by the back-end exist are discussed in Section~\ref{sec:structuredtrace}. \section{The compiler and its correctness statement} The uncertified prototype \ocaml{} \cerco{} compiler was originally described in Deliverables 2.1 and 2.2.  Its design was replicated in the formal \matita{} code, which was presented in Deliverables 3.2 and 4.2, for the front-end and back-end respectively. \begin{figure} \begin{center} \includegraphics[width=0.5\linewidth]{compiler-plain.pdf} \end{center} \caption{Languages in the \cerco{} compiler} \label{fig:compilerlangs} \end{figure} The compiler uses a number of intermediate languages, as outlined the middle two lines of Figure~\ref{fig:compilerlangs}.  The upper line represents the front-end of the compiler, and the lower the back-end, finishing with Intel 8051 binary code.  Not all of the front-end compiler passes introduce a new language, and Figure~\ref{fig:summary} presents a list of every pass involved. \begin{figure} \begin{center} \begin{minipage}{.8\linewidth} \begin{tabbing} \quad \= $\downarrow$ \quad \= \kill \textsf{C} (unformalised)\\ \> $\downarrow$ \> CIL parser (unformalised \ocaml)\\ \textsf{Clight}\\ %\> $\downarrow$ \> add runtime functions\\ \> $\downarrow$ \> \lstinline[language=C]'switch' removal\\ \> $\downarrow$ \> labelling\\ \> $\downarrow$ \> cast removal\\ \> $\downarrow$ \> stack variable allocation and control structure simplification\\ \textsf{Cminor}\\ %\> $\downarrow$ \> generate global variable initialization code\\ \> $\downarrow$ \> transform to RTL graph\\ \textsf{RTLabs}\\ \> $\downarrow$ \> check cost labelled properties of RTL graph\\ \> $\downarrow$ \> start of target specific back-end\\ \>\quad \vdots \end{tabbing} \end{minipage} \end{center} \caption{Front-end languages and compiler passes} \label{fig:summary} \end{figure} \label{page:switchintro} The annotated source code is produced by the cost labelling phase. Note that there is a pass to replace C \lstinline[language=C]'switch' statements before labelling --- we need to remove them because the simple form of labelling used in the formalised compiler is not quite capable of capturing their execution time costs, largely due to C's fall-through' behaviour where execution from one branch continues in the next unless there is an explicit \lstinline[language=C]'break'. The cast removal phase which follows cost labelling simplifies expressions to prevent unnecessary arithmetic promotion, which is specified by the C standard but costly for an 8-bit target.  The transformation to \textsf{Cminor} and subsequently \textsf{RTLabs} bear considerable resemblance to some passes of the CompCert compiler~\cite{Blazy-Leroy-Clight-09,Leroy-backend}, although we use a simpler \textsf{Cminor} where all loops use \lstinline[language=C]'goto' statements, and the \textsf{RTLabs} language retains a target-independent flavour.  The back-end takes \textsf{RTLabs} code as input. The whole compilation function returns the following information on success: \begin{lstlisting}[language=Grafite] record compiler_output : Type[0] := { c_labelled_object_code: labelled_object_code ; c_stack_cost: stack_cost_model ; c_max_stack: nat ; c_init_costlabel: costlabel ; c_labelled_clight: clight_program ; c_clight_cost_map: clight_cost_map }. \end{lstlisting} It consists of annotated 8051 object code, a mapping from function identifiers to the function's stack space usage, the space available for the stack after global variable allocation, a cost label covering the execution time for the initialisation of global variables and the call to the \lstinline[language=C]'main' function, the annotated source code, and finally a mapping from cost labels to actual execution time costs. An \ocaml{} pretty printer is used to provide a concrete version of the output code and annotated source code.  In the case of the annotated source code, it also inserts the actual costs alongside the cost labels, and optionally adds a global cost variable and instrumentation to support further reasoning in external tools such as Frama-C. \subsection{Revisions to the prototype compiler} Our focus on intensional properties prompted us to consider whether we could incorporate stack space into the costs presented to the user. We only allocate one fixed-size frame per function, so modelling this was relatively simple.  It is the only form of dynamic memory allocation provided by the compiler, so we were able to strengthen the statement of the goal to guarantee successful execution whenever the stack space obeys the \lstinline'c_max_stack' bound calculated by subtracting the global variable requirements from the total memory available. The cost labelling checks at the end of Figure~\ref{fig:summary} have been introduced to reduce the proof burden, and are described in Section~\ref{sec:costchecks}. The use of dependent types to capture simple intermediate language invariants makes every front-end pass a total function, except \textsf{Clight} to \textsf{Cminor} and the cost checks.  Hence various well-formedness and type safety checks are performed only once between \textsf{Clight} and \textsf{Cminor}, and the invariants rule out any difficulties in the later stages.  With the benefit of hindsight we would have included an initial checking phase to produce a well-formed' variant of \textsf{Clight}, conjecturing that this would simplify various parts of the proofs for the \textsf{Clight} stages which deal with potentially ill-formed code. Following D2.2, we previously generated code for global variable initialisation in \textsf{Cminor}, for which we reserved a cost label to represent the execution time for initialisation.  However, the back-end must also add an initial call to the main function, whose cost must also be accounted for, so we decided to move the initialisation code to the back-end and merge the costs. \subsection{Main correctness statement} Informally, our main intensional result links the time difference in a source code execution to the time difference in the object code, expressing the time for the source by summing the values for the cost labels in the trace, and the time for the target by a clock built in to the 8051 executable semantics. The availability of precise timing information for 8501 implementations and the design of the compiler allow it to give exact time costs in terms of processor cycles, not just upper bounds. However, these exact results are only available if the subtrace we measure starts and ends at suitable points.  In particular, pure computation with no observable effects may be reordered and moved past cost labels, so we cannot measure time between arbitrary statements in the program. There is also a constraint on the subtraces that we measure due to the requirements of the correctness proof for the object code timing analysis.  To be sure that the timings are assigned to the correct cost label, we need to know that each return from a function call must go to the correct return address.  It is difficult to observe this property locally in the object code because it relies on much earlier stages in the compiler.  To convey this information to the timing analysis extra structure is imposed on the subtraces, which is described in Section~\ref{sec:fegoals}. % Regarding the footnote, would there even be much point? % TODO: this might be quite easy to add ('just' subtract the % measurable subtrace from the second label to the end).  Could also % measure other traces in this manner. These restrictions are reflected in the subtraces that we give timing guarantees on; they must start at a cost label and end at the return of the enclosing function of the cost label\footnote{We expect that this would generalise to more general subtraces by subtracting costs for unwanted measurable suffixes of a measurable subtrace.}.  A typical example of such a subtrace is the execution of an entire function from the cost label at the start of the function until it returns.  We call such any such subtrace \emph{measurable} if it (and the prefix of the trace from the start to the subtrace) can also be executed within the available stack space. Now we can give the main intensional statement for the compiler. Given a \emph{measurable} subtrace for a labelled \textsf{Clight} program, there is a subtrace of the 8051 object code program where the time differences match.  Moreover, \emph{observable} parts of the trace also match --- these are the appearance of cost labels and function calls and returns. More formally, the definition of this statement in \matita{} is \begin{lstlisting}[language=Grafite] definition simulates := $\lambda$p: compiler_output. let initial_status := initialise_status $...$ (cm (c_labelled_object_code $...$ p)) in $\forall$m1,m2. measurable Clight_pcs (c_labelled_clight $...$ p) m1 m2 (stack_sizes (c_stack_cost $...$ p)) (c_max_stack $...$ p) $\rightarrow$ $\forall$c1,c2. clock_after Clight_pcs (c_labelled_clight $...$ p) m1 (c_clight_cost_map $...$ p) = OK $...$ c1 $\rightarrow$ clock_after Clight_pcs (c_labelled_clight $...$ p) (m1+m2) (c_clight_cost_map $...$ p) = OK $...$ c2 $\rightarrow$ $\exists$n1,n2. observables Clight_pcs (c_labelled_clight $...$ p) m1 m2 = observables (OC_preclassified_system (c_labelled_object_code $...$ p)) (c_labelled_object_code $...$ p) n1 n2 $\wedge$ clock ?? (execute (n1+n2) ? initial_status) = clock ?? (execute n1 ? initial_status) + (c2-c1). \end{lstlisting} where the \lstinline'measurable', \lstinline'clock_after' and \lstinline'observables' definitions are generic definitions for multiple languages; in this case the \lstinline'Clight_pcs' record applies them to \textsf{Clight} programs. There is a second part to the statement, which says that the initial processing of the input program to produce the cost labelled version does not affect the semantics of the program: % Yes, I'm paraphrasing the result a tiny bit to remove the observe non-function \begin{lstlisting}[language=Grafite] $\forall$input_program,output. compile input_program = return output $\rightarrow$ not_wrong $...$ (exec_inf $...$ clight_fullexec input_program) $\rightarrow$ sim_with_labels (exec_inf $...$ clight_fullexec input_program) (exec_inf $...$ clight_fullexec (c_labelled_clight $...$ output)) \end{lstlisting} That is, any successful compilation produces a labelled program that has identical behaviour to the original, so long as there is no undefined behaviour'. Note that this statement provides full functional correctness, including preservation of (non-)termination.  The intensional result above does not do this directly --- it does not guarantee the same result or same termination.  There are two mitigating factors, however: first, to prove the intensional property you need local simulation results --- these can be pieced together to form full behavioural equivalence, only time constraints have prevented us from doing so.  Second, if we wish to confirm a result, termination, or non-termination we could add an observable witness, such as a function that is only called if the correct result is given.  The intensional result guarantees that the observable witness is preserved, so the program must behave correctly. These two results are combined in the the \lstinline'correct' theorem in the file \lstinline'correctness.ma'. \section{Correctness statement for the front-end} \label{sec:fegoals} The essential parts of the intensional proof were outlined during work on a toy compiler in Task 2.1~\cite{d2.1,springerlink:10.1007/978-3-642-32469-7_3}.  These are \begin{enumerate} \item functional correctness, in particular preserving the trace of cost labels, \item the \emph{soundness} and \emph{precision} of the cost labelling on the object code, and \item the timing analysis on the object code produces a correct mapping from cost labels to time. \end{enumerate} However, that toy development did not include function calls.  For the full \cerco{} compiler we also need to maintain the invariant that functions return to the correct program location in the caller, as we mentioned in the previous section.  During work on the back-end timing analysis (describe in more detail in the companion deliverable, D4.4) the notion of a \emph{structured trace} was developed to enforce this return property, and also most of the cost labelling properties too. \begin{figure} \begin{center} \includegraphics[width=0.5\linewidth]{compiler.pdf} \end{center} \caption{The compiler and proof outline} \label{fig:compiler} \end{figure} Jointly, we generalised the structured traces to apply to any of the intermediate languages which have some idea of program counter.  This means that they are introduced part way through the compiler, see Figure~\ref{fig:compiler}.  Proving that a structured trace can be constructed at \textsf{RTLabs} has several virtues: \begin{itemize} \item This is the first language where every operation has its own unique, easily addressable, statement. \item Function calls and returns are still handled implicitly in the language and so the structural properties are ensured by the semantics. \item Many of the back-end languages from \textsf{RTL} onwards share a common core set of definitions, and using structured traces throughout increases this uniformity. \end{itemize} \begin{figure} \begin{center} \includegraphics[width=0.6\linewidth]{strtraces.pdf} \end{center} \caption{Nesting of functions in structured traces} \label{fig:strtrace} \end{figure} A structured trace is a mutually inductive data type which contains the steps from a normal program trace, but arranged into a nested structure which groups entire function calls together and aggregates individual steps between cost labels (or between the final cost label and the return from the function), see Figure~\ref{fig:strtrace}.  This captures the idea that the cost labels only represent costs \emph{within} a function --- calls to other functions are accounted for in the nested trace for their execution, and we can locally regard function calls as a single step. These structured traces form the core part of the intermediate results that we must prove so that the back-end can complete the main intensional result stated above.  In full, we provide the back-end with \begin{enumerate} \item A normal trace of the \textbf{prefix} of the program's execution before reaching the measurable subtrace.  (This needs to be preserved so that we know that the stack space consumed is correct, and to set up the simulation results.) \item The \textbf{structured trace} corresponding to the measurable subtrace. \item An additional property about the structured trace that no program counter' is \textbf{repeated} between cost labels.  Together with the structure in the trace, this takes over from showing that cost labelling is sound and precise. \item A proof that the \textbf{observables} have been preserved. \item A proof that the \textbf{stack limit} is still observed by the prefix and the structure trace.  (This is largely a consequence of the preservation of observables.) \end{enumerate} The \lstinline'front_end_correct' lemma in the \lstinline'correctness.ma' file provides a record containing these. Following the outline in Figure~\ref{fig:compiler}, we will first deal with the transformations in \textsf{Clight} that produce the source program with cost labels, then show that measurable traces can be lifted to \textsf{RTLabs}, and finally show that we can construct the properties listed above ready for the back-end proofs. \section{Input code to cost labelled program} \label{sec:inputtolabelling} As explained on page~\pageref{page:switchintro}, the costs of complex C \lstinline[language=C]'switch' statements cannot be represented with the simple labelling used in the formalised compiler.  Our first pass replaces these statements with simpler C code, allowing our second pass to perform the cost labelling.  We show that the behaviour of programs is unchanged by these passes using forward simulations\footnote{All of our languages are deterministic, which can be seen directly from their executable definitions.  Thus we know that forward simulations are sufficient because the target cannot have any other behaviour.}. \subsection{Switch removal} We compile away \lstinline[language=C]'switch' statements into more basic \textsf{Clight} code. Note that this transformation does not necessarily deteriorate the efficiency of the generated code. For instance, compilers such as GCC introduce balanced trees of if-then-else'' constructs for small switches. However, our implementation strategy is much simpler. Let us consider the following input statement. \begin{lstlisting}[language=C] switch(e) { case v1: stmt1; case v2: stmt2; default: stmt_default; } \end{lstlisting} Note that \textsf{stmt1}, \textsf{stmt2}, \ldots \textsf{stmt\_default} may contain \lstinline[language=C]'break' statements, which have the effect of exiting the switch statement. In the absence of break, the execution falls through each case sequentially. In our implementation, we produce an equivalent sequence of if-then'' chained by gotos: \begin{lstlisting}[language=C] fresh = e; if(fresh == v1) { $\llbracket$stmt1$\rrbracket$; goto lbl_case2; }; if(fresh == v2) { lbl_case2: $\llbracket$stmt2$\rrbracket$; goto lbl_case2; }; $\llbracket$stmt_default$\rrbracket$; exit_label: \end{lstlisting} The proof had to tackle the following points: \begin{itemize} \item the source and target memories are not the same (due to the fresh variable), \item the flow of control is changed in a non-local way (e.g. \textbf{goto} instead of \textbf{break}). \end{itemize} In order to tackle the first point, we implemented a version of memory extensions similar to those of CompCert. For the simulation we decided to prove a sufficient amount to give us confidence in the definitions and approach, but to curtail the proof because this pass does not contribute to the intensional correctness result.  We tackled several simple cases, that do not interact with the switch removal per se, to show that the definitions were usable, and part of the switch case to check that the approach is reasonable. This comprises propagating the memory extension through each statement (except switch), as well as various invariants that are needed for the switch case (in particular, freshness hypotheses). The details of the evaluation process for the source switch statement and its target counterpart can be found in the file \lstinline'switchRemoval.ma', along more details on the transformation itself. Proving the correctness of the second point would require reasoning on the semantics of \lstinline[language=C]'goto' statements. In the \textsf{Clight} semantics, this is implemented as a function-wide lookup of the target label. The invariant we would need is the fact that a global label lookup on a freshly created goto is equivalent to a local lookup. This would in turn require the propagation of some freshness hypotheses on labels. As discussed, we decided to omit this part of the correctness proof. \subsection{Cost labelling} The simulation for the cost labelling pass is the simplest in the front-end.  The main argument is that any step of the source program is simulated by the same step of the labelled one, plus any extra steps for the added cost labels.  The extra instructions do not change the memory or local environments, although we have to keep track of the extra instructions that appear in continuations, for example during the execution of a \lstinline[language=C]'while' loop. We do not attempt to capture any cost properties of the labelling\footnote{We describe how the cost properties are established in Section~\ref{sec:costchecks}.} in the simulation proof, which allows the proof to be oblivious to the choice of cost labels.  Hence we do not have to reason about the threading of name generation through the labelling function, greatly reducing the amount of effort required. %TODO: both give one-step-sim-by-many forward sim results; switch %removal tricky, uses aux var to keep result of expr, not central to %intensional correctness so curtailed proof effort once reasonable %level of confidence in code gained; labelling much simpler; don't care %what the labels are at this stage, just need to know when to go %through extra steps.  Rolled up into a single result with a cofixpoint %to obtain coinductive statement of equivalence (show). \section{Finding corresponding measurable subtraces} \label{sec:measurablelifting} There follow the three main passes of the front-end: \begin{enumerate} \item simplification of casts in \textsf{Clight} code \item \textsf{Clight} to \textsf{Cminor} translation, performing stack variable allocation and simplifying control structures \item transformation to \textsf{RTLabs} control flow graph \end{enumerate} We have taken a common approach to each pass: first we build (or axiomatise) forward simulation results that are similar to normal compiler proofs, but which are slightly more fine-grained so that we can see that the call structure and relative placement of cost labels is preserved. Then we instantiate a general result which shows that we can find a \emph{measurable} subtrace in the target of the pass that corresponds to the measurable subtrace in the source.  By repeated application of this result we can find a measurable subtrace of the execution of the \textsf{RTLabs} code, suitable for the construction of a structured trace (see Section~\ref{sec:structuredtrace}).  This is essentially an extra layer on top of the simulation proofs that provides us with the additional information required for our intensional correctness proof. \subsection{Generic measurable subtrace lifting proof} Our generic proof is parametrised on a record containing small-step semantics for the source and target language, a classification of states (the same form of classification is used when defining structured traces), a simulation relation which respects the classification and cost labelling and four simulation results.  The simulations are split by the starting state's classification and whether it is a cost label, which will allow us to observe that the call structure is preserved.  They are: \begin{enumerate} \item a step from a normal' state (which is not classified as a call or return) which is not a cost label is simulated by zero or more normal' steps; \item a step from a call' state followed by a cost label step is simulated by a step from a call' state, a corresponding label step, then zero or more normal' steps; \item a step from a call' state not followed by a cost label similarly (note that this case cannot occur in a well-labelled program, but we do not have enough information locally to exploit this); and \item a cost label step is simulated by a cost label step. \end{enumerate} Finally, we need to know that a successfully translated program will have an initial state in the simulation relation with the original program's initial state. The back-end has similar requirements for lifting simulations to structured traces.  Fortunately, our treatment of calls and returns can be slightly simpler because we have special call and return states that correspond to function entry and return that are separate from the actual instructions.  This was originally inherited from our port of CompCert's \textsf{Clight} semantics, but proves useful here because we only need to consider adding extra steps \emph{after} a call or return state, because the instruction step deals with extra steps that occur before.  The back-end makes all of the call and return machinery explicit, and thus needs more complex statements about extra steps before and after each call and return. \begin{figure} \begin{center} \includegraphics[width=0.5\linewidth]{meassim.pdf} \end{center} \caption{Tiling of simulation for a measurable subtrace} \label{fig:tiling} \end{figure} To find the measurable subtrace in the target program's execution we walk along the original program's execution trace applying the appropriate simulation result by induction on the number of steps. While the number of steps taken varies, the overall structure is preserved, as illustrated in Figure~\ref{fig:tiling}.  By preserving the structure we also maintain the same intensional observables.  One delicate point is that the cost label following a call must remain directly afterwards\footnote{The prototype compiler allowed some straight-line code to appear before the cost label until a later stage of the compiler, but we must move the requirement forward to fit with the structured traces.} % Damn it, I should have just moved the cost label forwards in RTLabs, % like the prototype does in RTL to ERTL; the result would have been % simpler.  Or was there some reason not to do that? (both in the program code and in the execution trace), even if we introduce extra steps, for example to store parameters in memory in \textsf{Cminor}.  Thus we have a version of the call simulation that deals with both the call and the cost label in one result. In addition to the subtrace we are interested in measuring, we must prove that the earlier part of the trace is also preserved in order to use the simulation from the initial state.  This proof also guarantees that we do not run out of stack space before the subtrace we are interested in.  The lemmas for this prefix and the measurable subtrace are similar, following the pattern above.  However, the measurable subtrace also requires us to rebuild the termination proof.  This is defined recursively: \label{prog:terminationproof} \begin{lstlisting}[language=Grafite] let rec will_return_aux C (depth:nat) (trace:list (cs_state $...$ C $\times$ trace)) on trace : bool := match trace with [ nil $\Rightarrow$ false | cons h tl $\Rightarrow$ let $\langle$s,tr$\rangle$ := h in match cs_classify C s with [ cl_call $\Rightarrow$ will_return_aux C (S depth) tl | cl_return $\Rightarrow$ match depth with [ O $\Rightarrow$ match tl with [ nil $\Rightarrow$ true | _ $\Rightarrow$ false ] | S d $\Rightarrow$ will_return_aux C d tl ] | _ $\Rightarrow$ will_return_aux C depth tl ] ]. \end{lstlisting} The \lstinline'depth' is the number of return states we need to see before we have returned to the original function (initially zero) and \lstinline'trace' the measurable subtrace obtained from the running the semantics for the correct number of steps.  This definition unfolds tail recursively for each step, and once the corresponding simulation result has been applied a new one for the target can be asserted by unfolding and applying the induction hypothesis on the shorter trace. Combining the lemmas about the prefix and the measurable subtrace requires a little care because the states joining the two might not be related in the simulation.  In particular, if the measurable subtrace starts from the cost label at the beginning of the function there may be some extra instructions in the target code to execute to complete function entry before the states are back in the relation.  Hence we carefully phrased the lemmas to allow for such extra steps. Together, these then gives us an overall result for any simulation fitting the requirements above (contained in the \lstinline'meas_sim' record): \begin{lstlisting}[language=Grafite] theorem measured_subtrace_preserved : $\forall$MS:meas_sim. $\forall$p1,p2,m,n,stack_cost,max. ms_compiled MS p1 p2 $\rightarrow$ measurable (ms_C1 MS) p1 m n stack_cost max $\rightarrow$ $\exists$m',n'. measurable (ms_C2 MS) p2 m' n' stack_cost max $\wedge$ observables (ms_C1 MS) p1 m n = observables (ms_C2 MS) p2 m' n'. \end{lstlisting} The stack space requirement that is embedded in \lstinline'measurable' is a consequence of the preservation of observables, because it is determined by the functions called and returned from, which are observable. \subsection{Simulation results for each pass} We now consider the simulation results for the passes, each of which is used to instantiate the \lstinline[language=Grafite]'measured_subtrace_preserved' theorem to construct the measurable subtrace for the next language. \subsubsection{Cast simplification} The parser used in \cerco{} introduces a lot of explicit type casts. If left as they are, these constructs can greatly hamper the quality of the generated code -- especially as the architecture we consider is an $8$-bit one. In \textsf{Clight}, casts are expressions. Hence, most of the work of this transformation proceeds on expressions. The transformation proceeds by recursively trying to coerce an expression to a particular integer type, which is in practice smaller than the original one. This functionality is implemented by two mutually recursive functions whose signature is the following. \begin{lstlisting}[language=Grafite] let rec simplify_expr (e:expr) (target_sz:intsize) (target_sg:signedness) : $\Sigma$result:bool$\times$expr. $\forall$ge,en,m. simplify_inv ge en m e (\snd result) target_sz target_sg (\fst result) := $\ldots$ and simplify_inside (e:expr) : $\Sigma$result:expr. conservation e result := $\ldots$ \end{lstlisting} The \textsf{simplify\_inside} acts as a wrapper for \textsf{simplify\_expr}. Whenever \textsf{simplify\_inside} encounters a \textsf{Ecast} expression, it tries to coerce the sub-expression to the desired type using \textsf{simplify\_expr}, which tries to perform the actual coercion. In return, \textsf{simplify\_expr} calls back \textsf{simplify\_inside} in some particular positions, where we decided to be conservative in order to simplify the proofs. However, the current design allows to incrementally revert to a more aggressive version, by replacing recursive calls to \textsf{simplify\_inside} by calls to \textsf{simplify\_expr} \emph{and} proving the corresponding invariants -- where possible. The \textsf{simplify\_inv} invariant encodes either the conservation of the semantics during the transformation corresponding to the failure of the coercion (\textsf{Inv\_eq} constructor), or the successful downcast of the considered expression to the target type (\textsf{Inv\_coerce\_ok}). \begin{lstlisting}[language=Grafite] inductive simplify_inv (ge : genv) (en : env) (m : mem) (e1 : expr) (e2 : expr) (target_sz : intsize) (target_sg : signedness) : bool $\rightarrow$ Prop := | Inv_eq : $\forall$result_flag. $\ldots$ simplify_inv ge en m e1 e2 target_sz target_sg result_flag | Inv_coerce_ok : $\forall$src_sz,src_sg. typeof e1 = Tint src_sz src_sg $\rightarrow$ typeof e2 = Tint target_sz target_sg $\rightarrow$ smaller_integer_val src_sz target_sz src_sg (exec_expr ge en m e1) (exec_expr ge en m e2) $\rightarrow$ simplify_inv ge en m e1 e2 target_sz target_sg true. \end{lstlisting} The \textsf{conservation} invariant for \textsf{simplify\_inside} simply states the conservation of the semantics, as in the \textsf{Inv\_eq} constructor of the previous invariant. \begin{lstlisting}[language=Grafite] definition conservation := $\lambda$e,result. $\forall$ge,en,m. res_sim ? (exec_expr ge en m e) (exec_expr ge en m result) $\wedge$ res_sim ? (exec_lvalue ge en m e) (exec_lvalue ge en m result) $\wedge$ typeof e = typeof result. \end{lstlisting} This invariant is then easily lifted to statement evaluations. The main problem encountered with this particular pass was dealing with inconsistently typed programs, a canonical case being a particular integer constant of a certain size typed with another size. This prompted the need to introduce numerous type checks, making both the implementation and the proof more complex, even though more comprehensive checks are made in the next stage. %\todo{Make this a particular case of the more general statement on baking more invariants in the Clight language} \subsubsection{Clight to Cminor} This pass is the last one operating on the \textsf{Clight} language. Its input is a full \textsf{Clight} program, and its output is a \textsf{Cminor} program. Note that we do not use an equivalent of CompCert's \textsf{C\#minor} language: we translate directly to a variant of \textsf{Cminor}. This presents the advantage of not requiring the special loop constructs, nor the explicit block structure. Another salient point of our approach is that a significant number of the properties needed for the simulation proof were directly encoded in dependently typed translation functions.  In particular, freshness conditions and well-typedness conditions are included. The main effects of the transformation from \textsf{Clight} to \textsf{Cminor} are listed below. \begin{itemize} \item Variables are classified as being either globals, stack-allocated locals or potentially register-allocated locals. The value of register-allocated local variables is moved out of the modelled memory and stored in a dedicated environment. \item In \textsf{Clight}, each local variable has a dedicated memory block, whereas stack-allocated locals are bundled together on a function-by-function basis. \item Loops are converted to jumps. \end{itemize} The first two points require memory injections which are more flexible that those needed in the switch removal case. In the remainder of this section, we briefly discuss our implementation of memory injections, and then the simulation proof. \paragraph{Memory injections.} Our memory injections are modelled after the work of Blazy \& Leroy. However, the corresponding paper is based on the first version of the CompCert memory model~\cite{2008-Leroy-Blazy-memory-model}, whereas we use a much more concrete model, allowing byte-level manipulations (as in the later version of CompCert's memory model). We proved roughly 80 \% of the required lemmas. Notably, some of the difficulties encountered were due to overly relaxed conditions on pointer validity (fixed during development). Some more side conditions had to be added to take care of possible overflows when converting from \textbf{Z} block bounds to $16$ bit pointer offsets (in practice, such overflows only occur in edge cases that are easily ruled out -- but this fact is not visible in memory injections). Concretely, some of the lemmas on the preservation of simulation of loads after writes were axiomatised, due to a lack of time. \paragraph{Simulation proof.} We proved the simulation result for expressions and a representative selection of statements.  In particular we tackled \lstinline[language=C]'while' statements to ensure that we correctly translate loops because our approach differs from CompCert by converting directly to \textsf{Cminor} \lstinline[language=C]'goto's rather than maintaining a notion of loop in \textsf{Cminor}.  We also have a partial proof for function entry, covering the setup of the memory injection, but not function exit.  Exits, and the remaining statements, have been axiomatised. Careful management of the proof state was required because proof terms are embedded in \textsf{Cminor} code to show that invariants are respected.  These proof terms appear in the proof state when inverting the translation functions, and they can be large and awkward.  While generalising them away is usually sufficient, it can be difficult when they appear under a binder. %The correctness proof for this transformation was not completed. We proved the %simulation result for expressions and for some subset of the critical statement cases. %Notably lacking are the function entry and exit, where the memory injection is %properly set up. As would be expected, a significant amount of work has to be performed %to show the conservation of all invariants at each simulation step. %\todo{list cases, explain while loop, explain labeling problem} \subsubsection{Cminor to RTLabs} The translation from \textsf{Cminor} to \textsf{RTLabs} is a fairly routine control flow graph (CFG) construction.  As such, we chose to axiomatise the associated extensional simulation results.  However, we did prove several properties of the generated programs: \begin{itemize} \item All statements are type correct with respect to the declared pseudo-register type environment. \item The CFG is closed, and has a distinguished entry node and a unique exit node. \end{itemize} These properties rely on similar properties about type safety and the presence of \lstinline[language=C]'goto'-labels for \textsf{Cminor} programs which are checked at the preceding stage.  As a result, this transformation is total and any compilation failures must occur when the corresponding \textsf{Clight} source is available and a better error message can be generated. The proof obligations for these properties include many instances of graph inclusion.  We automated these proofs using a small amount of reflection, making the obligations much easier to handle.  One drawback to enforcing invariants throughout is that temporarily breaking them can be awkward.  For example, \lstinline'return' statements were originally used as placeholders for \lstinline[language=C]'goto' destinations that had not yet been translated.  However, this made establishing the single exit node property rather difficult, and a different placeholder was chosen instead.  In other circumstances it is possible to prove a more complex invariant then simplify it at the end of the transformation. \section{Checking cost labelling properties} \label{sec:costchecks} Ideally, we would provide proofs that the cost labelling pass always produces programs that are soundly and precisely labelled and that each subsequent pass preserves these properties.  This would match our use of dependent types to eliminate impossible sources of errors during compilation, in particular retaining intermediate language type information. However, given the limited amount of time available we realised that implementing a compile-time check for a sound and precise labelling of the \textsf{RTLabs} intermediate code would reduce the proof burden considerably.  This is similar in spirit to the use of translation validation in certified compilation, which makes a similar trade-off between the potential for compile-time failure and the volume of proof required. The check cannot be pushed into a later stage of the compiler because much of the information is embedded into the structured traces. However, if an alternative method was used to show that function returns in the compiled code are sufficiently well-behaved, then we could consider pushing the cost property checks into the timing analysis itself.  We leave this as a possible area for future work. \subsection{Implementation and correctness} \label{sec:costchecksimpl} For a cost labelling to be sound and precise we need a cost label at the start of each function, after each branch and at least one in every loop.  The first two parts are trivial to check by examining the code.  In \textsf{RTLabs} the last part is specified by saying that there is a bound on the number of successive instruction nodes in the CFG that you can follow before you encounter a cost label, and checking this is more difficult. The implementation progresses through the set of nodes in the graph, following successors until a cost label is found or a label-free cycle is discovered (in which case the property does not hold and we return an error).  This is made easier by the prior knowledge that every successor of a branch instruction is a cost label, so we do not need to search each branch.  When a label is found, we remove the chain of program counters from the set and continue from another node in the set until it is empty, at which point we know that there is a bound for every node in the graph. Directly reasoning about the function that implements this procedure would be rather awkward, so an inductive specification of a single step of its behaviour was written and proved to match the implementation.  This was then used to prove the implementation sound and complete. While we have not attempted to prove that the cost labelled properties are established and preserved earlier in the compiler, we expect that the effort for the \textsf{Cminor} to \textsf{RTLabs} stage alone would be similar to the work outlined above, because it involves the change from requiring a cost label at particular positions to requiring cost labels to break loops in the CFG.  As there are another three passes to consider (including the labelling itself), we believe that using the check above is much simpler overall. % TODO? Found some Clight to Cminor bugs quite quickly \section{Existence of a structured trace} \label{sec:structuredtrace} The \emph{structured trace} idea introduced in Section~\ref{sec:fegoals} enriches the execution trace of a program by nesting function calls in a mixed-step style and embedding the cost labelling properties of the program.  See Figure~\ref{fig:strtrace} on page~\pageref{fig:strtrace} for an illustration of a structured trace. It was originally designed to support the proof of correctness for the timing analysis of the object code in the back-end, then generalised to provide a common structure to use from the end of the front-end to the object code. To make the definition generic we abstract over the semantics of the language, \begin{lstlisting}[language=Grafite] record abstract_status : Type[1] := { as_status :> Type[0] ; as_execute : relation as_status ; as_pc : DeqSet ; as_pc_of : as_status $\rightarrow$ as_pc ; as_classify : as_status $\rightarrow$ status_class ; as_label_of_pc : as_pc $\rightarrow$ option costlabel ; as_after_return : ($\Sigma$s:as_status. as_classify s =  cl_call) $\rightarrow$ as_status $\rightarrow$ Prop ; as_result: as_status $\rightarrow$ option int ; as_call_ident : ($\Sigma$s:as_status.as_classify s = cl_call) $\rightarrow$ ident ; as_tailcall_ident : ($\Sigma$s:as_status.as_classify s = cl_tailcall) $\rightarrow$ ident }. \end{lstlisting} which requires a type of states, an execution relation\footnote{All of our semantics are executable, but using a relation was simpler in the abstraction.}, some notion of abstract program counter with decidable equality, the classification of states, and functions to extract the observable intensional information (cost labels and the identity of functions that are called).  The \lstinline'as_after_return' property links the state before a function call with the state after return, providing the evidence that execution returns to the correct place.  The precise form varies between stages; in \textsf{RTLabs} it insists the CFG, the pointer to the CFG node to execute next, and some call stack information is preserved. The structured traces are defined using three mutually inductive types.  The core data structure is \lstinline'trace_any_label', which captures some straight-line execution until the next cost label or the return from the enclosing function.  Any function calls are embedded as a single step, with its own trace nested inside and the before and after states linked by \lstinline'as_after_return'; and states classified as a jump' (in particular branches) must be followed by a cost label. The second type, \lstinline'trace_label_label', is a \lstinline'trace_any_label' where the initial state is cost labelled. Thus a trace in this type identifies a series of steps whose cost is entirely accounted for by the label at the start. Finally, \lstinline'trace_label_return' is a sequence of \lstinline'trace_label_label' values which end in the return from the function.  These correspond to a measurable subtrace, and in particular include executions of an entire function call (and so are used for the nested calls in \lstinline'trace_any_label'). \subsection{Construction} The construction of the structured trace replaces syntactic cost labelling properties, which place requirements on where labels appear in the program, with semantic properties that constrain the execution traces of the program.  The construction begins by defining versions of the sound and precise labelling properties on states and global environments (for the code that appears in each of them) rather than whole programs, and showing that these are preserved by steps of the \textsf{RTLabs} semantics. Then we show that each cost labelling property required by the definition of structured traces is locally satisfied.  These proofs are broken up by the classification of states.  Similarly, we prove a step-by-step stack preservation result, which states that the \textsf{RTLabs} semantics never changes the lower parts of the stack. The core part of the construction of a structured trace is to use the proof of termination from the measurable trace (defined on page~\pageref{prog:terminationproof}) to fold up' the execution into the nested form.  The results outlined above fill in the proof obligations for the cost labelling properties and the stack preservation result shows that calls return to the correct location. The structured trace alone is not sufficient to capture the property that the program is soundly labelled.  While the structured trace guarantees termination, it still permits a loop to be executed a finite number of times without encountering a cost label.  We eliminate this by proving that no program counter' repeats within any \lstinline'trace_any_label' section by showing that it is incompatible with the property that there is a bound on the number of successor instructions you can follow in the CFG before you encounter a cost label (from Section~\ref{sec:costchecksimpl}). \subsubsection{Complete execution structured traces} The development of the construction above started relatively early, before the measurable subtrace preservation proofs.  To be confident that the traces were well-formed at that time, we also developed a complete execution form that embeds the traces above.  This includes non-terminating program executions, where an infinite number of the terminating structured traces are embedded.  This construction confirmed that our definition of structured traces was consistent, although we later found that we did not require the whole execution version for the compiler correctness results. To construct these we need to know whether each function call will eventually terminate, requiring the use of the excluded middle.  This classical reasoning is local to the construction of whole program traces and is not necessary for our main results. \section{Conclusion} In combination with the work on the CerCo back-end and by concentrating on the novel intensional parts of the proof, we have shown that it is possible to construct certifying compilers that correctly report execution time and stack space costs.  The layering of intensional correctness proofs on top of normal simulation results provides a useful separation of concerns, and could permit the reuse of existing results. \section{Files} The following table gives a high-level overview of the \matita{} source files in Deliverable 3.4: \bigskip \begin{tabular}{rp{.7\linewidth}} \lstinline'compiler.ma' & Top-level compiler definitions, in particular \lstinline'front_end', and the whole compiler definition \lstinline'compile'. \\ \lstinline'correctness.ma' & Correctness results: \lstinline'front_end_correct' and \lstinline'correct', respectively. \\ \lstinline'Clight/*' & \textsf{Clight}: proofs for switch removal, cost labelling, cast simplification and conversion to \textsf{Cminor}. \\ \lstinline'Cminor/*' & \textsf{Cminor}: axioms of conversion to \textsf{RTLabs}. \\ \lstinline'RTLabs/*' & \textsf{RTLabs}: definitions and proofs for compile-time cost labelling checks, construction of structured traces. \\ \lstinline'common/Measurable.ma' & Definitions for measurable subtraces. \\ \lstinline'common/FEMeasurable.ma' & Generic measurable subtrace lifting proof. \\ \lstinline'common/*' & Other common definitions relevant to many parts of the compiler and proof. \\ \lstinline'utilities/*' & General purpose definitions used throughout, including extensions to the standard \matita{} library. \end{tabular} \subsection{A brief overview of the backend compilation chain} Our abstraction takes the following form: \begin{lstlisting} \begin{lstlisting}[language=Grafite] inductive joint_instruction (p: params__) (globals: list ident): Type[0] := | COMMENT: String $\rightarrow$ joint_instruction p globals | INT: generic_reg p $\rightarrow$ Byte $\rightarrow$ joint_instruction p globals ... | OP1: Op1 $ightarrow$ acc_a_reg p $ightarrow$ acc_a_reg p $ightarrow$ joint_instruction p globals | OP1: Op1 $\rightarrow$ acc_a_reg p $\rightarrow$ acc_a_reg p $\rightarrow$ joint_instruction p globals ... | extension: extend_statements p $\rightarrow$ joint_instruction p globals. As \texttt{p} varies between intermediate languages, we can provide language specific extensions to the syntax of the joint language. For example, ERTL's extended syntax consists of the following extra statements: \begin{lstlisting} \begin{lstlisting}[language=Grafite] inductive ertl_statement_extension: Type[0] := | ertl_st_ext_new_frame: ertl_statement_extension \end{lstlisting} These are further packaged into an ERTL specific instance of \texttt{params\_\_} as follows: \begin{lstlisting} \begin{lstlisting}[language=Grafite] definition ertl_params__: params__ := mk_params__ register register ... ertl_statement_extension. Our joint internal function record looks like so: \begin{lstlisting} \begin{lstlisting}[language=Grafite] record joint_internal_function (globals: list ident) (p:params globals) : Type[0] := { The joint' syntax makes this especially clear. For instance, in the definition: \begin{lstlisting} \begin{lstlisting}[language=Grafite] inductive joint_instruction (p:params__) (globals: list ident): Type[0] := ... For example, in the \texttt{RTLabs} to \texttt{RTL} transformation pass, many functions only make sense' when lists of registers passed to them as arguments conform to some specific length. For instance, the \texttt{translate\_negint} function, which translates a negative integer constant: \begin{lstlisting} \begin{lstlisting}[language=Grafite] definition translate_negint := $\lambda$globals: list ident. Practically, we would always like to ensure that the entry and exit labels are present in the statement graph. We ensure that this is so with a dependent sum type in the \texttt{joint\_internal\_function} record, which all graph based languages specialise to obtain their own internal function representation: \begin{lstlisting} \begin{lstlisting}[language=Grafite] record joint_internal_function (globals: list ident) (p: params globals): Type[0] := { We begin the abstraction process with the \texttt{params\_\_} record. This holds the types of the representations of the different register varieties in the intermediate languages: \begin{lstlisting} \begin{lstlisting}[language=Grafite] record params__: Type[1] := { As mentioned in the report for Deliverable D4.2, the record \texttt{params\_\_} is enough to be able to specify the instructions of the joint languages: \begin{lstlisting} \begin{lstlisting}[language=Grafite] inductive joint_instruction (p: params__) (globals: list ident): Type[0] := | COMMENT: String $\rightarrow$ joint_instruction p globals Naturally, as some intermediate languages are graph based, and others linearised, the passing act of passing control on to the successor' instruction can either be the act of following a graph edge in a control flow graph, or incrementing an index into a list. We make a distinction between instructions that pass control onto their immediate successors, and those that jump elsewhere in the program, through the use of \texttt{succ}, denoting the immediate successor of the current instruction, in the \texttt{params\_} record described below. \begin{lstlisting} \begin{lstlisting}[language=Grafite] record params_: Type[1] := { The type \texttt{succ} corresponds to labels, in the case of control flow graph based languages, or is instantiated to the unit type for the linearised language, LIN. Using \texttt{param\_} we can define statements of the joint language: \begin{lstlisting} \begin{lstlisting}[language=Grafite] inductive joint_statement (p:params_) (globals: list ident): Type[0] := | sequential: joint_instruction p globals $\rightarrow$ succ p $\rightarrow$ joint_statement p globals For the semantics, we need further parametererised types. In particular, we parameterise the result and parameter type of an internal function call in \texttt{params0}: \begin{lstlisting} \begin{lstlisting}[language=Grafite] record params0: Type[1] := { We further extend \texttt{params0} with a type for local variables in internal function calls: \begin{lstlisting} \begin{lstlisting}[language=Grafite] record params1 : Type[1] := { Further, we hypothesise a generic method for looking up the next instruction in the graph, called \texttt{lookup}. Note that \texttt{lookup} may fail, and returns an \texttt{option} type: \begin{lstlisting} \begin{lstlisting}[language=Grafite] record params (globals: list ident): Type[1] := { In particular, we have a description of the result, parameters and the local variables of a function. Note also that we have lifted the hypothesised \texttt{lookup} function from \texttt{params} into a dependent sigma type, which combines a label (the entry and exit points of the control flow graph or list) combined with a proof that the label is in the graph structure: \begin{lstlisting} \begin{lstlisting}[language=Grafite] record joint_internal_function (globals: list ident) (p:params globals) : Type[0] := { The reason is because some intermediate languages share a host of parameters, and only differ on some others. For instance, in instantiating the ERTL language, certain parameters are shared with RTL, whilst others are ERTL specific: \begin{lstlisting} \begin{lstlisting}[language=Grafite] ... definition ertl_params__: params__ := ... definition ertl_params1: params1 := rtl_ertl_params1 ertl_params0. definition ertl_params: ∀globals. params globals := rtl_ertl_params ertl_params0. definition ertl_params: $\forall$globals. params globals := rtl_ertl_params ertl_params0. ... definition ertl_statement := joint_statement ertl_params_. \end{lstlisting} Here, \texttt{rtl\_ertl\_params1} are the common parameters of the ERTL and RTL languages: \begin{lstlisting} \begin{lstlisting}[language=Grafite] definition rtl_ertl_params1 := $\lambda$pars0. mk_params1 pars0 (list register). \end{lstlisting} The record \texttt{more\_sem\_params} bundles together functions that store and retrieve values in various forms of register: \begin{lstlisting} \begin{lstlisting}[language=Grafite] record more_sem_params (p:params_): Type[1] := { We extend \texttt{more\_sem\_params} with yet more parameters via \texttt{more\_sem\_params2}: \begin{lstlisting} \begin{lstlisting}[language=Grafite] record more_sem_params1 (globals: list ident) (p: params globals) : Type[1] := { We bundle \texttt{params} and \texttt{sem\_params} together into a single record. This will be used in the function \texttt{eval\_statement} which executes a single statement of the joint language: \begin{lstlisting} \begin{lstlisting}[language=Grafite] record sem_params2 (globals: list ident): Type[1] := { \noindent The \texttt{state} record holds the current state of the interpreter: \begin{lstlisting} \begin{lstlisting}[language=Grafite] record state (p: sem_params): Type[0] := { We use the function \texttt{eval\_statement} to evaluate a single joint statement: \begin{lstlisting} \begin{lstlisting}[language=Grafite] definition eval_statement: $\forall$globals: list ident.$\forall$p:sem_params2 globals. We call this function \texttt{return} and say that it must have type $\alpha \rightarrow M \alpha$, where $M$ is the name of the monad. In our example, the lifting' function for the \texttt{option} monad can be implemented as: \begin{lstlisting} \begin{lstlisting}[language=Grafite] let return x = Some x \end{lstlisting} We can see that bind unpacks' a monadic value, applies a function after unpacking, and repacks' the new value in the monad. In our example, the sequencing function for the \texttt{option} monad can be implemented as: \begin{lstlisting} \begin{lstlisting}[language=Grafite] let bind o f = match o with Here, the monad's sequencing operation ensures that cost label emissions and function calls are maintained in the correct order. We have already seen how the \texttt{eval\_statement} function of the joint language is monadic, with type: \begin{lstlisting} \begin{lstlisting}[language=Grafite] definition eval_statement: $\forall$globals: list ident.$\forall$p:sem_params2 globals. If we examine the body of \texttt{eval\_statement}, we may also see how the monad sequences effects. For instance, in the case for the \texttt{LOAD} statement, we have the following: \begin{lstlisting} \begin{lstlisting}[language=Grafite] definition eval_statement: $\forall$globals: list ident. $\forall$p:sem_params2 globals. ... match s with | LOAD dst addrl addrh ⇒ | LOAD dst addrl addrh $\Rightarrow$ ! vaddrh $\leftarrow$ dph_retrieve ... st addrh; ! vaddrl $\leftarrow$ dpl_retrieve ... st addrl; Here, we employ a certain degree of syntactic sugaring. The syntax \begin{lstlisting} \begin{lstlisting}[language=Grafite] ... ! vaddrh $\leftarrow$ dph_retrieve ... st addrh; is sugaring for the \texttt{IO} monad's binding operation. We can expand this sugaring to the following much more verbose code: \begin{lstlisting} \begin{lstlisting}[language=Grafite] ... bind (dph_retrieve ... st addrh) ($\lambda$vaddrh. bind (dpl_retrieve ... st addrl)