# Changeset 3173

Ignore:
Timestamp:
Apr 23, 2013, 6:30:54 PM (4 years ago)
Message:

A little reworking of 3.4.

File:
1 edited

### Legend:

Unmodified
 r3168 \vspace*{7cm} \paragraph{Abstract} We report on the correctness proofs for the front-end of the \cerco{} cost lifting compiler, considering three distinct parts of the task: showing that the \emph{annotated source code} output by the compiler has equivalent behaviour to the original input (up to the annotations); showing that a \emph{measurable} subtrace of the annotated source code corresponds to an equivalent measurable subtrace in the code produced by the front-end, including costs; and finally showing that the enriched \emph{structured} execution traces required for cost correctness in the back-end can be constructed from the We report on the correctness proofs for the front-end of the \cerco{} cost lifting compiler.  First, we identify the core result we wish to prove, which says that the we correctly predict the precise execution time for particular parts of the execution called \emph{measurable} subtraces.  Then we consider the three distinct parts of the task: showing that the \emph{annotated source code} output by the compiler has equivalent behaviour to the original input (up to the annotations); showing that a measurable subtrace of the annotated source code corresponds to an equivalent measurable subtrace in the code produced by the front-end, including costs; and finally showing that the enriched \emph{structured} execution traces required for cost correctness in the back-end can be constructed from the properties of the code produced by the front-end. that we get consistent cost measurements throughout the intermediate languages of the compiler can be layered on top of normal forward simulation results, if we split them into local call-structure preserving results. This deliverable shows correctness results about the formalised compiler described in D3.2, using the source language semantics from D3.1 and intermediate language semantics from D3.3.  Together with the companion deliverable about the correctness of the back-end, D4.4, we obtain results about the whole formalised compiler. if we split those results into local call-structure preserving simulations. This deliverable shows correctness results about the formalised compiler described in D3.2, using the source language semantics from D3.1 and intermediate language semantics from D3.3.  It builds on earlier work on a toy compiler built to test the labelling approach in D2.1. Together with the companion deliverable about the correctness of the back-end, D4.4, we obtain results about the whole formalised compiler. % TODO: mention the deliverable about the extracted compiler et al? translation validation on sound, precise labelling properties.} The \cerco{} compiler compiles C code targeting microcontrollers implementing the Intel 8051 architecture, which produces both the object code and source code containing annotations describing the timing behavior of the object code. There are two versions: first, an initial prototype implemented in \ocaml{}~\cite{d2.2}, and a version formalised in the \matita{} proof assistant~\cite{d3.2,d4.2} and then extracted to \ocaml{} code to produce an executable compiler.  In this document we present results formalised in \matita{} about the front-end of that version of the compiler, and how that fits The \cerco{} compiler compiles C code, targeting microcontrollers implementing the Intel 8051 architecture.  It produces both the object code and source code containing annotations describing the timing behaviour of the object code.  There are two versions: first, an initial prototype implemented in \ocaml{}~\cite{d2.2}, and a version formalised in the \matita{} proof assistant~\cite{d3.2,d4.2} and then extracted to \ocaml{} code to produce an executable compiler.  In this document we present results formalised in \matita{} about the front-end of the latter version of the compiler, and how that fits into the verification of the whole compiler. \todo{maybe mention stack space here?  other additions?  refer to "layering"?} A key part of this work was to separate the proofs about the compiled code's extensional behaviour (that is, the functional correctness of the compiler) from the intensional correctness that the costs given are correct. Unfortunately, the ambitious goal of completely verifying the compiler was not feasible within the time available, but thanks to this separation of extensional and intensional proofs we are able to axiomatize simulation results similar to those in other compiler verification projects and concentrate on the novel intensional proofs.  The proofs were also made more tractable by introducing compile-time checks for the sound and precise' cost labelling properties rather than proving that they are preserved throughout. \todo{maybe mention stack space here?  other additions?  refer to "layering"?}  A key part of this work was to separate the proofs about the compiled code's extensional behaviour (that is, the functional correctness of the compiler) from the intensional correctness that the costs given are correct.  Unfortunately, the ambitious goal of completely verifying the entire compiler was not feasible within the time available, but thanks to this separation of extensional and intensional proofs we are able to axiomatize simulation results similar to those in other compiler verification projects and concentrate on the novel intensional proofs.  The proofs were also made more tractable by introducing compile-time checks for the sound and precise' cost labelling properties rather than proving that they are preserved throughout. The overall statement of correctness says that the annotated program has the identifiers to the function's stack space usage\footnote{The compiled code's only stack usage is to allocate a fixed-size frame on each function entry and discard it on exit.}, a cost label covering the initialisation of global variables and calling the \lstinline[language=C]'main' function, the annotated source code, and finally a mapping from cost labels to actual execution time costs. function entry and discard it on exit.  No other dynamic allocation is provided by the compiler.}, the space available for the stack after global variable allocation, a cost label covering the execution time for the initialisation of global variables and the call to the \lstinline[language=C]'main' function, the annotated source code, and finally a mapping from cost labels to actual execution time costs. An \ocaml{} pretty printer is used to provide a concrete version of the output The availability of precise timing information for 8501 implementations and the design of the compiler allow it to give exact time costs in terms of processor cycles.  However, these exact results are only available if the subtrace we measure starts and ends at suitable points.  In particular, pure computation with no observable effects may be reordered and moved past cost labels, so we cannot measure time between arbitrary statements in the program. time costs in terms of processor cycles, not just upper bounds. However, these exact results are only available if the subtrace we measure starts and ends at suitable points.  In particular, pure computation with no observable effects may be reordered and moved past cost labels, so we cannot measure time between arbitrary statements in the program. There is also a constraint on the subtraces that we on much earlier stages in the compiler.  To convey this information to the timing analysis extra structure is imposed on the subtraces, which we will give more details on in Section~\ref{sec:fegoals}. is described in Section~\ref{sec:fegoals}. % Regarding the footnote, would there even be much point? % TODO: this might be quite easy to add ('just' subtract the % measurable subtrace from the second label to the end).  Could also % measure other traces in this manner. These restrictions are reflected in the subtraces that we give timing guarantees on; they must start at a cost label and end at the return the execution of an entire function from the cost label at the start of the function until it returns.  We call such any such subtrace \emph{measurable} if it (and the prefix of the trace before it) can also be executed within the available stack space. \emph{measurable} if it (and the prefix of the trace from the start to the subtrace) can also be executed within the available stack space. Now we can give the main intensional statement for the compiler. \item This is the first language where every operation has its own unique, easily addressable, statement. \item Function calls and returns are still handled in the language and so the structural properties are ensured by the semantics. \item Function calls and returns are still handled implicitly in the language and so the structural properties are ensured by the semantics. \item Many of the back-end languages from \textsf{RTL} share a common core set of definitions, and using structured traces throughout \label{fig:strtrace} \end{figure} A structured trace is a mutually inductive data type which principally A structured trace is a mutually inductive data type which contains the steps from a normal program trace, but arranged into a nested structure which groups entire function calls together and aggregates individual steps between cost labels (or between the final cost label and the return from the function), see Figure~\ref{fig:strtrace}.  This capture the idea that the cost labels Figure~\ref{fig:strtrace}.  This captures the idea that the cost labels only represent costs \emph{within} a function --- calls to other functions are accounted for in the trace for their execution, and we functions are accounted for in the nested trace for their execution, and we can locally regard function calls as a single step. \item A normal trace of the \textbf{prefix} of the program's execution before reaching the measurable subtrace.  (This needs to be preserved so that we know that the stack space consumed is correct.) preserved so that we know that the stack space consumed is correct, and to set up the simulation results.) \item The \textbf{structured trace} corresponding to the measurable subtrace. Then we instantiate a general result which shows that we can find a \emph{measurable} subtrace in the target of the pass that corresponds to the measurable subtract in the source.  By repeated application of to the measurable subtrace in the source.  By repeated application of this result we can find a measurable subtrace of the execution of the \textsf{RTLabs} code, suitable for the construction of a structured subtrace are similar, following the pattern above.  However, the measurable subtrace also requires us to rebuild the termination proof.  This has a recursive form: proof.  This is defined recursively: \label{prog:terminationproof} \begin{lstlisting}[language=matita] let rec will_return_aux C (depth:nat) For a cost labelling to be sound and precise we need a cost label at the start of each function, after each branch and at least once in the start of each function, after each branch and at least one in every loop.  The first two parts are trivial to check by examining the code.  In \textsf{RTLabs} the last part is specified by saying This is made easier by the prior knowledge that any branch is followed by cost labels, so we do not need to search each branch.  When a label is found, we remove the chain from the set and continue until it is empty, at which point we know that there is a bound for every node in the graph. is found, we remove the chain from the set and continue from another node in the set until it is empty, at which point we know that there is a bound for every node in the graph. Directly reasoning about the function that implements this would be program counters with decidable equality, the classification of states and functions to extract the observable intensional information (cost labels and the identity of functions that are called). labels and the identity of functions that are called).  The \lstinline'as_after_return' property links the state before a function call with the state after return, providing the evidence that execution returns to the correct place.  The precise form varies between stages; in \textsf{RTLabs} it insists the CFG, the pointer to the CFG node to execute next, and some call stack information is preserved. The structured traces are defined using three mutually inductive captures some straight-line execution until the next cost label or function return.  Each function call is embedded as a single step, with its own trace nested inside, and states classified as a jump' (in particular branches) must be followed by a cost label. with its own trace nested inside and the before and after states linked by \lstinline'as_after_return', and states classified as a jump' (in particular branches) must be followed by a cost label. The second type, \lstinline'trace_label_label', is a \lstinline'trace_label_label' values which end in the return from the function.  These correspond to a measurable subtrace, and in particular include executions of an entire function call. particular include executions of an entire function call (and so are used for the nested calls in \lstinline'trace_any_label'). \subsection{Construction} The core part of the construction of a fixed trace is to use the proof of termination from the measurable trace (defined on page~\pageref{prog:terminationproof}) to `fold up' the execution into the nested form. TODO: design, basic structure from termination proof, how cost