# Changeset 3231

Ignore:
Timestamp:
Apr 30, 2013, 4:53:05 PM (8 years ago)
Message:

Final revisions to 3.4.

Location:
Deliverables/D3.4/Report
Files:
2 edited

### Legend:

Unmodified
 r3228 of the compiler can be layered on top of normal forward simulation results, if we split those results into local call-structure preserving simulations. This split allowed us to concentrate on the intensional proofs by axiomatising some of the simulation results that are very similar to existing compiler correctness results. This split allowed us to concentrate on the \textbf{intensional} proofs by axiomatising some of the extensional simulation results that are very similar to existing compiler correctness results, such as CompCert. This report is about the correctness results that are deliverable into the verification of the whole compiler. A key part of this work was to layer the intensional correctness A key part of this work was to layer the \emph{intensional} correctness results that show that the costs produced are correct on top of the proofs about the compiled code's extensional behaviour (that is, the proofs about the compiled code's \emph{extensional} behaviour (that is, the functional correctness of the compiler).  Unfortunately, the ambitious goal of completely verifying the entire compiler was not feasible within the time available, but thanks to this separation of extensional and intensional proofs we are able to axiomatize some extensional and intensional proofs we are able to axiomatise some extensional simulation results which are similar to those in other compiler verification projects and concentrate on the novel intensional proofs.  We were describe the statements we need to prove about the intermediate \textsf{RTLabs} programs for the back-end proofs. Section~\ref{sec:inputtolabelling} covers the passes which produce the Section~\ref{sec:inputtolabelling} covers the compiler passes which produce the annotated source program and Section~\ref{sec:measurablelifting} the rest of the transformations in the front-end.  Then the compile time checks of the transformations in the front-end.  Then the compile-time checks for good cost labelling are detailed in Section~\ref{sec:costchecks} and the proofs that the structured traces required by the back-end exist are discussed in Section~\ref{sec:structuredtrace}. \section{The compiler and main goals} The unformalised \ocaml{} \cerco{} compiler was originally described \section{The compiler and its correctness statement} The uncertified prototype \ocaml{} \cerco{} compiler was originally described in Deliverables 2.1 and 2.2.  Its design was replicated in the formal \matita{} code, which was presented in Deliverables 3.2 and 4.2, for middle two lines of Figure~\ref{fig:compilerlangs}.  The upper line represents the front-end of the compiler, and the lower the back-end, finishing with 8051 binary code.  Not all of the front-end passes introduces a new language, and Figure~\ref{fig:summary} presents a finishing with Intel 8051 binary code.  Not all of the front-end compiler passes introduce a new language, and Figure~\ref{fig:summary} presents a list of every pass involved. \begin{tabbing} \quad \= $\downarrow$ \quad \= \kill \textsf{C} (unformalized)\\ \> $\downarrow$ \> CIL parser (unformalized \ocaml)\\ \textsf{C} (unformalised)\\ \> $\downarrow$ \> CIL parser (unformalised \ocaml)\\ \textsf{Clight}\\ %\> $\downarrow$ \> add runtime functions\\ \label{page:switchintro} The annotated source code is taken after the cost labelling phase. The annotated source code is produced by the cost labelling phase. Note that there is a pass to replace C \lstinline[language=C]'switch' statements before labelling --- we need to remove them because the annotated source code, it also inserts the actual costs alongside the cost labels, and optionally adds a global cost variable and instrumentation to support further reasoning. instrumentation to support further reasoning in external tools such as Frama-C. \subsection{Revisions to the prototype compiler} The use of dependent types to capture simple intermediate language invariants makes every front-end pass except \textsf{Clight} to \textsf{Cminor} and the cost checks total functions.  Hence various well-formedness and type safety checks are performed once between invariants makes every front-end pass a total function, except \textsf{Clight} to \textsf{Cminor} and the cost checks.  Hence various well-formedness and type safety checks are performed only once between \textsf{Clight} and \textsf{Cminor}, and the invariants rule out any difficulties in the later stages. With the benefit of hindsight we would have included an initial checking phase to produce a well-formed' variant of \textsf{Clight}, conjecturing that this would simplify various parts of the proofs for the \textsf{Clight} stages which deal with potentially ill-formed code. Following D2.2, we previous generated code for global variable difficulties in the later stages.  With the benefit of hindsight we would have included an initial checking phase to produce a well-formed' variant of \textsf{Clight}, conjecturing that this would simplify various parts of the proofs for the \textsf{Clight} stages which deal with potentially ill-formed code. Following D2.2, we previously generated code for global variable initialisation in \textsf{Cminor}, for which we reserved a cost label to represent the execution time for initialisation.  However, the initialisation code to the back-end and merge the costs. \subsection{Main goals} \subsection{Main correctness statement} Informally, our main intensional result links the time difference in a source theorem in the file \lstinline'correctness.ma'. \section{Goals for the front-end} \section{Correctness statement for the front-end} \label{sec:fegoals} \end{itemize} In order to tackle the first point, we implemented a version of memory extensions similar to CompCert's. extensions similar to those of CompCert. For the simulation we decided to prove a sufficient amount to give us The invariant we would need is the fact that a global label lookup on a freshly created goto is equivalent to a local lookup. This would in turn require the propagation of some freshness hypotheses on labels. For reasons expressed above, propagation of some freshness hypotheses on labels. As discussed, we decided to omit this part of the correctness proof. during the execution of a \lstinline[language=C]'while' loop. We do not attempt to capture any cost properties of the labelling in the simulation proof\footnote{We describe how the cost properties are established in Section~\ref{sec:costchecks}.}, allowing the proof to be oblivious to the choice We do not attempt to capture any cost properties of the labelling\footnote{We describe how the cost properties are established in Section~\ref{sec:costchecks}.} in the simulation proof, which allows the proof to be oblivious to the choice of cost labels.  Hence we do not have to reason about the threading of name generation through the labelling function, greatly reducing the introduce extra steps, for example to store parameters in memory in \textsf{Cminor}.  Thus we have a version of the call simulation that deals with both in one result. In addition to the subtrace we are interested in measuring we must also prove that the earlier part of the trace is also preserved in order to use the simulation from the initial state.  It also that deals with both the call and the cost label in one result. In addition to the subtrace we are interested in measuring, we must prove that the earlier part of the trace is also preserved in order to use the simulation from the initial state.  This proof also guarantees that we do not run out of stack space before the subtrace we are interested in.  The lemmas for this prefix and the measurable \begin{lstlisting}[language=matita] let rec will_return_aux C (depth:nat) (trace:list (cs_state $...$ C × trace)) on trace : bool := (trace:list (cs_state $...$ C $\times$ trace)) on trace : bool := match trace with [ nil $\Rightarrow$ false Combining the lemmas about the prefix and the measurable subtrace requires a little care because the states joining the two might not be in the simulation relation.  In particular, if the measurable subtrace related in the simulation.  In particular, if the measurable subtrace starts from the cost label at the beginning of the function there may be some extra instructions in the target code to execute to complete function entry before the states are back in the relation.  Hence we carefully phrased the lemmas to allow for these extra steps. carefully phrased the lemmas to allow for such extra steps. Together, these then gives us an overall result for any simulation fitting the The \textsf{simplify\_inv} invariant encodes either the conservation of the semantics during the transformation corresponding to the failure of the transformation (\textsf{Inv\_eq} constructor), or the successful of the coercion (\textsf{Inv\_eq} constructor), or the successful downcast of the considered expression to the target type (\textsf{Inv\_coerce\_ok}). Our memory injections are modelled after the work of Blazy \& Leroy. However, the corresponding paper is based on the first version of the CompCert memory model, whereas we use a much more concrete model, allowing byte-level CompCert memory model~\cite{2008-Leroy-Blazy-memory-model}, whereas we use a much more concrete model, allowing byte-level manipulations (as in the later version of CompCert's memory model). We proved roughly 80 \% of the required lemmas. Some difficulties encountered were notably due to some overly relaxed conditions on pointer validity (fixed during development). roughly 80 \% of the required lemmas. Notably, some of the difficulties encountered were due to overly relaxed conditions on pointer validity (fixed during development). Some more side conditions had to be added to take care of possible overflows when converting from \textbf{Z} block bounds to $16$ bit pointer offsets (in practice, these overflows from \textbf{Z} block bounds to $16$ bit pointer offsets (in practice, such overflows only occur in edge cases that are easily ruled out -- but this fact is not visible in memory injections). Concretely, some of the lemmas on the preservation of simulation of translate loops because our approach differs from CompCert by converting directly to \textsf{Cminor} \lstinline[language=C]'goto's rather than maintaining a notion of loops.  We also have a partial rather than maintaining a notion of loop in \textsf{Cminor}.  We also have a partial proof for function entry, covering the setup of the memory injection, but not function exit.  Exits, and the remaining statements, have been axiomatised. Careful management of the proof state was required due to the proof terms that are embedded in \textsf{Cminor} code to show that some invariants are respected.  These proof terms can be large and awkward, and while generalising them away is usually sufficient, it can be difficult when they appear under a binder. Careful management of the proof state was required because proof terms are embedded in \textsf{Cminor} code to show that invariants are respected.  These proof terms appear in the proof state when inverting the translation functions, and they can be large and awkward.  While generalising them away is usually sufficient, it can be difficult when they appear under a binder. %The correctness proof for this transformation was not completed. We proved the The translation from \textsf{Cminor} to \textsf{RTLabs} is a fairly routine control flow graph (CFG) construction.  As such, we chose to axiomatise its simulation results.  However, we did prove several axiomatise the associated extensional simulation results.  However, we did prove several properties of the generated programs: \begin{itemize} These properties rely on similar properties about type safety and the presence of \lstinline[language=C]'goto'-labels for Cminor programs presence of \lstinline[language=C]'goto'-labels for \textsf{Cminor} programs which are checked at the preceding stage.  As a result, this transformation is total and any compilation failures must occur when graph inclusion.  We automated these proofs using a small amount of reflection, making the obligations much easier to handle.  One drawback to enforcing invariants thoroughly is that temporarily drawback to enforcing invariants throughout is that temporarily breaking them can be awkward.  For example, \lstinline'return' statements were originally used as placeholders for \subsection{Implementation and correctness} \label{sec:costchecksimpl} For a cost labelling to be sound and precise we need a cost label at The implementation progresses through the set of nodes in the graph, following successors until a cost label is found or a label-free cycle is discovered (in which case the property does not hold and we stop). This is made easier by the prior knowledge that any branch is followed by cost labels, so we do not need to search each branch.  When a label is found, we remove the chain from the set and continue from another node in the set until it is empty, at which point we know that there is a bound for every node in the graph. Directly reasoning about the function that implements this would be is discovered (in which case the property does not hold and we return an error).  This is made easier by the prior knowledge that every successor of a branch instruction is a cost label, so we do not need to search each branch.  When a label is found, we remove the chain of program counters from the set and continue from another node in the set until it is empty, at which point we know that there is a bound for every node in the graph. Directly reasoning about the function that implements this procedure would be rather awkward, so an inductive specification of a single step of its behaviour was written and proved to match the implementation.  This \label{sec:structuredtrace} The \emph{structured traces} introduced in Section~\ref{sec:fegoals} enrich the execution trace of a program by nesting function calls in a mixed-step style and embedding the cost labelling properties of the program.  It was originally designed to support the proof of correctness for the timing analysis of the object code in the back-end, then generalised to provide a common structure to use from the end of the front-end to the object code.  See Figure~\ref{fig:strtrace} on page~\pageref{fig:strtrace} for an illustration of a structured trace. The \emph{structured trace} idea introduced in Section~\ref{sec:fegoals} enriches the execution trace of a program by nesting function calls in a mixed-step style and embedding the cost labelling properties of the program.  See Figure~\ref{fig:strtrace} on page~\pageref{fig:strtrace} for an illustration of a structured trace. It was originally designed to support the proof of correctness for the timing analysis of the object code in the back-end, then generalised to provide a common structure to use from the end of the front-end to the object code. To make the definition generic we abstract over the semantics of the }. \end{lstlisting} which gives a type of states, an execution relation\footnote{All of which requires a type of states, an execution relation\footnote{All of our semantics are executable, but using a relation was simpler in the abstraction.}, some notion of program counters with decidable equality, the classification of states, the abstraction.}, some notion of abstract program counter with decidable equality, the classification of states, and functions to extract the observable intensional information (cost labels and the identity of functions that are called).  The The construction of the structured trace replaces syntactic cost labelling properties, which place requirements on where labels appear in the program, with semantics properties that constrain the execution in the program, with semantic properties that constrain the execution traces of the program.  The construction begins by defining versions of the sound and precise labelling properties on states and global \textsf{RTLabs} semantics. Then we show that each cost labelling property the structured traces definition requires is locally satisfied.  These are broken up by the classification of states.  Similarly, we prove a step-by-step stack preservation result, which states that the \textsf{RTLabs} semantics never changes the lower parts of the stack. Then we show that each cost labelling property required by the definition of structured traces is locally satisfied.  These proofs are broken up by the classification of states.  Similarly, we prove a step-by-step stack preservation result, which states that the \textsf{RTLabs} semantics never changes the lower parts of the stack. The core part of the construction of a structured trace is to use the with the property that there is a bound on the number of successor instructions you can follow in the CFG before you encounter a cost label. label (from Section~\ref{sec:costchecksimpl}). \subsubsection{Complete execution structured traces}