source: Deliverables/D3.4/Report/report.tex @ 3223

Last change on this file since 3223 was 3223, checked in by campbell, 7 years ago

More revisions to 3.4.

File size: 51.2 KB
14% LaTeX Companion, p 74
15\newcommand{\todo}[1]{\marginpar{\raggedright - #1}}
18  {keywords={Definition,Lemma,Theorem,Remark,Qed,Save,Inductive,Record},
19   morekeywords={[2]if,then,else},
20  }
23  {keywords={definition,lemma,theorem,remark,inductive,record,qed,let,rec,match,with,Type,and,on},
24   morekeywords={[2]whd,normalize,elim,cases,destruct},
25   mathescape=true,
26   morecomment=[n]{(*}{*)},
27  }
30        keywordstyle=\color{red}\bfseries,
31        keywordstyle=[2]\color{blue},
32        commentstyle=\color{green},
33        stringstyle=\color{blue},
34        showspaces=false,showstringspaces=false}
48\vspace*{1cm}Project FP7-ICT-2009-C-243881 \cerco{}}
50\date{ }
70Report n. D3.4\\
71Front-end Correctness Proofs\\
78Version 1.0
86Brian~Campbell, Ilias~Garnier, James~McKinna, Ian~Stark
92Project Acronym: \cerco{}\\
93Project full title: Certified Complexity\\
94Proposal/Contract no.: FP7-ICT-2009-C-243881 \cerco{}\\
96\clearpage \pagestyle{myheadings} \markright{\cerco{}, FP7-ICT-2009-C-243881}
[3173]102We report on the correctness proofs for the front-end of the \cerco{}
103cost lifting compiler.  First, we identify the core result we wish to
104prove, which says that the we correctly predict the precise execution
105time for particular parts of the execution called \emph{measurable}
106subtraces.  Then we consider the three distinct parts of the task:
107showing that the \emph{annotated source code} output by the compiler
108has equivalent behaviour to the original input (up to the
109annotations); showing that a measurable subtrace of the
110annotated source code corresponds to an equivalent measurable subtrace
111in the code produced by the front-end, including costs; and finally
112showing that the enriched \emph{structured} execution traces required
113for cost correctness in the back-end can be constructed from the
[3128]114properties of the code produced by the front-end.
[3214]116A key part of our work is that the intensional correctness results which show
[3128]117that we get consistent cost measurements throughout the intermediate languages
118of the compiler can be layered on top of normal forward simulation results,
[3173]119if we split those results into local call-structure preserving simulations.
[3173]121This deliverable shows correctness results about the formalised
122compiler described in D3.2, using the source language semantics from
123D3.1 and intermediate language semantics from D3.3.  It builds on
[3214]124earlier work on the correctness of a toy compiler built to test the
125labelling approach in D2.1. Together with the companion deliverable
126about the correctness of the back-end, D4.4, we obtain results about
127the whole formalised compiler.
129% TODO: mention the deliverable about the extracted compiler et al?
135% CHECK: clear up any -ize vs -ise
136% CHECK: clear up any "front end" vs "front-end"
137% CHECK: clear up any mentions of languages that aren't textsf'd.
138% CHECK: fix unicode in listings
[3214]142The \cerco{} compiler produces a version of the source code containing
143annotations describing the timing behaviour of the object code, as well
144as the object code itself. It compiles C code, targeting
145microcontrollers implementing the Intel 8051 architecture.  There are
146two versions: first, an initial prototype was implemented in
147\ocaml{}~\cite{d2.2}, then a version was formalised in the \matita{} proof
148assistant~\cite{d3.2,d4.2} and extracted to \ocaml{} code to
149produce an executable compiler.  In this document we present results
150formalised in \matita{} about the front-end of the latter version of
151the compiler, and how that fits into the verification of the whole
[3212]154A key part of this work was to layer the intensional correctness
[3214]155results that show that the costs produced are correct on top of the
[3212]156proofs about the compiled code's extensional behaviour (that is, the
157functional correctness of the compiler).  Unfortunately, the ambitious
158goal of completely verifying the entire compiler was not feasible
159within the time available, but thanks to this separation of
[3214]160extensional and intensional proofs we are able to axiomatize some
161simulation results which are similar to those in other compiler verification
[3212]162projects and concentrate on the novel intensional proofs.  We were
163also able to add stack space costs to obtain a stronger result.  The
164proofs were made more tractable by introducing compile-time checks for
[3173]165the `sound and precise' cost labelling properties rather than proving
166that they are preserved throughout.
168The overall statement of correctness says that the annotated program has the
169same behaviour as the input, and that for any suitably well-structured part of
170the execution (which we call \emph{measurable}), the object code will execute
171the same behaviour taking precisely the time given by the cost annotations in
172the annotated source program.
174In the next section we recall the structure of the compiler and make the overall
175statement more precise.  Following that, in Section~\ref{sec:fegoals} we
176describe the statements we need to prove about the intermediate \textsf{RTLabs}
[3214]177programs for the back-end proofs.
[3212]178Section~\ref{sec:inputtolabelling} covers the passes which produce the
179annotated source program and Section~\ref{sec:measurablelifting} the rest
180of the transformations in the front-end.  Then the compile time checks
181for good cost labelling are detailed in Section~\ref{sec:costchecks}
182and the proofs that the structured traces required by the back-end
183exist are discussed in Section~\ref{sec:structuredtrace}.
185\section{The compiler and main goals}
[3181]187The unformalised \ocaml{} \cerco{} compiler was originally described
188in Deliverables 2.1 and 2.2.  Its design was replicated in the formal
189\matita{} code, which was presented in Deliverables 3.2 and 4.2, for
190the front-end and back-end respectively.
196\caption{Languages in the \cerco{} compiler}
200The compiler uses a number of intermediate languages, as outlined the
201middle two lines of Figure~\ref{fig:compilerlangs}.  The upper line
202represents the front-end of the compiler, and the lower the back-end,
203finishing with 8051 binary code.  Not all of the front-end passes
204introduces a new language, and Figure~\ref{fig:summary} presents a
205list of every pass involved.
211\quad \= $\downarrow$ \quad \= \kill
212\textsf{C} (unformalized)\\
213\> $\downarrow$ \> CIL parser (unformalized \ocaml)\\
[3142]215%\> $\downarrow$ \> add runtime functions\\
216\> $\downarrow$ \> \lstinline[language=C]'switch' removal\\
217\> $\downarrow$ \> labelling\\
[3128]218\> $\downarrow$ \> cast removal\\
219\> $\downarrow$ \> stack variable allocation and control structure
220 simplification\\
[3142]222%\> $\downarrow$ \> generate global variable initialization code\\
[3128]223\> $\downarrow$ \> transform to RTL graph\\
[3142]225\> $\downarrow$ \> check cost labelled properties of RTL graph\\
[3128]226\> $\downarrow$ \> start of target specific back-end\\
227\>\quad \vdots
[3142]231\caption{Front-end languages and compiler passes}
236The annotated source code is taken after the cost labelling phase.
237Note that there is a pass to replace C \lstinline[language=C]'switch'
238statements before labelling --- we need to remove them because the
239simple form of labelling used in the formalised compiler is not quite
[3214]240capable of capturing their execution time costs, largely due to C's
241`fall-through' behaviour where execution from one branch continues in
242the next unless there is an explicit \lstinline[language=C]'break'.
244The cast removal phase which follows cost labelling simplifies
[3214]245expressions to prevent unnecessary arithmetic promotion, which is
[3181]246specified by the C standard but costly for an 8-bit target.  The
247transformation to \textsf{Cminor} and subsequently \textsf{RTLabs}
248bear considerable resemblance to some passes of the CompCert
[3212]249compiler~\cite{Blazy-Leroy-Clight-09,Leroy-backend}, although we use a simpler \textsf{Cminor} where
[3181]250all loops use \lstinline[language=C]'goto' statements, and the
251\textsf{RTLabs} language retains a target-independent flavour.  The
252back-end takes \textsf{RTLabs} code as input.
254The whole compilation function returns the following information on success:
[3214]256  record compiler_output : Type[0] :=
257   { c_labelled_object_code: labelled_object_code
258   ; c_stack_cost: stack_cost_model
259   ; c_max_stack: nat
260   ; c_init_costlabel: costlabel
261   ; c_labelled_clight: clight_program
262   ; c_clight_cost_map: clight_cost_map
263   }.
265It consists of annotated 8051 object code, a mapping from function
[3214]266identifiers to the function's stack space usage, the space available for the
[3173]267stack after global variable allocation, a cost label covering the
268execution time for the initialisation of global variables and the call
269to the \lstinline[language=C]'main' function, the annotated source
270code, and finally a mapping from cost labels to actual execution time
[3214]273An \ocaml{} pretty printer is used to provide a concrete version of
274the output code and annotated source code.  In the case of the
275annotated source code, it also inserts the actual costs alongside the
276cost labels, and optionally adds a global cost variable and
277instrumentation to support further reasoning.
279\subsection{Revisions to the prototype compiler}
[3181]281Our focus on intensional properties prompted us to consider whether we
282could incorporate stack space into the costs presented to the user.
283We only allocate one fixed-size frame per function, so modelling this
284was relatively simple.  It is the only form of dynamic memory
285allocation provided by the compiler, so we were able to strengthen the
286statement of the goal to guarantee successful execution whenever the
[3214]287stack space obeys the \lstinline'c_max_stack' bound calculated by
288subtracting the global variable requirements from the total memory
[3214]291The cost labelling checks at the end of Figure~\ref{fig:summary} have been
[3181]292introduced to reduce the proof burden, and are described in
[3181]295The use of dependent types to capture simple intermediate language
296invariants makes every front-end pass except \textsf{Clight} to
297\textsf{Cminor} and the cost checks total functions.  Hence various
[3214]298well-formedness and type safety checks are performed once between
299\textsf{Clight} and \textsf{Cminor}, and the invariants rule out any
300difficulties in the later stages.
[3181]301With the benefit of hindsight we would have included an initial
302checking phase to produce a `well-formed' variant of \textsf{Clight},
303conjecturing that this would simplify various parts of the proofs for
304the \textsf{Clight} stages which deal with potentially ill-formed
[3223]307Following D2.2, we previous generated code for global variable
308initialisation in \textsf{Cminor}, for which we reserved a cost label
309to represent the execution time for initialisation.  However, the
310back-end must also add an initial call to the main function, whose
311cost must also be accounted for, so we decided to move the
312initialisation code to the back-end and merge the costs.
[3128]314\subsection{Main goals}
316Informally, our main intensional result links the time difference in a source
317code execution to the time difference in the object code, expressing the time
318for the source by summing the values for the cost labels in the trace, and the
319time for the target by a clock built in to the 8051 executable semantics.
321The availability of precise timing information for 8501
322implementations and the design of the compiler allow it to give exact
[3173]323time costs in terms of processor cycles, not just upper bounds.
324However, these exact results are only available if the subtrace we
325measure starts and ends at suitable points.  In particular, pure
326computation with no observable effects may be reordered and moved past
327cost labels, so we cannot measure time between arbitrary statements in
328the program.
330There is also a constraint on the subtraces that we
331measure due to the requirements of the correctness proof for the
332object code timing analysis.  To be sure that the timings are assigned
333to the correct cost label, we need to know that each return from a
334function call must go to the correct return address.  It is difficult
335to observe this property locally in the object code because it relies
336on much earlier stages in the compiler.  To convey this information to
337the timing analysis extra structure is imposed on the subtraces, which
[3173]338is described in Section~\ref{sec:fegoals}.
340% Regarding the footnote, would there even be much point?
[3173]341% TODO: this might be quite easy to add ('just' subtract the
342% measurable subtrace from the second label to the end).  Could also
343% measure other traces in this manner.
[3128]344These restrictions are reflected in the subtraces that we give timing
345guarantees on; they must start at a cost label and end at the return
346of the enclosing function of the cost label\footnote{We expect that
[3216]347  this would generalise to more general subtraces by subtracting costs
348  for unwanted measurable suffixes of a measurable subtrace.}.  A
349typical example of such a subtrace is the execution of an entire
350function from the cost label at the start of the function until it
351returns.  We call such any such subtrace \emph{measurable} if it (and
352the prefix of the trace from the start to the subtrace) can also be
353executed within the available stack space.
355Now we can give the main intensional statement for the compiler.
356Given a \emph{measurable} subtrace for a labelled \textsf{Clight}
357program, there is a subtrace of the 8051 object code program where the
358time differences match.  Moreover, \emph{observable} parts of the
359trace also match --- these are the appearance of cost labels and
360function calls and returns.
[3128]364More formally, the definition of this statement in \matita{} is
366definition simulates :=
367  $\lambda$p: compiler_output.
[3216]368  let initial_status := initialise_status $...$ (cm (c_labelled_object_code $...$ p)) in
[3128]369  $\forall$m1,m2.
[3216]370   measurable Clight_pcs (c_labelled_clight $...$ p) m1 m2
371       (stack_sizes (c_stack_cost $...$ p)) (c_max_stack $...$ p) $\rightarrow$
[3128]372  $\forall$c1,c2.
[3216]373   clock_after Clight_pcs (c_labelled_clight $...$ p) m1 (c_clight_cost_map $...$ p) = OK $...$ c1 $\rightarrow$
374   clock_after Clight_pcs (c_labelled_clight $...$ p) (m1+m2) (c_clight_cost_map $...$ p) = OK $...$ c2 $\rightarrow$
[3128]375  $\exists$n1,n2.
[3216]376   observables Clight_pcs (c_labelled_clight $...$ p) m1 m2 =
377     observables (OC_preclassified_system (c_labelled_object_code $...$ p))
378          (c_labelled_object_code $...$ p) n1 n2
[3128]379  $\wedge$
[3216]380   clock ?? (execute (n1+n2) ? initial_status) =
381     clock ?? (execute n1 ? initial_status) + (c2-c1).
383where the \lstinline'measurable', \lstinline'clock_after' and
[3216]384\lstinline'observables' definitions are generic definitions for multiple
[3128]385languages; in this case the \lstinline'Clight_pcs' record applies them
386to \textsf{Clight} programs.
[3131]388There is a second part to the statement, which says that the initial
389processing of the input program to produce the cost labelled version
390does not affect the semantics of the program:
391% Yes, I'm paraphrasing the result a tiny bit to remove the observe non-function
393  $\forall$input_program,output.
394  compile input_program = return output $\rightarrow$
[3216]395  not_wrong $...$ (exec_inf $...$ clight_fullexec input_program) $\rightarrow$
[3131]396  sim_with_labels
[3216]397   (exec_inf $...$ clight_fullexec input_program)
398   (exec_inf $...$ clight_fullexec (c_labelled_clight $...$ output))
400That is, any successful compilation produces a labelled program that
401has identical behaviour to the original, so long as there is no
402`undefined behaviour'.
[3216]404Note that this statement provides full functional correctness, including
[3131]405preservation of (non-)termination.  The intensional result above does
406not do this directly --- it does not guarantee the same result or same
407termination.  There are two mitigating factors, however: first, to
[3216]408prove the intensional property you need local simulation results --- these
[3131]409can be pieced together to form full behavioural equivalence, only time
410constraints have prevented us from doing so.  Second, if we wish to
411confirm a result, termination, or non-termination we could add an
412observable witness, such as a function that is only called if the
413correct result is given.  The intensional result guarantees that the
414observable witness is preserved, so the program must behave correctly.
[3223]416\section{Goals for the front-end}
419The essential parts of the intensional proof were outlined during work
[3216]420on a toy compiler in Task
4212.1~\cite{d2.1,springerlink:10.1007/978-3-642-32469-7_3}.  These are
423\item functional correctness, in particular preserving the trace of
424  cost labels,
425\item the \emph{soundness} and \emph{precision} of the cost labelling
426  on the object code, and
427\item the timing analysis on the object code produces a correct
428  mapping from cost labels to time.
431However, that toy development did not include function calls.  For the
432full \cerco{} compiler we also need to maintain the invariant that
433functions return to the correct program location in the caller, as we
434mentioned in the previous section.  During work on the back-end timing
435analysis (describe in more detail in the companion deliverable, D4.4)
436the notion of a \emph{structured trace} was developed to enforce this
437return property, and also most of the cost labelling properties too.
443\caption{The compiler and proof outline}
447Jointly, we generalised the structured traces to apply to any of the
[3216]448intermediate languages which have some idea of program counter.  This means
[3128]449that they are introduced part way through the compiler, see
450Figure~\ref{fig:compiler}.  Proving that a structured trace can be
451constructed at \textsf{RTLabs} has several virtues:
453\item This is the first language where every operation has its own
454  unique, easily addressable, statement.
[3173]455\item Function calls and returns are still handled implicitly in the
456  language and so the structural properties are ensured by the
457  semantics.
[3216]458\item Many of the back-end languages from \textsf{RTL} onwards share a common
[3128]459  core set of definitions, and using structured traces throughout
460  increases this uniformity.
467\caption{Nesting of functions in structured traces}
[3173]470A structured trace is a mutually inductive data type which
[3128]471contains the steps from a normal program trace, but arranged into a
472nested structure which groups entire function calls together and
473aggregates individual steps between cost labels (or between the final
[3140]474cost label and the return from the function), see
[3173]475Figure~\ref{fig:strtrace}.  This captures the idea that the cost labels
[3140]476only represent costs \emph{within} a function --- calls to other
[3173]477functions are accounted for in the nested trace for their execution, and we
[3140]478can locally regard function calls as a single step.
480These structured traces form the core part of the intermediate results
481that we must prove so that the back-end can complete the main
482intensional result stated above.  In full, we provide the back-end
485\item A normal trace of the \textbf{prefix} of the program's execution
486  before reaching the measurable subtrace.  (This needs to be
[3173]487  preserved so that we know that the stack space consumed is correct,
488  and to set up the simulation results.)
[3128]489\item The \textbf{structured trace} corresponding to the measurable
490  subtrace.
491\item An additional property about the structured trace that no
[3131]492  `program counter' is \textbf{repeated} between cost labels.  Together with
[3128]493  the structure in the trace, this takes over from showing that
494  cost labelling is sound and precise.
[3131]495\item A proof that the \textbf{observables} have been preserved.
496\item A proof that the \textbf{stack limit} is still observed by the prefix and
[3128]497  the structure trace.  (This is largely a consequence of the
498  preservation of observables.)
501Following the outline in Figure~\ref{fig:compiler}, we will first deal
502with the transformations in \textsf{Clight} that produce the source
503program with cost labels, then show that measurable traces can be
[3216]504lifted to \textsf{RTLabs}, and finally show that we can construct the
[3128]505properties listed above ready for the back-end proofs.
507\section{Input code to cost labelled program}
[3181]510As explained on page~\pageref{page:switchintro}, the costs of complex
511C \lstinline[language=C]'switch' statements cannot be represented with
[3216]512the simple labelling used in the formalised compiler.  Our first pass
513replaces these statements with simpler C code, allowing our second
514pass to perform the cost labelling.  We show that the behaviour of
515programs is unchanged by these passes using forward
516simulations\footnote{All of our languages are deterministic, which can
517be seen directly from their executable definitions.  Thus we know that
518forward simulations are sufficient because the target cannot have any
519other behaviour.}.
[3211]521\subsection{Switch removal}
[3216]523We compile away \lstinline[language=C]'switch' statements into more
524basic \textsf{Clight} code.
[3211]525Note that this transformation does not necessarily deteriorate the
526efficiency of the generated code. For instance, compilers such as GCC
527introduce balanced trees of ``if-then-else'' constructs for small
528switches. However, our implementation strategy is much simpler. Let
529us consider the following input statement.
532   switch(e) {
533   case v1:
534     stmt1;
535   case v2:
536     stmt2;
537   default:
538     stmt_default;
539   }
542Note that \textsf{stmt1}, \textsf{stmt2}, \ldots \textsf{stmt\_default}
543may contain \lstinline[language=C]'break' statements, which have the
544effect of exiting the switch statement. In the absence of break, the
[3216]545execution falls through each case sequentially. In our implementation,
[3211]546we produce an equivalent sequence of ``if-then'' chained by gotos:
548   fresh = e;
549   if(fresh == v1) {
550     $\llbracket$stmt1$\rrbracket$;
551     goto lbl_case2;
552   };
553   if(fresh == v2) {
554     lbl_case2:
[3216]555     $\llbracket$stmt2$\rrbracket$;
[3211]556     goto lbl_case2;
557   };
558   $\llbracket$stmt_default$\rrbracket$;
559   exit_label:
562The proof had to tackle the following points:
[3216]564\item the source and target memories are not the same (due to the fresh variable),
[3211]565\item the flow of control is changed in a non-local way (e.g. \textbf{goto} 
566  instead of \textbf{break}).
568In order to tackle the first point, we implemented a version of memory
[3223]569extensions similar to CompCert's.
[3223]571For the simulation we decided to prove a sufficient amount to give us
572confidence in the definitions and approach, but curtail the proof
573because this pass does not contribute to the intensional correctness
574result.  We tackled several simple cases, that do not interact with
575the switch removal per se, to show that the definitions were usable,
576and part of the switch case to check that the approach is
577reasonable. This comprises propagating the memory extension through
578each statement (except switch), as well as various invariants that are
579needed for the switch case (in particular, freshness hypotheses). The
580details of the evaluation process for the source switch statement and
581its target counterpart can be found in the file
582\textbf{}, along more details on the transformation
[3211]585Proving the correctness of the second point would require reasoning on the
586semantics of \lstinline[language=C]'goto' statements. In the \textsf{Clight} 
587semantics, this is implemented as a function-wide lookup of the target label.
588The invariant we would need is the fact that a global label lookup on a freshly
589created goto is equivalent to a local lookup. This would in turn require the
590propagation of some freshness hypotheses on labels. For reasons expressed above,
591we decided to omit this part of the correctness proof.
593\subsection{Cost labelling}
595The simulation for the cost labelling pass is the simplest in the
596front-end.  The main argument is that any step of the source program
597is simulated by the same step of the labelled one, plus any extra
598steps for the added cost labels.  The extra instructions do not change
599the memory or local environments, although we have to keep track of
[3216]600the extra instructions that appear in continuations, for example
601during the execution of a \lstinline[language=C]'while' loop.
603We do not attempt to capture any cost properties of the labelling in
[3216]604the simulation proof\footnote{We describe how the cost properties are
605established in Section~\ref{sec:costchecks}.}, allowing the proof to be oblivious to the choice
[3211]606of cost labels.  Hence we do not have to reason about the threading of
607name generation through the labelling function, greatly reducing the
[3216]608amount of effort required.
610%TODO: both give one-step-sim-by-many forward sim results; switch
611%removal tricky, uses aux var to keep result of expr, not central to
612%intensional correctness so curtailed proof effort once reasonable
613%level of confidence in code gained; labelling much simpler; don't care
614%what the labels are at this stage, just need to know when to go
615%through extra steps.  Rolled up into a single result with a cofixpoint
616%to obtain coinductive statement of equivalence (show).
[3128]618\section{Finding corresponding measurable subtraces}
[3142]621There follow the three main passes of the front-end:
623\item simplification of casts in \textsf{Clight} code
624\item \textsf{Clight} to \textsf{Cminor} translation, performing stack
625  variable allocation and simplifying control structures
626\item transformation to \textsf{RTLabs} control flow graph
[3216]628We have taken a common approach to
[3158]629each pass: first we build (or axiomatise) forward simulation results
[3216]630that are similar to normal compiler proofs, but which are slightly more
[3158]631fine-grained so that we can see that the call structure and relative
632placement of cost labels is preserved.
[3158]634Then we instantiate a general result which shows that we can find a
635\emph{measurable} subtrace in the target of the pass that corresponds
[3173]636to the measurable subtrace in the source.  By repeated application of
[3158]637this result we can find a measurable subtrace of the execution of the
638\textsf{RTLabs} code, suitable for the construction of a structured
[3216]639trace (see Section~\ref{sec:structuredtrace}).  This is essentially an
[3158]640extra layer on top of the simulation proofs that provides us with the
[3216]641additional information required for our intensional correctness proof.
643\subsection{Generic measurable subtrace lifting proof}
[3158]645Our generic proof is parametrised on a record containing small-step
646semantics for the source and target language, a classification of
647states (the same form of classification is used when defining
[3216]648structured traces), a simulation relation which respects the
649classification and cost labelling and
650four simulation results.  The simulations are split by the starting state's
651classification and whether it is a cost label, which will allow us to
652observe that the call structure is preserved.  They are:
654\item a step from a `normal' state (which is not classified as a call
655  or return) which is not a cost label is simulated by zero or more
656  `normal' steps;
657\item a step from a `call' state followed by a cost label step is
658  simulated by a step from a `call' state, a corresponding label step,
659  then zero or more `normal' steps;
660\item a step from a `call' state not followed by a cost label
661  similarly (note that this case cannot occur in a well-labelled
662  program, but we do not have enough information locally to exploit
663  this); and
664\item a cost label step is simulated by a cost label step.
666Finally, we need to know that a successfully translated program will
667have an initial state in the simulation relation with the original
668program's initial state.
674\caption{Tiling of simulation for a measurable subtrace}
678To find the measurable subtrace in the target program's execution we
679walk along the original program's execution trace applying the
680appropriate simulation result by induction on the number of steps.
681While the number of steps taken varies, the overall structure is
682preserved, as illustrated in Figure~\ref{fig:tiling}.  By preserving
683the structure we also maintain the same intensional observables.  One
684delicate point is that the cost label following a call must remain
685directly afterwards\footnote{The prototype compiler allowed some
686  straight-line code to appear before the cost label until a later
687  stage of the compiler, but we must move the requirement forward to
688  fit with the structured traces.}
689% Damn it, I should have just moved the cost label forwards in RTLabs,
690% like the prototype does in RTL to ERTL; the result would have been
691% simpler.  Or was there some reason not to do that?
692(both in the program code and in the execution trace), even if we
693introduce extra steps, for example to store parameters in memory in
694\textsf{Cminor}.  Thus we have a version of the call simulation
695that deals with both in one result.
697In addition to the subtrace we are interested in measuring we must
698also prove that the earlier part of the trace is also preserved in
699order to use the simulation from the initial state.  It also
700guarantees that we do not run out of stack space before the subtrace
701we are interested in.  The lemmas for this prefix and the measurable
702subtrace are similar, following the pattern above.  However, the
703measurable subtrace also requires us to rebuild the termination
[3173]704proof.  This is defined recursively:
[3216]707  let rec will_return_aux C (depth:nat)
708                               (trace:list (cs_state $...$ C × trace)) on trace : bool :=
709  match trace with
710  [ nil $\Rightarrow$ false
711  | cons h tl $\Rightarrow$
712    let $\langle$s,tr$\rangle$ := h in
713    match cs_classify C s with
714    [ cl_call $\Rightarrow$ will_return_aux C (S depth) tl
715    | cl_return $\Rightarrow$
716        match depth with
717        [ O $\Rightarrow$ match tl with [ nil $\Rightarrow$ true | _ $\Rightarrow$ false ]
718        | S d $\Rightarrow$ will_return_aux C d tl
719        ]
720    | _ $\Rightarrow$ will_return_aux C depth tl
721    ]
722  ].
724The \lstinline'depth' is the number of return states we need to see
725before we have returned to the original function (initially zero) and
726\lstinline'trace' the measurable subtrace obtained from the running
727the semantics for the correct number of steps.  This definition
728unfolds tail recursively for each step, and once the corresponding
729simulation result has been applied a new one for the target can be
730asserted by unfolding and applying the induction hypothesis on the
731shorter trace.
733This then gives us an overall result for any simulation fitting the
734requirements above (contained in the \lstinline'meas_sim' record):
736theorem measured_subtrace_preserved :
737  $\forall$MS:meas_sim.
738  $\forall$p1,p2,m,n,stack_cost,max.
739  ms_compiled MS p1 p2 $\rightarrow$
740  measurable (ms_C1 MS) p1 m n stack_cost max $\rightarrow$
741  $\exists$m',n'.
742    measurable (ms_C2 MS) p2 m' n' stack_cost max $\wedge$
743    observables (ms_C1 MS) p1 m n = observables (ms_C2 MS) p2 m' n'.
745The stack space requirement that is embedded in \lstinline'measurable'
[3216]746is a consequence of the preservation of observables, because it is
747determined by the functions called and returned from, which are observable.
749TODO: how to deal with untidy edges wrt to sim rel; anything to
750say about obs?
[3216]752TODO: cost labels are
[3158]753statements/exprs in these languages; split call/return gives simpler
[3142]756\subsection{Simulation results for each pass}
[3216]758We now consider the simulation results for the passes, each of which
759is used to instantiate the
760\lstinline[language=matita]'measured_subtrace_preserved' theorem to
761construct the measurable subtrace for the next language.
[3216]763\subsubsection{Cast simplification}
765The parser used in \cerco{} introduces a lot of explicit type casts.
[3191]766If left as they are, these constructs can greatly hamper the
767quality of the generated code -- especially as the architecture
[3216]768we consider is an $8$-bit one. In \textsf{Clight}, casts are
[3191]769expressions. Hence, most of the work of this transformation
770proceeds on expressions. The tranformation proceeds by recursively
771trying to coerce an expression to a particular integer type, which
772is in practice smaller than the original one. This functionality
773is implemented by two mutually recursive functions whose signature
774is the following.
777let rec simplify_expr (e:expr) (target_sz:intsize) (target_sg:signedness)
[3216]778  : $\Sigma$result:bool$\times$expr.
779    $\forall$ge,en,m. simplify_inv ge en m e (\snd result) target_sz target_sg (\fst result) := $\ldots$
[3216]781and simplify_inside (e:expr) : $\Sigma$result:expr. conservation e result := $\ldots$
[3191]784The \textsf{simplify\_inside} acts as a wrapper for
785\textsf{simplify\_expr}. Whenever \textsf{simplify\_inside} encounters
786a \textsf{Ecast} expression, it tries to coerce the sub-expression
787to the desired type using \textsf{simplify\_expr}, which tries to
788perform the actual coercion. In return, \textsf{simplify\_expr} calls
789back \textsf{simplify\_inside} in some particular positions, where we
790decided to be conservative in order to simplify the proofs. However,
791the current design allows to incrementally revert to a more aggressive
792version, by replacing recursive calls to \textsf{simplify\_inside} by
793calls to \textsf{simplify\_expr} \emph{and} proving the corresponding
794invariants -- where possible.
[3191]796The \textsf{simplify\_inv} invariant encodes either the conservation
797of the semantics during the transformation corresponding to the failure
798of the transformation (\textsf{Inv\_eq} constructor), or the successful
799downcast of the considered expression to the target type
803inductive simplify_inv
804  (ge : genv) (en : env) (m : mem)
[3216]805  (e1 : expr) (e2 : expr) (target_sz : intsize) (target_sg : signedness) : bool $\rightarrow$ Prop :=
806| Inv_eq : $\forall$result_flag. $\ldots$
[3191]807     simplify_inv ge en m e1 e2 target_sz target_sg result_flag
[3216]808| Inv_coerce_ok : $\forall$src_sz,src_sg.
809     typeof e1 = Tint src_sz src_sg $\rightarrow$
810     typeof e2 = Tint target_sz target_sg $\rightarrow$     
811     smaller_integer_val src_sz target_sz src_sg (exec_expr ge en m e1) (exec_expr ge en m e2) $\rightarrow$
[3191]812     simplify_inv ge en m e1 e2 target_sz target_sg true.
[3216]815The \textsf{conservation} invariant for \textsf{simplify\_inside} simply states the conservation
[3191]816of the semantics, as in the \textsf{Inv\_eq} constructor of the previous
[3216]820definition conservation := $\lambda$e,result. $\forall$ge,en,m.
[3191]821          res_sim ? (exec_expr ge en m e) (exec_expr ge en m result)
[3216]822       $\wedge$ res_sim ? (exec_lvalue ge en m e) (exec_lvalue ge en m result)
823       $\wedge$ typeof e = typeof result.
826This invariant is then easily lifted to statement evaluations.
827The main problem encountered with this particular pass was dealing with
828inconsistently typed programs, a canonical case being a particular
829integer constant of a certain size typed with another size. This
[3216]830prompted the need to introduce numerous type checks, making
831both the implementation and the proof more complex, even though more
832comprehensive checks are made in the next stage.
[3223]833%\todo{Make this a particular case of the more general statement on baking more invariants in the Clight language}
[3218]835\subsubsection{Clight to Cminor}
[3216]837This pass is the last one operating on the \textsf{Clight} language.
838Its input is a full \textsf{Clight} program, and its output is a
839\textsf{Cminor} program. Note that we do not use an equivalent of
[3223]840CompCert's \textsf{C\#minor} language: we translate directly to a
[3216]841variant of \textsf{Cminor}. This presents the advantage of not
842requiring the special loop constructs, nor the explicit block
843structure. Another salient point of our approach is that a significant
844number of the properties needed for the simulation proof were directly
845encoded in dependently typed translation functions.  In particular,
846freshness conditions and well-typedness conditions are included. The
847main effects of the transformation from \textsf{Clight} to
[3168]848\textsf{Cminor} are listed below.
851\item Variables are classified as being either globals, stack-allocated
852  locals or potentially register-allocated locals. The value of register-allocated
853  local variables is moved out of the modelled memory and stored in a
854  dedicated environment.
[3216]855\item In \textsf{Clight}, each local variable has a dedicated memory block, whereas
[3168]856  stack-allocated locals are bundled together on a function-by-function basis.
857\item Loops are converted to jumps.
860The first two points require memory injections which are more flexible that those
861needed in the switch removal case. In the remainder of this section, we briefly
862discuss our implementation of memory injections, and then the simulation proof.
[3218]864\paragraph{Memory injections.}
866Our memory injections are modelled after the work of Blazy \& Leroy.
867However, the corresponding paper is based on the first version of the
[3223]868CompCert memory model, whereas we use a much more concrete model, allowing byte-level
869manipulations (as in the later version of CompCert's memory model). We proved
[3168]870roughly 80 \% of the required lemmas. Some difficulties encountered were notably
[3216]871due to some overly relaxed conditions on pointer validity (fixed during development).
872Some more side conditions had to be added to take care of possible overflows when converting
873from \textbf{Z} block bounds to $16$ bit pointer offsets (in practice, these overflows
[3168]874only occur in edge cases that are easily ruled out -- but this fact is not visible
875in memory injections). Concretely, some of the lemmas on the preservation of simulation of
[3216]876loads after writes were axiomatised, due to a lack of time.
[3218]878\paragraph{Simulation proof.}
[3223]880We proved the simulation result for expressions and a representative
881selection of statements.  In particular we tackled
882\lstinline[language=C]'while' statements to ensure that we correctly
883translate loops because our approach differs from CompCert by
884converting directly to \textsf{Cminor} \lstinline[language=C]'goto's
885rather than maintaining a notion of loops.  We also have a partial
886proof for function entry, covering the setup of the memory injection,
887but not function exit.  Exits, and the remaining statements, have been
[3223]890Careful management of the proof state was required due to the proof
891terms that are embedded in \textsf{Cminor} code to show that some
892invariants are respected.  These proof terms can be large and awkward,
893and while generalising them away is usually sufficient, it can be
894difficult when they appear under a binder.
[3223]896%The correctness proof for this transformation was not completed. We proved the
897%simulation result for expressions and for some subset of the critical statement cases.
898%Notably lacking are the function entry and exit, where the memory injection is
899%properly set up. As would be expected, a significant amount of work has to be performed
900%to show the conservation of all invariants at each simulation step.
902%\todo{list cases, explain while loop, explain labeling problem}
[3218]904\subsubsection{Cminor to RTLabs}
906The translation from \textsf{Cminor} to \textsf{RTLabs} is a fairly
907routine control flow graph (CFG) construction.  As such, we chose to
908axiomatise its simulation results.  However, we did prove several
909properties of the generated programs:
911\item All statements are type correct with respect to the declared
912  pseudo-register type environment.
913\item The CFG is closed, and has a distinguished entry node and a
914  unique exit node.
917These properties rely on similar properties about type safety and the
918presence of \lstinline[language=C]'goto'-labels for Cminor programs
919which are checked at the preceding stage.  As a result, this
920transformation is total and any compilation failures must occur when
921the corresponding \textsf{Clight} source is available and a better
922error message can be generated.
924The proof obligations for these properties include many instances of
925graph inclusion.  We automated these proofs using a small amount of
926reflection, making the obligations much easier to handle.  One
927drawback to enforcing invariants thoroughly is that temporarily
928breaking them can be awkward.  For example, \lstinline'return'
929statements were originally used as placeholders for
930\lstinline[language=C]'goto' destinations that had not yet been
931translated.  However, this made establishing the single exit node
932property rather difficult, and a different placeholder was chosen
933instead.  In other circumstances it is possible to prove a more
934complex invariant then simplify it at the end of the transformation.
[3142]936\section{Checking cost labelling properties}
[3159]939Ideally, we would provide proofs that the cost labelling pass always
[3216]940produces programs that are soundly and precisely labelled and that
[3159]941each subsequent pass preserves these properties.  This would match our
942use of dependent types to eliminate impossible sources of errors
943during compilation, in particular retaining intermediate language type
[3159]946However, given the limited amount of time available we realised that
947implementing a compile-time check for a sound and precise labelling of
948the \textsf{RTLabs} intermediate code would reduce the proof burden
949considerably.  This is similar in spirit to the use of translation
[3212]950validation in certified compilation, which makes a similar trade-off
951between the potential for compile-time failure and the volume of proof
954The check cannot be pushed into a later stage of the compiler because
955much of the information is embedded into the structured traces.
956However, if an alternative method was used to show that function
957returns in the compiled code are sufficiently well-behaved, then we
958could consider pushing the cost property checks into the timing
959analysis itself.  We leave this as a possible area for future work.
961\subsection{Implementation and correctness}
963For a cost labelling to be sound and precise we need a cost label at
[3173]964the start of each function, after each branch and at least one in
[3159]965every loop.  The first two parts are trivial to check by examining the
966code.  In \textsf{RTLabs} the last part is specified by saying
967that there is a bound on the number of successive instruction nodes in
968the CFG that you can follow before you encounter a cost label, and
969checking this is more difficult.
[3216]971The implementation progresses through the set of nodes in the graph,
[3159]972following successors until a cost label is found or a label-free cycle
973is discovered (in which case the property does not hold and we stop).
974This is made easier by the prior knowledge that any branch is followed
975by cost labels, so we do not need to search each branch.  When a label
[3173]976is found, we remove the chain from the set and continue from another
977node in the set until it is empty, at which point we know that there
978is a bound for every node in the graph.
980Directly reasoning about the function that implements this would be
981rather awkward, so an inductive specification of a single step of its
982behaviour was written and proved to match the implementation.  This
983was then used to prove the implementation sound and complete.
[3216]985While we have not attempted to prove that the cost labelled properties
[3159]986are established and preserved earlier in the compiler, we expect that
[3216]987the effort for the \textsf{Cminor} to \textsf{RTLabs} stage alone
988would be similar to the work outlined above, because it involves the
989change from requiring a cost label at particular positions to
990requiring cost labels to break loops in the CFG.  As there are another
991three passes to consider (including the labelling itself), we believe
992that using the check above is much simpler overall.
[3167]994% TODO? Found some Clight to Cminor bugs quite quickly
[3128]996\section{Existence of a structured trace}
[3216]999The \emph{structured traces} introduced in Section~\ref{sec:fegoals} enrich the execution trace of a program by
1000nesting function calls in a mixed-step style and embedding the cost labelling
[3212]1001properties of the program.  It was originally designed to support the
1002proof of correctness for the timing analysis of the object code in the
[3159]1003back-end, then generalised to provide a common structure to use from
1004the end of the front-end to the object code.  See
[3212]1005Figure~\ref{fig:strtrace} on page~\pageref{fig:strtrace} for an
1006illustration of a structured trace.
[3167]1008To make the definition generic we abstract over the semantics of the
1011record abstract_status : Type[1] :=
1012  { as_status :> Type[0]
1013  ; as_execute : relation as_status
1014  ; as_pc : DeqSet
1015  ; as_pc_of : as_status $\rightarrow$ as_pc
1016  ; as_classify : as_status $\rightarrow$ status_class
1017  ; as_label_of_pc : as_pc $\rightarrow$ option costlabel
1018  ; as_after_return : ($\Sigma$s:as_status. as_classify s =  cl_call) $\rightarrow$ as_status $\rightarrow$ Prop
1019  ; as_result: as_status $\rightarrow$ option int
1020  ; as_call_ident : ($\Sigma$s:as_status.as_classify s = cl_call) $\rightarrow$ ident
1021  ; as_tailcall_ident : ($\Sigma$s:as_status.as_classify s = cl_tailcall) $\rightarrow$ ident
1022  }.
[3216]1024which gives a type of states, an execution relation\footnote{All of
1025  our semantics are executable, but using a relation was simpler in
1026  the abstraction.}, some notion of
1027program counters with decidable equality, the classification of states,
[3167]1028and functions to extract the observable intensional information (cost
[3173]1029labels and the identity of functions that are called).  The
1030\lstinline'as_after_return' property links the state before a function
1031call with the state after return, providing the evidence that
1032execution returns to the correct place.  The precise form varies
1033between stages; in \textsf{RTLabs} it insists the CFG, the pointer to
1034the CFG node to execute next, and some call stack information is
[3167]1037The structured traces are defined using three mutually inductive
1038types.  The core data structure is \lstinline'trace_any_label', which
[3216]1039captures some straight-line execution until the next cost label or the
1040return from the enclosing function.  Any function calls are embedded as
1041a single step, with its own trace nested inside and the before and
1042after states linked by \lstinline'as_after_return'; and states
1043classified as a `jump' (in particular branches) must be followed by a
1044cost label.
[3167]1046The second type, \lstinline'trace_label_label', is a
1047\lstinline'trace_any_label' where the initial state is cost labelled.
1048Thus a trace in this type identifies a series of steps whose cost is
1049entirely accounted for by the label at the start.
1051Finally, \lstinline'trace_label_return' is a sequence of
1052\lstinline'trace_label_label' values which end in the return from the
1053function.  These correspond to a measurable subtrace, and in
[3173]1054particular include executions of an entire function call (and so are
1055used for the nested calls in \lstinline'trace_any_label').
[3203]1059The construction of the structured trace replaces syntactic cost
[3216]1060labelling properties, which place requirements on where labels appear
[3203]1061in the program, with semantics properties that constrain the execution
1062traces of the program.  The construction begins by defining versions
[3216]1063of the sound and precise labelling properties on states and global
1064environments (for the code that appears in each of them) rather than
1065whole programs, and showing that these are preserved by steps of the
1066\textsf{RTLabs} semantics.
1068Then we show that each cost labelling property the structured traces
[3216]1069definition requires is locally satisfied.  These are broken up by the
[3203]1070classification of states.  Similarly, we prove a step-by-step stack
1071preservation result, which states that the \textsf{RTLabs} semantics
1072never changes the lower parts of the stack.
1074The core part of the construction of a structured trace is to use the
1075proof of termination from the measurable trace (defined on
[3173]1076page~\pageref{prog:terminationproof}) to `fold up' the execution into
[3203]1077the nested form.  The results outlined above fill in the proof
1078obligations for the cost labelling properties and the stack
1079preservation result shows that calls return to the correct location.
[3203]1081The structured trace alone is not sufficient to capture the property
1082that the program is soundly labelled.  While the structured trace
1083guarantees termination, it still permits a loop to be executed a
1084finite number of times without encountering a cost label.  We
1085eliminate this by proving that no `program counter' repeats within any
1086\lstinline'trace_any_label' section by showing that it is incompatible
1087with the property that there is a bound on the number of successor
1088instructions you can follow in the CFG before you encounter a cost
[3216]1091\subsubsection{Complete execution structured traces}
1093The development of the construction above started relatively early,
[3216]1094before the measurable subtrace preservation proofs.  To be confident
1095that the traces were well-formed at that time, we also developed a
1096complete execution form that embeds the traces above.  This includes
1097non-terminating program executions, where an infinite number of the terminating
1098structured traces are embedded.  This construction confirmed that our
1099definition of structured traces was consistent, although we later
1100found that we did not require the whole execution version for the
1101compiler correctness results.
1103To construct these we need to know whether each function call will
1104eventually terminate, requiring the use of the excluded middle.  This
1105classical reasoning is local to the construction of whole program
1106traces and is not necessary for our main results.
[3212]1110In combination with the work on the CerCo back-end and by
1111concentrating on the novel intensional parts of the proof, we have
1112shown that it is possible to construct certifying compilers that
[3216]1113correctly report execution time and stack space costs.  The layering
1114of intensional correctness proofs on top of normal simulation results
1115provides a useful separation of concerns, and could permit the reuse
1116of existing results.
[3142]1120TODO: appendix on code layout?
Note: See TracBrowser for help on using the repository browser.