source: Deliverables/D3.4/Report/report.tex @ 3226

Last change on this file since 3226 was 3226, checked in by campbell, 8 years ago

More 3.4 revisions; mostly administrative.

File size: 52.8 KB
14% LaTeX Companion, p 74
15\newcommand{\todo}[1]{\marginpar{\raggedright - #1}}
18  {keywords={Definition,Lemma,Theorem,Remark,Qed,Save,Inductive,Record},
19   morekeywords={[2]if,then,else},
20  }
23  {keywords={definition,lemma,theorem,remark,inductive,record,qed,let,rec,match,with,Type,and,on},
24   morekeywords={[2]whd,normalize,elim,cases,destruct},
25   mathescape=true,
26   morecomment=[n]{(*}{*)},
27  }
30        keywordstyle=\color{red}\bfseries,
31        keywordstyle=[2]\color{blue},
32        commentstyle=\color{green},
33        stringstyle=\color{blue},
34        showspaces=false,showstringspaces=false}
48\vspace*{1cm}Project FP7-ICT-2009-C-243881 \cerco{}}
50\date{ }
70Report n. D3.4\\
71Front-end Correctness Proofs\\
78Version 1.0
86Brian~Campbell, Ilias~Garnier, James~McKinna, Ian~Stark
92Project Acronym: \cerco{}\\
93Project full title: Certified Complexity\\
94Proposal/Contract no.: FP7-ICT-2009-C-243881 \cerco{}\\
96\clearpage \pagestyle{myheadings} \markright{\cerco{}, FP7-ICT-2009-C-243881}
102We report on the correctness proofs for the front-end of the \cerco{}
103cost lifting compiler.  First, we identify the core result we wish to
104prove, which says that the we correctly predict the precise execution
105time for particular parts of the execution called \emph{measurable}
106subtraces.  Then we consider the three distinct parts of the task:
107showing that the \emph{annotated source code} output by the compiler
108has equivalent behaviour to the original input (up to the
109annotations); showing that a measurable subtrace of the
110annotated source code corresponds to an equivalent measurable subtrace
111in the code produced by the front-end, including costs; and finally
112showing that the enriched \emph{structured} execution traces required
113for cost correctness in the back-end can be constructed from the
114properties of the code produced by the front-end.
116A key part of our work is that the intensional correctness results which show
117that we get consistent cost measurements throughout the intermediate languages
118of the compiler can be layered on top of normal forward simulation results,
119if we split those results into local call-structure preserving simulations.
120This split allowed us to concentrate on the intensional proofs by
121axiomatising some of the simulation results that are very similar to
122existing compiler correctness results.
124This report is about the correctness results that are deliverable
125D3.4, which are about the formalised compiler described in D3.2, using
126the source language semantics from D3.1 and intermediate language
127semantics from D3.3.  It builds on earlier work on the correctness of
128a toy compiler built to test the labelling approach in D2.1. Together
129with the companion deliverable about the correctness of the back-end,
130D4.4, we obtain results about the whole formalised compiler.
136% CHECK: clear up any -ize vs -ise
137% CHECK: clear up any "front end" vs "front-end"
138% CHECK: clear up any mentions of languages that aren't textsf'd.
139% CHECK: fix unicode in listings
143The \cerco{} compiler produces a version of the source code containing
144annotations describing the timing behaviour of the object code, as
145well as the object code itself. It compiles C code, targeting
146microcontrollers implementing the Intel 8051 architecture.  There are
147two versions: first, an initial prototype was implemented in
148\ocaml{}~\cite{d2.2}, then a version was formalised in the \matita{}
149proof assistant~\cite{d3.2,d4.2} and extracted to \ocaml{} code to
150produce an executable compiler.  In this document we present results
151from Deliverable 3.4, the formalised proofs in \matita{} about the
152front-end of the latter version of the compiler (culminating in the
153\lstinline'front_end_correct' lemma), and describe how that fits
154into the verification of the whole compiler.
156A key part of this work was to layer the intensional correctness
157results that show that the costs produced are correct on top of the
158proofs about the compiled code's extensional behaviour (that is, the
159functional correctness of the compiler).  Unfortunately, the ambitious
160goal of completely verifying the entire compiler was not feasible
161within the time available, but thanks to this separation of
162extensional and intensional proofs we are able to axiomatize some
163simulation results which are similar to those in other compiler verification
164projects and concentrate on the novel intensional proofs.  We were
165also able to add stack space costs to obtain a stronger result.  The
166proofs were made more tractable by introducing compile-time checks for
167the `sound and precise' cost labelling properties rather than proving
168that they are preserved throughout.
170The overall statement of correctness says that the annotated program has the
171same behaviour as the input, and that for any suitably well-structured part of
172the execution (which we call \emph{measurable}), the object code will execute
173the same behaviour taking precisely the time given by the cost annotations in
174the annotated source program.
176In the next section we recall the structure of the compiler and make the overall
177statement more precise.  Following that, in Section~\ref{sec:fegoals} we
178describe the statements we need to prove about the intermediate \textsf{RTLabs}
179programs for the back-end proofs.
180Section~\ref{sec:inputtolabelling} covers the passes which produce the
181annotated source program and Section~\ref{sec:measurablelifting} the rest
182of the transformations in the front-end.  Then the compile time checks
183for good cost labelling are detailed in Section~\ref{sec:costchecks}
184and the proofs that the structured traces required by the back-end
185exist are discussed in Section~\ref{sec:structuredtrace}.
187\section{The compiler and main goals}
189The unformalised \ocaml{} \cerco{} compiler was originally described
190in Deliverables 2.1 and 2.2.  Its design was replicated in the formal
191\matita{} code, which was presented in Deliverables 3.2 and 4.2, for
192the front-end and back-end respectively.
198\caption{Languages in the \cerco{} compiler}
202The compiler uses a number of intermediate languages, as outlined the
203middle two lines of Figure~\ref{fig:compilerlangs}.  The upper line
204represents the front-end of the compiler, and the lower the back-end,
205finishing with 8051 binary code.  Not all of the front-end passes
206introduces a new language, and Figure~\ref{fig:summary} presents a
207list of every pass involved.
213\quad \= $\downarrow$ \quad \= \kill
214\textsf{C} (unformalized)\\
215\> $\downarrow$ \> CIL parser (unformalized \ocaml)\\
217%\> $\downarrow$ \> add runtime functions\\
218\> $\downarrow$ \> \lstinline[language=C]'switch' removal\\
219\> $\downarrow$ \> labelling\\
220\> $\downarrow$ \> cast removal\\
221\> $\downarrow$ \> stack variable allocation and control structure
222 simplification\\
224%\> $\downarrow$ \> generate global variable initialization code\\
225\> $\downarrow$ \> transform to RTL graph\\
227\> $\downarrow$ \> check cost labelled properties of RTL graph\\
228\> $\downarrow$ \> start of target specific back-end\\
229\>\quad \vdots
233\caption{Front-end languages and compiler passes}
238The annotated source code is taken after the cost labelling phase.
239Note that there is a pass to replace C \lstinline[language=C]'switch'
240statements before labelling --- we need to remove them because the
241simple form of labelling used in the formalised compiler is not quite
242capable of capturing their execution time costs, largely due to C's
243`fall-through' behaviour where execution from one branch continues in
244the next unless there is an explicit \lstinline[language=C]'break'.
246The cast removal phase which follows cost labelling simplifies
247expressions to prevent unnecessary arithmetic promotion, which is
248specified by the C standard but costly for an 8-bit target.  The
249transformation to \textsf{Cminor} and subsequently \textsf{RTLabs}
250bear considerable resemblance to some passes of the CompCert
251compiler~\cite{Blazy-Leroy-Clight-09,Leroy-backend}, although we use a simpler \textsf{Cminor} where
252all loops use \lstinline[language=C]'goto' statements, and the
253\textsf{RTLabs} language retains a target-independent flavour.  The
254back-end takes \textsf{RTLabs} code as input.
256The whole compilation function returns the following information on success:
258  record compiler_output : Type[0] :=
259   { c_labelled_object_code: labelled_object_code
260   ; c_stack_cost: stack_cost_model
261   ; c_max_stack: nat
262   ; c_init_costlabel: costlabel
263   ; c_labelled_clight: clight_program
264   ; c_clight_cost_map: clight_cost_map
265   }.
267It consists of annotated 8051 object code, a mapping from function
268identifiers to the function's stack space usage, the space available for the
269stack after global variable allocation, a cost label covering the
270execution time for the initialisation of global variables and the call
271to the \lstinline[language=C]'main' function, the annotated source
272code, and finally a mapping from cost labels to actual execution time
275An \ocaml{} pretty printer is used to provide a concrete version of
276the output code and annotated source code.  In the case of the
277annotated source code, it also inserts the actual costs alongside the
278cost labels, and optionally adds a global cost variable and
279instrumentation to support further reasoning.
281\subsection{Revisions to the prototype compiler}
283Our focus on intensional properties prompted us to consider whether we
284could incorporate stack space into the costs presented to the user.
285We only allocate one fixed-size frame per function, so modelling this
286was relatively simple.  It is the only form of dynamic memory
287allocation provided by the compiler, so we were able to strengthen the
288statement of the goal to guarantee successful execution whenever the
289stack space obeys the \lstinline'c_max_stack' bound calculated by
290subtracting the global variable requirements from the total memory
293The cost labelling checks at the end of Figure~\ref{fig:summary} have been
294introduced to reduce the proof burden, and are described in
297The use of dependent types to capture simple intermediate language
298invariants makes every front-end pass except \textsf{Clight} to
299\textsf{Cminor} and the cost checks total functions.  Hence various
300well-formedness and type safety checks are performed once between
301\textsf{Clight} and \textsf{Cminor}, and the invariants rule out any
302difficulties in the later stages.
303With the benefit of hindsight we would have included an initial
304checking phase to produce a `well-formed' variant of \textsf{Clight},
305conjecturing that this would simplify various parts of the proofs for
306the \textsf{Clight} stages which deal with potentially ill-formed
309Following D2.2, we previous generated code for global variable
310initialisation in \textsf{Cminor}, for which we reserved a cost label
311to represent the execution time for initialisation.  However, the
312back-end must also add an initial call to the main function, whose
313cost must also be accounted for, so we decided to move the
314initialisation code to the back-end and merge the costs.
316\subsection{Main goals}
318Informally, our main intensional result links the time difference in a source
319code execution to the time difference in the object code, expressing the time
320for the source by summing the values for the cost labels in the trace, and the
321time for the target by a clock built in to the 8051 executable semantics.
323The availability of precise timing information for 8501
324implementations and the design of the compiler allow it to give exact
325time costs in terms of processor cycles, not just upper bounds.
326However, these exact results are only available if the subtrace we
327measure starts and ends at suitable points.  In particular, pure
328computation with no observable effects may be reordered and moved past
329cost labels, so we cannot measure time between arbitrary statements in
330the program.
332There is also a constraint on the subtraces that we
333measure due to the requirements of the correctness proof for the
334object code timing analysis.  To be sure that the timings are assigned
335to the correct cost label, we need to know that each return from a
336function call must go to the correct return address.  It is difficult
337to observe this property locally in the object code because it relies
338on much earlier stages in the compiler.  To convey this information to
339the timing analysis extra structure is imposed on the subtraces, which
340is described in Section~\ref{sec:fegoals}.
342% Regarding the footnote, would there even be much point?
343% TODO: this might be quite easy to add ('just' subtract the
344% measurable subtrace from the second label to the end).  Could also
345% measure other traces in this manner.
346These restrictions are reflected in the subtraces that we give timing
347guarantees on; they must start at a cost label and end at the return
348of the enclosing function of the cost label\footnote{We expect that
349  this would generalise to more general subtraces by subtracting costs
350  for unwanted measurable suffixes of a measurable subtrace.}.  A
351typical example of such a subtrace is the execution of an entire
352function from the cost label at the start of the function until it
353returns.  We call such any such subtrace \emph{measurable} if it (and
354the prefix of the trace from the start to the subtrace) can also be
355executed within the available stack space.
357Now we can give the main intensional statement for the compiler.
358Given a \emph{measurable} subtrace for a labelled \textsf{Clight}
359program, there is a subtrace of the 8051 object code program where the
360time differences match.  Moreover, \emph{observable} parts of the
361trace also match --- these are the appearance of cost labels and
362function calls and returns.
366More formally, the definition of this statement in \matita{} is
368definition simulates :=
369  $\lambda$p: compiler_output.
370  let initial_status := initialise_status $...$ (cm (c_labelled_object_code $...$ p)) in
371  $\forall$m1,m2.
372   measurable Clight_pcs (c_labelled_clight $...$ p) m1 m2
373       (stack_sizes (c_stack_cost $...$ p)) (c_max_stack $...$ p) $\rightarrow$
374  $\forall$c1,c2.
375   clock_after Clight_pcs (c_labelled_clight $...$ p) m1 (c_clight_cost_map $...$ p) = OK $...$ c1 $\rightarrow$
376   clock_after Clight_pcs (c_labelled_clight $...$ p) (m1+m2) (c_clight_cost_map $...$ p) = OK $...$ c2 $\rightarrow$
377  $\exists$n1,n2.
378   observables Clight_pcs (c_labelled_clight $...$ p) m1 m2 =
379     observables (OC_preclassified_system (c_labelled_object_code $...$ p))
380          (c_labelled_object_code $...$ p) n1 n2
381  $\wedge$
382   clock ?? (execute (n1+n2) ? initial_status) =
383     clock ?? (execute n1 ? initial_status) + (c2-c1).
385where the \lstinline'measurable', \lstinline'clock_after' and
386\lstinline'observables' definitions are generic definitions for multiple
387languages; in this case the \lstinline'Clight_pcs' record applies them
388to \textsf{Clight} programs.
390There is a second part to the statement, which says that the initial
391processing of the input program to produce the cost labelled version
392does not affect the semantics of the program:
393% Yes, I'm paraphrasing the result a tiny bit to remove the observe non-function
395  $\forall$input_program,output.
396  compile input_program = return output $\rightarrow$
397  not_wrong $...$ (exec_inf $...$ clight_fullexec input_program) $\rightarrow$
398  sim_with_labels
399   (exec_inf $...$ clight_fullexec input_program)
400   (exec_inf $...$ clight_fullexec (c_labelled_clight $...$ output))
402That is, any successful compilation produces a labelled program that
403has identical behaviour to the original, so long as there is no
404`undefined behaviour'.
406Note that this statement provides full functional correctness, including
407preservation of (non-)termination.  The intensional result above does
408not do this directly --- it does not guarantee the same result or same
409termination.  There are two mitigating factors, however: first, to
410prove the intensional property you need local simulation results --- these
411can be pieced together to form full behavioural equivalence, only time
412constraints have prevented us from doing so.  Second, if we wish to
413confirm a result, termination, or non-termination we could add an
414observable witness, such as a function that is only called if the
415correct result is given.  The intensional result guarantees that the
416observable witness is preserved, so the program must behave correctly.
418These two results are combined in the the \lstinline'correct'
419theorem in the file \lstinline''.
421\section{Goals for the front-end}
424The essential parts of the intensional proof were outlined during work
425on a toy compiler in Task
4262.1~\cite{d2.1,springerlink:10.1007/978-3-642-32469-7_3}.  These are
428\item functional correctness, in particular preserving the trace of
429  cost labels,
430\item the \emph{soundness} and \emph{precision} of the cost labelling
431  on the object code, and
432\item the timing analysis on the object code produces a correct
433  mapping from cost labels to time.
436However, that toy development did not include function calls.  For the
437full \cerco{} compiler we also need to maintain the invariant that
438functions return to the correct program location in the caller, as we
439mentioned in the previous section.  During work on the back-end timing
440analysis (describe in more detail in the companion deliverable, D4.4)
441the notion of a \emph{structured trace} was developed to enforce this
442return property, and also most of the cost labelling properties too.
448\caption{The compiler and proof outline}
452Jointly, we generalised the structured traces to apply to any of the
453intermediate languages which have some idea of program counter.  This means
454that they are introduced part way through the compiler, see
455Figure~\ref{fig:compiler}.  Proving that a structured trace can be
456constructed at \textsf{RTLabs} has several virtues:
458\item This is the first language where every operation has its own
459  unique, easily addressable, statement.
460\item Function calls and returns are still handled implicitly in the
461  language and so the structural properties are ensured by the
462  semantics.
463\item Many of the back-end languages from \textsf{RTL} onwards share a common
464  core set of definitions, and using structured traces throughout
465  increases this uniformity.
472\caption{Nesting of functions in structured traces}
475A structured trace is a mutually inductive data type which
476contains the steps from a normal program trace, but arranged into a
477nested structure which groups entire function calls together and
478aggregates individual steps between cost labels (or between the final
479cost label and the return from the function), see
480Figure~\ref{fig:strtrace}.  This captures the idea that the cost labels
481only represent costs \emph{within} a function --- calls to other
482functions are accounted for in the nested trace for their execution, and we
483can locally regard function calls as a single step.
485These structured traces form the core part of the intermediate results
486that we must prove so that the back-end can complete the main
487intensional result stated above.  In full, we provide the back-end
490\item A normal trace of the \textbf{prefix} of the program's execution
491  before reaching the measurable subtrace.  (This needs to be
492  preserved so that we know that the stack space consumed is correct,
493  and to set up the simulation results.)
494\item The \textbf{structured trace} corresponding to the measurable
495  subtrace.
496\item An additional property about the structured trace that no
497  `program counter' is \textbf{repeated} between cost labels.  Together with
498  the structure in the trace, this takes over from showing that
499  cost labelling is sound and precise.
500\item A proof that the \textbf{observables} have been preserved.
501\item A proof that the \textbf{stack limit} is still observed by the prefix and
502  the structure trace.  (This is largely a consequence of the
503  preservation of observables.)
505The \lstinline'front_end_correct' lemma in the
506\lstinline'' file provides a record containing these.
508Following the outline in Figure~\ref{fig:compiler}, we will first deal
509with the transformations in \textsf{Clight} that produce the source
510program with cost labels, then show that measurable traces can be
511lifted to \textsf{RTLabs}, and finally show that we can construct the
512properties listed above ready for the back-end proofs.
514\section{Input code to cost labelled program}
517As explained on page~\pageref{page:switchintro}, the costs of complex
518C \lstinline[language=C]'switch' statements cannot be represented with
519the simple labelling used in the formalised compiler.  Our first pass
520replaces these statements with simpler C code, allowing our second
521pass to perform the cost labelling.  We show that the behaviour of
522programs is unchanged by these passes using forward
523simulations\footnote{All of our languages are deterministic, which can
524be seen directly from their executable definitions.  Thus we know that
525forward simulations are sufficient because the target cannot have any
526other behaviour.}.
528\subsection{Switch removal}
530We compile away \lstinline[language=C]'switch' statements into more
531basic \textsf{Clight} code.
532Note that this transformation does not necessarily deteriorate the
533efficiency of the generated code. For instance, compilers such as GCC
534introduce balanced trees of ``if-then-else'' constructs for small
535switches. However, our implementation strategy is much simpler. Let
536us consider the following input statement.
539   switch(e) {
540   case v1:
541     stmt1;
542   case v2:
543     stmt2;
544   default:
545     stmt_default;
546   }
549Note that \textsf{stmt1}, \textsf{stmt2}, \ldots \textsf{stmt\_default}
550may contain \lstinline[language=C]'break' statements, which have the
551effect of exiting the switch statement. In the absence of break, the
552execution falls through each case sequentially. In our implementation,
553we produce an equivalent sequence of ``if-then'' chained by gotos:
555   fresh = e;
556   if(fresh == v1) {
557     $\llbracket$stmt1$\rrbracket$;
558     goto lbl_case2;
559   };
560   if(fresh == v2) {
561     lbl_case2:
562     $\llbracket$stmt2$\rrbracket$;
563     goto lbl_case2;
564   };
565   $\llbracket$stmt_default$\rrbracket$;
566   exit_label:
569The proof had to tackle the following points:
571\item the source and target memories are not the same (due to the fresh variable),
572\item the flow of control is changed in a non-local way (e.g. \textbf{goto} 
573  instead of \textbf{break}).
575In order to tackle the first point, we implemented a version of memory
576extensions similar to CompCert's.
578For the simulation we decided to prove a sufficient amount to give us
579confidence in the definitions and approach, but curtail the proof
580because this pass does not contribute to the intensional correctness
581result.  We tackled several simple cases, that do not interact with
582the switch removal per se, to show that the definitions were usable,
583and part of the switch case to check that the approach is
584reasonable. This comprises propagating the memory extension through
585each statement (except switch), as well as various invariants that are
586needed for the switch case (in particular, freshness hypotheses). The
587details of the evaluation process for the source switch statement and
588its target counterpart can be found in the file
589\lstinline'', along more details on the transformation
592Proving the correctness of the second point would require reasoning on the
593semantics of \lstinline[language=C]'goto' statements. In the \textsf{Clight} 
594semantics, this is implemented as a function-wide lookup of the target label.
595The invariant we would need is the fact that a global label lookup on a freshly
596created goto is equivalent to a local lookup. This would in turn require the
597propagation of some freshness hypotheses on labels. For reasons expressed above,
598we decided to omit this part of the correctness proof.
600\subsection{Cost labelling}
602The simulation for the cost labelling pass is the simplest in the
603front-end.  The main argument is that any step of the source program
604is simulated by the same step of the labelled one, plus any extra
605steps for the added cost labels.  The extra instructions do not change
606the memory or local environments, although we have to keep track of
607the extra instructions that appear in continuations, for example
608during the execution of a \lstinline[language=C]'while' loop.
610We do not attempt to capture any cost properties of the labelling in
611the simulation proof\footnote{We describe how the cost properties are
612established in Section~\ref{sec:costchecks}.}, allowing the proof to be oblivious to the choice
613of cost labels.  Hence we do not have to reason about the threading of
614name generation through the labelling function, greatly reducing the
615amount of effort required.
617%TODO: both give one-step-sim-by-many forward sim results; switch
618%removal tricky, uses aux var to keep result of expr, not central to
619%intensional correctness so curtailed proof effort once reasonable
620%level of confidence in code gained; labelling much simpler; don't care
621%what the labels are at this stage, just need to know when to go
622%through extra steps.  Rolled up into a single result with a cofixpoint
623%to obtain coinductive statement of equivalence (show).
625\section{Finding corresponding measurable subtraces}
628There follow the three main passes of the front-end:
630\item simplification of casts in \textsf{Clight} code
631\item \textsf{Clight} to \textsf{Cminor} translation, performing stack
632  variable allocation and simplifying control structures
633\item transformation to \textsf{RTLabs} control flow graph
635We have taken a common approach to
636each pass: first we build (or axiomatise) forward simulation results
637that are similar to normal compiler proofs, but which are slightly more
638fine-grained so that we can see that the call structure and relative
639placement of cost labels is preserved.
641Then we instantiate a general result which shows that we can find a
642\emph{measurable} subtrace in the target of the pass that corresponds
643to the measurable subtrace in the source.  By repeated application of
644this result we can find a measurable subtrace of the execution of the
645\textsf{RTLabs} code, suitable for the construction of a structured
646trace (see Section~\ref{sec:structuredtrace}).  This is essentially an
647extra layer on top of the simulation proofs that provides us with the
648additional information required for our intensional correctness proof.
650\subsection{Generic measurable subtrace lifting proof}
652Our generic proof is parametrised on a record containing small-step
653semantics for the source and target language, a classification of
654states (the same form of classification is used when defining
655structured traces), a simulation relation which respects the
656classification and cost labelling and
657four simulation results.  The simulations are split by the starting state's
658classification and whether it is a cost label, which will allow us to
659observe that the call structure is preserved.  They are:
661\item a step from a `normal' state (which is not classified as a call
662  or return) which is not a cost label is simulated by zero or more
663  `normal' steps;
664\item a step from a `call' state followed by a cost label step is
665  simulated by a step from a `call' state, a corresponding label step,
666  then zero or more `normal' steps;
667\item a step from a `call' state not followed by a cost label
668  similarly (note that this case cannot occur in a well-labelled
669  program, but we do not have enough information locally to exploit
670  this); and
671\item a cost label step is simulated by a cost label step.
673Finally, we need to know that a successfully translated program will
674have an initial state in the simulation relation with the original
675program's initial state.
681\caption{Tiling of simulation for a measurable subtrace}
685To find the measurable subtrace in the target program's execution we
686walk along the original program's execution trace applying the
687appropriate simulation result by induction on the number of steps.
688While the number of steps taken varies, the overall structure is
689preserved, as illustrated in Figure~\ref{fig:tiling}.  By preserving
690the structure we also maintain the same intensional observables.  One
691delicate point is that the cost label following a call must remain
692directly afterwards\footnote{The prototype compiler allowed some
693  straight-line code to appear before the cost label until a later
694  stage of the compiler, but we must move the requirement forward to
695  fit with the structured traces.}
696% Damn it, I should have just moved the cost label forwards in RTLabs,
697% like the prototype does in RTL to ERTL; the result would have been
698% simpler.  Or was there some reason not to do that?
699(both in the program code and in the execution trace), even if we
700introduce extra steps, for example to store parameters in memory in
701\textsf{Cminor}.  Thus we have a version of the call simulation
702that deals with both in one result.
704In addition to the subtrace we are interested in measuring we must
705also prove that the earlier part of the trace is also preserved in
706order to use the simulation from the initial state.  It also
707guarantees that we do not run out of stack space before the subtrace
708we are interested in.  The lemmas for this prefix and the measurable
709subtrace are similar, following the pattern above.  However, the
710measurable subtrace also requires us to rebuild the termination
711proof.  This is defined recursively:
714  let rec will_return_aux C (depth:nat)
715                               (trace:list (cs_state $...$ C × trace)) on trace : bool :=
716  match trace with
717  [ nil $\Rightarrow$ false
718  | cons h tl $\Rightarrow$
719    let $\langle$s,tr$\rangle$ := h in
720    match cs_classify C s with
721    [ cl_call $\Rightarrow$ will_return_aux C (S depth) tl
722    | cl_return $\Rightarrow$
723        match depth with
724        [ O $\Rightarrow$ match tl with [ nil $\Rightarrow$ true | _ $\Rightarrow$ false ]
725        | S d $\Rightarrow$ will_return_aux C d tl
726        ]
727    | _ $\Rightarrow$ will_return_aux C depth tl
728    ]
729  ].
731The \lstinline'depth' is the number of return states we need to see
732before we have returned to the original function (initially zero) and
733\lstinline'trace' the measurable subtrace obtained from the running
734the semantics for the correct number of steps.  This definition
735unfolds tail recursively for each step, and once the corresponding
736simulation result has been applied a new one for the target can be
737asserted by unfolding and applying the induction hypothesis on the
738shorter trace.
740This then gives us an overall result for any simulation fitting the
741requirements above (contained in the \lstinline'meas_sim' record):
743theorem measured_subtrace_preserved :
744  $\forall$MS:meas_sim.
745  $\forall$p1,p2,m,n,stack_cost,max.
746  ms_compiled MS p1 p2 $\rightarrow$
747  measurable (ms_C1 MS) p1 m n stack_cost max $\rightarrow$
748  $\exists$m',n'.
749    measurable (ms_C2 MS) p2 m' n' stack_cost max $\wedge$
750    observables (ms_C1 MS) p1 m n = observables (ms_C2 MS) p2 m' n'.
752The stack space requirement that is embedded in \lstinline'measurable'
753is a consequence of the preservation of observables, because it is
754determined by the functions called and returned from, which are observable.
756TODO: how to deal with untidy edges wrt to sim rel; anything to
757say about obs?
759TODO: cost labels are
760statements/exprs in these languages; split call/return gives simpler
763\subsection{Simulation results for each pass}
765We now consider the simulation results for the passes, each of which
766is used to instantiate the
767\lstinline[language=matita]'measured_subtrace_preserved' theorem to
768construct the measurable subtrace for the next language.
770\subsubsection{Cast simplification}
772The parser used in \cerco{} introduces a lot of explicit type casts.
773If left as they are, these constructs can greatly hamper the
774quality of the generated code -- especially as the architecture
775we consider is an $8$-bit one. In \textsf{Clight}, casts are
776expressions. Hence, most of the work of this transformation
777proceeds on expressions. The tranformation proceeds by recursively
778trying to coerce an expression to a particular integer type, which
779is in practice smaller than the original one. This functionality
780is implemented by two mutually recursive functions whose signature
781is the following.
784let rec simplify_expr (e:expr) (target_sz:intsize) (target_sg:signedness)
785  : $\Sigma$result:bool$\times$expr.
786    $\forall$ge,en,m. simplify_inv ge en m e (\snd result) target_sz target_sg (\fst result) := $\ldots$
788and simplify_inside (e:expr) : $\Sigma$result:expr. conservation e result := $\ldots$
791The \textsf{simplify\_inside} acts as a wrapper for
792\textsf{simplify\_expr}. Whenever \textsf{simplify\_inside} encounters
793a \textsf{Ecast} expression, it tries to coerce the sub-expression
794to the desired type using \textsf{simplify\_expr}, which tries to
795perform the actual coercion. In return, \textsf{simplify\_expr} calls
796back \textsf{simplify\_inside} in some particular positions, where we
797decided to be conservative in order to simplify the proofs. However,
798the current design allows to incrementally revert to a more aggressive
799version, by replacing recursive calls to \textsf{simplify\_inside} by
800calls to \textsf{simplify\_expr} \emph{and} proving the corresponding
801invariants -- where possible.
803The \textsf{simplify\_inv} invariant encodes either the conservation
804of the semantics during the transformation corresponding to the failure
805of the transformation (\textsf{Inv\_eq} constructor), or the successful
806downcast of the considered expression to the target type
810inductive simplify_inv
811  (ge : genv) (en : env) (m : mem)
812  (e1 : expr) (e2 : expr) (target_sz : intsize) (target_sg : signedness) : bool $\rightarrow$ Prop :=
813| Inv_eq : $\forall$result_flag. $\ldots$
814     simplify_inv ge en m e1 e2 target_sz target_sg result_flag
815| Inv_coerce_ok : $\forall$src_sz,src_sg.
816     typeof e1 = Tint src_sz src_sg $\rightarrow$
817     typeof e2 = Tint target_sz target_sg $\rightarrow$     
818     smaller_integer_val src_sz target_sz src_sg (exec_expr ge en m e1) (exec_expr ge en m e2) $\rightarrow$
819     simplify_inv ge en m e1 e2 target_sz target_sg true.
822The \textsf{conservation} invariant for \textsf{simplify\_inside} simply states the conservation
823of the semantics, as in the \textsf{Inv\_eq} constructor of the previous
827definition conservation := $\lambda$e,result. $\forall$ge,en,m.
828          res_sim ? (exec_expr ge en m e) (exec_expr ge en m result)
829       $\wedge$ res_sim ? (exec_lvalue ge en m e) (exec_lvalue ge en m result)
830       $\wedge$ typeof e = typeof result.
833This invariant is then easily lifted to statement evaluations.
834The main problem encountered with this particular pass was dealing with
835inconsistently typed programs, a canonical case being a particular
836integer constant of a certain size typed with another size. This
837prompted the need to introduce numerous type checks, making
838both the implementation and the proof more complex, even though more
839comprehensive checks are made in the next stage.
840%\todo{Make this a particular case of the more general statement on baking more invariants in the Clight language}
842\subsubsection{Clight to Cminor}
844This pass is the last one operating on the \textsf{Clight} language.
845Its input is a full \textsf{Clight} program, and its output is a
846\textsf{Cminor} program. Note that we do not use an equivalent of
847CompCert's \textsf{C\#minor} language: we translate directly to a
848variant of \textsf{Cminor}. This presents the advantage of not
849requiring the special loop constructs, nor the explicit block
850structure. Another salient point of our approach is that a significant
851number of the properties needed for the simulation proof were directly
852encoded in dependently typed translation functions.  In particular,
853freshness conditions and well-typedness conditions are included. The
854main effects of the transformation from \textsf{Clight} to
855\textsf{Cminor} are listed below.
858\item Variables are classified as being either globals, stack-allocated
859  locals or potentially register-allocated locals. The value of register-allocated
860  local variables is moved out of the modelled memory and stored in a
861  dedicated environment.
862\item In \textsf{Clight}, each local variable has a dedicated memory block, whereas
863  stack-allocated locals are bundled together on a function-by-function basis.
864\item Loops are converted to jumps.
867The first two points require memory injections which are more flexible that those
868needed in the switch removal case. In the remainder of this section, we briefly
869discuss our implementation of memory injections, and then the simulation proof.
871\paragraph{Memory injections.}
873Our memory injections are modelled after the work of Blazy \& Leroy.
874However, the corresponding paper is based on the first version of the
875CompCert memory model, whereas we use a much more concrete model, allowing byte-level
876manipulations (as in the later version of CompCert's memory model). We proved
877roughly 80 \% of the required lemmas. Some difficulties encountered were notably
878due to some overly relaxed conditions on pointer validity (fixed during development).
879Some more side conditions had to be added to take care of possible overflows when converting
880from \textbf{Z} block bounds to $16$ bit pointer offsets (in practice, these overflows
881only occur in edge cases that are easily ruled out -- but this fact is not visible
882in memory injections). Concretely, some of the lemmas on the preservation of simulation of
883loads after writes were axiomatised, due to a lack of time.
885\paragraph{Simulation proof.}
887We proved the simulation result for expressions and a representative
888selection of statements.  In particular we tackled
889\lstinline[language=C]'while' statements to ensure that we correctly
890translate loops because our approach differs from CompCert by
891converting directly to \textsf{Cminor} \lstinline[language=C]'goto's
892rather than maintaining a notion of loops.  We also have a partial
893proof for function entry, covering the setup of the memory injection,
894but not function exit.  Exits, and the remaining statements, have been
897Careful management of the proof state was required due to the proof
898terms that are embedded in \textsf{Cminor} code to show that some
899invariants are respected.  These proof terms can be large and awkward,
900and while generalising them away is usually sufficient, it can be
901difficult when they appear under a binder.
903%The correctness proof for this transformation was not completed. We proved the
904%simulation result for expressions and for some subset of the critical statement cases.
905%Notably lacking are the function entry and exit, where the memory injection is
906%properly set up. As would be expected, a significant amount of work has to be performed
907%to show the conservation of all invariants at each simulation step.
909%\todo{list cases, explain while loop, explain labeling problem}
911\subsubsection{Cminor to RTLabs}
913The translation from \textsf{Cminor} to \textsf{RTLabs} is a fairly
914routine control flow graph (CFG) construction.  As such, we chose to
915axiomatise its simulation results.  However, we did prove several
916properties of the generated programs:
918\item All statements are type correct with respect to the declared
919  pseudo-register type environment.
920\item The CFG is closed, and has a distinguished entry node and a
921  unique exit node.
924These properties rely on similar properties about type safety and the
925presence of \lstinline[language=C]'goto'-labels for Cminor programs
926which are checked at the preceding stage.  As a result, this
927transformation is total and any compilation failures must occur when
928the corresponding \textsf{Clight} source is available and a better
929error message can be generated.
931The proof obligations for these properties include many instances of
932graph inclusion.  We automated these proofs using a small amount of
933reflection, making the obligations much easier to handle.  One
934drawback to enforcing invariants thoroughly is that temporarily
935breaking them can be awkward.  For example, \lstinline'return'
936statements were originally used as placeholders for
937\lstinline[language=C]'goto' destinations that had not yet been
938translated.  However, this made establishing the single exit node
939property rather difficult, and a different placeholder was chosen
940instead.  In other circumstances it is possible to prove a more
941complex invariant then simplify it at the end of the transformation.
943\section{Checking cost labelling properties}
946Ideally, we would provide proofs that the cost labelling pass always
947produces programs that are soundly and precisely labelled and that
948each subsequent pass preserves these properties.  This would match our
949use of dependent types to eliminate impossible sources of errors
950during compilation, in particular retaining intermediate language type
953However, given the limited amount of time available we realised that
954implementing a compile-time check for a sound and precise labelling of
955the \textsf{RTLabs} intermediate code would reduce the proof burden
956considerably.  This is similar in spirit to the use of translation
957validation in certified compilation, which makes a similar trade-off
958between the potential for compile-time failure and the volume of proof
961The check cannot be pushed into a later stage of the compiler because
962much of the information is embedded into the structured traces.
963However, if an alternative method was used to show that function
964returns in the compiled code are sufficiently well-behaved, then we
965could consider pushing the cost property checks into the timing
966analysis itself.  We leave this as a possible area for future work.
968\subsection{Implementation and correctness}
970For a cost labelling to be sound and precise we need a cost label at
971the start of each function, after each branch and at least one in
972every loop.  The first two parts are trivial to check by examining the
973code.  In \textsf{RTLabs} the last part is specified by saying
974that there is a bound on the number of successive instruction nodes in
975the CFG that you can follow before you encounter a cost label, and
976checking this is more difficult.
978The implementation progresses through the set of nodes in the graph,
979following successors until a cost label is found or a label-free cycle
980is discovered (in which case the property does not hold and we stop).
981This is made easier by the prior knowledge that any branch is followed
982by cost labels, so we do not need to search each branch.  When a label
983is found, we remove the chain from the set and continue from another
984node in the set until it is empty, at which point we know that there
985is a bound for every node in the graph.
987Directly reasoning about the function that implements this would be
988rather awkward, so an inductive specification of a single step of its
989behaviour was written and proved to match the implementation.  This
990was then used to prove the implementation sound and complete.
992While we have not attempted to prove that the cost labelled properties
993are established and preserved earlier in the compiler, we expect that
994the effort for the \textsf{Cminor} to \textsf{RTLabs} stage alone
995would be similar to the work outlined above, because it involves the
996change from requiring a cost label at particular positions to
997requiring cost labels to break loops in the CFG.  As there are another
998three passes to consider (including the labelling itself), we believe
999that using the check above is much simpler overall.
1001% TODO? Found some Clight to Cminor bugs quite quickly
1003\section{Existence of a structured trace}
1006The \emph{structured traces} introduced in Section~\ref{sec:fegoals} enrich the execution trace of a program by
1007nesting function calls in a mixed-step style and embedding the cost labelling
1008properties of the program.  It was originally designed to support the
1009proof of correctness for the timing analysis of the object code in the
1010back-end, then generalised to provide a common structure to use from
1011the end of the front-end to the object code.  See
1012Figure~\ref{fig:strtrace} on page~\pageref{fig:strtrace} for an
1013illustration of a structured trace.
1015To make the definition generic we abstract over the semantics of the
1018record abstract_status : Type[1] :=
1019  { as_status :> Type[0]
1020  ; as_execute : relation as_status
1021  ; as_pc : DeqSet
1022  ; as_pc_of : as_status $\rightarrow$ as_pc
1023  ; as_classify : as_status $\rightarrow$ status_class
1024  ; as_label_of_pc : as_pc $\rightarrow$ option costlabel
1025  ; as_after_return : ($\Sigma$s:as_status. as_classify s =  cl_call) $\rightarrow$ as_status $\rightarrow$ Prop
1026  ; as_result: as_status $\rightarrow$ option int
1027  ; as_call_ident : ($\Sigma$s:as_status.as_classify s = cl_call) $\rightarrow$ ident
1028  ; as_tailcall_ident : ($\Sigma$s:as_status.as_classify s = cl_tailcall) $\rightarrow$ ident
1029  }.
1031which gives a type of states, an execution relation\footnote{All of
1032  our semantics are executable, but using a relation was simpler in
1033  the abstraction.}, some notion of
1034program counters with decidable equality, the classification of states,
1035and functions to extract the observable intensional information (cost
1036labels and the identity of functions that are called).  The
1037\lstinline'as_after_return' property links the state before a function
1038call with the state after return, providing the evidence that
1039execution returns to the correct place.  The precise form varies
1040between stages; in \textsf{RTLabs} it insists the CFG, the pointer to
1041the CFG node to execute next, and some call stack information is
1044The structured traces are defined using three mutually inductive
1045types.  The core data structure is \lstinline'trace_any_label', which
1046captures some straight-line execution until the next cost label or the
1047return from the enclosing function.  Any function calls are embedded as
1048a single step, with its own trace nested inside and the before and
1049after states linked by \lstinline'as_after_return'; and states
1050classified as a `jump' (in particular branches) must be followed by a
1051cost label.
1053The second type, \lstinline'trace_label_label', is a
1054\lstinline'trace_any_label' where the initial state is cost labelled.
1055Thus a trace in this type identifies a series of steps whose cost is
1056entirely accounted for by the label at the start.
1058Finally, \lstinline'trace_label_return' is a sequence of
1059\lstinline'trace_label_label' values which end in the return from the
1060function.  These correspond to a measurable subtrace, and in
1061particular include executions of an entire function call (and so are
1062used for the nested calls in \lstinline'trace_any_label').
1066The construction of the structured trace replaces syntactic cost
1067labelling properties, which place requirements on where labels appear
1068in the program, with semantics properties that constrain the execution
1069traces of the program.  The construction begins by defining versions
1070of the sound and precise labelling properties on states and global
1071environments (for the code that appears in each of them) rather than
1072whole programs, and showing that these are preserved by steps of the
1073\textsf{RTLabs} semantics.
1075Then we show that each cost labelling property the structured traces
1076definition requires is locally satisfied.  These are broken up by the
1077classification of states.  Similarly, we prove a step-by-step stack
1078preservation result, which states that the \textsf{RTLabs} semantics
1079never changes the lower parts of the stack.
1081The core part of the construction of a structured trace is to use the
1082proof of termination from the measurable trace (defined on
1083page~\pageref{prog:terminationproof}) to `fold up' the execution into
1084the nested form.  The results outlined above fill in the proof
1085obligations for the cost labelling properties and the stack
1086preservation result shows that calls return to the correct location.
1088The structured trace alone is not sufficient to capture the property
1089that the program is soundly labelled.  While the structured trace
1090guarantees termination, it still permits a loop to be executed a
1091finite number of times without encountering a cost label.  We
1092eliminate this by proving that no `program counter' repeats within any
1093\lstinline'trace_any_label' section by showing that it is incompatible
1094with the property that there is a bound on the number of successor
1095instructions you can follow in the CFG before you encounter a cost
1098\subsubsection{Complete execution structured traces}
1100The development of the construction above started relatively early,
1101before the measurable subtrace preservation proofs.  To be confident
1102that the traces were well-formed at that time, we also developed a
1103complete execution form that embeds the traces above.  This includes
1104non-terminating program executions, where an infinite number of the terminating
1105structured traces are embedded.  This construction confirmed that our
1106definition of structured traces was consistent, although we later
1107found that we did not require the whole execution version for the
1108compiler correctness results.
1110To construct these we need to know whether each function call will
1111eventually terminate, requiring the use of the excluded middle.  This
1112classical reasoning is local to the construction of whole program
1113traces and is not necessary for our main results.
1117In combination with the work on the CerCo back-end and by
1118concentrating on the novel intensional parts of the proof, we have
1119shown that it is possible to construct certifying compilers that
1120correctly report execution time and stack space costs.  The layering
1121of intensional correctness proofs on top of normal simulation results
1122provides a useful separation of concerns, and could permit the reuse
1123of existing results.
1129The following table gives a high-level overview of the \matita{}
1130source files in Deliverable 3.4:
1135\lstinline'' & Top-level compiler definitions, in particular
1136\lstinline'front_end', and the whole compiler definition
1137\lstinline'compile'. \\
1138\lstinline'' & Correctness results: \lstinline'front_end_correct'
1139and \lstinline'correct', respectively. \\
1140\lstinline'Clight/*' & \textsf{Clight}: proofs for switch
1141removal, cost labelling, cast simplification and conversion to
1142\textsf{Cminor}. \\
1143\lstinline'Cminor/*' & \textsf{Cminor}: axioms of conversion to
1144\textsf{RTLabs}. \\
1145\lstinline'RTLabs/*' & \textsf{RTLabs}: definitions and proofs for
1146compile-time cost labelling checks, construction of structured traces.
1148\lstinline'common/' & Definitions for measurable
1149subtraces. \\
1150\lstinline'common/' & Generic measurable subtrace
1151lifting proof. \\
1152\lstinline'common/*' & Other common definitions relevant to many parts
1153of the compiler and proof. \\
1154\lstinline'utilities/*' & General purpose definitions used throughout,
1155including extensions to the standard \matita{} library.
Note: See TracBrowser for help on using the repository browser.