source: Deliverables/D3.4/Report/report.tex @ 3231

Last change on this file since 3231 was 3231, checked in by campbell, 8 years ago

Final revisions to 3.4.

File size: 54.9 KB
14% LaTeX Companion, p 74
15\newcommand{\todo}[1]{\marginpar{\raggedright - #1}}
18  {keywords={Definition,Lemma,Theorem,Remark,Qed,Save,Inductive,Record},
19   morekeywords={[2]if,then,else},
20  }
23  {keywords={definition,lemma,theorem,remark,inductive,record,qed,let,rec,match,with,Type,and,on},
24   morekeywords={[2]whd,normalize,elim,cases,destruct},
25   mathescape=true,
26   morecomment=[n]{(*}{*)},
27  }
30        keywordstyle=\color{red}\bfseries,
31        keywordstyle=[2]\color{blue},
32        commentstyle=\color{green},
33        stringstyle=\color{blue},
34        showspaces=false,showstringspaces=false}
48\vspace*{1cm}Project FP7-ICT-2009-C-243881 \cerco{}}
50\date{ }
70Report n. D3.4\\
71Front-end Correctness Proofs\\
78Version 1.0
86Brian~Campbell, Ilias~Garnier, James~McKinna, Ian~Stark
92Project Acronym: \cerco{}\\
93Project full title: Certified Complexity\\
94Proposal/Contract no.: FP7-ICT-2009-C-243881 \cerco{}\\
96\clearpage \pagestyle{myheadings} \markright{\cerco{}, FP7-ICT-2009-C-243881}
100\section*{Executive Summary}
101\addcontentsline{toc}{section}{Executive Summary}
104\cerco{} Work Package 3, \emph{Verified Compiler - front end} aims
105for formalise and verify the front-end of the \cerco{} cost lifting
106compiler.  This document accompanies the final deliverable,
107\textbf{D3.4: Front-end Correctness Proofs}.  The deliverable consists
108of the formal correctness proofs for the front-end, written in the
109\matita{} proof assistant and this document provides report on the
110work carried out for it.
116We report on the correctness proofs for the front-end of the \cerco{}
117cost lifting compiler.  First, we identify the core result we wish to
118prove, which says that the we correctly predict the precise execution
119time for particular parts of the execution called \emph{measurable}
120subtraces.  Then we consider the three distinct parts of the task:
121showing that the \emph{annotated source code} output by the compiler
122has equivalent behaviour to the original input (up to the
123annotations); showing that a measurable subtrace of the
124annotated source code corresponds to an equivalent measurable subtrace
125in the code produced by the front-end, including costs; and finally
126showing that the enriched \emph{structured} execution traces required
127for cost correctness in the back-end can be constructed from the
128properties of the code produced by the front-end.
130A key part of our work is that the intensional correctness results which show
131that we get consistent cost measurements throughout the intermediate languages
132of the compiler can be layered on top of normal forward simulation results,
133if we split those results into local call-structure preserving simulations.
134This split allowed us to concentrate on the \textbf{intensional} proofs by
135axiomatising some of the extensional simulation results that are very similar to
136existing compiler correctness results, such as CompCert.
138This report is about the correctness results that are deliverable
139D3.4, which are about the formalised compiler described in D3.2, using
140the source language semantics from D3.1 and intermediate language
141semantics from D3.3.  It builds on earlier work on the correctness of
142a toy compiler built to test the labelling approach in D2.1. Together
143with the companion deliverable about the correctness of the back-end,
144D4.4, we obtain results about the whole formalised compiler.
150% CHECK: clear up any -ize vs -ise
151% CHECK: clear up any "front end" vs "front-end"
152% CHECK: clear up any mentions of languages that aren't textsf'd.
153% CHECK: fix unicode in listings
157The \cerco{} compiler produces a version of the source code containing
158annotations describing the timing behaviour of the object code, as
159well as the object code itself. It compiles C code, targeting
160microcontrollers implementing the Intel 8051 architecture.  There are
161two versions: first, an initial prototype was implemented in
162\ocaml{}~\cite{d2.2}, then a version was formalised in the \matita{}
163proof assistant~\cite{d3.2,d4.2} and extracted to \ocaml{} code to
164produce an executable compiler.  In this document we present results
165from Deliverable 3.4, the formalised proofs in \matita{} about the
166front-end of the latter version of the compiler (culminating in the
167\lstinline'front_end_correct' lemma), and describe how that fits
168into the verification of the whole compiler.
170A key part of this work was to layer the \emph{intensional} correctness
171results that show that the costs produced are correct on top of the
172proofs about the compiled code's \emph{extensional} behaviour (that is, the
173functional correctness of the compiler).  Unfortunately, the ambitious
174goal of completely verifying the entire compiler was not feasible
175within the time available, but thanks to this separation of
176extensional and intensional proofs we are able to axiomatise some extensional
177simulation results which are similar to those in other compiler verification
178projects and concentrate on the novel intensional proofs.  We were
179also able to add stack space costs to obtain a stronger result.  The
180proofs were made more tractable by introducing compile-time checks for
181the `sound and precise' cost labelling properties rather than proving
182that they are preserved throughout.
184The overall statement of correctness says that the annotated program has the
185same behaviour as the input, and that for any suitably well-structured part of
186the execution (which we call \emph{measurable}), the object code will execute
187the same behaviour taking precisely the time given by the cost annotations in
188the annotated source program.
190In the next section we recall the structure of the compiler and make the overall
191statement more precise.  Following that, in Section~\ref{sec:fegoals} we
192describe the statements we need to prove about the intermediate \textsf{RTLabs}
193programs for the back-end proofs.
194Section~\ref{sec:inputtolabelling} covers the compiler passes which produce the
195annotated source program and Section~\ref{sec:measurablelifting} the rest
196of the transformations in the front-end.  Then the compile-time checks
197for good cost labelling are detailed in Section~\ref{sec:costchecks}
198and the proofs that the structured traces required by the back-end
199exist are discussed in Section~\ref{sec:structuredtrace}.
201\section{The compiler and its correctness statement}
203The uncertified prototype \ocaml{} \cerco{} compiler was originally described
204in Deliverables 2.1 and 2.2.  Its design was replicated in the formal
205\matita{} code, which was presented in Deliverables 3.2 and 4.2, for
206the front-end and back-end respectively.
212\caption{Languages in the \cerco{} compiler}
216The compiler uses a number of intermediate languages, as outlined the
217middle two lines of Figure~\ref{fig:compilerlangs}.  The upper line
218represents the front-end of the compiler, and the lower the back-end,
219finishing with Intel 8051 binary code.  Not all of the front-end compiler passes
220introduce a new language, and Figure~\ref{fig:summary} presents a
221list of every pass involved.
227\quad \= $\downarrow$ \quad \= \kill
228\textsf{C} (unformalised)\\
229\> $\downarrow$ \> CIL parser (unformalised \ocaml)\\
231%\> $\downarrow$ \> add runtime functions\\
232\> $\downarrow$ \> \lstinline[language=C]'switch' removal\\
233\> $\downarrow$ \> labelling\\
234\> $\downarrow$ \> cast removal\\
235\> $\downarrow$ \> stack variable allocation and control structure
236 simplification\\
238%\> $\downarrow$ \> generate global variable initialization code\\
239\> $\downarrow$ \> transform to RTL graph\\
241\> $\downarrow$ \> check cost labelled properties of RTL graph\\
242\> $\downarrow$ \> start of target specific back-end\\
243\>\quad \vdots
247\caption{Front-end languages and compiler passes}
252The annotated source code is produced by the cost labelling phase.
253Note that there is a pass to replace C \lstinline[language=C]'switch'
254statements before labelling --- we need to remove them because the
255simple form of labelling used in the formalised compiler is not quite
256capable of capturing their execution time costs, largely due to C's
257`fall-through' behaviour where execution from one branch continues in
258the next unless there is an explicit \lstinline[language=C]'break'.
260The cast removal phase which follows cost labelling simplifies
261expressions to prevent unnecessary arithmetic promotion, which is
262specified by the C standard but costly for an 8-bit target.  The
263transformation to \textsf{Cminor} and subsequently \textsf{RTLabs}
264bear considerable resemblance to some passes of the CompCert
265compiler~\cite{Blazy-Leroy-Clight-09,Leroy-backend}, although we use a simpler \textsf{Cminor} where
266all loops use \lstinline[language=C]'goto' statements, and the
267\textsf{RTLabs} language retains a target-independent flavour.  The
268back-end takes \textsf{RTLabs} code as input.
270The whole compilation function returns the following information on success:
272  record compiler_output : Type[0] :=
273   { c_labelled_object_code: labelled_object_code
274   ; c_stack_cost: stack_cost_model
275   ; c_max_stack: nat
276   ; c_init_costlabel: costlabel
277   ; c_labelled_clight: clight_program
278   ; c_clight_cost_map: clight_cost_map
279   }.
281It consists of annotated 8051 object code, a mapping from function
282identifiers to the function's stack space usage, the space available for the
283stack after global variable allocation, a cost label covering the
284execution time for the initialisation of global variables and the call
285to the \lstinline[language=C]'main' function, the annotated source
286code, and finally a mapping from cost labels to actual execution time
289An \ocaml{} pretty printer is used to provide a concrete version of
290the output code and annotated source code.  In the case of the
291annotated source code, it also inserts the actual costs alongside the
292cost labels, and optionally adds a global cost variable and
293instrumentation to support further reasoning in external tools such as
296\subsection{Revisions to the prototype compiler}
298Our focus on intensional properties prompted us to consider whether we
299could incorporate stack space into the costs presented to the user.
300We only allocate one fixed-size frame per function, so modelling this
301was relatively simple.  It is the only form of dynamic memory
302allocation provided by the compiler, so we were able to strengthen the
303statement of the goal to guarantee successful execution whenever the
304stack space obeys the \lstinline'c_max_stack' bound calculated by
305subtracting the global variable requirements from the total memory
308The cost labelling checks at the end of Figure~\ref{fig:summary} have been
309introduced to reduce the proof burden, and are described in
312The use of dependent types to capture simple intermediate language
313invariants makes every front-end pass a total function, except
314\textsf{Clight} to \textsf{Cminor} and the cost checks.  Hence various
315well-formedness and type safety checks are performed only once between
316\textsf{Clight} and \textsf{Cminor}, and the invariants rule out any
317difficulties in the later stages.  With the benefit of hindsight we
318would have included an initial checking phase to produce a
319`well-formed' variant of \textsf{Clight}, conjecturing that this would
320simplify various parts of the proofs for the \textsf{Clight} stages
321which deal with potentially ill-formed code.
323Following D2.2, we previously generated code for global variable
324initialisation in \textsf{Cminor}, for which we reserved a cost label
325to represent the execution time for initialisation.  However, the
326back-end must also add an initial call to the main function, whose
327cost must also be accounted for, so we decided to move the
328initialisation code to the back-end and merge the costs.
330\subsection{Main correctness statement}
332Informally, our main intensional result links the time difference in a source
333code execution to the time difference in the object code, expressing the time
334for the source by summing the values for the cost labels in the trace, and the
335time for the target by a clock built in to the 8051 executable semantics.
337The availability of precise timing information for 8501
338implementations and the design of the compiler allow it to give exact
339time costs in terms of processor cycles, not just upper bounds.
340However, these exact results are only available if the subtrace we
341measure starts and ends at suitable points.  In particular, pure
342computation with no observable effects may be reordered and moved past
343cost labels, so we cannot measure time between arbitrary statements in
344the program.
346There is also a constraint on the subtraces that we
347measure due to the requirements of the correctness proof for the
348object code timing analysis.  To be sure that the timings are assigned
349to the correct cost label, we need to know that each return from a
350function call must go to the correct return address.  It is difficult
351to observe this property locally in the object code because it relies
352on much earlier stages in the compiler.  To convey this information to
353the timing analysis extra structure is imposed on the subtraces, which
354is described in Section~\ref{sec:fegoals}.
356% Regarding the footnote, would there even be much point?
357% TODO: this might be quite easy to add ('just' subtract the
358% measurable subtrace from the second label to the end).  Could also
359% measure other traces in this manner.
360These restrictions are reflected in the subtraces that we give timing
361guarantees on; they must start at a cost label and end at the return
362of the enclosing function of the cost label\footnote{We expect that
363  this would generalise to more general subtraces by subtracting costs
364  for unwanted measurable suffixes of a measurable subtrace.}.  A
365typical example of such a subtrace is the execution of an entire
366function from the cost label at the start of the function until it
367returns.  We call such any such subtrace \emph{measurable} if it (and
368the prefix of the trace from the start to the subtrace) can also be
369executed within the available stack space.
371Now we can give the main intensional statement for the compiler.
372Given a \emph{measurable} subtrace for a labelled \textsf{Clight}
373program, there is a subtrace of the 8051 object code program where the
374time differences match.  Moreover, \emph{observable} parts of the
375trace also match --- these are the appearance of cost labels and
376function calls and returns.
380More formally, the definition of this statement in \matita{} is
382definition simulates :=
383  $\lambda$p: compiler_output.
384  let initial_status := initialise_status $...$ (cm (c_labelled_object_code $...$ p)) in
385  $\forall$m1,m2.
386   measurable Clight_pcs (c_labelled_clight $...$ p) m1 m2
387       (stack_sizes (c_stack_cost $...$ p)) (c_max_stack $...$ p) $\rightarrow$
388  $\forall$c1,c2.
389   clock_after Clight_pcs (c_labelled_clight $...$ p) m1 (c_clight_cost_map $...$ p) = OK $...$ c1 $\rightarrow$
390   clock_after Clight_pcs (c_labelled_clight $...$ p) (m1+m2) (c_clight_cost_map $...$ p) = OK $...$ c2 $\rightarrow$
391  $\exists$n1,n2.
392   observables Clight_pcs (c_labelled_clight $...$ p) m1 m2 =
393     observables (OC_preclassified_system (c_labelled_object_code $...$ p))
394          (c_labelled_object_code $...$ p) n1 n2
395  $\wedge$
396   clock ?? (execute (n1+n2) ? initial_status) =
397     clock ?? (execute n1 ? initial_status) + (c2-c1).
399where the \lstinline'measurable', \lstinline'clock_after' and
400\lstinline'observables' definitions are generic definitions for multiple
401languages; in this case the \lstinline'Clight_pcs' record applies them
402to \textsf{Clight} programs.
404There is a second part to the statement, which says that the initial
405processing of the input program to produce the cost labelled version
406does not affect the semantics of the program:
407% Yes, I'm paraphrasing the result a tiny bit to remove the observe non-function
409  $\forall$input_program,output.
410  compile input_program = return output $\rightarrow$
411  not_wrong $...$ (exec_inf $...$ clight_fullexec input_program) $\rightarrow$
412  sim_with_labels
413   (exec_inf $...$ clight_fullexec input_program)
414   (exec_inf $...$ clight_fullexec (c_labelled_clight $...$ output))
416That is, any successful compilation produces a labelled program that
417has identical behaviour to the original, so long as there is no
418`undefined behaviour'.
420Note that this statement provides full functional correctness, including
421preservation of (non-)termination.  The intensional result above does
422not do this directly --- it does not guarantee the same result or same
423termination.  There are two mitigating factors, however: first, to
424prove the intensional property you need local simulation results --- these
425can be pieced together to form full behavioural equivalence, only time
426constraints have prevented us from doing so.  Second, if we wish to
427confirm a result, termination, or non-termination we could add an
428observable witness, such as a function that is only called if the
429correct result is given.  The intensional result guarantees that the
430observable witness is preserved, so the program must behave correctly.
432These two results are combined in the the \lstinline'correct'
433theorem in the file \lstinline''.
435\section{Correctness statement for the front-end}
438The essential parts of the intensional proof were outlined during work
439on a toy compiler in Task
4402.1~\cite{d2.1,springerlink:10.1007/978-3-642-32469-7_3}.  These are
442\item functional correctness, in particular preserving the trace of
443  cost labels,
444\item the \emph{soundness} and \emph{precision} of the cost labelling
445  on the object code, and
446\item the timing analysis on the object code produces a correct
447  mapping from cost labels to time.
450However, that toy development did not include function calls.  For the
451full \cerco{} compiler we also need to maintain the invariant that
452functions return to the correct program location in the caller, as we
453mentioned in the previous section.  During work on the back-end timing
454analysis (describe in more detail in the companion deliverable, D4.4)
455the notion of a \emph{structured trace} was developed to enforce this
456return property, and also most of the cost labelling properties too.
462\caption{The compiler and proof outline}
466Jointly, we generalised the structured traces to apply to any of the
467intermediate languages which have some idea of program counter.  This means
468that they are introduced part way through the compiler, see
469Figure~\ref{fig:compiler}.  Proving that a structured trace can be
470constructed at \textsf{RTLabs} has several virtues:
472\item This is the first language where every operation has its own
473  unique, easily addressable, statement.
474\item Function calls and returns are still handled implicitly in the
475  language and so the structural properties are ensured by the
476  semantics.
477\item Many of the back-end languages from \textsf{RTL} onwards share a common
478  core set of definitions, and using structured traces throughout
479  increases this uniformity.
486\caption{Nesting of functions in structured traces}
489A structured trace is a mutually inductive data type which
490contains the steps from a normal program trace, but arranged into a
491nested structure which groups entire function calls together and
492aggregates individual steps between cost labels (or between the final
493cost label and the return from the function), see
494Figure~\ref{fig:strtrace}.  This captures the idea that the cost labels
495only represent costs \emph{within} a function --- calls to other
496functions are accounted for in the nested trace for their execution, and we
497can locally regard function calls as a single step.
499These structured traces form the core part of the intermediate results
500that we must prove so that the back-end can complete the main
501intensional result stated above.  In full, we provide the back-end
504\item A normal trace of the \textbf{prefix} of the program's execution
505  before reaching the measurable subtrace.  (This needs to be
506  preserved so that we know that the stack space consumed is correct,
507  and to set up the simulation results.)
508\item The \textbf{structured trace} corresponding to the measurable
509  subtrace.
510\item An additional property about the structured trace that no
511  `program counter' is \textbf{repeated} between cost labels.  Together with
512  the structure in the trace, this takes over from showing that
513  cost labelling is sound and precise.
514\item A proof that the \textbf{observables} have been preserved.
515\item A proof that the \textbf{stack limit} is still observed by the prefix and
516  the structure trace.  (This is largely a consequence of the
517  preservation of observables.)
519The \lstinline'front_end_correct' lemma in the
520\lstinline'' file provides a record containing these.
522Following the outline in Figure~\ref{fig:compiler}, we will first deal
523with the transformations in \textsf{Clight} that produce the source
524program with cost labels, then show that measurable traces can be
525lifted to \textsf{RTLabs}, and finally show that we can construct the
526properties listed above ready for the back-end proofs.
528\section{Input code to cost labelled program}
531As explained on page~\pageref{page:switchintro}, the costs of complex
532C \lstinline[language=C]'switch' statements cannot be represented with
533the simple labelling used in the formalised compiler.  Our first pass
534replaces these statements with simpler C code, allowing our second
535pass to perform the cost labelling.  We show that the behaviour of
536programs is unchanged by these passes using forward
537simulations\footnote{All of our languages are deterministic, which can
538be seen directly from their executable definitions.  Thus we know that
539forward simulations are sufficient because the target cannot have any
540other behaviour.}.
542\subsection{Switch removal}
544We compile away \lstinline[language=C]'switch' statements into more
545basic \textsf{Clight} code.
546Note that this transformation does not necessarily deteriorate the
547efficiency of the generated code. For instance, compilers such as GCC
548introduce balanced trees of ``if-then-else'' constructs for small
549switches. However, our implementation strategy is much simpler. Let
550us consider the following input statement.
553   switch(e) {
554   case v1:
555     stmt1;
556   case v2:
557     stmt2;
558   default:
559     stmt_default;
560   }
563Note that \textsf{stmt1}, \textsf{stmt2}, \ldots \textsf{stmt\_default}
564may contain \lstinline[language=C]'break' statements, which have the
565effect of exiting the switch statement. In the absence of break, the
566execution falls through each case sequentially. In our implementation,
567we produce an equivalent sequence of ``if-then'' chained by gotos:
569   fresh = e;
570   if(fresh == v1) {
571     $\llbracket$stmt1$\rrbracket$;
572     goto lbl_case2;
573   };
574   if(fresh == v2) {
575     lbl_case2:
576     $\llbracket$stmt2$\rrbracket$;
577     goto lbl_case2;
578   };
579   $\llbracket$stmt_default$\rrbracket$;
580   exit_label:
583The proof had to tackle the following points:
585\item the source and target memories are not the same (due to the fresh variable),
586\item the flow of control is changed in a non-local way (e.g. \textbf{goto} 
587  instead of \textbf{break}).
589In order to tackle the first point, we implemented a version of memory
590extensions similar to those of CompCert.
592For the simulation we decided to prove a sufficient amount to give us
593confidence in the definitions and approach, but curtail the proof
594because this pass does not contribute to the intensional correctness
595result.  We tackled several simple cases, that do not interact with
596the switch removal per se, to show that the definitions were usable,
597and part of the switch case to check that the approach is
598reasonable. This comprises propagating the memory extension through
599each statement (except switch), as well as various invariants that are
600needed for the switch case (in particular, freshness hypotheses). The
601details of the evaluation process for the source switch statement and
602its target counterpart can be found in the file
603\lstinline'', along more details on the transformation
606Proving the correctness of the second point would require reasoning on the
607semantics of \lstinline[language=C]'goto' statements. In the \textsf{Clight} 
608semantics, this is implemented as a function-wide lookup of the target label.
609The invariant we would need is the fact that a global label lookup on a freshly
610created goto is equivalent to a local lookup. This would in turn require the
611propagation of some freshness hypotheses on labels. As discussed,
612we decided to omit this part of the correctness proof.
614\subsection{Cost labelling}
616The simulation for the cost labelling pass is the simplest in the
617front-end.  The main argument is that any step of the source program
618is simulated by the same step of the labelled one, plus any extra
619steps for the added cost labels.  The extra instructions do not change
620the memory or local environments, although we have to keep track of
621the extra instructions that appear in continuations, for example
622during the execution of a \lstinline[language=C]'while' loop.
624We do not attempt to capture any cost properties of the labelling\footnote{We describe how the cost properties are
625established in Section~\ref{sec:costchecks}.} in
626the simulation proof, which allows the proof to be oblivious to the choice
627of cost labels.  Hence we do not have to reason about the threading of
628name generation through the labelling function, greatly reducing the
629amount of effort required.
631%TODO: both give one-step-sim-by-many forward sim results; switch
632%removal tricky, uses aux var to keep result of expr, not central to
633%intensional correctness so curtailed proof effort once reasonable
634%level of confidence in code gained; labelling much simpler; don't care
635%what the labels are at this stage, just need to know when to go
636%through extra steps.  Rolled up into a single result with a cofixpoint
637%to obtain coinductive statement of equivalence (show).
639\section{Finding corresponding measurable subtraces}
642There follow the three main passes of the front-end:
644\item simplification of casts in \textsf{Clight} code
645\item \textsf{Clight} to \textsf{Cminor} translation, performing stack
646  variable allocation and simplifying control structures
647\item transformation to \textsf{RTLabs} control flow graph
649We have taken a common approach to
650each pass: first we build (or axiomatise) forward simulation results
651that are similar to normal compiler proofs, but which are slightly more
652fine-grained so that we can see that the call structure and relative
653placement of cost labels is preserved.
655Then we instantiate a general result which shows that we can find a
656\emph{measurable} subtrace in the target of the pass that corresponds
657to the measurable subtrace in the source.  By repeated application of
658this result we can find a measurable subtrace of the execution of the
659\textsf{RTLabs} code, suitable for the construction of a structured
660trace (see Section~\ref{sec:structuredtrace}).  This is essentially an
661extra layer on top of the simulation proofs that provides us with the
662additional information required for our intensional correctness proof.
664\subsection{Generic measurable subtrace lifting proof}
666Our generic proof is parametrised on a record containing small-step
667semantics for the source and target language, a classification of
668states (the same form of classification is used when defining
669structured traces), a simulation relation which respects the
670classification and cost labelling and
671four simulation results.  The simulations are split by the starting state's
672classification and whether it is a cost label, which will allow us to
673observe that the call structure is preserved.  They are:
675\item a step from a `normal' state (which is not classified as a call
676  or return) which is not a cost label is simulated by zero or more
677  `normal' steps;
678\item a step from a `call' state followed by a cost label step is
679  simulated by a step from a `call' state, a corresponding label step,
680  then zero or more `normal' steps;
681\item a step from a `call' state not followed by a cost label
682  similarly (note that this case cannot occur in a well-labelled
683  program, but we do not have enough information locally to exploit
684  this); and
685\item a cost label step is simulated by a cost label step.
687Finally, we need to know that a successfully translated program will
688have an initial state in the simulation relation with the original
689program's initial state.
691The back-end has similar requirements for lifting simulations to
692structured traces.  Fortunately, our treatment of calls and returns
693can be slightly simpler because we have special call and return states
694that correspond to function entry and return that are separate from
695the actual instructions.  This was originally inherited from our port
696of CompCert's \textsf{Clight} semantics, but proves useful here
697because we only need to consider adding extra steps \emph{after} a
698call or return state, because the instruction step deals with extra
699steps that occur before.  The back-end makes all of the call and
700return machinery explicit, and thus needs more complex statements
701about extra steps before and after each call and return.
707\caption{Tiling of simulation for a measurable subtrace}
711To find the measurable subtrace in the target program's execution we
712walk along the original program's execution trace applying the
713appropriate simulation result by induction on the number of steps.
714While the number of steps taken varies, the overall structure is
715preserved, as illustrated in Figure~\ref{fig:tiling}.  By preserving
716the structure we also maintain the same intensional observables.  One
717delicate point is that the cost label following a call must remain
718directly afterwards\footnote{The prototype compiler allowed some
719  straight-line code to appear before the cost label until a later
720  stage of the compiler, but we must move the requirement forward to
721  fit with the structured traces.}
722% Damn it, I should have just moved the cost label forwards in RTLabs,
723% like the prototype does in RTL to ERTL; the result would have been
724% simpler.  Or was there some reason not to do that?
725(both in the program code and in the execution trace), even if we
726introduce extra steps, for example to store parameters in memory in
727\textsf{Cminor}.  Thus we have a version of the call simulation
728that deals with both the call and the cost label in one result.
730In addition to the subtrace we are interested in measuring, we must
731prove that the earlier part of the trace is also preserved in
732order to use the simulation from the initial state.  This proof also
733guarantees that we do not run out of stack space before the subtrace
734we are interested in.  The lemmas for this prefix and the measurable
735subtrace are similar, following the pattern above.  However, the
736measurable subtrace also requires us to rebuild the termination
737proof.  This is defined recursively:
740  let rec will_return_aux C (depth:nat)
741                               (trace:list (cs_state $...$ C $\times$ trace)) on trace : bool :=
742  match trace with
743  [ nil $\Rightarrow$ false
744  | cons h tl $\Rightarrow$
745    let $\langle$s,tr$\rangle$ := h in
746    match cs_classify C s with
747    [ cl_call $\Rightarrow$ will_return_aux C (S depth) tl
748    | cl_return $\Rightarrow$
749        match depth with
750        [ O $\Rightarrow$ match tl with [ nil $\Rightarrow$ true | _ $\Rightarrow$ false ]
751        | S d $\Rightarrow$ will_return_aux C d tl
752        ]
753    | _ $\Rightarrow$ will_return_aux C depth tl
754    ]
755  ].
757The \lstinline'depth' is the number of return states we need to see
758before we have returned to the original function (initially zero) and
759\lstinline'trace' the measurable subtrace obtained from the running
760the semantics for the correct number of steps.  This definition
761unfolds tail recursively for each step, and once the corresponding
762simulation result has been applied a new one for the target can be
763asserted by unfolding and applying the induction hypothesis on the
764shorter trace.
766Combining the lemmas about the prefix and the measurable subtrace
767requires a little care because the states joining the two might not be
768related in the simulation.  In particular, if the measurable subtrace
769starts from the cost label at the beginning of the function there may
770be some extra instructions in the target code to execute to complete
771function entry before the states are back in the relation.  Hence we
772carefully phrased the lemmas to allow for such extra steps.
774Together, these then gives us an overall result for any simulation fitting the
775requirements above (contained in the \lstinline'meas_sim' record):
777theorem measured_subtrace_preserved :
778  $\forall$MS:meas_sim.
779  $\forall$p1,p2,m,n,stack_cost,max.
780  ms_compiled MS p1 p2 $\rightarrow$
781  measurable (ms_C1 MS) p1 m n stack_cost max $\rightarrow$
782  $\exists$m',n'.
783    measurable (ms_C2 MS) p2 m' n' stack_cost max $\wedge$
784    observables (ms_C1 MS) p1 m n = observables (ms_C2 MS) p2 m' n'.
786The stack space requirement that is embedded in \lstinline'measurable'
787is a consequence of the preservation of observables, because it is
788determined by the functions called and returned from, which are observable.
790\subsection{Simulation results for each pass}
792We now consider the simulation results for the passes, each of which
793is used to instantiate the
794\lstinline[language=matita]'measured_subtrace_preserved' theorem to
795construct the measurable subtrace for the next language.
797\subsubsection{Cast simplification}
799The parser used in \cerco{} introduces a lot of explicit type casts.
800If left as they are, these constructs can greatly hamper the
801quality of the generated code -- especially as the architecture
802we consider is an $8$-bit one. In \textsf{Clight}, casts are
803expressions. Hence, most of the work of this transformation
804proceeds on expressions. The tranformation proceeds by recursively
805trying to coerce an expression to a particular integer type, which
806is in practice smaller than the original one. This functionality
807is implemented by two mutually recursive functions whose signature
808is the following.
811let rec simplify_expr (e:expr) (target_sz:intsize) (target_sg:signedness)
812  : $\Sigma$result:bool$\times$expr.
813    $\forall$ge,en,m. simplify_inv ge en m e (\snd result) target_sz target_sg (\fst result) := $\ldots$
815and simplify_inside (e:expr) : $\Sigma$result:expr. conservation e result := $\ldots$
818The \textsf{simplify\_inside} acts as a wrapper for
819\textsf{simplify\_expr}. Whenever \textsf{simplify\_inside} encounters
820a \textsf{Ecast} expression, it tries to coerce the sub-expression
821to the desired type using \textsf{simplify\_expr}, which tries to
822perform the actual coercion. In return, \textsf{simplify\_expr} calls
823back \textsf{simplify\_inside} in some particular positions, where we
824decided to be conservative in order to simplify the proofs. However,
825the current design allows to incrementally revert to a more aggressive
826version, by replacing recursive calls to \textsf{simplify\_inside} by
827calls to \textsf{simplify\_expr} \emph{and} proving the corresponding
828invariants -- where possible.
830The \textsf{simplify\_inv} invariant encodes either the conservation
831of the semantics during the transformation corresponding to the failure
832of the coercion (\textsf{Inv\_eq} constructor), or the successful
833downcast of the considered expression to the target type
837inductive simplify_inv
838  (ge : genv) (en : env) (m : mem)
839  (e1 : expr) (e2 : expr) (target_sz : intsize) (target_sg : signedness) : bool $\rightarrow$ Prop :=
840| Inv_eq : $\forall$result_flag. $\ldots$
841     simplify_inv ge en m e1 e2 target_sz target_sg result_flag
842| Inv_coerce_ok : $\forall$src_sz,src_sg.
843     typeof e1 = Tint src_sz src_sg $\rightarrow$
844     typeof e2 = Tint target_sz target_sg $\rightarrow$     
845     smaller_integer_val src_sz target_sz src_sg (exec_expr ge en m e1) (exec_expr ge en m e2) $\rightarrow$
846     simplify_inv ge en m e1 e2 target_sz target_sg true.
849The \textsf{conservation} invariant for \textsf{simplify\_inside} simply states the conservation
850of the semantics, as in the \textsf{Inv\_eq} constructor of the previous
854definition conservation := $\lambda$e,result. $\forall$ge,en,m.
855          res_sim ? (exec_expr ge en m e) (exec_expr ge en m result)
856       $\wedge$ res_sim ? (exec_lvalue ge en m e) (exec_lvalue ge en m result)
857       $\wedge$ typeof e = typeof result.
860This invariant is then easily lifted to statement evaluations.
861The main problem encountered with this particular pass was dealing with
862inconsistently typed programs, a canonical case being a particular
863integer constant of a certain size typed with another size. This
864prompted the need to introduce numerous type checks, making
865both the implementation and the proof more complex, even though more
866comprehensive checks are made in the next stage.
867%\todo{Make this a particular case of the more general statement on baking more invariants in the Clight language}
869\subsubsection{Clight to Cminor}
871This pass is the last one operating on the \textsf{Clight} language.
872Its input is a full \textsf{Clight} program, and its output is a
873\textsf{Cminor} program. Note that we do not use an equivalent of
874CompCert's \textsf{C\#minor} language: we translate directly to a
875variant of \textsf{Cminor}. This presents the advantage of not
876requiring the special loop constructs, nor the explicit block
877structure. Another salient point of our approach is that a significant
878number of the properties needed for the simulation proof were directly
879encoded in dependently typed translation functions.  In particular,
880freshness conditions and well-typedness conditions are included. The
881main effects of the transformation from \textsf{Clight} to
882\textsf{Cminor} are listed below.
885\item Variables are classified as being either globals, stack-allocated
886  locals or potentially register-allocated locals. The value of register-allocated
887  local variables is moved out of the modelled memory and stored in a
888  dedicated environment.
889\item In \textsf{Clight}, each local variable has a dedicated memory block, whereas
890  stack-allocated locals are bundled together on a function-by-function basis.
891\item Loops are converted to jumps.
894The first two points require memory injections which are more flexible that those
895needed in the switch removal case. In the remainder of this section, we briefly
896discuss our implementation of memory injections, and then the simulation proof.
898\paragraph{Memory injections.}
900Our memory injections are modelled after the work of Blazy \& Leroy.
901However, the corresponding paper is based on the first version of the
902CompCert memory model~\cite{2008-Leroy-Blazy-memory-model}, whereas we use a much more concrete model, allowing byte-level
903manipulations (as in the later version of CompCert's memory model). We proved
904roughly 80 \% of the required lemmas. Notably, some of the difficulties encountered were
905due to overly relaxed conditions on pointer validity (fixed during development).
906Some more side conditions had to be added to take care of possible overflows when converting
907from \textbf{Z} block bounds to $16$ bit pointer offsets (in practice, such overflows
908only occur in edge cases that are easily ruled out -- but this fact is not visible
909in memory injections). Concretely, some of the lemmas on the preservation of simulation of
910loads after writes were axiomatised, due to a lack of time.
912\paragraph{Simulation proof.}
914We proved the simulation result for expressions and a representative
915selection of statements.  In particular we tackled
916\lstinline[language=C]'while' statements to ensure that we correctly
917translate loops because our approach differs from CompCert by
918converting directly to \textsf{Cminor} \lstinline[language=C]'goto's
919rather than maintaining a notion of loop in \textsf{Cminor}.  We also have a partial
920proof for function entry, covering the setup of the memory injection,
921but not function exit.  Exits, and the remaining statements, have been
924Careful management of the proof state was required because proof terms
925are embedded in \textsf{Cminor} code to show that invariants are
926respected.  These proof terms appear in the proof state when inverting
927the translation functions, and they can be large and awkward.  While
928generalising them away is usually sufficient, it can be difficult when
929they appear under a binder.
931%The correctness proof for this transformation was not completed. We proved the
932%simulation result for expressions and for some subset of the critical statement cases.
933%Notably lacking are the function entry and exit, where the memory injection is
934%properly set up. As would be expected, a significant amount of work has to be performed
935%to show the conservation of all invariants at each simulation step.
937%\todo{list cases, explain while loop, explain labeling problem}
939\subsubsection{Cminor to RTLabs}
941The translation from \textsf{Cminor} to \textsf{RTLabs} is a fairly
942routine control flow graph (CFG) construction.  As such, we chose to
943axiomatise the associated extensional simulation results.  However, we did prove several
944properties of the generated programs:
946\item All statements are type correct with respect to the declared
947  pseudo-register type environment.
948\item The CFG is closed, and has a distinguished entry node and a
949  unique exit node.
952These properties rely on similar properties about type safety and the
953presence of \lstinline[language=C]'goto'-labels for \textsf{Cminor} programs
954which are checked at the preceding stage.  As a result, this
955transformation is total and any compilation failures must occur when
956the corresponding \textsf{Clight} source is available and a better
957error message can be generated.
959The proof obligations for these properties include many instances of
960graph inclusion.  We automated these proofs using a small amount of
961reflection, making the obligations much easier to handle.  One
962drawback to enforcing invariants throughout is that temporarily
963breaking them can be awkward.  For example, \lstinline'return'
964statements were originally used as placeholders for
965\lstinline[language=C]'goto' destinations that had not yet been
966translated.  However, this made establishing the single exit node
967property rather difficult, and a different placeholder was chosen
968instead.  In other circumstances it is possible to prove a more
969complex invariant then simplify it at the end of the transformation.
971\section{Checking cost labelling properties}
974Ideally, we would provide proofs that the cost labelling pass always
975produces programs that are soundly and precisely labelled and that
976each subsequent pass preserves these properties.  This would match our
977use of dependent types to eliminate impossible sources of errors
978during compilation, in particular retaining intermediate language type
981However, given the limited amount of time available we realised that
982implementing a compile-time check for a sound and precise labelling of
983the \textsf{RTLabs} intermediate code would reduce the proof burden
984considerably.  This is similar in spirit to the use of translation
985validation in certified compilation, which makes a similar trade-off
986between the potential for compile-time failure and the volume of proof
989The check cannot be pushed into a later stage of the compiler because
990much of the information is embedded into the structured traces.
991However, if an alternative method was used to show that function
992returns in the compiled code are sufficiently well-behaved, then we
993could consider pushing the cost property checks into the timing
994analysis itself.  We leave this as a possible area for future work.
996\subsection{Implementation and correctness}
999For a cost labelling to be sound and precise we need a cost label at
1000the start of each function, after each branch and at least one in
1001every loop.  The first two parts are trivial to check by examining the
1002code.  In \textsf{RTLabs} the last part is specified by saying
1003that there is a bound on the number of successive instruction nodes in
1004the CFG that you can follow before you encounter a cost label, and
1005checking this is more difficult.
1007The implementation progresses through the set of nodes in the graph,
1008following successors until a cost label is found or a label-free cycle
1009is discovered (in which case the property does not hold and we return
1010an error).  This is made easier by the prior knowledge that every
1011successor of a branch instruction is a cost label, so we do not need
1012to search each branch.  When a label is found, we remove the chain of
1013program counters from the set and continue from another node in the
1014set until it is empty, at which point we know that there is a bound
1015for every node in the graph.
1017Directly reasoning about the function that implements this procedure would be
1018rather awkward, so an inductive specification of a single step of its
1019behaviour was written and proved to match the implementation.  This
1020was then used to prove the implementation sound and complete.
1022While we have not attempted to prove that the cost labelled properties
1023are established and preserved earlier in the compiler, we expect that
1024the effort for the \textsf{Cminor} to \textsf{RTLabs} stage alone
1025would be similar to the work outlined above, because it involves the
1026change from requiring a cost label at particular positions to
1027requiring cost labels to break loops in the CFG.  As there are another
1028three passes to consider (including the labelling itself), we believe
1029that using the check above is much simpler overall.
1031% TODO? Found some Clight to Cminor bugs quite quickly
1033\section{Existence of a structured trace}
1036The \emph{structured trace} idea introduced in
1037Section~\ref{sec:fegoals} enriches the execution trace of a program by
1038nesting function calls in a mixed-step style and embedding the cost
1039labelling properties of the program.  See Figure~\ref{fig:strtrace} on
1040page~\pageref{fig:strtrace} for an illustration of a structured trace.
1041It was originally designed to support the proof of correctness for the
1042timing analysis of the object code in the back-end, then generalised
1043to provide a common structure to use from the end of the front-end to
1044the object code.
1046To make the definition generic we abstract over the semantics of the
1049record abstract_status : Type[1] :=
1050  { as_status :> Type[0]
1051  ; as_execute : relation as_status
1052  ; as_pc : DeqSet
1053  ; as_pc_of : as_status $\rightarrow$ as_pc
1054  ; as_classify : as_status $\rightarrow$ status_class
1055  ; as_label_of_pc : as_pc $\rightarrow$ option costlabel
1056  ; as_after_return : ($\Sigma$s:as_status. as_classify s =  cl_call) $\rightarrow$ as_status $\rightarrow$ Prop
1057  ; as_result: as_status $\rightarrow$ option int
1058  ; as_call_ident : ($\Sigma$s:as_status.as_classify s = cl_call) $\rightarrow$ ident
1059  ; as_tailcall_ident : ($\Sigma$s:as_status.as_classify s = cl_tailcall) $\rightarrow$ ident
1060  }.
1062which requires a type of states, an execution relation\footnote{All of
1063  our semantics are executable, but using a relation was simpler in
1064  the abstraction.}, some notion of abstract
1065program counter with decidable equality, the classification of states,
1066and functions to extract the observable intensional information (cost
1067labels and the identity of functions that are called).  The
1068\lstinline'as_after_return' property links the state before a function
1069call with the state after return, providing the evidence that
1070execution returns to the correct place.  The precise form varies
1071between stages; in \textsf{RTLabs} it insists the CFG, the pointer to
1072the CFG node to execute next, and some call stack information is
1075The structured traces are defined using three mutually inductive
1076types.  The core data structure is \lstinline'trace_any_label', which
1077captures some straight-line execution until the next cost label or the
1078return from the enclosing function.  Any function calls are embedded as
1079a single step, with its own trace nested inside and the before and
1080after states linked by \lstinline'as_after_return'; and states
1081classified as a `jump' (in particular branches) must be followed by a
1082cost label.
1084The second type, \lstinline'trace_label_label', is a
1085\lstinline'trace_any_label' where the initial state is cost labelled.
1086Thus a trace in this type identifies a series of steps whose cost is
1087entirely accounted for by the label at the start.
1089Finally, \lstinline'trace_label_return' is a sequence of
1090\lstinline'trace_label_label' values which end in the return from the
1091function.  These correspond to a measurable subtrace, and in
1092particular include executions of an entire function call (and so are
1093used for the nested calls in \lstinline'trace_any_label').
1097The construction of the structured trace replaces syntactic cost
1098labelling properties, which place requirements on where labels appear
1099in the program, with semantic properties that constrain the execution
1100traces of the program.  The construction begins by defining versions
1101of the sound and precise labelling properties on states and global
1102environments (for the code that appears in each of them) rather than
1103whole programs, and showing that these are preserved by steps of the
1104\textsf{RTLabs} semantics.
1106Then we show that each cost labelling property required by the
1107definition of structured traces is locally satisfied.  These proofs are
1108broken up by the classification of states.  Similarly, we prove a
1109step-by-step stack preservation result, which states that the
1110\textsf{RTLabs} semantics never changes the lower parts of the stack.
1112The core part of the construction of a structured trace is to use the
1113proof of termination from the measurable trace (defined on
1114page~\pageref{prog:terminationproof}) to `fold up' the execution into
1115the nested form.  The results outlined above fill in the proof
1116obligations for the cost labelling properties and the stack
1117preservation result shows that calls return to the correct location.
1119The structured trace alone is not sufficient to capture the property
1120that the program is soundly labelled.  While the structured trace
1121guarantees termination, it still permits a loop to be executed a
1122finite number of times without encountering a cost label.  We
1123eliminate this by proving that no `program counter' repeats within any
1124\lstinline'trace_any_label' section by showing that it is incompatible
1125with the property that there is a bound on the number of successor
1126instructions you can follow in the CFG before you encounter a cost
1127label (from Section~\ref{sec:costchecksimpl}).
1129\subsubsection{Complete execution structured traces}
1131The development of the construction above started relatively early,
1132before the measurable subtrace preservation proofs.  To be confident
1133that the traces were well-formed at that time, we also developed a
1134complete execution form that embeds the traces above.  This includes
1135non-terminating program executions, where an infinite number of the terminating
1136structured traces are embedded.  This construction confirmed that our
1137definition of structured traces was consistent, although we later
1138found that we did not require the whole execution version for the
1139compiler correctness results.
1141To construct these we need to know whether each function call will
1142eventually terminate, requiring the use of the excluded middle.  This
1143classical reasoning is local to the construction of whole program
1144traces and is not necessary for our main results.
1148In combination with the work on the CerCo back-end and by
1149concentrating on the novel intensional parts of the proof, we have
1150shown that it is possible to construct certifying compilers that
1151correctly report execution time and stack space costs.  The layering
1152of intensional correctness proofs on top of normal simulation results
1153provides a useful separation of concerns, and could permit the reuse
1154of existing results.
1160The following table gives a high-level overview of the \matita{}
1161source files in Deliverable 3.4:
1166\lstinline'' & Top-level compiler definitions, in particular
1167\lstinline'front_end', and the whole compiler definition
1168\lstinline'compile'. \\
1169\lstinline'' & Correctness results: \lstinline'front_end_correct'
1170and \lstinline'correct', respectively. \\
1171\lstinline'Clight/*' & \textsf{Clight}: proofs for switch
1172removal, cost labelling, cast simplification and conversion to
1173\textsf{Cminor}. \\
1174\lstinline'Cminor/*' & \textsf{Cminor}: axioms of conversion to
1175\textsf{RTLabs}. \\
1176\lstinline'RTLabs/*' & \textsf{RTLabs}: definitions and proofs for
1177compile-time cost labelling checks, construction of structured traces.
1179\lstinline'common/' & Definitions for measurable
1180subtraces. \\
1181\lstinline'common/' & Generic measurable subtrace
1182lifting proof. \\
1183\lstinline'common/*' & Other common definitions relevant to many parts
1184of the compiler and proof. \\
1185\lstinline'utilities/*' & General purpose definitions used throughout,
1186including extensions to the standard \matita{} library.
Note: See TracBrowser for help on using the repository browser.