source: Deliverables/D3.4/Report/report.tex @ 3212

Last change on this file since 3212 was 3212, checked in by campbell, 7 years ago

Sort out some "to do"s, minimal conclusion.

File size: 46.8 KB
14% LaTeX Companion, p 74
15\newcommand{\todo}[1]{\marginpar{\raggedright - #1}}
18  {keywords={Definition,Lemma,Theorem,Remark,Qed,Save,Inductive,Record},
19   morekeywords={[2]if,then,else},
20  }
23  {keywords={definition,lemma,theorem,remark,inductive,record,qed,let,rec,match,with,Type,and,on},
24   morekeywords={[2]whd,normalize,elim,cases,destruct},
25   mathescape=true,
26   morecomment=[n]{(*}{*)},
27  }
30        keywordstyle=\color{red}\bfseries,
31        keywordstyle=[2]\color{blue},
32        commentstyle=\color{green},
33        stringstyle=\color{blue},
34        showspaces=false,showstringspaces=false}
48\vspace*{1cm}Project FP7-ICT-2009-C-243881 \cerco{}}
50\date{ }
70Report n. D3.4\\
71Front-end Correctness Proofs\\
78Version 1.0
86Brian~Campbell, Ilias~Garnier, James~McKinna, Ian~Stark
92Project Acronym: \cerco{}\\
93Project full title: Certified Complexity\\
94Proposal/Contract no.: FP7-ICT-2009-C-243881 \cerco{}\\
96\clearpage \pagestyle{myheadings} \markright{\cerco{}, FP7-ICT-2009-C-243881}
102We report on the correctness proofs for the front-end of the \cerco{}
103cost lifting compiler.  First, we identify the core result we wish to
104prove, which says that the we correctly predict the precise execution
105time for particular parts of the execution called \emph{measurable}
106subtraces.  Then we consider the three distinct parts of the task:
107showing that the \emph{annotated source code} output by the compiler
108has equivalent behaviour to the original input (up to the
109annotations); showing that a measurable subtrace of the
110annotated source code corresponds to an equivalent measurable subtrace
111in the code produced by the front-end, including costs; and finally
112showing that the enriched \emph{structured} execution traces required
113for cost correctness in the back-end can be constructed from the
114properties of the code produced by the front-end.
116A key part of our work is that the intensional correctness results that show
117that we get consistent cost measurements throughout the intermediate languages
118of the compiler can be layered on top of normal forward simulation results,
119if we split those results into local call-structure preserving simulations.
121This deliverable shows correctness results about the formalised
122compiler described in D3.2, using the source language semantics from
123D3.1 and intermediate language semantics from D3.3.  It builds on
124earlier work on a toy compiler built to test the labelling approach in
125D2.1. Together with the companion deliverable about the correctness of
126the back-end, D4.4, we obtain results about the whole formalised
129% TODO: mention the deliverable about the extracted compiler et al?
135% CHECK: clear up any -ize vs -ise
136% CHECK: clear up any "front end" vs "front-end"
137% CHECK: clear up any mentions of languages that aren't textsf'd.
138% CHECK: fix unicode in listings
142The \cerco{} compiler compiles C code, targeting microcontrollers
143implementing the Intel 8051 architecture.  It produces both the object
144code and source code containing annotations describing the timing
145behaviour of the object code.  There are two versions: first, an
146initial prototype implemented in \ocaml{}~\cite{d2.2}, and a version
147formalised in the \matita{} proof assistant~\cite{d3.2,d4.2} and then
148extracted to \ocaml{} code to produce an executable compiler.  In this
149document we present results formalised in \matita{} about the
150front-end of the latter version of the compiler, and how that fits
151into the verification of the whole compiler.
153A key part of this work was to layer the intensional correctness
154results that show that the costs given are correct on top of the
155proofs about the compiled code's extensional behaviour (that is, the
156functional correctness of the compiler).  Unfortunately, the ambitious
157goal of completely verifying the entire compiler was not feasible
158within the time available, but thanks to this separation of
159extensional and intensional proofs we are able to axiomatize
160simulation results similar to those in other compiler verification
161projects and concentrate on the novel intensional proofs.  We were
162also able to add stack space costs to obtain a stronger result.  The
163proofs were made more tractable by introducing compile-time checks for
164the `sound and precise' cost labelling properties rather than proving
165that they are preserved throughout.
167The overall statement of correctness says that the annotated program has the
168same behaviour as the input, and that for any suitably well-structured part of
169the execution (which we call \emph{measurable}), the object code will execute
170the same behaviour taking precisely the time given by the cost annotations in
171the annotated source program.
173In the next section we recall the structure of the compiler and make the overall
174statement more precise.  Following that, in Section~\ref{sec:fegoals} we
175describe the statements we need to prove about the intermediate \textsf{RTLabs}
176programs sufficient for the back-end proofs.
177Section~\ref{sec:inputtolabelling} covers the passes which produce the
178annotated source program and Section~\ref{sec:measurablelifting} the rest
179of the transformations in the front-end.  Then the compile time checks
180for good cost labelling are detailed in Section~\ref{sec:costchecks}
181and the proofs that the structured traces required by the back-end
182exist are discussed in Section~\ref{sec:structuredtrace}.
184\section{The compiler and main goals}
186The unformalised \ocaml{} \cerco{} compiler was originally described
187in Deliverables 2.1 and 2.2.  Its design was replicated in the formal
188\matita{} code, which was presented in Deliverables 3.2 and 4.2, for
189the front-end and back-end respectively.
195\caption{Languages in the \cerco{} compiler}
199The compiler uses a number of intermediate languages, as outlined the
200middle two lines of Figure~\ref{fig:compilerlangs}.  The upper line
201represents the front-end of the compiler, and the lower the back-end,
202finishing with 8051 binary code.  Not all of the front-end passes
203introduces a new language, and Figure~\ref{fig:summary} presents a
204list of every pass involved.
210\quad \= $\downarrow$ \quad \= \kill
211\textsf{C} (unformalized)\\
212\> $\downarrow$ \> CIL parser (unformalized \ocaml)\\
214%\> $\downarrow$ \> add runtime functions\\
215\> $\downarrow$ \> \lstinline[language=C]'switch' removal\\
216\> $\downarrow$ \> labelling\\
217\> $\downarrow$ \> cast removal\\
218\> $\downarrow$ \> stack variable allocation and control structure
219 simplification\\
221%\> $\downarrow$ \> generate global variable initialization code\\
222\> $\downarrow$ \> transform to RTL graph\\
224\> $\downarrow$ \> check cost labelled properties of RTL graph\\
225\> $\downarrow$ \> start of target specific back-end\\
226\>\quad \vdots
230\caption{Front-end languages and compiler passes}
235The annotated source code is taken after the cost labelling phase.
236Note that there is a pass to replace C \lstinline[language=C]'switch'
237statements before labelling --- we need to remove them because the
238simple form of labelling used in the formalised compiler is not quite
239capable of capturing their execution time costs, largely due to the
240fall-through behaviour.
242The cast removal phase which follows cost labelling simplifies
243expressions to prevent unnecessary arithmetic promotion which is
244specified by the C standard but costly for an 8-bit target.  The
245transformation to \textsf{Cminor} and subsequently \textsf{RTLabs}
246bear considerable resemblance to some passes of the CompCert
247compiler~\cite{Blazy-Leroy-Clight-09,Leroy-backend}, although we use a simpler \textsf{Cminor} where
248all loops use \lstinline[language=C]'goto' statements, and the
249\textsf{RTLabs} language retains a target-independent flavour.  The
250back-end takes \textsf{RTLabs} code as input.
252The whole compilation function returns the following information on success:
254record compiler_output : Type[0] :=
255 { c_labelled_object_code: labelled_object_code
256 ; c_stack_cost: stack_cost_model
257 ; c_max_stack: nat
258 ; c_init_costlabel: costlabel
259 ; c_labelled_clight: clight_program
260 ; c_clight_cost_map: clight_cost_map
261 }.
263It consists of annotated 8051 object code, a mapping from function
264identifiers to the function's stack space usage\footnote{The compiled
265  code's only stack usage is to allocate a fixed-size frame on each
266  function entry and discard it on exit.  No other dynamic allocation
267  is provided by the compiler.}, the space available for the
268stack after global variable allocation, a cost label covering the
269execution time for the initialisation of global variables and the call
270to the \lstinline[language=C]'main' function, the annotated source
271code, and finally a mapping from cost labels to actual execution time
274An \ocaml{} pretty printer is used to provide a concrete version of the output
275code.  In the case of the annotated source code, it also inserts the actual
276costs alongside the cost labels, and optionally adds a global cost variable
277and instrumentation to support further reasoning.
279\subsection{Revisions to the prototype compiler}
281Our focus on intensional properties prompted us to consider whether we
282could incorporate stack space into the costs presented to the user.
283We only allocate one fixed-size frame per function, so modelling this
284was relatively simple.  It is the only form of dynamic memory
285allocation provided by the compiler, so we were able to strengthen the
286statement of the goal to guarantee successful execution whenever the
287stack space obeys a bound calculated by subtracting the global
288variable requirements from the total memory available.
290The cost checks at the end of Figure~\ref{fig:summary} have been
291introduced to reduce the proof burden, and are described in
294The use of dependent types to capture simple intermediate language
295invariants makes every front-end pass except \textsf{Clight} to
296\textsf{Cminor} and the cost checks total functions.  Hence various
297well-formedness and type checks are dealt with once in that phase.
298With the benefit of hindsight we would have included an initial
299checking phase to produce a `well-formed' variant of \textsf{Clight},
300conjecturing that this would simplify various parts of the proofs for
301the \textsf{Clight} stages which deal with potentially ill-formed
304\todo{move of initialisation code?}
306\subsection{Main goals}
308TODO: need an example for this
310Informally, our main intensional result links the time difference in a source
311code execution to the time difference in the object code, expressing the time
312for the source by summing the values for the cost labels in the trace, and the
313time for the target by a clock built in to the 8051 executable semantics.
315The availability of precise timing information for 8501
316implementations and the design of the compiler allow it to give exact
317time costs in terms of processor cycles, not just upper bounds.
318However, these exact results are only available if the subtrace we
319measure starts and ends at suitable points.  In particular, pure
320computation with no observable effects may be reordered and moved past
321cost labels, so we cannot measure time between arbitrary statements in
322the program.
324There is also a constraint on the subtraces that we
325measure due to the requirements of the correctness proof for the
326object code timing analysis.  To be sure that the timings are assigned
327to the correct cost label, we need to know that each return from a
328function call must go to the correct return address.  It is difficult
329to observe this property locally in the object code because it relies
330on much earlier stages in the compiler.  To convey this information to
331the timing analysis extra structure is imposed on the subtraces, which
332is described in Section~\ref{sec:fegoals}.
334% Regarding the footnote, would there even be much point?
335% TODO: this might be quite easy to add ('just' subtract the
336% measurable subtrace from the second label to the end).  Could also
337% measure other traces in this manner.
338These restrictions are reflected in the subtraces that we give timing
339guarantees on; they must start at a cost label and end at the return
340of the enclosing function of the cost label\footnote{We expect that
341  this would generalise to subtraces between cost labels in the same
342  function, but could not justify the extra complexity that would be
343  required to show this.}.  A typical example of such a subtrace is
344the execution of an entire function from the cost label at the start
345of the function until it returns.  We call such any such subtrace
346\emph{measurable} if it (and the prefix of the trace from the start to
347the subtrace) can also be executed within the available stack space.
349Now we can give the main intensional statement for the compiler.
350Given a \emph{measurable} subtrace for a labelled \textsf{Clight}
351program, there is a subtrace of the 8051 object code program where the
352time differences match.  Moreover, \emph{observable} parts of the
353trace also match --- these are the appearance of cost labels and
354function calls and returns.
356More formally, the definition of this statement in \matita{} is
358definition simulates :=
359  $\lambda$p: compiler_output.
360  let initial_status := initialise_status $\dots$ (cm (c_labelled_object_code $\dots$ p)) in
361  $\forall$m1,m2.
362   measurable Clight_pcs (c_labelled_clight $\dots$ p) m1 m2
363    (lookup_stack_cost (c_stack_cost $\dots$ p)) (c_max_stack $\dots$ p) $\rightarrow$
364  $\forall$c1,c2.
365   clock_after Clight_pcs (c_labelled_clight $\dots$ p) m1 (c_clight_cost_map $\dots$ p) = OK $\dots$ c1 $\rightarrow$
366   clock_after Clight_pcs (c_labelled_clight $\dots$ p) (m1+m2) (c_clight_cost_map $\dots$ p) = OK $\dots$ c2 $\rightarrow$
367  $\exists$n1,n2.
368   observables Clight_pcs (c_labelled_clight $\dots$ p) m1 m2 =
369   observables (OC_preclassified_system (c_labelled_object_code $\dots$ p))
370    (c_labelled_object_code $\dots$ p) n1 n2
371  $\wedge$
372   c2 - c1 = clock $\dots$ (execute n2 ? initial_status) - clock $\dots$ (execute n1 ? initial_status).
374where the \lstinline'measurable', \lstinline'clock_after' and
375\lstinline'observables' definitions can be applied to multiple
376languages; in this case the \lstinline'Clight_pcs' record applies them
377to \textsf{Clight} programs.
379There is a second part to the statement, which says that the initial
380processing of the input program to produce the cost labelled version
381does not affect the semantics of the program:
382% Yes, I'm paraphrasing the result a tiny bit to remove the observe non-function
384  $\forall$input_program,output.
385  compile input_program = return output $\rightarrow$
386  not_wrong … (exec_inf … clight_fullexec input_program) $\rightarrow$
387  sim_with_labels
388   (exec_inf … clight_fullexec input_program)
389   (exec_inf … clight_fullexec (c_labelled_clight … output))
391That is, any successful compilation produces a labelled program that
392has identical behaviour to the original, so long as there is no
393`undefined behaviour'.
395Note that this provides full functional correctness, including
396preservation of (non-)termination.  The intensional result above does
397not do this directly --- it does not guarantee the same result or same
398termination.  There are two mitigating factors, however: first, to
399prove the intensional property you need local simulation results that
400can be pieced together to form full behavioural equivalence, only time
401constraints have prevented us from doing so.  Second, if we wish to
402confirm a result, termination, or non-termination we could add an
403observable witness, such as a function that is only called if the
404correct result is given.  The intensional result guarantees that the
405observable witness is preserved, so the program must behave correctly.
407\section{Intermediate goals for the front-end}
410The essential parts of the intensional proof were outlined during work
411on a toy
412compiler~\cite{d2.1,springerlink:10.1007/978-3-642-32469-7_3}.  These
415\item functional correctness, in particular preserving the trace of
416  cost labels,
417\item the \emph{soundness} and \emph{precision} of the cost labelling
418  on the object code, and
419\item the timing analysis on the object code produces a correct
420  mapping from cost labels to time.
423However, that toy development did not include function calls.  For the
424full \cerco{} compiler we also need to maintain the invariant that
425functions return to the correct program location in the caller, as we
426mentioned in the previous section.  During work on the back-end timing
427analysis (describe in more detail in the companion deliverable, D4.4)
428the notion of a \emph{structured trace} was developed to enforce this
429return property, and also most of the cost labelling properties too.
435\caption{The compiler and proof outline}
439Jointly, we generalised the structured traces to apply to any of the
440intermediate languages with some idea of program counter.  This means
441that they are introduced part way through the compiler, see
442Figure~\ref{fig:compiler}.  Proving that a structured trace can be
443constructed at \textsf{RTLabs} has several virtues:
445\item This is the first language where every operation has its own
446  unique, easily addressable, statement.
447\item Function calls and returns are still handled implicitly in the
448  language and so the structural properties are ensured by the
449  semantics.
450\item Many of the back-end languages from \textsf{RTL} share a common
451  core set of definitions, and using structured traces throughout
452  increases this uniformity.
459\caption{Nesting of functions in structured traces}
462A structured trace is a mutually inductive data type which
463contains the steps from a normal program trace, but arranged into a
464nested structure which groups entire function calls together and
465aggregates individual steps between cost labels (or between the final
466cost label and the return from the function), see
467Figure~\ref{fig:strtrace}.  This captures the idea that the cost labels
468only represent costs \emph{within} a function --- calls to other
469functions are accounted for in the nested trace for their execution, and we
470can locally regard function calls as a single step.
472These structured traces form the core part of the intermediate results
473that we must prove so that the back-end can complete the main
474intensional result stated above.  In full, we provide the back-end
477\item A normal trace of the \textbf{prefix} of the program's execution
478  before reaching the measurable subtrace.  (This needs to be
479  preserved so that we know that the stack space consumed is correct,
480  and to set up the simulation results.)
481\item The \textbf{structured trace} corresponding to the measurable
482  subtrace.
483\item An additional property about the structured trace that no
484  `program counter' is \textbf{repeated} between cost labels.  Together with
485  the structure in the trace, this takes over from showing that
486  cost labelling is sound and precise.
487\item A proof that the \textbf{observables} have been preserved.
488\item A proof that the \textbf{stack limit} is still observed by the prefix and
489  the structure trace.  (This is largely a consequence of the
490  preservation of observables.)
493Following the outline in Figure~\ref{fig:compiler}, we will first deal
494with the transformations in \textsf{Clight} that produce the source
495program with cost labels, then show that measurable traces can be
496lifted to \textsf{RTLabs}, and finally that we can construct the
497properties listed above ready for the back-end proofs.
499\section{Input code to cost labelled program}
502As explained on page~\pageref{page:switchintro}, the costs of complex
503C \lstinline[language=C]'switch' statements cannot be represented with
504the simple labelling.  Our first pass replaces these statements with
505simpler C code, allowing our second pass to perform the cost
506labelling.  We show that the behaviour of programs is unchanged by
507these passes.
509\subsection{Switch removal}
511The intermediate languages of the front-end have no jump tables.
512Hence, we have to compile the \lstinline[language=C]'switch'
513constructs away before going from \textsf{Clight} to \textsf{Cminor}.
514Note that this transformation does not necessarily deteriorate the
515efficiency of the generated code. For instance, compilers such as GCC
516introduce balanced trees of ``if-then-else'' constructs for small
517switches. However, our implementation strategy is much simpler. Let
518us consider the following input statement.
521   switch(e) {
522   case v1:
523     stmt1;
524   case v2:
525     stmt2;
526   default:
527     stmt_default;
528   }
531Note that \textsf{stmt1}, \textsf{stmt2}, \ldots \textsf{stmt\_default}
532may contain \lstinline[language=C]'break' statements, which have the
533effect of exiting the switch statement. In the absence of break, the
534execution falls through each case sequentially. In our current implementation,
535we produce an equivalent sequence of ``if-then'' chained by gotos:
537   fresh = e;
538   if(fresh == v1) {
539     $\llbracket$stmt1$\rrbracket$;
540     goto lbl_case2;
541   };
542   if(fresh == v2) {
543     lbl_case2:
544     $\llbracket$stmt2;$\rrbracket$
545     goto lbl_case2;
546   };
547   $\llbracket$stmt_default$\rrbracket$;
548   exit_label:
551The proof had to tackle the following points:
553\item the source and target memories are not the same (cf. fresh variable),
554\item the flow of control is changed in a non-local way (e.g. \textbf{goto} 
555  instead of \textbf{break}).
558In order to tackle the first point, we implemented a version of memory
559extensions similar to Compcert's. What has been done is the simulation proof
560for all ``easy'' cases, that do not interact with the switch removal per se,
561plus a bit of the switch case. This comprises propagating the memory extension
562through  each statement (except switch), as well as various invariants that
563are needed for the switch case (in particular, freshness hypotheses). The
564details of the evaluation process for the source switch statement and its
565target counterpart can be found in the file \textbf{},
566along more details on the transformation itself.
568Proving the correctness of the second point would require reasoning on the
569semantics of \lstinline[language=C]'goto' statements. In the \textsf{Clight} 
570semantics, this is implemented as a function-wide lookup of the target label.
571The invariant we would need is the fact that a global label lookup on a freshly
572created goto is equivalent to a local lookup. This would in turn require the
573propagation of some freshness hypotheses on labels. For reasons expressed above,
574we decided to omit this part of the correctness proof.
576\subsection{Cost labelling}
578The simulation for the cost labelling pass is the simplest in the
579front-end.  The main argument is that any step of the source program
580is simulated by the same step of the labelled one, plus any extra
581steps for the added cost labels.  The extra instructions do not change
582the memory or local environments, although we have to keep track of
583the extra instructions in continuations.
585We do not attempt to capture any cost properties of the labelling in
586the simulation proof, allowing the proof to be oblivious to the choice
587of cost labels.  Hence we do not have to reason about the threading of
588name generation through the labelling function, greatly reducing the
589amount of effort required.  We describe how the cost properties are
590established in Section~\ref{sec:costchecks}.
592%TODO: both give one-step-sim-by-many forward sim results; switch
593%removal tricky, uses aux var to keep result of expr, not central to
594%intensional correctness so curtailed proof effort once reasonable
595%level of confidence in code gained; labelling much simpler; don't care
596%what the labels are at this stage, just need to know when to go
597%through extra steps.  Rolled up into a single result with a cofixpoint
598%to obtain coinductive statement of equivalence (show).
600\section{Finding corresponding measurable subtraces}
603There follow the three main passes of the front-end:
605\item simplification of casts in \textsf{Clight} code
606\item \textsf{Clight} to \textsf{Cminor} translation, performing stack
607  variable allocation and simplifying control structures
608\item transformation to \textsf{RTLabs} control flow graph
610\todo{I keep mentioning forward sim results - I probably ought to say
611  something about determinancy} We have taken a common approach to
612each pass: first we build (or axiomatise) forward simulation results
613that are similar to normal compiler proofs, but slightly more
614fine-grained so that we can see that the call structure and relative
615placement of cost labels is preserved.
617Then we instantiate a general result which shows that we can find a
618\emph{measurable} subtrace in the target of the pass that corresponds
619to the measurable subtrace in the source.  By repeated application of
620this result we can find a measurable subtrace of the execution of the
621\textsf{RTLabs} code, suitable for the construction of a structured
622trace (see Section~\ref{sec:structuredtrace}.  This is essentially an
623extra layer on top of the simulation proofs that provides us with the
624extra information required for our intensional correctness proof.
626\subsection{Generic measurable subtrace lifting proof}
628Our generic proof is parametrised on a record containing small-step
629semantics for the source and target language, a classification of
630states (the same form of classification is used when defining
631structured traces), a simulation relation which loosely respects the
632classification and cost labelling \todo{this isn't very helpful} and
633four simulation results:
635\item a step from a `normal' state (which is not classified as a call
636  or return) which is not a cost label is simulated by zero or more
637  `normal' steps;
638\item a step from a `call' state followed by a cost label step is
639  simulated by a step from a `call' state, a corresponding label step,
640  then zero or more `normal' steps;
641\item a step from a `call' state not followed by a cost label
642  similarly (note that this case cannot occur in a well-labelled
643  program, but we do not have enough information locally to exploit
644  this); and
645\item a cost label step is simulated by a cost label step.
647Finally, we need to know that a successfully translated program will
648have an initial state in the simulation relation with the original
649program's initial state.
655\caption{Tiling of simulation for a measurable subtrace}
659To find the measurable subtrace in the target program's execution we
660walk along the original program's execution trace applying the
661appropriate simulation result by induction on the number of steps.
662While the number of steps taken varies, the overall structure is
663preserved, as illustrated in Figure~\ref{fig:tiling}.  By preserving
664the structure we also maintain the same intensional observables.  One
665delicate point is that the cost label following a call must remain
666directly afterwards\footnote{The prototype compiler allowed some
667  straight-line code to appear before the cost label until a later
668  stage of the compiler, but we must move the requirement forward to
669  fit with the structured traces.}
670% Damn it, I should have just moved the cost label forwards in RTLabs,
671% like the prototype does in RTL to ERTL; the result would have been
672% simpler.  Or was there some reason not to do that?
673(both in the program code and in the execution trace), even if we
674introduce extra steps, for example to store parameters in memory in
675\textsf{Cminor}.  Thus we have a version of the call simulation
676that deals with both in one result.
678In addition to the subtrace we are interested in measuring we must
679also prove that the earlier part of the trace is also preserved in
680order to use the simulation from the initial state.  It also
681guarantees that we do not run out of stack space before the subtrace
682we are interested in.  The lemmas for this prefix and the measurable
683subtrace are similar, following the pattern above.  However, the
684measurable subtrace also requires us to rebuild the termination
685proof.  This is defined recursively:
688let rec will_return_aux C (depth:nat)
689                             (trace:list (cs_state … C × trace)) on trace : bool :=
690match trace with
691[ nil $\Rightarrow$ false
692| cons h tl $\Rightarrow$
693  let $\langle$s,tr$\rangle$ := h in
694  match cs_classify C s with
695  [ cl_call $\Rightarrow$ will_return_aux C (S depth) tl
696  | cl_return $\Rightarrow$
697      match depth with
698      [ O $\Rightarrow$ match tl with [ nil $\Rightarrow$ true | _ $\Rightarrow$ false ]
699      | S d $\Rightarrow$ will_return_aux C d tl
700      ]
701  | _ $\Rightarrow$ will_return_aux C depth tl
702  ]
705The \lstinline'depth' is the number of return states we need to see
706before we have returned to the original function (initially zero) and
707\lstinline'trace' the measurable subtrace obtained from the running
708the semantics for the correct number of steps.  This definition
709unfolds tail recursively for each step, and once the corresponding
710simulation result has been applied a new one for the target can be
711asserted by unfolding and applying the induction hypothesis on the
712shorter trace.
714This then gives us an overall result for any simulation fitting the
715requirements above (contained in the \lstinline'meas_sim' record):
717theorem measured_subtrace_preserved :
718  $\forall$MS:meas_sim.
719  $\forall$p1,p2,m,n,stack_cost,max.
720  ms_compiled MS p1 p2 $\rightarrow$
721  measurable (ms_C1 MS) p1 m n stack_cost max $\rightarrow$
722  $\exists$m',n'.
723    measurable (ms_C2 MS) p2 m' n' stack_cost max $\wedge$
724    observables (ms_C1 MS) p1 m n = observables (ms_C2 MS) p2 m' n'.
726The stack space requirement that is embedded in \lstinline'measurable'
727is a consequence of the preservation of observables.
729TODO: how to deal with untidy edges wrt to sim rel; anything to
730say about obs?
732TODO: say something about termination measures; cost labels are
733statements/exprs in these languages; split call/return gives simpler
736\subsection{Simulation results for each pass}
738\paragraph{Cast simplification.}
740The parser used in Cerco introduces a lot of explicit type casts.
741If left as they are, these constructs can greatly hamper the
742quality of the generated code -- especially as the architecture
743we consider is an $8$ bit one. In \textsf{Clight}, casts are
744expressions. Hence, most of the work of this transformation
745proceeds on expressions. The tranformation proceeds by recursively
746trying to coerce an expression to a particular integer type, which
747is in practice smaller than the original one. This functionality
748is implemented by two mutually recursive functions whose signature
749is the following.
752let rec simplify_expr (e:expr) (target_sz:intsize) (target_sg:signedness)
753  : Σresult:bool×expr.
754    ∀ge,en,m. simplify_inv ge en m e (\snd result) target_sz target_sg (\fst result) ≝ $\ldots$
756and simplify_inside (e:expr) : Σresult:expr. conservation e result ≝ $\ldots$
759The \textsf{simplify\_inside} acts as a wrapper for
760\textsf{simplify\_expr}. Whenever \textsf{simplify\_inside} encounters
761a \textsf{Ecast} expression, it tries to coerce the sub-expression
762to the desired type using \textsf{simplify\_expr}, which tries to
763perform the actual coercion. In return, \textsf{simplify\_expr} calls
764back \textsf{simplify\_inside} in some particular positions, where we
765decided to be conservative in order to simplify the proofs. However,
766the current design allows to incrementally revert to a more aggressive
767version, by replacing recursive calls to \textsf{simplify\_inside} by
768calls to \textsf{simplify\_expr} \emph{and} proving the corresponding
769invariants -- where possible.
771The \textsf{simplify\_inv} invariant encodes either the conservation
772of the semantics during the transformation corresponding to the failure
773of the transformation (\textsf{Inv\_eq} constructor), or the successful
774downcast of the considered expression to the target type
778inductive simplify_inv
779  (ge : genv) (en : env) (m : mem)
780  (e1 : expr) (e2 : expr) (target_sz : intsize) (target_sg : signedness) : bool → Prop ≝
781| Inv_eq : ∀result_flag. $\ldots$
782     simplify_inv ge en m e1 e2 target_sz target_sg result_flag
783| Inv_coerce_ok : ∀src_sz,src_sg.
784     (typeof e1) = (Tint src_sz src_sg) → (typeof e2) = (Tint target_sz target_sg) →     
785     (smaller_integer_val src_sz target_sz src_sg (exec_expr ge en m e1) (exec_expr ge en m e2)) →
786     simplify_inv ge en m e1 e2 target_sz target_sg true.
789The \textsf{conservation} invariant simply states the conservation
790of the semantics, as in the \textsf{Inv\_eq} constructor of the previous
794definition conservation ≝ λe,result. ∀ge,en,m.
795          res_sim ? (exec_expr ge en m e) (exec_expr ge en m result)
796         ∧ res_sim ? (exec_lvalue ge en m e) (exec_lvalue ge en m result)
797         ∧ typeof e = typeof result.
800This invariant is then easily lifted to statement evaluations.
801The main problem encountered with this particular pass was dealing with
802inconsistently typed programs, a canonical case being a particular
803integer constant of a certain size typed with another size. This
804prompted the need to introduce numerous type checks, complexifying
805both the implementation and the proof.
806\todo{Make this a particular case of the more general statement on baking more invariants in the Clight language}
808\paragraph{Clight to Cminor.}
810This pass is the last one operating on the Clight intermediate language.
811Its input is a full \textsf{Clight} program, and its output is a \textsf{Cminor} 
812program. Note that we do not use an equivalent of the C\#minor language: we
813translate directly to Cminor. This presents the advantage of not requiring the
814special loop constructs, nor the explicit block structure. Another salient
815point of our approach is that a significant part of the properties needed for
816the simulation proof were directly encoded in dependently typed translation
817functions. This concerns more precisely freshness conditions and well-typedness
818conditions. The main effects of the transformation from \textsf{Clight} to
819\textsf{Cminor} are listed below.
822\item Variables are classified as being either globals, stack-allocated
823  locals or potentially register-allocated locals. The value of register-allocated
824  local variables is moved out of the modelled memory and stored in a
825  dedicated environment.
826\item In Clight, each local variable has a dedicated memory block, whereas
827  stack-allocated locals are bundled together on a function-by-function basis.
828\item Loops are converted to jumps.
831The first two points require memory injections which are more flexible that those
832needed in the switch removal case. In the remainder of this section, we briefly
833discuss our implementation of memory injections, and then the simulation proof.
835\subparagraph{Memory injections.}
837Our memory injections are modelled after the work of Blazy \& Leroy.
838However, the corresponding paper is based on the first version of the
839Compcert memory model, whereas we use a much more concrete model, allowing byte-level
840manipulations (as in the later version of Compcert's memory model). We proved
841roughly 80 \% of the required lemmas. Some difficulties encountered were notably
842due to some too relaxed conditions on pointer validity (problem fixed during development).
843Some more conditions had to be added to take care of possible overflows when converting
844from \textbf{Z} block bounds to $16$ bit pointer offsets (in pratice, these overflows
845only occur in edge cases that are easily ruled out -- but this fact is not visible
846in memory injections). Concretely, some of the lemmas on the preservation of simulation of
847loads after writes were axiomatised, for lack of time.
849\subparagraph{Simulation proof.}
851The correction proof for this transformation was not terminated. We proved the
852simulation result for expressions and for some subset of the critical statement cases.
853Notably lacking are the function entry and exit, where the memory injection is
854properly set up. As can be guessed, a significant amount of work has to be performed
855to show the conservation of all invariants at each simulation step.
857\todo{list cases, explain while loop, explain labeling problem}
859\section{Checking cost labelling properties}
862Ideally, we would provide proofs that the cost labelling pass always
863produced programs that are soundly and precisely labelled and that
864each subsequent pass preserves these properties.  This would match our
865use of dependent types to eliminate impossible sources of errors
866during compilation, in particular retaining intermediate language type
869However, given the limited amount of time available we realised that
870implementing a compile-time check for a sound and precise labelling of
871the \textsf{RTLabs} intermediate code would reduce the proof burden
872considerably.  This is similar in spirit to the use of translation
873validation in certified compilation, which makes a similar trade-off
874between the potential for compile-time failure and the volume of proof
877The check cannot be pushed into a later stage of the compiler because
878much of the information is embedded into the structured traces.
879However, if an alternative method was used to show that function
880returns in the compiled code are sufficiently well-behaved, then we
881could consider pushing the cost property checks into the timing
882analysis itself.  We leave this as a possible area for future work.
884\subsection{Implementation and correctness}
886For a cost labelling to be sound and precise we need a cost label at
887the start of each function, after each branch and at least one in
888every loop.  The first two parts are trivial to check by examining the
889code.  In \textsf{RTLabs} the last part is specified by saying
890that there is a bound on the number of successive instruction nodes in
891the CFG that you can follow before you encounter a cost label, and
892checking this is more difficult.
894The implementation works through the set of nodes in the graph,
895following successors until a cost label is found or a label-free cycle
896is discovered (in which case the property does not hold and we stop).
897This is made easier by the prior knowledge that any branch is followed
898by cost labels, so we do not need to search each branch.  When a label
899is found, we remove the chain from the set and continue from another
900node in the set until it is empty, at which point we know that there
901is a bound for every node in the graph.
903Directly reasoning about the function that implements this would be
904rather awkward, so an inductive specification of a single step of its
905behaviour was written and proved to match the implementation.  This
906was then used to prove the implementation sound and complete.
908While we have not attempted to proof that the cost labelled properties
909are established and preserved earlier in the compiler, we expect that
910the effort for the \textsf{Cminor} to \textsf{RTLabs} would be similar
911to the work outlined above, because it involves the change from
912requiring a cost label at particular positions to requiring cost
913labels to break loops in the CFG.  As there are another three passes
914to consider (including the labelling itself), we believe that using
915the check above is much simpler overall.
917% TODO? Found some Clight to Cminor bugs quite quickly
919\section{Existence of a structured trace}
922\emph{Structured traces} enrich the execution trace of a program by
923nesting function calls in a mixed-step style and embedding the cost
924properties of the program.  It was originally designed to support the
925proof of correctness for the timing analysis of the object code in the
926back-end, then generalised to provide a common structure to use from
927the end of the front-end to the object code.  See
928Figure~\ref{fig:strtrace} on page~\pageref{fig:strtrace} for an
929illustration of a structured trace.
931To make the definition generic we abstract over the semantics of the
934record abstract_status : Type[1] :=
935  { as_status :> Type[0]
936  ; as_execute : relation as_status
937  ; as_pc : DeqSet
938  ; as_pc_of : as_status $\rightarrow$ as_pc
939  ; as_classify : as_status $\rightarrow$ status_class
940  ; as_label_of_pc : as_pc $\rightarrow$ option costlabel
941  ; as_after_return : ($\Sigma$s:as_status. as_classify s =  cl_call) $\rightarrow$ as_status $\rightarrow$ Prop
942  ; as_result: as_status $\rightarrow$ option int
943  ; as_call_ident : ($\Sigma$s:as_status.as_classify s = cl_call) $\rightarrow$ ident
944  ; as_tailcall_ident : ($\Sigma$s:as_status.as_classify s = cl_tailcall) $\rightarrow$ ident
945  }.
947which gives a type of states, an execution relation, some notion of
948program counters with decidable equality, the classification of states
949and functions to extract the observable intensional information (cost
950labels and the identity of functions that are called).  The
951\lstinline'as_after_return' property links the state before a function
952call with the state after return, providing the evidence that
953execution returns to the correct place.  The precise form varies
954between stages; in \textsf{RTLabs} it insists the CFG, the pointer to
955the CFG node to execute next, and some call stack information is
958The structured traces are defined using three mutually inductive
959types.  The core data structure is \lstinline'trace_any_label', which
960captures some straight-line execution until the next cost label or
961function return.  Each function call is embedded as a single step,
962with its own trace nested inside and the before and after states
963linked by \lstinline'as_after_return', and states classified as a
964`jump' (in particular branches) must be followed by a cost label.
966The second type, \lstinline'trace_label_label', is a
967\lstinline'trace_any_label' where the initial state is cost labelled.
968Thus a trace in this type identifies a series of steps whose cost is
969entirely accounted for by the label at the start.
971Finally, \lstinline'trace_label_return' is a sequence of
972\lstinline'trace_label_label' values which end in the return from the
973function.  These correspond to a measurable subtrace, and in
974particular include executions of an entire function call (and so are
975used for the nested calls in \lstinline'trace_any_label').
979The construction of the structured trace replaces syntactic cost
980labelling properties which place requirements on where labels appear
981in the program, with semantics properties that constrain the execution
982traces of the program.  The construction begins by defining versions
983of the sound and precise labelling properties on the program text that
984appears in states rather than programs, and showing that these are
985preserved by steps of the \textsf{RTLabs} semantics.
987Then we show that each cost labelling property the structured traces
988definition requires is satisfied.  These are broken up by the
989classification of states.  Similarly, we prove a step-by-step stack
990preservation result, which states that the \textsf{RTLabs} semantics
991never changes the lower parts of the stack.
993The core part of the construction of a structured trace is to use the
994proof of termination from the measurable trace (defined on
995page~\pageref{prog:terminationproof}) to `fold up' the execution into
996the nested form.  The results outlined above fill in the proof
997obligations for the cost labelling properties and the stack
998preservation result shows that calls return to the correct location.
1000The structured trace alone is not sufficient to capture the property
1001that the program is soundly labelled.  While the structured trace
1002guarantees termination, it still permits a loop to be executed a
1003finite number of times without encountering a cost label.  We
1004eliminate this by proving that no `program counter' repeats within any
1005\lstinline'trace_any_label' section by showing that it is incompatible
1006with the property that there is a bound on the number of successor
1007instructions you can follow in the CFG before you encounter a cost
1010\subsubsection{Whole program structured traces}
1012The development of the construction above started relatively early,
1013before the measurable subtraces preservation proofs.  To be confident
1014that the traces were well-formed, we also developed a whole program
1015form that embeds the traces above.  This includes non-terminating
1016programs, where an infinite number of the terminating structured
1017traces are embedded.  This construction confirmed that our definition
1018of structured traces was consistent, although we later found that we
1019did not require them for the compiler correctness results.
1021To construct these we need to know whether each function call will
1022eventually terminate, requiring the use of the excluded middle.  This
1023classical reasoning is local to the construction of whole program
1024traces and is not necessary for our main results.
1028In combination with the work on the CerCo back-end and by
1029concentrating on the novel intensional parts of the proof, we have
1030shown that it is possible to construct certifying compilers that
1031correctly report execution time and stack space costs.
1033TODO: appendix on code layout?
Note: See TracBrowser for help on using the repository browser.