source: Deliverables/D3.4/Report/report.tex @ 3214

Last change on this file since 3214 was 3214, checked in by campbell, 7 years ago

Some 3.4 revisions.

File size: 47.0 KB
14% LaTeX Companion, p 74
15\newcommand{\todo}[1]{\marginpar{\raggedright - #1}}
18  {keywords={Definition,Lemma,Theorem,Remark,Qed,Save,Inductive,Record},
19   morekeywords={[2]if,then,else},
20  }
23  {keywords={definition,lemma,theorem,remark,inductive,record,qed,let,rec,match,with,Type,and,on},
24   morekeywords={[2]whd,normalize,elim,cases,destruct},
25   mathescape=true,
26   morecomment=[n]{(*}{*)},
27  }
30        keywordstyle=\color{red}\bfseries,
31        keywordstyle=[2]\color{blue},
32        commentstyle=\color{green},
33        stringstyle=\color{blue},
34        showspaces=false,showstringspaces=false}
48\vspace*{1cm}Project FP7-ICT-2009-C-243881 \cerco{}}
50\date{ }
70Report n. D3.4\\
71Front-end Correctness Proofs\\
78Version 1.0
86Brian~Campbell, Ilias~Garnier, James~McKinna, Ian~Stark
92Project Acronym: \cerco{}\\
93Project full title: Certified Complexity\\
94Proposal/Contract no.: FP7-ICT-2009-C-243881 \cerco{}\\
96\clearpage \pagestyle{myheadings} \markright{\cerco{}, FP7-ICT-2009-C-243881}
102We report on the correctness proofs for the front-end of the \cerco{}
103cost lifting compiler.  First, we identify the core result we wish to
104prove, which says that the we correctly predict the precise execution
105time for particular parts of the execution called \emph{measurable}
106subtraces.  Then we consider the three distinct parts of the task:
107showing that the \emph{annotated source code} output by the compiler
108has equivalent behaviour to the original input (up to the
109annotations); showing that a measurable subtrace of the
110annotated source code corresponds to an equivalent measurable subtrace
111in the code produced by the front-end, including costs; and finally
112showing that the enriched \emph{structured} execution traces required
113for cost correctness in the back-end can be constructed from the
114properties of the code produced by the front-end.
116A key part of our work is that the intensional correctness results which show
117that we get consistent cost measurements throughout the intermediate languages
118of the compiler can be layered on top of normal forward simulation results,
119if we split those results into local call-structure preserving simulations.
121This deliverable shows correctness results about the formalised
122compiler described in D3.2, using the source language semantics from
123D3.1 and intermediate language semantics from D3.3.  It builds on
124earlier work on the correctness of a toy compiler built to test the
125labelling approach in D2.1. Together with the companion deliverable
126about the correctness of the back-end, D4.4, we obtain results about
127the whole formalised compiler.
129% TODO: mention the deliverable about the extracted compiler et al?
135% CHECK: clear up any -ize vs -ise
136% CHECK: clear up any "front end" vs "front-end"
137% CHECK: clear up any mentions of languages that aren't textsf'd.
138% CHECK: fix unicode in listings
142The \cerco{} compiler produces a version of the source code containing
143annotations describing the timing behaviour of the object code, as well
144as the object code itself. It compiles C code, targeting
145microcontrollers implementing the Intel 8051 architecture.  There are
146two versions: first, an initial prototype was implemented in
147\ocaml{}~\cite{d2.2}, then a version was formalised in the \matita{} proof
148assistant~\cite{d3.2,d4.2} and extracted to \ocaml{} code to
149produce an executable compiler.  In this document we present results
150formalised in \matita{} about the front-end of the latter version of
151the compiler, and how that fits into the verification of the whole
154A key part of this work was to layer the intensional correctness
155results that show that the costs produced are correct on top of the
156proofs about the compiled code's extensional behaviour (that is, the
157functional correctness of the compiler).  Unfortunately, the ambitious
158goal of completely verifying the entire compiler was not feasible
159within the time available, but thanks to this separation of
160extensional and intensional proofs we are able to axiomatize some
161simulation results which are similar to those in other compiler verification
162projects and concentrate on the novel intensional proofs.  We were
163also able to add stack space costs to obtain a stronger result.  The
164proofs were made more tractable by introducing compile-time checks for
165the `sound and precise' cost labelling properties rather than proving
166that they are preserved throughout.
168The overall statement of correctness says that the annotated program has the
169same behaviour as the input, and that for any suitably well-structured part of
170the execution (which we call \emph{measurable}), the object code will execute
171the same behaviour taking precisely the time given by the cost annotations in
172the annotated source program.
174In the next section we recall the structure of the compiler and make the overall
175statement more precise.  Following that, in Section~\ref{sec:fegoals} we
176describe the statements we need to prove about the intermediate \textsf{RTLabs}
177programs for the back-end proofs.
178Section~\ref{sec:inputtolabelling} covers the passes which produce the
179annotated source program and Section~\ref{sec:measurablelifting} the rest
180of the transformations in the front-end.  Then the compile time checks
181for good cost labelling are detailed in Section~\ref{sec:costchecks}
182and the proofs that the structured traces required by the back-end
183exist are discussed in Section~\ref{sec:structuredtrace}.
185\section{The compiler and main goals}
187The unformalised \ocaml{} \cerco{} compiler was originally described
188in Deliverables 2.1 and 2.2.  Its design was replicated in the formal
189\matita{} code, which was presented in Deliverables 3.2 and 4.2, for
190the front-end and back-end respectively.
196\caption{Languages in the \cerco{} compiler}
200The compiler uses a number of intermediate languages, as outlined the
201middle two lines of Figure~\ref{fig:compilerlangs}.  The upper line
202represents the front-end of the compiler, and the lower the back-end,
203finishing with 8051 binary code.  Not all of the front-end passes
204introduces a new language, and Figure~\ref{fig:summary} presents a
205list of every pass involved.
211\quad \= $\downarrow$ \quad \= \kill
212\textsf{C} (unformalized)\\
213\> $\downarrow$ \> CIL parser (unformalized \ocaml)\\
215%\> $\downarrow$ \> add runtime functions\\
216\> $\downarrow$ \> \lstinline[language=C]'switch' removal\\
217\> $\downarrow$ \> labelling\\
218\> $\downarrow$ \> cast removal\\
219\> $\downarrow$ \> stack variable allocation and control structure
220 simplification\\
222%\> $\downarrow$ \> generate global variable initialization code\\
223\> $\downarrow$ \> transform to RTL graph\\
225\> $\downarrow$ \> check cost labelled properties of RTL graph\\
226\> $\downarrow$ \> start of target specific back-end\\
227\>\quad \vdots
231\caption{Front-end languages and compiler passes}
236The annotated source code is taken after the cost labelling phase.
237Note that there is a pass to replace C \lstinline[language=C]'switch'
238statements before labelling --- we need to remove them because the
239simple form of labelling used in the formalised compiler is not quite
240capable of capturing their execution time costs, largely due to C's
241`fall-through' behaviour where execution from one branch continues in
242the next unless there is an explicit \lstinline[language=C]'break'.
244The cast removal phase which follows cost labelling simplifies
245expressions to prevent unnecessary arithmetic promotion, which is
246specified by the C standard but costly for an 8-bit target.  The
247transformation to \textsf{Cminor} and subsequently \textsf{RTLabs}
248bear considerable resemblance to some passes of the CompCert
249compiler~\cite{Blazy-Leroy-Clight-09,Leroy-backend}, although we use a simpler \textsf{Cminor} where
250all loops use \lstinline[language=C]'goto' statements, and the
251\textsf{RTLabs} language retains a target-independent flavour.  The
252back-end takes \textsf{RTLabs} code as input.
254The whole compilation function returns the following information on success:
256  record compiler_output : Type[0] :=
257   { c_labelled_object_code: labelled_object_code
258   ; c_stack_cost: stack_cost_model
259   ; c_max_stack: nat
260   ; c_init_costlabel: costlabel
261   ; c_labelled_clight: clight_program
262   ; c_clight_cost_map: clight_cost_map
263   }.
265It consists of annotated 8051 object code, a mapping from function
266identifiers to the function's stack space usage, the space available for the
267stack after global variable allocation, a cost label covering the
268execution time for the initialisation of global variables and the call
269to the \lstinline[language=C]'main' function, the annotated source
270code, and finally a mapping from cost labels to actual execution time
273An \ocaml{} pretty printer is used to provide a concrete version of
274the output code and annotated source code.  In the case of the
275annotated source code, it also inserts the actual costs alongside the
276cost labels, and optionally adds a global cost variable and
277instrumentation to support further reasoning.
279\subsection{Revisions to the prototype compiler}
281Our focus on intensional properties prompted us to consider whether we
282could incorporate stack space into the costs presented to the user.
283We only allocate one fixed-size frame per function, so modelling this
284was relatively simple.  It is the only form of dynamic memory
285allocation provided by the compiler, so we were able to strengthen the
286statement of the goal to guarantee successful execution whenever the
287stack space obeys the \lstinline'c_max_stack' bound calculated by
288subtracting the global variable requirements from the total memory
291The cost labelling checks at the end of Figure~\ref{fig:summary} have been
292introduced to reduce the proof burden, and are described in
295The use of dependent types to capture simple intermediate language
296invariants makes every front-end pass except \textsf{Clight} to
297\textsf{Cminor} and the cost checks total functions.  Hence various
298well-formedness and type safety checks are performed once between
299\textsf{Clight} and \textsf{Cminor}, and the invariants rule out any
300difficulties in the later stages.
301With the benefit of hindsight we would have included an initial
302checking phase to produce a `well-formed' variant of \textsf{Clight},
303conjecturing that this would simplify various parts of the proofs for
304the \textsf{Clight} stages which deal with potentially ill-formed
307\todo{move of initialisation code?}
309\subsection{Main goals}
311TODO: need an example for this
313Informally, our main intensional result links the time difference in a source
314code execution to the time difference in the object code, expressing the time
315for the source by summing the values for the cost labels in the trace, and the
316time for the target by a clock built in to the 8051 executable semantics.
318The availability of precise timing information for 8501
319implementations and the design of the compiler allow it to give exact
320time costs in terms of processor cycles, not just upper bounds.
321However, these exact results are only available if the subtrace we
322measure starts and ends at suitable points.  In particular, pure
323computation with no observable effects may be reordered and moved past
324cost labels, so we cannot measure time between arbitrary statements in
325the program.
327There is also a constraint on the subtraces that we
328measure due to the requirements of the correctness proof for the
329object code timing analysis.  To be sure that the timings are assigned
330to the correct cost label, we need to know that each return from a
331function call must go to the correct return address.  It is difficult
332to observe this property locally in the object code because it relies
333on much earlier stages in the compiler.  To convey this information to
334the timing analysis extra structure is imposed on the subtraces, which
335is described in Section~\ref{sec:fegoals}.
337% Regarding the footnote, would there even be much point?
338% TODO: this might be quite easy to add ('just' subtract the
339% measurable subtrace from the second label to the end).  Could also
340% measure other traces in this manner.
341These restrictions are reflected in the subtraces that we give timing
342guarantees on; they must start at a cost label and end at the return
343of the enclosing function of the cost label\footnote{We expect that
344  this would generalise to subtraces between cost labels in the same
345  function, but could not justify the extra complexity that would be
346  required to show this.}.  A typical example of such a subtrace is
347the execution of an entire function from the cost label at the start
348of the function until it returns.  We call such any such subtrace
349\emph{measurable} if it (and the prefix of the trace from the start to
350the subtrace) can also be executed within the available stack space.
352Now we can give the main intensional statement for the compiler.
353Given a \emph{measurable} subtrace for a labelled \textsf{Clight}
354program, there is a subtrace of the 8051 object code program where the
355time differences match.  Moreover, \emph{observable} parts of the
356trace also match --- these are the appearance of cost labels and
357function calls and returns.
359More formally, the definition of this statement in \matita{} is
361definition simulates :=
362  $\lambda$p: compiler_output.
363  let initial_status := initialise_status $\dots$ (cm (c_labelled_object_code $\dots$ p)) in
364  $\forall$m1,m2.
365   measurable Clight_pcs (c_labelled_clight $\dots$ p) m1 m2
366    (lookup_stack_cost (c_stack_cost $\dots$ p)) (c_max_stack $\dots$ p) $\rightarrow$
367  $\forall$c1,c2.
368   clock_after Clight_pcs (c_labelled_clight $\dots$ p) m1 (c_clight_cost_map $\dots$ p) = OK $\dots$ c1 $\rightarrow$
369   clock_after Clight_pcs (c_labelled_clight $\dots$ p) (m1+m2) (c_clight_cost_map $\dots$ p) = OK $\dots$ c2 $\rightarrow$
370  $\exists$n1,n2.
371   observables Clight_pcs (c_labelled_clight $\dots$ p) m1 m2 =
372   observables (OC_preclassified_system (c_labelled_object_code $\dots$ p))
373    (c_labelled_object_code $\dots$ p) n1 n2
374  $\wedge$
375   c2 - c1 = clock $\dots$ (execute n2 ? initial_status) - clock $\dots$ (execute n1 ? initial_status).
377where the \lstinline'measurable', \lstinline'clock_after' and
378\lstinline'observables' definitions can be applied to multiple
379languages; in this case the \lstinline'Clight_pcs' record applies them
380to \textsf{Clight} programs.
382There is a second part to the statement, which says that the initial
383processing of the input program to produce the cost labelled version
384does not affect the semantics of the program:
385% Yes, I'm paraphrasing the result a tiny bit to remove the observe non-function
387  $\forall$input_program,output.
388  compile input_program = return output $\rightarrow$
389  not_wrong … (exec_inf … clight_fullexec input_program) $\rightarrow$
390  sim_with_labels
391   (exec_inf … clight_fullexec input_program)
392   (exec_inf … clight_fullexec (c_labelled_clight … output))
394That is, any successful compilation produces a labelled program that
395has identical behaviour to the original, so long as there is no
396`undefined behaviour'.
398Note that this provides full functional correctness, including
399preservation of (non-)termination.  The intensional result above does
400not do this directly --- it does not guarantee the same result or same
401termination.  There are two mitigating factors, however: first, to
402prove the intensional property you need local simulation results that
403can be pieced together to form full behavioural equivalence, only time
404constraints have prevented us from doing so.  Second, if we wish to
405confirm a result, termination, or non-termination we could add an
406observable witness, such as a function that is only called if the
407correct result is given.  The intensional result guarantees that the
408observable witness is preserved, so the program must behave correctly.
410\section{Intermediate goals for the front-end}
413The essential parts of the intensional proof were outlined during work
414on a toy
415compiler~\cite{d2.1,springerlink:10.1007/978-3-642-32469-7_3}.  These
418\item functional correctness, in particular preserving the trace of
419  cost labels,
420\item the \emph{soundness} and \emph{precision} of the cost labelling
421  on the object code, and
422\item the timing analysis on the object code produces a correct
423  mapping from cost labels to time.
426However, that toy development did not include function calls.  For the
427full \cerco{} compiler we also need to maintain the invariant that
428functions return to the correct program location in the caller, as we
429mentioned in the previous section.  During work on the back-end timing
430analysis (describe in more detail in the companion deliverable, D4.4)
431the notion of a \emph{structured trace} was developed to enforce this
432return property, and also most of the cost labelling properties too.
438\caption{The compiler and proof outline}
442Jointly, we generalised the structured traces to apply to any of the
443intermediate languages with some idea of program counter.  This means
444that they are introduced part way through the compiler, see
445Figure~\ref{fig:compiler}.  Proving that a structured trace can be
446constructed at \textsf{RTLabs} has several virtues:
448\item This is the first language where every operation has its own
449  unique, easily addressable, statement.
450\item Function calls and returns are still handled implicitly in the
451  language and so the structural properties are ensured by the
452  semantics.
453\item Many of the back-end languages from \textsf{RTL} share a common
454  core set of definitions, and using structured traces throughout
455  increases this uniformity.
462\caption{Nesting of functions in structured traces}
465A structured trace is a mutually inductive data type which
466contains the steps from a normal program trace, but arranged into a
467nested structure which groups entire function calls together and
468aggregates individual steps between cost labels (or between the final
469cost label and the return from the function), see
470Figure~\ref{fig:strtrace}.  This captures the idea that the cost labels
471only represent costs \emph{within} a function --- calls to other
472functions are accounted for in the nested trace for their execution, and we
473can locally regard function calls as a single step.
475These structured traces form the core part of the intermediate results
476that we must prove so that the back-end can complete the main
477intensional result stated above.  In full, we provide the back-end
480\item A normal trace of the \textbf{prefix} of the program's execution
481  before reaching the measurable subtrace.  (This needs to be
482  preserved so that we know that the stack space consumed is correct,
483  and to set up the simulation results.)
484\item The \textbf{structured trace} corresponding to the measurable
485  subtrace.
486\item An additional property about the structured trace that no
487  `program counter' is \textbf{repeated} between cost labels.  Together with
488  the structure in the trace, this takes over from showing that
489  cost labelling is sound and precise.
490\item A proof that the \textbf{observables} have been preserved.
491\item A proof that the \textbf{stack limit} is still observed by the prefix and
492  the structure trace.  (This is largely a consequence of the
493  preservation of observables.)
496Following the outline in Figure~\ref{fig:compiler}, we will first deal
497with the transformations in \textsf{Clight} that produce the source
498program with cost labels, then show that measurable traces can be
499lifted to \textsf{RTLabs}, and finally that we can construct the
500properties listed above ready for the back-end proofs.
502\section{Input code to cost labelled program}
505As explained on page~\pageref{page:switchintro}, the costs of complex
506C \lstinline[language=C]'switch' statements cannot be represented with
507the simple labelling.  Our first pass replaces these statements with
508simpler C code, allowing our second pass to perform the cost
509labelling.  We show that the behaviour of programs is unchanged by
510these passes.
512\subsection{Switch removal}
514The intermediate languages of the front-end have no jump tables.
515Hence, we have to compile the \lstinline[language=C]'switch'
516constructs away before going from \textsf{Clight} to \textsf{Cminor}.
517Note that this transformation does not necessarily deteriorate the
518efficiency of the generated code. For instance, compilers such as GCC
519introduce balanced trees of ``if-then-else'' constructs for small
520switches. However, our implementation strategy is much simpler. Let
521us consider the following input statement.
524   switch(e) {
525   case v1:
526     stmt1;
527   case v2:
528     stmt2;
529   default:
530     stmt_default;
531   }
534Note that \textsf{stmt1}, \textsf{stmt2}, \ldots \textsf{stmt\_default}
535may contain \lstinline[language=C]'break' statements, which have the
536effect of exiting the switch statement. In the absence of break, the
537execution falls through each case sequentially. In our current implementation,
538we produce an equivalent sequence of ``if-then'' chained by gotos:
540   fresh = e;
541   if(fresh == v1) {
542     $\llbracket$stmt1$\rrbracket$;
543     goto lbl_case2;
544   };
545   if(fresh == v2) {
546     lbl_case2:
547     $\llbracket$stmt2;$\rrbracket$
548     goto lbl_case2;
549   };
550   $\llbracket$stmt_default$\rrbracket$;
551   exit_label:
554The proof had to tackle the following points:
556\item the source and target memories are not the same (cf. fresh variable),
557\item the flow of control is changed in a non-local way (e.g. \textbf{goto} 
558  instead of \textbf{break}).
561In order to tackle the first point, we implemented a version of memory
562extensions similar to Compcert's. What has been done is the simulation proof
563for all ``easy'' cases, that do not interact with the switch removal per se,
564plus a bit of the switch case. This comprises propagating the memory extension
565through  each statement (except switch), as well as various invariants that
566are needed for the switch case (in particular, freshness hypotheses). The
567details of the evaluation process for the source switch statement and its
568target counterpart can be found in the file \textbf{},
569along more details on the transformation itself.
571Proving the correctness of the second point would require reasoning on the
572semantics of \lstinline[language=C]'goto' statements. In the \textsf{Clight} 
573semantics, this is implemented as a function-wide lookup of the target label.
574The invariant we would need is the fact that a global label lookup on a freshly
575created goto is equivalent to a local lookup. This would in turn require the
576propagation of some freshness hypotheses on labels. For reasons expressed above,
577we decided to omit this part of the correctness proof.
579\subsection{Cost labelling}
581The simulation for the cost labelling pass is the simplest in the
582front-end.  The main argument is that any step of the source program
583is simulated by the same step of the labelled one, plus any extra
584steps for the added cost labels.  The extra instructions do not change
585the memory or local environments, although we have to keep track of
586the extra instructions in continuations.
588We do not attempt to capture any cost properties of the labelling in
589the simulation proof, allowing the proof to be oblivious to the choice
590of cost labels.  Hence we do not have to reason about the threading of
591name generation through the labelling function, greatly reducing the
592amount of effort required.  We describe how the cost properties are
593established in Section~\ref{sec:costchecks}.
595%TODO: both give one-step-sim-by-many forward sim results; switch
596%removal tricky, uses aux var to keep result of expr, not central to
597%intensional correctness so curtailed proof effort once reasonable
598%level of confidence in code gained; labelling much simpler; don't care
599%what the labels are at this stage, just need to know when to go
600%through extra steps.  Rolled up into a single result with a cofixpoint
601%to obtain coinductive statement of equivalence (show).
603\section{Finding corresponding measurable subtraces}
606There follow the three main passes of the front-end:
608\item simplification of casts in \textsf{Clight} code
609\item \textsf{Clight} to \textsf{Cminor} translation, performing stack
610  variable allocation and simplifying control structures
611\item transformation to \textsf{RTLabs} control flow graph
613\todo{I keep mentioning forward sim results - I probably ought to say
614  something about determinancy} We have taken a common approach to
615each pass: first we build (or axiomatise) forward simulation results
616that are similar to normal compiler proofs, but slightly more
617fine-grained so that we can see that the call structure and relative
618placement of cost labels is preserved.
620Then we instantiate a general result which shows that we can find a
621\emph{measurable} subtrace in the target of the pass that corresponds
622to the measurable subtrace in the source.  By repeated application of
623this result we can find a measurable subtrace of the execution of the
624\textsf{RTLabs} code, suitable for the construction of a structured
625trace (see Section~\ref{sec:structuredtrace}.  This is essentially an
626extra layer on top of the simulation proofs that provides us with the
627extra information required for our intensional correctness proof.
629\subsection{Generic measurable subtrace lifting proof}
631Our generic proof is parametrised on a record containing small-step
632semantics for the source and target language, a classification of
633states (the same form of classification is used when defining
634structured traces), a simulation relation which loosely respects the
635classification and cost labelling \todo{this isn't very helpful} and
636four simulation results:
638\item a step from a `normal' state (which is not classified as a call
639  or return) which is not a cost label is simulated by zero or more
640  `normal' steps;
641\item a step from a `call' state followed by a cost label step is
642  simulated by a step from a `call' state, a corresponding label step,
643  then zero or more `normal' steps;
644\item a step from a `call' state not followed by a cost label
645  similarly (note that this case cannot occur in a well-labelled
646  program, but we do not have enough information locally to exploit
647  this); and
648\item a cost label step is simulated by a cost label step.
650Finally, we need to know that a successfully translated program will
651have an initial state in the simulation relation with the original
652program's initial state.
658\caption{Tiling of simulation for a measurable subtrace}
662To find the measurable subtrace in the target program's execution we
663walk along the original program's execution trace applying the
664appropriate simulation result by induction on the number of steps.
665While the number of steps taken varies, the overall structure is
666preserved, as illustrated in Figure~\ref{fig:tiling}.  By preserving
667the structure we also maintain the same intensional observables.  One
668delicate point is that the cost label following a call must remain
669directly afterwards\footnote{The prototype compiler allowed some
670  straight-line code to appear before the cost label until a later
671  stage of the compiler, but we must move the requirement forward to
672  fit with the structured traces.}
673% Damn it, I should have just moved the cost label forwards in RTLabs,
674% like the prototype does in RTL to ERTL; the result would have been
675% simpler.  Or was there some reason not to do that?
676(both in the program code and in the execution trace), even if we
677introduce extra steps, for example to store parameters in memory in
678\textsf{Cminor}.  Thus we have a version of the call simulation
679that deals with both in one result.
681In addition to the subtrace we are interested in measuring we must
682also prove that the earlier part of the trace is also preserved in
683order to use the simulation from the initial state.  It also
684guarantees that we do not run out of stack space before the subtrace
685we are interested in.  The lemmas for this prefix and the measurable
686subtrace are similar, following the pattern above.  However, the
687measurable subtrace also requires us to rebuild the termination
688proof.  This is defined recursively:
691let rec will_return_aux C (depth:nat)
692                             (trace:list (cs_state … C × trace)) on trace : bool :=
693match trace with
694[ nil $\Rightarrow$ false
695| cons h tl $\Rightarrow$
696  let $\langle$s,tr$\rangle$ := h in
697  match cs_classify C s with
698  [ cl_call $\Rightarrow$ will_return_aux C (S depth) tl
699  | cl_return $\Rightarrow$
700      match depth with
701      [ O $\Rightarrow$ match tl with [ nil $\Rightarrow$ true | _ $\Rightarrow$ false ]
702      | S d $\Rightarrow$ will_return_aux C d tl
703      ]
704  | _ $\Rightarrow$ will_return_aux C depth tl
705  ]
708The \lstinline'depth' is the number of return states we need to see
709before we have returned to the original function (initially zero) and
710\lstinline'trace' the measurable subtrace obtained from the running
711the semantics for the correct number of steps.  This definition
712unfolds tail recursively for each step, and once the corresponding
713simulation result has been applied a new one for the target can be
714asserted by unfolding and applying the induction hypothesis on the
715shorter trace.
717This then gives us an overall result for any simulation fitting the
718requirements above (contained in the \lstinline'meas_sim' record):
720theorem measured_subtrace_preserved :
721  $\forall$MS:meas_sim.
722  $\forall$p1,p2,m,n,stack_cost,max.
723  ms_compiled MS p1 p2 $\rightarrow$
724  measurable (ms_C1 MS) p1 m n stack_cost max $\rightarrow$
725  $\exists$m',n'.
726    measurable (ms_C2 MS) p2 m' n' stack_cost max $\wedge$
727    observables (ms_C1 MS) p1 m n = observables (ms_C2 MS) p2 m' n'.
729The stack space requirement that is embedded in \lstinline'measurable'
730is a consequence of the preservation of observables.
732TODO: how to deal with untidy edges wrt to sim rel; anything to
733say about obs?
735TODO: say something about termination measures; cost labels are
736statements/exprs in these languages; split call/return gives simpler
739\subsection{Simulation results for each pass}
741\paragraph{Cast simplification.}
743The parser used in Cerco introduces a lot of explicit type casts.
744If left as they are, these constructs can greatly hamper the
745quality of the generated code -- especially as the architecture
746we consider is an $8$ bit one. In \textsf{Clight}, casts are
747expressions. Hence, most of the work of this transformation
748proceeds on expressions. The tranformation proceeds by recursively
749trying to coerce an expression to a particular integer type, which
750is in practice smaller than the original one. This functionality
751is implemented by two mutually recursive functions whose signature
752is the following.
755let rec simplify_expr (e:expr) (target_sz:intsize) (target_sg:signedness)
756  : Σresult:bool×expr.
757    ∀ge,en,m. simplify_inv ge en m e (\snd result) target_sz target_sg (\fst result) ≝ $\ldots$
759and simplify_inside (e:expr) : Σresult:expr. conservation e result ≝ $\ldots$
762The \textsf{simplify\_inside} acts as a wrapper for
763\textsf{simplify\_expr}. Whenever \textsf{simplify\_inside} encounters
764a \textsf{Ecast} expression, it tries to coerce the sub-expression
765to the desired type using \textsf{simplify\_expr}, which tries to
766perform the actual coercion. In return, \textsf{simplify\_expr} calls
767back \textsf{simplify\_inside} in some particular positions, where we
768decided to be conservative in order to simplify the proofs. However,
769the current design allows to incrementally revert to a more aggressive
770version, by replacing recursive calls to \textsf{simplify\_inside} by
771calls to \textsf{simplify\_expr} \emph{and} proving the corresponding
772invariants -- where possible.
774The \textsf{simplify\_inv} invariant encodes either the conservation
775of the semantics during the transformation corresponding to the failure
776of the transformation (\textsf{Inv\_eq} constructor), or the successful
777downcast of the considered expression to the target type
781inductive simplify_inv
782  (ge : genv) (en : env) (m : mem)
783  (e1 : expr) (e2 : expr) (target_sz : intsize) (target_sg : signedness) : bool → Prop ≝
784| Inv_eq : ∀result_flag. $\ldots$
785     simplify_inv ge en m e1 e2 target_sz target_sg result_flag
786| Inv_coerce_ok : ∀src_sz,src_sg.
787     (typeof e1) = (Tint src_sz src_sg) → (typeof e2) = (Tint target_sz target_sg) →     
788     (smaller_integer_val src_sz target_sz src_sg (exec_expr ge en m e1) (exec_expr ge en m e2)) →
789     simplify_inv ge en m e1 e2 target_sz target_sg true.
792The \textsf{conservation} invariant simply states the conservation
793of the semantics, as in the \textsf{Inv\_eq} constructor of the previous
797definition conservation ≝ λe,result. ∀ge,en,m.
798          res_sim ? (exec_expr ge en m e) (exec_expr ge en m result)
799         ∧ res_sim ? (exec_lvalue ge en m e) (exec_lvalue ge en m result)
800         ∧ typeof e = typeof result.
803This invariant is then easily lifted to statement evaluations.
804The main problem encountered with this particular pass was dealing with
805inconsistently typed programs, a canonical case being a particular
806integer constant of a certain size typed with another size. This
807prompted the need to introduce numerous type checks, complexifying
808both the implementation and the proof.
809\todo{Make this a particular case of the more general statement on baking more invariants in the Clight language}
811\paragraph{Clight to Cminor.}
813This pass is the last one operating on the Clight intermediate language.
814Its input is a full \textsf{Clight} program, and its output is a \textsf{Cminor} 
815program. Note that we do not use an equivalent of the C\#minor language: we
816translate directly to Cminor. This presents the advantage of not requiring the
817special loop constructs, nor the explicit block structure. Another salient
818point of our approach is that a significant part of the properties needed for
819the simulation proof were directly encoded in dependently typed translation
820functions. This concerns more precisely freshness conditions and well-typedness
821conditions. The main effects of the transformation from \textsf{Clight} to
822\textsf{Cminor} are listed below.
825\item Variables are classified as being either globals, stack-allocated
826  locals or potentially register-allocated locals. The value of register-allocated
827  local variables is moved out of the modelled memory and stored in a
828  dedicated environment.
829\item In Clight, each local variable has a dedicated memory block, whereas
830  stack-allocated locals are bundled together on a function-by-function basis.
831\item Loops are converted to jumps.
834The first two points require memory injections which are more flexible that those
835needed in the switch removal case. In the remainder of this section, we briefly
836discuss our implementation of memory injections, and then the simulation proof.
838\subparagraph{Memory injections.}
840Our memory injections are modelled after the work of Blazy \& Leroy.
841However, the corresponding paper is based on the first version of the
842Compcert memory model, whereas we use a much more concrete model, allowing byte-level
843manipulations (as in the later version of Compcert's memory model). We proved
844roughly 80 \% of the required lemmas. Some difficulties encountered were notably
845due to some too relaxed conditions on pointer validity (problem fixed during development).
846Some more conditions had to be added to take care of possible overflows when converting
847from \textbf{Z} block bounds to $16$ bit pointer offsets (in pratice, these overflows
848only occur in edge cases that are easily ruled out -- but this fact is not visible
849in memory injections). Concretely, some of the lemmas on the preservation of simulation of
850loads after writes were axiomatised, for lack of time.
852\subparagraph{Simulation proof.}
854The correction proof for this transformation was not terminated. We proved the
855simulation result for expressions and for some subset of the critical statement cases.
856Notably lacking are the function entry and exit, where the memory injection is
857properly set up. As can be guessed, a significant amount of work has to be performed
858to show the conservation of all invariants at each simulation step.
860\todo{list cases, explain while loop, explain labeling problem}
862\section{Checking cost labelling properties}
865Ideally, we would provide proofs that the cost labelling pass always
866produced programs that are soundly and precisely labelled and that
867each subsequent pass preserves these properties.  This would match our
868use of dependent types to eliminate impossible sources of errors
869during compilation, in particular retaining intermediate language type
872However, given the limited amount of time available we realised that
873implementing a compile-time check for a sound and precise labelling of
874the \textsf{RTLabs} intermediate code would reduce the proof burden
875considerably.  This is similar in spirit to the use of translation
876validation in certified compilation, which makes a similar trade-off
877between the potential for compile-time failure and the volume of proof
880The check cannot be pushed into a later stage of the compiler because
881much of the information is embedded into the structured traces.
882However, if an alternative method was used to show that function
883returns in the compiled code are sufficiently well-behaved, then we
884could consider pushing the cost property checks into the timing
885analysis itself.  We leave this as a possible area for future work.
887\subsection{Implementation and correctness}
889For a cost labelling to be sound and precise we need a cost label at
890the start of each function, after each branch and at least one in
891every loop.  The first two parts are trivial to check by examining the
892code.  In \textsf{RTLabs} the last part is specified by saying
893that there is a bound on the number of successive instruction nodes in
894the CFG that you can follow before you encounter a cost label, and
895checking this is more difficult.
897The implementation works through the set of nodes in the graph,
898following successors until a cost label is found or a label-free cycle
899is discovered (in which case the property does not hold and we stop).
900This is made easier by the prior knowledge that any branch is followed
901by cost labels, so we do not need to search each branch.  When a label
902is found, we remove the chain from the set and continue from another
903node in the set until it is empty, at which point we know that there
904is a bound for every node in the graph.
906Directly reasoning about the function that implements this would be
907rather awkward, so an inductive specification of a single step of its
908behaviour was written and proved to match the implementation.  This
909was then used to prove the implementation sound and complete.
911While we have not attempted to proof that the cost labelled properties
912are established and preserved earlier in the compiler, we expect that
913the effort for the \textsf{Cminor} to \textsf{RTLabs} would be similar
914to the work outlined above, because it involves the change from
915requiring a cost label at particular positions to requiring cost
916labels to break loops in the CFG.  As there are another three passes
917to consider (including the labelling itself), we believe that using
918the check above is much simpler overall.
920% TODO? Found some Clight to Cminor bugs quite quickly
922\section{Existence of a structured trace}
925\emph{Structured traces} enrich the execution trace of a program by
926nesting function calls in a mixed-step style and embedding the cost
927properties of the program.  It was originally designed to support the
928proof of correctness for the timing analysis of the object code in the
929back-end, then generalised to provide a common structure to use from
930the end of the front-end to the object code.  See
931Figure~\ref{fig:strtrace} on page~\pageref{fig:strtrace} for an
932illustration of a structured trace.
934To make the definition generic we abstract over the semantics of the
937record abstract_status : Type[1] :=
938  { as_status :> Type[0]
939  ; as_execute : relation as_status
940  ; as_pc : DeqSet
941  ; as_pc_of : as_status $\rightarrow$ as_pc
942  ; as_classify : as_status $\rightarrow$ status_class
943  ; as_label_of_pc : as_pc $\rightarrow$ option costlabel
944  ; as_after_return : ($\Sigma$s:as_status. as_classify s =  cl_call) $\rightarrow$ as_status $\rightarrow$ Prop
945  ; as_result: as_status $\rightarrow$ option int
946  ; as_call_ident : ($\Sigma$s:as_status.as_classify s = cl_call) $\rightarrow$ ident
947  ; as_tailcall_ident : ($\Sigma$s:as_status.as_classify s = cl_tailcall) $\rightarrow$ ident
948  }.
950which gives a type of states, an execution relation, some notion of
951program counters with decidable equality, the classification of states
952and functions to extract the observable intensional information (cost
953labels and the identity of functions that are called).  The
954\lstinline'as_after_return' property links the state before a function
955call with the state after return, providing the evidence that
956execution returns to the correct place.  The precise form varies
957between stages; in \textsf{RTLabs} it insists the CFG, the pointer to
958the CFG node to execute next, and some call stack information is
961The structured traces are defined using three mutually inductive
962types.  The core data structure is \lstinline'trace_any_label', which
963captures some straight-line execution until the next cost label or
964function return.  Each function call is embedded as a single step,
965with its own trace nested inside and the before and after states
966linked by \lstinline'as_after_return', and states classified as a
967`jump' (in particular branches) must be followed by a cost label.
969The second type, \lstinline'trace_label_label', is a
970\lstinline'trace_any_label' where the initial state is cost labelled.
971Thus a trace in this type identifies a series of steps whose cost is
972entirely accounted for by the label at the start.
974Finally, \lstinline'trace_label_return' is a sequence of
975\lstinline'trace_label_label' values which end in the return from the
976function.  These correspond to a measurable subtrace, and in
977particular include executions of an entire function call (and so are
978used for the nested calls in \lstinline'trace_any_label').
982The construction of the structured trace replaces syntactic cost
983labelling properties which place requirements on where labels appear
984in the program, with semantics properties that constrain the execution
985traces of the program.  The construction begins by defining versions
986of the sound and precise labelling properties on the program text that
987appears in states rather than programs, and showing that these are
988preserved by steps of the \textsf{RTLabs} semantics.
990Then we show that each cost labelling property the structured traces
991definition requires is satisfied.  These are broken up by the
992classification of states.  Similarly, we prove a step-by-step stack
993preservation result, which states that the \textsf{RTLabs} semantics
994never changes the lower parts of the stack.
996The core part of the construction of a structured trace is to use the
997proof of termination from the measurable trace (defined on
998page~\pageref{prog:terminationproof}) to `fold up' the execution into
999the nested form.  The results outlined above fill in the proof
1000obligations for the cost labelling properties and the stack
1001preservation result shows that calls return to the correct location.
1003The structured trace alone is not sufficient to capture the property
1004that the program is soundly labelled.  While the structured trace
1005guarantees termination, it still permits a loop to be executed a
1006finite number of times without encountering a cost label.  We
1007eliminate this by proving that no `program counter' repeats within any
1008\lstinline'trace_any_label' section by showing that it is incompatible
1009with the property that there is a bound on the number of successor
1010instructions you can follow in the CFG before you encounter a cost
1013\subsubsection{Whole program structured traces}
1015The development of the construction above started relatively early,
1016before the measurable subtraces preservation proofs.  To be confident
1017that the traces were well-formed, we also developed a whole program
1018form that embeds the traces above.  This includes non-terminating
1019programs, where an infinite number of the terminating structured
1020traces are embedded.  This construction confirmed that our definition
1021of structured traces was consistent, although we later found that we
1022did not require them for the compiler correctness results.
1024To construct these we need to know whether each function call will
1025eventually terminate, requiring the use of the excluded middle.  This
1026classical reasoning is local to the construction of whole program
1027traces and is not necessary for our main results.
1031In combination with the work on the CerCo back-end and by
1032concentrating on the novel intensional parts of the proof, we have
1033shown that it is possible to construct certifying compilers that
1034correctly report execution time and stack space costs.
1036TODO: appendix on code layout?
Note: See TracBrowser for help on using the repository browser.