source: Deliverables/D3.4/Report/report.tex @ 3191

Last change on this file since 3191 was 3191, checked in by garnier, 8 years ago

Some more info on cast removal

File size: 43.6 KB
14% LaTeX Companion, p 74
15\newcommand{\todo}[1]{\marginpar{\raggedright - #1}}
18  {keywords={Definition,Lemma,Theorem,Remark,Qed,Save,Inductive,Record},
19   morekeywords={[2]if,then,else},
20  }
23  {keywords={definition,lemma,theorem,remark,inductive,record,qed,let,rec,match,with,Type,and,on},
24   morekeywords={[2]whd,normalize,elim,cases,destruct},
25   mathescape=true,
26   morecomment=[n]{(*}{*)},
27  }
30        keywordstyle=\color{red}\bfseries,
31        keywordstyle=[2]\color{blue},
32        commentstyle=\color{green},
33        stringstyle=\color{blue},
34        showspaces=false,showstringspaces=false}
48\vspace*{1cm}Project FP7-ICT-2009-C-243881 \cerco{}}
50\date{ }
70Report n. D3.4\\
71Front-end Correctness Proofs\\
78Version 1.0
86Brian~Campbell, Ilias~Garnier, James~McKinna, Ian~Stark
92Project Acronym: \cerco{}\\
93Project full title: Certified Complexity\\
94Proposal/Contract no.: FP7-ICT-2009-C-243881 \cerco{}\\
96\clearpage \pagestyle{myheadings} \markright{\cerco{}, FP7-ICT-2009-C-243881}
102We report on the correctness proofs for the front-end of the \cerco{}
103cost lifting compiler.  First, we identify the core result we wish to
104prove, which says that the we correctly predict the precise execution
105time for particular parts of the execution called \emph{measurable}
106subtraces.  Then we consider the three distinct parts of the task:
107showing that the \emph{annotated source code} output by the compiler
108has equivalent behaviour to the original input (up to the
109annotations); showing that a measurable subtrace of the
110annotated source code corresponds to an equivalent measurable subtrace
111in the code produced by the front-end, including costs; and finally
112showing that the enriched \emph{structured} execution traces required
113for cost correctness in the back-end can be constructed from the
114properties of the code produced by the front-end.
116A key part of our work is that the intensional correctness results that show
117that we get consistent cost measurements throughout the intermediate languages
118of the compiler can be layered on top of normal forward simulation results,
119if we split those results into local call-structure preserving simulations.
121This deliverable shows correctness results about the formalised
122compiler described in D3.2, using the source language semantics from
123D3.1 and intermediate language semantics from D3.3.  It builds on
124earlier work on a toy compiler built to test the labelling approach in
125D2.1. Together with the companion deliverable about the correctness of
126the back-end, D4.4, we obtain results about the whole formalised
129% TODO: mention the deliverable about the extracted compiler et al?
135% CHECK: clear up any -ize vs -ise
136% CHECK: clear up any "front end" vs "front-end"
137% CHECK: clear up any mentions of languages that aren't textsf'd.
138% CHECK: fix unicode in listings
142\todo{add stack space for more precise statement.  Also do some
143translation validation on sound, precise labelling properties.}
145The \cerco{} compiler compiles C code, targeting microcontrollers
146implementing the Intel 8051 architecture.  It produces both the object
147code and source code containing annotations describing the timing
148behaviour of the object code.  There are two versions: first, an
149initial prototype implemented in \ocaml{}~\cite{d2.2}, and a version
150formalised in the \matita{} proof assistant~\cite{d3.2,d4.2} and then
151extracted to \ocaml{} code to produce an executable compiler.  In this
152document we present results formalised in \matita{} about the
153front-end of the latter version of the compiler, and how that fits
154into the verification of the whole compiler.
156\todo{maybe mention stack space here?  other additions?  refer to
157  "layering"?}  A key part of this work was to separate the proofs
158about the compiled code's extensional behaviour (that is, the
159functional correctness of the compiler) from the intensional
160correctness that the costs given are correct.  Unfortunately, the
161ambitious goal of completely verifying the entire compiler was not
162feasible within the time available, but thanks to this separation of
163extensional and intensional proofs we are able to axiomatize
164simulation results similar to those in other compiler verification
165projects and concentrate on the novel intensional proofs.  The proofs
166were also made more tractable by introducing compile-time checks for
167the `sound and precise' cost labelling properties rather than proving
168that they are preserved throughout.
170The overall statement of correctness says that the annotated program has the
171same behaviour as the input, and that for any suitably well-structured part of
172the execution (which we call \emph{measurable}), the object code will execute
173the same behaviour taking precisely the time given by the cost annotations in
174the annotated source program.
176In the next section we recall the structure of the compiler and make the overall
177statement more precise.  Following that, in Section~\ref{sec:fegoals} we
178describe the statements we need to prove about the intermediate \textsf{RTLabs}
179programs sufficient for the back-end proofs.  \todo{rest of document structure}
181\section{The compiler and main goals}
183The unformalised \ocaml{} \cerco{} compiler was originally described
184in Deliverables 2.1 and 2.2.  Its design was replicated in the formal
185\matita{} code, which was presented in Deliverables 3.2 and 4.2, for
186the front-end and back-end respectively.
192\caption{Languages in the \cerco{} compiler}
196The compiler uses a number of intermediate languages, as outlined the
197middle two lines of Figure~\ref{fig:compilerlangs}.  The upper line
198represents the front-end of the compiler, and the lower the back-end,
199finishing with 8051 binary code.  Not all of the front-end passes
200introduces a new language, and Figure~\ref{fig:summary} presents a
201list of every pass involved.
207\quad \= $\downarrow$ \quad \= \kill
208\textsf{C} (unformalized)\\
209\> $\downarrow$ \> CIL parser (unformalized \ocaml)\\
211%\> $\downarrow$ \> add runtime functions\\
212\> $\downarrow$ \> \lstinline[language=C]'switch' removal\\
213\> $\downarrow$ \> labelling\\
214\> $\downarrow$ \> cast removal\\
215\> $\downarrow$ \> stack variable allocation and control structure
216 simplification\\
218%\> $\downarrow$ \> generate global variable initialization code\\
219\> $\downarrow$ \> transform to RTL graph\\
221\> $\downarrow$ \> check cost labelled properties of RTL graph\\
222\> $\downarrow$ \> start of target specific back-end\\
223\>\quad \vdots
227\caption{Front-end languages and compiler passes}
232The annotated source code is taken after the cost labelling phase.
233Note that there is a pass to replace C \lstinline[language=C]'switch'
234statements before labelling --- we need to remove them because the
235simple form of labelling used in the formalised compiler is not quite
236capable of capturing their execution time costs, largely due to the
237fall-through behaviour.
239The cast removal phase which follows cost labelling simplifies
240expressions to prevent unnecessary arithmetic promotion which is
241specified by the C standard but costly for an 8-bit target.  The
242transformation to \textsf{Cminor} and subsequently \textsf{RTLabs}
243bear considerable resemblance to some passes of the CompCert
244compiler\todo{cite}, although we use a simpler \textsf{Cminor} where
245all loops use \lstinline[language=C]'goto' statements, and the
246\textsf{RTLabs} language retains a target-independent flavour.  The
247back-end takes \textsf{RTLabs} code as input.
249The whole compilation function returns the following information on success:
251record compiler_output : Type[0] :=
252 { c_labelled_object_code: labelled_object_code
253 ; c_stack_cost: stack_cost_model
254 ; c_max_stack: nat
255 ; c_init_costlabel: costlabel
256 ; c_labelled_clight: clight_program
257 ; c_clight_cost_map: clight_cost_map
258 }.
260It consists of annotated 8051 object code, a mapping from function
261identifiers to the function's stack space usage\footnote{The compiled
262  code's only stack usage is to allocate a fixed-size frame on each
263  function entry and discard it on exit.  No other dynamic allocation
264  is provided by the compiler.}, the space available for the
265stack after global variable allocation, a cost label covering the
266execution time for the initialisation of global variables and the call
267to the \lstinline[language=C]'main' function, the annotated source
268code, and finally a mapping from cost labels to actual execution time
271An \ocaml{} pretty printer is used to provide a concrete version of the output
272code.  In the case of the annotated source code, it also inserts the actual
273costs alongside the cost labels, and optionally adds a global cost variable
274and instrumentation to support further reasoning.  \todo{Cross-ref case study
277\subsection{Revisions to the prototype compiler}
279Our focus on intensional properties prompted us to consider whether we
280could incorporate stack space into the costs presented to the user.
281We only allocate one fixed-size frame per function, so modelling this
282was relatively simple.  It is the only form of dynamic memory
283allocation provided by the compiler, so we were able to strengthen the
284statement of the goal to guarantee successful execution whenever the
285stack space obeys a bound calculated by subtracting the global
286variable requirements from the total memory available.
288The cost checks at the end of Figure~\ref{fig:summary} have been
289introduced to reduce the proof burden, and are described in
292The use of dependent types to capture simple intermediate language
293invariants makes every front-end pass except \textsf{Clight} to
294\textsf{Cminor} and the cost checks total functions.  Hence various
295well-formedness and type checks are dealt with once in that phase.
296With the benefit of hindsight we would have included an initial
297checking phase to produce a `well-formed' variant of \textsf{Clight},
298conjecturing that this would simplify various parts of the proofs for
299the \textsf{Clight} stages which deal with potentially ill-formed
302\todo{move of initialisation code?}
304\subsection{Main goals}
306TODO: need an example for this
308Informally, our main intensional result links the time difference in a source
309code execution to the time difference in the object code, expressing the time
310for the source by summing the values for the cost labels in the trace, and the
311time for the target by a clock built in to the 8051 executable semantics.
313The availability of precise timing information for 8501
314implementations and the design of the compiler allow it to give exact
315time costs in terms of processor cycles, not just upper bounds.
316However, these exact results are only available if the subtrace we
317measure starts and ends at suitable points.  In particular, pure
318computation with no observable effects may be reordered and moved past
319cost labels, so we cannot measure time between arbitrary statements in
320the program.
322There is also a constraint on the subtraces that we
323measure due to the requirements of the correctness proof for the
324object code timing analysis.  To be sure that the timings are assigned
325to the correct cost label, we need to know that each return from a
326function call must go to the correct return address.  It is difficult
327to observe this property locally in the object code because it relies
328on much earlier stages in the compiler.  To convey this information to
329the timing analysis extra structure is imposed on the subtraces, which
330is described in Section~\ref{sec:fegoals}.
332% Regarding the footnote, would there even be much point?
333% TODO: this might be quite easy to add ('just' subtract the
334% measurable subtrace from the second label to the end).  Could also
335% measure other traces in this manner.
336These restrictions are reflected in the subtraces that we give timing
337guarantees on; they must start at a cost label and end at the return
338of the enclosing function of the cost label\footnote{We expect that
339  this would generalise to subtraces between cost labels in the same
340  function, but could not justify the extra complexity that would be
341  required to show this.}.  A typical example of such a subtrace is
342the execution of an entire function from the cost label at the start
343of the function until it returns.  We call such any such subtrace
344\emph{measurable} if it (and the prefix of the trace from the start to
345the subtrace) can also be executed within the available stack space.
347Now we can give the main intensional statement for the compiler.
348Given a \emph{measurable} subtrace for a labelled \textsf{Clight}
349program, there is a subtrace of the 8051 object code program where the
350time differences match.  Moreover, \emph{observable} parts of the
351trace also match --- these are the appearance of cost labels and
352function calls and returns.
354More formally, the definition of this statement in \matita{} is
356definition simulates :=
357  $\lambda$p: compiler_output.
358  let initial_status := initialise_status $\dots$ (cm (c_labelled_object_code $\dots$ p)) in
359  $\forall$m1,m2.
360   measurable Clight_pcs (c_labelled_clight $\dots$ p) m1 m2
361    (lookup_stack_cost (c_stack_cost $\dots$ p)) (c_max_stack $\dots$ p) $\rightarrow$
362  $\forall$c1,c2.
363   clock_after Clight_pcs (c_labelled_clight $\dots$ p) m1 (c_clight_cost_map $\dots$ p) = OK $\dots$ c1 $\rightarrow$
364   clock_after Clight_pcs (c_labelled_clight $\dots$ p) (m1+m2) (c_clight_cost_map $\dots$ p) = OK $\dots$ c2 $\rightarrow$
365  $\exists$n1,n2.
366   observables Clight_pcs (c_labelled_clight $\dots$ p) m1 m2 =
367   observables (OC_preclassified_system (c_labelled_object_code $\dots$ p))
368    (c_labelled_object_code $\dots$ p) n1 n2
369  $\wedge$
370   c2 - c1 = clock $\dots$ (execute n2 ? initial_status) - clock $\dots$ (execute n1 ? initial_status).
372where the \lstinline'measurable', \lstinline'clock_after' and
373\lstinline'observables' definitions can be applied to multiple
374languages; in this case the \lstinline'Clight_pcs' record applies them
375to \textsf{Clight} programs.
377There is a second part to the statement, which says that the initial
378processing of the input program to produce the cost labelled version
379does not affect the semantics of the program:
380% Yes, I'm paraphrasing the result a tiny bit to remove the observe non-function
382  $\forall$input_program,output.
383  compile input_program = return output $\rightarrow$
384  not_wrong … (exec_inf … clight_fullexec input_program) $\rightarrow$
385  sim_with_labels
386   (exec_inf … clight_fullexec input_program)
387   (exec_inf … clight_fullexec (c_labelled_clight … output))
389That is, any successful compilation produces a labelled program that
390has identical behaviour to the original, so long as there is no
391`undefined behaviour'.
393Note that this provides full functional correctness, including
394preservation of (non-)termination.  The intensional result above does
395not do this directly --- it does not guarantee the same result or same
396termination.  There are two mitigating factors, however: first, to
397prove the intensional property you need local simulation results that
398can be pieced together to form full behavioural equivalence, only time
399constraints have prevented us from doing so.  Second, if we wish to
400confirm a result, termination, or non-termination we could add an
401observable witness, such as a function that is only called if the
402correct result is given.  The intensional result guarantees that the
403observable witness is preserved, so the program must behave correctly.
405\section{Intermediate goals for the front-end}
408The essential parts of the intensional proof were outlined during work
409on a toy
410compiler~\cite{d2.1,springerlink:10.1007/978-3-642-32469-7_3}.  These
413\item functional correctness, in particular preserving the trace of
414  cost labels,
415\item the \emph{soundness} and \emph{precision} of the cost labelling
416  on the object code, and
417\item the timing analysis on the object code produces a correct
418  mapping from cost labels to time.
421However, that toy development did not include function calls.  For the
422full \cerco{} compiler we also need to maintain the invariant that
423functions return to the correct program location in the caller, as we
424mentioned in the previous section.  During work on the back-end timing
425analysis (describe in more detail in the companion deliverable, D4.4)
426the notion of a \emph{structured trace} was developed to enforce this
427return property, and also most of the cost labelling properties too.
433\caption{The compiler and proof outline}
437Jointly, we generalised the structured traces to apply to any of the
438intermediate languages with some idea of program counter.  This means
439that they are introduced part way through the compiler, see
440Figure~\ref{fig:compiler}.  Proving that a structured trace can be
441constructed at \textsf{RTLabs} has several virtues:
443\item This is the first language where every operation has its own
444  unique, easily addressable, statement.
445\item Function calls and returns are still handled implicitly in the
446  language and so the structural properties are ensured by the
447  semantics.
448\item Many of the back-end languages from \textsf{RTL} share a common
449  core set of definitions, and using structured traces throughout
450  increases this uniformity.
457\caption{Nesting of functions in structured traces}
460A structured trace is a mutually inductive data type which
461contains the steps from a normal program trace, but arranged into a
462nested structure which groups entire function calls together and
463aggregates individual steps between cost labels (or between the final
464cost label and the return from the function), see
465Figure~\ref{fig:strtrace}.  This captures the idea that the cost labels
466only represent costs \emph{within} a function --- calls to other
467functions are accounted for in the nested trace for their execution, and we
468can locally regard function calls as a single step.
470These structured traces form the core part of the intermediate results
471that we must prove so that the back-end can complete the main
472intensional result stated above.  In full, we provide the back-end
475\item A normal trace of the \textbf{prefix} of the program's execution
476  before reaching the measurable subtrace.  (This needs to be
477  preserved so that we know that the stack space consumed is correct,
478  and to set up the simulation results.)
479\item The \textbf{structured trace} corresponding to the measurable
480  subtrace.
481\item An additional property about the structured trace that no
482  `program counter' is \textbf{repeated} between cost labels.  Together with
483  the structure in the trace, this takes over from showing that
484  cost labelling is sound and precise.
485\item A proof that the \textbf{observables} have been preserved.
486\item A proof that the \textbf{stack limit} is still observed by the prefix and
487  the structure trace.  (This is largely a consequence of the
488  preservation of observables.)
491Following the outline in Figure~\ref{fig:compiler}, we will first deal
492with the transformations in \textsf{Clight} that produce the source
493program with cost labels, then show that measurable traces can be
494lifted to \textsf{RTLabs}, and finally that we can construct the
495properties listed above ready for the back-end proofs.
497\section{Input code to cost labelled program}
499As explained on page~\pageref{page:switchintro}, the costs of complex
500C \lstinline[language=C]'switch' statements cannot be represented with
501the simple labelling.  Our first pass replaces these statements with
502simpler C code, allowing our second pass to perform the cost
503labelling.  We show that the behaviour of programs is unchanged by
504these passes.
506TODO: both give one-step-sim-by-many forward sim results; switch
507removal tricky, uses aux var to keep result of expr, not central to
508intensional correctness so curtailed proof effort once reasonable
509level of confidence in code gained; labelling much simpler; don't care
510what the labels are at this stage, just need to know when to go
511through extra steps.  Rolled up into a single result with a cofixpoint
512to obtain coinductive statement of equivalence (show).
514\section{Finding corresponding measurable subtraces}
516There follow the three main passes of the front-end:
518\item simplification of casts in \textsf{Clight} code
519\item \textsf{Clight} to \textsf{Cminor} translation, performing stack
520  variable allocation and simplifying control structures
521\item transformation to \textsf{RTLabs} control flow graph
523\todo{I keep mentioning forward sim results - I probably ought to say
524  something about determinancy} We have taken a common approach to
525each pass: first we build (or axiomatise) forward simulation results
526that are similar to normal compiler proofs, but slightly more
527fine-grained so that we can see that the call structure and relative
528placement of cost labels is preserved.
530Then we instantiate a general result which shows that we can find a
531\emph{measurable} subtrace in the target of the pass that corresponds
532to the measurable subtrace in the source.  By repeated application of
533this result we can find a measurable subtrace of the execution of the
534\textsf{RTLabs} code, suitable for the construction of a structured
535trace (see Section~\ref{sec:structuredtrace}.  This is essentially an
536extra layer on top of the simulation proofs that provides us with the
537extra information required for our intensional correctness proof.
539\subsection{Generic measurable subtrace lifting proof}
541Our generic proof is parametrised on a record containing small-step
542semantics for the source and target language, a classification of
543states (the same form of classification is used when defining
544structured traces), a simulation relation which loosely respects the
545classification and cost labelling \todo{this isn't very helpful} and
546four simulation results:
548\item a step from a `normal' state (which is not classified as a call
549  or return) which is not a cost label is simulated by zero or more
550  `normal' steps;
551\item a step from a `call' state followed by a cost label step is
552  simulated by a step from a `call' state, a corresponding label step,
553  then zero or more `normal' steps;
554\item a step from a `call' state not followed by a cost label
555  similarly (note that this case cannot occur in a well-labelled
556  program, but we do not have enough information locally to exploit
557  this); and
558\item a cost label step is simulated by a cost label step.
560Finally, we need to know that a successfully translated program will
561have an initial state in the simulation relation with the original
562program's initial state.
568\caption{Tiling of simulation for a measurable subtrace}
572To find the measurable subtrace in the target program's execution we
573walk along the original program's execution trace applying the
574appropriate simulation result by induction on the number of steps.
575While the number of steps taken varies, the overall structure is
576preserved, as illustrated in Figure~\ref{fig:tiling}.  By preserving
577the structure we also maintain the same intensional observables.  One
578delicate point is that the cost label following a call must remain
579directly afterwards\footnote{The prototype compiler allowed some
580  straight-line code to appear before the cost label until a later
581  stage of the compiler, but we must move the requirement forward to
582  fit with the structured traces.}
583% Damn it, I should have just moved the cost label forwards in RTLabs,
584% like the prototype does in RTL to ERTL; the result would have been
585% simpler.  Or was there some reason not to do that?
586(both in the program code and in the execution trace), even if we
587introduce extra steps, for example to store parameters in memory in
588\textsf{Cminor}.  Thus we have a version of the call simulation
589that deals with both in one result.
591In addition to the subtrace we are interested in measuring we must
592also prove that the earlier part of the trace is also preserved in
593order to use the simulation from the initial state.  It also
594guarantees that we do not run out of stack space before the subtrace
595we are interested in.  The lemmas for this prefix and the measurable
596subtrace are similar, following the pattern above.  However, the
597measurable subtrace also requires us to rebuild the termination
598proof.  This is defined recursively:
601let rec will_return_aux C (depth:nat)
602                             (trace:list (cs_state … C × trace)) on trace : bool :=
603match trace with
604[ nil $\Rightarrow$ false
605| cons h tl $\Rightarrow$
606  let $\langle$s,tr$\rangle$ := h in
607  match cs_classify C s with
608  [ cl_call $\Rightarrow$ will_return_aux C (S depth) tl
609  | cl_return $\Rightarrow$
610      match depth with
611      [ O $\Rightarrow$ match tl with [ nil $\Rightarrow$ true | _ $\Rightarrow$ false ]
612      | S d $\Rightarrow$ will_return_aux C d tl
613      ]
614  | _ $\Rightarrow$ will_return_aux C depth tl
615  ]
618The \lstinline'depth' is the number of return states we need to see
619before we have returned to the original function (initially zero) and
620\lstinline'trace' the measurable subtrace obtained from the running
621the semantics for the correct number of steps.  This definition
622unfolds tail recursively for each step, and once the corresponding
623simulation result has been applied a new one for the target can be
624asserted by unfolding and applying the induction hypothesis on the
625shorter trace.
627This then gives us an overall result for any simulation fitting the
628requirements above (contained in the \lstinline'meas_sim' record):
630theorem measured_subtrace_preserved :
631  $\forall$MS:meas_sim.
632  $\forall$p1,p2,m,n,stack_cost,max.
633  ms_compiled MS p1 p2 $\rightarrow$
634  measurable (ms_C1 MS) p1 m n stack_cost max $\rightarrow$
635  $\exists$m',n'.
636    measurable (ms_C2 MS) p2 m' n' stack_cost max $\wedge$
637    observables (ms_C1 MS) p1 m n = observables (ms_C2 MS) p2 m' n'.
639The stack space requirement that is embedded in \lstinline'measurable'
640is a consequence of the preservation of observables.
642TODO: how to deal with untidy edges wrt to sim rel; anything to
643say about obs?
645TODO: say something about termination measures; cost labels are
646statements/exprs in these languages; split call/return gives simpler
649\subsection{Simulation results for each pass}
651\todo{don't use loop structures from CompCert, go straight to jumps}
653\paragraph{Cast simplification.}
655The parser used in Cerco introduces a lot of explicit type casts.
656If left as they are, these constructs can greatly hamper the
657quality of the generated code -- especially as the architecture
658we consider is an $8$ bit one. In \textsf{Clight}, casts are
659expressions. Hence, most of the work of this transformation
660proceeds on expressions. The tranformation proceeds by recursively
661trying to coerce an expression to a particular integer type, which
662is in practice smaller than the original one. This functionality
663is implemented by two mutually recursive functions whose signature
664is the following.
667let rec simplify_expr (e:expr) (target_sz:intsize) (target_sg:signedness)
668  : Σresult:bool×expr.
669    ∀ge,en,m. simplify_inv ge en m e (\snd result) target_sz target_sg (\fst result) ≝ $\ldots$
671and simplify_inside (e:expr) : Σresult:expr. conservation e result ≝ $\ldots$
674The \textsf{simplify\_inside} acts as a wrapper for
675\textsf{simplify\_expr}. Whenever \textsf{simplify\_inside} encounters
676a \textsf{Ecast} expression, it tries to coerce the sub-expression
677to the desired type using \textsf{simplify\_expr}, which tries to
678perform the actual coercion. In return, \textsf{simplify\_expr} calls
679back \textsf{simplify\_inside} in some particular positions, where we
680decided to be conservative in order to simplify the proofs. However,
681the current design allows to incrementally revert to a more aggressive
682version, by replacing recursive calls to \textsf{simplify\_inside} by
683calls to \textsf{simplify\_expr} \emph{and} proving the corresponding
684invariants -- where possible.
686The \textsf{simplify\_inv} invariant encodes either the conservation
687of the semantics during the transformation corresponding to the failure
688of the transformation (\textsf{Inv\_eq} constructor), or the successful
689downcast of the considered expression to the target type
693inductive simplify_inv
694  (ge : genv) (en : env) (m : mem)
695  (e1 : expr) (e2 : expr) (target_sz : intsize) (target_sg : signedness) : bool → Prop ≝
696| Inv_eq : ∀result_flag. $\ldots$
697     simplify_inv ge en m e1 e2 target_sz target_sg result_flag
698| Inv_coerce_ok : ∀src_sz,src_sg.
699     (typeof e1) = (Tint src_sz src_sg) → (typeof e2) = (Tint target_sz target_sg) →     
700     (smaller_integer_val src_sz target_sz src_sg (exec_expr ge en m e1) (exec_expr ge en m e2)) →
701     simplify_inv ge en m e1 e2 target_sz target_sg true.
704The \textsf{conservation} invariant simply states the conservation
705of the semantics, as in the \textsf{Inv\_eq} constructor of the previous
709definition conservation ≝ λe,result. ∀ge,en,m.
710          res_sim ? (exec_expr ge en m e) (exec_expr ge en m result)
711         ∧ res_sim ? (exec_lvalue ge en m e) (exec_lvalue ge en m result)
712         ∧ typeof e = typeof result.
715This invariant is then easily lifted to statement evaluations.
716The main problem encountered with this particular pass was dealing with
717inconsistently typed programs, a canonical case being a particular
718integer constant of a certain size typed with another size. This
719prompted the need to introduce numerous type checks, complexifying
720both the implementation and the proof.
721\todo{Make this a particular case of the more general statement on baking more invariants in the Clight language}
723\paragraph{Switch removal.}
725The intermediate languages of the front-end have no jump tables.
726Hence, we have to compile the \lstinline[language=C]'switch'
727constructs away before going from \textsf{Clight} to \textsf{Cminor}.
728Note that this transformation does not necessarily deteriorate the
729efficiency of the generated code. For instance, compilers such as GCC
730introduce balanced trees of ``if-then-else'' constructs for small
731switches. However, our implementation strategy is much simpler. Let
732us consider the following input statement.
735   switch(e) {
736   case v1:
737     stmt1;
738   case v2:
739     stmt2;
740   default:
741     stmt_default;
742   }
745Note that \textsf{stmt1}, \textsf{stmt2}, \ldots \textsf{stmt\_default}
746may contain \lstinline[language=C]'break' statements, which have the
747effect of exiting the switch statement. In the absence of break, the
748execution falls through each case sequentially. In our current implementation,
749we produce an equivalent sequence of ``if-then'' chained by gotos:
751   fresh = e;
752   if(fresh == v1) {
753     $\llbracket$stmt1$\rrbracket$;
754     goto lbl_case2;
755   };
756   if(fresh == v2) {
757     lbl_case2:
758     $\llbracket$stmt2;$\rrbracket$
759     goto lbl_case2;
760   };
761   $\llbracket$stmt_default$\rrbracket$;
762   exit_label:
765The proof had to tackle the following points:
767\item the source and target memories are not the same (cf. fresh variable),
768\item the flow of control is changed in a non-local way (e.g. \textbf{goto} 
769  instead of \textbf{break}).
772In order to tackle the first point, we implemented a version of memory
773extensions similar to Compcert's. What has been done is the simulation proof
774for all ``easy'' cases, that do not interact with the switch removal per se,
775plus a bit of the switch case. This comprises propagating the memory extension
776through  each statement (except switch), as well as various invariants that
777are needed for the switch case (in particular, freshness hypotheses). The
778details of the evaluation process for the source switch statement and its
779target counterpart can be found in the file \textbf{},
780along more details on the transformation itself.
782Proving the correctness of the second point would require reasoning on the
783semantics of \lstinline[language=C]'goto' statements. In the \textsf{Clight} 
784semantics, this is implemented as a function-wide lookup of the target label.
785The invariant we would need is the fact that a global label lookup on a freshly
786created goto is equivalent to a local lookup. This would in turn require the
787propagation of some freshness hypotheses on labels. For reasons expressed above,
788we decided to omit this part of the correctness proof.
790\paragraph{Clight to Cminor.}
792This pass is the last one operating on the Clight intermediate language.
793Its input is a full \textsf{Clight} program, and its output is a \textsf{Cminor} 
794program. Note that we do not use an equivalent of the C\#minor language: we
795translate directly to Cminor. This presents the advantage of not requiring the
796special loop constructs, nor the explicit block structure. Another salient
797point of our approach is that a significant part of the properties needed for
798the simulation proof were directly encoded in dependently typed translation
799functions. This concerns more precisely freshness conditions and well-typedness
800conditions. The main effects of the transformation from \textsf{Clight} to
801\textsf{Cminor} are listed below.
804\item Variables are classified as being either globals, stack-allocated
805  locals or potentially register-allocated locals. The value of register-allocated
806  local variables is moved out of the modelled memory and stored in a
807  dedicated environment.
808\item In Clight, each local variable has a dedicated memory block, whereas
809  stack-allocated locals are bundled together on a function-by-function basis.
810\item Loops are converted to jumps.
813The first two points require memory injections which are more flexible that those
814needed in the switch removal case. In the remainder of this section, we briefly
815discuss our implementation of memory injections, and then the simulation proof.
817\subparagraph{Memory injections.}
819Our memory injections are modelled after the work of Blazy \& Leroy.
820However, the corresponding paper is based on the first version of the
821Compcert memory model, whereas we use a much more concrete model, allowing byte-level
822manipulations (as in the later version of Compcert's memory model). We proved
823roughly 80 \% of the required lemmas. Some difficulties encountered were notably
824due to some too relaxed conditions on pointer validity (problem fixed during development).
825Some more conditions had to be added to take care of possible overflows when converting
826from \textbf{Z} block bounds to $16$ bit pointer offsets (in pratice, these overflows
827only occur in edge cases that are easily ruled out -- but this fact is not visible
828in memory injections). Concretely, some of the lemmas on the preservation of simulation of
829loads after writes were axiomatised, for lack of time.
831\subparagraph{Simulation proof.}
833The correction proof for this transformation was not terminated. We proved the
834simulation result for expressions and for some subset of the critical statement cases.
835Notably lacking are the function entry and exit, where the memory injection is
836properly set up. As can be guessed, a significant amount of work has to be performed
837to show the conservation of all invariants at each simulation step.
839\todo{list cases, explain while loop, explain labeling problem}
841\section{Checking cost labelling properties}
844Ideally, we would provide proofs that the cost labelling pass always
845produced programs that are soundly and precisely labelled and that
846each subsequent pass preserves these properties.  This would match our
847use of dependent types to eliminate impossible sources of errors
848during compilation, in particular retaining intermediate language type
851However, given the limited amount of time available we realised that
852implementing a compile-time check for a sound and precise labelling of
853the \textsf{RTLabs} intermediate code would reduce the proof burden
854considerably.  This is similar in spirit to the use of translation
855validation in certified compilation\todo{Cite some suitable work
856  here}, which makes a similar trade-off between the potential for
857compile-time failure and the volume of proof required.
859The check cannot be pushed into a later stage of the compiler because
860much of the information is embedded into the structured traces.
861However, if an alternative method was used to show that function
862returns in the compiled code are sufficiently well-behaved, then we
863could consider pushing the cost property checks into the timing
864analysis itself.  We leave this as a possible area for future work.
866\subsection{Implementation and correctness}
868For a cost labelling to be sound and precise we need a cost label at
869the start of each function, after each branch and at least one in
870every loop.  The first two parts are trivial to check by examining the
871code.  In \textsf{RTLabs} the last part is specified by saying
872that there is a bound on the number of successive instruction nodes in
873the CFG that you can follow before you encounter a cost label, and
874checking this is more difficult.
876The implementation works through the set of nodes in the graph,
877following successors until a cost label is found or a label-free cycle
878is discovered (in which case the property does not hold and we stop).
879This is made easier by the prior knowledge that any branch is followed
880by cost labels, so we do not need to search each branch.  When a label
881is found, we remove the chain from the set and continue from another
882node in the set until it is empty, at which point we know that there
883is a bound for every node in the graph.
885Directly reasoning about the function that implements this would be
886rather awkward, so an inductive specification of a single step of its
887behaviour was written and proved to match the implementation.  This
888was then used to prove the implementation sound and complete.
890While we have not attempted to proof that the cost labelled properties
891are established and preserved earlier in the compiler, we expect that
892the effort for the \textsf{Cminor} to \textsf{RTLabs} would be similar
893to the work outlined above, because it involves the change from
894requiring a cost label at particular positions to requiring cost
895labels to break loops in the CFG.  As there are another three passes
896to consider (including the labelling itself), we believe that using
897the check above is much simpler overall.
899% TODO? Found some Clight to Cminor bugs quite quickly
901\section{Existence of a structured trace}
904\emph{Structured traces} enrich the execution trace of a program by
905nesting function calls in a mixed-step style\todo{Can I find a
906  justification for mixed-step} and embedding the cost properties of
907the program.  It was originally designed to support the proof of
908correctness for the timing analysis of the object code in the
909back-end, then generalised to provide a common structure to use from
910the end of the front-end to the object code.  See
911Figure~\ref{fig:strtrace} on page~\pageref{fig:strtrace}
912for an illustration of a structured trace.
914To make the definition generic we abstract over the semantics of the
917record abstract_status : Type[1] :=
918  { as_status :> Type[0]
919  ; as_execute : relation as_status
920  ; as_pc : DeqSet
921  ; as_pc_of : as_status $\rightarrow$ as_pc
922  ; as_classify : as_status $\rightarrow$ status_class
923  ; as_label_of_pc : as_pc $\rightarrow$ option costlabel
924  ; as_after_return : ($\Sigma$s:as_status. as_classify s =  cl_call) $\rightarrow$ as_status $\rightarrow$ Prop
925  ; as_result: as_status $\rightarrow$ option int
926  ; as_call_ident : ($\Sigma$s:as_status.as_classify s = cl_call) $\rightarrow$ ident
927  ; as_tailcall_ident : ($\Sigma$s:as_status.as_classify s = cl_tailcall) $\rightarrow$ ident
928  }.
930which gives a type of states, an execution relation, some notion of
931program counters with decidable equality, the classification of states
932and functions to extract the observable intensional information (cost
933labels and the identity of functions that are called).  The
934\lstinline'as_after_return' property links the state before a function
935call with the state after return, providing the evidence that
936execution returns to the correct place.  The precise form varies
937between stages; in \textsf{RTLabs} it insists the CFG, the pointer to
938the CFG node to execute next, and some call stack information is
941The structured traces are defined using three mutually inductive
942types.  The core data structure is \lstinline'trace_any_label', which
943captures some straight-line execution until the next cost label or
944function return.  Each function call is embedded as a single step,
945with its own trace nested inside and the before and after states
946linked by \lstinline'as_after_return', and states classified as a
947`jump' (in particular branches) must be followed by a cost label.
949The second type, \lstinline'trace_label_label', is a
950\lstinline'trace_any_label' where the initial state is cost labelled.
951Thus a trace in this type identifies a series of steps whose cost is
952entirely accounted for by the label at the start.
954Finally, \lstinline'trace_label_return' is a sequence of
955\lstinline'trace_label_label' values which end in the return from the
956function.  These correspond to a measurable subtrace, and in
957particular include executions of an entire function call (and so are
958used for the nested calls in \lstinline'trace_any_label').
962The core part of the construction of a fixed trace is to use the proof
963of termination from the measurable trace (defined on
964page~\pageref{prog:terminationproof}) to `fold up' the execution into
965the nested form.
967TODO: design, basic structure from termination proof, how cost
968labelling props are baked in; unrepeating PCs, remainder of sound
969labellings; coinductive version for whole programs, reason/relevance,
970use of em (maybe a general comment about uses of classical reasoning
971in development)
977TODO: appendix on code layout?
Note: See TracBrowser for help on using the repository browser.