source: Deliverables/D3.4/Report/report.tex @ 3249

Last change on this file since 3249 was 3249, checked in by Ian Stark, 7 years ago

UEDIN D3.4, D6.4-6.5

File size: 55.2 KB
14% LaTeX Companion, p 74
15\newcommand{\todo}[1]{\marginpar{\raggedright - #1}}
18  {keywords={Definition,Lemma,Theorem,Remark,Qed,Save,Inductive,Record},
19   morekeywords={[2]if,then,else},
20  }
23  {keywords={definition,lemma,theorem,remark,inductive,record,qed,let,rec,match,with,Type,and,on},
24   morekeywords={[2]whd,normalize,elim,cases,destruct},
25   mathescape=true,
26   morecomment=[n]{(*}{*)},
27  }
30        keywordstyle=\color{red}\bfseries,
31        keywordstyle=[2]\color{blue},
32        commentstyle=\color{green},
33        stringstyle=\color{blue},
34        showspaces=false,showstringspaces=false}
48\vspace*{1cm}Project FP7-ICT-2009-C-243881 \cerco{}}
50\date{ }
70Report n. D3.4\\
71Front-end Correctness Proofs\\
78Version 1.0
86Brian~Campbell, Ilias~Garnier, James~McKinna, Ian~Stark
92Project Acronym: \cerco{}\\
93Project full title: Certified Complexity\\
94Proposal/Contract no.: FP7-ICT-2009-C-243881 \cerco{}\\
96\clearpage \pagestyle{myheadings} \markright{\cerco{}, FP7-ICT-2009-C-243881}
100\section*{Executive Summary}
101\addcontentsline{toc}{section}{Executive Summary}
104This document reports on \textbf{D3.4:~Front-end Correctness Proofs}, the
105final deliverable of \cerco{} work package~3 \emph{Verified Compiler - front
106  end}.  That work package was devoted to the formalisation and verification of
107the front-end of the \cerco{} cost lifting compiler, and Deliverable~3.4 is a
108machine-checked formal correctness proof of the compiler front-end, written in
109the \matita{} proof assistant.
111The proof itself is contained in a series of Matita files, listed in
112Appendix~A and culminating in the lemma \lstinline'front_end_correct' and the
113theorem \lstinline'correct' in the file \lstinline''.  This
114document accompanies the deliverable and briefly reports on the work carried
115out in the development of that proof.
121We report on the correctness proofs for the front-end of the \cerco{}
122cost lifting compiler.  First, we identify the core result we wish to
123prove, which says that the we correctly predict the precise execution
124time for particular parts of the execution called \emph{measurable}
125subtraces.  Then we consider the three distinct parts of the task:
126showing that the \emph{annotated source code} output by the compiler
127has equivalent behaviour to the original input (up to the
128annotations); showing that a measurable subtrace of the
129annotated source code corresponds to an equivalent measurable subtrace
130in the code produced by the front-end, including costs; and finally
131showing that the enriched \emph{structured} execution traces required
132for cost correctness in the back-end can be constructed from the
133properties of the code produced by the front-end.
135A key part of our work is that the intensional correctness results which show
136that we get consistent cost measurements throughout the intermediate languages
137of the compiler can be layered on top of normal forward simulation results,
138if we split those results into local call-structure preserving simulations.
139This split allowed us to concentrate on the \textbf{intensional} proofs by
140axiomatising some of the extensional simulation results that are very similar to
141existing compiler correctness results, such as CompCert.
143This report is about the correctness results that are deliverable
144D3.4, which are about the formalised compiler described in D3.2, using
145the source language semantics from D3.1 and intermediate language
146semantics from D3.3.  It builds on earlier work on the correctness of
147a toy compiler built to test the labelling approach in D2.1. Together
148with the companion deliverable about the correctness of the back-end,
149D4.4, we obtain results about the whole formalised compiler.
155% CHECK: clear up any -ize vs -ise
156% CHECK: clear up any "front end" vs "front-end"
157% CHECK: clear up any mentions of languages that aren't textsf'd.
158% CHECK: fix unicode in listings
162The \cerco{} compiler produces a version of the source code containing
163annotations describing the timing behaviour of the object code, as
164well as the object code itself. It compiles C code, targeting
165microcontrollers implementing the Intel 8051 architecture.  There are
166two versions: first, an initial prototype was implemented in
167\ocaml{}~\cite{d2.2}, then a version was formalised in the \matita{}
168proof assistant~\cite{d3.2,d4.2} and extracted to \ocaml{} code to
169produce an executable compiler.  In this document we present results
170from Deliverable 3.4, the formalised proofs in \matita{} about the
171front-end of the latter version of the compiler (culminating in the
172\lstinline'front_end_correct' lemma), and describe how that fits
173into the verification of the whole compiler.
175A key part of this work was to layer the \emph{intensional} correctness
176results that show that the costs produced are correct on top of the
177proofs about the compiled code's \emph{extensional} behaviour (that is, the
178functional correctness of the compiler).  Unfortunately, the ambitious
179goal of completely verifying the entire compiler was not feasible
180within the time available, but thanks to this separation of
181extensional and intensional proofs we are able to axiomatise some extensional
182simulation results which are similar to those in other compiler verification
183projects and concentrate on the novel intensional proofs.  We were
184also able to add stack space costs to obtain a stronger result.  The
185proofs were made more tractable by introducing compile-time checks for
186the `sound and precise' cost labelling properties rather than proving
187that they are preserved throughout.
189The overall statement of correctness says that the annotated program has the
190same behaviour as the input, and that for any suitably well-structured part of
191the execution (which we call \emph{measurable}), the object code will execute
192the same behaviour taking precisely the time given by the cost annotations in
193the annotated source program.
195In the next section we recall the structure of the compiler and make the overall
196statement more precise.  Following that, in Section~\ref{sec:fegoals} we
197describe the statements we need to prove about the intermediate \textsf{RTLabs}
198programs for the back-end proofs.
199Section~\ref{sec:inputtolabelling} covers the compiler passes which produce the
200annotated source program and Section~\ref{sec:measurablelifting} the rest
201of the transformations in the front-end.  Then the compile-time checks
202for good cost labelling are detailed in Section~\ref{sec:costchecks}
203and the proofs that the structured traces required by the back-end
204exist are discussed in Section~\ref{sec:structuredtrace}.
206\section{The compiler and its correctness statement}
208The uncertified prototype \ocaml{} \cerco{} compiler was originally described
209in Deliverables 2.1 and 2.2.  Its design was replicated in the formal
210\matita{} code, which was presented in Deliverables 3.2 and 4.2, for
211the front-end and back-end respectively.
217\caption{Languages in the \cerco{} compiler}
221The compiler uses a number of intermediate languages, as outlined the
222middle two lines of Figure~\ref{fig:compilerlangs}.  The upper line
223represents the front-end of the compiler, and the lower the back-end,
224finishing with Intel 8051 binary code.  Not all of the front-end compiler passes
225introduce a new language, and Figure~\ref{fig:summary} presents a
226list of every pass involved.
232\quad \= $\downarrow$ \quad \= \kill
233\textsf{C} (unformalised)\\
234\> $\downarrow$ \> CIL parser (unformalised \ocaml)\\
236%\> $\downarrow$ \> add runtime functions\\
237\> $\downarrow$ \> \lstinline[language=C]'switch' removal\\
238\> $\downarrow$ \> labelling\\
239\> $\downarrow$ \> cast removal\\
240\> $\downarrow$ \> stack variable allocation and control structure
241 simplification\\
243%\> $\downarrow$ \> generate global variable initialization code\\
244\> $\downarrow$ \> transform to RTL graph\\
246\> $\downarrow$ \> check cost labelled properties of RTL graph\\
247\> $\downarrow$ \> start of target specific back-end\\
248\>\quad \vdots
252\caption{Front-end languages and compiler passes}
257The annotated source code is produced by the cost labelling phase.
258Note that there is a pass to replace C \lstinline[language=C]'switch'
259statements before labelling --- we need to remove them because the
260simple form of labelling used in the formalised compiler is not quite
261capable of capturing their execution time costs, largely due to C's
262`fall-through' behaviour where execution from one branch continues in
263the next unless there is an explicit \lstinline[language=C]'break'.
265The cast removal phase which follows cost labelling simplifies
266expressions to prevent unnecessary arithmetic promotion, which is
267specified by the C standard but costly for an 8-bit target.  The
268transformation to \textsf{Cminor} and subsequently \textsf{RTLabs}
269bear considerable resemblance to some passes of the CompCert
270compiler~\cite{Blazy-Leroy-Clight-09,Leroy-backend}, although we use a simpler \textsf{Cminor} where
271all loops use \lstinline[language=C]'goto' statements, and the
272\textsf{RTLabs} language retains a target-independent flavour.  The
273back-end takes \textsf{RTLabs} code as input.
275The whole compilation function returns the following information on success:
277  record compiler_output : Type[0] :=
278   { c_labelled_object_code: labelled_object_code
279   ; c_stack_cost: stack_cost_model
280   ; c_max_stack: nat
281   ; c_init_costlabel: costlabel
282   ; c_labelled_clight: clight_program
283   ; c_clight_cost_map: clight_cost_map
284   }.
286It consists of annotated 8051 object code, a mapping from function
287identifiers to the function's stack space usage, the space available for the
288stack after global variable allocation, a cost label covering the
289execution time for the initialisation of global variables and the call
290to the \lstinline[language=C]'main' function, the annotated source
291code, and finally a mapping from cost labels to actual execution time
294An \ocaml{} pretty printer is used to provide a concrete version of
295the output code and annotated source code.  In the case of the
296annotated source code, it also inserts the actual costs alongside the
297cost labels, and optionally adds a global cost variable and
298instrumentation to support further reasoning in external tools such as
301\subsection{Revisions to the prototype compiler}
303Our focus on intensional properties prompted us to consider whether we
304could incorporate stack space into the costs presented to the user.
305We only allocate one fixed-size frame per function, so modelling this
306was relatively simple.  It is the only form of dynamic memory
307allocation provided by the compiler, so we were able to strengthen the
308statement of the goal to guarantee successful execution whenever the
309stack space obeys the \lstinline'c_max_stack' bound calculated by
310subtracting the global variable requirements from the total memory
313The cost labelling checks at the end of Figure~\ref{fig:summary} have been
314introduced to reduce the proof burden, and are described in
317The use of dependent types to capture simple intermediate language
318invariants makes every front-end pass a total function, except
319\textsf{Clight} to \textsf{Cminor} and the cost checks.  Hence various
320well-formedness and type safety checks are performed only once between
321\textsf{Clight} and \textsf{Cminor}, and the invariants rule out any
322difficulties in the later stages.  With the benefit of hindsight we
323would have included an initial checking phase to produce a
324`well-formed' variant of \textsf{Clight}, conjecturing that this would
325simplify various parts of the proofs for the \textsf{Clight} stages
326which deal with potentially ill-formed code.
328Following D2.2, we previously generated code for global variable
329initialisation in \textsf{Cminor}, for which we reserved a cost label
330to represent the execution time for initialisation.  However, the
331back-end must also add an initial call to the main function, whose
332cost must also be accounted for, so we decided to move the
333initialisation code to the back-end and merge the costs.
335\subsection{Main correctness statement}
337Informally, our main intensional result links the time difference in a source
338code execution to the time difference in the object code, expressing the time
339for the source by summing the values for the cost labels in the trace, and the
340time for the target by a clock built in to the 8051 executable semantics.
342The availability of precise timing information for 8501
343implementations and the design of the compiler allow it to give exact
344time costs in terms of processor cycles, not just upper bounds.
345However, these exact results are only available if the subtrace we
346measure starts and ends at suitable points.  In particular, pure
347computation with no observable effects may be reordered and moved past
348cost labels, so we cannot measure time between arbitrary statements in
349the program.
351There is also a constraint on the subtraces that we
352measure due to the requirements of the correctness proof for the
353object code timing analysis.  To be sure that the timings are assigned
354to the correct cost label, we need to know that each return from a
355function call must go to the correct return address.  It is difficult
356to observe this property locally in the object code because it relies
357on much earlier stages in the compiler.  To convey this information to
358the timing analysis extra structure is imposed on the subtraces, which
359is described in Section~\ref{sec:fegoals}.
361% Regarding the footnote, would there even be much point?
362% TODO: this might be quite easy to add ('just' subtract the
363% measurable subtrace from the second label to the end).  Could also
364% measure other traces in this manner.
365These restrictions are reflected in the subtraces that we give timing
366guarantees on; they must start at a cost label and end at the return
367of the enclosing function of the cost label\footnote{We expect that
368  this would generalise to more general subtraces by subtracting costs
369  for unwanted measurable suffixes of a measurable subtrace.}.  A
370typical example of such a subtrace is the execution of an entire
371function from the cost label at the start of the function until it
372returns.  We call such any such subtrace \emph{measurable} if it (and
373the prefix of the trace from the start to the subtrace) can also be
374executed within the available stack space.
376Now we can give the main intensional statement for the compiler.
377Given a \emph{measurable} subtrace for a labelled \textsf{Clight}
378program, there is a subtrace of the 8051 object code program where the
379time differences match.  Moreover, \emph{observable} parts of the
380trace also match --- these are the appearance of cost labels and
381function calls and returns.
385More formally, the definition of this statement in \matita{} is
387definition simulates :=
388  $\lambda$p: compiler_output.
389  let initial_status := initialise_status $...$ (cm (c_labelled_object_code $...$ p)) in
390  $\forall$m1,m2.
391   measurable Clight_pcs (c_labelled_clight $...$ p) m1 m2
392       (stack_sizes (c_stack_cost $...$ p)) (c_max_stack $...$ p) $\rightarrow$
393  $\forall$c1,c2.
394   clock_after Clight_pcs (c_labelled_clight $...$ p) m1 (c_clight_cost_map $...$ p) = OK $...$ c1 $\rightarrow$
395   clock_after Clight_pcs (c_labelled_clight $...$ p) (m1+m2) (c_clight_cost_map $...$ p) = OK $...$ c2 $\rightarrow$
396  $\exists$n1,n2.
397   observables Clight_pcs (c_labelled_clight $...$ p) m1 m2 =
398     observables (OC_preclassified_system (c_labelled_object_code $...$ p))
399          (c_labelled_object_code $...$ p) n1 n2
400  $\wedge$
401   clock ?? (execute (n1+n2) ? initial_status) =
402     clock ?? (execute n1 ? initial_status) + (c2-c1).
404where the \lstinline'measurable', \lstinline'clock_after' and
405\lstinline'observables' definitions are generic definitions for multiple
406languages; in this case the \lstinline'Clight_pcs' record applies them
407to \textsf{Clight} programs.
409There is a second part to the statement, which says that the initial
410processing of the input program to produce the cost labelled version
411does not affect the semantics of the program:
412% Yes, I'm paraphrasing the result a tiny bit to remove the observe non-function
414  $\forall$input_program,output.
415  compile input_program = return output $\rightarrow$
416  not_wrong $...$ (exec_inf $...$ clight_fullexec input_program) $\rightarrow$
417  sim_with_labels
418   (exec_inf $...$ clight_fullexec input_program)
419   (exec_inf $...$ clight_fullexec (c_labelled_clight $...$ output))
421That is, any successful compilation produces a labelled program that
422has identical behaviour to the original, so long as there is no
423`undefined behaviour'.
425Note that this statement provides full functional correctness, including
426preservation of (non-)termination.  The intensional result above does
427not do this directly --- it does not guarantee the same result or same
428termination.  There are two mitigating factors, however: first, to
429prove the intensional property you need local simulation results --- these
430can be pieced together to form full behavioural equivalence, only time
431constraints have prevented us from doing so.  Second, if we wish to
432confirm a result, termination, or non-termination we could add an
433observable witness, such as a function that is only called if the
434correct result is given.  The intensional result guarantees that the
435observable witness is preserved, so the program must behave correctly.
437These two results are combined in the the \lstinline'correct'
438theorem in the file \lstinline''.
440\section{Correctness statement for the front-end}
443The essential parts of the intensional proof were outlined during work
444on a toy compiler in Task
4452.1~\cite{d2.1,springerlink:10.1007/978-3-642-32469-7_3}.  These are
447\item functional correctness, in particular preserving the trace of
448  cost labels,
449\item the \emph{soundness} and \emph{precision} of the cost labelling
450  on the object code, and
451\item the timing analysis on the object code produces a correct
452  mapping from cost labels to time.
455However, that toy development did not include function calls.  For the
456full \cerco{} compiler we also need to maintain the invariant that
457functions return to the correct program location in the caller, as we
458mentioned in the previous section.  During work on the back-end timing
459analysis (describe in more detail in the companion deliverable, D4.4)
460the notion of a \emph{structured trace} was developed to enforce this
461return property, and also most of the cost labelling properties too.
467\caption{The compiler and proof outline}
471Jointly, we generalised the structured traces to apply to any of the
472intermediate languages which have some idea of program counter.  This means
473that they are introduced part way through the compiler, see
474Figure~\ref{fig:compiler}.  Proving that a structured trace can be
475constructed at \textsf{RTLabs} has several virtues:
477\item This is the first language where every operation has its own
478  unique, easily addressable, statement.
479\item Function calls and returns are still handled implicitly in the
480  language and so the structural properties are ensured by the
481  semantics.
482\item Many of the back-end languages from \textsf{RTL} onwards share a common
483  core set of definitions, and using structured traces throughout
484  increases this uniformity.
491\caption{Nesting of functions in structured traces}
494A structured trace is a mutually inductive data type which
495contains the steps from a normal program trace, but arranged into a
496nested structure which groups entire function calls together and
497aggregates individual steps between cost labels (or between the final
498cost label and the return from the function), see
499Figure~\ref{fig:strtrace}.  This captures the idea that the cost labels
500only represent costs \emph{within} a function --- calls to other
501functions are accounted for in the nested trace for their execution, and we
502can locally regard function calls as a single step.
504These structured traces form the core part of the intermediate results
505that we must prove so that the back-end can complete the main
506intensional result stated above.  In full, we provide the back-end
509\item A normal trace of the \textbf{prefix} of the program's execution
510  before reaching the measurable subtrace.  (This needs to be
511  preserved so that we know that the stack space consumed is correct,
512  and to set up the simulation results.)
513\item The \textbf{structured trace} corresponding to the measurable
514  subtrace.
515\item An additional property about the structured trace that no
516  `program counter' is \textbf{repeated} between cost labels.  Together with
517  the structure in the trace, this takes over from showing that
518  cost labelling is sound and precise.
519\item A proof that the \textbf{observables} have been preserved.
520\item A proof that the \textbf{stack limit} is still observed by the prefix and
521  the structure trace.  (This is largely a consequence of the
522  preservation of observables.)
524The \lstinline'front_end_correct' lemma in the
525\lstinline'' file provides a record containing these.
527Following the outline in Figure~\ref{fig:compiler}, we will first deal
528with the transformations in \textsf{Clight} that produce the source
529program with cost labels, then show that measurable traces can be
530lifted to \textsf{RTLabs}, and finally show that we can construct the
531properties listed above ready for the back-end proofs.
533\section{Input code to cost labelled program}
536As explained on page~\pageref{page:switchintro}, the costs of complex
537C \lstinline[language=C]'switch' statements cannot be represented with
538the simple labelling used in the formalised compiler.  Our first pass
539replaces these statements with simpler C code, allowing our second
540pass to perform the cost labelling.  We show that the behaviour of
541programs is unchanged by these passes using forward
542simulations\footnote{All of our languages are deterministic, which can
543be seen directly from their executable definitions.  Thus we know that
544forward simulations are sufficient because the target cannot have any
545other behaviour.}.
547\subsection{Switch removal}
549We compile away \lstinline[language=C]'switch' statements into more
550basic \textsf{Clight} code.
551Note that this transformation does not necessarily deteriorate the
552efficiency of the generated code. For instance, compilers such as GCC
553introduce balanced trees of ``if-then-else'' constructs for small
554switches. However, our implementation strategy is much simpler. Let
555us consider the following input statement.
558   switch(e) {
559   case v1:
560     stmt1;
561   case v2:
562     stmt2;
563   default:
564     stmt_default;
565   }
568Note that \textsf{stmt1}, \textsf{stmt2}, \ldots \textsf{stmt\_default}
569may contain \lstinline[language=C]'break' statements, which have the
570effect of exiting the switch statement. In the absence of break, the
571execution falls through each case sequentially. In our implementation,
572we produce an equivalent sequence of ``if-then'' chained by gotos:
574   fresh = e;
575   if(fresh == v1) {
576     $\llbracket$stmt1$\rrbracket$;
577     goto lbl_case2;
578   };
579   if(fresh == v2) {
580     lbl_case2:
581     $\llbracket$stmt2$\rrbracket$;
582     goto lbl_case2;
583   };
584   $\llbracket$stmt_default$\rrbracket$;
585   exit_label:
588The proof had to tackle the following points:
590\item the source and target memories are not the same (due to the fresh variable),
591\item the flow of control is changed in a non-local way (e.g. \textbf{goto} 
592  instead of \textbf{break}).
594In order to tackle the first point, we implemented a version of memory
595extensions similar to those of CompCert.
597For the simulation we decided to prove a sufficient amount to give us
598confidence in the definitions and approach, but to curtail the proof
599because this pass does not contribute to the intensional correctness
600result.  We tackled several simple cases, that do not interact with
601the switch removal per se, to show that the definitions were usable,
602and part of the switch case to check that the approach is
603reasonable. This comprises propagating the memory extension through
604each statement (except switch), as well as various invariants that are
605needed for the switch case (in particular, freshness hypotheses). The
606details of the evaluation process for the source switch statement and
607its target counterpart can be found in the file
608\lstinline'', along more details on the transformation
611Proving the correctness of the second point would require reasoning on the
612semantics of \lstinline[language=C]'goto' statements. In the \textsf{Clight} 
613semantics, this is implemented as a function-wide lookup of the target label.
614The invariant we would need is the fact that a global label lookup on a freshly
615created goto is equivalent to a local lookup. This would in turn require the
616propagation of some freshness hypotheses on labels. As discussed,
617we decided to omit this part of the correctness proof.
619\subsection{Cost labelling}
621The simulation for the cost labelling pass is the simplest in the
622front-end.  The main argument is that any step of the source program
623is simulated by the same step of the labelled one, plus any extra
624steps for the added cost labels.  The extra instructions do not change
625the memory or local environments, although we have to keep track of
626the extra instructions that appear in continuations, for example
627during the execution of a \lstinline[language=C]'while' loop.
629We do not attempt to capture any cost properties of the labelling\footnote{We describe how the cost properties are
630established in Section~\ref{sec:costchecks}.} in
631the simulation proof, which allows the proof to be oblivious to the choice
632of cost labels.  Hence we do not have to reason about the threading of
633name generation through the labelling function, greatly reducing the
634amount of effort required.
636%TODO: both give one-step-sim-by-many forward sim results; switch
637%removal tricky, uses aux var to keep result of expr, not central to
638%intensional correctness so curtailed proof effort once reasonable
639%level of confidence in code gained; labelling much simpler; don't care
640%what the labels are at this stage, just need to know when to go
641%through extra steps.  Rolled up into a single result with a cofixpoint
642%to obtain coinductive statement of equivalence (show).
644\section{Finding corresponding measurable subtraces}
647There follow the three main passes of the front-end:
649\item simplification of casts in \textsf{Clight} code
650\item \textsf{Clight} to \textsf{Cminor} translation, performing stack
651  variable allocation and simplifying control structures
652\item transformation to \textsf{RTLabs} control flow graph
654We have taken a common approach to
655each pass: first we build (or axiomatise) forward simulation results
656that are similar to normal compiler proofs, but which are slightly more
657fine-grained so that we can see that the call structure and relative
658placement of cost labels is preserved.
660Then we instantiate a general result which shows that we can find a
661\emph{measurable} subtrace in the target of the pass that corresponds
662to the measurable subtrace in the source.  By repeated application of
663this result we can find a measurable subtrace of the execution of the
664\textsf{RTLabs} code, suitable for the construction of a structured
665trace (see Section~\ref{sec:structuredtrace}).  This is essentially an
666extra layer on top of the simulation proofs that provides us with the
667additional information required for our intensional correctness proof.
669\subsection{Generic measurable subtrace lifting proof}
671Our generic proof is parametrised on a record containing small-step
672semantics for the source and target language, a classification of
673states (the same form of classification is used when defining
674structured traces), a simulation relation which respects the
675classification and cost labelling and
676four simulation results.  The simulations are split by the starting state's
677classification and whether it is a cost label, which will allow us to
678observe that the call structure is preserved.  They are:
680\item a step from a `normal' state (which is not classified as a call
681  or return) which is not a cost label is simulated by zero or more
682  `normal' steps;
683\item a step from a `call' state followed by a cost label step is
684  simulated by a step from a `call' state, a corresponding label step,
685  then zero or more `normal' steps;
686\item a step from a `call' state not followed by a cost label
687  similarly (note that this case cannot occur in a well-labelled
688  program, but we do not have enough information locally to exploit
689  this); and
690\item a cost label step is simulated by a cost label step.
692Finally, we need to know that a successfully translated program will
693have an initial state in the simulation relation with the original
694program's initial state.
696The back-end has similar requirements for lifting simulations to
697structured traces.  Fortunately, our treatment of calls and returns
698can be slightly simpler because we have special call and return states
699that correspond to function entry and return that are separate from
700the actual instructions.  This was originally inherited from our port
701of CompCert's \textsf{Clight} semantics, but proves useful here
702because we only need to consider adding extra steps \emph{after} a
703call or return state, because the instruction step deals with extra
704steps that occur before.  The back-end makes all of the call and
705return machinery explicit, and thus needs more complex statements
706about extra steps before and after each call and return.
712\caption{Tiling of simulation for a measurable subtrace}
716To find the measurable subtrace in the target program's execution we
717walk along the original program's execution trace applying the
718appropriate simulation result by induction on the number of steps.
719While the number of steps taken varies, the overall structure is
720preserved, as illustrated in Figure~\ref{fig:tiling}.  By preserving
721the structure we also maintain the same intensional observables.  One
722delicate point is that the cost label following a call must remain
723directly afterwards\footnote{The prototype compiler allowed some
724  straight-line code to appear before the cost label until a later
725  stage of the compiler, but we must move the requirement forward to
726  fit with the structured traces.}
727% Damn it, I should have just moved the cost label forwards in RTLabs,
728% like the prototype does in RTL to ERTL; the result would have been
729% simpler.  Or was there some reason not to do that?
730(both in the program code and in the execution trace), even if we
731introduce extra steps, for example to store parameters in memory in
732\textsf{Cminor}.  Thus we have a version of the call simulation
733that deals with both the call and the cost label in one result.
735In addition to the subtrace we are interested in measuring, we must
736prove that the earlier part of the trace is also preserved in
737order to use the simulation from the initial state.  This proof also
738guarantees that we do not run out of stack space before the subtrace
739we are interested in.  The lemmas for this prefix and the measurable
740subtrace are similar, following the pattern above.  However, the
741measurable subtrace also requires us to rebuild the termination
742proof.  This is defined recursively:
745  let rec will_return_aux C (depth:nat)
746                               (trace:list (cs_state $...$ C $\times$ trace)) on trace : bool :=
747  match trace with
748  [ nil $\Rightarrow$ false
749  | cons h tl $\Rightarrow$
750    let $\langle$s,tr$\rangle$ := h in
751    match cs_classify C s with
752    [ cl_call $\Rightarrow$ will_return_aux C (S depth) tl
753    | cl_return $\Rightarrow$
754        match depth with
755        [ O $\Rightarrow$ match tl with [ nil $\Rightarrow$ true | _ $\Rightarrow$ false ]
756        | S d $\Rightarrow$ will_return_aux C d tl
757        ]
758    | _ $\Rightarrow$ will_return_aux C depth tl
759    ]
760  ].
762The \lstinline'depth' is the number of return states we need to see
763before we have returned to the original function (initially zero) and
764\lstinline'trace' the measurable subtrace obtained from the running
765the semantics for the correct number of steps.  This definition
766unfolds tail recursively for each step, and once the corresponding
767simulation result has been applied a new one for the target can be
768asserted by unfolding and applying the induction hypothesis on the
769shorter trace.
771Combining the lemmas about the prefix and the measurable subtrace
772requires a little care because the states joining the two might not be
773related in the simulation.  In particular, if the measurable subtrace
774starts from the cost label at the beginning of the function there may
775be some extra instructions in the target code to execute to complete
776function entry before the states are back in the relation.  Hence we
777carefully phrased the lemmas to allow for such extra steps.
779Together, these then gives us an overall result for any simulation fitting the
780requirements above (contained in the \lstinline'meas_sim' record):
782theorem measured_subtrace_preserved :
783  $\forall$MS:meas_sim.
784  $\forall$p1,p2,m,n,stack_cost,max.
785  ms_compiled MS p1 p2 $\rightarrow$
786  measurable (ms_C1 MS) p1 m n stack_cost max $\rightarrow$
787  $\exists$m',n'.
788    measurable (ms_C2 MS) p2 m' n' stack_cost max $\wedge$
789    observables (ms_C1 MS) p1 m n = observables (ms_C2 MS) p2 m' n'.
791The stack space requirement that is embedded in \lstinline'measurable'
792is a consequence of the preservation of observables, because it is
793determined by the functions called and returned from, which are observable.
795\subsection{Simulation results for each pass}
797We now consider the simulation results for the passes, each of which
798is used to instantiate the
799\lstinline[language=matita]'measured_subtrace_preserved' theorem to
800construct the measurable subtrace for the next language.
802\subsubsection{Cast simplification}
804The parser used in \cerco{} introduces a lot of explicit type casts.
805If left as they are, these constructs can greatly hamper the
806quality of the generated code -- especially as the architecture
807we consider is an $8$-bit one. In \textsf{Clight}, casts are
808expressions. Hence, most of the work of this transformation
809proceeds on expressions. The transformation proceeds by recursively
810trying to coerce an expression to a particular integer type, which
811is in practice smaller than the original one. This functionality
812is implemented by two mutually recursive functions whose signature
813is the following.
816let rec simplify_expr (e:expr) (target_sz:intsize) (target_sg:signedness)
817  : $\Sigma$result:bool$\times$expr.
818    $\forall$ge,en,m. simplify_inv ge en m e (\snd result) target_sz target_sg (\fst result) := $\ldots$
820and simplify_inside (e:expr) : $\Sigma$result:expr. conservation e result := $\ldots$
823The \textsf{simplify\_inside} acts as a wrapper for
824\textsf{simplify\_expr}. Whenever \textsf{simplify\_inside} encounters
825a \textsf{Ecast} expression, it tries to coerce the sub-expression
826to the desired type using \textsf{simplify\_expr}, which tries to
827perform the actual coercion. In return, \textsf{simplify\_expr} calls
828back \textsf{simplify\_inside} in some particular positions, where we
829decided to be conservative in order to simplify the proofs. However,
830the current design allows to incrementally revert to a more aggressive
831version, by replacing recursive calls to \textsf{simplify\_inside} by
832calls to \textsf{simplify\_expr} \emph{and} proving the corresponding
833invariants -- where possible.
835The \textsf{simplify\_inv} invariant encodes either the conservation
836of the semantics during the transformation corresponding to the failure
837of the coercion (\textsf{Inv\_eq} constructor), or the successful
838downcast of the considered expression to the target type
842inductive simplify_inv
843  (ge : genv) (en : env) (m : mem)
844  (e1 : expr) (e2 : expr) (target_sz : intsize) (target_sg : signedness) : bool $\rightarrow$ Prop :=
845| Inv_eq : $\forall$result_flag. $\ldots$
846     simplify_inv ge en m e1 e2 target_sz target_sg result_flag
847| Inv_coerce_ok : $\forall$src_sz,src_sg.
848     typeof e1 = Tint src_sz src_sg $\rightarrow$
849     typeof e2 = Tint target_sz target_sg $\rightarrow$     
850     smaller_integer_val src_sz target_sz src_sg (exec_expr ge en m e1) (exec_expr ge en m e2) $\rightarrow$
851     simplify_inv ge en m e1 e2 target_sz target_sg true.
854The \textsf{conservation} invariant for \textsf{simplify\_inside} simply states the conservation
855of the semantics, as in the \textsf{Inv\_eq} constructor of the previous
859definition conservation := $\lambda$e,result. $\forall$ge,en,m.
860          res_sim ? (exec_expr ge en m e) (exec_expr ge en m result)
861       $\wedge$ res_sim ? (exec_lvalue ge en m e) (exec_lvalue ge en m result)
862       $\wedge$ typeof e = typeof result.
865This invariant is then easily lifted to statement evaluations.
866The main problem encountered with this particular pass was dealing with
867inconsistently typed programs, a canonical case being a particular
868integer constant of a certain size typed with another size. This
869prompted the need to introduce numerous type checks, making
870both the implementation and the proof more complex, even though more
871comprehensive checks are made in the next stage.
872%\todo{Make this a particular case of the more general statement on baking more invariants in the Clight language}
874\subsubsection{Clight to Cminor}
876This pass is the last one operating on the \textsf{Clight} language.
877Its input is a full \textsf{Clight} program, and its output is a
878\textsf{Cminor} program. Note that we do not use an equivalent of
879CompCert's \textsf{C\#minor} language: we translate directly to a
880variant of \textsf{Cminor}. This presents the advantage of not
881requiring the special loop constructs, nor the explicit block
882structure. Another salient point of our approach is that a significant
883number of the properties needed for the simulation proof were directly
884encoded in dependently typed translation functions.  In particular,
885freshness conditions and well-typedness conditions are included. The
886main effects of the transformation from \textsf{Clight} to
887\textsf{Cminor} are listed below.
890\item Variables are classified as being either globals, stack-allocated
891  locals or potentially register-allocated locals. The value of register-allocated
892  local variables is moved out of the modelled memory and stored in a
893  dedicated environment.
894\item In \textsf{Clight}, each local variable has a dedicated memory block, whereas
895  stack-allocated locals are bundled together on a function-by-function basis.
896\item Loops are converted to jumps.
899The first two points require memory injections which are more flexible that those
900needed in the switch removal case. In the remainder of this section, we briefly
901discuss our implementation of memory injections, and then the simulation proof.
903\paragraph{Memory injections.}
905Our memory injections are modelled after the work of Blazy \& Leroy.
906However, the corresponding paper is based on the first version of the
907CompCert memory model~\cite{2008-Leroy-Blazy-memory-model}, whereas we use a much more concrete model, allowing byte-level
908manipulations (as in the later version of CompCert's memory model). We proved
909roughly 80 \% of the required lemmas. Notably, some of the difficulties encountered were
910due to overly relaxed conditions on pointer validity (fixed during development).
911Some more side conditions had to be added to take care of possible overflows when converting
912from \textbf{Z} block bounds to $16$ bit pointer offsets (in practice, such overflows
913only occur in edge cases that are easily ruled out -- but this fact is not visible
914in memory injections). Concretely, some of the lemmas on the preservation of simulation of
915loads after writes were axiomatised, due to a lack of time.
917\paragraph{Simulation proof.}
919We proved the simulation result for expressions and a representative
920selection of statements.  In particular we tackled
921\lstinline[language=C]'while' statements to ensure that we correctly
922translate loops because our approach differs from CompCert by
923converting directly to \textsf{Cminor} \lstinline[language=C]'goto's
924rather than maintaining a notion of loop in \textsf{Cminor}.  We also have a partial
925proof for function entry, covering the setup of the memory injection,
926but not function exit.  Exits, and the remaining statements, have been
929Careful management of the proof state was required because proof terms
930are embedded in \textsf{Cminor} code to show that invariants are
931respected.  These proof terms appear in the proof state when inverting
932the translation functions, and they can be large and awkward.  While
933generalising them away is usually sufficient, it can be difficult when
934they appear under a binder.
936%The correctness proof for this transformation was not completed. We proved the
937%simulation result for expressions and for some subset of the critical statement cases.
938%Notably lacking are the function entry and exit, where the memory injection is
939%properly set up. As would be expected, a significant amount of work has to be performed
940%to show the conservation of all invariants at each simulation step.
942%\todo{list cases, explain while loop, explain labeling problem}
944\subsubsection{Cminor to RTLabs}
946The translation from \textsf{Cminor} to \textsf{RTLabs} is a fairly
947routine control flow graph (CFG) construction.  As such, we chose to
948axiomatise the associated extensional simulation results.  However, we did prove several
949properties of the generated programs:
951\item All statements are type correct with respect to the declared
952  pseudo-register type environment.
953\item The CFG is closed, and has a distinguished entry node and a
954  unique exit node.
957These properties rely on similar properties about type safety and the
958presence of \lstinline[language=C]'goto'-labels for \textsf{Cminor} programs
959which are checked at the preceding stage.  As a result, this
960transformation is total and any compilation failures must occur when
961the corresponding \textsf{Clight} source is available and a better
962error message can be generated.
964The proof obligations for these properties include many instances of
965graph inclusion.  We automated these proofs using a small amount of
966reflection, making the obligations much easier to handle.  One
967drawback to enforcing invariants throughout is that temporarily
968breaking them can be awkward.  For example, \lstinline'return'
969statements were originally used as placeholders for
970\lstinline[language=C]'goto' destinations that had not yet been
971translated.  However, this made establishing the single exit node
972property rather difficult, and a different placeholder was chosen
973instead.  In other circumstances it is possible to prove a more
974complex invariant then simplify it at the end of the transformation.
976\section{Checking cost labelling properties}
979Ideally, we would provide proofs that the cost labelling pass always
980produces programs that are soundly and precisely labelled and that
981each subsequent pass preserves these properties.  This would match our
982use of dependent types to eliminate impossible sources of errors
983during compilation, in particular retaining intermediate language type
986However, given the limited amount of time available we realised that
987implementing a compile-time check for a sound and precise labelling of
988the \textsf{RTLabs} intermediate code would reduce the proof burden
989considerably.  This is similar in spirit to the use of translation
990validation in certified compilation, which makes a similar trade-off
991between the potential for compile-time failure and the volume of proof
994The check cannot be pushed into a later stage of the compiler because
995much of the information is embedded into the structured traces.
996However, if an alternative method was used to show that function
997returns in the compiled code are sufficiently well-behaved, then we
998could consider pushing the cost property checks into the timing
999analysis itself.  We leave this as a possible area for future work.
1001\subsection{Implementation and correctness}
1004For a cost labelling to be sound and precise we need a cost label at
1005the start of each function, after each branch and at least one in
1006every loop.  The first two parts are trivial to check by examining the
1007code.  In \textsf{RTLabs} the last part is specified by saying
1008that there is a bound on the number of successive instruction nodes in
1009the CFG that you can follow before you encounter a cost label, and
1010checking this is more difficult.
1012The implementation progresses through the set of nodes in the graph,
1013following successors until a cost label is found or a label-free cycle
1014is discovered (in which case the property does not hold and we return
1015an error).  This is made easier by the prior knowledge that every
1016successor of a branch instruction is a cost label, so we do not need
1017to search each branch.  When a label is found, we remove the chain of
1018program counters from the set and continue from another node in the
1019set until it is empty, at which point we know that there is a bound
1020for every node in the graph.
1022Directly reasoning about the function that implements this procedure would be
1023rather awkward, so an inductive specification of a single step of its
1024behaviour was written and proved to match the implementation.  This
1025was then used to prove the implementation sound and complete.
1027While we have not attempted to prove that the cost labelled properties
1028are established and preserved earlier in the compiler, we expect that
1029the effort for the \textsf{Cminor} to \textsf{RTLabs} stage alone
1030would be similar to the work outlined above, because it involves the
1031change from requiring a cost label at particular positions to
1032requiring cost labels to break loops in the CFG.  As there are another
1033three passes to consider (including the labelling itself), we believe
1034that using the check above is much simpler overall.
1036% TODO? Found some Clight to Cminor bugs quite quickly
1038\section{Existence of a structured trace}
1041The \emph{structured trace} idea introduced in
1042Section~\ref{sec:fegoals} enriches the execution trace of a program by
1043nesting function calls in a mixed-step style and embedding the cost
1044labelling properties of the program.  See Figure~\ref{fig:strtrace} on
1045page~\pageref{fig:strtrace} for an illustration of a structured trace.
1046It was originally designed to support the proof of correctness for the
1047timing analysis of the object code in the back-end, then generalised
1048to provide a common structure to use from the end of the front-end to
1049the object code.
1051To make the definition generic we abstract over the semantics of the
1054record abstract_status : Type[1] :=
1055  { as_status :> Type[0]
1056  ; as_execute : relation as_status
1057  ; as_pc : DeqSet
1058  ; as_pc_of : as_status $\rightarrow$ as_pc
1059  ; as_classify : as_status $\rightarrow$ status_class
1060  ; as_label_of_pc : as_pc $\rightarrow$ option costlabel
1061  ; as_after_return : ($\Sigma$s:as_status. as_classify s =  cl_call) $\rightarrow$ as_status $\rightarrow$ Prop
1062  ; as_result: as_status $\rightarrow$ option int
1063  ; as_call_ident : ($\Sigma$s:as_status.as_classify s = cl_call) $\rightarrow$ ident
1064  ; as_tailcall_ident : ($\Sigma$s:as_status.as_classify s = cl_tailcall) $\rightarrow$ ident
1065  }.
1067which requires a type of states, an execution relation\footnote{All of
1068  our semantics are executable, but using a relation was simpler in
1069  the abstraction.}, some notion of abstract
1070program counter with decidable equality, the classification of states,
1071and functions to extract the observable intensional information (cost
1072labels and the identity of functions that are called).  The
1073\lstinline'as_after_return' property links the state before a function
1074call with the state after return, providing the evidence that
1075execution returns to the correct place.  The precise form varies
1076between stages; in \textsf{RTLabs} it insists the CFG, the pointer to
1077the CFG node to execute next, and some call stack information is
1080The structured traces are defined using three mutually inductive
1081types.  The core data structure is \lstinline'trace_any_label', which
1082captures some straight-line execution until the next cost label or the
1083return from the enclosing function.  Any function calls are embedded as
1084a single step, with its own trace nested inside and the before and
1085after states linked by \lstinline'as_after_return'; and states
1086classified as a `jump' (in particular branches) must be followed by a
1087cost label.
1089The second type, \lstinline'trace_label_label', is a
1090\lstinline'trace_any_label' where the initial state is cost labelled.
1091Thus a trace in this type identifies a series of steps whose cost is
1092entirely accounted for by the label at the start.
1094Finally, \lstinline'trace_label_return' is a sequence of
1095\lstinline'trace_label_label' values which end in the return from the
1096function.  These correspond to a measurable subtrace, and in
1097particular include executions of an entire function call (and so are
1098used for the nested calls in \lstinline'trace_any_label').
1102The construction of the structured trace replaces syntactic cost
1103labelling properties, which place requirements on where labels appear
1104in the program, with semantic properties that constrain the execution
1105traces of the program.  The construction begins by defining versions
1106of the sound and precise labelling properties on states and global
1107environments (for the code that appears in each of them) rather than
1108whole programs, and showing that these are preserved by steps of the
1109\textsf{RTLabs} semantics.
1111Then we show that each cost labelling property required by the
1112definition of structured traces is locally satisfied.  These proofs are
1113broken up by the classification of states.  Similarly, we prove a
1114step-by-step stack preservation result, which states that the
1115\textsf{RTLabs} semantics never changes the lower parts of the stack.
1117The core part of the construction of a structured trace is to use the
1118proof of termination from the measurable trace (defined on
1119page~\pageref{prog:terminationproof}) to `fold up' the execution into
1120the nested form.  The results outlined above fill in the proof
1121obligations for the cost labelling properties and the stack
1122preservation result shows that calls return to the correct location.
1124The structured trace alone is not sufficient to capture the property
1125that the program is soundly labelled.  While the structured trace
1126guarantees termination, it still permits a loop to be executed a
1127finite number of times without encountering a cost label.  We
1128eliminate this by proving that no `program counter' repeats within any
1129\lstinline'trace_any_label' section by showing that it is incompatible
1130with the property that there is a bound on the number of successor
1131instructions you can follow in the CFG before you encounter a cost
1132label (from Section~\ref{sec:costchecksimpl}).
1134\subsubsection{Complete execution structured traces}
1136The development of the construction above started relatively early,
1137before the measurable subtrace preservation proofs.  To be confident
1138that the traces were well-formed at that time, we also developed a
1139complete execution form that embeds the traces above.  This includes
1140non-terminating program executions, where an infinite number of the terminating
1141structured traces are embedded.  This construction confirmed that our
1142definition of structured traces was consistent, although we later
1143found that we did not require the whole execution version for the
1144compiler correctness results.
1146To construct these we need to know whether each function call will
1147eventually terminate, requiring the use of the excluded middle.  This
1148classical reasoning is local to the construction of whole program
1149traces and is not necessary for our main results.
1153In combination with the work on the CerCo back-end and by
1154concentrating on the novel intensional parts of the proof, we have
1155shown that it is possible to construct certifying compilers that
1156correctly report execution time and stack space costs.  The layering
1157of intensional correctness proofs on top of normal simulation results
1158provides a useful separation of concerns, and could permit the reuse
1159of existing results.
1165The following table gives a high-level overview of the \matita{}
1166source files in Deliverable 3.4:
1171\lstinline'' & Top-level compiler definitions, in particular
1172\lstinline'front_end', and the whole compiler definition
1173\lstinline'compile'. \\
1174\lstinline'' & Correctness results: \lstinline'front_end_correct'
1175and \lstinline'correct', respectively. \\
1176\lstinline'Clight/*' & \textsf{Clight}: proofs for switch
1177removal, cost labelling, cast simplification and conversion to
1178\textsf{Cminor}. \\
1179\lstinline'Cminor/*' & \textsf{Cminor}: axioms of conversion to
1180\textsf{RTLabs}. \\
1181\lstinline'RTLabs/*' & \textsf{RTLabs}: definitions and proofs for
1182compile-time cost labelling checks, construction of structured traces.
1184\lstinline'common/' & Definitions for measurable
1185subtraces. \\
1186\lstinline'common/' & Generic measurable subtrace
1187lifting proof. \\
1188\lstinline'common/*' & Other common definitions relevant to many parts
1189of the compiler and proof. \\
1190\lstinline'utilities/*' & General purpose definitions used throughout,
1191including extensions to the standard \matita{} library.
Note: See TracBrowser for help on using the repository browser.