source: Deliverables/D3.4/Report/report.tex @ 3173

Last change on this file since 3173 was 3173, checked in by campbell, 7 years ago

A little reworking of 3.4.

File size: 39.3 KB
Line 
1\documentclass[11pt,epsf,a4wide]{article}
2\usepackage[mathletters]{ucs}
3\usepackage[utf8x]{inputenc}
4\usepackage{stmaryrd}
5\usepackage{listings}
6\usepackage{../../style/cerco}
7\newcommand{\ocaml}{OCaml}
8\newcommand{\clight}{Clight}
9\newcommand{\matita}{Matita}
10\newcommand{\sdcc}{\texttt{sdcc}}
11
12\newcommand{\textSigma}{\ensuremath{\Sigma}}
13
14% LaTeX Companion, p 74
15\newcommand{\todo}[1]{\marginpar{\raggedright - #1}}
16
17\lstdefinelanguage{coq}
18  {keywords={Definition,Lemma,Theorem,Remark,Qed,Save,Inductive,Record},
19   morekeywords={[2]if,then,else},
20  }
21
22\lstdefinelanguage{matita}
23  {keywords={definition,lemma,theorem,remark,inductive,record,qed,let,rec,match,with,Type,and,on},
24   morekeywords={[2]whd,normalize,elim,cases,destruct},
25   mathescape=true,
26   morecomment=[n]{(*}{*)},
27  }
28
29\lstset{language=matita,basicstyle=\small\tt,columns=flexible,breaklines=false,
30        keywordstyle=\color{red}\bfseries,
31        keywordstyle=[2]\color{blue},
32        commentstyle=\color{green},
33        stringstyle=\color{blue},
34        showspaces=false,showstringspaces=false}
35
36\lstset{extendedchars=false}
37\lstset{inputencoding=utf8x}
38\DeclareUnicodeCharacter{8797}{:=}
39\DeclareUnicodeCharacter{10746}{++}
40\DeclareUnicodeCharacter{9001}{\ensuremath{\langle}}
41\DeclareUnicodeCharacter{9002}{\ensuremath{\rangle}}
42
43
44\title{
45INFORMATION AND COMMUNICATION TECHNOLOGIES\\
46(ICT)\\
47PROGRAMME\\
48\vspace*{1cm}Project FP7-ICT-2009-C-243881 \cerco{}}
49
50\date{ }
51\author{}
52
53\begin{document}
54\thispagestyle{empty}
55
56\vspace*{-1cm}
57\begin{center}
58\includegraphics[width=0.6\textwidth]{../../style/cerco_logo.png}
59\end{center}
60
61\begin{minipage}{\textwidth}
62\maketitle
63\end{minipage}
64
65
66\vspace*{0.5cm}
67\begin{center}
68\begin{LARGE}
69\bf
70Report n. D3.4\\
71Front-end Correctness Proofs\\
72\end{LARGE} 
73\end{center}
74
75\vspace*{2cm}
76\begin{center}
77\begin{large}
78Version 1.0
79\end{large}
80\end{center}
81
82\vspace*{0.5cm}
83\begin{center}
84\begin{large}
85Authors:\\
86Brian~Campbell, Ilias~Garnier, James~McKinna, Ian~Stark
87\end{large}
88\end{center}
89
90\vspace*{\fill}
91\noindent
92Project Acronym: \cerco{}\\
93Project full title: Certified Complexity\\
94Proposal/Contract no.: FP7-ICT-2009-C-243881 \cerco{}\\
95
96\clearpage \pagestyle{myheadings} \markright{\cerco{}, FP7-ICT-2009-C-243881}
97
98\newpage
99
100\vspace*{7cm}
101\paragraph{Abstract}
102We report on the correctness proofs for the front-end of the \cerco{}
103cost lifting compiler.  First, we identify the core result we wish to
104prove, which says that the we correctly predict the precise execution
105time for particular parts of the execution called \emph{measurable}
106subtraces.  Then we consider the three distinct parts of the task:
107showing that the \emph{annotated source code} output by the compiler
108has equivalent behaviour to the original input (up to the
109annotations); showing that a measurable subtrace of the
110annotated source code corresponds to an equivalent measurable subtrace
111in the code produced by the front-end, including costs; and finally
112showing that the enriched \emph{structured} execution traces required
113for cost correctness in the back-end can be constructed from the
114properties of the code produced by the front-end.
115
116A key part of our work is that the intensional correctness results that show
117that we get consistent cost measurements throughout the intermediate languages
118of the compiler can be layered on top of normal forward simulation results,
119if we split those results into local call-structure preserving simulations.
120
121This deliverable shows correctness results about the formalised
122compiler described in D3.2, using the source language semantics from
123D3.1 and intermediate language semantics from D3.3.  It builds on
124earlier work on a toy compiler built to test the labelling approach in
125D2.1. Together with the companion deliverable about the correctness of
126the back-end, D4.4, we obtain results about the whole formalised
127compiler.
128
129% TODO: mention the deliverable about the extracted compiler et al?
130
131\newpage 
132
133\tableofcontents
134
135% CHECK: clear up any -ize vs -ise
136% CHECK: clear up any "front end" vs "front-end"
137% CHECK: clear up any mentions of languages that aren't textsf'd.
138% CHECK: fix unicode in listings
139
140\section{Introduction}
141
142\todo{add stack space for more precise statement.  Also do some
143translation validation on sound, precise labelling properties.}
144
145The \cerco{} compiler compiles C code, targeting microcontrollers
146implementing the Intel 8051 architecture.  It produces both the object
147code and source code containing annotations describing the timing
148behaviour of the object code.  There are two versions: first, an
149initial prototype implemented in \ocaml{}~\cite{d2.2}, and a version
150formalised in the \matita{} proof assistant~\cite{d3.2,d4.2} and then
151extracted to \ocaml{} code to produce an executable compiler.  In this
152document we present results formalised in \matita{} about the
153front-end of the latter version of the compiler, and how that fits
154into the verification of the whole compiler.
155
156\todo{maybe mention stack space here?  other additions?  refer to
157  "layering"?}  A key part of this work was to separate the proofs
158about the compiled code's extensional behaviour (that is, the
159functional correctness of the compiler) from the intensional
160correctness that the costs given are correct.  Unfortunately, the
161ambitious goal of completely verifying the entire compiler was not
162feasible within the time available, but thanks to this separation of
163extensional and intensional proofs we are able to axiomatize
164simulation results similar to those in other compiler verification
165projects and concentrate on the novel intensional proofs.  The proofs
166were also made more tractable by introducing compile-time checks for
167the `sound and precise' cost labelling properties rather than proving
168that they are preserved throughout.
169
170The overall statement of correctness says that the annotated program has the
171same behaviour as the input, and that for any suitably well-structured part of
172the execution (which we call \emph{measurable}), the object code will execute
173the same behaviour taking precisely the time given by the cost annotations in
174the annotated source program.
175
176In the next section we recall the structure of the compiler and make the overall
177statement more precise.  Following that, in Section~\ref{sec:fegoals} we
178describe the statements we need to prove about the intermediate \textsf{RTLabs}
179programs sufficient for the back-end proofs.  \todo{rest of document structure}
180
181\section{The compiler and main goals}
182
183TODO: outline compiler, maybe figure from talk, maybe something like the figure
184below.
185
186TODO: might want a version of this figure
187\begin{figure}
188\begin{center}
189\begin{minipage}{.8\linewidth}
190\begin{tabbing}
191\quad \= $\downarrow$ \quad \= \kill
192\textsf{C} (unformalized)\\
193\> $\downarrow$ \> CIL parser (unformalized \ocaml)\\
194\textsf{Clight}\\
195%\> $\downarrow$ \> add runtime functions\\
196\> $\downarrow$ \> \lstinline[language=C]'switch' removal\\
197\> $\downarrow$ \> labelling\\
198\> $\downarrow$ \> cast removal\\
199\> $\downarrow$ \> stack variable allocation and control structure
200 simplification\\
201\textsf{Cminor}\\
202%\> $\downarrow$ \> generate global variable initialization code\\
203\> $\downarrow$ \> transform to RTL graph\\
204\textsf{RTLabs}\\
205\> $\downarrow$ \> check cost labelled properties of RTL graph\\
206\> $\downarrow$ \> start of target specific back-end\\
207\>\quad \vdots
208\end{tabbing}
209\end{minipage}
210\end{center}
211\caption{Front-end languages and compiler passes}
212\label{fig:summary}
213\end{figure}
214
215The compiler function returns the following record on success:
216\begin{lstlisting}[language=matita]
217record compiler_output : Type[0] :=
218 { c_labelled_object_code: labelled_object_code
219 ; c_stack_cost: stack_cost_model
220 ; c_max_stack: nat
221 ; c_init_costlabel: costlabel
222 ; c_labelled_clight: clight_program
223 ; c_clight_cost_map: clight_cost_map
224 }.
225\end{lstlisting}
226It consists of annotated 8051 object code, a mapping from function
227identifiers to the function's stack space usage\footnote{The compiled
228  code's only stack usage is to allocate a fixed-size frame on each
229  function entry and discard it on exit.  No other dynamic allocation
230  is provided by the compiler.}, the space available for the
231stack after global variable allocation, a cost label covering the
232execution time for the initialisation of global variables and the call
233to the \lstinline[language=C]'main' function, the annotated source
234code, and finally a mapping from cost labels to actual execution time
235costs.
236
237An \ocaml{} pretty printer is used to provide a concrete version of the output
238code.  In the case of the annotated source code, it also inserts the actual
239costs alongside the cost labels, and optionally adds a global cost variable
240and instrumentation to support further reasoning.  \todo{Cross-ref case study
241deliverables}
242
243\subsection{Revisions to the prototype compiler}
244
245TODO: could be a good idea to have this again; stack space,
246initialisation, cost checks, had we dropped cminor loops in previous
247writing?, check mailing list in case I've forgotten something
248
249TODO: continued use of dep types to reduce partiality
250
251\subsection{Main goals}
252
253TODO: need an example for this
254
255Informally, our main intensional result links the time difference in a source
256code execution to the time difference in the object code, expressing the time
257for the source by summing the values for the cost labels in the trace, and the
258time for the target by a clock built in to the 8051 executable semantics.
259
260The availability of precise timing information for 8501
261implementations and the design of the compiler allow it to give exact
262time costs in terms of processor cycles, not just upper bounds.
263However, these exact results are only available if the subtrace we
264measure starts and ends at suitable points.  In particular, pure
265computation with no observable effects may be reordered and moved past
266cost labels, so we cannot measure time between arbitrary statements in
267the program.
268
269There is also a constraint on the subtraces that we
270measure due to the requirements of the correctness proof for the
271object code timing analysis.  To be sure that the timings are assigned
272to the correct cost label, we need to know that each return from a
273function call must go to the correct return address.  It is difficult
274to observe this property locally in the object code because it relies
275on much earlier stages in the compiler.  To convey this information to
276the timing analysis extra structure is imposed on the subtraces, which
277is described in Section~\ref{sec:fegoals}.
278
279% Regarding the footnote, would there even be much point?
280% TODO: this might be quite easy to add ('just' subtract the
281% measurable subtrace from the second label to the end).  Could also
282% measure other traces in this manner.
283These restrictions are reflected in the subtraces that we give timing
284guarantees on; they must start at a cost label and end at the return
285of the enclosing function of the cost label\footnote{We expect that
286  this would generalise to subtraces between cost labels in the same
287  function, but could not justify the extra complexity that would be
288  required to show this.}.  A typical example of such a subtrace is
289the execution of an entire function from the cost label at the start
290of the function until it returns.  We call such any such subtrace
291\emph{measurable} if it (and the prefix of the trace from the start to
292the subtrace) can also be executed within the available stack space.
293
294Now we can give the main intensional statement for the compiler.
295Given a \emph{measurable} subtrace for a labelled \textsf{Clight}
296program, there is a subtrace of the 8051 object code program where the
297time differences match.  Moreover, \emph{observable} parts of the
298trace also match --- these are the appearance of cost labels and
299function calls and returns.
300
301More formally, the definition of this statement in \matita{} is
302\begin{lstlisting}[language=matita]
303definition simulates :=
304  $\lambda$p: compiler_output.
305  let initial_status := initialise_status $\dots$ (cm (c_labelled_object_code $\dots$ p)) in
306  $\forall$m1,m2.
307   measurable Clight_pcs (c_labelled_clight $\dots$ p) m1 m2
308    (lookup_stack_cost (c_stack_cost $\dots$ p)) (c_max_stack $\dots$ p) $\rightarrow$
309  $\forall$c1,c2.
310   clock_after Clight_pcs (c_labelled_clight $\dots$ p) m1 (c_clight_cost_map $\dots$ p) = OK $\dots$ c1 $\rightarrow$
311   clock_after Clight_pcs (c_labelled_clight $\dots$ p) (m1+m2) (c_clight_cost_map $\dots$ p) = OK $\dots$ c2 $\rightarrow$
312  $\exists$n1,n2.
313   observables Clight_pcs (c_labelled_clight $\dots$ p) m1 m2 =
314   observables (OC_preclassified_system (c_labelled_object_code $\dots$ p))
315    (c_labelled_object_code $\dots$ p) n1 n2
316  $\wedge$
317   c2 - c1 = clock $\dots$ (execute n2 ? initial_status) - clock $\dots$ (execute n1 ? initial_status).
318\end{lstlisting}
319where the \lstinline'measurable', \lstinline'clock_after' and
320\lstinline'observables' definitions can be applied to multiple
321languages; in this case the \lstinline'Clight_pcs' record applies them
322to \textsf{Clight} programs.
323
324There is a second part to the statement, which says that the initial
325processing of the input program to produce the cost labelled version
326does not affect the semantics of the program:
327% Yes, I'm paraphrasing the result a tiny bit to remove the observe non-function
328\begin{lstlisting}[language=matita]
329  $\forall$input_program,output.
330  compile input_program = return output $\rightarrow$
331  not_wrong … (exec_inf … clight_fullexec input_program) $\rightarrow$
332  sim_with_labels
333   (exec_inf … clight_fullexec input_program)
334   (exec_inf … clight_fullexec (c_labelled_clight … output))
335\end{lstlisting}
336That is, any successful compilation produces a labelled program that
337has identical behaviour to the original, so long as there is no
338`undefined behaviour'.
339
340Note that this provides full functional correctness, including
341preservation of (non-)termination.  The intensional result above does
342not do this directly --- it does not guarantee the same result or same
343termination.  There are two mitigating factors, however: first, to
344prove the intensional property you need local simulation results that
345can be pieced together to form full behavioural equivalence, only time
346constraints have prevented us from doing so.  Second, if we wish to
347confirm a result, termination, or non-termination we could add an
348observable witness, such as a function that is only called if the
349correct result is given.  The intensional result guarantees that the
350observable witness is preserved, so the program must behave correctly.
351
352\section{Intermediate goals for the front-end}
353\label{sec:fegoals}
354
355The essential parts of the intensional proof were outlined during work
356on a toy
357compiler~\cite{d2.1,springerlink:10.1007/978-3-642-32469-7_3}.  These
358are
359\begin{enumerate}
360\item functional correctness, in particular preserving the trace of
361  cost labels,
362\item the \emph{soundness} and \emph{precision} of the cost labelling
363  on the object code, and
364\item the timing analysis on the object code produces a correct
365  mapping from cost labels to time.
366\end{enumerate}
367
368However, that toy development did not include function calls.  For the
369full \cerco{} compiler we also need to maintain the invariant that
370functions return to the correct program location in the caller, as we
371mentioned in the previous section.  During work on the back-end timing
372analysis (describe in more detail in the companion deliverable, D4.4)
373the notion of a \emph{structured trace} was developed to enforce this
374return property, and also most of the cost labelling properties too.
375
376\begin{figure}
377\begin{center}
378\includegraphics[width=0.5\linewidth]{compiler.pdf}
379\end{center}
380\caption{The compiler and proof outline}
381\label{fig:compiler}
382\end{figure}
383
384Jointly, we generalised the structured traces to apply to any of the
385intermediate languages with some idea of program counter.  This means
386that they are introduced part way through the compiler, see
387Figure~\ref{fig:compiler}.  Proving that a structured trace can be
388constructed at \textsf{RTLabs} has several virtues:
389\begin{itemize}
390\item This is the first language where every operation has its own
391  unique, easily addressable, statement.
392\item Function calls and returns are still handled implicitly in the
393  language and so the structural properties are ensured by the
394  semantics.
395\item Many of the back-end languages from \textsf{RTL} share a common
396  core set of definitions, and using structured traces throughout
397  increases this uniformity.
398\end{itemize}
399
400\begin{figure}
401\begin{center}
402\includegraphics[width=0.6\linewidth]{strtraces.pdf}
403\end{center}
404\caption{Nesting of functions in structured traces}
405\label{fig:strtrace}
406\end{figure}
407A structured trace is a mutually inductive data type which
408contains the steps from a normal program trace, but arranged into a
409nested structure which groups entire function calls together and
410aggregates individual steps between cost labels (or between the final
411cost label and the return from the function), see
412Figure~\ref{fig:strtrace}.  This captures the idea that the cost labels
413only represent costs \emph{within} a function --- calls to other
414functions are accounted for in the nested trace for their execution, and we
415can locally regard function calls as a single step.
416
417These structured traces form the core part of the intermediate results
418that we must prove so that the back-end can complete the main
419intensional result stated above.  In full, we provide the back-end
420with
421\begin{enumerate}
422\item A normal trace of the \textbf{prefix} of the program's execution
423  before reaching the measurable subtrace.  (This needs to be
424  preserved so that we know that the stack space consumed is correct,
425  and to set up the simulation results.)
426\item The \textbf{structured trace} corresponding to the measurable
427  subtrace.
428\item An additional property about the structured trace that no
429  `program counter' is \textbf{repeated} between cost labels.  Together with
430  the structure in the trace, this takes over from showing that
431  cost labelling is sound and precise.
432\item A proof that the \textbf{observables} have been preserved.
433\item A proof that the \textbf{stack limit} is still observed by the prefix and
434  the structure trace.  (This is largely a consequence of the
435  preservation of observables.)
436\end{enumerate}
437
438Following the outline in Figure~\ref{fig:compiler}, we will first deal
439with the transformations in \textsf{Clight} that produce the source
440program with cost labels, then show that measurable traces can be
441lifted to \textsf{RTLabs}, and finally that we can construct the
442properties listed above ready for the back-end proofs.
443
444\section{Input code to cost labelled program}
445
446The simple form of labelling used in the formalised compiler is not
447quite capable of capturing costs arising from complex C
448\lstinline[language=C]'switch' statements, largely due to the
449fall-through behaviour.  Our first pass replaces these statements with
450simpler C code, allowing our second pass to perform the cost
451labelling.  We show that the behaviour of programs is unchanged by
452these passes.
453
454TODO: both give one-step-sim-by-many forward sim results; switch
455removal tricky, uses aux var to keep result of expr, not central to
456intensional correctness so curtailed proof effort once reasonable
457level of confidence in code gained; labelling much simpler; don't care
458what the labels are at this stage, just need to know when to go
459through extra steps.  Rolled up into a single result with a cofixpoint
460to obtain coinductive statement of equivalence (show).
461
462\section{Finding corresponding measurable subtraces}
463
464There follow the three main passes of the front-end:
465\begin{enumerate}
466\item simplification of casts in \textsf{Clight} code
467\item \textsf{Clight} to \textsf{Cminor} translation, performing stack
468  variable allocation and simplifying control structures
469\item transformation to \textsf{RTLabs} control flow graph
470\end{enumerate}
471\todo{I keep mentioning forward sim results - I probably ought to say
472  something about determinancy} We have taken a common approach to
473each pass: first we build (or axiomatise) forward simulation results
474that are similar to normal compiler proofs, but slightly more
475fine-grained so that we can see that the call structure and relative
476placement of cost labels is preserved.
477
478Then we instantiate a general result which shows that we can find a
479\emph{measurable} subtrace in the target of the pass that corresponds
480to the measurable subtrace in the source.  By repeated application of
481this result we can find a measurable subtrace of the execution of the
482\textsf{RTLabs} code, suitable for the construction of a structured
483trace (see Section~\ref{sec:structuredtrace}.  This is essentially an
484extra layer on top of the simulation proofs that provides us with the
485extra information required for our intensional correctness proof.
486
487\subsection{Generic measurable subtrace lifting proof}
488
489Our generic proof is parametrised on a record containing small-step
490semantics for the source and target language, a classification of
491states (the same form of classification is used when defining
492structured traces), a simulation relation which loosely respects the
493classification and cost labelling \todo{this isn't very helpful} and
494four simulation results:
495\begin{enumerate}
496\item a step from a `normal' state (which is not classified as a call
497  or return) which is not a cost label is simulated by zero or more
498  `normal' steps;
499\item a step from a `call' state followed by a cost label step is
500  simulated by a step from a `call' state, a corresponding label step,
501  then zero or more `normal' steps;
502\item a step from a `call' state not followed by a cost label
503  similarly (note that this case cannot occur in a well-labelled
504  program, but we do not have enough information locally to exploit
505  this); and
506\item a cost label step is simulated by a cost label step.
507\end{enumerate}
508Finally, we need to know that a successfully translated program will
509have an initial state in the simulation relation with the original
510program's initial state.
511
512\begin{figure}
513\begin{center}
514\includegraphics[width=0.5\linewidth]{meassim.pdf}
515\end{center}
516\caption{Tiling of simulation for a measurable subtrace}
517\label{fig:tiling}
518\end{figure}
519
520To find the measurable subtrace in the target program's execution we
521walk along the original program's execution trace applying the
522appropriate simulation result by induction on the number of steps.
523While the number of steps taken varies, the overall structure is
524preserved, as illustrated in Figure~\ref{fig:tiling}.  By preserving
525the structure we also maintain the same intensional observables.  One
526delicate point is that the cost label following a call must remain
527directly afterwards\footnote{The prototype compiler allowed some
528  straight-line code to appear before the cost label until a later
529  stage of the compiler, but we must move the requirement forward to
530  fit with the structured traces.}
531% Damn it, I should have just moved the cost label forwards in RTLabs,
532% like the prototype does in RTL to ERTL; the result would have been
533% simpler.  Or was there some reason not to do that?
534(both in the program code and in the execution trace), even if we
535introduce extra steps, for example to store parameters in memory in
536\textsf{Cminor}.  Thus we have a version of the call simulation
537that deals with both in one result.
538
539In addition to the subtrace we are interested in measuring we must
540also prove that the earlier part of the trace is also preserved in
541order to use the simulation from the initial state.  It also
542guarantees that we do not run out of stack space before the subtrace
543we are interested in.  The lemmas for this prefix and the measurable
544subtrace are similar, following the pattern above.  However, the
545measurable subtrace also requires us to rebuild the termination
546proof.  This is defined recursively:
547\label{prog:terminationproof}
548\begin{lstlisting}[language=matita]
549let rec will_return_aux C (depth:nat)
550                             (trace:list (cs_state … C × trace)) on trace : bool :=
551match trace with
552[ nil $\Rightarrow$ false
553| cons h tl $\Rightarrow$
554  let $\langle$s,tr$\rangle$ := h in
555  match cs_classify C s with
556  [ cl_call $\Rightarrow$ will_return_aux C (S depth) tl
557  | cl_return $\Rightarrow$
558      match depth with
559      [ O $\Rightarrow$ match tl with [ nil $\Rightarrow$ true | _ $\Rightarrow$ false ]
560      | S d $\Rightarrow$ will_return_aux C d tl
561      ]
562  | _ $\Rightarrow$ will_return_aux C depth tl
563  ]
564].
565\end{lstlisting}
566The \lstinline'depth' is the number of return states we need to see
567before we have returned to the original function (initially zero) and
568\lstinline'trace' the measurable subtrace obtained from the running
569the semantics for the correct number of steps.  This definition
570unfolds tail recursively for each step, and once the corresponding
571simulation result has been applied a new one for the target can be
572asserted by unfolding and applying the induction hypothesis on the
573shorter trace.
574
575This then gives us an overall result for any simulation fitting the
576requirements above (contained in the \lstinline'meas_sim' record):
577\begin{lstlisting}[language=matita]
578theorem measured_subtrace_preserved :
579  $\forall$MS:meas_sim.
580  $\forall$p1,p2,m,n,stack_cost,max.
581  ms_compiled MS p1 p2 $\rightarrow$
582  measurable (ms_C1 MS) p1 m n stack_cost max $\rightarrow$
583  $\exists$m',n'.
584    measurable (ms_C2 MS) p2 m' n' stack_cost max $\wedge$
585    observables (ms_C1 MS) p1 m n = observables (ms_C2 MS) p2 m' n'.
586\end{lstlisting}
587The stack space requirement that is embedded in \lstinline'measurable'
588is a consequence of the preservation of observables.
589
590TODO: how to deal with untidy edges wrt to sim rel; anything to
591say about obs?
592
593TODO: say something about termination measures; cost labels are
594statements/exprs in these languages; split call/return gives simpler
595simulations
596
597\subsection{Simulation results for each pass}
598
599\todo{don't use loop structures from CompCert, go straight to jumps}
600
601\paragraph{Cast simplification}
602
603The parser used in Cerco introduces a lot of explicit type casts. If left
604as they are, these casts can greatly hamper the quality of the generated
605code -- especially as the architecture we consider is an $8$ bit one.
606In \textsf{Clight}, casts are expressions. Hence, most of the work of
607this transformation proceeds on expressions, and the one-step simulation
608result one those is easily lifted to statement evaluations.
609
610\todo{introduce the explicit type signature of the functions below ?}
611
612The tranformation proceeds by recursively trying to coerce an expression to
613a particular type, which is in practice a integer type smaller than the
614original one. This functionality is implemented by two mutually recursive
615functions: \textbf{simplify\_expr} and \textbf{simplify\_inside}. The latter
616acts as a wrapper for the first one. Whenever \textbf{simplify\_inside} 
617encounters a \textbf{Ecast} expression, it tries to coerce the sub-expression
618to the desired type using \textbf{simplify\_expr}, which tries to perform the
619actual coercion. In return, \textbf{simplify\_expr} calls back
620\textbf{simplify\_inside} in some particular positions, where we decided to
621be conservative in order to simplify the proofs. However, the current design
622allows to incrementally revert to a more aggressive version, by replacing
623recursive calls to \textbf{simplify\_inside} by calls to \textbf{simplify\_expr} 
624\emph{and} proving the corresponding invariants -- where possible.
625
626The main problem encountered with this particular pass was dealing with inconsistently
627typed programs, a canonical case being a particular integer constant of a certain
628size typed with another size. This prompted the need to introduce numerous
629type checks, complexifying both the implementation and the proof.
630
631\paragraph{Switch removal}
632
633Our intermediate languages have no jump tables. Hence, we have to compile the
634\lstinline[language=C]'switch' constructs away before going from \textsf{Clight} to
635\textsf{Cminor}. Note that this is not necessarily a bad decision performance-wise
636for small switches. For instance, compilers such as GCC introduce balanced trees
637of ``if-then-else'' constructs. However, our implementation strategy is much simpler.
638Let us consider the following input statement.
639
640\begin{lstlisting}[language=C]
641   switch(e) {
642   case v1:
643     stmt1;
644   case v2:
645     stmt2;
646   default:
647     stmt_default;
648   }
649\end{lstlisting}
650
651Note that stmt1, stmt2, \ldots stmt\_default may contain "break" statements,
652which have the effect of exiting the switch statement. In the absence of break,
653the execution falls through each case sequentially. In our current implementation,
654we produce an equivalent sequence of ``if-then'' chained by gotos:
655\begin{lstlisting}[language=C]
656   fresh = e;
657   if(fresh == v1) {
658     $\llbracket$stmt1$\rrbracket$;
659     goto lbl_case2;
660   };
661   if(fresh == v2) {
662     lbl_case2:
663     $\llbracket$stmt2;$\rrbracket$
664     goto lbl_case2;
665   };
666   $\llbracket$stmt_default$\rrbracket$;
667   exit_label:
668\end{lstlisting}
669
670The proof had to tackle the following points:
671\begin{itemize}
672\item the source and target memories are not the same (cf. fresh variable),
673\item the flow of control is changed in a non-local way (e.g. \textbf{goto} 
674  instead of \textbf{break}).
675\end{itemize}
676
677In order to tackle the first point, we implemented a version of memory
678extensions similar to Compcert's. What has been done is the simulation proof
679for all ``easy'' cases, that do not interact with the switch removal per se,
680plus a bit of the switch case. This comprises propagating the memory extension
681through  each statement (except switch), as well as various invariants that
682are needed for the switch case (in particular, freshness hypotheses). The
683details of the evaluation process for the source switch statement and its
684target counterpart can be found in the file \textbf{switchRemoval.ma},
685along more details on the transformation itself.
686
687Proving the correctness of the second point would require reasoning on the
688semantics of \lstinline[language=C]'goto' statements. In the \textsf{Clight} 
689semantics, this is implemented as a function-wide lookup of the target label.
690The invariant we would need is the fact that a global label lookup on a freshly
691created goto is equivalent to a local lookup. This would in turn require the
692propagation of some freshness hypotheses on labels. For reasons expressed above,
693we decided to omit this part of the correctness proof.
694
695\paragraph{Clight to Cminor}
696
697This pass is the last one operating on the Clight intermediate language.
698Its input is a full \textsf{Clight} program, and its output is a \textsf{Cminor} 
699program. Note that we do not use an equivalent of the C\#minor language: we
700translate directly to Cminor. This presents the advantage of not requiring the
701special loop constructs, nor the explicit block structure. Another salient
702point of our approach is that a significant part of the properties needed for
703the simulation proof were directly encoded in dependently typed translation
704functions. This concerns more precisely freshness conditions and well-typedness
705conditions. The main effects of the transformation from \textsf{Clight} to
706\textsf{Cminor} are listed below.
707
708\begin{itemize}
709\item Variables are classified as being either globals, stack-allocated
710  locals or potentially register-allocated locals. The value of register-allocated
711  local variables is moved out of the modelled memory and stored in a
712  dedicated environment.
713\item In Clight, each local variable has a dedicated memory block, whereas
714  stack-allocated locals are bundled together on a function-by-function basis.
715\item Loops are converted to jumps.
716\end{itemize}
717
718The first two points require memory injections which are more flexible that those
719needed in the switch removal case. In the remainder of this section, we briefly
720discuss our implementation of memory injections, and then the simulation proof.
721
722\subparagraph{Memory injections}
723
724Our memory injections are modelled after the work of Blazy \& Leroy.
725However, the corresponding paper is based on the first version of the
726Compcert memory model, whereas we use a much more concrete model, allowing byte-level
727manipulations (as in the later version of Compcert's memory model). We proved
728roughly 80 \% of the required lemmas. Some difficulties encountered were notably
729due to some too relaxed conditions on pointer validity (problem fixed during development).
730Some more conditions had to be added to take care of possible overflows when converting
731from \textbf{Z} block bounds to $16$ bit pointer offsets (in pratice, these overflows
732only occur in edge cases that are easily ruled out -- but this fact is not visible
733in memory injections). Concretely, some of the lemmas on the preservation of simulation of
734loads after writes were axiomatised, for lack of time.
735
736\subparagraph{Simulation proof}
737
738The correction proof for this transformation was not terminated. We proved the
739simulation result for expressions and for some subset of the critical statement cases.
740Notably lacking are the function entry and exit, where the memory injection is
741properly set up. As can be guessed, a significant amount of work has to be performed
742to show the conservation of all invariants at each simulation step.
743
744\todo{list cases, explain while loop, explain labeling problem}
745
746\section{Checking cost labelling properties}
747
748Ideally, we would provide proofs that the cost labelling pass always
749produced programs that are soundly and precisely labelled and that
750each subsequent pass preserves these properties.  This would match our
751use of dependent types to eliminate impossible sources of errors
752during compilation, in particular retaining intermediate language type
753information.
754
755However, given the limited amount of time available we realised that
756implementing a compile-time check for a sound and precise labelling of
757the \textsf{RTLabs} intermediate code would reduce the proof burden
758considerably.  This is similar in spirit to the use of translation
759validation in certified compilation\todo{Cite some suitable work
760  here}, which makes a similar trade-off between the potential for
761compile-time failure and the volume of proof required.
762
763The check cannot be pushed into a later stage of the compiler because
764much of the information is embedded into the structured traces.
765However, if an alternative method was used to show that function
766returns in the compiled code are sufficiently well-behaved, then we
767could consider pushing the cost property checks into the timing
768analysis itself.  We leave this as a possible area for future work.
769
770\subsection{Implementation and correctness}
771
772For a cost labelling to be sound and precise we need a cost label at
773the start of each function, after each branch and at least one in
774every loop.  The first two parts are trivial to check by examining the
775code.  In \textsf{RTLabs} the last part is specified by saying
776that there is a bound on the number of successive instruction nodes in
777the CFG that you can follow before you encounter a cost label, and
778checking this is more difficult.
779
780The implementation works through the set of nodes in the graph,
781following successors until a cost label is found or a label-free cycle
782is discovered (in which case the property does not hold and we stop).
783This is made easier by the prior knowledge that any branch is followed
784by cost labels, so we do not need to search each branch.  When a label
785is found, we remove the chain from the set and continue from another
786node in the set until it is empty, at which point we know that there
787is a bound for every node in the graph.
788
789Directly reasoning about the function that implements this would be
790rather awkward, so an inductive specification of a single step of its
791behaviour was written and proved to match the implementation.  This
792was then used to prove the implementation sound and complete.
793
794While we have not attempted to proof that the cost labelled properties
795are established and preserved earlier in the compiler, we expect that
796the effort for the \textsf{Cminor} to \textsf{RTLabs} would be similar
797to the work outlined above, because it involves the change from
798requiring a cost label at particular positions to requiring cost
799labels to break loops in the CFG.  As there are another three passes
800to consider (including the labelling itself), we believe that using
801the check above is much simpler overall.
802
803% TODO? Found some Clight to Cminor bugs quite quickly
804
805\section{Existence of a structured trace}
806\label{sec:structuredtrace}
807
808\emph{Structured traces} enrich the execution trace of a program by
809nesting function calls in a mixed-step style\todo{Can I find a
810  justification for mixed-step} and embedding the cost properties of
811the program.  It was originally designed to support the proof of
812correctness for the timing analysis of the object code in the
813back-end, then generalised to provide a common structure to use from
814the end of the front-end to the object code.  See
815Figure~\ref{fig:strtrace} on page~\pageref{fig:strtrace}
816for an illustration of a structured trace.
817
818To make the definition generic we abstract over the semantics of the
819language,
820\begin{lstlisting}[language=matita]
821record abstract_status : Type[1] :=
822  { as_status :> Type[0]
823  ; as_execute : relation as_status
824  ; as_pc : DeqSet
825  ; as_pc_of : as_status $\rightarrow$ as_pc
826  ; as_classify : as_status $\rightarrow$ status_class
827  ; as_label_of_pc : as_pc $\rightarrow$ option costlabel
828  ; as_after_return : ($\Sigma$s:as_status. as_classify s =  cl_call) $\rightarrow$ as_status $\rightarrow$ Prop
829  ; as_result: as_status $\rightarrow$ option int
830  ; as_call_ident : ($\Sigma$s:as_status.as_classify s = cl_call) $\rightarrow$ ident
831  ; as_tailcall_ident : ($\Sigma$s:as_status.as_classify s = cl_tailcall) $\rightarrow$ ident
832  }.
833\end{lstlisting}
834which gives a type of states, an execution relation, some notion of
835program counters with decidable equality, the classification of states
836and functions to extract the observable intensional information (cost
837labels and the identity of functions that are called).  The
838\lstinline'as_after_return' property links the state before a function
839call with the state after return, providing the evidence that
840execution returns to the correct place.  The precise form varies
841between stages; in \textsf{RTLabs} it insists the CFG, the pointer to
842the CFG node to execute next, and some call stack information is
843preserved.
844
845The structured traces are defined using three mutually inductive
846types.  The core data structure is \lstinline'trace_any_label', which
847captures some straight-line execution until the next cost label or
848function return.  Each function call is embedded as a single step,
849with its own trace nested inside and the before and after states
850linked by \lstinline'as_after_return', and states classified as a
851`jump' (in particular branches) must be followed by a cost label.
852
853The second type, \lstinline'trace_label_label', is a
854\lstinline'trace_any_label' where the initial state is cost labelled.
855Thus a trace in this type identifies a series of steps whose cost is
856entirely accounted for by the label at the start.
857
858Finally, \lstinline'trace_label_return' is a sequence of
859\lstinline'trace_label_label' values which end in the return from the
860function.  These correspond to a measurable subtrace, and in
861particular include executions of an entire function call (and so are
862used for the nested calls in \lstinline'trace_any_label').
863
864\subsection{Construction}
865
866The core part of the construction of a fixed trace is to use the proof
867of termination from the measurable trace (defined on
868page~\pageref{prog:terminationproof}) to `fold up' the execution into
869the nested form.
870
871TODO: design, basic structure from termination proof, how cost
872labelling props are baked in; unrepeating PCs, remainder of sound
873labellings; coinductive version for whole programs, reason/relevance,
874use of em (maybe a general comment about uses of classical reasoning
875in development)
876
877\section{Conclusion}
878
879TODO
880
881TODO: appendix on code layout?
882
883\bibliographystyle{plain}
884\bibliography{report}
885
886\end{document}
Note: See TracBrowser for help on using the repository browser.