source: Deliverables/D3.4/Report/report.tex @ 3211

Last change on this file since 3211 was 3211, checked in by campbell, 4 years ago

Put switch removal in correct place; describe cost labelling sim.

File size: 46.4 KB
Line 
1\documentclass[11pt,epsf,a4wide]{article}
2\usepackage[mathletters]{ucs}
3\usepackage[utf8x]{inputenc}
4\usepackage{stmaryrd}
5\usepackage{listings}
6\usepackage{../../style/cerco}
7\newcommand{\ocaml}{OCaml}
8\newcommand{\clight}{Clight}
9\newcommand{\matita}{Matita}
10\newcommand{\sdcc}{\texttt{sdcc}}
11
12\newcommand{\textSigma}{\ensuremath{\Sigma}}
13
14% LaTeX Companion, p 74
15\newcommand{\todo}[1]{\marginpar{\raggedright - #1}}
16
17\lstdefinelanguage{coq}
18  {keywords={Definition,Lemma,Theorem,Remark,Qed,Save,Inductive,Record},
19   morekeywords={[2]if,then,else},
20  }
21
22\lstdefinelanguage{matita}
23  {keywords={definition,lemma,theorem,remark,inductive,record,qed,let,rec,match,with,Type,and,on},
24   morekeywords={[2]whd,normalize,elim,cases,destruct},
25   mathescape=true,
26   morecomment=[n]{(*}{*)},
27  }
28
29\lstset{language=matita,basicstyle=\small\tt,columns=flexible,breaklines=false,
30        keywordstyle=\color{red}\bfseries,
31        keywordstyle=[2]\color{blue},
32        commentstyle=\color{green},
33        stringstyle=\color{blue},
34        showspaces=false,showstringspaces=false}
35
36\lstset{extendedchars=false}
37\lstset{inputencoding=utf8x}
38\DeclareUnicodeCharacter{8797}{:=}
39\DeclareUnicodeCharacter{10746}{++}
40\DeclareUnicodeCharacter{9001}{\ensuremath{\langle}}
41\DeclareUnicodeCharacter{9002}{\ensuremath{\rangle}}
42
43
44\title{
45INFORMATION AND COMMUNICATION TECHNOLOGIES\\
46(ICT)\\
47PROGRAMME\\
48\vspace*{1cm}Project FP7-ICT-2009-C-243881 \cerco{}}
49
50\date{ }
51\author{}
52
53\begin{document}
54\thispagestyle{empty}
55
56\vspace*{-1cm}
57\begin{center}
58\includegraphics[width=0.6\textwidth]{../../style/cerco_logo.png}
59\end{center}
60
61\begin{minipage}{\textwidth}
62\maketitle
63\end{minipage}
64
65
66\vspace*{0.5cm}
67\begin{center}
68\begin{LARGE}
69\bf
70Report n. D3.4\\
71Front-end Correctness Proofs\\
72\end{LARGE} 
73\end{center}
74
75\vspace*{2cm}
76\begin{center}
77\begin{large}
78Version 1.0
79\end{large}
80\end{center}
81
82\vspace*{0.5cm}
83\begin{center}
84\begin{large}
85Authors:\\
86Brian~Campbell, Ilias~Garnier, James~McKinna, Ian~Stark
87\end{large}
88\end{center}
89
90\vspace*{\fill}
91\noindent
92Project Acronym: \cerco{}\\
93Project full title: Certified Complexity\\
94Proposal/Contract no.: FP7-ICT-2009-C-243881 \cerco{}\\
95
96\clearpage \pagestyle{myheadings} \markright{\cerco{}, FP7-ICT-2009-C-243881}
97
98\newpage
99
100\vspace*{7cm}
101\paragraph{Abstract}
102We report on the correctness proofs for the front-end of the \cerco{}
103cost lifting compiler.  First, we identify the core result we wish to
104prove, which says that the we correctly predict the precise execution
105time for particular parts of the execution called \emph{measurable}
106subtraces.  Then we consider the three distinct parts of the task:
107showing that the \emph{annotated source code} output by the compiler
108has equivalent behaviour to the original input (up to the
109annotations); showing that a measurable subtrace of the
110annotated source code corresponds to an equivalent measurable subtrace
111in the code produced by the front-end, including costs; and finally
112showing that the enriched \emph{structured} execution traces required
113for cost correctness in the back-end can be constructed from the
114properties of the code produced by the front-end.
115
116A key part of our work is that the intensional correctness results that show
117that we get consistent cost measurements throughout the intermediate languages
118of the compiler can be layered on top of normal forward simulation results,
119if we split those results into local call-structure preserving simulations.
120
121This deliverable shows correctness results about the formalised
122compiler described in D3.2, using the source language semantics from
123D3.1 and intermediate language semantics from D3.3.  It builds on
124earlier work on a toy compiler built to test the labelling approach in
125D2.1. Together with the companion deliverable about the correctness of
126the back-end, D4.4, we obtain results about the whole formalised
127compiler.
128
129% TODO: mention the deliverable about the extracted compiler et al?
130
131\newpage 
132
133\tableofcontents
134
135% CHECK: clear up any -ize vs -ise
136% CHECK: clear up any "front end" vs "front-end"
137% CHECK: clear up any mentions of languages that aren't textsf'd.
138% CHECK: fix unicode in listings
139
140\section{Introduction}
141
142\todo{add stack space for more precise statement.  Also do some
143translation validation on sound, precise labelling properties.}
144
145The \cerco{} compiler compiles C code, targeting microcontrollers
146implementing the Intel 8051 architecture.  It produces both the object
147code and source code containing annotations describing the timing
148behaviour of the object code.  There are two versions: first, an
149initial prototype implemented in \ocaml{}~\cite{d2.2}, and a version
150formalised in the \matita{} proof assistant~\cite{d3.2,d4.2} and then
151extracted to \ocaml{} code to produce an executable compiler.  In this
152document we present results formalised in \matita{} about the
153front-end of the latter version of the compiler, and how that fits
154into the verification of the whole compiler.
155
156\todo{maybe mention stack space here?  other additions?  refer to
157  "layering"?}  A key part of this work was to separate the proofs
158about the compiled code's extensional behaviour (that is, the
159functional correctness of the compiler) from the intensional
160correctness that the costs given are correct.  Unfortunately, the
161ambitious goal of completely verifying the entire compiler was not
162feasible within the time available, but thanks to this separation of
163extensional and intensional proofs we are able to axiomatize
164simulation results similar to those in other compiler verification
165projects and concentrate on the novel intensional proofs.  The proofs
166were also made more tractable by introducing compile-time checks for
167the `sound and precise' cost labelling properties rather than proving
168that they are preserved throughout.
169
170The overall statement of correctness says that the annotated program has the
171same behaviour as the input, and that for any suitably well-structured part of
172the execution (which we call \emph{measurable}), the object code will execute
173the same behaviour taking precisely the time given by the cost annotations in
174the annotated source program.
175
176In the next section we recall the structure of the compiler and make the overall
177statement more precise.  Following that, in Section~\ref{sec:fegoals} we
178describe the statements we need to prove about the intermediate \textsf{RTLabs}
179programs sufficient for the back-end proofs.  \todo{rest of document structure}
180
181\section{The compiler and main goals}
182
183The unformalised \ocaml{} \cerco{} compiler was originally described
184in Deliverables 2.1 and 2.2.  Its design was replicated in the formal
185\matita{} code, which was presented in Deliverables 3.2 and 4.2, for
186the front-end and back-end respectively.
187
188\begin{figure}
189\begin{center}
190\includegraphics{compiler-plain.pdf}
191\end{center}
192\caption{Languages in the \cerco{} compiler}
193\label{fig:compilerlangs}
194\end{figure}
195
196The compiler uses a number of intermediate languages, as outlined the
197middle two lines of Figure~\ref{fig:compilerlangs}.  The upper line
198represents the front-end of the compiler, and the lower the back-end,
199finishing with 8051 binary code.  Not all of the front-end passes
200introduces a new language, and Figure~\ref{fig:summary} presents a
201list of every pass involved.
202
203\begin{figure}
204\begin{center}
205\begin{minipage}{.8\linewidth}
206\begin{tabbing}
207\quad \= $\downarrow$ \quad \= \kill
208\textsf{C} (unformalized)\\
209\> $\downarrow$ \> CIL parser (unformalized \ocaml)\\
210\textsf{Clight}\\
211%\> $\downarrow$ \> add runtime functions\\
212\> $\downarrow$ \> \lstinline[language=C]'switch' removal\\
213\> $\downarrow$ \> labelling\\
214\> $\downarrow$ \> cast removal\\
215\> $\downarrow$ \> stack variable allocation and control structure
216 simplification\\
217\textsf{Cminor}\\
218%\> $\downarrow$ \> generate global variable initialization code\\
219\> $\downarrow$ \> transform to RTL graph\\
220\textsf{RTLabs}\\
221\> $\downarrow$ \> check cost labelled properties of RTL graph\\
222\> $\downarrow$ \> start of target specific back-end\\
223\>\quad \vdots
224\end{tabbing}
225\end{minipage}
226\end{center}
227\caption{Front-end languages and compiler passes}
228\label{fig:summary}
229\end{figure}
230
231\label{page:switchintro}
232The annotated source code is taken after the cost labelling phase.
233Note that there is a pass to replace C \lstinline[language=C]'switch'
234statements before labelling --- we need to remove them because the
235simple form of labelling used in the formalised compiler is not quite
236capable of capturing their execution time costs, largely due to the
237fall-through behaviour.
238
239The cast removal phase which follows cost labelling simplifies
240expressions to prevent unnecessary arithmetic promotion which is
241specified by the C standard but costly for an 8-bit target.  The
242transformation to \textsf{Cminor} and subsequently \textsf{RTLabs}
243bear considerable resemblance to some passes of the CompCert
244compiler\todo{cite}, although we use a simpler \textsf{Cminor} where
245all loops use \lstinline[language=C]'goto' statements, and the
246\textsf{RTLabs} language retains a target-independent flavour.  The
247back-end takes \textsf{RTLabs} code as input.
248
249The whole compilation function returns the following information on success:
250\begin{lstlisting}[language=matita]
251record compiler_output : Type[0] :=
252 { c_labelled_object_code: labelled_object_code
253 ; c_stack_cost: stack_cost_model
254 ; c_max_stack: nat
255 ; c_init_costlabel: costlabel
256 ; c_labelled_clight: clight_program
257 ; c_clight_cost_map: clight_cost_map
258 }.
259\end{lstlisting}
260It consists of annotated 8051 object code, a mapping from function
261identifiers to the function's stack space usage\footnote{The compiled
262  code's only stack usage is to allocate a fixed-size frame on each
263  function entry and discard it on exit.  No other dynamic allocation
264  is provided by the compiler.}, the space available for the
265stack after global variable allocation, a cost label covering the
266execution time for the initialisation of global variables and the call
267to the \lstinline[language=C]'main' function, the annotated source
268code, and finally a mapping from cost labels to actual execution time
269costs.
270
271An \ocaml{} pretty printer is used to provide a concrete version of the output
272code.  In the case of the annotated source code, it also inserts the actual
273costs alongside the cost labels, and optionally adds a global cost variable
274and instrumentation to support further reasoning.  \todo{Cross-ref case study
275deliverables}
276
277\subsection{Revisions to the prototype compiler}
278
279Our focus on intensional properties prompted us to consider whether we
280could incorporate stack space into the costs presented to the user.
281We only allocate one fixed-size frame per function, so modelling this
282was relatively simple.  It is the only form of dynamic memory
283allocation provided by the compiler, so we were able to strengthen the
284statement of the goal to guarantee successful execution whenever the
285stack space obeys a bound calculated by subtracting the global
286variable requirements from the total memory available.
287
288The cost checks at the end of Figure~\ref{fig:summary} have been
289introduced to reduce the proof burden, and are described in
290Section~\ref{sec:costchecks}.
291
292The use of dependent types to capture simple intermediate language
293invariants makes every front-end pass except \textsf{Clight} to
294\textsf{Cminor} and the cost checks total functions.  Hence various
295well-formedness and type checks are dealt with once in that phase.
296With the benefit of hindsight we would have included an initial
297checking phase to produce a `well-formed' variant of \textsf{Clight},
298conjecturing that this would simplify various parts of the proofs for
299the \textsf{Clight} stages which deal with potentially ill-formed
300code.
301
302\todo{move of initialisation code?}
303
304\subsection{Main goals}
305
306TODO: need an example for this
307
308Informally, our main intensional result links the time difference in a source
309code execution to the time difference in the object code, expressing the time
310for the source by summing the values for the cost labels in the trace, and the
311time for the target by a clock built in to the 8051 executable semantics.
312
313The availability of precise timing information for 8501
314implementations and the design of the compiler allow it to give exact
315time costs in terms of processor cycles, not just upper bounds.
316However, these exact results are only available if the subtrace we
317measure starts and ends at suitable points.  In particular, pure
318computation with no observable effects may be reordered and moved past
319cost labels, so we cannot measure time between arbitrary statements in
320the program.
321
322There is also a constraint on the subtraces that we
323measure due to the requirements of the correctness proof for the
324object code timing analysis.  To be sure that the timings are assigned
325to the correct cost label, we need to know that each return from a
326function call must go to the correct return address.  It is difficult
327to observe this property locally in the object code because it relies
328on much earlier stages in the compiler.  To convey this information to
329the timing analysis extra structure is imposed on the subtraces, which
330is described in Section~\ref{sec:fegoals}.
331
332% Regarding the footnote, would there even be much point?
333% TODO: this might be quite easy to add ('just' subtract the
334% measurable subtrace from the second label to the end).  Could also
335% measure other traces in this manner.
336These restrictions are reflected in the subtraces that we give timing
337guarantees on; they must start at a cost label and end at the return
338of the enclosing function of the cost label\footnote{We expect that
339  this would generalise to subtraces between cost labels in the same
340  function, but could not justify the extra complexity that would be
341  required to show this.}.  A typical example of such a subtrace is
342the execution of an entire function from the cost label at the start
343of the function until it returns.  We call such any such subtrace
344\emph{measurable} if it (and the prefix of the trace from the start to
345the subtrace) can also be executed within the available stack space.
346
347Now we can give the main intensional statement for the compiler.
348Given a \emph{measurable} subtrace for a labelled \textsf{Clight}
349program, there is a subtrace of the 8051 object code program where the
350time differences match.  Moreover, \emph{observable} parts of the
351trace also match --- these are the appearance of cost labels and
352function calls and returns.
353
354More formally, the definition of this statement in \matita{} is
355\begin{lstlisting}[language=matita]
356definition simulates :=
357  $\lambda$p: compiler_output.
358  let initial_status := initialise_status $\dots$ (cm (c_labelled_object_code $\dots$ p)) in
359  $\forall$m1,m2.
360   measurable Clight_pcs (c_labelled_clight $\dots$ p) m1 m2
361    (lookup_stack_cost (c_stack_cost $\dots$ p)) (c_max_stack $\dots$ p) $\rightarrow$
362  $\forall$c1,c2.
363   clock_after Clight_pcs (c_labelled_clight $\dots$ p) m1 (c_clight_cost_map $\dots$ p) = OK $\dots$ c1 $\rightarrow$
364   clock_after Clight_pcs (c_labelled_clight $\dots$ p) (m1+m2) (c_clight_cost_map $\dots$ p) = OK $\dots$ c2 $\rightarrow$
365  $\exists$n1,n2.
366   observables Clight_pcs (c_labelled_clight $\dots$ p) m1 m2 =
367   observables (OC_preclassified_system (c_labelled_object_code $\dots$ p))
368    (c_labelled_object_code $\dots$ p) n1 n2
369  $\wedge$
370   c2 - c1 = clock $\dots$ (execute n2 ? initial_status) - clock $\dots$ (execute n1 ? initial_status).
371\end{lstlisting}
372where the \lstinline'measurable', \lstinline'clock_after' and
373\lstinline'observables' definitions can be applied to multiple
374languages; in this case the \lstinline'Clight_pcs' record applies them
375to \textsf{Clight} programs.
376
377There is a second part to the statement, which says that the initial
378processing of the input program to produce the cost labelled version
379does not affect the semantics of the program:
380% Yes, I'm paraphrasing the result a tiny bit to remove the observe non-function
381\begin{lstlisting}[language=matita]
382  $\forall$input_program,output.
383  compile input_program = return output $\rightarrow$
384  not_wrong … (exec_inf … clight_fullexec input_program) $\rightarrow$
385  sim_with_labels
386   (exec_inf … clight_fullexec input_program)
387   (exec_inf … clight_fullexec (c_labelled_clight … output))
388\end{lstlisting}
389That is, any successful compilation produces a labelled program that
390has identical behaviour to the original, so long as there is no
391`undefined behaviour'.
392
393Note that this provides full functional correctness, including
394preservation of (non-)termination.  The intensional result above does
395not do this directly --- it does not guarantee the same result or same
396termination.  There are two mitigating factors, however: first, to
397prove the intensional property you need local simulation results that
398can be pieced together to form full behavioural equivalence, only time
399constraints have prevented us from doing so.  Second, if we wish to
400confirm a result, termination, or non-termination we could add an
401observable witness, such as a function that is only called if the
402correct result is given.  The intensional result guarantees that the
403observable witness is preserved, so the program must behave correctly.
404
405\section{Intermediate goals for the front-end}
406\label{sec:fegoals}
407
408The essential parts of the intensional proof were outlined during work
409on a toy
410compiler~\cite{d2.1,springerlink:10.1007/978-3-642-32469-7_3}.  These
411are
412\begin{enumerate}
413\item functional correctness, in particular preserving the trace of
414  cost labels,
415\item the \emph{soundness} and \emph{precision} of the cost labelling
416  on the object code, and
417\item the timing analysis on the object code produces a correct
418  mapping from cost labels to time.
419\end{enumerate}
420
421However, that toy development did not include function calls.  For the
422full \cerco{} compiler we also need to maintain the invariant that
423functions return to the correct program location in the caller, as we
424mentioned in the previous section.  During work on the back-end timing
425analysis (describe in more detail in the companion deliverable, D4.4)
426the notion of a \emph{structured trace} was developed to enforce this
427return property, and also most of the cost labelling properties too.
428
429\begin{figure}
430\begin{center}
431\includegraphics[width=0.5\linewidth]{compiler.pdf}
432\end{center}
433\caption{The compiler and proof outline}
434\label{fig:compiler}
435\end{figure}
436
437Jointly, we generalised the structured traces to apply to any of the
438intermediate languages with some idea of program counter.  This means
439that they are introduced part way through the compiler, see
440Figure~\ref{fig:compiler}.  Proving that a structured trace can be
441constructed at \textsf{RTLabs} has several virtues:
442\begin{itemize}
443\item This is the first language where every operation has its own
444  unique, easily addressable, statement.
445\item Function calls and returns are still handled implicitly in the
446  language and so the structural properties are ensured by the
447  semantics.
448\item Many of the back-end languages from \textsf{RTL} share a common
449  core set of definitions, and using structured traces throughout
450  increases this uniformity.
451\end{itemize}
452
453\begin{figure}
454\begin{center}
455\includegraphics[width=0.6\linewidth]{strtraces.pdf}
456\end{center}
457\caption{Nesting of functions in structured traces}
458\label{fig:strtrace}
459\end{figure}
460A structured trace is a mutually inductive data type which
461contains the steps from a normal program trace, but arranged into a
462nested structure which groups entire function calls together and
463aggregates individual steps between cost labels (or between the final
464cost label and the return from the function), see
465Figure~\ref{fig:strtrace}.  This captures the idea that the cost labels
466only represent costs \emph{within} a function --- calls to other
467functions are accounted for in the nested trace for their execution, and we
468can locally regard function calls as a single step.
469
470These structured traces form the core part of the intermediate results
471that we must prove so that the back-end can complete the main
472intensional result stated above.  In full, we provide the back-end
473with
474\begin{enumerate}
475\item A normal trace of the \textbf{prefix} of the program's execution
476  before reaching the measurable subtrace.  (This needs to be
477  preserved so that we know that the stack space consumed is correct,
478  and to set up the simulation results.)
479\item The \textbf{structured trace} corresponding to the measurable
480  subtrace.
481\item An additional property about the structured trace that no
482  `program counter' is \textbf{repeated} between cost labels.  Together with
483  the structure in the trace, this takes over from showing that
484  cost labelling is sound and precise.
485\item A proof that the \textbf{observables} have been preserved.
486\item A proof that the \textbf{stack limit} is still observed by the prefix and
487  the structure trace.  (This is largely a consequence of the
488  preservation of observables.)
489\end{enumerate}
490
491Following the outline in Figure~\ref{fig:compiler}, we will first deal
492with the transformations in \textsf{Clight} that produce the source
493program with cost labels, then show that measurable traces can be
494lifted to \textsf{RTLabs}, and finally that we can construct the
495properties listed above ready for the back-end proofs.
496
497\section{Input code to cost labelled program}
498
499As explained on page~\pageref{page:switchintro}, the costs of complex
500C \lstinline[language=C]'switch' statements cannot be represented with
501the simple labelling.  Our first pass replaces these statements with
502simpler C code, allowing our second pass to perform the cost
503labelling.  We show that the behaviour of programs is unchanged by
504these passes.
505
506\subsection{Switch removal}
507
508The intermediate languages of the front-end have no jump tables.
509Hence, we have to compile the \lstinline[language=C]'switch'
510constructs away before going from \textsf{Clight} to \textsf{Cminor}.
511Note that this transformation does not necessarily deteriorate the
512efficiency of the generated code. For instance, compilers such as GCC
513introduce balanced trees of ``if-then-else'' constructs for small
514switches. However, our implementation strategy is much simpler. Let
515us consider the following input statement.
516
517\begin{lstlisting}[language=C]
518   switch(e) {
519   case v1:
520     stmt1;
521   case v2:
522     stmt2;
523   default:
524     stmt_default;
525   }
526\end{lstlisting}
527
528Note that \textsf{stmt1}, \textsf{stmt2}, \ldots \textsf{stmt\_default}
529may contain \lstinline[language=C]'break' statements, which have the
530effect of exiting the switch statement. In the absence of break, the
531execution falls through each case sequentially. In our current implementation,
532we produce an equivalent sequence of ``if-then'' chained by gotos:
533\begin{lstlisting}[language=C]
534   fresh = e;
535   if(fresh == v1) {
536     $\llbracket$stmt1$\rrbracket$;
537     goto lbl_case2;
538   };
539   if(fresh == v2) {
540     lbl_case2:
541     $\llbracket$stmt2;$\rrbracket$
542     goto lbl_case2;
543   };
544   $\llbracket$stmt_default$\rrbracket$;
545   exit_label:
546\end{lstlisting}
547
548The proof had to tackle the following points:
549\begin{itemize}
550\item the source and target memories are not the same (cf. fresh variable),
551\item the flow of control is changed in a non-local way (e.g. \textbf{goto} 
552  instead of \textbf{break}).
553\end{itemize}
554
555In order to tackle the first point, we implemented a version of memory
556extensions similar to Compcert's. What has been done is the simulation proof
557for all ``easy'' cases, that do not interact with the switch removal per se,
558plus a bit of the switch case. This comprises propagating the memory extension
559through  each statement (except switch), as well as various invariants that
560are needed for the switch case (in particular, freshness hypotheses). The
561details of the evaluation process for the source switch statement and its
562target counterpart can be found in the file \textbf{switchRemoval.ma},
563along more details on the transformation itself.
564
565Proving the correctness of the second point would require reasoning on the
566semantics of \lstinline[language=C]'goto' statements. In the \textsf{Clight} 
567semantics, this is implemented as a function-wide lookup of the target label.
568The invariant we would need is the fact that a global label lookup on a freshly
569created goto is equivalent to a local lookup. This would in turn require the
570propagation of some freshness hypotheses on labels. For reasons expressed above,
571we decided to omit this part of the correctness proof.
572
573\subsection{Cost labelling}
574
575The simulation for the cost labelling pass is the simplest in the
576front-end.  The main argument is that any step of the source program
577is simulated by the same step of the labelled one, plus any extra
578steps for the added cost labels.  The extra instructions do not change
579the memory or local environments, although we have to keep track of
580the extra instructions in continuations.
581
582We do not attempt to capture any cost properties of the labelling in
583the simulation proof, allowing the proof to be oblivious to the choice
584of cost labels.  Hence we do not have to reason about the threading of
585name generation through the labelling function, greatly reducing the
586amount of effort required.  We describe how the cost properties are
587established in Section~\ref{sec:costchecks}.
588
589%TODO: both give one-step-sim-by-many forward sim results; switch
590%removal tricky, uses aux var to keep result of expr, not central to
591%intensional correctness so curtailed proof effort once reasonable
592%level of confidence in code gained; labelling much simpler; don't care
593%what the labels are at this stage, just need to know when to go
594%through extra steps.  Rolled up into a single result with a cofixpoint
595%to obtain coinductive statement of equivalence (show).
596
597\section{Finding corresponding measurable subtraces}
598
599There follow the three main passes of the front-end:
600\begin{enumerate}
601\item simplification of casts in \textsf{Clight} code
602\item \textsf{Clight} to \textsf{Cminor} translation, performing stack
603  variable allocation and simplifying control structures
604\item transformation to \textsf{RTLabs} control flow graph
605\end{enumerate}
606\todo{I keep mentioning forward sim results - I probably ought to say
607  something about determinancy} We have taken a common approach to
608each pass: first we build (or axiomatise) forward simulation results
609that are similar to normal compiler proofs, but slightly more
610fine-grained so that we can see that the call structure and relative
611placement of cost labels is preserved.
612
613Then we instantiate a general result which shows that we can find a
614\emph{measurable} subtrace in the target of the pass that corresponds
615to the measurable subtrace in the source.  By repeated application of
616this result we can find a measurable subtrace of the execution of the
617\textsf{RTLabs} code, suitable for the construction of a structured
618trace (see Section~\ref{sec:structuredtrace}.  This is essentially an
619extra layer on top of the simulation proofs that provides us with the
620extra information required for our intensional correctness proof.
621
622\subsection{Generic measurable subtrace lifting proof}
623
624Our generic proof is parametrised on a record containing small-step
625semantics for the source and target language, a classification of
626states (the same form of classification is used when defining
627structured traces), a simulation relation which loosely respects the
628classification and cost labelling \todo{this isn't very helpful} and
629four simulation results:
630\begin{enumerate}
631\item a step from a `normal' state (which is not classified as a call
632  or return) which is not a cost label is simulated by zero or more
633  `normal' steps;
634\item a step from a `call' state followed by a cost label step is
635  simulated by a step from a `call' state, a corresponding label step,
636  then zero or more `normal' steps;
637\item a step from a `call' state not followed by a cost label
638  similarly (note that this case cannot occur in a well-labelled
639  program, but we do not have enough information locally to exploit
640  this); and
641\item a cost label step is simulated by a cost label step.
642\end{enumerate}
643Finally, we need to know that a successfully translated program will
644have an initial state in the simulation relation with the original
645program's initial state.
646
647\begin{figure}
648\begin{center}
649\includegraphics[width=0.5\linewidth]{meassim.pdf}
650\end{center}
651\caption{Tiling of simulation for a measurable subtrace}
652\label{fig:tiling}
653\end{figure}
654
655To find the measurable subtrace in the target program's execution we
656walk along the original program's execution trace applying the
657appropriate simulation result by induction on the number of steps.
658While the number of steps taken varies, the overall structure is
659preserved, as illustrated in Figure~\ref{fig:tiling}.  By preserving
660the structure we also maintain the same intensional observables.  One
661delicate point is that the cost label following a call must remain
662directly afterwards\footnote{The prototype compiler allowed some
663  straight-line code to appear before the cost label until a later
664  stage of the compiler, but we must move the requirement forward to
665  fit with the structured traces.}
666% Damn it, I should have just moved the cost label forwards in RTLabs,
667% like the prototype does in RTL to ERTL; the result would have been
668% simpler.  Or was there some reason not to do that?
669(both in the program code and in the execution trace), even if we
670introduce extra steps, for example to store parameters in memory in
671\textsf{Cminor}.  Thus we have a version of the call simulation
672that deals with both in one result.
673
674In addition to the subtrace we are interested in measuring we must
675also prove that the earlier part of the trace is also preserved in
676order to use the simulation from the initial state.  It also
677guarantees that we do not run out of stack space before the subtrace
678we are interested in.  The lemmas for this prefix and the measurable
679subtrace are similar, following the pattern above.  However, the
680measurable subtrace also requires us to rebuild the termination
681proof.  This is defined recursively:
682\label{prog:terminationproof}
683\begin{lstlisting}[language=matita]
684let rec will_return_aux C (depth:nat)
685                             (trace:list (cs_state … C × trace)) on trace : bool :=
686match trace with
687[ nil $\Rightarrow$ false
688| cons h tl $\Rightarrow$
689  let $\langle$s,tr$\rangle$ := h in
690  match cs_classify C s with
691  [ cl_call $\Rightarrow$ will_return_aux C (S depth) tl
692  | cl_return $\Rightarrow$
693      match depth with
694      [ O $\Rightarrow$ match tl with [ nil $\Rightarrow$ true | _ $\Rightarrow$ false ]
695      | S d $\Rightarrow$ will_return_aux C d tl
696      ]
697  | _ $\Rightarrow$ will_return_aux C depth tl
698  ]
699].
700\end{lstlisting}
701The \lstinline'depth' is the number of return states we need to see
702before we have returned to the original function (initially zero) and
703\lstinline'trace' the measurable subtrace obtained from the running
704the semantics for the correct number of steps.  This definition
705unfolds tail recursively for each step, and once the corresponding
706simulation result has been applied a new one for the target can be
707asserted by unfolding and applying the induction hypothesis on the
708shorter trace.
709
710This then gives us an overall result for any simulation fitting the
711requirements above (contained in the \lstinline'meas_sim' record):
712\begin{lstlisting}[language=matita]
713theorem measured_subtrace_preserved :
714  $\forall$MS:meas_sim.
715  $\forall$p1,p2,m,n,stack_cost,max.
716  ms_compiled MS p1 p2 $\rightarrow$
717  measurable (ms_C1 MS) p1 m n stack_cost max $\rightarrow$
718  $\exists$m',n'.
719    measurable (ms_C2 MS) p2 m' n' stack_cost max $\wedge$
720    observables (ms_C1 MS) p1 m n = observables (ms_C2 MS) p2 m' n'.
721\end{lstlisting}
722The stack space requirement that is embedded in \lstinline'measurable'
723is a consequence of the preservation of observables.
724
725TODO: how to deal with untidy edges wrt to sim rel; anything to
726say about obs?
727
728TODO: say something about termination measures; cost labels are
729statements/exprs in these languages; split call/return gives simpler
730simulations
731
732\subsection{Simulation results for each pass}
733
734\paragraph{Cast simplification.}
735
736The parser used in Cerco introduces a lot of explicit type casts.
737If left as they are, these constructs can greatly hamper the
738quality of the generated code -- especially as the architecture
739we consider is an $8$ bit one. In \textsf{Clight}, casts are
740expressions. Hence, most of the work of this transformation
741proceeds on expressions. The tranformation proceeds by recursively
742trying to coerce an expression to a particular integer type, which
743is in practice smaller than the original one. This functionality
744is implemented by two mutually recursive functions whose signature
745is the following.
746
747\begin{lstlisting}[language=matita]
748let rec simplify_expr (e:expr) (target_sz:intsize) (target_sg:signedness)
749  : Σresult:bool×expr.
750    ∀ge,en,m. simplify_inv ge en m e (\snd result) target_sz target_sg (\fst result) ≝ $\ldots$
751
752and simplify_inside (e:expr) : Σresult:expr. conservation e result ≝ $\ldots$
753\end{lstlisting}
754
755The \textsf{simplify\_inside} acts as a wrapper for
756\textsf{simplify\_expr}. Whenever \textsf{simplify\_inside} encounters
757a \textsf{Ecast} expression, it tries to coerce the sub-expression
758to the desired type using \textsf{simplify\_expr}, which tries to
759perform the actual coercion. In return, \textsf{simplify\_expr} calls
760back \textsf{simplify\_inside} in some particular positions, where we
761decided to be conservative in order to simplify the proofs. However,
762the current design allows to incrementally revert to a more aggressive
763version, by replacing recursive calls to \textsf{simplify\_inside} by
764calls to \textsf{simplify\_expr} \emph{and} proving the corresponding
765invariants -- where possible.
766
767The \textsf{simplify\_inv} invariant encodes either the conservation
768of the semantics during the transformation corresponding to the failure
769of the transformation (\textsf{Inv\_eq} constructor), or the successful
770downcast of the considered expression to the target type
771(\textsf{Inv\_coerce\_ok}).
772
773\begin{lstlisting}[language=matita]
774inductive simplify_inv
775  (ge : genv) (en : env) (m : mem)
776  (e1 : expr) (e2 : expr) (target_sz : intsize) (target_sg : signedness) : bool → Prop ≝
777| Inv_eq : ∀result_flag. $\ldots$
778     simplify_inv ge en m e1 e2 target_sz target_sg result_flag
779| Inv_coerce_ok : ∀src_sz,src_sg.
780     (typeof e1) = (Tint src_sz src_sg) → (typeof e2) = (Tint target_sz target_sg) →     
781     (smaller_integer_val src_sz target_sz src_sg (exec_expr ge en m e1) (exec_expr ge en m e2)) →
782     simplify_inv ge en m e1 e2 target_sz target_sg true.
783\end{lstlisting}
784
785The \textsf{conservation} invariant simply states the conservation
786of the semantics, as in the \textsf{Inv\_eq} constructor of the previous
787invariant.
788
789\begin{lstlisting}[language=matita]
790definition conservation ≝ λe,result. ∀ge,en,m.
791          res_sim ? (exec_expr ge en m e) (exec_expr ge en m result)
792         ∧ res_sim ? (exec_lvalue ge en m e) (exec_lvalue ge en m result)
793         ∧ typeof e = typeof result.
794\end{lstlisting}
795
796This invariant is then easily lifted to statement evaluations.
797The main problem encountered with this particular pass was dealing with
798inconsistently typed programs, a canonical case being a particular
799integer constant of a certain size typed with another size. This
800prompted the need to introduce numerous type checks, complexifying
801both the implementation and the proof.
802\todo{Make this a particular case of the more general statement on baking more invariants in the Clight language}
803
804\paragraph{Clight to Cminor.}
805
806This pass is the last one operating on the Clight intermediate language.
807Its input is a full \textsf{Clight} program, and its output is a \textsf{Cminor} 
808program. Note that we do not use an equivalent of the C\#minor language: we
809translate directly to Cminor. This presents the advantage of not requiring the
810special loop constructs, nor the explicit block structure. Another salient
811point of our approach is that a significant part of the properties needed for
812the simulation proof were directly encoded in dependently typed translation
813functions. This concerns more precisely freshness conditions and well-typedness
814conditions. The main effects of the transformation from \textsf{Clight} to
815\textsf{Cminor} are listed below.
816
817\begin{itemize}
818\item Variables are classified as being either globals, stack-allocated
819  locals or potentially register-allocated locals. The value of register-allocated
820  local variables is moved out of the modelled memory and stored in a
821  dedicated environment.
822\item In Clight, each local variable has a dedicated memory block, whereas
823  stack-allocated locals are bundled together on a function-by-function basis.
824\item Loops are converted to jumps.
825\end{itemize}
826
827The first two points require memory injections which are more flexible that those
828needed in the switch removal case. In the remainder of this section, we briefly
829discuss our implementation of memory injections, and then the simulation proof.
830
831\subparagraph{Memory injections.}
832
833Our memory injections are modelled after the work of Blazy \& Leroy.
834However, the corresponding paper is based on the first version of the
835Compcert memory model, whereas we use a much more concrete model, allowing byte-level
836manipulations (as in the later version of Compcert's memory model). We proved
837roughly 80 \% of the required lemmas. Some difficulties encountered were notably
838due to some too relaxed conditions on pointer validity (problem fixed during development).
839Some more conditions had to be added to take care of possible overflows when converting
840from \textbf{Z} block bounds to $16$ bit pointer offsets (in pratice, these overflows
841only occur in edge cases that are easily ruled out -- but this fact is not visible
842in memory injections). Concretely, some of the lemmas on the preservation of simulation of
843loads after writes were axiomatised, for lack of time.
844
845\subparagraph{Simulation proof.}
846
847The correction proof for this transformation was not terminated. We proved the
848simulation result for expressions and for some subset of the critical statement cases.
849Notably lacking are the function entry and exit, where the memory injection is
850properly set up. As can be guessed, a significant amount of work has to be performed
851to show the conservation of all invariants at each simulation step.
852
853\todo{list cases, explain while loop, explain labeling problem}
854
855\section{Checking cost labelling properties}
856\label{sec:costchecks}
857
858Ideally, we would provide proofs that the cost labelling pass always
859produced programs that are soundly and precisely labelled and that
860each subsequent pass preserves these properties.  This would match our
861use of dependent types to eliminate impossible sources of errors
862during compilation, in particular retaining intermediate language type
863information.
864
865However, given the limited amount of time available we realised that
866implementing a compile-time check for a sound and precise labelling of
867the \textsf{RTLabs} intermediate code would reduce the proof burden
868considerably.  This is similar in spirit to the use of translation
869validation in certified compilation\todo{Cite some suitable work
870  here}, which makes a similar trade-off between the potential for
871compile-time failure and the volume of proof required.
872
873The check cannot be pushed into a later stage of the compiler because
874much of the information is embedded into the structured traces.
875However, if an alternative method was used to show that function
876returns in the compiled code are sufficiently well-behaved, then we
877could consider pushing the cost property checks into the timing
878analysis itself.  We leave this as a possible area for future work.
879
880\subsection{Implementation and correctness}
881
882For a cost labelling to be sound and precise we need a cost label at
883the start of each function, after each branch and at least one in
884every loop.  The first two parts are trivial to check by examining the
885code.  In \textsf{RTLabs} the last part is specified by saying
886that there is a bound on the number of successive instruction nodes in
887the CFG that you can follow before you encounter a cost label, and
888checking this is more difficult.
889
890The implementation works through the set of nodes in the graph,
891following successors until a cost label is found or a label-free cycle
892is discovered (in which case the property does not hold and we stop).
893This is made easier by the prior knowledge that any branch is followed
894by cost labels, so we do not need to search each branch.  When a label
895is found, we remove the chain from the set and continue from another
896node in the set until it is empty, at which point we know that there
897is a bound for every node in the graph.
898
899Directly reasoning about the function that implements this would be
900rather awkward, so an inductive specification of a single step of its
901behaviour was written and proved to match the implementation.  This
902was then used to prove the implementation sound and complete.
903
904While we have not attempted to proof that the cost labelled properties
905are established and preserved earlier in the compiler, we expect that
906the effort for the \textsf{Cminor} to \textsf{RTLabs} would be similar
907to the work outlined above, because it involves the change from
908requiring a cost label at particular positions to requiring cost
909labels to break loops in the CFG.  As there are another three passes
910to consider (including the labelling itself), we believe that using
911the check above is much simpler overall.
912
913% TODO? Found some Clight to Cminor bugs quite quickly
914
915\section{Existence of a structured trace}
916\label{sec:structuredtrace}
917
918\emph{Structured traces} enrich the execution trace of a program by
919nesting function calls in a mixed-step style\todo{Can I find a
920  justification for mixed-step} and embedding the cost properties of
921the program.  It was originally designed to support the proof of
922correctness for the timing analysis of the object code in the
923back-end, then generalised to provide a common structure to use from
924the end of the front-end to the object code.  See
925Figure~\ref{fig:strtrace} on page~\pageref{fig:strtrace}
926for an illustration of a structured trace.
927
928To make the definition generic we abstract over the semantics of the
929language,
930\begin{lstlisting}[language=matita]
931record abstract_status : Type[1] :=
932  { as_status :> Type[0]
933  ; as_execute : relation as_status
934  ; as_pc : DeqSet
935  ; as_pc_of : as_status $\rightarrow$ as_pc
936  ; as_classify : as_status $\rightarrow$ status_class
937  ; as_label_of_pc : as_pc $\rightarrow$ option costlabel
938  ; as_after_return : ($\Sigma$s:as_status. as_classify s =  cl_call) $\rightarrow$ as_status $\rightarrow$ Prop
939  ; as_result: as_status $\rightarrow$ option int
940  ; as_call_ident : ($\Sigma$s:as_status.as_classify s = cl_call) $\rightarrow$ ident
941  ; as_tailcall_ident : ($\Sigma$s:as_status.as_classify s = cl_tailcall) $\rightarrow$ ident
942  }.
943\end{lstlisting}
944which gives a type of states, an execution relation, some notion of
945program counters with decidable equality, the classification of states
946and functions to extract the observable intensional information (cost
947labels and the identity of functions that are called).  The
948\lstinline'as_after_return' property links the state before a function
949call with the state after return, providing the evidence that
950execution returns to the correct place.  The precise form varies
951between stages; in \textsf{RTLabs} it insists the CFG, the pointer to
952the CFG node to execute next, and some call stack information is
953preserved.
954
955The structured traces are defined using three mutually inductive
956types.  The core data structure is \lstinline'trace_any_label', which
957captures some straight-line execution until the next cost label or
958function return.  Each function call is embedded as a single step,
959with its own trace nested inside and the before and after states
960linked by \lstinline'as_after_return', and states classified as a
961`jump' (in particular branches) must be followed by a cost label.
962
963The second type, \lstinline'trace_label_label', is a
964\lstinline'trace_any_label' where the initial state is cost labelled.
965Thus a trace in this type identifies a series of steps whose cost is
966entirely accounted for by the label at the start.
967
968Finally, \lstinline'trace_label_return' is a sequence of
969\lstinline'trace_label_label' values which end in the return from the
970function.  These correspond to a measurable subtrace, and in
971particular include executions of an entire function call (and so are
972used for the nested calls in \lstinline'trace_any_label').
973
974\subsection{Construction}
975
976The construction of the structured trace replaces syntactic cost
977labelling properties which place requirements on where labels appear
978in the program, with semantics properties that constrain the execution
979traces of the program.  The construction begins by defining versions
980of the sound and precise labelling properties on the program text that
981appears in states rather than programs, and showing that these are
982preserved by steps of the \textsf{RTLabs} semantics.
983
984Then we show that each cost labelling property the structured traces
985definition requires is satisfied.  These are broken up by the
986classification of states.  Similarly, we prove a step-by-step stack
987preservation result, which states that the \textsf{RTLabs} semantics
988never changes the lower parts of the stack.
989
990The core part of the construction of a structured trace is to use the
991proof of termination from the measurable trace (defined on
992page~\pageref{prog:terminationproof}) to `fold up' the execution into
993the nested form.  The results outlined above fill in the proof
994obligations for the cost labelling properties and the stack
995preservation result shows that calls return to the correct location.
996
997The structured trace alone is not sufficient to capture the property
998that the program is soundly labelled.  While the structured trace
999guarantees termination, it still permits a loop to be executed a
1000finite number of times without encountering a cost label.  We
1001eliminate this by proving that no `program counter' repeats within any
1002\lstinline'trace_any_label' section by showing that it is incompatible
1003with the property that there is a bound on the number of successor
1004instructions you can follow in the CFG before you encounter a cost
1005label.
1006
1007\subsubsection{Whole program structured traces}
1008
1009The development of the construction above started relatively early,
1010before the measurable subtraces preservation proofs.  To be confident
1011that the traces were well-formed, we also developed a whole program
1012form that embeds the traces above.  This includes non-terminating
1013programs, where an infinite number of the terminating structured
1014traces are embedded.  This construction confirmed that our definition
1015of structured traces was consistent, although we later found that we
1016did not require them for the compiler correctness results.
1017
1018To construct these we need to know whether each function call will
1019eventually terminate, requiring the use of the excluded middle.  This
1020classical reasoning is local to the construction of whole program
1021traces and is not necessary for our main results.
1022
1023\section{Conclusion}
1024
1025TODO
1026
1027TODO: appendix on code layout?
1028
1029\bibliographystyle{plain}
1030\bibliography{report}
1031
1032\end{document}
Note: See TracBrowser for help on using the repository browser.