source: Deliverables/D3.4/Report/report.tex @ 3216

Last change on this file since 3216 was 3216, checked in by campbell, 7 years ago

Revisions throughout 3.4.

File size: 48.3 KB
Line 
1\documentclass[11pt,epsf,a4wide]{article}
2\usepackage[mathletters]{ucs}
3\usepackage[utf8x]{inputenc}
4\usepackage{stmaryrd}
5\usepackage{listings}
6\usepackage{../../style/cerco}
7\newcommand{\ocaml}{OCaml}
8\newcommand{\clight}{Clight}
9\newcommand{\matita}{Matita}
10\newcommand{\sdcc}{\texttt{sdcc}}
11
12\newcommand{\textSigma}{\ensuremath{\Sigma}}
13
14% LaTeX Companion, p 74
15\newcommand{\todo}[1]{\marginpar{\raggedright - #1}}
16
17\lstdefinelanguage{coq}
18  {keywords={Definition,Lemma,Theorem,Remark,Qed,Save,Inductive,Record},
19   morekeywords={[2]if,then,else},
20  }
21
22\lstdefinelanguage{matita}
23  {keywords={definition,lemma,theorem,remark,inductive,record,qed,let,rec,match,with,Type,and,on},
24   morekeywords={[2]whd,normalize,elim,cases,destruct},
25   mathescape=true,
26   morecomment=[n]{(*}{*)},
27  }
28
29\lstset{language=matita,basicstyle=\small\tt,columns=flexible,breaklines=false,
30        keywordstyle=\color{red}\bfseries,
31        keywordstyle=[2]\color{blue},
32        commentstyle=\color{green},
33        stringstyle=\color{blue},
34        showspaces=false,showstringspaces=false}
35
36\lstset{extendedchars=false}
37\lstset{inputencoding=utf8x}
38\DeclareUnicodeCharacter{8797}{:=}
39\DeclareUnicodeCharacter{10746}{++}
40\DeclareUnicodeCharacter{9001}{\ensuremath{\langle}}
41\DeclareUnicodeCharacter{9002}{\ensuremath{\rangle}}
42
43
44\title{
45INFORMATION AND COMMUNICATION TECHNOLOGIES\\
46(ICT)\\
47PROGRAMME\\
48\vspace*{1cm}Project FP7-ICT-2009-C-243881 \cerco{}}
49
50\date{ }
51\author{}
52
53\begin{document}
54\thispagestyle{empty}
55
56\vspace*{-1cm}
57\begin{center}
58\includegraphics[width=0.6\textwidth]{../../style/cerco_logo.png}
59\end{center}
60
61\begin{minipage}{\textwidth}
62\maketitle
63\end{minipage}
64
65
66\vspace*{0.5cm}
67\begin{center}
68\begin{LARGE}
69\bf
70Report n. D3.4\\
71Front-end Correctness Proofs\\
72\end{LARGE} 
73\end{center}
74
75\vspace*{2cm}
76\begin{center}
77\begin{large}
78Version 1.0
79\end{large}
80\end{center}
81
82\vspace*{0.5cm}
83\begin{center}
84\begin{large}
85Authors:\\
86Brian~Campbell, Ilias~Garnier, James~McKinna, Ian~Stark
87\end{large}
88\end{center}
89
90\vspace*{\fill}
91\noindent
92Project Acronym: \cerco{}\\
93Project full title: Certified Complexity\\
94Proposal/Contract no.: FP7-ICT-2009-C-243881 \cerco{}\\
95
96\clearpage \pagestyle{myheadings} \markright{\cerco{}, FP7-ICT-2009-C-243881}
97
98\newpage
99
100\vspace*{7cm}
101\paragraph{Abstract}
102We report on the correctness proofs for the front-end of the \cerco{}
103cost lifting compiler.  First, we identify the core result we wish to
104prove, which says that the we correctly predict the precise execution
105time for particular parts of the execution called \emph{measurable}
106subtraces.  Then we consider the three distinct parts of the task:
107showing that the \emph{annotated source code} output by the compiler
108has equivalent behaviour to the original input (up to the
109annotations); showing that a measurable subtrace of the
110annotated source code corresponds to an equivalent measurable subtrace
111in the code produced by the front-end, including costs; and finally
112showing that the enriched \emph{structured} execution traces required
113for cost correctness in the back-end can be constructed from the
114properties of the code produced by the front-end.
115
116A key part of our work is that the intensional correctness results which show
117that we get consistent cost measurements throughout the intermediate languages
118of the compiler can be layered on top of normal forward simulation results,
119if we split those results into local call-structure preserving simulations.
120
121This deliverable shows correctness results about the formalised
122compiler described in D3.2, using the source language semantics from
123D3.1 and intermediate language semantics from D3.3.  It builds on
124earlier work on the correctness of a toy compiler built to test the
125labelling approach in D2.1. Together with the companion deliverable
126about the correctness of the back-end, D4.4, we obtain results about
127the whole formalised compiler.
128
129% TODO: mention the deliverable about the extracted compiler et al?
130
131\newpage 
132
133\tableofcontents
134
135% CHECK: clear up any -ize vs -ise
136% CHECK: clear up any "front end" vs "front-end"
137% CHECK: clear up any mentions of languages that aren't textsf'd.
138% CHECK: fix unicode in listings
139
140\section{Introduction}
141
142The \cerco{} compiler produces a version of the source code containing
143annotations describing the timing behaviour of the object code, as well
144as the object code itself. It compiles C code, targeting
145microcontrollers implementing the Intel 8051 architecture.  There are
146two versions: first, an initial prototype was implemented in
147\ocaml{}~\cite{d2.2}, then a version was formalised in the \matita{} proof
148assistant~\cite{d3.2,d4.2} and extracted to \ocaml{} code to
149produce an executable compiler.  In this document we present results
150formalised in \matita{} about the front-end of the latter version of
151the compiler, and how that fits into the verification of the whole
152compiler.
153
154A key part of this work was to layer the intensional correctness
155results that show that the costs produced are correct on top of the
156proofs about the compiled code's extensional behaviour (that is, the
157functional correctness of the compiler).  Unfortunately, the ambitious
158goal of completely verifying the entire compiler was not feasible
159within the time available, but thanks to this separation of
160extensional and intensional proofs we are able to axiomatize some
161simulation results which are similar to those in other compiler verification
162projects and concentrate on the novel intensional proofs.  We were
163also able to add stack space costs to obtain a stronger result.  The
164proofs were made more tractable by introducing compile-time checks for
165the `sound and precise' cost labelling properties rather than proving
166that they are preserved throughout.
167
168The overall statement of correctness says that the annotated program has the
169same behaviour as the input, and that for any suitably well-structured part of
170the execution (which we call \emph{measurable}), the object code will execute
171the same behaviour taking precisely the time given by the cost annotations in
172the annotated source program.
173
174In the next section we recall the structure of the compiler and make the overall
175statement more precise.  Following that, in Section~\ref{sec:fegoals} we
176describe the statements we need to prove about the intermediate \textsf{RTLabs}
177programs for the back-end proofs.
178Section~\ref{sec:inputtolabelling} covers the passes which produce the
179annotated source program and Section~\ref{sec:measurablelifting} the rest
180of the transformations in the front-end.  Then the compile time checks
181for good cost labelling are detailed in Section~\ref{sec:costchecks}
182and the proofs that the structured traces required by the back-end
183exist are discussed in Section~\ref{sec:structuredtrace}.
184
185\section{The compiler and main goals}
186
187The unformalised \ocaml{} \cerco{} compiler was originally described
188in Deliverables 2.1 and 2.2.  Its design was replicated in the formal
189\matita{} code, which was presented in Deliverables 3.2 and 4.2, for
190the front-end and back-end respectively.
191
192\begin{figure}
193\begin{center}
194\includegraphics[width=0.5\linewidth]{compiler-plain.pdf}
195\end{center}
196\caption{Languages in the \cerco{} compiler}
197\label{fig:compilerlangs}
198\end{figure}
199
200The compiler uses a number of intermediate languages, as outlined the
201middle two lines of Figure~\ref{fig:compilerlangs}.  The upper line
202represents the front-end of the compiler, and the lower the back-end,
203finishing with 8051 binary code.  Not all of the front-end passes
204introduces a new language, and Figure~\ref{fig:summary} presents a
205list of every pass involved.
206
207\begin{figure}
208\begin{center}
209\begin{minipage}{.8\linewidth}
210\begin{tabbing}
211\quad \= $\downarrow$ \quad \= \kill
212\textsf{C} (unformalized)\\
213\> $\downarrow$ \> CIL parser (unformalized \ocaml)\\
214\textsf{Clight}\\
215%\> $\downarrow$ \> add runtime functions\\
216\> $\downarrow$ \> \lstinline[language=C]'switch' removal\\
217\> $\downarrow$ \> labelling\\
218\> $\downarrow$ \> cast removal\\
219\> $\downarrow$ \> stack variable allocation and control structure
220 simplification\\
221\textsf{Cminor}\\
222%\> $\downarrow$ \> generate global variable initialization code\\
223\> $\downarrow$ \> transform to RTL graph\\
224\textsf{RTLabs}\\
225\> $\downarrow$ \> check cost labelled properties of RTL graph\\
226\> $\downarrow$ \> start of target specific back-end\\
227\>\quad \vdots
228\end{tabbing}
229\end{minipage}
230\end{center}
231\caption{Front-end languages and compiler passes}
232\label{fig:summary}
233\end{figure}
234
235\label{page:switchintro}
236The annotated source code is taken after the cost labelling phase.
237Note that there is a pass to replace C \lstinline[language=C]'switch'
238statements before labelling --- we need to remove them because the
239simple form of labelling used in the formalised compiler is not quite
240capable of capturing their execution time costs, largely due to C's
241`fall-through' behaviour where execution from one branch continues in
242the next unless there is an explicit \lstinline[language=C]'break'.
243
244The cast removal phase which follows cost labelling simplifies
245expressions to prevent unnecessary arithmetic promotion, which is
246specified by the C standard but costly for an 8-bit target.  The
247transformation to \textsf{Cminor} and subsequently \textsf{RTLabs}
248bear considerable resemblance to some passes of the CompCert
249compiler~\cite{Blazy-Leroy-Clight-09,Leroy-backend}, although we use a simpler \textsf{Cminor} where
250all loops use \lstinline[language=C]'goto' statements, and the
251\textsf{RTLabs} language retains a target-independent flavour.  The
252back-end takes \textsf{RTLabs} code as input.
253
254The whole compilation function returns the following information on success:
255\begin{lstlisting}[language=matita]
256  record compiler_output : Type[0] :=
257   { c_labelled_object_code: labelled_object_code
258   ; c_stack_cost: stack_cost_model
259   ; c_max_stack: nat
260   ; c_init_costlabel: costlabel
261   ; c_labelled_clight: clight_program
262   ; c_clight_cost_map: clight_cost_map
263   }.
264\end{lstlisting}
265It consists of annotated 8051 object code, a mapping from function
266identifiers to the function's stack space usage, the space available for the
267stack after global variable allocation, a cost label covering the
268execution time for the initialisation of global variables and the call
269to the \lstinline[language=C]'main' function, the annotated source
270code, and finally a mapping from cost labels to actual execution time
271costs.
272
273An \ocaml{} pretty printer is used to provide a concrete version of
274the output code and annotated source code.  In the case of the
275annotated source code, it also inserts the actual costs alongside the
276cost labels, and optionally adds a global cost variable and
277instrumentation to support further reasoning.
278
279\subsection{Revisions to the prototype compiler}
280
281Our focus on intensional properties prompted us to consider whether we
282could incorporate stack space into the costs presented to the user.
283We only allocate one fixed-size frame per function, so modelling this
284was relatively simple.  It is the only form of dynamic memory
285allocation provided by the compiler, so we were able to strengthen the
286statement of the goal to guarantee successful execution whenever the
287stack space obeys the \lstinline'c_max_stack' bound calculated by
288subtracting the global variable requirements from the total memory
289available.
290
291The cost labelling checks at the end of Figure~\ref{fig:summary} have been
292introduced to reduce the proof burden, and are described in
293Section~\ref{sec:costchecks}.
294
295The use of dependent types to capture simple intermediate language
296invariants makes every front-end pass except \textsf{Clight} to
297\textsf{Cminor} and the cost checks total functions.  Hence various
298well-formedness and type safety checks are performed once between
299\textsf{Clight} and \textsf{Cminor}, and the invariants rule out any
300difficulties in the later stages.
301With the benefit of hindsight we would have included an initial
302checking phase to produce a `well-formed' variant of \textsf{Clight},
303conjecturing that this would simplify various parts of the proofs for
304the \textsf{Clight} stages which deal with potentially ill-formed
305code.
306
307\todo{move of initialisation code?}
308
309\subsection{Main goals}
310
311Informally, our main intensional result links the time difference in a source
312code execution to the time difference in the object code, expressing the time
313for the source by summing the values for the cost labels in the trace, and the
314time for the target by a clock built in to the 8051 executable semantics.
315
316The availability of precise timing information for 8501
317implementations and the design of the compiler allow it to give exact
318time costs in terms of processor cycles, not just upper bounds.
319However, these exact results are only available if the subtrace we
320measure starts and ends at suitable points.  In particular, pure
321computation with no observable effects may be reordered and moved past
322cost labels, so we cannot measure time between arbitrary statements in
323the program.
324
325There is also a constraint on the subtraces that we
326measure due to the requirements of the correctness proof for the
327object code timing analysis.  To be sure that the timings are assigned
328to the correct cost label, we need to know that each return from a
329function call must go to the correct return address.  It is difficult
330to observe this property locally in the object code because it relies
331on much earlier stages in the compiler.  To convey this information to
332the timing analysis extra structure is imposed on the subtraces, which
333is described in Section~\ref{sec:fegoals}.
334
335% Regarding the footnote, would there even be much point?
336% TODO: this might be quite easy to add ('just' subtract the
337% measurable subtrace from the second label to the end).  Could also
338% measure other traces in this manner.
339These restrictions are reflected in the subtraces that we give timing
340guarantees on; they must start at a cost label and end at the return
341of the enclosing function of the cost label\footnote{We expect that
342  this would generalise to more general subtraces by subtracting costs
343  for unwanted measurable suffixes of a measurable subtrace.}.  A
344typical example of such a subtrace is the execution of an entire
345function from the cost label at the start of the function until it
346returns.  We call such any such subtrace \emph{measurable} if it (and
347the prefix of the trace from the start to the subtrace) can also be
348executed within the available stack space.
349
350Now we can give the main intensional statement for the compiler.
351Given a \emph{measurable} subtrace for a labelled \textsf{Clight}
352program, there is a subtrace of the 8051 object code program where the
353time differences match.  Moreover, \emph{observable} parts of the
354trace also match --- these are the appearance of cost labels and
355function calls and returns.
356
357
358
359More formally, the definition of this statement in \matita{} is
360\begin{lstlisting}[language=matita]
361definition simulates :=
362  $\lambda$p: compiler_output.
363  let initial_status := initialise_status $...$ (cm (c_labelled_object_code $...$ p)) in
364  $\forall$m1,m2.
365   measurable Clight_pcs (c_labelled_clight $...$ p) m1 m2
366       (stack_sizes (c_stack_cost $...$ p)) (c_max_stack $...$ p) $\rightarrow$
367  $\forall$c1,c2.
368   clock_after Clight_pcs (c_labelled_clight $...$ p) m1 (c_clight_cost_map $...$ p) = OK $...$ c1 $\rightarrow$
369   clock_after Clight_pcs (c_labelled_clight $...$ p) (m1+m2) (c_clight_cost_map $...$ p) = OK $...$ c2 $\rightarrow$
370  $\exists$n1,n2.
371   observables Clight_pcs (c_labelled_clight $...$ p) m1 m2 =
372     observables (OC_preclassified_system (c_labelled_object_code $...$ p))
373          (c_labelled_object_code $...$ p) n1 n2
374  $\wedge$
375   clock ?? (execute (n1+n2) ? initial_status) =
376     clock ?? (execute n1 ? initial_status) + (c2-c1).
377\end{lstlisting}
378where the \lstinline'measurable', \lstinline'clock_after' and
379\lstinline'observables' definitions are generic definitions for multiple
380languages; in this case the \lstinline'Clight_pcs' record applies them
381to \textsf{Clight} programs.
382
383There is a second part to the statement, which says that the initial
384processing of the input program to produce the cost labelled version
385does not affect the semantics of the program:
386% Yes, I'm paraphrasing the result a tiny bit to remove the observe non-function
387\begin{lstlisting}[language=matita]
388  $\forall$input_program,output.
389  compile input_program = return output $\rightarrow$
390  not_wrong $...$ (exec_inf $...$ clight_fullexec input_program) $\rightarrow$
391  sim_with_labels
392   (exec_inf $...$ clight_fullexec input_program)
393   (exec_inf $...$ clight_fullexec (c_labelled_clight $...$ output))
394\end{lstlisting}
395That is, any successful compilation produces a labelled program that
396has identical behaviour to the original, so long as there is no
397`undefined behaviour'.
398
399Note that this statement provides full functional correctness, including
400preservation of (non-)termination.  The intensional result above does
401not do this directly --- it does not guarantee the same result or same
402termination.  There are two mitigating factors, however: first, to
403prove the intensional property you need local simulation results --- these
404can be pieced together to form full behavioural equivalence, only time
405constraints have prevented us from doing so.  Second, if we wish to
406confirm a result, termination, or non-termination we could add an
407observable witness, such as a function that is only called if the
408correct result is given.  The intensional result guarantees that the
409observable witness is preserved, so the program must behave correctly.
410
411\section{Intermediate goals for the front-end}
412\label{sec:fegoals}
413
414The essential parts of the intensional proof were outlined during work
415on a toy compiler in Task
4162.1~\cite{d2.1,springerlink:10.1007/978-3-642-32469-7_3}.  These are
417\begin{enumerate}
418\item functional correctness, in particular preserving the trace of
419  cost labels,
420\item the \emph{soundness} and \emph{precision} of the cost labelling
421  on the object code, and
422\item the timing analysis on the object code produces a correct
423  mapping from cost labels to time.
424\end{enumerate}
425
426However, that toy development did not include function calls.  For the
427full \cerco{} compiler we also need to maintain the invariant that
428functions return to the correct program location in the caller, as we
429mentioned in the previous section.  During work on the back-end timing
430analysis (describe in more detail in the companion deliverable, D4.4)
431the notion of a \emph{structured trace} was developed to enforce this
432return property, and also most of the cost labelling properties too.
433
434\begin{figure}
435\begin{center}
436\includegraphics[width=0.5\linewidth]{compiler.pdf}
437\end{center}
438\caption{The compiler and proof outline}
439\label{fig:compiler}
440\end{figure}
441
442Jointly, we generalised the structured traces to apply to any of the
443intermediate languages which have some idea of program counter.  This means
444that they are introduced part way through the compiler, see
445Figure~\ref{fig:compiler}.  Proving that a structured trace can be
446constructed at \textsf{RTLabs} has several virtues:
447\begin{itemize}
448\item This is the first language where every operation has its own
449  unique, easily addressable, statement.
450\item Function calls and returns are still handled implicitly in the
451  language and so the structural properties are ensured by the
452  semantics.
453\item Many of the back-end languages from \textsf{RTL} onwards share a common
454  core set of definitions, and using structured traces throughout
455  increases this uniformity.
456\end{itemize}
457
458\begin{figure}
459\begin{center}
460\includegraphics[width=0.6\linewidth]{strtraces.pdf}
461\end{center}
462\caption{Nesting of functions in structured traces}
463\label{fig:strtrace}
464\end{figure}
465A structured trace is a mutually inductive data type which
466contains the steps from a normal program trace, but arranged into a
467nested structure which groups entire function calls together and
468aggregates individual steps between cost labels (or between the final
469cost label and the return from the function), see
470Figure~\ref{fig:strtrace}.  This captures the idea that the cost labels
471only represent costs \emph{within} a function --- calls to other
472functions are accounted for in the nested trace for their execution, and we
473can locally regard function calls as a single step.
474
475These structured traces form the core part of the intermediate results
476that we must prove so that the back-end can complete the main
477intensional result stated above.  In full, we provide the back-end
478with
479\begin{enumerate}
480\item A normal trace of the \textbf{prefix} of the program's execution
481  before reaching the measurable subtrace.  (This needs to be
482  preserved so that we know that the stack space consumed is correct,
483  and to set up the simulation results.)
484\item The \textbf{structured trace} corresponding to the measurable
485  subtrace.
486\item An additional property about the structured trace that no
487  `program counter' is \textbf{repeated} between cost labels.  Together with
488  the structure in the trace, this takes over from showing that
489  cost labelling is sound and precise.
490\item A proof that the \textbf{observables} have been preserved.
491\item A proof that the \textbf{stack limit} is still observed by the prefix and
492  the structure trace.  (This is largely a consequence of the
493  preservation of observables.)
494\end{enumerate}
495
496Following the outline in Figure~\ref{fig:compiler}, we will first deal
497with the transformations in \textsf{Clight} that produce the source
498program with cost labels, then show that measurable traces can be
499lifted to \textsf{RTLabs}, and finally show that we can construct the
500properties listed above ready for the back-end proofs.
501
502\section{Input code to cost labelled program}
503\label{sec:inputtolabelling}
504
505As explained on page~\pageref{page:switchintro}, the costs of complex
506C \lstinline[language=C]'switch' statements cannot be represented with
507the simple labelling used in the formalised compiler.  Our first pass
508replaces these statements with simpler C code, allowing our second
509pass to perform the cost labelling.  We show that the behaviour of
510programs is unchanged by these passes using forward
511simulations\footnote{All of our languages are deterministic, which can
512be seen directly from their executable definitions.  Thus we know that
513forward simulations are sufficient because the target cannot have any
514other behaviour.}.
515
516\subsection{Switch removal}
517
518We compile away \lstinline[language=C]'switch' statements into more
519basic \textsf{Clight} code.
520Note that this transformation does not necessarily deteriorate the
521efficiency of the generated code. For instance, compilers such as GCC
522introduce balanced trees of ``if-then-else'' constructs for small
523switches. However, our implementation strategy is much simpler. Let
524us consider the following input statement.
525
526\begin{lstlisting}[language=C]
527   switch(e) {
528   case v1:
529     stmt1;
530   case v2:
531     stmt2;
532   default:
533     stmt_default;
534   }
535\end{lstlisting}
536
537Note that \textsf{stmt1}, \textsf{stmt2}, \ldots \textsf{stmt\_default}
538may contain \lstinline[language=C]'break' statements, which have the
539effect of exiting the switch statement. In the absence of break, the
540execution falls through each case sequentially. In our implementation,
541we produce an equivalent sequence of ``if-then'' chained by gotos:
542\begin{lstlisting}[language=C]
543   fresh = e;
544   if(fresh == v1) {
545     $\llbracket$stmt1$\rrbracket$;
546     goto lbl_case2;
547   };
548   if(fresh == v2) {
549     lbl_case2:
550     $\llbracket$stmt2$\rrbracket$;
551     goto lbl_case2;
552   };
553   $\llbracket$stmt_default$\rrbracket$;
554   exit_label:
555\end{lstlisting}
556
557The proof had to tackle the following points:
558\begin{itemize}
559\item the source and target memories are not the same (due to the fresh variable),
560\item the flow of control is changed in a non-local way (e.g. \textbf{goto} 
561  instead of \textbf{break}).
562\end{itemize}
563
564In order to tackle the first point, we implemented a version of memory
565extensions similar to Compcert's. What has been done is the simulation proof
566for all ``easy'' cases, that do not interact with the switch removal per se,
567plus a bit of the switch case. This comprises propagating the memory extension
568through  each statement (except switch), as well as various invariants that
569are needed for the switch case (in particular, freshness hypotheses). The
570details of the evaluation process for the source switch statement and its
571target counterpart can be found in the file \textbf{switchRemoval.ma},
572along more details on the transformation itself.
573
574Proving the correctness of the second point would require reasoning on the
575semantics of \lstinline[language=C]'goto' statements. In the \textsf{Clight} 
576semantics, this is implemented as a function-wide lookup of the target label.
577The invariant we would need is the fact that a global label lookup on a freshly
578created goto is equivalent to a local lookup. This would in turn require the
579propagation of some freshness hypotheses on labels. For reasons expressed above,
580we decided to omit this part of the correctness proof.
581
582\subsection{Cost labelling}
583
584The simulation for the cost labelling pass is the simplest in the
585front-end.  The main argument is that any step of the source program
586is simulated by the same step of the labelled one, plus any extra
587steps for the added cost labels.  The extra instructions do not change
588the memory or local environments, although we have to keep track of
589the extra instructions that appear in continuations, for example
590during the execution of a \lstinline[language=C]'while' loop.
591
592We do not attempt to capture any cost properties of the labelling in
593the simulation proof\footnote{We describe how the cost properties are
594established in Section~\ref{sec:costchecks}.}, allowing the proof to be oblivious to the choice
595of cost labels.  Hence we do not have to reason about the threading of
596name generation through the labelling function, greatly reducing the
597amount of effort required.
598
599%TODO: both give one-step-sim-by-many forward sim results; switch
600%removal tricky, uses aux var to keep result of expr, not central to
601%intensional correctness so curtailed proof effort once reasonable
602%level of confidence in code gained; labelling much simpler; don't care
603%what the labels are at this stage, just need to know when to go
604%through extra steps.  Rolled up into a single result with a cofixpoint
605%to obtain coinductive statement of equivalence (show).
606
607\section{Finding corresponding measurable subtraces}
608\label{sec:measurablelifting}
609
610There follow the three main passes of the front-end:
611\begin{enumerate}
612\item simplification of casts in \textsf{Clight} code
613\item \textsf{Clight} to \textsf{Cminor} translation, performing stack
614  variable allocation and simplifying control structures
615\item transformation to \textsf{RTLabs} control flow graph
616\end{enumerate}
617We have taken a common approach to
618each pass: first we build (or axiomatise) forward simulation results
619that are similar to normal compiler proofs, but which are slightly more
620fine-grained so that we can see that the call structure and relative
621placement of cost labels is preserved.
622
623Then we instantiate a general result which shows that we can find a
624\emph{measurable} subtrace in the target of the pass that corresponds
625to the measurable subtrace in the source.  By repeated application of
626this result we can find a measurable subtrace of the execution of the
627\textsf{RTLabs} code, suitable for the construction of a structured
628trace (see Section~\ref{sec:structuredtrace}).  This is essentially an
629extra layer on top of the simulation proofs that provides us with the
630additional information required for our intensional correctness proof.
631
632\subsection{Generic measurable subtrace lifting proof}
633
634Our generic proof is parametrised on a record containing small-step
635semantics for the source and target language, a classification of
636states (the same form of classification is used when defining
637structured traces), a simulation relation which respects the
638classification and cost labelling and
639four simulation results.  The simulations are split by the starting state's
640classification and whether it is a cost label, which will allow us to
641observe that the call structure is preserved.  They are:
642\begin{enumerate}
643\item a step from a `normal' state (which is not classified as a call
644  or return) which is not a cost label is simulated by zero or more
645  `normal' steps;
646\item a step from a `call' state followed by a cost label step is
647  simulated by a step from a `call' state, a corresponding label step,
648  then zero or more `normal' steps;
649\item a step from a `call' state not followed by a cost label
650  similarly (note that this case cannot occur in a well-labelled
651  program, but we do not have enough information locally to exploit
652  this); and
653\item a cost label step is simulated by a cost label step.
654\end{enumerate}
655Finally, we need to know that a successfully translated program will
656have an initial state in the simulation relation with the original
657program's initial state.
658
659\begin{figure}
660\begin{center}
661\includegraphics[width=0.5\linewidth]{meassim.pdf}
662\end{center}
663\caption{Tiling of simulation for a measurable subtrace}
664\label{fig:tiling}
665\end{figure}
666
667To find the measurable subtrace in the target program's execution we
668walk along the original program's execution trace applying the
669appropriate simulation result by induction on the number of steps.
670While the number of steps taken varies, the overall structure is
671preserved, as illustrated in Figure~\ref{fig:tiling}.  By preserving
672the structure we also maintain the same intensional observables.  One
673delicate point is that the cost label following a call must remain
674directly afterwards\footnote{The prototype compiler allowed some
675  straight-line code to appear before the cost label until a later
676  stage of the compiler, but we must move the requirement forward to
677  fit with the structured traces.}
678% Damn it, I should have just moved the cost label forwards in RTLabs,
679% like the prototype does in RTL to ERTL; the result would have been
680% simpler.  Or was there some reason not to do that?
681(both in the program code and in the execution trace), even if we
682introduce extra steps, for example to store parameters in memory in
683\textsf{Cminor}.  Thus we have a version of the call simulation
684that deals with both in one result.
685
686In addition to the subtrace we are interested in measuring we must
687also prove that the earlier part of the trace is also preserved in
688order to use the simulation from the initial state.  It also
689guarantees that we do not run out of stack space before the subtrace
690we are interested in.  The lemmas for this prefix and the measurable
691subtrace are similar, following the pattern above.  However, the
692measurable subtrace also requires us to rebuild the termination
693proof.  This is defined recursively:
694\label{prog:terminationproof}
695\begin{lstlisting}[language=matita]
696  let rec will_return_aux C (depth:nat)
697                               (trace:list (cs_state $...$ C × trace)) on trace : bool :=
698  match trace with
699  [ nil $\Rightarrow$ false
700  | cons h tl $\Rightarrow$
701    let $\langle$s,tr$\rangle$ := h in
702    match cs_classify C s with
703    [ cl_call $\Rightarrow$ will_return_aux C (S depth) tl
704    | cl_return $\Rightarrow$
705        match depth with
706        [ O $\Rightarrow$ match tl with [ nil $\Rightarrow$ true | _ $\Rightarrow$ false ]
707        | S d $\Rightarrow$ will_return_aux C d tl
708        ]
709    | _ $\Rightarrow$ will_return_aux C depth tl
710    ]
711  ].
712\end{lstlisting}
713The \lstinline'depth' is the number of return states we need to see
714before we have returned to the original function (initially zero) and
715\lstinline'trace' the measurable subtrace obtained from the running
716the semantics for the correct number of steps.  This definition
717unfolds tail recursively for each step, and once the corresponding
718simulation result has been applied a new one for the target can be
719asserted by unfolding and applying the induction hypothesis on the
720shorter trace.
721
722This then gives us an overall result for any simulation fitting the
723requirements above (contained in the \lstinline'meas_sim' record):
724\begin{lstlisting}[language=matita]
725theorem measured_subtrace_preserved :
726  $\forall$MS:meas_sim.
727  $\forall$p1,p2,m,n,stack_cost,max.
728  ms_compiled MS p1 p2 $\rightarrow$
729  measurable (ms_C1 MS) p1 m n stack_cost max $\rightarrow$
730  $\exists$m',n'.
731    measurable (ms_C2 MS) p2 m' n' stack_cost max $\wedge$
732    observables (ms_C1 MS) p1 m n = observables (ms_C2 MS) p2 m' n'.
733\end{lstlisting}
734The stack space requirement that is embedded in \lstinline'measurable'
735is a consequence of the preservation of observables, because it is
736determined by the functions called and returned from, which are observable.
737
738TODO: how to deal with untidy edges wrt to sim rel; anything to
739say about obs?
740
741TODO: cost labels are
742statements/exprs in these languages; split call/return gives simpler
743simulations
744
745\subsection{Simulation results for each pass}
746
747We now consider the simulation results for the passes, each of which
748is used to instantiate the
749\lstinline[language=matita]'measured_subtrace_preserved' theorem to
750construct the measurable subtrace for the next language.
751
752\subsubsection{Cast simplification}
753
754The parser used in \cerco{} introduces a lot of explicit type casts.
755If left as they are, these constructs can greatly hamper the
756quality of the generated code -- especially as the architecture
757we consider is an $8$-bit one. In \textsf{Clight}, casts are
758expressions. Hence, most of the work of this transformation
759proceeds on expressions. The tranformation proceeds by recursively
760trying to coerce an expression to a particular integer type, which
761is in practice smaller than the original one. This functionality
762is implemented by two mutually recursive functions whose signature
763is the following.
764
765\begin{lstlisting}[language=matita]
766let rec simplify_expr (e:expr) (target_sz:intsize) (target_sg:signedness)
767  : $\Sigma$result:bool$\times$expr.
768    $\forall$ge,en,m. simplify_inv ge en m e (\snd result) target_sz target_sg (\fst result) := $\ldots$
769
770and simplify_inside (e:expr) : $\Sigma$result:expr. conservation e result := $\ldots$
771\end{lstlisting}
772
773The \textsf{simplify\_inside} acts as a wrapper for
774\textsf{simplify\_expr}. Whenever \textsf{simplify\_inside} encounters
775a \textsf{Ecast} expression, it tries to coerce the sub-expression
776to the desired type using \textsf{simplify\_expr}, which tries to
777perform the actual coercion. In return, \textsf{simplify\_expr} calls
778back \textsf{simplify\_inside} in some particular positions, where we
779decided to be conservative in order to simplify the proofs. However,
780the current design allows to incrementally revert to a more aggressive
781version, by replacing recursive calls to \textsf{simplify\_inside} by
782calls to \textsf{simplify\_expr} \emph{and} proving the corresponding
783invariants -- where possible.
784
785The \textsf{simplify\_inv} invariant encodes either the conservation
786of the semantics during the transformation corresponding to the failure
787of the transformation (\textsf{Inv\_eq} constructor), or the successful
788downcast of the considered expression to the target type
789(\textsf{Inv\_coerce\_ok}).
790
791\begin{lstlisting}[language=matita]
792inductive simplify_inv
793  (ge : genv) (en : env) (m : mem)
794  (e1 : expr) (e2 : expr) (target_sz : intsize) (target_sg : signedness) : bool $\rightarrow$ Prop :=
795| Inv_eq : $\forall$result_flag. $\ldots$
796     simplify_inv ge en m e1 e2 target_sz target_sg result_flag
797| Inv_coerce_ok : $\forall$src_sz,src_sg.
798     typeof e1 = Tint src_sz src_sg $\rightarrow$
799     typeof e2 = Tint target_sz target_sg $\rightarrow$     
800     smaller_integer_val src_sz target_sz src_sg (exec_expr ge en m e1) (exec_expr ge en m e2) $\rightarrow$
801     simplify_inv ge en m e1 e2 target_sz target_sg true.
802\end{lstlisting}
803
804The \textsf{conservation} invariant for \textsf{simplify\_inside} simply states the conservation
805of the semantics, as in the \textsf{Inv\_eq} constructor of the previous
806invariant.
807
808\begin{lstlisting}[language=matita]
809definition conservation := $\lambda$e,result. $\forall$ge,en,m.
810          res_sim ? (exec_expr ge en m e) (exec_expr ge en m result)
811       $\wedge$ res_sim ? (exec_lvalue ge en m e) (exec_lvalue ge en m result)
812       $\wedge$ typeof e = typeof result.
813\end{lstlisting}
814
815This invariant is then easily lifted to statement evaluations.
816The main problem encountered with this particular pass was dealing with
817inconsistently typed programs, a canonical case being a particular
818integer constant of a certain size typed with another size. This
819prompted the need to introduce numerous type checks, making
820both the implementation and the proof more complex, even though more
821comprehensive checks are made in the next stage.
822\todo{Make this a particular case of the more general statement on baking more invariants in the Clight language}
823
824\subsection{Clight to Cminor}
825
826This pass is the last one operating on the \textsf{Clight} language.
827Its input is a full \textsf{Clight} program, and its output is a
828\textsf{Cminor} program. Note that we do not use an equivalent of
829Compcert's \textsf{C\#minor} language: we translate directly to a
830variant of \textsf{Cminor}. This presents the advantage of not
831requiring the special loop constructs, nor the explicit block
832structure. Another salient point of our approach is that a significant
833number of the properties needed for the simulation proof were directly
834encoded in dependently typed translation functions.  In particular,
835freshness conditions and well-typedness conditions are included. The
836main effects of the transformation from \textsf{Clight} to
837\textsf{Cminor} are listed below.
838
839\begin{itemize}
840\item Variables are classified as being either globals, stack-allocated
841  locals or potentially register-allocated locals. The value of register-allocated
842  local variables is moved out of the modelled memory and stored in a
843  dedicated environment.
844\item In \textsf{Clight}, each local variable has a dedicated memory block, whereas
845  stack-allocated locals are bundled together on a function-by-function basis.
846\item Loops are converted to jumps.
847\end{itemize}
848
849The first two points require memory injections which are more flexible that those
850needed in the switch removal case. In the remainder of this section, we briefly
851discuss our implementation of memory injections, and then the simulation proof.
852
853\subparagraph{Memory injections.}
854
855Our memory injections are modelled after the work of Blazy \& Leroy.
856However, the corresponding paper is based on the first version of the
857Compcert memory model, whereas we use a much more concrete model, allowing byte-level
858manipulations (as in the later version of Compcert's memory model). We proved
859roughly 80 \% of the required lemmas. Some difficulties encountered were notably
860due to some overly relaxed conditions on pointer validity (fixed during development).
861Some more side conditions had to be added to take care of possible overflows when converting
862from \textbf{Z} block bounds to $16$ bit pointer offsets (in practice, these overflows
863only occur in edge cases that are easily ruled out -- but this fact is not visible
864in memory injections). Concretely, some of the lemmas on the preservation of simulation of
865loads after writes were axiomatised, due to a lack of time.
866
867\subparagraph{Simulation proof.}
868
869The correctness proof for this transformation was not completed. We proved the
870simulation result for expressions and for some subset of the critical statement cases.
871Notably lacking are the function entry and exit, where the memory injection is
872properly set up. As would be expected, a significant amount of work has to be performed
873to show the conservation of all invariants at each simulation step.
874
875\todo{list cases, explain while loop, explain labeling problem}
876
877\section{Checking cost labelling properties}
878\label{sec:costchecks}
879
880Ideally, we would provide proofs that the cost labelling pass always
881produces programs that are soundly and precisely labelled and that
882each subsequent pass preserves these properties.  This would match our
883use of dependent types to eliminate impossible sources of errors
884during compilation, in particular retaining intermediate language type
885information.
886
887However, given the limited amount of time available we realised that
888implementing a compile-time check for a sound and precise labelling of
889the \textsf{RTLabs} intermediate code would reduce the proof burden
890considerably.  This is similar in spirit to the use of translation
891validation in certified compilation, which makes a similar trade-off
892between the potential for compile-time failure and the volume of proof
893required.
894
895The check cannot be pushed into a later stage of the compiler because
896much of the information is embedded into the structured traces.
897However, if an alternative method was used to show that function
898returns in the compiled code are sufficiently well-behaved, then we
899could consider pushing the cost property checks into the timing
900analysis itself.  We leave this as a possible area for future work.
901
902\subsection{Implementation and correctness}
903
904For a cost labelling to be sound and precise we need a cost label at
905the start of each function, after each branch and at least one in
906every loop.  The first two parts are trivial to check by examining the
907code.  In \textsf{RTLabs} the last part is specified by saying
908that there is a bound on the number of successive instruction nodes in
909the CFG that you can follow before you encounter a cost label, and
910checking this is more difficult.
911
912The implementation progresses through the set of nodes in the graph,
913following successors until a cost label is found or a label-free cycle
914is discovered (in which case the property does not hold and we stop).
915This is made easier by the prior knowledge that any branch is followed
916by cost labels, so we do not need to search each branch.  When a label
917is found, we remove the chain from the set and continue from another
918node in the set until it is empty, at which point we know that there
919is a bound for every node in the graph.
920
921Directly reasoning about the function that implements this would be
922rather awkward, so an inductive specification of a single step of its
923behaviour was written and proved to match the implementation.  This
924was then used to prove the implementation sound and complete.
925
926While we have not attempted to prove that the cost labelled properties
927are established and preserved earlier in the compiler, we expect that
928the effort for the \textsf{Cminor} to \textsf{RTLabs} stage alone
929would be similar to the work outlined above, because it involves the
930change from requiring a cost label at particular positions to
931requiring cost labels to break loops in the CFG.  As there are another
932three passes to consider (including the labelling itself), we believe
933that using the check above is much simpler overall.
934
935% TODO? Found some Clight to Cminor bugs quite quickly
936
937\section{Existence of a structured trace}
938\label{sec:structuredtrace}
939
940The \emph{structured traces} introduced in Section~\ref{sec:fegoals} enrich the execution trace of a program by
941nesting function calls in a mixed-step style and embedding the cost labelling
942properties of the program.  It was originally designed to support the
943proof of correctness for the timing analysis of the object code in the
944back-end, then generalised to provide a common structure to use from
945the end of the front-end to the object code.  See
946Figure~\ref{fig:strtrace} on page~\pageref{fig:strtrace} for an
947illustration of a structured trace.
948
949To make the definition generic we abstract over the semantics of the
950language,
951\begin{lstlisting}[language=matita]
952record abstract_status : Type[1] :=
953  { as_status :> Type[0]
954  ; as_execute : relation as_status
955  ; as_pc : DeqSet
956  ; as_pc_of : as_status $\rightarrow$ as_pc
957  ; as_classify : as_status $\rightarrow$ status_class
958  ; as_label_of_pc : as_pc $\rightarrow$ option costlabel
959  ; as_after_return : ($\Sigma$s:as_status. as_classify s =  cl_call) $\rightarrow$ as_status $\rightarrow$ Prop
960  ; as_result: as_status $\rightarrow$ option int
961  ; as_call_ident : ($\Sigma$s:as_status.as_classify s = cl_call) $\rightarrow$ ident
962  ; as_tailcall_ident : ($\Sigma$s:as_status.as_classify s = cl_tailcall) $\rightarrow$ ident
963  }.
964\end{lstlisting}
965which gives a type of states, an execution relation\footnote{All of
966  our semantics are executable, but using a relation was simpler in
967  the abstraction.}, some notion of
968program counters with decidable equality, the classification of states,
969and functions to extract the observable intensional information (cost
970labels and the identity of functions that are called).  The
971\lstinline'as_after_return' property links the state before a function
972call with the state after return, providing the evidence that
973execution returns to the correct place.  The precise form varies
974between stages; in \textsf{RTLabs} it insists the CFG, the pointer to
975the CFG node to execute next, and some call stack information is
976preserved.
977
978The structured traces are defined using three mutually inductive
979types.  The core data structure is \lstinline'trace_any_label', which
980captures some straight-line execution until the next cost label or the
981return from the enclosing function.  Any function calls are embedded as
982a single step, with its own trace nested inside and the before and
983after states linked by \lstinline'as_after_return'; and states
984classified as a `jump' (in particular branches) must be followed by a
985cost label.
986
987The second type, \lstinline'trace_label_label', is a
988\lstinline'trace_any_label' where the initial state is cost labelled.
989Thus a trace in this type identifies a series of steps whose cost is
990entirely accounted for by the label at the start.
991
992Finally, \lstinline'trace_label_return' is a sequence of
993\lstinline'trace_label_label' values which end in the return from the
994function.  These correspond to a measurable subtrace, and in
995particular include executions of an entire function call (and so are
996used for the nested calls in \lstinline'trace_any_label').
997
998\subsection{Construction}
999
1000The construction of the structured trace replaces syntactic cost
1001labelling properties, which place requirements on where labels appear
1002in the program, with semantics properties that constrain the execution
1003traces of the program.  The construction begins by defining versions
1004of the sound and precise labelling properties on states and global
1005environments (for the code that appears in each of them) rather than
1006whole programs, and showing that these are preserved by steps of the
1007\textsf{RTLabs} semantics.
1008
1009Then we show that each cost labelling property the structured traces
1010definition requires is locally satisfied.  These are broken up by the
1011classification of states.  Similarly, we prove a step-by-step stack
1012preservation result, which states that the \textsf{RTLabs} semantics
1013never changes the lower parts of the stack.
1014
1015The core part of the construction of a structured trace is to use the
1016proof of termination from the measurable trace (defined on
1017page~\pageref{prog:terminationproof}) to `fold up' the execution into
1018the nested form.  The results outlined above fill in the proof
1019obligations for the cost labelling properties and the stack
1020preservation result shows that calls return to the correct location.
1021
1022The structured trace alone is not sufficient to capture the property
1023that the program is soundly labelled.  While the structured trace
1024guarantees termination, it still permits a loop to be executed a
1025finite number of times without encountering a cost label.  We
1026eliminate this by proving that no `program counter' repeats within any
1027\lstinline'trace_any_label' section by showing that it is incompatible
1028with the property that there is a bound on the number of successor
1029instructions you can follow in the CFG before you encounter a cost
1030label.
1031
1032\subsubsection{Complete execution structured traces}
1033
1034The development of the construction above started relatively early,
1035before the measurable subtrace preservation proofs.  To be confident
1036that the traces were well-formed at that time, we also developed a
1037complete execution form that embeds the traces above.  This includes
1038non-terminating program executions, where an infinite number of the terminating
1039structured traces are embedded.  This construction confirmed that our
1040definition of structured traces was consistent, although we later
1041found that we did not require the whole execution version for the
1042compiler correctness results.
1043
1044To construct these we need to know whether each function call will
1045eventually terminate, requiring the use of the excluded middle.  This
1046classical reasoning is local to the construction of whole program
1047traces and is not necessary for our main results.
1048
1049\section{Conclusion}
1050
1051In combination with the work on the CerCo back-end and by
1052concentrating on the novel intensional parts of the proof, we have
1053shown that it is possible to construct certifying compilers that
1054correctly report execution time and stack space costs.  The layering
1055of intensional correctness proofs on top of normal simulation results
1056provides a useful separation of concerns, and could permit the reuse
1057of existing results.
1058
1059
1060
1061TODO: appendix on code layout?
1062
1063\bibliographystyle{plain}
1064\bibliography{report}
1065
1066\end{document}
Note: See TracBrowser for help on using the repository browser.