source: Deliverables/D3.4/Report/report.tex @ 3128

Last change on this file since 3128 was 3128, checked in by campbell, 7 years ago

Start of D3.4.
(Sorry it's taking longer than anticipated; I blame a migraine, possibly
induced by a Javascript Quadrocopter attack.)

File size: 16.1 KB
Line 
1\documentclass[11pt,epsf,a4wide]{article}
2\usepackage[mathletters]{ucs}
3\usepackage[utf8x]{inputenc}
4\usepackage{listings}
5\usepackage{../../style/cerco}
6\newcommand{\ocaml}{OCaml}
7\newcommand{\clight}{Clight}
8\newcommand{\matita}{Matita}
9\newcommand{\sdcc}{\texttt{sdcc}}
10
11\newcommand{\textSigma}{\ensuremath{\Sigma}}
12
13% LaTeX Companion, p 74
14\newcommand{\todo}[1]{\marginpar{\raggedright - #1}}
15
16\lstdefinelanguage{coq}
17  {keywords={Definition,Lemma,Theorem,Remark,Qed,Save,Inductive,Record},
18   morekeywords={[2]if,then,else},
19  }
20
21\lstdefinelanguage{matita}
22  {keywords={definition,lemma,theorem,remark,inductive,record,qed,let,rec,match,with,Type,and,on},
23   morekeywords={[2]whd,normalize,elim,cases,destruct},
24   mathescape=true,
25   morecomment=[n]{(*}{*)},
26  }
27
28\lstset{language=matita,basicstyle=\small\tt,columns=flexible,breaklines=false,
29        keywordstyle=\color{red}\bfseries,
30        keywordstyle=[2]\color{blue},
31        commentstyle=\color{green},
32        stringstyle=\color{blue},
33        showspaces=false,showstringspaces=false}
34
35\lstset{extendedchars=false}
36\lstset{inputencoding=utf8x}
37\DeclareUnicodeCharacter{8797}{:=}
38\DeclareUnicodeCharacter{10746}{++}
39\DeclareUnicodeCharacter{9001}{\ensuremath{\langle}}
40\DeclareUnicodeCharacter{9002}{\ensuremath{\rangle}}
41
42
43\title{
44INFORMATION AND COMMUNICATION TECHNOLOGIES\\
45(ICT)\\
46PROGRAMME\\
47\vspace*{1cm}Project FP7-ICT-2009-C-243881 \cerco{}}
48
49\date{ }
50\author{}
51
52\begin{document}
53\thispagestyle{empty}
54
55\vspace*{-1cm}
56\begin{center}
57\includegraphics[width=0.6\textwidth]{../../style/cerco_logo.png}
58\end{center}
59
60\begin{minipage}{\textwidth}
61\maketitle
62\end{minipage}
63
64
65\vspace*{0.5cm}
66\begin{center}
67\begin{LARGE}
68\bf
69Report n. D3.4\\
70Front-end Correctness Proofs\\
71\end{LARGE} 
72\end{center}
73
74\vspace*{2cm}
75\begin{center}
76\begin{large}
77Version 1.0
78\end{large}
79\end{center}
80
81\vspace*{0.5cm}
82\begin{center}
83\begin{large}
84Authors:\\
85Brian~Campbell, Ilias~Garnier, James~McKinna, Ian~Stark
86\end{large}
87\end{center}
88
89\vspace*{\fill}
90\noindent
91Project Acronym: \cerco{}\\
92Project full title: Certified Complexity\\
93Proposal/Contract no.: FP7-ICT-2009-C-243881 \cerco{}\\
94
95\clearpage \pagestyle{myheadings} \markright{\cerco{}, FP7-ICT-2009-C-243881}
96
97\newpage
98
99\vspace*{7cm}
100\paragraph{Abstract}
101We report on the correctness proofs for the front-end of the \cerco{} cost
102lifting compiler, considering three distinct parts of the task: showing that
103the \emph{annotated source code} output by the compiler has equivalent
104behaviour to the original input (up to the annotations); showing that a
105\emph{measurable} subtrace of the annotated source code corresponds to an
106equivalent measurable subtrace in the code produced by the front-end, including
107costs; and finally showing that the enriched \emph{structured} execution traces
108required for cost correctness in the back-end can be constructed from the
109properties of the code produced by the front-end.
110
111A key part of our work is that the intensional correctness results that show
112that we get consistent cost measurements throughout the intermediate languages
113of the compiler can be layered on top of normal forward simulation results,
114if we split them into local call-structure preserving results.
115
116This deliverable shows correctness results about the formalised compiler
117described in D3.2, using the source language semantics from D3.1 and
118intermediate language semantics from D3.3.  Together with the companion
119deliverable about the correctness of the back-end, D4.4, we obtain results
120about the whole formalised compiler.
121
122% TODO: mention the deliverable about the extracted compiler et al?
123
124\newpage 
125
126\tableofcontents
127
128% CHECK: clear up any -ize vs -ise
129% CHECK: clear up any "front end" vs "front-end"
130% CHECK: clear up any mentions of languages that aren't textsf'd.
131% CHECK: fix unicode in listings
132
133\section{Introduction}
134
135TODO: briefly, cerco compiler's purpose, prototype, formalisation, whole
136compiler.  Full functional correctness too ambitious; fortunately get nice
137layering of intensional correctness on top of mildly refined normal simulations.
138Hence, concentrate on new intensional correctness results; add stack space for
139more precise statement.  Also do some
140translation validation on sound, precise labelling properties.  Fluffly version
141of correctness statement.
142
143The \cerco{} compiler compiles C code targeting microcontrollers implementing
144the Intel 8051 architecture, which produces both the object code and source
145code containing annotations describing the timing behavior of the object code.
146There are two versions: first, an initial prototype implemented in
147\ocaml{}~\cite{d2.2}, and a version formalised in the \matita{} proof
148assistant~\cite{d3.2,d4.2} and then extracted to \ocaml{} code to produce an
149executable compiler.  In this document we present results formalised in
150\matita{} about the front-end of that version of the compiler, and how that fits
151into the verification of the whole compiler.
152
153% TODO: maybe mention stack space here?  other additions?  refer to "layering"?
154A key part of this work was to separate the proofs about the compiled code's
155extensional behaviour (that is, the functional correctness of the compiler)
156from the intensional correctness that the costs given are correct.
157Unfortunately, the ambitious goal of completely verifying the compiler was not
158feasible within the time available, but thanks to this separation of extensional
159and intensional proofs we are able to axiomatize simulation results similar to
160those in other compiler verification projects and concentrate on the novel
161intensional proofs.  The proofs were also made more tractable by introducing
162compile-time checks for the `sound and precise' cost labelling properties
163rather than proving that they are preserved throughout.
164
165The overall statement of correctness says that the annotated program has the
166same behaviour as the input, and that for any suitably well-structured part of
167the execution (which we call \emph{measurable}), the object code will execute
168the same behaviour taking precisely the time given by the cost annotations in
169the annotated source program.
170
171In the next section we recall the structure of the compiler and make the overall
172statement more precise.  Following that, in Section~\ref{sec:fegoals} we
173describe the statements we need to prove about the intermediate \textsf{RTLabs}
174programs sufficient for the back-end proofs.  TODO: rest of document structure
175
176\section{The compiler and main goals}
177
178TODO: outline compiler, maybe figure from talk, maybe something like the figure
179below.
180
181TODO: might want a version of this figure
182\begin{figure}
183\begin{center}
184\begin{minipage}{.8\linewidth}
185\begin{tabbing}
186\quad \= $\downarrow$ \quad \= \kill
187\textsf{C} (unformalized)\\
188\> $\downarrow$ \> CIL parser (unformalized \ocaml)\\
189\textsf{Clight}\\
190\> $\downarrow$ \> cast removal\\
191\> $\downarrow$ \> add runtime functions\\
192\> $\downarrow$ \> labelling\\
193\> $\downarrow$ \> stack variable allocation and control structure
194 simplification\\
195\textsf{Cminor}\\
196\> $\downarrow$ \> generate global variable initialization code\\
197\> $\downarrow$ \> transform to RTL graph\\
198\textsf{RTLabs}\\
199\> $\downarrow$ \> start of target specific back-end\\
200\>\quad \vdots
201\end{tabbing}
202\end{minipage}
203\end{center}
204\caption{Front-end languages and transformations}
205\label{fig:summary}
206\end{figure}
207
208The compiler function returns the following record on success:
209\begin{lstlisting}[language=matita]
210record compiler_output : Type[0] :=
211 { c_labelled_object_code: labelled_object_code
212 ; c_stack_cost: stack_cost_model
213 ; c_max_stack: nat
214 ; c_init_costlabel: costlabel
215 ; c_labelled_clight: clight_program
216 ; c_clight_cost_map: clight_cost_map
217 }.
218\end{lstlisting}
219It consists of annotated 8051 object code, a mapping from function
220identifiers to the function's stack space usage\footnote{The compiled
221  code's only stack usage is to allocate a fixed-size frame on each
222  function entry and discard it on exit.}, a cost label covering the
223initialisation of global variables and calling the
224\lstinline[language=C]'main' function, the annotated source code, and
225finally a mapping from cost labels to actual execution time costs.
226
227An \ocaml{} pretty printer is used to provide a concrete version of the output
228code.  In the case of the annotated source code, it also inserts the actual
229costs alongside the cost labels, and optionally adds a global cost variable
230and instrumentation to support further reasoning.  \todo{Cross-ref case study
231deliverables}
232
233\subsection{Revisions to the prototype compiler}
234
235TODO: could be a good idea to have this again
236
237\subsection{Main goals}
238
239TODO: need an example for this
240
241Informally, our main intensional result links the time difference in a source
242code execution to the time difference in the object code, expressing the time
243for the source by summing the values for the cost labels in the trace, and the
244time for the target by a clock built in to the 8051 executable semantics.
245
246The availability of precise timing information for 8501
247implementations and the design of the compiler allow it to give exact
248time costs in terms of processor cycles.  However, these exact results
249are only available if the subtrace we measure starts and ends at
250suitable points.  In particular, pure computation with no observable
251effects may be reordered and moved past cost labels, so we cannot
252measure time between arbitrary statements in the program.
253
254There is also a constraint on the subtraces that we
255measure due to the requirements of the correctness proof for the
256object code timing analysis.  To be sure that the timings are assigned
257to the correct cost label, we need to know that each return from a
258function call must go to the correct return address.  It is difficult
259to observe this property locally in the object code because it relies
260on much earlier stages in the compiler.  To convey this information to
261the timing analysis extra structure is imposed on the subtraces, which
262we will give more details on in Section~\ref{sec:fegoals}.
263
264% Regarding the footnote, would there even be much point?
265These restrictions are reflected in the subtraces that we give timing
266guarantees on; they must start at a cost label and end at the return
267of the enclosing function of the cost label\footnote{We expect that
268  this would generalise to subtraces between cost labels in the same
269  function, but could not justify the extra complexity that would be
270  required to show this.}.  A typical example of such a subtrace is
271the execution of an entire function from the cost label at the start
272of the function until it returns.  We call such any such subtrace
273\emph{measurable} if it (and the prefix of the trace before it) can
274also be executed within the available stack space.
275
276Now we can give the main intensional statement for the compiler.
277Given a \emph{measurable} subtrace for a labelled \textsf{Clight}
278program, there is a subtrace of the 8051 object code program where the
279time differences match.  Moreover, \emph{observable} parts of the
280trace also match --- these are the appearance of cost labels and
281function calls and returns.
282
283More formally, the definition of this statement in \matita{} is
284\begin{lstlisting}[language=matita]
285definition simulates :=
286  $\lambda$p: compiler_output.
287  let initial_status := initialise_status $\dots$ (cm (c_labelled_object_code $\dots$ p)) in
288  $\forall$m1,m2.
289   measurable Clight_pcs (c_labelled_clight $\dots$ p) m1 m2
290    (lookup_stack_cost (c_stack_cost $\dots$ p)) (c_max_stack $\dots$ p) $\rightarrow$
291  $\forall$c1,c2.
292   clock_after Clight_pcs (c_labelled_clight $\dots$ p) m1 (c_clight_cost_map $\dots$ p) = OK $\dots$ c1 $\rightarrow$
293   clock_after Clight_pcs (c_labelled_clight $\dots$ p) (m1+m2) (c_clight_cost_map $\dots$ p) = OK $\dots$ c2 $\rightarrow$
294  $\exists$n1,n2.
295   observables Clight_pcs (c_labelled_clight $\dots$ p) m1 m2 =
296   observables (OC_preclassified_system (c_labelled_object_code $\dots$ p))
297    (c_labelled_object_code $\dots$ p) n1 n2
298  $\wedge$
299   c2 - c1 = clock $\dots$ (execute n2 ? initial_status) - clock $\dots$ (execute n1 ? initial_status).
300\end{lstlisting}
301where the \lstinline'measurable', \lstinline'clock_after' and
302\lstinline'observables' definitions can be applied to multiple
303languages; in this case the \lstinline'Clight_pcs' record applies them
304to \textsf{Clight} programs.
305
306TODO: rest of correctness; discuss full functional correctness
307
308\section{Intermediate goals for the front-end}
309\label{sec:fegoals}
310
311The essential parts of the intensional proof were outlined during work
312on a toy
313compiler~\cite{d2.1,springerlink:10.1007/978-3-642-32469-7_3}.  These
314are
315\begin{enumerate}
316\item functional correctness, in particular preserving the trace of
317  cost labels,
318\item the \emph{soundness} and \emph{precision} of the cost labelling
319  on the object code, and
320\item the timing analysis on the object code produces a correct
321  mapping from cost labels to time.
322\end{enumerate}
323
324However, that toy development did not include function calls.  For the
325full \cerco{} compiler we also need to maintain the invariant that
326functions return to the correct program location in the caller, as we
327mentioned in the previous section.  During work on the back-end timing
328analysis (describe in more detail in the companion deliverable, D4.4)
329the notion of a \emph{structured trace} was developed to enforce this
330return property, and also most of the cost labelling properties too.
331
332\begin{figure}
333\begin{center}
334\includegraphics[width=0.5\linewidth]{compiler.pdf}
335\end{center}
336\caption{The compiler and proof outline}
337\label{fig:compiler}
338\end{figure}
339
340Jointly, we generalised the structured traces to apply to any of the
341intermediate languages with some idea of program counter.  This means
342that they are introduced part way through the compiler, see
343Figure~\ref{fig:compiler}.  Proving that a structured trace can be
344constructed at \textsf{RTLabs} has several virtues:
345\begin{itemize}
346\item This is the first language where every operation has its own
347  unique, easily addressable, statement.
348\item Function calls and returns are still handled in the language and
349  so the structural properties are ensured by the semantics.
350\item Many of the back-end languages from \textsf{RTL} share a common
351  core set of definitions, and using structured traces throughout
352  increases this uniformity.
353\end{itemize}
354
355\todo{Nice version of the nested trace diagram from slides}
356A structured trace is a mutually inductive data type which principally
357contains the steps from a normal program trace, but arranged into a
358nested structure which groups entire function calls together and
359aggregates individual steps between cost labels (or between the final
360cost label and the return from the function).  This capture the idea
361that the cost labels only represent costs \emph{within} a function ---
362calls to other functions are accounted for in the trace for their
363execution, and we can locally regard function calls as a single step.
364
365These structured traces form the core part of the intermediate results
366that we must prove so that the back-end can complete the main
367intensional result stated above.  In full, we provide the back-end
368with
369\begin{enumerate}
370\item A normal trace of the \textbf{prefix} of the program's execution
371  before reaching the measurable subtrace.  (This needs to be
372  preserved so that we know that the stack space consumed is correct.)
373\item The \textbf{structured trace} corresponding to the measurable
374  subtrace.
375\item An additional property about the structured trace that no
376  `program counter' is repeated between cost labels.  Together with
377  the structure in the trace, this takes over from showing that
378  cost labelling is sound and precise.
379\item A proof that the observables have been preserved.
380\item A proof that the stack limit is still observed by the prefix and
381  the structure trace.  (This is largely a consequence of the
382  preservation of observables.)
383\end{enumerate}
384
385Following the outline in Figure~\ref{fig:compiler}, we will first deal
386with the transformations in \textsf{Clight} that produce the source
387program with cost labels, then show that measurable traces can be
388lifted to \textsf{RTLabs}, and finally that we can construct the
389properties listed above ready for the back-end proofs.
390
391\section{Input code to cost labelled program}
392
393\section{Finding corresponding measurable subtraces}
394
395\section{Existence of a structured trace}
396\label{sec:structuretrace}
397
398\section{Conclusion}
399
400TODO
401
402\bibliographystyle{plain}
403\bibliography{report}
404
405\end{document}
Note: See TracBrowser for help on using the repository browser.