source: Deliverables/D1.2/CompilerProofOutline/outline.tex @ 1744

Last change on this file since 1744 was 1744, checked in by boender, 8 years ago
  • New version of effort table
File size: 28.0 KB
Line 
1\documentclass[a4paper, 10pt]{article}
2
3\usepackage{a4wide}
4\usepackage{amsfonts}
5\usepackage{amsmath}
6\usepackage{amssymb}
7\usepackage[english]{babel}
8\usepackage{color}
9\usepackage{diagrams}
10\usepackage{graphicx}
11\usepackage[colorlinks]{hyperref}
12\usepackage[utf8x]{inputenc}
13\usepackage{listings}
14\usepackage{microtype}
15\usepackage{skull}
16\usepackage{stmaryrd}
17\usepackage{wasysym}
18
19\lstdefinelanguage{matita-ocaml} {
20  mathescape=true
21}
22\lstset{
23  language=matita-ocaml,basicstyle=\tt,columns=flexible,breaklines=false,
24  showspaces=false, showstringspaces=false, extendedchars=false,
25  inputencoding=utf8x, tabsize=2
26}
27
28\title{Proof outline for the correctness of the CerCo compiler}
29\date{\today}
30\author{The CerCo team}
31
32\begin{document}
33
34\maketitle
35
36\section{Introduction}
37\label{sect.introduction}
38
39In the last project review of the CerCo project, the project reviewers expressed the opinion that it would be a good idea to attempt to write down some of the statements of the correctness theorems that we intend to prove about the complexity preserving compiler.
40This document provides a very high-level, pen-and-paper sketch of what we view as the best path to completing the correctness proof for the compiler.
41In particular, for every translation between two intermediate languages, in both the front- and back-ends, we identify the key translation steps, and identify some invariants that we view as being important for the correctness proof.
42
43\section{Front-end: Clight to RTLabs}
44
45The front-end of the CerCo compiler consists of several stages:
46
47\begin{center}
48\begin{minipage}{.8\linewidth}
49\begin{tabbing}
50\quad \= $\downarrow$ \quad \= \kill
51\textsf{Clight}\\
52\> $\downarrow$ \> cast removal\\
53\> $\downarrow$ \> add runtime functions\footnote{Following the last project
54meeting we intend to move this transformation to the back-end}\\
55\> $\downarrow$ \> cost labelling\\
56\> $\downarrow$ \> loop optimizations\footnote{\label{lab:opt2}To be ported from the untrusted compiler and certified only in case of early completion of the certification of the other passes.} (an endo-transformation)\\
57\> $\downarrow$ \> partial redundancy elimination$^{\mbox{\scriptsize \ref{lab:opt2}}}$ (an endo-transformation)\\
58\> $\downarrow$ \> stack variable allocation and control structure
59 simplification\\
60\textsf{Cminor}\\
61\> $\downarrow$ \> generate global variable initialisation code\\
62\> $\downarrow$ \> transform to RTL graph\\
63\textsf{RTLabs}\\
64\> $\downarrow$ \> \\
65\>\,\vdots
66\end{tabbing}
67\end{minipage}
68\end{center}
69
70Here, by `endo-transformation', we mean a mapping from language back to itself:
71the loop optimization step maps the Clight language to itself.
72
73%Our overall statements of correctness with respect to costs will
74%require a correctly labelled program
75There are three layers in most of the proofs proposed:
76\begin{enumerate}
77\item invariants closely tied to the syntax and transformations using
78  dependent types,
79\item a forward simulation proof, and
80\item syntactic proofs about the cost labelling.
81\end{enumerate}
82The first will support both function correctness and allow us to show
83the totality of some of the compiler stages (that is, these stages of
84the compiler cannot fail).  The second provides the main functional
85correctness result, and the last will be crucial for applying
86correctness results about the costings from the back-end.
87
88We will also prove that a suitably labelled RTLabs trace can be turned
89into a \emph{structured trace} which splits the execution trace into
90cost-label-to-cost-label chunks with nested function calls.  This
91structure was identified during work on the correctness of the
92back-end cost analysis as retaining important information about the
93structure of the execution that is difficult to reconstruct later in
94the compiler.
95
96\subsection{Clight Cast removal}
97
98This removes some casts inserted by the parser to make arithmetic
99promotion explicit when they are superfluous (such as
100\lstinline[language=C]'c = (short)((int)a + (int)b);').
101
102The transformation only affects Clight expressions, recursively
103detecting casts that can be safely eliminated.  The semantics provides
104a big-step definition for expression, so we should be able to show a
105lock-step forward simulation using a lemma showing that cast
106elimination does not change the evaluation of expressions.  This lemma
107will follow from a structural induction on the source expression.  We
108have already proved a few of the underlying arithmetic results
109necessary to validate the approach.
110
111\subsection{Clight cost labelling}
112
113This adds cost labels before and after selected statements and
114expressions, and the execution traces ought to be equivalent modulo
115cost labels.  Hence it requires a simple forward simulation with a
116limited amount of stuttering whereever a new cost label is introduced.
117A bound can be given for the amount of stuttering allowed can be given
118based on the statement or continuation to be evaluated next.
119
120We also intend to show three syntactic properties about the cost
121labelling:
122\begin{itemize}
123\item every function starts with a cost label,
124\item every branching instruction is followed by a label (note that
125  exiting a loop is treated as a branch), and
126\item the head of every loop (and any \lstinline'goto' destination) is
127  a cost label.
128\end{itemize}
129These can be shown by structural induction on the source term.
130
131\subsection{Clight to Cminor translation}
132
133This translation is the first to introduce some invariants, with the
134proofs closely tied to the implementation by dependent typing.  These
135are largely complete and show that the generated code enjoys:
136\begin{itemize}
137\item some minimal type safety shown by explicit checks on the
138  Cminor types during the transformation (a little more work remains
139  to be done here, but follows the same form);
140\item that variables named in the parameter and local variable
141  environments are distinct from one another, again by an explicit
142  check;
143\item that variables used in the generated code are present in the
144  resulting environment (either by checking their presence in the
145  source environment, or from a list of freshly temporary variables);
146  and
147\item that all \lstinline[language=C]'goto' labels are present (by
148  checking them against a list of source labels and proving that all
149  source labels are preserved).
150\end{itemize}
151
152The simulation will be similar to the relevant stages of CompCert
153(Clight to Csharpminor and Csharpminor to Cminor --- in the event that
154the direct proof is unwieldy we could introduce a corresponding
155intermediate language).  During early experimentation with porting
156CompCert definitions to the Matita proof assistant we found little
157difficulty reproving the results for the memory model, so we plan to
158port the memory injection properties and use them to relate Clight
159in-memory variables with either a local variable valuation or a stack
160slot, depending on how it was classified.
161
162This should be sufficient to show the equivalence of (big-step)
163expression evaluation.  The simulation can then be shown by relating
164corresponding blocks of statement and continuations with their Cminor
165counterparts and proving that a few steps reaches the next matching
166state.
167
168The syntactic properties required for cost labels remain similar and a
169structural induction on the function bodies should be sufficient to
170show that they are preserved.
171
172\subsection{Cminor global initialisation code}
173
174This short phase replaces the global variable initialisation data with
175code that executes when the program starts.  Each piece of
176initialisation data in the source is matched by a new statement
177storing that data.  As each global variable is allocated a distinct
178memory block, the program state after the initialisation statements
179will be the same as the original program's state at the start of
180execution, and will proceed in the same manner afterwards.
181
182% Actually, the above is wrong...
183% ... this ought to be in a fresh main function with a fresh cost label
184
185\subsection{Cminor to RTLabs translation}
186
187In this part of the compiler we transform the program's functions into
188control flow graphs.  It is closely related to CompCert's Cminorsel to
189RTL transformation, albeit with target-independent operations.
190
191We already enforce several invariants with dependent types: some type
192safety is enforced mostly using the type information from Cminor; and
193that the graph is closed (by showing that each successor was recently
194added, or corresponds to a \lstinline[language=C]'goto' label which
195are all added before the end).  Note that this relies on a
196monotonicity property; CompCert maintains a similar property in a
197similar way while building RTL graphs.  We will also add a result
198showing that all of the pseudo-register names are distinct for use by
199later stages using the same method as Cminor.
200
201The simulation will relate Cminor states to RTLabs states about to
202execute the corresponding code (with a corresponding continuation).
203Each Cminor statement becomes zero or more RTLabs statements, with a
204decreasing measure based on the statement and continuations similar to
205CompCert's.  We may also follow CompCert in using a relational
206specification of this stage so as to abstract away from the functional
207(and highly dependently typed) definition.
208
209The first two labelling properties remain as before; we will show that
210cost labels are preserved, so the function entry point will be a cost
211label, and successors that are cost labels map still map to cost
212labels, preserving the condition on branches.  We replace the property
213for loops with the notion that we will always reach a cost label or
214the end of the function by following a bounded number of successors.
215This can be easily seen in Cminor using the requirement for cost
216labels at the head of loops and after gotos.  It remains to show that
217this is preserved by the translation to RTLabs.  % how?
218
219\subsection{RTLabs structured trace generation}
220
221This proof-only step incorporates the function call structure and cost
222labelling properties into the execution trace.  As the function calls
223are nested within the trace, we need to distinguish between
224terminating and non-terminating function calls.
225
226Thus we use the excluded middle (specialised to a function termination
227property) to do this.  Terminating functions are built by following
228the trace, breaking it into chunks between cost labels and recursively
229processing function calls.  The main difficulties here are the
230non-structurally recursive nature of the function (instead we use the
231size of the termination proof as a measure) and using the properties
232on RTLabs cost labelling to create the correct structure.  We also
233show that the lower stack frames are preserved during function calls
234in order to prove that after returning from a function call we execute
235the correct code.  This part of the work has already been constructed,
236but still requires a simple proof to show that flattening the structured
237trace recreates the original flat trace.
238
239The non-terminating case uses the excluded middle to decide whether to
240use the function described above to construct an inductive terminating
241structured trace, or a coinductive version for non-termination.  It
242follows the trace like the terminating version to build up chunks of
243trace from cost-label to cost-label (which, by the finite distance to
244a cost label property shown before, can be represented by an inductive
245type), unless it encounters a non-terminating function in which case
246it corecursively processes that.  This part of the trace
247transformation is currently under construction, and will also need a
248flattening result to show that it is correct.
249
250\section{Backend: RTLabs to machine code}
251\label{sect.backend.rtlabs.machine.code}
252
253The compiler backend consists of the following intermediate languages, and stages of translation:
254
255\begin{center}
256\begin{minipage}{.8\linewidth}
257\begin{tabbing}
258\quad \=\,\vdots\= \\
259\> $\downarrow$ \>\\
260\> $\downarrow$ \quad \= \kill
261\textsf{RTLabs}\\
262\> $\downarrow$ \> copy propagation\footnote{\label{lab:opt}To be ported from the untrusted compiler and certified only in case of early completion of the certification of the other passes.} (an endo-transformation) \\
263\> $\downarrow$ \> instruction selection\\
264\textsf{RTL}\\
265\> $\downarrow$ \> constant propagation$^{\mbox{\scriptsize \ref{lab:opt}}}$ (an endo-transformation) \\
266\> $\downarrow$ \> calling convention made explicit \\
267\> $\downarrow$ \> layout of activation records \\
268\textsf{ERTL}\\
269\> $\downarrow$ \> register allocation and spilling\\
270\> $\downarrow$ \> dead code elimination\\
271\textsf{LTL}\\
272\> $\downarrow$ \> function linearisation\\
273\> $\downarrow$ \> branch compression (an endo-transformation) \\
274\textsf{LIN}\\
275\> $\downarrow$ \> relabeling\\
276\textsf{ASM}\\
277\> $\downarrow$ \> pseudoinstruction expansion\\
278\textsf{MCS-51 machine code}\\
279\end{tabbing}
280\end{minipage}
281\end{center}
282
283\subsection{The RTLabs to RTL translation}
284\label{subsect.rtlabs.rtl.translation}
285
286% dpm: type system and casting load (???)
287We require a map, $\sigma$, between \texttt{Values} of the front-end memory model to lists of \texttt{BEValues} of the back-end memory model:
288
289\begin{displaymath}
290\mathtt{Value} ::= \bot \mid \mathtt{int(size)} \mid \mathtt{float} \mid \mathtt{null} \mid \mathtt{ptr} \quad\stackrel{\sigma}{\longrightarrow}\quad \mathtt{BEValue} ::= \bot \mid \mathtt{byte} \mid \mathtt{int}_i \mid \mathtt{ptr}_i
291\end{displaymath}
292
293We further require a map, $\sigma$, which maps the front-end \texttt{Memory} and the back-end's notion of \texttt{BEMemory}.
294
295\begin{displaymath}
296\mathtt{Mem}\ \alpha = \mathtt{Block} \rightarrow (\mathbb{Z} \rightarrow \alpha)
297\end{displaymath}
298
299where
300
301\begin{displaymath}
302\mathtt{Block} ::= \mathtt{Region} \cup \mathtt{ID}
303\end{displaymath}
304
305\begin{displaymath}
306\mathtt{BEMem} = \mathtt{Mem} \mathtt{Value}
307\end{displaymath}
308
309\begin{displaymath}
310\mathtt{Address} = \mathtt{BEValue} \times  \mathtt{BEValue} \\
311\end{displaymath}
312
313\begin{displaymath}
314\mathtt{Mem} = \mathtt{Block} \rightarrow (\mathbb{Z} \rightarrow \mathtt{Cont} \mid \mathtt{Value} \times \mathtt{Size} \mid \mathtt{null})
315\end{displaymath}
316
317\begin{center}
318\begin{picture}(2, 2)
319\put(-150,-20){\framebox(25,25)[c]{\texttt{v}}}
320\put(-125,-20){\framebox(25,25)[c]{\texttt{4}}}
321\put(-100,-20){\framebox(25,25)[c]{\texttt{cont}}}
322\put(-75,-20){\framebox(25,25)[c]{\texttt{cont}}}
323\put(-50,-20){\framebox(25,25)[c]{\texttt{cont}}}
324\put(-15,-10){\vector(1, 0){30}}
325\put(25,-20){\framebox(25,25)[c]{\texttt{\texttt{v1}}}}
326\put(50,-20){\framebox(25,25)[c]{\texttt{\texttt{v2}}}}
327\put(75,-20){\framebox(25,25)[c]{\texttt{\texttt{v3}}}}
328\put(100,-20){\framebox(25,25)[c]{\texttt{\texttt{v4}}}}
329\end{picture}
330\end{center}
331
332\begin{displaymath}
333\mathtt{load}\ s\ a\ M = \mathtt{Some}\ v \rightarrow \forall i \leq s.\ \mathtt{load}\ s\ (a + i)\ \sigma(M) = \mathtt{Some}\ v_i
334\end{displaymath}
335
336\begin{displaymath}
337\sigma(\mathtt{store}\ v\ M) = \mathtt{store}\ \sigma(v)\ \sigma(M)
338\end{displaymath}
339
340\begin{displaymath}
341\texttt{load}^* (\mathtt{store}\ \sigma(v)\ \sigma(M))\ \sigma(a)\ \sigma(M) = \mathtt{load}^*\ \sigma(s)\ \sigma(a)\ \sigma(M)
342\end{displaymath}
343
344\begin{displaymath}
345\begin{array}{rll}
346\mathtt{State} & ::=  & (\mathtt{State} : \mathtt{Frame}^* \times \mathtt{Frame} \\
347               & \mid & \mathtt{Call} : \mathtt{Frame}^* \times \mathtt{Args} \times \mathtt{Return} \times \mathtt{Fun} \\
348               & \mid & \mathtt{Return} : \mathtt{Frame}^* \times \mathtt{Value} \times \mathtt{Return}) \times \mathtt{Mem}
349\end{array}
350\end{displaymath}
351
352\begin{displaymath}
353\mathtt{State} ::= \mathtt{Frame}^* \times \mathtt{PC} \times \mathtt{SP} \times \mathtt{ISP} \times \mathtt{CARRY} \times \mathtt{REGS}
354\end{displaymath}
355
356\begin{displaymath}
357\mathtt{State} \stackrel{\sigma}{\longrightarrow} \mathtt{State}
358\end{displaymath}
359
360\begin{displaymath}
361\sigma(\mathtt{State} (\mathtt{Frame}^* \times \mathtt{Frame})) \longrightarrow ((\sigma(\mathtt{Frame}^*), \sigma(\mathtt{PC}), \sigma(\mathtt{SP}), 0, 0, \sigma(\mathtt{REGS})), \sigma(\mathtt{Mem}))
362\end{displaymath}
363
364\begin{displaymath}
365\sigma(\mathtt{Return}(-)) \longrightarrow \sigma \circ \text{return one step}
366\end{displaymath}
367
368\begin{displaymath}
369\sigma(\mathtt{Call}(-)) \longrightarrow \sigma \circ \text{call one step}
370\end{displaymath}
371
372Return one step commuting diagram:
373
374\begin{displaymath}
375\begin{diagram}
376s & \rTo^{\text{one step of execution}} & s'   \\
377  & \rdTo                             & \dTo \\
378  &                                   & \llbracket s'' \rrbracket
379\end{diagram}
380\end{displaymath}
381
382Call one step commuting diagram:
383
384\begin{displaymath}
385\begin{diagram}
386s & \rTo^{\text{one step of execution}} & s'   \\
387  & \rdTo                             & \dTo \\
388  &                                   & \llbracket s'' \rrbracket
389\end{diagram}
390\end{displaymath}
391
392\begin{displaymath}
393\begin{array}{rcl}
394\mathtt{Call(id,\ args,\ dst,\ pc),\ State(FRAME, FRAMES)} & \longrightarrow & \mathtt{Call(M(args), dst)}, \\
395                                                           &                 & \mathtt{PUSH(current\_frame[PC := after\_return])}
396\end{array}
397\end{displaymath}
398
399In the case where the call is to an external function, we have:
400
401\begin{displaymath}
402\begin{array}{rcl}
403\mathtt{Call(M(args), dst)},                       & \stackrel{\mathtt{ret\_val = f(M(args))}}{\longrightarrow} & \mathtt{Return(ret\_val,\ dst,\ PUSH(...))} \\
404\mathtt{PUSH(current\_frame[PC := after\_return])} &                                                            & 
405\end{array}
406\end{displaymath}
407
408then:
409
410\begin{displaymath}
411\begin{array}{rcl}
412\mathtt{Return(ret\_val,\ dst,\ PUSH(...))} & \longrightarrow & \mathtt{pc = POP\_STACK(regs[dst := M(ret\_val)],\ pc)}
413\end{array}
414\end{displaymath}
415
416In the case where the call is to an internal function, we have:
417
418\begin{displaymath}
419\begin{array}{rcl}
420\mathtt{CALL}(\mathtt{id}, \mathtt{args}, \mathtt{dst}, \mathtt{pc}) & \longrightarrow & \mathtt{CALL\_ID}(\mathtt{id}, \sigma'(\mathtt{args}), \sigma(\mathtt{dst}), \mathtt{pc}) \\
421\mathtt{RETURN}                                                      & \longrightarrow & \mathtt{RETURN} \\
422\end{array} 
423\end{displaymath}
424
425\begin{displaymath}
426\begin{array}{rcl}
427\mathtt{Call(M(args), dst)}                        & \longrightarrow & \mathtt{sp = alloc}, regs = \emptyset[- := PARAMS] \\
428\mathtt{PUSH(current\_frame[PC := after\_return])} &                 & \mathtt{State}(regs,\ sp,\ pc_\emptyset,\ dst)
429\end{array}
430\end{displaymath}
431
432then:
433
434\begin{displaymath}
435\begin{array}{rcl}
436\mathtt{sp = alloc}, regs = \emptyset[- := PARAMS] & \longrightarrow & \mathtt{free(sp)} \\
437\mathtt{State}(regs,\ sp,\ pc_\emptyset,\ dst)     &                 & \mathtt{Return(M(ret\_val), dst, frames)}
438\end{array}
439\end{displaymath}
440
441and finally:
442
443\begin{displaymath}
444\begin{array}{rcl}
445\mathtt{free(sp)}                         & \longrightarrow & \mathtt{pc = POP\_STACK(regs[dst := M(ret\_val)],\ pc)} \\
446\mathtt{Return(M(ret\_val), dst, frames)} &                 & 
447\end{array}
448\end{displaymath}
449
450\begin{displaymath}
451\begin{array}{rcl}
452\sigma & : & \mathtt{register} \rightarrow \mathtt{list\ register} \\
453\sigma' & : & \mathtt{list\ register} \rightarrow \mathtt{list\ register}
454\end{array}
455\end{displaymath}
456
457\subsection{The RTL to ERTL translation}
458\label{subsect.rtl.ertl.translation}
459
460\begin{displaymath}
461\begin{diagram}
462& & \llbracket \mathtt{CALL\_ID}(\mathtt{id}, \mathtt{args}, \mathtt{dst}, \mathtt{pc})\rrbracket & & \\
463& \ldTo^{\text{external}} & & \rdTo^{\text{internal}} & \\
464\skull & & & & \mathtt{regs} = [\mathtt{params}/-] \\
465& & & & \mathtt{sp} = \mathtt{ALLOC} \\
466& & & & \mathtt{PUSH}(\mathtt{carry}, \mathtt{regs}, \mathtt{dst}, \mathtt{return\_addr}), \mathtt{pc}_{0}, \mathtt{regs}, \mathtt{sp} \\
467\end{diagram}
468\end{displaymath}
469
470\begin{align*}
471\llbracket \mathtt{RETURN} \rrbracket \\
472\mathtt{return\_addr} & := \mathtt{top}(\mathtt{stack}) \\
473v*                    & := m(\mathtt{rv\_regs}) \\
474\mathtt{dst}, \mathtt{sp}, \mathtt{carry}, \mathtt{regs} & := \mathtt{pop} \\
475\mathtt{regs}[v* / \mathtt{dst}] \\
476\end{align*}
477
478\begin{displaymath}
479\begin{diagram}
480s    & \rTo^1 & s' & \rTo^1 & s'' \\
481\dTo &        &    & \rdTo  & \dTo \\
482\llbracket s \rrbracket & \rTo(1,3)^1 & & & \llbracket s'' \rrbracket \\ 
483\mathtt{CALL} \\
484\end{diagram}
485\end{displaymath}
486
487\begin{displaymath}
488\begin{diagram}
489s    & \rTo^1 & s' & \rTo^1 & s'' \\
490\dTo &        &    & \rdTo  & \dTo \\
491\  & \rTo(1,3) & & & \ \\
492\mathtt{RETURN} \\
493\end{diagram}
494\end{displaymath}
495
496\begin{displaymath}
497\mathtt{b\_graph\_translate}: (\mathtt{label} \rightarrow \mathtt{blist'})
498\rightarrow \mathtt{graph} \rightarrow \mathtt{graph}
499\end{displaymath}
500
501\begin{align*}
502\mathtt{theorem} &\ \mathtt{b\_graph\_translate\_ok}: \\
503& \forall  f.\forall G_{i}.\mathtt{let}\ G_{\sigma} := \mathtt{b\_graph\_translate}\ f\ G_{i}\ \mathtt{in} \\
504&       \forall l \in G_{i}.\mathtt{subgraph}\ (f\ l)\ l\ (\mathtt{next}\ l\ G_{i})\ G_{\sigma}
505\end{align*}
506
507\begin{align*}
508\mathtt{lemma} &\ \mathtt{execute\_1\_step\_ok}: \\
509&       \forall s.  \mathtt{let}\ s' := s\ \sigma\ \mathtt{in} \\
510&       \mathtt{let}\ l := pc\ s\ \mathtt{in} \\
511&       s \stackrel{1}{\rightarrow} s^{*} \Rightarrow \exists n. s' \stackrel{n}{\rightarrow} s'^{*} \wedge s'^{*} = s'\ \sigma
512\end{align*}
513
514\begin{align*}
515\mathrm{RTL\ status} & \ \ \mathrm{ERTL\ status} \\
516\mathtt{sp} & = \mathtt{spl} / \mathtt{sph} \\
517\mathtt{graph} & \mapsto \mathtt{graph} + \mathtt{prologue}(s) + \mathtt{epilogue}(s) \\
518& \mathrm{where}\ s = \mathrm{callee\ saved} + \nu \mathrm{RA} \\
519\end{align*}
520
521\begin{displaymath}
522\begin{diagram}
523\mathtt{CALL} & \rTo^1 & \mathtt{inside\ function} \\
524\dTo & & \dTo \\
525\underbrace{\ldots}_{\llbracket \mathtt{CALL} \rrbracket} & \rTo &
526\underbrace{\ldots}_{\mathtt{prologue}} \\
527\end{diagram}
528\end{displaymath}
529
530\begin{displaymath}
531\begin{diagram}
532\mathtt{RETURN} & \rTo^1 & \mathtt{.} \\
533\dTo & & \dTo \\
534\underbrace{\ldots}_{\mathtt{epilogue}} & \rTo &
535\underbrace{\ldots} \\
536\end{diagram}
537\end{displaymath}
538
539\begin{align*}
540\mathtt{prologue}(s) = & \mathtt{create\_new\_frame}; \\
541                       & \mathtt{pop\ ra}; \\
542                       & \mathtt{save\ callee\_saved}; \\
543                                                                                         & \mathtt{get\_params} \\
544                                                                                         & \ \ \mathtt{reg\_params}: \mathtt{move} \\
545                                                                                         & \ \ \mathtt{stack\_params}: \mathtt{push}/\mathtt{pop}/\mathtt{move} \\
546\end{align*}
547
548\begin{align*}
549\mathtt{epilogue}(s) = & \mathtt{save\ return\ to\ tmp\ real\ regs}; \\
550                                                                                         & \mathtt{restore\_registers}; \\
551                       & \mathtt{push\ ra}; \\
552                       & \mathtt{delete\_frame}; \\
553                       & \mathtt{save return} \\
554\end{align*}
555
556\begin{displaymath}
557\mathtt{CALL}\ id \mapsto \mathtt{set\_params};\ \mathtt{CALL}\ id;\ \mathtt{fetch\_result}
558\end{displaymath}
559
560\subsection{The ERTL to LTL translation}
561\label{subsect.ertl.ltl.translation}
562
563\subsection{The LTL to LIN translation}
564\label{subsect.ltl.lin.translation}
565
566We require a map, $\sigma$, from LTL statuses, where program counters are represented as labels in a graph data structure, to LIN statuses, where program counters are natural numbers:
567\begin{displaymath}
568\mathtt{pc : label} \stackrel{\sigma}{\longrightarrow} \mathbb{N}
569\end{displaymath}
570
571The LTL to LIN translation pass also linearises the graph data structure into a list of instructions.
572Pseudocode for the linearisation process is as follows:
573
574\begin{lstlisting}
575let rec linearise graph visited required generated todo :=
576  match todo with
577  | l::todo ->
578    if l $\in$ visited then
579      let generated := generated $\cup\ \{$ Goto l $\}$ in
580      let required := required $\cup$ l in
581        linearise graph visited required generated todo
582    else
583      -- Get the instruction at label `l' in the graph
584      let lookup := graph(l) in
585      let generated := generated $\cup\ \{$ lookup $\}$ in
586      -- Find the successor of the instruction at label `l' in the graph
587      let successor := succ(l, graph) in
588      let todo := successor::todo in
589        linearise graph visited required generated todo
590  | []      -> (required, generated)
591\end{lstlisting}
592
593It is easy to see that this linearisation process eventually terminates.
594In particular, the size of the visited label set is monotonically increasing, and is bounded above by the size of the graph that we are linearising.
595
596The initial call to \texttt{linearise} sees the \texttt{visited}, \texttt{required} and \texttt{generated} sets set to the empty set, and \texttt{todo} initialized with the singleton list consisting of the entry point of the graph.
597We envisage needing to prove the following invariants on the linearisation function above:
598
599\begin{enumerate}
600\item
601$\mathtt{visited} \approx \mathtt{generated}$, where $\approx$ is \emph{multiset} equality, as \texttt{generated} is a set of instructions where instructions may mention labels multiple times, and \texttt{visited} is a set of labels,
602\item
603$\forall \mathtt{l} \in \mathtt{generated}.\ \mathtt{succ(l,\ graph)} \subseteq \mathtt{required} \cup \mathtt{todo}$,
604\item
605$\mathtt{required} \subseteq \mathtt{visited}$,
606\item
607$\mathtt{visited} \cap \mathtt{todo} = \emptyset$.
608\end{enumerate}
609
610The invariants collectively imply the following properties, crucial to correctness, about the linearisation process:
611
612\begin{enumerate}
613\item
614Every graph node is visited at most once,
615\item
616Every instruction that is generated is generated due to some graph node being visited,
617\item
618The successor instruction of every instruction that has been visited already will eventually be visited too.
619\end{enumerate}
620
621Note, because the LTL to LIN transformation is the first time the program is linearised, we must discover a notion of `well formed program' suitable for linearised forms.
622In particular, we see the notion of well formedness (yet to be formally defined) resting on the following conditions:
623
624\begin{enumerate}
625\item
626For every jump to a label in a linearised program, the target label exists at some point in the program,
627\item
628Each label is unique, appearing only once in the program,
629\item
630The final instruction of a program must be a return.
631\end{enumerate}
632
633We assume that these properties will be easy consequences of the invariants on the linearisation function defined above.
634
635The final condition above is potentially a little opaque, so we explain further.
636First, the only instructions that can reasonably appear in final position at the end of a program are returns or backward jumps, as any other instruction would cause execution to `fall out' of the end of the program (for example, when a function invoked with \texttt{CALL} returns, it returns to the next instruction past the \texttt{CALL} that invoked it).
637However, in LIN, though each function's graph has been linearised, the entire program is not yet fully linearised into a list of instructions, but rather, a list of `functions', each consisting of a linearised body along with other data.
638Each well-formed function must end with a call to \texttt{RET}, and therefore the only correct instruction that can terminate a LIN program is a \texttt{RET} instruction.
639
640\subsection{The LIN to ASM and ASM to MCS-51 machine code translations}
641\label{subsect.lin.asm.translation}
642
643The LIN to ASM translation step is trivial, being almost the identity function.
644The only non-trivial feature of the LIN to ASM translation is that all labels are `named apart' so that there is no chance of freshly generated labels from different namespaces clashing with labels from another namespace.
645
646The ASM to MCS-51 machine code translation step, and the required statements of correctness, are found in an unpublished manuscript attached to this document.
647
648\section{Estimated effort}
649Based on the rough analysis performed so far we can estimate the total
650effort for the certification of the compiler. We obtain this estimation by
651combining, for each pass: 1) the number of lines of code to be certified;
6522) the ratio of number of lines of proof to number of lines of code from
653the CompCert project~\cite{compcert} for the CompCert pass that is closest to
654ours; 3) an estimation of the complexity of the pass according to the
655analysis above.
656
657\begin{tabular}{lrll}
658Pass origin & Code lines & CompCert ratio & Estimated effort \\
659\hline
660Cminor & 2913 & 5.23 \permil & 15.24 \\ 
661RTLabs & 1252 & 1.17 \permil & 1.48 \\
662RTL    &  469 & 4.17 \permil & 1.95 \\
663ERTL   &  789 & 3.01 \permil & 2.38 \\
664LTL    &   92 & 5.94 \permil & 0.55 \\
665LIN    &  354 & 6.54 \permil & 2.31 \\
666ASM    &  984 & 4.80 \permil & 4.72 \\
667\hline
668Total  & 6853 & 4.17 \permil & 28.63 \\
669\end{tabular}
670
671\end{document}
Note: See TracBrowser for help on using the repository browser.