source: Deliverables/D1.2/CompilerProofOutline/outline.tex @ 1740

Last change on this file since 1740 was 1740, checked in by sacerdot, 8 years ago

...

File size: 24.9 KB
Line 
1\documentclass[a4paper, 10pt]{article}
2
3\usepackage{a4wide}
4\usepackage{amsfonts}
5\usepackage{amsmath}
6\usepackage{amssymb}
7\usepackage[english]{babel}
8\usepackage{color}
9\usepackage{diagrams}
10\usepackage{graphicx}
11\usepackage[colorlinks]{hyperref}
12\usepackage[utf8x]{inputenc}
13\usepackage{listings}
14\usepackage{microtype}
15\usepackage{skull}
16\usepackage{stmaryrd}
17
18\lstdefinelanguage{matita-ocaml} {
19  mathescape=true
20}
21\lstset{
22  language=matita-ocaml,basicstyle=\tt,columns=flexible,breaklines=false,
23  showspaces=false, showstringspaces=false, extendedchars=false,
24  inputencoding=utf8x, tabsize=2
25}
26
27\title{Proof outline for the correctness of the CerCo compiler}
28\date{\today}
29\author{The CerCo team}
30
31\begin{document}
32
33\maketitle
34
35\section{Introduction}
36\label{sect.introduction}
37
38In the last project review of the CerCo project, the project reviewers expressed the opinion that it would be a good idea to attempt to write down some of the statements of the correctness theorems that we intend to prove about the complexity preserving compiler.
39This document provides a very high-level, pen-and-paper sketch of what we view as the best path to completing the correctness proof for the compiler.
40In particular, for every translation between two intermediate languages, in both the front- and back-ends, we identify the key translation steps, and identify some invariants that we view as being important for the correctness proof.
41
42\section{Front-end: Clight to RTLabs}
43
44The front-end of the CerCo compiler consists of several stages:
45
46\begin{center}
47\begin{minipage}{.8\linewidth}
48\begin{tabbing}
49\quad \= $\downarrow$ \quad \= \kill
50\textsf{Clight}\\
51\> $\downarrow$ \> cast removal\\
52\> $\downarrow$ \> add runtime functions\footnote{Following the last project
53meeting we intend to move this transformation to the back-end}\\
54\> $\downarrow$ \> cost labelling\\
55\> $\downarrow$ \> loop optimizations\footnote{To be ported from the untrusted compiler and certified only in case of early completion of the certification of the other passes.} (an endo-transformation)\\
56\> $\downarrow$ \> stack variable allocation and control structure
57 simplification\\
58\textsf{Cminor}\\
59\> $\downarrow$ \> generate global variable initialisation code\\
60\> $\downarrow$ \> transform to RTL graph\\
61\textsf{RTLabs}\\
62\> $\downarrow$ \> start of target specific back-end\\
63\>\quad \vdots
64\end{tabbing}
65\end{minipage}
66\end{center}
67
68Here, by `endo-transformation', we mean a mapping from language back to itself:
69the loop optimization step maps the Clight language to itself.
70
71%Our overall statements of correctness with respect to costs will
72%require a correctly labelled program
73There are three layers in most of the proofs proposed:
74\begin{enumerate}
75\item invariants closely tied to the syntax and transformations using
76  dependent types,
77\item a forward simulation proof, and
78\item syntactic proofs about the cost labelling.
79\end{enumerate}
80The first will support both function correctness and allow us to show
81the totality of some of the compiler stages (that is, these stages of
82the compiler cannot fail).  The second provides the main functional
83correctness result, and the last will be crucial for applying
84correctness results about the costings from the back-end.
85
86We will also prove that a suitably labelled RTLabs trace can be turned
87into a \emph{structured trace} which splits the execution trace into
88cost-label-to-cost-label chunks with nested function calls.  This
89structure was identified during work on the correctness of the
90back-end cost analysis as retaining important information about the
91structure of the execution that is difficult to reconstruct later in
92the compiler.
93
94\subsection{Clight Cast removal}
95
96This removes some casts inserted by the parser to make arithmetic
97promotion explicit when they are superfluous (such as
98\lstinline[language=C]'c = (short)((int)a + (int)b);').
99
100The transformation only affects Clight expressions, recursively
101detecting casts that can be safely eliminated.  The semantics provides
102a big-step definition for expression, so we should be able to show a
103lock-step forward simulation using a lemma showing that cast
104elimination does not change the evaluation of expressions.  This lemma
105will follow from a structural induction on the source expression.  We
106have already proved a few of the underlying arithmetic results
107necessary to validate the approach.
108
109\subsection{Clight cost labelling}
110
111This adds cost labels before and after selected statements and
112expressions, and the execution traces ought to be equivalent modulo
113cost labels.  Hence it requires a simple forward simulation with a
114limited amount of stuttering whereever a new cost label is introduced.
115A bound can be given for the amount of stuttering allowed can be given
116based on the statement or continuation to be evaluated next.
117
118We also intend to show three syntactic properties about the cost
119labelling:
120\begin{itemize}
121\item every function starts with a cost label,
122\item every branching instruction is followed by a label (note that
123  exiting a loop is treated as a branch), and
124\item the head of every loop (and any \lstinline'goto' destination) is
125  a cost label.
126\end{itemize}
127These can be shown by structural induction on the source term.
128
129\subsection{Clight to Cminor translation}
130
131This translation is the first to introduce some invariants, with the
132proofs closely tied to the implementation by dependent typing.  These
133are largely complete and show that the generated code enjoys:
134\begin{itemize}
135\item some minimal type safety shown by explicit checks on the
136  Cminor types during the transformation (a little more work remains
137  to be done here, but follows the same form);
138\item that variables named in the parameter and local variable
139  environments are distinct from one another, again by an explicit
140  check;
141\item that variables used in the generated code are present in the
142  resulting environment (either by checking their presence in the
143  source environment, or from a list of freshly temporary variables);
144  and
145\item that all \lstinline[language=C]'goto' labels are present (by
146  checking them against a list of source labels and proving that all
147  source labels are preserved).
148\end{itemize}
149
150The simulation will be similar to the relevant stages of CompCert
151(Clight to Csharpminor and Csharpminor to Cminor --- in the event that
152the direct proof is unwieldy we could introduce a corresponding
153intermediate language).  During early experimentation with porting
154CompCert definitions to the Matita proof assistant we found little
155difficulty reproving the results for the memory model, so we plan to
156port the memory injection properties and use them to relate Clight
157in-memory variables with either a local variable valuation or a stack
158slot, depending on how it was classified.
159
160This should be sufficient to show the equivalence of (big-step)
161expression evaluation.  The simulation can then be shown by relating
162corresponding blocks of statement and continuations with their Cminor
163counterparts and proving that a few steps reaches the next matching
164state.
165
166The syntactic properties required for cost labels remain similar and a
167structural induction on the function bodies should be sufficient to
168show that they are preserved.
169
170\subsection{Cminor global initialisation code}
171
172This short phase replaces the global variable initialisation data with
173code that executes when the program starts.  Each piece of
174initialisation data in the source is matched by a new statement
175storing that data.  As each global variable is allocated a distinct
176memory block, the program state after the initialisation statements
177will be the same as the original program's state at the start of
178execution, and will proceed in the same manner afterwards.
179
180% Actually, the above is wrong...
181% ... this ought to be in a fresh main function with a fresh cost label
182
183\subsection{Cminor to RTLabs translation}
184
185In this part of the compiler we transform the program's functions into
186control flow graphs.  It is closely related to CompCert's Cminorsel to
187RTL transformation, albeit with target-independent operations.
188
189We already enforce several invariants with dependent types: some type
190safety is enforced mostly using the type information from Cminor; and
191that the graph is closed (by showing that each successor was recently
192added, or corresponds to a \lstinline[language=C]'goto' label which
193are all added before the end).  Note that this relies on a
194monotonicity property; CompCert maintains a similar property in a
195similar way while building RTL graphs.  We will also add a result
196showing that all of the pseudo-register names are distinct for use by
197later stages using the same method as Cminor.
198
199The simulation will relate Cminor states to RTLabs states about to
200execute the corresponding code (with a corresponding continuation).
201Each Cminor statement becomes zero or more RTLabs statements, with a
202decreasing measure based on the statement and continuations similar to
203CompCert's.  We may also follow CompCert in using a relational
204specification of this stage so as to abstract away from the functional
205(and highly dependently typed) definition.
206
207The first two labelling properties remain as before; we will show that
208cost labels are preserved, so the function entry point will be a cost
209label, and successors that are cost labels map still map to cost
210labels, preserving the condition on branches.  We replace the property
211for loops with the notion that we will always reach a cost label or
212the end of the function by following a bounded number of successors.
213This can be easily seen in Cminor using the requirement for cost
214labels at the head of loops and after gotos.  It remains to show that
215this is preserved by the translation to RTLabs.  % how?
216
217\subsection{RTLabs structured trace generation}
218
219This proof-only step incorporates the function call structure and cost
220labelling properties into the execution trace.  As the function calls
221are nested within the trace, we need to distinguish between
222terminating and non-terminating function calls.
223
224Thus we use the excluded middle (specialised to a function termination
225property) to do this.  Terminating functions are built by following
226the trace, breaking it into chunks between cost labels and recursively
227processing function calls.  The main difficulties here are the
228non-structurally recursive nature of the function (instead we use the
229size of the termination proof as a measure) and using the properties
230on RTLabs cost labelling to create the correct structure.  We also
231show that the lower stack frames are preserved during function calls
232in order to prove that after returning from a function call we execute
233the correct code.  This part of the work has already been constructed,
234but still requires a simple proof to show that flattening the structured
235trace recreates the original flat trace.
236
237The non-terminating case uses the excluded middle to decide whether to
238use the function described above to construct an inductive terminating
239structured trace, or a coinductive version for non-termination.  It
240follows the trace like the terminating version to build up chunks of
241trace from cost-label to cost-label (which, by the finite distance to
242a cost label property shown before, can be represented by an inductive
243type), unless it encounters a non-terminating function in which case
244it corecursively processes that.  This part of the trace
245transformation is currently under construction, and will also need a
246flattening result to show that it is correct.
247
248\section{Backend: RTLabs to machine code}
249\label{sect.backend.rtlabs.machine.code}
250
251The compiler backend consists of the following intermediate languages, and stages of translation:
252
253\begin{center}
254\begin{minipage}{.8\linewidth}
255\begin{tabbing}
256\quad \= $\downarrow$ \quad \= \kill
257\textsf{RTLabs}\\
258\> $\downarrow$ \> copy propagation\footnote{\label{lab:opt}To be ported from the untrusted compiler and certified only in case of early completion of the certification of the other passes.} (an endo-transformation) \\
259\> $\downarrow$ \> instruction selection\\
260\textsf{RTL}\\
261\> $\downarrow$ \> constant propagation$^{\mbox{\scriptsize \ref{lab:opt}}}$ (an endo-transformation) \\
262\> $\downarrow$ \> calling convention made explicit \\
263\> $\downarrow$ \> stack frame generation \\
264\textsf{ERTL}\\
265\> $\downarrow$ \> register allocation and spilling\\
266\> $\downarrow$ \> dead code elimination\\
267\textsf{LTL}\\
268\> $\downarrow$ \> function linearisation\\
269\> $\downarrow$ \> branch compression (an endo-transformation) \\
270\textsf{LIN}\\
271\> $\downarrow$ \> relabeling\\
272\textsf{ASM}\\
273\> $\downarrow$ \> pseudoinstruction expansion\\
274\textsf{Machine code}\\
275\end{tabbing}
276\end{minipage}
277\end{center}
278
279\subsection{The RTLabs to RTL translation}
280\label{subsect.rtlabs.rtl.translation}
281
282% dpm: type system and casting load (???)
283We require a map, $\sigma$, between \texttt{Values} of the front-end memory model to lists of \texttt{BEValues} of the back-end memory model:
284
285\begin{displaymath}
286\mathtt{Value} ::= \bot \mid \mathtt{int(size)} \mid \mathtt{float} \mid \mathtt{null} \mid \mathtt{ptr} \quad\stackrel{\sigma}{\longrightarrow}\quad \mathtt{BEValue} ::= \bot \mid \mathtt{byte} \mid \mathtt{int}_i \mid \mathtt{ptr}_i
287\end{displaymath}
288
289We further require a map, $\sigma$, which maps the front-end \texttt{Memory} and the back-end's notion of \texttt{BEMemory}.
290
291\begin{displaymath}
292\mathtt{Mem}\ \alpha = \mathtt{Block} \rightarrow (\mathbb{Z} \rightarrow \alpha)
293\end{displaymath}
294
295where
296
297\begin{displaymath}
298\mathtt{Block} ::= \mathtt{Region} \cup \mathtt{ID}
299\end{displaymath}
300
301\begin{displaymath}
302\mathtt{BEMem} = \mathtt{Mem} \mathtt{Value}
303\end{displaymath}
304
305\begin{displaymath}
306\mathtt{Address} = \mathtt{BEValue} \times  \mathtt{BEValue} \\
307\end{displaymath}
308
309\begin{displaymath}
310\mathtt{Mem} = \mathtt{Block} \rightarrow (\mathbb{Z} \rightarrow \mathtt{Cont} \mid \mathtt{Value} \times \mathtt{Size} \mid \mathtt{null})
311\end{displaymath}
312
313\begin{center}
314\begin{picture}(2, 2)
315\put(0,-20){\framebox(25,25)[c]{\texttt{v}}}
316\put(26,-20){\framebox(25,25)[c]{\texttt{4}}}
317\put(51,-20){\framebox(25,25)[c]{\texttt{cont}}}
318\put(76,-20){\framebox(25,25)[c]{\texttt{cont}}}
319\put(101,-20){\framebox(25,25)[c]{\texttt{cont}}}
320\end{picture}
321\end{center}
322
323\begin{displaymath}
324\mathtt{load}\ s\ a\ M = \mathtt{Some}\ v \rightarrow \forall i \leq s.\ \mathtt{load}\ s\ (a + i)\ \sigma(M) = \mathtt{Some}\ v_i
325\end{displaymath}
326
327\begin{displaymath}
328\sigma(\mathtt{store}\ v\ M) = \mathtt{store}\ \sigma(v)\ \sigma(M)
329\end{displaymath}
330
331\begin{displaymath}
332\begin{array}{rll}
333\mathtt{State} & ::=  & (\mathtt{State} : \mathtt{Frame}^* \times \mathtt{Frame} \\
334               & \mid & \mathtt{Call} : \mathtt{Frame}^* \times \mathtt{Args} \times \mathtt{Return} \times \mathtt{Fun} \\
335               & \mid & \mathtt{Return} : \mathtt{Frame}^* \times \mathtt{Value} \times \mathtt{Return}) \times \mathtt{Mem}
336\end{array}
337\end{displaymath}
338
339\begin{displaymath}
340\mathtt{State} ::= \mathtt{Frame}^* \times \mathtt{PC} \times \mathtt{SP} \times \mathtt{ISP} \times \mathtt{CARRY} \times \mathtt{REGS}
341\end{displaymath}
342
343\begin{displaymath}
344\mathtt{State} \stackrel{\sigma}{\longrightarrow} \mathtt{State}
345\end{displaymath}
346
347\begin{displaymath}
348\sigma(\mathtt{State} (\mathtt{Frame}^* \times \mathtt{Frame})) \longrightarrow ((\sigma(\mathtt{Frame}^*), \sigma(\mathtt{PC}), \sigma(\mathtt{SP}), 0, 0, \sigma(\mathtt{REGS})), \sigma(\mathtt{Mem}))
349\end{displaymath}
350
351\begin{displaymath}
352\sigma(\mathtt{Return}(-)) \longrightarrow \sigma \circ \text{return one step}
353\end{displaymath}
354
355\begin{displaymath}
356\sigma(\mathtt{Call}(-)) \longrightarrow \sigma \circ \text{call one step}
357\end{displaymath}
358
359Return one step commuting diagram:
360
361\begin{displaymath}
362\begin{diagram}
363s & \rTo^{\text{one step of execution}} & s'   \\
364  & \rdTo                             & \dTo \\
365  &                                   & \llbracket s'' \rrbracket
366\end{diagram}
367\end{displaymath}
368
369Call one step commuting diagram:
370
371\begin{displaymath}
372\begin{diagram}
373s & \rTo^{\text{one step of execution}} & s'   \\
374  & \rdTo                             & \dTo \\
375  &                                   & \llbracket s'' \rrbracket
376\end{diagram}
377\end{displaymath}
378
379\begin{displaymath}
380\begin{diagram}
381\mathtt{CALL}(\mathtt{id},\ \mathtt{args},\ \mathtt{dst},\ \mathtt{pc}),\ \mathtt{State}(\mathtt{Frame},\ \mathtt{Frames}) & \rTo & \mathtt{Call}(FINISH ME)
382\end{diagram}
383\end{displaymath}
384
385\begin{displaymath}
386\begin{array}{rcl}
387\mathtt{CALL}(\mathtt{id}, \mathtt{args}, \mathtt{dst}, \mathtt{pc}) & \longrightarrow & \mathtt{CALL\_ID}(\mathtt{id}, \sigma'(\mathtt{args}), \sigma(\mathtt{dst}), \mathtt{pc}) \\
388\mathtt{RETURN}                                                      & \longrightarrow & \mathtt{RETURN} \\
389\end{array} 
390\end{displaymath}
391
392\begin{displaymath}
393\begin{array}{rcl}
394\sigma & : & \mathtt{register} \rightarrow \mathtt{list\ register} \\
395\sigma' & : & \mathtt{list\ register} \rightarrow \mathtt{list\ register}
396\end{array} 
397\end{displaymath}
398
399\subsection{The RTL to ERTL translation}
400\label{subsect.rtl.ertl.translation}
401
402\begin{displaymath}
403\begin{diagram}
404& & \llbracket \mathtt{CALL\_ID}(\mathtt{id}, \mathtt{args}, \mathtt{dst}, \mathtt{pc})\rrbracket & & \\
405& \ldTo^{\text{external}} & & \rdTo^{\text{internal}} & \\
406\skull & & & & \mathtt{regs} = [\mathtt{params}/-] \\
407& & & & \mathtt{sp} = \mathtt{ALLOC} \\
408& & & & \mathtt{PUSH}(\mathtt{carry}, \mathtt{regs}, \mathtt{dst}, \mathtt{return\_addr}), \mathtt{pc}_{0}, \mathtt{regs}, \mathtt{sp} \\
409\end{diagram}
410\end{displaymath}
411
412\begin{align*}
413\llbracket \mathtt{RETURN} \rrbracket \\
414\mathtt{return\_addr} & := \mathtt{top}(\mathtt{stack}) \\
415v*                    & := m(\mathtt{rv\_regs}) \\
416\mathtt{dst}, \mathtt{sp}, \mathtt{carry}, \mathtt{regs} & := \mathtt{pop} \\
417\mathtt{regs}[v* / \mathtt{dst}] \\
418\end{align*}
419
420\begin{displaymath}
421\begin{diagram}
422s    & \rTo^1 & s' & \rTo^1 & s'' \\
423\dTo &        &    & \rdTo  & \dTo \\
424\llbracket s \rrbracket & \rTo(1,3)^1 & & & \llbracket s'' \rrbracket \\ 
425\mathtt{CALL} \\
426\end{diagram}
427\end{displaymath}
428
429\begin{displaymath}
430\begin{diagram}
431s    & \rTo^1 & s' & \rTo^1 & s'' \\
432\dTo &        &    & \rdTo  & \dTo \\
433\  & \rTo(1,3) & & & \ \\
434\mathtt{RETURN} \\
435\end{diagram}
436\end{displaymath}
437
438\begin{displaymath}
439\mathtt{b\_graph\_translate}: (\mathtt{label} \rightarrow \mathtt{blist'})
440\rightarrow \mathtt{graph} \rightarrow \mathtt{graph}
441\end{displaymath}
442
443\begin{align*}
444\mathtt{theorem} &\ \mathtt{b\_graph\_translate\_ok}: \\
445& \forall  f.\forall G_{i}.\mathtt{let}\ G_{\sigma} := \mathtt{b\_graph\_translate}\ f\ G_{i}\ \mathtt{in} \\
446&       \forall l \in G_{i}.\mathtt{subgraph}\ (f\ l)\ l\ (\mathtt{next}\ l\ G_{i})\ G_{\sigma}
447\end{align*}
448
449\begin{align*}
450\mathtt{lemma} &\ \mathtt{execute\_1\_step\_ok}: \\
451&       \forall s.  \mathtt{let}\ s' := s\ \sigma\ \mathtt{in} \\
452&       \mathtt{let}\ l := pc\ s\ \mathtt{in} \\
453&       s \stackrel{1}{\rightarrow} s^{*} \Rightarrow \exists n. s' \stackrel{n}{\rightarrow} s'^{*} \wedge s'^{*} = s'\ \sigma
454\end{align*}
455
456\begin{align*}
457\mathrm{RTL\ status} & \ \ \mathrm{ERTL\ status} \\
458\mathtt{sp} & = \mathtt{spl} / \mathtt{sph} \\
459\mathtt{graph} & \mapsto \mathtt{graph} + \mathtt{prologue}(s) + \mathtt{epilogue}(s) \\
460& \mathrm{where}\ s = \mathrm{callee\ saved} + \nu \mathrm{RA} \\
461\end{align*}
462
463\begin{displaymath}
464\begin{diagram}
465\mathtt{CALL} & \rTo^1 & \mathtt{inside\ function} \\
466\dTo & & \dTo \\
467\underbrace{\ldots}_{\llbracket \mathtt{CALL} \rrbracket} & \rTo &
468\underbrace{\ldots}_{\mathtt{prologue}} \\
469\end{diagram}
470\end{displaymath}
471
472\begin{displaymath}
473\begin{diagram}
474\mathtt{RETURN} & \rTo^1 & \mathtt{.} \\
475\dTo & & \dTo \\
476\underbrace{\ldots}_{\mathtt{epilogue}} & \rTo &
477\underbrace{\ldots} \\
478\end{diagram}
479\end{displaymath}
480
481\begin{align*}
482\mathtt{prologue}(s) = & \mathtt{create\_new\_frame}; \\
483                       & \mathtt{pop\ ra}; \\
484                       & \mathtt{save\ callee\_saved}; \\
485                                                                                         & \mathtt{get\_params} \\
486                                                                                         & \ \ \mathtt{reg\_params}: \mathtt{move} \\
487                                                                                         & \ \ \mathtt{stack\_params}: \mathtt{push}/\mathtt{pop}/\mathtt{move} \\
488\end{align*}
489
490\begin{align*}
491\mathtt{epilogue}(s) = & \mathtt{save\ return\ to\ tmp\ real\ regs}; \\
492                                                                                         & \mathtt{restore\_registers}; \\
493                       & \mathtt{push\ ra}; \\
494                       & \mathtt{delete\_frame}; \\
495                       & \mathtt{save return} \\
496\end{align*}
497
498\begin{displaymath}
499\mathtt{CALL}\ id \mapsto \mathtt{set\_params};\ \mathtt{CALL}\ id;\ \mathtt{fetch\_result}
500\end{displaymath}
501
502\subsection{The ERTL to LTL translation}
503\label{subsect.ertl.ltl.translation}
504
505\subsection{The LTL to LIN translation}
506\label{subsect.ltl.lin.translation}
507
508We require a map, $\sigma$, from LTL statuses, where program counters are represented as labels in a graph data structure, to LIN statuses, where program counters are natural numbers:
509\begin{displaymath}
510\mathtt{pc : label} \stackrel{\sigma}{\longrightarrow} \mathbb{N}
511\end{displaymath}
512
513The LTL to LIN translation pass also linearises the graph data structure into a list of instructions.
514Pseudocode for the linearisation process is as follows:
515
516\begin{lstlisting}
517let rec linearise graph visited required generated todo :=
518  match todo with
519  | l::todo ->
520    if l $\in$ visited then
521      let generated := generated $\cup\ \{$ Goto l $\}$ in
522      let required := required $\cup$ l in
523        linearise graph visited required generated todo
524    else
525      -- Get the instruction at label `l' in the graph
526      let lookup := graph(l) in
527      let generated := generated $\cup\ \{$ lookup $\}$ in
528      -- Find the successor of the instruction at label `l' in the graph
529      let successor := succ(l, graph) in
530      let todo := successor::todo in
531        linearise graph visited required generated todo
532  | []      -> (required, generated)
533\end{lstlisting}
534
535It is easy to see that this linearisation process eventually terminates.
536In particular, the size of the visited label set is monotonically increasing, and is bounded above by the size of the graph that we are linearising.
537
538The initial call to \texttt{linearise} sees the \texttt{visited}, \texttt{required} and \texttt{generated} sets set to the empty set, and \texttt{todo} initialized with the singleton list consisting of the entry point of the graph.
539We envisage needing to prove the following invariants on the linearisation function above:
540
541\begin{enumerate}
542\item
543$\mathtt{visited} \approx \mathtt{generated}$, where $\approx$ is \emph{multiset} equality, as \texttt{generated} is a set of instructions where instructions may mention labels multiple times, and \texttt{visited} is a set of labels,
544\item
545$\forall \mathtt{l} \in \mathtt{generated}.\ \mathtt{succ(l,\ graph)} \subseteq \mathtt{required} \cup \mathtt{todo}$,
546\item
547$\mathtt{required} \subseteq \mathtt{visited}$,
548\item
549$\mathtt{visited} \cap \mathtt{todo} = \emptyset$.
550\end{enumerate}
551
552The invariants collectively imply the following properties, crucial to correctness, about the linearisation process:
553
554\begin{enumerate}
555\item
556Every graph node is visited at most once,
557\item
558Every instruction that is generated is generated due to some graph node being visited,
559\item
560The successor instruction of every instruction that has been visited already will eventually be visited too.
561\end{enumerate}
562
563Note, because the LTL to LIN transformation is the first time the program is linearised, we must discover a notion of `well formed program' suitable for linearised forms.
564In particular, we see the notion of well formedness (yet to be formally defined) resting on the following conditions:
565
566\begin{enumerate}
567\item
568For every jump to a label in a linearised program, the target label exists at some point in the program,
569\item
570Each label is unique, appearing only once in the program,
571\item
572The final instruction of a program must be a return.
573\end{enumerate}
574
575We assume that these properties will be easy consequences of the invariants on the linearisation function defined above.
576
577The final condition above is potentially a little opaque, so we explain further.
578First, the only instructions that can reasonably appear in final position at the end of a program are returns or backward jumps, as any other instruction would cause execution to `fall out' of the end of the program (for example, when a function invoked with \texttt{CALL} returns, it returns to the next instruction past the \texttt{CALL} that invoked it).
579However, in LIN, though each function's graph has been linearised, the entire program is not yet fully linearised into a list of instructions, but rather, a list of `functions', each consisting of a linearised body along with other data.
580Each well-formed function must end with a call to \texttt{RET}, and therefore the only correct instruction that can terminate a LIN program is a \texttt{RET} instruction.
581
582\subsection{The LIN to ASM and ASM to MCS-51 machine code translations}
583\label{subsect.lin.asm.translation}
584
585The LIN to ASM translation step is trivial, being almost the identity function.
586The only non-trivial feature of the LIN to ASM translation is that all labels are `named apart' so that there is no chance of freshly generated labels from different namespaces clashing with labels from another namespace.
587
588The ASM to MCS-51 machine code translation step, and the required statements of correctness, are found in an unpublished manuscript attached to this document.
589
590\end{document}
Note: See TracBrowser for help on using the repository browser.