source: Deliverables/D1.2/CompilerProofOutline/outline.tex @ 1739

Last change on this file since 1739 was 1739, checked in by sacerdot, 8 years ago

Footnote for optimizations.

File size: 24.9 KB
Line 
1\documentclass[a4paper, 10pt]{article}
2
3\usepackage{a4wide}
4\usepackage{amsfonts}
5\usepackage{amsmath}
6\usepackage{amssymb}
7\usepackage[english]{babel}
8\usepackage{color}
9\usepackage{diagrams}
10\usepackage{graphicx}
11\usepackage[colorlinks]{hyperref}
12\usepackage[utf8x]{inputenc}
13\usepackage{listings}
14\usepackage{microtype}
15\usepackage{skull}
16\usepackage{stmaryrd}
17
18\lstdefinelanguage{matita-ocaml} {
19  mathescape=true
20}
21\lstset{
22  language=matita-ocaml,basicstyle=\tt,columns=flexible,breaklines=false,
23  showspaces=false, showstringspaces=false, extendedchars=false,
24  inputencoding=utf8x, tabsize=2
25}
26
27\title{Proof outline for the correctness of the CerCo compiler}
28\date{\today}
29\author{The CerCo team}
30
31\begin{document}
32
33\maketitle
34
35\section{Introduction}
36\label{sect.introduction}
37
38In the last project review of the CerCo project, the project reviewers expressed the opinion that it would be a good idea to attempt to write down some of the statements of the correctness theorems that we intend to prove about the complexity preserving compiler.
39This document provides a very high-level, pen-and-paper sketch of what we view as the best path to completing the correctness proof for the compiler.
40In particular, for every translation between two intermediate languages, in both the front- and back-ends, we identify the key translation steps, and identify some invariants that we view as being important for the correctness proof.
41
42\section{Front-end: Clight to RTLabs}
43
44The front-end of the CerCo compiler consists of several stages:
45
46\begin{center}
47\begin{minipage}{.8\linewidth}
48\begin{tabbing}
49\quad \= $\downarrow$ \quad \= \kill
50\textsf{Clight}\\
51\> $\downarrow$ \> cast removal\\
52\> $\downarrow$ \> add runtime functions\footnote{Following the last project
53meeting we intend to move this transformation to the back-end}\\
54\> $\downarrow$ \> cost labelling\\
55\> $\downarrow$ \> loop optimizations\footnote{To be ported from the untrusted compiler and certified only in case of early completion of the certification of the other passes.} (an endo-transformation)\\
56\> $\downarrow$ \> stack variable allocation and control structure
57 simplification\\
58\textsf{Cminor}\\
59\> $\downarrow$ \> generate global variable initialisation code\\
60\> $\downarrow$ \> transform to RTL graph\\
61\textsf{RTLabs}\\
62\> $\downarrow$ \> start of target specific back-end\\
63\>\quad \vdots
64\end{tabbing}
65\end{minipage}
66\end{center}
67
68%Our overall statements of correctness with respect to costs will
69%require a correctly labelled program
70
71There are three layers in most of the proofs proposed:
72\begin{enumerate}
73\item invariants closely tied to the syntax and transformations using
74  dependent types,
75\item a forward simulation proof, and
76\item syntactic proofs about the cost labelling.
77\end{enumerate}
78The first will support both function correctness and allow us to show
79the totality of some of the compiler stages (that is, these stages of
80the compiler cannot fail).  The second provides the main functional
81correctness result, and the last will be crucial for applying
82correctness results about the costings from the back-end.
83
84We will also prove that a suitably labelled RTLabs trace can be turned
85into a \emph{structured trace} which splits the execution trace into
86cost-label-to-cost-label chunks with nested function calls.  This
87structure was identified during work on the correctness of the
88back-end cost analysis as retaining important information about the
89structure of the execution that is difficult to reconstruct later in
90the compiler.
91
92\subsection{Clight Cast removal}
93
94This removes some casts inserted by the parser to make arithmetic
95promotion explicit when they are superfluous (such as
96\lstinline[language=C]'c = (short)((int)a + (int)b);').
97
98The transformation only affects Clight expressions, recursively
99detecting casts that can be safely eliminated.  The semantics provides
100a big-step definition for expression, so we should be able to show a
101lock-step forward simulation using a lemma showing that cast
102elimination does not change the evaluation of expressions.  This lemma
103will follow from a structural induction on the source expression.  We
104have already proved a few of the underlying arithmetic results
105necessary to validate the approach.
106
107\subsection{Clight cost labelling}
108
109This adds cost labels before and after selected statements and
110expressions, and the execution traces ought to be equivalent modulo
111cost labels.  Hence it requires a simple forward simulation with a
112limited amount of stuttering whereever a new cost label is introduced.
113A bound can be given for the amount of stuttering allowed can be given
114based on the statement or continuation to be evaluated next.
115
116We also intend to show three syntactic properties about the cost
117labelling:
118\begin{itemize}
119\item every function starts with a cost label,
120\item every branching instruction is followed by a label (note that
121  exiting a loop is treated as a branch), and
122\item the head of every loop (and any \lstinline'goto' destination) is
123  a cost label.
124\end{itemize}
125These can be shown by structural induction on the source term.
126
127\subsection{Clight to Cminor translation}
128
129This translation is the first to introduce some invariants, with the
130proofs closely tied to the implementation by dependent typing.  These
131are largely complete and show that the generated code enjoys:
132\begin{itemize}
133\item some minimal type safety shown by explicit checks on the
134  Cminor types during the transformation (a little more work remains
135  to be done here, but follows the same form);
136\item that variables named in the parameter and local variable
137  environments are distinct from one another, again by an explicit
138  check;
139\item that variables used in the generated code are present in the
140  resulting environment (either by checking their presence in the
141  source environment, or from a list of freshly temporary variables);
142  and
143\item that all \lstinline[language=C]'goto' labels are present (by
144  checking them against a list of source labels and proving that all
145  source labels are preserved).
146\end{itemize}
147
148The simulation will be similar to the relevant stages of CompCert
149(Clight to Csharpminor and Csharpminor to Cminor --- in the event that
150the direct proof is unwieldy we could introduce a corresponding
151intermediate language).  During early experimentation with porting
152CompCert definitions to the Matita proof assistant we found little
153difficulty reproving the results for the memory model, so we plan to
154port the memory injection properties and use them to relate Clight
155in-memory variables with either a local variable valuation or a stack
156slot, depending on how it was classified.
157
158This should be sufficient to show the equivalence of (big-step)
159expression evaluation.  The simulation can then be shown by relating
160corresponding blocks of statement and continuations with their Cminor
161counterparts and proving that a few steps reaches the next matching
162state.
163
164The syntactic properties required for cost labels remain similar and a
165structural induction on the function bodies should be sufficient to
166show that they are preserved.
167
168\subsection{Cminor global initialisation code}
169
170This short phase replaces the global variable initialisation data with
171code that executes when the program starts.  Each piece of
172initialisation data in the source is matched by a new statement
173storing that data.  As each global variable is allocated a distinct
174memory block, the program state after the initialisation statements
175will be the same as the original program's state at the start of
176execution, and will proceed in the same manner afterwards.
177
178% Actually, the above is wrong...
179% ... this ought to be in a fresh main function with a fresh cost label
180
181\subsection{Cminor to RTLabs translation}
182
183In this part of the compiler we transform the program's functions into
184control flow graphs.  It is closely related to CompCert's Cminorsel to
185RTL transformation, albeit with target-independent operations.
186
187We already enforce several invariants with dependent types: some type
188safety is enforced mostly using the type information from Cminor; and
189that the graph is closed (by showing that each successor was recently
190added, or corresponds to a \lstinline[language=C]'goto' label which
191are all added before the end).  Note that this relies on a
192monotonicity property; CompCert maintains a similar property in a
193similar way while building RTL graphs.  We will also add a result
194showing that all of the pseudo-register names are distinct for use by
195later stages using the same method as Cminor.
196
197The simulation will relate Cminor states to RTLabs states about to
198execute the corresponding code (with a corresponding continuation).
199Each Cminor statement becomes zero or more RTLabs statements, with a
200decreasing measure based on the statement and continuations similar to
201CompCert's.  We may also follow CompCert in using a relational
202specification of this stage so as to abstract away from the functional
203(and highly dependently typed) definition.
204
205The first two labelling properties remain as before; we will show that
206cost labels are preserved, so the function entry point will be a cost
207label, and successors that are cost labels map still map to cost
208labels, preserving the condition on branches.  We replace the property
209for loops with the notion that we will always reach a cost label or
210the end of the function by following a bounded number of successors.
211This can be easily seen in Cminor using the requirement for cost
212labels at the head of loops and after gotos.  It remains to show that
213this is preserved by the translation to RTLabs.  % how?
214
215\subsection{RTLabs structured trace generation}
216
217This proof-only step incorporates the function call structure and cost
218labelling properties into the execution trace.  As the function calls
219are nested within the trace, we need to distinguish between
220terminating and non-terminating function calls.
221
222Thus we use the excluded middle (specialised to a function termination
223property) to do this.  Terminating functions are built by following
224the trace, breaking it into chunks between cost labels and recursively
225processing function calls.  The main difficulties here are the
226non-structurally recursive nature of the function (instead we use the
227size of the termination proof as a measure) and using the properties
228on RTLabs cost labelling to create the correct structure.  We also
229show that the lower stack frames are preserved during function calls
230in order to prove that after returning from a function call we execute
231the correct code.  This part of the work has already been constructed,
232but still requires a simple proof to show that flattening the structured
233trace recreates the original flat trace.
234
235The non-terminating case uses the excluded middle to decide whether to
236use the function described above to construct an inductive terminating
237structured trace, or a coinductive version for non-termination.  It
238follows the trace like the terminating version to build up chunks of
239trace from cost-label to cost-label (which, by the finite distance to
240a cost label property shown before, can be represented by an inductive
241type), unless it encounters a non-terminating function in which case
242it corecursively processes that.  This part of the trace
243transformation is currently under construction, and will also need a
244flattening result to show that it is correct.
245
246\section{Backend: RTLabs to machine code}
247\label{sect.backend.rtlabs.machine.code}
248
249The compiler backend consists of the following intermediate languages, and stages of translation:
250
251\begin{center}
252\begin{minipage}{.8\linewidth}
253\begin{tabbing}
254\quad \= $\downarrow$ \quad \= \kill
255\textsf{RTLabs}\\
256\> $\downarrow$ \> copy propagation\footnote{\label{lab:opt}To be ported from the untrusted compiler and certified only in case of early completion of the certification of the other passes.} (an endo-transformation) \\
257\> $\downarrow$ \> instruction selection\\
258\textsf{RTL}\\
259\> $\downarrow$ \> constant propagation$^{\mbox{\scriptsize \ref{lab:opt}}}$ (an endo-transformation) \\
260\> $\downarrow$ \> calling convention made explicit \\
261\> $\downarrow$ \> stack frame generation \\
262\textsf{ERTL}\\
263\> $\downarrow$ \> register allocation and spilling\\
264\> $\downarrow$ \> dead code elimination\\
265\textsf{LTL}\\
266\> $\downarrow$ \> function linearisation\\
267\> $\downarrow$ \> branch compression (an endo-transformation) \\
268\textsf{LIN}\\
269\> $\downarrow$ \> relabeling\\
270\textsf{ASM}\\
271\> $\downarrow$ \> pseudoinstruction expansion\\
272\textsf{Machine code}\\
273\end{tabbing}
274\end{minipage}
275\end{center}
276
277Here, by `endo-transformation', we mean a mapping from language back to itself.
278For example, the constant propagation step maps the RTL language to the RTL language.
279
280\subsection{The RTLabs to RTL translation}
281\label{subsect.rtlabs.rtl.translation}
282
283% dpm: type system and casting load (???)
284We require a map, $\sigma$, between \texttt{Values} of the front-end memory model to lists of \texttt{BEValues} of the back-end memory model:
285
286\begin{displaymath}
287\mathtt{Value} ::= \bot \mid \mathtt{int(size)} \mid \mathtt{float} \mid \mathtt{null} \mid \mathtt{ptr} \quad\stackrel{\sigma}{\longrightarrow}\quad \mathtt{BEValue} ::= \bot \mid \mathtt{byte} \mid \mathtt{int}_i \mid \mathtt{ptr}_i
288\end{displaymath}
289
290We further require a map, $\sigma$, which maps the front-end \texttt{Memory} and the back-end's notion of \texttt{BEMemory}.
291
292\begin{displaymath}
293\mathtt{Mem}\ \alpha = \mathtt{Block} \rightarrow (\mathbb{Z} \rightarrow \alpha)
294\end{displaymath}
295
296where
297
298\begin{displaymath}
299\mathtt{Block} ::= \mathtt{Region} \cup \mathtt{ID}
300\end{displaymath}
301
302\begin{displaymath}
303\mathtt{BEMem} = \mathtt{Mem} \mathtt{Value}
304\end{displaymath}
305
306\begin{displaymath}
307\mathtt{Address} = \mathtt{BEValue} \times  \mathtt{BEValue} \\
308\end{displaymath}
309
310\begin{displaymath}
311\mathtt{Mem} = \mathtt{Block} \rightarrow (\mathbb{Z} \rightarrow \mathtt{Cont} \mid \mathtt{Value} \times \mathtt{Size} \mid \mathtt{null})
312\end{displaymath}
313
314\begin{center}
315\begin{picture}(2, 2)
316\put(0,-20){\framebox(25,25)[c]{\texttt{v}}}
317\put(26,-20){\framebox(25,25)[c]{\texttt{4}}}
318\put(51,-20){\framebox(25,25)[c]{\texttt{cont}}}
319\put(76,-20){\framebox(25,25)[c]{\texttt{cont}}}
320\put(101,-20){\framebox(25,25)[c]{\texttt{cont}}}
321\end{picture}
322\end{center}
323
324\begin{displaymath}
325\mathtt{load}\ s\ a\ M = \mathtt{Some}\ v \rightarrow \forall i \leq s.\ \mathtt{load}\ s\ (a + i)\ \sigma(M) = \mathtt{Some}\ v_i
326\end{displaymath}
327
328\begin{displaymath}
329\sigma(\mathtt{store}\ v\ M) = \mathtt{store}\ \sigma(v)\ \sigma(M)
330\end{displaymath}
331
332\begin{displaymath}
333\begin{array}{rll}
334\mathtt{State} & ::=  & (\mathtt{State} : \mathtt{Frame}^* \times \mathtt{Frame} \\
335               & \mid & \mathtt{Call} : \mathtt{Frame}^* \times \mathtt{Args} \times \mathtt{Return} \times \mathtt{Fun} \\
336               & \mid & \mathtt{Return} : \mathtt{Frame}^* \times \mathtt{Value} \times \mathtt{Return}) \times \mathtt{Mem}
337\end{array}
338\end{displaymath}
339
340\begin{displaymath}
341\mathtt{State} ::= \mathtt{Frame}^* \times \mathtt{PC} \times \mathtt{SP} \times \mathtt{ISP} \times \mathtt{CARRY} \times \mathtt{REGS}
342\end{displaymath}
343
344\begin{displaymath}
345\mathtt{State} \stackrel{\sigma}{\longrightarrow} \mathtt{State}
346\end{displaymath}
347
348\begin{displaymath}
349\sigma(\mathtt{State} (\mathtt{Frame}^* \times \mathtt{Frame})) \longrightarrow ((\sigma(\mathtt{Frame}^*), \sigma(\mathtt{PC}), \sigma(\mathtt{SP}), 0, 0, \sigma(\mathtt{REGS})), \sigma(\mathtt{Mem}))
350\end{displaymath}
351
352\begin{displaymath}
353\sigma(\mathtt{Return}(-)) \longrightarrow \sigma \circ \text{return one step}
354\end{displaymath}
355
356\begin{displaymath}
357\sigma(\mathtt{Call}(-)) \longrightarrow \sigma \circ \text{call one step}
358\end{displaymath}
359
360Return one step commuting diagram:
361
362\begin{displaymath}
363\begin{diagram}
364s & \rTo^{\text{one step of execution}} & s'   \\
365  & \rdTo                             & \dTo \\
366  &                                   & \llbracket s'' \rrbracket
367\end{diagram}
368\end{displaymath}
369
370Call one step commuting diagram:
371
372\begin{displaymath}
373\begin{diagram}
374s & \rTo^{\text{one step of execution}} & s'   \\
375  & \rdTo                             & \dTo \\
376  &                                   & \llbracket s'' \rrbracket
377\end{diagram}
378\end{displaymath}
379
380\begin{displaymath}
381\begin{diagram}
382\mathtt{CALL}(\mathtt{id},\ \mathtt{args},\ \mathtt{dst},\ \mathtt{pc}),\ \mathtt{State}(\mathtt{Frame},\ \mathtt{Frames}) & \rTo & \mathtt{Call}(FINISH ME)
383\end{diagram}
384\end{displaymath}
385
386\begin{displaymath}
387\begin{array}{rcl}
388\mathtt{CALL}(\mathtt{id}, \mathtt{args}, \mathtt{dst}, \mathtt{pc}) & \longrightarrow & \mathtt{CALL\_ID}(\mathtt{id}, \sigma'(\mathtt{args}), \sigma(\mathtt{dst}), \mathtt{pc}) \\
389\mathtt{RETURN}                                                      & \longrightarrow & \mathtt{RETURN} \\
390\end{array} 
391\end{displaymath}
392
393\begin{displaymath}
394\begin{array}{rcl}
395\sigma & : & \mathtt{register} \rightarrow \mathtt{list\ register} \\
396\sigma' & : & \mathtt{list\ register} \rightarrow \mathtt{list\ register}
397\end{array} 
398\end{displaymath}
399
400\subsection{The RTL to ERTL translation}
401\label{subsect.rtl.ertl.translation}
402
403\begin{displaymath}
404\begin{diagram}
405& & \llbracket \mathtt{CALL\_ID}(\mathtt{id}, \mathtt{args}, \mathtt{dst}, \mathtt{pc})\rrbracket & & \\
406& \ldTo^{\text{external}} & & \rdTo^{\text{internal}} & \\
407\skull & & & & \mathtt{regs} = [\mathtt{params}/-] \\
408& & & & \mathtt{sp} = \mathtt{ALLOC} \\
409& & & & \mathtt{PUSH}(\mathtt{carry}, \mathtt{regs}, \mathtt{dst}, \mathtt{return\_addr}), \mathtt{pc}_{0}, \mathtt{regs}, \mathtt{sp} \\
410\end{diagram}
411\end{displaymath}
412
413\begin{align*}
414\llbracket \mathtt{RETURN} \rrbracket \\
415\mathtt{return\_addr} & := \mathtt{top}(\mathtt{stack}) \\
416v*                    & := m(\mathtt{rv\_regs}) \\
417\mathtt{dst}, \mathtt{sp}, \mathtt{carry}, \mathtt{regs} & := \mathtt{pop} \\
418\mathtt{regs}[v* / \mathtt{dst}] \\
419\end{align*}
420
421\begin{displaymath}
422\begin{diagram}
423s    & \rTo^1 & s' & \rTo^1 & s'' \\
424\dTo &        &    & \rdTo  & \dTo \\
425\llbracket s \rrbracket & \rTo(1,3)^1 & & & \llbracket s'' \rrbracket \\ 
426\mathtt{CALL} \\
427\end{diagram}
428\end{displaymath}
429
430\begin{displaymath}
431\begin{diagram}
432s    & \rTo^1 & s' & \rTo^1 & s'' \\
433\dTo &        &    & \rdTo  & \dTo \\
434\  & \rTo(1,3) & & & \ \\
435\mathtt{RETURN} \\
436\end{diagram}
437\end{displaymath}
438
439\begin{displaymath}
440\mathtt{b\_graph\_translate}: (\mathtt{label} \rightarrow \mathtt{blist'})
441\rightarrow \mathtt{graph} \rightarrow \mathtt{graph}
442\end{displaymath}
443
444\begin{align*}
445\mathtt{theorem} &\ \mathtt{b\_graph\_translate\_ok}: \\
446& \forall  f.\forall G_{i}.\mathtt{let}\ G_{\sigma} := \mathtt{b\_graph\_translate}\ f\ G_{i}\ \mathtt{in} \\
447&       \forall l \in G_{i}.\mathtt{subgraph}\ (f\ l)\ l\ (\mathtt{next}\ l\ G_{i})\ G_{\sigma}
448\end{align*}
449
450\begin{align*}
451\mathtt{lemma} &\ \mathtt{execute\_1\_step\_ok}: \\
452&       \forall s.  \mathtt{let}\ s' := s\ \sigma\ \mathtt{in} \\
453&       \mathtt{let}\ l := pc\ s\ \mathtt{in} \\
454&       s \stackrel{1}{\rightarrow} s^{*} \Rightarrow \exists n. s' \stackrel{n}{\rightarrow} s'^{*} \wedge s'^{*} = s'\ \sigma
455\end{align*}
456
457\begin{align*}
458\mathrm{RTL\ status} & \ \ \mathrm{ERTL\ status} \\
459\mathtt{sp} & = \mathtt{spl} / \mathtt{sph} \\
460\mathtt{graph} & \mapsto \mathtt{graph} + \mathtt{prologue}(s) + \mathtt{epilogue}(s) \\
461& \mathrm{where}\ s = \mathrm{callee\ saved} + \nu \mathrm{RA} \\
462\end{align*}
463
464\begin{displaymath}
465\begin{diagram}
466\mathtt{CALL} & \rTo^1 & \mathtt{inside\ function} \\
467\dTo & & \dTo \\
468\underbrace{\ldots}_{\llbracket \mathtt{CALL} \rrbracket} & \rTo &
469\underbrace{\ldots}_{\mathtt{prologue}} \\
470\end{diagram}
471\end{displaymath}
472
473\begin{displaymath}
474\begin{diagram}
475\mathtt{RETURN} & \rTo^1 & \mathtt{.} \\
476\dTo & & \dTo \\
477\underbrace{\ldots}_{\mathtt{epilogue}} & \rTo &
478\underbrace{\ldots} \\
479\end{diagram}
480\end{displaymath}
481
482\begin{align*}
483\mathtt{prologue}(s) = & \mathtt{create\_new\_frame}; \\
484                       & \mathtt{pop\ ra}; \\
485                       & \mathtt{save\ callee\_saved}; \\
486                                                                                         & \mathtt{get\_params} \\
487                                                                                         & \ \ \mathtt{reg\_params}: \mathtt{move} \\
488                                                                                         & \ \ \mathtt{stack\_params}: \mathtt{push}/\mathtt{pop}/\mathtt{move} \\
489\end{align*}
490
491\begin{align*}
492\mathtt{epilogue}(s) = & \mathtt{save\ return\ to\ tmp\ real\ regs}; \\
493                                                                                         & \mathtt{restore\_registers}; \\
494                       & \mathtt{push\ ra}; \\
495                       & \mathtt{delete\_frame}; \\
496                       & \mathtt{save return} \\
497\end{align*}
498
499\begin{displaymath}
500\mathtt{CALL}\ id \mapsto \mathtt{set\_params};\ \mathtt{CALL}\ id;\ \mathtt{fetch\_result}
501\end{displaymath}
502
503\subsection{The ERTL to LTL translation}
504\label{subsect.ertl.ltl.translation}
505
506\subsection{The LTL to LIN translation}
507\label{subsect.ltl.lin.translation}
508
509We require a map, $\sigma$, from LTL statuses, where program counters are represented as labels in a graph data structure, to LIN statuses, where program counters are natural numbers:
510\begin{displaymath}
511\mathtt{pc : label} \stackrel{\sigma}{\longrightarrow} \mathbb{N}
512\end{displaymath}
513
514The LTL to LIN translation pass also linearises the graph data structure into a list of instructions.
515Pseudocode for the linearisation process is as follows:
516
517\begin{lstlisting}
518let rec linearise graph visited required generated todo :=
519  match todo with
520  | l::todo ->
521    if l $\in$ visited then
522      let generated := generated $\cup\ \{$ Goto l $\}$ in
523      let required := required $\cup$ l in
524        linearise graph visited required generated todo
525    else
526      -- Get the instruction at label `l' in the graph
527      let lookup := graph(l) in
528      let generated := generated $\cup\ \{$ lookup $\}$ in
529      -- Find the successor of the instruction at label `l' in the graph
530      let successor := succ(l, graph) in
531      let todo := successor::todo in
532        linearise graph visited required generated todo
533  | []      -> (required, generated)
534\end{lstlisting}
535
536It is easy to see that this linearisation process eventually terminates.
537In particular, the size of the visited label set is monotonically increasing, and is bounded above by the size of the graph that we are linearising.
538
539The initial call to \texttt{linearise} sees the \texttt{visited}, \texttt{required} and \texttt{generated} sets set to the empty set, and \texttt{todo} initialized with the singleton list consisting of the entry point of the graph.
540We envisage needing to prove the following invariants on the linearisation function above:
541
542\begin{enumerate}
543\item
544$\mathtt{visited} \approx \mathtt{generated}$, where $\approx$ is \emph{multiset} equality, as \texttt{generated} is a set of instructions where instructions may mention labels multiple times, and \texttt{visited} is a set of labels,
545\item
546$\forall \mathtt{l} \in \mathtt{generated}.\ \mathtt{succ(l,\ graph)} \subseteq \mathtt{required} \cup \mathtt{todo}$,
547\item
548$\mathtt{required} \subseteq \mathtt{visited}$,
549\item
550$\mathtt{visited} \cap \mathtt{todo} = \emptyset$.
551\end{enumerate}
552
553The invariants collectively imply the following properties, crucial to correctness, about the linearisation process:
554
555\begin{enumerate}
556\item
557Every graph node is visited at most once,
558\item
559Every instruction that is generated is generated due to some graph node being visited,
560\item
561The successor instruction of every instruction that has been visited already will eventually be visited too.
562\end{enumerate}
563
564Note, because the LTL to LIN transformation is the first time the program is linearised, we must discover a notion of `well formed program' suitable for linearised forms.
565In particular, we see the notion of well formedness (yet to be formally defined) resting on the following conditions:
566
567\begin{enumerate}
568\item
569For every jump to a label in a linearised program, the target label exists at some point in the program,
570\item
571Each label is unique, appearing only once in the program,
572\item
573The final instruction of a program must be a return.
574\end{enumerate}
575
576We assume that these properties will be easy consequences of the invariants on the linearisation function defined above.
577
578The final condition above is potentially a little opaque, so we explain further.
579First, the only instructions that can reasonably appear in final position at the end of a program are returns or backward jumps, as any other instruction would cause execution to `fall out' of the end of the program (for example, when a function invoked with \texttt{CALL} returns, it returns to the next instruction past the \texttt{CALL} that invoked it).
580However, in LIN, though each function's graph has been linearised, the entire program is not yet fully linearised into a list of instructions, but rather, a list of `functions', each consisting of a linearised body along with other data.
581Each well-formed function must end with a call to \texttt{RET}, and therefore the only correct instruction that can terminate a LIN program is a \texttt{RET} instruction.
582
583\subsection{The LIN to ASM and ASM to MCS-51 machine code translations}
584\label{subsect.lin.asm.translation}
585
586The LIN to ASM translation step is trivial, being almost the identity function.
587The only non-trivial feature of the LIN to ASM translation is that all labels are `named apart' so that there is no chance of freshly generated labels from different namespaces clashing with labels from another namespace.
588
589The ASM to MCS-51 machine code translation step, and the required statements of correctness, are found in an unpublished manuscript attached to this document.
590
591\end{document}
Note: See TracBrowser for help on using the repository browser.