source: Deliverables/D1.2/CompilerProofOutline/outline.tex @ 1728

Last change on this file since 1728 was 1728, checked in by mulligan, 8 years ago

avoiding conflicts

File size: 13.7 KB
Line 
1\documentclass[a4paper, 10pt]{article}
2
3\usepackage{a4wide}
4\usepackage{amsfonts}
5\usepackage{amsmath}
6\usepackage{amssymb}
7\usepackage[english]{babel}
8\usepackage{color}
9\usepackage{diagrams}
10\usepackage{graphicx}
11\usepackage[colorlinks]{hyperref}
12\usepackage[utf8x]{inputenc}
13\usepackage{listings}
14\usepackage{microtype}
15\usepackage{skull}
16\usepackage{stmaryrd}
17
18\lstdefinelanguage{matita-ocaml} {
19  mathescape=true
20}
21\lstset{
22  language=matita-ocaml,basicstyle=\tt,columns=flexible,breaklines=false,
23  showspaces=false, showstringspaces=false, extendedchars=false,
24  inputencoding=utf8x, tabsize=2
25}
26
27\title{Proof outline for the correctness of the CerCo compiler}
28\date{\today}
29\author{The CerCo team}
30
31\begin{document}
32
33\maketitle
34
35\section{Introduction}
36\label{sect.introduction}
37
38In the last project review of the CerCo project, the project reviewers expressed the opinion that it would be a good idea to attempt to write down some of the statements of the correctness theorems that we intend to prove about the complexity preserving compiler.
39This document provides a very high-level, pen-and-paper sketch of what we view as the best path to completing the correctness proof for the compiler.
40In particular, for every translation between two intermediate languages, in both the front- and back-ends, we identify the key translation steps, and identify some invariants that we view as being important for the correctness proof.
41
42\section{The RTLabs to RTL translation}
43\label{sect.rtlabs.rtl.translation}
44
45% dpm: type system and casting load (???)
46We require a map, $\sigma$, between \texttt{Values} of the front-end memory model to lists of \texttt{BEValues} of the back-end memory model:
47
48\begin{displaymath}
49\mathtt{Value} ::= \bot \mid \mathtt{int(size)} \mid \mathtt{float} \mid \mathtt{null} \mid \mathtt{ptr} \quad\stackrel{\sigma}{\longrightarrow}\quad \mathtt{BEValue} ::= \bot \mid \mathtt{byte} \mid \mathtt{int}_i \mid \mathtt{ptr}_i
50\end{displaymath}
51
52We further require a map, $\sigma$, which maps the front-end \texttt{Memory} and the back-end's notion of \texttt{BEMemory}.
53
54\begin{displaymath}
55\mathtt{Mem}\ \alpha = \mathtt{Block} \rightarrow (\mathbb{Z} \rightarrow \alpha)
56\end{displaymath}
57
58where
59
60\begin{displaymath}
61\mathtt{Block} ::= \mathtt{Region} \cup \mathtt{ID}
62\end{displaymath}
63
64\begin{displaymath}
65\mathtt{BEMem} = \mathtt{Mem} \mathtt{Value}
66\end{displaymath}
67
68\begin{displaymath}
69\mathtt{Address} = \mathtt{BEValue} \times  \mathtt{BEValue} \\
70\end{displaymath}
71
72\begin{displaymath}
73\mathtt{Mem} = \mathtt{Block} \rightarrow (\mathbb{Z} \rightarrow \mathtt{Cont} \mid \mathtt{Value} \times \mathtt{Size} \mid \mathtt{null})
74\end{displaymath}
75
76\begin{center}
77\begin{picture}(2, 2)
78% picture of sigma mapping memory to memory
79TODO
80\end{picture}
81\end{center}
82
83\begin{displaymath}
84\mathtt{load}\ s\ a\ M = \mathtt{Some}\ v \rightarrow \forall i \leq s.\ \mathtt{load}\ s\ (a + i)\ \sigma(M) = \mathtt{Some}\ v_i
85\end{displaymath}
86
87\begin{displaymath}
88\sigma(\mathtt{store}\ v\ M) = \mathtt{store}\ \sigma(v)\ \sigma(M)
89\end{displaymath}
90
91\begin{displaymath}
92\begin{array}{rll}
93\mathtt{State} & ::=  & (\mathtt{State} : \mathtt{Frame}^* \times \mathtt{Frame} \\
94               & \mid & \mathtt{Call} : \mathtt{Frame}^* \times \mathtt{Args} \times \mathtt{Return} \times \mathtt{Fun} \\
95               & \mid & \mathtt{Return} : \mathtt{Frame}^* \times \mathtt{Value} \times \mathtt{Return}) \times \mathtt{Mem}
96\end{array}
97\end{displaymath}
98
99\begin{displaymath}
100\mathtt{State} ::= \mathtt{Frame}^* \times \mathtt{PC} \times \mathtt{SP} \times \mathtt{ISP} \times \mathtt{CARRY} \times \mathtt{REGS}
101\end{displaymath}
102
103\begin{displaymath}
104\mathtt{State} \stackrel{\sigma}{\longrightarrow} \mathtt{State}
105\end{displaymath}
106
107\begin{displaymath}
108\sigma(\mathtt{State} (\mathtt{Frame}^* \times \mathtt{Frame})) \longrightarrow ((\sigma(\mathtt{Frame}^*), \sigma(\mathtt{PC}), \sigma(\mathtt{SP}), 0, 0, \sigma(\mathtt{REGS})), \sigma(\mathtt{Mem}))
109\end{displaymath}
110
111\begin{displaymath}
112\sigma(\mathtt{Return}(-)) \longrightarrow \sigma \circ \text{return one step}
113\end{displaymath}
114
115\begin{displaymath}
116\sigma(\mathtt{Call}(-)) \longrightarrow \sigma \circ \text{call one step}
117\end{displaymath}
118
119Return one step commuting diagram:
120
121\begin{displaymath}
122\begin{diagram}
123s & \rTo^{\text{one step of execution}} & s'   \\
124  & \rdTo                             & \dTo \\
125  &                                   & \llbracket s'' \rrbracket
126\end{diagram}
127\end{displaymath}
128
129Call one step commuting diagram:
130
131\begin{displaymath}
132\begin{diagram}
133s & \rTo^{\text{one step of execution}} & s'   \\
134  & \rdTo                             & \dTo \\
135  &                                   & \llbracket s'' \rrbracket
136\end{diagram}
137\end{displaymath}
138
139\begin{displaymath}
140\begin{diagram}
141\mathtt{CALL}(\mathtt{id},\ \mathtt{args},\ \mathtt{dst},\ \mathtt{pc}),\ \mathtt{State}(\mathtt{Frame},\ \mathtt{Frames}) & \rTo & \mathtt{Call}(FINISH ME)
142\end{diagram}
143\end{displaymath}
144
145\begin{displaymath}
146\begin{array}{rcl}
147\mathtt{CALL}(\mathtt{id}, \mathtt{args}, \mathtt{dst}, \mathtt{pc}) & \longrightarrow & \mathtt{CALL\_ID}(\mathtt{id}, \sigma'(\mathtt{args}), \sigma(\mathtt{dst}), \mathtt{pc}) \\
148\mathtt{RETURN}                                                      & \longrightarrow & \mathtt{RETURN} \\
149\end{array} 
150\end{displaymath}
151
152\begin{displaymath}
153\begin{array}{rcl}
154\sigma & : & \mathtt{register} \rightarrow \mathtt{list\ register} \\
155\sigma' & : & \mathtt{list\ register} \rightarrow \mathtt{list\ register}
156\end{array} 
157\end{displaymath}
158
159\section{The RTL to ERTL translation}
160\label{sect.rtl.ertl.translation}
161
162\begin{displaymath}
163\begin{diagram}
164& & \llbracket \mathtt{CALL\_ID}(\mathtt{id}, \mathtt{args}, \mathtt{dst}, \mathtt{pc})\rrbracket & & \\
165& \ldTo^{\text{external}} & & \rdTo^{\text{internal}} & \\
166\skull & & & & \mathtt{regs} = [\mathtt{params}/-] \\
167& & & & \mathtt{sp} = \mathtt{ALLOC} \\
168& & & & \mathtt{PUSH}(\mathtt{carry}, \mathtt{regs}, \mathtt{dst}, \mathtt{return\_addr}), \mathtt{pc}_{0}, \mathtt{regs}, \mathtt{sp} \\
169\end{diagram}
170\end{displaymath}
171
172\begin{align*}
173\llbracket \mathtt{RETURN} \rrbracket \\
174\mathtt{return\_addr} & := \mathtt{top}(\mathtt{stack}) \\
175v*                    & := m(\mathtt{rv\_regs}) \\
176\mathtt{dst}, \mathtt{sp}, \mathtt{carry}, \mathtt{regs} & := \mathtt{pop} \\
177\mathtt{regs}[v* / \mathtt{dst}] \\
178\end{align*}
179
180\begin{displaymath}
181\begin{diagram}
182s    & \rTo^1 & s' & \rTo^1 & s'' \\
183\dTo &        &    & \rdTo  & \dTo \\
184\llbracket s \rrbracket & \rTo(1,3)^1 & & & \llbracket s'' \rrbracket \\ 
185\mathtt{CALL} \\
186\end{diagram}
187\end{displaymath}
188
189\begin{displaymath}
190\begin{diagram}
191s    & \rTo^1 & s' & \rTo^1 & s'' \\
192\dTo &        &    & \rdTo  & \dTo \\
193\  & \rTo(1,3) & & & \ \\
194\mathtt{RETURN} \\
195\end{diagram}
196\end{displaymath}
197
198\begin{displaymath}
199\mathtt{b\_graph\_translate}: (\mathtt{label} \rightarrow \mathtt{blist'})
200\rightarrow \mathtt{graph} \rightarrow \mathtt{graph}
201\end{displaymath}
202
203\begin{align*}
204\mathtt{theorem} &\ \mathtt{b\_graph\_translate\_ok}: \\
205& \forall  f.\forall G_{i}.\mathtt{let}\ G_{\sigma} := \mathtt{b\_graph\_translate}\ f\ G_{i}\ \mathtt{in} \\
206&       \forall l \in G_{i}.\mathtt{subgraph}\ (f\ l)\ l\ (\mathtt{next}\ l\ G_{i})\ G_{\sigma}
207\end{align*}
208
209\begin{align*}
210\mathtt{lemma} &\ \mathtt{execute\_1\_step\_ok}: \\
211&       \forall s.  \mathtt{let}\ s' := s\ \sigma\ \mathtt{in} \\
212&       \mathtt{let}\ l := pc\ s\ \mathtt{in} \\
213&       s \stackrel{1}{\rightarrow} s^{*} \Rightarrow \exists n. s' \stackrel{n}{\rightarrow} s'^{*} \wedge s'^{*} = s'\ \sigma
214\end{align*}
215
216\begin{align*}
217\mathrm{RTL\ status} & \ \ \mathrm{ERTL\ status} \\
218\mathtt{sp} & = \mathtt{spl} / \mathtt{sph} \\
219\mathtt{graph} & \mapsto \mathtt{graph} + \mathtt{prologue}(s) + \mathtt{epilogue}(s) \\
220& \mathrm{where}\ s = \mathrm{callee\ saved} + \nu \mathrm{RA} \\
221\end{align*}
222
223\begin{displaymath}
224\begin{diagram}
225\mathtt{CALL} & \rTo^1 & \mathtt{inside\ function} \\
226\dTo & & \dTo \\
227\underbrace{\ldots}_{\llbracket \mathtt{CALL} \rrbracket} & \rTo &
228\underbrace{\ldots}_{\mathtt{prologue}} \\
229\end{diagram}
230\end{displaymath}
231
232\begin{displaymath}
233\begin{diagram}
234\mathtt{RETURN} & \rTo^1 & \mathtt{.} \\
235\dTo & & \dTo \\
236\underbrace{\ldots}_{\mathtt{epilogue}} & \rTo &
237\underbrace{\ldots} \\
238\end{diagram}
239\end{displaymath}
240
241\begin{align*}
242\mathtt{prologue}(s) = & \mathtt{create\_new\_frame}; \\
243                       & \mathtt{pop\ ra}; \\
244                       & \mathtt{save\ callee\_saved}; \\
245                                                                                         & \mathtt{get\_params} \\
246                                                                                         & \ \ \mathtt{reg\_params}: \mathtt{move} \\
247                                                                                         & \ \ \mathtt{stack\_params}: \mathtt{push}/\mathtt{pop}/\mathtt{move} \\
248\end{align*}
249
250\begin{align*}
251\mathtt{epilogue}(s) = & \mathtt{save\ return\ to\ tmp\ real\ regs}; \\
252                                                                                         & \mathtt{restore\_registers}; \\
253                       & \mathtt{push\ ra}; \\
254                       & \mathtt{delete\_frame}; \\
255                       & \mathtt{save return} \\
256\end{align*}
257
258\begin{displaymath}
259\mathtt{CALL} id \mapsto \mathtt{set\_params}; \mathtt{CALL} id; \mathtt{fetch\_result}
260\end{displaymath}
261
262\section{The ERTL to LTL translation}
263\label{sect.ertl.ltl.translation}
264
265\section{The LTL to LIN translation}
266\label{sect.ltl.lin.translation}
267
268We require a map, $\sigma$, from LTL statuses, where program counters are represented as labels in a graph data structure, to LIN statuses, where program counters are natural numbers:
269\begin{displaymath}
270\mathtt{pc : label} \stackrel{\sigma}{\longrightarrow} \mathbb{N}
271\end{displaymath}
272
273The LTL to LIN translation pass also linearises the graph data structure into a list of instructions.
274Pseudocode for the linearisation process is as follows:
275
276\begin{lstlisting}
277let rec linearise graph visited required generated todo :=
278  match todo with
279  | l::todo ->
280    if l $\in$ visited then
281      let generated := generated $\cup\ \{$ Goto l $\}$ in
282      let required := required $\cup$ l in
283        linearise graph visited required generated todo
284    else
285      -- Get the instruction at label `l' in the graph
286      let lookup := graph(l) in
287      let generated := generated $\cup\ \{$ lookup $\}$ in
288      -- Find the successor of the instruction at label `l' in the graph
289      let successor := succ(l, graph) in
290      let todo := successor::todo in
291        linearise graph visited required generated todo
292  | []      -> (required, generated)
293\end{lstlisting}
294
295It is easy to see that this linearisation process eventually terminates.
296In particular, the size of the visited label set is monotonically increasing, and is bounded above by the size of the graph that we are linearising.
297
298The initial call to \texttt{linearise} sees the \texttt{visited}, \texttt{required} and \texttt{generated} sets set to the empty set, and \texttt{todo} initialized with the singleton list consisting of the entry point of the graph.
299We envisage needing to prove the following invariants on the linearisation function above:
300
301\begin{enumerate}
302\item
303$\mathtt{visited} \approx \mathtt{generated}$, where $\approx$ is \emph{multiset} equality, as \texttt{generated} is a set of instructions where instructions may mention labels multiple times, and \texttt{visited} is a set of labels,
304\item
305$\forall \mathtt{l} \in \mathtt{generated}.\ \mathtt{succ(l,\ graph)} \subseteq \mathtt{required} \cup \mathtt{todo}$,
306\item
307$\mathtt{required} \subseteq \mathtt{visited}$,
308\item
309$\mathtt{visited} \cap \mathtt{todo} = \emptyset$.
310\end{enumerate}
311
312The invariants collectively imply the following properties, crucial to correctness, about the linearisation process:
313
314\begin{enumerate}
315\item
316Every graph node is visited at most once,
317\item
318Every instruction that is generated is generated due to some graph node being visited,
319\item
320The successor instruction of every instruction that has been visited already will eventually be visited too.
321\end{enumerate}
322
323Note, because the LTL to LIN transformation is the first time the program is linearised, we must discover a notion of `well formed program' suitable for linearised forms.
324In particular, we see the notion of well formedness (yet to be formally defined) resting on the following conditions:
325
326\begin{enumerate}
327\item
328For every jump to a label in a linearised program, the target label exists at some point in the program,
329\item
330Each label is unique, appearing only once in the program,
331\item
332The final instruction of a program must be a return.
333\end{enumerate}
334
335We assume that these properties will be easy consequences of the invariants on the linearisation function defined above.
336
337The final condition above is potentially a little opaque, so we explain further.
338First, the only instructions that can reasonably appear in final position at the end of a program are returns or backward jumps, as any other instruction would cause execution to `fall out' of the end of the program (for example, when a function invoked with \texttt{CALL} returns, it returns to the next instruction past the \texttt{CALL} that invoked it).
339However, in LIN, though each function's graph has been linearised, the entire program is not yet fully linearised into a list of instructions, but rather, a list of `functions', each consisting of a linearised body along with other data.
340Each well-formed function must end with a call to \texttt{RET}, and therefore the only correct instruction that can terminate a LIN program is a \texttt{RET} instruction.
341
342\section{The LIN to ASM and ASM to MCS-51 machine code translations}
343\label{sect.lin.asm.translation}
344
345The LIN to ASM translation step is trivial, being almost the identity function.
346The only non-trivial feature of the LIN to ASM translation is that all labels are `named apart' so that there is no chance of freshly generated labels from different namespaces clashing with labels from another namespace.
347
348The ASM to MCS-51 machine code translation step, and the required statements of correctness, are found in an unpublished manuscript attached to this document.
349
350\end{document}
Note: See TracBrowser for help on using the repository browser.