1 | \documentclass[a4paper, 10pt]{article} |
---|
2 | |
---|
3 | \usepackage{a4wide} |
---|
4 | \usepackage{amsfonts} |
---|
5 | \usepackage{amsmath} |
---|
6 | \usepackage{amssymb} |
---|
7 | \usepackage[english]{babel} |
---|
8 | \usepackage{color} |
---|
9 | \usepackage{diagrams} |
---|
10 | \usepackage{graphicx} |
---|
11 | \usepackage[colorlinks]{hyperref} |
---|
12 | \usepackage[utf8x]{inputenc} |
---|
13 | \usepackage{listings} |
---|
14 | \usepackage{microtype} |
---|
15 | \usepackage{skull} |
---|
16 | \usepackage{stmaryrd} |
---|
17 | \usepackage{array} |
---|
18 | \newcolumntype{b}{@{}>{{}}} |
---|
19 | \newcolumntype{B}{@{}>{{}}c<{{}}@{}} |
---|
20 | \newcolumntype{h}[1]{@{\hspace{#1}}} |
---|
21 | \newcolumntype{L}{>{$}l<{$}} |
---|
22 | \newcolumntype{C}{>{$}c<{$}} |
---|
23 | \newcolumntype{R}{>{$}r<{$}} |
---|
24 | \newcolumntype{S}{>{$(}r<{)$}} |
---|
25 | \newcolumntype{n}{@{}} |
---|
26 | \usepackage{wasysym} |
---|
27 | |
---|
28 | |
---|
29 | \lstdefinelanguage{matita-ocaml} { |
---|
30 | mathescape=true |
---|
31 | } |
---|
32 | \lstset{ |
---|
33 | language=matita-ocaml,basicstyle=\tt,columns=flexible,breaklines=false, |
---|
34 | showspaces=false, showstringspaces=false, extendedchars=false, |
---|
35 | inputencoding=utf8x, tabsize=2 |
---|
36 | } |
---|
37 | |
---|
38 | \title{Proof outline for the correctness of the CerCo compiler} |
---|
39 | \date{\today} |
---|
40 | \author{The CerCo team} |
---|
41 | |
---|
42 | \begin{document} |
---|
43 | |
---|
44 | \maketitle |
---|
45 | |
---|
46 | \section{Introduction} |
---|
47 | \label{sect.introduction} |
---|
48 | |
---|
49 | In the last project review of the CerCo project, the project reviewers |
---|
50 | recommended us to quickly outline a paper-and-pencil correctness proof |
---|
51 | for each of the stages of the CerCo compiler in order to allow for an |
---|
52 | estimation of the complexity and time required to complete the formalization |
---|
53 | of the proof. This has been possible starting from month 18 when we have |
---|
54 | completed the formalization in Matita of the datastructures and code of |
---|
55 | the compiler. |
---|
56 | |
---|
57 | In this document we provide a very high-level, pen-and-paper |
---|
58 | sketch of what we view as the best path to completing the correctness proof |
---|
59 | for the compiler. In particular, for every translation between two intermediate languages, in both the front- and back-ends, we identify the key translation steps, and identify some invariants that we view as being important for the correctness proof. We sketch the overall correctness results, and also briefly describe the parts of the proof that have already |
---|
60 | been completed at the end of the First Period. |
---|
61 | |
---|
62 | In the last section we finally present an estimation of the effort required |
---|
63 | for the certification in Matita of the compiler and we draw conclusions. |
---|
64 | |
---|
65 | \section{Front-end: Clight to RTLabs} |
---|
66 | |
---|
67 | The front-end of the CerCo compiler consists of several stages: |
---|
68 | |
---|
69 | \begin{center} |
---|
70 | \begin{minipage}{.8\linewidth} |
---|
71 | \begin{tabbing} |
---|
72 | \quad \= $\downarrow$ \quad \= \kill |
---|
73 | \textsf{Clight}\\ |
---|
74 | \> $\downarrow$ \> cast removal\\ |
---|
75 | \> $\downarrow$ \> add runtime functions\footnote{Following the last project |
---|
76 | meeting we intend to move this transformation to the back-end}\\ |
---|
77 | \> $\downarrow$ \> cost labelling\\ |
---|
78 | \> $\downarrow$ \> loop optimizations\footnote{\label{lab:opt2}To be ported from the untrusted compiler and certified only in case of early completion of the certification of the other passes.} (an endo-transformation)\\ |
---|
79 | \> $\downarrow$ \> partial redundancy elimination$^{\mbox{\scriptsize \ref{lab:opt2}}}$ (an endo-transformation)\\ |
---|
80 | \> $\downarrow$ \> stack variable allocation and control structure |
---|
81 | simplification\\ |
---|
82 | \textsf{Cminor}\\ |
---|
83 | \> $\downarrow$ \> generate global variable initialisation code\\ |
---|
84 | \> $\downarrow$ \> transform to RTL graph\\ |
---|
85 | \textsf{RTLabs}\\ |
---|
86 | \> $\downarrow$ \> \\ |
---|
87 | \>\,\vdots |
---|
88 | \end{tabbing} |
---|
89 | \end{minipage} |
---|
90 | \end{center} |
---|
91 | |
---|
92 | Here, by `endo-transformation', we mean a mapping from language back to itself: |
---|
93 | the loop optimization step maps the Clight language to itself. |
---|
94 | |
---|
95 | %Our overall statements of correctness with respect to costs will |
---|
96 | %require a correctly labelled program |
---|
97 | There are three layers in most of the proofs proposed: |
---|
98 | \begin{enumerate} |
---|
99 | \item invariants closely tied to the syntax and transformations using |
---|
100 | dependent types (such as the presence of variable names in environments), |
---|
101 | \item a forward simulation proof relating each small-step of the |
---|
102 | source to zero or more steps of the target, and |
---|
103 | \item proofs about syntactic properties of the cost labelling. |
---|
104 | \end{enumerate} |
---|
105 | The first will support both functional correctness and allow us to |
---|
106 | show the totality of some of the compiler stages (that is, those |
---|
107 | stages of the compiler cannot fail). The second provides the main |
---|
108 | functional correctness result, including the preservation of cost |
---|
109 | labels in the traces, and the last will be crucial for applying |
---|
110 | correctness results about the costings from the back-end by showing |
---|
111 | that they appear in enough places so that we can assign all of the |
---|
112 | execution costs to them. |
---|
113 | |
---|
114 | We will also prove that a suitably labelled RTLabs trace can be turned |
---|
115 | into a \emph{structured trace} which splits the execution trace into |
---|
116 | cost-label to cost-label chunks with nested function calls. This |
---|
117 | structure was identified during work on the correctness of the |
---|
118 | back-end cost analysis as retaining important information about the |
---|
119 | structure of the execution that is difficult to reconstruct later in |
---|
120 | the compiler. |
---|
121 | |
---|
122 | \subsection{Clight cast removal} |
---|
123 | |
---|
124 | This transformation removes some casts inserted by the parser to make |
---|
125 | arithmetic promotion explicit but which are superfluous (such as |
---|
126 | \lstinline[language=C]'c = (short)((int)a + (int)b);' where |
---|
127 | \lstinline'a' and \lstinline'b' are \lstinline[language=C]'short'). |
---|
128 | This is necessary for producing good code for our target architecture. |
---|
129 | |
---|
130 | It only affects Clight expressions, recursively detecting casts that |
---|
131 | can be safely eliminated. The semantics provides a big-step |
---|
132 | definition for expression, so we should be able to show a lock-step |
---|
133 | forward simulation between otherwise identical states using a lemma |
---|
134 | showing that cast elimination does not change the evaluation of |
---|
135 | expressions. This lemma will follow from a structural induction on |
---|
136 | the source expression. We have already proved a few of the underlying |
---|
137 | arithmetic results necessary to validate the approach. |
---|
138 | |
---|
139 | \subsection{Clight cost labelling} |
---|
140 | |
---|
141 | This adds cost labels before and after selected statements and |
---|
142 | expressions, and the execution traces ought to be equivalent modulo |
---|
143 | the new cost labels. Hence it requires a simple forward simulation |
---|
144 | with a limited amount of stuttering whereever a new cost label is |
---|
145 | introduced. A bound can be given for the amount of stuttering allowed |
---|
146 | based on the statement or continuation to be evaluated next. |
---|
147 | |
---|
148 | We also intend to show three syntactic properties about the cost |
---|
149 | labelling: |
---|
150 | \begin{enumerate} |
---|
151 | \item every function starts with a cost label, |
---|
152 | \item every branching instruction is followed by a cost label (note that |
---|
153 | exiting a loop is treated as a branch), and |
---|
154 | \item the head of every loop (and any \lstinline'goto' destination) is |
---|
155 | a cost label. |
---|
156 | \end{enumerate} |
---|
157 | These can be shown by structural induction on the source term. |
---|
158 | |
---|
159 | \subsection{Clight to Cminor translation} |
---|
160 | |
---|
161 | This translation is the first to introduce some invariants, with the |
---|
162 | proofs closely tied to the implementation by dependent typing. These |
---|
163 | are largely complete and show that the generated code enjoys: |
---|
164 | \begin{itemize} |
---|
165 | \item some minimal type safety shown by explicit checks on the |
---|
166 | Cminor types during the transformation (a little more work remains |
---|
167 | to be done here, but follows the same form); |
---|
168 | \item that variables named in the parameter and local variable |
---|
169 | environments are distinct from one another, again by an explicit |
---|
170 | check; |
---|
171 | \item that variables used in the generated code are present in the |
---|
172 | resulting environment (either by checking their presence in the |
---|
173 | source environment, or from a list of freshly generated temporary variables); |
---|
174 | and |
---|
175 | \item that all \lstinline[language=C]'goto' labels are present (by |
---|
176 | checking them against a list of source labels and proving that all |
---|
177 | source labels are preserved). |
---|
178 | \end{itemize} |
---|
179 | |
---|
180 | The simulation will be similar to the relevant stages of CompCert |
---|
181 | (Clight to Csharpminor and Csharpminor to Cminor --- in the event that |
---|
182 | the direct proof is unwieldy we could introduce an intermediate |
---|
183 | language corresponding to Csharpminor). During early experimentation |
---|
184 | with porting CompCert definitions to the Matita proof assistant we |
---|
185 | found little difficulty reproving the results for the memory model, so |
---|
186 | we plan to port the memory injection properties and use them to relate |
---|
187 | Clight in-memory variables with either the value of the local variable or a |
---|
188 | stack slot, depending on how it was classified. |
---|
189 | |
---|
190 | This should be sufficient to show the equivalence of (big-step) |
---|
191 | expression evaluation. The simulation can then be shown by relating |
---|
192 | corresponding blocks of statement and continuations with their Cminor |
---|
193 | counterparts and proving that a few steps reaches the next matching |
---|
194 | state. |
---|
195 | |
---|
196 | The syntactic properties required for cost labels remain similar and a |
---|
197 | structural induction on the function bodies should be sufficient to |
---|
198 | show that they are preserved. |
---|
199 | |
---|
200 | \subsection{Cminor global initialisation code} |
---|
201 | |
---|
202 | This short phase replaces the global variable initialisation data with |
---|
203 | code that executes when the program starts. Each piece of |
---|
204 | initialisation data in the source is matched by a new statement |
---|
205 | storing that data. As each global variable is allocated a distinct |
---|
206 | memory block, the program state after the initialisation statements |
---|
207 | will be the same as the original program's state at the start of |
---|
208 | execution, and will proceed in the same manner afterwards. |
---|
209 | |
---|
210 | % Actually, the above is wrong... |
---|
211 | % ... this ought to be in a fresh main function with a fresh cost label |
---|
212 | |
---|
213 | \subsection{Cminor to RTLabs translation} |
---|
214 | |
---|
215 | In this part of the compiler we transform the program's functions into |
---|
216 | control flow graphs. It is closely related to CompCert's Cminorsel to |
---|
217 | RTL transformation, albeit with target-independent operations. |
---|
218 | |
---|
219 | We already enforce several invariants with dependent types: some type |
---|
220 | safety, mostly shown using the type information from Cminor; and |
---|
221 | that the graph is closed (by showing that each successor was recently |
---|
222 | added, or corresponds to a \lstinline[language=C]'goto' label which |
---|
223 | are all added before the end). Note that this relies on a |
---|
224 | monotonicity property; CompCert maintains a similar property in a |
---|
225 | similar way while building RTL graphs. We will also add a result |
---|
226 | showing that all of the pseudo-register names are distinct for use by |
---|
227 | later stages using the same method as Cminor. |
---|
228 | |
---|
229 | The simulation will relate Cminor states to RTLabs states which are about to |
---|
230 | execute the code corresponding to the Cminor statement or continuation. |
---|
231 | Each Cminor statement becomes zero or more RTLabs statements, with a |
---|
232 | decreasing measure based on the statement and continuations similar to |
---|
233 | CompCert's. We may also follow CompCert in using a relational |
---|
234 | specification of this stage so as to abstract away from the functional |
---|
235 | (and highly dependently typed) definition. |
---|
236 | |
---|
237 | The first two labelling properties remain as before; we will show that |
---|
238 | cost labels are preserved, so the function entry point will be a cost |
---|
239 | label, and successors to any statement that are cost labels map still |
---|
240 | map to cost labels, preserving the condition on branches. We replace |
---|
241 | the property for loops with the notion that we will always reach a |
---|
242 | cost label or the end of the function after following a bounded number of |
---|
243 | successors. This can be easily seen in Cminor using the requirement |
---|
244 | for cost labels at the head of loops and after gotos. It remains to |
---|
245 | show that this is preserved by the translation to RTLabs. % how? |
---|
246 | |
---|
247 | \subsection{RTLabs structured trace generation} |
---|
248 | |
---|
249 | This proof-only step incorporates the function call structure and cost |
---|
250 | labelling properties into the execution trace. As the function calls |
---|
251 | are nested within the trace, we need to distinguish between |
---|
252 | terminating and non-terminating function calls. Thus we use the |
---|
253 | excluded middle (specialised to a function termination property) to do |
---|
254 | this. |
---|
255 | |
---|
256 | Structured traces for terminating functions are built by following the |
---|
257 | flat trace, breaking it into chunks between cost labels and |
---|
258 | recursively processing function calls. The main difficulties here are |
---|
259 | the non-structurally recursive nature of the function (instead we use |
---|
260 | the size of the termination proof as a measure) and using the RTLabs |
---|
261 | cost labelling properties to show that the constraints of the |
---|
262 | structured traces are observed. We also show that the lower stack |
---|
263 | frames are preserved during function calls in order to prove that |
---|
264 | after returning from a function call we resume execution of the |
---|
265 | correct code. This part of the work has already been constructed, but |
---|
266 | still requires a simple proof to show that flattening the structured |
---|
267 | trace recreates the original flat trace. |
---|
268 | |
---|
269 | The non-terminating case follows the trace like the terminating |
---|
270 | version to build up chunks of trace from cost-label to cost-label |
---|
271 | (which, by the finite distance to a cost label property shown before, |
---|
272 | can be represented by an inductive type). These chunks are chained |
---|
273 | together in a coinductive data structure that can represent |
---|
274 | non-terminating traces. The excluded middle is used to decide whether |
---|
275 | function calls terminate, in which case the function described above |
---|
276 | constructs an inductive terminating structured trace which is nested |
---|
277 | in the caller's trace. Otherwise, another coinductive constructor is |
---|
278 | used to embed the non-terminating trace of the callee, generated by |
---|
279 | corecursion. This part of the trace transformation is currently under |
---|
280 | construction, and will also need a flattening result to show that it |
---|
281 | is correct. |
---|
282 | |
---|
283 | |
---|
284 | \section{Backend: RTLabs to machine code} |
---|
285 | \label{sect.backend.rtlabs.machine.code} |
---|
286 | |
---|
287 | The compiler backend consists of the following intermediate languages, and stages of translation: |
---|
288 | |
---|
289 | \begin{center} |
---|
290 | \begin{minipage}{.8\linewidth} |
---|
291 | \begin{tabbing} |
---|
292 | \quad \=\,\vdots\= \\ |
---|
293 | \> $\downarrow$ \>\\ |
---|
294 | \> $\downarrow$ \quad \= \kill |
---|
295 | \textsf{RTLabs}\\ |
---|
296 | \> $\downarrow$ \> copy propagation\footnote{\label{lab:opt}To be ported from the untrusted compiler and certified only in case of early completion of the certification of the other passes.} (an endo-transformation) \\ |
---|
297 | \> $\downarrow$ \> instruction selection\\ |
---|
298 | \> $\downarrow$ \> change of memory models in compiler\\ |
---|
299 | \textsf{RTL}\\ |
---|
300 | \> $\downarrow$ \> constant propagation$^{\mbox{\scriptsize \ref{lab:opt}}}$ (an endo-transformation) \\ |
---|
301 | \> $\downarrow$ \> calling convention made explicit \\ |
---|
302 | \> $\downarrow$ \> layout of activation records \\ |
---|
303 | \textsf{ERTL}\\ |
---|
304 | \> $\downarrow$ \> register allocation and spilling\\ |
---|
305 | \> $\downarrow$ \> dead code elimination\\ |
---|
306 | \textsf{LTL}\\ |
---|
307 | \> $\downarrow$ \> function linearisation\\ |
---|
308 | \> $\downarrow$ \> branch compression (an endo-transformation) \\ |
---|
309 | \textsf{LIN}\\ |
---|
310 | \> $\downarrow$ \> relabeling\\ |
---|
311 | \textsf{ASM}\\ |
---|
312 | \> $\downarrow$ \> pseudoinstruction expansion\\ |
---|
313 | \textsf{MCS-51 machine code}\\ |
---|
314 | \end{tabbing} |
---|
315 | \end{minipage} |
---|
316 | \end{center} |
---|
317 | |
---|
318 | \subsection{The RTLabs to RTL translation} |
---|
319 | \label{subsect.rtlabs.rtl.translation} |
---|
320 | |
---|
321 | The RTLabs to RTL translation pass marks the frontier between the two memory models used in the CerCo project. |
---|
322 | As a result, we require some method of translating between the values that the two memory models permit. |
---|
323 | Suppose we have such a translation, $\sigma$. |
---|
324 | Then the translation between values of the two memory models may be pictured with: |
---|
325 | |
---|
326 | \begin{displaymath} |
---|
327 | \mathtt{Value} ::= \bot \mid \mathtt{int(size)} \mid \mathtt{float} \mid \mathtt{null} \mid \mathtt{ptr} \quad\stackrel{\sigma}{\longrightarrow}\quad \mathtt{BEValue} ::= \bot \mid \mathtt{byte} \mid \mathtt{null}_i \mid \mathtt{ptr}_i |
---|
328 | \end{displaymath} |
---|
329 | |
---|
330 | In the front-end, we have both integer and float values, where integer values are `sized', along with null values and pointers. Some frontenv values are |
---|
331 | representables in a byte, but some others require more bits. |
---|
332 | |
---|
333 | In the back-end model all values are meant to be represented in a single byte. |
---|
334 | Values can thefore be undefined, be one byte long integers or be indexed |
---|
335 | fragments of a pointer, null or not. Floats values are no longer present, as floating point arithmetic is not supported by the CerCo compiler. |
---|
336 | |
---|
337 | The $\sigma$ map implements a one-to-many relation: a single front-end value |
---|
338 | is mapped to a sequence of back-end values when its size is more then one byte. |
---|
339 | |
---|
340 | We further require a map, $\sigma$, which maps the front-end \texttt{Memory} and the back-end's notion of \texttt{BEMemory}. Both kinds of memory can be |
---|
341 | thought as an instance of a generic \texttt{Mem} data type parameterized over |
---|
342 | the kind of values stored in memory. |
---|
343 | |
---|
344 | \begin{displaymath} |
---|
345 | \mathtt{Mem}\ \alpha = \mathtt{Block} \rightarrow (\mathbb{Z} \rightarrow \alpha) |
---|
346 | \end{displaymath} |
---|
347 | |
---|
348 | Here, \texttt{Block} consists of a \texttt{Region} paired with an identifier. |
---|
349 | |
---|
350 | \begin{displaymath} |
---|
351 | \mathtt{Block} ::= \mathtt{Region} \times \mathtt{ID} |
---|
352 | \end{displaymath} |
---|
353 | |
---|
354 | We now have what we need for defining what is meant by the `memory' in the backend memory model. |
---|
355 | Namely, we instantiate the previously defined \texttt{Mem} type with the type of back-end memory values. |
---|
356 | |
---|
357 | \begin{displaymath} |
---|
358 | \mathtt{BEMem} = \mathtt{Mem} \mathtt{BEValue} |
---|
359 | \end{displaymath} |
---|
360 | |
---|
361 | Memory addresses consist of a pair of back-end memory values: |
---|
362 | |
---|
363 | \begin{displaymath} |
---|
364 | \mathtt{Address} = \mathtt{BEValue} \times \mathtt{BEValue} \\ |
---|
365 | \end{displaymath} |
---|
366 | |
---|
367 | The back- and front-end memory models differ in how they represent sized integeer values in memory. |
---|
368 | In particular, the front-end stores integer values as a header, with size information, followed by a string of `continuation' blocks, marking out the full representation of the value in memory. |
---|
369 | In contrast, the layout of sized integer values in the back-end memory model consists of a series of byte-sized `chunks': |
---|
370 | |
---|
371 | \begin{center} |
---|
372 | \begin{picture}(0, 25) |
---|
373 | \put(-125,0){\framebox(25,25)[c]{\texttt{v,4}}} |
---|
374 | \put(-100,0){\framebox(25,25)[c]{\texttt{cont}}} |
---|
375 | \put(-75,0){\framebox(25,25)[c]{\texttt{cont}}} |
---|
376 | \put(-50,0){\framebox(25,25)[c]{\texttt{cont}}} |
---|
377 | \put(-15,10){\vector(1, 0){30}} |
---|
378 | \put(25,0){\framebox(25,25)[c]{\texttt{\texttt{v$_1$}}}} |
---|
379 | \put(50,0){\framebox(25,25)[c]{\texttt{\texttt{v$_2$}}}} |
---|
380 | \put(75,0){\framebox(25,25)[c]{\texttt{\texttt{v$_3$}}}} |
---|
381 | \put(100,0){\framebox(25,25)[c]{\texttt{\texttt{v$_4$}}}} |
---|
382 | \end{picture} |
---|
383 | \end{center} |
---|
384 | |
---|
385 | Chunks for pointers are pairs made of the original pointer and the index of the chunk. |
---|
386 | Therefore, when assembling the chunks together, we can always recognize if all chunks refer to the same value or if the operation is meaningless. |
---|
387 | |
---|
388 | The differing memory representations of values in the two memory models imply the need for a series of lemmas on the actions of \texttt{load} and \texttt{store} to ensure correctness. |
---|
389 | The first lemma required has the following statement: |
---|
390 | \begin{displaymath} |
---|
391 | \mathtt{load}\ s\ a\ M = \mathtt{Some}\ v \rightarrow \forall i \leq s.\ \mathtt{load}\ s\ (a + i)\ \sigma(M) = \mathtt{Some}\ v_i |
---|
392 | \end{displaymath} |
---|
393 | That is, if we are successful in reading a value of size $s$ from memory at address $a$ in front-end memory, then we should successfully be able to read a value from memory in the back-end memory at \emph{any} address from address $a$ up to and including address $a + i$, where $i \leq s$. |
---|
394 | |
---|
395 | Next, we must show that \texttt{store} properly commutes with the $\sigma$-map between memory spaces: |
---|
396 | \begin{displaymath} |
---|
397 | \sigma(\mathtt{store}\ a\ v\ M) = \mathtt{store}\ \sigma(v)\ \sigma(a)\ \sigma(M) |
---|
398 | \end{displaymath} |
---|
399 | That is, if we store a value \texttt{v} in the front-end memory \texttt{M} at address \texttt{a} and transform the resulting memory with $\sigma$, then this is equivalent to storing a transformed value $\mathtt{\sigma(v)}$ at address $\mathtt{\sigma(a)}$ into the back-end memory $\mathtt{\sigma(M)}$. |
---|
400 | |
---|
401 | Finally, we must prove that \texttt{load}, \texttt{store} and $\sigma$ all properly commute. |
---|
402 | Writing \texttt{load}$^*$ for multiple consecutive iterations of \texttt{load}, we must prove: |
---|
403 | \begin{displaymath} |
---|
404 | \texttt{load}^* (\mathtt{store}\ \sigma(a')\ \sigma(v)\ \sigma(M))\ \sigma(a)\ \sigma(M) = \mathtt{load}^*\ \sigma(s)\ \sigma(a)\ \sigma(M) |
---|
405 | \end{displaymath} |
---|
406 | That is, suppose we store a transformed value $\mathtt{\sigma(v)}$ into a back-end memory $\mathtt{\sigma(M)}$ at address $\mathtt{\sigma(a')}$, using \texttt{store}, and then load the correct number of bytes (for the size of $\mathtt{\sigma(v)}$ at address $\sigma(a)$. |
---|
407 | Then, this should be equivalent to loading the correct number of bytes from address $\sigma(a)$ in an unaltered version of $\mathtt{\sigma(M)}$, \emph{providing} that the memory regions occupied by the two sequences of bytes at the two addresses do not overlap. |
---|
408 | This will entail more proof obligations, demonstrating that the $\sigma$-map between memory spaces respects memory regions. |
---|
409 | |
---|
410 | \begin{displaymath} |
---|
411 | \begin{array}{rll} |
---|
412 | \mathtt{State} & ::= & (\mathtt{State} : \mathtt{Frame}^* \times \mathtt{Frame} \\ |
---|
413 | & \mid & \mathtt{Call} : \mathtt{Frame}^* \times \mathtt{Args} \times \mathtt{Return} \times \mathtt{Fun} \\ |
---|
414 | & \mid & \mathtt{Return} : \mathtt{Frame}^* \times \mathtt{Value} \times \mathtt{Return}) \times \mathtt{Mem} |
---|
415 | \end{array} |
---|
416 | \end{displaymath} |
---|
417 | |
---|
418 | \begin{displaymath} |
---|
419 | \mathtt{State} ::= \mathtt{Frame}^* \times \mathtt{PC} \times \mathtt{SP} \times \mathtt{ISP} \times \mathtt{CARRY} \times \mathtt{REGS} |
---|
420 | \end{displaymath} |
---|
421 | |
---|
422 | \begin{displaymath} |
---|
423 | \mathtt{State} \stackrel{\sigma}{\longrightarrow} \mathtt{State} |
---|
424 | \end{displaymath} |
---|
425 | |
---|
426 | \begin{displaymath} |
---|
427 | \sigma(\mathtt{State} (\mathtt{Frame}^* \times \mathtt{Frame})) \longrightarrow ((\sigma(\mathtt{Frame}^*), \sigma(\mathtt{PC}), \sigma(\mathtt{SP}), 0, 0, \sigma(\mathtt{REGS})), \sigma(\mathtt{Mem})) |
---|
428 | \end{displaymath} |
---|
429 | |
---|
430 | \begin{displaymath} |
---|
431 | \sigma(\mathtt{Return}(-)) \longrightarrow \sigma \circ \text{return one step} |
---|
432 | \end{displaymath} |
---|
433 | |
---|
434 | \begin{displaymath} |
---|
435 | \sigma(\mathtt{Call}(-)) \longrightarrow \sigma \circ \text{call one step} |
---|
436 | \end{displaymath} |
---|
437 | |
---|
438 | Return one step commuting diagram: |
---|
439 | |
---|
440 | \begin{displaymath} |
---|
441 | \begin{diagram} |
---|
442 | s & \rTo^{\text{one step of execution}} & s' \\ |
---|
443 | & \rdTo & \dTo \\ |
---|
444 | & & \llbracket s'' \rrbracket |
---|
445 | \end{diagram} |
---|
446 | \end{displaymath} |
---|
447 | |
---|
448 | Call one step commuting diagram: |
---|
449 | |
---|
450 | \begin{displaymath} |
---|
451 | \begin{diagram} |
---|
452 | s & \rTo^{\text{one step of execution}} & s' \\ |
---|
453 | & \rdTo & \dTo \\ |
---|
454 | & & \llbracket s'' \rrbracket |
---|
455 | \end{diagram} |
---|
456 | \end{displaymath} |
---|
457 | |
---|
458 | \begin{displaymath} |
---|
459 | \begin{array}{rcl} |
---|
460 | \mathtt{Call(id,\ args,\ dst,\ pc),\ State(FRAME, FRAMES)} & \longrightarrow & \mathtt{Call(M(args), dst)}, \\ |
---|
461 | & & \mathtt{PUSH(current\_frame[PC := after\_return])} |
---|
462 | \end{array} |
---|
463 | \end{displaymath} |
---|
464 | |
---|
465 | In the case where the call is to an external function, we have: |
---|
466 | |
---|
467 | \begin{displaymath} |
---|
468 | \begin{array}{rcl} |
---|
469 | \mathtt{Call(M(args), dst)}, & \stackrel{\mathtt{ret\_val = f(M(args))}}{\longrightarrow} & \mathtt{Return(ret\_val,\ dst,\ PUSH(...))} \\ |
---|
470 | \mathtt{PUSH(current\_frame[PC := after\_return])} & & |
---|
471 | \end{array} |
---|
472 | \end{displaymath} |
---|
473 | |
---|
474 | then: |
---|
475 | |
---|
476 | \begin{displaymath} |
---|
477 | \begin{array}{rcl} |
---|
478 | \mathtt{Return(ret\_val,\ dst,\ PUSH(...))} & \longrightarrow & \mathtt{pc = POP\_STACK(regs[dst := M(ret\_val)],\ pc)} |
---|
479 | \end{array} |
---|
480 | \end{displaymath} |
---|
481 | |
---|
482 | In the case where the call is to an internal function, we have: |
---|
483 | |
---|
484 | \begin{displaymath} |
---|
485 | \begin{array}{rcl} |
---|
486 | \mathtt{CALL}(\mathtt{id}, \mathtt{args}, \mathtt{dst}, \mathtt{pc}) & \longrightarrow & \mathtt{CALL\_ID}(\mathtt{id}, \sigma'(\mathtt{args}), \sigma(\mathtt{dst}), \mathtt{pc}) \\ |
---|
487 | \mathtt{RETURN} & \longrightarrow & \mathtt{RETURN} \\ |
---|
488 | \end{array} |
---|
489 | \end{displaymath} |
---|
490 | |
---|
491 | \begin{displaymath} |
---|
492 | \begin{array}{rcl} |
---|
493 | \mathtt{Call(M(args), dst)} & \longrightarrow & \mathtt{sp = alloc}, regs = \emptyset[- := PARAMS] \\ |
---|
494 | \mathtt{PUSH(current\_frame[PC := after\_return])} & & \mathtt{State}(regs,\ sp,\ pc_\emptyset,\ dst) |
---|
495 | \end{array} |
---|
496 | \end{displaymath} |
---|
497 | |
---|
498 | then: |
---|
499 | |
---|
500 | \begin{displaymath} |
---|
501 | \begin{array}{rcl} |
---|
502 | \mathtt{sp = alloc}, regs = \emptyset[- := PARAMS] & \longrightarrow & \mathtt{free(sp)} \\ |
---|
503 | \mathtt{State}(regs,\ sp,\ pc_\emptyset,\ dst) & & \mathtt{Return(M(ret\_val), dst, frames)} |
---|
504 | \end{array} |
---|
505 | \end{displaymath} |
---|
506 | |
---|
507 | and finally: |
---|
508 | |
---|
509 | \begin{displaymath} |
---|
510 | \begin{array}{rcl} |
---|
511 | \mathtt{free(sp)} & \longrightarrow & \mathtt{pc = POP\_STACK(regs[dst := M(ret\_val)],\ pc)} \\ |
---|
512 | \mathtt{Return(M(ret\_val), dst, frames)} & & |
---|
513 | \end{array} |
---|
514 | \end{displaymath} |
---|
515 | |
---|
516 | \begin{displaymath} |
---|
517 | \begin{array}{rcl} |
---|
518 | \sigma & : & \mathtt{register} \rightarrow \mathtt{list\ register} \\ |
---|
519 | \sigma' & : & \mathtt{list\ register} \rightarrow \mathtt{list\ register} |
---|
520 | \end{array} |
---|
521 | \end{displaymath} |
---|
522 | |
---|
523 | \subsection{The RTL to ERTL translation} |
---|
524 | \label{subsect.rtl.ertl.translation} |
---|
525 | |
---|
526 | \begin{displaymath} |
---|
527 | \begin{diagram} |
---|
528 | & & \llbracket \mathtt{CALL\_ID}(\mathtt{id}, \mathtt{args}, \mathtt{dst}, \mathtt{pc})\rrbracket & & \\ |
---|
529 | & \ldTo^{\text{external}} & & \rdTo^{\text{internal}} & \\ |
---|
530 | \skull & & & & \mathtt{regs} = [\mathtt{params}/-] \\ |
---|
531 | & & & & \mathtt{sp} = \mathtt{ALLOC} \\ |
---|
532 | & & & & \mathtt{PUSH}(\mathtt{carry}, \mathtt{regs}, \mathtt{dst}, \mathtt{return\_addr}), \mathtt{pc}_{0}, \mathtt{regs}, \mathtt{sp} \\ |
---|
533 | \end{diagram} |
---|
534 | \end{displaymath} |
---|
535 | |
---|
536 | \begin{align*} |
---|
537 | \llbracket \mathtt{RETURN} \rrbracket \\ |
---|
538 | \mathtt{return\_addr} & := \mathtt{top}(\mathtt{stack}) \\ |
---|
539 | v* & := m(\mathtt{rv\_regs}) \\ |
---|
540 | \mathtt{dst}, \mathtt{sp}, \mathtt{carry}, \mathtt{regs} & := \mathtt{pop} \\ |
---|
541 | \mathtt{regs}[v* / \mathtt{dst}] \\ |
---|
542 | \end{align*} |
---|
543 | |
---|
544 | \begin{displaymath} |
---|
545 | \begin{diagram} |
---|
546 | s & \rTo^1 & s' & \rTo^1 & s'' \\ |
---|
547 | \dTo & & & \rdTo & \dTo \\ |
---|
548 | \llbracket s \rrbracket & \rTo(1,3)^1 & & & \llbracket s'' \rrbracket \\ |
---|
549 | \mathtt{CALL} \\ |
---|
550 | \end{diagram} |
---|
551 | \end{displaymath} |
---|
552 | |
---|
553 | \begin{displaymath} |
---|
554 | \begin{diagram} |
---|
555 | s & \rTo^1 & s' & \rTo^1 & s'' \\ |
---|
556 | \dTo & & & \rdTo & \dTo \\ |
---|
557 | \ & \rTo(1,3) & & & \ \\ |
---|
558 | \mathtt{RETURN} \\ |
---|
559 | \end{diagram} |
---|
560 | \end{displaymath} |
---|
561 | |
---|
562 | \begin{displaymath} |
---|
563 | \mathtt{b\_graph\_translate}: (\mathtt{label} \rightarrow \mathtt{blist'}) |
---|
564 | \rightarrow \mathtt{graph} \rightarrow \mathtt{graph} |
---|
565 | \end{displaymath} |
---|
566 | |
---|
567 | \begin{align*} |
---|
568 | \mathtt{theorem} &\ \mathtt{b\_graph\_translate\_ok}: \\ |
---|
569 | & \forall f.\forall G_{i}.\mathtt{let}\ G_{\sigma} := \mathtt{b\_graph\_translate}\ f\ G_{i}\ \mathtt{in} \\ |
---|
570 | & \forall l \in G_{i}.\mathtt{subgraph}\ (f\ l)\ l\ (\mathtt{next}\ l\ G_{i})\ G_{\sigma} |
---|
571 | \end{align*} |
---|
572 | |
---|
573 | \begin{align*} |
---|
574 | \mathtt{lemma} &\ \mathtt{execute\_1\_step\_ok}: \\ |
---|
575 | & \forall s. \mathtt{let}\ s' := s\ \sigma\ \mathtt{in} \\ |
---|
576 | & \mathtt{let}\ l := pc\ s\ \mathtt{in} \\ |
---|
577 | & s \stackrel{1}{\rightarrow} s^{*} \Rightarrow \exists n. s' \stackrel{n}{\rightarrow} s'^{*} \wedge s'^{*} = s'\ \sigma |
---|
578 | \end{align*} |
---|
579 | |
---|
580 | \begin{align*} |
---|
581 | \mathrm{RTL\ status} & \ \ \mathrm{ERTL\ status} \\ |
---|
582 | \mathtt{sp} & = \mathtt{spl} / \mathtt{sph} \\ |
---|
583 | \mathtt{graph} & \mathtt{graph} + \mathtt{prologue}(s) + \mathtt{epilogue}(s) \\ |
---|
584 | & \mathrm{where}\ s = \mathrm{callee\ saved} + \nu \mathrm{RA} \\ |
---|
585 | \end{align*} |
---|
586 | |
---|
587 | \begin{displaymath} |
---|
588 | \begin{diagram} |
---|
589 | \mathtt{CALL} & \rTo^1 & \mathtt{inside\ function} \\ |
---|
590 | \dTo & & \dTo \\ |
---|
591 | \underbrace{\ldots}_{\llbracket \mathtt{CALL} \rrbracket} & \rTo & |
---|
592 | \underbrace{\ldots}_{\mathtt{prologue}} \\ |
---|
593 | \end{diagram} |
---|
594 | \end{displaymath} |
---|
595 | |
---|
596 | \begin{displaymath} |
---|
597 | \begin{diagram} |
---|
598 | \mathtt{RETURN} & \rTo^1 & \mathtt{.} \\ |
---|
599 | \dTo & & \dTo \\ |
---|
600 | \underbrace{\ldots}_{\mathtt{epilogue}} & \rTo & |
---|
601 | \underbrace{\ldots} \\ |
---|
602 | \end{diagram} |
---|
603 | \end{displaymath} |
---|
604 | |
---|
605 | \begin{align*} |
---|
606 | \mathtt{prologue}(s) = & \mathtt{create\_new\_frame}; \\ |
---|
607 | & \mathtt{pop\ ra}; \\ |
---|
608 | & \mathtt{save\ callee\_saved}; \\ |
---|
609 | & \mathtt{get\_params} \\ |
---|
610 | & \ \ \mathtt{reg\_params}: \mathtt{move} \\ |
---|
611 | & \ \ \mathtt{stack\_params}: \mathtt{push}/\mathtt{pop}/\mathtt{move} \\ |
---|
612 | \end{align*} |
---|
613 | |
---|
614 | \begin{align*} |
---|
615 | \mathtt{epilogue}(s) = & \mathtt{save\ return\ to\ tmp\ real\ regs}; \\ |
---|
616 | & \mathtt{restore\_registers}; \\ |
---|
617 | & \mathtt{push\ ra}; \\ |
---|
618 | & \mathtt{delete\_frame}; \\ |
---|
619 | & \mathtt{save return} \\ |
---|
620 | \end{align*} |
---|
621 | |
---|
622 | \begin{displaymath} |
---|
623 | \mathtt{CALL}\ id \mapsto \mathtt{set\_params};\ \mathtt{CALL}\ id;\ \mathtt{fetch\_result} |
---|
624 | \end{displaymath} |
---|
625 | |
---|
626 | \subsection{The ERTL to LTL translation} |
---|
627 | \label{subsect.ertl.ltl.translation} |
---|
628 | \newcommand{\declsf}[1]{\expandafter\newcommand\expandafter{\csname #1\endcsname}{\mathop{\mathsf{#1}}\nolimits}} |
---|
629 | \declsf{Livebefore} |
---|
630 | \declsf{Liveafter} |
---|
631 | \declsf{Defined} |
---|
632 | \declsf{Used} |
---|
633 | \declsf{Eliminable} |
---|
634 | \declsf{StatementSem} |
---|
635 | For the liveness analysis, we aim at a map |
---|
636 | $\ell \in \mathtt{label} \mapsto $ live registers at $\ell$. |
---|
637 | We define the following operators on ERTL statements. |
---|
638 | $$ |
---|
639 | \begin{array}{lL>{(ex. $}L<{)$}} |
---|
640 | \Defined(s) & registers defined at $s$ & r_1\leftarrow r_2+r_3 \mapsto \{r_1,C\}, \mathtt{CALL}~id\mapsto \text{caller-save} |
---|
641 | \\ |
---|
642 | \Used(s) & registers used at $s$ & r_1\leftarrow r_2+r_3 \mapsto \{r_2,r_3\}, \mathtt{CALL}~id\mapsto \text{parameters} |
---|
643 | \end{array} |
---|
644 | $$ |
---|
645 | Given $LA:\mathtt{label}\to\mathtt{lattice}$ (where $\mathtt{lattice}$ |
---|
646 | is the type of sets of registers\footnote{More precisely, it is thethe lattice |
---|
647 | of pairs of sets of pseudo-registers and sets of hardware registers, |
---|
648 | with pointwise operations.}, we also have have the following |
---|
649 | predicates: |
---|
650 | $$ |
---|
651 | \begin{array}{lL} |
---|
652 | \Eliminable_{LA}(\ell) & iff $s(\ell)$ has side-effects only on $r\notin LA(\ell)$ |
---|
653 | \\& |
---|
654 | (ex.\ $\ell : r_1\leftarrow r_2+r_3 \mapsto (\{r_1,C\}\cap LA(\ell)\neq\emptyset, |
---|
655 | \mathtt{CALL}id\mapsto \text{never}$) |
---|
656 | \\ |
---|
657 | \Livebefore_{LA}(\ell) &$:= |
---|
658 | \begin{cases} |
---|
659 | LA(\ell) &\text{if $\Eliminable_{LA}(\ell)$,}\\ |
---|
660 | (LA(\ell)\setminus \Defined(s(\ell)))\cup \Used(s(\ell) &\text{otherwise}. |
---|
661 | \end{cases}$ |
---|
662 | \end{array} |
---|
663 | $$ |
---|
664 | In particular, $\Livebefore$ has type $(\mathtt{label}\to\mathtt{lattice})\to |
---|
665 | \mathtt{label}\to\mathtt{lattice}$. |
---|
666 | |
---|
667 | The equation on which we build the fixpoint is then |
---|
668 | $$\Liveafter(\ell) \doteq \bigcup_{\ell' >_1 \ell} \Livebefore_{\Liveafter}(\ell')$$ |
---|
669 | where $\ell' >_1 \ell$ denotes that $\ell'$ is an immediate successor of $\ell$ |
---|
670 | in the graph. We do not require the fixpoint to be the least one, so the hypothesis |
---|
671 | on $\Liveafter$ that we require is |
---|
672 | $$\Liveafter(\ell) \supseteq \bigcup_{\ell' >_1 \ell} \Livebefore(\ell')$$ |
---|
673 | (for shortness we drop the subscript from $\Livebefore$). |
---|
674 | \subsection{The LTL to LIN translation} |
---|
675 | \label{subsect.ltl.lin.translation} |
---|
676 | |
---|
677 | We require a map, $\sigma$, from LTL statuses, where program counters are represented as labels in a graph data structure, to LIN statuses, where program counters are natural numbers: |
---|
678 | \begin{displaymath} |
---|
679 | \mathtt{pc : label} \stackrel{\sigma}{\longrightarrow} \mathbb{N} |
---|
680 | \end{displaymath} |
---|
681 | |
---|
682 | The LTL to LIN translation pass also linearises the graph data structure into a list of instructions. |
---|
683 | Pseudocode for the linearisation process is as follows: |
---|
684 | |
---|
685 | \begin{lstlisting} |
---|
686 | let rec linearise graph visited required generated todo := |
---|
687 | match todo with |
---|
688 | | l::todo -> |
---|
689 | if l $\in$ visited then |
---|
690 | let generated := generated $\cup\ \{$ Goto l $\}$ in |
---|
691 | let required := required $\cup$ l in |
---|
692 | linearise graph visited required generated todo |
---|
693 | else |
---|
694 | -- Get the instruction at label `l' in the graph |
---|
695 | let lookup := graph(l) in |
---|
696 | let generated := generated $\cup\ \{$ lookup $\}$ in |
---|
697 | -- Find the successor of the instruction at label `l' in the graph |
---|
698 | let successor := succ(l, graph) in |
---|
699 | let todo := successor::todo in |
---|
700 | linearise graph visited required generated todo |
---|
701 | | [] -> (required, generated) |
---|
702 | \end{lstlisting} |
---|
703 | |
---|
704 | It is easy to see that this linearisation process eventually terminates. |
---|
705 | In particular, the size of the visited label set is monotonically increasing, and is bounded above by the size of the graph that we are linearising. |
---|
706 | |
---|
707 | The initial call to \texttt{linearise} sees the \texttt{visited}, \texttt{required} and \texttt{generated} sets set to the empty set, and \texttt{todo} initialized with the singleton list consisting of the entry point of the graph. |
---|
708 | We envisage needing to prove the following invariants on the linearisation function above: |
---|
709 | |
---|
710 | \begin{enumerate} |
---|
711 | \item |
---|
712 | $\mathtt{visited} \approx \mathtt{generated}$, where $\approx$ is \emph{multiset} equality, as \texttt{generated} is a set of instructions where instructions may mention labels multiple times, and \texttt{visited} is a set of labels, |
---|
713 | \item |
---|
714 | $\forall \mathtt{l} \in \mathtt{generated}.\ \mathtt{succ(l,\ graph)} \subseteq \mathtt{required} \cup \mathtt{todo}$, |
---|
715 | \item |
---|
716 | $\mathtt{required} \subseteq \mathtt{visited}$, |
---|
717 | \item |
---|
718 | $\mathtt{visited} \cap \mathtt{todo} = \emptyset$. |
---|
719 | \end{enumerate} |
---|
720 | |
---|
721 | The invariants collectively imply the following properties, crucial to correctness, about the linearisation process: |
---|
722 | |
---|
723 | \begin{enumerate} |
---|
724 | \item |
---|
725 | Every graph node is visited at most once, |
---|
726 | \item |
---|
727 | Every instruction that is generated is generated due to some graph node being visited, |
---|
728 | \item |
---|
729 | The successor instruction of every instruction that has been visited already will eventually be visited too. |
---|
730 | \end{enumerate} |
---|
731 | |
---|
732 | Note, because the LTL to LIN transformation is the first time the program is linearised, we must discover a notion of `well formed program' suitable for linearised forms. |
---|
733 | In particular, we see the notion of well formedness (yet to be formally defined) resting on the following conditions: |
---|
734 | |
---|
735 | \begin{enumerate} |
---|
736 | \item |
---|
737 | For every jump to a label in a linearised program, the target label exists at some point in the program, |
---|
738 | \item |
---|
739 | Each label is unique, appearing only once in the program, |
---|
740 | \item |
---|
741 | The final instruction of a program must be a return. |
---|
742 | \end{enumerate} |
---|
743 | |
---|
744 | We assume that these properties will be easy consequences of the invariants on the linearisation function defined above. |
---|
745 | |
---|
746 | The final condition above is potentially a little opaque, so we explain further. |
---|
747 | First, the only instructions that can reasonably appear in final position at the end of a program are returns or backward jumps, as any other instruction would cause execution to `fall out' of the end of the program (for example, when a function invoked with \texttt{CALL} returns, it returns to the next instruction past the \texttt{CALL} that invoked it). |
---|
748 | However, in LIN, though each function's graph has been linearised, the entire program is not yet fully linearised into a list of instructions, but rather, a list of `functions', each consisting of a linearised body along with other data. |
---|
749 | Each well-formed function must end with a call to \texttt{RET}, and therefore the only correct instruction that can terminate a LIN program is a \texttt{RET} instruction. |
---|
750 | |
---|
751 | \subsection{The LIN to ASM and ASM to MCS-51 machine code translations} |
---|
752 | \label{subsect.lin.asm.translation} |
---|
753 | |
---|
754 | The LIN to ASM translation step is trivial, being almost the identity function. |
---|
755 | The only non-trivial feature of the LIN to ASM translation is that all labels are `named apart' so that there is no chance of freshly generated labels from different namespaces clashing with labels from another namespace. |
---|
756 | |
---|
757 | The ASM to MCS-51 machine code translation step, and the required statements of correctness, are found in an unpublished manuscript attached to this document. |
---|
758 | This is the most complex translation because of the huge number of cases |
---|
759 | to be addressed and because of the complexity of the two semantics. |
---|
760 | Moreover, in the assembly code we have conditional and unconditional jumps |
---|
761 | to arbitrary locations in the code, which are not supported by the MCS-51 |
---|
762 | instruction set. The latter has several kind of jumps characterized by a |
---|
763 | different instruction size and execution time, but limited in range. For |
---|
764 | instance, conditional jumps to locations whose destination is more than |
---|
765 | $2^7$ bytes away from the jump instruction location are not supported at |
---|
766 | all and need to be emulated with a code transformation. The problem, which |
---|
767 | is known in the litterature as branch displacement and that applies also |
---|
768 | to modern architectures, is known to be hard and is often NP. As far as we |
---|
769 | know, we will provide the first formally verified proof of correctness for |
---|
770 | an assembler that implements branch displacement. We are also providing |
---|
771 | the first verified proof of correctness of a mildly optimizing branch |
---|
772 | displacement algorithm that, at the moment, is almost finished, but not |
---|
773 | described in the companion paper. This proof by itself took about 6 men |
---|
774 | months. |
---|
775 | |
---|
776 | \section{Correctness of cost prediction} |
---|
777 | Roughly speaking, |
---|
778 | the proof of correctness of cost prediction shows that the cost of executing |
---|
779 | a labelled object code program is the same as the sum over all labels in the |
---|
780 | program execution trace of the cost statically associated to the label and |
---|
781 | computed on the object code itself. |
---|
782 | |
---|
783 | In presence of object level function calls, the previous statement is, however, |
---|
784 | incorrect. The reason is twofold. First of all, a function call may diverge. |
---|
785 | To the last labels that comes before the call, however, we also associate |
---|
786 | the cost of the instructions that follow the call. Therefore, in the |
---|
787 | sum over all labels, when we meet a label we pre-pay for the instructions |
---|
788 | after function calls, assuming all calls to be terminating. This choice is |
---|
789 | driven by considerations on the source code. Functions can be called also |
---|
790 | inside expressions and it would be too disruptive to put labels inside |
---|
791 | expressions to capture the cost of instructions that follow a call. Moreover, |
---|
792 | adding a label after each call would produce a much higher number of proof |
---|
793 | obligations in the certification of source programs using Frama-C. The |
---|
794 | proof obligations, moreover, would be guarded by termination of all functions |
---|
795 | involved, that also generates lots of additional complex proof obligations |
---|
796 | that have little to do with execution costs. With our approach, instead, we |
---|
797 | put less burden on the user, at the price of proving a weaker statement: |
---|
798 | the estimated and actual costs will be the same if and only if the high level |
---|
799 | program is converging. For prefixes of diverging programs we can provide |
---|
800 | a similar result where the equality is replaced by an inequality (loss of |
---|
801 | precision). |
---|
802 | |
---|
803 | Assuming totality of functions is however not sufficient yet at the object |
---|
804 | level. Even if a function returns, there is no guarantee that it will transfer |
---|
805 | control back to the calling point. For instance, the function could have |
---|
806 | manipulated the return address from its stack frame. Moreover, an object level |
---|
807 | program can forge any address and transfer control to it, with no guarantee |
---|
808 | on the execution behaviour and labelling properties of the called program. |
---|
809 | |
---|
810 | To solve the problem, we introduced the notion of \emph{structured trace} |
---|
811 | that come in two flavours: structured traces for total programs (an inductive |
---|
812 | type) and structured traces for diverging programs (a co-inductive type based |
---|
813 | on the previous one). Roughly speaking, a structured trace represents the |
---|
814 | execution of a well behaved program that is subject to several constraints |
---|
815 | like |
---|
816 | \begin{enumerate} |
---|
817 | \item All function calls return control just after the calling point |
---|
818 | \item The execution of all function bodies start with a label and end with |
---|
819 | a RET (even the ones reached by invoking a function pointer) |
---|
820 | \item All instructions are covered by a label (required by correctness of |
---|
821 | the labelling approach) |
---|
822 | \item The target of all conditional jumps must be labelled (a sufficient |
---|
823 | but not necessary condition for precision of the labelling approach) |
---|
824 | \item \label{prop5} Two structured traces with the same structure yield the same |
---|
825 | cost traces. |
---|
826 | \end{enumerate} |
---|
827 | |
---|
828 | Correctness of cost predictions is proved only for structured execution traces, |
---|
829 | i.e. well behaved programs. The forward simulation proof for all back-end |
---|
830 | passes will actually be a proof of preservation of the structure of |
---|
831 | the structured traces that, because of property \ref{prop5}, will imply |
---|
832 | correctness of the cost prediction for the back-end. The Clight to RTLabs |
---|
833 | will also include a proof that associates to each converging execution its |
---|
834 | converging structured trace and to each diverging execution its diverging |
---|
835 | structured trace. |
---|
836 | |
---|
837 | There are also other two issues that invalidate the naive statement of |
---|
838 | correctness of cost prediciton given above. The algorithm that statically |
---|
839 | computes the cost of blocks is correct only when the object code is \emph{well |
---|
840 | formed} and the program counter is \emph{reachable}. |
---|
841 | A well formed object code is such that |
---|
842 | the program counter will never overflow after the execution step of |
---|
843 | the processor. An overflow that occurs during fetching but is overwritten |
---|
844 | during execution is, however, correct and necessary to accept correct |
---|
845 | programs that are as large as the program memory. Temporary overflows add |
---|
846 | complications to the proof. A reachable address is an address that can be |
---|
847 | obtained by fetching (not executing!) a finite number of times from the |
---|
848 | beginning of the code memory without ever overflowing. The complication is that |
---|
849 | the static prediction traverses the code memory assuming that the memory will |
---|
850 | be read sequentially from the beginning and that all jumps jump only to |
---|
851 | reachable addresses. When this property is violated, the way the code memory |
---|
852 | is interpreted is uncorrect and the cost computed is totally meaningless. |
---|
853 | The reachability relation is closed by fetching for well formed programs. |
---|
854 | The property that calls to function pointers only target reachable and |
---|
855 | well labelled locations, however, is not statically predictable and it is |
---|
856 | enforced in the structured trace. |
---|
857 | |
---|
858 | The proof of correctness of cost predictions has been quite complex. Setting |
---|
859 | up the good invariants (structured traces, well formed programs, reachability) |
---|
860 | and completing the proof has required more than 3 men months while the initally |
---|
861 | estimated effort was much lower. In the paper-and-pencil proof for IMP, the |
---|
862 | corresponding proof was obvious and only took two lines. |
---|
863 | |
---|
864 | The proof itself is quite involved. We |
---|
865 | basically need to show as an important lemma that the sum of the execution |
---|
866 | costs over a structured trace, where the costs are summed in execution order, |
---|
867 | is equivalent to the sum of the execution costs in the order of pre-payment. |
---|
868 | The two orders are quite different and the proof is by mutual recursion over |
---|
869 | the definition of the converging structured traces, which is a family of three |
---|
870 | mutual inductive types. The fact that this property only holds for converging |
---|
871 | function calls in hidden in the definition of the structured traces. |
---|
872 | Then we need to show that the order of pre-payment |
---|
873 | corresponds to the order induced by the cost traces extracted from the |
---|
874 | structured trace. Finally, we need to show that the statically computed cost |
---|
875 | for one block corresponds to the cost dinamically computed in pre-payment |
---|
876 | order. |
---|
877 | |
---|
878 | \section{Overall results} |
---|
879 | |
---|
880 | Functional correctness of the compiled code can be shown by composing |
---|
881 | the simulations to show that the target behaviour matches the |
---|
882 | behaviour of the source program, if the source program does not `go |
---|
883 | wrong'. More precisely, we show that there is a forward simulation |
---|
884 | between the source trace and a (flattened structured) trace of the |
---|
885 | output, and conclude equivalence because the target's semantics are |
---|
886 | in the form of an executable function, and hence |
---|
887 | deterministic. |
---|
888 | |
---|
889 | Combining this with the correctness of the assignment of costs to cost |
---|
890 | labels at the ASM level for a structured trace, we can show that the |
---|
891 | cost of executing any compiled function (including the main function) |
---|
892 | is equal to the sum of all the values for cost labels encountered in |
---|
893 | the \emph{source code's} trace of the function. |
---|
894 | |
---|
895 | \section{Estimated effort} |
---|
896 | Based on the rough analysis performed so far we can estimate the total |
---|
897 | effort for the certification of the compiler. We obtain this estimation by |
---|
898 | combining, for each pass: 1) the number of lines of code to be certified; |
---|
899 | 2) the ratio of number of lines of proof to number of lines of code from |
---|
900 | the CompCert project~\cite{compcert} for the CompCert pass that is closest to |
---|
901 | ours; 3) an estimation of the complexity of the pass according to the |
---|
902 | analysis above. |
---|
903 | |
---|
904 | \begin{tabular}{lrlrr} |
---|
905 | Pass origin & Code lines & CompCert ratio & Estimated effort & Estimated effort \\ |
---|
906 | & & & (based on CompCert) & \\ |
---|
907 | \hline |
---|
908 | Common & 4864 & 4.25 \permil & 20.67 & 17.0 \\ |
---|
909 | Cminor & 1057 & 5.23 \permil & 5.53 & 6.0 \\ |
---|
910 | Clight & 1856 & 5.23 \permil & 9.71 & 10.0 \\ |
---|
911 | RTLabs & 1252 & 1.17 \permil & 1.48 & 5.0 \\ |
---|
912 | RTL & 469 & 4.17 \permil & 1.95 & 2.0 \\ |
---|
913 | ERTL & 789 & 3.01 \permil & 2.38 & 2.5 \\ |
---|
914 | LTL & 92 & 5.94 \permil & 0.55 & 0.5 \\ |
---|
915 | LIN & 354 & 6.54 \permil & 2.31 & 1.0 \\ |
---|
916 | ASM & 984 & 4.80 \permil & 4.72 & 10.0 \\ |
---|
917 | \hline |
---|
918 | Total common & 4864 & 4.25 \permil & 20.67 & 17.0 \\ |
---|
919 | Total front-end & 2913 & 5.23 \permil & 15.24 & 16.0 \\ |
---|
920 | Total back-end & 6853 & 4.17 \permil & 13.39 & 21.0 \\ |
---|
921 | \hline |
---|
922 | Total & 14630 & 3.75 \permil & 49.30 & 54.0 \\ |
---|
923 | \end{tabular} |
---|
924 | |
---|
925 | We provide now some additional informations on the methodology used in the |
---|
926 | computation. The passes in Cerco and CompCert front-end closely match each |
---|
927 | other. However, there is no clear correspondence between the two back-ends. |
---|
928 | For instance, we enforce the calling convention immediately after instruction |
---|
929 | selection, whereas in CompCert this is performed in a later phase. Or we |
---|
930 | linearize the code at the very end, whereas CompCert performs linearization |
---|
931 | as soon as possible. Therefore, the first part of the exercise has consisted |
---|
932 | in shuffling and partitioning the CompCert code in order to assign to each |
---|
933 | CerCo pass the CompCert code that performs the same transformation. |
---|
934 | |
---|
935 | After this preliminary step, using the data given in~\cite{compcert} (which |
---|
936 | are relative to an early version of CompCert) we computed the ratio between |
---|
937 | men months and lines of code in CompCert for each CerCo pass. This is shown |
---|
938 | in the third column of Table~\ref{wildguess}. For those CerCo passes that |
---|
939 | have no correspondence in CompCert (like the optimizing assembler) or where |
---|
940 | we have insufficient data, we have used the average of the ratios computed |
---|
941 | above. |
---|
942 | |
---|
943 | The first column of the table shows the number of lines of code for each |
---|
944 | pass in CerCo. The third column is obtained multiplying the first with the |
---|
945 | CompCert ratio. It provides an estimate of the effort required (in men months) |
---|
946 | if the complexity of the proofs for CerCo and Compcert would be the same. |
---|
947 | |
---|
948 | The two proof styles, however, are on purpose completely different. Where |
---|
949 | CompCert uses non executable semantics, describing the various semantics with |
---|
950 | inductive types, we have preferred executable semantics. Therefore, CompCert |
---|
951 | proofs by induction and inversion become proof by functional inversion, |
---|
952 | performed using the Russel methodology (now called Program in Coq, but whose |
---|
953 | behaviour differs from Matita's one). Moreover, CompCert code is written using |
---|
954 | only types that belong to the Hindley-Milner fragment, whereas we have |
---|
955 | heavily exploited dependent types all over the code. The dependent type |
---|
956 | discipline offers many advantages from the point of view of clarity of the |
---|
957 | invariants involved and early detection of errors and it naturally combines |
---|
958 | well with the Russel approach which is based on dependent types. However, it |
---|
959 | is also well known to introduce technical problems all over the code, like |
---|
960 | the need to explicitly prove type equalities to be able to manipulate |
---|
961 | expressions in certain ways. In many situations, the difficulties encountered |
---|
962 | with manipulating dependent types are better addressed by improving the Matita |
---|
963 | system, according to the formalization driven system development. For this |
---|
964 | reason, and assuming a pessimistic point of view on our performance, the |
---|
965 | fourth columns presents the final estimation of the effort required, that also |
---|
966 | takes in account the complexity of the proof suggested by the informal proofs |
---|
967 | sketched in the previous section. |
---|
968 | |
---|
969 | \end{document} |
---|