1 | \documentclass[nopanel]{beamer} |
---|
2 | \usepackage{graphicx,color} |
---|
3 | \usepackage{amsfonts} |
---|
4 | \usepackage{listings} |
---|
5 | |
---|
6 | \lstset{basicstyle=\footnotesize\tt,columns=flexible,breaklines=false, |
---|
7 | keywordstyle=\color{red}\bfseries, |
---|
8 | keywordstyle=[2]\color{blue}, |
---|
9 | commentstyle=\color{green}, |
---|
10 | stringstyle=\color{blue}, |
---|
11 | showspaces=false,showstringspaces=false} |
---|
12 | |
---|
13 | |
---|
14 | \setbeamertemplate{footline}[page number] |
---|
15 | \setbeamertemplate{sidebar right}{} |
---|
16 | |
---|
17 | \setbeamercolor{alerted text}{fg=red!70!orange} |
---|
18 | |
---|
19 | \newcommand{\red}[1]{\textcolor{red}{#1}} |
---|
20 | \newcommand{\blue}[1]{\textcolor{blue}{#1}} |
---|
21 | \newcommand{\green}[1]{\textcolor{green}{#1}} |
---|
22 | \newcommand{\grey}[1]{\textcolor{grey}{#1}} |
---|
23 | |
---|
24 | \addtolength{\parskip}{0.3em} |
---|
25 | |
---|
26 | \begin{document} |
---|
27 | |
---|
28 | \title{WP3: Verified Compiler --- Front End\\[1\jot] |
---|
29 | D3.4: Front-End Correctness Proofs} |
---|
30 | \author{Brian~Campbell, Ilias~Garnier, James~McKinna, \underline{Ian~Stark}} |
---|
31 | \institute{LFCS, University of Edinburgh\\[1ex] |
---|
32 | \includegraphics[width=.3\linewidth]{cerco_logo.png}\\ |
---|
33 | \footnotesize Project FP7-ICT-2009-C-243881 |
---|
34 | } |
---|
35 | \date{CerCo review meeting\\16th May 2013} |
---|
36 | \maketitle |
---|
37 | |
---|
38 | |
---|
39 | \begin{frame}{CerCo Outcomes} |
---|
40 | |
---|
41 | The final results of the CerCo project include the following: |
---|
42 | |
---|
43 | \medskip % |
---|
44 | \begin{itemize} |
---|
45 | \item A compiler from C to executable 8051 binaries, formalised in Matita as |
---|
46 | \blue{\lstinline|compiler.ma|} \dots |
---|
47 | \item \dots which adds labels and cost information to C source code |
---|
48 | \item \dots with exact values for time and space used by the binary. |
---|
49 | |
---|
50 | \medskip % |
---|
51 | \item A machine-checked proof of this in Matita |
---|
52 | \blue{\lstinline|correctness.ma|}: |
---|
53 | |
---|
54 | \smallskip % |
---|
55 | \begin{itemize} |
---|
56 | \item Addressing \red{intensional} properties --- clock cycles, stack bytes |
---|
57 | |
---|
58 | \smallskip % |
---|
59 | \item As well as \red{extensional} properties --- functional correctness. |
---|
60 | \end{itemize} |
---|
61 | \item An executable cost-annotating compiler \blue{\lstinline|cerco|} |
---|
62 | extracted from the formalisation. |
---|
63 | \end{itemize} |
---|
64 | |
---|
65 | \medskip % |
---|
66 | This talk treats the front-end compiler and proof, describing the technical |
---|
67 | challenges and contributions from the work in Period~3. |
---|
68 | |
---|
69 | \end{frame} |
---|
70 | |
---|
71 | \begin{frame}{Compiler} |
---|
72 | |
---|
73 | The CerCo compiler, given C source code, will generate: |
---|
74 | \begin{itemize} |
---|
75 | \item Labelled Clight source code; |
---|
76 | \item Labelled 8051 object code; |
---|
77 | \item Cost maps: for each label a count of time (clock cycles) and space |
---|
78 | (stack bytes) usage. |
---|
79 | \end{itemize} |
---|
80 | |
---|
81 | \medskip % |
---|
82 | The compiler operates in multiple passes, with each intermediate language |
---|
83 | having an executable semantics in Matita: % |
---|
84 | \vspace*{-\medskipamount} |
---|
85 | \begin{center} |
---|
86 | \includegraphics[width=0.7\linewidth]{compiler-plain.pdf} |
---|
87 | \end{center} |
---|
88 | \vspace*{-\medskipamount} |
---|
89 | |
---|
90 | \end{frame} |
---|
91 | |
---|
92 | \begin{frame}{Correctness} |
---|
93 | |
---|
94 | Suppose we have C source with labelled Clight, labelled 8051 binary, and |
---|
95 | cost maps all generated from the CerCo compiler. |
---|
96 | |
---|
97 | \medskip % |
---|
98 | Then the Matita theorem for correctness is that: |
---|
99 | \begin{itemize} |
---|
100 | \item Executing the original and labelled Clight source gives the same |
---|
101 | behaviour; |
---|
102 | \item Executing the 8051 binary gives the same observables: labels and |
---|
103 | call/return; |
---|
104 | \item Applying the cost maps to the trace of C labels correctly |
---|
105 | describes the time and space usage of the 8051 execution. |
---|
106 | \end{itemize} |
---|
107 | |
---|
108 | In fact we give an exact result on all \red{measurable subtraces}: from any |
---|
109 | label to the end of its enclosing function. |
---|
110 | |
---|
111 | \medskip % |
---|
112 | The machine-checked proof is the concatenation of correctness results for |
---|
113 | the front end, the back end, and the assembler. |
---|
114 | |
---|
115 | \end{frame} |
---|
116 | |
---|
117 | \begin{frame}{Front-End Progress} |
---|
118 | |
---|
119 | \vspace*{-\bigskipamount} |
---|
120 | \begin{center} |
---|
121 | \includegraphics[width=0.7\linewidth]{compiler-plain.pdf} |
---|
122 | \end{center} |
---|
123 | \vspace*{-\medskipamount} |
---|
124 | |
---|
125 | At the start of Period~3 the front end included in Matita: |
---|
126 | \begin{itemize} |
---|
127 | \item Executable semantics for source and intermediate languages; |
---|
128 | \item Labelling and compilation of source through these; |
---|
129 | \item Innovation of \red{structured traces}, originally to support timing |
---|
130 | calculation in the assembler. |
---|
131 | \end{itemize} |
---|
132 | |
---|
133 | At the completion of Period~3, the front end now includes proofs of |
---|
134 | correctness for all these stages, extension to stack usage, and considerable |
---|
135 | refinement of the existing formal development. |
---|
136 | |
---|
137 | \end{frame} |
---|
138 | |
---|
139 | \begin{frame}{Front-End Contribution} |
---|
140 | |
---|
141 | Notable features and original elements of the front-end include: |
---|
142 | \begin{itemize} |
---|
143 | \item Proof layering: |
---|
144 | \begin{itemize} |
---|
145 | \item A \blue{functional correctness proof}; |
---|
146 | \item A \blue{cost proof}, separate but depending on functional |
---|
147 | correctness. |
---|
148 | \end{itemize} |
---|
149 | \item \blue{Internal checks} as a proof shortcut, like translation |
---|
150 | validation. |
---|
151 | \item \blue{Structured traces} for holding detailed data on resource |
---|
152 | behaviour, not just for assembler cost calculation. |
---|
153 | \item Introduction of \blue{measurable subtraces} as a unit of cost |
---|
154 | proof. |
---|
155 | \end{itemize} |
---|
156 | These features are significant in particular because: |
---|
157 | \begin{itemize} |
---|
158 | \item Compilation is not 1-1 transliteration: code is rearranged and |
---|
159 | modified, but label sequencing is preserved; |
---|
160 | \item The source language has implicit control flow constraints not present |
---|
161 | in the target (call/return, branches, loops). |
---|
162 | \end{itemize} |
---|
163 | |
---|
164 | \end{frame} |
---|
165 | |
---|
166 | \begin{frame}{Axiomatization} |
---|
167 | |
---|
168 | The translation in \blue{\lstinline|compile.ma|} is complete --- essential |
---|
169 | for extracting a compiler. |
---|
170 | |
---|
171 | \medskip % |
---|
172 | The simulation results in \blue{\lstinline|correctness.ma|} are fully |
---|
173 | specified, and with substantially complete proofs. Some parts, however, |
---|
174 | have been axiomatized and their correctness assumed. |
---|
175 | |
---|
176 | \medskip % |
---|
177 | In the front-end there are two sources of this axiomatisation: |
---|
178 | \begin{itemize} |
---|
179 | \item \red{Functional correctness.} Other projects, notably |
---|
180 | \blue{CompCert}, provide assurances that such proofs can be completed; so |
---|
181 | certain parts of this proof have been assumed. |
---|
182 | \item Simulation steps with \red{many cases}, some very similar; here we |
---|
183 | have identified representative and more challenging cases for detailed |
---|
184 | proof. |
---|
185 | \end{itemize} |
---|
186 | |
---|
187 | \end{frame} |
---|
188 | |
---|
189 | \begin{frame}{Structure of the Proof} |
---|
190 | |
---|
191 | \begin{center} |
---|
192 | \includegraphics[width=0.7\linewidth]{compiler.pdf} |
---|
193 | \end{center} |
---|
194 | |
---|
195 | The transition from front-end to back-end marks the change: |
---|
196 | \begin{itemize} |
---|
197 | \item From a language with explicit call/return structure and high-level |
---|
198 | structured control flow; |
---|
199 | \item To a language with unconstrained jumps for control flow. |
---|
200 | \end{itemize} |
---|
201 | |
---|
202 | Correspondingly, the proof hands over |
---|
203 | \begin{itemize} |
---|
204 | \item From measurable subtraces with labelling that must be checked |
---|
205 | \blue{sound} and \blue{precise}; |
---|
206 | \item To \blue{structured traces} which embed these invariants. |
---|
207 | \end{itemize} |
---|
208 | |
---|
209 | \end{frame} |
---|
210 | |
---|
211 | \begin{frame}{Structured Traces} |
---|
212 | |
---|
213 | Embed call structure and label invariants into execution traces. |
---|
214 | |
---|
215 | \begin{center} |
---|
216 | \includegraphics{strtraces.pdf} |
---|
217 | \end{center} |
---|
218 | \vspace*{-\bigskipamount} |
---|
219 | \begin{itemize} |
---|
220 | \item Enforces \blue{invariants} on positions of cost labels. |
---|
221 | \item \blue{Groups} execution steps according to preceding cost label. |
---|
222 | \item Forbids \blue{repeating} addresses in grouped steps. |
---|
223 | \end{itemize} |
---|
224 | |
---|
225 | Constructed by folding up \red{measurable subtraces}, using their termination |
---|
226 | and the fact that labelling is \blue{sound} and \blue{precise}. |
---|
227 | |
---|
228 | \end{frame} |
---|
229 | |
---|
230 | \begin{frame}{Step-by-Step: Getting to Labelled Source} |
---|
231 | |
---|
232 | \blue{C $\to$ Clight} |
---|
233 | |
---|
234 | \smallskip % |
---|
235 | To translate from C to Clight we reuse the CompCert parser. |
---|
236 | |
---|
237 | \bigskip % |
---|
238 | \blue{Switch Removal} |
---|
239 | |
---|
240 | \smallskip % |
---|
241 | Control flow for \red{\lstinline'switch'} is too subtle for simple |
---|
242 | labelling; we replace with an \red{\lstinline'if .. then .. else'} tree. |
---|
243 | Proof requires tracking memory extension for an extra test variable. |
---|
244 | |
---|
245 | \bigskip % |
---|
246 | \blue{Cost Labels} |
---|
247 | |
---|
248 | \smallskip % |
---|
249 | Labels added at function start, branch targets, \dots |
---|
250 | |
---|
251 | \smallskip % |
---|
252 | Simulation of execution is exact, just skipping over labels. |
---|
253 | |
---|
254 | \end{frame} |
---|
255 | |
---|
256 | \begin{frame}{Front-End Correctness} |
---|
257 | |
---|
258 | Suppose we have a \blue{measurable} subtrace of Clight execution, including |
---|
259 | the \blue{prefix} of that trace from initial state. |
---|
260 | |
---|
261 | \medskip % |
---|
262 | Then for handover to the back-end we have in RTLabs: |
---|
263 | \begin{itemize} |
---|
264 | \item A corresponding \red{prefix}; |
---|
265 | \item A \red{structured trace} corresponding to the measurable subtrace; |
---|
266 | \item The required invariants, including \red{no repeating addresses}; |
---|
267 | \item A proof that the same \red{stack limit} is observed. |
---|
268 | \end{itemize} |
---|
269 | |
---|
270 | \medskip % |
---|
271 | In addition, to link this with the source program: |
---|
272 | \begin{itemize} |
---|
273 | \item The observables for the RTLabs \blue{prefix} and \blue{structured |
---|
274 | trace} are the same as for their counterparts in Clight. |
---|
275 | \end{itemize} |
---|
276 | |
---|
277 | \end{frame} |
---|
278 | |
---|
279 | \begin{frame}{Lifting Measurable Traces} |
---|
280 | |
---|
281 | For each pass find a target \red{measurable subtrace} equivalent to any |
---|
282 | source \red{measurable subtrace}. |
---|
283 | |
---|
284 | \begin{center} |
---|
285 | \includegraphics{meassim.pdf} |
---|
286 | \end{center} |
---|
287 | |
---|
288 | Split normal simulation proof: |
---|
289 | \begin{enumerate} |
---|
290 | \item each \blue{call} becomes a \blue{call} step plus zero or more |
---|
291 | \blue{normal} steps; following cost labels should stay next to the call |
---|
292 | step |
---|
293 | \item each \blue{return} becomes a \blue{return} plus zero or |
---|
294 | more \blue{normal} steps |
---|
295 | \item \blue{cost} label steps are preserved |
---|
296 | \item other \blue{normal} steps become zero or more \blue{normal} steps |
---|
297 | \end{enumerate} |
---|
298 | |
---|
299 | This preserves the defining property for \red{measurable subtraces}. |
---|
300 | |
---|
301 | \end{frame} |
---|
302 | |
---|
303 | \begin{frame}{Front-End Simulation Results} |
---|
304 | \blue{Cast simplification} |
---|
305 | \begin{itemize} |
---|
306 | \item Simplify expressions for 8-bit target. |
---|
307 | \item Only expressions change --- statements are in lock-step. |
---|
308 | \end{itemize} |
---|
309 | |
---|
310 | \blue{Clight to Cminor} |
---|
311 | \begin{itemize} |
---|
312 | \item Stack allocation and control flow transformation. |
---|
313 | \item Similar to \blue{CompCert}, but using \red{\lstinline'goto'} instead |
---|
314 | of \red{\lstinline'loop'} |
---|
315 | \item Large proof terms for invariants embedded in Cminor programs using |
---|
316 | dependent types. |
---|
317 | \end{itemize} |
---|
318 | |
---|
319 | \blue{Cminor to RTLabs} |
---|
320 | \begin{itemize} |
---|
321 | \item Construction of control-flow graph. |
---|
322 | \item Embedded invariants mean that translation function is \red{total}. |
---|
323 | \end{itemize} |
---|
324 | \end{frame} |
---|
325 | |
---|
326 | \begin{frame}{Checking Cost Labelling} |
---|
327 | |
---|
328 | Building structured traces in \blue{RTLabs} depends on having cost labelling |
---|
329 | on the linear trace that is \red{sound} and \red{precise}. |
---|
330 | |
---|
331 | \medskip % |
---|
332 | Instead of carrying this as an invariant all the way through each previous |
---|
333 | pass, we take a \blue{shortcut} and check on the spot. |
---|
334 | |
---|
335 | \medskip % |
---|
336 | This is similar to \blue{translation validation}. |
---|
337 | |
---|
338 | \medskip % |
---|
339 | The specific requirements for labelling \blue{RTLabs} code are: |
---|
340 | \begin{description}[\red{soundness}] |
---|
341 | \item[\red{soundness}] At every point, a finite bound on the number of |
---|
342 | instructions until the next label; |
---|
343 | \item[\red{soundness}] A label at the start of every function; |
---|
344 | \item[\red{precision}] A label after every branch. |
---|
345 | \end{description} |
---|
346 | |
---|
347 | \medskip % |
---|
348 | The bound is the hard part; the rest is a simple syntactic check. |
---|
349 | \end{frame} |
---|
350 | |
---|
351 | \begin{frame}[fragile]{Checking Bound to Next Label} |
---|
352 | |
---|
353 | \begin{center} |
---|
354 | \begin{overprint}[0.5\linewidth] |
---|
355 | \onslide<1>\includegraphics[width=0.8\linewidth]{loop.pdf} |
---|
356 | \onslide<2>\includegraphics[width=0.8\linewidth]{loop1.pdf} |
---|
357 | \onslide<3>\includegraphics[width=0.8\linewidth]{loop2.pdf} |
---|
358 | \onslide<4>\includegraphics[width=0.8\linewidth]{loop3.pdf} |
---|
359 | \onslide<5>\includegraphics[width=0.8\linewidth]{loop4.pdf} |
---|
360 | \onslide<6->\includegraphics[width=0.8\linewidth]{loopx.pdf} |
---|
361 | \end{overprint} |
---|
362 | \end{center} |
---|
363 | \begin{itemize} |
---|
364 | \item Pick an arbitrary node in the CFG; |
---|
365 | \item<2-> Follow successor instructions; |
---|
366 | \item<4-> If we find a \alert{cost label} all the traced nodes have a |
---|
367 | bound\dots |
---|
368 | \item<5-> \dots so remove them and pick a new node, continue; |
---|
369 | \item<6-> But if we find an \alert{unlabelled loop} then the labelling is |
---|
370 | unsound, report an error. |
---|
371 | \end{itemize} |
---|
372 | |
---|
373 | \onslide<7-> |
---|
374 | When complete, we have a proof that every loop contains a label. |
---|
375 | |
---|
376 | \end{frame} |
---|
377 | |
---|
378 | \begin{frame}{Checking Cost Labelling} |
---|
379 | |
---|
380 | This check of soundness and precision in RTLabs: |
---|
381 | \begin{itemize} |
---|
382 | \item Will always succeed when invariants hold (and fail when not); |
---|
383 | \item Extracts to additional checking code in the compiler; |
---|
384 | \item[\checkmark] Significantly reduces the proof effort. |
---|
385 | \end{itemize} |
---|
386 | |
---|
387 | \medskip % |
---|
388 | The last point is our key observation: a conventional proof would require |
---|
389 | showing that: |
---|
390 | \begin{quote} |
---|
391 | \red{If} every loop in \blue{Cminor} code is headed by a cost label; |
---|
392 | \\[1\jot] |
---|
393 | \red{Then} at every point in every \blue{RTLabs} execution there is a |
---|
394 | finite bound on the number of instructions until the next cost label. |
---|
395 | \end{quote} |
---|
396 | We expect just proving existence of this bound to be considerably harder |
---|
397 | than implementing the whole checking mechanism. |
---|
398 | |
---|
399 | \end{frame} |
---|
400 | |
---|
401 | \begin{frame}{Putting the Proofs Together} |
---|
402 | |
---|
403 | \begin{center} |
---|
404 | \includegraphics[width=0.7\linewidth]{compiler.pdf} |
---|
405 | \end{center} |
---|
406 | |
---|
407 | The front-end proof demonstrates that for any Clight execution: |
---|
408 | \begin{itemize} |
---|
409 | \item We have a corresponding \blue{structured trace} with \blue{invariants} |
---|
410 | \dots |
---|
411 | \item \dots and with the same \blue{observables} of labels and call/return. |
---|
412 | \end{itemize} |
---|
413 | |
---|
414 | The back-end proof takes this structured trace and shows that |
---|
415 | \begin{itemize} |
---|
416 | \item There is a corresponding assembler trace \dots |
---|
417 | \item \dots with \blue{sound} and \blue{precise} labelling and the same |
---|
418 | \blue{observables}. |
---|
419 | \end{itemize} |
---|
420 | |
---|
421 | Combining these gives a proof that time and space costs computed on the |
---|
422 | source are exactly those observed on executing the binary. |
---|
423 | |
---|
424 | \end{frame} |
---|
425 | |
---|
426 | \begin{frame}{Summary} |
---|
427 | |
---|
428 | CerCo WP3 has produced the front-end of a C-to-8051 compiler that annotates |
---|
429 | C source with runtime cycle counts and stack space. |
---|
430 | |
---|
431 | \medskip % |
---|
432 | This is formalised in the Matita proof assistant, with a proof of |
---|
433 | correctness, and can be extracted as an executable compiler. |
---|
434 | |
---|
435 | \smallskip % |
---|
436 | \begin{itemize} |
---|
437 | \item Modular \blue{layering} of intensional proofs over extensional |
---|
438 | results; |
---|
439 | \item Extensional results richer than normal and carefully chosen, but do |
---|
440 | not require large changes to proof techniques. |
---|
441 | \item \blue{Compile-time checks} on intermediate code reduce proof effort. |
---|
442 | \item \blue{Structured traces} as a repository for invariant information. |
---|
443 | \item \blue{Measurable subtraces} as a unit of cost proof. |
---|
444 | \end{itemize} |
---|
445 | |
---|
446 | \smallskip % |
---|
447 | Going beyond this, in future projects (REMS, \dots) we plan to: |
---|
448 | \begin{itemize} |
---|
449 | \item Generalize labelling schemes for more sophisticated targets; |
---|
450 | \item Apply \blue{decompilation} to jump the gap from source to binary. |
---|
451 | \end{itemize} |
---|
452 | |
---|
453 | \end{frame} |
---|
454 | \end{document} |
---|
455 | |
---|
456 | % LocalWords: LFCS Matita CerCo intensional WCET toolchain CompCert Clight ASM |
---|
457 | % LocalWords: reexecutes subtrace RTLabs subtraces Garnier McKinna cerco |
---|
458 | % LocalWords: Axiomatization axiomatized Cminor goto |
---|