source: Deliverables/Dissemination/final-review/wp6-dissemination.ltx @ 3291

Last change on this file since 3291 was 3291, checked in by mckinna, 8 years ago

template beamer file, and notes for the talk
more later today!

File size: 13.7 KB
Line 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
25
26\comment{
27\begin{frame}{CerCo Outcomes}
28
29  The final results of the CerCo project include the following:
30
31  \medskip %
32  \begin{itemize}
33  \item A compiler from C to executable 8051 binaries, formalised in Matita as
34    \blue{\lstinline|compiler.ma|} \dots
35  \item \dots which adds labels and cost information to C source code
36  \item \dots with exact values for time and space used by the binary.
37
38    \medskip %
39  \item A machine-checked proof of this in Matita
40    \blue{\lstinline|correctness.ma|}:
41
42    \smallskip %
43    \begin{itemize}
44    \item Addressing \red{intensional} properties --- clock cycles, stack bytes
45
46      \smallskip %
47    \item As well as \red{extensional} properties --- functional correctness.
48    \end{itemize}
49  \item An executable cost-annotating compiler \blue{\lstinline|cerco|}
50    extracted from the formalisation.
51  \end{itemize}
52
53  \medskip %
54  This talk treats the front-end compiler and proof, describing the technical
55  challenges and contributions from the work in Period~3.
56 
57\end{frame}
58
59\begin{frame}{Compiler}
60
61  The CerCo compiler, given C source code, will generate:
62  \begin{itemize}
63  \item Labelled Clight source code;
64  \item Labelled 8051 object code;
65  \item Cost maps: for each label a count of time (clock cycles) and space
66    (stack bytes) usage.
67  \end{itemize}
68
69  \medskip %
70  The compiler operates in multiple passes, with each intermediate language
71  having an executable semantics in Matita: %
72  \vspace*{-\medskipamount}
73  \begin{center}
74    \includegraphics[width=0.7\linewidth]{compiler-plain.pdf.eps}
75  \end{center}
76  \vspace*{-\medskipamount}
77
78\end{frame}
79
80\begin{frame}{Correctness}
81 
82  Suppose we have C source with labelled Clight, labelled 8051 binary, and
83  cost maps all generated from the CerCo compiler.
84
85  \medskip %
86  Then the Matita theorem for correctness is that:
87  \begin{itemize}
88  \item Executing the original and labelled Clight source gives the same
89    behaviour;
90  \item Executing the 8051 binary gives the same observables: labels and
91    call/return;
92  \item Applying the cost maps to the trace of C labels correctly
93    describes the time and space usage of the 8051 execution.
94  \end{itemize}
95
96  In fact we give an exact result on all \red{measurable subtraces}: from any
97  label to the end of its enclosing function.
98
99  \medskip %
100  The machine-checked proof is the concatenation of correctness results for
101  the front end, the back end, and the assembler.
102
103\end{frame}
104
105\begin{frame}{Front-End Progress}
106 
107  \vspace*{-\bigskipamount}
108  \begin{center}
109    \includegraphics[width=0.7\linewidth]{compiler-plain.pdf.eps}
110  \end{center}
111  \vspace*{-\medskipamount}
112
113  At the start of Period~3 the front end included in Matita:
114  \begin{itemize}
115  \item Executable semantics for source and intermediate languages;
116  \item Labelling and compilation of source through these;
117  \item Innovation of \red{structured traces}, originally to support timing
118    computation in the assembler.
119  \end{itemize}
120
121  At the completion of Period~3, the front end now includes proofs of
122  correctness for all these stages, extension to stack usage, and considerable
123  refinement of the existing formal development.
124 
125\end{frame}
126
127\begin{frame}{Front-End Contribution}
128
129  Notable features and original elements of the front-end include:
130  \begin{itemize}
131  \item Proof layering:
132    \begin{itemize}
133    \item A \blue{functional correctness proof};
134    \item A \blue{cost proof}, separate but depending on functional
135      correctness.
136    \end{itemize}
137  \item \blue{Internal checks} as a proof shortcut, like translation
138    validation.
139  \item \blue{Structured traces} for holding detailed data on resource
140    behaviour, not just for assembler cost calculation.
141  \item Introduction of \blue{measurable subtraces} as a unit of cost
142    proof.
143  \end{itemize}
144  These features are significant in particular because:
145  \begin{itemize}
146  \item Compilation is not 1-1 transliteration: code is rearranged and
147    modified, but label sequencing is preserved;
148  \item The source language has implicit control flow constraints not present
149    in the target (call/return, branches, loops).
150  \end{itemize}
151
152\end{frame}
153
154\begin{frame}{Axiomatization}
155
156  The translation in \blue{\lstinline|compile.ma|} is complete --- essential
157  for extracting a compiler.
158
159  \medskip %
160  The simulation results in \blue{\lstinline|correctness.ma|} are fully
161  specified, and with substantially complete proofs.  Some parts, however,
162  have been axiomatized and their correctness assumed.
163
164  \medskip %
165  In the front-end there are two sources of this axiomatisation:
166  \begin{itemize}
167  \item \red{Functional correctness.}  Other projects, notably
168    \blue{CompCert}, provide assurances that such proofs can be completed; so
169    certain parts of this proof have been assumed.
170  \item Simulation steps with \red{many cases}, some very similar; here we
171    have identified representative and more challenging cases for detailed
172    proof.
173  \end{itemize}
174 
175\end{frame}
176
177\begin{frame}{Structure of the Proof}
178
179  \begin{center}
180    \includegraphics[width=0.7\linewidth]{compiler.pdf.eps}
181  \end{center}
182
183  The transition from front-end to back-end marks the change:
184  \begin{itemize}
185  \item From a language with explicit call/return structure and high-level
186    structured control flow;
187  \item To a language where all control flow is unconstrained jumps.
188  \end{itemize}
189
190  Correspondingly, the proof hands over
191  \begin{itemize}
192  \item From measurable subtraces with labelling that must be checked
193    \blue{sound} and \blue{precise};
194  \item To \blue{structured traces} which embed these invariants.
195  \end{itemize}
196
197\end{frame}
198
199\begin{frame}{Structured Traces}
200
201  Embed call structure and label invariants into execution traces.
202
203  \begin{center}
204    \includegraphics{strtraces.pdf.eps}
205  \end{center}
206  \vspace*{-\bigskipamount}
207  \begin{itemize}
208  \item Enforces \blue{invariants} on positions of cost labels.
209  \item \blue{Groups} execution steps according to preceding cost label.
210  \item Forbids \blue{repeating} addresses in grouped steps.
211  \end{itemize}
212
213  Constructed by folding up \red{measurable subtraces}, using their termination
214  and the fact that labelling is \blue{sound} and \blue{precise}.
215
216\end{frame}
217
218\begin{frame}{Step-by-Step: Getting to Labelled Source}
219
220  \blue{C $\to$ Clight}
221
222  \smallskip %
223  To translate from C to Clight we reuse the CompCert parser.
224
225  \bigskip %
226  \blue{Switch Removal}
227
228  \smallskip %
229  Control flow for \red{\lstinline'switch'} is too subtle for simple
230  labelling; we replace with an \red{\lstinline'if .. then .. else'} tree.
231  Proof requires tracking memory extension for an extra test variable.
232
233  \bigskip %
234  \blue{Cost Labels}
235
236  \smallskip %
237  Labels added at function start, branch targets, \dots
238
239  \smallskip %
240  Simulation of execution is exact, just skipping over labels.
241
242\end{frame}
243
244\begin{frame}{Front-End Correctness}
245
246  Suppose we have a \blue{measurable} subtrace of Clight execution, including
247  the \blue{prefix} of that trace from initial state.
248
249  \medskip %
250  Then for handover to the back-end we have in RTLabs:
251  \begin{itemize}
252  \item A corresponding \red{prefix};
253  \item A \red{structured trace} corresponding to the measurable subtrace;
254  \item The required invariants, including \red{no repeating addresses};
255  \item A proof that the same \red{stack limit} is observed.
256  \end{itemize}
257
258  \medskip %
259  In addition, to link this with the source program:
260  \begin{itemize}
261  \item The observables for the RTLabs \blue{prefix} and \blue{structured
262      trace} are the same as for their counterparts in Clight.
263  \end{itemize}
264
265\end{frame}
266
267\begin{frame}{Lifting Measurable Traces}
268
269  For each pass find a target \red{measurable subtrace} equivalent to any
270  source \red{measurable subtrace}.
271
272  \begin{center}
273    \includegraphics{meassim.pdf.eps}
274  \end{center}
275
276  Split normal simulation proof:
277  \begin{enumerate}
278  \item each \blue{call} becomes a \blue{call} step plus zero or more
279    \blue{normal} steps; following cost labels should stay next to the call
280    step
281  \item each \blue{return} becomes a \blue{return} plus zero or
282    more \blue{normal} steps
283  \item \blue{cost} label steps are preserved
284  \item other \blue{normal} steps become zero or more \blue{normal} steps
285  \end{enumerate}
286
287  This preserves the defining property for \red{measurable subtraces}.
288
289\end{frame}
290
291\begin{frame}{Front-End Simulation Results}
292  \blue{Cast simplification}
293  \begin{itemize}
294  \item Simplify expressions for 8-bit target.
295  \item Only expressions change --- statements are in lock-step.
296  \end{itemize}
297
298  \blue{Clight to Cminor}
299  \begin{itemize}
300  \item Stack allocation and control flow transformation.
301  \item Similar to \blue{CompCert}, but using \red{\lstinline'goto'} instead
302    of \red{\lstinline'loop'}
303  \item Large proof terms for invariants embedded in Cminor programs using
304    dependent types.
305  \end{itemize}
306
307  \blue{Cminor to RTLabs}
308  \begin{itemize}
309  \item Construction of control-flow graph.
310  \item Embedded invariants mean that translation function is \red{total}.
311  \end{itemize}
312\end{frame}
313
314\begin{frame}{Checking Cost Labelling}
315
316  Building structured traces in \blue{RTLabs} depends on having cost labelling
317  on the linear trace that is \red{sound} and \red{precise}.
318
319  \medskip %
320  Instead of carrying this as an invariant all the way through each previous
321  pass, we take a \blue{shortcut} and check on the spot.
322
323  \medskip %
324  This is similar to \blue{translation validation}.
325
326  \medskip %
327   The specific requirements for labelling \blue{RTLabs} code are:
328   \begin{description}[\red{soundness}]
329   \item[\red{soundness}] At every point, a finite bound on the number of
330     instructions until the next label;
331   \item[\red{soundness}] A label at the start of every function;
332   \item[\red{precision}] A label after every branch.
333   \end{description}
334 
335   \medskip %
336   The bound is the hard part; the rest is a simple syntactic check.
337\end{frame}
338
339\begin{frame}[fragile]{Checking Bound to Next Label}
340
341    \begin{center}
342  \begin{overprint}[0.5\linewidth]
343      \onslide<1>\includegraphics[width=0.8\linewidth]{loop.pdf.eps}
344      \onslide<2>\includegraphics[width=0.8\linewidth]{loop1.pdf.eps}
345      \onslide<3>\includegraphics[width=0.8\linewidth]{loop2.pdf.eps}
346      \onslide<4>\includegraphics[width=0.8\linewidth]{loop3.pdf.eps}
347      \onslide<5>\includegraphics[width=0.8\linewidth]{loop4.pdf.eps}
348      \onslide<6->\includegraphics[width=0.8\linewidth]{loopx.pdf.eps}
349  \end{overprint}
350    \end{center}
351  \begin{itemize}
352  \item Pick an arbitrary node in the CFG;
353  \item<2-> Follow successor instructions;
354  \item<4-> If we find a \alert{cost label} all the traced nodes have a
355    bound\dots
356  \item<5-> \dots so remove them and pick a new node, continue;
357  \item<6-> But if we find an \alert{unlabelled loop} then the labelling is
358    unsound, report an error.
359  \end{itemize}
360
361  \onslide<7->
362  When complete, we have a proof that every loop contains a label.
363
364\end{frame}
365
366\begin{frame}{Checking Cost Labelling}
367
368  This check of soundness and precision in RTLabs:
369  \begin{itemize}
370  \item Is partial --- will only succeed when invariants hold;
371  \item Extracts to additional checking code in the compiler;
372  \item[\checkmark] Significantly reduces the proof effort.
373  \end{itemize}
374
375  \medskip %
376  The last point is our key observation: a conventional proof would require
377  showing that:
378  \begin{quote}
379    \red{If} every loop in \blue{Cminor} code is headed by a cost label;
380    \\[1\jot]
381    \red{Then} at every point in every \blue{RTLabs} execution there is a
382    finite bound on the number of instructions until the next cost label.
383  \end{quote}
384  We expect that proving existence in general of this bound alone to be
385  considerably harder than the whole check.
386
387\end{frame}
388
389\begin{frame}{Putting the Proofs Together}
390
391  \begin{center}
392    \includegraphics[width=0.7\linewidth]{compiler.pdf.eps}
393  \end{center}
394
395  The front-end proof demonstrates that for any Clight execution:
396  \begin{itemize}
397  \item We have a corresponding \blue{structured trace} with \blue{invariants}
398    \dots
399  \item \dots and with the same \blue{observables} of labels and call/return.
400  \end{itemize}
401
402  The back-end proof takes this structured trace and shows that
403  \begin{itemize}
404  \item There is a corresponding assembler trace \dots
405  \item \dots with \blue{sound} and \blue{precise} labelling and the same
406    \blue{observables}.
407  \end{itemize}
408
409  Combining these gives a proof that time and space costs computed on the
410  source are exactly those observed on executing the binary.
411
412\end{frame}
413
414\begin{frame}{Summary}
415
416  CerCo WP3 has produced the front-end of a C-to-8051 compiler that annotates
417  C source with runtime cycle counts and stack space.
418
419  \medskip %
420  This is formalised in the Matita proof assistant, with a proof of
421  correctness, and can be extracted as an executable compiler.
422
423  \smallskip %
424  \begin{itemize}
425  \item Modular \blue{layering} of intensional proofs over extensional
426    results;
427  \item Extensional results richer than normal and carefully chosen, but do
428    not require large changes to proof techniques.
429  \item \blue{Compile-time checks} on intermediate code reduce proof effort.
430  \item \blue{Structured traces} as a repository for invariant information.
431  \item \blue{Measurable subtraces} as a unit of cost proof.
432  \end{itemize}
433
434  \smallskip %
435  Going beyond this, in future projects (REMS, \dots) we plan to:
436  \begin{itemize}
437  \item Generalize  labelling schemes for more sophisticated targets;
438  \item Apply \blue{decompilation} to jump the gap from source to binary.
439  \end{itemize}
440
441\end{frame}
442}
443
444% LocalWords:  LFCS Matita CerCo intensional WCET toolchain CompCert Clight ASM
445% LocalWords:  reexecutes subtrace RTLabs subtraces Garnier McKinna cerco
446% LocalWords:  Axiomatization axiomatized Cminor goto
Note: See TracBrowser for help on using the repository browser.