source: Deliverables/D1.1/Presentations/Paris presentation March 2012/WP4-dominic.tex @ 1815

Last change on this file since 1815 was 1815, checked in by mulligan, 8 years ago

Added more to talk

File size: 18.0 KB
Line 
1\documentclass[serif]{beamer}
2
3\usepackage[english]{babel}
4\usepackage{listings}
5\usepackage{microtype}
6
7\author{Dominic Mulligan}
8\title{CerCo Work Package 4}
9\date{CerCo project review meeting\\March 2012}
10
11\lstdefinelanguage{matita-ocaml}
12  {keywords={definition,coercion,lemma,theorem,remark,inductive,record,qed,let,in,rec,match,return,with,Type,try},
13   morekeywords={[2]whd,normalize,elim,cases,destruct},
14   morekeywords={[3]type,of},
15   mathescape=true,
16  }
17
18\lstset{language=matita-ocaml,basicstyle=\scriptsize\tt,columns=flexible,breaklines=false,
19        keywordstyle=\color{red}\bfseries,
20        keywordstyle=[2]\color{blue},
21        keywordstyle=[3]\color{blue}\bfseries,
22        commentstyle=\color{green},
23        stringstyle=\color{blue},
24        showspaces=false,showstringspaces=false}
25
26\begin{document}
27
28\begin{frame}
29\maketitle
30\end{frame}
31
32\begin{frame}
33\frametitle{Summary I}
34Relevant tasks: T4.2 and T4.3 (from the CerCo Contract):
35\begin{quotation}
36\textbf{Task 4.2}
37Functional encoding in the Calculus of Inductive Construction (indicative effort: UNIBO: 8; UDP: 2; UEDIN: 0)
38\end{quotation}
39
40\begin{quotation}
41\textbf{Task 4.3}
42Formal semantics of intermediate languages (indicative effort: UNIBO: 4; UDP: 0; UEDIN: 0)
43\end{quotation}
44\end{frame}
45
46\begin{frame}
47\frametitle{Summary II}
48Task 4.2 in detail (from the CerCo Contract):
49\begin{quotation}
50\textbf{CIC encoding: Back-end}:
51Functional Specification in the internal language of the Proof Assistant (the Calculus of Inductive Construction) of the back end of the compiler.
52This unit is meant to be composable with the front-end of deliverable D3.2, to obtain a full working compiler for Milestone M2.
53A first validation of the design principles and implementation choices for the Untrusted Cost-annotating OCaml Compiler D2.2 is achieved and reported in the deliverable, possibly triggering updates of the Untrusted Cost-annotating OCaml Compiler sources.
54\end{quotation}
55\end{frame}
56
57\begin{frame}
58\frametitle{Summary III}
59Task 4.3 in detail (from the CerCo contract):
60\begin{quotation}
61\textbf{Executable Formal Semantics of back-end intermediate languages}:
62This prototype is the formal counterpart of deliverable D2.1 for the back end side of the compiler and validates it.
63\end{quotation}
64\end{frame}
65
66\begin{frame}
67\frametitle{Backend intermediate languages I}
68\begin{itemize}
69\item
70O'Caml prototype has five backend intermediate languages: RTLabs, RTL, ERTL, LTL, LIN
71\item
72RTLabs is the `frontier' between backend and frontend, last abstract language
73\item
74RTLabs, RTL, ERTL and LTL are graph based languages: functions represented as graphs of statements, with entry and exit points
75\item
76LIN is a linearised form of LTL, and is the exit point of the compiler's backend
77\end{itemize}
78\end{frame}
79
80\begin{frame}
81\frametitle{Backend intermediate languages II}
82\begin{small}
83\begin{tabbing}
84\quad \=\,\quad \= \\
85\textsf{RTLabs}\\
86\> $\downarrow$ \> copy propagation (an endo-transformation, yet to be ported) \\
87\> $\downarrow$ \> instruction selection\\
88\> $\downarrow$ \> change of memory models in compiler\\
89\textsf{RTL}\\
90\> $\downarrow$ \> constant propagation (an endo-transformation, yet to be ported) \\
91\> $\downarrow$ \> calling convention made explicit \\
92\> $\downarrow$ \> layout of activation records \\
93\textsf{ERTL}\\
94\> $\downarrow$ \> register allocation and spilling\\
95\> $\downarrow$ \> dead code elimination\\
96\textsf{LTL}\\
97\> $\downarrow$ \> function linearisation\\
98\> $\downarrow$ \> branch compression (an endo-transformation) \\
99\textsf{LIN}\\
100\> $\downarrow$ \> relabeling\\
101\end{tabbing}
102\end{small}
103\end{frame}
104
105\begin{frame}
106\frametitle{\texttt{Joint}: a new approach I}
107\begin{itemize}
108\item
109Looking at O'Caml code, languages after RTLabs are similar
110\item
111Similar instructions, similar semantics, and so on
112\item
113Differences are slight
114\item
115Pseudoregisters used instead of hardware registers
116\item
117A few new unique instructions for each language
118\item
119In semantics: some notion of `successor' of an instruction, etc.
120\end{itemize}
121\end{frame}
122
123\begin{frame}
124\frametitle{\texttt{Joint}: a new approach II}
125\begin{itemize}
126\item
127Idea: all of these languages are just instances of a single language
128\item
129This language \texttt{Joint} is parameterised by a type of registers to be used in instructions, and so forth
130\item
131Each language after RTLabs is now just defined as the \texttt{Joint} language instantiated with some concrete types
132\item
133Similarly for semantics: common definitions that take e.g. type representing program counters as parameters
134\end{itemize}
135\end{frame}
136
137\begin{frame}[fragile]
138\frametitle{\texttt{Joint}: a new approach III}
139\texttt{Joint} instructions allow us to embed language-specific instructions:
140\begin{lstlisting}
141inductive joint_instruction (p: params__) (globals: list ident): Type[0] :=
142 | COMMENT: String $\rightarrow$ joint_instruction p globals
143 | COST_LABEL: costlabel $\rightarrow$ joint_instruction p globals
144 ...
145 | COND: acc_a_reg p → label $\rightarrow$ joint_instruction p globals
146 | extension: extend_statements p $\rightarrow$ joint_instruction p globals.
147\end{lstlisting}
148
149\begin{lstlisting}
150inductive ertl_statement_extension: Type[0] :=
151 | ertl_st_ext_new_frame: ertl_statement_extension
152 | ertl_st_ext_del_frame: ertl_statement_extension
153 | ertl_st_ext_frame_size: register $\rightarrow$ ertl_statement_extension.
154\end{lstlisting}
155\end{frame}
156
157\begin{frame}[fragile]
158\frametitle{\texttt{Joint}: a new approach IV}
159For example, the definition of the \texttt{ERTL} language is now:
160\begin{lstlisting}
161inductive move_registers: Type[0] :=
162 | pseudo: register $\rightarrow$ move_registers
163 | hardware: Register $\rightarrow$ move_registers.
164
165definition ertl_params__: params__ :=
166 mk_params__ register register register register
167  (move_registers $\times$ move_registers)
168   register nat unit ertl_statement_extension.
169definition ertl_params_: params_ :=
170 graph_params_ ertl_params__.
171definition ertl_params0: params0 :=
172 mk_params0 ertl_params__ (list register) nat.
173definition ertl_params1: params1 :=
174 rtl_ertl_params1 ertl_params0.
175definition ertl_params: $\forall$globals. params globals ≝
176 rtl_ertl_params ertl_params0.
177\end{lstlisting}
178\end{frame}
179
180\begin{frame}
181\frametitle{\texttt{Joint}: advantages}
182\begin{itemize}
183\item
184Why use \texttt{Joint}, as opposed to defining all languages individually?
185\item
186Reduces repeated code (fewer bugs, or places to change)
187\item
188Unify some proofs, making correctness proof easier
189\item
190Easier to add new intermediate languages as needed
191\item
192Easier to see relationship between consecutive languages at a glance
193\end{itemize}
194\end{frame}
195
196\begin{frame}[fragile]
197\frametitle{Instruction embedding in \texttt{Joint}}
198\begin{lstlisting}
199inductive joint_instruction (p:params__) (globals: list ident): Type[0] :=
200 ...
201 | POP: acc_a_reg p $\rightarrow$ joint_instruction p globals
202 | PUSH: acc_a_reg p $\rightarrow$ joint_instruction p globals
203 ...
204\end{lstlisting}
205\begin{itemize}
206\item
207RTLabs (not a \texttt{Joint} language) to RTL is where instruction selection takes place
208\item
209Instructions from the MCS-51 instruction set embedded in \texttt{Joint} syntax
210\end{itemize}
211\end{frame}
212
213\begin{frame}
214\frametitle{Semantics of \texttt{Joint} I}
215\begin{itemize}
216\item
217As mentioned, use of \texttt{Joint} also unifies semantics of these languages
218\end{itemize}
219\end{frame}
220
221\begin{frame}
222\frametitle{Semantics of \texttt{Joint} II}
223\end{frame}
224
225\begin{frame}
226\frametitle{A new intermediate language}
227\begin{itemize}
228\item
229Matita backend includes a new intermediate language: RTLntc
230\item
231Sits between RTL and ERTL
232\item
233RTLntc is the RTL language where all tailcalls have been eliminated
234\item
235This language is `implicit' in the O'Caml compiler
236\item
237There, the RTL to ERTL transformation eliminates tailcalls as part of translation
238\item
239But including an extra, explicit intermediate language is `almost free' using the \texttt{Joint} language approach
240\end{itemize}
241\end{frame}
242
243\begin{frame}
244\frametitle{The LTL to LIN transform I}
245\begin{itemize}
246\item
247The order of transformations in O'Caml prototype is fixed
248\item
249Linearisation takes place at a fixed place, in the translation between LTL and LIN
250\item
251The Matita compiler is different: linearisation is a generic process
252\item
253Any graph-based language can now be linearised
254\end{itemize}
255\end{frame}
256
257\begin{frame}
258\frametitle{The LTL to LIN transform II}
259\begin{itemize}
260\item
261The CompCert backend linearises much sooner than CerCo's
262\item
263We can now experiment with linearising much earlier
264\item
265Many transformations and optimisations can work fine on a linearised form
266\end{itemize}
267\end{frame}
268
269\begin{frame}
270\frametitle{`Free time'}
271\begin{itemize}
272\item
273We had six months of `free time', time not allotted in the Contract to working on any explicit deliverable
274\item
275We invested this time working on:
276\begin{itemize}
277\item
278The proof of correctness for the assembler
279\item
280A notion of `structured traces', used throughout the compiler formalisation, as a means of eventually proving that the compiler correctly preserves costs
281\item
282Structured traces were defined in collaboration with the team at UEDIN
283\end{itemize}
284\item
285I now explain in more detail how this time was used
286\end{itemize}
287\end{frame}
288
289\begin{frame}
290\frametitle{Assembler}
291\begin{itemize}
292\item
293After LIN, compiler spits out assembly language for MCS-51
294\item
295Assembler has pseudoinstructions similar to many commercial assembly languages
296\item
297For instance, instead of computed jumps (e.g. \texttt{SJMP} to a specific address), compiler can simply spit out a generic jump instruction to a label
298\item
299Simplifies the compiler, at the expense of introducing more proof obligations
300\end{itemize}
301\end{frame}
302
303\begin{frame}
304\frametitle{A problem: jump expansion}
305\begin{itemize}
306\item
307`Jump expansion' is our name for the following (standard) problem
308\item
309Given a pseudojump to a label $l$, how best can this be expanded into an assembly instruction \texttt{SJMP}, \texttt{AJMP} or \texttt{LJMP} to a concrete address?
310\item
311Problem also applies to conditional jumps
312\item
313Problem especially relevant for MCS-51 as it has a small code memory, therefore aggressive expansion of jumps into smallest possible concrete jump instruction needed
314\item
315But a known hard problem (NP-complete depending on architecture), and easy to imagine knotty configurations where size of jumps are interdependent
316\end{itemize}
317\end{frame}
318
319\begin{frame}
320\frametitle{Jump expansion I}
321\begin{itemize}
322\item
323We employed the following tactic: split the decision over how any particular pseudoinstruction is expanded from pseudoinstruction expansion
324\item
325Call the decision maker a `policy'
326\item
327We started the proof of correctness for the assembler based on the premise that a correct policy exists
328\item
329Further, we know that the assembler only fails to assemble a program if a good policy does not exist (a side-effect of using dependent types)
330\item
331A bad policy is a function that expands a given pseudojump into a concrete jump instruction that is `too small' for the distance to be jumped
332\end{itemize}
333\end{frame}
334
335\begin{frame}
336\frametitle{Jump expansion II}
337\begin{itemize}
338\item
339Boender at UNIBO has been working on a verified implementation of a good jump expansion policy for the MCS-51
340\item
341The strategy initially translates all pseudojumps as \texttt{SJMP} and then increases their size if necessary
342\item
343Termination of the procedure is proved, as well as a safety property, stating that jumps are not expanded into jumps that are too long
344\item
345Optimality is not proved
346\item
347Boender's work is the first formal treatment of the `jump expansion problem'
348\end{itemize}
349\end{frame}
350
351\begin{frame}
352\frametitle{Assembler correctness proof}
353\begin{itemize}
354\item
355Assuming the existence of a good jump expansion property, we completed about 50\% of the correctness proof for the assembler
356\item
357Boender's work constitutes a large missing piece (modulo a few missing lemmas), but holes still remain in the main proof
358\item
359We abandoned this work to start on other tasks
360\item
361However, we intend to return to the proof, and publishing an account of the work (possibly) as a journal paper
362\end{itemize}
363\end{frame}
364
365\begin{frame}
366\frametitle{Structured traces I}
367\begin{itemize}
368\item
369We introduced a notion of `structured traces'
370\item
371These are intended to statically capture the execution trace of a program
372\item
373Come in two variants: inductive and coinductive
374\item
375Inductive is meant to capture program execution traces that eventually halt
376\item
377Coinductive is meant to capture program execution traces that diverge
378\end{itemize}
379\end{frame}
380
381\begin{frame}
382\frametitle{Structured traces II}
383\begin{itemize}
384\item
385Here, I focus on the inductive variety
386\item
387Used the most (for now) in the backend
388\item
389In particular, used in the proof that static and dynamic cost computations coincide
390\item
391This will be explained later
392\end{itemize}
393\end{frame}
394
395\begin{frame}
396\frametitle{Structured traces III}
397\begin{itemize}
398\item
399Central insight is that program execution is always in the body of some function (from \texttt{main} onwards)
400\item
401A well formed program must have labels appearing at certain spots
402\item
403Similarly, the final instruction executed when executing a function must be a \texttt{RET} instruction
404\item
405These invariants, and others, are crystalised in the specific syntactic form of a structured trace
406\item
407Some examples...
408\end{itemize}
409\end{frame}
410
411\begin{frame}[fragile]
412\frametitle{Structured trace examples I}
413Traces end with a return being executed, or not:
414\begin{lstlisting}
415inductive trace_ends_with_ret: Type[0] :=
416 | ends_with_ret: trace_ends_with_ret
417 | doesnt_end_with_ret: trace_ends_with_ret.
418\end{lstlisting}
419A trace starting with a label and ending with a return:
420\begin{lstlisting}
421inductive trace_label_return (S: abstract_status): S $\rightarrow$ S $\rightarrow$ Type[0] :=
422 | tlr_base:
423  $\forall$status_before: S.
424  $\forall$status_after: S.
425   trace_label_label S ends_with_ret status_before status_after $\rightarrow$
426    trace_label_return S status_before status_after
427  ...
428\end{lstlisting}
429\end{frame}
430
431\begin{frame}[fragile]
432\frametitle{Structured trace examples II}
433A trace starting and ending with a cost label:
434\begin{lstlisting}
435...
436with trace_label_label: trace_ends_with_ret $\rightarrow$ S $\rightarrow$ S $\rightarrow$ Type[0] :=
437 | tll_base:
438  $\forall$ends_flag: trace_ends_with_ret.
439  $\forall$start_status: S.
440  $\forall$end_status: S.
441   trace_any_label S ends_flag start_status end_status $\rightarrow$
442   as_costed S start_status $\rightarrow$
443    trace_label_label S ends_flag start_status end_status
444...
445\end{lstlisting}
446\end{frame}
447
448\begin{frame}[fragile]
449\frametitle{Structured trace examples III}
450A call followed by a label on return:
451\begin{lstlisting}
452...
453with trace_any_label: trace_ends_with_ret $\rightarrow$ S $\rightarrow$ S $\rightarrow$ Type[0] :=
454...
455 | tal_base_call:
456  $\forall$status_pre_fun_call: S.   
457  $\forall$status_start_fun_call: S.
458  $\forall$status_final: S.
459   as_execute S status_pre_fun_call status_start_fun_call $\rightarrow$
460   $\forall$H:as_classifier S status_pre_fun_call cl_call.
461    as_after_return S (mk_Sig ?? status_pre_fun_call H) status_final $\rightarrow$
462    trace_label_return S status_start_fun_call status_final $\rightarrow$
463    as_costed S status_final $\rightarrow$
464     trace_any_label S doesnt_end_with_ret status_pre_fun_call status_final
465...
466\end{lstlisting}
467\end{frame}
468
469\begin{frame}[fragile]
470\frametitle{Structured trace examples IV}
471A call followed by a non-empty trace:
472\begin{lstlisting}
473...
474| tal_step_call:
475 $\forall$end_flag: trace_ends_with_ret.
476 $\forall$status_pre_fun_call: S.
477 $\forall$status_start_fun_call: S.
478 $\forall$status_after_fun_call: S.
479 $\forall$status_final: S.
480  as_execute S status_pre_fun_call status_start_fun_call $\rightarrow$
481  $\forall$H:as_classifier S status_pre_fun_call cl_call.
482   as_after_return S (mk_Sig ?? status_pre_fun_call H) status_after_fun_call $\rightarrow$
483   trace_label_return S status_start_fun_call status_after_fun_call $\rightarrow$
484   ¬ as_costed S status_after_fun_call $\rightarrow$
485   trace_any_label S end_flag status_after_fun_call status_final $\rightarrow$
486    trace_any_label S end_flag status_pre_fun_call status_final
487...
488\end{lstlisting}
489\end{frame}
490
491\begin{frame}
492\frametitle{Static and dynamic costs I}
493\begin{itemize}
494\item
495Given a structured trace, we can compute its associated cost
496\item
497This is the \emph{static} cost of a program execution
498\item
499Similarly, given a program counter and a code memory (corresponding to the trace), we can compute a \emph{dynamic cost} of a simple block
500\item
501Do this by repeatedly fetching, obtaining the next instruction, and a new program counter
502\item
503This requires some predicates defining what a `good program' and what a `good program counter' are
504\end{itemize}
505\end{frame}
506
507\begin{frame}
508\frametitle{Static and dynamic costs II}
509\begin{itemize}
510\item
511We aim to prove that the dynamic and static cost calculations coincide
512\item
513This would imply that the static cost computation is correct
514\item
515This proof is surprisingly tricky to complete (about 3 man months of work so far)
516\item
517Is now about 75\% complete
518\end{itemize}
519\end{frame}
520
521\begin{frame}
522\frametitle{Changes ported to O'Caml prototype}
523XXX: todo
524\end{frame}
525
526\begin{frame}
527\frametitle{Improvements in Matita}
528\begin{itemize}
529\item
530Part of the motivation for using Matita was for CerCo to act a `stress test'
531\item
532The proofs talked about in this talk have done this
533\item
534Many improvements to Matita have been made since the last project meeting
535\item
536These include major speed-ups of e.g. printing large goals, bug fixes, the porting of CerCo code to standard library, and more options for passing to tactics
537\end{itemize}
538\end{frame}
539
540\begin{frame}
541\frametitle{Summary}
542We have:
543\begin{itemize}
544\item
545Translated the O'Caml prototype's backend intermediate languages into Matita
546\item
547Implemented the translations between languages, and given the intermediate languages a semantics
548\item
549Refactored many of the backend intermediate languages into a common, parametric `joint' language, that is later specialised
550\item
551Spotted opportunities for commuting backend translation passes
552\item
553Used six months of `free time' to define structured traces and start the proof of correctness for the assembler
554\end{itemize}
555\end{frame}
556
557\end{document}
Note: See TracBrowser for help on using the repository browser.