source: Deliverables/D1.2/Presentations/WP4-dominic.tex @ 1840

Last change on this file since 1840 was 1840, checked in by mulligan, 8 years ago

Changes to my presentation based on feedback from practice session yesterday. More changes to come.

File size: 17.8 KB
Line 
1\documentclass[serif]{beamer}
2
3\usepackage{amssymb}
4\usepackage[english]{babel}
5\usepackage{listings}
6\usepackage{microtype}
7
8\usetheme{Frankfurt}
9\logo{\includegraphics[height=1.0cm]{fetopen.png}}
10
11\author{Dominic Mulligan}
12\title{CerCo Work Package 4}
13\date{CerCo project review meeting\\March 2012}
14
15\lstdefinelanguage{matita-ocaml}
16  {keywords={definition,coercion,lemma,theorem,remark,inductive,record,qed,let,in,rec,match,return,with,Type,try},
17   morekeywords={[2]whd,normalize,elim,cases,destruct},
18   morekeywords={[3]type,of},
19   mathescape=true,
20  }
21
22\lstset{language=matita-ocaml,basicstyle=\scriptsize\tt,columns=flexible,breaklines=false,
23        keywordstyle=\color{red}\bfseries,
24        keywordstyle=[2]\color{blue},
25        keywordstyle=[3]\color{blue}\bfseries,
26        commentstyle=\color{green},
27        stringstyle=\color{blue},
28        showspaces=false,showstringspaces=false}
29
30\begin{document}
31
32\begin{frame}
33\maketitle
34\end{frame}
35
36\begin{frame}
37\frametitle{Summary}
38Relevant tasks: T4.2 and T4.3 (from the CerCo Contract):
39\begin{quotation}
40\textbf{Task 4.2}
41Functional encoding in the Calculus of Inductive Construction (indicative effort: UNIBO: 8; UDP: 2; UEDIN: 0)
42\end{quotation}
43
44\begin{quotation}
45\textbf{Task 4.3}
46Formal semantics of intermediate languages (indicative effort: UNIBO: 4; UDP: 0; UEDIN: 0)
47\end{quotation}
48\end{frame}
49
50\begin{frame}
51\frametitle{Contents}
52\tableofcontents
53\end{frame}
54
55\section{Rationalisation of backend languages}
56
57\begin{frame}
58\frametitle{Backend intermediate languages I}
59\begin{itemize}
60\item
61OCaml prototype has five backend intermediate languages: RTLabs, RTL, ERTL, LTL, LIN
62\item
63RTLabs is the `frontier' between backend and frontend, last abstract language
64\item
65RTLabs, RTL, ERTL and LTL are graph based languages: functions represented as graphs of statements, with entry and exit points
66\item
67LIN is a linearised form of LTL, and is the exit point of the compiler's backend
68\item
69In contrast to frontend, backend is very different to CompCert's
70\end{itemize}
71\end{frame}
72
73\begin{frame}
74\frametitle{Backend intermediate languages II}
75\begin{small}
76\begin{tabbing}
77\quad \=\,\quad \=\ \\
78\textsf{RTLabs}\\
79\> $\downarrow$ \> copy propagation ($\times$) \\
80\> $\downarrow$ \> instruction selection (\checkmark) \\
81\> $\downarrow$ \> change of memory models in compiler (\checkmark) \\
82\textsf{RTL}\\
83\> $\downarrow$ \> constant propagation ($\times$) \\
84\> $\downarrow$ \> calling convention made explicit (\checkmark) \\
85\> $\downarrow$ \> layout of activation records (\checkmark) \\
86\textsf{ERTL}\\
87\> $\downarrow$ \> register allocation and spilling (\checkmark) \\
88\> $\downarrow$ \> dead code elimination (\checkmark) \\
89\textsf{LTL}\\
90\> $\downarrow$ \> function linearisation (\checkmark) \\
91\> $\downarrow$ \> branch compression ($\times$) \\
92\textsf{LIN}\\
93\> $\downarrow$ \> relabeling (\checkmark) \\
94\textsf{ASM}
95\end{tabbing}
96\end{small}
97\end{frame}
98
99\begin{frame}
100\frametitle{\texttt{Joint}: a new approach I}
101\begin{itemize}
102\item
103Consecutive languages in backend must be similar
104\item
105Transformations between languages translate away some small specific set of features
106\item
107But looking at OCaml code, not clear precisely what differences between languages are, as code is repeated
108\item
109Not clear if translation passes can commute, for instance
110\item
111CerCo passes are in a different order to CompCert (calling convention and register allocation done in different places)
112\item
113Instruction selection done early: changing subset of instructions used would require instructions to be duplicated everywhere in backend
114\end{itemize}
115\end{frame}
116
117\begin{frame}
118\frametitle{\texttt{Joint}: a new approach II}
119\begin{itemize}
120\item
121Idea: all of these languages are just instances of a single language
122\item
123This language \texttt{Joint} is parameterised by a type of registers to be used in instructions, and so forth
124\item
125Each language after RTLabs is now just defined as the \texttt{Joint} language instantiated with some concrete types
126\item
127Similarly for semantics: common definitions that take e.g. type representing program counters as parameters
128\end{itemize}
129\end{frame}
130
131\begin{frame}[fragile]
132\frametitle{\texttt{Joint}: a new approach III}
133\texttt{Joint} instructions allow us to embed language-specific instructions:
134\begin{lstlisting}
135inductive joint_instruction (p: params__) (globals: list ident): Type[0] :=
136 | COMMENT: String $\rightarrow$ joint_instruction p globals
137 | COST_LABEL: costlabel $\rightarrow$ joint_instruction p globals
138 ...
139 | COND: acc_a_reg p $\rightarrow$ label $\rightarrow$ joint_instruction p globals
140 | extension: extend_statements p $\rightarrow$ joint_instruction p globals.
141\end{lstlisting}
142
143\begin{lstlisting}
144inductive ertl_statement_extension: Type[0] :=
145 | ertl_st_ext_new_frame: ertl_statement_extension
146 | ertl_st_ext_del_frame: ertl_statement_extension
147 | ertl_st_ext_frame_size: register $\rightarrow$ ertl_statement_extension.
148\end{lstlisting}
149\end{frame}
150
151\begin{frame}
152\frametitle{\texttt{Joint}: a new approach IV}
153\begin{itemize}
154\item
155Languages that provide extensions need to provide translations and semantics for those extensions
156\item
157Everything else can be handled at the \texttt{Joint}-level
158\item
159This modularises the handling of these languages
160\end{itemize}
161\end{frame}
162
163\begin{frame}
164\frametitle{\texttt{Joint}: advantages I}
165\begin{itemize}
166\item
167We can recover the concrete OCaml languages by instantiating parameterized types
168\item
169Why use \texttt{Joint}?
170\item
171Reduces repeated code (fewer bugs, or places to change)
172\item
173Unify some proofs, making correctness proof easier
174\end{itemize}
175\end{frame}
176
177\begin{frame}
178\frametitle{\texttt{Joint}: advantages II}
179\begin{itemize}
180\item
181Easier to add new intermediate languages as needed
182\item
183Easier to see relationship between consecutive languages at a glance
184\item
185MCS-51 instruction set embedded in \texttt{Joint} syntax
186\item
187Simplifies instruction selection
188\item
189We can investigate which translation passes commute much more easily
190\end{itemize}
191\end{frame}
192
193\begin{frame}
194\frametitle{Semantics of \texttt{Joint} I}
195\begin{itemize}
196\item
197As mentioned, use of \texttt{Joint} also unifies semantics of these languages
198\item
199We use several sets of records, which represent the state that a program is in
200\item
201These records are parametric in representations for e.g. frames
202\end{itemize}
203\end{frame}
204
205\begin{frame}
206\frametitle{A new intermediate language}
207\begin{itemize}
208\item
209Matita backend includes a new intermediate language: RTLntc
210\item
211Sits between RTL and ERTL
212\item
213RTLntc is the RTL language where all tailcalls have been eliminated
214\item
215This language is `implicit' in the OCaml compiler
216\item
217There, the RTL to ERTL transformation eliminates tailcalls as part of translation
218\item
219But including an extra, explicit intermediate language is `almost free' using the \texttt{Joint} language approach
220\end{itemize}
221\end{frame}
222
223\begin{frame}
224\frametitle{The LTL to LIN transform I}
225\begin{itemize}
226\item
227\texttt{Joint} clearly separates fetching from program execution
228\item
229We can vary how one works whilst fixing the other
230\item
231Linearisation is moving from fetching from a graph-based language to fetching from a list-based program representation
232\item
233The order of transformations in OCaml prototype is fixed
234\item
235Linearisation takes place at a fixed place, in the translation between LTL and LIN
236\item
237The Matita compiler is different: linearisation is a generic process
238\item
239Any graph-based language can now be linearised
240\end{itemize}
241\end{frame}
242
243\begin{frame}
244\frametitle{The LTL to LIN transform II}
245\begin{itemize}
246\item
247CompCert backend linearises much sooner than CerCo's
248\item
249Can now experiment with linearising much earlier
250\item
251Many transformations and optimisations can work fine on a linearised form
252\item
253Only place in the (current) backend that requires a graph-based language is in the ERTL pass, where we do a dataflow analysis
254\end{itemize}
255\end{frame}
256
257\section{Assembler correctness proof and structured traces}
258
259\begin{frame}
260\frametitle{Time not reported}
261\begin{itemize}
262\item
263We had six months of time which is not reported on in any deliverable
264\item
265We invested this time working on:
266\begin{itemize}
267\item
268The global proof sketch
269\item
270The setup of `proof infrastructure', common definitions, lemmas, invariants etc. required for main body of proof
271\item
272The proof of correctness for the assembler
273\item
274A notion of `structured traces', used throughout the compiler formalisation, as a means of eventually proving that the compiler correctly preserves costs
275\item
276Structured traces were defined in collaboration with the team at UEDIN
277\end{itemize}
278\end{itemize}
279\end{frame}
280
281\begin{frame}
282\frametitle{Assembler}
283\begin{itemize}
284\item
285After LIN, compiler spits out assembly language for MCS-51
286\item
287Assembler has pseudoinstructions similar to many commercial assembly languages
288\item
289For instance, instead of computed jumps (e.g. \texttt{SJMP} to a specific address), compiler can simply spit out a generic jump instruction to a label
290\item
291Simplifies the compiler, at the expense of introducing more proof obligations
292\item
293Now need a formalized assembler (a step further than CompCert)
294\end{itemize}
295\end{frame}
296
297\begin{frame}
298\frametitle{A problem: jump expansion}
299\begin{itemize}
300\item
301`Jump expansion' is our name for the standard `branch displacement' problem
302\item
303Given a pseudojump to a label $l$, how best can this be expanded into an assembly instruction \texttt{SJMP}, \texttt{AJMP} or \texttt{LJMP} to a concrete address?
304\item
305Problem also applies to conditional jumps
306\item
307Problem especially relevant for MCS-51 as it has a small code memory, therefore aggressive expansion of jumps into smallest possible concrete jump instruction needed
308\item
309But a known hard problem (NP-complete depending on architecture), and easy to imagine knotty configurations where size of jumps are interdependent
310\end{itemize}
311\end{frame}
312
313\begin{frame}
314\frametitle{Jump expansion I}
315\begin{itemize}
316\item
317We employed the following tactic: split the decision over how any particular pseudoinstruction is expanded from pseudoinstruction expansion
318\item
319Call the decision maker a `policy'
320\item
321We started the proof of correctness for the assembler based on the premise that a correct policy exists
322\item
323Further, we know that the assembler only fails to assemble a program if a good policy does not exist (a side-effect of using dependent types)
324\item
325A bad policy is a function that expands a given pseudojump into a concrete jump instruction that is `too small' for the distance to be jumped, or makes the program consume too much memory
326\end{itemize}
327\end{frame}
328
329\begin{frame}
330\frametitle{Jump expansion II}
331\begin{itemize}
332\item
333Jaap Boender at UNIBO has been working on a verified implementation of a good jump expansion policy for the MCS-51
334\item
335The strategy initially translates all pseudojumps as \texttt{SJMP} and then increases their size if necessary
336\item
337Termination of the procedure is proved, as well as a safety property, stating that jumps are not expanded into jumps that are too long
338\item
339His strategy is not optimal (though the computed solution is optimal for the strategy employed)
340\item
341Jaap's work is the first formal treatment of the `jump expansion problem'
342\end{itemize}
343\end{frame}
344
345\begin{frame}
346\frametitle{Assembler correctness proof}
347\begin{itemize}
348\item
349Assuming the existence of a good jump expansion property, we completed about 75\% of the correctness proof for the assembler
350\item
351Jaap's work has just been completed (modulo a few missing lemmas)
352\item
353Postponed the remainder of main assembler proof to start work on other tasks (and for Jaap to finish)
354\item
355We intend to return to proof, and publish an account of the work (possibly) as a journal paper
356\end{itemize}
357\end{frame}
358
359\begin{frame}[fragile]
360\frametitle{Who pays? I}
361\begin{columns}
362\begin{column}[b]{0.5\linewidth}
363\centering
364\begin{lstlisting}
365int main(int argc, char** argv) {
366  cost_label1:
367    ...
368    some_function();
369  cost_label2:
370    ...
371}
372\end{lstlisting}
373\end{column}
374\begin{column}[b]{0.5\linewidth}
375\centering
376\begin{lstlisting}
377  main:
378    ...
379  cost_label1:
380    ...
381    LCALL some_function
382  cost_label2:
383    ...
384\end{lstlisting}
385\end{column}
386\end{columns}
387\begin{itemize}
388\item
389Question: where do we put cost labels to capture execution costs?
390\item
391Proof obligations complicated by panoply of labels
392\item
393Doesn't work well with \texttt{g(h() + 2 + f())}
394\item
395Is \texttt{cost\_label2} ever reached? \texttt{some\_function()} must terminate!
396\item
397\texttt{some\_function()} may not return correctly
398\end{itemize}
399\end{frame}
400
401\begin{frame}
402\frametitle{Who pays? II}
403\begin{itemize}
404\item
405Solution: omit \texttt{cost\_label2} and just keep \texttt{cost\_label1}
406\item
407We pay for everything `up front' when entering a function
408\item
409No need to prove \texttt{some\_function()} terminates
410\item
411But now execution of functions in CerCo takes a particular form
412\item
413Functions begin with a label, call other functions that begin with a label, eventually return, but \emph{return} to the correct place
414\item
415`Recursive structure'
416\end{itemize}
417\end{frame}
418
419\begin{frame}
420\frametitle{Recursive structure of `good' execution}
421\begin{center}
422\includegraphics[scale=0.33]{recursive_structure.png}
423\end{center}
424\end{frame}
425
426\begin{frame}
427\frametitle{Structured traces I}
428\begin{itemize}
429\item
430We introduced a notion of `structured traces'
431\item
432These are intended to statically capture the (good) execution traces of a program
433\item
434To borrow a slogan: they are the `computational content of a well-formed program's execution'
435\item
436Come in two variants: inductive and coinductive
437\item
438Inductive captures program execution traces that eventually halt, coinductive ones that diverge
439\end{itemize}
440\end{frame}
441
442\begin{frame}
443\frametitle{Structured traces II}
444\begin{itemize}
445\item
446I focus on the inductive variety, as used the most (for now) in the backend
447\item
448In particular, used in the proof that static and dynamic cost computations coincide
449\item
450Traces preserved by backend compilation, initially created at RTL
451\item
452This will be explained later
453\end{itemize}
454\end{frame}
455
456\begin{frame}
457\frametitle{Structured traces III}
458\begin{itemize}
459\item
460Central insight is that program execution is always in the body of some function (from \texttt{main} onwards)
461\item
462A well formed program must have labels appearing at certain spots
463\item
464Similarly, the final instruction executed when executing a function must be a \texttt{RET}
465\item
466Execution must then continue in body of calling function, at correct place
467\item
468These invariants, and others, are crystalised in the specific syntactic form of a structured trace
469\end{itemize}
470\end{frame}
471
472\begin{frame}
473\frametitle{Static and dynamic costs I}
474\begin{itemize}
475\item
476Given a structured trace, we can compute its associated cost
477\item
478This is the \emph{static} cost of a program execution
479\item
480Similarly, given a program counter and a code memory (corresponding to the trace), we can compute a \emph{dynamic cost} of a simple block
481\item
482Do this by repeatedly fetching, obtaining the next instruction, and a new program counter
483\item
484This requires some predicates defining what a `good program' and what a `good program counter' are
485\item
486Want program counters on instruction boundaries
487\end{itemize}
488\end{frame}
489
490\begin{frame}
491\frametitle{Static and dynamic costs II}
492\begin{itemize}
493\item
494We aim to prove that the dynamic and static cost calculations coincide
495\item
496This would imply that the static cost computation is correct
497\item
498This proof is surprisingly tricky to complete (about 3 man months of work so far)
499\item
500Is now about 95\% complete
501\end{itemize}
502\end{frame}
503
504\section{Changes to tools and prototypes, looking forward}
505
506\begin{frame}
507\frametitle{Changes ported to OCaml prototype}
508\begin{itemize}
509\item
510Bug fixes spotted in the formalisation so far have been merged back into the OCaml compiler
511\item
512Larger changes like the \texttt{Joint} machinery have so far not
513\item
514It is unclear whether they will be
515\item
516Just a generalisation of what is already there
517\item
518Supposed to make formalisation easier
519\item
520Further, we want to ensure that the untrusted compiler is as correct as possible, for experiments in e.g. Frama-C
521\item
522Porting a large change to the untrusted compiler would jeopardise these experiments
523\end{itemize}
524\end{frame}
525
526\begin{frame}
527\frametitle{Improvements in Matita}
528\begin{itemize}
529\item
530Part of the motivation for using Matita was for CerCo to act a `stress test'
531\item
532The proofs talked about in this talk have done this
533\item
534Many improvements to Matita have been made since the last project meeting
535\item
536These include major speed-ups of e.g. printing large goals, bug fixes, the porting of CerCo code to standard library, and more options for passing to tactics
537\end{itemize}
538\end{frame}
539
540\begin{frame}
541\frametitle{The next period}
542UNIBO has following pool of remaining manpower (postdocs):
543\begin{center}
544\begin{tabular}{ll}
545Person & Man months remaining \\
546\hline
547Boender & 10 months \\
548Mulligan & 6 months \\
549Tranquilli & 10 months \\
550\end{tabular}
551\end{center}
552\begin{itemize}
553\item
554Boender finishing assembler correctness proof
555\item
556Mulligan proofs of correctness for 1 intermediate language
557\item
558Tranquilli proofs of correctness for 2 intermediate languages
559\item
560Sacerdoti Coen `floating'
561\item
562Believe we have enough manpower to complete backend (required 21 man months)
563\end{itemize}
564\end{frame}
565
566\begin{frame}
567\frametitle{Summary}
568We have:
569\begin{itemize}
570\item
571Translated the OCaml prototype's backend intermediate languages into Matita
572\item
573Implemented the translations between languages, and given the intermediate languages a semantics
574\item
575Refactored many of the backend intermediate languages into a common, parametric `joint' language, that is later specialised
576\item
577Spotted opportunities for possibly commuting backend translation passes
578\item
579Used six months to define structured traces and start the proof of correctness for the assembler
580\item
581Distinguished our proof from CompCert's by heavy use of dependent types throughout whole compiler
582\end{itemize}
583\end{frame}
584
585\end{document}
Note: See TracBrowser for help on using the repository browser.