source: Deliverables/D1.2/Presentations/WP4-dominic.tex @ 1840

Last change on this file since 1840 was 1840, checked in by mulligan, 8 years ago

Changes to my presentation based on feedback from practice session yesterday. More changes to come.

File size: 17.8 KB
RevLine 
[1813]1\documentclass[serif]{beamer}
2
[1840]3\usepackage{amssymb}
[1813]4\usepackage[english]{babel}
5\usepackage{listings}
6\usepackage{microtype}
7
[1822]8\usetheme{Frankfurt}
9\logo{\includegraphics[height=1.0cm]{fetopen.png}}
10
[1813]11\author{Dominic Mulligan}
12\title{CerCo Work Package 4}
13\date{CerCo project review meeting\\March 2012}
14
15\lstdefinelanguage{matita-ocaml}
16  {keywords={definition,coercion,lemma,theorem,remark,inductive,record,qed,let,in,rec,match,return,with,Type,try},
17   morekeywords={[2]whd,normalize,elim,cases,destruct},
18   morekeywords={[3]type,of},
19   mathescape=true,
20  }
21
22\lstset{language=matita-ocaml,basicstyle=\scriptsize\tt,columns=flexible,breaklines=false,
23        keywordstyle=\color{red}\bfseries,
24        keywordstyle=[2]\color{blue},
25        keywordstyle=[3]\color{blue}\bfseries,
26        commentstyle=\color{green},
27        stringstyle=\color{blue},
28        showspaces=false,showstringspaces=false}
29
30\begin{document}
31
32\begin{frame}
33\maketitle
34\end{frame}
35
36\begin{frame}
[1840]37\frametitle{Summary}
[1813]38Relevant tasks: T4.2 and T4.3 (from the CerCo Contract):
39\begin{quotation}
40\textbf{Task 4.2}
41Functional encoding in the Calculus of Inductive Construction (indicative effort: UNIBO: 8; UDP: 2; UEDIN: 0)
42\end{quotation}
43
44\begin{quotation}
45\textbf{Task 4.3}
46Formal semantics of intermediate languages (indicative effort: UNIBO: 4; UDP: 0; UEDIN: 0)
47\end{quotation}
48\end{frame}
49
50\begin{frame}
[1840]51\frametitle{Contents}
52\tableofcontents
[1813]53\end{frame}
54
[1840]55\section{Rationalisation of backend languages}
[1813]56
57\begin{frame}
58\frametitle{Backend intermediate languages I}
59\begin{itemize}
60\item
[1840]61OCaml prototype has five backend intermediate languages: RTLabs, RTL, ERTL, LTL, LIN
[1813]62\item
63RTLabs is the `frontier' between backend and frontend, last abstract language
64\item
65RTLabs, RTL, ERTL and LTL are graph based languages: functions represented as graphs of statements, with entry and exit points
66\item
67LIN is a linearised form of LTL, and is the exit point of the compiler's backend
[1824]68\item
69In contrast to frontend, backend is very different to CompCert's
[1813]70\end{itemize}
71\end{frame}
72
73\begin{frame}
74\frametitle{Backend intermediate languages II}
75\begin{small}
76\begin{tabbing}
[1840]77\quad \=\,\quad \=\ \\
[1813]78\textsf{RTLabs}\\
[1840]79\> $\downarrow$ \> copy propagation ($\times$) \\
80\> $\downarrow$ \> instruction selection (\checkmark) \\
81\> $\downarrow$ \> change of memory models in compiler (\checkmark) \\
[1813]82\textsf{RTL}\\
[1840]83\> $\downarrow$ \> constant propagation ($\times$) \\
84\> $\downarrow$ \> calling convention made explicit (\checkmark) \\
85\> $\downarrow$ \> layout of activation records (\checkmark) \\
[1813]86\textsf{ERTL}\\
[1840]87\> $\downarrow$ \> register allocation and spilling (\checkmark) \\
88\> $\downarrow$ \> dead code elimination (\checkmark) \\
[1813]89\textsf{LTL}\\
[1840]90\> $\downarrow$ \> function linearisation (\checkmark) \\
91\> $\downarrow$ \> branch compression ($\times$) \\
[1813]92\textsf{LIN}\\
[1840]93\> $\downarrow$ \> relabeling (\checkmark) \\
94\textsf{ASM}
[1813]95\end{tabbing}
96\end{small}
97\end{frame}
98
99\begin{frame}
[1815]100\frametitle{\texttt{Joint}: a new approach I}
101\begin{itemize}
102\item
[1824]103Consecutive languages in backend must be similar
[1815]104\item
[1824]105Transformations between languages translate away some small specific set of features
[1815]106\item
[1840]107But looking at OCaml code, not clear precisely what differences between languages are, as code is repeated
[1815]108\item
[1824]109Not clear if translation passes can commute, for instance
[1815]110\item
[1824]111CerCo passes are in a different order to CompCert (calling convention and register allocation done in different places)
[1815]112\item
[1824]113Instruction selection done early: changing subset of instructions used would require instructions to be duplicated everywhere in backend
[1815]114\end{itemize}
[1813]115\end{frame}
116
117\begin{frame}
[1815]118\frametitle{\texttt{Joint}: a new approach II}
119\begin{itemize}
120\item
121Idea: all of these languages are just instances of a single language
122\item
123This language \texttt{Joint} is parameterised by a type of registers to be used in instructions, and so forth
124\item
125Each language after RTLabs is now just defined as the \texttt{Joint} language instantiated with some concrete types
126\item
127Similarly for semantics: common definitions that take e.g. type representing program counters as parameters
128\end{itemize}
[1813]129\end{frame}
130
[1815]131\begin{frame}[fragile]
132\frametitle{\texttt{Joint}: a new approach III}
133\texttt{Joint} instructions allow us to embed language-specific instructions:
134\begin{lstlisting}
135inductive joint_instruction (p: params__) (globals: list ident): Type[0] :=
136 | COMMENT: String $\rightarrow$ joint_instruction p globals
137 | COST_LABEL: costlabel $\rightarrow$ joint_instruction p globals
138 ...
[1816]139 | COND: acc_a_reg p $\rightarrow$ label $\rightarrow$ joint_instruction p globals
[1815]140 | extension: extend_statements p $\rightarrow$ joint_instruction p globals.
141\end{lstlisting}
142
143\begin{lstlisting}
144inductive ertl_statement_extension: Type[0] :=
145 | ertl_st_ext_new_frame: ertl_statement_extension
146 | ertl_st_ext_del_frame: ertl_statement_extension
147 | ertl_st_ext_frame_size: register $\rightarrow$ ertl_statement_extension.
148\end{lstlisting}
149\end{frame}
150
[1840]151\begin{frame}
[1815]152\frametitle{\texttt{Joint}: a new approach IV}
[1824]153\begin{itemize}
154\item
[1840]155Languages that provide extensions need to provide translations and semantics for those extensions
[1824]156\item
[1840]157Everything else can be handled at the \texttt{Joint}-level
158\item
159This modularises the handling of these languages
[1824]160\end{itemize}
161\end{frame}
162
[1813]163\begin{frame}
[1840]164\frametitle{\texttt{Joint}: advantages I}
[1815]165\begin{itemize}
166\item
[1840]167We can recover the concrete OCaml languages by instantiating parameterized types
[1815]168\item
[1840]169Why use \texttt{Joint}?
170\item
[1815]171Reduces repeated code (fewer bugs, or places to change)
172\item
173Unify some proofs, making correctness proof easier
[1840]174\end{itemize}
175\end{frame}
176
177\begin{frame}
178\frametitle{\texttt{Joint}: advantages II}
179\begin{itemize}
[1815]180\item
181Easier to add new intermediate languages as needed
182\item
183Easier to see relationship between consecutive languages at a glance
184\item
[1840]185MCS-51 instruction set embedded in \texttt{Joint} syntax
[1815]186\item
[1840]187Simplifies instruction selection
188\item
[1824]189We can investigate which translation passes commute much more easily
[1815]190\end{itemize}
191\end{frame}
192
[1813]193\begin{frame}
194\frametitle{Semantics of \texttt{Joint} I}
[1815]195\begin{itemize}
196\item
197As mentioned, use of \texttt{Joint} also unifies semantics of these languages
[1816]198\item
199We use several sets of records, which represent the state that a program is in
200\item
201These records are parametric in representations for e.g. frames
[1815]202\end{itemize}
[1813]203\end{frame}
204
205\begin{frame}
206\frametitle{A new intermediate language}
207\begin{itemize}
208\item
209Matita backend includes a new intermediate language: RTLntc
210\item
211Sits between RTL and ERTL
212\item
213RTLntc is the RTL language where all tailcalls have been eliminated
214\item
[1840]215This language is `implicit' in the OCaml compiler
[1813]216\item
217There, the RTL to ERTL transformation eliminates tailcalls as part of translation
218\item
219But including an extra, explicit intermediate language is `almost free' using the \texttt{Joint} language approach
220\end{itemize}
221\end{frame}
222
223\begin{frame}
224\frametitle{The LTL to LIN transform I}
225\begin{itemize}
226\item
[1824]227\texttt{Joint} clearly separates fetching from program execution
228\item
229We can vary how one works whilst fixing the other
230\item
231Linearisation is moving from fetching from a graph-based language to fetching from a list-based program representation
232\item
[1840]233The order of transformations in OCaml prototype is fixed
[1813]234\item
235Linearisation takes place at a fixed place, in the translation between LTL and LIN
236\item
237The Matita compiler is different: linearisation is a generic process
238\item
239Any graph-based language can now be linearised
240\end{itemize}
241\end{frame}
242
243\begin{frame}
244\frametitle{The LTL to LIN transform II}
245\begin{itemize}
246\item
[1840]247CompCert backend linearises much sooner than CerCo's
[1813]248\item
[1840]249Can now experiment with linearising much earlier
[1813]250\item
251Many transformations and optimisations can work fine on a linearised form
[1824]252\item
[1840]253Only place in the (current) backend that requires a graph-based language is in the ERTL pass, where we do a dataflow analysis
[1813]254\end{itemize}
255\end{frame}
256
[1840]257\section{Assembler correctness proof and structured traces}
258
[1813]259\begin{frame}
[1840]260\frametitle{Time not reported}
[1813]261\begin{itemize}
262\item
[1824]263We had six months of time which is not reported on in any deliverable
[1813]264\item
265We invested this time working on:
266\begin{itemize}
267\item
[1824]268The global proof sketch
269\item
270The setup of `proof infrastructure', common definitions, lemmas, invariants etc. required for main body of proof
271\item
[1813]272The proof of correctness for the assembler
273\item
274A notion of `structured traces', used throughout the compiler formalisation, as a means of eventually proving that the compiler correctly preserves costs
275\item
276Structured traces were defined in collaboration with the team at UEDIN
277\end{itemize}
278\end{itemize}
279\end{frame}
280
281\begin{frame}
282\frametitle{Assembler}
283\begin{itemize}
284\item
285After LIN, compiler spits out assembly language for MCS-51
286\item
287Assembler has pseudoinstructions similar to many commercial assembly languages
288\item
289For instance, instead of computed jumps (e.g. \texttt{SJMP} to a specific address), compiler can simply spit out a generic jump instruction to a label
290\item
291Simplifies the compiler, at the expense of introducing more proof obligations
[1840]292\item
293Now need a formalized assembler (a step further than CompCert)
[1813]294\end{itemize}
295\end{frame}
296
297\begin{frame}
298\frametitle{A problem: jump expansion}
299\begin{itemize}
300\item
[1824]301`Jump expansion' is our name for the standard `branch displacement' problem
[1813]302\item
303Given a pseudojump to a label $l$, how best can this be expanded into an assembly instruction \texttt{SJMP}, \texttt{AJMP} or \texttt{LJMP} to a concrete address?
304\item
305Problem also applies to conditional jumps
306\item
307Problem especially relevant for MCS-51 as it has a small code memory, therefore aggressive expansion of jumps into smallest possible concrete jump instruction needed
308\item
309But a known hard problem (NP-complete depending on architecture), and easy to imagine knotty configurations where size of jumps are interdependent
310\end{itemize}
311\end{frame}
312
313\begin{frame}
314\frametitle{Jump expansion I}
315\begin{itemize}
316\item
317We employed the following tactic: split the decision over how any particular pseudoinstruction is expanded from pseudoinstruction expansion
318\item
319Call the decision maker a `policy'
320\item
321We started the proof of correctness for the assembler based on the premise that a correct policy exists
322\item
323Further, we know that the assembler only fails to assemble a program if a good policy does not exist (a side-effect of using dependent types)
324\item
[1824]325A bad policy is a function that expands a given pseudojump into a concrete jump instruction that is `too small' for the distance to be jumped, or makes the program consume too much memory
[1813]326\end{itemize}
327\end{frame}
328
329\begin{frame}
330\frametitle{Jump expansion II}
331\begin{itemize}
332\item
[1840]333Jaap Boender at UNIBO has been working on a verified implementation of a good jump expansion policy for the MCS-51
[1813]334\item
335The strategy initially translates all pseudojumps as \texttt{SJMP} and then increases their size if necessary
336\item
337Termination of the procedure is proved, as well as a safety property, stating that jumps are not expanded into jumps that are too long
338\item
[1824]339His strategy is not optimal (though the computed solution is optimal for the strategy employed)
[1813]340\item
[1840]341Jaap's work is the first formal treatment of the `jump expansion problem'
[1813]342\end{itemize}
343\end{frame}
344
345\begin{frame}
346\frametitle{Assembler correctness proof}
347\begin{itemize}
348\item
[1824]349Assuming the existence of a good jump expansion property, we completed about 75\% of the correctness proof for the assembler
[1813]350\item
[1840]351Jaap's work has just been completed (modulo a few missing lemmas)
[1813]352\item
[1840]353Postponed the remainder of main assembler proof to start work on other tasks (and for Jaap to finish)
[1813]354\item
[1840]355We intend to return to proof, and publish an account of the work (possibly) as a journal paper
[1813]356\end{itemize}
357\end{frame}
358
[1829]359\begin{frame}[fragile]
360\frametitle{Who pays? I}
[1840]361\begin{columns}
362\begin{column}[b]{0.5\linewidth}
363\centering
[1829]364\begin{lstlisting}
365int main(int argc, char** argv) {
[1840]366  cost_label1:
[1829]367    ...
368    some_function();
[1840]369  cost_label2:
[1829]370    ...
371}
372\end{lstlisting}
[1840]373\end{column}
374\begin{column}[b]{0.5\linewidth}
375\centering
376\begin{lstlisting}
377  main:
378    ...
379  cost_label1:
380    ...
381    LCALL some_function
382  cost_label2:
383    ...
384\end{lstlisting}
385\end{column}
386\end{columns}
[1829]387\begin{itemize}
388\item
389Question: where do we put cost labels to capture execution costs?
390\item
391Proof obligations complicated by panoply of labels
392\item
393Doesn't work well with \texttt{g(h() + 2 + f())}
394\item
[1840]395Is \texttt{cost\_label2} ever reached? \texttt{some\_function()} must terminate!
[1829]396\item
397\texttt{some\_function()} may not return correctly
398\end{itemize}
399\end{frame}
400
[1813]401\begin{frame}
[1829]402\frametitle{Who pays? II}
403\begin{itemize}
404\item
[1840]405Solution: omit \texttt{cost\_label2} and just keep \texttt{cost\_label1}
[1829]406\item
407We pay for everything `up front' when entering a function
408\item
409No need to prove \texttt{some\_function()} terminates
410\item
411But now execution of functions in CerCo takes a particular form
412\item
413Functions begin with a label, call other functions that begin with a label, eventually return, but \emph{return} to the correct place
414\item
415`Recursive structure'
416\end{itemize}
417\end{frame}
418
419\begin{frame}
[1840]420\frametitle{Recursive structure of `good' execution}
421\begin{center}
422\includegraphics[scale=0.33]{recursive_structure.png}
423\end{center}
424\end{frame}
425
426\begin{frame}
[1813]427\frametitle{Structured traces I}
428\begin{itemize}
429\item
430We introduced a notion of `structured traces'
431\item
[1829]432These are intended to statically capture the (good) execution traces of a program
[1813]433\item
[1840]434To borrow a slogan: they are the `computational content of a well-formed program's execution'
[1829]435\item
[1813]436Come in two variants: inductive and coinductive
437\item
[1840]438Inductive captures program execution traces that eventually halt, coinductive ones that diverge
[1813]439\end{itemize}
440\end{frame}
441
442\begin{frame}
443\frametitle{Structured traces II}
444\begin{itemize}
445\item
[1840]446I focus on the inductive variety, as used the most (for now) in the backend
[1813]447\item
448In particular, used in the proof that static and dynamic cost computations coincide
449\item
[1840]450Traces preserved by backend compilation, initially created at RTL
451\item
[1813]452This will be explained later
453\end{itemize}
454\end{frame}
455
456\begin{frame}
457\frametitle{Structured traces III}
458\begin{itemize}
459\item
460Central insight is that program execution is always in the body of some function (from \texttt{main} onwards)
461\item
462A well formed program must have labels appearing at certain spots
463\item
[1829]464Similarly, the final instruction executed when executing a function must be a \texttt{RET}
[1813]465\item
[1829]466Execution must then continue in body of calling function, at correct place
467\item
[1813]468These invariants, and others, are crystalised in the specific syntactic form of a structured trace
469\end{itemize}
470\end{frame}
471
472\begin{frame}
473\frametitle{Static and dynamic costs I}
474\begin{itemize}
475\item
476Given a structured trace, we can compute its associated cost
477\item
478This is the \emph{static} cost of a program execution
479\item
480Similarly, given a program counter and a code memory (corresponding to the trace), we can compute a \emph{dynamic cost} of a simple block
481\item
482Do this by repeatedly fetching, obtaining the next instruction, and a new program counter
483\item
484This requires some predicates defining what a `good program' and what a `good program counter' are
[1840]485\item
486Want program counters on instruction boundaries
[1813]487\end{itemize}
488\end{frame}
489
490\begin{frame}
491\frametitle{Static and dynamic costs II}
492\begin{itemize}
493\item
494We aim to prove that the dynamic and static cost calculations coincide
495\item
496This would imply that the static cost computation is correct
497\item
498This proof is surprisingly tricky to complete (about 3 man months of work so far)
499\item
[1824]500Is now about 95\% complete
[1813]501\end{itemize}
502\end{frame}
503
[1840]504\section{Changes to tools and prototypes, looking forward}
505
[1813]506\begin{frame}
[1840]507\frametitle{Changes ported to OCaml prototype}
[1821]508\begin{itemize}
509\item
[1840]510Bug fixes spotted in the formalisation so far have been merged back into the OCaml compiler
[1821]511\item
512Larger changes like the \texttt{Joint} machinery have so far not
513\item
514It is unclear whether they will be
515\item
[1824]516Just a generalisation of what is already there
[1821]517\item
[1824]518Supposed to make formalisation easier
519\item
520Further, we want to ensure that the untrusted compiler is as correct as possible, for experiments in e.g. Frama-C
521\item
522Porting a large change to the untrusted compiler would jeopardise these experiments
[1821]523\end{itemize}
[1813]524\end{frame}
525
526\begin{frame}
527\frametitle{Improvements in Matita}
528\begin{itemize}
529\item
530Part of the motivation for using Matita was for CerCo to act a `stress test'
531\item
532The proofs talked about in this talk have done this
533\item
534Many improvements to Matita have been made since the last project meeting
535\item
536These include major speed-ups of e.g. printing large goals, bug fixes, the porting of CerCo code to standard library, and more options for passing to tactics
537\end{itemize}
538\end{frame}
539
540\begin{frame}
[1824]541\frametitle{The next period}
[1829]542UNIBO has following pool of remaining manpower (postdocs):
543\begin{center}
544\begin{tabular}{ll}
545Person & Man months remaining \\
546\hline
[1840]547Boender & 10 months \\
[1829]548Mulligan & 6 months \\
549Tranquilli & 10 months \\
550\end{tabular}
551\end{center}
552\begin{itemize}
553\item
554Boender finishing assembler correctness proof
555\item
556Mulligan proofs of correctness for 1 intermediate language
557\item
558Tranquilli proofs of correctness for 2 intermediate languages
559\item
560Sacerdoti Coen `floating'
561\item
[1840]562Believe we have enough manpower to complete backend (required 21 man months)
[1829]563\end{itemize}
[1824]564\end{frame}
565
566\begin{frame}
[1813]567\frametitle{Summary}
568We have:
569\begin{itemize}
570\item
[1840]571Translated the OCaml prototype's backend intermediate languages into Matita
[1813]572\item
573Implemented the translations between languages, and given the intermediate languages a semantics
574\item
575Refactored many of the backend intermediate languages into a common, parametric `joint' language, that is later specialised
576\item
[1824]577Spotted opportunities for possibly commuting backend translation passes
[1813]578\item
[1824]579Used six months to define structured traces and start the proof of correctness for the assembler
580\item
581Distinguished our proof from CompCert's by heavy use of dependent types throughout whole compiler
[1813]582\end{itemize}
583\end{frame}
584
585\end{document}
Note: See TracBrowser for help on using the repository browser.