# Changeset 1867

Ignore:
Timestamp:
Mar 15, 2012, 6:42:36 PM (8 years ago)
Message:

Large changes following comments by IS, JMc and CSC

Location:
Deliverables/D1.2/Presentations
Files:
3 edited

### Legend:

Unmodified
 r1862 \usetheme{Frankfurt} \setbeamertemplate{navigation symbols}{} \logo{\includegraphics[height=1.0cm]{fetopen.png}} \author{Dominic Mulligan \\ Postdoc, University of Bologna} \title{CerCo Work Package 4} \author{Dominic Mulligan \\ University of Bologna} \title{CerCo Work Package 4 \\ Verified compiler --- Back-end} \date{CerCo project review meeting\\March 2012} \begin{frame} \frametitle{Overview of progress} Status at end of first period: \begin{center} {\large Executable semantics of MCS-51 in OCaml and Matita} \end{center} ~\\~\\Goals for the end of second period: \begin{center} {\large Executable semantics of back-end intermediate languages\\~\\ Encoding of compiler back-end in CIC} \end{center} \frametitle{Achievements in period 2} \begin{description}[T4.2] \item[T4.2] \alert{Completed}: Matita encoding of compiler back-end \item[T4.3] \alert{Completed}: Executable semantics for intermediate languages \item[T4.4] \alert{In progress}: Correctness proofs \end{description} \bigskip Deliverables D4.2 and D4.3 submitted. \end{frame} \end{frame} \section{Rationalisation of backend languages} \begin{frame} \frametitle{Backend intermediate languages} \section{Unification of back-end languages} \begin{frame} \frametitle{Back-end compilation stages} \vspace{-1em} \begin{small} \begin{frame} \frametitle{\texttt{Joint}: a new approach I} \begin{itemize} \item Consecutive languages in backend must be similar \item Transformations between languages translate away some small specific set of features \item But looking at OCaml code, not clear precisely what differences between languages are, as code is repeated \item Not clear if translation passes can commute, for instance \item CerCo passes are in a different order to CompCert (calling convention and register allocation done in different places) \item Instruction selection done early: changing subset of instructions used would require instructions to be duplicated everywhere in backend \end{itemize} \end{frame} \begin{frame} \frametitle{\texttt{Joint}: a new approach II} \begin{itemize} \item Idea: all of these languages are just instances of a single language \item This language \texttt{Joint} is parameterised by a type of registers to be used in instructions, and so forth \item Each language after RTLabs is now just defined as the \texttt{Joint} language instantiated with some concrete types \item Similarly for semantics: common definitions that take e.g. type representing program counters as parameters \end{itemize} \end{frame} \begin{frame}[fragile] \frametitle{\texttt{Joint}: a new approach III} \texttt{Joint} instructions allow us to embed language-specific instructions: \begin{lstlisting} inductive joint_instruction (p: params__) (globals: list ident): Type[0] := | COMMENT: String $\rightarrow$ joint_instruction p globals | COST_LABEL: costlabel $\rightarrow$ joint_instruction p globals ... | COND: acc_a_reg p $\rightarrow$ label $\rightarrow$ joint_instruction p globals | extension: extend_statements p $\rightarrow$ joint_instruction p globals. \end{lstlisting} \begin{lstlisting} inductive ertl_statement_extension: Type[0] := | ertl_st_ext_new_frame: ertl_statement_extension | ertl_st_ext_del_frame: ertl_statement_extension | ertl_st_ext_frame_size: register $\rightarrow$ ertl_statement_extension. \end{lstlisting} \end{frame} \begin{frame} \frametitle{\texttt{Joint}: a new approach IV} \begin{itemize} \item Languages that provide extensions need to provide translations and semantics for those extensions \item Everything else can be handled at the \texttt{Joint}-level \item This modularises the handling of these languages \end{itemize} \end{frame} \begin{frame} \frametitle{\texttt{Joint}: advantages I} \begin{itemize} \item We can recover the concrete OCaml languages by instantiating parameterized types \item Why use \texttt{Joint}? \item Reduces repeated code (fewer bugs, or places to change) \item Unify some proofs, making correctness proof easier \item Generic optimizations (e.g. constant propagation) \end{itemize} \end{frame} \begin{frame} \frametitle{\texttt{Joint}: advantages II} \begin{itemize} \item Easier to add new intermediate languages as needed \item Easier to see relationship between consecutive languages at a glance \item MCS-51 instruction set embedded in \texttt{Joint} syntax \item Simplifies instruction selection \item We can investigate which translation passes commute much more easily \end{itemize} \end{frame} \begin{frame} \frametitle{Semantics of \texttt{Joint} I} \begin{itemize} \item As mentioned, use of \texttt{Joint} also unifies semantics of these languages \item We use several sets of records, which represent the state that a program is in \item These records are parametric in representations for e.g. frames \end{itemize} \end{frame} \begin{frame} \frametitle{A new intermediate language} \begin{itemize} \item Matita backend includes a new intermediate language: RTLntc \item Sits between RTL and ERTL \item RTLntc is the RTL language where all tailcalls have been eliminated \item This language is implicit' in the OCaml compiler \item There, the RTL to ERTL transformation eliminates tailcalls as part of translation \item But including an extra, explicit intermediate language is almost free' using the \texttt{Joint} language approach \end{itemize} \end{frame} \begin{frame} \frametitle{The LTL to LIN transform I} \begin{itemize} \item \texttt{Joint} clearly separates fetching from program execution \item We can vary how one works whilst fixing the other \item Linearisation is moving from fetching from a graph-based language to fetching from a list-based program representation \item The order of transformations in OCaml prototype is fixed \item Linearisation takes place at a fixed place, in the translation between LTL and LIN \item The Matita compiler is different: linearisation is a generic process \item Any graph-based language can now be linearised \end{itemize} \end{frame} \begin{frame} \frametitle{The LTL to LIN transform II} \begin{itemize} \item CompCert backend linearises much sooner than CerCo's \item Can now experiment with linearising much earlier \item Many transformations and optimisations can work fine on a linearised form \item Only place in the (current) backend that requires a graph-based language is in the ERTL pass, where we do a dataflow analysis \end{itemize} \end{frame} \section{Optimizing assembler correctness proof} \begin{frame} \frametitle{Time not reported} \begin{itemize} \item We had six months of time which is not reported on in any deliverable \item We invested this time working on: \begin{itemize} \item The global proof sketch \item The setup of proof infrastructure', common definitions, lemmas, invariants etc. required for main body of proof \item The proof of correctness for the assembler \item A notion of structured traces', used throughout the compiler formalisation, as a means of eventually proving that the compiler correctly preserves costs \item Structured traces were defined in collaboration with the team at UEDIN \end{itemize} \end{itemize} \end{frame} \begin{frame} \frametitle{Assembler} \begin{itemize} \item After LIN, compiler spits out assembly language for MCS-51 \item Assembler has pseudoinstructions similar to many commercial assembly languages \item For instance, instead of computed jumps (e.g. \texttt{SJMP} to a specific address), compiler can simply spit out a generic jump instruction to a label \item Simplifies the compiler, at the expense of introducing more proof obligations \item Now need a formalized assembler (a step further than CompCert) \end{itemize} \end{frame} \begin{frame} \frametitle{A problem: jump expansion} \begin{itemize} \item Jump expansion' is our name for the standard branch displacement' problem \item Given a pseudojump to a label $l$, how best can this be expanded into an assembly instruction \texttt{SJMP}, \texttt{AJMP} or \texttt{LJMP} to a concrete address? \item Problem also applies to conditional jumps \item Problem especially relevant for MCS-51 as it has a small code memory, therefore aggressive expansion of jumps into smallest possible concrete jump instruction needed \item But a known hard problem (NP-complete depending on architecture), and easy to imagine knotty configurations where size of jumps are interdependent \end{itemize} \end{frame} \begin{frame} \frametitle{Jump expansion I} \begin{itemize} \item We employed the following tactic: split the decision over how any particular pseudoinstruction is expanded from pseudoinstruction expansion \item Call the decision maker a policy' \item We started the proof of correctness for the assembler based on the premise that a correct policy exists \item Further, we know that the assembler only fails to assemble a program if a good policy does not exist (a side-effect of using dependent types) \item A bad policy is a function that expands a given pseudojump into a concrete jump instruction that is too small' for the distance to be jumped, or makes the program consume too much memory \end{itemize} \end{frame} \begin{frame} \frametitle{Jump expansion II} \begin{itemize} \item Jaap Boender at UNIBO has been working on a verified implementation of a good jump expansion policy for the MCS-51 \item The strategy initially translates all pseudojumps as \texttt{SJMP} and then increases their size if necessary \item Termination of the procedure is proved, as well as a safety property, stating that jumps are not expanded into jumps that are too long \item His strategy is not optimal (though the computed solution is optimal for the strategy employed) \item Jaap's work is the first formal treatment of the jump expansion problem' \end{itemize} \end{frame} \begin{frame} \frametitle{Assembler correctness proof} \begin{itemize} \item Assuming the existence of a good jump expansion property, we completed about 75\% of the correctness proof for the assembler \item Jaap's work has just been completed (modulo a few missing lemmas) \item Postponed the remainder of main assembler proof to start work on other tasks (and for Jaap to finish) \item We intend to return to proof, and publish an account of the work (possibly) as a journal paper \frametitle{\texttt{Joint}: unifying intermediate languages} In OCaml prototype each intermediate language is defined separately, but: \medskip \begin{itemize} \item Back-end languages are all closely related \item Compiler stages make small changes \item Some stages may be repeated or exchanged \end{itemize} \bigskip \texttt{Joint} is a parameterized language which streamlines the back-end \begin{itemize} \item Each back-end language is an instance of \texttt{Joint} \item Definitions, functions and proofs are shared \item Each \texttt{Joint} instance can add custom instructions \end{itemize} \end{frame} \begin{frame} \frametitle{\texttt{Joint}: benefits} \begin{itemize} \item Reduces repeated code \item Optimizations generalize to multiple languages \item Unifies proofs, making correctness proof easier \item Allows a more flexible approach to compiler translation order \end{itemize} \begin{itemize} \item New intermediate language \structure{RTLntc} \alert{removes tailcalls} \item With \texttt{Joint}, incorporating RTLntc is cheap \end{itemize} \begin{itemize} \item The prototype, and CompCert, fix where linearisation occurs \item The Matita compiler is different: any graph-based language can be linearised \end{itemize} \end{frame} \section{Correctness proof for optimizing assembler} \begin{frame} \frametitle{Assembler correctness} \begin{itemize} \item We compute costs directly on object code \item So, unlike CompCert, we formalise the assembler and machine code (75\% complete) \end{itemize} \begin{itemize} \item Forces us to address a hard problem: branch displacement \item Solution: isolate \alert{expansion policy} from generic correctness proof \item Separately (Boender): well-defined and verified expansion policies (99\% complete) \item Jaap's work is the first machine formalised treatment of branch displacement \end{itemize} \end{frame} \begin{frame} \frametitle{Structured traces I} \frametitle{Structured traces} \begin{itemize} \item We introduced a notion of structured traces' \item These are intended to statically capture the (good) execution traces of a program \item To borrow a slogan: they are the computational content of a well-formed program's execution' \item Come in two variants: inductive and coinductive \item Inductive captures program execution traces that eventually halt, coinductive ones that diverge \end{itemize} \end{frame} \begin{frame} \frametitle{Structured traces II} \begin{itemize} \item I focus on the inductive variety, as used the most (for now) in the backend \item In particular, used in the proof that static and dynamic cost computations coincide \item Traces preserved by backend compilation, initially created at RTLabs \item This will be explained later \end{itemize} \end{frame} \begin{frame} \frametitle{Structured traces III} \begin{itemize} \item Central insight is that program execution is always in the body of some function (from \texttt{main} onwards) \item A well formed program must have labels appearing at certain spots \item Similarly, the final instruction executed when executing a function must be a \texttt{RET} \item Execution must then continue in body of calling function, at correct place \item These invariants, and others, are crystalised in the specific syntactic form of a structured trace \end{itemize} \end{frame} \begin{frame} \frametitle{Recursive structure of good' execution I} These are needed to statically capture the (good) execution traces of a program \item They are the computational content of the proof of well-labelledness' \item Translation passes \alert{must} preserve structure of traces, not just their flattening \end{itemize} \end{frame} \begin{frame} \frametitle{Recursive structure of good' execution} \begin{itemize} \item \begin{frame} \frametitle{Static and dynamic costs} \frametitle{Static and dynamic costs I} \begin{center} \includegraphics[scale=0.33]{recursive_structure.png} \begin{tabular}[b]{ll} & \texttt{emit(l1)} \\ & \texttt{emit(L1)} \\ & \texttt{MOV r1 0}\\ & \texttt{ADD r1 r2}\\ & \texttt{CALL f} \\ & \alert{\texttt{CALL f}} \\ & \texttt{ADD r2 r2}\\ & \texttt{MOV r2 0}\\ \end{tabular} \end{center} \makebox[0pt][l]{k($l_1$) = k(\texttt{MOV}) + k (\texttt{ADD}) + \ldots + k(\texttt{RET})}\\ static-cost(trace) = k($l_1$) + \ldots + k($l_4$)\\ \makebox[0pt][l]{k($L_1$) = k(\texttt{MOV}) + k (\texttt{ADD}) + \ldots + k(\texttt{RET})}\\ static-cost(trace) = k($L_1$) + \ldots + k($L_4$)\\ dynamic-cost(trace) = \texttt{clock}(Final$_1$) - \texttt{clock}(Start$_1$)\\ \alert{Theorem: static-cost(trace) = dynamic-cost(trace)} \end{frame} \section{Changes to tools and prototypes, looking forward} \section{Feedback to Matita and the OCaml prototype} \begin{frame} \begin{itemize} \item Bug fixes spotted in the formalisation so far have been merged back into the OCaml compiler Bug fixes spotted in the formalisation so far have been merged back into the OCaml prototype compiler \item Larger changes like the \texttt{Joint} machinery have so far not \item It is unclear whether they will be \item Just a generalisation of what is already there \item Supposed to make formalisation easier Unclear whether they will be: their real value is in streamlining formalisation \item Further, we want to ensure that the untrusted compiler is as correct as possible, for experiments in e.g. Frama-C Permanents floating' \item Believe we have enough manpower to complete backend (required 21 man months) Believe we have enough manpower to complete back-end (required 21 man months) \end{itemize} \end{frame} \begin{itemize} \item Translated the OCaml prototype's backend intermediate languages into Matita Translated the OCaml prototype's back-end intermediate languages into Matita \item Implemented the translations between languages, and given the intermediate languages a semantics \item Refactored many of the backend intermediate languages into a common, parametric joint' language, that is later specialised \item Spotted opportunities for possibly commuting backend translation passes Refactored many of the back-end intermediate languages into a common, parametric `joint' language, that is later specialised \item Spotted opportunities for possibly commuting back-end translation passes \item Used six months to define structured traces and start the proof of correctness for the assembler