Mar 15, 2012, 6:42:36 PM (8 years ago)

Large changes following comments by IS, JMc and CSC

1 edited


  • Deliverables/D1.2/Presentations/WP4-dominic.tex

    r1862 r1867  
     9\setbeamertemplate{navigation symbols}{}
    11 \author{Dominic Mulligan \\ Postdoc, University of Bologna}
    12 \title{CerCo Work Package 4}
     12\author{Dominic Mulligan \\ University of Bologna}
     13\title{CerCo Work Package 4 \\ Verified compiler --- Back-end}
    1314\date{CerCo project review meeting\\March 2012}
    45 \frametitle{Overview of progress}
    46 Status at end of first period:
    47 \begin{center}
    48 {\large Executable semantics of MCS-51 in OCaml and Matita}
    49 \end{center}
    51 ~\\~\\Goals for the end of second period:
    52 \begin{center}
    53 {\large Executable semantics of back-end intermediate languages\\~\\
    54 Encoding of compiler back-end in CIC}
    55 \end{center}
     46\frametitle{Achievements in period 2}
     49\item[T4.2] \alert{Completed}: Matita encoding of compiler back-end
     50\item[T4.3] \alert{Completed}: Executable semantics for intermediate languages
     51\item[T4.4] \alert{In progress}: Correctness proofs
     55Deliverables D4.2 and D4.3 submitted.
    63 \section{Rationalisation of backend languages}
    65 \begin{frame}
    66 \frametitle{Backend intermediate languages}
     63\section{Unification of back-end languages}
     66\frametitle{Back-end compilation stages}
    93 \frametitle{\texttt{Joint}: a new approach I}
    94 \begin{itemize}
    95 \item
    96 Consecutive languages in backend must be similar
    97 \item
    98 Transformations between languages translate away some small specific set of features
    99 \item
    100 But looking at OCaml code, not clear precisely what differences between languages are, as code is repeated
    101 \item
    102 Not clear if translation passes can commute, for instance
    103 \item
    104 CerCo passes are in a different order to CompCert (calling convention and register allocation done in different places)
    105 \item
    106 Instruction selection done early: changing subset of instructions used would require instructions to be duplicated everywhere in backend
    107 \end{itemize}
    108 \end{frame}
    110 \begin{frame}
    111 \frametitle{\texttt{Joint}: a new approach II}
    112 \begin{itemize}
    113 \item
    114 Idea: all of these languages are just instances of a single language
    115 \item
    116 This language \texttt{Joint} is parameterised by a type of registers to be used in instructions, and so forth
    117 \item
    118 Each language after RTLabs is now just defined as the \texttt{Joint} language instantiated with some concrete types
    119 \item
    120 Similarly for semantics: common definitions that take e.g. type representing program counters as parameters
    121 \end{itemize}
    122 \end{frame}
    124 \begin{frame}[fragile]
    125 \frametitle{\texttt{Joint}: a new approach III}
    126 \texttt{Joint} instructions allow us to embed language-specific instructions:
    127 \begin{lstlisting}
    128 inductive joint_instruction (p: params__) (globals: list ident): Type[0] :=
    129  | COMMENT: String $\rightarrow$ joint_instruction p globals
    130  | COST_LABEL: costlabel $\rightarrow$ joint_instruction p globals
    131  ...
    132  | COND: acc_a_reg p $\rightarrow$ label $\rightarrow$ joint_instruction p globals
    133  | extension: extend_statements p $\rightarrow$ joint_instruction p globals.
    134 \end{lstlisting}
    136 \begin{lstlisting}
    137 inductive ertl_statement_extension: Type[0] :=
    138  | ertl_st_ext_new_frame: ertl_statement_extension
    139  | ertl_st_ext_del_frame: ertl_statement_extension
    140  | ertl_st_ext_frame_size: register $\rightarrow$ ertl_statement_extension.
    141 \end{lstlisting}
    142 \end{frame}
    144 \begin{frame}
    145 \frametitle{\texttt{Joint}: a new approach IV}
    146 \begin{itemize}
    147 \item
    148 Languages that provide extensions need to provide translations and semantics for those extensions
    149 \item
    150 Everything else can be handled at the \texttt{Joint}-level
    151 \item
    152 This modularises the handling of these languages
    153 \end{itemize}
    154 \end{frame}
    156 \begin{frame}
    157 \frametitle{\texttt{Joint}: advantages I}
    158 \begin{itemize}
    159 \item
    160 We can recover the concrete OCaml languages by instantiating parameterized types
    161 \item
    162 Why use \texttt{Joint}?
    163 \item
    164 Reduces repeated code (fewer bugs, or places to change)
    165 \item
    166 Unify some proofs, making correctness proof easier
    167 \item
    168 Generic optimizations (e.g. constant propagation)
    169 \end{itemize}
    170 \end{frame}
    172 \begin{frame}
    173 \frametitle{\texttt{Joint}: advantages II}
    174 \begin{itemize}
    175 \item
    176 Easier to add new intermediate languages as needed
    177 \item
    178 Easier to see relationship between consecutive languages at a glance
    179 \item
    180 MCS-51 instruction set embedded in \texttt{Joint} syntax
    181 \item
    182 Simplifies instruction selection
    183 \item
    184 We can investigate which translation passes commute much more easily
    185 \end{itemize}
    186 \end{frame}
    188 \begin{frame}
    189 \frametitle{Semantics of \texttt{Joint} I}
    190 \begin{itemize}
    191 \item
    192 As mentioned, use of \texttt{Joint} also unifies semantics of these languages
    193 \item
    194 We use several sets of records, which represent the state that a program is in
    195 \item
    196 These records are parametric in representations for e.g. frames
    197 \end{itemize}
    198 \end{frame}
    200 \begin{frame}
    201 \frametitle{A new intermediate language}
    202 \begin{itemize}
    203 \item
    204 Matita backend includes a new intermediate language: RTLntc
    205 \item
    206 Sits between RTL and ERTL
    207 \item
    208 RTLntc is the RTL language where all tailcalls have been eliminated
    209 \item
    210 This language is `implicit' in the OCaml compiler
    211 \item
    212 There, the RTL to ERTL transformation eliminates tailcalls as part of translation
    213 \item
    214 But including an extra, explicit intermediate language is `almost free' using the \texttt{Joint} language approach
    215 \end{itemize}
    216 \end{frame}
    218 \begin{frame}
    219 \frametitle{The LTL to LIN transform I}
    220 \begin{itemize}
    221 \item
    222 \texttt{Joint} clearly separates fetching from program execution
    223 \item
    224 We can vary how one works whilst fixing the other
    225 \item
    226 Linearisation is moving from fetching from a graph-based language to fetching from a list-based program representation
    227 \item
    228 The order of transformations in OCaml prototype is fixed
    229 \item
    230 Linearisation takes place at a fixed place, in the translation between LTL and LIN
    231 \item
    232 The Matita compiler is different: linearisation is a generic process
    233 \item
    234 Any graph-based language can now be linearised
    235 \end{itemize}
    236 \end{frame}
    238 \begin{frame}
    239 \frametitle{The LTL to LIN transform II}
    240 \begin{itemize}
    241 \item
    242 CompCert backend linearises much sooner than CerCo's
    243 \item
    244 Can now experiment with linearising much earlier
    245 \item
    246 Many transformations and optimisations can work fine on a linearised form
    247 \item
    248 Only place in the (current) backend that requires a graph-based language is in the ERTL pass, where we do a dataflow analysis
    249 \end{itemize}
    250 \end{frame}
    252 \section{Optimizing assembler correctness proof}
    254 \begin{frame}
    255 \frametitle{Time not reported}
    256 \begin{itemize}
    257 \item
    258 We had six months of time which is not reported on in any deliverable
    259 \item
    260 We invested this time working on:
    261 \begin{itemize}
    262 \item
    263 The global proof sketch
    264 \item
    265 The setup of `proof infrastructure', common definitions, lemmas, invariants etc. required for main body of proof
    266 \item
    267 The proof of correctness for the assembler
    268 \item
    269 A notion of `structured traces', used throughout the compiler formalisation, as a means of eventually proving that the compiler correctly preserves costs
    270 \item
    271 Structured traces were defined in collaboration with the team at UEDIN
    272 \end{itemize}
    273 \end{itemize}
    274 \end{frame}
    276 \begin{frame}
    277 \frametitle{Assembler}
    278 \begin{itemize}
    279 \item
    280 After LIN, compiler spits out assembly language for MCS-51
    281 \item
    282 Assembler has pseudoinstructions similar to many commercial assembly languages
    283 \item
    284 For instance, instead of computed jumps (e.g. \texttt{SJMP} to a specific address), compiler can simply spit out a generic jump instruction to a label
    285 \item
    286 Simplifies the compiler, at the expense of introducing more proof obligations
    287 \item
    288 Now need a formalized assembler (a step further than CompCert)
    289 \end{itemize}
    290 \end{frame}
    292 \begin{frame}
    293 \frametitle{A problem: jump expansion}
    294 \begin{itemize}
    295 \item
    296 `Jump expansion' is our name for the standard `branch displacement' problem
    297 \item
    298 Given a pseudojump to a label $l$, how best can this be expanded into an assembly instruction \texttt{SJMP}, \texttt{AJMP} or \texttt{LJMP} to a concrete address?
    299 \item
    300 Problem also applies to conditional jumps
    301 \item
    302 Problem especially relevant for MCS-51 as it has a small code memory, therefore aggressive expansion of jumps into smallest possible concrete jump instruction needed
    303 \item
    304 But a known hard problem (NP-complete depending on architecture), and easy to imagine knotty configurations where size of jumps are interdependent
    305 \end{itemize}
    306 \end{frame}
    308 \begin{frame}
    309 \frametitle{Jump expansion I}
    310 \begin{itemize}
    311 \item
    312 We employed the following tactic: split the decision over how any particular pseudoinstruction is expanded from pseudoinstruction expansion
    313 \item
    314 Call the decision maker a `policy'
    315 \item
    316 We started the proof of correctness for the assembler based on the premise that a correct policy exists
    317 \item
    318 Further, we know that the assembler only fails to assemble a program if a good policy does not exist (a side-effect of using dependent types)
    319 \item
    320 A bad policy is a function that expands a given pseudojump into a concrete jump instruction that is `too small' for the distance to be jumped, or makes the program consume too much memory
    321 \end{itemize}
    322 \end{frame}
    324 \begin{frame}
    325 \frametitle{Jump expansion II}
    326 \begin{itemize}
    327 \item
    328 Jaap Boender at UNIBO has been working on a verified implementation of a good jump expansion policy for the MCS-51
    329 \item
    330 The strategy initially translates all pseudojumps as \texttt{SJMP} and then increases their size if necessary
    331 \item
    332 Termination of the procedure is proved, as well as a safety property, stating that jumps are not expanded into jumps that are too long
    333 \item
    334 His strategy is not optimal (though the computed solution is optimal for the strategy employed)
    335 \item
    336 Jaap's work is the first formal treatment of the `jump expansion problem'
    337 \end{itemize}
    338 \end{frame}
    340 \begin{frame}
    341 \frametitle{Assembler correctness proof}
    342 \begin{itemize}
    343 \item
    344 Assuming the existence of a good jump expansion property, we completed about 75\% of the correctness proof for the assembler
    345 \item
    346 Jaap's work has just been completed (modulo a few missing lemmas)
    347 \item
    348 Postponed the remainder of main assembler proof to start work on other tasks (and for Jaap to finish)
    349 \item
    350 We intend to return to proof, and publish an account of the work (possibly) as a journal paper
     93\frametitle{\texttt{Joint}: unifying intermediate languages}
     94In OCaml prototype each intermediate language is defined separately, but:
     98Back-end languages are all closely related
     100Compiler stages make small changes
     102Some stages may be repeated or exchanged
     105\texttt{Joint} is a parameterized language which streamlines the back-end
     108Each back-end language is an instance of \texttt{Joint}
     110Definitions, functions and proofs are shared
     112Each \texttt{Joint} instance can add custom instructions
     117\frametitle{\texttt{Joint}: benefits}
     120Reduces repeated code
     122Optimizations generalize to multiple languages
     124Unifies proofs, making correctness proof easier
     126Allows a more flexible approach to compiler translation order
     130New intermediate language \structure{RTLntc} \alert{removes tailcalls}
     132With \texttt{Joint}, incorporating RTLntc is cheap
     136The prototype, and CompCert, fix where linearisation occurs
     138The Matita compiler is different: any graph-based language can be linearised
     142\section{Correctness proof for optimizing assembler}
     145\frametitle{Assembler correctness}
     148We compute costs directly on object code
     150So, unlike CompCert, we formalise the assembler and machine code (75\% complete)
     154Forces us to address a hard problem: branch displacement
     156Solution: isolate \alert{expansion policy} from generic correctness proof
     158Separately (Boender): well-defined and verified expansion policies (99\% complete)
     160Jaap's work is the first machine formalised treatment of branch displacement
    421 \frametitle{Structured traces I}
     231\frametitle{Structured traces}
    424234We introduced a notion of `structured traces'
    426 These are intended to statically capture the (good) execution traces of a program
    427 \item
    428 To borrow a slogan: they are the `computational content of a well-formed program's execution'
    429 \item
    430 Come in two variants: inductive and coinductive
    431 \item
    432 Inductive captures program execution traces that eventually halt, coinductive ones that diverge
    433 \end{itemize}
    434 \end{frame}
    436 \begin{frame}
    437 \frametitle{Structured traces II}
    438 \begin{itemize}
    439 \item
    440 I focus on the inductive variety, as used the most (for now) in the backend
    441 \item
    442 In particular, used in the proof that static and dynamic cost computations coincide
    443 \item
    444 Traces preserved by backend compilation, initially created at RTLabs
    445 \item
    446 This will be explained later
    447 \end{itemize}
    448 \end{frame}
    450 \begin{frame}
    451 \frametitle{Structured traces III}
    452 \begin{itemize}
    453 \item
    454 Central insight is that program execution is always in the body of some function (from \texttt{main} onwards)
    455 \item
    456 A well formed program must have labels appearing at certain spots
    457 \item
    458 Similarly, the final instruction executed when executing a function must be a \texttt{RET}
    459 \item
    460 Execution must then continue in body of calling function, at correct place
    461 \item
    462 These invariants, and others, are crystalised in the specific syntactic form of a structured trace
    463 \end{itemize}
    464 \end{frame}
    466 \begin{frame}
    467 \frametitle{Recursive structure of `good' execution I}
     236These are needed to statically capture the (good) execution traces of a program
     238They are the `computational content of the proof of well-labelledness'
     240Translation passes \alert{must} preserve structure of traces, not just their flattening
     245\frametitle{Recursive structure of `good' execution}
    480 \frametitle{Static and dynamic costs}
     258\frametitle{Static and dynamic costs I}
    484     & \texttt{emit(l1)} \\
     262    & \texttt{emit(L1)} \\
    485263    & \texttt{MOV r1 0}\\
    486264    & \texttt{ADD r1 r2}\\
    487     & \texttt{CALL f} \\
     265    & \alert{\texttt{CALL f}} \\
    488266    & \texttt{ADD r2 r2}\\
    489267    & \texttt{MOV r2 0}\\
    493 \makebox[0pt][l]{k($l_1$) = k(\texttt{MOV}) + k (\texttt{ADD}) + \ldots + k(\texttt{RET})}\\
    494 static-cost(trace) = k($l_1$) + \ldots + k($l_4$)\\
     271\makebox[0pt][l]{k($L_1$) = k(\texttt{MOV}) + k (\texttt{ADD}) + \ldots + k(\texttt{RET})}\\
     272static-cost(trace) = k($L_1$) + \ldots + k($L_4$)\\
    495273dynamic-cost(trace) = \texttt{clock}(Final$_1$) - \texttt{clock}(Start$_1$)\\
    496274\alert{Theorem: static-cost(trace) = dynamic-cost(trace)}
    517 \section{Changes to tools and prototypes, looking forward}
     295\section{Feedback to Matita and the OCaml prototype}
    523 Bug fixes spotted in the formalisation so far have been merged back into the OCaml compiler
     301Bug fixes spotted in the formalisation so far have been merged back into the OCaml prototype compiler
    525303Larger changes like the \texttt{Joint} machinery have so far not
    527 It is unclear whether they will be
    528 \item
    529 Just a generalisation of what is already there
    530 \item
    531 Supposed to make formalisation easier
     305Unclear whether they will be: their real value is in streamlining formalisation
    533307Further, we want to ensure that the untrusted compiler is as correct as possible, for experiments in e.g. Frama-C
    573347Permanents `floating'
    575 Believe we have enough manpower to complete backend (required 21 man months)
     349Believe we have enough manpower to complete back-end (required 21 man months)
    584 Translated the OCaml prototype's backend intermediate languages into Matita
     358Translated the OCaml prototype's back-end intermediate languages into Matita
    586360Implemented the translations between languages, and given the intermediate languages a semantics
    588 Refactored many of the backend intermediate languages into a common, parametric `joint' language, that is later specialised
    589 \item
    590 Spotted opportunities for possibly commuting backend translation passes
     362Refactored many of the back-end intermediate languages into a common, parametric `joint' language, that is later specialised
     364Spotted opportunities for possibly commuting back-end translation passes
    592366Used six months to define structured traces and start the proof of correctness for the assembler
Note: See TracChangeset for help on using the changeset viewer.