Changeset 1867 for Deliverables/D1.2/Presentations
- Timestamp:
- Mar 15, 2012, 6:42:36 PM (9 years ago)
- Location:
- Deliverables/D1.2/Presentations
- Files:
-
- 3 edited
Legend:
- Unmodified
- Added
- Removed
-
Deliverables/D1.2/Presentations/WP4-dominic.tex
r1862 r1867 7 7 8 8 \usetheme{Frankfurt} 9 \setbeamertemplate{navigation symbols}{} 9 10 \logo{\includegraphics[height=1.0cm]{fetopen.png}} 10 11 11 \author{Dominic Mulligan \\ Postdoc,University of Bologna}12 \title{CerCo Work Package 4 }12 \author{Dominic Mulligan \\ University of Bologna} 13 \title{CerCo Work Package 4 \\ Verified compiler --- Back-end} 13 14 \date{CerCo project review meeting\\March 2012} 14 15 … … 43 44 44 45 \begin{frame} 45 \frametitle{Overview of progress} 46 Status at end of first period: 47 \begin{center} 48 {\large Executable semantics of MCS-51 in OCaml and Matita} 49 \end{center} 50 51 ~\\~\\Goals for the end of second period: 52 \begin{center} 53 {\large Executable semantics of back-end intermediate languages\\~\\ 54 Encoding of compiler back-end in CIC} 55 \end{center} 46 \frametitle{Achievements in period 2} 47 48 \begin{description}[T4.2] 49 \item[T4.2] \alert{Completed}: Matita encoding of compiler back-end 50 \item[T4.3] \alert{Completed}: Executable semantics for intermediate languages 51 \item[T4.4] \alert{In progress}: Correctness proofs 52 \end{description} 53 54 \bigskip 55 Deliverables D4.2 and D4.3 submitted. 56 56 \end{frame} 57 57 … … 61 61 \end{frame} 62 62 63 \section{ Rationalisation of backend languages}64 65 \begin{frame} 66 \frametitle{Back end intermediate languages}63 \section{Unification of back-end languages} 64 65 \begin{frame} 66 \frametitle{Back-end compilation stages} 67 67 \vspace{-1em} 68 68 \begin{small} … … 91 91 92 92 \begin{frame} 93 \frametitle{\texttt{Joint}: a new approach I} 94 \begin{itemize} 95 \item 96 Consecutive languages in backend must be similar 97 \item 98 Transformations between languages translate away some small specific set of features 99 \item 100 But looking at OCaml code, not clear precisely what differences between languages are, as code is repeated 101 \item 102 Not clear if translation passes can commute, for instance 103 \item 104 CerCo passes are in a different order to CompCert (calling convention and register allocation done in different places) 105 \item 106 Instruction selection done early: changing subset of instructions used would require instructions to be duplicated everywhere in backend 107 \end{itemize} 108 \end{frame} 109 110 \begin{frame} 111 \frametitle{\texttt{Joint}: a new approach II} 112 \begin{itemize} 113 \item 114 Idea: all of these languages are just instances of a single language 115 \item 116 This language \texttt{Joint} is parameterised by a type of registers to be used in instructions, and so forth 117 \item 118 Each language after RTLabs is now just defined as the \texttt{Joint} language instantiated with some concrete types 119 \item 120 Similarly for semantics: common definitions that take e.g. type representing program counters as parameters 121 \end{itemize} 122 \end{frame} 123 124 \begin{frame}[fragile] 125 \frametitle{\texttt{Joint}: a new approach III} 126 \texttt{Joint} instructions allow us to embed language-specific instructions: 127 \begin{lstlisting} 128 inductive joint_instruction (p: params__) (globals: list ident): Type[0] := 129 | COMMENT: String $\rightarrow$ joint_instruction p globals 130 | COST_LABEL: costlabel $\rightarrow$ joint_instruction p globals 131 ... 132 | COND: acc_a_reg p $\rightarrow$ label $\rightarrow$ joint_instruction p globals 133 | extension: extend_statements p $\rightarrow$ joint_instruction p globals. 134 \end{lstlisting} 135 136 \begin{lstlisting} 137 inductive ertl_statement_extension: Type[0] := 138 | ertl_st_ext_new_frame: ertl_statement_extension 139 | ertl_st_ext_del_frame: ertl_statement_extension 140 | ertl_st_ext_frame_size: register $\rightarrow$ ertl_statement_extension. 141 \end{lstlisting} 142 \end{frame} 143 144 \begin{frame} 145 \frametitle{\texttt{Joint}: a new approach IV} 146 \begin{itemize} 147 \item 148 Languages that provide extensions need to provide translations and semantics for those extensions 149 \item 150 Everything else can be handled at the \texttt{Joint}-level 151 \item 152 This modularises the handling of these languages 153 \end{itemize} 154 \end{frame} 155 156 \begin{frame} 157 \frametitle{\texttt{Joint}: advantages I} 158 \begin{itemize} 159 \item 160 We can recover the concrete OCaml languages by instantiating parameterized types 161 \item 162 Why use \texttt{Joint}? 163 \item 164 Reduces repeated code (fewer bugs, or places to change) 165 \item 166 Unify some proofs, making correctness proof easier 167 \item 168 Generic optimizations (e.g. constant propagation) 169 \end{itemize} 170 \end{frame} 171 172 \begin{frame} 173 \frametitle{\texttt{Joint}: advantages II} 174 \begin{itemize} 175 \item 176 Easier to add new intermediate languages as needed 177 \item 178 Easier to see relationship between consecutive languages at a glance 179 \item 180 MCS-51 instruction set embedded in \texttt{Joint} syntax 181 \item 182 Simplifies instruction selection 183 \item 184 We can investigate which translation passes commute much more easily 185 \end{itemize} 186 \end{frame} 187 188 \begin{frame} 189 \frametitle{Semantics of \texttt{Joint} I} 190 \begin{itemize} 191 \item 192 As mentioned, use of \texttt{Joint} also unifies semantics of these languages 193 \item 194 We use several sets of records, which represent the state that a program is in 195 \item 196 These records are parametric in representations for e.g. frames 197 \end{itemize} 198 \end{frame} 199 200 \begin{frame} 201 \frametitle{A new intermediate language} 202 \begin{itemize} 203 \item 204 Matita backend includes a new intermediate language: RTLntc 205 \item 206 Sits between RTL and ERTL 207 \item 208 RTLntc is the RTL language where all tailcalls have been eliminated 209 \item 210 This language is `implicit' in the OCaml compiler 211 \item 212 There, the RTL to ERTL transformation eliminates tailcalls as part of translation 213 \item 214 But including an extra, explicit intermediate language is `almost free' using the \texttt{Joint} language approach 215 \end{itemize} 216 \end{frame} 217 218 \begin{frame} 219 \frametitle{The LTL to LIN transform I} 220 \begin{itemize} 221 \item 222 \texttt{Joint} clearly separates fetching from program execution 223 \item 224 We can vary how one works whilst fixing the other 225 \item 226 Linearisation is moving from fetching from a graph-based language to fetching from a list-based program representation 227 \item 228 The order of transformations in OCaml prototype is fixed 229 \item 230 Linearisation takes place at a fixed place, in the translation between LTL and LIN 231 \item 232 The Matita compiler is different: linearisation is a generic process 233 \item 234 Any graph-based language can now be linearised 235 \end{itemize} 236 \end{frame} 237 238 \begin{frame} 239 \frametitle{The LTL to LIN transform II} 240 \begin{itemize} 241 \item 242 CompCert backend linearises much sooner than CerCo's 243 \item 244 Can now experiment with linearising much earlier 245 \item 246 Many transformations and optimisations can work fine on a linearised form 247 \item 248 Only place in the (current) backend that requires a graph-based language is in the ERTL pass, where we do a dataflow analysis 249 \end{itemize} 250 \end{frame} 251 252 \section{Optimizing assembler correctness proof} 253 254 \begin{frame} 255 \frametitle{Time not reported} 256 \begin{itemize} 257 \item 258 We had six months of time which is not reported on in any deliverable 259 \item 260 We invested this time working on: 261 \begin{itemize} 262 \item 263 The global proof sketch 264 \item 265 The setup of `proof infrastructure', common definitions, lemmas, invariants etc. required for main body of proof 266 \item 267 The proof of correctness for the assembler 268 \item 269 A notion of `structured traces', used throughout the compiler formalisation, as a means of eventually proving that the compiler correctly preserves costs 270 \item 271 Structured traces were defined in collaboration with the team at UEDIN 272 \end{itemize} 273 \end{itemize} 274 \end{frame} 275 276 \begin{frame} 277 \frametitle{Assembler} 278 \begin{itemize} 279 \item 280 After LIN, compiler spits out assembly language for MCS-51 281 \item 282 Assembler has pseudoinstructions similar to many commercial assembly languages 283 \item 284 For instance, instead of computed jumps (e.g. \texttt{SJMP} to a specific address), compiler can simply spit out a generic jump instruction to a label 285 \item 286 Simplifies the compiler, at the expense of introducing more proof obligations 287 \item 288 Now need a formalized assembler (a step further than CompCert) 289 \end{itemize} 290 \end{frame} 291 292 \begin{frame} 293 \frametitle{A problem: jump expansion} 294 \begin{itemize} 295 \item 296 `Jump expansion' is our name for the standard `branch displacement' problem 297 \item 298 Given a pseudojump to a label $l$, how best can this be expanded into an assembly instruction \texttt{SJMP}, \texttt{AJMP} or \texttt{LJMP} to a concrete address? 299 \item 300 Problem also applies to conditional jumps 301 \item 302 Problem especially relevant for MCS-51 as it has a small code memory, therefore aggressive expansion of jumps into smallest possible concrete jump instruction needed 303 \item 304 But a known hard problem (NP-complete depending on architecture), and easy to imagine knotty configurations where size of jumps are interdependent 305 \end{itemize} 306 \end{frame} 307 308 \begin{frame} 309 \frametitle{Jump expansion I} 310 \begin{itemize} 311 \item 312 We employed the following tactic: split the decision over how any particular pseudoinstruction is expanded from pseudoinstruction expansion 313 \item 314 Call the decision maker a `policy' 315 \item 316 We started the proof of correctness for the assembler based on the premise that a correct policy exists 317 \item 318 Further, we know that the assembler only fails to assemble a program if a good policy does not exist (a side-effect of using dependent types) 319 \item 320 A bad policy is a function that expands a given pseudojump into a concrete jump instruction that is `too small' for the distance to be jumped, or makes the program consume too much memory 321 \end{itemize} 322 \end{frame} 323 324 \begin{frame} 325 \frametitle{Jump expansion II} 326 \begin{itemize} 327 \item 328 Jaap Boender at UNIBO has been working on a verified implementation of a good jump expansion policy for the MCS-51 329 \item 330 The strategy initially translates all pseudojumps as \texttt{SJMP} and then increases their size if necessary 331 \item 332 Termination of the procedure is proved, as well as a safety property, stating that jumps are not expanded into jumps that are too long 333 \item 334 His strategy is not optimal (though the computed solution is optimal for the strategy employed) 335 \item 336 Jaap's work is the first formal treatment of the `jump expansion problem' 337 \end{itemize} 338 \end{frame} 339 340 \begin{frame} 341 \frametitle{Assembler correctness proof} 342 \begin{itemize} 343 \item 344 Assuming the existence of a good jump expansion property, we completed about 75\% of the correctness proof for the assembler 345 \item 346 Jaap's work has just been completed (modulo a few missing lemmas) 347 \item 348 Postponed the remainder of main assembler proof to start work on other tasks (and for Jaap to finish) 349 \item 350 We intend to return to proof, and publish an account of the work (possibly) as a journal paper 93 \frametitle{\texttt{Joint}: unifying intermediate languages} 94 In OCaml prototype each intermediate language is defined separately, but: 95 \medskip 96 \begin{itemize} 97 \item 98 Back-end languages are all closely related 99 \item 100 Compiler stages make small changes 101 \item 102 Some stages may be repeated or exchanged 103 \end{itemize} 104 \bigskip 105 \texttt{Joint} is a parameterized language which streamlines the back-end 106 \begin{itemize} 107 \item 108 Each back-end language is an instance of \texttt{Joint} 109 \item 110 Definitions, functions and proofs are shared 111 \item 112 Each \texttt{Joint} instance can add custom instructions 113 \end{itemize} 114 \end{frame} 115 116 \begin{frame} 117 \frametitle{\texttt{Joint}: benefits} 118 \begin{itemize} 119 \item 120 Reduces repeated code 121 \item 122 Optimizations generalize to multiple languages 123 \item 124 Unifies proofs, making correctness proof easier 125 \item 126 Allows a more flexible approach to compiler translation order 127 \end{itemize} 128 \begin{itemize} 129 \item 130 New intermediate language \structure{RTLntc} \alert{removes tailcalls} 131 \item 132 With \texttt{Joint}, incorporating RTLntc is cheap 133 \end{itemize} 134 \begin{itemize} 135 \item 136 The prototype, and CompCert, fix where linearisation occurs 137 \item 138 The Matita compiler is different: any graph-based language can be linearised 139 \end{itemize} 140 \end{frame} 141 142 \section{Correctness proof for optimizing assembler} 143 144 \begin{frame} 145 \frametitle{Assembler correctness} 146 \begin{itemize} 147 \item 148 We compute costs directly on object code 149 \item 150 So, unlike CompCert, we formalise the assembler and machine code (75\% complete) 151 \end{itemize} 152 \begin{itemize} 153 \item 154 Forces us to address a hard problem: branch displacement 155 \item 156 Solution: isolate \alert{expansion policy} from generic correctness proof 157 \item 158 Separately (Boender): well-defined and verified expansion policies (99\% complete) 159 \item 160 Jaap's work is the first machine formalised treatment of branch displacement 351 161 \end{itemize} 352 162 \end{frame} … … 419 229 420 230 \begin{frame} 421 \frametitle{Structured traces I}231 \frametitle{Structured traces} 422 232 \begin{itemize} 423 233 \item 424 234 We introduced a notion of `structured traces' 425 235 \item 426 These are intended to statically capture the (good) execution traces of a program 427 \item 428 To borrow a slogan: they are the `computational content of a well-formed program's execution' 429 \item 430 Come in two variants: inductive and coinductive 431 \item 432 Inductive captures program execution traces that eventually halt, coinductive ones that diverge 433 \end{itemize} 434 \end{frame} 435 436 \begin{frame} 437 \frametitle{Structured traces II} 438 \begin{itemize} 439 \item 440 I focus on the inductive variety, as used the most (for now) in the backend 441 \item 442 In particular, used in the proof that static and dynamic cost computations coincide 443 \item 444 Traces preserved by backend compilation, initially created at RTLabs 445 \item 446 This will be explained later 447 \end{itemize} 448 \end{frame} 449 450 \begin{frame} 451 \frametitle{Structured traces III} 452 \begin{itemize} 453 \item 454 Central insight is that program execution is always in the body of some function (from \texttt{main} onwards) 455 \item 456 A well formed program must have labels appearing at certain spots 457 \item 458 Similarly, the final instruction executed when executing a function must be a \texttt{RET} 459 \item 460 Execution must then continue in body of calling function, at correct place 461 \item 462 These invariants, and others, are crystalised in the specific syntactic form of a structured trace 463 \end{itemize} 464 \end{frame} 465 466 \begin{frame} 467 \frametitle{Recursive structure of `good' execution I} 236 These are needed to statically capture the (good) execution traces of a program 237 \item 238 They are the `computational content of the proof of well-labelledness' 239 \item 240 Translation passes \alert{must} preserve structure of traces, not just their flattening 241 \end{itemize} 242 \end{frame} 243 244 \begin{frame} 245 \frametitle{Recursive structure of `good' execution} 468 246 \begin{itemize} 469 247 \item … … 478 256 479 257 \begin{frame} 480 \frametitle{Static and dynamic costs }258 \frametitle{Static and dynamic costs I} 481 259 \begin{center} 482 260 \includegraphics[scale=0.33]{recursive_structure.png} 483 261 \begin{tabular}[b]{ll} 484 & \texttt{emit( l1)} \\262 & \texttt{emit(L1)} \\ 485 263 & \texttt{MOV r1 0}\\ 486 264 & \texttt{ADD r1 r2}\\ 487 & \ texttt{CALL f} \\265 & \alert{\texttt{CALL f}} \\ 488 266 & \texttt{ADD r2 r2}\\ 489 267 & \texttt{MOV r2 0}\\ … … 491 269 \end{tabular} 492 270 \end{center} 493 \makebox[0pt][l]{k($ l_1$) = k(\texttt{MOV}) + k (\texttt{ADD}) + \ldots + k(\texttt{RET})}\\494 static-cost(trace) = k($ l_1$) + \ldots + k($l_4$)\\271 \makebox[0pt][l]{k($L_1$) = k(\texttt{MOV}) + k (\texttt{ADD}) + \ldots + k(\texttt{RET})}\\ 272 static-cost(trace) = k($L_1$) + \ldots + k($L_4$)\\ 495 273 dynamic-cost(trace) = \texttt{clock}(Final$_1$) - \texttt{clock}(Start$_1$)\\ 496 274 \alert{Theorem: static-cost(trace) = dynamic-cost(trace)} … … 515 293 \end{frame} 516 294 517 \section{ Changes to tools and prototypes, looking forward}295 \section{Feedback to Matita and the OCaml prototype} 518 296 519 297 \begin{frame} … … 521 299 \begin{itemize} 522 300 \item 523 Bug fixes spotted in the formalisation so far have been merged back into the OCaml compiler301 Bug fixes spotted in the formalisation so far have been merged back into the OCaml prototype compiler 524 302 \item 525 303 Larger changes like the \texttt{Joint} machinery have so far not 526 304 \item 527 It is unclear whether they will be 528 \item 529 Just a generalisation of what is already there 530 \item 531 Supposed to make formalisation easier 305 Unclear whether they will be: their real value is in streamlining formalisation 532 306 \item 533 307 Further, we want to ensure that the untrusted compiler is as correct as possible, for experiments in e.g. Frama-C … … 573 347 Permanents `floating' 574 348 \item 575 Believe we have enough manpower to complete back end (required 21 man months)349 Believe we have enough manpower to complete back-end (required 21 man months) 576 350 \end{itemize} 577 351 \end{frame} … … 582 356 \begin{itemize} 583 357 \item 584 Translated the OCaml prototype's back end intermediate languages into Matita358 Translated the OCaml prototype's back-end intermediate languages into Matita 585 359 \item 586 360 Implemented the translations between languages, and given the intermediate languages a semantics 587 361 \item 588 Refactored many of the back end intermediate languages into a common, parametric `joint' language, that is later specialised589 \item 590 Spotted opportunities for possibly commuting back end translation passes362 Refactored many of the back-end intermediate languages into a common, parametric `joint' language, that is later specialised 363 \item 364 Spotted opportunities for possibly commuting back-end translation passes 591 365 \item 592 366 Used six months to define structured traces and start the proof of correctness for the assembler -
Deliverables/D1.2/Presentations/recursive_structure.svg
r1853 r1867 81 81 inkscape:pageopacity="0.0" 82 82 inkscape:pageshadow="2" 83 inkscape:zoom="0. 6"83 inkscape:zoom="0.84852814" 84 84 inkscape:cx="130" 85 inkscape:cy=" 300"85 inkscape:cy="233.61087" 86 86 inkscape:current-layer="layer1" 87 87 inkscape:document-units="px" … … 296 296 x="210.20999" 297 297 y="292.24133" 298 style="font-size:36px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;fill:#000000;font-family:LMMono8;-inkscape-font-specification:LMMono8">l2</tspan></text> 298 style="font-size:36px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;fill:#000000;font-family:LMMono8;-inkscape-font-specification:LMMono8">L<tspan 299 style="font-size:65.00091553%;baseline-shift:sub" 300 id="tspan3051">2</tspan></tspan></text> 299 301 <text 300 302 xml:space="preserve" … … 308 310 x="216.54333" 309 311 y="127.57468" 310 style="font-size:36px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;fill:#000000;font-family:LMMono8;-inkscape-font-specification:LMMono8">l3</tspan></text> 312 style="font-size:36px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;fill:#000000;font-family:LMMono8;-inkscape-font-specification:LMMono8">L<tspan 313 style="font-size:65.00091553%;baseline-shift:sub" 314 id="tspan3053">3</tspan></tspan></text> 311 315 <text 312 316 xml:space="preserve" … … 320 324 x="110.87666" 321 325 y="450.57468" 322 style="font-size:36px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;fill:#000000;font-family:LMMono8;-inkscape-font-specification:LMMono8">l1</tspan></text> 326 style="font-size:36px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;fill:#000000;font-family:LMMono8;-inkscape-font-specification:LMMono8">L<tspan 327 style="font-size:65.00091553%;baseline-shift:sub" 328 id="tspan3049">1</tspan></tspan></text> 323 329 <text 324 330 xml:space="preserve" … … 332 338 x="385.80167" 333 339 y="63.313339" 334 style="font-size:36px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;fill:#000000;font-family:LMMono12;-inkscape-font-specification:LMMono12">l4</tspan></text> 340 style="font-size:36px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;fill:#000000;font-family:LMMono12;-inkscape-font-specification:LMMono12">L<tspan 341 style="font-size:65.00091553%;baseline-shift:sub" 342 id="tspan3055">4</tspan></tspan></text> 335 343 <text 336 344 xml:space="preserve"
Note: See TracChangeset
for help on using the changeset viewer.