Changeset 1432 for Deliverables/D4.2-4.3
- Timestamp:
- Oct 20, 2011, 4:21:29 PM (9 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
Deliverables/D4.2-4.3/reports/D4-2.tex
r1428 r1432 162 162 \paragraph{RTLntl (Register Transfer Language --- No Tailcalls)} 163 163 This language is a pseudoregister, graph based language where all tailcalls are eliminated. 164 RTLntl is not present in the O'Caml compiler .164 RTLntl is not present in the O'Caml compiler, the RTL language is reused for this purpose. 165 165 166 166 \paragraph{ERTL (Explicit Register Transfer Language)} … … 197 197 The O'Caml compiler is written in the following manner. 198 198 Each intermediate language has its own dedicated syntax, notions of internal function, and so on. 199 Here, we make a distinction between `internal functions'---other functions that are explicitly written by the programmer, and `external functions', which belong to external library and require explictly linking. 200 Internal functions are represented as a record, consisting of a sequential structure, of some description, of statements, entry and exit points to this structure, and other book keeping devices. 199 Here, we make a distinction between `internal functions'---other functions that are explicitly written by the programmer---and `external functions', which belong to external library and require explictly linking. 200 In particular, IO can be seen as a special case of the `external function' mechanism. 201 Internal functions are represented as a record consisting of a sequential structure of statements, entry and exit points to this structure, and other book keeping devices. 202 This sequential structure can either be a control flow graph or a linearised list of statements, depending on the language. 201 203 Translations between intermediate language map syntaxes to syntaxes, and internal function representations to internal function representations explicitly. 202 204 … … 225 227 In the type above, this parameterisation is realised with the \texttt{params\_\_} record. 226 228 As a result of this parameterisation, we have also added a degree of `type safety' to the intermediate languages' syntaxes. 227 In particular, we note that the \texttt{OP1} constructor expects quite a specific type, in that the two register arguments must both be the accumulator A. 229 In particular, we note that the \texttt{OP1} constructor expects quite a specific type, in that the two register arguments must both be what passes for the accumulator A in that language. 230 In some languages, for example LIN, this is the hardware accumulator, whilst in others this is any pseudoregister. 228 231 Contrast this with the \texttt{INT} constructor, which expects a \texttt{generic\_reg}, corresponding to an `arbitrary' register type. 229 232 … … 246 249 \paragraph{Shared code, reduced proofs} 247 250 Many features of individual backend intermediate languages are shared with other intermediate languages. 248 For instance, RTLabs, RTL, ERTL and LTL are all graph based languages, where functions are represented as a graph of statements that form their bodies.251 For instance, RTLabs, RTL, ERTL and LTL are all graph based languages, where functions are represented as a control flow graph of statements that form their bodies. 249 252 Functions for adding statements to a graph, searching the graph, and so on, are remarkably similar across all languages, but are duplicated in the O'Caml code. 250 253 … … 267 270 \end{lstlisting} 268 271 In particular, everything that can vary between differing intermediate languages has been parameterised. 269 Here, we see the number ofparameters, the listing of local variables, and the internal code representation has been parameterised.272 Here, we see the location where to fetch parameters, the listing of local variables, and the internal code representation has been parameterised. 270 273 Other particulars are also parameterised, though here omitted. 271 274 272 Hopefully this abstraction process will reduce the number of proofs that need to be written, dealing with internal functions. 273 We only need to prove once that fetching a statement's successor is `correct', and we inherit this property for free for every intermediate language. 275 Hopefully this abstraction process will reduce the number of proofs that need to be written, dealing with internal functions of different languages characterised by parameters. 274 276 275 277 \paragraph{Dependency on instruction selection} … … 288 290 \end{lstlisting} 289 291 The capitalised constructors---\texttt{INT}, \texttt{MOVE}, and so on---are all machine specific instructions. 290 Retargetting the compiler to another microprocessor would entail replacing these constructors with constructors that correspond to the instructions of the new target.292 Retargetting the compiler to another microprocessor, improving instruction selection, or simply enlarging the subset of the machine language that the compiler can use, would entail replacing these constructors with constructors that correspond to the instructions of the new target. 291 293 We feel that this makes which instructions are target dependent, and which are not (i.e. those language specific instructions that fall inside the \texttt{extension} constructor) much more explicit. 294 In the long term, we would really like to try to directly embed the target language in the syntax, in order to reuse the target language's semantics. 292 295 293 296 \paragraph{Independent development and testing} 294 297 We have essentially modularised the intermediate languages in the compiler backend. 295 As with any form of modularisation, we reap benefits in the ability to independently test and develop each intermediate language separately .298 As with any form of modularisation, we reap benefits in the ability to independently test and develop each intermediate language separately, with the benefit of fixing bugs just once. 296 299 297 300 \paragraph{Future reuse for other compiler projects} 298 301 Another advantage of our modularisation scheme is the ability to quickly use and reuse intermediate languages for other compiler projects. 299 For instance, in creating a cost-preserving compiler for a functional language, we may choose to target the RTL languagedirectly.300 Naturally, the register requirements for a functional language may differ from those of an imperative language, a reconfiguration which our parameterisation makes easy.302 For instance, in creating a cost-preserving compiler for a functional language, we may choose to target a linearised version of RTL directly. 303 Adding such an intermediate language would involve the addition of just a few lines of code. 301 304 302 305 \paragraph{Easy addition of new compiler passes} … … 306 309 As generic code for the `joint' language exists, for example to add statements to control flow graphs, this code can be reused for the new intermediate language. 307 310 308 \paragraph{Possible language commutations}311 \paragraph{Possible commutations of translation passes} 309 312 The backend translation passes of the CerCo compiler differ quite a bit from the CompCert compiler. 310 313 In the CompCert compiler, linearisation occurs much earlier in the compilation chain, and passes such as register colouring and allocation are carried out on a linearised form of program. 311 314 Contrast this with our own approach, where the code is represented as a graph for much longer. 315 Similarly, in CompCert the calling conventions are enforced after register allocation, whereas we do register allocation before enforcing the calling convention. 312 316 313 317 However, by abstracting the representation of intermediate functions, we are now much more free to reorder translation passes as we see fit. … … 323 327 \label{subsect.use.of.dependent.types} 324 328 325 We see t hreepotential ways in which a compiler can fail to compile a program:329 We see two potential ways in which a compiler can fail to compile a program: 326 330 \begin{enumerate} 327 331 \item 328 332 The program is malformed, and there is no hope of making sense of the program. 329 333 \item 330 A heuristic or algorithm in the compiler is implemented incorrectly, in which case an otherwise correct source program fails to be compiler to correct assembly code. 331 \item 332 An invariant in the compiler is invalidated. 334 The compiler is buggy, or an invariant in the compiler is invalidated, for example, if the program is too large to fit into the processor's code memory. 333 335 \end{enumerate} 334 The first source of failure we are unable to do anything about.335 The latter two sources of failure should be interpreted as a compiler bug, and, as part of a verified compiler project, we'd like to rule out all such bugs.336 In CerCo, we aim to use dependent types to help us enforce invariants and prove our heuristics and algorithms correct.337 338 336 First, we encode informal invariants, or uses of \texttt{assert false} in the O'Caml code, with dependent types, converting partial functions into total functions. 339 337 There are numerous examples of this throughout the backend. … … 354 352 This was an assertion in the O'Caml code. 355 353 356 Secondly, we make use of dependent types to make the Matita code easier to read, and eventually the proofs of correctness for the compiler easier to write.354 Secondly, we make use of dependent types to make the Matita code correct by construction, and eventually the proofs of correctness for the compiler easier to write. 357 355 For instance, many intermediate languages in the backend of the compiler, from RTLabs to LTL, are graph based languages. 358 356 Here, function definitions consist of a graph (i.e. a map from labels to statements) and a pair of labels denoting the entry and exit points of this graph. … … 375 373 Namely, CompCert makes little use of dependent types to encode invariants. 376 374 In contrast, we wish to make as much use of dependent types as possible, both to experiment with different ways of encoding compilers in a proof assistant, but also as a way of `stress testing' Matita's support for dependent types. 375 In particular, we would like to make use of Matita's support for `Russell'-style reasoning. 377 376 378 377 %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% … … 398 397 399 398 It should be noted that these axiomatised components fall into the following pattern: whilst their implementation is complex, and their proof of correctness is difficult, we are able to quickly and easily verify that any answer that they provide is correct. 400 As a result, we do not see this axiomatisation process as being too onerous.401 399 \item 402 400 \textbf{A few non-computational proof obligations.} … … 423 421 \label{sect.associated.changes.to.ocaml.compiler} 424 422 425 At the moment, no changes we have made in the Matita backend have made their way back into the O'Caml compiler. 426 We do not see the heavy process of modularisation and abstraction as making its way back into the O'Caml codebase, as this is a significant rewrite of the backend code. 427 However, several bugfixes, and the identification of `hidden invariants' in the O'Caml code will be incorporated back into the prototype. 423 At the moment, only bugfixes, but no architectural changes we have made in the Matita backend have made their way back into the O'Caml compiler. 424 We do not see the heavy process of modularisation and abstraction as making its way back into the O'Caml codebase, as this is a significant rewrite of the backend code that is supposed to yield the same code after instantiating parameters, anyway. 428 425 429 426 %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% … … 469 466 470 467 Translation specific files (files relating to language semantics have been omitted) are presented in Table~\ref{table.syntax}. 468 \begin{landscape} 471 469 \begin{table} 472 470 \begin{threeparttable} … … 474 472 Title & Description & O'Caml & Ratio \\ 475 473 \hline 474 \texttt{joint/Joint.ma} & Abstracted syntax for backend languages & N/A & N/A \\ 476 475 \texttt{RTLabs/syntax.ma} & The syntax of RTLabs & \texttt{RTLabs/RTLabs.mli} & 0.65 \\ 477 \texttt{joint/Joint.ma} & Abstracted syntax for backend languages & N/A & N/A \\478 476 \texttt{RTL/RTL.ma} & The syntax of RTL & \texttt{RTL/RTL.mli} & 1.85\tnote{b} \\ 479 477 \texttt{ERTL/ERTL.ma} & The syntax of ERTL & \texttt{ERTL/ERTL.mli} & 1.04\tnote{b} \\ 478 \texttt{LTL/LTL.ma} & The specialisation of the above file to the syntax of LTL & \texttt{LTL/LTL.mli} & 1.86\tnote{a} \\ 480 479 \texttt{LIN/joint\_LTL\_LIN.ma} & The syntax of the abstracted combined LTL and LIN language & N/A & N/A \\ 481 \texttt{LTL/LTL.ma} & The specialisation of the above file to the syntax of LTL & \texttt{LTL/LTL.mli} & 1.86\tnote{a} \\482 480 \texttt{LIN/LIN.ma} & The specialisation of the above file to the syntax of LIN & \texttt{LIN/LIN.mli} & 2.27\tnote{a} \\ 483 481 \end{tabular} 484 482 \begin{tablenotes} 485 483 \item[a] Includes \texttt{joint/Joint\_LTL\_LIN.ma} and \texttt{joint/Joint.ma}. 486 \item[b] Includes \texttt{joint/Joint.ma}. 484 \item[b] Includes \texttt{joint/Joint.ma}.\\ 485 Total lines of Matita code for the above files: 347. \\ 486 Total lines of O'Caml code for the above files: 616. \\ 487 Ration of total lines: 0.56. 487 488 \end{tablenotes} 488 489 \end{threeparttable} … … 490 491 \label{table.syntax} 491 492 \end{table} 493 \end{landscape} 492 494 Here, the O'Caml column denotes the O'Caml source file in the prototype compiler's implementation that corresponds to the Matita script in question. 493 495 The ratios are the linecounts of the Matita file divided by the line counts of the corresponding O'Caml file. 494 496 These are computed with \texttt{wc -l}, a standard Unix tool. 497 498 Individual file's ratios are an over approximation, due to the fact that it's hard to relate an individual O'Caml file to the abstracted Matita code that has been spread across multiple files. 499 The ratio between total Matita code lines and total O'Caml code lines is more reflective of the compressed and abstracted state of the Matita translation. 500 495 501 Translations and utilities are presented in Table~\ref{table.translation.utilities}. 496 502 \begin{landscape} … … 500 506 Title & Description & O'Caml & Ratio \\ 501 507 \hline 508 \texttt{joint/TranslateUtils.ma} & Generic translation utilities & N/A & N/A \\ 502 509 \texttt{RTLabs/RTLabsToRTL.ma} & Translation from RTLabs to RTL & \texttt{RTLabs/RTLabsToRTL.ml} & 1.61 \\ 503 \texttt{joint/TranslateUtils.ma} & Generic translation utilities & N/A & N/A \\ 504 \texttt{joint/Joint\_LTL\_LIN.ma} & Generic code for LTL and LIN languages & N/A & N/A \\ 505 \texttt{RTL/RTLToERTL.ma} & Translation from RTL to ERTL & \texttt{RTL/RTLToERTL.ml} & 1.38\tnote{a}\tnote{b}\tnote{c} \\ 506 \texttt{RTL/RTLtailcall.ma} & Elimination of tailcalls & \texttt{RTL/RTLtailcall.ml} & 2.01 \\ 507 \texttt{ERTL/ERTLToLTL.ma} & Translation from ERTL to LTL & \texttt{ERTL/ERTLToLTL.ml} & 1.39\tnote{a}\tnote{b}\tnote{c}\tnote{d} \\ 508 \texttt{ERTL/Interference.ma} & Axiomatised graph colouring component & \texttt{common/interference.ml} & 0.23\tnote{b} \\ 509 \texttt{ERTL/liveness.ma} & Liveness analysis & \texttt{ERTL/liveness.ml} & 1.46\tnote{b} \\ 510 \texttt{LTL/LTLToLIN.ma} & Translation from LTL to LIN & \texttt{LTL/LTLToLIN.ml} & 1.10 \tnote{a}\tnote{b}\tnote{c}\tnote{e}\tnote{f} \\ 511 \texttt{LIN/LINToASM.ma} & Translation from LIN to assembly & \texttt{LIN/LINToASM.ml} & 4.06\tnote{a}\tnote{b}\tnote{c}\tnote{f} 510 \texttt{RTL/RTLToERTL.ma} & Translation from RTL to ERTL & \texttt{RTL/RTLToERTL.ml} & 1.38\tnote{a} \\ 511 \texttt{RTL/RTLTailcall.ma} & Elimination of tailcalls & \texttt{RTL/RTLtailcall.ml} & 2.01 \\ 512 \texttt{ERTL/ERTLToLTL.ma} & Translation from ERTL to LTL & \texttt{ERTL/ERTLToLTL.ml} & 1.39\tnote{b} \\ 513 \texttt{ERTL/Interference.ma} & Axiomatised graph colouring component & \texttt{common/interference.ml} & 0.23\tnote{c} \\ 514 \texttt{ERTL/liveness.ma} & Liveness analysis & \texttt{ERTL/liveness.ml} & 1.46\tnote{c} \\ 515 \texttt{LTL/LTLToLIN.ma} & Translation from LTL to LIN & \texttt{LTL/LTLToLIN.ml} & 1.10 \tnote{d} \\ 516 \texttt{LIN/joint\_LTL\_LIN.ma} & Generic code for LTL and LIN languages & N/A & N/A \\ 517 \texttt{LIN/LINToASM.ma} & Translation from LIN to assembly & \texttt{LIN/LINToASM.ml} & 4.06\tnote{e} 512 518 \end{tabular} 513 519 \begin{tablenotes} 514 \item[a] Includes \texttt{joint/TranslateUtils.ma}. 515 \item[b] Includes \texttt{joint/Joint.ma}. 516 \item[c] Includes \texttt{joint/TranslateUtil.ma}. 517 \item[d] Includes \texttt{ERTL/ERTLToLTLI.ml}. 518 \item[e] Includes \texttt{LTL/LTLToLINI.ml}. 519 \item[f] Includes \texttt{joint/joint\_LTL\_LIN.ma}. 520 \item[a] Includes \texttt{joint/TranslateUtils.ma}, \texttt{joint/Joint.ma} and \texttt{joint/TranslateUtil.ma}. 521 \item[b] Includes \texttt{joint/TranslateUtils.ma}, \texttt{joint/Joint.ma}, \texttt{joint/TranslateUtil.ma} and \texttt{ERTL/ERTLToLTLI.ml}. 522 \item[c] Includes \texttt{joint/Joint.ma}. 523 \item[d] Includes \texttt{joint/TranslateUtils.ma}, \texttt{joint/Joint.ma}, \texttt{joint/TranslateUtil.ma}, \texttt{LTL/LTLToLINI.ml} and \texttt{joint/joint\_LTL\_LIN.ma}. 524 \item[e] Includes \texttt{joint/TranslateUtils.ma}, \texttt{joint/Joint.ma}, \texttt{joint/TranslateUtil.ma} and \texttt{joint/joint\_LTL\_LIN.ma}.\\ 525 Total lines of Matita code for the above files: 3032. \\ 526 Total lines of O'Caml code for the above files: 3332. \\ 527 Ration of total lines: 0.91. 520 528 \end{tablenotes} 521 529 \end{threeparttable}
Note: See TracChangeset
for help on using the changeset viewer.