Changeset 1432 for Deliverables


Ignore:
Timestamp:
Oct 20, 2011, 4:21:29 PM (9 years ago)
Author:
mulligan
Message:

finished d4.2 report

File:
1 edited

Legend:

Unmodified
Added
Removed
  • Deliverables/D4.2-4.3/reports/D4-2.tex

    r1428 r1432  
    162162\paragraph{RTLntl (Register Transfer Language --- No Tailcalls)}
    163163This language is a pseudoregister, graph based language where all tailcalls are eliminated.
    164 RTLntl is not present in the O'Caml compiler.
     164RTLntl is not present in the O'Caml compiler, the RTL language is reused for this purpose.
    165165
    166166\paragraph{ERTL (Explicit Register Transfer Language)}
     
    197197The O'Caml compiler is written in the following manner.
    198198Each intermediate language has its own dedicated syntax, notions of internal function, and so on.
    199 Here, we make a distinction between `internal functions'---other functions that are explicitly written by the programmer, and `external functions', which belong to external library and require explictly linking.
    200 Internal functions are represented as a record, consisting of a sequential structure, of some description, of statements, entry and exit points to this structure, and other book keeping devices.
     199Here, we make a distinction between `internal functions'---other functions that are explicitly written by the programmer---and `external functions', which belong to external library and require explictly linking.
     200In particular, IO can be seen as a special case of the `external function' mechanism.
     201Internal functions are represented as a record consisting of a sequential structure of statements, entry and exit points to this structure, and other book keeping devices.
     202This sequential structure can either be a control flow graph or a linearised list of statements, depending on the language.
    201203Translations between intermediate language map syntaxes to syntaxes, and internal function representations to internal function representations explicitly.
    202204
     
    225227In the type above, this parameterisation is realised with the \texttt{params\_\_} record.
    226228As a result of this parameterisation, we have also added a degree of `type safety' to the intermediate languages' syntaxes.
    227 In particular, we note that the \texttt{OP1} constructor expects quite a specific type, in that the two register arguments must both be the accumulator A.
     229In particular, we note that the \texttt{OP1} constructor expects quite a specific type, in that the two register arguments must both be what passes for the accumulator A in that language.
     230In some languages, for example LIN, this is the hardware accumulator, whilst in others this is any pseudoregister.
    228231Contrast this with the \texttt{INT} constructor, which expects a \texttt{generic\_reg}, corresponding to an `arbitrary' register type.
    229232
     
    246249\paragraph{Shared code, reduced proofs}
    247250Many features of individual backend intermediate languages are shared with other intermediate languages.
    248 For instance, RTLabs, RTL, ERTL and LTL are all graph based languages, where functions are represented as a graph of statements that form their bodies.
     251For instance, RTLabs, RTL, ERTL and LTL are all graph based languages, where functions are represented as a control flow graph of statements that form their bodies.
    249252Functions for adding statements to a graph, searching the graph, and so on, are remarkably similar across all languages, but are duplicated in the O'Caml code.
    250253
     
    267270\end{lstlisting}
    268271In particular, everything that can vary between differing intermediate languages has been parameterised.
    269 Here, we see the number of parameters, the listing of local variables, and the internal code representation has been parameterised.
     272Here, we see the location where to fetch parameters, the listing of local variables, and the internal code representation has been parameterised.
    270273Other particulars are also parameterised, though here omitted.
    271274
    272 Hopefully this abstraction process will reduce the number of proofs that need to be written, dealing with internal functions.
    273 We only need to prove once that fetching a statement's successor is `correct', and we inherit this property for free for every intermediate language.
     275Hopefully this abstraction process will reduce the number of proofs that need to be written, dealing with internal functions of different languages characterised by parameters.
    274276
    275277\paragraph{Dependency on instruction selection}
     
    288290\end{lstlisting}
    289291The capitalised constructors---\texttt{INT}, \texttt{MOVE}, and so on---are all machine specific instructions.
    290 Retargetting the compiler to another microprocessor would entail replacing these constructors with constructors that correspond to the instructions of the new target.
     292Retargetting the compiler to another microprocessor, improving instruction selection, or simply enlarging the subset of the machine language that the compiler can use, would entail replacing these constructors with constructors that correspond to the instructions of the new target.
    291293We feel that this makes which instructions are target dependent, and which are not (i.e. those language specific instructions that fall inside the \texttt{extension} constructor) much more explicit.
     294In the long term, we would really like to try to directly embed the target language in the syntax, in order to reuse the target language's semantics.
    292295
    293296\paragraph{Independent development and testing}
    294297We have essentially modularised the intermediate languages in the compiler backend.
    295 As with any form of modularisation, we reap benefits in the ability to independently test and develop each intermediate language separately.
     298As with any form of modularisation, we reap benefits in the ability to independently test and develop each intermediate language separately, with the benefit of fixing bugs just once.
    296299
    297300\paragraph{Future reuse for other compiler projects}
    298301Another advantage of our modularisation scheme is the ability to quickly use and reuse intermediate languages for other compiler projects.
    299 For instance, in creating a cost-preserving compiler for a functional language, we may choose to target the RTL language directly.
    300 Naturally, the register requirements for a functional language may differ from those of an imperative language, a reconfiguration which our parameterisation makes easy.
     302For instance, in creating a cost-preserving compiler for a functional language, we may choose to target a linearised version of RTL directly.
     303Adding such an intermediate language would involve the addition of just a few lines of code.
    301304
    302305\paragraph{Easy addition of new compiler passes}
     
    306309As generic code for the `joint' language exists, for example to add statements to control flow graphs, this code can be reused for the new intermediate language.
    307310
    308 \paragraph{Possible language commutations}
     311\paragraph{Possible commutations of translation passes}
    309312The backend translation passes of the CerCo compiler differ quite a bit from the CompCert compiler.
    310313In the CompCert compiler, linearisation occurs much earlier in the compilation chain, and passes such as register colouring and allocation are carried out on a linearised form of program.
    311314Contrast this with our own approach, where the code is represented as a graph for much longer.
     315Similarly, in CompCert the calling conventions are enforced after register allocation, whereas we do register allocation before enforcing the calling convention.
    312316
    313317However, by abstracting the representation of intermediate functions, we are now much more free to reorder translation passes as we see fit.
     
    323327\label{subsect.use.of.dependent.types}
    324328
    325 We see three potential ways in which a compiler can fail to compile a program:
     329We see two potential ways in which a compiler can fail to compile a program:
    326330\begin{enumerate}
    327331\item
    328332The program is malformed, and there is no hope of making sense of the program.
    329333\item
    330 A heuristic or algorithm in the compiler is implemented incorrectly, in which case an otherwise correct source program fails to be compiler to correct assembly code.
    331 \item
    332 An invariant in the compiler is invalidated.
     334The compiler is buggy, or an invariant in the compiler is invalidated, for example, if the program is too large to fit into the processor's code memory.
    333335\end{enumerate}
    334 The first source of failure we are unable to do anything about.
    335 The latter two sources of failure should be interpreted as a compiler bug, and, as part of a verified compiler project, we'd like to rule out all such bugs.
    336 In CerCo, we aim to use dependent types to help us enforce invariants and prove our heuristics and algorithms correct.
    337 
    338336First, we encode informal invariants, or uses of \texttt{assert false} in the O'Caml code, with dependent types, converting partial functions into total functions.
    339337There are numerous examples of this throughout the backend.
     
    354352This was an assertion in the O'Caml code.
    355353
    356 Secondly, we make use of dependent types to make the Matita code easier to read, and eventually the proofs of correctness for the compiler easier to write.
     354Secondly, we make use of dependent types to make the Matita code correct by construction, and eventually the proofs of correctness for the compiler easier to write.
    357355For instance, many intermediate languages in the backend of the compiler, from RTLabs to LTL, are graph based languages.
    358356Here, function definitions consist of a graph (i.e. a map from labels to statements) and a pair of labels denoting the entry and exit points of this graph.
     
    375373Namely, CompCert makes little use of dependent types to encode invariants.
    376374In contrast, we wish to make as much use of dependent types as possible, both to experiment with different ways of encoding compilers in a proof assistant, but also as a way of `stress testing' Matita's support for dependent types.
     375In particular, we would like to make use of Matita's support for `Russell'-style reasoning.
    377376
    378377%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
     
    398397
    399398It should be noted that these axiomatised components fall into the following pattern: whilst their implementation is complex, and their proof of correctness is difficult, we are able to quickly and easily verify that any answer that they provide is correct.
    400 As a result, we do not see this axiomatisation process as being too onerous.
    401399\item
    402400\textbf{A few non-computational proof obligations.}
     
    423421\label{sect.associated.changes.to.ocaml.compiler}
    424422
    425 At the moment, no changes we have made in the Matita backend have made their way back into the O'Caml compiler.
    426 We do not see the heavy process of modularisation and abstraction as making its way back into the O'Caml codebase, as this is a significant rewrite of the backend code.
    427 However, several bugfixes, and the identification of `hidden invariants' in the O'Caml code will be incorporated back into the prototype.
     423At the moment, only bugfixes, but no architectural changes we have made in the Matita backend have made their way back into the O'Caml compiler.
     424We do not see the heavy process of modularisation and abstraction as making its way back into the O'Caml codebase, as this is a significant rewrite of the backend code that is supposed to yield the same code after instantiating parameters, anyway.
    428425
    429426%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
     
    469466
    470467Translation specific files (files relating to language semantics have been omitted) are presented in Table~\ref{table.syntax}.
     468\begin{landscape}
    471469\begin{table}
    472470\begin{threeparttable}
     
    474472Title & Description & O'Caml & Ratio \\
    475473\hline
     474\texttt{joint/Joint.ma} & Abstracted syntax for backend languages & N/A & N/A \\
    476475\texttt{RTLabs/syntax.ma} & The syntax of RTLabs & \texttt{RTLabs/RTLabs.mli} & 0.65 \\
    477 \texttt{joint/Joint.ma} & Abstracted syntax for backend languages & N/A & N/A \\
    478476\texttt{RTL/RTL.ma} & The syntax of RTL & \texttt{RTL/RTL.mli} & 1.85\tnote{b} \\
    479477\texttt{ERTL/ERTL.ma} & The syntax of ERTL & \texttt{ERTL/ERTL.mli} & 1.04\tnote{b} \\
     478\texttt{LTL/LTL.ma} & The specialisation of the above file to the syntax of LTL & \texttt{LTL/LTL.mli} & 1.86\tnote{a} \\
    480479\texttt{LIN/joint\_LTL\_LIN.ma} & The syntax of the abstracted combined LTL and LIN language & N/A & N/A \\
    481 \texttt{LTL/LTL.ma} & The specialisation of the above file to the syntax of LTL & \texttt{LTL/LTL.mli} & 1.86\tnote{a} \\
    482480\texttt{LIN/LIN.ma} & The specialisation of the above file to the syntax of LIN & \texttt{LIN/LIN.mli} & 2.27\tnote{a} \\
    483481\end{tabular}
    484482\begin{tablenotes}
    485483  \item[a] Includes \texttt{joint/Joint\_LTL\_LIN.ma} and \texttt{joint/Joint.ma}.
    486   \item[b] Includes \texttt{joint/Joint.ma}.
     484  \item[b] Includes \texttt{joint/Joint.ma}.\\
     485  Total lines of Matita code for the above files: 347. \\
     486  Total lines of O'Caml code for the above files: 616. \\
     487  Ration of total lines: 0.56.
    487488\end{tablenotes}
    488489\end{threeparttable}
     
    490491\label{table.syntax}
    491492\end{table}
     493\end{landscape}
    492494Here, the O'Caml column denotes the O'Caml source file in the prototype compiler's implementation that corresponds to the Matita script in question.
    493495The ratios are the linecounts of the Matita file divided by the line counts of the corresponding O'Caml file.
    494496These are computed with \texttt{wc -l}, a standard Unix tool.
     497
     498Individual file's ratios are an over approximation, due to the fact that it's hard to relate an individual O'Caml file to the abstracted Matita code that has been spread across multiple files.
     499The ratio between total Matita code lines and total O'Caml code lines is more reflective of the compressed and abstracted state of the Matita translation.
     500
    495501Translations and utilities are presented in Table~\ref{table.translation.utilities}.
    496502\begin{landscape}
     
    500506Title & Description & O'Caml & Ratio \\
    501507\hline
     508\texttt{joint/TranslateUtils.ma} & Generic translation utilities & N/A & N/A \\
    502509\texttt{RTLabs/RTLabsToRTL.ma} & Translation from RTLabs to RTL & \texttt{RTLabs/RTLabsToRTL.ml} & 1.61 \\
    503 \texttt{joint/TranslateUtils.ma} & Generic translation utilities & N/A & N/A \\
    504 \texttt{joint/Joint\_LTL\_LIN.ma} & Generic code for LTL and LIN languages & N/A & N/A \\
    505 \texttt{RTL/RTLToERTL.ma} & Translation from RTL to ERTL & \texttt{RTL/RTLToERTL.ml} & 1.38\tnote{a}\tnote{b}\tnote{c} \\
    506 \texttt{RTL/RTLtailcall.ma} & Elimination of tailcalls & \texttt{RTL/RTLtailcall.ml} & 2.01 \\
    507 \texttt{ERTL/ERTLToLTL.ma} & Translation from ERTL to LTL & \texttt{ERTL/ERTLToLTL.ml} & 1.39\tnote{a}\tnote{b}\tnote{c}\tnote{d} \\
    508 \texttt{ERTL/Interference.ma} & Axiomatised graph colouring component & \texttt{common/interference.ml} & 0.23\tnote{b} \\
    509 \texttt{ERTL/liveness.ma} & Liveness analysis & \texttt{ERTL/liveness.ml} & 1.46\tnote{b} \\
    510 \texttt{LTL/LTLToLIN.ma} & Translation from LTL to LIN & \texttt{LTL/LTLToLIN.ml} & 1.10 \tnote{a}\tnote{b}\tnote{c}\tnote{e}\tnote{f} \\
    511 \texttt{LIN/LINToASM.ma} & Translation from LIN to assembly & \texttt{LIN/LINToASM.ml} & 4.06\tnote{a}\tnote{b}\tnote{c}\tnote{f}
     510\texttt{RTL/RTLToERTL.ma} & Translation from RTL to ERTL & \texttt{RTL/RTLToERTL.ml} & 1.38\tnote{a} \\
     511\texttt{RTL/RTLTailcall.ma} & Elimination of tailcalls & \texttt{RTL/RTLtailcall.ml} & 2.01 \\
     512\texttt{ERTL/ERTLToLTL.ma} & Translation from ERTL to LTL & \texttt{ERTL/ERTLToLTL.ml} & 1.39\tnote{b} \\
     513\texttt{ERTL/Interference.ma} & Axiomatised graph colouring component & \texttt{common/interference.ml} & 0.23\tnote{c} \\
     514\texttt{ERTL/liveness.ma} & Liveness analysis & \texttt{ERTL/liveness.ml} & 1.46\tnote{c} \\
     515\texttt{LTL/LTLToLIN.ma} & Translation from LTL to LIN & \texttt{LTL/LTLToLIN.ml} & 1.10 \tnote{d} \\
     516\texttt{LIN/joint\_LTL\_LIN.ma} & Generic code for LTL and LIN languages & N/A & N/A \\
     517\texttt{LIN/LINToASM.ma} & Translation from LIN to assembly & \texttt{LIN/LINToASM.ml} & 4.06\tnote{e}
    512518\end{tabular}
    513519\begin{tablenotes}
    514   \item[a] Includes \texttt{joint/TranslateUtils.ma}.
    515   \item[b] Includes \texttt{joint/Joint.ma}.
    516   \item[c] Includes \texttt{joint/TranslateUtil.ma}.
    517   \item[d] Includes \texttt{ERTL/ERTLToLTLI.ml}.
    518   \item[e] Includes \texttt{LTL/LTLToLINI.ml}.
    519   \item[f] Includes \texttt{joint/joint\_LTL\_LIN.ma}.
     520  \item[a] Includes \texttt{joint/TranslateUtils.ma}, \texttt{joint/Joint.ma} and \texttt{joint/TranslateUtil.ma}.
     521  \item[b] Includes \texttt{joint/TranslateUtils.ma}, \texttt{joint/Joint.ma}, \texttt{joint/TranslateUtil.ma} and \texttt{ERTL/ERTLToLTLI.ml}.
     522  \item[c] Includes \texttt{joint/Joint.ma}.
     523  \item[d] Includes \texttt{joint/TranslateUtils.ma}, \texttt{joint/Joint.ma}, \texttt{joint/TranslateUtil.ma}, \texttt{LTL/LTLToLINI.ml} and \texttt{joint/joint\_LTL\_LIN.ma}.
     524  \item[e] Includes \texttt{joint/TranslateUtils.ma}, \texttt{joint/Joint.ma}, \texttt{joint/TranslateUtil.ma} and \texttt{joint/joint\_LTL\_LIN.ma}.\\
     525  Total lines of Matita code for the above files: 3032. \\
     526  Total lines of O'Caml code for the above files: 3332. \\
     527  Ration of total lines: 0.91.
    520528\end{tablenotes}
    521529\end{threeparttable}
Note: See TracChangeset for help on using the changeset viewer.