# Changeset 1432

Ignore:
Timestamp:
Oct 20, 2011, 4:21:29 PM (8 years ago)
Message:

finished d4.2 report

File:
1 edited

### Legend:

Unmodified
 r1428 \paragraph{RTLntl (Register Transfer Language --- No Tailcalls)} This language is a pseudoregister, graph based language where all tailcalls are eliminated. RTLntl is not present in the O'Caml compiler. RTLntl is not present in the O'Caml compiler, the RTL language is reused for this purpose. \paragraph{ERTL (Explicit Register Transfer Language)} The O'Caml compiler is written in the following manner. Each intermediate language has its own dedicated syntax, notions of internal function, and so on. Here, we make a distinction between internal functions'---other functions that are explicitly written by the programmer, and external functions', which belong to external library and require explictly linking. Internal functions are represented as a record, consisting of a sequential structure, of some description, of statements, entry and exit points to this structure, and other book keeping devices. Here, we make a distinction between internal functions'---other functions that are explicitly written by the programmer---and external functions', which belong to external library and require explictly linking. In particular, IO can be seen as a special case of the external function' mechanism. Internal functions are represented as a record consisting of a sequential structure of statements, entry and exit points to this structure, and other book keeping devices. This sequential structure can either be a control flow graph or a linearised list of statements, depending on the language. Translations between intermediate language map syntaxes to syntaxes, and internal function representations to internal function representations explicitly. In the type above, this parameterisation is realised with the \texttt{params\_\_} record. As a result of this parameterisation, we have also added a degree of type safety' to the intermediate languages' syntaxes. In particular, we note that the \texttt{OP1} constructor expects quite a specific type, in that the two register arguments must both be the accumulator A. In particular, we note that the \texttt{OP1} constructor expects quite a specific type, in that the two register arguments must both be what passes for the accumulator A in that language. In some languages, for example LIN, this is the hardware accumulator, whilst in others this is any pseudoregister. Contrast this with the \texttt{INT} constructor, which expects a \texttt{generic\_reg}, corresponding to an arbitrary' register type. \paragraph{Shared code, reduced proofs} Many features of individual backend intermediate languages are shared with other intermediate languages. For instance, RTLabs, RTL, ERTL and LTL are all graph based languages, where functions are represented as a graph of statements that form their bodies. For instance, RTLabs, RTL, ERTL and LTL are all graph based languages, where functions are represented as a control flow graph of statements that form their bodies. Functions for adding statements to a graph, searching the graph, and so on, are remarkably similar across all languages, but are duplicated in the O'Caml code. \end{lstlisting} In particular, everything that can vary between differing intermediate languages has been parameterised. Here, we see the number of parameters, the listing of local variables, and the internal code representation has been parameterised. Here, we see the location where to fetch parameters, the listing of local variables, and the internal code representation has been parameterised. Other particulars are also parameterised, though here omitted. Hopefully this abstraction process will reduce the number of proofs that need to be written, dealing with internal functions. We only need to prove once that fetching a statement's successor is correct', and we inherit this property for free for every intermediate language. Hopefully this abstraction process will reduce the number of proofs that need to be written, dealing with internal functions of different languages characterised by parameters. \paragraph{Dependency on instruction selection} \end{lstlisting} The capitalised constructors---\texttt{INT}, \texttt{MOVE}, and so on---are all machine specific instructions. Retargetting the compiler to another microprocessor would entail replacing these constructors with constructors that correspond to the instructions of the new target. Retargetting the compiler to another microprocessor, improving instruction selection, or simply enlarging the subset of the machine language that the compiler can use, would entail replacing these constructors with constructors that correspond to the instructions of the new target. We feel that this makes which instructions are target dependent, and which are not (i.e. those language specific instructions that fall inside the \texttt{extension} constructor) much more explicit. In the long term, we would really like to try to directly embed the target language in the syntax, in order to reuse the target language's semantics. \paragraph{Independent development and testing} We have essentially modularised the intermediate languages in the compiler backend. As with any form of modularisation, we reap benefits in the ability to independently test and develop each intermediate language separately. As with any form of modularisation, we reap benefits in the ability to independently test and develop each intermediate language separately, with the benefit of fixing bugs just once. \paragraph{Future reuse for other compiler projects} Another advantage of our modularisation scheme is the ability to quickly use and reuse intermediate languages for other compiler projects. For instance, in creating a cost-preserving compiler for a functional language, we may choose to target the RTL language directly. Naturally, the register requirements for a functional language may differ from those of an imperative language, a reconfiguration which our parameterisation makes easy. For instance, in creating a cost-preserving compiler for a functional language, we may choose to target a linearised version of RTL directly. Adding such an intermediate language would involve the addition of just a few lines of code. \paragraph{Easy addition of new compiler passes} As generic code for the joint' language exists, for example to add statements to control flow graphs, this code can be reused for the new intermediate language. \paragraph{Possible language commutations} \paragraph{Possible commutations of translation passes} The backend translation passes of the CerCo compiler differ quite a bit from the CompCert compiler. In the CompCert compiler, linearisation occurs much earlier in the compilation chain, and passes such as register colouring and allocation are carried out on a linearised form of program. Contrast this with our own approach, where the code is represented as a graph for much longer. Similarly, in CompCert the calling conventions are enforced after register allocation, whereas we do register allocation before enforcing the calling convention. However, by abstracting the representation of intermediate functions, we are now much more free to reorder translation passes as we see fit. \label{subsect.use.of.dependent.types} We see three potential ways in which a compiler can fail to compile a program: We see two potential ways in which a compiler can fail to compile a program: \begin{enumerate} \item The program is malformed, and there is no hope of making sense of the program. \item A heuristic or algorithm in the compiler is implemented incorrectly, in which case an otherwise correct source program fails to be compiler to correct assembly code. \item An invariant in the compiler is invalidated. The compiler is buggy, or an invariant in the compiler is invalidated, for example, if the program is too large to fit into the processor's code memory. \end{enumerate} The first source of failure we are unable to do anything about. The latter two sources of failure should be interpreted as a compiler bug, and, as part of a verified compiler project, we'd like to rule out all such bugs. In CerCo, we aim to use dependent types to help us enforce invariants and prove our heuristics and algorithms correct. First, we encode informal invariants, or uses of \texttt{assert false} in the O'Caml code, with dependent types, converting partial functions into total functions. There are numerous examples of this throughout the backend. This was an assertion in the O'Caml code. Secondly, we make use of dependent types to make the Matita code easier to read, and eventually the proofs of correctness for the compiler easier to write. Secondly, we make use of dependent types to make the Matita code correct by construction, and eventually the proofs of correctness for the compiler easier to write. For instance, many intermediate languages in the backend of the compiler, from RTLabs to LTL, are graph based languages. Here, function definitions consist of a graph (i.e. a map from labels to statements) and a pair of labels denoting the entry and exit points of this graph. Namely, CompCert makes little use of dependent types to encode invariants. In contrast, we wish to make as much use of dependent types as possible, both to experiment with different ways of encoding compilers in a proof assistant, but also as a way of stress testing' Matita's support for dependent types. In particular, we would like to make use of Matita's support for Russell'-style reasoning. %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% It should be noted that these axiomatised components fall into the following pattern: whilst their implementation is complex, and their proof of correctness is difficult, we are able to quickly and easily verify that any answer that they provide is correct. As a result, we do not see this axiomatisation process as being too onerous. \item \textbf{A few non-computational proof obligations.} \label{sect.associated.changes.to.ocaml.compiler} At the moment, no changes we have made in the Matita backend have made their way back into the O'Caml compiler. We do not see the heavy process of modularisation and abstraction as making its way back into the O'Caml codebase, as this is a significant rewrite of the backend code. However, several bugfixes, and the identification of hidden invariants' in the O'Caml code will be incorporated back into the prototype. At the moment, only bugfixes, but no architectural changes we have made in the Matita backend have made their way back into the O'Caml compiler. We do not see the heavy process of modularisation and abstraction as making its way back into the O'Caml codebase, as this is a significant rewrite of the backend code that is supposed to yield the same code after instantiating parameters, anyway. %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% Translation specific files (files relating to language semantics have been omitted) are presented in Table~\ref{table.syntax}. \begin{landscape} \begin{table} \begin{threeparttable} Title & Description & O'Caml & Ratio \\ \hline \texttt{joint/Joint.ma} & Abstracted syntax for backend languages & N/A & N/A \\ \texttt{RTLabs/syntax.ma} & The syntax of RTLabs & \texttt{RTLabs/RTLabs.mli} & 0.65 \\ \texttt{joint/Joint.ma} & Abstracted syntax for backend languages & N/A & N/A \\ \texttt{RTL/RTL.ma} & The syntax of RTL & \texttt{RTL/RTL.mli} & 1.85\tnote{b} \\ \texttt{ERTL/ERTL.ma} & The syntax of ERTL & \texttt{ERTL/ERTL.mli} & 1.04\tnote{b} \\ \texttt{LTL/LTL.ma} & The specialisation of the above file to the syntax of LTL & \texttt{LTL/LTL.mli} & 1.86\tnote{a} \\ \texttt{LIN/joint\_LTL\_LIN.ma} & The syntax of the abstracted combined LTL and LIN language & N/A & N/A \\ \texttt{LTL/LTL.ma} & The specialisation of the above file to the syntax of LTL & \texttt{LTL/LTL.mli} & 1.86\tnote{a} \\ \texttt{LIN/LIN.ma} & The specialisation of the above file to the syntax of LIN & \texttt{LIN/LIN.mli} & 2.27\tnote{a} \\ \end{tabular} \begin{tablenotes} \item[a] Includes \texttt{joint/Joint\_LTL\_LIN.ma} and \texttt{joint/Joint.ma}. \item[b] Includes \texttt{joint/Joint.ma}. \item[b] Includes \texttt{joint/Joint.ma}.\\ Total lines of Matita code for the above files: 347. \\ Total lines of O'Caml code for the above files: 616. \\ Ration of total lines: 0.56. \end{tablenotes} \end{threeparttable} \label{table.syntax} \end{table} \end{landscape} Here, the O'Caml column denotes the O'Caml source file in the prototype compiler's implementation that corresponds to the Matita script in question. The ratios are the linecounts of the Matita file divided by the line counts of the corresponding O'Caml file. These are computed with \texttt{wc -l}, a standard Unix tool. Individual file's ratios are an over approximation, due to the fact that it's hard to relate an individual O'Caml file to the abstracted Matita code that has been spread across multiple files. The ratio between total Matita code lines and total O'Caml code lines is more reflective of the compressed and abstracted state of the Matita translation. Translations and utilities are presented in Table~\ref{table.translation.utilities}. \begin{landscape} Title & Description & O'Caml & Ratio \\ \hline \texttt{joint/TranslateUtils.ma} & Generic translation utilities & N/A & N/A \\ \texttt{RTLabs/RTLabsToRTL.ma} & Translation from RTLabs to RTL & \texttt{RTLabs/RTLabsToRTL.ml} & 1.61 \\ \texttt{joint/TranslateUtils.ma} & Generic translation utilities & N/A & N/A \\ \texttt{joint/Joint\_LTL\_LIN.ma} & Generic code for LTL and LIN languages & N/A & N/A \\ \texttt{RTL/RTLToERTL.ma} & Translation from RTL to ERTL & \texttt{RTL/RTLToERTL.ml} & 1.38\tnote{a}\tnote{b}\tnote{c} \\ \texttt{RTL/RTLtailcall.ma} & Elimination of tailcalls & \texttt{RTL/RTLtailcall.ml} & 2.01 \\ \texttt{ERTL/ERTLToLTL.ma} & Translation from ERTL to LTL & \texttt{ERTL/ERTLToLTL.ml} & 1.39\tnote{a}\tnote{b}\tnote{c}\tnote{d} \\ \texttt{ERTL/Interference.ma} & Axiomatised graph colouring component & \texttt{common/interference.ml} & 0.23\tnote{b} \\ \texttt{ERTL/liveness.ma} & Liveness analysis & \texttt{ERTL/liveness.ml} & 1.46\tnote{b} \\ \texttt{LTL/LTLToLIN.ma} & Translation from LTL to LIN & \texttt{LTL/LTLToLIN.ml} & 1.10 \tnote{a}\tnote{b}\tnote{c}\tnote{e}\tnote{f} \\ \texttt{LIN/LINToASM.ma} & Translation from LIN to assembly & \texttt{LIN/LINToASM.ml} & 4.06\tnote{a}\tnote{b}\tnote{c}\tnote{f} \texttt{RTL/RTLToERTL.ma} & Translation from RTL to ERTL & \texttt{RTL/RTLToERTL.ml} & 1.38\tnote{a} \\ \texttt{RTL/RTLTailcall.ma} & Elimination of tailcalls & \texttt{RTL/RTLtailcall.ml} & 2.01 \\ \texttt{ERTL/ERTLToLTL.ma} & Translation from ERTL to LTL & \texttt{ERTL/ERTLToLTL.ml} & 1.39\tnote{b} \\ \texttt{ERTL/Interference.ma} & Axiomatised graph colouring component & \texttt{common/interference.ml} & 0.23\tnote{c} \\ \texttt{ERTL/liveness.ma} & Liveness analysis & \texttt{ERTL/liveness.ml} & 1.46\tnote{c} \\ \texttt{LTL/LTLToLIN.ma} & Translation from LTL to LIN & \texttt{LTL/LTLToLIN.ml} & 1.10 \tnote{d} \\ \texttt{LIN/joint\_LTL\_LIN.ma} & Generic code for LTL and LIN languages & N/A & N/A \\ \texttt{LIN/LINToASM.ma} & Translation from LIN to assembly & \texttt{LIN/LINToASM.ml} & 4.06\tnote{e} \end{tabular} \begin{tablenotes} \item[a] Includes \texttt{joint/TranslateUtils.ma}. \item[b] Includes \texttt{joint/Joint.ma}. \item[c] Includes \texttt{joint/TranslateUtil.ma}. \item[d] Includes \texttt{ERTL/ERTLToLTLI.ml}. \item[e] Includes \texttt{LTL/LTLToLINI.ml}. \item[f] Includes \texttt{joint/joint\_LTL\_LIN.ma}. \item[a] Includes \texttt{joint/TranslateUtils.ma}, \texttt{joint/Joint.ma} and \texttt{joint/TranslateUtil.ma}. \item[b] Includes \texttt{joint/TranslateUtils.ma}, \texttt{joint/Joint.ma}, \texttt{joint/TranslateUtil.ma} and \texttt{ERTL/ERTLToLTLI.ml}. \item[c] Includes \texttt{joint/Joint.ma}. \item[d] Includes \texttt{joint/TranslateUtils.ma}, \texttt{joint/Joint.ma}, \texttt{joint/TranslateUtil.ma}, \texttt{LTL/LTLToLINI.ml} and \texttt{joint/joint\_LTL\_LIN.ma}. \item[e] Includes \texttt{joint/TranslateUtils.ma}, \texttt{joint/Joint.ma}, \texttt{joint/TranslateUtil.ma} and \texttt{joint/joint\_LTL\_LIN.ma}.\\ Total lines of Matita code for the above files: 3032. \\ Total lines of O'Caml code for the above files: 3332. \\ Ration of total lines: 0.91. \end{tablenotes} \end{threeparttable}