Ignore:
Timestamp:
Oct 14, 2011, 3:51:09 PM (8 years ago)
Author:
mulligan
Message:

changes to file based on claudio's suggestions

File:
1 edited

Legend:

Unmodified
Added
Removed
  • Deliverables/D4.2-4.3/reports/D4-2.tex

    r1366 r1373  
    142142\label{subsect.brief.overview.backend.compilation.chain}
    143143
    144 The compiler's backend consists of four distinct intermediate languages: RTL, ERTL, LTL and LIN.
    145 A fifth language, RTLabs, serves as the exit point of the backend and the entry point of the frontend.
    146 RTL, ERTL and LTL are `graph based' languages, whereas LIN is a linearised language, the final language before translation to assembly.
     144The Matita compiler's backend consists of five distinct intermediate languages: RTL, RTLntl, ERTL, LTL and LIN.
     145A fifth language, RTLabs, serves as the entry point of the backend and the exit point of the frontend.
     146RTL, RTLntl, ERTL and LTL are `graph based' languages, whereas LIN is a linearised language, the final language before translation to assembly.
    147147
    148148We now briefly discuss the properties of the intermediate languages, and discuss the various transformations that take place during the translation process:
     
    150150\paragraph{RTLabs ((Abstract) Register Transfer Language)}
    151151As mentioned, this is the final language of the compiler's frontend and the entry point for the backend.
    152 This language uses pseudoregisters, not hardware registers.
    153 During the translation pass from RTLabs to RTL instruction selection is carried out.
     152This language uses pseudoregisters, not hardware registers.\footnote{There are an unbounded number of pseudoregisters.  Pseudoregisters are converted to hardware registers of stack positions during register allocation.}
     153Functions still use stackframes, where arguments are passed on the stack and results are stored in addresses.
     154During the pass to RTL, these are eliminated, and instruction selection is carried out.
    154155
    155156\paragraph{RTL (Register Transfer Language)}
    156157This language uses pseudoregisters, not hardware registers.
    157 Tailcall elimination is carried out during the translation from RTL to ERTL.
     158Tailcall elimination is carried out during the translation from RTL to RTLntl.
     159
     160\paragraph{RTLntl (Register Transfer Language --- No Tailcalls)}
     161This language is a pseudoregister, graph based language where all tailcalls are eliminated.
     162RTLntl is not present in the O'Caml compiler.
    158163
    159164\paragraph{ERTL (Extended Register Transfer Language)}
     
    161166The ERTL to LTL pass performs the following transformations: liveness analysis, register colouring and register/stack slot allocation.
    162167
    163 \paragraph{LTL (Linearised Transfer Language)}
    164 The name is somewhat of a misnomer, as the language is \emph{not} linearised, and is in fact still graph based, but uses hardware registers instead of pseudoregisters.
     168\paragraph{LTL (Linearisable Transfer Language)}
     169Another graph based language, but uses hardware registers instead of pseudoregisters.
    165170Tunnelling (branch compression) should be implemented here.
    166171
     
    189194The O'Caml compiler is written in the following manner.
    190195Each intermediate language has its own dedicated syntax, notions of internal function, and so on.
    191 Translations map syntaxes to syntaxes, and internal functions to internal functions explicitly.
     196Here, we make a distinction between `internal functions'---other functions that are explicitly written by the programmer, and `external functions', which belong to external library and require explictly linking.
     197Internal functions are represented as a record, consisting of a sequential structure, of some description, of statements, entry and exit points to this structure, and other book keeping devices.
     198Translations between intermediate language map syntaxes to syntaxes, and internal function representations to internal function representations explicitly.
    192199
    193200This is a perfectly valid way to write a compiler, where everything is made explicit, but writing a \emph{verified} compiler poses new challenges.
    194201In particular, we must look ahead to see how our choice of encodings will affect the size and complexity of the forthcoming proofs of correctness.
    195202We now discuss some abstractions, introduced in the Matita code, which we hope will make our proofs shorter, amongst other benefits.
    196 
    197 \paragraph{Shared code, reduced proofs}
    198 Many features of individual backend intermediate languages are shared with other intermediate languages.
    199 For instance, RTLabs, RTL, ERTL and LTL are all graph based languages, where functions are represented as a graph of statements that form their bodies.
    200 Functions for adding statements to a graph, searching the graph, and so on, are remarkably similar across all languages, but are duplicated in the O'Caml code.
    201 
    202 As a result, we chose to abstract the representation of internal functions for the RTL, ERTL, LTL and LIN intermediate languages into a `joint' representation.
    203 This representation is parameterised by a record that dictates the layout of the function body for each intermediate language.
    204 For instance, in RTL, the layout is graph like, whereas in LIN, the layout is a linearised list of statements.
    205 Further, a generalised way of accessing the successor statement to the one currently under consideration is needed, and so forth.
    206 
    207 Our joint internal function record looks like so:
    208 \begin{lstlisting}
    209 record joint_internal_function (globals: list ident) (p:params globals) : Type[0] ≝
    210 {
    211   ...
    212   joint_if_params   : paramsT p;
    213   joint_if_locals   : localsT p;
    214   ...
    215   joint_if_code     : codeT … p;
    216   ...
    217 }.
    218 \end{lstlisting}
    219 In particular, everything that can vary between differing intermediate languages has been parameterised.
    220 Here, we see the number of parameters, the listing of local variables, and the internal code representation has been parameterised.
    221 Other particulars are also parameterised, though here omitted.
    222 
    223 Hopefully this abstraction process will reduce the number of proofs that need to be written, dealing with internal functions.
    224 We only need to prove once that fetching a statement's successor is `correct', and we inherit this property for free for every intermediate language.
    225203
    226204\paragraph{Changes between languages made explicit}
     
    232210inductive joint_instruction (p: params__) (globals: list ident): Type[0] :=
    233211  | COMMENT: String $\rightarrow$ joint_instruction p globals
    234   | COST_LABEL: costlabel $\rightarrow$ joint_instruction p globals
     212  ...
    235213  | INT: generic_reg p $\rightarrow$ Byte $\rightarrow$ joint_instruction p globals
     214  ...
     215  | OP1: Op1 → acc_a_reg p → acc_a_reg p → joint_instruction p globals
    236216  ...
    237217  | extension: extend_statements p $\rightarrow$ joint_instruction p globals.
     
    240220However, these instructions expect different register types (either a pseudoregister or a hardware register) as arguments.
    241221We must therefore parameterise the joint syntax with a record of parameters that will be specialised to each intermediate language.
    242 In the type above, this parameterisation is realised wit the \texttt{params\_\_} record.
     222In the type above, this parameterisation is realised with the \texttt{params\_\_} record.
     223As a result of this parameterisation, we have also added a degree of `type safety' to the intermediate languages' syntaxes.
     224In particular, we note that the \texttt{OP1} constructor expects quite a specific type, in that the two register arguments must both be the accumulator A.
     225Contrast this with the \texttt{INT} constructor, which expects a \texttt{generic\_reg}, corresponding to an `arbitrary' register type.
    243226
    244227Further, we note that some intermediate languages have language specific instructions (i.e. the instructions that change between languages).
     
    258241\end{lstlisting}
    259242
     243\paragraph{Shared code, reduced proofs}
     244Many features of individual backend intermediate languages are shared with other intermediate languages.
     245For instance, RTLabs, RTL, ERTL and LTL are all graph based languages, where functions are represented as a graph of statements that form their bodies.
     246Functions for adding statements to a graph, searching the graph, and so on, are remarkably similar across all languages, but are duplicated in the O'Caml code.
     247
     248As a result, we chose to abstract the representation of internal functions for the RTL, ERTL, LTL and LIN intermediate languages into a `joint' representation.
     249This representation is parameterised by a record that dictates the layout of the function body for each intermediate language.
     250For instance, in RTL, the layout is graph like, whereas in LIN, the layout is a linearised list of statements.
     251Further, a generalised way of accessing the successor statement to the one currently under consideration is needed, and so forth.
     252
     253Our joint internal function record looks like so:
     254\begin{lstlisting}
     255record joint_internal_function (globals: list ident) (p:params globals) : Type[0] ≝
     256{
     257  ...
     258  joint_if_params   : paramsT p;
     259  joint_if_locals   : localsT p;
     260  ...
     261  joint_if_code     : codeT … p;
     262  ...
     263}.
     264\end{lstlisting}
     265In particular, everything that can vary between differing intermediate languages has been parameterised.
     266Here, we see the number of parameters, the listing of local variables, and the internal code representation has been parameterised.
     267Other particulars are also parameterised, though here omitted.
     268
     269Hopefully this abstraction process will reduce the number of proofs that need to be written, dealing with internal functions.
     270We only need to prove once that fetching a statement's successor is `correct', and we inherit this property for free for every intermediate language.
     271
     272\paragraph{Dependency on instruction selection}
     273We note that the backend languages are all essentially `post instruction selection languages'.
     274The `joint' syntax makes this especially clear.
     275For instance, in the definition:
     276\begin{lstlisting}
     277inductive joint_instruction (p:params__) (globals: list ident): Type[0] ≝
     278  ...
     279  | INT: generic_reg p → Byte → joint_instruction p globals
     280  | MOVE: pair_reg p → joint_instruction p globals
     281  ...
     282  | PUSH: acc_a_reg p → joint_instruction p globals
     283  ...
     284  | extension: extend_statements p → joint_instruction p globals.
     285\end{lstlisting}
     286The capitalised constructors---\texttt{INT}, \texttt{MOVE}, and so on---are all machine specific instructions.
     287Retargetting the compiler to another microprocessor would entail replacing these constructors with constructors that correspond to the instructions of the new target.
     288We feel that this makes which instructions are target dependent, and which are not (i.e. those language specific instructions that fall inside the \texttt{extension} constructor) much more explicit.
     289
     290\paragraph{Independent development and testing}
     291We have essentially modularised the intermediate languages in the compiler backend.
     292As with any form of modularisation, we reap benefits in the ability to independently test and develop each intermediate language separately.
     293
     294\paragraph{Future reuse for other compiler projects}
     295Another advantage of our modularisation scheme is the ability to quickly use and reuse intermediate languages for other compiler projects.
     296For instance, in creating a cost-preserving compiler for a functional language, we may choose to target the RTL language directly.
     297Naturally, the register requirements for a functional language may differ from those of an imperative language, a reconfiguration which our parameterisation makes easy.
     298
     299\paragraph{Easy addition of new compiler passes}
     300Under our modularisation and abstraction scheme, new compiler passes can easily be injected into the backend.
     301We have a concrete example of this in the RTLntl language, an intermediate language that was not present in the original O'Caml code.
     302To specify a new intermediate language we must simply specify, through the use of the statement extension mechanism, what differs in the new intermediate language from the `joint' language, and configure a new notion of internal function record, by specialising parameters, to the new language.
     303As generic code for the `joint' language exists, for example to add statements to control flow graphs, this code can be reused for the new intermediate language.
     304
    260305\paragraph{Possible language commutations}
    261306The backend translation passes of the CerCo compiler differ quite a bit from the CompCert compiler.
     
    275320\label{subsect.use.of.dependent.types}
    276321
    277 We use dependent types in the backend for three reasons.
     322We see three potential ways in which a compiler can fail to compile a program:
     323\begin{enumerate}
     324\item
     325The program is malformed, and there is no hope of making sense of the program.
     326\item
     327A heuristic or algorithm in the compiler is implemented incorrectly, in which case an otherwise correct source program fails to be compiler to correct assembly code.
     328\item
     329An invariant in the compiler is invalidated.
     330\end{enumerate}
     331The first source of failure we are unable to do anything about.
     332The latter two sources of failure should be interpreted as a compiler bug, and, as part of a verified compiler project, we'd like to rule out all such bugs.
     333In CerCo, we aim to use dependent types to help us enforce invariants and prove our heuristics and algorithms correct.
    278334
    279335First, we encode informal invariants, or uses of \texttt{assert false} in the O'Caml code, with dependent types, converting partial functions into total functions.
     
    329385In particular, the compiler does not support the floating point datatype, nor accompanying functions over that datatype.
    330386At the moment, frontend languages within the compiler possess constructors corresponding to floating point code.
    331 These are removed during instruction selection (in the RTLabs to RTL transformation) using a daemon.
     387These are removed during instruction selection (in the RTLabs to RTL transformation) using a daemon.\footnote{A Girardism.  An axiom of type \texttt{False}, from which we can prove anything.}
    332388However, at some point, we would like the front end of the compiler to recognise programs that use floating point code and reject them as being invalid.
    333 \item
    334 \textbf{Functions and operations on datatypes that are implemented in the C runtime.}
    335 The compiler emits only a subset of the instructions available in the MCS-51's instruction architecture.
    336 In particular, integer modulus at the C source level is transformed into a call to a runtime function implementing the modulus operation during the translation to C-light.
    337 
    338 However, all datatypes corresponding to valid C operations over integers and floats mention integer modulus in the backend, and the translation of these operations is discharged using a daemon.
    339 We have plans to dispense with this `precooking' processs at the C-light level and move the translation of these operations into the RTLabs to RTL pass, where they can be translated properly.
    340389\item
    341390\textbf{Axiomatised components that will be implemented using external oracles.}
     
    344393Instead, these components are axiomatised, along with the properties that they need to satisfy in order for the rest of the compilation chain to be correct.
    345394These axiomatised components are found in the ERTL to LTL pass.
     395
     396It should be noted that these axiomatised components fall into the following pattern: whilst their implementation is complex, and their proof of correctness is difficult, we are able to quickly and easily verify that any answer that they provide is correct.
     397As a result, we do not see this axiomatisation process as being too onerous.
    346398\item
    347399\textbf{A few non-computational proof obligations.}
    348400A few difficult-to-close, but non-computational (i.e. they do not prevent us from executing the compiler inside Matita), proof obligations have been closed using daemons in the backend.
    349401These proof obligations originate with our use of dependent types for expressing invariants in the compiler.
     402However, here, it should be mentioned that many open proof obligations are simply impossible to close until we start to obtain stronger invariants from the proof of correctness for the compiler proper.
     403In particular, in the RTLabs to RTL pass, several proof obligations relating to lists of registers stored in a `local environment' appear to fall into this pattern.
     404\item
    350405\textbf{Branch compression (tunnelling).}
    351406This was a feature of the O'Caml compiler.
    352407It is not yet currently implemented in the Matita compiler.
    353408This feature is only an optimisation, and will not affect the correctness of the compiler.
     409\item
    354410\textbf{`Real' tailcalls}
    355411For the time being, tailcalls in the backend are translated to `vanilla' function calls during the ERTL to LTL pass.
     
    361417% SECTION.                                                                    %
    362418%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
     419\section{Associated changes to O'Caml compiler}
     420\label{sect.associated.changes.to.ocaml.compiler}
     421
     422At the moment, no changes we have made in the Matita backend have made their way back into the O'Caml compiler.
     423We do not see the heavy process of modularisation and abstraction as making its way back into the O'Caml codebase, as this is a significant rewrite of the backend code.
     424However, several bugfixes, and the identification of `hidden invariants' in the O'Caml code will be incorporated back into the prototype.
     425
     426%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
     427% SECTION.                                                                    %
     428%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
    363429\section{Future work}
    364430\label{sect.future.work}
     
    376442\item
    377443We plan to close all existing proof obligations that are closed using daemons, arising from our use of dependent types in the backend.
    378 This should be routine.
     444However, many may not be closable until we have completed Deliverable D4.4, the certification of the whole compiler, as we may not have invariants strong enough at the present time.
    379445\item
    380446We plan to port the O'Caml compiler's implementation of tailcalls when this is completed, and eventually port the branch compression code currently in the O'Caml compiler to the Matita implementation.
     
    382448\item
    383449We plan to validate the backend translations, removing any obvious bugs, by executing the translation inside Matita on small C programs.
     450This is not critical, as the certification process will find all bugs anyway.
    384451\end{itemize}
    385452
     
    398465\label{subsect.listing.files}
    399466
    400 Translation specific files (files relating to language semantics have been omitted):
    401 \begin{center}
    402 \begin{tabular*}{0.9\textwidth}{p{5cm}p{8cm}}
    403 Title & Description \\
    404 \hline
    405 \texttt{RTLabs/syntax.ma} & The syntax of RTLabs \\
    406 \texttt{RTLabs/RTLabsToRTL.ma} & The translation from RTLabs to RTL \\
    407 \texttt{joint/Joint.ma} & Abstracted syntax for backend languages \\
    408 \texttt{joint/TranslateUtils.ma} & Generic translation utilities \\
    409 \texttt{RTL/RTL.ma} & The syntax of RTL \\
    410 \texttt{RTL/RTLToERTL.ma} & The translation from RTL to ERTL \\
    411 \texttt{RTL/RTLtailcall.ma} & Elimination of tailcalls \\
    412 \texttt{ERTL/ERTL.ma} & The syntax of ERTL \\
    413 \texttt{ERTL/ERTLToLTL.ma} & The translation from ERTL to LTL \\
    414 \texttt{ERTL/Interference.ma} & Axiomatised graph colouring component \\
    415 \texttt{ERTL/liveness.ma} & Liveness analysis \\
    416 \texttt{LTL/LTL.ma} & The syntax of LTL \\
    417 \texttt{LTL/LTLToLIN.ma} & The translation from LTL to LIN \\
    418 \texttt{LIN/LIN.ma} & The syntax of LIN \\
    419 \texttt{LIN/LINToASM.ma} & The translation from LIN to assembly language
    420 \end{tabular*}
    421 \end{center}
     467Translation specific files (files relating to language semantics have been omitted).
     468Syntax:
     469\begin{center}
     470\begin{tabular*}{\textwidth}{p{3.5cm}p{5.5cm}p{3.5cm}p{1cm}}
     471Title & Description & O'Caml & Ratio \\
     472\hline
     473\texttt{RTLabs/syntax.ma} & The syntax of RTLabs & \texttt{RTLabs/RTLabs.mli} & 0.65 \\
     474\texttt{joint/Joint.ma} & Joint syntax for backend languages & N/A & N/A \\
     475\texttt{RTL/RTL.ma} & The syntax of RTL & \texttt{RTL/RTL.mli} & 0.41 \\
     476\texttt{ERTL/ERTL.ma} & The syntax of ERTL & \texttt{ERTL/ERTL.mli} & 0.13 \\
     477\texttt{LTL/LTL.ma} & The syntax of LTL & \texttt{LTL/LTL.mli} & 0.13 \\
     478\texttt{LIN/LIN.ma} & The syntax of LIN & \texttt{LIN/LIN.mli} & 0.36
     479\end{tabular*}
     480\end{center}
     481Here, the O'Caml column denotes the O'Caml source file in the prototype compiler's implementation that corresponds to the Matita script in question.
     482The ratios are the linecounts of the Matita file divided by the line counts of the corresponding O'Caml file.
     483These are computed with \texttt{wc -l}, a standard Unix tool.
     484
     485\noindent
     486Translations and utilities:
     487\begin{center}
     488\begin{tabular*}{\textwidth}{p{4.5cm}p{4.5cm}p{4.5cm}p{1cm}}
     489Title & Description & O'Caml & Ratio \\
     490\hline
     491\texttt{RTLabs/RTLabsToRTL.ma} & The translation from RTLabs to RTL & \texttt{RTLabs/RTLabsToRTL.ml} & 1.61 \\
     492\texttt{joint/TranslateUtils.ma} & Generic translation utilities & N/A & N/A \\
     493\texttt{RTL/RTLToERTL.ma} & The translation from RTL to ERTL & \texttt{RTL/RTLToERTL.ml} & 0.88 \\
     494\texttt{RTL/RTLtailcall.ma} & Elimination of tailcalls & \texttt{RTL/RTLtailcall.ml} & 2.08 \\
     495\texttt{ERTL/ERTLToLTL.ma} & The translation from ERTL to LTL & \texttt{ERTL/ERTLToRTL.ml} & 3.46 \\
     496\texttt{ERTL/Interference.ma} & Axiomatised graph colouring component & \texttt{common/interference.ml} & 0.03\footnote{The majority of this file is axiomatised.} \\
     497\texttt{ERTL/liveness.ma} & Liveness analysis & \texttt{ERTL/liveness.ml} & 0.92 \\
     498\texttt{LTL/LTLToLIN.ma} & The translation from LTL to LIN & \texttt{LTL/LTLToLIN.ml} & 0.75 \\
     499\texttt{LIN/LINToASM.ma} & The translation from LIN to assembly language & \texttt{LIN/LINToASM.ml} 2.45 &
     500\end{tabular*}
     501\end{center}
     502Given that Matita code is much more verbose than O'Caml code, with explicit typing and inline proofs, we have achieved respectable line count ratios in the translation.
    422503
    423504%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
     
    427508\label{subsect.listing.important.functions.and.axioms}
    428509
    429 We list some important functions in the backend compilation:
     510We list some important functions and axioms in the backend compilation:
    430511
    431512\paragraph{From RTL/RTLabsToRTL.ma}
Note: See TracChangeset for help on using the changeset viewer.