Changeset 1362


Ignore:
Timestamp:
Oct 12, 2011, 5:54:04 PM (8 years ago)
Author:
mulligan
Message:

d4-2 report almost complete

File:
1 edited

Legend:

Unmodified
Added
Removed
  • Deliverables/D4.2-4.3/reports/D4-2.tex

    r1361 r1362  
    144144The compiler's backend consists of four distinct intermediate languages: RTL, ERTL, LTL and LIN.
    145145A fifth language, RTLabs, serves as the exit point of the backend and the entry point of the frontend.
    146 RTL, ERTL and LTL are `graph based' languages, whereas LIN is a linearised language
     146RTL, ERTL and LTL are `graph based' languages, whereas LIN is a linearised language, the final language before translation to assembly.
     147
     148We now briefly discuss the properties of the intermediate languages, and discuss the various transformations that take place during the translation process:
     149
     150\paragraph{RTLabs ((Abstract) Register Transfer Language)}
     151As mentioned, this is the final language of the compiler's frontend and the entry point for the backend.
     152This language uses pseudoregisters, not hardware registers.
     153During the translation pass from RTLabs to RTL instruction selection is carried out.
    147154
    148155\paragraph{RTL (Register Transfer Language)}
     156This language uses pseudoregisters, not hardware registers.
     157Tailcall elimination is carried out during the translation from RTL to ERTL.
    149158
    150159\paragraph{ERTL (Extended Register Transfer Language)}
     160In this language most instructions still operate on pseudoregisters, apart from instructions that move data to, and from, the accumulator.
     161The ERTL to LTL pass performs the following transformations: liveness analysis, register colouring and register/stack slot allocation.
    151162
    152163\paragraph{LTL (Linearised Transfer Language)}
     164The name is somewhat of a misnomer, as the language is \emph{not} linearised, and is in fact still graph based, but uses hardware registers instead of pseudoregisters.
     165Tunnelling (branch compression) should be implemented here.
    153166
    154167\paragraph{LIN (Linearised)}
     168This is a linearised form of the LTL language; function graphs have been linearised into lists of statements.
     169All registers have been translated into hardware registers or stack addresses.
     170This is the final stage of compilation before translating directly into assembly language.
    155171
    156172%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
     
    184200Functions for adding statements to a graph, searching the graph, and so on, are remarkably similar across all languages, but are duplicated in the O'Caml code.
    185201
     202As a result, we chose to abstract the representation of internal functions for the RTL, ERTL, LTL and LIN intermediate languages into a `joint' representation.
     203This representation is parameterised by a record that dictates the layout of the function body for each intermediate language.
     204For instance, in RTL, the layout is graph like, whereas in LIN, the layout is a linearised list of statements.
     205Further, a generalised way of accessing the successor statement to the one currently under consideration is needed, and so forth.
     206
     207Our joint internal function record looks like so:
     208\begin{lstlisting}
     209record joint_internal_function (globals: list ident) (p:params globals) : Type[0] ≝
     210{
     211  ...
     212  joint_if_params   : paramsT p;
     213  joint_if_locals   : localsT p;
     214  ...
     215  joint_if_code     : codeT … p;
     216  ...
     217}.
     218\end{lstlisting}
     219In particular, everything that can vary between differing intermediate languages has been parameterised.
     220Here, we see the number of parameters, the listing of local variables, and the internal code representation has been parameterised.
     221Other particulars are also parameterised, though here omitted.
     222
     223Hopefully this abstraction process will reduce the number of proofs that need to be written, dealing with internal functions.
     224We only need to prove once that fetching a statement's successor is `correct', and we inherit this property for free for every intermediate language.
     225
    186226\paragraph{Changes between languages made explicit}
     227Due to the bureaucracy inherent in explicating each intermediate language's syntax in the O'Caml compiler, it can often be hard to see exactly what changes between each successive intermediate language.
     228By abstracting the syntax of the RTL, ERTL, LTL and LIN intermediate languages, we make these changes much clearer.
     229
     230Our abstraction takes the following form:
     231\begin{lstlisting}
     232inductive joint_instruction (p: params__) (globals: list ident): Type[0] :=
     233  | COMMENT: String $\rightarrow$ joint_instruction p globals
     234  | COST_LABEL: costlabel $\rightarrow$ joint_instruction p globals
     235  | INT: generic_reg p $\rightarrow$ Byte $\rightarrow$ joint_instruction p globals
     236  ...
     237  | extension: extend_statements p $\rightarrow$ joint_instruction p globals.
     238\end{lstlisting}
     239We first note that for the majority of intermediate languages, many instructions are shared.
     240However, these instructions expect different register types (either a pseudoregister or a hardware register) as arguments.
     241We must therefore parameterise the joint syntax with a record of parameters that will be specialised to each intermediate language.
     242In the type above, this parameterisation is realised wit the \texttt{params\_\_} record.
     243
     244Further, we note that some intermediate languages have language specific instructions (i.e. the instructions that change between languages).
     245We therefore add a new constructor to the syntax, \texttt{extension}, which expects a value of type \texttt{extend\_statements p}.
     246As \texttt{p} varies between intermediate languages, we can provide language specific extensions to the syntax of the joint language.
     247For example, ERTL's extended syntax consists of the following extra statements:
     248\begin{lstlisting}
     249inductive ertl_statement_extension: Type[0] :=
     250  | ertl_st_ext_new_frame: ertl_statement_extension
     251  | ertl_st_ext_del_frame: ertl_statement_extension
     252  | ertl_st_ext_frame_size: register $\rightarrow$ ertl_statement_extension.
     253\end{lstlisting}
     254These are further packaged into an ERTL specific instance of \texttt{params\_\_} as follows:
     255\begin{lstlisting}
     256definition ertl_params__: params__ :=
     257  mk_params__ register register ... ertl_statement_extension.
     258\end{lstlisting}
    187259
    188260\paragraph{Possible language commutations}
    189 
    190 \paragraph{Instruction selection}
     261The backend translation passes of the CerCo compiler differ quite a bit from the CompCert compiler.
     262In the CompCert compiler, linearisation occurs much earlier in the compilation chain, and passes such as register colouring and allocation are carried out on a linearised form of program.
     263Contrast this with our own approach, where the code is represented as a graph for much longer.
     264
     265However, by abstracting the representation of intermediate functions, we are now much more free to reorder translation passes as we see fit.
     266The linearisation process, for instance, now no longer cares about the specific representation of code in the source and target languages.
     267It just relies on a common interface.
     268We are therefore, in theory, free to pick where we wish to linearise our representation.
     269This adds an unusual flexibility into the compilation process, and allows us to freely experiment with different orderings of translation passes.
    191270
    192271%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
     
    269348A few difficult-to-close, but non-computational (i.e. they do not prevent us from executing the compiler inside Matita), proof obligations have been closed using daemons in the backend.
    270349These proof obligations originate with our use of dependent types for expressing invariants in the compiler.
     350\textbf{Branch compression (tunnelling).}
     351This was a feature of the O'Caml compiler.
     352It is not yet currently implemented in the Matita compiler.
     353This feature is only an optimisation, and will not affect the correctness of the compiler.
    271354\textbf{`Real' tailcalls}
    272355For the time being, tailcalls in the backend are translated to `vanilla' function calls during the ERTL to LTL pass.
     
    295378This should be routine.
    296379\item
    297 We plan to port the O'Caml compiler's implementation of tailcalls when this is completed.
     380We plan to port the O'Caml compiler's implementation of tailcalls when this is completed, and eventually port the branch compression code currently in the O'Caml compiler to the Matita implementation.
    298381This should not cause any major problems.
    299382\item
     
    338421\end{center}
    339422
     423%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
     424% SECTION.                                                                    %
     425%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
     426\subsection{Listing of important functions}
     427\label{subsect.listing.important.functions}
     428
    340429\end{document}
Note: See TracChangeset for help on using the changeset viewer.