Changeset 3663


Ignore:
Timestamp:
Mar 16, 2017, 3:59:16 PM (7 months ago)
Author:
mulligan
Message:

more work on compiler architecture section

File:
1 edited

Legend:

Unmodified
Added
Removed
  • Papers/jar-cerco-2017/architecture.tex

    r3660 r3663  
    1414Where appropriate, we make reference to the source documentation of these two compilers, to explain our intermediate languages.
    1515
     16Note: though many of the compiler intermediate languages (especially in the front end) are shared with CompCert, the architecture of the CerCo compiler is different in quite a substantial way from that of CompCert.
     17In particular, the CompCert compiler's backend `linearises' its code much earlier, compared to our own.
     18Indeed, as explained below, the CerCo compiler works with a control-flow graph based representation of the program almost until producing machine code, only linearising in the translation from LTL to LIN intermediate languages.
     19This is merely a design variation in our two compilers, and is not motivated by any real technical issue pertinent to the proof of correctness, or due to the lifting of cost models.
     20
    1621\subsection{Compiler front-end}
    1722
    1823The compiler front-end lowers a program written in C to a low-level language of control-flow graphs, ready for the entry-point of our compiler backend.
     24In total, there are three intermediate languages in the front-end.
     25
    1926The first translation step of the compiler lowers the full C language to a high-level intermediate language, Clight---this language will be explained further below.
    20 This translation phase is unverified, using the CIL C parsing utility~\cite{cil02} to parse a C file and produce a Clight AST.
    21 We therefore do not expand on this here, and only note its existence.
    22 
    23 \paragraph{Clight}
    24 Clight is a large subset of the C language, developed by the CompCert team for use in the CompCert verified C compiler, that we adopt as the source language of our compiler.
     27This translation phase is unverified, using the CIL C parsing utility~\cite{cil02} to handle the intricacies of parsing C.
     28We produce a Clight AST directly from a C source file, using this tool.
     29As this translation step is unverified we do not expand any further on it here, and only note its existence.
     30
     31\paragraph{Clight} is a large subset of the C language, developed by the CompCert team for use in the CompCert verified C compiler, that we adopt as the source language of our compiler.
    2532We refer to~\cite{compcert} for a formal definition of the Clight language.
    2633
     
    3239In order to limit the implementation effort of verifying the CerCo compiler, our current compiler for Clight does {\em not} cover the operators relating to the floating point type {\tt float}.
    3340
    34 \paragraph{Cminor}
    35 
    36 Cminor is a simple, low-level imperative language, comparable to a stripped-down, typeless variant of C.
     41\paragraph{Cminor} is a simple, low-level imperative language, comparable to a stripped-down, typeless variant of C.
    3742The Cminor intermediate language again is taken directly from the CompCert verified C compiler, and we refer to the CompCert project for its formal definition.
    3843Like Clight, the Cminor language is structured into expressions, statements, functions, and whole programs.
     
    4449This information is produced by a static analysis that determines the variables whose address may be `taken'.
    4550When translating from Clight to Cminor, space is reserved for local arrays and structures.
    46 In a second step, the proper compilation of Clight programs into Cminor programs performed: it consists mainly in translating Clight control structures to the more primitive control-flow constructs available in Cminor.
    47 
    48 \paragraph{RTLabs}
    49 
    50 RTLAbs is the last architecture independent language in the compilation process.
    51 It is a rather straightforward {\em abstraction} of the {\em architecture-dependent} RTL intermediate language available in the CompCert project and it is intended to factorise some work common to the various target assembly languages (e.g. optimisations) and thus to make retargeting of the compiler a simpler matter.
    52 
     51In a second step, the proper compilation of Clight programs into Cminor programs is performed: consisting mainly of the translation of Clight control structures to the more primitive control-flow constructs available in Cminor.
     52
     53\paragraph{RTLabs} is the last architecture independent language in the compilation process.
     54It is a rather straightforward abstraction of the architecture-dependent RTL intermediate language used in the CompCert project.
     55
     56RTLabs is intended to ease retargeting of the compiler---should that be desired---by factorising some work common to the various target assembly languages, for example optimisations.
    5357In RTLAbs the structure of Cminor expressions is lost.
    5458This may have a negative impact on the instruction selection steps that follow in the compiler backend.
     
    6973\subsection{Compiler back-end}
    7074
    71 \paragraph{RTL}
    72 
    73 \paragraph{ERTL}
    74 
    75 \paragraph{LTL}
    76 
    77 \paragraph{LIN}
    78 
    79 \subsection{Target hardware}
    80 
    81 
    82 \subsubsection{RTL}
    83 
    84 As in RTLAbs, the structure of RTL programs is based on CFGs. RTL is
    85 the first architecture-dependant intermediate language of our compiler
    86 which, in its current version, targets the MCS-51 assembly language.
    87 
    88 \paragraph{Syntax.}
    89 RTL is very close to RTLAbs. It is based on CFGs and explicits the MCS-51
    90 instructions corresponding to the RTLAbs instructions. Type information
    91 disappears: everything is represented using 32 bits integers. Moreover, each
    92 global of the program is associated to an offset. The syntax of the language can
    93 be found in table \ref{RTL:syntax}. The grammar rules $\textit{unop}$, $\textit{binop}$,
    94 $\textit{uncon}$, and $\textit{bincon}$, respectively, represent the sets of unary
    95 operations, binary operations, unary conditions and binary conditions of the
    96 MCS-51 language.
     75\paragraph{RTL} is the first architecture-dependant intermediate language of our compiler, and the entry point of the compiler back end.
     76RTL programs are structured as CFGs, like RTLabs.
     77
     78The syntax of RTL is very close to RTLAbs, but now MCS-51 instructions are made explicit.
     79Type information completely disappears: everything is represented using 32-bit words.
     80Moreover, each global variable of a program is associated to a fixed offset, as we begin to fix the eventual memory layout of the program in the processor's memory.
     81The syntax of the RTL language is provided in Table~\ref{RTL:syntax}.
     82Within the grammar $\textit{unop}$, $\textit{binop}$, $\textit{uncon}$, and $\textit{bincon}$, represent the sets of unary operations, binary operations, unary conditions and binary conditions of the MCS-51 machine code language, lifted into RTL, respectively.
    9783
    9884\begin{table}
     
    164150\end{table}
    165151
    166 \paragraph{Translation of RTLAbs to RTL.}
    167 This translation is mostly straightforward. A RTLAbs instruction is often
    168 directly translated to a corresponding MCS-51 instruction. There are a few
    169 exceptions: some RTLAbs instructions are expanded in two or more MCS-51
    170 instructions. When the translation of a RTLAbs instruction requires more than
    171 a few simple MCS-51 instruction, it is translated into a call to a function
    172 defined in the preamble of the compilation result.
    173 
    174 \subsubsection{ERTL}
    175 
    176 As in RTL, the structure of ERTL programs is based on CFGs. ERTL
    177 explicits the calling conventions of the MCS-51 assembly language.
    178 
    179 \paragraph{Syntax.}
    180 The syntax of the language is given in table \ref{ERTL:syntax}. The main
    181 difference between RTL and ERTL is the use of hardware registers.
    182 Parameters are passed in specific hardware registers; if there are too many
    183 parameters, the remaining are stored in the stack. Other conventionally specific
    184 hardware registers are used: a register that holds the result of a function, a
    185 register that holds the base address of the globals, a register that holds the
    186 address of the top of the stack, and some registers that need to be saved when
    187 entering a function and whose values are restored when leaving a
    188 function. Following these conventions, function calls do not list their
    189 parameters anymore; they only mention their number. Two new instructions appear
    190 to allocate and deallocate on the stack some space needed by a function to
    191 execute. Along with these two instructions come two instructions to fetch or
    192 assign a value in the parameter sections of the stack; these instructions cannot
    193 yet be translated using regular load and store instructions because we do not
    194 know the final size of the stack area of each function. At last, the return
    195 instruction has a boolean argument that tells whether the result of the function
    196 may later be used or not (this is exploited for optimizations).
     152Translating from RTLabs into RTL is mostly straightforward.
     153An RTLAbs instruction is often directly translated to a corresponding MCS-51 instruction.
     154However, there are a few exceptions: some RTLAbs instructions are expanded into multiple MCS-51 instructions.
     155When the translation of an RTLAbs instruction requires more than a few simple MCS-51 instructions, we generate a function implementing the RTLabs instruction, place it in the program's preamble before the entry point, and generate a call to the relevant function, instead of directly generating the instruction sequence.
     156
     157\paragraph{ERTL} programs are again structured as CFGs.
     158In ERTL, calling conventions of the MCS-51 machine language are made explicit.
     159We provide the syntax of the language in Table~\ref{ERTL:syntax}.
     160
     161The main difference between RTL and ERTL is the elimination of pseudo registers, and the use of hardware registers to pass arguments to functions.
     162Parameters are now passed in fixed hardware registers, tied directly to our target hardware.
     163If there are too many parameters, the remaining arguments to be passed are spilled, and are stored in the stack.
     164
     165Further, we now enforce a convention that certain other now-explicit hardware registers are used for certain tasks.
     166We fix registers holding the result of a function, holding the base address of the program's global variables, holding the address of the top of the stack.
     167We also make explicit that some registers must be saved and restored when entering or returning from a function, to avoid clobbering.
     168
     169ERTL function calls do not list their parameters anymore---they merely assert the number of arguments that are being passed.
     170Two new instructions are provided to allocate and deallocate on the stack some space needed by a function to execute.
     171Along with these two instructions come two instructions to fetch or assign a value in the parameter sections of the stack; these instructions cannot yet be translated using regular load and store instructions.
     172This is because we do not know the final size of the stack area of each function at this point in the compilation chain.
     173
     174Note that the ERTL function return instruction has a boolean argument that indicates whether the result of the function may be used later or not.
     175This is exploited in certain optimisations.
    197176
    198177\begin{table}
     
    270249\end{table}
    271250
    272 \paragraph{Translation of RTL to ERTL.}
    273 The work consists in expliciting the conventions previously mentioned. These
    274 conventions appear when entering, calling and leaving a function, and when
    275 referencing a global variable or the address of a local variable.
    276 
    277 \paragraph{Optimizations.}
    278 A \emph{liveness analysis} is performed on ERTL to replace unused
    279 instructions by a $\textsf{skip}$. An instruction is tagged as unused when it
    280 performs an assignment on a register that will not be read afterwards. Also, the
    281 result of the liveness analysis is exploited by a \emph{register allocation}
    282 algorithm whose result is to efficiently associate a physical location (a
    283 hardware register or an address in the stack) to each pseudo register of the
    284 program.
     251The majority of the work involved in translating from RTL to ERTL involves making the MCS-51 calling conventions explicit.
     252These conventions appear when entering, calling and leaving a function, and when referencing a global variable or the address of a local variable.
     253
     254At this point, we also perform a \emph{liveness analysis} on ERTL to replace unused instructions by a $\textsf{skip}$, or dummy instruction with no effect on the program state.
     255Our analysis declares an instruction as unused when it performs an assignment on a register that will not be read afterwards.
     256The result of this liveness analysis is also exploited by the \emph{register allocation} algorithm whose result is to efficiently associate a physical location (a hardware register or an address in the stack) to each pseudo register of the input RTL program.
     257
     258\paragraph{LTL}
     259
     260\paragraph{LIN}
     261
     262\subsection{Target hardware}
    285263
    286264\subsubsection{LTL}
Note: See TracChangeset for help on using the changeset viewer.