Changeset 3660


Ignore:
Timestamp:
Mar 16, 2017, 2:28:50 PM (3 months ago)
Author:
mulligan
Message:

more work on compiler architecture section

File:
1 edited

Legend:

Unmodified
Added
Removed
  • Papers/jar-cerco-2017/architecture.tex

    r3643 r3660  
    88\label{sect.compiler.architecture}
    99
    10 \subsection{The compiler}
    11 
    12 This section gives an informal overview of the compiler, in particular it
    13 highlights the main features of the intermediate languages, the purpose of the
    14 compilation steps, and the optimisations.
    15 
    16 \subsubsection{Clight}
    17 Clight is a large subset of the C language that we adopt as
    18 the source language of our compiler.
    19 It features most of the types and operators
    20 of C. It includes pointer arithmetic, pointers to
    21 functions, and \texttt{struct} and \texttt{union} types, as well as
    22 all C control structures. The main difference with the C
    23 language is that Clight expressions are side-effect free, which
    24 means that side-effect operators ($\texttt{=}$,$\texttt{+=}$,$\texttt{++}$,$\ldots$) and function calls
    25 within expressions are not supported.
    26 Given a C program, we rely on the CIL tool~\cite{cil02} to deal
    27 with the idiosyncrasy of  C concrete syntax and to produce an
    28 equivalent program in Clight abstract syntax.
    29 We refer to  the CompCert project \cite{compcert}
    30 for a formal definition of the
    31 Clight language. Here we just recall  in
    32 figure \ref{syntax-clight} its syntax which is
    33 classically structured in expressions, statements, functions,
    34 and whole programs. In order to limit the implementation effort,
    35 our current compiler for Clight does {\em not} cover the operators
    36 relating to the floating point type {\tt float}. So, in a nutshell,
    37 the fragment of C we have implemented is Clight without
    38 floating point.
    39 
    40 
    41 
    42 \begin{figure}
    43   \label{syntax-clight}
    44   \footnotesize{
    45   \begin{tabular}{l l l l}
    46     Expressions: & $a$ ::= & $id$               & variable identifier \\
    47     & & $|$ $n$                                 & integer constant \\
    48 %    & & $|$ $f$                                & float constant \\
    49     & & $|$ \texttt{sizeof}($\tau$)             & size of a type \\
    50     & & $|$ $op_1$ $a$                          & unary arithmetic operation \\
    51     & & $|$ $a$ $op_2$ $a$                      & binary arithmetic operation \\
    52     & & $|$ $*a$                                & pointer dereferencing \\
    53     & & $|$ $a.id$                              & field access \\
    54     & & $|$ $\&a$                               & taking the address of \\
    55     & & $|$ $(\tau)a$                           & type cast \\
    56     & & $|$ $a ? a : a$                         & conditional expression \\[10pt]
    57     Statements: & $s$ ::= & \texttt{skip}       & empty statement \\
    58     & & $|$ $a=a$                               & assignment \\
    59     & & $|$ $a=a(a^*)$                          & function call \\
    60     & & $|$ $a(a^*)$                            & procedure call \\
    61     & & $|$ $s;s$                               & sequence \\
    62     & & $|$ \texttt{if} $a$ \texttt{then} $s$ \texttt{else} $s$ & conditional \\
    63     & & $|$ \texttt{switch} $a$ $sw$            & multi-way branch \\
    64     & & $|$ \texttt{while} $a$ \texttt{do} $s$  & ``while'' loop \\
    65     & & $|$ \texttt{do} $s$ \texttt{while} $a$  & ``do'' loop \\
    66     & & $|$ \texttt{for}($s$,$a$,$s$) $s$       & ``for'' loop\\
    67     & & $|$ \texttt{break}                      & exit from current loop \\
    68     & & $|$ \texttt{continue}                   & next iteration of the current loop \\
    69     & & $|$ \texttt{return} $a^?$               & return from current function \\
    70     & & $|$ \texttt{goto} $lbl$                 & branching \\
    71     & & $|$ $lbl$ : $s$                         & labelled statement \\[10pt]
    72     Switch cases: & $sw$ ::= & \texttt{default} : $s$           & default case \\
    73     & & $|$ \texttt{case } $n$ : $s;sw$                         & labelled case \\[10pt]
    74     Variable declarations: & $dcl$ ::= & $(\tau\quad id)^*$             & type and name\\[10pt]
    75     Functions: & $Fd$ ::= & $\tau$ $id(dcl)\{dcl;s\}$   & internal function \\
    76     & & $|$ \texttt{extern} $\tau$ $id(dcl)$                    & external function\\[10pt]
    77     Programs: & $P$ ::= & $dcl;Fd^*;\texttt{main}=id$           & global variables, functions, entry point
    78   \end{tabular}}
    79   \caption{Syntax of the Clight language}
    80 \end{figure}
    81 
    82 
    83 \subsubsection{Cminor}
    84 
    85 Cminor is a simple, low-level imperative language, comparable to a
    86 stripped-down, typeless variant of C. Again we refer
    87 to the CompCert project for its formal definition
    88 and we just recall in figure \ref{syntax-cminor}
    89 its syntax which as for Clight is structured in
    90 expressions, statements, functions, and whole programs.
    91 
    92 \begin{figure}
    93   \label{syntax-cminor}
    94   \footnotesize{
    95   \begin{tabular}{l l l l}
    96     Signatures: & $sig$ ::= & \texttt{sig} $\vec{\texttt{int}}$ $(\texttt{int}|\texttt{void})$ & arguments and result \\[10pt]
    97     Expressions: & $a$ ::= & $id$               & local variable \\
    98      & & $|$ $n$                                & integer constant \\
    99 %     & & $|$ $f$                               & float constant \\
    100      & & $|$ \texttt{addrsymbol}($id$)          & address of global symbol \\
    101      & & $|$ \texttt{addrstack}($\delta$)       & address within stack data \\
    102      & & $|$ $op_1$ $a$                         & unary arithmetic operation \\
    103      & & $|$ $op_2$ $a$ $a$                     & binary arithmetic operation \\
    104      & & $|$ $\kappa[a]$                        & memory read\\
    105      & & $|$ $a?a:a$                    & conditional expression \\[10pt]
    106     Statements: & $s$ ::= & \texttt{skip}       & empty statement \\
    107     & & $|$ $id=a$                              & assignment \\
    108     & & $|$ $\kappa[a]=a$                       & memory write \\
    109     & & $|$ $id^?=a(\vec{a}):sig$               & function call \\
    110     & & $|$ \texttt{tailcall} $a(\vec{a}):sig$  & function tail call \\
    111     & & $|$ \texttt{return}$(a^?)$              & function return \\
    112     & & $|$ $s;s$                               & sequence \\
    113     & & $|$ \texttt{if} $a$ \texttt{then} $s$ \texttt{else} $s$         & conditional  \\
    114     & & $|$ \texttt{loop} $s$                   & infinite loop \\
    115     & & $|$ \texttt{block} $s$          & block delimiting \texttt{exit} constructs \\
    116     & & $|$ \texttt{exit} $n$                   & terminate the $(n+1)^{th}$ enclosing block \\
    117     & & $|$ \texttt{switch} $a$ $tbl$           & multi-way test and exit\\
    118     & & $|$ $lbl:s$                             & labelled statement \\
    119     & & $|$ \texttt{goto} $lbl$                 & jump to a label\\[10pt]
    120     Switch tables: & $tbl$ ::= & \texttt{default:exit}($n$) & \\
    121     & & $|$ \texttt{case} $i$: \texttt{exit}($n$);$tbl$ & \\[10pt]
    122     Functions: & $Fd$ ::= & \texttt{internal} $sig$ $\vec{id}$ $\vec{id}$ $n$ $s$ & internal function: signature, parameters, \\
    123     & & & local variables, stack size and body \\
    124     & & $|$ \texttt{external} $id$ $sig$ & external function \\[10pt]
    125     Programs: & $P$ ::= & \texttt{prog} $(id=data)^*$ $(id=Fd)^*$ $id$   & global variables, functions and entry point
    126   \end{tabular}}
    127   \caption{Syntax of the Cminor language}
    128 \end{figure}
    129 
    130 \paragraph{Translation of Clight to Cminor.}
    131 As in Cminor stack operations are made explicit, one has to know
    132 which variables are stored in the stack. This
    133 information is produced by a static analysis that determines
    134 the variables whose address may be `taken'.
    135 Also space is reserved for local arrays and structures. In a
    136 second step, the proper compilation is performed: it consists mainly
    137 in translating Clight control structures to the basic
    138 ones available in Cminor.
    139 
    140 \subsubsection{RTLAbs}
    141 
    142 RTLAbs is the last architecture independent language in the
    143 compilation process.  It is a rather straightforward {\em abstraction} of
    144 the {\em architecture-dependent} RTL intermediate language
    145 available in the CompCert project and it is intended to factorize
    146 some work common to the various target assembly languages
    147 (e.g. optimizations) and thus to make retargeting of the compiler a
    148 simpler matter.
    149 
    150 We stress that in RTLAbs the structure of Cminor expressions
    151 is lost and that this may have a negative impact on the following
    152 instruction selection step.
    153 Still, the subtleties of instruction selection seem rather orthogonal
    154 to our goals and we deem the possibility of retargeting
    155 easily the compiler more important than the efficiency of the generated code.
    156 
    157 
    158 
    159 \paragraph{Syntax.}
    160 In RTLAbs, programs are represented as \emph{control flow
    161   graphs} (CFGs for short). We associate with the nodes of the graphs
    162 instructions reflecting the Cminor commands. As usual, commands that change the control
    163 flow of the program (e.g. loops, conditionals) are translated by inserting
    164 suitable branching instructions in the CFG.
    165 The syntax of the language is depicted in table
    166 \ref{RTLAbs:syntax}. Local variables are now represented by \emph{pseudo
    167   registers} that are available in unbounded number. The grammar rule $\textit{op}$ that is
    168 not detailed in table \ref{RTLAbs:syntax} defines usual arithmetic and boolean
    169 operations (\texttt{+}, \texttt{xor}, \texttt{$\le$}, etc.) as well as constants
    170 and conversions between sized integers.
    171 
    172 \begin{table}
    173 {\footnotesize
    174 \[
    175 \begin{array}{lllllll}
    176 \textit{return\_type} & ::= & \textsf{int} \Alt \textsf{void} & \qquad
    177 \textit{signature} & ::= & (\textsf{int} \rightarrow)^*\ \textit{return\_type}\\
    178 \end{array}
    179 \]
    180 
    181 \[
    182 \begin{array}{lllllll}
    183 \textit{memq} & ::= & \textsf{int8s} \Alt \textsf{int8u} \Alt \textsf{int16s} \Alt \textsf{int16u}
    184                  \Alt \textsf{int32} & \qquad
    185 \textit{fun\_ref} & ::= & \textit{fun\_name} \Alt \textit{psd\_reg}
    186 \end{array}
    187 \]
    188 
    189 \[
    190 \begin{array}{llll}
    191 \textit{instruction} & ::= & \Alt \textsf{skip} \rightarrow \textit{node} &
    192                         \quad \mbox{(no instruction)}\\
    193                 &     & \Alt \textit{psd\_reg} := \textit{op}(\textit{psd\_reg}^*)
    194                         \rightarrow \textit{node} & \quad \mbox{(operation)}\\
    195                 &     & \Alt \textit{psd\_reg} := \textsf{\&}\textit{var\_name}
    196                         \rightarrow \textit{node} &
    197                         \quad \mbox{(address of a global)}\\
    198                 &     & \Alt \textit{psd\_reg} := \textsf{\&locals}[n]
    199                         \rightarrow \textit{node} &
    200                         \quad \mbox{(address of a local)}\\
    201                 &     & \Alt \textit{psd\_reg} := \textit{fun\_name}
    202                         \rightarrow \textit{node} &
    203                         \quad \mbox{(address of a function)}\\
    204                 &     & \Alt \textit{psd\_reg} :=
    205                         \textit{memq}(\textit{psd\_reg}[\textit{psd\_reg}])
    206                         \rightarrow \textit{node} & \quad \mbox{(memory load)}\\
    207                 &     & \Alt \textit{memq}(\textit{psd\_reg}[\textit{psd\_reg}]) :=
    208                         \textit{psd\_reg}
    209                         \rightarrow \textit{node} & \quad \mbox{(memory store)}\\
    210                 &     & \Alt \textit{psd\_reg} := \textit{fun\_ref}({\it psd\_reg^*}) :
    211                         \textit{signature} \rightarrow \textit{node} &
    212                         \quad \mbox{(function call)}\\
    213                 &     & \Alt \textit{fun\_ref}({\it psd\_reg^*}) : \textit{signature}
    214                         & \quad \mbox{(function tail call)}\\
    215                 &     & \Alt \textsf{test}\ \textit{op}(\textit{psd\_reg}^*) \rightarrow
    216                         \textit{node}, \textit{node} & \quad \mbox{(branch)}\\
    217                 &     & \Alt \textsf{return}\ \textit{psd\_reg}? & \quad \mbox{(return)}
    218 \end{array}
    219 \]
    220 
    221 \[
    222 \begin{array}{lll}
    223 \textit{fun\_def} & ::= & \textit{fun\_name}(\textit{psd\_reg}^*): \textit{signature}\\
    224              &     & \textsf{result:} \textit{psd\_reg}?\\
    225              &     & \textsf{locals:} \textit{psd\_reg}^*\\
    226              &     & \textsf{stack:} n\\
    227              &     & \textsf{entry:} \textit{node}\\
    228              &     & \textsf{exit:} \textit{node}\\
    229              &     & (\textit{node:} \textit{instruction})^*
    230 \end{array}
    231 \]
    232 
    233 \[
    234 \begin{array}{lllllll}
    235 \textit{init\_datum} & ::= & \textsf{reserve}(n) \Alt \textsf{int8}(n) \Alt \textsf{int16}(n)
    236                        \Alt \textsf{int32}(n) & \qquad
    237 \textit{init\_data} & ::= & \textit{init\_datum}^+
    238 \end{array}
    239 \]
    240 
    241 \[
    242 \begin{array}{lllllll}
    243 \textit{global\_decl} & ::= & \textsf{var}\ \textit{var\_name} \textit{\{init\_data\}} & \qquad
    244 \textit{fun\_decl} & ::= & \textsf{extern}\ \textit{fun\_name} \textit{(signature)} \Alt
    245                       \textit{fun\_def}
    246 \end{array}
    247 \]
    248 
    249 \[
    250 \begin{array}{lll}
    251 \textit{program} & ::= & \textit{global\_decl}^*\\
    252             &     & \textit{fun\_decl}^*
    253 \end{array}
    254 \]}
    255 \caption{Syntax of the RTLAbs language}\label{RTLAbs:syntax}
    256 \end{table}
    257 
    258 \paragraph{Translation of Cminor to RTLAbs.}
    259 Translating Cminor programs to RTLAbs programs mainly consists in
    260 transforming Cminor commands in CFGs. Most commands are sequential and have a
    261 rather straightforward linear translation. A conditional is translated in a
    262 branch instruction; a loop is translated using a back edge in the CFG.
    263 
     10In this section we give an overview of the architecture of the CerCo verified C compiler.
     11We discuss the compiler's intermediate languages, the optimisations and other transforms that occur at each intermediate language, and discuss the compiler's target hardware: the MCS-51 8-bit microprocessor.
     12
     13Many of the intermediate languages used in the CerCo verified C compiler are inspired by analogous languages in the CompCert verified C compiler, and a compiler written by Fran\s{c}ois Pottier---used for teaching compiler construction to undergraduates.
     14Where appropriate, we make reference to the source documentation of these two compilers, to explain our intermediate languages.
     15
     16\subsection{Compiler front-end}
     17
     18The compiler front-end lowers a program written in C to a low-level language of control-flow graphs, ready for the entry-point of our compiler backend.
     19The first translation step of the compiler lowers the full C language to a high-level intermediate language, Clight---this language will be explained further below.
     20This translation phase is unverified, using the CIL C parsing utility~\cite{cil02} to parse a C file and produce a Clight AST.
     21We therefore do not expand on this here, and only note its existence.
     22
     23\paragraph{Clight}
     24Clight is a large subset of the C language, developed by the CompCert team for use in the CompCert verified C compiler, that we adopt as the source language of our compiler.
     25We refer to~\cite{compcert} for a formal definition of the Clight language.
     26
     27Briefly, Clight features most of the types and operators of C, and includes pointer arithmetic, pointers to functions, and \texttt{struct} and \texttt{union} types, as well as all of C's control structures.
     28The language is classically structured with expressions, statements, functions, and whole programs.
     29Clight expressions are side-effect free, which means that side-effect operators ($\texttt{=}$,$\texttt{+=}$,$\texttt{++}$, and so on) and function calls within expressions are not supported.
     30This is the key difference between the full C programming language and Clight.
     31
     32In order to limit the implementation effort of verifying the CerCo compiler, our current compiler for Clight does {\em not} cover the operators relating to the floating point type {\tt float}.
     33
     34\paragraph{Cminor}
     35
     36Cminor is a simple, low-level imperative language, comparable to a stripped-down, typeless variant of C.
     37The Cminor intermediate language again is taken directly from the CompCert verified C compiler, and we refer to the CompCert project for its formal definition.
     38Like Clight, the Cminor language is structured into expressions, statements, functions, and whole programs.
     39Control flow constructs are now presented in a simpler form, compared to Clight.
     40In particular, Cminor provides only a single, generic `loop' construct, into which the different looping constructs of Clight must be translated.
     41Switch statements are also present, but in a semi-explicit jump-table form.
     42
     43In Cminor, stack operations are made explicit, and one must know which variables are stored on the stack when translating from Clight to Cminor.
     44This information is produced by a static analysis that determines the variables whose address may be `taken'.
     45When translating from Clight to Cminor, space is reserved for local arrays and structures.
     46In a second step, the proper compilation of Clight programs into Cminor programs performed: it consists mainly in translating Clight control structures to the more primitive control-flow constructs available in Cminor.
     47
     48\paragraph{RTLabs}
     49
     50RTLAbs is the last architecture independent language in the compilation process.
     51It is a rather straightforward {\em abstraction} of the {\em architecture-dependent} RTL intermediate language available in the CompCert project and it is intended to factorise some work common to the various target assembly languages (e.g. optimisations) and thus to make retargeting of the compiler a simpler matter.
     52
     53In RTLAbs the structure of Cminor expressions is lost.
     54This may have a negative impact on the instruction selection steps that follow in the compiler backend.
     55However, the subtleties of instruction selection seem rather orthogonal to our goals, and we deem the possibility of retargeting easily the compiler more important than the efficiency of the generated code.
     56
     57In RTLAbs, programs are represented as \emph{control flow graphs}, or CFGs for short.
     58We associate with the nodes of the graphs instructions reflecting commands of the Cminor language.
     59As usual, commands that change the control flow of the program (e.g. loops, conditionals) are translated by inserting suitable branching instructions in the CFG.
     60
     61Local variables are now represented by \emph{pseudo registers}.
     62The number of available pseudo registers is unbounded.
     63Eventually these pseudo registers will be replaced by real (machine) registers or stack slots, in the compiler backend.
     64
     65Translating Cminor programs to RTLAbs consists of transforming Cminor commands into a CFG form.
     66Most commands are sequential and have a rather straightforward linear translation.
     67However, a conditional is translated into a branch instruction, whilst a loop is translated using a back-edge in the CFG.
     68
     69\subsection{Compiler back-end}
     70
     71\paragraph{RTL}
     72
     73\paragraph{ERTL}
     74
     75\paragraph{LTL}
     76
     77\paragraph{LIN}
     78
     79\subsection{Target hardware}
    26480
    26581
Note: See TracChangeset for help on using the changeset viewer.