# Changeset 3660

Ignore:
Timestamp:
Mar 16, 2017, 2:28:50 PM (3 months ago)
Message:

more work on compiler architecture section

File:
1 edited

### Legend:

Unmodified
 r3643 \label{sect.compiler.architecture} \subsection{The compiler} This section gives an informal overview of the compiler, in particular it highlights the main features of the intermediate languages, the purpose of the compilation steps, and the optimisations. \subsubsection{Clight} Clight is a large subset of the C language that we adopt as the source language of our compiler. It features most of the types and operators of C. It includes pointer arithmetic, pointers to functions, and \texttt{struct} and \texttt{union} types, as well as all C control structures. The main difference with the C language is that Clight expressions are side-effect free, which means that side-effect operators ($\texttt{=}$,$\texttt{+=}$,$\texttt{++}$,$\ldots$) and function calls within expressions are not supported. Given a C program, we rely on the CIL tool~\cite{cil02} to deal with the idiosyncrasy of  C concrete syntax and to produce an equivalent program in Clight abstract syntax. We refer to  the CompCert project \cite{compcert} for a formal definition of the Clight language. Here we just recall  in figure \ref{syntax-clight} its syntax which is classically structured in expressions, statements, functions, and whole programs. In order to limit the implementation effort, our current compiler for Clight does {\em not} cover the operators relating to the floating point type {\tt float}. So, in a nutshell, the fragment of C we have implemented is Clight without floating point. \begin{figure} \label{syntax-clight} \footnotesize{ \begin{tabular}{l l l l} Expressions: & $a$ ::= & $id$               & variable identifier \\ & & $|$ $n$                                 & integer constant \\ %    & & $|$ $f$                                & float constant \\ & & $|$ \texttt{sizeof}($\tau$)             & size of a type \\ & & $|$ $op_1$ $a$                          & unary arithmetic operation \\ & & $|$ $a$ $op_2$ $a$                      & binary arithmetic operation \\ & & $|$ $*a$                                & pointer dereferencing \\ & & $|$ $a.id$                              & field access \\ & & $|$ $\&a$                               & taking the address of \\ & & $|$ $(\tau)a$                           & type cast \\ & & $|$ $a ? a : a$                         & conditional expression \$10pt] Statements: & s ::= & \texttt{skip} & empty statement \\ & & | a=a & assignment \\ & & | a=a(a^*) & function call \\ & & | a(a^*) & procedure call \\ & & | s;s & sequence \\ & & | \texttt{if} a \texttt{then} s \texttt{else} s & conditional \\ & & | \texttt{switch} a sw & multi-way branch \\ & & | \texttt{while} a \texttt{do} s & while'' loop \\ & & | \texttt{do} s \texttt{while} a & do'' loop \\ & & | \texttt{for}(s,a,s) s & for'' loop\\ & & | \texttt{break} & exit from current loop \\ & & | \texttt{continue} & next iteration of the current loop \\ & & | \texttt{return} a^? & return from current function \\ & & | \texttt{goto} lbl & branching \\ & & | lbl : s & labelled statement \\[10pt] Switch cases: & sw ::= & \texttt{default} : s & default case \\ & & | \texttt{case } n : s;sw & labelled case \\[10pt] Variable declarations: & dcl ::= & (\tau\quad id)^* & type and name\\[10pt] Functions: & Fd ::= & \tau id(dcl)\{dcl;s\} & internal function \\ & & | \texttt{extern} \tau id(dcl) & external function\\[10pt] Programs: & P ::= & dcl;Fd^*;\texttt{main}=id & global variables, functions, entry point \end{tabular}} \caption{Syntax of the Clight language} \end{figure} \subsubsection{Cminor} Cminor is a simple, low-level imperative language, comparable to a stripped-down, typeless variant of C. Again we refer to the CompCert project for its formal definition and we just recall in figure \ref{syntax-cminor} its syntax which as for Clight is structured in expressions, statements, functions, and whole programs. \begin{figure} \label{syntax-cminor} \footnotesize{ \begin{tabular}{l l l l} Signatures: & sig ::= & \texttt{sig} \vec{\texttt{int}} (\texttt{int}|\texttt{void}) & arguments and result \\[10pt] Expressions: & a ::= & id & local variable \\ & & | n & integer constant \\ % & & | f & float constant \\ & & | \texttt{addrsymbol}(id) & address of global symbol \\ & & | \texttt{addrstack}(\delta) & address within stack data \\ & & | op_1 a & unary arithmetic operation \\ & & | op_2 a a & binary arithmetic operation \\ & & | \kappa[a] & memory read\\ & & | a?a:a & conditional expression \\[10pt] Statements: & s ::= & \texttt{skip} & empty statement \\ & & | id=a & assignment \\ & & | \kappa[a]=a & memory write \\ & & | id^?=a(\vec{a}):sig & function call \\ & & | \texttt{tailcall} a(\vec{a}):sig & function tail call \\ & & | \texttt{return}(a^?) & function return \\ & & | s;s & sequence \\ & & | \texttt{if} a \texttt{then} s \texttt{else} s & conditional \\ & & | \texttt{loop} s & infinite loop \\ & & | \texttt{block} s & block delimiting \texttt{exit} constructs \\ & & | \texttt{exit} n & terminate the (n+1)^{th} enclosing block \\ & & | \texttt{switch} a tbl & multi-way test and exit\\ & & | lbl:s & labelled statement \\ & & | \texttt{goto} lbl & jump to a label\\[10pt] Switch tables: & tbl ::= & \texttt{default:exit}(n) & \\ & & | \texttt{case} i: \texttt{exit}(n);tbl & \\[10pt] Functions: & Fd ::= & \texttt{internal} sig \vec{id} \vec{id} n s & internal function: signature, parameters, \\ & & & local variables, stack size and body \\ & & | \texttt{external} id sig & external function \\[10pt] Programs: & P ::= & \texttt{prog} (id=data)^* (id=Fd)^* id & global variables, functions and entry point \end{tabular}} \caption{Syntax of the Cminor language} \end{figure} \paragraph{Translation of Clight to Cminor.} As in Cminor stack operations are made explicit, one has to know which variables are stored in the stack. This information is produced by a static analysis that determines the variables whose address may be taken'. Also space is reserved for local arrays and structures. In a second step, the proper compilation is performed: it consists mainly in translating Clight control structures to the basic ones available in Cminor. \subsubsection{RTLAbs} RTLAbs is the last architecture independent language in the compilation process. It is a rather straightforward {\em abstraction} of the {\em architecture-dependent} RTL intermediate language available in the CompCert project and it is intended to factorize some work common to the various target assembly languages (e.g. optimizations) and thus to make retargeting of the compiler a simpler matter. We stress that in RTLAbs the structure of Cminor expressions is lost and that this may have a negative impact on the following instruction selection step. Still, the subtleties of instruction selection seem rather orthogonal to our goals and we deem the possibility of retargeting easily the compiler more important than the efficiency of the generated code. \paragraph{Syntax.} In RTLAbs, programs are represented as \emph{control flow graphs} (CFGs for short). We associate with the nodes of the graphs instructions reflecting the Cminor commands. As usual, commands that change the control flow of the program (e.g. loops, conditionals) are translated by inserting suitable branching instructions in the CFG. The syntax of the language is depicted in table \ref{RTLAbs:syntax}. Local variables are now represented by \emph{pseudo registers} that are available in unbounded number. The grammar rule \textit{op} that is not detailed in table \ref{RTLAbs:syntax} defines usual arithmetic and boolean operations (\texttt{+}, \texttt{xor}, \texttt{\le}, etc.) as well as constants and conversions between sized integers. \begin{table} {\footnotesize \[ \begin{array}{lllllll} \textit{return\_type} & ::= & \textsf{int} \Alt \textsf{void} & \qquad \textit{signature} & ::= & (\textsf{int} \rightarrow)^*\ \textit{return\_type}\\ \end{array}$ $\begin{array}{lllllll} \textit{memq} & ::= & \textsf{int8s} \Alt \textsf{int8u} \Alt \textsf{int16s} \Alt \textsf{int16u} \Alt \textsf{int32} & \qquad \textit{fun\_ref} & ::= & \textit{fun\_name} \Alt \textit{psd\_reg} \end{array}$ $\begin{array}{llll} \textit{instruction} & ::= & \Alt \textsf{skip} \rightarrow \textit{node} & \quad \mbox{(no instruction)}\\ & & \Alt \textit{psd\_reg} := \textit{op}(\textit{psd\_reg}^*) \rightarrow \textit{node} & \quad \mbox{(operation)}\\ & & \Alt \textit{psd\_reg} := \textsf{\&}\textit{var\_name} \rightarrow \textit{node} & \quad \mbox{(address of a global)}\\ & & \Alt \textit{psd\_reg} := \textsf{\&locals}[n] \rightarrow \textit{node} & \quad \mbox{(address of a local)}\\ & & \Alt \textit{psd\_reg} := \textit{fun\_name} \rightarrow \textit{node} & \quad \mbox{(address of a function)}\\ & & \Alt \textit{psd\_reg} := \textit{memq}(\textit{psd\_reg}[\textit{psd\_reg}]) \rightarrow \textit{node} & \quad \mbox{(memory load)}\\ & & \Alt \textit{memq}(\textit{psd\_reg}[\textit{psd\_reg}]) := \textit{psd\_reg} \rightarrow \textit{node} & \quad \mbox{(memory store)}\\ & & \Alt \textit{psd\_reg} := \textit{fun\_ref}({\it psd\_reg^*}) : \textit{signature} \rightarrow \textit{node} & \quad \mbox{(function call)}\\ & & \Alt \textit{fun\_ref}({\it psd\_reg^*}) : \textit{signature} & \quad \mbox{(function tail call)}\\ & & \Alt \textsf{test}\ \textit{op}(\textit{psd\_reg}^*) \rightarrow \textit{node}, \textit{node} & \quad \mbox{(branch)}\\ & & \Alt \textsf{return}\ \textit{psd\_reg}? & \quad \mbox{(return)} \end{array}$ $\begin{array}{lll} \textit{fun\_def} & ::= & \textit{fun\_name}(\textit{psd\_reg}^*): \textit{signature}\\ & & \textsf{result:} \textit{psd\_reg}?\\ & & \textsf{locals:} \textit{psd\_reg}^*\\ & & \textsf{stack:} n\\ & & \textsf{entry:} \textit{node}\\ & & \textsf{exit:} \textit{node}\\ & & (\textit{node:} \textit{instruction})^* \end{array}$ $\begin{array}{lllllll} \textit{init\_datum} & ::= & \textsf{reserve}(n) \Alt \textsf{int8}(n) \Alt \textsf{int16}(n) \Alt \textsf{int32}(n) & \qquad \textit{init\_data} & ::= & \textit{init\_datum}^+ \end{array}$ $\begin{array}{lllllll} \textit{global\_decl} & ::= & \textsf{var}\ \textit{var\_name} \textit{\{init\_data\}} & \qquad \textit{fun\_decl} & ::= & \textsf{extern}\ \textit{fun\_name} \textit{(signature)} \Alt \textit{fun\_def} \end{array}$ $\begin{array}{lll} \textit{program} & ::= & \textit{global\_decl}^*\\ & & \textit{fun\_decl}^* \end{array}$} \caption{Syntax of the RTLAbs language}\label{RTLAbs:syntax} \end{table} \paragraph{Translation of Cminor to RTLAbs.} Translating Cminor programs to RTLAbs programs mainly consists in transforming Cminor commands in CFGs. Most commands are sequential and have a rather straightforward linear translation. A conditional is translated in a branch instruction; a loop is translated using a back edge in the CFG. In this section we give an overview of the architecture of the CerCo verified C compiler. We discuss the compiler's intermediate languages, the optimisations and other transforms that occur at each intermediate language, and discuss the compiler's target hardware: the MCS-51 8-bit microprocessor. Many of the intermediate languages used in the CerCo verified C compiler are inspired by analogous languages in the CompCert verified C compiler, and a compiler written by Fran\s{c}ois Pottier---used for teaching compiler construction to undergraduates. Where appropriate, we make reference to the source documentation of these two compilers, to explain our intermediate languages. \subsection{Compiler front-end} The compiler front-end lowers a program written in C to a low-level language of control-flow graphs, ready for the entry-point of our compiler backend. The first translation step of the compiler lowers the full C language to a high-level intermediate language, Clight---this language will be explained further below. This translation phase is unverified, using the CIL C parsing utility~\cite{cil02} to parse a C file and produce a Clight AST. We therefore do not expand on this here, and only note its existence. \paragraph{Clight} Clight is a large subset of the C language, developed by the CompCert team for use in the CompCert verified C compiler, that we adopt as the source language of our compiler. We refer to~\cite{compcert} for a formal definition of the Clight language. Briefly, Clight features most of the types and operators of C, and includes pointer arithmetic, pointers to functions, and \texttt{struct} and \texttt{union} types, as well as all of C's control structures. The language is classically structured with expressions, statements, functions, and whole programs. Clight expressions are side-effect free, which means that side-effect operators ($\texttt{=}$,$\texttt{+=}$,$\texttt{++}$, and so on) and function calls within expressions are not supported. This is the key difference between the full C programming language and Clight. In order to limit the implementation effort of verifying the CerCo compiler, our current compiler for Clight does {\em not} cover the operators relating to the floating point type {\tt float}. \paragraph{Cminor} Cminor is a simple, low-level imperative language, comparable to a stripped-down, typeless variant of C. The Cminor intermediate language again is taken directly from the CompCert verified C compiler, and we refer to the CompCert project for its formal definition. Like Clight, the Cminor language is structured into expressions, statements, functions, and whole programs. Control flow constructs are now presented in a simpler form, compared to Clight. In particular, Cminor provides only a single, generic loop' construct, into which the different looping constructs of Clight must be translated. Switch statements are also present, but in a semi-explicit jump-table form. In Cminor, stack operations are made explicit, and one must know which variables are stored on the stack when translating from Clight to Cminor. This information is produced by a static analysis that determines the variables whose address may be `taken'. When translating from Clight to Cminor, space is reserved for local arrays and structures. In a second step, the proper compilation of Clight programs into Cminor programs performed: it consists mainly in translating Clight control structures to the more primitive control-flow constructs available in Cminor. \paragraph{RTLabs} RTLAbs is the last architecture independent language in the compilation process. It is a rather straightforward {\em abstraction} of the {\em architecture-dependent} RTL intermediate language available in the CompCert project and it is intended to factorise some work common to the various target assembly languages (e.g. optimisations) and thus to make retargeting of the compiler a simpler matter. In RTLAbs the structure of Cminor expressions is lost. This may have a negative impact on the instruction selection steps that follow in the compiler backend. However, the subtleties of instruction selection seem rather orthogonal to our goals, and we deem the possibility of retargeting easily the compiler more important than the efficiency of the generated code. In RTLAbs, programs are represented as \emph{control flow graphs}, or CFGs for short. We associate with the nodes of the graphs instructions reflecting commands of the Cminor language. As usual, commands that change the control flow of the program (e.g. loops, conditionals) are translated by inserting suitable branching instructions in the CFG. Local variables are now represented by \emph{pseudo registers}. The number of available pseudo registers is unbounded. Eventually these pseudo registers will be replaced by real (machine) registers or stack slots, in the compiler backend. Translating Cminor programs to RTLAbs consists of transforming Cminor commands into a CFG form. Most commands are sequential and have a rather straightforward linear translation. However, a conditional is translated into a branch instruction, whilst a loop is translated using a back-edge in the CFG. \subsection{Compiler back-end} \paragraph{RTL} \paragraph{ERTL} \paragraph{LTL} \paragraph{LIN} \subsection{Target hardware}