1 | \newcommand{\Alt}{ \mid\!\!\mid } |
---|
2 | |
---|
3 | % CerCo compiler architecture |
---|
4 | % Description of languages |
---|
5 | % Target hardware description |
---|
6 | |
---|
7 | \section{Compiler architecture} |
---|
8 | \label{sect.compiler.architecture} |
---|
9 | |
---|
10 | In this section we give an overview of the architecture of the CerCo verified C compiler. |
---|
11 | We discuss the compiler's intermediate languages, the optimisations and other transforms that occur at each intermediate language, and discuss the compiler's target hardware: the MCS-51 8-bit microprocessor. |
---|
12 | |
---|
13 | Many of the intermediate languages used in the CerCo verified C compiler are inspired by analogous languages in the CompCert verified C compiler, and a compiler written by Fran\s{c}ois Pottier---used for teaching compiler construction to undergraduates. |
---|
14 | Where appropriate, we make reference to the source documentation of these two compilers, to explain our intermediate languages. |
---|
15 | |
---|
16 | Note: though many of the compiler intermediate languages (especially in the front end) are shared with CompCert, the architecture of the CerCo compiler is different in quite a substantial way from that of CompCert. |
---|
17 | In particular, the CompCert compiler's backend `linearises' its code much earlier, compared to our own. |
---|
18 | Indeed, as explained below, the CerCo compiler works with a control-flow graph based representation of the program almost until producing machine code, only linearising in the translation from LTL to LIN intermediate languages. |
---|
19 | This is merely a design variation in our two compilers, and is not motivated by any real technical issue pertinent to the proof of correctness, or due to the lifting of cost models. |
---|
20 | |
---|
21 | \subsection{Compiler front-end} |
---|
22 | |
---|
23 | The compiler front-end lowers a program written in C to a low-level language of control-flow graphs, ready for the entry-point of our compiler backend. |
---|
24 | In total, there are three intermediate languages in the front-end. |
---|
25 | |
---|
26 | The first translation step of the compiler lowers the full C language to a high-level intermediate language, Clight---this language will be explained further below. |
---|
27 | This translation phase is unverified, using the CIL C parsing utility~\cite{cil02} to handle the intricacies of parsing C. |
---|
28 | We produce a Clight AST directly from a C source file, using this tool. |
---|
29 | As this translation step is unverified we do not expand any further on it here, and only note its existence. |
---|
30 | |
---|
31 | \paragraph{Clight} is a large subset of the C language, developed by the CompCert team for use in the CompCert verified C compiler, that we adopt as the source language of our compiler. |
---|
32 | We refer to~\cite{compcert} for a formal definition of the Clight language. |
---|
33 | |
---|
34 | Briefly, Clight features most of the types and operators of C, and includes pointer arithmetic, pointers to functions, and \texttt{struct} and \texttt{union} types, as well as all of C's control structures. |
---|
35 | The language is classically structured with expressions, statements, functions, and whole programs. |
---|
36 | Clight expressions are side-effect free, which means that side-effect operators ($\texttt{=}$,$\texttt{+=}$,$\texttt{++}$, and so on) and function calls within expressions are not supported. |
---|
37 | This is the key difference between the full C programming language and Clight. |
---|
38 | |
---|
39 | In order to limit the implementation effort of verifying the CerCo compiler, our current compiler for Clight does {\em not} cover the operators relating to the floating point type {\tt float}. |
---|
40 | |
---|
41 | \paragraph{Cminor} is a simple, low-level imperative language, comparable to a stripped-down, typeless variant of C. |
---|
42 | The Cminor intermediate language again is taken directly from the CompCert verified C compiler, and we refer to the CompCert project for its formal definition. |
---|
43 | Like Clight, the Cminor language is structured into expressions, statements, functions, and whole programs. |
---|
44 | Control flow constructs are now presented in a simpler form, compared to Clight. |
---|
45 | In particular, Cminor provides only a single, generic `loop' construct, into which the different looping constructs of Clight must be translated. |
---|
46 | Switch statements are also present, but in a semi-explicit jump-table form. |
---|
47 | |
---|
48 | In Cminor, stack operations are made explicit, and one must know which variables are stored on the stack when translating from Clight to Cminor. |
---|
49 | This information is produced by a static analysis that determines the variables whose address may be `taken'. |
---|
50 | When translating from Clight to Cminor, space is reserved for local arrays and structures. |
---|
51 | In a second step, the proper compilation of Clight programs into Cminor programs is performed: consisting mainly of the translation of Clight control structures to the more primitive control-flow constructs available in Cminor. |
---|
52 | |
---|
53 | \paragraph{RTLabs} is the last architecture independent language in the compilation process. |
---|
54 | It is a rather straightforward abstraction of the architecture-dependent RTL intermediate language used in the CompCert project. |
---|
55 | |
---|
56 | RTLabs is intended to ease retargeting of the compiler---should that be desired---by factorising some work common to the various target assembly languages, for example optimisations. |
---|
57 | In RTLAbs the structure of Cminor expressions is lost. |
---|
58 | This may have a negative impact on the instruction selection steps that follow in the compiler backend. |
---|
59 | However, the subtleties of instruction selection seem rather orthogonal to our goals, and we deem the possibility of retargeting easily the compiler more important than the efficiency of the generated code. |
---|
60 | |
---|
61 | In RTLAbs, programs are represented as \emph{control flow graphs}, or CFGs for short. |
---|
62 | We associate with the nodes of the graphs instructions reflecting commands of the Cminor language. |
---|
63 | As usual, commands that change the control flow of the program (e.g. loops, conditionals) are translated by inserting suitable branching instructions in the CFG. |
---|
64 | |
---|
65 | Local variables are now represented by \emph{pseudo registers}. |
---|
66 | The number of available pseudo registers is unbounded. |
---|
67 | Eventually these pseudo registers will be replaced by real (machine) registers or stack slots, in the compiler backend. |
---|
68 | |
---|
69 | Translating Cminor programs to RTLAbs consists of transforming Cminor commands into a CFG form. |
---|
70 | Most commands are sequential and have a rather straightforward linear translation. |
---|
71 | However, a conditional is translated into a branch instruction, whilst a loop is translated using a back-edge in the CFG. |
---|
72 | |
---|
73 | \subsection{Compiler back-end} |
---|
74 | |
---|
75 | \paragraph{RTL} is the first architecture-dependant intermediate language of our compiler, and the entry point of the compiler back end. |
---|
76 | RTL programs are structured as CFGs, like RTLabs. |
---|
77 | |
---|
78 | The syntax of RTL is very close to RTLAbs, but now MCS-51 instructions are made explicit. |
---|
79 | Type information completely disappears: everything is represented using 32-bit words. |
---|
80 | Moreover, each global variable of a program is associated to a fixed offset, as we begin to fix the eventual memory layout of the program in the processor's memory. |
---|
81 | The syntax of the RTL language is provided in Table~\ref{RTL:syntax}. |
---|
82 | Within the grammar $\textit{unop}$, $\textit{binop}$, $\textit{uncon}$, and $\textit{bincon}$, represent the sets of unary operations, binary operations, unary conditions and binary conditions of the MCS-51 machine code language, lifted into RTL, respectively. |
---|
83 | |
---|
84 | \begin{table}[t] |
---|
85 | {\footnotesize |
---|
86 | \[ |
---|
87 | \begin{array}{lllllll} |
---|
88 | \textit{size} & ::= & \textsf{Byte} \Alt \textsf{HalfWord} \Alt \textsf{Word} & \qquad |
---|
89 | \textit{fun\_ref} & ::= & \textit{fun\_name} \Alt \textit{psd\_reg} |
---|
90 | \end{array} |
---|
91 | \] |
---|
92 | |
---|
93 | \[ |
---|
94 | \begin{array}{llll} |
---|
95 | \textit{instruction} & ::= & \Alt \textsf{skip} \rightarrow \textit{node} & |
---|
96 | \quad \mbox{(no instruction)}\\ |
---|
97 | & & \Alt \textit{psd\_reg} := n |
---|
98 | \rightarrow \textit{node} & \quad \mbox{(constant)}\\ |
---|
99 | & & \Alt \textit{psd\_reg} := \textit{unop}(\textit{psd\_reg}) |
---|
100 | \rightarrow \textit{node} & \quad \mbox{(unary operation)}\\ |
---|
101 | & & \Alt \textit{psd\_reg} := |
---|
102 | \textit{binop}(\textit{psd\_reg},\textit{psd\_reg}) |
---|
103 | \rightarrow \textit{node} & \quad |
---|
104 | \mbox{(binary operation)}\\ |
---|
105 | & & \Alt \textit{psd\_reg} := \textsf{\&globals}[n] |
---|
106 | \rightarrow \textit{node} & |
---|
107 | \quad \mbox{(address of a global)}\\ |
---|
108 | & & \Alt \textit{psd\_reg} := \textsf{\&locals}[n] |
---|
109 | \rightarrow \textit{node} & |
---|
110 | \quad \mbox{(address of a local)}\\ |
---|
111 | & & \Alt \textit{psd\_reg} := \textit{fun\_name} |
---|
112 | \rightarrow \textit{node} & |
---|
113 | \quad \mbox{(address of a function)}\\ |
---|
114 | & & \Alt \textit{psd\_reg} := |
---|
115 | \textit{size}(\textit{psd\_reg}[n]) |
---|
116 | \rightarrow \textit{node} & \quad \mbox{(memory load)}\\ |
---|
117 | & & \Alt \textit{size}(\textit{psd\_reg}[n]) := |
---|
118 | \textit{psd\_reg} |
---|
119 | \rightarrow \textit{node} & \quad \mbox{(memory store)}\\ |
---|
120 | & & \Alt \textit{psd\_reg} := \textit{fun\_ref}({\it psd\_reg^*}) |
---|
121 | \rightarrow \textit{node} & |
---|
122 | \quad \mbox{(function call)}\\ |
---|
123 | & & \Alt \textit{fun\_ref}({\it psd\_reg^*}) |
---|
124 | & \quad \mbox{(function tail call)}\\ |
---|
125 | & & \Alt \textsf{test}\ \textit{uncon}(\textit{psd\_reg}) \rightarrow |
---|
126 | \textit{node}, \textit{node} & \quad |
---|
127 | \mbox{(branch unary condition)}\\ |
---|
128 | & & \Alt \textsf{test}\ \textit{bincon}(\textit{psd\_reg},\textit{psd\_reg}) |
---|
129 | \rightarrow \textit{node}, \textit{node} & \quad |
---|
130 | \mbox{(branch binary condition)}\\ |
---|
131 | & & \Alt \textsf{return}\ \textit{psd\_reg}? & \quad |
---|
132 | \mbox{(return)} |
---|
133 | \end{array} |
---|
134 | \] |
---|
135 | |
---|
136 | \[ |
---|
137 | \begin{array}{lllllll} |
---|
138 | \textit{fun\_def} & ::= & \textit{fun\_name}(\textit{psd\_reg}^*) & \qquad |
---|
139 | \textit{program} & ::= & \textsf{globals}: n\\ |
---|
140 | & & \textsf{result:} \textit{psd\_reg}? & \qquad |
---|
141 | & & \textit{fun\_def}^*\\ |
---|
142 | & & \textsf{locals:} \textit{psd\_reg}^*\\ |
---|
143 | & & \textsf{stack:} n\\ |
---|
144 | & & \textsf{entry:} \textit{node}\\ |
---|
145 | & & \textsf{exit:} \textit{node}\\ |
---|
146 | & & (\textit{node:} \textit{instruction})^* |
---|
147 | \end{array} |
---|
148 | \]} |
---|
149 | \caption{Syntax of the RTL language}\label{RTL:syntax} |
---|
150 | \end{table} |
---|
151 | |
---|
152 | Translating from RTLabs into RTL is mostly straightforward. |
---|
153 | An RTLAbs instruction is often directly translated to a corresponding MCS-51 instruction. |
---|
154 | However, there are a few exceptions: some RTLAbs instructions are expanded into multiple MCS-51 instructions. |
---|
155 | When the translation of an RTLAbs instruction requires more than a few simple MCS-51 instructions, we generate a function implementing the RTLabs instruction, place it in the program's preamble before the entry point, and generate a call to the relevant function, instead of directly generating the instruction sequence. |
---|
156 | |
---|
157 | \paragraph{ERTL} programs are again structured as CFGs. |
---|
158 | In ERTL, calling conventions of the MCS-51 machine language are made explicit. |
---|
159 | We provide the syntax of the language in Table~\ref{ERTL:syntax}. |
---|
160 | |
---|
161 | The main difference between RTL and ERTL is the elimination of pseudo registers, and the use of hardware registers to pass arguments to functions. |
---|
162 | Parameters are now passed in fixed hardware registers, tied directly to our target hardware. |
---|
163 | If there are too many parameters, the remaining arguments to be passed are spilled, and are stored in the stack. |
---|
164 | Pseudo registers are not entirely eliminated at this point: they are still used for all other storage barring passing arguments to functions. |
---|
165 | |
---|
166 | Further, we now enforce a convention that certain other now-explicit hardware registers are used for certain tasks. |
---|
167 | We fix registers holding the result of a function, holding the base address of the program's global variables, holding the address of the top of the stack. |
---|
168 | We also make explicit that some registers must be saved and restored when entering or returning from a function, to avoid clobbering. |
---|
169 | |
---|
170 | ERTL function calls do not list their parameters anymore---they merely assert the number of arguments that are being passed. |
---|
171 | Two new instructions are provided to allocate and deallocate on the stack some space needed by a function to execute. |
---|
172 | Along with these two instructions come two instructions to fetch or assign a value in the parameter sections of the stack; these instructions cannot yet be translated using regular load and store instructions. |
---|
173 | This is because we do not know the final size of the stack area of each function at this point in the compilation chain. |
---|
174 | |
---|
175 | Note that the ERTL function return instruction has a boolean argument that indicates whether the result of the function may be used later or not. |
---|
176 | This is exploited in certain optimisations. |
---|
177 | |
---|
178 | \begin{table}[t] |
---|
179 | {\footnotesize |
---|
180 | \[ |
---|
181 | \begin{array}{lllllll} |
---|
182 | \textit{size} & ::= & \textsf{Byte} \Alt \textsf{HalfWord} \Alt \textsf{Word} & \qquad |
---|
183 | \textit{fun\_ref} & ::= & \textit{fun\_name} \Alt \textit{psd\_reg} |
---|
184 | \end{array} |
---|
185 | \] |
---|
186 | |
---|
187 | \[ |
---|
188 | \begin{array}{llll} |
---|
189 | \textit{instruction} & ::= & \Alt \textsf{skip} \rightarrow \textit{node} & |
---|
190 | \quad \mbox{(no instruction)}\\ |
---|
191 | & & \Alt \textsf{NewFrame} \rightarrow \textit{node} & |
---|
192 | \quad \mbox{(frame creation)}\\ |
---|
193 | & & \Alt \textsf{DelFrame} \rightarrow \textit{node} & |
---|
194 | \quad \mbox{(frame deletion)}\\ |
---|
195 | & & \Alt \textit{psd\_reg} := \textsf{stack}[\textit{slot}, n] |
---|
196 | \rightarrow \textit{node} & |
---|
197 | \quad \mbox{(stack load)}\\ |
---|
198 | & & \Alt \textsf{stack}[\textit{slot}, n] := \textit{psd\_reg} |
---|
199 | \rightarrow \textit{node} & |
---|
200 | \quad \mbox{(stack store)}\\ |
---|
201 | & & \Alt \textit{hdw\_reg} := \textit{psd\_reg} |
---|
202 | \rightarrow \textit{node} & |
---|
203 | \quad \mbox{(pseudo to hardware)}\\ |
---|
204 | & & \Alt \textit{psd\_reg} := \textit{hdw\_reg} |
---|
205 | \rightarrow \textit{node} & |
---|
206 | \quad \mbox{(hardware to pseudo)}\\ |
---|
207 | & & \Alt \textit{psd\_reg} := n |
---|
208 | \rightarrow \textit{node} & \quad \mbox{(constant)}\\ |
---|
209 | & & \Alt \textit{psd\_reg} := \textit{unop}(\textit{psd\_reg}) |
---|
210 | \rightarrow \textit{node} & \quad \mbox{(unary operation)}\\ |
---|
211 | & & \Alt \textit{psd\_reg} := |
---|
212 | \textit{binop}(\textit{psd\_reg},\textit{psd\_reg}) |
---|
213 | \rightarrow \textit{node} & \quad |
---|
214 | \mbox{(binary operation)}\\ |
---|
215 | & & \Alt \textit{psd\_reg} := \textit{fun\_name} |
---|
216 | \rightarrow \textit{node} & |
---|
217 | \quad \mbox{(address of a function)}\\ |
---|
218 | & & \Alt \textit{psd\_reg} := |
---|
219 | \textit{size}(\textit{psd\_reg}[n]) |
---|
220 | \rightarrow \textit{node} & \quad \mbox{(memory load)}\\ |
---|
221 | & & \Alt \textit{size}(\textit{psd\_reg}[n]) := |
---|
222 | \textit{psd\_reg} |
---|
223 | \rightarrow \textit{node} & \quad \mbox{(memory store)}\\ |
---|
224 | & & \Alt \textit{fun\_ref}(n) \rightarrow \textit{node} & |
---|
225 | \quad \mbox{(function call)}\\ |
---|
226 | & & \Alt \textit{fun\_ref}(n) |
---|
227 | & \quad \mbox{(function tail call)}\\ |
---|
228 | & & \Alt \textsf{test}\ \textit{uncon}(\textit{psd\_reg}) \rightarrow |
---|
229 | \textit{node}, \textit{node} & \quad |
---|
230 | \mbox{(branch unary condition)}\\ |
---|
231 | & & \Alt \textsf{test}\ \textit{bincon}(\textit{psd\_reg},\textit{psd\_reg}) |
---|
232 | \rightarrow \textit{node}, \textit{node} & \quad |
---|
233 | \mbox{(branch binary condition)}\\ |
---|
234 | & & \Alt \textsf{return}\ b & \quad \mbox{(return)} |
---|
235 | \end{array} |
---|
236 | \] |
---|
237 | |
---|
238 | \[ |
---|
239 | \begin{array}{lllllll} |
---|
240 | \textit{fun\_def} & ::= & \textit{fun\_name}(n) & \qquad |
---|
241 | \textit{program} & ::= & \textsf{globals}: n\\ |
---|
242 | & & \textsf{locals:} \textit{psd\_reg}^* & \qquad |
---|
243 | & & \textit{fun\_def}^*\\ |
---|
244 | & & \textsf{stack:} n\\ |
---|
245 | & & \textsf{entry:} \textit{node}\\ |
---|
246 | & & (\textit{node:} \textit{instruction})^* |
---|
247 | \end{array} |
---|
248 | \]} |
---|
249 | \caption{Syntax of the ERTL language}\label{ERTL:syntax} |
---|
250 | \end{table} |
---|
251 | |
---|
252 | The majority of the work involved in translating from RTL to ERTL involves making the MCS-51 calling conventions explicit. |
---|
253 | These conventions appear when entering, calling and leaving a function, and when referencing a global variable or the address of a local variable. |
---|
254 | |
---|
255 | At this point, we also perform a \emph{liveness analysis} on ERTL to replace unused instructions by a $\textsf{skip}$, or dummy instruction with no effect on the program state. |
---|
256 | Our analysis declares an instruction as unused when it performs an assignment on a register that will not be read afterwards. |
---|
257 | The result of this liveness analysis is also exploited by the \emph{register allocation} algorithm whose result is to efficiently associate a physical location (a hardware register or an address in the stack) to each pseudo register of the input RTL program. |
---|
258 | |
---|
259 | \paragraph{LTL} is another CFG-based intermediate language. |
---|
260 | Pseudo registers are now completely eliminated in favour of physical locations---both hardware registers and stack addresses. |
---|
261 | |
---|
262 | Except for a few exceptions, the instructions of the language are those of ERTL with hardware registers replacing pseudo registers. |
---|
263 | Calling and returning conventions were explicited in ERTL; thus, function calls and returns do not need parameters in LTL. |
---|
264 | The syntax of LTL is presented in Table~\ref{LTL:syntax}. |
---|
265 | |
---|
266 | \begin{table}[t] |
---|
267 | {\footnotesize |
---|
268 | \[ |
---|
269 | \begin{array}{lllllll} |
---|
270 | \textit{size} & ::= & \textsf{Byte} \Alt \textsf{HalfWord} \Alt \textsf{Word} & \qquad |
---|
271 | \textit{fun\_ref} & ::= & \textit{fun\_name} \Alt \textit{hdw\_reg} |
---|
272 | \end{array} |
---|
273 | \] |
---|
274 | |
---|
275 | \[ |
---|
276 | \begin{array}{llll} |
---|
277 | \textit{instruction} & ::= & \Alt \textsf{skip} \rightarrow \textit{node} & |
---|
278 | \quad \mbox{(no instruction)}\\ |
---|
279 | & & \Alt \textsf{NewFrame} \rightarrow \textit{node} & |
---|
280 | \quad \mbox{(frame creation)}\\ |
---|
281 | & & \Alt \textsf{DelFrame} \rightarrow \textit{node} & |
---|
282 | \quad \mbox{(frame deletion)}\\ |
---|
283 | & & \Alt \textit{hdw\_reg} := n |
---|
284 | \rightarrow \textit{node} & \quad \mbox{(constant)}\\ |
---|
285 | & & \Alt \textit{hdw\_reg} := \textit{unop}(\textit{hdw\_reg}) |
---|
286 | \rightarrow \textit{node} & \quad \mbox{(unary operation)}\\ |
---|
287 | & & \Alt \textit{hdw\_reg} := |
---|
288 | \textit{binop}(\textit{hdw\_reg},\textit{hdw\_reg}) |
---|
289 | \rightarrow \textit{node} & \quad |
---|
290 | \mbox{(binary operation)}\\ |
---|
291 | & & \Alt \textit{hdw\_reg} := \textit{fun\_name} |
---|
292 | \rightarrow \textit{node} & |
---|
293 | \quad \mbox{(address of a function)}\\ |
---|
294 | & & \Alt \textit{hdw\_reg} := \textit{size}(\textit{hdw\_reg}[n]) |
---|
295 | \rightarrow \textit{node} & \quad \mbox{(memory load)}\\ |
---|
296 | & & \Alt \textit{size}(\textit{hdw\_reg}[n]) := \textit{hdw\_reg} |
---|
297 | \rightarrow \textit{node} & \quad \mbox{(memory store)}\\ |
---|
298 | & & \Alt \textit{fun\_ref}() \rightarrow \textit{node} & |
---|
299 | \quad \mbox{(function call)}\\ |
---|
300 | & & \Alt \textit{fun\_ref}() |
---|
301 | & \quad \mbox{(function tail call)}\\ |
---|
302 | & & \Alt \textsf{test}\ \textit{uncon}(\textit{hdw\_reg}) \rightarrow |
---|
303 | \textit{node}, \textit{node} & \quad |
---|
304 | \mbox{(branch unary condition)}\\ |
---|
305 | & & \Alt \textsf{test}\ \textit{bincon}(\textit{hdw\_reg},\textit{hdw\_reg}) |
---|
306 | \rightarrow \textit{node}, \textit{node} & \quad |
---|
307 | \mbox{(branch binary condition)}\\ |
---|
308 | & & \Alt \textsf{return} & \quad \mbox{(return)} |
---|
309 | \end{array} |
---|
310 | \] |
---|
311 | |
---|
312 | \[ |
---|
313 | \begin{array}{lllllll} |
---|
314 | \textit{fun\_def} & ::= & \textit{fun\_name}(n) & \qquad |
---|
315 | \textit{program} & ::= & \textsf{globals}: n\\ |
---|
316 | & & \textsf{locals:} n & \qquad |
---|
317 | & & \textit{fun\_def}^*\\ |
---|
318 | & & \textsf{stack:} n\\ |
---|
319 | & & \textsf{entry:} \textit{node}\\ |
---|
320 | & & (\textit{node:} \textit{instruction})^* |
---|
321 | \end{array} |
---|
322 | \]} |
---|
323 | \caption{Syntax of the LTL language}\label{LTL:syntax} |
---|
324 | \end{table} |
---|
325 | |
---|
326 | Translating ERTL to LTL relies on the results of the liveness analysis performed on ERTL, and mentioned above, and of the register allocation process. |
---|
327 | Unused instructions are eliminated and each pseudo register is replaced by a physical location. |
---|
328 | In LTL, the size of the stack frame of a function is now known. |
---|
329 | As a result, instructions intended to load or store values in the stack can be translated using regular load and store instructions of the hardware target. |
---|
330 | |
---|
331 | We use a \emph{graph compression} algorithm to remove empty instructions generated by previous compilation passes and by the liveness analysis. |
---|
332 | |
---|
333 | \paragraph{LIN} is a linearised intermediate language, and the final intermediate language before assembly language is produced. |
---|
334 | The body of every function within LIN is now represented as a sequence of instructions. |
---|
335 | Despite one being a CFG language, and the other being linear, the instructions of LIN are very close to those of LTL. |
---|
336 | \emph{Program labels}, \emph{gotos} and branch instructions handle the changes in the control flow. |
---|
337 | The syntax of LIN programs is presented in Table~\ref{LIN:syntax}. |
---|
338 | |
---|
339 | \begin{table}[t] |
---|
340 | {\footnotesize |
---|
341 | \[ |
---|
342 | \begin{array}{lllllll} |
---|
343 | \textit{size} & ::= & \textsf{Byte} \Alt \textsf{HalfWord} \Alt \textsf{Word} & \qquad |
---|
344 | \textit{fun\_ref} & ::= & \textit{fun\_name} \Alt \textit{hdw\_reg} |
---|
345 | \end{array} |
---|
346 | \] |
---|
347 | |
---|
348 | \[ |
---|
349 | \begin{array}{llll} |
---|
350 | \textit{instruction} & ::= & \Alt \textsf{NewFrame} & |
---|
351 | \quad \mbox{(frame creation)}\\ |
---|
352 | & & \Alt \textsf{DelFrame} & |
---|
353 | \quad \mbox{(frame deletion)}\\ |
---|
354 | & & \Alt \textit{hdw\_reg} := n & \quad \mbox{(constant)}\\ |
---|
355 | & & \Alt \textit{hdw\_reg} := \textit{unop}(\textit{hdw\_reg}) |
---|
356 | & \quad \mbox{(unary operation)}\\ |
---|
357 | & & \Alt \textit{hdw\_reg} := |
---|
358 | \textit{binop}(\textit{hdw\_reg},\textit{hdw\_reg}) |
---|
359 | & \quad \mbox{(binary operation)}\\ |
---|
360 | & & \Alt \textit{hdw\_reg} := \textit{fun\_name} |
---|
361 | & \quad \mbox{(address of a function)}\\ |
---|
362 | & & \Alt \textit{hdw\_reg} := \textit{size}(\textit{hdw\_reg}[n]) |
---|
363 | & \quad \mbox{(memory load)}\\ |
---|
364 | & & \Alt \textit{size}(\textit{hdw\_reg}[n]) := \textit{hdw\_reg} |
---|
365 | & \quad \mbox{(memory store)}\\ |
---|
366 | & & \Alt \textsf{call}\ \textit{fun\_ref} |
---|
367 | & \quad \mbox{(function call)}\\ |
---|
368 | & & \Alt \textsf{tailcall}\ \textit{fun\_ref} |
---|
369 | & \quad \mbox{(function tail call)}\\ |
---|
370 | & & \Alt \textit{uncon}(\textit{hdw\_reg}) \rightarrow |
---|
371 | \textit{node} & \quad |
---|
372 | \mbox{(branch unary condition)}\\ |
---|
373 | & & \Alt \textit{bincon}(\textit{hdw\_reg},\textit{hdw\_reg}) |
---|
374 | \rightarrow \textit{node} & \quad |
---|
375 | \mbox{(branch binary condition)}\\ |
---|
376 | & & \Alt \textit{mips\_label:} & \quad \mbox{(MCS-51 label)}\\ |
---|
377 | & & \Alt \textsf{goto}\ \textit{mips\_label} & \quad \mbox{(goto)}\\ |
---|
378 | & & \Alt \textsf{return} & \quad \mbox{(return)} |
---|
379 | \end{array} |
---|
380 | \] |
---|
381 | |
---|
382 | \[ |
---|
383 | \begin{array}{lllllll} |
---|
384 | \textit{fun\_def} & ::= & \textit{fun\_name}(n) & \qquad |
---|
385 | \textit{program} & ::= & \textsf{globals}: n\\ |
---|
386 | & & \textsf{locals:} n & \qquad |
---|
387 | & & \textit{fun\_def}^*\\ |
---|
388 | & & \textit{instruction}^* |
---|
389 | \end{array} |
---|
390 | \]} |
---|
391 | \caption{Syntax of the LIN language}\label{LIN:syntax} |
---|
392 | \end{table} |
---|
393 | |
---|
394 | Translating LTL to LIN amounts to walking the CFG representation of each LTL function, starting from the `main' function, and producing a series of lists of instructions from each function body. |
---|
395 | Some dead-code elimination is implemented here, as functions that are never called are eliminated entirely. |
---|
396 | |
---|
397 | \subsection{Target hardware} |
---|
398 | |
---|
399 | \paragraph{MCS-51 assembly language} |
---|