# Changeset 3665 for Papers

Ignore:
Timestamp:
Mar 16, 2017, 5:09:26 PM (15 months ago)
Message:

finished compiler architecture section?

File:
1 edited

### Legend:

Unmodified
 r3664 Indeed, as explained below, the CerCo compiler works with a control-flow graph based representation of the program almost until producing machine code, only linearising in the translation from LTL to LIN intermediate languages. This is merely a design variation in our two compilers, and is not motivated by any real technical issue pertinent to the proof of correctness, or due to the lifting of cost models. Like many verified compilers, we ignore linking entirely, focussing solely on compiling single module programs. \subsection{Compiler front-end} \subsection{Compiler back-end} The compiler back end lowers the machine-independent RTlabs language to an assembly language of our target architecture, the MCS-51. As each backend intermediate language is bespoke, we include the full grammar of each language. \paragraph{RTL} is the first architecture-dependant intermediate language of our compiler, and the entry point of the compiler back end. \subsection{Target hardware} \paragraph{MCS-51 assembly language} The target architecture of our compiler is the MCS-51, an 8-bit microcontroller. The MCS-51 dates from the early 1980s, when Intel first introduced the architecture, and is commonly called the 8051/8052 after the two product numbers Intel used for their own variants. Derivatives are still widely manufactured by a number of semiconductor foundries, with the processor being used especially in embedded systems. The MCS-51 has a relative paucity of features compared to its more modern brethren, with the lack of any caching or pipelining features meaning that timing of execution is predictable. We know, for example, that on the Siemens variant of the MCS-51 that we are targetting, that a \texttt{MOV} instruction will take three clock cycles, no matter what state that microcontroller is in. This fact, coupled with its popularity in the embedded software world, makes the MCS-51 an ideal target for the CerCo verified C compiler. The MCS-51's paucity of features---though an advantage in many respects---also quickly becomes a hindrance, as the MCS-51 features a relatively minuscule series of memory spaces by modern standards. As a result our C compiler, to be able to successfully compile realistic programs for embedded devices, ought to produce tight' machine code. Further, the MCS-51 instruction set also includes many variants of jump and call instructions. \emph{Short jump} (or \emph{short call}) instructions may be used when the location to jump to is within a certain distance. Otherwise, other jump instructions must be used for longer jumps. As short jump instructions take up less space in code memory than their longer counterparts, we wish to always try to select the smallest machine code jump instruction, to save space. These two facts motivate the introduction of a bespoke assembly language, with pseudoinstructions for jump, call, and other tricky machine code instructions, which a dedicated assembler will handle. This simplifies the compiler, which does not need to try to select the shortest possible jump during instruction selection. Note that, as the cost model must be lifted from machine code, back through the compilation chain, the assembler must be able to lift a cost model from machine code to assembly programs. As a result, the CerCo verified C compiler and its assembler cannot be properly separated. Under the CerCo scheme, the assembler \emph{is} part of the compiler proper, and no longer an independent program in its own right. As a result, the assembly language that we define may be though of as a final intermediate language for our compiler: \paragraph{MCS-51 assembly language} adds explicit labels for jumping to, and embeds pseudo instructions for jumps, calls, and arbitrary-sized moves to and from memory into MCS-51 machine code. These simplified forms ease the development of the compiler. The CerCo assembler is a moderately optimising assembler, in that the assembler tries to produce the smallest jump or call instructions that it can, as mentioned above. To do this, we must solve the branch displacement' problem---deciding how best to expand pseudojumps to labels in assembly language to machine code jumps. Our optimising assembler must therefore individually expand every pseudo instruction in such a way that all global constraints are satisfied and that the compiled program is minimal in size and faster in concrete time complexity. This problem is known to be computationally hard for most CISC architectures (see~\cite{hyde:branch:2006}). After assembling a program, the CerCo assembler produces encoded MCS-51 machine code in Intel HEX format, an executable file format that describes a code memory image.