Changeset 3665

Mar 16, 2017, 5:09:26 PM (9 days ago)

finished compiler architecture section?

1 edited


  • Papers/jar-cerco-2017/architecture.tex

    r3664 r3665  
    1818Indeed, as explained below, the CerCo compiler works with a control-flow graph based representation of the program almost until producing machine code, only linearising in the translation from LTL to LIN intermediate languages.
    1919This is merely a design variation in our two compilers, and is not motivated by any real technical issue pertinent to the proof of correctness, or due to the lifting of cost models.
     21Like many verified compilers, we ignore linking entirely, focussing solely on compiling single module programs.
    2123\subsection{Compiler front-end}
    7375\subsection{Compiler back-end}
     77The compiler back end lowers the machine-independent RTlabs language to an assembly language of our target architecture, the MCS-51.
     78As each backend intermediate language is bespoke, we include the full grammar of each language.
    7580\paragraph{RTL} is the first architecture-dependant intermediate language of our compiler, and the entry point of the compiler back end.
    397402\subsection{Target hardware}
    399 \paragraph{MCS-51 assembly language}
     404The target architecture of our compiler is the MCS-51, an 8-bit microcontroller.
     405The MCS-51 dates from the early 1980s, when Intel first introduced the architecture, and is commonly called the 8051/8052 after the two product numbers Intel used for their own variants.
     406Derivatives are still widely manufactured by a number of semiconductor foundries, with the processor being used especially in embedded systems.
     408The MCS-51 has a relative paucity of features compared to its more modern brethren, with the lack of any caching or pipelining features meaning that timing of execution is predictable.
     409We know, for example, that on the Siemens variant of the MCS-51 that we are targetting, that a \texttt{MOV} instruction will take three clock cycles, no matter what state that microcontroller is in.
     410This fact, coupled with its popularity in the embedded software world, makes the MCS-51 an ideal target for the CerCo verified C compiler.
     412The MCS-51's paucity of features---though an advantage in many respects---also quickly becomes a hindrance, as the MCS-51 features a relatively minuscule series of memory spaces by modern standards.
     413As a result our C compiler, to be able to successfully compile realistic programs for embedded devices, ought to produce `tight' machine code.
     415Further, the MCS-51 instruction set also includes many variants of jump and call instructions.
     416\emph{Short jump} (or \emph{short call}) instructions may be used when the location to jump to is within a certain distance.
     417Otherwise, other jump instructions must be used for longer jumps.
     418As short jump instructions take up less space in code memory than their longer counterparts, we wish to always try to select the smallest machine code jump instruction, to save space.
     420These two facts motivate the introduction of a bespoke assembly language, with pseudoinstructions for jump, call, and other tricky machine code instructions, which a dedicated assembler will handle.
     421This simplifies the compiler, which does not need to try to select the shortest possible jump during instruction selection.
     423Note that, as the cost model must be lifted from machine code, back through the compilation chain, the assembler must be able to lift a cost model from machine code to assembly programs.
     424As a result, the CerCo verified C compiler and its assembler cannot be properly separated.
     425Under the CerCo scheme, the assembler \emph{is} part of the compiler proper, and no longer an independent program in its own right.
     426As a result, the assembly language that we define may be though of as a final intermediate language for our compiler:
     428\paragraph{MCS-51 assembly language} adds explicit labels for jumping to, and embeds pseudo instructions for jumps, calls, and arbitrary-sized moves to and from memory into MCS-51 machine code.
     429These simplified forms ease the development of the compiler.
     431The CerCo assembler is a moderately optimising assembler, in that the assembler tries to produce the smallest jump or call instructions that it can, as mentioned above.
     432To do this, we must solve the `branch displacement' problem---deciding how best to expand pseudojumps to labels in assembly language to machine code jumps.
     433Our optimising assembler must therefore individually expand every pseudo instruction in such a way that all global constraints are satisfied and that the compiled program is minimal in size and faster in concrete time complexity.
     434This problem is known to be computationally hard for most CISC architectures (see~\cite{hyde:branch:2006}).
     436After assembling a program, the CerCo assembler produces encoded MCS-51 machine code in Intel HEX format, an executable file format that describes a code memory image.
Note: See TracChangeset for help on using the changeset viewer.