Changeset 2066


Ignore:
Timestamp:
Jun 13, 2012, 6:46:06 PM (5 years ago)
Author:
mulligan
Message:

Finished for the day.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • src/ASM/CPP2012-asm/cpp-2012-asm.tex

    r2063 r2066  
    7878As a result our C compiler, to have any sort of hope of successfully compiling realistic programs for embedded devices, ought to produce `tight' machine code.
    7979
    80 For example, the MCS-51 features three unconditional jump instructions: \texttt{LJMP} and \texttt{SJMP}---`long jump' and `short jump' respectively---and an 11-bit oddity of the MCS-51, \texttt{AJMP}.
    81 Each of these three instructions expects arguments in different sizes and behaves in markedly different ways: \texttt{SJMP} may only perform a `local jump'; \texttt{LJMP} may jump to any address in the MCS-51's memory space and \texttt{AJMP} may jump to any address in the current memory page.
    82 Consequently, the size of each opcode is different, and to squeeze as much code as possible into the MCS-51's limited code memory, the smallest possible opcode that will suffice should be selected.
    83 This is a well known problem to assembler writers who target RISC architectures, often referred to as `branch displacement'~\cite{holmes:branch:2006}.
    84 
    8580Branch displacement is not a simple problem to solve and requires the implementation of an optimising assembler.
    8681Labels, conditional jumps to labels, a program preamble containing global data and a \texttt{MOV} instruction for moving this global data into the MCS-51's one 16-bit register all feature in our assembly language.
     
    10499In each translation pass from intermediate language to intermediate language, we must prove that both the meaning and concrete complexity characteristics of the program are preserved.
    105100This also applies for the translation from assembly language to machine code.
    106 
    107 Naturally, this raises a question: how do we assign an \emph{accurate} cost to a pseudoinstruction?
    108 As mentioned, conditional jumps at the assembly level can jump to a label appearing anywhere in the program.
    109 However, at the machine code level, conditional jumps are limited to jumping `locally', using a measly byte offset.
    110 To translate a jump to a label, a single conditional jump pseudoinstruction may be translated into a block of three real instructions as follows (here, \texttt{JZ} is `jump if accumulator is zero'):
    111 {\small{
    112 \begin{displaymath}
    113 \begin{array}{r@{\quad}l@{\;\;}l@{\qquad}c@{\qquad}l@{\;\;}l}
    114        & \mathtt{JZ}  & \mathtt{label}                      &                 & \mathtt{JZ}   & \text{size of \texttt{SJMP} instruction} \\
    115        & \ldots       &                            & \text{translates to}   & \mathtt{SJMP} & \text{size of \texttt{LJMP} instruction} \\
    116 \mathtt{label:} & \mathtt{MOV} & \mathtt{A}\;\;\mathtt{B}   & \Longrightarrow & \mathtt{LJMP} & \text{address of \textit{label}} \\
    117        &              &                            &                 & \ldots        & \\
    118        &              &                            &                 & \mathtt{MOV}  & \mathtt{A}\;\;\mathtt{B}
    119 \end{array}
    120 \end{displaymath}}}
    121 Here, if \texttt{JZ} fails, we fall through to the \texttt{SJMP} which jumps over the \texttt{LJMP}.
    122 Naturally, if \texttt{label} is close enough, a conditional jump pseudoinstruction is mapped directly to a conditional jump machine instruction; the above translation only applies if \texttt{label} is not sufficiently local.
    123 We address the calculation of whether a label is indeed `close enough' for the simpler translation to be used below.
    124 
    125 Crucially, the above translation demonstrates the difficulty in predicting how many clock cycles a pseudoinstruction will take to execute.
    126 A conditional jump may be mapped to a single machine instruction or a block of three.
    127 Perhaps more insidious is the realisation that the number of cycles needed to execute the instructions in the two branches of a translated conditional jump may be different.
    128 Depending on the particular MCS-51 derivative at hand, an \texttt{SJMP} could in theory take a different number of clock cycles to execute than an \texttt{LJMP}.
    129 These issues must also be dealt with in order to prove that the translation pass preserves the concrete complexity of assembly code, and that the semantics of a program using the MCS-51's I/O facilities does not change.
    130 We address this problem by parameterising the semantics over a cost model.
    131 We prove the preservation of concrete complexity only for the program-dependent cost model induced by the optimisation.
    132101
    133102Yet one more question remains: how do we decide whether to expand a jump into an \texttt{SJMP} or an \texttt{LJMP}?
     
    203172\label{sect.the.proof}
    204173
    205 \subsection{The assembler and semantics of machine code}
    206 \label{subsect.the.assembler.and.semantics.of.machine.code}
     174% ---------------------------------------------------------------------------- %
     175% SECTION                                                                      %
     176% ---------------------------------------------------------------------------- %
     177
     178\subsection{Machine code semantics}
     179\label{subsect.machine.code.semantics}
    207180
    208181Our emulator centres around a \texttt{Status} record, describing the microprocessor's state.
     
    216189\end{lstlisting}
    217190The function \texttt{execute} allows one to execute an arbitrary, but fixed (due to Matita's normalisation requirement) number of steps of a program.
    218 Execution proceeds by a case analysis on the instruction fetched from code memory using the program counter of the input \texttt{Status} record.
    219 
    220 Naturally, assembly programs have analogues.
    221 The counterpart of the \texttt{Status} record is \texttt{PseudoStatus}.
    222 Instead of code memory being implemented as tries of bytes, code memory is here implemented as lists of pseudoinstructions, and program counters are merely indices into this list.
    223 Both \texttt{Status} and \texttt{PseudoStatus} are specialisations of the same \texttt{PreStatus} record, parametric in the representation of code memory.
     191The function \texttt{execute\_1} closely matches the operation of the MCS-51 hardware, as described by a Siemen's manufacturer's data sheet.
     192We first fetch, using the program counter, from code memory the first byte of the instruction to be executed, decoding the resulting opcode, fetching more bytes as is necessary.
     193Decoded instructions are represented as an inductive type:
     194
     195\begin{lstlisting}
     196inductive preinstruction (A: Type[0]): Type[0] :=
     197 | ADD: [[acc_a]] → [[ registr; direct; indirect; data ]] → preinstruction A
     198 | INC: [[ acc_a; registr; direct ; indirect ; dptr ]] → preinstruction A
     199 | JB: [[bit_addr]] → A → preinstruction A
     200 | ...
     201
     202inductive instruction: Type[0] ≝
     203  | LCALL: [[addr16]] → instruction
     204  | AJMP: [[addr11]] → instruction
     205  | RealInstruction: preinstruction [[ relative ]] → instruction.
     206\end{lstlisting}
     207Here, we use dependent types to provide a precise typing for instructions, specifying in their type the permitted addressing modes that their arguments can belong to.
     208The parameterised type $A$ of \texttt{preinstruction} represents the addressing mode for conditional jumps; in \texttt{instruction} we fix this type to be a relative offset in the constructor \texttt{RealInstruction}.
     209
     210Once decoded, execution proceeds by a case analysis on the decoded instruction, following the operation of the hardware.
     211For example:
     212
     213\begin{lstlisting}
     214  | DEC addr $\Rightarrow$
     215    let s := add_ticks1 s in
     216    let $\langle$result, flags$\rangle$ := sub_8_with_carry (get_arg_8 $\ldots$ s true addr)
     217      (bitvector_of_nat 8 1) false in
     218      set_arg_8 $\ldots$ s addr result
     219\end{lstlisting}
     220
     221Here, \texttt{add\_ticks1} models the incrementing of the internal clock of the microprocessor.
     222
     223% ---------------------------------------------------------------------------- %
     224% SECTION                                                                      %
     225% ---------------------------------------------------------------------------- %
     226
     227\subsection{Assembly code semantics}
     228\label{subsect.assembly.code.semantics}
     229
     230An assembly program is a list of potentially labelled pseudoinstructions, bundled with a preamble consisting of a list of symbolic names for locations in data memory.
     231Pseudoinstructions are implemented as an inductive type:
     232\begin{lstlisting}
     233inductive pseudo_instruction: Type[0] ≝
     234  | Instruction: preinstruction Identifier → pseudo_instruction
     235    ...
     236  | Jmp: Identifier → pseudo_instruction
     237  | Call: Identifier → pseudo_instruction
     238  | Mov: [[dptr]] → Identifier → pseudo_instruction.
     239\end{lstlisting}
     240The pseudoinstructions \texttt{Jmp}, \texttt{Call} and \texttt{Mov} are generalisations of machine code unconditional jumps, calls and move instructions respectively, all of whom act on labels, as opposed to concrete memory addresses.
     241Similarly, conditional jumps can now only jump to labels, as is implied by the first constructor of the type above.
     242All other object code instructions are available at the assembly level, with the exception of those that appeared in the \texttt{instruction} type, such as \texttt{ACALL} and \texttt{LJMP}.
     243These are jumps and calls to absolute addresses, which we do not wish to allow at the assembly level.
     244
     245Execution of pseudoinstructions is a function from \texttt{PseudoStatus} to \texttt{PseudoStatus}.
     246Both \texttt{Status} and \texttt{PseudoStatus} are instances of a more general type parameterised over the representation of code memory.
    224247This allows us to share some code that is common to both records (for instance, `setter' and `getter' functions).
    225248
    226 Our analogue of \texttt{execute\_1} is \texttt{execute\_1\_pseudo\_instruction}:
     249Emulation for pseudoinstructions is handled by \texttt{execute\_1\_pseudo\_instruction}:
    227250\begin{lstlisting}
    228251definition execute_1_pseudo_instruction:
    229252 (Word $\rightarrow$ nat $\times$ nat) $\rightarrow$ $\forall$cm. PseudoStatus cm $\rightarrow$ PseudoStatus cm := $\ldots$
    230253\end{lstlisting}
    231 Notice, here, that the emulation function for assembly programs takes an additional argument.
     254Notice, here, that the emulation function for assembly programs takes an additional argument over \texttt{execute\_1}.
    232255This is a function that maps program counters (at the assembly level) to pairs of natural numbers representing the number of clock ticks that the pseudoinstruction needs to execute, post expansion.
    233256We call this function a \emph{costing}, and note that the costing is induced by the particular strategy we use to expand pseudoinstructions.
     
    238261During the proof of correctness of the assembler we relate the clocks of \texttt{Status}es and \texttt{PseudoStatus}es for the policy induced by the cost model and optimisations.
    239262
    240 The assembler, mapping programs consisting of lists of pseudoinstructions to lists of bytes, operates in a mostly straightforward manner.
    241 To a degree of approximation the assembler, on an assembly program consisting of $n$ pseudoinstructions $\mathtt{P_i}$ for $1 \leq i \leq n$, works as in the following diagram (we use $-^{*}$ to denote a combined map and flatten operation):
     263Execution proceeds by first fetching from pseudo-code memory using the program counter---treated as an index into the pseudoinstruction list.
     264No decoding is required.
     265We then proceed by case analysis over the pseudoinstruction, reusing the code for object code for all instructions present in the MCS-51's instruction set.
     266For all newly introduced pseudoinstructions, we simply translate labels to concrete addresses before behaving as a `real' instruction.
     267
     268In contrast to other approaches, we do not perform any kind of symbolic execution, wherein data is the disjoint union of bytes and addresses, with addresses kept opaque and immutable.
     269Labels are immediately translated to concrete addresses, and registers and memory locations only ever contain bytes, never labels.
     270As a consequence, we allow the programmer to mangle, change and generally adjust addresses as they want, under the proviso that the translation process may not be able to preserve the semantics of programs that do this.
     271This will be further discussed in Subsection~\ref{subsect.total.correctness.for.well.behaved.assembly.programs}.
     272
     273% ---------------------------------------------------------------------------- %
     274% SECTION                                                                      %
     275% ---------------------------------------------------------------------------- %
     276
     277\subsection{The assembler}
     278\label{subsect.the.assembler}
     279
     280Conceptually the assembler works in two passes.
     281The first pass expands pseudoinstructions into a list of machine code instructions using the function \texttt{expand\_pseudo\_instruction}.
     282The second pass encodes as a list of bytes the expanded instruction list by mapping the function \texttt{assembly1} across the list, and then flattening.
    242283\begin{displaymath}
    243284[\mathtt{P_1}, \ldots \mathtt{P_n}] \xrightarrow{\left(\mathtt{P_i} \xrightarrow{\mbox{\fontsize{7}{9}\selectfont$\mathtt{expand\_pseudo\_instruction}$}} \mathtt{[I^1_i, \ldots I^q_i]} \xrightarrow{\mbox{\fontsize{7}{9}\selectfont$\mathtt{~~~~~~~~assembly1^{*}~~~~~~~~}$}} \mathtt{[0110]}\right)^{*}} \mathtt{[010101]}
    244285\end{displaymath}
    245 Here $\mathtt{I^j_i}$ for $1 \leq j \leq q$ are the $q$ machine code instructions obtained by expanding, with \texttt{expand\_pseudo\_instruction}, a single pseudoinstruction $P_i$.
    246 Each machine code instruction $\mathtt{I^i_j}$ is then assembled, using the \texttt{assembly1} function, into a list of bytes.
    247 This process is iterated for each pseudoinstruction, before the lists are flattened into a single bit list representation of the original assembly program.
    248 
    249 % ---------------------------------------------------------------------------- %
    250 % SECTION                                                                      %
    251 % ---------------------------------------------------------------------------- %
    252 \subsection{Correctness of the assembler with respect to fetching}
    253 \label{subsect.total.correctness.of.the.assembler}
    254 
    255 Throughout the proof of correctness for assembly we assume that policies are bifurcated into two functions: \texttt{sigma} mapping \texttt{Word} to \texttt{Word} and \texttt{policy} mapping \texttt{Word} to \texttt{bool}.
    256 For our purposes, \texttt{sigma} is most interesting, as it maps pseudo program counters to program counters; \texttt{policy} is merely a technical device used in jump expansion.
    257 
    258 Using our policies, we now work toward proving the total correctness of the assembler.
    259 By `total correctness', we mean that the assembly process never fails when provided with a good policy and that the process does not change the semantics of a certain class of well behaved assembly programs.
    260 By `good policy' we mean that policies provided to us will keep a lockstep correspondence between addresses at the assembly level and addresses at the machine code level:
     286The most complex of the two passes is the first, which expands pseudoinstructions and must perform the task of branch displacement~\cite{holmes:branch:2006}.
     287
     288The branch displacement problems consists of the problem of expanding pseudojumps into their concrete counterparts, preferably as efficiently as possible.
     289For instance, the MCS-51 features three unconditional jump instructions: \texttt{LJMP} and \texttt{SJMP}---`long jump' and `short jump' respectively---and an 11-bit oddity of the MCS-51, \texttt{AJMP}.
     290Each of these three instructions expects arguments in different sizes and behaves in markedly different ways: \texttt{SJMP} may only perform a `local jump'; \texttt{LJMP} may jump to any address in the MCS-51's memory space and \texttt{AJMP} may jump to any address in the current memory page.
     291Consequently, the size of each opcode is different, and to squeeze as much code as possible into the MCS-51's limited code memory, the smallest possible opcode that will suffice should be selected.
     292This is a well known problem to assembler writers who target RISC architectures.
     293
     294Similarly, a conditional pseudojump must be translated potentially into a configuration of machine code instructions, depending on the distance to the jump's target.
     295For example, to translate a jump to a label, a single conditional jump pseudoinstruction may be translated into a block of three real instructions as follows (here, \texttt{JZ} is `jump if accumulator is zero'):
     296{\small{
     297\begin{displaymath}
     298\begin{array}{r@{\quad}l@{\;\;}l@{\qquad}c@{\qquad}l@{\;\;}l}
     299       & \mathtt{JZ}  & \mathtt{label}                      &                 & \mathtt{JZ}   & \text{size of \texttt{SJMP} instruction} \\
     300       & \ldots       &                            & \text{translates to}   & \mathtt{SJMP} & \text{size of \texttt{LJMP} instruction} \\
     301\mathtt{label:} & \mathtt{MOV} & \mathtt{A}\;\;\mathtt{B}   & \Longrightarrow & \mathtt{LJMP} & \text{address of \textit{label}} \\
     302       &              &                            &                 & \ldots        & \\
     303       &              &                            &                 & \mathtt{MOV}  & \mathtt{A}\;\;\mathtt{B}
     304\end{array}
     305\end{displaymath}}}
     306Here, if \texttt{JZ} fails, we fall through to the \texttt{SJMP} which jumps over the \texttt{LJMP}.
     307Naturally, if \texttt{label} is `close enough', a conditional jump pseudoinstruction is mapped directly to a conditional jump machine instruction; the above translation only applies if \texttt{label} is not sufficiently local.
     308
     309In order to implement branch displacement it is impossible to really make the \texttt{expand\_pseudo\_instruction} function completely independent of the encoding function.
     310This is due to branch displacement requiring the distance in bytes of the target of the jump.
     311Moreover the standard solutions for solving the branch displacement problem find their solutions iteratively, by either starting from a solution where all jumps are long, and shrinking them when possible, or starting from a state where all jumps are short and increasing their length as needed.
     312Proving the correctness of such algorithms is already quite involved and the correctness of the assembler as a whole does not depend on the `quality' of the solution found to a branch displacement problem.
     313For this reason, we try to isolate the computation of a branch displacement problem from the proof of correctness for the assembler by parameterising our \texttt{expand\_pseudo\_instruction} by a `policy'.
     314
     315\begin{lstlisting}
     316definition expand_pseudo_instruction:
     317 $\forall$lookup_labels: Identifier $\rightarrow$ Word.
     318 $\forall$sigma: Word $\rightarrow$ Word.
     319 $\forall$policy: Word $\rightarrow$ bool.
     320 $\forall$ppc: Word.
     321 $\forall$lookup_datalabels: Identifier $\rightarrow$ Word.
     322 $\forall$pi: pseudo_instruction.
     323  list instruction := ...
     324\end{lstlisting}
     325Here, the functions \texttt{lookup\_labels} and \texttt{lookup\_datalabels} are the functions that map labels and datalabels to program counters respectively, both of them used in the semantics of assembly.
     326The input \texttt{pi} is the pseudoinstruction to be expanded and is found at address \texttt{ppc} in the assembly program.
     327The policy is defined using the two functions \texttt{sigma} and \texttt{policy}, of which \texttt{sigma} is the most interesting.
     328The function $\sigma$ maps pseudo program counters to program counters: the encoding of the expansion of the pseudoinstruction found at address \texttt{a} in the assembly code should be placed into code memory at address \texttt{sigma(a)}.
     329Of course this is possible only if the policy is correct, which means that the encoding of consecutive assembly instructions must be consecutive in code memory.
    261330\begin{displaymath}
    262331\texttt{sigma}(\texttt{ppc} + 1) = \texttt{pc} + \texttt{current\_instruction\_size}
    263332\end{displaymath}
    264 along with some other technical properties related to program counters falling within the bounds of our programs.
     333Here, \texttt{current\_instruction\_size} is the size in bytes of the encoding of the expanded pseudoinstruction found at \texttt{ppc}.
     334The \texttt{expand\_pseudo\_instruction} function uses the \texttt{sigma} map to determine the size of jump required when expanding pseudojumps, computing the jump size by examining the size of the differences between program counters.
     335In some circumstances the decision as to what machine code jump a particular pseudojump should be expanded into is ambiguous, and requires the use of the \texttt{policy} function to force a decision.
     336
     337% ---------------------------------------------------------------------------- %
     338% SECTION                                                                      %
     339% ---------------------------------------------------------------------------- %
     340\subsection{Correctness of the assembler with respect to fetching}
     341\label{subsect.total.correctness.of.the.assembler}
     342
     343Throughout the proof of correctness for assembly we assume that policies are bifurcated into two functions: \texttt{sigma} mapping \texttt{Word} to \texttt{Word} and \texttt{policy} mapping \texttt{Word} to \texttt{bool}.
     344For our purposes, \texttt{sigma} is most interesting, as it maps pseudo program counters to program counters; \texttt{policy} is merely a technical device used in jump expansion.
     345
     346Using our policies, we now work toward proving the total correctness of the assembler.
     347By `total correctness', we mean that the assembly process never fails when provided with a good policy and that the process does not change the semantics of a certain class of well behaved assembly programs.
     348By `good policy' we mean
    265349We assume the correctness of the policies given to us using a function, \texttt{sigma\_policy\_specification} that we take in input, when needed.
    266350
Note: See TracChangeset for help on using the changeset viewer.