- Timestamp:
- Jun 13, 2012, 6:46:06 PM (9 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
src/ASM/CPP2012-asm/cpp-2012-asm.tex
r2063 r2066 78 78 As a result our C compiler, to have any sort of hope of successfully compiling realistic programs for embedded devices, ought to produce `tight' machine code. 79 79 80 For example, the MCS-51 features three unconditional jump instructions: \texttt{LJMP} and \texttt{SJMP}---`long jump' and `short jump' respectively---and an 11-bit oddity of the MCS-51, \texttt{AJMP}.81 Each of these three instructions expects arguments in different sizes and behaves in markedly different ways: \texttt{SJMP} may only perform a `local jump'; \texttt{LJMP} may jump to any address in the MCS-51's memory space and \texttt{AJMP} may jump to any address in the current memory page.82 Consequently, the size of each opcode is different, and to squeeze as much code as possible into the MCS-51's limited code memory, the smallest possible opcode that will suffice should be selected.83 This is a well known problem to assembler writers who target RISC architectures, often referred to as `branch displacement'~\cite{holmes:branch:2006}.84 85 80 Branch displacement is not a simple problem to solve and requires the implementation of an optimising assembler. 86 81 Labels, conditional jumps to labels, a program preamble containing global data and a \texttt{MOV} instruction for moving this global data into the MCS-51's one 16-bit register all feature in our assembly language. … … 104 99 In each translation pass from intermediate language to intermediate language, we must prove that both the meaning and concrete complexity characteristics of the program are preserved. 105 100 This also applies for the translation from assembly language to machine code. 106 107 Naturally, this raises a question: how do we assign an \emph{accurate} cost to a pseudoinstruction?108 As mentioned, conditional jumps at the assembly level can jump to a label appearing anywhere in the program.109 However, at the machine code level, conditional jumps are limited to jumping `locally', using a measly byte offset.110 To translate a jump to a label, a single conditional jump pseudoinstruction may be translated into a block of three real instructions as follows (here, \texttt{JZ} is `jump if accumulator is zero'):111 {\small{112 \begin{displaymath}113 \begin{array}{r@{\quad}l@{\;\;}l@{\qquad}c@{\qquad}l@{\;\;}l}114 & \mathtt{JZ} & \mathtt{label} & & \mathtt{JZ} & \text{size of \texttt{SJMP} instruction} \\115 & \ldots & & \text{translates to} & \mathtt{SJMP} & \text{size of \texttt{LJMP} instruction} \\116 \mathtt{label:} & \mathtt{MOV} & \mathtt{A}\;\;\mathtt{B} & \Longrightarrow & \mathtt{LJMP} & \text{address of \textit{label}} \\117 & & & & \ldots & \\118 & & & & \mathtt{MOV} & \mathtt{A}\;\;\mathtt{B}119 \end{array}120 \end{displaymath}}}121 Here, if \texttt{JZ} fails, we fall through to the \texttt{SJMP} which jumps over the \texttt{LJMP}.122 Naturally, if \texttt{label} is close enough, a conditional jump pseudoinstruction is mapped directly to a conditional jump machine instruction; the above translation only applies if \texttt{label} is not sufficiently local.123 We address the calculation of whether a label is indeed `close enough' for the simpler translation to be used below.124 125 Crucially, the above translation demonstrates the difficulty in predicting how many clock cycles a pseudoinstruction will take to execute.126 A conditional jump may be mapped to a single machine instruction or a block of three.127 Perhaps more insidious is the realisation that the number of cycles needed to execute the instructions in the two branches of a translated conditional jump may be different.128 Depending on the particular MCS-51 derivative at hand, an \texttt{SJMP} could in theory take a different number of clock cycles to execute than an \texttt{LJMP}.129 These issues must also be dealt with in order to prove that the translation pass preserves the concrete complexity of assembly code, and that the semantics of a program using the MCS-51's I/O facilities does not change.130 We address this problem by parameterising the semantics over a cost model.131 We prove the preservation of concrete complexity only for the program-dependent cost model induced by the optimisation.132 101 133 102 Yet one more question remains: how do we decide whether to expand a jump into an \texttt{SJMP} or an \texttt{LJMP}? … … 203 172 \label{sect.the.proof} 204 173 205 \subsection{The assembler and semantics of machine code} 206 \label{subsect.the.assembler.and.semantics.of.machine.code} 174 % ---------------------------------------------------------------------------- % 175 % SECTION % 176 % ---------------------------------------------------------------------------- % 177 178 \subsection{Machine code semantics} 179 \label{subsect.machine.code.semantics} 207 180 208 181 Our emulator centres around a \texttt{Status} record, describing the microprocessor's state. … … 216 189 \end{lstlisting} 217 190 The function \texttt{execute} allows one to execute an arbitrary, but fixed (due to Matita's normalisation requirement) number of steps of a program. 218 Execution proceeds by a case analysis on the instruction fetched from code memory using the program counter of the input \texttt{Status} record. 219 220 Naturally, assembly programs have analogues. 221 The counterpart of the \texttt{Status} record is \texttt{PseudoStatus}. 222 Instead of code memory being implemented as tries of bytes, code memory is here implemented as lists of pseudoinstructions, and program counters are merely indices into this list. 223 Both \texttt{Status} and \texttt{PseudoStatus} are specialisations of the same \texttt{PreStatus} record, parametric in the representation of code memory. 191 The function \texttt{execute\_1} closely matches the operation of the MCS-51 hardware, as described by a Siemen's manufacturer's data sheet. 192 We first fetch, using the program counter, from code memory the first byte of the instruction to be executed, decoding the resulting opcode, fetching more bytes as is necessary. 193 Decoded instructions are represented as an inductive type: 194 195 \begin{lstlisting} 196 inductive preinstruction (A: Type[0]): Type[0] := 197 | ADD: [[acc_a]] → [[ registr; direct; indirect; data ]] → preinstruction A 198 | INC: [[ acc_a; registr; direct ; indirect ; dptr ]] → preinstruction A 199 | JB: [[bit_addr]] → A → preinstruction A 200 | ... 201 202 inductive instruction: Type[0] ≝ 203 | LCALL: [[addr16]] → instruction 204 | AJMP: [[addr11]] → instruction 205 | RealInstruction: preinstruction [[ relative ]] → instruction. 206 \end{lstlisting} 207 Here, we use dependent types to provide a precise typing for instructions, specifying in their type the permitted addressing modes that their arguments can belong to. 208 The parameterised type $A$ of \texttt{preinstruction} represents the addressing mode for conditional jumps; in \texttt{instruction} we fix this type to be a relative offset in the constructor \texttt{RealInstruction}. 209 210 Once decoded, execution proceeds by a case analysis on the decoded instruction, following the operation of the hardware. 211 For example: 212 213 \begin{lstlisting} 214 | DEC addr $\Rightarrow$ 215 let s := add_ticks1 s in 216 let $\langle$result, flags$\rangle$ := sub_8_with_carry (get_arg_8 $\ldots$ s true addr) 217 (bitvector_of_nat 8 1) false in 218 set_arg_8 $\ldots$ s addr result 219 \end{lstlisting} 220 221 Here, \texttt{add\_ticks1} models the incrementing of the internal clock of the microprocessor. 222 223 % ---------------------------------------------------------------------------- % 224 % SECTION % 225 % ---------------------------------------------------------------------------- % 226 227 \subsection{Assembly code semantics} 228 \label{subsect.assembly.code.semantics} 229 230 An assembly program is a list of potentially labelled pseudoinstructions, bundled with a preamble consisting of a list of symbolic names for locations in data memory. 231 Pseudoinstructions are implemented as an inductive type: 232 \begin{lstlisting} 233 inductive pseudo_instruction: Type[0] ≝ 234 | Instruction: preinstruction Identifier → pseudo_instruction 235 ... 236 | Jmp: Identifier → pseudo_instruction 237 | Call: Identifier → pseudo_instruction 238 | Mov: [[dptr]] → Identifier → pseudo_instruction. 239 \end{lstlisting} 240 The pseudoinstructions \texttt{Jmp}, \texttt{Call} and \texttt{Mov} are generalisations of machine code unconditional jumps, calls and move instructions respectively, all of whom act on labels, as opposed to concrete memory addresses. 241 Similarly, conditional jumps can now only jump to labels, as is implied by the first constructor of the type above. 242 All other object code instructions are available at the assembly level, with the exception of those that appeared in the \texttt{instruction} type, such as \texttt{ACALL} and \texttt{LJMP}. 243 These are jumps and calls to absolute addresses, which we do not wish to allow at the assembly level. 244 245 Execution of pseudoinstructions is a function from \texttt{PseudoStatus} to \texttt{PseudoStatus}. 246 Both \texttt{Status} and \texttt{PseudoStatus} are instances of a more general type parameterised over the representation of code memory. 224 247 This allows us to share some code that is common to both records (for instance, `setter' and `getter' functions). 225 248 226 Our analogue of \texttt{execute\_1} is\texttt{execute\_1\_pseudo\_instruction}:249 Emulation for pseudoinstructions is handled by \texttt{execute\_1\_pseudo\_instruction}: 227 250 \begin{lstlisting} 228 251 definition execute_1_pseudo_instruction: 229 252 (Word $\rightarrow$ nat $\times$ nat) $\rightarrow$ $\forall$cm. PseudoStatus cm $\rightarrow$ PseudoStatus cm := $\ldots$ 230 253 \end{lstlisting} 231 Notice, here, that the emulation function for assembly programs takes an additional argument .254 Notice, here, that the emulation function for assembly programs takes an additional argument over \texttt{execute\_1}. 232 255 This is a function that maps program counters (at the assembly level) to pairs of natural numbers representing the number of clock ticks that the pseudoinstruction needs to execute, post expansion. 233 256 We call this function a \emph{costing}, and note that the costing is induced by the particular strategy we use to expand pseudoinstructions. … … 238 261 During the proof of correctness of the assembler we relate the clocks of \texttt{Status}es and \texttt{PseudoStatus}es for the policy induced by the cost model and optimisations. 239 262 240 The assembler, mapping programs consisting of lists of pseudoinstructions to lists of bytes, operates in a mostly straightforward manner. 241 To a degree of approximation the assembler, on an assembly program consisting of $n$ pseudoinstructions $\mathtt{P_i}$ for $1 \leq i \leq n$, works as in the following diagram (we use $-^{*}$ to denote a combined map and flatten operation): 263 Execution proceeds by first fetching from pseudo-code memory using the program counter---treated as an index into the pseudoinstruction list. 264 No decoding is required. 265 We then proceed by case analysis over the pseudoinstruction, reusing the code for object code for all instructions present in the MCS-51's instruction set. 266 For all newly introduced pseudoinstructions, we simply translate labels to concrete addresses before behaving as a `real' instruction. 267 268 In contrast to other approaches, we do not perform any kind of symbolic execution, wherein data is the disjoint union of bytes and addresses, with addresses kept opaque and immutable. 269 Labels are immediately translated to concrete addresses, and registers and memory locations only ever contain bytes, never labels. 270 As a consequence, we allow the programmer to mangle, change and generally adjust addresses as they want, under the proviso that the translation process may not be able to preserve the semantics of programs that do this. 271 This will be further discussed in Subsection~\ref{subsect.total.correctness.for.well.behaved.assembly.programs}. 272 273 % ---------------------------------------------------------------------------- % 274 % SECTION % 275 % ---------------------------------------------------------------------------- % 276 277 \subsection{The assembler} 278 \label{subsect.the.assembler} 279 280 Conceptually the assembler works in two passes. 281 The first pass expands pseudoinstructions into a list of machine code instructions using the function \texttt{expand\_pseudo\_instruction}. 282 The second pass encodes as a list of bytes the expanded instruction list by mapping the function \texttt{assembly1} across the list, and then flattening. 242 283 \begin{displaymath} 243 284 [\mathtt{P_1}, \ldots \mathtt{P_n}] \xrightarrow{\left(\mathtt{P_i} \xrightarrow{\mbox{\fontsize{7}{9}\selectfont$\mathtt{expand\_pseudo\_instruction}$}} \mathtt{[I^1_i, \ldots I^q_i]} \xrightarrow{\mbox{\fontsize{7}{9}\selectfont$\mathtt{~~~~~~~~assembly1^{*}~~~~~~~~}$}} \mathtt{[0110]}\right)^{*}} \mathtt{[010101]} 244 285 \end{displaymath} 245 Here $\mathtt{I^j_i}$ for $1 \leq j \leq q$ are the $q$ machine code instructions obtained by expanding, with \texttt{expand\_pseudo\_instruction}, a single pseudoinstruction $P_i$. 246 Each machine code instruction $\mathtt{I^i_j}$ is then assembled, using the \texttt{assembly1} function, into a list of bytes. 247 This process is iterated for each pseudoinstruction, before the lists are flattened into a single bit list representation of the original assembly program. 248 249 % ---------------------------------------------------------------------------- % 250 % SECTION % 251 % ---------------------------------------------------------------------------- % 252 \subsection{Correctness of the assembler with respect to fetching} 253 \label{subsect.total.correctness.of.the.assembler} 254 255 Throughout the proof of correctness for assembly we assume that policies are bifurcated into two functions: \texttt{sigma} mapping \texttt{Word} to \texttt{Word} and \texttt{policy} mapping \texttt{Word} to \texttt{bool}. 256 For our purposes, \texttt{sigma} is most interesting, as it maps pseudo program counters to program counters; \texttt{policy} is merely a technical device used in jump expansion. 257 258 Using our policies, we now work toward proving the total correctness of the assembler. 259 By `total correctness', we mean that the assembly process never fails when provided with a good policy and that the process does not change the semantics of a certain class of well behaved assembly programs. 260 By `good policy' we mean that policies provided to us will keep a lockstep correspondence between addresses at the assembly level and addresses at the machine code level: 286 The most complex of the two passes is the first, which expands pseudoinstructions and must perform the task of branch displacement~\cite{holmes:branch:2006}. 287 288 The branch displacement problems consists of the problem of expanding pseudojumps into their concrete counterparts, preferably as efficiently as possible. 289 For instance, the MCS-51 features three unconditional jump instructions: \texttt{LJMP} and \texttt{SJMP}---`long jump' and `short jump' respectively---and an 11-bit oddity of the MCS-51, \texttt{AJMP}. 290 Each of these three instructions expects arguments in different sizes and behaves in markedly different ways: \texttt{SJMP} may only perform a `local jump'; \texttt{LJMP} may jump to any address in the MCS-51's memory space and \texttt{AJMP} may jump to any address in the current memory page. 291 Consequently, the size of each opcode is different, and to squeeze as much code as possible into the MCS-51's limited code memory, the smallest possible opcode that will suffice should be selected. 292 This is a well known problem to assembler writers who target RISC architectures. 293 294 Similarly, a conditional pseudojump must be translated potentially into a configuration of machine code instructions, depending on the distance to the jump's target. 295 For example, to translate a jump to a label, a single conditional jump pseudoinstruction may be translated into a block of three real instructions as follows (here, \texttt{JZ} is `jump if accumulator is zero'): 296 {\small{ 297 \begin{displaymath} 298 \begin{array}{r@{\quad}l@{\;\;}l@{\qquad}c@{\qquad}l@{\;\;}l} 299 & \mathtt{JZ} & \mathtt{label} & & \mathtt{JZ} & \text{size of \texttt{SJMP} instruction} \\ 300 & \ldots & & \text{translates to} & \mathtt{SJMP} & \text{size of \texttt{LJMP} instruction} \\ 301 \mathtt{label:} & \mathtt{MOV} & \mathtt{A}\;\;\mathtt{B} & \Longrightarrow & \mathtt{LJMP} & \text{address of \textit{label}} \\ 302 & & & & \ldots & \\ 303 & & & & \mathtt{MOV} & \mathtt{A}\;\;\mathtt{B} 304 \end{array} 305 \end{displaymath}}} 306 Here, if \texttt{JZ} fails, we fall through to the \texttt{SJMP} which jumps over the \texttt{LJMP}. 307 Naturally, if \texttt{label} is `close enough', a conditional jump pseudoinstruction is mapped directly to a conditional jump machine instruction; the above translation only applies if \texttt{label} is not sufficiently local. 308 309 In order to implement branch displacement it is impossible to really make the \texttt{expand\_pseudo\_instruction} function completely independent of the encoding function. 310 This is due to branch displacement requiring the distance in bytes of the target of the jump. 311 Moreover the standard solutions for solving the branch displacement problem find their solutions iteratively, by either starting from a solution where all jumps are long, and shrinking them when possible, or starting from a state where all jumps are short and increasing their length as needed. 312 Proving the correctness of such algorithms is already quite involved and the correctness of the assembler as a whole does not depend on the `quality' of the solution found to a branch displacement problem. 313 For this reason, we try to isolate the computation of a branch displacement problem from the proof of correctness for the assembler by parameterising our \texttt{expand\_pseudo\_instruction} by a `policy'. 314 315 \begin{lstlisting} 316 definition expand_pseudo_instruction: 317 $\forall$lookup_labels: Identifier $\rightarrow$ Word. 318 $\forall$sigma: Word $\rightarrow$ Word. 319 $\forall$policy: Word $\rightarrow$ bool. 320 $\forall$ppc: Word. 321 $\forall$lookup_datalabels: Identifier $\rightarrow$ Word. 322 $\forall$pi: pseudo_instruction. 323 list instruction := ... 324 \end{lstlisting} 325 Here, the functions \texttt{lookup\_labels} and \texttt{lookup\_datalabels} are the functions that map labels and datalabels to program counters respectively, both of them used in the semantics of assembly. 326 The input \texttt{pi} is the pseudoinstruction to be expanded and is found at address \texttt{ppc} in the assembly program. 327 The policy is defined using the two functions \texttt{sigma} and \texttt{policy}, of which \texttt{sigma} is the most interesting. 328 The function $\sigma$ maps pseudo program counters to program counters: the encoding of the expansion of the pseudoinstruction found at address \texttt{a} in the assembly code should be placed into code memory at address \texttt{sigma(a)}. 329 Of course this is possible only if the policy is correct, which means that the encoding of consecutive assembly instructions must be consecutive in code memory. 261 330 \begin{displaymath} 262 331 \texttt{sigma}(\texttt{ppc} + 1) = \texttt{pc} + \texttt{current\_instruction\_size} 263 332 \end{displaymath} 264 along with some other technical properties related to program counters falling within the bounds of our programs. 333 Here, \texttt{current\_instruction\_size} is the size in bytes of the encoding of the expanded pseudoinstruction found at \texttt{ppc}. 334 The \texttt{expand\_pseudo\_instruction} function uses the \texttt{sigma} map to determine the size of jump required when expanding pseudojumps, computing the jump size by examining the size of the differences between program counters. 335 In some circumstances the decision as to what machine code jump a particular pseudojump should be expanded into is ambiguous, and requires the use of the \texttt{policy} function to force a decision. 336 337 % ---------------------------------------------------------------------------- % 338 % SECTION % 339 % ---------------------------------------------------------------------------- % 340 \subsection{Correctness of the assembler with respect to fetching} 341 \label{subsect.total.correctness.of.the.assembler} 342 343 Throughout the proof of correctness for assembly we assume that policies are bifurcated into two functions: \texttt{sigma} mapping \texttt{Word} to \texttt{Word} and \texttt{policy} mapping \texttt{Word} to \texttt{bool}. 344 For our purposes, \texttt{sigma} is most interesting, as it maps pseudo program counters to program counters; \texttt{policy} is merely a technical device used in jump expansion. 345 346 Using our policies, we now work toward proving the total correctness of the assembler. 347 By `total correctness', we mean that the assembly process never fails when provided with a good policy and that the process does not change the semantics of a certain class of well behaved assembly programs. 348 By `good policy' we mean 265 349 We assume the correctness of the policies given to us using a function, \texttt{sigma\_policy\_specification} that we take in input, when needed. 266 350
Note: See TracChangeset
for help on using the changeset viewer.