# Changeset 2066

Ignore:
Timestamp:
Jun 13, 2012, 6:46:06 PM (6 years ago)
Message:

Finished for the day.

File:
1 edited

### Legend:

Unmodified
 r2063 As a result our C compiler, to have any sort of hope of successfully compiling realistic programs for embedded devices, ought to produce tight' machine code. For example, the MCS-51 features three unconditional jump instructions: \texttt{LJMP} and \texttt{SJMP}---long jump' and short jump' respectively---and an 11-bit oddity of the MCS-51, \texttt{AJMP}. Each of these three instructions expects arguments in different sizes and behaves in markedly different ways: \texttt{SJMP} may only perform a local jump'; \texttt{LJMP} may jump to any address in the MCS-51's memory space and \texttt{AJMP} may jump to any address in the current memory page. Consequently, the size of each opcode is different, and to squeeze as much code as possible into the MCS-51's limited code memory, the smallest possible opcode that will suffice should be selected. This is a well known problem to assembler writers who target RISC architectures, often referred to as branch displacement'~\cite{holmes:branch:2006}. Branch displacement is not a simple problem to solve and requires the implementation of an optimising assembler. Labels, conditional jumps to labels, a program preamble containing global data and a \texttt{MOV} instruction for moving this global data into the MCS-51's one 16-bit register all feature in our assembly language. In each translation pass from intermediate language to intermediate language, we must prove that both the meaning and concrete complexity characteristics of the program are preserved. This also applies for the translation from assembly language to machine code. Naturally, this raises a question: how do we assign an \emph{accurate} cost to a pseudoinstruction? As mentioned, conditional jumps at the assembly level can jump to a label appearing anywhere in the program. However, at the machine code level, conditional jumps are limited to jumping locally', using a measly byte offset. To translate a jump to a label, a single conditional jump pseudoinstruction may be translated into a block of three real instructions as follows (here, \texttt{JZ} is jump if accumulator is zero'): {\small{ \begin{displaymath} \begin{array}{r@{\quad}l@{\;\;}l@{\qquad}c@{\qquad}l@{\;\;}l} & \mathtt{JZ}  & \mathtt{label}                      &                 & \mathtt{JZ}   & \text{size of \texttt{SJMP} instruction} \\ & \ldots       &                            & \text{translates to}   & \mathtt{SJMP} & \text{size of \texttt{LJMP} instruction} \\ \mathtt{label:} & \mathtt{MOV} & \mathtt{A}\;\;\mathtt{B}   & \Longrightarrow & \mathtt{LJMP} & \text{address of \textit{label}} \\ &              &                            &                 & \ldots        & \\ &              &                            &                 & \mathtt{MOV}  & \mathtt{A}\;\;\mathtt{B} \end{array} \end{displaymath}}} Here, if \texttt{JZ} fails, we fall through to the \texttt{SJMP} which jumps over the \texttt{LJMP}. Naturally, if \texttt{label} is close enough, a conditional jump pseudoinstruction is mapped directly to a conditional jump machine instruction; the above translation only applies if \texttt{label} is not sufficiently local. We address the calculation of whether a label is indeed close enough' for the simpler translation to be used below. Crucially, the above translation demonstrates the difficulty in predicting how many clock cycles a pseudoinstruction will take to execute. A conditional jump may be mapped to a single machine instruction or a block of three. Perhaps more insidious is the realisation that the number of cycles needed to execute the instructions in the two branches of a translated conditional jump may be different. Depending on the particular MCS-51 derivative at hand, an \texttt{SJMP} could in theory take a different number of clock cycles to execute than an \texttt{LJMP}. These issues must also be dealt with in order to prove that the translation pass preserves the concrete complexity of assembly code, and that the semantics of a program using the MCS-51's I/O facilities does not change. We address this problem by parameterising the semantics over a cost model. We prove the preservation of concrete complexity only for the program-dependent cost model induced by the optimisation. Yet one more question remains: how do we decide whether to expand a jump into an \texttt{SJMP} or an \texttt{LJMP}? \label{sect.the.proof} \subsection{The assembler and semantics of machine code} \label{subsect.the.assembler.and.semantics.of.machine.code} % ---------------------------------------------------------------------------- % % SECTION                                                                      % % ---------------------------------------------------------------------------- % \subsection{Machine code semantics} \label{subsect.machine.code.semantics} Our emulator centres around a \texttt{Status} record, describing the microprocessor's state. \end{lstlisting} The function \texttt{execute} allows one to execute an arbitrary, but fixed (due to Matita's normalisation requirement) number of steps of a program. Execution proceeds by a case analysis on the instruction fetched from code memory using the program counter of the input \texttt{Status} record. Naturally, assembly programs have analogues. The counterpart of the \texttt{Status} record is \texttt{PseudoStatus}. Instead of code memory being implemented as tries of bytes, code memory is here implemented as lists of pseudoinstructions, and program counters are merely indices into this list. Both \texttt{Status} and \texttt{PseudoStatus} are specialisations of the same \texttt{PreStatus} record, parametric in the representation of code memory. The function \texttt{execute\_1} closely matches the operation of the MCS-51 hardware, as described by a Siemen's manufacturer's data sheet. We first fetch, using the program counter, from code memory the first byte of the instruction to be executed, decoding the resulting opcode, fetching more bytes as is necessary. Decoded instructions are represented as an inductive type: \begin{lstlisting} inductive preinstruction (A: Type[0]): Type[0] := | ADD: [[acc_a]] → [[ registr; direct; indirect; data ]] → preinstruction A | INC: [[ acc_a; registr; direct ; indirect ; dptr ]] → preinstruction A | JB: [[bit_addr]] → A → preinstruction A | ... inductive instruction: Type[0] ≝ | LCALL: [[addr16]] → instruction | AJMP: [[addr11]] → instruction | RealInstruction: preinstruction [[ relative ]] → instruction. \end{lstlisting} Here, we use dependent types to provide a precise typing for instructions, specifying in their type the permitted addressing modes that their arguments can belong to. The parameterised type $A$ of \texttt{preinstruction} represents the addressing mode for conditional jumps; in \texttt{instruction} we fix this type to be a relative offset in the constructor \texttt{RealInstruction}. Once decoded, execution proceeds by a case analysis on the decoded instruction, following the operation of the hardware. For example: \begin{lstlisting} | DEC addr $\Rightarrow$ let s := add_ticks1 s in let $\langle$result, flags$\rangle$ := sub_8_with_carry (get_arg_8 $\ldots$ s true addr) (bitvector_of_nat 8 1) false in set_arg_8 $\ldots$ s addr result \end{lstlisting} Here, \texttt{add\_ticks1} models the incrementing of the internal clock of the microprocessor. % ---------------------------------------------------------------------------- % % SECTION                                                                      % % ---------------------------------------------------------------------------- % \subsection{Assembly code semantics} \label{subsect.assembly.code.semantics} An assembly program is a list of potentially labelled pseudoinstructions, bundled with a preamble consisting of a list of symbolic names for locations in data memory. Pseudoinstructions are implemented as an inductive type: \begin{lstlisting} inductive pseudo_instruction: Type[0] ≝ | Instruction: preinstruction Identifier → pseudo_instruction ... | Jmp: Identifier → pseudo_instruction | Call: Identifier → pseudo_instruction | Mov: [[dptr]] → Identifier → pseudo_instruction. \end{lstlisting} The pseudoinstructions \texttt{Jmp}, \texttt{Call} and \texttt{Mov} are generalisations of machine code unconditional jumps, calls and move instructions respectively, all of whom act on labels, as opposed to concrete memory addresses. Similarly, conditional jumps can now only jump to labels, as is implied by the first constructor of the type above. All other object code instructions are available at the assembly level, with the exception of those that appeared in the \texttt{instruction} type, such as \texttt{ACALL} and \texttt{LJMP}. These are jumps and calls to absolute addresses, which we do not wish to allow at the assembly level. Execution of pseudoinstructions is a function from \texttt{PseudoStatus} to \texttt{PseudoStatus}. Both \texttt{Status} and \texttt{PseudoStatus} are instances of a more general type parameterised over the representation of code memory. This allows us to share some code that is common to both records (for instance, setter' and getter' functions). Our analogue of \texttt{execute\_1} is \texttt{execute\_1\_pseudo\_instruction}: Emulation for pseudoinstructions is handled by \texttt{execute\_1\_pseudo\_instruction}: \begin{lstlisting} definition execute_1_pseudo_instruction: (Word $\rightarrow$ nat $\times$ nat) $\rightarrow$ $\forall$cm. PseudoStatus cm $\rightarrow$ PseudoStatus cm := $\ldots$ \end{lstlisting} Notice, here, that the emulation function for assembly programs takes an additional argument. Notice, here, that the emulation function for assembly programs takes an additional argument over \texttt{execute\_1}. This is a function that maps program counters (at the assembly level) to pairs of natural numbers representing the number of clock ticks that the pseudoinstruction needs to execute, post expansion. We call this function a \emph{costing}, and note that the costing is induced by the particular strategy we use to expand pseudoinstructions. During the proof of correctness of the assembler we relate the clocks of \texttt{Status}es and \texttt{PseudoStatus}es for the policy induced by the cost model and optimisations. The assembler, mapping programs consisting of lists of pseudoinstructions to lists of bytes, operates in a mostly straightforward manner. To a degree of approximation the assembler, on an assembly program consisting of $n$ pseudoinstructions $\mathtt{P_i}$ for $1 \leq i \leq n$, works as in the following diagram (we use $-^{*}$ to denote a combined map and flatten operation): Execution proceeds by first fetching from pseudo-code memory using the program counter---treated as an index into the pseudoinstruction list. No decoding is required. We then proceed by case analysis over the pseudoinstruction, reusing the code for object code for all instructions present in the MCS-51's instruction set. For all newly introduced pseudoinstructions, we simply translate labels to concrete addresses before behaving as a real' instruction. In contrast to other approaches, we do not perform any kind of symbolic execution, wherein data is the disjoint union of bytes and addresses, with addresses kept opaque and immutable. Labels are immediately translated to concrete addresses, and registers and memory locations only ever contain bytes, never labels. As a consequence, we allow the programmer to mangle, change and generally adjust addresses as they want, under the proviso that the translation process may not be able to preserve the semantics of programs that do this. This will be further discussed in Subsection~\ref{subsect.total.correctness.for.well.behaved.assembly.programs}. % ---------------------------------------------------------------------------- % % SECTION                                                                      % % ---------------------------------------------------------------------------- % \subsection{The assembler} \label{subsect.the.assembler} Conceptually the assembler works in two passes. The first pass expands pseudoinstructions into a list of machine code instructions using the function \texttt{expand\_pseudo\_instruction}. The second pass encodes as a list of bytes the expanded instruction list by mapping the function \texttt{assembly1} across the list, and then flattening. \begin{displaymath} [\mathtt{P_1}, \ldots \mathtt{P_n}] \xrightarrow{\left(\mathtt{P_i} \xrightarrow{\mbox{\fontsize{7}{9}\selectfont$\mathtt{expand\_pseudo\_instruction}$}} \mathtt{[I^1_i, \ldots I^q_i]} \xrightarrow{\mbox{\fontsize{7}{9}\selectfont$\mathtt{~~~~~~~~assembly1^{*}~~~~~~~~}$}} \mathtt{[0110]}\right)^{*}} \mathtt{[010101]} \end{displaymath} Here $\mathtt{I^j_i}$ for $1 \leq j \leq q$ are the $q$ machine code instructions obtained by expanding, with \texttt{expand\_pseudo\_instruction}, a single pseudoinstruction $P_i$. Each machine code instruction $\mathtt{I^i_j}$ is then assembled, using the \texttt{assembly1} function, into a list of bytes. This process is iterated for each pseudoinstruction, before the lists are flattened into a single bit list representation of the original assembly program. % ---------------------------------------------------------------------------- % % SECTION                                                                      % % ---------------------------------------------------------------------------- % \subsection{Correctness of the assembler with respect to fetching} \label{subsect.total.correctness.of.the.assembler} Throughout the proof of correctness for assembly we assume that policies are bifurcated into two functions: \texttt{sigma} mapping \texttt{Word} to \texttt{Word} and \texttt{policy} mapping \texttt{Word} to \texttt{bool}. For our purposes, \texttt{sigma} is most interesting, as it maps pseudo program counters to program counters; \texttt{policy} is merely a technical device used in jump expansion. Using our policies, we now work toward proving the total correctness of the assembler. By total correctness', we mean that the assembly process never fails when provided with a good policy and that the process does not change the semantics of a certain class of well behaved assembly programs. By good policy' we mean that policies provided to us will keep a lockstep correspondence between addresses at the assembly level and addresses at the machine code level: The most complex of the two passes is the first, which expands pseudoinstructions and must perform the task of branch displacement~\cite{holmes:branch:2006}. The branch displacement problems consists of the problem of expanding pseudojumps into their concrete counterparts, preferably as efficiently as possible. For instance, the MCS-51 features three unconditional jump instructions: \texttt{LJMP} and \texttt{SJMP}---long jump' and short jump' respectively---and an 11-bit oddity of the MCS-51, \texttt{AJMP}. Each of these three instructions expects arguments in different sizes and behaves in markedly different ways: \texttt{SJMP} may only perform a local jump'; \texttt{LJMP} may jump to any address in the MCS-51's memory space and \texttt{AJMP} may jump to any address in the current memory page. Consequently, the size of each opcode is different, and to squeeze as much code as possible into the MCS-51's limited code memory, the smallest possible opcode that will suffice should be selected. This is a well known problem to assembler writers who target RISC architectures. Similarly, a conditional pseudojump must be translated potentially into a configuration of machine code instructions, depending on the distance to the jump's target. For example, to translate a jump to a label, a single conditional jump pseudoinstruction may be translated into a block of three real instructions as follows (here, \texttt{JZ} is jump if accumulator is zero'): {\small{ \begin{displaymath} \begin{array}{r@{\quad}l@{\;\;}l@{\qquad}c@{\qquad}l@{\;\;}l} & \mathtt{JZ}  & \mathtt{label}                      &                 & \mathtt{JZ}   & \text{size of \texttt{SJMP} instruction} \\ & \ldots       &                            & \text{translates to}   & \mathtt{SJMP} & \text{size of \texttt{LJMP} instruction} \\ \mathtt{label:} & \mathtt{MOV} & \mathtt{A}\;\;\mathtt{B}   & \Longrightarrow & \mathtt{LJMP} & \text{address of \textit{label}} \\ &              &                            &                 & \ldots        & \\ &              &                            &                 & \mathtt{MOV}  & \mathtt{A}\;\;\mathtt{B} \end{array} \end{displaymath}}} Here, if \texttt{JZ} fails, we fall through to the \texttt{SJMP} which jumps over the \texttt{LJMP}. Naturally, if \texttt{label} is close enough', a conditional jump pseudoinstruction is mapped directly to a conditional jump machine instruction; the above translation only applies if \texttt{label} is not sufficiently local. In order to implement branch displacement it is impossible to really make the \texttt{expand\_pseudo\_instruction} function completely independent of the encoding function. This is due to branch displacement requiring the distance in bytes of the target of the jump. Moreover the standard solutions for solving the branch displacement problem find their solutions iteratively, by either starting from a solution where all jumps are long, and shrinking them when possible, or starting from a state where all jumps are short and increasing their length as needed. Proving the correctness of such algorithms is already quite involved and the correctness of the assembler as a whole does not depend on the quality' of the solution found to a branch displacement problem. For this reason, we try to isolate the computation of a branch displacement problem from the proof of correctness for the assembler by parameterising our \texttt{expand\_pseudo\_instruction} by a policy'. \begin{lstlisting} definition expand_pseudo_instruction: $\forall$lookup_labels: Identifier $\rightarrow$ Word. $\forall$sigma: Word $\rightarrow$ Word. $\forall$policy: Word $\rightarrow$ bool. $\forall$ppc: Word. $\forall$lookup_datalabels: Identifier $\rightarrow$ Word. $\forall$pi: pseudo_instruction. list instruction := ... \end{lstlisting} Here, the functions \texttt{lookup\_labels} and \texttt{lookup\_datalabels} are the functions that map labels and datalabels to program counters respectively, both of them used in the semantics of assembly. The input \texttt{pi} is the pseudoinstruction to be expanded and is found at address \texttt{ppc} in the assembly program. The policy is defined using the two functions \texttt{sigma} and \texttt{policy}, of which \texttt{sigma} is the most interesting. The function $\sigma$ maps pseudo program counters to program counters: the encoding of the expansion of the pseudoinstruction found at address \texttt{a} in the assembly code should be placed into code memory at address \texttt{sigma(a)}. Of course this is possible only if the policy is correct, which means that the encoding of consecutive assembly instructions must be consecutive in code memory. \begin{displaymath} \texttt{sigma}(\texttt{ppc} + 1) = \texttt{pc} + \texttt{current\_instruction\_size} \end{displaymath} along with some other technical properties related to program counters falling within the bounds of our programs. Here, \texttt{current\_instruction\_size} is the size in bytes of the encoding of the expanded pseudoinstruction found at \texttt{ppc}. The \texttt{expand\_pseudo\_instruction} function uses the \texttt{sigma} map to determine the size of jump required when expanding pseudojumps, computing the jump size by examining the size of the differences between program counters. In some circumstances the decision as to what machine code jump a particular pseudojump should be expanded into is ambiguous, and requires the use of the \texttt{policy} function to force a decision. % ---------------------------------------------------------------------------- % % SECTION                                                                      % % ---------------------------------------------------------------------------- % \subsection{Correctness of the assembler with respect to fetching} \label{subsect.total.correctness.of.the.assembler} Throughout the proof of correctness for assembly we assume that policies are bifurcated into two functions: \texttt{sigma} mapping \texttt{Word} to \texttt{Word} and \texttt{policy} mapping \texttt{Word} to \texttt{bool}. For our purposes, \texttt{sigma} is most interesting, as it maps pseudo program counters to program counters; \texttt{policy} is merely a technical device used in jump expansion. Using our policies, we now work toward proving the total correctness of the assembler. By total correctness', we mean that the assembly process never fails when provided with a good policy and that the process does not change the semantics of a certain class of well behaved assembly programs. By good policy' we mean We assume the correctness of the policies given to us using a function, \texttt{sigma\_policy\_specification} that we take in input, when needed.