# Changeset 2354 for Papers/cpp-asm-2012

Ignore:
Timestamp:
Sep 26, 2012, 10:39:22 PM (7 years ago)
Message:

the introduction.

File:
1 edited

### Legend:

Unmodified
 r2352 % ---------------------------------------------------------------------------- % \subsection{Machine code semantics} \subsection{Machine code and its semantics} \label{subsect.machine.code.semantics} Our emulator centres around a \texttt{Status} record, describing the microprocessor's state. This record contains fields corresponding to the microprocessor's program counter, registers, stack pointer, special function registers, and so on. At the machine code level, code memory is implemented as a compact trie of bytes addressed by the program counter. We parameterise \texttt{Status} records by this representation as a few technical tasks manipulating statuses are made simpler using this approach, as well as permitting a modicum of abstraction. We may execute a single step of a machine code program using the \texttt{execute\_1} function, which returns an updated \texttt{Status}: We implemented a realistic and efficient emulator for the MCS-51 microprocessor. An MCS-51 program is just a sequence of bytes stored in the read-only code memory of the processor, represented as a compact trie of bytes addressed by the program counter. The \texttt{Status} of the emulator is described as a record that contains the microprocessor's program counter, registers, stack pointer, clock, special function registers, code memory, and so on. The value of the code memory is a parameter of the record since it is not changed during execution. The \texttt{Status} records is itself an instance of a more general datatype \texttt{PreStatus} that abstracts over the implementation of code memory in order to reuse the same datatype for the semantics of the assembly language in the next section. The execution of a single instruction is performed by the \texttt{execute\_1} function, parametric over the content \texttt{cm} of the code memory: \begin{lstlisting} definition execute_1: $\forall$cm. Status cm $\rightarrow$ Status cm \end{lstlisting} Here \texttt{cm} is our trie of bytes representing code memory. %The function \texttt{execute} allows one to execute an arbitrary, but fixed (due to Matita's normalisation requirement) number of steps of a program. The function \texttt{execute\_1} closely matches the operation of the MCS-51 hardware, as described by a Siemen's manufacturer's data sheet~\cite{siemens:2011}. We first fetch, using the program counter, from code memory the first byte of the instruction to be executed, decoding the resulting opcode, fetching more bytes as is necessary. Decoded instructions are represented as an inductive type, where $\llbracket - \rrbracket$ denotes a fixed-length vector: The function \texttt{execute\_1} closely matches the fetch-decode-execute cycle of the MCS-51 hardware, as described by a Siemen's manufacturer's data sheet~\cite{siemens:2011}. Fetching and decoding are performed simultaneously: we first fetch, using the program counter, from code memory the first byte of the instruction to be executed, decoding the resulting opcode, fetching more bytes as is necessary to decode the arguments. Decoded instructions are represented by the \texttt{instruction} data type which extends a data type of \texttt{preinstruction}s that will be reused for the assembly language. \begin{lstlisting} inductive preinstruction (A: Type[0]): Type[0] := | ... \end{lstlisting} Here, we use dependent types to provide a precise typing for instructions, specifying in their type the permitted addressing modes that their arguments can belong to. The parameterised type $A$ of \texttt{preinstruction} represents the addressing mode for conditional jumps; in \texttt{instruction} we fix this type to be a relative offset in the constructor \texttt{RealInstruction}. The MCS-51 has many operand modes, but an unorthogonal instruction set: every opcode is only enable for a finite subset of the possible operand modes. Here we exploit dependent types and an implicit coercion to synthesize the type of arguments of opcodes from a vector of names of operand modes. For example, \texttt{ACC} has two operands, the first one constrained to be the \texttt{A} accumulator, and the second one to be a disjoint union of register, direct, indirect and data operand modes. The parameterised type $A$ of \texttt{preinstruction} represents the addressing mode allowed for conditional jumps; in the \texttt{RealInstruction} constructor we constraint it to be a relative offset. A different instantiation will be used in the next Section for assembly programs. Once decoded, execution proceeds by a case analysis on the decoded instruction, following the operation of the hardware. For example, the \texttt{DEC} instruction (decrement') is implemented as follows: For example, the \texttt{DEC} preinstruction (decrement') is executed as follows: \begin{lstlisting} | DEC addr $\Rightarrow$ \end{lstlisting} Here, \texttt{add\_ticks1} models the incrementing of the internal clock of the microprocessor. % ---------------------------------------------------------------------------- % % SECTION                                                                      % % ---------------------------------------------------------------------------- % \subsection{Assembly code semantics} Here, \texttt{add\_ticks1} models the incrementing of the internal clock of the microprocessor; it is a parameter of the semantics of \texttt{preinstruction}s that is fixed in the semantics of \texttt{instruction}s according to the manufacturer datasheet. % ---------------------------------------------------------------------------- % % SECTION                                                                      % % ---------------------------------------------------------------------------- % \subsection{Assembly code and its semantics} \label{subsect.assembly.code.semantics} An assembly program is a list of potentially labelled pseudoinstructions, bundled with a preamble consisting of a list of symbolic names for locations in data memory (i.e. global variables). Pseudoinstructions are implemented as an inductive type: All preinstructions are pseudoinstructions, but conditional jumps are now only allowed to use \texttt{Identifiers} (labels) as their target. \begin{lstlisting} inductive pseudo_instruction: Type[0] := \end{lstlisting} The pseudoinstructions \texttt{Jmp}, \texttt{Call} and \texttt{Mov} are generalisations of machine code unconditional jumps, calls and move instructions respectively, all of whom act on labels, as opposed to concrete memory addresses. Similarly, conditional jumps can now only jump to labels, as is implied by the first constructor of the type above. All other object code instructions are available at the assembly level, with the exception of those that appeared in the \texttt{instruction} type, such as \texttt{ACALL} and \texttt{LJMP}. These are jumps and calls to absolute addresses, which we do not wish to allow at the assembly level. The object code calls and jumps that act on concrete memory addresses are ruled out of assembly programs since they are not preinstructions (see previous Section). Execution of pseudoinstructions is an endofunction on \texttt{PseudoStatus}. Both \texttt{Status} and \texttt{PseudoStatus} are instances of a more general type parameterised over the representation of code memory. The more general type is crucial for sharing the majority of the semantics of the two languages. A \texttt{PseudoStatus} is an instance of \texttt{PreStatus} that differs from a \texttt{Status} only in the datatype used for code memory: a list of optionally labelled pseudoinstructions versus a trie of bytes. The \texttt{PreStatus} type is crucial for sharing the majority of the semantics of the two languages. Emulation for pseudoinstructions is handled by \texttt{execute\_1\_pseudo\_instruction}: program_counter s < $\mid$snd cm$\mid$ $\rightarrow$ PseudoStatus cm \end{lstlisting} The type of \texttt{execute\_1\_pseudo\_instruction} is interesting. Here our representation of code memory \texttt{cm} is no longer a trie of bytes, but a list of pseudoinstructions. We take in a function that maps program counters (at the assembly level) to pairs of natural numbers representing the number of clock ticks that the pseudoinstructions needs to execute, post expansion. This function is called a \emph{costing}, and the costing is induced by whatever strategy we use to expand pseudoinstructions. If we change how we expand conditional jumps to labels, for instance, then the costing needs to change, hence \texttt{execute\_1\_pseudo\_instruction}'s parametricity in the costing. This costing function expects a proof, as its second argument, stating that the program counter falls within the bounds of the pseudoprogram. A similar proof is required for the pseudo-program counter passed to the \texttt{execute\_1\_pseudo\_instruction} function. The costing returns \emph{pairs} of natural numbers because, in the case of expanding conditional jumps to labels, the expansion of the true branch' and false branch' may differ in execution time. The \texttt{add\_ticks1} function, which we have already seen used to increment the machine clock above, is determined for the assembly language from the costing function. The type of \texttt{execute\_1\_pseudo\_instruction} is more involved than that of \texttt{execute\_1}. The first difference is that execution is only defined when the program counter points to a valid instruction, i.e. it is smaller than the length $\mid$\texttt{snd cm}$\mid$ of the program. The second difference is the abstraction over the cost model, abbreviated here as \emph{costing}. The costing is a function that maps valid program counters to pairs of natural numbers representing the number of clock ticks used by the pseudoinstructions stored at those program counters. For conditional jumps the two numbers differ to represent different costs for the true branch' and the false branch'. In the next Section we will see how the optimizing assembler induces the only costing that is preserved by compilation. Obviously the induced costing is determined by the branch displacement policy that decides how to expand every pseudojump to a label into concrete instructions. Execution proceeds by first fetching from pseudo-code memory using the program counter---treated as an index into the pseudoinstruction list.