Changeset 2354 for Papers

Sep 26, 2012, 10:39:22 PM (7 years ago)

3.1 and 3.2 rewritten because now the reader has more information coming from
the introduction.

1 edited


  • Papers/cpp-asm-2012/cpp-2012-asm.tex

    r2352 r2354  
    207207% ---------------------------------------------------------------------------- %
    209 \subsection{Machine code semantics}
     209\subsection{Machine code and its semantics}
    212 Our emulator centres around a \texttt{Status} record, describing the microprocessor's state.
    213 This record contains fields corresponding to the microprocessor's program counter, registers, stack pointer, special function registers, and so on.
    214 At the machine code level, code memory is implemented as a compact trie of bytes addressed by the program counter.
    215 We parameterise \texttt{Status} records by this representation as a few technical tasks manipulating statuses are made simpler using this approach, as well as permitting a modicum of abstraction.
    217 We may execute a single step of a machine code program using the \texttt{execute\_1} function, which returns an updated \texttt{Status}:
     212We implemented a realistic and efficient emulator for the MCS-51 microprocessor.
     213An MCS-51 program is just a sequence of bytes stored in the read-only code
     214memory of the processor, represented as a compact trie of bytes addressed
     215by the program counter.
     216The \texttt{Status} of the emulator is described as
     217a record that contains the microprocessor's program counter, registers, stack
     218pointer, clock, special function registers, code memory, and so on.
     219The value of the code memory is a parameter of the record since it is not
     220changed during execution.
     222The \texttt{Status} records is itself an instance of a more general
     223datatype \texttt{PreStatus} that abstracts over the implementation of code
     224memory in order to reuse the same datatype for the semantics of the assembly
     225language in the next section.
     227The execution of a single instruction is performed by the \texttt{execute\_1}
     228function, parametric over the content \texttt{cm} of the code memory:
    219230definition execute_1: $\forall$cm. Status cm $\rightarrow$ Status cm
    221 Here \texttt{cm} is our trie of bytes representing code memory.
    222 %The function \texttt{execute} allows one to execute an arbitrary, but fixed (due to Matita's normalisation requirement) number of steps of a program.
    223 The function \texttt{execute\_1} closely matches the operation of the MCS-51 hardware, as described by a Siemen's manufacturer's data sheet~\cite{siemens:2011}.
    224 We first fetch, using the program counter, from code memory the first byte of the instruction to be executed, decoding the resulting opcode, fetching more bytes as is necessary.
    225 Decoded instructions are represented as an inductive type, where $\llbracket - \rrbracket$ denotes a fixed-length vector:
     233The function \texttt{execute\_1} closely matches the fetch-decode-execute
     234cycle of the MCS-51 hardware, as described by a Siemen's manufacturer's data sheet~\cite{siemens:2011}.
     235Fetching and decoding are performed simultaneously:
     236we first fetch, using the program counter, from code memory the first byte of the instruction to be executed, decoding the resulting opcode, fetching more bytes as is necessary to decode the arguments.
     237Decoded instructions are represented by the \texttt{instruction} data type
     238which extends a data type of \texttt{preinstruction}s that will be reused
     239for the assembly language.
    227241inductive preinstruction (A: Type[0]): Type[0] :=
    237251 | ...
    239 Here, we use dependent types to provide a precise typing for instructions, specifying in their type the permitted addressing modes that their arguments can belong to.
    240 The parameterised type $A$ of \texttt{preinstruction} represents the addressing mode for conditional jumps; in \texttt{instruction} we fix this type to be a relative offset in the constructor \texttt{RealInstruction}.
     253The MCS-51 has many operand modes, but an unorthogonal instruction set: every
     254opcode is only enable for a finite subset of the possible operand modes.
     255Here we exploit dependent types and an implicit coercion to synthesize
     256the type of arguments of opcodes from a vector of names of operand modes.
     257For example, \texttt{ACC} has two operands, the first one constrained to be
     258the \texttt{A} accumulator, and the second one to be a disjoint union of
     259register, direct, indirect and data operand modes.
     261The parameterised type $A$ of \texttt{preinstruction} represents the addressing mode allowed for conditional jumps; in the \texttt{RealInstruction} constructor
     262we constraint it to be a relative offset. A different instantiation will be
     263used in the next Section for assembly programs.
    242265Once decoded, execution proceeds by a case analysis on the decoded instruction, following the operation of the hardware.
    243 For example, the \texttt{DEC} instruction (`decrement') is implemented as follows:
     266For example, the \texttt{DEC} preinstruction (`decrement') is executed as follows:
    245268 | DEC addr $\Rightarrow$
    252 Here, \texttt{add\_ticks1} models the incrementing of the internal clock of the microprocessor.
    254 % ---------------------------------------------------------------------------- %
    255 % SECTION                                                                      %
    256 % ---------------------------------------------------------------------------- %
    258 \subsection{Assembly code semantics}
     275Here, \texttt{add\_ticks1} models the incrementing of the internal clock of the microprocessor; it is a parameter of the semantics of \texttt{preinstruction}s
     276that is fixed in the semantics of \texttt{instruction}s according to the
     277manufacturer datasheet.
     279% ---------------------------------------------------------------------------- %
     280% SECTION                                                                      %
     281% ---------------------------------------------------------------------------- %
     283\subsection{Assembly code and its semantics}
    261286An assembly program is a list of potentially labelled pseudoinstructions, bundled with a preamble consisting of a list of symbolic names for locations in data memory (i.e. global variables).
    262 Pseudoinstructions are implemented as an inductive type:
     287All preinstructions are pseudoinstructions, but conditional jumps are now
     288only allowed to use \texttt{Identifiers} (labels) as their target.
    264290inductive pseudo_instruction: Type[0] :=
    271297The pseudoinstructions \texttt{Jmp}, \texttt{Call} and \texttt{Mov} are generalisations of machine code unconditional jumps, calls and move instructions respectively, all of whom act on labels, as opposed to concrete memory addresses.
    272 Similarly, conditional jumps can now only jump to labels, as is implied by the first constructor of the type above.
    273 All other object code instructions are available at the assembly level, with the exception of those that appeared in the \texttt{instruction} type, such as \texttt{ACALL} and \texttt{LJMP}.
    274 These are jumps and calls to absolute addresses, which we do not wish to allow at the assembly level.
     298The object code calls and jumps that act on concrete memory addresses are ruled
     299out of assembly programs since they are not preinstructions (see previous
    276302Execution of pseudoinstructions is an endofunction on \texttt{PseudoStatus}.
    277 Both \texttt{Status} and \texttt{PseudoStatus} are instances of a more general type parameterised over the representation of code memory.
    278 The more general type is crucial for sharing the majority of the semantics of the two languages.
     303A \texttt{PseudoStatus} is an instance of \texttt{PreStatus} that differs
     304from a \texttt{Status} only in the datatype used for code memory: a list
     305of optionally labelled pseudoinstructions versus a trie of bytes.
     306The \texttt{PreStatus} type is crucial for sharing the majority of the
     307semantics of the two languages.
    280309Emulation for pseudoinstructions is handled by \texttt{execute\_1\_pseudo\_instruction}:
    284313  program_counter s < $\mid$snd cm$\mid$ $\rightarrow$ PseudoStatus cm
    286 The type of \texttt{execute\_1\_pseudo\_instruction} is interesting.
    287 Here our representation of code memory \texttt{cm} is no longer a trie of bytes, but a list of pseudoinstructions.
    288 We take in a function that maps program counters (at the assembly level) to pairs of natural numbers representing the number of clock ticks that the pseudoinstructions needs to execute, post expansion.
    289 This function is called a \emph{costing}, and the costing is induced by whatever strategy we use to expand pseudoinstructions.
    290 If we change how we expand conditional jumps to labels, for instance, then the costing needs to change, hence \texttt{execute\_1\_pseudo\_instruction}'s parametricity in the costing.
    291 This costing function expects a proof, as its second argument, stating that the program counter falls within the bounds of the pseudoprogram.
    292 A similar proof is required for the pseudo-program counter passed to the \texttt{execute\_1\_pseudo\_instruction} function.
    294 The costing returns \emph{pairs} of natural numbers because, in the case of expanding conditional jumps to labels, the expansion of the `true branch' and `false branch' may differ in execution time.
    295 The \texttt{add\_ticks1} function, which we have already seen used to increment the machine clock above, is determined for the assembly language from the costing function.
     315The type of \texttt{execute\_1\_pseudo\_instruction} is more involved than
     316that of \texttt{execute\_1}. The first difference is that execution is only
     317defined when the program counter points to a valid instruction, i.e.
     318it is smaller than the length $\mid$\texttt{snd cm}$\mid$ of the program.
     319The second difference is the abstraction over the cost model, abbreviated
     320here as \emph{costing}.
     321The costing is a function that maps valid program counters to pairs of natural numbers representing the number of clock ticks used by the pseudoinstructions stored at those program counters. For conditional jumps the two numbers differ
     322to represent different costs for the `true branch' and the `false branch'.
     323In the next Section we will see how the optimizing
     324assembler induces the only costing that is preserved by compilation.
     325Obviously the induced costing is determined by the branch displacement policy
     326that decides how to expand every pseudojump to a label into concrete
    297329Execution proceeds by first fetching from pseudo-code memory using the program counter---treated as an index into the pseudoinstruction list.
Note: See TracChangeset for help on using the changeset viewer.