Changeset 380

Dec 6, 2010, 5:14:59 PM (9 years ago)

More added on subtyping stuff, etc.

1 edited


  • Deliverables/D4.1/Report/report.tex

    r377 r380  
    20   {keywords={ndefinition,nlemma,ntheorem,nremark,ninductive,nrecord,nqed,nlet,rec,match,with,Type},
     21  {keywords={ndefinition,ncoercion,nlemma,ntheorem,nremark,ninductive,nrecord,nqed,nlet,let,in,rec,match,return,with,Type},
    2122   morekeywords={[2]nwhd,nnormalize,nelim,ncases,ndestruct},
    2223   mathescape=true,
     109The Grant Agreement states the D4.1/D4.2 deliverables consist of:
     112\textbf{Executable Formal Semantics of Machine Code}: Formal definition of the semantics of the target language.
     113The semantics will be given in a functional (and hence executable) form, useful for testing, validation and project assessment.
     117\textbf{CIC encoding: Back-end}: Functional Specification in the internal language of the Proof Assistant (the Calculus of Inductive Construction) of the back end of the compiler.
     118This unit is meant to be composable with the front-end of deliverable D3.2, to obtain a full working compiler for Milestone M2.
     119A first validation of the design principles and implementation choices for the Untrusted Cost-annotating OCaml Compiler D2.2 is achieved and reported in the deliverable, possibly triggering updates of the Untrusted Cost-annotating OCaml Compiler sources.
     121We now report on our implementation of these deliverables.
    108123\subsection{A brief overview of the target processor}
    122137An open source emulator for the processor, MCU8051 IDE, is also available.
    124 % Processor architecture
    125139The 8051 has a relatively straightforward architecture, unencumbered by advanced features of modern processors, making it an ideal target for formalisation.
    126140A high-level overview of the processor's memory layout is provided in Figure~\ref{fig.memory.layout}.
    128 % Internal RAM
    129142Processor RAM is divided into numerous segments, with the most prominent division being between internal and (optional) external memory.
    130143Internal memory, commonly provided on the die itself with fast access, is further divided into 128 bytes of internal RAM and numerous Special Function Registers (SFRs) which control the operation of the processor.
    134147What remains of the IRAM may be treated as general purpose memory.
    136 % External RAM
    137149External RAM (XRAM), limited to 64 kilobytes, is optional, and may be provided on or off chip, depending on the manufacturer.
    138150XRAM is accessed using a dedicated instruction.
    140152However, depending on the particular manufacturer and processor model, a dedicated on-die read-only memory area for program code may also be supplied (the processor has a Harvard architecture, where program code and data are separated).
    142 % ALU
     154Memory may be addressed in numerous ways: immediate, direct, indirect, external direct and code indirect.
     155As the latter two addressing modes hint, there are some restrictions enforced by the 8051 and its derivatives on which addressing modes may be used with specific types of memory.
     156For instance, the 128 bytes of extra internal RAM that the 8052 features cannot be addressed using indirect addressing; rather, external (in)direct addressing must be used.
    143158The 8051 series possesses an eight bit Arithmetic and Logic Unit (ALU), with a wide variety of instructions for performing arithmetic and logical operations on bits and integers.
    144159Further, the processor possesses two eight bit general purpose accumulators, A and B.
    146 % Serial I/O and the input-output lines
    147161Communication with the device is facilitated by an onboard UART serial port, and associated serial controller, which can operate in numerous modes.
    148162Serial baud rate is determined by one of two sixteen bit timers included with the 8051, which can be set to multiple modes of operation.
    150164As an additional method of communication, the 8051 also provides a four byte bit-addressable input-output port.
    152 %
    153166The programmer may take advantage of the interrupt mechanism that the processor provides.
    154167This is especially useful when dealing with input or output involving the serial device, as an interrupt can be set when a whole character is sent or received via the serial port.
    170 We provide a high-level overview of the operation of the emulator.
    172 % Intel HEX, parsing
    173 Program code is loaded onto the 8051 in a standard format, the Intel Hex (IHX) format.
    174 All compilers producing machine code for the 8051, including the SDCC compiler which we use for debugging purposes, produce compiled programs in IHX format as standard.
    175 Accordingly, our O'Caml emulator can parse IHX files and populate the emulator's code memory with their contents.
    177 Once code memory is populated, and the rest of the emulator state has been initialised (i.e. setting the program counter to zero), the O'Caml emulator fetches the instruction pointed to by the program counter from code memory.
    179183\subsection{Lack of orthogonality in instruction set}
     186The instruction set of 8051 assembly is highly irregular.
     187For instance, consider the MOV instruction, which implements a data transfer between two memory locations, which takes eighteen possible combinations of addressing modes.
    269276This is due to record typechecking in Matita being slow for large records.
    271 \subsection{Addressing modes: use of dependent types}
    272 \label{subsect.addressing.modes.use.of.dependent.types}
    274278\subsection{Dealing with partiality}
    283287An example of a function which exhibits the latter behaviour is \texttt{set\_arg\_16} from \texttt{}, which fails with a pattern match exception if called on an input representing an eight bit argument.
    285 \textbf{Assert false} may be called if the emulator finds itself in an `impossible situation'.
     289\textbf{Assert false} may be called if the emulator finds itself in an `impossible situation', such as encountering an empty list where a list containing one element is expected.
    286290In this respect, we used \texttt{assert false} in a similar way to the previously described use of incomplete pattern analysis.
    288292\textbf{Assert false} may be called is some feature of the physical 8051 processor is not implemented in the O'Caml emulator and an executing program is attempting to use it.
     295The three manifestations of partiality above can be split into two types: partiality that manifests itself due to O'Caml's type system not being strong enough to rule the cause out, and partiality that signals a `real' crash in the processor due to the user attempting to use an unimplemented feature.
     296Items 1 and 2 belong to the former class, Item 3 to the latter.
     298Clearly Items 1 and 2 above must be addressed in the Matita formalisation.
     299Item 2 is solved through extensive use of dependent types.
     300Indexing into lists and vectors, for instance, is always `type safe', as we provide probing functions with strong dependent types.
     302Item 1 is perhaps the most problematic of the three problems, as we either have to provide an exhaustive case analysis, use pattern wildcards, or find a clever way of encoding the possible patterns that are expected as input in the type of a function.
     303We employ a technique that implements the latter idea.
     304This is discussed in Subsection~\ref{subsect.addressing.modes.use.of.dependent.types}.
     306To solve Item 3 above in the Matita formalisation of the emulator, we introduce an axiom \texttt{not\_implemented} of type \texttt{False}.
     307When the emulator attempts to use an unimplemented feature, we introduce a metavariable, corresponding to an open proof obligation.
     308These obligations are closed by performing a case analysis over \texttt{not\_implemented}.
     310\subsection{Addressing modes: use of dependent types}
     313We provide an inductive data type representing all possible addressing modes of 8051 assembly.
     314This is the type that functions will pattern match against.
     318ninductive addressing_mode: Type[0] ≝
     319  DIRECT: Byte $\rightarrow$ addressing_mode
     320| INDIRECT: Bit $\rightarrow$ addressing_mode
     324However, we also wish to express in the type of our functions the \emph{impossibility} of pattern matching against certain constructors.
     325In order to do this, we introduce an inductive type of addressing mode `tags'.
     326The constructors of \texttt{addressing\_mode\_tag} are in one-one correspondence with the constructors of \texttt{addressing\_mode}:
     329ninductive addressing_mode_tag : Type[0] ≝
     330  direct: addressing_mode_tag
     331| indirect: addressing_mode_tag
     335We then provide a function that checks whether an \texttt{addressing\_mode} is `morally' an \texttt{addressing\_mode\_tag}, as follows:
     338nlet rec is_a (d:addressing_mode_tag) (A:addressing_mode) on d ≝
     339  match d with
     340   [ direct $\Rightarrow$ match A with [ DIRECT _ $\Rightarrow$ true | _ $\Rightarrow$ false ]
     341   | indirect $\Rightarrow$ match A with [ INDIRECT _ $\Rightarrow$ true | _ $\Rightarrow$ false ]
     345We also extend this check to vectors of \texttt{addressing\_mode\_tag}'s in the obvious manner:
     348nlet rec is_in (n: Nat) (l: Vector addressing_mode_tag n) (A:addressing_mode) on l ≝
     349 match l return $\lambda$m.$\lambda$_ :Vector addressing_mode_tag m.Bool with
     350  [ VEmpty $\Rightarrow$ false
     351  | VCons m he (tl: Vector addressing_mode_tag m) $\Rightarrow$
     352     is_a he A $\vee$ is_in ? tl A ].
     355Here \texttt{VEmpty} and \texttt{VCons} are the two constructors of the \texttt{Vector} data type, and $\mathtt{\vee}$ is inclusive disjunction on Booleans.
     358nrecord subaddressing_mode (n: Nat) (l: Vector addressing_mode_tag (S n)) : Type[0] ≝
     360  subaddressing_modeel :> addressing_mode;
     361  subaddressing_modein: bool_to_Prop (is_in ? l subaddressing_modeel)
     365We can now provide an inductive type of preinstructions with precise typings:
     368ninductive preinstruction (A: Type[0]): Type[0] ≝
     369   ADD: $\llbracket$ acc_a $\rrbracket$ $\rightarrow$ $\llbracket$ register; direct; indirect; data $\rrbracket$ $\rightarrow$ preinstruction A
     370 | ADDC: $\llbracket$ acc_a $\rrbracket$ $\rightarrow$ $\llbracket$ register; direct; indirect; data $\rrbracket$ $\rightarrow$ preinstruction A
     374Here $\llbracket - \rrbracket$ is syntax denoting a vector.
     375We see that the constructor \texttt{ADD} expects two parameters, the first being the accumulator A (\texttt{acc\_a}), and the second being one of a register, direct, indirect or data addressing mode.
     377The final, missing component is a pair of type coercions from \texttt{addressing\_mode} to \texttt{subaddressing\_mode} and from \texttt{subaddressing\_mode} to \texttt{Type$\lbrack0\rbrack$}, respectively.
     378The previous machinery allows us to state in the type of a function what addressing modes that function expects.
     379For instance, consider \texttt{set\_arg\_16}, which expects only a \texttt{DPTR}:
     382ndefinition set_arg_16: Status $\rightarrow$ Word $\rightarrow$ $\llbracket$ dptr $\rrbracket$ $\rightarrow$ Status ≝
     383  $\lambda$s, v, a.
     384   match a return $\lambda$x. bool_to_Prop (is_in ? $\llbracket$ dptr $\rrbracket$ x) $\rightarrow$ ? with
     385     [ DPTR $\Rightarrow$ $\lambda$_: True.
     386       let 〈 bu, bl 〉 := split $\ldots$ eight eight v in
     387       let status := set_8051_sfr s SFR_DPH bu in
     388       let status := set_8051_sfr status SFR_DPL bl in
     389         status
     390     | _ $\Rightarrow$ $\lambda$_: False.
     391       match K in False with
     392       [
     393       ]
     394     ] (subaddressing_modein $\ldots$ a).
     397All other cases are discharged by the catch-all at the bottom of the match expression.
     398Attempting to match against another addressing mode not indicated in the type (for example, \texttt{REGISTER}) will produce a type-error.
Note: See TracChangeset for help on using the changeset viewer.