Changeset 380 for Deliverables

Ignore:
Timestamp:
Dec 6, 2010, 5:14:59 PM (10 years ago)
Message:

More added on subtyping stuff, etc.

File:
1 edited

Legend:

Unmodified
 r377 \usepackage[utf8x]{inputenc} \usepackage{listings} \usepackage{stmaryrd} \usepackage{url} \lstdefinelanguage{matita} {keywords={ndefinition,nlemma,ntheorem,nremark,ninductive,nrecord,nqed,nlet,rec,match,with,Type}, {keywords={ndefinition,ncoercion,nlemma,ntheorem,nremark,ninductive,nrecord,nqed,nlet,let,in,rec,match,return,with,Type}, morekeywords={[2]nwhd,nnormalize,nelim,ncases,ndestruct}, mathescape=true, \label{subsect.task} The Grant Agreement states the D4.1/D4.2 deliverables consist of: \begin{quotation} \textbf{Executable Formal Semantics of Machine Code}: Formal definition of the semantics of the target language. The semantics will be given in a functional (and hence executable) form, useful for testing, validation and project assessment. \end{quotation} \begin{quotation} \textbf{CIC encoding: Back-end}: Functional Specification in the internal language of the Proof Assistant (the Calculus of Inductive Construction) of the back end of the compiler. This unit is meant to be composable with the front-end of deliverable D3.2, to obtain a full working compiler for Milestone M2. A first validation of the design principles and implementation choices for the Untrusted Cost-annotating OCaml Compiler D2.2 is achieved and reported in the deliverable, possibly triggering updates of the Untrusted Cost-annotating OCaml Compiler sources. \end{quotation} We now report on our implementation of these deliverables. \subsection{A brief overview of the target processor} \label{subsect.brief.overview.target.processor} An open source emulator for the processor, MCU8051 IDE, is also available. % Processor architecture The 8051 has a relatively straightforward architecture, unencumbered by advanced features of modern processors, making it an ideal target for formalisation. A high-level overview of the processor's memory layout is provided in Figure~\ref{fig.memory.layout}. % Internal RAM Processor RAM is divided into numerous segments, with the most prominent division being between internal and (optional) external memory. Internal memory, commonly provided on the die itself with fast access, is further divided into 128 bytes of internal RAM and numerous Special Function Registers (SFRs) which control the operation of the processor. What remains of the IRAM may be treated as general purpose memory. % External RAM External RAM (XRAM), limited to 64 kilobytes, is optional, and may be provided on or off chip, depending on the manufacturer. XRAM is accessed using a dedicated instruction. However, depending on the particular manufacturer and processor model, a dedicated on-die read-only memory area for program code may also be supplied (the processor has a Harvard architecture, where program code and data are separated). % ALU Memory may be addressed in numerous ways: immediate, direct, indirect, external direct and code indirect. As the latter two addressing modes hint, there are some restrictions enforced by the 8051 and its derivatives on which addressing modes may be used with specific types of memory. For instance, the 128 bytes of extra internal RAM that the 8052 features cannot be addressed using indirect addressing; rather, external (in)direct addressing must be used. The 8051 series possesses an eight bit Arithmetic and Logic Unit (ALU), with a wide variety of instructions for performing arithmetic and logical operations on bits and integers. Further, the processor possesses two eight bit general purpose accumulators, A and B. % Serial I/O and the input-output lines Communication with the device is facilitated by an onboard UART serial port, and associated serial controller, which can operate in numerous modes. Serial baud rate is determined by one of two sixteen bit timers included with the 8051, which can be set to multiple modes of operation. As an additional method of communication, the 8051 also provides a four byte bit-addressable input-output port. % The programmer may take advantage of the interrupt mechanism that the processor provides. This is especially useful when dealing with input or output involving the serial device, as an interrupt can be set when a whole character is sent or received via the serial port. \label{subsect.anatomy.emulator} We provide a high-level overview of the operation of the emulator. % Intel HEX, parsing Program code is loaded onto the 8051 in a standard format, the Intel Hex (IHX) format. All compilers producing machine code for the 8051, including the SDCC compiler which we use for debugging purposes, produce compiled programs in IHX format as standard. Accordingly, our O'Caml emulator can parse IHX files and populate the emulator's code memory with their contents. Once code memory is populated, and the rest of the emulator state has been initialised (i.e. setting the program counter to zero), the O'Caml emulator fetches the instruction pointed to by the program counter from code memory. \subsection{Lack of orthogonality in instruction set} \label{subsect.lack.orthogonality.instruction.set} The instruction set of 8051 assembly is highly irregular. For instance, consider the MOV instruction, which implements a data transfer between two memory locations, which takes eighteen possible combinations of addressing modes. \subsection{Pseudo-instructions} This is due to record typechecking in Matita being slow for large records. \subsection{Addressing modes: use of dependent types} \label{subsect.addressing.modes.use.of.dependent.types} \subsection{Dealing with partiality} \label{subsect.dealing.with.partiality} An example of a function which exhibits the latter behaviour is \texttt{set\_arg\_16} from \texttt{ASMInterpret.ml}, which fails with a pattern match exception if called on an input representing an eight bit argument. \item \textbf{Assert false} may be called if the emulator finds itself in an impossible situation'. \textbf{Assert false} may be called if the emulator finds itself in an impossible situation', such as encountering an empty list where a list containing one element is expected. In this respect, we used \texttt{assert false} in a similar way to the previously described use of incomplete pattern analysis. \item \textbf{Assert false} may be called is some feature of the physical 8051 processor is not implemented in the O'Caml emulator and an executing program is attempting to use it. \end{enumerate} The three manifestations of partiality above can be split into two types: partiality that manifests itself due to O'Caml's type system not being strong enough to rule the cause out, and partiality that signals a real' crash in the processor due to the user attempting to use an unimplemented feature. Items 1 and 2 belong to the former class, Item 3 to the latter. Clearly Items 1 and 2 above must be addressed in the Matita formalisation. Item 2 is solved through extensive use of dependent types. Indexing into lists and vectors, for instance, is always type safe', as we provide probing functions with strong dependent types. Item 1 is perhaps the most problematic of the three problems, as we either have to provide an exhaustive case analysis, use pattern wildcards, or find a clever way of encoding the possible patterns that are expected as input in the type of a function. We employ a technique that implements the latter idea. This is discussed in Subsection~\ref{subsect.addressing.modes.use.of.dependent.types}. To solve Item 3 above in the Matita formalisation of the emulator, we introduce an axiom \texttt{not\_implemented} of type \texttt{False}. When the emulator attempts to use an unimplemented feature, we introduce a metavariable, corresponding to an open proof obligation. These obligations are closed by performing a case analysis over \texttt{not\_implemented}. \subsection{Addressing modes: use of dependent types} \label{subsect.addressing.modes.use.of.dependent.types} We provide an inductive data type representing all possible addressing modes of 8051 assembly. This is the type that functions will pattern match against. \begin{quote} \begin{lstlisting} ninductive addressing_mode: Type[0] ≝ DIRECT: Byte $\rightarrow$ addressing_mode | INDIRECT: Bit $\rightarrow$ addressing_mode ... \end{lstlisting} \end{quote} However, we also wish to express in the type of our functions the \emph{impossibility} of pattern matching against certain constructors. In order to do this, we introduce an inductive type of addressing mode tags'. The constructors of \texttt{addressing\_mode\_tag} are in one-one correspondence with the constructors of \texttt{addressing\_mode}: \begin{quote} \begin{lstlisting} ninductive addressing_mode_tag : Type[0] ≝ direct: addressing_mode_tag | indirect: addressing_mode_tag ... \end{lstlisting} \end{quote} We then provide a function that checks whether an \texttt{addressing\_mode} is morally' an \texttt{addressing\_mode\_tag}, as follows: \begin{quote} \begin{lstlisting} nlet rec is_a (d:addressing_mode_tag) (A:addressing_mode) on d ≝ match d with [ direct $\Rightarrow$ match A with [ DIRECT _ $\Rightarrow$ true | _ $\Rightarrow$ false ] | indirect $\Rightarrow$ match A with [ INDIRECT _ $\Rightarrow$ true | _ $\Rightarrow$ false ] ... \end{lstlisting} \end{quote} We also extend this check to vectors of \texttt{addressing\_mode\_tag}'s in the obvious manner: \begin{quote} \begin{lstlisting} nlet rec is_in (n: Nat) (l: Vector addressing_mode_tag n) (A:addressing_mode) on l ≝ match l return $\lambda$m.$\lambda$_ :Vector addressing_mode_tag m.Bool with [ VEmpty $\Rightarrow$ false | VCons m he (tl: Vector addressing_mode_tag m) $\Rightarrow$ is_a he A $\vee$ is_in ? tl A ]. \end{lstlisting} \end{quote} Here \texttt{VEmpty} and \texttt{VCons} are the two constructors of the \texttt{Vector} data type, and $\mathtt{\vee}$ is inclusive disjunction on Booleans. \begin{quote} \begin{lstlisting} nrecord subaddressing_mode (n: Nat) (l: Vector addressing_mode_tag (S n)) : Type[0] ≝ { subaddressing_modeel :> addressing_mode; subaddressing_modein: bool_to_Prop (is_in ? l subaddressing_modeel) }. \end{lstlisting} \end{quote} We can now provide an inductive type of preinstructions with precise typings: \begin{quote} \begin{lstlisting} ninductive preinstruction (A: Type[0]): Type[0] ≝ ADD: $\llbracket$ acc_a $\rrbracket$ $\rightarrow$ $\llbracket$ register; direct; indirect; data $\rrbracket$ $\rightarrow$ preinstruction A | ADDC: $\llbracket$ acc_a $\rrbracket$ $\rightarrow$ $\llbracket$ register; direct; indirect; data $\rrbracket$ $\rightarrow$ preinstruction A ... \end{lstlisting} \end{quote} Here $\llbracket - \rrbracket$ is syntax denoting a vector. We see that the constructor \texttt{ADD} expects two parameters, the first being the accumulator A (\texttt{acc\_a}), and the second being one of a register, direct, indirect or data addressing mode. The final, missing component is a pair of type coercions from \texttt{addressing\_mode} to \texttt{subaddressing\_mode} and from \texttt{subaddressing\_mode} to \texttt{Type$\lbrack0\rbrack$}, respectively. The previous machinery allows us to state in the type of a function what addressing modes that function expects. For instance, consider \texttt{set\_arg\_16}, which expects only a \texttt{DPTR}: \begin{quote} \begin{lstlisting} ndefinition set_arg_16: Status $\rightarrow$ Word $\rightarrow$ $\llbracket$ dptr $\rrbracket$ $\rightarrow$ Status ≝ $\lambda$s, v, a. match a return $\lambda$x. bool_to_Prop (is_in ? $\llbracket$ dptr $\rrbracket$ x) $\rightarrow$ ? with [ DPTR $\Rightarrow$ $\lambda$_: True. let 〈 bu, bl 〉 := split $\ldots$ eight eight v in let status := set_8051_sfr s SFR_DPH bu in let status := set_8051_sfr status SFR_DPL bl in status | _ $\Rightarrow$ $\lambda$_: False. match K in False with [ ] ] (subaddressing_modein $\ldots$ a). \end{lstlisting} \end{quote} All other cases are discharged by the catch-all at the bottom of the match expression. Attempting to match against another addressing mode not indicated in the type (for example, \texttt{REGISTER}) will produce a type-error. \end{document}