Dec 7, 2010, 4:12:16 PM (9 years ago)

First draft of report finished.

1 edited


  • Deliverables/D4.1/Report/report.tex

    r382 r383  
    1919\vspace*{1cm}Project FP7-ICT-2009-C-243881 \cerco{}}
    21 \lstdefinelanguage{matita}
    2222  {keywords={ndefinition,ncoercion,nlemma,ntheorem,nremark,ninductive,nrecord,nqed,nlet,let,in,rec,match,return,with,Type},
    2323   morekeywords={[2]nwhd,nnormalize,nelim,ncases,ndestruct},
     24   morekeywords={[3]type,of},
    2425   mathescape=true,
    2526  }
    27 \lstset{language=matita,basicstyle=\small\tt,columns=flexible,breaklines=false,
    2829        keywordstyle=\color{red}\bfseries,
    2930        keywordstyle=[2]\color{blue},
     31        keywordstyle=[3]\color{blue}\bfseries,
    3032        commentstyle=\color{green},
    3133        stringstyle=\color{blue},
    96 We discuss the design and implementation of an O'Caml emulator for the Intel 8051/8052 eight bit processor, and its subsequent formalisation in the dependently typed proof assistant Matita.
     98We discuss the implementation of a prototype O'Caml emulator for the Intel 8051/8052 eight bit processor, and its subsequent formalisation in the dependently typed proof assistant Matita.
     99In particular, we focus on the decisions made during the design of both emulators, and how the design of the O'Caml emulator had to be modified in order to fit into the more stringent type system of Matita.
     101We also briefly summarise the Intel 8051/8052 processor architecture, our target processor model for the \textsf{CerCo} project.
    104 \section{Introduction}
    105 \label{sect.introduction}
    107 \subsection{Task}
    108 \label{subsect.task}
    110111The Grant Agreement states the D4.1/D4.2 deliverables consist of the following tasks:
    122123We now report on our implementation of these deliverables.
    124 \subsection{A brief overview of the target processor}
    125 \label{subsect.brief.overview.target.processor}
     125\section{A brief overview of the target processor}
    127128The MCS-51 is an eight bit microprocessor introduced by Intel in the late 1970s.
    193 \subsection{Anatomy of the emulator}
    194 \label{subsect.anatomy.emulator}
     194We discuss decisions made during the design of the prototype O'Caml emulator.
    196196\subsection{Lack of orthogonality in instruction set}
    200200For instance, consider the MOV instruction, which implements a data transfer between two memory locations, which takes eighteen possible combinations of addressing modes.
    202 \subsection{Pseudo-instructions}
    203 \label{subsect.pseudo-instructions}
     202We handle this problem by introducing `unions' of types, using O'Caml's polymorphic variants feature:
     205type ('a, 'b) union2 = [ `U1 of 'a | `U2 of 'b ]
     208(We also introduce \texttt{union3} and \texttt{union6}, which suffice for our purposes.)
     210Using these union types, we can rationalise the inductive type encoding the assembly instruction set.
     211For instance:
     214type 'addr preinstruction =
     216  | `XRL of (acc * [ data | reg | direct | indirect ],
     217             direct * [ acc | data ]) union2
     221That is, the \texttt{XRL} instruction\footnote{Exclusive disjunction.} take either the accumulator A as its first argument, followed by data with one of data, register, direct or indirect addressing modes, or takes data with a direct addressing mode as its first argument, with either the accumulator A or data with the data addressing mode as its second argument.
     223Further, all functions that must pattern match against the \texttt{(pre)instruction} inductive type are also simplified using this technique.
     224Using O'Caml's ability to perform `deep pattern' matches, we may pattern match against \texttt{`XRL(`U1(arg1, arg2))} and have the guarantee that \texttt{arg1} takes the form \texttt{`ACC\_A}.
     226\subsection{Pseudo-instructions and labels}
     229Per the description of Deliverable D4.1 in the Grant Agreement above, the 8051 emulator must eventually interface with the C compiler frontend of Deliverable D3.2, produced in Paris.
     230After consultation, it was found that the design of the compiler frontend could be simplified considerably with the introduction of \emph{pseudoinstructions} and labels.
     232We introduce three new pseudoinstructions---\texttt{Jump}, \texttt{Call}, and \texttt{Mov}---corresponding to unconditional jumps, procedure calls and data transfers respectively.
     233We also `promote' all unlabeled conditional jumps in 8051 assembly to labeled pseudojumps; one can now jump to a label conditionally, as opposed to jumping to a fixed relative offset.
     234Further, we introduce labels for jumping to, and cost annotations, used by the Paris team.
     236The three new pseudoinstructions, along with the promoted conditional jumps, allow the Paris team to abstract away from the differences between different types of unconditional jump (the 8051 has three different sorts, depending on the length of the jump), as well as abstract away the differences between memory transfers and calls.
     237However, the emulator must perform an expansion stage, during which pseudoinstructions are translated to `real' 8051 assembly instructions.
     239The introduction of labeled conditional jumps slightly complicates our type of abstract syntax for 8051 assembly.
     240We define an inductive type representing conditional jumps in 8051 assembly code, parameterised by a type representing relative offsets:s
     243type 'addr jump =
     244  [ `JC of 'addr
     245  | `JNC of 'addr
     249An inductive type of preinstructions is defined, which is also parameterised by a type representing relative offsets in assembly code, and incorporates the inductive type of conditional jumps:
     252type 'addr preinstruction =
     253  [ `ADD of acc * [ reg | direct | indirect | data ]
     255  | 'addr jump
     259A type representing instructions is defined, choosing a concrete type for relative offsets:
     262type instruction = rel preinstruction
     265Here, \texttt{rel} is a type which `wraps up' a byte.
     266Finally, this type of instructions is incorporated into a type of labelled instructions:
     269type labelled_instruction =
     270  [ instruction
     271  | `Cost of string
     272  | `Label of string
     273  | `Jmp of string
     274  | `Call of string
     275  | `Mov of dptr * string
     276  | `WithLabel of [`Label of string] jump
     280Throughout, we make heavy use of polymorphic variants to deal with issues relating to subtyping.
     282As mentioned, the emulator must now handle an additional expansion stage, removing pseudoinstructions in favour of real, 8051 assembly instructions.
     283This is relatively straightforward, and is done in two stages.
     285The first stage consists of iterating over an assembly program, building a dictionary of all labels and their position in the program.
     286The second stage consists of iterating over the same program and replacing all pseudojumps (both conditional and unconditional) with an 8051 jump to the requisite computed offset.
     287One subtletly persists, however.
     289The 8051 has three different types of unconditional jump, depending on the length of the jump to be used: \texttt{AJMP}, \texttt{JMP} and \texttt{LJMP}.
     290The instructions \texttt{AJMP} and \texttt{JMP} are short jumps, whereas \texttt{LJMP} is a long jump, capable of reaching anywhere in the program.
     291At the moment, the second pass of the expansion stage replaces all unconditional pseudojumps with a \texttt{LJMP} for simplicity.
     292We do, however, plan to improve this process for efficiency reasons, expanding to shorter jumps where feasible.
    205297In validating the design and implementation of the O'Caml emulator we used two tactics:
    211303Use of reference compilers and emulators.
    212304The operation of the emulator was manually tested by reference to \textsc{mcu 8051 ide}, an emulator for the 8051 series processor.
    213 A number of small C programs were compiled in SDCC\footnote{See the \texttt{GCC} directory for a selection of them.}, and the resulting IHX files were disassembled by \textsc{mcu 8051 ide}.
     305A number of small C programs were compiled in SDCC\footnote{See the \texttt{GCC} directory for a selection of them.}.
     306The resulting IHX files were disassembled by \textsc{mcu 8051 ide}.
     307(IHX files are a standard format for transferring compiled assembly code onto an 8051 series processor, produced by SDCC and all other compilers that target that 8051.)
    214308The status changes in both emulators were then compared.
    219313As a further check, the design and operation of the emulator was compared with the textual description of online tutorials on 8051 programming, such as those found at \url{http://www.8052.com}.
    221 \subsection{Validation}
    222 \label{subsect.validation}
    224315\section{The emulator in Matita}
     318The O'Caml emulator served as a testbed and prototype for an emulator written in the internal language of the Matita proof assistant.
     319We describe our work porting the emulator to Matita, especially where the design of the Matita emulator differs from that of the O'Caml version.
    227321\subsection{What we do not implement}
Note: See TracChangeset for help on using the changeset viewer.