# Changeset 383 for Deliverables/D4.1/Report

Ignore:
Timestamp:
Dec 7, 2010, 4:12:16 PM (10 years ago)
Message:

First draft of report finished.

Location:
Deliverables/D4.1/Report
Files:
2 edited

### Legend:

Unmodified
 r382 \vspace*{1cm}Project FP7-ICT-2009-C-243881 \cerco{}} \lstdefinelanguage{matita} \lstdefinelanguage{matita-ocaml} {keywords={ndefinition,ncoercion,nlemma,ntheorem,nremark,ninductive,nrecord,nqed,nlet,let,in,rec,match,return,with,Type}, morekeywords={[2]nwhd,nnormalize,nelim,ncases,ndestruct}, morekeywords={[3]type,of}, mathescape=true, } \lstset{language=matita,basicstyle=\small\tt,columns=flexible,breaklines=false, \lstset{language=matita-ocaml,basicstyle=\small\tt,columns=flexible,breaklines=false, keywordstyle=\color{red}\bfseries, keywordstyle=[2]\color{blue}, keywordstyle=[3]\color{blue}\bfseries, commentstyle=\color{green}, stringstyle=\color{blue}, \vspace*{7cm} \paragraph{Abstract} We discuss the design and implementation of an O'Caml emulator for the Intel 8051/8052 eight bit processor, and its subsequent formalisation in the dependently typed proof assistant Matita. We discuss the implementation of a prototype O'Caml emulator for the Intel 8051/8052 eight bit processor, and its subsequent formalisation in the dependently typed proof assistant Matita. In particular, we focus on the decisions made during the design of both emulators, and how the design of the O'Caml emulator had to be modified in order to fit into the more stringent type system of Matita. We also briefly summarise the Intel 8051/8052 processor architecture, our target processor model for the \textsf{CerCo} project. \newpage \newpage \section{Introduction} \label{sect.introduction} \subsection{Task} \label{subsect.task} \section{Task} \label{sect.task} The Grant Agreement states the D4.1/D4.2 deliverables consist of the following tasks: We now report on our implementation of these deliverables. \subsection{A brief overview of the target processor} \label{subsect.brief.overview.target.processor} \section{A brief overview of the target processor} \label{sect.brief.overview.target.processor} The MCS-51 is an eight bit microprocessor introduced by Intel in the late 1970s. \label{sect.emulator.in.ocaml} \subsection{Anatomy of the emulator} \label{subsect.anatomy.emulator} We discuss decisions made during the design of the prototype O'Caml emulator. \subsection{Lack of orthogonality in instruction set} For instance, consider the MOV instruction, which implements a data transfer between two memory locations, which takes eighteen possible combinations of addressing modes. \subsection{Pseudo-instructions} \label{subsect.pseudo-instructions} We handle this problem by introducing unions' of types, using O'Caml's polymorphic variants feature: \begin{quote} \begin{lstlisting} type ('a, 'b) union2 = [ U1 of 'a | U2 of 'b ] \end{lstlisting} \end{quote} (We also introduce \texttt{union3} and \texttt{union6}, which suffice for our purposes.) Using these union types, we can rationalise the inductive type encoding the assembly instruction set. For instance: \begin{quote} \begin{lstlisting} type 'addr preinstruction = ... | XRL of (acc * [ data | reg | direct | indirect ], direct * [ acc | data ]) union2 ... \end{lstlisting} \end{quote} That is, the \texttt{XRL} instruction\footnote{Exclusive disjunction.} take either the accumulator A as its first argument, followed by data with one of data, register, direct or indirect addressing modes, or takes data with a direct addressing mode as its first argument, with either the accumulator A or data with the data addressing mode as its second argument. Further, all functions that must pattern match against the \texttt{(pre)instruction} inductive type are also simplified using this technique. Using O'Caml's ability to perform deep pattern' matches, we may pattern match against \texttt{XRL(U1(arg1, arg2))} and have the guarantee that \texttt{arg1} takes the form \texttt{ACC\_A}. \subsection{Pseudo-instructions and labels} \label{subsect.pseudo-instructions.labels} Per the description of Deliverable D4.1 in the Grant Agreement above, the 8051 emulator must eventually interface with the C compiler frontend of Deliverable D3.2, produced in Paris. After consultation, it was found that the design of the compiler frontend could be simplified considerably with the introduction of \emph{pseudoinstructions} and labels. We introduce three new pseudoinstructions---\texttt{Jump}, \texttt{Call}, and \texttt{Mov}---corresponding to unconditional jumps, procedure calls and data transfers respectively. We also promote' all unlabeled conditional jumps in 8051 assembly to labeled pseudojumps; one can now jump to a label conditionally, as opposed to jumping to a fixed relative offset. Further, we introduce labels for jumping to, and cost annotations, used by the Paris team. The three new pseudoinstructions, along with the promoted conditional jumps, allow the Paris team to abstract away from the differences between different types of unconditional jump (the 8051 has three different sorts, depending on the length of the jump), as well as abstract away the differences between memory transfers and calls. However, the emulator must perform an expansion stage, during which pseudoinstructions are translated to real' 8051 assembly instructions. The introduction of labeled conditional jumps slightly complicates our type of abstract syntax for 8051 assembly. We define an inductive type representing conditional jumps in 8051 assembly code, parameterised by a type representing relative offsets:s \begin{quote} \begin{lstlisting} type 'addr jump = [ JC of 'addr | JNC of 'addr ... \end{lstlisting} \end{quote} An inductive type of preinstructions is defined, which is also parameterised by a type representing relative offsets in assembly code, and incorporates the inductive type of conditional jumps: \begin{quote} \begin{lstlisting} type 'addr preinstruction = [ ADD of acc * [ reg | direct | indirect | data ] ... | 'addr jump ... \end{lstlisting} \end{quote} A type representing instructions is defined, choosing a concrete type for relative offsets: \begin{quote} \begin{lstlisting} type instruction = rel preinstruction \end{lstlisting} \end{quote} Here, \texttt{rel} is a type which wraps up' a byte. Finally, this type of instructions is incorporated into a type of labelled instructions: \begin{quote} \begin{lstlisting} type labelled_instruction = [ instruction | Cost of string | Label of string | Jmp of string | Call of string | Mov of dptr * string | WithLabel of [`Label of string] jump ] \end{lstlisting} \end{quote} Throughout, we make heavy use of polymorphic variants to deal with issues relating to subtyping. As mentioned, the emulator must now handle an additional expansion stage, removing pseudoinstructions in favour of real, 8051 assembly instructions. This is relatively straightforward, and is done in two stages. The first stage consists of iterating over an assembly program, building a dictionary of all labels and their position in the program. The second stage consists of iterating over the same program and replacing all pseudojumps (both conditional and unconditional) with an 8051 jump to the requisite computed offset. One subtletly persists, however. The 8051 has three different types of unconditional jump, depending on the length of the jump to be used: \texttt{AJMP}, \texttt{JMP} and \texttt{LJMP}. The instructions \texttt{AJMP} and \texttt{JMP} are short jumps, whereas \texttt{LJMP} is a long jump, capable of reaching anywhere in the program. At the moment, the second pass of the expansion stage replaces all unconditional pseudojumps with a \texttt{LJMP} for simplicity. We do, however, plan to improve this process for efficiency reasons, expanding to shorter jumps where feasible. \subsection{Validation} \label{subsect.validation} In validating the design and implementation of the O'Caml emulator we used two tactics: Use of reference compilers and emulators. The operation of the emulator was manually tested by reference to \textsc{mcu 8051 ide}, an emulator for the 8051 series processor. A number of small C programs were compiled in SDCC\footnote{See the \texttt{GCC} directory for a selection of them.}, and the resulting IHX files were disassembled by \textsc{mcu 8051 ide}. A number of small C programs were compiled in SDCC\footnote{See the \texttt{GCC} directory for a selection of them.}. The resulting IHX files were disassembled by \textsc{mcu 8051 ide}. (IHX files are a standard format for transferring compiled assembly code onto an 8051 series processor, produced by SDCC and all other compilers that target that 8051.) The status changes in both emulators were then compared. As a further check, the design and operation of the emulator was compared with the textual description of online tutorials on 8051 programming, such as those found at \url{http://www.8052.com}. \subsection{Validation} \label{subsect.validation} \section{The emulator in Matita} \label{sect.emulator.in.matita} The O'Caml emulator served as a testbed and prototype for an emulator written in the internal language of the Matita proof assistant. We describe our work porting the emulator to Matita, especially where the design of the Matita emulator differs from that of the O'Caml version. \subsection{What we do not implement}