source: Deliverables/D4.1/Report/report.tex @ 377

Last change on this file since 377 was 377, checked in by mulligan, 10 years ago

Description of techniques related to validation of O'Caml emulator.

File size: 13.3 KB
Line 
1\documentclass[11pt, epsf, a4wide]{article}
2
3\usepackage{../../style/cerco}
4
5\usepackage{amsfonts}
6\usepackage{amsmath}
7\usepackage{amssymb} 
8\usepackage[english]{babel}
9\usepackage[utf8x]{inputenc}
10\usepackage{listings}
11\usepackage{url}
12
13\title{
14INFORMATION AND COMMUNICATION TECHNOLOGIES\\
15(ICT)\\
16PROGRAMME\\
17\vspace*{1cm}Project FP7-ICT-2009-C-243881 \cerco{}}
18
19\lstdefinelanguage{matita}
20  {keywords={ndefinition,nlemma,ntheorem,nremark,ninductive,nrecord,nqed,nlet,rec,match,with,Type},
21   morekeywords={[2]nwhd,nnormalize,nelim,ncases,ndestruct},
22   mathescape=true,
23  }
24
25\lstset{language=matita,basicstyle=\small\tt,columns=flexible,breaklines=false,
26        keywordstyle=\color{red}\bfseries,
27        keywordstyle=[2]\color{blue},
28        commentstyle=\color{green},
29        stringstyle=\color{blue},
30        showspaces=false,showstringspaces=false}
31
32\lstset{extendedchars=false}
33\lstset{inputencoding=utf8x}
34\DeclareUnicodeCharacter{8797}{:=}
35\DeclareUnicodeCharacter{10746}{++}
36\DeclareUnicodeCharacter{9001}{\ensuremath{\langle}}
37\DeclareUnicodeCharacter{9002}{\ensuremath{\rangle}}
38
39\date{}
40\author{}
41
42\begin{document}
43
44\thispagestyle{empty}
45
46\vspace*{-1cm}
47\begin{center}
48\includegraphics[width=0.6\textwidth]{../../style/cerco_logo.png}
49\end{center}
50
51\begin{minipage}{\textwidth}
52\maketitle
53\end{minipage}
54
55\vspace*{0.5cm}
56\begin{center}
57\begin{LARGE}
58\textbf{
59Report n. D4.1\\
60Intel 8051/8052 emulator prototype, and formalisation in Matita}
61\end{LARGE} 
62\end{center}
63
64\vspace*{2cm}
65\begin{center}
66\begin{large}
67Version 1.0
68\end{large}
69\end{center}
70
71\vspace*{0.5cm}
72\begin{center}
73\begin{large}
74Main Authors:\\
75Claudio Sacerdoti Coen and Dominic P. Mulligan
76\end{large}
77\end{center}
78
79\vspace*{\fill}
80
81\noindent
82Project Acronym: \cerco{}\\
83Project full title: Certified Complexity\\
84Proposal/Contract no.: FP7-ICT-2009-C-243881 \cerco{}\\
85
86\clearpage
87\pagestyle{myheadings}
88\markright{\cerco{}, FP7-ICT-2009-C-243881}
89
90\newpage
91
92\vspace*{7cm}
93\paragraph{Abstract}
94We discuss the design and implementation of an O'Caml emulator for the Intel 8051/8052 eight bit processor, and its subsequent formalisation in the dependently typed proof assistant Matita.
95
96\newpage
97
98\tableofcontents
99
100\newpage
101
102\section{Introduction}
103\label{sect.introduction}
104
105\subsection{Task}
106\label{subsect.task}
107
108\subsection{A brief overview of the target processor}
109\label{subsect.brief.overview.target.processor}
110
111\begin{figure}
112\caption{High level overview of the 8051 memory layout}
113\label{fig.memory.layout}
114\end{figure}
115
116The MCS-51 is an eight bit microprocessor introduced by Intel in the late 1970s.
117Commonly called the 8051, in the three decades since its introduction the processor has become a highly popular target for embedded systems engineers.
118Further, the processor and its immediate successor, the 8052, is still manufactured by a host of semiconductor suppliers---many of them European---including Atmel, Siemens Semiconductor, NXP (formerly Phillips Semiconductor), Texas Instruments, and Maxim (formerly Dallas Semiconductor).
119
120The 8051 is a well documented processor, and has the additional support of numerous open source and commercial tools, such as compilers for high-level languages and emulators.
121For instance, the open source Small Device C Compiler (SDCC) recognises a dialect of C, and other compilers targeting the 8051 for BASIC, Forth and Modula-2 are also extant.
122An open source emulator for the processor, MCU8051 IDE, is also available.
123
124% Processor architecture
125The 8051 has a relatively straightforward architecture, unencumbered by advanced features of modern processors, making it an ideal target for formalisation.
126A high-level overview of the processor's memory layout is provided in Figure~\ref{fig.memory.layout}.
127
128% Internal RAM
129Processor RAM is divided into numerous segments, with the most prominent division being between internal and (optional) external memory.
130Internal memory, commonly provided on the die itself with fast access, is further divided into 128 bytes of internal RAM and numerous Special Function Registers (SFRs) which control the operation of the processor.
131Internal RAM (IRAM) is further divided into a eight general purpose bit-addressable registers (R0--R7).
132These sit in the first eight bytes of IRAM, though can be programmatically `shifted up' as needed.
133Bit memory, followed by a small amount of stack space resides in the memory space immediately after the register banks.
134What remains of the IRAM may be treated as general purpose memory.
135
136% External RAM
137External RAM (XRAM), limited to 64 kilobytes, is optional, and may be provided on or off chip, depending on the manufacturer.
138XRAM is accessed using a dedicated instruction.
139External code memory is often stored in the form of an EPROM, and limited to 64 kilobytes in size.
140However, depending on the particular manufacturer and processor model, a dedicated on-die read-only memory area for program code may also be supplied (the processor has a Harvard architecture, where program code and data are separated).
141
142% ALU
143The 8051 series possesses an eight bit Arithmetic and Logic Unit (ALU), with a wide variety of instructions for performing arithmetic and logical operations on bits and integers.
144Further, the processor possesses two eight bit general purpose accumulators, A and B.
145
146% Serial I/O and the input-output lines
147Communication with the device is facilitated by an onboard UART serial port, and associated serial controller, which can operate in numerous modes.
148Serial baud rate is determined by one of two sixteen bit timers included with the 8051, which can be set to multiple modes of operation.
149(The 8052 provides an additional sixteen bit timer.)
150As an additional method of communication, the 8051 also provides a four byte bit-addressable input-output port.
151
152%
153The programmer may take advantage of the interrupt mechanism that the processor provides.
154This is especially useful when dealing with input or output involving the serial device, as an interrupt can be set when a whole character is sent or received via the serial port.
155
156Interrupts immediately halt the flow of execution of the processor, and cause the program counter to jump to a fixed address, where the requisite interrupt handler is stored.
157However, interrupts may be set to one of two priorities: low and high.
158The interrupt handler of an interrupt with high priority is executed ahead of the interrupt handler of an interrupt of lower priority, interrupting a currently executing handler of lower priority, if necessary.
159
160The 8051 has interrupts disabled by default.
161The programmer is free to handle serial input and output manually, by poking serial flags in the SFRs.
162Similarly, `exceptional circumstances' that would otherwise trigger an interrupt on more modern processors, for example, division by zero, are also signalled by setting flags.
163
164\section{The emulator in O'Caml}
165\label{sect.emulator.in.ocaml}
166
167\subsection{Anatomy of the emulator}
168\label{subsect.anatomy.emulator}
169
170We provide a high-level overview of the operation of the emulator.
171
172% Intel HEX, parsing
173Program code is loaded onto the 8051 in a standard format, the Intel Hex (IHX) format.
174All compilers producing machine code for the 8051, including the SDCC compiler which we use for debugging purposes, produce compiled programs in IHX format as standard.
175Accordingly, our O'Caml emulator can parse IHX files and populate the emulator's code memory with their contents.
176
177Once code memory is populated, and the rest of the emulator state has been initialised (i.e. setting the program counter to zero), the O'Caml emulator fetches the instruction pointed to by the program counter from code memory.
178
179\subsection{Lack of orthogonality in instruction set}
180\label{subsect.lack.orthogonality.instruction.set}
181
182\subsection{Pseudo-instructions}
183\label{subsect.pseudo-instructions}
184
185In validating the design and implementation of the O'Caml emulator we used two tactics:
186\begin{enumerate}
187\item
188Use of multiple manufacturer's data sheets (both the Siemens Semiconductor and Phillips Semiconductor specifications for the 8051, as well as online sources such as the Keil website).
189We found typographic errors in manufacturer's data sheets which were resolved by consulting an alternative sheet.
190\item
191Use of reference compilers and emulators.
192The operation of the emulator was manually tested by reference to \textsc{mcu 8051 ide}, an emulator for the 8051 series processor.
193A number of small C programs were compiled in SDCC\footnote{See the \texttt{GCC} directory for a selection of them.}, and the resulting IHX files were disassembled by \textsc{mcu 8051 ide}.
194The status changes in both emulators were then compared.
195
196For further validation, the output of the compiled C programs from SDCC was compared with the output of the same programs in GCC, in order to pre-empt the introduction of bugs in the emulator inherited from a faulty C compiler.
197\end{enumerate}
198
199As a further check, the design and operation of the emulator was compared with the textual description of online tutorials on 8051 programming, such as those found at \url{http://www.8052.com}.
200
201\subsection{Validation}
202\label{subsect.validation}
203
204\section{The emulator in Matita}
205\label{sect.emulator.in.matita}
206
207\subsection{What we do not implement}
208\label{subsect.what.we.do.not.implement}
209
210Our O'Caml 8051 emulator provides functions for reading and parsing Intel IHX format files.
211We do not implement these functions in the Matita emulator, as Matita provides no means of input or output.
212
213\subsection{Auxilliary data structures and libraries}
214\label{subsect.auxilliary.data.structures.and.libraries}
215
216A small library of data structures was written, along with basic functions operating over them.
217Implemented data structures include: Booleans, option types, lists, Cartesian products, Natural numbers, fixed-length vectors, and sparse tries.
218
219Our type of vectors, in particular, makes heavy use of dependent types.
220Probing vectors is `type safe' for instance: we cannot index into a vector beyond the vector's length.
221
222We represent bits as Boolean values.
223Nibbles, bytes, words, and so on, are represented as fixed length (bit)vectors of the requisite length.
224
225\subsection{The emulator state}
226\label{subsect.emulator.state}
227
228We represent all processor memory in the Matita emulator as a sparse (bitvector)trie:
229
230\begin{quote}
231\begin{lstlisting}
232ninductive BitVectorTrie (A: Type[0]): Nat $\rightarrow$ Type[0] ≝
233  Leaf: A $\rightarrow$ BitVectorTrie A Z
234| Node: ∀n: Nat. BitVectorTrie A n $\rightarrow$ BitVectorTrie A n $\rightarrow$ BitVectorTrie A (S n)
235| Stub: ∀n: Nat. BitVectorTrie A n.
236\end{lstlisting}
237\end{quote}
238
239Nodes are addressed by a bitvector index, representing a path through the tree.
240At any point in the tree, a \texttt{Stub} may be inserted, representing a `hole' in the tree.
241All functions operating on tries use dependent types to enforce the invariant that the height of the tree and the length of the bitvector representing a path through the tree are the same.
242
243We probe a trie with the \texttt{lookup} function.
244This takes an additional argument representing the value to be returned should a stub, representing uninitialised data, be encountered during traversal.
245
246Like the O'Caml emulator, we use a record to represent processor state:
247
248\begin{quote}
249\begin{lstlisting}
250nrecord Status: Type[0] ≝
251{
252  code_memory: BitVectorTrie Byte sixteen;
253  low_internal_ram: BitVectorTrie Byte seven;
254  high_internal_ram: BitVectorTrie Byte seven;
255  external_ram: BitVectorTrie Byte sixteen;
256 
257  program_counter: Word;
258 
259  special_function_registers_8051: Vector Byte nineteen;
260  special_function_registers_8052: Vector Byte five;
261 
262  ...
263}.
264\end{lstlisting}
265\end{quote}
266
267However, we `squash' the \texttt{Status} record in the Matita emulator by grouping all 8051 SFRs (respectively, 8052 SFRs) into a single vector of bytes, as opposed to representing them as explicit fields in the record itself.
268We then provide functions that index into the respective vector to `get' and `set' the respective SFRs.
269This is due to record typechecking in Matita being slow for large records.
270
271\subsection{Addressing modes: use of dependent types}
272\label{subsect.addressing.modes.use.of.dependent.types}
273
274\subsection{Dealing with partiality}
275\label{subsect.dealing.with.partiality}
276
277The O'Caml 8051 emulator makes use of a number of partial functions.
278These functions either \texttt{assert false}\footnote{O'Caml idiom: immediately halts execution of the running program.} or do not perform a comprehensive pattern analysis over their inputs.
279There are a number of possible reasons for this:
280\begin{enumerate}
281\item
282\textbf{Incomplete pattern analyses} are used where we are confident that the particular pattern match in question should never occur, for instance if the calling function performs a test beforehand, or where the emulator should fail anyway if a particular unchecked pattern is used as input.
283An example of a function which exhibits the latter behaviour is \texttt{set\_arg\_16} from \texttt{ASMInterpret.ml}, which fails with a pattern match exception if called on an input representing an eight bit argument.
284\item
285\textbf{Assert false} may be called if the emulator finds itself in an `impossible situation'.
286In this respect, we used \texttt{assert false} in a similar way to the previously described use of incomplete pattern analysis.
287\item
288\textbf{Assert false} may be called is some feature of the physical 8051 processor is not implemented in the O'Caml emulator and an executing program is attempting to use it.
289\end{enumerate}
290\end{document}
Note: See TracBrowser for help on using the repository browser.