source: Deliverables/D4.1/Report/report.tex @ 380

Last change on this file since 380 was 380, checked in by mulligan, 10 years ago

More added on subtyping stuff, etc.

File size: 19.8 KB
Line 
1\documentclass[11pt, epsf, a4wide]{article}
2
3\usepackage{../../style/cerco}
4
5\usepackage{amsfonts}
6\usepackage{amsmath}
7\usepackage{amssymb} 
8\usepackage[english]{babel}
9\usepackage[utf8x]{inputenc}
10\usepackage{listings}
11\usepackage{stmaryrd}
12\usepackage{url}
13
14\title{
15INFORMATION AND COMMUNICATION TECHNOLOGIES\\
16(ICT)\\
17PROGRAMME\\
18\vspace*{1cm}Project FP7-ICT-2009-C-243881 \cerco{}}
19
20\lstdefinelanguage{matita}
21  {keywords={ndefinition,ncoercion,nlemma,ntheorem,nremark,ninductive,nrecord,nqed,nlet,let,in,rec,match,return,with,Type},
22   morekeywords={[2]nwhd,nnormalize,nelim,ncases,ndestruct},
23   mathescape=true,
24  }
25
26\lstset{language=matita,basicstyle=\small\tt,columns=flexible,breaklines=false,
27        keywordstyle=\color{red}\bfseries,
28        keywordstyle=[2]\color{blue},
29        commentstyle=\color{green},
30        stringstyle=\color{blue},
31        showspaces=false,showstringspaces=false}
32
33\lstset{extendedchars=false}
34\lstset{inputencoding=utf8x}
35\DeclareUnicodeCharacter{8797}{:=}
36\DeclareUnicodeCharacter{10746}{++}
37\DeclareUnicodeCharacter{9001}{\ensuremath{\langle}}
38\DeclareUnicodeCharacter{9002}{\ensuremath{\rangle}}
39
40\date{}
41\author{}
42
43\begin{document}
44
45\thispagestyle{empty}
46
47\vspace*{-1cm}
48\begin{center}
49\includegraphics[width=0.6\textwidth]{../../style/cerco_logo.png}
50\end{center}
51
52\begin{minipage}{\textwidth}
53\maketitle
54\end{minipage}
55
56\vspace*{0.5cm}
57\begin{center}
58\begin{LARGE}
59\textbf{
60Report n. D4.1\\
61Intel 8051/8052 emulator prototype, and formalisation in Matita}
62\end{LARGE} 
63\end{center}
64
65\vspace*{2cm}
66\begin{center}
67\begin{large}
68Version 1.0
69\end{large}
70\end{center}
71
72\vspace*{0.5cm}
73\begin{center}
74\begin{large}
75Main Authors:\\
76Claudio Sacerdoti Coen and Dominic P. Mulligan
77\end{large}
78\end{center}
79
80\vspace*{\fill}
81
82\noindent
83Project Acronym: \cerco{}\\
84Project full title: Certified Complexity\\
85Proposal/Contract no.: FP7-ICT-2009-C-243881 \cerco{}\\
86
87\clearpage
88\pagestyle{myheadings}
89\markright{\cerco{}, FP7-ICT-2009-C-243881}
90
91\newpage
92
93\vspace*{7cm}
94\paragraph{Abstract}
95We discuss the design and implementation of an O'Caml emulator for the Intel 8051/8052 eight bit processor, and its subsequent formalisation in the dependently typed proof assistant Matita.
96
97\newpage
98
99\tableofcontents
100
101\newpage
102
103\section{Introduction}
104\label{sect.introduction}
105
106\subsection{Task}
107\label{subsect.task}
108
109The Grant Agreement states the D4.1/D4.2 deliverables consist of:
110
111\begin{quotation}
112\textbf{Executable Formal Semantics of Machine Code}: Formal definition of the semantics of the target language.
113The semantics will be given in a functional (and hence executable) form, useful for testing, validation and project assessment.
114\end{quotation}
115
116\begin{quotation}
117\textbf{CIC encoding: Back-end}: Functional Specification in the internal language of the Proof Assistant (the Calculus of Inductive Construction) of the back end of the compiler.
118This unit is meant to be composable with the front-end of deliverable D3.2, to obtain a full working compiler for Milestone M2.
119A first validation of the design principles and implementation choices for the Untrusted Cost-annotating OCaml Compiler D2.2 is achieved and reported in the deliverable, possibly triggering updates of the Untrusted Cost-annotating OCaml Compiler sources.
120\end{quotation}
121We now report on our implementation of these deliverables.
122
123\subsection{A brief overview of the target processor}
124\label{subsect.brief.overview.target.processor}
125
126\begin{figure}
127\caption{High level overview of the 8051 memory layout}
128\label{fig.memory.layout}
129\end{figure}
130
131The MCS-51 is an eight bit microprocessor introduced by Intel in the late 1970s.
132Commonly called the 8051, in the three decades since its introduction the processor has become a highly popular target for embedded systems engineers.
133Further, the processor and its immediate successor, the 8052, is still manufactured by a host of semiconductor suppliers---many of them European---including Atmel, Siemens Semiconductor, NXP (formerly Phillips Semiconductor), Texas Instruments, and Maxim (formerly Dallas Semiconductor).
134
135The 8051 is a well documented processor, and has the additional support of numerous open source and commercial tools, such as compilers for high-level languages and emulators.
136For instance, the open source Small Device C Compiler (SDCC) recognises a dialect of C, and other compilers targeting the 8051 for BASIC, Forth and Modula-2 are also extant.
137An open source emulator for the processor, MCU8051 IDE, is also available.
138
139The 8051 has a relatively straightforward architecture, unencumbered by advanced features of modern processors, making it an ideal target for formalisation.
140A high-level overview of the processor's memory layout is provided in Figure~\ref{fig.memory.layout}.
141
142Processor RAM is divided into numerous segments, with the most prominent division being between internal and (optional) external memory.
143Internal memory, commonly provided on the die itself with fast access, is further divided into 128 bytes of internal RAM and numerous Special Function Registers (SFRs) which control the operation of the processor.
144Internal RAM (IRAM) is further divided into a eight general purpose bit-addressable registers (R0--R7).
145These sit in the first eight bytes of IRAM, though can be programmatically `shifted up' as needed.
146Bit memory, followed by a small amount of stack space resides in the memory space immediately after the register banks.
147What remains of the IRAM may be treated as general purpose memory.
148
149External RAM (XRAM), limited to 64 kilobytes, is optional, and may be provided on or off chip, depending on the manufacturer.
150XRAM is accessed using a dedicated instruction.
151External code memory is often stored in the form of an EPROM, and limited to 64 kilobytes in size.
152However, depending on the particular manufacturer and processor model, a dedicated on-die read-only memory area for program code may also be supplied (the processor has a Harvard architecture, where program code and data are separated).
153
154Memory may be addressed in numerous ways: immediate, direct, indirect, external direct and code indirect.
155As the latter two addressing modes hint, there are some restrictions enforced by the 8051 and its derivatives on which addressing modes may be used with specific types of memory.
156For instance, the 128 bytes of extra internal RAM that the 8052 features cannot be addressed using indirect addressing; rather, external (in)direct addressing must be used.
157
158The 8051 series possesses an eight bit Arithmetic and Logic Unit (ALU), with a wide variety of instructions for performing arithmetic and logical operations on bits and integers.
159Further, the processor possesses two eight bit general purpose accumulators, A and B.
160
161Communication with the device is facilitated by an onboard UART serial port, and associated serial controller, which can operate in numerous modes.
162Serial baud rate is determined by one of two sixteen bit timers included with the 8051, which can be set to multiple modes of operation.
163(The 8052 provides an additional sixteen bit timer.)
164As an additional method of communication, the 8051 also provides a four byte bit-addressable input-output port.
165
166The programmer may take advantage of the interrupt mechanism that the processor provides.
167This is especially useful when dealing with input or output involving the serial device, as an interrupt can be set when a whole character is sent or received via the serial port.
168
169Interrupts immediately halt the flow of execution of the processor, and cause the program counter to jump to a fixed address, where the requisite interrupt handler is stored.
170However, interrupts may be set to one of two priorities: low and high.
171The interrupt handler of an interrupt with high priority is executed ahead of the interrupt handler of an interrupt of lower priority, interrupting a currently executing handler of lower priority, if necessary.
172
173The 8051 has interrupts disabled by default.
174The programmer is free to handle serial input and output manually, by poking serial flags in the SFRs.
175Similarly, `exceptional circumstances' that would otherwise trigger an interrupt on more modern processors, for example, division by zero, are also signalled by setting flags.
176
177\section{The emulator in O'Caml}
178\label{sect.emulator.in.ocaml}
179
180\subsection{Anatomy of the emulator}
181\label{subsect.anatomy.emulator}
182
183\subsection{Lack of orthogonality in instruction set}
184\label{subsect.lack.orthogonality.instruction.set}
185
186The instruction set of 8051 assembly is highly irregular.
187For instance, consider the MOV instruction, which implements a data transfer between two memory locations, which takes eighteen possible combinations of addressing modes.
188
189\subsection{Pseudo-instructions}
190\label{subsect.pseudo-instructions}
191
192In validating the design and implementation of the O'Caml emulator we used two tactics:
193\begin{enumerate}
194\item
195Use of multiple manufacturer's data sheets (both the Siemens Semiconductor and Phillips Semiconductor specifications for the 8051, as well as online sources such as the Keil website).
196We found typographic errors in manufacturer's data sheets which were resolved by consulting an alternative sheet.
197\item
198Use of reference compilers and emulators.
199The operation of the emulator was manually tested by reference to \textsc{mcu 8051 ide}, an emulator for the 8051 series processor.
200A number of small C programs were compiled in SDCC\footnote{See the \texttt{GCC} directory for a selection of them.}, and the resulting IHX files were disassembled by \textsc{mcu 8051 ide}.
201The status changes in both emulators were then compared.
202
203For further validation, the output of the compiled C programs from SDCC was compared with the output of the same programs in GCC, in order to pre-empt the introduction of bugs in the emulator inherited from a faulty C compiler.
204\end{enumerate}
205
206As a further check, the design and operation of the emulator was compared with the textual description of online tutorials on 8051 programming, such as those found at \url{http://www.8052.com}.
207
208\subsection{Validation}
209\label{subsect.validation}
210
211\section{The emulator in Matita}
212\label{sect.emulator.in.matita}
213
214\subsection{What we do not implement}
215\label{subsect.what.we.do.not.implement}
216
217Our O'Caml 8051 emulator provides functions for reading and parsing Intel IHX format files.
218We do not implement these functions in the Matita emulator, as Matita provides no means of input or output.
219
220\subsection{Auxilliary data structures and libraries}
221\label{subsect.auxilliary.data.structures.and.libraries}
222
223A small library of data structures was written, along with basic functions operating over them.
224Implemented data structures include: Booleans, option types, lists, Cartesian products, Natural numbers, fixed-length vectors, and sparse tries.
225
226Our type of vectors, in particular, makes heavy use of dependent types.
227Probing vectors is `type safe' for instance: we cannot index into a vector beyond the vector's length.
228
229We represent bits as Boolean values.
230Nibbles, bytes, words, and so on, are represented as fixed length (bit)vectors of the requisite length.
231
232\subsection{The emulator state}
233\label{subsect.emulator.state}
234
235We represent all processor memory in the Matita emulator as a sparse (bitvector)trie:
236
237\begin{quote}
238\begin{lstlisting}
239ninductive BitVectorTrie (A: Type[0]): Nat $\rightarrow$ Type[0] ≝
240  Leaf: A $\rightarrow$ BitVectorTrie A Z
241| Node: ∀n: Nat. BitVectorTrie A n $\rightarrow$ BitVectorTrie A n $\rightarrow$ BitVectorTrie A (S n)
242| Stub: ∀n: Nat. BitVectorTrie A n.
243\end{lstlisting}
244\end{quote}
245
246Nodes are addressed by a bitvector index, representing a path through the tree.
247At any point in the tree, a \texttt{Stub} may be inserted, representing a `hole' in the tree.
248All functions operating on tries use dependent types to enforce the invariant that the height of the tree and the length of the bitvector representing a path through the tree are the same.
249
250We probe a trie with the \texttt{lookup} function.
251This takes an additional argument representing the value to be returned should a stub, representing uninitialised data, be encountered during traversal.
252
253Like the O'Caml emulator, we use a record to represent processor state:
254
255\begin{quote}
256\begin{lstlisting}
257nrecord Status: Type[0] ≝
258{
259  code_memory: BitVectorTrie Byte sixteen;
260  low_internal_ram: BitVectorTrie Byte seven;
261  high_internal_ram: BitVectorTrie Byte seven;
262  external_ram: BitVectorTrie Byte sixteen;
263 
264  program_counter: Word;
265 
266  special_function_registers_8051: Vector Byte nineteen;
267  special_function_registers_8052: Vector Byte five;
268 
269  ...
270}.
271\end{lstlisting}
272\end{quote}
273
274However, we `squash' the \texttt{Status} record in the Matita emulator by grouping all 8051 SFRs (respectively, 8052 SFRs) into a single vector of bytes, as opposed to representing them as explicit fields in the record itself.
275We then provide functions that index into the respective vector to `get' and `set' the respective SFRs.
276This is due to record typechecking in Matita being slow for large records.
277
278\subsection{Dealing with partiality}
279\label{subsect.dealing.with.partiality}
280
281The O'Caml 8051 emulator makes use of a number of partial functions.
282These functions either \texttt{assert false}\footnote{O'Caml idiom: immediately halts execution of the running program.} or do not perform a comprehensive pattern analysis over their inputs.
283There are a number of possible reasons for this:
284\begin{enumerate}
285\item
286\textbf{Incomplete pattern analyses} are used where we are confident that the particular pattern match in question should never occur, for instance if the calling function performs a test beforehand, or where the emulator should fail anyway if a particular unchecked pattern is used as input.
287An example of a function which exhibits the latter behaviour is \texttt{set\_arg\_16} from \texttt{ASMInterpret.ml}, which fails with a pattern match exception if called on an input representing an eight bit argument.
288\item
289\textbf{Assert false} may be called if the emulator finds itself in an `impossible situation', such as encountering an empty list where a list containing one element is expected.
290In this respect, we used \texttt{assert false} in a similar way to the previously described use of incomplete pattern analysis.
291\item
292\textbf{Assert false} may be called is some feature of the physical 8051 processor is not implemented in the O'Caml emulator and an executing program is attempting to use it.
293\end{enumerate}
294
295The three manifestations of partiality above can be split into two types: partiality that manifests itself due to O'Caml's type system not being strong enough to rule the cause out, and partiality that signals a `real' crash in the processor due to the user attempting to use an unimplemented feature.
296Items 1 and 2 belong to the former class, Item 3 to the latter.
297
298Clearly Items 1 and 2 above must be addressed in the Matita formalisation.
299Item 2 is solved through extensive use of dependent types.
300Indexing into lists and vectors, for instance, is always `type safe', as we provide probing functions with strong dependent types.
301
302Item 1 is perhaps the most problematic of the three problems, as we either have to provide an exhaustive case analysis, use pattern wildcards, or find a clever way of encoding the possible patterns that are expected as input in the type of a function.
303We employ a technique that implements the latter idea.
304This is discussed in Subsection~\ref{subsect.addressing.modes.use.of.dependent.types}.
305
306To solve Item 3 above in the Matita formalisation of the emulator, we introduce an axiom \texttt{not\_implemented} of type \texttt{False}.
307When the emulator attempts to use an unimplemented feature, we introduce a metavariable, corresponding to an open proof obligation.
308These obligations are closed by performing a case analysis over \texttt{not\_implemented}.
309
310\subsection{Addressing modes: use of dependent types}
311\label{subsect.addressing.modes.use.of.dependent.types}
312
313We provide an inductive data type representing all possible addressing modes of 8051 assembly.
314This is the type that functions will pattern match against.
315
316\begin{quote}
317\begin{lstlisting}
318ninductive addressing_mode: Type[0] ≝
319  DIRECT: Byte $\rightarrow$ addressing_mode
320| INDIRECT: Bit $\rightarrow$ addressing_mode
321...
322\end{lstlisting}
323\end{quote}
324However, we also wish to express in the type of our functions the \emph{impossibility} of pattern matching against certain constructors.
325In order to do this, we introduce an inductive type of addressing mode `tags'.
326The constructors of \texttt{addressing\_mode\_tag} are in one-one correspondence with the constructors of \texttt{addressing\_mode}:
327\begin{quote}
328\begin{lstlisting}
329ninductive addressing_mode_tag : Type[0] ≝
330  direct: addressing_mode_tag
331| indirect: addressing_mode_tag
332...
333\end{lstlisting}
334\end{quote}
335We then provide a function that checks whether an \texttt{addressing\_mode} is `morally' an \texttt{addressing\_mode\_tag}, as follows:
336\begin{quote}
337\begin{lstlisting}
338nlet rec is_a (d:addressing_mode_tag) (A:addressing_mode) on d ≝
339  match d with
340   [ direct $\Rightarrow$ match A with [ DIRECT _ $\Rightarrow$ true | _ $\Rightarrow$ false ]
341   | indirect $\Rightarrow$ match A with [ INDIRECT _ $\Rightarrow$ true | _ $\Rightarrow$ false ]
342...
343\end{lstlisting}
344\end{quote}
345We also extend this check to vectors of \texttt{addressing\_mode\_tag}'s in the obvious manner:
346\begin{quote}
347\begin{lstlisting}
348nlet rec is_in (n: Nat) (l: Vector addressing_mode_tag n) (A:addressing_mode) on l ≝
349 match l return $\lambda$m.$\lambda$_ :Vector addressing_mode_tag m.Bool with
350  [ VEmpty $\Rightarrow$ false
351  | VCons m he (tl: Vector addressing_mode_tag m) $\Rightarrow$
352     is_a he A $\vee$ is_in ? tl A ].
353\end{lstlisting}
354\end{quote}
355Here \texttt{VEmpty} and \texttt{VCons} are the two constructors of the \texttt{Vector} data type, and $\mathtt{\vee}$ is inclusive disjunction on Booleans.
356\begin{quote}
357\begin{lstlisting}
358nrecord subaddressing_mode (n: Nat) (l: Vector addressing_mode_tag (S n)) : Type[0] ≝
359{
360  subaddressing_modeel :> addressing_mode;
361  subaddressing_modein: bool_to_Prop (is_in ? l subaddressing_modeel)
362}.
363\end{lstlisting}
364\end{quote}
365We can now provide an inductive type of preinstructions with precise typings:
366\begin{quote}
367\begin{lstlisting}
368ninductive preinstruction (A: Type[0]): Type[0] ≝
369   ADD: $\llbracket$ acc_a $\rrbracket$ $\rightarrow$ $\llbracket$ register; direct; indirect; data $\rrbracket$ $\rightarrow$ preinstruction A
370 | ADDC: $\llbracket$ acc_a $\rrbracket$ $\rightarrow$ $\llbracket$ register; direct; indirect; data $\rrbracket$ $\rightarrow$ preinstruction A
371...
372\end{lstlisting}
373\end{quote}
374Here $\llbracket - \rrbracket$ is syntax denoting a vector.
375We see that the constructor \texttt{ADD} expects two parameters, the first being the accumulator A (\texttt{acc\_a}), and the second being one of a register, direct, indirect or data addressing mode.
376
377The final, missing component is a pair of type coercions from \texttt{addressing\_mode} to \texttt{subaddressing\_mode} and from \texttt{subaddressing\_mode} to \texttt{Type$\lbrack0\rbrack$}, respectively.
378The previous machinery allows us to state in the type of a function what addressing modes that function expects.
379For instance, consider \texttt{set\_arg\_16}, which expects only a \texttt{DPTR}:
380\begin{quote}
381\begin{lstlisting}
382ndefinition set_arg_16: Status $\rightarrow$ Word $\rightarrow$ $\llbracket$ dptr $\rrbracket$ $\rightarrow$ Status ≝
383  $\lambda$s, v, a.
384   match a return $\lambda$x. bool_to_Prop (is_in ? $\llbracket$ dptr $\rrbracket$ x) $\rightarrow$ ? with
385     [ DPTR $\Rightarrow$ $\lambda$_: True.
386       let 〈 bu, bl 〉 := split $\ldots$ eight eight v in
387       let status := set_8051_sfr s SFR_DPH bu in
388       let status := set_8051_sfr status SFR_DPL bl in
389         status
390     | _ $\Rightarrow$ $\lambda$_: False.
391       match K in False with
392       [
393       ]
394     ] (subaddressing_modein $\ldots$ a).
395\end{lstlisting}
396\end{quote}
397All other cases are discharged by the catch-all at the bottom of the match expression.
398Attempting to match against another addressing mode not indicated in the type (for example, \texttt{REGISTER}) will produce a type-error.
399
400\end{document}
Note: See TracBrowser for help on using the repository browser.