source: Deliverables/D4.1/ITP-Paper/itp-2011.tex @ 518

Last change on this file since 518 was 518, checked in by mulligan, 9 years ago

small changes

File size: 30.2 KB
Line 
1\documentclass{llncs}
2
3\usepackage{amsfonts}
4\usepackage{amsmath}
5\usepackage{amssymb} 
6\usepackage[english]{babel}
7\usepackage{color}
8\usepackage{fancybox}
9\usepackage{graphicx}
10\usepackage[utf8x]{inputenc}
11\usepackage{listings}
12\usepackage{mdwlist}
13\usepackage{microtype}
14\usepackage{stmaryrd}
15\usepackage{url}
16
17\newlength{\mylength} 
18\newenvironment{frametxt}%
19        {\setlength{\fboxsep}{5pt}
20                \setlength{\mylength}{\linewidth}%
21                \addtolength{\mylength}{-2\fboxsep}%
22                \addtolength{\mylength}{-2\fboxrule}%
23                \Sbox
24                \minipage{\mylength}%
25                        \setlength{\abovedisplayskip}{0pt}%
26                        \setlength{\belowdisplayskip}{0pt}%
27                }%
28                {\endminipage\endSbox
29                        \[\fbox{\TheSbox}\]}
30
31\lstdefinelanguage{matita-ocaml}
32  {keywords={definition,coercion,lemma,theorem,remark,inductive,record,qed,let,in,rec,match,return,with,Type,try,on},
33   morekeywords={[2]whd,normalize,elim,cases,destruct},
34   morekeywords={[3]type,ofm},
35   mathescape=true,
36  }
37\lstset{language=matita-ocaml,basicstyle=\small\tt,columns=flexible,breaklines=false,
38        keywordstyle=\color{red}\bfseries,
39        keywordstyle=[2]\color{blue},
40        keywordstyle=[3]\color{blue}\bfseries,
41        commentstyle=\color{green},
42        stringstyle=\color{blue},
43        showspaces=false,showstringspaces=false}
44\lstset{extendedchars=false}
45\lstset{inputencoding=utf8x}
46\DeclareUnicodeCharacter{8797}{:=}
47\DeclareUnicodeCharacter{10746}{++}
48\DeclareUnicodeCharacter{9001}{\ensuremath{\langle}}
49\DeclareUnicodeCharacter{9002}{\ensuremath{\rangle}}
50
51\author{Claudio Sacerdoti Coen \and Dominic P. Mulligan}
52\authorrunning{C. Sacerdoti Coen and D. P. Mulligan}
53\title{An executable formalisation of the MCS-51 microprocessor in Matita}
54\titlerunning{An executable formalisation of the MCS-51}
55\institute{Dipartimento di Scienze dell'Informazione, University of Bologna}
56
57\begin{document}
58
59\maketitle
60
61\begin{abstract}
62We summarise our formalisation of an emulator for the MCS-51 microprocessor in the Matita proof assistant.
63The MCS-51 is a widely used 8-bit microprocessor, especially popular in embedded devices.
64
65We proceeded in two stages, first implementing in O'Caml a prototype emulator, where bugs could be `ironed out' quickly.
66We then ported our O'Caml emulator to Matita's internal language.
67Though mostly straight-forward, this porting presented multiple problems.
68Of particular interest is how we handle the extreme non-orthoganality of the MSC-51's instruction set.
69In O'Caml, this was handled through heavy use of polymorphic variants.
70In Matita, we achieve the same effect through a non-standard use of dependent types.
71
72Both the O'Caml and Matita emulators are `executable'.
73Assembly programs may be animated within Matita, producing a trace of instructions executed.
74
75Our formalisation is a major component of the ongoing EU-funded CerCo project.
76\end{abstract}
77
78%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
79% SECTION                                                                      %
80%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
81\section{Background}
82\label{sect.introduction}
83
84Formal methods are designed to increase our confidence in the design and implementation of software (and hardware).
85Ideally, we would like all software to come equipped with a formal specification, along with a proof of correctness for the implementation.
86Today practically all programs are written in high level languages and then compiled into low level ones.
87Specifications are therefore also given at a high level and correctness can be proved by reasoning automatically or interactively on the program's source code.
88The code that is actually run, however, is not the high level source code that we reason on, but the object code that is generated by the compiler.
89
90A few simple questions now arise:
91\begin{itemize*}
92\item
93What properties are preserved during compilation?
94\item
95What properties are affected by the compilation strategy?
96\item
97To what extent can you trust your compiler in preserving those properties?
98\end{itemize*}
99These questions, and others like them, motivate a current `hot topic' in computer science research: \emph{compiler verification}.
100So far, the field has been focused on the first and last questions only.
101In particular, much attention has been placed on verifying compiler correctness with respect to extensional properties of programs, which are easily preserved during compilation; it is sufficient to completely preserve the denotational semantics of the input program.
102
103However, if we consider intensional properties of programs---such as space, time or energy spent into computation and transmission of data---the situation is more complex.
104To even be able to express these properties, and to be able to reason about them, we are forced to adopt a cost model that assigns a cost to single, or blocks, of instructions.
105Ideally, we would like to have a compositional cost model that assigns the same cost to all occurrences of one instruction.
106However, compiler optimisations are inherently non-compositional: each occurrence of a high level instruction is usually compiled in a different way according to the context it finds itself in.
107Therefore both the cost model and intensional specifications are affected by the compilation process.
108
109In the current EU project CerCo (`Certified Complexity') we approach the problem of reasoning about intensional properties of programs as follows.
110We are currently developing a compiler that induces a cost model on the high level source code.
111Costs are assigned to each block of high level instructions by considering the costs of the corresponding blocks of compiled object code.
112The cost model is therefore inherently non-compositional.
113However, the model has the potential to be extremely \emph{precise}, capturing a program's \emph{realistic} cost, by taking into account, not ignoring, the compilation process.
114A prototype compiler, where no approximation of the cost is provided, has been developed.
115
116We believe that our approach is especially applicable to certifying real time programs.
117Here, a user can certify that all `deadlines' are met whilst wringing as many clock cycles from the processor---using a cost model that does not over-estimate---as possible.
118
119Further, we see our approach as being relevant to the field of compiler verification (and construction) itself.
120For instance, an optimisation specified only extensionally is only half specified; though the optimisation may preserve the denotational semantics of a program, there is no guarantee that any intensional properties of the program, such as space or time usage, will be improved.
121Another potential application is toward completeness and correctness of the compilation process in the presence of space constraints.
122Here, a compiler could potentially reject a source program targetting an embedded system when the size of the compiled code exceeds the available ROM size.
123Moreover, preservation of a program's semantics may only be required for those programs that do not exhaust the stack or heap.
124Hence the statement of completeness of the compiler must take in to account a realistic cost model.
125
126In the methodology proposed in CerCo we assume we are able to compute on the object code exact and realistic costs for sequential blocks of instructions.
127With modern processors, though possible~\cite{??,??,??}, it is difficult to compute exact costs or to reasonably approximate them.
128This is because the execution of a program itself has an influence on the speed of processing.
129For instance, caching, memory effects and other advanced features such as branch prediction all have a profound effect on execution speeds.
130For this reason CerCo decided to focus on 8-bit microprocessors.
131These are still widely used in embedded systems, and have the advantage of an easily predictable cost model due to the relative sparcity of features that they possess.
132
133In particular, we have fully formalised an executable formal semantics of a family of 8 bit Freescale Microprocessors~\cite{oliboni}, and provided a similar executable formal semantics for the MCS-51 microprocessor.
134The latter work is what we describe in this paper.
135The main focus of the formalisation has been on capturing the intensional behaviour of the processor.
136However, the design of the MCS-51 itself has caused problems in our formalisation.
137For example, the MCS-51 has a highly unorthogonal instruction set.
138To cope with this unorthogonality, and to produce an executable specification, we have exploited the dependent type system of Matita, an interactive proof assistant.
139
140\subsection{The 8051/8052}
141\label{subsect.8051-8052}
142
143The MCS-51 is an eight bit microprocessor introduced by Intel in the late 1970s.
144Commonly called the 8051, in the three decades since its introduction the processor has become a highly popular target for embedded systems engineers.
145Further, the processor, its immediate successor the 8052, and many derivatives are still manufactured \emph{en masse} by a host of semiconductor suppliers.
146
147The 8051 is a well documented processor, and has the additional support of numerous open source and commercial tools, such as compilers for high-level languages and emulators.
148For instance, the open source Small Device C Compiler (SDCC) recognises a dialect of C, and other compilers targeting the 8051 for BASIC, Forth and Modula-2 are also extant.
149An open source emulator for the processor, MCU-8051 IDE, is also available.
150Both MCU-8051 IDE and SDCC were used profitably in the implementation of our formalisation.
151
152\begin{figure}[t]
153\begin{center}
154\includegraphics[scale=0.5]{memorylayout.png}
155\end{center}
156\caption{High level overview of the 8051 memory layout}
157\label{fig.memory.layout}
158\end{figure}
159
160The 8051 has a relatively straightforward architecture, unencumbered by advanced features of modern processors, making it an ideal target for formalisation.
161A high-level overview of the processor's memory layout is provided in Figure~\ref{fig.memory.layout}.
162
163Processor RAM is divided into numerous segments, with the most prominent division being between internal and (optional) external memory.
164Internal memory, commonly provided on the die itself with fast access, is further divided into 128 bytes of internal RAM and numerous Special Function Registers (SFRs) which control the operation of the processor.
165Internal RAM (IRAM) is further divided into eight general purpose bit-addressable registers (R0--R7).
166These sit in the first eight bytes of IRAM, though can be programmatically `shifted up' as needed.
167Bit memory, followed by a small amount of stack space, resides in the memory space immediately after the register banks.
168What remains of the IRAM may be treated as general purpose memory.
169A schematic view of IRAM layout is provided in Figure~\ref{fig.iram.layout}.
170
171External RAM (XRAM), limited to a maximum size of 64 kilobytes, is optional, and may be provided on or off chip, depending on the manufacturer.
172XRAM is accessed using a dedicated instruction, and requires sixteen bits to address fully.
173External code memory (XCODE) is often stored in the form of an EPROM, and limited to 64 kilobytes in size.
174However, depending on the particular manufacturer and processor model, a dedicated on-die read-only memory area for program code (ICODE) may also be supplied.
175
176Memory may be addressed in numerous ways: immediate, direct, indirect, external direct and code indirect.
177As the latter two addressing modes hint, there are some restrictions enforced by the 8051 and its derivatives on which addressing modes may be used with specific types of memory.
178For instance, the 128 bytes of extra internal RAM that the 8052 features cannot be addressed using indirect addressing; rather, external (in)direct addressing must be used.
179
180The 8051 series possesses an eight bit Arithmetic and Logic Unit (ALU), with a wide variety of instructions for performing arithmetic and logical operations on bits and integers.
181Further, the processor possesses two eight bit general purpose accumulators, A and B.
182
183Communication with the device is facilitated by an onboard UART serial port, and associated serial controller, which can operate in numerous modes.
184Serial baud rate is determined by one of two sixteen bit timers included with the 8051, which can be set to multiple modes of operation.
185(The 8052 provides an additional sixteen bit timer.)
186As an additional method of communication, the 8051 also provides a four byte bit-addressable input-output port.
187
188The programmer may take advantage of the interrupt mechanism that the processor provides.
189This is especially useful when dealing with input or output involving the serial device, as an interrupt can be set when a whole character is sent or received via the serial port.
190
191Interrupts immediately halt the flow of execution of the processor, and cause the program counter to jump to a fixed address, where the requisite interrupt handler is stored.
192However, interrupts may be set to one of two priorities: low and high.
193The interrupt handler of an interrupt with high priority is executed ahead of the interrupt handler of an interrupt of lower priority, interrupting a currently executing handler of lower priority, if necessary.
194
195The 8051 has interrupts disabled by default.
196The programmer is free to handle serial input and output manually, by poking serial flags in the SFRs.
197Similarly, `exceptional circumstances' that would otherwise trigger an interrupt on more modern processors, for example, division by zero, are also signalled by setting flags.
198
199\begin{figure}[t]
200\begin{center}
201\includegraphics[scale=0.5]{iramlayout.png}
202\end{center}
203\caption{Schematic view of 8051 IRAM layout}
204\label{fig.iram.layout}
205\end{figure}
206
207%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
208% SECTION                                                                      %
209%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
210\subsection{Overview of paper}
211\label{subsect.overview.paper}
212
213In Section~\ref{sect.development.strategy} we provide a brief overview of how we designed and implemented the formalised microprocessor emulator.
214In Section~\ref{sect.design.issues.formalisation} we describe how we made use of dependent types to handle some of the idiosyncracies of the microprocessor.
215In Section~\ref{sect.related.work} we describe the relation our work has to
216
217%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
218% SECTION                                                                      %
219%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
220\section{From O'Caml prototype to Matita formalisation}
221\label{sect.from.o'caml.prototype.matita.formalisation}
222
223Our implementation progressed in two stages:
224
225\paragraph{O'Caml prototype}
226We began with an emulator written in O'Caml.
227We used this to `iron out' any bugs in our design and implementation within O'Caml's more permissive type system.
228O'Caml's ability to perform file input-output also eased debugging and validation.
229Once we were happy with the performance and design of the O'Caml emulator, we moved to the Matita formalisation.
230
231\paragraph{Matita formalisation}
232Matita's syntax is lexically similar to O'Caml's.
233This eased the translation, as large swathes of code were merely copy-pasted with minor modifications.
234However, several major issues had to be addresses when moving from O'Caml to Matita.
235These are now discussed.
236
237%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
238% SECTION                                                                      %
239%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
240\section{Design issues in the formalisation}
241\label{sect.design.issues.formalisation}
242
243From hereonin, we typeset O'Caml source with blue and Matita source with red to distinguish between the two similar syntaxes.
244
245%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
246% SECTION                                                                      %
247%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
248\subsection{Labels and pseudoinstructions}
249\label{subsect.labels.pseudoinstructions}
250
251As part of the CerCo project, a prototype compiler was being developed in parallel with the emulator.
252Easing the design of the compiler was a key objective in implementing the emulator.
253For this reason, we introduced notion of \emph{pseudoinstruction} and \emph{label}.
254
255The MCS-51's instruction set has numerous instructions for unconditional and conditional jumps to memory locations, calling procedures and moving data between memory spaces.
256For instance, the instructions \texttt{AJMP}, \texttt{JMP} and \texttt{LJMP} all perform unconditional jumps.
257However, these instructions differ in how large the maximum size of the offset of the jump to be performed can be.
258Further, all jump instructions require a concrete memory address, to jump to, to be specified.
259Requiring the compiler to compute these offsets, and select appropriate jump instructions, was seen as needleslly burdensome.
260
261Instead, we introduced generic \texttt{Jump}, \texttt{Call} and \texttt{Move} instructions.
262These are expanded into MCS-51 assembly instructions with an assembly phase, prior to program execution.
263Further, we introduced a notion of label (represented by strings), and introduced pseudoinstructions that allow conditional jumps to jump to labels.
264These are also removed during the assembly phase, and replaced by concrete memory addresses and offsets.
265
266%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
267% SECTION                                                                      %
268%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
269\subsection{Representing memory}
270\label{subsect.representing.memory}
271
272The MCS-51 has numerous different types of memory.
273In our prototype implementation, we simply used a map datastructure from the O'Caml standard library.
274However, when moving to Matita, we aimed to improve our treatment of memory.
275
276We worked under the assumption that large swathes of memory would often be uninitialized.
277Using a complete binary tree, for instance, would therefore be extremely memory inefficient.
278Instead, we chose to use a modified form of trie, where paths are represented by bitvectors.
279As bitvectors were widely used in our implementation already for representing integers, this worked well:
280\begin{quote}
281\begin{lstlisting}
282inductive BitVectorTrie (A: Type[0]): nat $\rightarrow$ Type[0] ≝
283  Leaf: A $\rightarrow$ BitVectorTrie A 0
284| Node: ∀n. BitVectorTrie A n $\rightarrow$ BitVectorTrie A n $\rightarrow$ BitVectorTrie A (S n)
285| Stub: ∀n. BitVectorTrie A n.
286\end{lstlisting}
287\end{quote}
288Here, \texttt{Stub} is a constructor that can appear at any point in our tries.
289It internalises the notion of `uninitialized data'.
290Performing a lookup in memory is now straight-forward.
291We merely traverse a path, and if at any point we encounter a \texttt{Stub}, we return a default value\footnote{All manufacturer data sheets that we consulted were silent on the subject of what should be returned if we attempt to access uninitialized memory.  We defaulted to simply returning zero, though our \texttt{lookup} function is parametric in this choice.}.
292As we are using bitvectors, we may make full use of dependent types and ensure that our bitvector paths are of the same length as the height of the tree.
293
294%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
295% SECTION                                                                      %
296%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
297\subsection{Emulator architecture}
298\label{subsect.emulator.architecture}
299
300The internal state of our Matita emulator is represented as a record:
301\begin{quote}
302\begin{lstlisting}
303record Status: Type[0] ≝
304{
305  code_memory: BitVectorTrie Byte 16;
306  low_internal_ram: BitVectorTrie Byte 7;
307  high_internal_ram: BitVectorTrie Byte 7;
308  external_ram: BitVectorTrie Byte 16;
309  program_counter: Word;
310  special_function_registers_8051: Vector Byte 19;
311  special_function_registers_8052: Vector Byte 5;
312  ...
313}.
314\end{lstlisting}
315\end{quote}
316This record neatly encapsulates the current memory contents, the program counter, the state of the current SFRs, and so on.
317One peculiarity is the packing of the 24 combined SFRs into fixed length vectors.
318This was due to a bug in Matita when we were constructing the emulator, since fixed, where the time needed to typecheck a record grew exponentially with the number of fields.
319
320
321
322%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
323% SECTION                                                                      %
324%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
325\subsection{Instruction set unorthogonality}
326\label{subsect.instruction.set.unorthogonality}
327
328A peculiarity of the MCS-51 is the non-orthogonality of its instruction set.
329For instance, the \texttt{MOV} instruction, can be invoked using one of sixteen combinations of addressing modes.
330
331Such non-orthogonality in the instruction set was handled with the use of polymorphic variants in the O'Caml emulator.
332For instance, we introduced types corresponding to each addressing mode:
333\begin{quote}
334\begin{lstlisting}
335type direct = [ `DIRECT of byte ]
336type indirect = [ `INDIRECT of bit ]
337...
338\end{lstlisting}
339\end{quote}
340Which were then used in our inductive datatype for assembly instructions, as follows:
341\begin{quote}
342\begin{lstlisting}
343type 'addr preinstruction =
344 [ `ADD of acc * [ reg | direct | indirect | data ]
345...
346 | `MOV of
347    (acc * [ reg | direct | indirect | data ],
348     [ reg | indirect ] * [ acc | direct | data ],
349     direct * [ acc | reg | direct | indirect | data ],
350     dptr * data16,
351     carry * bit,
352     bit * carry
353     ) union6
354...
355\end{lstlisting}
356\end{quote}
357Here, \texttt{union6} is a disjoint union type, defined as follows:
358\begin{quote}
359\begin{lstlisting}
360type ('a,'b,'c,'d,'e,'f) union6 = [ `U1 of 'a | ... | `U6 of 'f ]
361\end{lstlisting}
362\end{quote}
363For our purposes, the types \texttt{union2}, \texttt{union3} and \texttt{union6} sufficed.
364
365This polymorphic variant machinery worked well: it introduced a certain level of type safety (for instance, the type of our \texttt{MOV} instruction above guarantees it cannot be invoked with arguments in the \texttt{carry} and \texttt{data16} addressing modes, respectively), and also allowed us to pattern match against instructions, when necessary.
366However, this polymorphic variant machinery is \emph{not} present in Matita.
367We needed some way to produce the same effect, which Matita supported.
368For this task, we used dependent types.
369
370We first provided an inductive data type representing all possible addressing modes, a type that functions will pattern match against:
371\begin{quote}
372\begin{lstlisting}
373inductive addressing_mode: Type[0] ≝
374  DIRECT: Byte $\rightarrow$ addressing_mode
375| INDIRECT: Bit $\rightarrow$ addressing_mode
376...
377\end{lstlisting}
378\end{quote}
379We also wished to express in the type of functions the \emph{impossibility} of pattern matching against certain constructors.
380In order to do this, we introduced an inductive type of addressing mode `tags'.
381The constructors of \texttt{addressing\_mode\_tag} are in one-to-one correspondence with the constructors of \texttt{addressing\_mode}:
382\begin{quote}
383\begin{lstlisting}
384inductive addressing_mode_tag : Type[0] ≝
385  direct: addressing_mode_tag
386| indirect: addressing_mode_tag
387...
388\end{lstlisting}
389\end{quote}
390A function that checks whether an \texttt{addressing\_mode} is `morally' an \texttt{addressing\_mode\_tag} is provided, as follows:
391\begin{quote}
392\begin{lstlisting}
393let rec is_a (d: addressing_mode_tag) (A: addressing_mode) on d ≝
394  match d with
395   [ direct $\Rightarrow$ match A with [ DIRECT _ $\Rightarrow$ true | _ $\Rightarrow$ false ]
396   | indirect $\Rightarrow$ match A with [ INDIRECT _ $\Rightarrow$ true | _ $\Rightarrow$ false ]
397...
398\end{lstlisting}
399\end{quote}
400We also extend this check to vectors of \texttt{addressing\_mode\_tag}'s in the obvious manner:
401\begin{quote}
402\begin{lstlisting}
403let rec is_in (n) (l: Vector addressing_mode_tag n) (A: addressing_mode) on l ≝
404 match l return $\lambda$m.$\lambda$_: Vector addressing_mode_tag m. bool with
405  [ VEmpty $\Rightarrow$ false
406  | VCons m he (tl: Vector addressing_mode_tag m) $\Rightarrow$
407     is_a he A $\vee$ is_in ? tl A ].
408\end{lstlisting}
409\end{quote}
410Here \texttt{VEmpty} and \texttt{VCons} are the two constructors of the \texttt{Vector} data type, and $\mathtt{\vee}$ is inclusive disjunction on Booleans.
411\begin{quote}
412\begin{lstlisting}
413record subaddressing_mode (n: Nat) (l: Vector addressing_mode_tag (S n)) : Type[0] ≝
414{
415  subaddressing_modeel :> addressing_mode;
416  subaddressing_modein: bool_to_Prop (is_in ? l subaddressing_modeel)
417}.
418\end{lstlisting}
419\end{quote}
420We can now provide an inductive type of preinstructions with precise typings:
421\begin{quote}
422\begin{lstlisting}
423inductive preinstruction (A: Type[0]): Type[0] ≝
424   ADD: $\llbracket$ acc_a $\rrbracket$ $\rightarrow$ $\llbracket$ register; direct; indirect; data $\rrbracket$ $\rightarrow$ preinstruction A
425 | ADDC: $\llbracket$ acc_a $\rrbracket$ $\rightarrow$ $\llbracket$ register; direct; indirect; data $\rrbracket$ $\rightarrow$ preinstruction A
426...
427\end{lstlisting}
428\end{quote}
429Here $\llbracket - \rrbracket$ is syntax denoting a vector.
430We see that the constructor \texttt{ADD} expects two parameters, the first being the accumulator A (\texttt{acc\_a}), and the second being one of a register, direct, indirect or data addressing mode.
431
432The final, missing component is a pair of type coercions from \texttt{addressing\_mode} to \texttt{subaddressing\_mode} and from \texttt{subaddressing\_mode} to \texttt{Type$\lbrack0\rbrack$}, respectively.
433The previous machinery allows us to state in the type of a function what addressing modes that function expects.
434For instance, consider \texttt{set\_arg\_16}, which expects only a \texttt{DPTR}:
435\begin{quote}
436\begin{lstlisting}
437definition set_arg_16: Status $\rightarrow$ Word $\rightarrow$ $\llbracket$ dptr $\rrbracket$ $\rightarrow$ Status ≝
438  $\lambda$s, v, a.
439   match a return $\lambda$x. bool_to_Prop (is_in ? $\llbracket$ dptr $\rrbracket$ x) $\rightarrow$ ? with
440     [ DPTR $\Rightarrow$ $\lambda$_: True.
441       let 〈 bu, bl 〉 := split $\ldots$ eight eight v in
442       let status := set_8051_sfr s SFR_DPH bu in
443       let status := set_8051_sfr status SFR_DPL bl in
444         status
445     | _ $\Rightarrow$ $\lambda$_: False.
446       match K in False with
447       [
448       ]
449     ] (subaddressing_modein $\ldots$ a).
450\end{lstlisting}
451\end{quote}
452All other cases are discharged by the catch-all at the bottom of the match expression.
453Attempting to match against another addressing mode not indicated in the type (for example, \texttt{REGISTER}) will produce a type-error.
454
455%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
456% SECTION                                                                      %
457%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
458\section{Validation}
459\label{sect.validation}
460
461We spent considerable effort attempting to ensure that our formalisation is correct, that is, what we have formalised really is an accurate model of the MCS-51 microprocessor.
462
463First, we made use of multiple data sheets, each from a different semiconductor manufacturer.
464This helped us spot errors in the specification of the processor's instruction set, and its behaviour.
465
466The O'Caml prototype was especially useful for validation purposes.
467This is because we wrote a module for parsing and loading the Intel HEX file format.
468HEX is a standard format that all compilers targetting the MCS-51, and similar processors, produce.
469It is essentially a snapshot of the processor's code memory in compressed form.
470Using this, we were able to compile C programs with SDCC, an open source compiler, and load the resulting program directly into our emulator's code memory, ready for execution.
471Further, we are able to produce a HEX file from our emulator's code memory, for loading into third party tools.
472After each step of execution, we can print out both the instruction that had been executed, along with its arguments, and a snapshot of the processor's state, including all flags and register contents.
473For example:
474\begin{frametxt}
475\begin{verbatim}
476...
477
47808: mov 81 #07
479
480 Processor status:                               
481
482   ACC : 0 (00000000) B   : 0 (00000000)
483   PSW : 0 (00000000) with flags set as:
484     CY  : false   AC  : false
485     FO  : false   RS1 : false
486     RS0 : false   OV  : false
487     UD  : false   P   : false
488   SP  : 7 (00000111) IP  : 0 (00000000)
489   PC  : 8 (0000000000001000)
490   DPL : 0 (00000000) DPH : 0 (00000000)
491   SCON: 0 (00000000) SBUF: 0 (00000000)
492   TMOD: 0 (00000000) TCON: 0 (00000000)
493   Registers:                                   
494    R0 : 0 (00000000) R1 : 0 (00000000)
495    R2 : 0 (00000000) R3 : 0 (00000000)
496    R4 : 0 (00000000) R5 : 0 (00000000)
497    R6 : 0 (00000000) R7 : 0 (00000000)
498
499...
500\end{verbatim}
501\end{frametxt}
502Here, the traces indicates that the instruction \texttt{mov 81 \#07} has just been executed by the processor, which is now in the state indicated.
503These traces were useful in spotting anything that was `obviously' wrong with the execution of the program.
504
505Further, we made use of an open source emulator for the MCS-51, \texttt{mcu8051ide}.
506Using our execution traces, we were able to step through a compiled program, one instruction at a time, in \texttt{mcu8051ide}, and compare the resulting execution trace with the trace produced by our emulator.
507
508Our Matita formalisation was largely copied from the O'Caml source code, apart from changes related to addressing modes already mentioned.
509However, as the Matita emulator is executable, we could perform further validation by comparing the trace of a program's execution in the Matita emulator with the trace of the same program in the O'Caml emulator.
510
511%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
512% SECTION                                                                      %
513%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
514\section{Related work}
515\label{sect.related.work}
516
517%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
518% SECTION                                                                      %
519%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
520\section{Conclusions}
521\label{sect.conclusions}
522
523\end{document}
Note: See TracBrowser for help on using the repository browser.