source: Deliverables/D4.1/ITP-Paper/itp-2011.tex @ 508

Last change on this file since 508 was 508, checked in by mulligan, 8 years ago

Changes

File size: 19.7 KB
Line 
1\documentclass{llncs}
2
3\usepackage{amsfonts}
4\usepackage{amsmath}
5\usepackage{amssymb} 
6\usepackage[english]{babel}
7\usepackage{color}
8\usepackage{graphicx}
9\usepackage[utf8x]{inputenc}
10\usepackage{listings}
11\usepackage{microtype}
12\usepackage{stmaryrd}
13\usepackage{url}
14
15\lstdefinelanguage{matita-ocaml}
16  {keywords={ndefinition,ncoercion,nlemma,ntheorem,nremark,ninductive,nrecord,nqed,nlet,let,in,rec,match,return,with,Type,try},
17   morekeywords={[2]nwhd,nnormalize,nelim,ncases,ndestruct},
18   morekeywords={[3]type,of},
19   mathescape=true,
20  }
21\lstset{language=matita-ocaml,basicstyle=\small\tt,columns=flexible,breaklines=false,
22        keywordstyle=\color{red}\bfseries,
23        keywordstyle=[2]\color{blue},
24        keywordstyle=[3]\color{blue}\bfseries,
25        commentstyle=\color{green},
26        stringstyle=\color{blue},
27        showspaces=false,showstringspaces=false}
28\lstset{extendedchars=false}
29\lstset{inputencoding=utf8x}
30\DeclareUnicodeCharacter{8797}{:=}
31\DeclareUnicodeCharacter{10746}{++}
32\DeclareUnicodeCharacter{9001}{\ensuremath{\langle}}
33\DeclareUnicodeCharacter{9002}{\ensuremath{\rangle}}
34
35\author{Claudio Sacerdoti Coen \and Dominic P. Mulligan}
36\authorrunning{C. Sacerdoti Coen and D. P. Mulligan}
37\title{An executable formalisation of the MCS-51 microprocessor in Matita}
38\titlerunning{An executable formalisation of the MCS-51}
39\institute{Dipartimento di Scienze dell'Informazione, University of Bologna}
40
41\begin{document}
42
43\maketitle
44
45\begin{abstract}
46We summarise our formalisation of an emulator for the MCS-51 microprocessor in the Matita proof assistant.
47The MCS-51 is a widely used 8-bit microprocessor, especially popular in embedded devices.
48
49We proceeded in two stages, first implementing in O'Caml a prototype emulator, where bugs could be `ironed out' quickly.
50We then ported our O'Caml emulator to Matita's internal language.
51Though mostly straight-forward, this porting presented multiple problems.
52Of particular interest is how we handle the extreme non-orthoganality of the MSC-51's instruction set.
53In O'Caml, this was handled through heavy use of polymorphic variants.
54In Matita, we achieve the same effect through a non-standard use of dependent types.
55
56Both the O'Caml and Matita emulators are `executable'.
57Assembly programs may be animated within Matita, producing a trace of instructions executed.
58
59Our formalisation is a major component of the ongoing EU-funded CerCo project.
60\end{abstract}
61
62%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
63% SECTION                                                                      %
64%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
65\section{Introduction}
66\label{sect.introduction}
67
68Compiler verification, as of late, is as of late a `hot topic'.
69This rapidly growing field is motivated by one simple question: `to what extent can you trust your compiler?'
70Existing verification efforts have broadly focussed on \emph{semantic correctness}, that is, creating a compiler that is guaranteed to preserve the semantics of a program during the compilation process.
71However, there is another important facet of correctness that has not garnered much attention, that is, correctness with respect to some intensional properties of the program to be compiled.
72
73\subsection{The 8051/8052}
74\label{subsect.8051-8052}
75
76The MCS-51 is an eight bit microprocessor introduced by Intel in the late 1970s.
77Commonly called the 8051, in the three decades since its introduction the processor has become a highly popular target for embedded systems engineers.
78Further, the processor and its immediate successor, the 8052, is still manufactured by a host of semiconductor suppliers---many of them European---including Atmel, Siemens Semiconductor, NXP (formerly Phillips Semiconductor), Texas Instruments, and Maxim (formerly Dallas Semiconductor).
79
80The 8051 is a well documented processor, and has the additional support of numerous open source and commercial tools, such as compilers for high-level languages and emulators.
81For instance, the open source Small Device C Compiler (SDCC) recognises a dialect of C, and other compilers targeting the 8051 for BASIC, Forth and Modula-2 are also extant.
82An open source emulator for the processor, MCU8051 IDE, is also available.
83
84\begin{figure}[t]
85\begin{center}
86\includegraphics[scale=0.5]{memorylayout.png}
87\end{center}
88\caption{High level overview of the 8051 memory layout}
89\label{fig.memory.layout}
90\end{figure}
91
92The 8051 has a relatively straightforward architecture, unencumbered by advanced features of modern processors, making it an ideal target for formalisation.
93A high-level overview of the processor's memory layout is provided in Figure~\ref{fig.memory.layout}.
94
95Processor RAM is divided into numerous segments, with the most prominent division being between internal and (optional) external memory.
96Internal memory, commonly provided on the die itself with fast access, is further divided into 128 bytes of internal RAM and numerous Special Function Registers (SFRs) which control the operation of the processor.
97Internal RAM (IRAM) is further divided into a eight general purpose bit-addressable registers (R0--R7).
98These sit in the first eight bytes of IRAM, though can be programmatically `shifted up' as needed.
99Bit memory, followed by a small amount of stack space resides in the memory space immediately after the register banks.
100What remains of the IRAM may be treated as general purpose memory.
101A schematic view of IRAM layout is provided in Figure~\ref{fig.iram.layout}.
102
103External RAM (XRAM), limited to 64 kilobytes, is optional, and may be provided on or off chip, depending on the manufacturer.
104XRAM is accessed using a dedicated instruction.
105External code memory (XCODE) is often stored in the form of an EPROM, and limited to 64 kilobytes in size.
106However, depending on the particular manufacturer and processor model, a dedicated on-die read-only memory area for program code (ICODE) may also be supplied.
107
108Memory may be addressed in numerous ways: immediate, direct, indirect, external direct and code indirect.
109As the latter two addressing modes hint, there are some restrictions enforced by the 8051 and its derivatives on which addressing modes may be used with specific types of memory.
110For instance, the 128 bytes of extra internal RAM that the 8052 features cannot be addressed using indirect addressing; rather, external (in)direct addressing must be used.
111
112The 8051 series possesses an eight bit Arithmetic and Logic Unit (ALU), with a wide variety of instructions for performing arithmetic and logical operations on bits and integers.
113Further, the processor possesses two eight bit general purpose accumulators, A and B.
114
115Communication with the device is facilitated by an onboard UART serial port, and associated serial controller, which can operate in numerous modes.
116Serial baud rate is determined by one of two sixteen bit timers included with the 8051, which can be set to multiple modes of operation.
117(The 8052 provides an additional sixteen bit timer.)
118As an additional method of communication, the 8051 also provides a four byte bit-addressable input-output port.
119
120The programmer may take advantage of the interrupt mechanism that the processor provides.
121This is especially useful when dealing with input or output involving the serial device, as an interrupt can be set when a whole character is sent or received via the serial port.
122
123Interrupts immediately halt the flow of execution of the processor, and cause the program counter to jump to a fixed address, where the requisite interrupt handler is stored.
124However, interrupts may be set to one of two priorities: low and high.
125The interrupt handler of an interrupt with high priority is executed ahead of the interrupt handler of an interrupt of lower priority, interrupting a currently executing handler of lower priority, if necessary.
126
127The 8051 has interrupts disabled by default.
128The programmer is free to handle serial input and output manually, by poking serial flags in the SFRs.
129Similarly, `exceptional circumstances' that would otherwise trigger an interrupt on more modern processors, for example, division by zero, are also signalled by setting flags.
130
131\begin{figure}[t]
132\begin{center}
133\includegraphics[scale=0.5]{iramlayout.png}
134\end{center}
135\caption{Schematic view of 8051 IRAM layout}
136\label{fig.iram.layout}
137\end{figure}
138
139%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
140% SECTION                                                                      %
141%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
142\subsection{The CerCo project}
143\label{subsect.cerco.project}
144
145The CerCo project (`Certified Complexity') is a current European FeT-Open project incorporating three sites---the Universities of Bologna, Edinburgh and Paris Diderot 7---throughout the European Union.
146The ultimate aim of the project is to produce a certified compiler for a large subset of the C programming language targetting a microprocessor used in embedded systems.
147In this respect, the CerCo project bears a deal of similarity with CompCert, another European funded project.
148However, we see a number of important differences between the aims of the two projects.
149\begin{enumerate}
150\item
151The CerCo project aims to allow reasoning on aspects of the intensional properties of C programs.
152That is,
153\item
154The CompCert project compiled a subset of C down to the assembly level.
155A semantics for assembly language was provided.
156\end{enumerate}
157
158%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
159% SECTION                                                                      %
160%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
161\subsection{Overview of paper}
162\label{subsect.overview.paper}
163
164In Section~\ref{sect.development.strategy} we provide a brief overview of how we designed and implemented the formalised microprocessor emulator.
165In Section~\ref{sect.design.issues.formalisation} we describe how we made use of dependent types to handle some of the idiosyncracies of the microprocessor.
166In Section~\ref{sect.related.work} we describe the relation our work has to
167
168%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
169% SECTION                                                                      %
170%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
171\section{From O'Caml prototype to Matita formalisation}
172\label{sect.from.o'caml.prototype.matita.formalisation}
173
174Our implementation progressed in two stages:
175
176\paragraph{O'Caml prototype}
177We began with an emulator written in O'Caml.
178We used this to `iron out' any bugs in our design and implementation within O'Caml's more permissive type system.
179O'Caml's ability to perform file input-output also eased debugging and validation.
180Once we were happy with the performance and design of the O'Caml emulator, we moved to the Matita formalisation.
181
182\paragraph{Matita formalisation}
183Matita's syntax is lexically similar to O'Caml's.
184This eased the translation, as large swathes of code were merely copy-pasted with minor modifications.
185However, several major issues had to be addresses when moving from O'Caml to Matita.
186These are now discussed.
187
188%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
189% SECTION                                                                      %
190%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
191\section{Design issues in the formalisation}
192\label{sect.design.issues.formalisation} 
193
194%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
195% SECTION                                                                      %
196%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
197\subsection{Labels and pseudoinstructions}
198\label{subsect.labels.pseudoinstructions}
199
200As part of the CerCo project, a prototype compiler was being developed in parallel with the emulator.
201Easing the design of the compiler was a key objective in implementing the emulator.
202For this reason, we introduced notion of \emph{pseudoinstruction} and \emph{label}.
203
204The MCS-51's instruction set has numerous instructions for unconditional and conditional jumps to memory locations, calling procedures and moving data between memory spaces.
205For instance, the instructions \texttt{AJMP}, \texttt{JMP} and \texttt{LJMP} all perform unconditional jumps.
206However, these instructions differ in how large the maximum size of the offset of the jump to be performed can be.
207Further, all jump instructions require a concrete memory address, to jump to, to be specified.
208Requiring the compiler to compute these offsets, and select appropriate jump instructions, was seen as needleslly burdensome.
209
210Instead, we introduced generic \texttt{Jump}, \texttt{Call} and \texttt{Move} instructions.
211These are expanded into MCS-51 assembly instructions with an assembly phase, prior to program execution.
212Further, we introduced a notion of label (represented by strings), and introduced pseudoinstructions that allow conditional jumps to jump to labels.
213These are also removed during the assembly phase, and replaced by concrete memory addresses and offsets.
214
215%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
216% SECTION                                                                      %
217%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
218\subsection{Putting dependent types to work}
219\label{subsect.putting.dependent.types.to.work}
220
221A peculiarity of the MCS-51 is the non-orthogonality of its instruction set.
222For instance, the \texttt{MOV} instruction, can be invoked using one of sixteen combinations of addressing modes.
223
224Such non-orthogonality in the instruction set was handled with the use of polymorphic variants in the O'Caml emulator.
225For instance, we introduced types corresponding to each addressing mode:
226\begin{quote}
227\begin{lstlisting}
228type direct = [ `DIRECT of byte ]
229type indirect = [ `INDIRECT of bit ]
230...
231\end{lstlisting}
232\end{quote}
233Which were then used in our inductive datatype for assembly instructions, as follows:
234\begin{quote}
235\begin{lstlisting}
236type 'addr preinstruction =
237 [ `ADD of acc * [ reg | direct | indirect | data ]
238...
239 | `MOV of
240    (acc * [ reg | direct | indirect | data ],
241     [ reg | indirect ] * [ acc | direct | data ],
242     direct * [ acc | reg | direct | indirect | data ],
243     dptr * data16,
244     carry * bit,
245     bit * carry
246     ) union6
247...
248\end{lstlisting}
249\end{quote}
250Here, \texttt{union6} is a disjoint union type, defined as follows:
251\begin{quote}
252\begin{lstlisting}
253type ('a,'b,'c,'d,'e,'f) union6 = [ `U1 of 'a | ... | `U6 of 'f ]
254\end{lstlisting}
255\end{quote}
256For our purposes, the types \texttt{union2} and \texttt{union3} were also necessary.
257
258This polymorphic variant machinery worked well: it allowed us to pattern match
259
260We provide an inductive data type representing all possible addressing modes of 8051 assembly.
261This is the type that functions will pattern match against.
262
263\begin{quote}
264\begin{lstlisting}
265ninductive addressing_mode: Type[0] ≝
266  DIRECT: Byte $\rightarrow$ addressing_mode
267| INDIRECT: Bit $\rightarrow$ addressing_mode
268...
269\end{lstlisting}
270\end{quote}
271However, we also wish to express in the type of our functions the \emph{impossibility} of pattern matching against certain constructors.
272In order to do this, we introduce an inductive type of addressing mode `tags'.
273The constructors of \texttt{addressing\_mode\_tag} are in one-one correspondence with the constructors of \texttt{addressing\_mode}:
274\begin{quote}
275\begin{lstlisting}
276ninductive addressing_mode_tag : Type[0] ≝
277  direct: addressing_mode_tag
278| indirect: addressing_mode_tag
279...
280\end{lstlisting}
281\end{quote}
282We then provide a function that checks whether an \texttt{addressing\_mode} is `morally' an \texttt{addressing\_mode\_tag}, as follows:
283\begin{quote}
284\begin{lstlisting}
285nlet rec is_a (d:addressing_mode_tag) (A:addressing_mode) on d ≝
286  match d with
287   [ direct $\Rightarrow$ match A with [ DIRECT _ $\Rightarrow$ true | _ $\Rightarrow$ false ]
288   | indirect $\Rightarrow$ match A with [ INDIRECT _ $\Rightarrow$ true | _ $\Rightarrow$ false ]
289...
290\end{lstlisting}
291\end{quote}
292We also extend this check to vectors of \texttt{addressing\_mode\_tag}'s in the obvious manner:
293\begin{quote}
294\begin{lstlisting}
295nlet rec is_in (n: Nat) (l: Vector addressing_mode_tag n) (A:addressing_mode) on l ≝
296 match l return $\lambda$m.$\lambda$_ :Vector addressing_mode_tag m.Bool with
297  [ VEmpty $\Rightarrow$ false
298  | VCons m he (tl: Vector addressing_mode_tag m) $\Rightarrow$
299     is_a he A $\vee$ is_in ? tl A ].
300\end{lstlisting}
301\end{quote}
302Here \texttt{VEmpty} and \texttt{VCons} are the two constructors of the \texttt{Vector} data type, and $\mathtt{\vee}$ is inclusive disjunction on Booleans.
303\begin{quote}
304\begin{lstlisting}
305nrecord subaddressing_mode (n: Nat) (l: Vector addressing_mode_tag (S n)) : Type[0] ≝
306{
307  subaddressing_modeel :> addressing_mode;
308  subaddressing_modein: bool_to_Prop (is_in ? l subaddressing_modeel)
309}.
310\end{lstlisting}
311\end{quote}
312We can now provide an inductive type of preinstructions with precise typings:
313\begin{quote}
314\begin{lstlisting}
315ninductive preinstruction (A: Type[0]): Type[0] ≝
316   ADD: $\llbracket$ acc_a $\rrbracket$ $\rightarrow$ $\llbracket$ register; direct; indirect; data $\rrbracket$ $\rightarrow$ preinstruction A
317 | ADDC: $\llbracket$ acc_a $\rrbracket$ $\rightarrow$ $\llbracket$ register; direct; indirect; data $\rrbracket$ $\rightarrow$ preinstruction A
318...
319\end{lstlisting}
320\end{quote}
321Here $\llbracket - \rrbracket$ is syntax denoting a vector.
322We see that the constructor \texttt{ADD} expects two parameters, the first being the accumulator A (\texttt{acc\_a}), and the second being one of a register, direct, indirect or data addressing mode.
323
324The final, missing component is a pair of type coercions from \texttt{addressing\_mode} to \texttt{subaddressing\_mode} and from \texttt{subaddressing\_mode} to \texttt{Type$\lbrack0\rbrack$}, respectively.
325The previous machinery allows us to state in the type of a function what addressing modes that function expects.
326For instance, consider \texttt{set\_arg\_16}, which expects only a \texttt{DPTR}:
327\begin{quote}
328\begin{lstlisting}
329ndefinition set_arg_16: Status $\rightarrow$ Word $\rightarrow$ $\llbracket$ dptr $\rrbracket$ $\rightarrow$ Status ≝
330  $\lambda$s, v, a.
331   match a return $\lambda$x. bool_to_Prop (is_in ? $\llbracket$ dptr $\rrbracket$ x) $\rightarrow$ ? with
332     [ DPTR $\Rightarrow$ $\lambda$_: True.
333       let 〈 bu, bl 〉 := split $\ldots$ eight eight v in
334       let status := set_8051_sfr s SFR_DPH bu in
335       let status := set_8051_sfr status SFR_DPL bl in
336         status
337     | _ $\Rightarrow$ $\lambda$_: False.
338       match K in False with
339       [
340       ]
341     ] (subaddressing_modein $\ldots$ a).
342\end{lstlisting}
343\end{quote}
344All other cases are discharged by the catch-all at the bottom of the match expression.
345Attempting to match against another addressing mode not indicated in the type (for example, \texttt{REGISTER}) will produce a type-error.
346
347%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
348% SECTION                                                                      %
349%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
350\section{Validation}
351\label{sect.validation}
352
353%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
354% SECTION                                                                      %
355%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
356\section{Related work}
357\label{sect.related.work}
358
359%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
360% SECTION                                                                      %
361%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
362\section{Conclusions}
363\label{sect.conclusions}
364
365\end{document}
Note: See TracBrowser for help on using the repository browser.