source: Deliverables/D4.1/ITP-Paper/itp-2011.tex @ 506

Last change on this file since 506 was 506, checked in by mulligan, 9 years ago

a bit more added

File size: 17.3 KB
Line 
1\documentclass{llncs}
2
3\usepackage{amsfonts}
4\usepackage{amsmath}
5\usepackage{amssymb} 
6\usepackage[english]{babel}
7\usepackage{color}
8\usepackage{graphicx}
9\usepackage[utf8x]{inputenc}
10\usepackage{listings}
11\usepackage{microtype}
12\usepackage{stmaryrd}
13\usepackage{url}
14
15\lstdefinelanguage{matita-ocaml}
16  {keywords={ndefinition,ncoercion,nlemma,ntheorem,nremark,ninductive,nrecord,nqed,nlet,let,in,rec,match,return,with,Type,try},
17   morekeywords={[2]nwhd,nnormalize,nelim,ncases,ndestruct},
18   morekeywords={[3]type,of},
19   mathescape=true,
20  }
21\lstset{language=matita-ocaml,basicstyle=\small\tt,columns=flexible,breaklines=false,
22        keywordstyle=\color{red}\bfseries,
23        keywordstyle=[2]\color{blue},
24        keywordstyle=[3]\color{blue}\bfseries,
25        commentstyle=\color{green},
26        stringstyle=\color{blue},
27        showspaces=false,showstringspaces=false}
28\lstset{extendedchars=false}
29\lstset{inputencoding=utf8x}
30\DeclareUnicodeCharacter{8797}{:=}
31\DeclareUnicodeCharacter{10746}{++}
32\DeclareUnicodeCharacter{9001}{\ensuremath{\langle}}
33\DeclareUnicodeCharacter{9002}{\ensuremath{\rangle}}
34
35\author{Claudio Sacerdoti Coen \and Dominic P. Mulligan}
36\authorrunning{C. Sacerdoti Coen and D. P. Mulligan}
37\title{An executable formalisation of the MCS-51 microprocessor in Matita}
38\titlerunning{An executable formalisation of the MCS-51}
39\institute{Dipartimento di Scienze dell'Informazione, University of Bologna}
40
41\begin{document}
42
43\maketitle
44
45\begin{abstract}
46We summarise our formalisation of an emulator for the MCS-51 microprocessor in the Matita proof assistant.
47The MCS-51 is a widely used 8-bit microprocessor, especially popular in embedded devices.
48
49We proceeded in two stages, first implementing in O'Caml a prototype emulator, where bugs could be `ironed out' quickly.
50We then ported our O'Caml emulator to Matita's internal language.
51Though mostly straight-forward, this porting presented multiple problems.
52Of particular interest is how we handle the extreme non-orthoganality of the MSC-51's instruction set.
53In O'Caml, this was handled through heavy use of polymorphic variants.
54In Matita, we achieve the same effect through a non-standard use of dependent types.
55
56Both the O'Caml and Matita emulators are `executable'.
57Assembly programs may be animated within Matita, producing a trace of instructions executed.
58
59Our formalisation is a major component of the ongoing EU-funded CerCo project.
60\end{abstract}
61
62%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
63% SECTION                                                                      %
64%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
65\section{Introduction}
66\label{sect.introduction}
67
68Compiler verification, as of late, is as of late a `hot topic'.
69This rapidly growing field is motivated by one simple question: `to what extent can you trust your compiler?'
70Existing verification efforts have broadly focussed on \emph{semantic correctness}, that is, creating a compiler that is guaranteed to preserve the semantics of a program during the compilation process.
71However, there is another important facet of correctness that has not garnered much attention, that is, correctness with respect to some intensional properties of the program to be compiled.
72
73\subsection{The 8051/8052}
74\label{subsect.8051-8052}
75
76The MCS-51 is an eight bit microprocessor introduced by Intel in the late 1970s.
77Commonly called the 8051, in the three decades since its introduction the processor has become a highly popular target for embedded systems engineers.
78Further, the processor and its immediate successor, the 8052, is still manufactured by a host of semiconductor suppliers---many of them European---including Atmel, Siemens Semiconductor, NXP (formerly Phillips Semiconductor), Texas Instruments, and Maxim (formerly Dallas Semiconductor).
79
80The 8051 is a well documented processor, and has the additional support of numerous open source and commercial tools, such as compilers for high-level languages and emulators.
81For instance, the open source Small Device C Compiler (SDCC) recognises a dialect of C, and other compilers targeting the 8051 for BASIC, Forth and Modula-2 are also extant.
82An open source emulator for the processor, MCU8051 IDE, is also available.
83
84\begin{figure}[t]
85\begin{center}
86\includegraphics[scale=0.5]{memorylayout.png}
87\end{center}
88\caption{High level overview of the 8051 memory layout}
89\label{fig.memory.layout}
90\end{figure}
91
92The 8051 has a relatively straightforward architecture, unencumbered by advanced features of modern processors, making it an ideal target for formalisation.
93A high-level overview of the processor's memory layout is provided in Figure~\ref{fig.memory.layout}.
94
95Processor RAM is divided into numerous segments, with the most prominent division being between internal and (optional) external memory.
96Internal memory, commonly provided on the die itself with fast access, is further divided into 128 bytes of internal RAM and numerous Special Function Registers (SFRs) which control the operation of the processor.
97Internal RAM (IRAM) is further divided into a eight general purpose bit-addressable registers (R0--R7).
98These sit in the first eight bytes of IRAM, though can be programmatically `shifted up' as needed.
99Bit memory, followed by a small amount of stack space resides in the memory space immediately after the register banks.
100What remains of the IRAM may be treated as general purpose memory.
101A schematic view of IRAM layout is provided in Figure~\ref{fig.iram.layout}.
102
103External RAM (XRAM), limited to 64 kilobytes, is optional, and may be provided on or off chip, depending on the manufacturer.
104XRAM is accessed using a dedicated instruction.
105External code memory (XCODE) is often stored in the form of an EPROM, and limited to 64 kilobytes in size.
106However, depending on the particular manufacturer and processor model, a dedicated on-die read-only memory area for program code (ICODE) may also be supplied.
107
108Memory may be addressed in numerous ways: immediate, direct, indirect, external direct and code indirect.
109As the latter two addressing modes hint, there are some restrictions enforced by the 8051 and its derivatives on which addressing modes may be used with specific types of memory.
110For instance, the 128 bytes of extra internal RAM that the 8052 features cannot be addressed using indirect addressing; rather, external (in)direct addressing must be used.
111
112The 8051 series possesses an eight bit Arithmetic and Logic Unit (ALU), with a wide variety of instructions for performing arithmetic and logical operations on bits and integers.
113Further, the processor possesses two eight bit general purpose accumulators, A and B.
114
115Communication with the device is facilitated by an onboard UART serial port, and associated serial controller, which can operate in numerous modes.
116Serial baud rate is determined by one of two sixteen bit timers included with the 8051, which can be set to multiple modes of operation.
117(The 8052 provides an additional sixteen bit timer.)
118As an additional method of communication, the 8051 also provides a four byte bit-addressable input-output port.
119
120The programmer may take advantage of the interrupt mechanism that the processor provides.
121This is especially useful when dealing with input or output involving the serial device, as an interrupt can be set when a whole character is sent or received via the serial port.
122
123Interrupts immediately halt the flow of execution of the processor, and cause the program counter to jump to a fixed address, where the requisite interrupt handler is stored.
124However, interrupts may be set to one of two priorities: low and high.
125The interrupt handler of an interrupt with high priority is executed ahead of the interrupt handler of an interrupt of lower priority, interrupting a currently executing handler of lower priority, if necessary.
126
127The 8051 has interrupts disabled by default.
128The programmer is free to handle serial input and output manually, by poking serial flags in the SFRs.
129Similarly, `exceptional circumstances' that would otherwise trigger an interrupt on more modern processors, for example, division by zero, are also signalled by setting flags.
130
131\begin{figure}[t]
132\begin{center}
133\includegraphics[scale=0.5]{iramlayout.png}
134\end{center}
135\caption{Schematic view of 8051 IRAM layout}
136\label{fig.iram.layout}
137\end{figure}
138
139%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
140% SECTION                                                                      %
141%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
142\subsection{The CerCo project}
143\label{subsect.cerco.project}
144
145The CerCo project (`Certified Complexity') is a current European FeT-Open project incorporating three sites---the Universities of Bologna, Edinburgh and Paris Diderot 7---throughout the European Union.
146The ultimate aim of the project is to produce a certified compiler for a large subset of the C programming language targetting a microprocessor used in embedded systems.
147In this respect, the CerCo project bears a deal of similarity with CompCert, another European funded project.
148However, we see a number of important differences between the aims of the two projects.
149\begin{enumerate}
150\item
151The CerCo project aims to allow reasoning on aspects of the intensional properties of C programs.
152That is,
153\item
154The CompCert project compiled a subset of C down to the assembly level.
155A semantics for assembly language was provided.
156\end{enumerate}
157
158%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
159% SECTION                                                                      %
160%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
161\subsection{Overview of paper}
162\label{subsect.overview.paper}
163
164In Section~\ref{sect.development.strategy} we provide a brief overview of how we designed and implemented the formalised microprocessor emulator.
165In Section~\ref{sect.design.issues.formalisation} we describe how we made use of dependent types to handle some of the idiosyncracies of the microprocessor.
166In Section~\ref{sect.related.work} we describe the relation our work has to
167
168%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
169% SECTION                                                                      %
170%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
171\section{From O'Caml prototype to Matita formalisation}
172\label{sect.from.o'caml.prototype.matita.formalisation}
173
174Our implementation progressed in two stages:
175
176\paragraph{O'Caml prototype}
177We began with an emulator written in O'Caml.
178We used this to `iron out' any bugs in our design and implementation within O'Caml's more permissive type system.
179O'Caml's ability to perform file input-output also eased debugging and validation.
180Once we were happy with the performance and design of the O'Caml emulator, we moved to the Matita formalisation.
181
182\paragraph{Matita formalisation}
183Matita's syntax is lexically similar to O'Caml's.
184This eased the translation, as large swathes of code were merely copy-pasted with minor modifications.
185However, several major issues had to be addresses when moving from O'Caml to Matita.
186These are now discussed.
187
188%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
189% SECTION                                                                      %
190%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
191\section{Design issues in the formalisation}
192\label{sect.design.issues.formalisation}
193
194As mentioned, the emulator formalisation is part of a wider project, CerCo.
195In parallel with the implementation of the emulator was the construction of a prototype compiler targetting the MCS-51.
196
197%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
198% SECTION                                                                      %
199%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
200\subsection{Labels and pseudoinstructions}
201\label{subsect.labels.pseudoinstructions}
202
203%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
204% SECTION                                                                      %
205%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
206\subsection{Putting dependent types to work}
207\label{subsect.putting.dependent.types.to.work}
208
209We provide an inductive data type representing all possible addressing modes of 8051 assembly.
210This is the type that functions will pattern match against.
211
212\begin{quote}
213\begin{lstlisting}
214ninductive addressing_mode: Type[0] ≝
215  DIRECT: Byte $\rightarrow$ addressing_mode
216| INDIRECT: Bit $\rightarrow$ addressing_mode
217...
218\end{lstlisting}
219\end{quote}
220However, we also wish to express in the type of our functions the \emph{impossibility} of pattern matching against certain constructors.
221In order to do this, we introduce an inductive type of addressing mode `tags'.
222The constructors of \texttt{addressing\_mode\_tag} are in one-one correspondence with the constructors of \texttt{addressing\_mode}:
223\begin{quote}
224\begin{lstlisting}
225ninductive addressing_mode_tag : Type[0] ≝
226  direct: addressing_mode_tag
227| indirect: addressing_mode_tag
228...
229\end{lstlisting}
230\end{quote}
231We then provide a function that checks whether an \texttt{addressing\_mode} is `morally' an \texttt{addressing\_mode\_tag}, as follows:
232\begin{quote}
233\begin{lstlisting}
234nlet rec is_a (d:addressing_mode_tag) (A:addressing_mode) on d ≝
235  match d with
236   [ direct $\Rightarrow$ match A with [ DIRECT _ $\Rightarrow$ true | _ $\Rightarrow$ false ]
237   | indirect $\Rightarrow$ match A with [ INDIRECT _ $\Rightarrow$ true | _ $\Rightarrow$ false ]
238...
239\end{lstlisting}
240\end{quote}
241We also extend this check to vectors of \texttt{addressing\_mode\_tag}'s in the obvious manner:
242\begin{quote}
243\begin{lstlisting}
244nlet rec is_in (n: Nat) (l: Vector addressing_mode_tag n) (A:addressing_mode) on l ≝
245 match l return $\lambda$m.$\lambda$_ :Vector addressing_mode_tag m.Bool with
246  [ VEmpty $\Rightarrow$ false
247  | VCons m he (tl: Vector addressing_mode_tag m) $\Rightarrow$
248     is_a he A $\vee$ is_in ? tl A ].
249\end{lstlisting}
250\end{quote}
251Here \texttt{VEmpty} and \texttt{VCons} are the two constructors of the \texttt{Vector} data type, and $\mathtt{\vee}$ is inclusive disjunction on Booleans.
252\begin{quote}
253\begin{lstlisting}
254nrecord subaddressing_mode (n: Nat) (l: Vector addressing_mode_tag (S n)) : Type[0] ≝
255{
256  subaddressing_modeel :> addressing_mode;
257  subaddressing_modein: bool_to_Prop (is_in ? l subaddressing_modeel)
258}.
259\end{lstlisting}
260\end{quote}
261We can now provide an inductive type of preinstructions with precise typings:
262\begin{quote}
263\begin{lstlisting}
264ninductive preinstruction (A: Type[0]): Type[0] ≝
265   ADD: $\llbracket$ acc_a $\rrbracket$ $\rightarrow$ $\llbracket$ register; direct; indirect; data $\rrbracket$ $\rightarrow$ preinstruction A
266 | ADDC: $\llbracket$ acc_a $\rrbracket$ $\rightarrow$ $\llbracket$ register; direct; indirect; data $\rrbracket$ $\rightarrow$ preinstruction A
267...
268\end{lstlisting}
269\end{quote}
270Here $\llbracket - \rrbracket$ is syntax denoting a vector.
271We see that the constructor \texttt{ADD} expects two parameters, the first being the accumulator A (\texttt{acc\_a}), and the second being one of a register, direct, indirect or data addressing mode.
272
273The final, missing component is a pair of type coercions from \texttt{addressing\_mode} to \texttt{subaddressing\_mode} and from \texttt{subaddressing\_mode} to \texttt{Type$\lbrack0\rbrack$}, respectively.
274The previous machinery allows us to state in the type of a function what addressing modes that function expects.
275For instance, consider \texttt{set\_arg\_16}, which expects only a \texttt{DPTR}:
276\begin{quote}
277\begin{lstlisting}
278ndefinition set_arg_16: Status $\rightarrow$ Word $\rightarrow$ $\llbracket$ dptr $\rrbracket$ $\rightarrow$ Status ≝
279  $\lambda$s, v, a.
280   match a return $\lambda$x. bool_to_Prop (is_in ? $\llbracket$ dptr $\rrbracket$ x) $\rightarrow$ ? with
281     [ DPTR $\Rightarrow$ $\lambda$_: True.
282       let 〈 bu, bl 〉 := split $\ldots$ eight eight v in
283       let status := set_8051_sfr s SFR_DPH bu in
284       let status := set_8051_sfr status SFR_DPL bl in
285         status
286     | _ $\Rightarrow$ $\lambda$_: False.
287       match K in False with
288       [
289       ]
290     ] (subaddressing_modein $\ldots$ a).
291\end{lstlisting}
292\end{quote}
293All other cases are discharged by the catch-all at the bottom of the match expression.
294Attempting to match against another addressing mode not indicated in the type (for example, \texttt{REGISTER}) will produce a type-error.
295
296%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
297% SECTION                                                                      %
298%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
299\section{Validation}
300\label{sect.validation}
301
302%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
303% SECTION                                                                      %
304%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
305\section{Related work}
306\label{sect.related.work}
307
308%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
309% SECTION                                                                      %
310%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
311\section{Conclusions}
312\label{sect.conclusions}
313
314\end{document}
Note: See TracBrowser for help on using the repository browser.