source: Deliverables/D4.1/ITP-Paper/itp-2011.tex @ 501

Last change on this file since 501 was 501, checked in by mulligan, 9 years ago

more changes

File size: 16.3 KB
Line 
1\documentclass{llncs}
2
3\usepackage{amsfonts}
4\usepackage{amsmath}
5\usepackage{amssymb} 
6\usepackage[english]{babel}
7\usepackage{color}
8\usepackage{graphicx}
9\usepackage[utf8x]{inputenc}
10\usepackage{listings}
11\usepackage{stmaryrd}
12\usepackage{url}
13
14\lstdefinelanguage{matita-ocaml}
15  {keywords={ndefinition,ncoercion,nlemma,ntheorem,nremark,ninductive,nrecord,nqed,nlet,let,in,rec,match,return,with,Type,try},
16   morekeywords={[2]nwhd,nnormalize,nelim,ncases,ndestruct},
17   morekeywords={[3]type,of},
18   mathescape=true,
19  }
20\lstset{language=matita-ocaml,basicstyle=\small\tt,columns=flexible,breaklines=false,
21        keywordstyle=\color{red}\bfseries,
22        keywordstyle=[2]\color{blue},
23        keywordstyle=[3]\color{blue}\bfseries,
24        commentstyle=\color{green},
25        stringstyle=\color{blue},
26        showspaces=false,showstringspaces=false}
27\lstset{extendedchars=false}
28\lstset{inputencoding=utf8x}
29\DeclareUnicodeCharacter{8797}{:=}
30\DeclareUnicodeCharacter{10746}{++}
31\DeclareUnicodeCharacter{9001}{\ensuremath{\langle}}
32\DeclareUnicodeCharacter{9002}{\ensuremath{\rangle}}
33
34\author{Claudio Sacerdoti Coen \and Dominic P. Mulligan}
35\authorrunning{C. Sacerdoti Coen and D. P. Mulligan}
36\title{An executable formalisation of the MCS-51 microprocessor in Matita}
37\titlerunning{An executable formalisation of the MCS-51}
38\institute{Dipartimento di Scienze dell'Informazione, University of Bologna}
39
40\begin{document}
41
42\maketitle
43
44\begin{abstract}
45We summarise our formalisation of an emulator for the MCS-51 microprocessor in the Matita proof assistant.
46The MCS-51 is a widely used 8-bit microprocessor, especially popular in embedded devices.
47
48We proceeded in two stages, first implementing in O'Caml a prototype emulator, where bugs could be `ironed out' quickly.
49We then ported our O'Caml emulator to Matita's internal language.
50Though mostly straight-forward, this porting presented multiple problems.
51Of particular interest is how we handle the extreme non-orthoganality of the MSC-51's instruction set.
52In O'Caml, this was handled through heavy use of polymorphic variants.
53In Matita, we achieve the same effect through a non-standard use of dependent types.
54
55Both the O'Caml and Matita emulators are `executable'.
56Assembly programs may be animated within Matita, producing a trace of instructions executed.
57
58Our formalisation is a major component of the ongoing EU-funded CerCo project.
59\end{abstract}
60
61%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
62% SECTION                                                                      %
63%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
64\section{Introduction}
65\label{sect.introduction}
66
67Compiler verification, as of late, is a `hot topic' in computer science research.
68This rapidly growing field is motivated by one simple question: `to what extent can you trust your compiler?'
69
70\subsection{The 8051/8052}
71\label{subsect.8051-8052}
72
73The MCS-51 is an eight bit microprocessor introduced by Intel in the late 1970s.
74Commonly called the 8051, in the three decades since its introduction the processor has become a highly popular target for embedded systems engineers.
75Further, the processor and its immediate successor, the 8052, is still manufactured by a host of semiconductor suppliers---many of them European---including Atmel, Siemens Semiconductor, NXP (formerly Phillips Semiconductor), Texas Instruments, and Maxim (formerly Dallas Semiconductor).
76
77The 8051 is a well documented processor, and has the additional support of numerous open source and commercial tools, such as compilers for high-level languages and emulators.
78For instance, the open source Small Device C Compiler (SDCC) recognises a dialect of C, and other compilers targeting the 8051 for BASIC, Forth and Modula-2 are also extant.
79An open source emulator for the processor, MCU8051 IDE, is also available.
80
81\begin{figure}[t]
82\begin{center}
83\includegraphics[scale=0.5]{memorylayout.png}
84\end{center}
85\caption{High level overview of the 8051 memory layout}
86\label{fig.memory.layout}
87\end{figure}
88
89The 8051 has a relatively straightforward architecture, unencumbered by advanced features of modern processors, making it an ideal target for formalisation.
90A high-level overview of the processor's memory layout is provided in Figure~\ref{fig.memory.layout}.
91
92Processor RAM is divided into numerous segments, with the most prominent division being between internal and (optional) external memory.
93Internal memory, commonly provided on the die itself with fast access, is further divided into 128 bytes of internal RAM and numerous Special Function Registers (SFRs) which control the operation of the processor.
94Internal RAM (IRAM) is further divided into a eight general purpose bit-addressable registers (R0--R7).
95These sit in the first eight bytes of IRAM, though can be programmatically `shifted up' as needed.
96Bit memory, followed by a small amount of stack space resides in the memory space immediately after the register banks.
97What remains of the IRAM may be treated as general purpose memory.
98A schematic view of IRAM layout is provided in Figure~\ref{fig.iram.layout}.
99
100External RAM (XRAM), limited to 64 kilobytes, is optional, and may be provided on or off chip, depending on the manufacturer.
101XRAM is accessed using a dedicated instruction.
102External code memory (XCODE) is often stored in the form of an EPROM, and limited to 64 kilobytes in size.
103However, depending on the particular manufacturer and processor model, a dedicated on-die read-only memory area for program code (ICODE) may also be supplied.
104
105Memory may be addressed in numerous ways: immediate, direct, indirect, external direct and code indirect.
106As the latter two addressing modes hint, there are some restrictions enforced by the 8051 and its derivatives on which addressing modes may be used with specific types of memory.
107For instance, the 128 bytes of extra internal RAM that the 8052 features cannot be addressed using indirect addressing; rather, external (in)direct addressing must be used.
108
109The 8051 series possesses an eight bit Arithmetic and Logic Unit (ALU), with a wide variety of instructions for performing arithmetic and logical operations on bits and integers.
110Further, the processor possesses two eight bit general purpose accumulators, A and B.
111
112Communication with the device is facilitated by an onboard UART serial port, and associated serial controller, which can operate in numerous modes.
113Serial baud rate is determined by one of two sixteen bit timers included with the 8051, which can be set to multiple modes of operation.
114(The 8052 provides an additional sixteen bit timer.)
115As an additional method of communication, the 8051 also provides a four byte bit-addressable input-output port.
116
117The programmer may take advantage of the interrupt mechanism that the processor provides.
118This is especially useful when dealing with input or output involving the serial device, as an interrupt can be set when a whole character is sent or received via the serial port.
119
120Interrupts immediately halt the flow of execution of the processor, and cause the program counter to jump to a fixed address, where the requisite interrupt handler is stored.
121However, interrupts may be set to one of two priorities: low and high.
122The interrupt handler of an interrupt with high priority is executed ahead of the interrupt handler of an interrupt of lower priority, interrupting a currently executing handler of lower priority, if necessary.
123
124The 8051 has interrupts disabled by default.
125The programmer is free to handle serial input and output manually, by poking serial flags in the SFRs.
126Similarly, `exceptional circumstances' that would otherwise trigger an interrupt on more modern processors, for example, division by zero, are also signalled by setting flags.
127
128\begin{figure}[t]
129\begin{center}
130\includegraphics[scale=0.5]{iramlayout.png}
131\end{center}
132\caption{Schematic view of 8051 IRAM layout}
133\label{fig.iram.layout}
134\end{figure}
135
136%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
137% SECTION                                                                      %
138%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
139\subsection{The CerCo project}
140\label{subsect.cerco.project}
141
142The CerCo project (`Certified Complexity') is a current European FeT-Open project incorporating three sites---the Universities of Bologna, Edinburgh and Paris Diderot 7---throughout the European Union.
143The ultimate aim of the project is to produce a certified compiler for a large subset of the C programming language targetting a microprocessor used in embedded systems.
144In this respect, the CerCo project bears a deal of similarity with CompCert, another European funded project.
145However, we see a number of important differences between the aims of the two projects.
146\begin{enumerate}
147\item
148The CerCo project aims to allow reasoning on aspects of the intensional properties of C programs.
149That is,
150\item
151The CompCert project compiled a subset of C down to the assembly level.
152A semantics for assembly language was provided.
153\end{enumerate}
154
155%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
156% SECTION                                                                      %
157%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
158\subsection{Overview of paper}
159\label{subsect.overview.paper}
160
161In Section~\ref{sect.development.strategy} we provide a brief overview of how we designed and implemented the formalised microprocessor emulator.
162In Section~\ref{sect.design.issues.formalisation} we describe how we made use of dependent types to handle some of the idiosyncracies of the microprocessor.
163In Section~\ref{sect.related.work} we describe the relation our work has to
164
165%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
166% SECTION                                                                      %
167%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
168\section{From O'Caml prototype to Matita formalisation}
169\label{sect.from.o'caml.prototype.matita.formalisation}
170
171Our implementation followed a two-layered approach.
172
173\paragraph{O'Caml prototype}
174Our implementation began with an emulator written in O'Caml.
175This served the purpose of allowing us to `iron out' any bugs in the design of the emulator within a more permissive language.
176Further,
177
178\paragraph{Matita formalisation}
179
180%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
181% SECTION                                                                      %
182%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
183\section{Design issues in the formalisation}
184\label{sect.design.issues.formalisation}
185
186%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
187% SECTION                                                                      %
188%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
189\subsection{Labels and pseudoinstructions}
190\label{subsect.labels.pseudoinstructions}
191
192%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
193% SECTION                                                                      %
194%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
195\subsection{Putting dependent types to work}
196\label{subsect.putting.dependent.types.to.work}
197
198We provide an inductive data type representing all possible addressing modes of 8051 assembly.
199This is the type that functions will pattern match against.
200
201\begin{quote}
202\begin{lstlisting}
203ninductive addressing_mode: Type[0] ≝
204  DIRECT: Byte $\rightarrow$ addressing_mode
205| INDIRECT: Bit $\rightarrow$ addressing_mode
206...
207\end{lstlisting}
208\end{quote}
209However, we also wish to express in the type of our functions the \emph{impossibility} of pattern matching against certain constructors.
210In order to do this, we introduce an inductive type of addressing mode `tags'.
211The constructors of \texttt{addressing\_mode\_tag} are in one-one correspondence with the constructors of \texttt{addressing\_mode}:
212\begin{quote}
213\begin{lstlisting}
214ninductive addressing_mode_tag : Type[0] ≝
215  direct: addressing_mode_tag
216| indirect: addressing_mode_tag
217...
218\end{lstlisting}
219\end{quote}
220We then provide a function that checks whether an \texttt{addressing\_mode} is `morally' an \texttt{addressing\_mode\_tag}, as follows:
221\begin{quote}
222\begin{lstlisting}
223nlet rec is_a (d:addressing_mode_tag) (A:addressing_mode) on d ≝
224  match d with
225   [ direct $\Rightarrow$ match A with [ DIRECT _ $\Rightarrow$ true | _ $\Rightarrow$ false ]
226   | indirect $\Rightarrow$ match A with [ INDIRECT _ $\Rightarrow$ true | _ $\Rightarrow$ false ]
227...
228\end{lstlisting}
229\end{quote}
230We also extend this check to vectors of \texttt{addressing\_mode\_tag}'s in the obvious manner:
231\begin{quote}
232\begin{lstlisting}
233nlet rec is_in (n: Nat) (l: Vector addressing_mode_tag n) (A:addressing_mode) on l ≝
234 match l return $\lambda$m.$\lambda$_ :Vector addressing_mode_tag m.Bool with
235  [ VEmpty $\Rightarrow$ false
236  | VCons m he (tl: Vector addressing_mode_tag m) $\Rightarrow$
237     is_a he A $\vee$ is_in ? tl A ].
238\end{lstlisting}
239\end{quote}
240Here \texttt{VEmpty} and \texttt{VCons} are the two constructors of the \texttt{Vector} data type, and $\mathtt{\vee}$ is inclusive disjunction on Booleans.
241\begin{quote}
242\begin{lstlisting}
243nrecord subaddressing_mode (n: Nat) (l: Vector addressing_mode_tag (S n)) : Type[0] ≝
244{
245  subaddressing_modeel :> addressing_mode;
246  subaddressing_modein: bool_to_Prop (is_in ? l subaddressing_modeel)
247}.
248\end{lstlisting}
249\end{quote}
250We can now provide an inductive type of preinstructions with precise typings:
251\begin{quote}
252\begin{lstlisting}
253ninductive preinstruction (A: Type[0]): Type[0] ≝
254   ADD: $\llbracket$ acc_a $\rrbracket$ $\rightarrow$ $\llbracket$ register; direct; indirect; data $\rrbracket$ $\rightarrow$ preinstruction A
255 | ADDC: $\llbracket$ acc_a $\rrbracket$ $\rightarrow$ $\llbracket$ register; direct; indirect; data $\rrbracket$ $\rightarrow$ preinstruction A
256...
257\end{lstlisting}
258\end{quote}
259Here $\llbracket - \rrbracket$ is syntax denoting a vector.
260We see that the constructor \texttt{ADD} expects two parameters, the first being the accumulator A (\texttt{acc\_a}), and the second being one of a register, direct, indirect or data addressing mode.
261
262The final, missing component is a pair of type coercions from \texttt{addressing\_mode} to \texttt{subaddressing\_mode} and from \texttt{subaddressing\_mode} to \texttt{Type$\lbrack0\rbrack$}, respectively.
263The previous machinery allows us to state in the type of a function what addressing modes that function expects.
264For instance, consider \texttt{set\_arg\_16}, which expects only a \texttt{DPTR}:
265\begin{quote}
266\begin{lstlisting}
267ndefinition set_arg_16: Status $\rightarrow$ Word $\rightarrow$ $\llbracket$ dptr $\rrbracket$ $\rightarrow$ Status ≝
268  $\lambda$s, v, a.
269   match a return $\lambda$x. bool_to_Prop (is_in ? $\llbracket$ dptr $\rrbracket$ x) $\rightarrow$ ? with
270     [ DPTR $\Rightarrow$ $\lambda$_: True.
271       let 〈 bu, bl 〉 := split $\ldots$ eight eight v in
272       let status := set_8051_sfr s SFR_DPH bu in
273       let status := set_8051_sfr status SFR_DPL bl in
274         status
275     | _ $\Rightarrow$ $\lambda$_: False.
276       match K in False with
277       [
278       ]
279     ] (subaddressing_modein $\ldots$ a).
280\end{lstlisting}
281\end{quote}
282All other cases are discharged by the catch-all at the bottom of the match expression.
283Attempting to match against another addressing mode not indicated in the type (for example, \texttt{REGISTER}) will produce a type-error.
284
285%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
286% SECTION                                                                      %
287%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
288\section{Validation}
289\label{sect.validation}
290
291%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
292% SECTION                                                                      %
293%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
294\section{Related work}
295\label{sect.related.work}
296
297%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
298% SECTION                                                                      %
299%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
300\section{Conclusions}
301\label{sect.conclusions}
302
303\end{document}
Note: See TracBrowser for help on using the repository browser.