source: Deliverables/D4.1/ITP-Paper/itp-2011.tex @ 505

Last change on this file since 505 was 505, checked in by mulligan, 8 years ago

finished sentence

File size: 16.7 KB
Line 
1\documentclass{llncs}
2
3\usepackage{amsfonts}
4\usepackage{amsmath}
5\usepackage{amssymb} 
6\usepackage[english]{babel}
7\usepackage{color}
8\usepackage{graphicx}
9\usepackage[utf8x]{inputenc}
10\usepackage{listings}
11\usepackage{microtype}
12\usepackage{stmaryrd}
13\usepackage{url}
14
15\lstdefinelanguage{matita-ocaml}
16  {keywords={ndefinition,ncoercion,nlemma,ntheorem,nremark,ninductive,nrecord,nqed,nlet,let,in,rec,match,return,with,Type,try},
17   morekeywords={[2]nwhd,nnormalize,nelim,ncases,ndestruct},
18   morekeywords={[3]type,of},
19   mathescape=true,
20  }
21\lstset{language=matita-ocaml,basicstyle=\small\tt,columns=flexible,breaklines=false,
22        keywordstyle=\color{red}\bfseries,
23        keywordstyle=[2]\color{blue},
24        keywordstyle=[3]\color{blue}\bfseries,
25        commentstyle=\color{green},
26        stringstyle=\color{blue},
27        showspaces=false,showstringspaces=false}
28\lstset{extendedchars=false}
29\lstset{inputencoding=utf8x}
30\DeclareUnicodeCharacter{8797}{:=}
31\DeclareUnicodeCharacter{10746}{++}
32\DeclareUnicodeCharacter{9001}{\ensuremath{\langle}}
33\DeclareUnicodeCharacter{9002}{\ensuremath{\rangle}}
34
35\author{Claudio Sacerdoti Coen \and Dominic P. Mulligan}
36\authorrunning{C. Sacerdoti Coen and D. P. Mulligan}
37\title{An executable formalisation of the MCS-51 microprocessor in Matita}
38\titlerunning{An executable formalisation of the MCS-51}
39\institute{Dipartimento di Scienze dell'Informazione, University of Bologna}
40
41\begin{document}
42
43\maketitle
44
45\begin{abstract}
46We summarise our formalisation of an emulator for the MCS-51 microprocessor in the Matita proof assistant.
47The MCS-51 is a widely used 8-bit microprocessor, especially popular in embedded devices.
48
49We proceeded in two stages, first implementing in O'Caml a prototype emulator, where bugs could be `ironed out' quickly.
50We then ported our O'Caml emulator to Matita's internal language.
51Though mostly straight-forward, this porting presented multiple problems.
52Of particular interest is how we handle the extreme non-orthoganality of the MSC-51's instruction set.
53In O'Caml, this was handled through heavy use of polymorphic variants.
54In Matita, we achieve the same effect through a non-standard use of dependent types.
55
56Both the O'Caml and Matita emulators are `executable'.
57Assembly programs may be animated within Matita, producing a trace of instructions executed.
58
59Our formalisation is a major component of the ongoing EU-funded CerCo project.
60\end{abstract}
61
62%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
63% SECTION                                                                      %
64%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
65\section{Introduction}
66\label{sect.introduction}
67
68Compiler verification, as of late, is as of late a `hot topic'.
69This rapidly growing field is motivated by one simple question: `to what extent can you trust your compiler?'
70Existing verification efforts have broadly focussed on \emph{semantic correctness}, that is, creating a compiler that is guaranteed to preserve the semantics of a program during the compilation process.
71However, there is another important facet of correctness that has not garnered much attention, that is, correctness with respect to some intensional properties of the program to be compiled.
72
73\subsection{The 8051/8052}
74\label{subsect.8051-8052}
75
76The MCS-51 is an eight bit microprocessor introduced by Intel in the late 1970s.
77Commonly called the 8051, in the three decades since its introduction the processor has become a highly popular target for embedded systems engineers.
78Further, the processor and its immediate successor, the 8052, is still manufactured by a host of semiconductor suppliers---many of them European---including Atmel, Siemens Semiconductor, NXP (formerly Phillips Semiconductor), Texas Instruments, and Maxim (formerly Dallas Semiconductor).
79
80The 8051 is a well documented processor, and has the additional support of numerous open source and commercial tools, such as compilers for high-level languages and emulators.
81For instance, the open source Small Device C Compiler (SDCC) recognises a dialect of C, and other compilers targeting the 8051 for BASIC, Forth and Modula-2 are also extant.
82An open source emulator for the processor, MCU8051 IDE, is also available.
83
84\begin{figure}[t]
85\begin{center}
86\includegraphics[scale=0.5]{memorylayout.png}
87\end{center}
88\caption{High level overview of the 8051 memory layout}
89\label{fig.memory.layout}
90\end{figure}
91
92The 8051 has a relatively straightforward architecture, unencumbered by advanced features of modern processors, making it an ideal target for formalisation.
93A high-level overview of the processor's memory layout is provided in Figure~\ref{fig.memory.layout}.
94
95Processor RAM is divided into numerous segments, with the most prominent division being between internal and (optional) external memory.
96Internal memory, commonly provided on the die itself with fast access, is further divided into 128 bytes of internal RAM and numerous Special Function Registers (SFRs) which control the operation of the processor.
97Internal RAM (IRAM) is further divided into a eight general purpose bit-addressable registers (R0--R7).
98These sit in the first eight bytes of IRAM, though can be programmatically `shifted up' as needed.
99Bit memory, followed by a small amount of stack space resides in the memory space immediately after the register banks.
100What remains of the IRAM may be treated as general purpose memory.
101A schematic view of IRAM layout is provided in Figure~\ref{fig.iram.layout}.
102
103External RAM (XRAM), limited to 64 kilobytes, is optional, and may be provided on or off chip, depending on the manufacturer.
104XRAM is accessed using a dedicated instruction.
105External code memory (XCODE) is often stored in the form of an EPROM, and limited to 64 kilobytes in size.
106However, depending on the particular manufacturer and processor model, a dedicated on-die read-only memory area for program code (ICODE) may also be supplied.
107
108Memory may be addressed in numerous ways: immediate, direct, indirect, external direct and code indirect.
109As the latter two addressing modes hint, there are some restrictions enforced by the 8051 and its derivatives on which addressing modes may be used with specific types of memory.
110For instance, the 128 bytes of extra internal RAM that the 8052 features cannot be addressed using indirect addressing; rather, external (in)direct addressing must be used.
111
112The 8051 series possesses an eight bit Arithmetic and Logic Unit (ALU), with a wide variety of instructions for performing arithmetic and logical operations on bits and integers.
113Further, the processor possesses two eight bit general purpose accumulators, A and B.
114
115Communication with the device is facilitated by an onboard UART serial port, and associated serial controller, which can operate in numerous modes.
116Serial baud rate is determined by one of two sixteen bit timers included with the 8051, which can be set to multiple modes of operation.
117(The 8052 provides an additional sixteen bit timer.)
118As an additional method of communication, the 8051 also provides a four byte bit-addressable input-output port.
119
120The programmer may take advantage of the interrupt mechanism that the processor provides.
121This is especially useful when dealing with input or output involving the serial device, as an interrupt can be set when a whole character is sent or received via the serial port.
122
123Interrupts immediately halt the flow of execution of the processor, and cause the program counter to jump to a fixed address, where the requisite interrupt handler is stored.
124However, interrupts may be set to one of two priorities: low and high.
125The interrupt handler of an interrupt with high priority is executed ahead of the interrupt handler of an interrupt of lower priority, interrupting a currently executing handler of lower priority, if necessary.
126
127The 8051 has interrupts disabled by default.
128The programmer is free to handle serial input and output manually, by poking serial flags in the SFRs.
129Similarly, `exceptional circumstances' that would otherwise trigger an interrupt on more modern processors, for example, division by zero, are also signalled by setting flags.
130
131\begin{figure}[t]
132\begin{center}
133\includegraphics[scale=0.5]{iramlayout.png}
134\end{center}
135\caption{Schematic view of 8051 IRAM layout}
136\label{fig.iram.layout}
137\end{figure}
138
139%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
140% SECTION                                                                      %
141%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
142\subsection{The CerCo project}
143\label{subsect.cerco.project}
144
145The CerCo project (`Certified Complexity') is a current European FeT-Open project incorporating three sites---the Universities of Bologna, Edinburgh and Paris Diderot 7---throughout the European Union.
146The ultimate aim of the project is to produce a certified compiler for a large subset of the C programming language targetting a microprocessor used in embedded systems.
147In this respect, the CerCo project bears a deal of similarity with CompCert, another European funded project.
148However, we see a number of important differences between the aims of the two projects.
149\begin{enumerate}
150\item
151The CerCo project aims to allow reasoning on aspects of the intensional properties of C programs.
152That is,
153\item
154The CompCert project compiled a subset of C down to the assembly level.
155A semantics for assembly language was provided.
156\end{enumerate}
157
158%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
159% SECTION                                                                      %
160%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
161\subsection{Overview of paper}
162\label{subsect.overview.paper}
163
164In Section~\ref{sect.development.strategy} we provide a brief overview of how we designed and implemented the formalised microprocessor emulator.
165In Section~\ref{sect.design.issues.formalisation} we describe how we made use of dependent types to handle some of the idiosyncracies of the microprocessor.
166In Section~\ref{sect.related.work} we describe the relation our work has to
167
168%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
169% SECTION                                                                      %
170%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
171\section{From O'Caml prototype to Matita formalisation}
172\label{sect.from.o'caml.prototype.matita.formalisation}
173
174Our implementation followed a two-layered approach.
175
176\paragraph{O'Caml prototype}
177Our implementation began with an emulator written in O'Caml.
178This served the purpose of allowing us to `iron out' any bugs in the design of the emulator within a more permissive language.
179Further,
180
181\paragraph{Matita formalisation}
182
183%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
184% SECTION                                                                      %
185%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
186\section{Design issues in the formalisation}
187\label{sect.design.issues.formalisation}
188
189%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
190% SECTION                                                                      %
191%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
192\subsection{Labels and pseudoinstructions}
193\label{subsect.labels.pseudoinstructions}
194
195%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
196% SECTION                                                                      %
197%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
198\subsection{Putting dependent types to work}
199\label{subsect.putting.dependent.types.to.work}
200
201We provide an inductive data type representing all possible addressing modes of 8051 assembly.
202This is the type that functions will pattern match against.
203
204\begin{quote}
205\begin{lstlisting}
206ninductive addressing_mode: Type[0] ≝
207  DIRECT: Byte $\rightarrow$ addressing_mode
208| INDIRECT: Bit $\rightarrow$ addressing_mode
209...
210\end{lstlisting}
211\end{quote}
212However, we also wish to express in the type of our functions the \emph{impossibility} of pattern matching against certain constructors.
213In order to do this, we introduce an inductive type of addressing mode `tags'.
214The constructors of \texttt{addressing\_mode\_tag} are in one-one correspondence with the constructors of \texttt{addressing\_mode}:
215\begin{quote}
216\begin{lstlisting}
217ninductive addressing_mode_tag : Type[0] ≝
218  direct: addressing_mode_tag
219| indirect: addressing_mode_tag
220...
221\end{lstlisting}
222\end{quote}
223We then provide a function that checks whether an \texttt{addressing\_mode} is `morally' an \texttt{addressing\_mode\_tag}, as follows:
224\begin{quote}
225\begin{lstlisting}
226nlet rec is_a (d:addressing_mode_tag) (A:addressing_mode) on d ≝
227  match d with
228   [ direct $\Rightarrow$ match A with [ DIRECT _ $\Rightarrow$ true | _ $\Rightarrow$ false ]
229   | indirect $\Rightarrow$ match A with [ INDIRECT _ $\Rightarrow$ true | _ $\Rightarrow$ false ]
230...
231\end{lstlisting}
232\end{quote}
233We also extend this check to vectors of \texttt{addressing\_mode\_tag}'s in the obvious manner:
234\begin{quote}
235\begin{lstlisting}
236nlet rec is_in (n: Nat) (l: Vector addressing_mode_tag n) (A:addressing_mode) on l ≝
237 match l return $\lambda$m.$\lambda$_ :Vector addressing_mode_tag m.Bool with
238  [ VEmpty $\Rightarrow$ false
239  | VCons m he (tl: Vector addressing_mode_tag m) $\Rightarrow$
240     is_a he A $\vee$ is_in ? tl A ].
241\end{lstlisting}
242\end{quote}
243Here \texttt{VEmpty} and \texttt{VCons} are the two constructors of the \texttt{Vector} data type, and $\mathtt{\vee}$ is inclusive disjunction on Booleans.
244\begin{quote}
245\begin{lstlisting}
246nrecord subaddressing_mode (n: Nat) (l: Vector addressing_mode_tag (S n)) : Type[0] ≝
247{
248  subaddressing_modeel :> addressing_mode;
249  subaddressing_modein: bool_to_Prop (is_in ? l subaddressing_modeel)
250}.
251\end{lstlisting}
252\end{quote}
253We can now provide an inductive type of preinstructions with precise typings:
254\begin{quote}
255\begin{lstlisting}
256ninductive preinstruction (A: Type[0]): Type[0] ≝
257   ADD: $\llbracket$ acc_a $\rrbracket$ $\rightarrow$ $\llbracket$ register; direct; indirect; data $\rrbracket$ $\rightarrow$ preinstruction A
258 | ADDC: $\llbracket$ acc_a $\rrbracket$ $\rightarrow$ $\llbracket$ register; direct; indirect; data $\rrbracket$ $\rightarrow$ preinstruction A
259...
260\end{lstlisting}
261\end{quote}
262Here $\llbracket - \rrbracket$ is syntax denoting a vector.
263We see that the constructor \texttt{ADD} expects two parameters, the first being the accumulator A (\texttt{acc\_a}), and the second being one of a register, direct, indirect or data addressing mode.
264
265The final, missing component is a pair of type coercions from \texttt{addressing\_mode} to \texttt{subaddressing\_mode} and from \texttt{subaddressing\_mode} to \texttt{Type$\lbrack0\rbrack$}, respectively.
266The previous machinery allows us to state in the type of a function what addressing modes that function expects.
267For instance, consider \texttt{set\_arg\_16}, which expects only a \texttt{DPTR}:
268\begin{quote}
269\begin{lstlisting}
270ndefinition set_arg_16: Status $\rightarrow$ Word $\rightarrow$ $\llbracket$ dptr $\rrbracket$ $\rightarrow$ Status ≝
271  $\lambda$s, v, a.
272   match a return $\lambda$x. bool_to_Prop (is_in ? $\llbracket$ dptr $\rrbracket$ x) $\rightarrow$ ? with
273     [ DPTR $\Rightarrow$ $\lambda$_: True.
274       let 〈 bu, bl 〉 := split $\ldots$ eight eight v in
275       let status := set_8051_sfr s SFR_DPH bu in
276       let status := set_8051_sfr status SFR_DPL bl in
277         status
278     | _ $\Rightarrow$ $\lambda$_: False.
279       match K in False with
280       [
281       ]
282     ] (subaddressing_modein $\ldots$ a).
283\end{lstlisting}
284\end{quote}
285All other cases are discharged by the catch-all at the bottom of the match expression.
286Attempting to match against another addressing mode not indicated in the type (for example, \texttt{REGISTER}) will produce a type-error.
287
288%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
289% SECTION                                                                      %
290%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
291\section{Validation}
292\label{sect.validation}
293
294%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
295% SECTION                                                                      %
296%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
297\section{Related work}
298\label{sect.related.work}
299
300%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
301% SECTION                                                                      %
302%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
303\section{Conclusions}
304\label{sect.conclusions}
305
306\end{document}
Note: See TracBrowser for help on using the repository browser.