source: Deliverables/D4.1/ITP-Paper/itp-2011.tex @ 507

Last change on this file since 507 was 507, checked in by mulligan, 8 years ago

A bit on labels and pseudo-instructions

File size: 18.0 KB
Line 
1\documentclass{llncs}
2
3\usepackage{amsfonts}
4\usepackage{amsmath}
5\usepackage{amssymb} 
6\usepackage[english]{babel}
7\usepackage{color}
8\usepackage{graphicx}
9\usepackage[utf8x]{inputenc}
10\usepackage{listings}
11\usepackage{microtype}
12\usepackage{stmaryrd}
13\usepackage{url}
14
15\lstdefinelanguage{matita-ocaml}
16  {keywords={ndefinition,ncoercion,nlemma,ntheorem,nremark,ninductive,nrecord,nqed,nlet,let,in,rec,match,return,with,Type,try},
17   morekeywords={[2]nwhd,nnormalize,nelim,ncases,ndestruct},
18   morekeywords={[3]type,of},
19   mathescape=true,
20  }
21\lstset{language=matita-ocaml,basicstyle=\small\tt,columns=flexible,breaklines=false,
22        keywordstyle=\color{red}\bfseries,
23        keywordstyle=[2]\color{blue},
24        keywordstyle=[3]\color{blue}\bfseries,
25        commentstyle=\color{green},
26        stringstyle=\color{blue},
27        showspaces=false,showstringspaces=false}
28\lstset{extendedchars=false}
29\lstset{inputencoding=utf8x}
30\DeclareUnicodeCharacter{8797}{:=}
31\DeclareUnicodeCharacter{10746}{++}
32\DeclareUnicodeCharacter{9001}{\ensuremath{\langle}}
33\DeclareUnicodeCharacter{9002}{\ensuremath{\rangle}}
34
35\author{Claudio Sacerdoti Coen \and Dominic P. Mulligan}
36\authorrunning{C. Sacerdoti Coen and D. P. Mulligan}
37\title{An executable formalisation of the MCS-51 microprocessor in Matita}
38\titlerunning{An executable formalisation of the MCS-51}
39\institute{Dipartimento di Scienze dell'Informazione, University of Bologna}
40
41\begin{document}
42
43\maketitle
44
45\begin{abstract}
46We summarise our formalisation of an emulator for the MCS-51 microprocessor in the Matita proof assistant.
47The MCS-51 is a widely used 8-bit microprocessor, especially popular in embedded devices.
48
49We proceeded in two stages, first implementing in O'Caml a prototype emulator, where bugs could be `ironed out' quickly.
50We then ported our O'Caml emulator to Matita's internal language.
51Though mostly straight-forward, this porting presented multiple problems.
52Of particular interest is how we handle the extreme non-orthoganality of the MSC-51's instruction set.
53In O'Caml, this was handled through heavy use of polymorphic variants.
54In Matita, we achieve the same effect through a non-standard use of dependent types.
55
56Both the O'Caml and Matita emulators are `executable'.
57Assembly programs may be animated within Matita, producing a trace of instructions executed.
58
59Our formalisation is a major component of the ongoing EU-funded CerCo project.
60\end{abstract}
61
62%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
63% SECTION                                                                      %
64%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
65\section{Introduction}
66\label{sect.introduction}
67
68Compiler verification, as of late, is as of late a `hot topic'.
69This rapidly growing field is motivated by one simple question: `to what extent can you trust your compiler?'
70Existing verification efforts have broadly focussed on \emph{semantic correctness}, that is, creating a compiler that is guaranteed to preserve the semantics of a program during the compilation process.
71However, there is another important facet of correctness that has not garnered much attention, that is, correctness with respect to some intensional properties of the program to be compiled.
72
73\subsection{The 8051/8052}
74\label{subsect.8051-8052}
75
76The MCS-51 is an eight bit microprocessor introduced by Intel in the late 1970s.
77Commonly called the 8051, in the three decades since its introduction the processor has become a highly popular target for embedded systems engineers.
78Further, the processor and its immediate successor, the 8052, is still manufactured by a host of semiconductor suppliers---many of them European---including Atmel, Siemens Semiconductor, NXP (formerly Phillips Semiconductor), Texas Instruments, and Maxim (formerly Dallas Semiconductor).
79
80The 8051 is a well documented processor, and has the additional support of numerous open source and commercial tools, such as compilers for high-level languages and emulators.
81For instance, the open source Small Device C Compiler (SDCC) recognises a dialect of C, and other compilers targeting the 8051 for BASIC, Forth and Modula-2 are also extant.
82An open source emulator for the processor, MCU8051 IDE, is also available.
83
84\begin{figure}[t]
85\begin{center}
86\includegraphics[scale=0.5]{memorylayout.png}
87\end{center}
88\caption{High level overview of the 8051 memory layout}
89\label{fig.memory.layout}
90\end{figure}
91
92The 8051 has a relatively straightforward architecture, unencumbered by advanced features of modern processors, making it an ideal target for formalisation.
93A high-level overview of the processor's memory layout is provided in Figure~\ref{fig.memory.layout}.
94
95Processor RAM is divided into numerous segments, with the most prominent division being between internal and (optional) external memory.
96Internal memory, commonly provided on the die itself with fast access, is further divided into 128 bytes of internal RAM and numerous Special Function Registers (SFRs) which control the operation of the processor.
97Internal RAM (IRAM) is further divided into a eight general purpose bit-addressable registers (R0--R7).
98These sit in the first eight bytes of IRAM, though can be programmatically `shifted up' as needed.
99Bit memory, followed by a small amount of stack space resides in the memory space immediately after the register banks.
100What remains of the IRAM may be treated as general purpose memory.
101A schematic view of IRAM layout is provided in Figure~\ref{fig.iram.layout}.
102
103External RAM (XRAM), limited to 64 kilobytes, is optional, and may be provided on or off chip, depending on the manufacturer.
104XRAM is accessed using a dedicated instruction.
105External code memory (XCODE) is often stored in the form of an EPROM, and limited to 64 kilobytes in size.
106However, depending on the particular manufacturer and processor model, a dedicated on-die read-only memory area for program code (ICODE) may also be supplied.
107
108Memory may be addressed in numerous ways: immediate, direct, indirect, external direct and code indirect.
109As the latter two addressing modes hint, there are some restrictions enforced by the 8051 and its derivatives on which addressing modes may be used with specific types of memory.
110For instance, the 128 bytes of extra internal RAM that the 8052 features cannot be addressed using indirect addressing; rather, external (in)direct addressing must be used.
111
112The 8051 series possesses an eight bit Arithmetic and Logic Unit (ALU), with a wide variety of instructions for performing arithmetic and logical operations on bits and integers.
113Further, the processor possesses two eight bit general purpose accumulators, A and B.
114
115Communication with the device is facilitated by an onboard UART serial port, and associated serial controller, which can operate in numerous modes.
116Serial baud rate is determined by one of two sixteen bit timers included with the 8051, which can be set to multiple modes of operation.
117(The 8052 provides an additional sixteen bit timer.)
118As an additional method of communication, the 8051 also provides a four byte bit-addressable input-output port.
119
120The programmer may take advantage of the interrupt mechanism that the processor provides.
121This is especially useful when dealing with input or output involving the serial device, as an interrupt can be set when a whole character is sent or received via the serial port.
122
123Interrupts immediately halt the flow of execution of the processor, and cause the program counter to jump to a fixed address, where the requisite interrupt handler is stored.
124However, interrupts may be set to one of two priorities: low and high.
125The interrupt handler of an interrupt with high priority is executed ahead of the interrupt handler of an interrupt of lower priority, interrupting a currently executing handler of lower priority, if necessary.
126
127The 8051 has interrupts disabled by default.
128The programmer is free to handle serial input and output manually, by poking serial flags in the SFRs.
129Similarly, `exceptional circumstances' that would otherwise trigger an interrupt on more modern processors, for example, division by zero, are also signalled by setting flags.
130
131\begin{figure}[t]
132\begin{center}
133\includegraphics[scale=0.5]{iramlayout.png}
134\end{center}
135\caption{Schematic view of 8051 IRAM layout}
136\label{fig.iram.layout}
137\end{figure}
138
139%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
140% SECTION                                                                      %
141%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
142\subsection{The CerCo project}
143\label{subsect.cerco.project}
144
145The CerCo project (`Certified Complexity') is a current European FeT-Open project incorporating three sites---the Universities of Bologna, Edinburgh and Paris Diderot 7---throughout the European Union.
146The ultimate aim of the project is to produce a certified compiler for a large subset of the C programming language targetting a microprocessor used in embedded systems.
147In this respect, the CerCo project bears a deal of similarity with CompCert, another European funded project.
148However, we see a number of important differences between the aims of the two projects.
149\begin{enumerate}
150\item
151The CerCo project aims to allow reasoning on aspects of the intensional properties of C programs.
152That is,
153\item
154The CompCert project compiled a subset of C down to the assembly level.
155A semantics for assembly language was provided.
156\end{enumerate}
157
158%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
159% SECTION                                                                      %
160%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
161\subsection{Overview of paper}
162\label{subsect.overview.paper}
163
164In Section~\ref{sect.development.strategy} we provide a brief overview of how we designed and implemented the formalised microprocessor emulator.
165In Section~\ref{sect.design.issues.formalisation} we describe how we made use of dependent types to handle some of the idiosyncracies of the microprocessor.
166In Section~\ref{sect.related.work} we describe the relation our work has to
167
168%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
169% SECTION                                                                      %
170%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
171\section{From O'Caml prototype to Matita formalisation}
172\label{sect.from.o'caml.prototype.matita.formalisation}
173
174Our implementation progressed in two stages:
175
176\paragraph{O'Caml prototype}
177We began with an emulator written in O'Caml.
178We used this to `iron out' any bugs in our design and implementation within O'Caml's more permissive type system.
179O'Caml's ability to perform file input-output also eased debugging and validation.
180Once we were happy with the performance and design of the O'Caml emulator, we moved to the Matita formalisation.
181
182\paragraph{Matita formalisation}
183Matita's syntax is lexically similar to O'Caml's.
184This eased the translation, as large swathes of code were merely copy-pasted with minor modifications.
185However, several major issues had to be addresses when moving from O'Caml to Matita.
186These are now discussed.
187
188%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
189% SECTION                                                                      %
190%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
191\section{Design issues in the formalisation}
192\label{sect.design.issues.formalisation} 
193
194%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
195% SECTION                                                                      %
196%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
197\subsection{Labels and pseudoinstructions}
198\label{subsect.labels.pseudoinstructions}
199
200The MCS-51's instruction set has numerous instructions for jumping to memory locations, calling procedures and moving data between memory spaces.
201For instance, the instructions \texttt{AJMP}, \texttt{JMP} and \texttt{LJMP} all perform jumps.
202However, these instructions differ in how large the maximum size of the offset in memory of the jump performed can be.
203Selecting a jump instruction therefore requires the compiler to compute the size of the offset and select the most suitable instruction.
204
205In parallel with the implementation of the emulator was the implementation of a prototype compiler.
206It was recognised that it would simplify the design of the compiler if the emulator introduced a notion of \emph{pseudoinstruction}.
207That is, instead of requiring that the compiler select the most appropriate jump instruction, for instance, we introduce a single pseudoinstruction \texttt{Jump}.
208Similarly, we introduce pseudoinstructions for generic calls,
209
210%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
211% SECTION                                                                      %
212%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
213\subsection{Putting dependent types to work}
214\label{subsect.putting.dependent.types.to.work}
215
216We provide an inductive data type representing all possible addressing modes of 8051 assembly.
217This is the type that functions will pattern match against.
218
219\begin{quote}
220\begin{lstlisting}
221ninductive addressing_mode: Type[0] ≝
222  DIRECT: Byte $\rightarrow$ addressing_mode
223| INDIRECT: Bit $\rightarrow$ addressing_mode
224...
225\end{lstlisting}
226\end{quote}
227However, we also wish to express in the type of our functions the \emph{impossibility} of pattern matching against certain constructors.
228In order to do this, we introduce an inductive type of addressing mode `tags'.
229The constructors of \texttt{addressing\_mode\_tag} are in one-one correspondence with the constructors of \texttt{addressing\_mode}:
230\begin{quote}
231\begin{lstlisting}
232ninductive addressing_mode_tag : Type[0] ≝
233  direct: addressing_mode_tag
234| indirect: addressing_mode_tag
235...
236\end{lstlisting}
237\end{quote}
238We then provide a function that checks whether an \texttt{addressing\_mode} is `morally' an \texttt{addressing\_mode\_tag}, as follows:
239\begin{quote}
240\begin{lstlisting}
241nlet rec is_a (d:addressing_mode_tag) (A:addressing_mode) on d ≝
242  match d with
243   [ direct $\Rightarrow$ match A with [ DIRECT _ $\Rightarrow$ true | _ $\Rightarrow$ false ]
244   | indirect $\Rightarrow$ match A with [ INDIRECT _ $\Rightarrow$ true | _ $\Rightarrow$ false ]
245...
246\end{lstlisting}
247\end{quote}
248We also extend this check to vectors of \texttt{addressing\_mode\_tag}'s in the obvious manner:
249\begin{quote}
250\begin{lstlisting}
251nlet rec is_in (n: Nat) (l: Vector addressing_mode_tag n) (A:addressing_mode) on l ≝
252 match l return $\lambda$m.$\lambda$_ :Vector addressing_mode_tag m.Bool with
253  [ VEmpty $\Rightarrow$ false
254  | VCons m he (tl: Vector addressing_mode_tag m) $\Rightarrow$
255     is_a he A $\vee$ is_in ? tl A ].
256\end{lstlisting}
257\end{quote}
258Here \texttt{VEmpty} and \texttt{VCons} are the two constructors of the \texttt{Vector} data type, and $\mathtt{\vee}$ is inclusive disjunction on Booleans.
259\begin{quote}
260\begin{lstlisting}
261nrecord subaddressing_mode (n: Nat) (l: Vector addressing_mode_tag (S n)) : Type[0] ≝
262{
263  subaddressing_modeel :> addressing_mode;
264  subaddressing_modein: bool_to_Prop (is_in ? l subaddressing_modeel)
265}.
266\end{lstlisting}
267\end{quote}
268We can now provide an inductive type of preinstructions with precise typings:
269\begin{quote}
270\begin{lstlisting}
271ninductive preinstruction (A: Type[0]): Type[0] ≝
272   ADD: $\llbracket$ acc_a $\rrbracket$ $\rightarrow$ $\llbracket$ register; direct; indirect; data $\rrbracket$ $\rightarrow$ preinstruction A
273 | ADDC: $\llbracket$ acc_a $\rrbracket$ $\rightarrow$ $\llbracket$ register; direct; indirect; data $\rrbracket$ $\rightarrow$ preinstruction A
274...
275\end{lstlisting}
276\end{quote}
277Here $\llbracket - \rrbracket$ is syntax denoting a vector.
278We see that the constructor \texttt{ADD} expects two parameters, the first being the accumulator A (\texttt{acc\_a}), and the second being one of a register, direct, indirect or data addressing mode.
279
280The final, missing component is a pair of type coercions from \texttt{addressing\_mode} to \texttt{subaddressing\_mode} and from \texttt{subaddressing\_mode} to \texttt{Type$\lbrack0\rbrack$}, respectively.
281The previous machinery allows us to state in the type of a function what addressing modes that function expects.
282For instance, consider \texttt{set\_arg\_16}, which expects only a \texttt{DPTR}:
283\begin{quote}
284\begin{lstlisting}
285ndefinition set_arg_16: Status $\rightarrow$ Word $\rightarrow$ $\llbracket$ dptr $\rrbracket$ $\rightarrow$ Status ≝
286  $\lambda$s, v, a.
287   match a return $\lambda$x. bool_to_Prop (is_in ? $\llbracket$ dptr $\rrbracket$ x) $\rightarrow$ ? with
288     [ DPTR $\Rightarrow$ $\lambda$_: True.
289       let 〈 bu, bl 〉 := split $\ldots$ eight eight v in
290       let status := set_8051_sfr s SFR_DPH bu in
291       let status := set_8051_sfr status SFR_DPL bl in
292         status
293     | _ $\Rightarrow$ $\lambda$_: False.
294       match K in False with
295       [
296       ]
297     ] (subaddressing_modein $\ldots$ a).
298\end{lstlisting}
299\end{quote}
300All other cases are discharged by the catch-all at the bottom of the match expression.
301Attempting to match against another addressing mode not indicated in the type (for example, \texttt{REGISTER}) will produce a type-error.
302
303%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
304% SECTION                                                                      %
305%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
306\section{Validation}
307\label{sect.validation}
308
309%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
310% SECTION                                                                      %
311%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
312\section{Related work}
313\label{sect.related.work}
314
315%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
316% SECTION                                                                      %
317%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
318\section{Conclusions}
319\label{sect.conclusions}
320
321\end{document}
Note: See TracBrowser for help on using the repository browser.