source: Deliverables/D4.1/ITP-Paper/itp-2011.tex @ 545

Last change on this file since 545 was 545, checked in by sacerdot, 9 years ago

I/O revisited.

File size: 53.1 KB
19        {\setlength{\fboxsep}{5pt}
20                \setlength{\mylength}{\linewidth}%
21                \addtolength{\mylength}{-2\fboxsep}%
22                \addtolength{\mylength}{-2\fboxrule}%
23                \Sbox
24                \minipage{\mylength}%
25                        \setlength{\abovedisplayskip}{0pt}%
26                        \setlength{\belowdisplayskip}{0pt}%
27                }%
28                {\endminipage\endSbox
29                        \[\fbox{\TheSbox}\]}
[539]32  {keywords={definition,coercion,lemma,theorem,remark,inductive,record,qed,let,in,rec,match,return,with,Type,try,on,to},
[510]33   morekeywords={[2]whd,normalize,elim,cases,destruct},
[532]34   morekeywords={[3]type,of,val,assert,let,function},
[495]35   mathescape=true,
36  }
38        keywordstyle=\color{red}\bfseries,
39        keywordstyle=[2]\color{blue},
40        keywordstyle=[3]\color{blue}\bfseries,
41        commentstyle=\color{green},
42        stringstyle=\color{blue},
43        showspaces=false,showstringspaces=false}
[543]51\author{Dominic P. Mulligan\thanks{The project CerCo acknowledges the financial support of the Future and
52Emerging Technologies (FET) programme within the Seventh Framework
53Programme for Research of the European Commission, under FET-Open grant
54number: 243881} \and Claudio Sacerdoti Coen$^\star$}
[527]55\authorrunning{D. P. Mulligan and C. Sacerdoti Coen}
[501]56\title{An executable formalisation of the MCS-51 microprocessor in Matita}
57\titlerunning{An executable formalisation of the MCS-51}
[544]58\institute{Dipartimento di Scienze dell'Informazione, Universit\`a di Bologna}
[495]65We summarise our formalisation of an emulator for the MCS-51 microprocessor in the Matita proof assistant.
66The MCS-51 is a widely used 8-bit microprocessor, especially popular in embedded devices.
68We proceeded in two stages, first implementing in O'Caml a prototype emulator, where bugs could be `ironed out' quickly.
69We then ported our O'Caml emulator to Matita's internal language.
70Though mostly straight-forward, this porting presented multiple problems.
71Of particular interest is how we handle the extreme non-orthoganality of the MSC-51's instruction set.
72In O'Caml, this was handled through heavy use of polymorphic variants.
[501]73In Matita, we achieve the same effect through a non-standard use of dependent types.
75Both the O'Caml and Matita emulators are `executable'.
76Assembly programs may be animated within Matita, producing a trace of instructions executed.
78Our formalisation is a major component of the ongoing EU-funded CerCo project.
82% SECTION                                                                      %
[512]87Formal methods are designed to increase our confidence in the design and implementation of software (and hardware).
88Ideally, we would like all software to come equipped with a formal specification, along with a proof of correctness for the implementation.
89Today practically all programs are written in high level languages and then compiled into low level ones.
90Specifications are therefore also given at a high level and correctness can be proved by reasoning automatically or interactively on the program's source code.
91The code that is actually run, however, is not the high level source code that we reason on, but the object code that is generated by the compiler.
[512]93A few simple questions now arise:
[509]96What properties are preserved during compilation?
[509]98What properties are affected by the compilation strategy?
[509]100To what extent can you trust your compiler in preserving those properties?
102These questions, and others like them, motivate a current `hot topic' in computer science research: \emph{compiler verification}.
103So far, the field has been focused on the first and last questions only.
104In particular, much attention has been placed on verifying compiler correctness with respect to extensional properties of programs, which are easily preserved during compilation; it is sufficient to completely preserve the denotational semantics of the input program.
[513]106However, if we consider intensional properties of programs---such as space, time or energy spent into computation and transmission of data---the situation is more complex.
[518]107To even be able to express these properties, and to be able to reason about them, we are forced to adopt a cost model that assigns a cost to single, or blocks, of instructions.
[513]108Ideally, we would like to have a compositional cost model that assigns the same cost to all occurrences of one instruction.
[515]109However, compiler optimisations are inherently non-compositional: each occurrence of a high level instruction is usually compiled in a different way according to the context it finds itself in.
[513]110Therefore both the cost model and intensional specifications are affected by the compilation process.
[518]112In the current EU project CerCo (`Certified Complexity') we approach the problem of reasoning about intensional properties of programs as follows.
[514]113We are currently developing a compiler that induces a cost model on the high level source code.
114Costs are assigned to each block of high level instructions by considering the costs of the corresponding blocks of compiled object code.
[518]115The cost model is therefore inherently non-compositional.
[514]116However, the model has the potential to be extremely \emph{precise}, capturing a program's \emph{realistic} cost, by taking into account, not ignoring, the compilation process.
117A prototype compiler, where no approximation of the cost is provided, has been developed.
[514]119We believe that our approach is especially applicable to certifying real time programs.
120Here, a user can certify that all `deadlines' are met whilst wringing as many clock cycles from the processor---using a cost model that does not over-estimate---as possible.
[515]122Further, we see our approach as being relevant to the field of compiler verification (and construction) itself.
123For instance, an optimisation specified only extensionally is only half specified; though the optimisation may preserve the denotational semantics of a program, there is no guarantee that any intensional properties of the program, such as space or time usage, will be improved.
124Another potential application is toward completeness and correctness of the compilation process in the presence of space constraints.
125Here, a compiler could potentially reject a source program targetting an embedded system when the size of the compiled code exceeds the available ROM size.
126Moreover, preservation of a program's semantics may only be required for those programs that do not exhaust the stack or heap.
127Hence the statement of completeness of the compiler must take in to account a realistic cost model.
[515]129In the methodology proposed in CerCo we assume we are able to compute on the object code exact and realistic costs for sequential blocks of instructions.
130With modern processors, though possible~\cite{??,??,??}, it is difficult to compute exact costs or to reasonably approximate them.
131This is because the execution of a program itself has an influence on the speed of processing.
[518]132For instance, caching, memory effects and other advanced features such as branch prediction all have a profound effect on execution speeds.
[515]133For this reason CerCo decided to focus on 8-bit microprocessors.
134These are still widely used in embedded systems, and have the advantage of an easily predictable cost model due to the relative sparcity of features that they possess.
[515]136In particular, we have fully formalised an executable formal semantics of a family of 8 bit Freescale Microprocessors~\cite{oliboni}, and provided a similar executable formal semantics for the MCS-51 microprocessor.
137The latter work is what we describe in this paper.
138The main focus of the formalisation has been on capturing the intensional behaviour of the processor.
139However, the design of the MCS-51 itself has caused problems in our formalisation.
140For example, the MCS-51 has a highly unorthogonal instruction set.
141To cope with this unorthogonality, and to produce an executable specification, we have exploited the dependent type system of Matita, an interactive proof assistant.
[493]143\subsection{The 8051/8052}
146The MCS-51 is an eight bit microprocessor introduced by Intel in the late 1970s.
147Commonly called the 8051, in the three decades since its introduction the processor has become a highly popular target for embedded systems engineers.
[515]148Further, the processor, its immediate successor the 8052, and many derivatives are still manufactured \emph{en masse} by a host of semiconductor suppliers.
150The 8051 is a well documented processor, and has the additional support of numerous open source and commercial tools, such as compilers for high-level languages and emulators.
151For instance, the open source Small Device C Compiler (SDCC) recognises a dialect of C, and other compilers targeting the 8051 for BASIC, Forth and Modula-2 are also extant.
[515]152An open source emulator for the processor, MCU-8051 IDE, is also available.
153Both MCU-8051 IDE and SDCC were used profitably in the implementation of our formalisation.
159\caption{High level overview of the 8051 memory layout}
163The 8051 has a relatively straightforward architecture, unencumbered by advanced features of modern processors, making it an ideal target for formalisation.
164A high-level overview of the processor's memory layout is provided in Figure~\ref{fig.memory.layout}.
166Processor RAM is divided into numerous segments, with the most prominent division being between internal and (optional) external memory.
167Internal memory, commonly provided on the die itself with fast access, is further divided into 128 bytes of internal RAM and numerous Special Function Registers (SFRs) which control the operation of the processor.
[516]168Internal RAM (IRAM) is further divided into eight general purpose bit-addressable registers (R0--R7).
[493]169These sit in the first eight bytes of IRAM, though can be programmatically `shifted up' as needed.
[516]170Bit memory, followed by a small amount of stack space, resides in the memory space immediately after the register banks.
[493]171What remains of the IRAM may be treated as general purpose memory.
172A schematic view of IRAM layout is provided in Figure~\ref{fig.iram.layout}.
[516]174External RAM (XRAM), limited to a maximum size of 64 kilobytes, is optional, and may be provided on or off chip, depending on the manufacturer.
175XRAM is accessed using a dedicated instruction, and requires sixteen bits to address fully.
[493]176External code memory (XCODE) is often stored in the form of an EPROM, and limited to 64 kilobytes in size.
177However, depending on the particular manufacturer and processor model, a dedicated on-die read-only memory area for program code (ICODE) may also be supplied.
179Memory may be addressed in numerous ways: immediate, direct, indirect, external direct and code indirect.
180As the latter two addressing modes hint, there are some restrictions enforced by the 8051 and its derivatives on which addressing modes may be used with specific types of memory.
181For instance, the 128 bytes of extra internal RAM that the 8052 features cannot be addressed using indirect addressing; rather, external (in)direct addressing must be used.
183The 8051 series possesses an eight bit Arithmetic and Logic Unit (ALU), with a wide variety of instructions for performing arithmetic and logical operations on bits and integers.
184Further, the processor possesses two eight bit general purpose accumulators, A and B.
186Communication with the device is facilitated by an onboard UART serial port, and associated serial controller, which can operate in numerous modes.
187Serial baud rate is determined by one of two sixteen bit timers included with the 8051, which can be set to multiple modes of operation.
188(The 8052 provides an additional sixteen bit timer.)
189As an additional method of communication, the 8051 also provides a four byte bit-addressable input-output port.
191The programmer may take advantage of the interrupt mechanism that the processor provides.
192This is especially useful when dealing with input or output involving the serial device, as an interrupt can be set when a whole character is sent or received via the serial port.
194Interrupts immediately halt the flow of execution of the processor, and cause the program counter to jump to a fixed address, where the requisite interrupt handler is stored.
195However, interrupts may be set to one of two priorities: low and high.
196The interrupt handler of an interrupt with high priority is executed ahead of the interrupt handler of an interrupt of lower priority, interrupting a currently executing handler of lower priority, if necessary.
198The 8051 has interrupts disabled by default.
199The programmer is free to handle serial input and output manually, by poking serial flags in the SFRs.
200Similarly, `exceptional circumstances' that would otherwise trigger an interrupt on more modern processors, for example, division by zero, are also signalled by setting flags.
206\caption{Schematic view of 8051 IRAM layout}
211% SECTION                                                                      %
213\subsection{Overview of paper}
[538]216In Section~\ref{} we discuss design issues in the development of the formalisation.
217In Section~\ref{sect.validation} we discuss how we validated the design and implementation of our emulator to ensure that what we formalised was an accurate model of an MCS-51 series microprocessor.
218In Section~\ref{} we describe previous work, with an eye toward describing its relation with the work described herein.
219In Section~\ref{sect.conclusions} we conclude the paper.
222% SECTION                                                                      %
[527]224\section{Design issues in the formalisation}
[540]227From hereonin, we typeset O'Caml source with \texttt{\color{blue}{blue}} and Matita source with \texttt{\color{red}{red}} to distinguish the two syntaxes.
228Matita's syntax is largely straightforward to those familiar with Coq or O'Caml.
229The only subtlety is the use of `\texttt{?}' in an argument position denoting an argument that should be inferred automatically, if possible.
231\subsection{Development strategy}
[538]234Our implementation progressed in two stages.
[506]235We began with an emulator written in O'Caml.
236We used this to `iron out' any bugs in our design and implementation within O'Caml's more permissive type system.
237O'Caml's ability to perform file input-output also eased debugging and validation.
238Once we were happy with the performance and design of the O'Caml emulator, we moved to the Matita formalisation.
[506]240Matita's syntax is lexically similar to O'Caml's.
241This eased the translation, as large swathes of code were merely copy-pasted with minor modifications.
242However, several major issues had to be addresses when moving from O'Caml to Matita.
243These are now discussed.
246% SECTION                                                                      %
[519]248\subsection{Representation of integers}
255type 'a vect = bit list
256type word = [`Sixteen] vect
257type byte = [`Eight] vect
[536]258$\color{blue}{\mathtt{let}}$ from_nibble =
[532]259 function
[536]260    [b1;b2;b3;b4] -> b1,b2,b3,b4
[532]261  | _ -> assert false
[532]268type 'a vect
269type word = [`Sixteen] vect
270type byte = [`Eight] vect
271val from_nibble: nibble -> bit*bit*bit*bit
274\caption{Sample of O'Caml implementation and interface for bitvectors module}
278Integers are represented using bitvectors, i.e. fixed length vectors of booleans.
279In our O'Caml emulator, we `faked' bitvectors using phantom types and polymorphic variants, as in Figure~\ref{fig.ocaml.implementation.bitvectors}.
280From within the bitvector module (left column) bitvectors are just lists of bits.
281However, the module's interface (right column) hides this implementation completely.
[527]283In Matita, we are able to use the full power of dependent types to define `real' bitvectors:
285inductive Vector (A: Type[0]): nat → Type[0] ≝
286  VEmpty: Vector A O
287| VCons: ∀n: nat. A → Vector A n → Vector A (S n).
289We define \texttt{BitVector} as a specialization of \texttt{Vector} to \texttt{bool}.
290We may use Matita's type system to provide even stronger guarantees, here on a function that splits a vector into two pieces at any index, providing that the index is smaller than the length of the \texttt{Vector} to be split:
292let rec split (A: Type[0]) (m,n: nat) on m:
293   Vector A (plus m n) $\rightarrow$ (Vector A m) $\times$ (Vector A n) := ...
297% SECTION                                                                      %
[511]299\subsection{Representing memory}
[516]302The MCS-51 has numerous different types of memory.
303In our prototype implementation, we simply used a map datastructure from the O'Caml standard library.
[519]304Matita's standard library is relatively small, and does not contain a generic map datastructure.
[536]305Therefore, we had the opportunity of crafting a special-purpose datastructure for the job.
307We worked under the assumption that large swathes of memory would often be uninitialized.
[519]308Na\"ively, using a complete binary tree, for instance, would be extremely memory inefficient.
[516]309Instead, we chose to use a modified form of trie, where paths are represented by bitvectors.
310As bitvectors were widely used in our implementation already for representing integers, this worked well:
312inductive BitVectorTrie (A: Type[0]): nat $\rightarrow$ Type[0] ≝
313  Leaf: A $\rightarrow$ BitVectorTrie A 0
314| Node: ∀n. BitVectorTrie A n $\rightarrow$ BitVectorTrie A n $\rightarrow$ BitVectorTrie A (S n)
315| Stub: ∀n. BitVectorTrie A n.
317Here, \texttt{Stub} is a constructor that can appear at any point in our tries.
318It internalises the notion of `uninitialized data'.
319Performing a lookup in memory is now straight-forward.
[519]320We merely traverse a path, and if at any point we encounter a \texttt{Stub}, we return a default value\footnote{All manufacturer data sheets that we consulted were silent on the subject of what should be returned if we attempt to access uninitialized memory.  We defaulted to simply returning zero, though our \texttt{lookup} function is parametric in this choice.  We do not believe that this is an outrageous decision, as SDCC for instance generates code which first `zeroes out' all memory in a preamble before executing the program proper.  This is in line with the C standard, which guarantees that all global variables will be zero initialized piecewise.}.
[516]321As we are using bitvectors, we may make full use of dependent types and ensure that our bitvector paths are of the same length as the height of the tree.
324% SECTION                                                                      %
[519]326\subsection{Labels and pseudoinstructions}
[523]329Aside from implementing the core MCS-51 instruction set, we also provided \emph{pseudoinstructions}, \emph{labels} and \emph{cost labels}.
330The purpose of \emph{cost labels} will be explained in Subsection~\ref{subsect.computation.cost.traces}.
[522]332Introducing pseudoinstructions had the effect of simplifying a C compiler---another component of the CerCo project---that was being implemented in parallel with our implementation.
333To understand why this is so, consider the fact that the MCS-51's instruction set has numerous instructions for unconditional and conditional jumps to memory locations.
[519]334For instance, the instructions \texttt{AJMP}, \texttt{JMP} and \texttt{LJMP} all perform unconditional jumps.
335However, these instructions differ in how large the maximum size of the offset of the jump to be performed can be.
[522]336Further, all jump instructions require a concrete memory address---to jump to---to be specified.
[519]337Requiring the compiler to compute these offsets, and select appropriate jump instructions, was seen as needleslly burdensome.
[522]339Introducing labels also had a simplifying effect on the design of the compiler.
340Instead of jumping to a concrete address, the compiler could `just' jump to a label.
341In this vein, we introduced pseudoinstructions for both unconditional and conditional jumps to a label.
[522]343Further, we also introduced labels for storing global data in a preamble before the program.
344A pseudoinstruction \texttt{Mov} moves (16-bit) data stored at a label into the (16-bit) register \texttt{DPTR}.
345We believe this facility, of storing global data in a preamble referenced by a label, will also make any future extension considering separate compilation much simpler.
347Our pseudoinstructions and labels induce an assembly language similar to that of SDCC.
348All pseudoinstructions and labels are `assembled away', prior to program execution, using a preprocessing stage.
349Jumps are computed in two stages.
350The first stage builds a map associating memory addresses to labels, with the second stage removing pseudojumps with concrete jumps to the correct address.
353% SECTION                                                                      %
[524]355\subsection{Anatomy of the (Matita) emulator}
[517]358The internal state of our Matita emulator is represented as a record:
360record Status: Type[0] ≝
362  code_memory: BitVectorTrie Byte 16;
363  low_internal_ram: BitVectorTrie Byte 7;
364  high_internal_ram: BitVectorTrie Byte 7;
365  external_ram: BitVectorTrie Byte 16;
366  program_counter: Word;
367  special_function_registers_8051: Vector Byte 19;
368  special_function_registers_8052: Vector Byte 5;
369  ...
372This record neatly encapsulates the current memory contents, the program counter, the state of the current SFRs, and so on.
373One peculiarity is the packing of the 24 combined SFRs into fixed length vectors.
374This was due to a bug in Matita when we were constructing the emulator, since fixed, where the time needed to typecheck a record grew exponentially with the number of fields.
[536]376Here, it appears that the MCS-51's memory spaces are completely disjoint.
377This is not so; many of them overlap with each other, and there's a many-many relationship between addressing modes and memory spaces.
378For instance, \texttt{DIRECT} addressing can be used to address low internal RAM and the SFRs, but not high internal RAM.
380For simplicity, we merely treat memory spaces as if they are completely disjoint in the \texttt{Status} record.
381Overlapping, and checking which addressing modes can be used to address particular memory spaces, is handled through numerous \texttt{get\_arg\_XX} and \texttt{set\_arg\_XX} (for 1, 8 and 16 bits) functions.
[524]383Both the Matita and O'Caml emulators follows the classic `fetch-decode-execute' model of processor operation.
[532]384The next instruction to be processed, indexed by the program counter, is fetched from code memory with \texttt{fetch}.
[524]385An updated program counter, along with the concrete cost, in processor cycles for executing this instruction, is also returned.
386These costs are taken from a Siemen's data sheet for the MCS-51, and will likely vary across manufacturers and particular derivatives of the processor.
[532]388definition fetch:
389  BitVectorTrie Byte 16 $\rightarrow$ Word $\rightarrow$ instruction $\times$ Word $\times$ nat := ...
391A single instruction is assembled into its corresponding bit encoding with \texttt{assembly1}:
393definition assembly1: instruction $\rightarrow$ list Byte := ...
395An assembly program, consisting of a preamble containing global data, and a list of (pseudo)instructions, is assembled using \texttt{assembly}.
396Pseudoinstructions and labels are eliminated in favour of concrete instructions from the MCS-51 instruction set.
397A map associating memory locations and cost labels (see Subsection~\ref{subsect.computation.cost.traces}) is also produced.
[532]399definition assembly:
400  assembly_program $\rightarrow$ option (list Byte $\times$ (BitVectorTrie String 16)) := ...
402A single execution step of the processor is evaluated using \texttt{execute\_1}, mapping a \texttt{Status} to a \texttt{Status}:
[532]404definition execute_1: Status $\rightarrow$ Status := ...
406Multiple steps of processor execution are implemented in \texttt{execute}, which wraps \texttt{execute\_1}:
408let rec execute (n: nat) (s: Status) on n: Status := ...
[532]410This differs slightly from the design of the O'Caml emulator, which executed a program indefinitely, and also accepted a callback function as an argument, which could `witness' the execution as it happened, and providing a print-out of the processor state, and other debugging information.
411Due to Matita's requirement that all functions be strongly normalizing, \texttt{execute} cannot execute a program indefinitely, and must execute a fixed number of steps.
414% SECTION                                                                      %
416\subsection{Instruction set unorthogonality}
[508]419A peculiarity of the MCS-51 is the non-orthogonality of its instruction set.
420For instance, the \texttt{MOV} instruction, can be invoked using one of sixteen combinations of addressing modes.
[520]422% Show example of pattern matching with polymorphic variants
[508]424Such non-orthogonality in the instruction set was handled with the use of polymorphic variants in the O'Caml emulator.
425For instance, we introduced types corresponding to each addressing mode:
427type direct = [ `DIRECT of byte ]
428type indirect = [ `INDIRECT of bit ]
431Which were then used in our inductive datatype for assembly instructions, as follows:
433type 'addr preinstruction =
434 [ `ADD of acc * [ reg | direct | indirect | data ]
436 | `MOV of
437    (acc * [ reg | direct | indirect | data ],
438     [ reg | indirect ] * [ acc | direct | data ],
439     direct * [ acc | reg | direct | indirect | data ],
440     dptr * data16,
441     carry * bit,
442     bit * carry
443     ) union6
446Here, \texttt{union6} is a disjoint union type, defined as follows:
448type ('a,'b,'c,'d,'e,'f) union6 = [ `U1 of 'a | ... | `U6 of 'f ]
[510]450For our purposes, the types \texttt{union2}, \texttt{union3} and \texttt{union6} sufficed.
[510]452This polymorphic variant machinery worked well: it introduced a certain level of type safety (for instance, the type of our \texttt{MOV} instruction above guarantees it cannot be invoked with arguments in the \texttt{carry} and \texttt{data16} addressing modes, respectively), and also allowed us to pattern match against instructions, when necessary.
453However, this polymorphic variant machinery is \emph{not} present in Matita.
454We needed some way to produce the same effect, which Matita supported.
455For this task, we used dependent types.
[510]457We first provided an inductive data type representing all possible addressing modes, a type that functions will pattern match against:
[510]459inductive addressing_mode: Type[0] ≝
[495]460  DIRECT: Byte $\rightarrow$ addressing_mode
461| INDIRECT: Bit $\rightarrow$ addressing_mode
[510]464We also wished to express in the type of functions the \emph{impossibility} of pattern matching against certain constructors.
465In order to do this, we introduced an inductive type of addressing mode `tags'.
466The constructors of \texttt{addressing\_mode\_tag} are in one-to-one correspondence with the constructors of \texttt{addressing\_mode}:
[510]468inductive addressing_mode_tag : Type[0] ≝
[495]469  direct: addressing_mode_tag
470| indirect: addressing_mode_tag
[510]473A function that checks whether an \texttt{addressing\_mode} is `morally' an \texttt{addressing\_mode\_tag} is provided, as follows:
[539]475let rec is_a (d: addressing_mode_tag) (A: addressing_mode) on d :=
[495]476  match d with
477   [ direct $\Rightarrow$ match A with [ DIRECT _ $\Rightarrow$ true | _ $\Rightarrow$ false ]
478   | indirect $\Rightarrow$ match A with [ INDIRECT _ $\Rightarrow$ true | _ $\Rightarrow$ false ]
481We also extend this check to vectors of \texttt{addressing\_mode\_tag}'s in the obvious manner:
[539]483let rec is_in (n: nat) (l: Vector addressing_mode_tag n) (A: addressing_mode) on l :=
[510]484 match l return $\lambda$m.$\lambda$_: Vector addressing_mode_tag m. bool with
[495]485  [ VEmpty $\Rightarrow$ false
486  | VCons m he (tl: Vector addressing_mode_tag m) $\Rightarrow$
487     is_a he A $\vee$ is_in ? tl A ].
[528]489Here $\mathtt{\vee}$ is inclusive disjunction on the \texttt{bool} datatype.
[539]491record subaddressing_mode (n: nat) (l: Vector addressing_mode_tag (S n)): Type[0] :=
493  subaddressing_modeel :> addressing_mode;
494  subaddressing_modein: bool_to_Prop (is_in ? l subaddressing_modeel)
497We can now provide an inductive type of preinstructions with precise typings:
[510]499inductive preinstruction (A: Type[0]): Type[0] ≝
[495]500   ADD: $\llbracket$ acc_a $\rrbracket$ $\rightarrow$ $\llbracket$ register; direct; indirect; data $\rrbracket$ $\rightarrow$ preinstruction A
501 | ADDC: $\llbracket$ acc_a $\rrbracket$ $\rightarrow$ $\llbracket$ register; direct; indirect; data $\rrbracket$ $\rightarrow$ preinstruction A
504Here $\llbracket - \rrbracket$ is syntax denoting a vector.
505We see that the constructor \texttt{ADD} expects two parameters, the first being the accumulator A (\texttt{acc\_a}), and the second being one of a register, direct, indirect or data addressing mode.
[520]507% One of these coercions opens up a proof obligation which needs discussing
508% Have lemmas proving that if an element is a member of a sub, then it is a member of a superlist, and so on
[495]509The final, missing component is a pair of type coercions from \texttt{addressing\_mode} to \texttt{subaddressing\_mode} and from \texttt{subaddressing\_mode} to \texttt{Type$\lbrack0\rbrack$}, respectively.
[539]510The latter coercion is largely straightforward, however the former is not:
512coercion mk_subaddressing_mode:
513  $\forall$n.  $\forall$l: Vector addressing_mode_tag (S n).
514  $\forall$a: addressing_mode.
515  $\forall$p: bool_to_Prop (is_in ? l a). subaddressing_mode n l :=
516    mk_subaddressing_mode on a: addressing_mode to subaddressing_mode ? ?.
518Using this coercion opens a proof obligation wherein we must prove that the \texttt{addressing\_mode\_tag} in correspondence with the \texttt{addressing\_mode} is a member of the \texttt{Vector} of permissible \texttt{addressing\_mode\_tag}s.
519This impels us to state and prove a number of auxilliary lemmas.
520For instance, we prove that if an \texttt{addressing\_mode\_tag} is a member of a \texttt{Vector}, and we possess a supervector, then the same \texttt{addressing\_mode\_tag} is a member of this supervector.
521Without combining Matita's automation and lemmas such as these our approach becomes infeasible: type checking the main \texttt{execute1} function opens up over 200 proof obligations.
523The machinery just described allows us to state in the type of a function what addressing modes that function expects.
[495]524For instance, consider \texttt{set\_arg\_16}, which expects only a \texttt{DPTR}:
[510]526definition set_arg_16: Status $\rightarrow$ Word $\rightarrow$ $\llbracket$ dptr $\rrbracket$ $\rightarrow$ Status ≝
[495]527  $\lambda$s, v, a.
528   match a return $\lambda$x. bool_to_Prop (is_in ? $\llbracket$ dptr $\rrbracket$ x) $\rightarrow$ ? with
529     [ DPTR $\Rightarrow$ $\lambda$_: True.
530       let 〈 bu, bl 〉 := split $\ldots$ eight eight v in
531       let status := set_8051_sfr s SFR_DPH bu in
532       let status := set_8051_sfr status SFR_DPL bl in
533         status
534     | _ $\Rightarrow$ $\lambda$_: False.
535       match K in False with
536       [
537       ]
538     ] (subaddressing_modein $\ldots$ a).
540All other cases are discharged by the catch-all at the bottom of the match expression.
541Attempting to match against another addressing mode not indicated in the type (for example, \texttt{REGISTER}) will produce a type-error.
[520]543% Note the execute_1 function which leaves open over 200 proof obligations which we can close automatically, provided we have the lemmas mentioned above
544% Give an example of the type of proof obligations left open?
545% Talk about extraction to O'Caml code, which hopefully will allow us to extract back to using polymorphic variants, or when extracting vectors we could extract using phantom types
546% Discuss alternative approaches, i.e. Sigma types to piece together smaller types into larger ones, as opposed to using a predicate to `cut out' pieces of a larger type, which is what we did
549% SECTION                                                                      %
[521]551\subsection{I/O and timers}
554% `Real clock' for I/O and timers
[545]555The O'Caml emulator has code for handling timers, asynchronous I/O and interrupts (these are not yet ported to the Matita emulator).
[525]556All three of these features interact with each other in subtle ways.
557For instance, interrupts can `fire' when an input is detected on the processor's UART port, and, in certain modes, timers reset when a high signal is detected on one of the MCS-51's communication pins.
[545]559To accurately model timers and I/O, we add to the central \texttt{status} record of the emulator an unbounded integral field \texttt{clock} to keep track of the current time. This field is only logical, since it does not represent any quantity stored in the actual processor.
[525]560Before every execution step, the \texttt{clock} is incremented by the number of processor cycles that the instruction just fetched will take to execute.
[545]561The processor then executes the instruction, followed by the code implementing the timers and I/O\footnote{Though is isn't fully specified by the manufacturer
562data sheets if I/O is handled at the beginning or the end of each cycle.}
563. In order to model I/O, we also store in the status a
564\emph{continuation} that is a description of the behaviour of the environment:
567type line =
568  [ `P1 of byte | `P3 of byte
569  | `SerialBuff of [ `Eight of byte | `Nine of BitVectors.bit * byte ]]
570type continuation =
571  [`In of time * line * epsilon * continuation] option *
572  [`Out of (time -> line -> time * continuation)]
[545]575At each moment, the second projection of the continuation $k$ describes how
576the environment will react to an output operation performed in the future by
577the processor: if the processor at time $\tau$ starts an asynchronous output
578$o$ either on the $P1$ or $P3$ serial lines or on the UART, then the
579environment will receive the output at time $\tau'$ and moreover
580the status is immediately updated with the continuation $k'$ where
581$ \pi_2(k)(\tau,o) = \langle \tau',k' \rangle$. Moreover, if
582$\pi_1(k) = \mathtt{Some}~\langle \tau',i,\epsilon,k'\rangle$, then at time
583$\tau'$ the environment will send the asynchronous input $i$ to the processor
584and the status will be updated with the continuation $k'$. The input will
585become visible to the processor only at time $\tau' + \epsilon$.
[545]587The actual time required to perform an I/O operation is actually partially
588specified in the data sheets of the UART chip, but its computation is so
589complicated that we prefer to abstract over it, leaving to the environment
590the computation of the delay time.
592We use only the P1 and P3 lines despite the MCS-51 having four output lines, P0--P3. This is because P0 and P2 become inoperable if the processor is equipped with XRAM (which we assume it is). The UART port can work in several modes,
593depending on the value of some SFRs. In the asyncrhonous modes it transmits
594eight bits at a time, using a ninth line for syncrhonization. In the
595syncrhonous modes the ninth line is used to transmits an additional bit.
598% SECTION                                                                      %
600\subsection{Computation of cost traces}
[529]603As mentioned in Subsection~\ref{subsect.labels.pseudoinstructions} we introduced a notion of \emph{cost label}.
604Cost labels are inserted by the prototype C compiler in specific locations in the object code.
605Roughly, for those familiar with control flow graphs, they are inserted at the start of every basic block.
[529]607Cost labels are used to calculate a precise costing for a program by marking the location of basic blocks.
608During the assembly phase, where labels and pseudoinstructions are eliminated, a map is generated associating cost labels with memory locations.
609This map is later used in a separate analysis which computes the cost of a program by traversing through a program, fetching one instruction at a time, and computing the cost of blocks.
610These block costings are stored in another map, and will later be passed back to the prototype compiler.
613% SECTION                                                                      %
[511]618We spent considerable effort attempting to ensure that our formalisation is correct, that is, what we have formalised really is an accurate model of the MCS-51 microprocessor.
620First, we made use of multiple data sheets, each from a different semiconductor manufacturer.
621This helped us spot errors in the specification of the processor's instruction set, and its behaviour.
623The O'Caml prototype was especially useful for validation purposes.
624This is because we wrote a module for parsing and loading the Intel HEX file format.
625HEX is a standard format that all compilers targetting the MCS-51, and similar processors, produce.
626It is essentially a snapshot of the processor's code memory in compressed form.
627Using this, we were able to compile C programs with SDCC, an open source compiler, and load the resulting program directly into our emulator's code memory, ready for execution.
628Further, we are able to produce a HEX file from our emulator's code memory, for loading into third party tools.
629After each step of execution, we can print out both the instruction that had been executed, along with its arguments, and a snapshot of the processor's state, including all flags and register contents.
630For example:
63508: mov 81 #07
637 Processor status:                               
639   ACC : 0 (00000000) B   : 0 (00000000)
640   PSW : 0 (00000000) with flags set as:
641     CY  : false   AC  : false
642     FO  : false   RS1 : false
643     RS0 : false   OV  : false
644     UD  : false   P   : false
645   SP  : 7 (00000111) IP  : 0 (00000000)
646   PC  : 8 (0000000000001000)
647   DPL : 0 (00000000) DPH : 0 (00000000)
648   SCON: 0 (00000000) SBUF: 0 (00000000)
649   TMOD: 0 (00000000) TCON: 0 (00000000)
650   Registers:                                   
651    R0 : 0 (00000000) R1 : 0 (00000000)
652    R2 : 0 (00000000) R3 : 0 (00000000)
653    R4 : 0 (00000000) R5 : 0 (00000000)
654    R6 : 0 (00000000) R7 : 0 (00000000)
659Here, the traces indicates that the instruction \texttt{mov 81 \#07} has just been executed by the processor, which is now in the state indicated.
660These traces were useful in spotting anything that was `obviously' wrong with the execution of the program.
662Further, we made use of an open source emulator for the MCS-51, \texttt{mcu8051ide}.
663Using our execution traces, we were able to step through a compiled program, one instruction at a time, in \texttt{mcu8051ide}, and compare the resulting execution trace with the trace produced by our emulator.
665Our Matita formalisation was largely copied from the O'Caml source code, apart from changes related to addressing modes already mentioned.
666However, as the Matita emulator is executable, we could perform further validation by comparing the trace of a program's execution in the Matita emulator with the trace of the same program in the O'Caml emulator.
669% SECTION                                                                      %
[493]671\section{Related work}
[542]673There is a quite large litterature on formalization of micro-processors.
674Most of it is aimed at the proof of correctness of the implementation of
675micro-processor at the micro-code or electric level.
676Instead, we are only interested in providing a precise specification of the
677behaviour of the micro-processor in order to prove correctness of compilers.
678In particular, we are also interested in the precise timings of the processor.
679Moreover, we have not only formalized the interface of an MCS-51 processor,
680but a complete MCS-51 system: the UART, the I/O lines, the hardware timers.
681%An assembler for MCS-51 is also provided, and it supports pseudo instructions.
[542]683One of the closest work to our is~\cite{A Trustworthy Monadic Formalization of the ARMv7 Instruction Set Architecture Anthony Fox and Magnus O. Myreen} which
684describes the formalization of the ARMv7 instruction set architecture. It also
685contains a good list of references to related work in the litterature.
[542]687Like we do, their model also operates at the machine code level, as opposed to
688a more abstract assembly code level. In particular, instruction decoding is
689explicitly modelled inside the logic. In addition to this, we also provide
690an assembly level together with an assembler to translate the instructions
691and pseudo-instruction to machine code.
[542]693From the perspective of trustworthyness, we have only performed a non-exhaustive
694validation against a pre-existent emulator, while in~\cite{again} they have
695used development boards and random testing to increase their confidence in the
696formalization. At the moment, we leave this to future works.
[542]698Another difference is that in~\cite{again} the authors provide an automation
699layer to derive single step theorems: if the processor is in a particular state
700that satisfies some pre-conditions, after execution of an instruction it will
701satisfy some post-conditions. We do not need single step theorems since Matita
702is based on a logic that internalizes conversion and since we gave an
703executable formalization, i.e. an emulation function from states to states.
704Application of the emulation function on the input state reduces to an output
705state that already satisfies the appropriate conditions.
[542]707Most of the difficulties we have had to face are related to the non uniformity
708of 8 bits architectures, in terms of instruction set, addressing modes and
709memory models. No difficulties like that are found when formalizing the ARM
710instruction set.
[542]712The closest project to CerCo is CompCert~\cite{compcert} which is about
713the certification of an ARM compiler. CompCert includes a formalization
714in Coq of a subset of ARM. Coq and Matita shares the same logic.
715However, the two formalizations do not have much in common. First of all,
716they provide an assembly level model (no instruction decoding), and so they
717must trust an unformalized assembler and linker, while we provide our own.
718I/O is also not considered at all.
719Moreover they assume an idealized abstract and uniform memory model, while
720we take in account the complicated memory model of the MCS-51 architecture.
721Finally, they only formalized about 90 instructions of the 200+ offered by the
722processor, and they augmented the assembly with macro instructions that are
723turned into real instructions only during communication with the external
724assembler. Even technically the two formalizations differ: while we tried to
725exploit as much as possible dependent types, they stick to the non dependent
726fragment of the logic of Coq.
[542]728In~\cite{Robert Atkey. CoqJVM: An executable specification of the Java virtual machine using dependent types. In Marino Miculan, Ivan Scagnetto, and Furio Honsell, editors, TYPES, volume 4941 of Lecture Notes in Computer Science, pages 18–32.  Springer, 2007.} Atkey presents an executable specification of the Java
729virtual machine using dependent types. As we do, dependent types are used
730to remove spurious partiality from the model and to lower the need of over-specifying the behaviour of the processor in impossible cases. Dependent types will
731also help to maintain invariants when proving the correctness of the CerCo compiler.
[542]733Finally, in~\cite{Susmit Sarkar, Pater Sewell, Francesco Zappa Nardelli, Scott Owens, Tom Ridge, Thomas Braibant Magnus O. Myreen, and Jade Alglave. The semantics of x86-CC multiprocessor machine code. In Principles of Programming Languages (POPL). ACM, 2009.} Sarkar and alt. provide an executable semantics for the
734x86-CC multiprocessor machine code, which exhibits an high degree of non uniformity as the 8051. However, they only consider a very small subset of the
735instructions and they over-approximate the possibilities of unorthogonality of
736the instruction set, dodging the problems we had to face.
737The most interesting idea to us in their formalization is the specification of
738the decode function, that is particularly error prone. What they did is to
739formalize in HOL a small language of patterns that is used in the data sheets
740of the x86, so that the decoding function is later implemented simply by
741copying the relevant lines from the manual into the HOL script.
742We are currently considering if implementing a similar solution in Matita.
743However, we would prefer to certify in Matita a compiler for the pattern
744language so that the data sheets could be compiled down to the efficient code
745that we provide in place of having to inefficiently interpret the data sheets
746every time an instruction is executed.
750% SECTION                                                                      %
[542]755\CSC{Tell what is NOT formalized/formalizable: the HEX parser/pretty printer
756 and/or the I/O procedure}
757\CSC{Decode: two implementations}
758\CSC{Discuss over-specification}
761  How to test it? Specify it?
767\section{Listing of main O'Caml functions}
770\subsubsection{From \texttt{}}
774Name & Description \\
776\texttt{assembly} & Assembles an abstract syntax tree representing an 8051 assembly program into a list of bytes, its compiled form. \\
777\texttt{initialize} & Initializes the emulator status. \\
778\texttt{load} & Loads an assembled program into the emulator's code memory. \\
779\texttt{fetch} & Fetches the next instruction, and automatically increments the program counter. \\
780\texttt{execute} & Emulates the processor.  Accepts as input a function that pretty prints the emulator status after every emulation loop. \\
[541]784\subsubsection{From \texttt{}}
788Name & Description \\
790\texttt{compute} & Computes a map associating costings to basic blocks in the program.
[540]794\subsubsection{From \texttt{}}
798Name & Description \\
800\texttt{intel\_hex\_of\_file} & Reads in a file and parses it if in Intel IHX format, otherwise raises an exception. \\
801\texttt{process\_intel\_hex} & Accepts a parsed Intel IHX file and populates a hashmap (of the same type as code memory) with the contents.
805\subsubsection{From \texttt{}}
809Name & Description \\
811\texttt{subb8\_with\_c} & Performs an eight bit subtraction on bitvectors.  The function also returns the most important PSW flags for the 8051: carry, auxiliary carry and overflow. \\
812\texttt{add8\_with\_c} & Performs an eight bit addition on bitvectors.  The function also returns the most important PSW flags for the 8051: carry, auxiliary carry and overflow. \\
813\texttt{dec} & Decrements an eight bit bitvector with underflow, if necessary. \\
814\texttt{inc} & Increments an eight bit bitvector with overflow, if necessary.
820\section{Listing of main Matita functions}
823\subsubsection{From \texttt{}}
827Title & Description \\
829\texttt{add\_n\_with\_carry} & Performs an $n$ bit addition on bitvectors.  The function also returns the most important PSW flags for the 8051: carry, auxiliary carry and overflow. \\
830\texttt{sub\_8\_with\_carry} & Performs an eight bit subtraction on bitvectors. The function also returns the most important PSW flags for the 8051: carry, auxiliary carry and overflow. \\
831\texttt{half\_add} & Performs a standard half addition on bitvectors, returning the result and carry bit. \\
832\texttt{full\_add} & Performs a standard full addition on bitvectors and a carry bit, returning the result and a carry bit.
836\subsubsection{From \texttt{}}
840Title & Description \\
842\texttt{assemble1} & Assembles a single 8051 assembly instruction into its memory representation. \\
843\texttt{assemble} & Assembles an 8051 assembly program into its memory representation.\\
844\texttt{assemble\_unlabelled\_program} &\\& Assembles a list of (unlabelled) 8051 assembly instructions into its memory representation.
848\subsubsection{From \texttt{}}
852Title & Description \\
854\texttt{lookup} & Returns the data stored at the end of a particular path (a bitvector) from the trie.  If no data exists, returns a default value. \\
855\texttt{insert} & Inserts data into a tree at the end of the path (a bitvector) indicated.  Automatically expands the tree (by filling in stubs) if necessary.
859\subsubsection{From \texttt{}}
863Title & Description \\
865\texttt{execute\_trace} & Executes an assembly program for a fixed number of steps, recording in a trace which instructions were executed.
869\subsubsection{From \texttt{}}
873Title & Description \\
875\texttt{fetch} & Decodes and returns the instruction currently pointed to by the program counter and automatically increments the program counter the required amount to point to the next instruction. \\
879\subsubsection{From \texttt{}}
883Title & Description \\
885\texttt{execute\_1} & Executes a single step of an 8051 assembly program. \\
886\texttt{execute} & Executes a fixed number of steps of an 8051 assembly program.
890\subsubsection{From \texttt{}}
894Title & Description \\
896\texttt{load} & Loads an assembled 8051 assembly program into code memory.
Note: See TracBrowser for help on using the repository browser.