# Changeset 817 for Deliverables/D4.1/ITP-Paper

Ignore:
Timestamp:
May 19, 2011, 3:16:20 PM (10 years ago)
Message:

final changes, now to get under 8 pages

Location:
Deliverables/D4.1/ITP-Paper
Files:
2 edited

### Legend:

Unmodified
 r815 We have fully formalised an executable formal semantics of a family of 8-bit Freescale microprocessors~\cite{oliboni:matita:2008}, and provided a similar executable formal semantics for the MCS-51 microprocessor. The latter is what we describe in this paper. The focus of the formalisation has been on capturing the intensional behaviour of the processor. The focus of the formalisation has been on capturing the intensional behaviour of the processor; this is not novel. However, the design of the MCS-51 itself has caused problems in the formalisation. For example, the MCS-51 has a highly unorthogonal instruction set. To cope with this unorthogonality, and to produce an executable specification, we rely on the dependent types of Matita, an interactive proof assistant~\cite{asperti:user:2007}. The manner in which we combined dependent types and coercions to handle this problem is novel. \paragraph*{The MCS-51}\quad \paragraph*{Overview of paper}\quad In Section~\ref{sect.design.issues.formalisation} we discuss design issues in the development of the formalisation. In Section~\ref{sect.validation} we discuss how we validated the design and implementation of the emulator to ensure that what we formalised was an accurate model of an MCS-51 series microprocessor. In Section~\ref{sect.validation} we discuss validation of the emulator to ensure that what we formalised was an accurate model of an MCS-51 series microprocessor. In Section~\ref{sect.related.work} we describe previous work, with an eye toward describing its relation with the work described herein. In Section~\ref{sect.conclusions} we conclude. \section{Design issues in the formalisation} \label{sect.design.issues.formalisation} We implemented two emulators, one in O'Caml and one in Matita. The O'Caml emulator is intended to be feature complete' with respect to the MCS-51 device. However, the Matita emulator is intended to be used as a target for a certified, complexity preserving C compiler. As a result, not all features of the MCS-51 are formalised in the Matita emulator. We designed the O'Caml emulator to be as efficient as possible, under the constraint that it would be eventually translated into Matita. One performance drain in the O'Caml emulator is the use of purely functional map datastructures to represent memory spaces, used to maintain the close correspondence between the Matita and O'Caml emulators. Matita~\cite{asperti:user:2007} is a proof assistant based on the Calculus of Coinductive constructions, similar to Coq. We traverse a path, and upon encountering \texttt{Stub}, we return a default value\footnote{All manufacturer data sheets that we consulted were silent on the subject of what should be returned if we attempt to access uninitialized memory.  We defaulted to simply returning zero, though our \texttt{lookup} function is parametric in this choice.  We do not believe that this is an outrageous decision, as SDCC for instance generates code which first zeroes out' all memory in a preamble before executing the program proper.  This is in line with the C standard, which guarantees that all global variables will be zero initialized piecewise.}. \texttt{BitVectorTrie} and \texttt{Vector} datastructures, and related functions, can be used in the formalising other microprocessors. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % SECTION                                                                      % While introducing pseudoinstructions, we also introduced labels for locations to jump to, and for global data. To specify global data via labels, we introduced a preamble before the program where labels and the size of reserved space for data is stored. A pseudoinstruction \texttt{Mov} moves (16-bit) data stored at a label into the MCS-51's one 16-bit register, \texttt{DPTR}. A pseudoinstruction \texttt{Mov} moves (16-bit) data stored at a label into the MCS-51's 16-bit register, \texttt{DPTR}. (This register is used for indirect addressing of data stored in external memory.) We use only the P1 and P3 lines despite the MCS-51 having 4 output lines, P0--P3. This is because P0 and P2 become inoperable if the processor is equipped with XRAM (we assume it is). This is because P0 and P2 become inoperable if XRAM is present (we assume it is). The UART port can work in several modes, depending on the how the SFRs are set. Using these execution traces, we were able to step through a compiled program in MCU 8051 IDE and compare the resulting execution trace with the trace produced by our emulator. We partially validated the assembler by checking that on defined opcodes the \texttt{assembly\_1} and \texttt{fetch} functions are inverse. We partially validated the assembler by proving in Matita that on all defined opcodes the \texttt{assembly\_1} and \texttt{fetch} functions are inverse. The Matita formalisation was largely copied from the O'Caml source code, apart from the changes already mentioned. \label{sect.related.work} A large body of literature on the formalisation of microprocessors exists. The majority of it deals with proving correctness of implementations of microprocessors at the microcode or gate level. The majority of it deals with proving correctness of implementations of microprocessors at the microcode or gate level, with many considering cycle accurate' models. We are interested in providing a precise specification of the behaviour of the microprocessor in order to prove the correctness of a compiler which will target the processor. In particular, we are interested in intensional properties of the processor; precise timings of instruction execution in clock cycles. In contrast, the various ARM instruction sets and memory models are relatively uniform. Perhaps the closest project to CerCo is CompCert~\cite{leroy:formally:2009}. Two close projects to CerCo are CompCert~\cite{leroy:formally:2009} and the CLI Stack. CompCert concerns the certification of a C compiler and includes a formalisation in Coq of a subset of PowerPC. The CLI Stack consists of the design and verification of a whole chain of artifacts including a 32-bit microprocessor, the Piton assembler and two compilers for high-level languages. Like CerCo, the CLI Stack compilers gave the cost of high-level instructions in processor cycles. However, unlike CerCo, both the CLI Stack high-level languages ($\mu$Gypsy and Nqthm Pure Lisp) and FM9001 microprocessor were not commercial products, but artificial' designs created for the purpose of verification (see~\cite{moore:grand:2005}). The CompCert C compiler is extracted to O'Caml using Coq's code extraction facility. We aim to make use of a similar facility in Matita. Many other formalised emulators/compilers have also been extracted from proof assistants using similar technology (e.g. see~\cite{blanqui:designing:2010}). Despite this similarity, the two formalisations do not have much in common. We aim to make use of a similar code extraction facility in Matita, but only if the extracted code exhibits the same degree of type safety, provided by polymorphic variants, and human readability that the O'Caml emulator posseses. This is because we aim to use the emulator as a library for non-certified software written directly in O'Caml. How we have used Matita's dependent types to handle the instruction set (Subsection~\ref{subsect.instruction.set.unorthogonality}) could enable code extraction to make use of polymorphic variants. Using Coq's current code extraction algorithm we could write assembly programs that would generate runtime errors when emulated. We leave this for future work. Despite the apparent similarity between CerCo and CompCert, the two formalisations do not have much in common. First, CompCert provides a formalisation at the assembly level (no instruction decoding). This impels them to trust an unformalised assembler and linker, whereas we provide our own. Our main difficulty in formalising the MCS-51 was the unorthogonality of its memory model and instruction set. These problems are easily handled in O'Caml by using advanced language features like polymorphic variants and phantom types, simulating Generalized Abstract Data Types~\cite{xi:guarded:2003}. In Matita, we use dependent types to recover the same flexibility, to reduce spurious partiality, and to grant invariants that will be later useful in other formalisations in the CerCo project. Importantly, we discovered the best manner of using dependent types to recover the same flexibility, to reduce spurious partiality, and to grant invariants that will be later useful in other formalisations in the CerCo project. The formalisation has been partially verified by computing execution traces on selected programs and comparing them with an existing emulator.