# Changeset 569

Ignore:
Timestamp:
Feb 18, 2011, 10:41:56 AM (8 years ago)
Message:

Tightened up English in the Introduction

File:
1 edited

### Legend:

Unmodified
 r568 The MCS-51 is a widely used 8-bit microprocessor, especially popular in embedded devices. We proceeded in two stages, first implementing in O'Caml a prototype emulator, where bugs could be ironed out' quickly. We then ported our O'Caml emulator to Matita's internal language. The formalisation proceeded in two stages, first implementing an O'Caml prototype for quickly ironing out' bugs. We then ported the O'Caml emulator to Matita. Though mostly straight-forward, this porting presented multiple problems. Of particular interest is how we handle the extreme non-orthoganality of the MSC-51's instruction set. In O'Caml, this was handled through heavy use of polymorphic variants. In Matita, we achieve the same effect through a non-standard use of dependent types. Of particular interest is how we handled the non-orthoganality of the MSC-51's instruction set. In O'Caml, this was handled with polymorphic variants. In Matita, we achieved the same effect with a non-standard use of dependent types. Both the O'Caml and Matita emulators are executable'. % SECTION                                                                      % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Background} \section{Introduction} \label{sect.introduction} Formal methods are designed to increase our confidence in the design and implementation of software (and hardware). Ideally, we would like all software to come equipped with a formal specification, along with a proof of correctness that the software meets this specification. Today the majority of programs are written in high level languages and then compiled into low level ones. Specifications are therefore also given at a high level and correctness can be proved by reasoning automatically or interactively on the program's source code. The code that is actually run, however, is not the high level source code that we reason on, but the object code that is generated by the compiler. Formal methods aim to increase our confidence in the design and implementation of software. Ideally, all software should come equipped with a formal specification and a proof of correctness for the corresponding implementation. The majority of programs are written in high level languages and then compiled into low level ones. Specifications are therefore also given at a high level and correctness can be proved by reasoning on the program's source code. The code that is actually run, however, is not the high level source code that we reason on, but low level code generated by the compiler. A few questions now arise: \begin{itemize*} These questions, and others like them, motivate a current hot topic' in computer science research: \emph{compiler verification} (for instance~\cite{leroy:formal:2009,chlipala:verified:2010}, and many others). So far, the field has only been focused on the first and last questions. Much attention has been placed on verifying compiler correctness with respect to extensional properties of programs, which are easily preserved during compilation; it is sufficient to completely preserve the denotational semantics of the input program. Much attention has been placed on verifying compiler correctness with respect to extensional properties of programs. These are easily' preserved during compilation. If we consider intensional properties of programs---space, time, and so forth---the situation is more complex. To express these properties, and reason about them, we must adopt a cost model that assigns a cost to single, or blocks, of instructions. A compositional cost model---assigning the same cost to all occurrences of one instruction---would be ideal. A compositional cost model, assigning the same cost to all occurrences of one instruction, would be ideal. However, compiler optimisations are inherently non-compositional: each occurrence of a high level instruction may be compiled in a different way depending on its context. Therefore both the cost model and intensional specifications are affected by the compilation process. A user can certify that deadlines' are met whilst wringing as many clock cycles from the processor---using a cost model that does not over-estimate---as possible. We also see our approach as being relevant to the compiler verification (and construction) itself. An optimisation specified only extensionally is only half specified; though the optimisation may preserve the denotational semantics of a program, there is no guarantee that any intensional properties of the program will be improved. We also see our approach as being relevant to compiler verification (and construction) itself. \emph{An optimisation specified only extensionally is only half specified}. Though the optimisation may preserve the denotational semantics of a program, there is no guarantee that any intensional properties of the program will be improved. Another potential application is toward completeness and correctness of the compilation process in the presence of space constraints. A compiler could potentially reject a source program targetting an embedded system when the size of the compiled code exceeds the available ROM size. A compiler could reject a source program targetting an embedded system when the size of the compiled code exceeds the available ROM size. Preservation of a program's semantics may only be required for those programs that do not exhaust the stack or heap. Hence the statement of completeness of the compiler must take in to account a realistic cost model. With the CerCo methodology, we assume we can assign to the object code exact and realistic costs for sequential blocks of instructions. This is possible with modern processors (see~\cite{bate:wcet:2011,yan:wcet:2008} for instance) but difficult, as the execution of a program has an influence on the speed of processing. Caching, memory effects, and other advanced features such as branch prediction all have a profound effect on execution speeds. The statement of completeness of the compiler must therefore take in to account a realistic cost model. With the CerCo methodology, we assume we can assign to object code exact and realistic costs for sequential blocks of instructions. This is possible with modern processors (see~\cite{bate:wcet:2011,yan:wcet:2008} for instance) but difficult, as the structure and execution of a program itself has an influence on the speed of processing. Caching, memory effects, and advanced features such as branch prediction all have an effect on execution speed. For this reason CerCo decided to focus on 8-bit microprocessors. These are still widely used in embedded systems, with the advantage of an easily predictable cost model due to their relative paucity of features. We have fully formalised an executable formal semantics of a family of 8-bit Freescale Microprocessors~\cite{oliboni:matita:2008}, and provided a similar executable formal semantics for the MCS-51 microprocessor. These are still used in embedded systems, with the advantage of a predictable cost model due to their relative paucity of features. We have fully formalised an executable formal semantics of a family of 8-bit Freescale microprocessors~\cite{oliboni:matita:2008}, and provided a similar executable formal semantics for the MCS-51 microprocessor. The latter is what we describe in this paper. The focus of the formalisation has been on capturing the intensional behaviour of the processor. However, the design of the MCS-51 itself has caused problems in our formalisation. For example, the MCS-51 has a highly unorthogonal instruction set. To cope with this unorthogonality, and to produce an executable specification, we have Matita's dependent types. \subsection{The 8051/8052} \label{subsect.8051-8052} To cope with this unorthogonality, and to produce an executable specification, we rely on Matita's dependent types. \paragraph{The MCS-51}\quad The MCS-51 is an 8-bit microprocessor introduced by Intel in the late 1970s. Commonly called the 8051, in the three decades since its introduction the processor has become a highly popular target for embedded systems engineers. Further, the processor, its immediate successor the 8052, and many derivatives are still manufactured \emph{en masse} by a host of semiconductor suppliers. The 8051 is a well documented processor, and has the additional support of numerous open source and commercial tools, such as compilers for high-level languages and emulators. For instance, the open source Small Device C Compiler (SDCC) recognises a dialect of C~\cite{sdcc:2010}, and other compilers targeting the 8051 for BASIC, Forth and Modula-2 are also extant. An open source emulator for the processor, MCU-8051 IDE, is also available~\cite{mcu8051ide:2010}. Both MCU-8051 IDE and SDCC were used profitably in the implementation of our formalisation. Commonly called the 8051, in the decades since its introduction the processor has become a popular component of embedded systems. The processor, its successor the 8052, and derivatives are still manufactured \emph{en masse} by a host of vendors. The 8051 is a well documented processor, and has the support of numerous open source and commercial tools, such as compilers and emulators. For instance, the open source Small Device C Compiler (SDCC) recognises a dialect of C~\cite{sdcc:2010}, and other compilers for BASIC, Forth and Modula-2 are also extant. An open source emulator for the processor, MCU 8051 IDE, is also available~\cite{mcu8051ide:2010}. Both MCU 8051 IDE and SDCC were used in for validating our formalisation. \begin{figure}[t] Processor RAM is divided into numerous segments, with the most prominent division being between internal and (optional) external memory. Internal memory, commonly provided on the die itself with fast access, is composed of 256 bytes, but, in direct addressing mode, half of them are overloaded with 128 bytes of memory mapped Special Function Registers (SFRs) which control the operation of the processor. Internal RAM (IRAM) is further divided into eight general purpose bit-addressable registers (R0--R7). These sit in the first eight bytes of IRAM, though can be programmatically shifted up' as needed. Bit memory, followed by a small amount of stack space, resides in the memory space immediately after the register banks. What remains of the IRAM may be treated as general purpose memory. Internal memory, commonly provided on the die itself with fast access, is composed of 256 bytes, but, in direct addressing mode, half of them are overloaded with 128 bytes of memory-mapped Special Function Registers (SFRs). SFRs control the operation of the processor. Internal RAM (IRAM) is divided again into 8 general purpose bit-addressable registers (R0--R7). These sit in the first 8 bytes of IRAM, though can be programmatically shifted up' as needed. Bit memory, followed by a small amount of stack space, resides in the memory space immediately following the register banks. What remains of IRAM may be treated as general purpose memory. A schematic view of IRAM layout is also provided in Figure~\ref{fig.memory.layout}. External RAM (XRAM), limited to a maximum size of 64 kilobytes, is optional, and may be provided on or off chip, depending on the manufacturer. External RAM (XRAM), limited to a maximum size of 64 kilobytes, is optional, and may be provided on or off chip, depending on the vendor. XRAM is accessed using a dedicated instruction, and requires 16 bits to address fully. External code memory (XCODE) is often stored in the form of an EPROM, and limited to 64 kilobytes in size. However, depending on the particular manufacturer and processor model, a dedicated on-die read-only memory area for program code (ICODE) may also be supplied. External code memory (XCODE) is often stored as an EPROM, and limited to 64 kilobytes in size. However, depending on the particular processor model, a dedicated on-die read-only memory area for program code (ICODE) may be supplied. Memory may be addressed in numerous ways: immediate, direct, indirect, external direct and code indirect. As the latter two addressing modes hint, there are some restrictions enforced by the 8051 and its derivatives on which addressing modes may be used with specific types of memory. For instance, the 128 bytes of extra internal RAM that the 8052 features cannot be addressed using indirect addressing; rather, external (in)direct addressing must be used. Moreover, some memory segments are addressed using 8-bit pointers while others require 16-bits. As the latter two addressing modes hint, there are some restrictions enforced by the 8051, and its derivatives, on which addressing modes may be used with specific types of memory. For instance, the extra 128 bytes of IRAM of the 8052 cannot be addressed using indirect addressing; rather, external (in)direct addressing must be used. Moreover, some memory segments are addressed using 8-bit pointers while others require 16-bits. The 8051 possesses an 8-bit Arithmetic and Logic Unit (ALU), with a variety of instructions for performing arithmetic and logical operations on bits and integers. Two 8-bit general purpose accumulators, A and B, are provided. Communication with the device is facilitated by an onboard UART serial port, and associated serial controller, which can operate in numerous modes. Communication with the device is handled by an inbuilt UART serial port and controller. This can operate in numerous modes. Serial baud rate is determined by one of two 16-bit timers included with the 8051, which can be set to multiple modes of operation. (The 8052 provides an additional 16-bit timer.) As an additional method of communication, the 8051 also provides a four byte bit-addressable input-output port. The programmer may take advantage of the interrupt mechanism that the processor provides. This is especially useful when dealing with input or output involving the serial device, as an interrupt can be set when a whole character is sent or received via the serial port. The 8051 also provides a 4 byte bit-addressable I/O port. The programmer may take advantage of an interrupt mechanism. This is especially useful when dealing with I/O involving the serial device, as an interrupt can be set when a whole character is sent or received via the UART. Interrupts immediately halt the flow of execution of the processor, and cause the program counter to jump to a fixed address, where the requisite interrupt handler is stored. %\end{figure} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % SECTION                                                                      % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Overview of paper} \label{subsect.overview.paper} \paragraph{Overview of paper}\quad In Section~\ref{sect.design.issues.formalisation} we discuss design issues in the development of the formalisation. In Section~\ref{sect.validation} we discuss how we validated the design and implementation of our emulator to ensure that what we formalised was an accurate model of an MCS-51 series microprocessor. In Matita, we are able to use the full power of dependent types to always work with vectors of a known size: \begin{lstlisting} inductive Vector (A: Type[0]): nat → Type[0] ≝ inductive Vector (A: Type[0]): nat $\rightarrow$ Type[0] ≝ VEmpty: Vector A O | VCons: ∀n: nat. A → Vector A n → Vector A (S n). | VCons: $\forall$n: nat. A $\rightarrow$ Vector A n $\rightarrow$ Vector A (S n). \end{lstlisting} We define \texttt{BitVector} as a specialization of \texttt{Vector} to \texttt{bool}. The formalisation is done at machine level and not at assembly level; we also formalise fetching and decoding. We separately provide an assembly language, enhanched with labels and pseudo-instructions, and an assembler from this language to machine code. We separately provide an assembly language, enhanched with labels and pseudoinstructions, and an assembler from this language to machine code. We introduce cost labels in the machine language to relate the data flow of the assembly program to that of the C source language, in order to associate costs to the C program. For the O'Caml version, we provide a parser and pretty printer from code memory to Intel HEX format.