Changeset 569


Ignore:
Timestamp:
Feb 18, 2011, 10:41:56 AM (6 years ago)
Author:
mulligan
Message:

Tightened up English in the Introduction

File:
1 edited

Legend:

Unmodified
Added
Removed
  • Deliverables/D4.1/ITP-Paper/itp-2011.tex

    r568 r569  
    6969The MCS-51 is a widely used 8-bit microprocessor, especially popular in embedded devices.
    7070
    71 We proceeded in two stages, first implementing in O'Caml a prototype emulator, where bugs could be `ironed out' quickly.
    72 We then ported our O'Caml emulator to Matita's internal language.
     71The formalisation proceeded in two stages, first implementing an O'Caml prototype for quickly `ironing out' bugs.
     72We then ported the O'Caml emulator to Matita.
    7373Though mostly straight-forward, this porting presented multiple problems.
    74 Of particular interest is how we handle the extreme non-orthoganality of the MSC-51's instruction set.
    75 In O'Caml, this was handled through heavy use of polymorphic variants.
    76 In Matita, we achieve the same effect through a non-standard use of dependent types.
     74Of particular interest is how we handled the non-orthoganality of the MSC-51's instruction set.
     75In O'Caml, this was handled with polymorphic variants.
     76In Matita, we achieved the same effect with a non-standard use of dependent types.
    7777
    7878Both the O'Caml and Matita emulators are `executable'.
     
    8484% SECTION                                                                      %
    8585%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    86 \section{Background}
     86\section{Introduction}
    8787\label{sect.introduction}
    8888
    89 Formal methods are designed to increase our confidence in the design and implementation of software (and hardware).
    90 Ideally, we would like all software to come equipped with a formal specification, along with a proof of correctness that the software meets this specification.
    91 Today the majority of programs are written in high level languages and then compiled into low level ones.
    92 Specifications are therefore also given at a high level and correctness can be proved by reasoning automatically or interactively on the program's source code.
    93 The code that is actually run, however, is not the high level source code that we reason on, but the object code that is generated by the compiler.
     89Formal methods aim to increase our confidence in the design and implementation of software.
     90Ideally, all software should come equipped with a formal specification and a proof of correctness for the corresponding implementation.
     91The majority of programs are written in high level languages and then compiled into low level ones.
     92Specifications are therefore also given at a high level and correctness can be proved by reasoning on the program's source code.
     93The code that is actually run, however, is not the high level source code that we reason on, but low level code generated by the compiler.
    9494A few questions now arise:
    9595\begin{itemize*}
     
    103103These questions, and others like them, motivate a current `hot topic' in computer science research: \emph{compiler verification} (for instance~\cite{leroy:formal:2009,chlipala:verified:2010}, and many others).
    104104So far, the field has only been focused on the first and last questions.
    105 Much attention has been placed on verifying compiler correctness with respect to extensional properties of programs, which are easily preserved during compilation; it is sufficient to completely preserve the denotational semantics of the input program.
     105Much attention has been placed on verifying compiler correctness with respect to extensional properties of programs.
     106These are `easily' preserved during compilation.
    106107
    107108If we consider intensional properties of programs---space, time, and so forth---the situation is more complex.
    108109To express these properties, and reason about them, we must adopt a cost model that assigns a cost to single, or blocks, of instructions.
    109 A compositional cost model---assigning the same cost to all occurrences of one instruction---would be ideal.
     110A compositional cost model, assigning the same cost to all occurrences of one instruction, would be ideal.
    110111However, compiler optimisations are inherently non-compositional: each occurrence of a high level instruction may be compiled in a different way depending on its context.
    111112Therefore both the cost model and intensional specifications are affected by the compilation process.
     
    122123A user can certify that `deadlines' are met whilst wringing as many clock cycles from the processor---using a cost model that does not over-estimate---as possible.
    123124
    124 We also see our approach as being relevant to the compiler verification (and construction) itself.
    125 An optimisation specified only extensionally is only half specified; though the optimisation may preserve the denotational semantics of a program, there is no guarantee that any intensional properties of the program will be improved.
     125We also see our approach as being relevant to compiler verification (and construction) itself.
     126\emph{An optimisation specified only extensionally is only half specified}.
     127Though the optimisation may preserve the denotational semantics of a program, there is no guarantee that any intensional properties of the program will be improved.
     128
    126129Another potential application is toward completeness and correctness of the compilation process in the presence of space constraints.
    127 A compiler could potentially reject a source program targetting an embedded system when the size of the compiled code exceeds the available ROM size.
     130A compiler could reject a source program targetting an embedded system when the size of the compiled code exceeds the available ROM size.
    128131Preservation of a program's semantics may only be required for those programs that do not exhaust the stack or heap.
    129 Hence the statement of completeness of the compiler must take in to account a realistic cost model.
    130 
    131 With the CerCo methodology, we assume we can assign to the object code exact and realistic costs for sequential blocks of instructions.
    132 This is possible with modern processors (see~\cite{bate:wcet:2011,yan:wcet:2008} for instance) but difficult, as the execution of a program has an influence on the speed of processing.
    133 Caching, memory effects, and other advanced features such as branch prediction all have a profound effect on execution speeds.
     132The statement of completeness of the compiler must therefore take in to account a realistic cost model.
     133
     134With the CerCo methodology, we assume we can assign to object code exact and realistic costs for sequential blocks of instructions.
     135This is possible with modern processors (see~\cite{bate:wcet:2011,yan:wcet:2008} for instance) but difficult, as the structure and execution of a program itself has an influence on the speed of processing.
     136Caching, memory effects, and advanced features such as branch prediction all have an effect on execution speed.
    134137For this reason CerCo decided to focus on 8-bit microprocessors.
    135 These are still widely used in embedded systems, with the advantage of an easily predictable cost model due to their relative paucity of features.
    136 
    137 We have fully formalised an executable formal semantics of a family of 8-bit Freescale Microprocessors~\cite{oliboni:matita:2008}, and provided a similar executable formal semantics for the MCS-51 microprocessor.
     138These are still used in embedded systems, with the advantage of a predictable cost model due to their relative paucity of features.
     139
     140We have fully formalised an executable formal semantics of a family of 8-bit Freescale microprocessors~\cite{oliboni:matita:2008}, and provided a similar executable formal semantics for the MCS-51 microprocessor.
    138141The latter is what we describe in this paper.
    139142The focus of the formalisation has been on capturing the intensional behaviour of the processor.
    140143However, the design of the MCS-51 itself has caused problems in our formalisation.
    141144For example, the MCS-51 has a highly unorthogonal instruction set.
    142 To cope with this unorthogonality, and to produce an executable specification, we have Matita's dependent types.
    143 
    144 \subsection{The 8051/8052}
    145 \label{subsect.8051-8052}
    146 
     145To cope with this unorthogonality, and to produce an executable specification, we rely on Matita's dependent types.
     146
     147\paragraph{The MCS-51}\quad
    147148The MCS-51 is an 8-bit microprocessor introduced by Intel in the late 1970s.
    148 Commonly called the 8051, in the three decades since its introduction the processor has become a highly popular target for embedded systems engineers.
    149 Further, the processor, its immediate successor the 8052, and many derivatives are still manufactured \emph{en masse} by a host of semiconductor suppliers.
    150 
    151 The 8051 is a well documented processor, and has the additional support of numerous open source and commercial tools, such as compilers for high-level languages and emulators.
    152 For instance, the open source Small Device C Compiler (SDCC) recognises a dialect of C~\cite{sdcc:2010}, and other compilers targeting the 8051 for BASIC, Forth and Modula-2 are also extant.
    153 An open source emulator for the processor, MCU-8051 IDE, is also available~\cite{mcu8051ide:2010}.
    154 Both MCU-8051 IDE and SDCC were used profitably in the implementation of our formalisation.
     149Commonly called the 8051, in the decades since its introduction the processor has become a popular component of embedded systems.
     150The processor, its successor the 8052, and derivatives are still manufactured \emph{en masse} by a host of vendors.
     151
     152The 8051 is a well documented processor, and has the support of numerous open source and commercial tools, such as compilers and emulators.
     153For instance, the open source Small Device C Compiler (SDCC) recognises a dialect of C~\cite{sdcc:2010}, and other compilers for BASIC, Forth and Modula-2 are also extant.
     154An open source emulator for the processor, MCU 8051 IDE, is also available~\cite{mcu8051ide:2010}.
     155Both MCU 8051 IDE and SDCC were used in for validating our formalisation.
    155156
    156157\begin{figure}[t]
     
    230231
    231232Processor RAM is divided into numerous segments, with the most prominent division being between internal and (optional) external memory.
    232 Internal memory, commonly provided on the die itself with fast access, is composed of 256 bytes, but, in direct addressing mode, half of them are overloaded with 128 bytes of memory mapped Special Function Registers (SFRs) which control the operation of the processor.
    233 Internal RAM (IRAM) is further divided into eight general purpose bit-addressable registers (R0--R7).
    234 These sit in the first eight bytes of IRAM, though can be programmatically `shifted up' as needed.
    235 Bit memory, followed by a small amount of stack space, resides in the memory space immediately after the register banks.
    236 What remains of the IRAM may be treated as general purpose memory.
     233Internal memory, commonly provided on the die itself with fast access, is composed of 256 bytes, but, in direct addressing mode, half of them are overloaded with 128 bytes of memory-mapped Special Function Registers (SFRs).
     234SFRs control the operation of the processor.
     235Internal RAM (IRAM) is divided again into 8 general purpose bit-addressable registers (R0--R7).
     236These sit in the first 8 bytes of IRAM, though can be programmatically `shifted up' as needed.
     237Bit memory, followed by a small amount of stack space, resides in the memory space immediately following the register banks.
     238What remains of IRAM may be treated as general purpose memory.
    237239A schematic view of IRAM layout is also provided in Figure~\ref{fig.memory.layout}.
    238240
    239 External RAM (XRAM), limited to a maximum size of 64 kilobytes, is optional, and may be provided on or off chip, depending on the manufacturer.
     241External RAM (XRAM), limited to a maximum size of 64 kilobytes, is optional, and may be provided on or off chip, depending on the vendor.
    240242XRAM is accessed using a dedicated instruction, and requires 16 bits to address fully.
    241 External code memory (XCODE) is often stored in the form of an EPROM, and limited to 64 kilobytes in size.
    242 However, depending on the particular manufacturer and processor model, a dedicated on-die read-only memory area for program code (ICODE) may also be supplied.
     243External code memory (XCODE) is often stored as an EPROM, and limited to 64 kilobytes in size.
     244However, depending on the particular processor model, a dedicated on-die read-only memory area for program code (ICODE) may be supplied.
    243245
    244246Memory may be addressed in numerous ways: immediate, direct, indirect, external direct and code indirect.
    245 As the latter two addressing modes hint, there are some restrictions enforced by the 8051 and its derivatives on which addressing modes may be used with specific types of memory.
    246 For instance, the 128 bytes of extra internal RAM that the 8052 features cannot be addressed using indirect addressing; rather, external (in)direct addressing must be used. Moreover, some memory segments are addressed using 8-bit pointers while others require 16-bits.
     247As the latter two addressing modes hint, there are some restrictions enforced by the 8051, and its derivatives, on which addressing modes may be used with specific types of memory.
     248For instance, the extra 128 bytes of IRAM of the 8052 cannot be addressed using indirect addressing; rather, external (in)direct addressing must be used. Moreover, some memory segments are addressed using 8-bit pointers while others require 16-bits.
    247249
    248250The 8051 possesses an 8-bit Arithmetic and Logic Unit (ALU), with a variety of instructions for performing arithmetic and logical operations on bits and integers.
    249251Two 8-bit general purpose accumulators, A and B, are provided.
    250252
    251 Communication with the device is facilitated by an onboard UART serial port, and associated serial controller, which can operate in numerous modes.
     253Communication with the device is handled by an inbuilt UART serial port and controller.
     254This can operate in numerous modes.
    252255Serial baud rate is determined by one of two 16-bit timers included with the 8051, which can be set to multiple modes of operation.
    253256(The 8052 provides an additional 16-bit timer.)
    254 As an additional method of communication, the 8051 also provides a four byte bit-addressable input-output port.
    255 
    256 The programmer may take advantage of the interrupt mechanism that the processor provides.
    257 This is especially useful when dealing with input or output involving the serial device, as an interrupt can be set when a whole character is sent or received via the serial port.
     257The 8051 also provides a 4 byte bit-addressable I/O port.
     258
     259The programmer may take advantage of an interrupt mechanism.
     260This is especially useful when dealing with I/O involving the serial device, as an interrupt can be set when a whole character is sent or received via the UART.
    258261
    259262Interrupts immediately halt the flow of execution of the processor, and cause the program counter to jump to a fixed address, where the requisite interrupt handler is stored.
     
    273276%\end{figure}
    274277
    275 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    276 % SECTION                                                                      %
    277 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    278 \subsection{Overview of paper}
    279 \label{subsect.overview.paper}
    280 
     278\paragraph{Overview of paper}\quad
    281279In Section~\ref{sect.design.issues.formalisation} we discuss design issues in the development of the formalisation.
    282280In Section~\ref{sect.validation} we discuss how we validated the design and implementation of our emulator to ensure that what we formalised was an accurate model of an MCS-51 series microprocessor.
     
    349347In Matita, we are able to use the full power of dependent types to always work with vectors of a known size:
    350348\begin{lstlisting}
    351 inductive Vector (A: Type[0]): nat Type[0] ≝
     349inductive Vector (A: Type[0]): nat $\rightarrow$ Type[0] ≝
    352350  VEmpty: Vector A O
    353 | VCons: ∀n: nat. A → Vector A n → Vector A (S n).
     351| VCons: $\forall$n: nat. A $\rightarrow$ Vector A n $\rightarrow$ Vector A (S n).
    354352\end{lstlisting}
    355353We define \texttt{BitVector} as a specialization of \texttt{Vector} to \texttt{bool}.
     
    782780
    783781The formalisation is done at machine level and not at assembly level; we also formalise fetching and decoding.
    784 We separately provide an assembly language, enhanched with labels and pseudo-instructions, and an assembler from this language to machine code.
     782We separately provide an assembly language, enhanched with labels and pseudoinstructions, and an assembler from this language to machine code.
    785783We introduce cost labels in the machine language to relate the data flow of the assembly program to that of the C source language, in order to associate costs to the C program.
    786784For the O'Caml version, we provide a parser and pretty printer from code memory to Intel HEX format.
Note: See TracChangeset for help on using the changeset viewer.