Ignore:
Timestamp:
May 18, 2011, 3:33:05 PM (9 years ago)
Author:
mulligan
Message:

changes with claudio

File:
1 edited

Legend:

Unmodified
Added
Removed
  • Deliverables/D4.1/ITP-Paper/itp-2011.tex

    r813 r814  
    7676
    7777\begin{abstract}
    78 We summarise the formalisation of an emulator for the MCS-51 microprocessor in the Matita proof assistant.
     78We summarise the formalisation of two emulators for the MCS-51 microprocessor in O'Caml and the Matita proof assistant.
    7979The MCS-51 is a widely used 8-bit microprocessor, especially popular in embedded devices.
    8080
    81 The formalisation proceeded in two stages, first implementing an O'Caml prototype, for quickly `ironing out' bugs, and then porting the O'Caml emulator to Matita.
    82 Though mostly straight-forward, this porting presented multiple problems.
    83 Of particular interest is how the unorthoganality of the MSC-51's instruction set is handled.
    84 In O'Caml, this was handled with polymorphic variants.
    85 In Matita, we achieved the same effect with a non-standard use of dependent types.
     81The O'Caml emulator is intended to be `feature complete' with respect to the MCS-51 device.
     82However, the Matita emulator is intended to be used as a target for a certified, complexity preserving C compiler, as part of the EU-funded CerCo project.
     83As a result, not all features of the MCS-51 are formalised in the Matita emulator.
     84
     85%The formalisation proceeded in two stages, first implementing an O'Caml prototype, for quickly `ironing out' bugs, and then porting the O'Caml emulator to Matita.
     86%Though mostly straight-forward, this porting presented multiple problems.
     87%Of particular interest is how the unorthoganality of the MSC-51's instruction set is handled.
     88%In O'Caml, this was handled with polymorphic variants.
     89%In Matita, we achieved the same effect with a non-standard use of dependent types.
    8690
    8791Both the O'Caml and Matita emulators are `executable'.
    8892Assembly programs may be animated within Matita, producing a trace of instructions executed.
    89 The formalisation is a major component of the ongoing EU-funded CerCo project.
    9093\end{abstract}
    9194
     
    146149
    147150With the CerCo methodology, we assume we can assign to object code exact and realistic costs for sequential blocks of instructions.
    148 This is possible with modern processors (see~\cite{bate:wcet:2011,yan:wcet:2008} for instance) but difficult, as the structure and execution of a program itself has an influence on the speed of processing.
    149 Caching, memory effects, and advanced features such as branch prediction all have an effect on execution speed.
    150 For this reason CerCo decided to focus on 8-bit microprocessors.
    151 These are still used in embedded systems, with the advantage of a predictable cost model due to their relative paucity of features.
     151The WCET community has developed complex tools for bounding the worst-case execution times of sequential blocks on modern processors.
     152WCET analysis takes place at the object code level.
     153However, it is more convenient to reason about programs at a much higher-level of abstraction.
     154Therefore, the analysis must be reflected back onto the original source code.
     155This reflection process is completely `untrusted' and makes strong assumptions about the internal design and correctness of the compiler.
     156For example, some WCET analysis tools, to maximise precision, require a programmer-provided strict upper bound on the number of loop iterations.
     157Compiler optimizations could rearrange code in such a manner that the upper bound is no longer strict.
     158The certified CerCo C compiler validates such strong assumptions, and a certified analysis tool could be obtained by combining the CerCo compiler with any certified WCET tool.
     159
     160We are interested in building a fully certified tool.
     161However we are not able to build a certified WCET tool \emph{and} certified C compiler within the confines of the CerCo project.
     162We therefore focus on certifying the compiler by targetting a microprocessor where complex WCET analyses are not required.
     163
     164Caching, memory effects, and advanced features such as branch prediction all have an effect on the complexity of WCET analyses (see~\cite{bate:wcet:2011,yan:wcet:2008}, and so on).
     165CerCo therefore decided to focus on 8-bit microprocessors, which are still used in embedded systems.
     166These have a predictable, precise cost model due to their relative paucity of features.
     167Manufacturer timesheets provide \emph{exact guarantees} for the number of processor cycles each instruction will take to execute.
    152168
    153169We have fully formalised an executable formal semantics of a family of 8-bit Freescale microprocessors~\cite{oliboni:matita:2008}, and provided a similar executable formal semantics for the MCS-51 microprocessor.
     
    158174To cope with this unorthogonality, and to produce an executable specification, we rely on the dependent types of Matita, an interactive proof assistant~\cite{asperti:user:2007}.
    159175
    160 \paragraph{The MCS-51}\quad
     176\paragraph*{The MCS-51}\quad
    161177The MCS-51 is an 8-bit microprocessor introduced by Intel in the late 1970s.
    162178Commonly called the 8051, in the decades since its introduction the processor has become a popular component of embedded systems.
    163179The processor and derivatives are still manufactured \emph{en masse} by a host of vendors.
    164 
    165 The 8051 is a well documented processor, and has the support of numerous open source and commercial tools, such as compilers and emulators.
     180Surprisingly, however, there is not yet a formal model of the MCS-51.
     181
     182The 8051 is a well documented processor, with very few underspecified behaviours (almost all of these correspond to erroneous usage of the processor).
     183The processor also has the support of numerous open source and commercial tools, such as compilers and emulators.
    166184For instance, the open source Small Device C Compiler (SDCC) recognises a dialect of C~\cite{sdcc:2010}, and other compilers for BASIC, Forth and Modula-2 are also extant.
    167185An open source emulator for the processor, MCU 8051 IDE, is also available~\cite{mcu8051ide:2010}.
     
    241259\end{figure*}
    242260
    243 The 8051 has a relatively straightforward architecture, unencumbered by advanced features of modern processors, making it an ideal target for formalisation.
     261The 8051 has a relatively straightforward architecture.
    244262A high-level overview of the processor's memory layout, along with the ways in which different memory spaces may be addressed, is provided in Figure~\ref{fig.memory.layout}.
    245263
     
    290308%\end{figure}
    291309
    292 \paragraph{Overview of paper}\quad
     310\paragraph*{Overview of paper}\quad
    293311In Section~\ref{sect.design.issues.formalisation} we discuss design issues in the development of the formalisation.
    294312In Section~\ref{sect.validation} we discuss how we validated the design and implementation of the emulator to ensure that what we formalised was an accurate model of an MCS-51 series microprocessor.
     
    303321
    304322Matita~\cite{asperti:user:2007} is a proof assistant based on the Calculus of Coinductive constructions, similar to Coq.
    305 Matita's internal language is similar to O'Caml's; we frame Matita code to distinguish the two.
    306 In Matita, `\texttt{?}' or `\texttt{$\ldots$}' denote an argument or arguments to be inferred, respectively.
     323As a programming language, Matita corresponds to the functional fragment of O'Caml extended with dependent types.
     324Matita also features a rich higher-order logic for reasoning about programs.
     325Unlike O'Caml, all recursive functions must be structurally recursive, and therefore total.
     326
     327We box Matita code to distinguish it from O'Caml code.
     328In Matita, `\texttt{?}' or `\texttt{$\ldots$}' denote arguments to be inferred automatically.
    307329
    308330A full account of the formalisation can be found in~\cite{cerco-report:2011}.
    309331All source code is available from the CerCo project website~\cite{cerco-code:2011}.
    310 
    311 \subsection{Development strategy}
    312 \label{subsect.development.strategy}
    313 
    314 The implementation progressed in two stages.
    315 We began with an emulator written in O'Caml to `iron out' any bugs in the design and implementation.
    316 O'Caml's ability to perform file I/O also eased debugging and validation.
    317 Once we were happy with the design of the O'Caml emulator, we moved to Matita.
    318 
    319 Matita's syntax is lexically similar to O'Caml's.
    320 This eased the translation, as swathes of code were copied with minor modifications.
    321 However, several major issues had to be addressed when moving to Matita.
    322 These are now discussed.
    323332
    324333%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     
    382391The MCS-51 has numerous disjoint memory spaces addressed by differently sized pointers.
    383392In the O'Caml implementation, we use a map data structure (from the standard library) for each space.
    384 Matita's standard library is small, and does not contain a generic map data structure.
    385 We had the opportunity of crafting a dependently typed special-purpose data structure for the job to enforce the correspondence between the size of pointer and the size of the memory space.
    386 Further, we assumed that large swathes of memory would often be uninitialized.
     393In Matita, we exploited dependent types to design a data structure which enforced the correspondence between the size of pointer and the size of the memory space.
     394Further, we assumed that large swathes of memory would often be uninitialized (an assumption on the behaviour of the programmer, not the processor!)
    387395
    388396We picked a modified form of trie of fixed height $h$.
     
    505513% Show example of pattern matching with polymorphic variants
    506514
    507 Such unorthogonality in the instruction set was handled with the use of polymorphic variants in O'Caml.
     515Such unorthogonality in the instruction set was handled with the use of polymorphic variants in O'Caml~\cite{garrigue:programming:1998}.
    508516For instance, we introduced types corresponding to each addressing mode:
    509517\begin{lstlisting}
     
    580588}.
    581589\end{lstlisting}
    582 An implicit coercion is provided to promote vectors of tags (denoted with $\llbracket - \rrbracket$) to the corresponding \texttt{subaddressing\_mode} so that we can use a syntax close to that of O'Caml to specify \texttt{preinstruction}s:
     590An implicit coercion~\cite{luo:coercive:1999} is provided to promote vectors of tags (denoted with $\llbracket - \rrbracket$) to the corresponding \texttt{subaddressing\_mode} so that we can use a syntax close to that of O'Caml to specify \texttt{preinstruction}s:
    583591\begin{lstlisting}[frame=single]
    584592inductive preinstruction (A: Type[0]): Type[0] ≝
     
    638646
    639647% `Real clock' for I/O and timers
    640 The O'Caml emulator has code for handling timers, asynchronous I/O and interrupts (these are not yet ported to the Matita emulator).
     648The O'Caml emulator has code for handling timers, asynchronous I/O and interrupts (these are not in the Matita emulator as they are not relevant to CerCo).
    641649All three of these features interact with each other in subtle ways.
    642650Interrupts can `fire' when an input is detected on the processor's UART port, and, in certain modes, timers reset when a high signal is detected on one of the MCS-51's communication pins.
     
    677685In an asyncrhonous mode, the UART transmits 8 bits at a time, using a ninth line for synchronisation.
    678686In a synchronous mode the ninth line is used to transmit an additional bit.
     687All UART modes are formalised.
    679688
    680689%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     
    691700During the assembly phase, where labels and pseudoinstructions are eliminated, a map is generated associating cost labels with memory locations.
    692701This map is later used in a separate analysis which computes the cost of a program by traversing through a program, fetching one instruction at a time, and computing the cost of blocks.
     702When targetting more complex processors, this simple analysis will need to be replaced by a more sophisticated WCET analysis.
    693703These block costings are stored in another map, and will later be passed back to the prototype compiler.
    694704
     
    699709\label{sect.validation}
    700710
    701 \begin{figure}[t]
    702 \begin{scriptsize}
    703 \begin{verbatim}
    704 08: mov 81 #07
    705 
    706  Processor status:                               
    707 
    708    ACC: 0   B: 0   PSW: 0
    709     with flags set as:
    710      CY: false    AC: false   FO: false   RS1: false
    711      RS0: false   OV: false   UD: false   P: false
    712    SP: 7   IP: 0   PC: 8   DPL: 0   DPH: 0   SCON: 0
    713    SBUF: 0   TMOD: 0   TCON: 0
    714    Registers:                                   
    715     R0: 0   R1: 0   R2: 0   R3: 0
    716     R4: 0   R5: 0   R6: 0   R7: 0
    717 \end{verbatim}
    718 \end{scriptsize}
    719 \caption{An example snippet from an emulator execution trace}
    720 \label{fig.execution.trace}
    721 \end{figure}
     711%\begin{figure}[t]
     712%\begin{scriptsize}
     713%\begin{verbatim}
     714%08: mov 81 #07
     715%
     716% Processor status:                               
     717%
     718%   ACC: 0   B: 0   PSW: 0
     719%    with flags set as:
     720%     CY: false    AC: false   FO: false   RS1: false
     721%     RS0: false   OV: false   UD: false   P: false
     722%   SP: 7   IP: 0   PC: 8   DPL: 0   DPH: 0   SCON: 0
     723%   SBUF: 0   TMOD: 0   TCON: 0
     724%   Registers:                                   
     725%    R0: 0   R1: 0   R2: 0   R3: 0
     726%    R4: 0   R5: 0   R6: 0   R7: 0
     727%\end{verbatim}
     728%\end{scriptsize}
     729%\caption{An example snippet from an emulator execution trace}
     730%\label{fig.execution.trace}
     731%\end{figure}
    722732
    723733We spent considerable effort attempting to ensure that what we have formalised is an accurate model of the MCS-51 microprocessor.
     
    726736This helped us triangulate errors in the specification of the processor's instruction set, and its behaviour, for instance, in a data sheet from Philips Semiconductor.
    727737
    728 The O'Caml prototype was especially useful for validation purposes.
     738The O'Caml emulator was especially useful for validation purposes.
    729739We wrote a module for parsing and loading Intel HEX format files.
    730740Intel HEX is a standard format that all compilers targetting the MCS-51, and similar processors, produce.
     
    733743Further, we can produce a HEX file from the emulator's code memory for loading into third party tools.
    734744After each step of execution, we can print out both the instruction that had been executed and a snapshot of the processor's state, including all flags and register contents.
    735 
    736 A snippet from an execution trace is found in Figure~\ref{fig.execution.trace}.
    737 Here, the trace indicates that the instruction \texttt{mov 81 \#07} has just been executed by the processor, which is now in the state indicated.
    738745These traces were useful in spotting anything that was `obviously' wrong with the execution of the program.
    739746
     
    772779This is because Matita is based on a logic, CIC, which internalizes conversion.
    773780In~\cite{fox:trustworthy:2010} the authors provide an automation layer to derive single step theorems: if the processor is in a state that satisfies some preconditions, then after execution of an instruction it will reside in a state satisfying some postconditions.
    774 We do not need single step theorems of this form.
     781We will not need single step theorems of this form to prove properties of the assembly code.
    775782
    776783Our main difficulties resided in the non-uniformity of an old 8-bit architecture, in terms of the instruction set, addressing modes and memory models.
    777 In contrast, the ARM instruction set and memory model is relatively uniform, simplifying any formalisation considerably.
     784In contrast, the various ARM instruction sets and memory models are relatively uniform.
    778785
    779786Perhaps the closest project to CerCo is CompCert~\cite{leroy:formally:2009}.
     
    785792This impels them to trust an unformalised assembler and linker, whereas we provide our own.
    786793Our formalisation is \emph{directly} executable, while the one in CompCert only provides a relation that describes execution.
    787 I/O is also not considered at all in CompCert.
     794In CompCert I/O is only described as a synchronous external function call and there is no I/O at the processor level.
    788795Moreover an idealized abstract and uniform memory model is assumed, while we take into account the complicated overlapping memory model of the MCS-51 architecture.
    789796Finally, 82 instructions of the more than 200 offered by the processor are formalised in CompCert, and the assembly language is augmented with macro instructions that are turned into `real' instructions only during communication with the external assembler.
     
    802809A small domain specific language of patterns is formalised in HOL4.
    803810This is similar to the specification language of the x86 instruction set found in manufacturer's data sheets.
    804 A decode function is implemented by copying lines from data sheets into the proof script, which are then interpreted.
    805 
     811A decode function is implemented by copying lines from data sheets into the proof script, which are then partially evaluated to obtain a compiler.
    806812We are currently considering implementing a similar domain specific language in Matita.
    807 However, we would prefer to certify in Matita the compiler for this language.
    808 Data sheets could then be compiled down to the efficient code that we currently provide, instead of inefficiently interpreting the data sheets every time an instruction is executed.
    809813
    810814%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     
    829833
    830834Our main difficulty in formalising the MCS-51 was the unorthogonality of its memory model and instruction set.
    831 These problems are easily handled in O'Caml by using advanced language features like polymorphic variants and phantom types, simulating Generalized Abstract Data Types.
     835These problems are easily handled in O'Caml by using advanced language features like polymorphic variants and phantom types, simulating Generalized Abstract Data Types~\cite{xi:guarded:2003}.
    832836In Matita, we use dependent types to recover the same flexibility, to reduce spurious partiality, and to grant invariants that will be later useful in other formalisations in the CerCo project.
    833837
Note: See TracChangeset for help on using the changeset viewer.