# Changeset 814 for Deliverables/D4.1

Ignore:
Timestamp:
May 18, 2011, 3:33:05 PM (9 years ago)
Message:

changes with claudio

Location:
Deliverables/D4.1/ITP-Paper
Files:
2 edited

### Legend:

Unmodified
 r579 } @article { luo:coercive:1999, author = {Zhaohui Luo}, title = {Coercive subtyping}, journal = {Journal of Logic and Computation}, volume = {9}, number = {1}, pages = {105--130}, year = {1999} } @inproceedings { yan:wcet:2008, } @inproceedings { xi:guarded:2003, author = {Hongwei Xi and Chiyan Chen and Gang Chen}, title = {Guarded recursive datatype constructors}, booktitle = {Proceedings of the $\mathrm{30^{th}}$ Annual Symposium on Principles of Programming Languages}, pages = {224--235}, year = {2003} } @misc { cerco:2011,
 r813 \begin{abstract} We summarise the formalisation of an emulator for the MCS-51 microprocessor in the Matita proof assistant. We summarise the formalisation of two emulators for the MCS-51 microprocessor in O'Caml and the Matita proof assistant. The MCS-51 is a widely used 8-bit microprocessor, especially popular in embedded devices. The formalisation proceeded in two stages, first implementing an O'Caml prototype, for quickly ironing out' bugs, and then porting the O'Caml emulator to Matita. Though mostly straight-forward, this porting presented multiple problems. Of particular interest is how the unorthoganality of the MSC-51's instruction set is handled. In O'Caml, this was handled with polymorphic variants. In Matita, we achieved the same effect with a non-standard use of dependent types. The O'Caml emulator is intended to be feature complete' with respect to the MCS-51 device. However, the Matita emulator is intended to be used as a target for a certified, complexity preserving C compiler, as part of the EU-funded CerCo project. As a result, not all features of the MCS-51 are formalised in the Matita emulator. %The formalisation proceeded in two stages, first implementing an O'Caml prototype, for quickly ironing out' bugs, and then porting the O'Caml emulator to Matita. %Though mostly straight-forward, this porting presented multiple problems. %Of particular interest is how the unorthoganality of the MSC-51's instruction set is handled. %In O'Caml, this was handled with polymorphic variants. %In Matita, we achieved the same effect with a non-standard use of dependent types. Both the O'Caml and Matita emulators are executable'. Assembly programs may be animated within Matita, producing a trace of instructions executed. The formalisation is a major component of the ongoing EU-funded CerCo project. \end{abstract} With the CerCo methodology, we assume we can assign to object code exact and realistic costs for sequential blocks of instructions. This is possible with modern processors (see~\cite{bate:wcet:2011,yan:wcet:2008} for instance) but difficult, as the structure and execution of a program itself has an influence on the speed of processing. Caching, memory effects, and advanced features such as branch prediction all have an effect on execution speed. For this reason CerCo decided to focus on 8-bit microprocessors. These are still used in embedded systems, with the advantage of a predictable cost model due to their relative paucity of features. The WCET community has developed complex tools for bounding the worst-case execution times of sequential blocks on modern processors. WCET analysis takes place at the object code level. However, it is more convenient to reason about programs at a much higher-level of abstraction. Therefore, the analysis must be reflected back onto the original source code. This reflection process is completely untrusted' and makes strong assumptions about the internal design and correctness of the compiler. For example, some WCET analysis tools, to maximise precision, require a programmer-provided strict upper bound on the number of loop iterations. Compiler optimizations could rearrange code in such a manner that the upper bound is no longer strict. The certified CerCo C compiler validates such strong assumptions, and a certified analysis tool could be obtained by combining the CerCo compiler with any certified WCET tool. We are interested in building a fully certified tool. However we are not able to build a certified WCET tool \emph{and} certified C compiler within the confines of the CerCo project. We therefore focus on certifying the compiler by targetting a microprocessor where complex WCET analyses are not required. Caching, memory effects, and advanced features such as branch prediction all have an effect on the complexity of WCET analyses (see~\cite{bate:wcet:2011,yan:wcet:2008}, and so on). CerCo therefore decided to focus on 8-bit microprocessors, which are still used in embedded systems. These have a predictable, precise cost model due to their relative paucity of features. Manufacturer timesheets provide \emph{exact guarantees} for the number of processor cycles each instruction will take to execute. We have fully formalised an executable formal semantics of a family of 8-bit Freescale microprocessors~\cite{oliboni:matita:2008}, and provided a similar executable formal semantics for the MCS-51 microprocessor. To cope with this unorthogonality, and to produce an executable specification, we rely on the dependent types of Matita, an interactive proof assistant~\cite{asperti:user:2007}. \paragraph{The MCS-51}\quad \paragraph*{The MCS-51}\quad The MCS-51 is an 8-bit microprocessor introduced by Intel in the late 1970s. Commonly called the 8051, in the decades since its introduction the processor has become a popular component of embedded systems. The processor and derivatives are still manufactured \emph{en masse} by a host of vendors. The 8051 is a well documented processor, and has the support of numerous open source and commercial tools, such as compilers and emulators. Surprisingly, however, there is not yet a formal model of the MCS-51. The 8051 is a well documented processor, with very few underspecified behaviours (almost all of these correspond to erroneous usage of the processor). The processor also has the support of numerous open source and commercial tools, such as compilers and emulators. For instance, the open source Small Device C Compiler (SDCC) recognises a dialect of C~\cite{sdcc:2010}, and other compilers for BASIC, Forth and Modula-2 are also extant. An open source emulator for the processor, MCU 8051 IDE, is also available~\cite{mcu8051ide:2010}. \end{figure*} The 8051 has a relatively straightforward architecture, unencumbered by advanced features of modern processors, making it an ideal target for formalisation. The 8051 has a relatively straightforward architecture. A high-level overview of the processor's memory layout, along with the ways in which different memory spaces may be addressed, is provided in Figure~\ref{fig.memory.layout}. %\end{figure} \paragraph{Overview of paper}\quad \paragraph*{Overview of paper}\quad In Section~\ref{sect.design.issues.formalisation} we discuss design issues in the development of the formalisation. In Section~\ref{sect.validation} we discuss how we validated the design and implementation of the emulator to ensure that what we formalised was an accurate model of an MCS-51 series microprocessor. Matita~\cite{asperti:user:2007} is a proof assistant based on the Calculus of Coinductive constructions, similar to Coq. Matita's internal language is similar to O'Caml's; we frame Matita code to distinguish the two. In Matita, \texttt{?}' or \texttt{$\ldots$}' denote an argument or arguments to be inferred, respectively. As a programming language, Matita corresponds to the functional fragment of O'Caml extended with dependent types. Matita also features a rich higher-order logic for reasoning about programs. Unlike O'Caml, all recursive functions must be structurally recursive, and therefore total. We box Matita code to distinguish it from O'Caml code. In Matita, \texttt{?}' or \texttt{$\ldots$}' denote arguments to be inferred automatically. A full account of the formalisation can be found in~\cite{cerco-report:2011}. All source code is available from the CerCo project website~\cite{cerco-code:2011}. \subsection{Development strategy} \label{subsect.development.strategy} The implementation progressed in two stages. We began with an emulator written in O'Caml to iron out' any bugs in the design and implementation. O'Caml's ability to perform file I/O also eased debugging and validation. Once we were happy with the design of the O'Caml emulator, we moved to Matita. Matita's syntax is lexically similar to O'Caml's. This eased the translation, as swathes of code were copied with minor modifications. However, several major issues had to be addressed when moving to Matita. These are now discussed. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% The MCS-51 has numerous disjoint memory spaces addressed by differently sized pointers. In the O'Caml implementation, we use a map data structure (from the standard library) for each space. Matita's standard library is small, and does not contain a generic map data structure. We had the opportunity of crafting a dependently typed special-purpose data structure for the job to enforce the correspondence between the size of pointer and the size of the memory space. Further, we assumed that large swathes of memory would often be uninitialized. In Matita, we exploited dependent types to design a data structure which enforced the correspondence between the size of pointer and the size of the memory space. Further, we assumed that large swathes of memory would often be uninitialized (an assumption on the behaviour of the programmer, not the processor!) We picked a modified form of trie of fixed height $h$. % Show example of pattern matching with polymorphic variants Such unorthogonality in the instruction set was handled with the use of polymorphic variants in O'Caml. Such unorthogonality in the instruction set was handled with the use of polymorphic variants in O'Caml~\cite{garrigue:programming:1998}. For instance, we introduced types corresponding to each addressing mode: \begin{lstlisting} }. \end{lstlisting} An implicit coercion is provided to promote vectors of tags (denoted with $\llbracket - \rrbracket$) to the corresponding \texttt{subaddressing\_mode} so that we can use a syntax close to that of O'Caml to specify \texttt{preinstruction}s: An implicit coercion~\cite{luo:coercive:1999} is provided to promote vectors of tags (denoted with $\llbracket - \rrbracket$) to the corresponding \texttt{subaddressing\_mode} so that we can use a syntax close to that of O'Caml to specify \texttt{preinstruction}s: \begin{lstlisting}[frame=single] inductive preinstruction (A: Type[0]): Type[0] ≝ % Real clock' for I/O and timers The O'Caml emulator has code for handling timers, asynchronous I/O and interrupts (these are not yet ported to the Matita emulator). The O'Caml emulator has code for handling timers, asynchronous I/O and interrupts (these are not in the Matita emulator as they are not relevant to CerCo). All three of these features interact with each other in subtle ways. Interrupts can fire' when an input is detected on the processor's UART port, and, in certain modes, timers reset when a high signal is detected on one of the MCS-51's communication pins. In an asyncrhonous mode, the UART transmits 8 bits at a time, using a ninth line for synchronisation. In a synchronous mode the ninth line is used to transmit an additional bit. All UART modes are formalised. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% During the assembly phase, where labels and pseudoinstructions are eliminated, a map is generated associating cost labels with memory locations. This map is later used in a separate analysis which computes the cost of a program by traversing through a program, fetching one instruction at a time, and computing the cost of blocks. When targetting more complex processors, this simple analysis will need to be replaced by a more sophisticated WCET analysis. These block costings are stored in another map, and will later be passed back to the prototype compiler. \label{sect.validation} \begin{figure}[t] \begin{scriptsize} \begin{verbatim} 08: mov 81 #07 Processor status: ACC: 0   B: 0   PSW: 0 with flags set as: CY: false    AC: false   FO: false   RS1: false RS0: false   OV: false   UD: false   P: false SP: 7   IP: 0   PC: 8   DPL: 0   DPH: 0   SCON: 0 SBUF: 0   TMOD: 0   TCON: 0 Registers: R0: 0   R1: 0   R2: 0   R3: 0 R4: 0   R5: 0   R6: 0   R7: 0 \end{verbatim} \end{scriptsize} \caption{An example snippet from an emulator execution trace} \label{fig.execution.trace} \end{figure} %\begin{figure}[t] %\begin{scriptsize} %\begin{verbatim} %08: mov 81 #07 % % Processor status: % %   ACC: 0   B: 0   PSW: 0 %    with flags set as: %     CY: false    AC: false   FO: false   RS1: false %     RS0: false   OV: false   UD: false   P: false %   SP: 7   IP: 0   PC: 8   DPL: 0   DPH: 0   SCON: 0 %   SBUF: 0   TMOD: 0   TCON: 0 %   Registers: %    R0: 0   R1: 0   R2: 0   R3: 0 %    R4: 0   R5: 0   R6: 0   R7: 0 %\end{verbatim} %\end{scriptsize} %\caption{An example snippet from an emulator execution trace} %\label{fig.execution.trace} %\end{figure} We spent considerable effort attempting to ensure that what we have formalised is an accurate model of the MCS-51 microprocessor. This helped us triangulate errors in the specification of the processor's instruction set, and its behaviour, for instance, in a data sheet from Philips Semiconductor. The O'Caml prototype was especially useful for validation purposes. The O'Caml emulator was especially useful for validation purposes. We wrote a module for parsing and loading Intel HEX format files. Intel HEX is a standard format that all compilers targetting the MCS-51, and similar processors, produce. Further, we can produce a HEX file from the emulator's code memory for loading into third party tools. After each step of execution, we can print out both the instruction that had been executed and a snapshot of the processor's state, including all flags and register contents. A snippet from an execution trace is found in Figure~\ref{fig.execution.trace}. Here, the trace indicates that the instruction \texttt{mov 81 \#07} has just been executed by the processor, which is now in the state indicated. These traces were useful in spotting anything that was obviously' wrong with the execution of the program. This is because Matita is based on a logic, CIC, which internalizes conversion. In~\cite{fox:trustworthy:2010} the authors provide an automation layer to derive single step theorems: if the processor is in a state that satisfies some preconditions, then after execution of an instruction it will reside in a state satisfying some postconditions. We do not need single step theorems of this form. We will not need single step theorems of this form to prove properties of the assembly code. Our main difficulties resided in the non-uniformity of an old 8-bit architecture, in terms of the instruction set, addressing modes and memory models. In contrast, the ARM instruction set and memory model is relatively uniform, simplifying any formalisation considerably. In contrast, the various ARM instruction sets and memory models are relatively uniform. Perhaps the closest project to CerCo is CompCert~\cite{leroy:formally:2009}. This impels them to trust an unformalised assembler and linker, whereas we provide our own. Our formalisation is \emph{directly} executable, while the one in CompCert only provides a relation that describes execution. I/O is also not considered at all in CompCert. In CompCert I/O is only described as a synchronous external function call and there is no I/O at the processor level. Moreover an idealized abstract and uniform memory model is assumed, while we take into account the complicated overlapping memory model of the MCS-51 architecture. Finally, 82 instructions of the more than 200 offered by the processor are formalised in CompCert, and the assembly language is augmented with macro instructions that are turned into real' instructions only during communication with the external assembler. A small domain specific language of patterns is formalised in HOL4. This is similar to the specification language of the x86 instruction set found in manufacturer's data sheets. A decode function is implemented by copying lines from data sheets into the proof script, which are then interpreted. A decode function is implemented by copying lines from data sheets into the proof script, which are then partially evaluated to obtain a compiler. We are currently considering implementing a similar domain specific language in Matita. However, we would prefer to certify in Matita the compiler for this language. Data sheets could then be compiled down to the efficient code that we currently provide, instead of inefficiently interpreting the data sheets every time an instruction is executed. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Our main difficulty in formalising the MCS-51 was the unorthogonality of its memory model and instruction set. These problems are easily handled in O'Caml by using advanced language features like polymorphic variants and phantom types, simulating Generalized Abstract Data Types. These problems are easily handled in O'Caml by using advanced language features like polymorphic variants and phantom types, simulating Generalized Abstract Data Types~\cite{xi:guarded:2003}. In Matita, we use dependent types to recover the same flexibility, to reduce spurious partiality, and to grant invariants that will be later useful in other formalisations in the CerCo project.