# Changeset 546 for Deliverables/D4.1

Ignore:
Timestamp:
Feb 16, 2011, 6:12:45 PM (9 years ago)
Message:

Partly rewritten conclusions to fix English

File:
1 edited

### Legend:

Unmodified
 r545 In Section~\ref{sect.conclusions} we conclude the paper. In Appendices~\ref{sect.listing.main.ocaml.functions} and~\ref{sect.listing.main.matita.functions} we provide a brief overview of the main functions in our implementation, and describe at a high level what they do. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % SECTION                                                                      % Using this coercion opens a proof obligation wherein we must prove that the \texttt{addressing\_mode\_tag} in correspondence with the \texttt{addressing\_mode} is a member of the \texttt{Vector} of permissible \texttt{addressing\_mode\_tag}s. This impels us to state and prove a number of auxilliary lemmas. For instance, we prove that if an \texttt{addressing\_mode\_tag} is a member of a \texttt{Vector}, and we possess a supervector, then the same \texttt{addressing\_mode\_tag} is a member of this supervector. Without combining Matita's automation and lemmas such as these our approach becomes infeasible: type checking the main \texttt{execute1} function opens up over 200 proof obligations. For instance, we prove that if an \texttt{addressing\_mode\_tag} is a member of a \texttt{Vector}, and we possess another vector with additional elements, then the same \texttt{addressing\_mode\_tag} is a member of this vector. Using these lemmas, and Matita's automation, all proof obligations are solved easily. (Type checking the main \texttt{execute\_1} function, for instance, opens up over 200 proof obligations.) The machinery just described allows us to state in the type of a function what addressing modes that function expects. Attempting to match against another addressing mode not indicated in the type (for example, \texttt{REGISTER}) will produce a type-error. % Note the execute_1 function which leaves open over 200 proof obligations which we can close automatically, provided we have the lemmas mentioned above % Give an example of the type of proof obligations left open? % Talk about extraction to O'Caml code, which hopefully will allow us to extract back to using polymorphic variants, or when extracting vectors we could extract using phantom types % Discuss alternative approaches, i.e. Sigma types to piece together smaller types into larger ones, as opposed to using a predicate to cut out' pieces of a larger type, which is what we did Processor status: ACC : 0 (00000000) B   : 0 (00000000) PSW : 0 (00000000) with flags set as: CY  : false   AC  : false FO  : false   RS1 : false RS0 : false   OV  : false ACC : 0 (00000000) B   : 0 (00000000) PSW : 0 (00000000) with flags set as: CY  : false   AC  : false FO  : false RS1 : false   RS0 : false OV  : false UD  : false   P   : false SP  : 7 (00000111) IP  : 0 (00000000) PC  : 8 (0000000000001000) DPL : 0 (00000000) DPH : 0 (00000000) SCON: 0 (00000000) SBUF: 0 (00000000) TMOD: 0 (00000000) TCON: 0 (00000000) DPL : 0 (00000000) DPH : 0 (00000000) SCON: 0 (00000000) SBUF: 0 (00000000) TMOD: 0 (00000000) TCON: 0 (00000000) Registers: R0 : 0 (00000000) R1 : 0 (00000000) R2 : 0 (00000000) R3 : 0 (00000000) R4 : 0 (00000000) R5 : 0 (00000000) R0 : 0 (00000000) R1 : 0 (00000000) R2 : 0 (00000000) R3 : 0 (00000000) R4 : 0 (00000000) R5 : 0 (00000000) R6 : 0 (00000000) R7 : 0 (00000000) \section{Related work} \label{sect.related.work} There is a quite large litterature on formalization of micro-processors. Most of it is aimed at the proof of correctness of the implementation of micro-processor at the micro-code or electric level. Instead, we are only interested in providing a precise specification of the behaviour of the micro-processor in order to prove correctness of compilers. In particular, we are also interested in the precise timings of the processor. Moreover, we have not only formalized the interface of an MCS-51 processor, but a complete MCS-51 system: the UART, the I/O lines, the hardware timers. %An assembler for MCS-51 is also provided, and it supports pseudo instructions. One of the closest work to our is~\cite{A Trustworthy Monadic Formalization of the ARMv7 Instruction Set Architecture Anthony Fox and Magnus O. Myreen} which describes the formalization of the ARMv7 instruction set architecture. It also contains a good list of references to related work in the litterature. Like we do, their model also operates at the machine code level, as opposed to a more abstract assembly code level. In particular, instruction decoding is explicitly modelled inside the logic. In addition to this, we also provide an assembly level together with an assembler to translate the instructions and pseudo-instruction to machine code. From the perspective of trustworthyness, we have only performed a non-exhaustive validation against a pre-existent emulator, while in~\cite{again} they have used development boards and random testing to increase their confidence in the formalization. At the moment, we leave this to future works. Another difference is that in~\cite{again} the authors provide an automation layer to derive single step theorems: if the processor is in a particular state that satisfies some pre-conditions, after execution of an instruction it will satisfy some post-conditions. We do not need single step theorems since Matita is based on a logic that internalizes conversion and since we gave an executable formalization, i.e. an emulation function from states to states. Application of the emulation function on the input state reduces to an output state that already satisfies the appropriate conditions. Most of the difficulties we have had to face are related to the non uniformity of 8 bits architectures, in terms of instruction set, addressing modes and memory models. No difficulties like that are found when formalizing the ARM instruction set. The closest project to CerCo is CompCert~\cite{compcert} which is about the certification of an ARM compiler. CompCert includes a formalization in Coq of a subset of ARM. Coq and Matita shares the same logic. However, the two formalizations do not have much in common. First of all, they provide an assembly level model (no instruction decoding), and so they must trust an unformalized assembler and linker, while we provide our own. I/O is also not considered at all. Moreover they assume an idealized abstract and uniform memory model, while we take in account the complicated memory model of the MCS-51 architecture. Finally, they only formalized about 90 instructions of the 200+ offered by the processor, and they augmented the assembly with macro instructions that are turned into real instructions only during communication with the external assembler. Even technically the two formalizations differ: while we tried to exploit as much as possible dependent types, they stick to the non dependent fragment of the logic of Coq. In~\cite{Robert Atkey. CoqJVM: An executable specification of the Java virtual machine using dependent types. In Marino Miculan, Ivan Scagnetto, and Furio Honsell, editors, TYPES, volume 4941 of Lecture Notes in Computer Science, pages 18–32.  Springer, 2007.} Atkey presents an executable specification of the Java virtual machine using dependent types. As we do, dependent types are used to remove spurious partiality from the model and to lower the need of over-specifying the behaviour of the processor in impossible cases. Dependent types will also help to maintain invariants when proving the correctness of the CerCo compiler. Finally, in~\cite{Susmit Sarkar, Pater Sewell, Francesco Zappa Nardelli, Scott Owens, Tom Ridge, Thomas Braibant Magnus O. Myreen, and Jade Alglave. The semantics of x86-CC multiprocessor machine code. In Principles of Programming Languages (POPL). ACM, 2009.} Sarkar and alt. provide an executable semantics for the x86-CC multiprocessor machine code, which exhibits an high degree of non uniformity as the 8051. However, they only consider a very small subset of the There exists a large body of literature on the formalisation of microprocessors. The majority of it aims to prove correctness of the implementation of the microprocessor at the microcode or gate level. However, we are interested in providing a precise specification of the behaviour of the microprocessor in order to prove the correctness of a compiler which will target the processor. In particular, we are interested in intensional properties of the processor; precise timings of instruction execution in clock cycles. Moreover, in addition to formalising the interface of an MCS-51 processor, we have also built a complete MCS-51 ecosystem: the UART, the I/O lines, and hardware timers, along with an assembler. Similar work to ours can be found in~\cite{A Trustworthy Monadic Formalization of the ARMv7 Instruction Set Architecture Anthony Fox and Magnus O. Myreen}. Here, the authors describe the formalisation, in HOL4, of the ARMv7 instruction set architecture, and point to a good list of references to related work in the literature. This formalisation also considers the machine code level, as opposed to only considering an abstract assembly language. In particular, instruction decoding is explicitly modelled inside HOL4's logic. However, we go further in also providing an assembly language, complete with assembler, to translate instructions and pseudoinstruction to machine code. Further, in~\cite{again} the authors validated their formalisation by using development boards and random testing. However, we currently rely on non-exhaustive testing against a third party emulator. We leave similar exhaustive testing for future work. Executability is another key difference between our work and~\cite{again}. In~\cite{again} the authors provide an automation layer to derive single step theorems: if the processor is in a particular state that satisfies some preconditions, then after execution of an instruction it will reside in another state satisfying some postconditions. We do not need single step theorems of this form. This is because Matita is based on a logic that internalizes conversion. As a result, our formalisation is executable: applying the emulation function to an input state eventually reduces to an output state that already satisfies the appropriate conditions. Our main difficulties resided in the non-uniformity of an old 8-bit architecture, in terms of the instruction set, addressing modes and memory models. In contrast, the ARM instruction set and memory model is relatively uniform, simplifying any formalisation considerably. Perhaps the closest project to CerCo is CompCert~\cite{compcert}. CompCert concerns the certification of an ARM compiler and includes a formalisation in Coq of a subset of ARM. Coq and Matita essentially share the same logic. Despite this similarity, the two formalisations do not have much in common. First, CompCert provides a formalisation at the assembly level (no instruction decoding), and this impels them to trust an unformalised assembler and linker, whereas we provide our own. I/O is also not considered at all in CompCert. Moreover an idealized abstract and uniform memory model is assumed, while we take into account the complicated overlapping memory model of the MCS-51 architecture. Finally, around 90 instructions of the 200+ offered by the processor are formalised in CompCert, and the assembly language is augmented with macro instructions that are turned into real' instructions only during communication with the external assembler. Even from a technical level the two formalisations differ: while we tried to exploit dependent types as often as possible, CompCert largely sticks to the non-dependent fragment of Coq. In~\cite{Robert Atkey. CoqJVM: An executable specification of the Java virtual machine using dependent types. In Marino Miculan, Ivan Scagnetto, and Furio Honsell, editors, TYPES, volume 4941 of Lecture Notes in Computer Science, pages 18-32.  Springer, 2007.} Atkey presents an executable specification of the Java virtual machine which uses dependent types. As we do, dependent types are used to remove spurious partiality from the model, and to lower the need for over-specifying the behaviour of the processor in impossible cases. Our use of dependent types will also help to maintain invariants when we prove the correctness of the CerCo prototype compiler. Finally, in~\cite{Susmit Sarkar, Pater Sewell, Francesco Zappa Nardelli, Scott Owens, Tom Ridge, Thomas Braibant Magnus O. Myreen, and Jade Alglave. The semantics of x86-CC multiprocessor machine code. In Principles of Programming Languages (POPL). ACM, 2009.} Sarkar et al provide an executable semantics for x86-CC multiprocessor machine code, which exhibits an high degree of non uniformity as the 8051. However, they only consider a very small subset of the instructions and they over-approximate the possibilities of unorthogonality of the instruction set, dodging the problems we had to face.