Changeset 546


Ignore:
Timestamp:
Feb 16, 2011, 6:12:45 PM (6 years ago)
Author:
mulligan
Message:

Partly rewritten conclusions to fix English

File:
1 edited

Legend:

Unmodified
Added
Removed
  • Deliverables/D4.1/ITP-Paper/itp-2011.tex

    r545 r546  
    219219In Section~\ref{sect.conclusions} we conclude the paper.
    220220
     221In Appendices~\ref{sect.listing.main.ocaml.functions} and~\ref{sect.listing.main.matita.functions} we provide a brief overview of the main functions in our implementation, and describe at a high level what they do.
     222
    221223%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    222224% SECTION                                                                      %
     
    518520Using this coercion opens a proof obligation wherein we must prove that the \texttt{addressing\_mode\_tag} in correspondence with the \texttt{addressing\_mode} is a member of the \texttt{Vector} of permissible \texttt{addressing\_mode\_tag}s.
    519521This impels us to state and prove a number of auxilliary lemmas.
    520 For instance, we prove that if an \texttt{addressing\_mode\_tag} is a member of a \texttt{Vector}, and we possess a supervector, then the same \texttt{addressing\_mode\_tag} is a member of this supervector.
    521 Without combining Matita's automation and lemmas such as these our approach becomes infeasible: type checking the main \texttt{execute1} function opens up over 200 proof obligations.
     522For instance, we prove that if an \texttt{addressing\_mode\_tag} is a member of a \texttt{Vector}, and we possess another vector with additional elements, then the same \texttt{addressing\_mode\_tag} is a member of this vector.
     523Using these lemmas, and Matita's automation, all proof obligations are solved easily.
     524(Type checking the main \texttt{execute\_1} function, for instance, opens up over 200 proof obligations.)
    522525
    523526The machinery just described allows us to state in the type of a function what addressing modes that function expects.
     
    541544Attempting to match against another addressing mode not indicated in the type (for example, \texttt{REGISTER}) will produce a type-error.
    542545
    543 % Note the execute_1 function which leaves open over 200 proof obligations which we can close automatically, provided we have the lemmas mentioned above
    544 % Give an example of the type of proof obligations left open?
    545546% Talk about extraction to O'Caml code, which hopefully will allow us to extract back to using polymorphic variants, or when extracting vectors we could extract using phantom types
    546547% Discuss alternative approaches, i.e. Sigma types to piece together smaller types into larger ones, as opposed to using a predicate to `cut out' pieces of a larger type, which is what we did
     
    637638 Processor status:                               
    638639
    639    ACC : 0 (00000000) B   : 0 (00000000)
    640    PSW : 0 (00000000) with flags set as:
    641      CY  : false   AC  : false
    642      FO  : false   RS1 : false
    643      RS0 : false   OV  : false
     640   ACC : 0 (00000000) B   : 0 (00000000) PSW : 0 (00000000)
     641    with flags set as:
     642     CY  : false   AC  : false FO  : false
     643     RS1 : false   RS0 : false OV  : false
    644644     UD  : false   P   : false
    645645   SP  : 7 (00000111) IP  : 0 (00000000)
    646646   PC  : 8 (0000000000001000)
    647    DPL : 0 (00000000) DPH : 0 (00000000)
    648    SCON: 0 (00000000) SBUF: 0 (00000000)
    649    TMOD: 0 (00000000) TCON: 0 (00000000)
     647   DPL : 0 (00000000) DPH : 0 (00000000) SCON: 0 (00000000)
     648   SBUF: 0 (00000000) TMOD: 0 (00000000) TCON: 0 (00000000)
    650649   Registers:                                   
    651     R0 : 0 (00000000) R1 : 0 (00000000)
    652     R2 : 0 (00000000) R3 : 0 (00000000)
    653     R4 : 0 (00000000) R5 : 0 (00000000)
     650    R0 : 0 (00000000) R1 : 0 (00000000) R2 : 0 (00000000)
     651    R3 : 0 (00000000) R4 : 0 (00000000) R5 : 0 (00000000)
    654652    R6 : 0 (00000000) R7 : 0 (00000000)
    655653
     
    671669\section{Related work}
    672670\label{sect.related.work}
    673 There is a quite large litterature on formalization of micro-processors.
    674 Most of it is aimed at the proof of correctness of the implementation of
    675 micro-processor at the micro-code or electric level.
    676 Instead, we are only interested in providing a precise specification of the
    677 behaviour of the micro-processor in order to prove correctness of compilers.
    678 In particular, we are also interested in the precise timings of the processor.
    679 Moreover, we have not only formalized the interface of an MCS-51 processor,
    680 but a complete MCS-51 system: the UART, the I/O lines, the hardware timers.
    681 %An assembler for MCS-51 is also provided, and it supports pseudo instructions.
    682 
    683 One of the closest work to our is~\cite{A Trustworthy Monadic Formalization of the ARMv7 Instruction Set Architecture Anthony Fox and Magnus O. Myreen} which
    684 describes the formalization of the ARMv7 instruction set architecture. It also
    685 contains a good list of references to related work in the litterature.
    686 
    687 Like we do, their model also operates at the machine code level, as opposed to
    688 a more abstract assembly code level. In particular, instruction decoding is
    689 explicitly modelled inside the logic. In addition to this, we also provide
    690 an assembly level together with an assembler to translate the instructions
    691 and pseudo-instruction to machine code.
    692 
    693 From the perspective of trustworthyness, we have only performed a non-exhaustive
    694 validation against a pre-existent emulator, while in~\cite{again} they have
    695 used development boards and random testing to increase their confidence in the
    696 formalization. At the moment, we leave this to future works.
    697 
    698 Another difference is that in~\cite{again} the authors provide an automation
    699 layer to derive single step theorems: if the processor is in a particular state
    700 that satisfies some pre-conditions, after execution of an instruction it will
    701 satisfy some post-conditions. We do not need single step theorems since Matita
    702 is based on a logic that internalizes conversion and since we gave an
    703 executable formalization, i.e. an emulation function from states to states.
    704 Application of the emulation function on the input state reduces to an output
    705 state that already satisfies the appropriate conditions.
    706 
    707 Most of the difficulties we have had to face are related to the non uniformity
    708 of 8 bits architectures, in terms of instruction set, addressing modes and
    709 memory models. No difficulties like that are found when formalizing the ARM
    710 instruction set.
    711 
    712 The closest project to CerCo is CompCert~\cite{compcert} which is about
    713 the certification of an ARM compiler. CompCert includes a formalization
    714 in Coq of a subset of ARM. Coq and Matita shares the same logic.
    715 However, the two formalizations do not have much in common. First of all,
    716 they provide an assembly level model (no instruction decoding), and so they
    717 must trust an unformalized assembler and linker, while we provide our own.
    718 I/O is also not considered at all.
    719 Moreover they assume an idealized abstract and uniform memory model, while
    720 we take in account the complicated memory model of the MCS-51 architecture.
    721 Finally, they only formalized about 90 instructions of the 200+ offered by the
    722 processor, and they augmented the assembly with macro instructions that are
    723 turned into real instructions only during communication with the external
    724 assembler. Even technically the two formalizations differ: while we tried to
    725 exploit as much as possible dependent types, they stick to the non dependent
    726 fragment of the logic of Coq.
    727 
    728 In~\cite{Robert Atkey. CoqJVM: An executable specification of the Java virtual machine using dependent types. In Marino Miculan, Ivan Scagnetto, and Furio Honsell, editors, TYPES, volume 4941 of Lecture Notes in Computer Science, pages 18–32.  Springer, 2007.} Atkey presents an executable specification of the Java
    729 virtual machine using dependent types. As we do, dependent types are used
    730 to remove spurious partiality from the model and to lower the need of over-specifying the behaviour of the processor in impossible cases. Dependent types will
    731 also help to maintain invariants when proving the correctness of the CerCo compiler.
    732 
    733 Finally, in~\cite{Susmit Sarkar, Pater Sewell, Francesco Zappa Nardelli, Scott Owens, Tom Ridge, Thomas Braibant Magnus O. Myreen, and Jade Alglave. The semantics of x86-CC multiprocessor machine code. In Principles of Programming Languages (POPL). ACM, 2009.} Sarkar and alt. provide an executable semantics for the
    734 x86-CC multiprocessor machine code, which exhibits an high degree of non uniformity as the 8051. However, they only consider a very small subset of the
     671There exists a large body of literature on the formalisation of microprocessors.
     672The majority of it aims to prove correctness of the implementation of the microprocessor at the microcode or gate level.
     673However, we are interested in providing a precise specification of the behaviour of the microprocessor in order to prove the correctness of a compiler which will target the processor.
     674In particular, we are interested in intensional properties of the processor; precise timings of instruction execution in clock cycles.
     675Moreover, in addition to formalising the interface of an MCS-51 processor, we have also built a complete MCS-51 ecosystem: the UART, the I/O lines, and hardware timers, along with an assembler.
     676
     677Similar work to ours can be found in~\cite{A Trustworthy Monadic Formalization of the ARMv7 Instruction Set Architecture Anthony Fox and Magnus O. Myreen}.
     678Here, the authors describe the formalisation, in HOL4, of the ARMv7 instruction set architecture, and point to a good list of references to related work in the literature.
     679This formalisation also considers the machine code level, as opposed to only considering an abstract assembly language.
     680In particular, instruction decoding is explicitly modelled inside HOL4's logic.
     681However, we go further in also providing an assembly language, complete with assembler, to translate instructions and pseudoinstruction to machine code.
     682
     683Further, in~\cite{again} the authors validated their formalisation by using development boards and random testing.
     684However, we currently rely on non-exhaustive testing against a third party emulator.
     685We leave similar exhaustive testing for future work.
     686
     687Executability is another key difference between our work and~\cite{again}.
     688In~\cite{again} the authors provide an automation layer to derive single step theorems: if the processor is in a particular state that satisfies some preconditions, then after execution of an instruction it will reside in another state satisfying some postconditions.
     689We do not need single step theorems of this form.
     690This is because Matita is based on a logic that internalizes conversion.
     691As a result, our formalisation is executable: applying the emulation function to an input state eventually reduces to an output state that already satisfies the appropriate conditions.
     692
     693Our main difficulties resided in the non-uniformity of an old 8-bit architecture, in terms of the instruction set, addressing modes and memory models.
     694In contrast, the ARM instruction set and memory model is relatively uniform, simplifying any formalisation considerably.
     695
     696Perhaps the closest project to CerCo is CompCert~\cite{compcert}.
     697CompCert concerns the certification of an ARM compiler and includes a formalisation in Coq of a subset of ARM.
     698Coq and Matita essentially share the same logic.
     699
     700Despite this similarity, the two formalisations do not have much in common.
     701First, CompCert provides a formalisation at the assembly level (no instruction decoding), and this impels them to trust an unformalised assembler and linker, whereas we provide our own.
     702I/O is also not considered at all in CompCert.
     703Moreover an idealized abstract and uniform memory model is assumed, while we take into account the complicated overlapping memory model of the MCS-51 architecture.
     704Finally, around 90 instructions of the 200+ offered by the processor are formalised in CompCert, and the assembly language is augmented with macro instructions that are turned into `real' instructions only during communication with the external assembler.
     705Even from a technical level the two formalisations differ: while we tried to exploit dependent types as often as possible, CompCert largely sticks to the non-dependent fragment of Coq.
     706
     707In~\cite{Robert Atkey. CoqJVM: An executable specification of the Java virtual machine using dependent types. In Marino Miculan, Ivan Scagnetto, and Furio Honsell, editors, TYPES, volume 4941 of Lecture Notes in Computer Science, pages 18-32.  Springer, 2007.} Atkey presents an executable specification of the Java virtual machine which uses dependent types.
     708As we do, dependent types are used to remove spurious partiality from the model, and to lower the need for over-specifying the behaviour of the processor in impossible cases.
     709Our use of dependent types will also help to maintain invariants when we prove the correctness of the CerCo prototype compiler.
     710
     711Finally, in~\cite{Susmit Sarkar, Pater Sewell, Francesco Zappa Nardelli, Scott Owens, Tom Ridge, Thomas Braibant Magnus O. Myreen, and Jade Alglave. The semantics of x86-CC multiprocessor machine code. In Principles of Programming Languages (POPL). ACM, 2009.} Sarkar et al provide an executable semantics for x86-CC multiprocessor machine code, which exhibits an high degree of non uniformity as the 8051. However, they only consider a very small subset of the
    735712instructions and they over-approximate the possibilities of unorthogonality of
    736713the instruction set, dodging the problems we had to face.
Note: See TracChangeset for help on using the changeset viewer.