Changeset 542


Ignore:
Timestamp:
Feb 16, 2011, 5:32:51 PM (6 years ago)
Author:
sacerdot
Message:

Comparison.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • Deliverables/D4.1/ITP-Paper/itp-2011.tex

    r541 r542  
    660660\section{Related work}
    661661\label{sect.related.work}
     662There is a quite large litterature on formalization of micro-processors.
     663Most of it is aimed at the proof of correctness of the implementation of
     664micro-processor at the micro-code or electric level.
     665Instead, we are only interested in providing a precise specification of the
     666behaviour of the micro-processor in order to prove correctness of compilers.
     667In particular, we are also interested in the precise timings of the processor.
     668Moreover, we have not only formalized the interface of an MCS-51 processor,
     669but a complete MCS-51 system: the UART, the I/O lines, the hardware timers.
     670%An assembler for MCS-51 is also provided, and it supports pseudo instructions.
     671
     672One of the closest work to our is~\cite{A Trustworthy Monadic Formalization of the ARMv7 Instruction Set Architecture Anthony Fox and Magnus O. Myreen} which
     673describes the formalization of the ARMv7 instruction set architecture. It also
     674contains a good list of references to related work in the litterature.
     675
     676Like we do, their model also operates at the machine code level, as opposed to
     677a more abstract assembly code level. In particular, instruction decoding is
     678explicitly modelled inside the logic. In addition to this, we also provide
     679an assembly level together with an assembler to translate the instructions
     680and pseudo-instruction to machine code.
     681
     682From the perspective of trustworthyness, we have only performed a non-exhaustive
     683validation against a pre-existent emulator, while in~\cite{again} they have
     684used development boards and random testing to increase their confidence in the
     685formalization. At the moment, we leave this to future works.
     686
     687Another difference is that in~\cite{again} the authors provide an automation
     688layer to derive single step theorems: if the processor is in a particular state
     689that satisfies some pre-conditions, after execution of an instruction it will
     690satisfy some post-conditions. We do not need single step theorems since Matita
     691is based on a logic that internalizes conversion and since we gave an
     692executable formalization, i.e. an emulation function from states to states.
     693Application of the emulation function on the input state reduces to an output
     694state that already satisfies the appropriate conditions.
     695
     696Most of the difficulties we have had to face are related to the non uniformity
     697of 8 bits architectures, in terms of instruction set, addressing modes and
     698memory models. No difficulties like that are found when formalizing the ARM
     699instruction set.
     700
     701The closest project to CerCo is CompCert~\cite{compcert} which is about
     702the certification of an ARM compiler. CompCert includes a formalization
     703in Coq of a subset of ARM. Coq and Matita shares the same logic.
     704However, the two formalizations do not have much in common. First of all,
     705they provide an assembly level model (no instruction decoding), and so they
     706must trust an unformalized assembler and linker, while we provide our own.
     707I/O is also not considered at all.
     708Moreover they assume an idealized abstract and uniform memory model, while
     709we take in account the complicated memory model of the MCS-51 architecture.
     710Finally, they only formalized about 90 instructions of the 200+ offered by the
     711processor, and they augmented the assembly with macro instructions that are
     712turned into real instructions only during communication with the external
     713assembler. Even technically the two formalizations differ: while we tried to
     714exploit as much as possible dependent types, they stick to the non dependent
     715fragment of the logic of Coq.
     716
     717In~\cite{Robert Atkey. CoqJVM: An executable specification of the Java virtual machine using dependent types. In Marino Miculan, Ivan Scagnetto, and Furio Honsell, editors, TYPES, volume 4941 of Lecture Notes in Computer Science, pages 18–32.  Springer, 2007.} Atkey presents an executable specification of the Java
     718virtual machine using dependent types. As we do, dependent types are used
     719to remove spurious partiality from the model and to lower the need of over-specifying the behaviour of the processor in impossible cases. Dependent types will
     720also help to maintain invariants when proving the correctness of the CerCo compiler.
     721
     722Finally, in~\cite{Susmit Sarkar, Pater Sewell, Francesco Zappa Nardelli, Scott Owens, Tom Ridge, Thomas Braibant Magnus O. Myreen, and Jade Alglave. The semantics of x86-CC multiprocessor machine code. In Principles of Programming Languages (POPL). ACM, 2009.} Sarkar and alt. provide an executable semantics for the
     723x86-CC multiprocessor machine code, which exhibits an high degree of non uniformity as the 8051. However, they only consider a very small subset of the
     724instructions and they over-approximate the possibilities of unorthogonality of
     725the instruction set, dodging the problems we had to face.
     726The most interesting idea to us in their formalization is the specification of
     727the decode function, that is particularly error prone. What they did is to
     728formalize in HOL a small language of patterns that is used in the data sheets
     729of the x86, so that the decoding function is later implemented simply by
     730copying the relevant lines from the manual into the HOL script.
     731We are currently considering if implementing a similar solution in Matita.
     732However, we would prefer to certify in Matita a compiler for the pattern
     733language so that the data sheets could be compiled down to the efficient code
     734that we provide in place of having to inefficiently interpret the data sheets
     735every time an instruction is executed.
     736
     737
     738%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     739% SECTION                                                                      %
     740%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     741\section{Conclusions}
     742\label{sect.conclusions}
    662743
    663744\CSC{Tell what is NOT formalized/formalizable: the HEX parser/pretty printer
     
    668749- WE FORMALIZE ALSO I/O ETC. NOT ONLY THE INSTRUCTION SELECTION (??)
    669750  How to test it? Specify it?
    670 
    671 ------------------------------------------
    672 
    673 Hardware verification?
    674 
    675 ------------------------------------------------------------
    676 
    677 A Trustworthy Monadic Formalization of the ARMv7 Instruction Set Architecture Anthony Fox and Magnus O. Myreen:
    678 
    679 - The model operates at the machine code level, as opposed to a more abstract
    680 assembly code level. In particular, the model does not provide assembly
    681 level “pseudo instructions” and instruction decoding is explicitly modelled
    682 inside the logic. This means that the the model can be directly validated
    683 by comparing results against the behaviour of hardware employing ARM
    684 processors – this is discussed in Section 5.
    685 
    686 - Monadic style: what do we do in OCaml/Matita?
    687 - Single Step Theorems
    688 - validation against developments boards and random testing
    689 
    690 This section discusses related work in formalizing various commercial instruction
    691 set architectures using interactive theorem provers, i.e. in ACL2, Coq, Isabelle
    692 and HOL4. There is much work that is indirectly related, but here we exclude
    693 non-commercial architectures (e.g. DLX) and informal or semi-formal ISA mod-
    694 els (e.g. in C, System Verilog, Haskell and so forth).
    695 
    696 ARM (this paper)
    697 x86 (15/7)
    698 JVM (many)
    699 
    700 WE ALSO HAVE AN ASSEMBLER (THEY DON'T)
    701 
    702 -----------------
    703 
    704 CompCert:
    705  - assembly level model (no instruction decoding, etc.)
    706  - abstract memory model
    707  They also have a more abstract view of memory which is expressed in terms of memory blocks, in contrast to our very concrete mapping from 32-bit addresses to 8-bit data.
    708  - 90 instructions of the 200+ offered by the processor; not all registers too
    709  - macro instructions that are turned into real instructions only during
    710    pretty-printing!
    711  - no proof of correctness of assembly and linking
    712 
    713 
    714 -----------------
    715 
    716 Robert Atkey. CoqJVM: An executable specification of the Java virtual machine
    717 using dependent types. In Marino Miculan, Ivan Scagnetto, and Furio Honsell,
    718 editors, TYPES, volume 4941 of Lecture Notes in Computer Science, pages 18–32.
    719 Springer, 2007.
    720 
    721 CoqJVM dependent types:
    722 
    723                                        We remove this spurious partiality from
    724 the model by making use of dependent types to maintain invariants about the
    725 state of the JVM. These invariants are then available to all proofs concerning
    726 the model.
    727 
    728             Our belief is that this will make large-scale proofs using the model
    729 easier to perform, and we have some initial evidence that this is the case, but
    730 detailed research of this claim is still required.
    731 
    732 ------------------------------------
    733 
    734 Susmit Sarkar, Pater Sewell, Francesco Zappa Nardelli, Scott Owens, Tom Ridge,
    735 Thomas Braibant Magnus O. Myreen, and Jade Alglave. The semantics of x86-CC
    736 multiprocessor machine code. In Principles of Programming Languages (POPL).
    737 ACM, 2009.
    738 
    739 (What Yann complained on was done in this paper:)
    740                                     To make the semantics
    741 scalable, without introducing many errors, we formalised
    742 the interpretation of these encodings inside the HOL logic,
    743 and built a HOL decoding function by directly copying the
    744 relevant lines from the manual into the HOL script.
    745 
    746 However, they only consider a very small subset of the instructions and they
    747 over-approximate the possibilities of unorthogonality of the instruction set.
    748 
    749 
    750 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    751 % SECTION                                                                      %
    752 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    753 \section{Conclusions}
    754 \label{sect.conclusions}
    755751
    756752\newpage
     
    890886\end{tabular*}
    891887\end{center}
    892 
    893888\end{document}
Note: See TracChangeset for help on using the changeset viewer.