Changeset 546
 Timestamp:
 Feb 16, 2011, 6:12:45 PM (8 years ago)
 File:

 1 edited
Legend:
 Unmodified
 Added
 Removed

Deliverables/D4.1/ITPPaper/itp2011.tex
r545 r546 219 219 In Section~\ref{sect.conclusions} we conclude the paper. 220 220 221 In Appendices~\ref{sect.listing.main.ocaml.functions} and~\ref{sect.listing.main.matita.functions} we provide a brief overview of the main functions in our implementation, and describe at a high level what they do. 222 221 223 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 222 224 % SECTION % … … 518 520 Using this coercion opens a proof obligation wherein we must prove that the \texttt{addressing\_mode\_tag} in correspondence with the \texttt{addressing\_mode} is a member of the \texttt{Vector} of permissible \texttt{addressing\_mode\_tag}s. 519 521 This impels us to state and prove a number of auxilliary lemmas. 520 For instance, we prove that if an \texttt{addressing\_mode\_tag} is a member of a \texttt{Vector}, and we possess a supervector, then the same \texttt{addressing\_mode\_tag} is a member of this supervector. 521 Without combining Matita's automation and lemmas such as these our approach becomes infeasible: type checking the main \texttt{execute1} function opens up over 200 proof obligations. 522 For instance, we prove that if an \texttt{addressing\_mode\_tag} is a member of a \texttt{Vector}, and we possess another vector with additional elements, then the same \texttt{addressing\_mode\_tag} is a member of this vector. 523 Using these lemmas, and Matita's automation, all proof obligations are solved easily. 524 (Type checking the main \texttt{execute\_1} function, for instance, opens up over 200 proof obligations.) 522 525 523 526 The machinery just described allows us to state in the type of a function what addressing modes that function expects. … … 541 544 Attempting to match against another addressing mode not indicated in the type (for example, \texttt{REGISTER}) will produce a typeerror. 542 545 543 % Note the execute_1 function which leaves open over 200 proof obligations which we can close automatically, provided we have the lemmas mentioned above544 % Give an example of the type of proof obligations left open?545 546 % Talk about extraction to O'Caml code, which hopefully will allow us to extract back to using polymorphic variants, or when extracting vectors we could extract using phantom types 546 547 % Discuss alternative approaches, i.e. Sigma types to piece together smaller types into larger ones, as opposed to using a predicate to `cut out' pieces of a larger type, which is what we did … … 637 638 Processor status: 638 639 639 ACC : 0 (00000000) B : 0 (00000000) 640 PSW : 0 (00000000) with flags set as: 641 CY : false AC : false 642 FO : false RS1 : false 643 RS0 : false OV : false 640 ACC : 0 (00000000) B : 0 (00000000) PSW : 0 (00000000) 641 with flags set as: 642 CY : false AC : false FO : false 643 RS1 : false RS0 : false OV : false 644 644 UD : false P : false 645 645 SP : 7 (00000111) IP : 0 (00000000) 646 646 PC : 8 (0000000000001000) 647 DPL : 0 (00000000) DPH : 0 (00000000) 648 SCON: 0 (00000000) SBUF: 0 (00000000) 649 TMOD: 0 (00000000) TCON: 0 (00000000) 647 DPL : 0 (00000000) DPH : 0 (00000000) SCON: 0 (00000000) 648 SBUF: 0 (00000000) TMOD: 0 (00000000) TCON: 0 (00000000) 650 649 Registers: 651 R0 : 0 (00000000) R1 : 0 (00000000) 652 R2 : 0 (00000000) R3 : 0 (00000000) 653 R4 : 0 (00000000) R5 : 0 (00000000) 650 R0 : 0 (00000000) R1 : 0 (00000000) R2 : 0 (00000000) 651 R3 : 0 (00000000) R4 : 0 (00000000) R5 : 0 (00000000) 654 652 R6 : 0 (00000000) R7 : 0 (00000000) 655 653 … … 671 669 \section{Related work} 672 670 \label{sect.related.work} 673 There is a quite large litterature on formalization of microprocessors. 674 Most of it is aimed at the proof of correctness of the implementation of 675 microprocessor at the microcode or electric level. 676 Instead, we are only interested in providing a precise specification of the 677 behaviour of the microprocessor in order to prove correctness of compilers. 678 In particular, we are also interested in the precise timings of the processor. 679 Moreover, we have not only formalized the interface of an MCS51 processor, 680 but a complete MCS51 system: the UART, the I/O lines, the hardware timers. 681 %An assembler for MCS51 is also provided, and it supports pseudo instructions. 682 683 One of the closest work to our is~\cite{A Trustworthy Monadic Formalization of the ARMv7 Instruction Set Architecture Anthony Fox and Magnus O. Myreen} which 684 describes the formalization of the ARMv7 instruction set architecture. It also 685 contains a good list of references to related work in the litterature. 686 687 Like we do, their model also operates at the machine code level, as opposed to 688 a more abstract assembly code level. In particular, instruction decoding is 689 explicitly modelled inside the logic. In addition to this, we also provide 690 an assembly level together with an assembler to translate the instructions 691 and pseudoinstruction to machine code. 692 693 From the perspective of trustworthyness, we have only performed a nonexhaustive 694 validation against a preexistent emulator, while in~\cite{again} they have 695 used development boards and random testing to increase their confidence in the 696 formalization. At the moment, we leave this to future works. 697 698 Another difference is that in~\cite{again} the authors provide an automation 699 layer to derive single step theorems: if the processor is in a particular state 700 that satisfies some preconditions, after execution of an instruction it will 701 satisfy some postconditions. We do not need single step theorems since Matita 702 is based on a logic that internalizes conversion and since we gave an 703 executable formalization, i.e. an emulation function from states to states. 704 Application of the emulation function on the input state reduces to an output 705 state that already satisfies the appropriate conditions. 706 707 Most of the difficulties we have had to face are related to the non uniformity 708 of 8 bits architectures, in terms of instruction set, addressing modes and 709 memory models. No difficulties like that are found when formalizing the ARM 710 instruction set. 711 712 The closest project to CerCo is CompCert~\cite{compcert} which is about 713 the certification of an ARM compiler. CompCert includes a formalization 714 in Coq of a subset of ARM. Coq and Matita shares the same logic. 715 However, the two formalizations do not have much in common. First of all, 716 they provide an assembly level model (no instruction decoding), and so they 717 must trust an unformalized assembler and linker, while we provide our own. 718 I/O is also not considered at all. 719 Moreover they assume an idealized abstract and uniform memory model, while 720 we take in account the complicated memory model of the MCS51 architecture. 721 Finally, they only formalized about 90 instructions of the 200+ offered by the 722 processor, and they augmented the assembly with macro instructions that are 723 turned into real instructions only during communication with the external 724 assembler. Even technically the two formalizations differ: while we tried to 725 exploit as much as possible dependent types, they stick to the non dependent 726 fragment of the logic of Coq. 727 728 In~\cite{Robert Atkey. CoqJVM: An executable specification of the Java virtual machine using dependent types. In Marino Miculan, Ivan Scagnetto, and Furio Honsell, editors, TYPES, volume 4941 of Lecture Notes in Computer Science, pages 18–32. Springer, 2007.} Atkey presents an executable specification of the Java 729 virtual machine using dependent types. As we do, dependent types are used 730 to remove spurious partiality from the model and to lower the need of overspecifying the behaviour of the processor in impossible cases. Dependent types will 731 also help to maintain invariants when proving the correctness of the CerCo compiler. 732 733 Finally, in~\cite{Susmit Sarkar, Pater Sewell, Francesco Zappa Nardelli, Scott Owens, Tom Ridge, Thomas Braibant Magnus O. Myreen, and Jade Alglave. The semantics of x86CC multiprocessor machine code. In Principles of Programming Languages (POPL). ACM, 2009.} Sarkar and alt. provide an executable semantics for the 734 x86CC multiprocessor machine code, which exhibits an high degree of non uniformity as the 8051. However, they only consider a very small subset of the 671 There exists a large body of literature on the formalisation of microprocessors. 672 The majority of it aims to prove correctness of the implementation of the microprocessor at the microcode or gate level. 673 However, we are interested in providing a precise specification of the behaviour of the microprocessor in order to prove the correctness of a compiler which will target the processor. 674 In particular, we are interested in intensional properties of the processor; precise timings of instruction execution in clock cycles. 675 Moreover, in addition to formalising the interface of an MCS51 processor, we have also built a complete MCS51 ecosystem: the UART, the I/O lines, and hardware timers, along with an assembler. 676 677 Similar work to ours can be found in~\cite{A Trustworthy Monadic Formalization of the ARMv7 Instruction Set Architecture Anthony Fox and Magnus O. Myreen}. 678 Here, the authors describe the formalisation, in HOL4, of the ARMv7 instruction set architecture, and point to a good list of references to related work in the literature. 679 This formalisation also considers the machine code level, as opposed to only considering an abstract assembly language. 680 In particular, instruction decoding is explicitly modelled inside HOL4's logic. 681 However, we go further in also providing an assembly language, complete with assembler, to translate instructions and pseudoinstruction to machine code. 682 683 Further, in~\cite{again} the authors validated their formalisation by using development boards and random testing. 684 However, we currently rely on nonexhaustive testing against a third party emulator. 685 We leave similar exhaustive testing for future work. 686 687 Executability is another key difference between our work and~\cite{again}. 688 In~\cite{again} the authors provide an automation layer to derive single step theorems: if the processor is in a particular state that satisfies some preconditions, then after execution of an instruction it will reside in another state satisfying some postconditions. 689 We do not need single step theorems of this form. 690 This is because Matita is based on a logic that internalizes conversion. 691 As a result, our formalisation is executable: applying the emulation function to an input state eventually reduces to an output state that already satisfies the appropriate conditions. 692 693 Our main difficulties resided in the nonuniformity of an old 8bit architecture, in terms of the instruction set, addressing modes and memory models. 694 In contrast, the ARM instruction set and memory model is relatively uniform, simplifying any formalisation considerably. 695 696 Perhaps the closest project to CerCo is CompCert~\cite{compcert}. 697 CompCert concerns the certification of an ARM compiler and includes a formalisation in Coq of a subset of ARM. 698 Coq and Matita essentially share the same logic. 699 700 Despite this similarity, the two formalisations do not have much in common. 701 First, CompCert provides a formalisation at the assembly level (no instruction decoding), and this impels them to trust an unformalised assembler and linker, whereas we provide our own. 702 I/O is also not considered at all in CompCert. 703 Moreover an idealized abstract and uniform memory model is assumed, while we take into account the complicated overlapping memory model of the MCS51 architecture. 704 Finally, around 90 instructions of the 200+ offered by the processor are formalised in CompCert, and the assembly language is augmented with macro instructions that are turned into `real' instructions only during communication with the external assembler. 705 Even from a technical level the two formalisations differ: while we tried to exploit dependent types as often as possible, CompCert largely sticks to the nondependent fragment of Coq. 706 707 In~\cite{Robert Atkey. CoqJVM: An executable specification of the Java virtual machine using dependent types. In Marino Miculan, Ivan Scagnetto, and Furio Honsell, editors, TYPES, volume 4941 of Lecture Notes in Computer Science, pages 1832. Springer, 2007.} Atkey presents an executable specification of the Java virtual machine which uses dependent types. 708 As we do, dependent types are used to remove spurious partiality from the model, and to lower the need for overspecifying the behaviour of the processor in impossible cases. 709 Our use of dependent types will also help to maintain invariants when we prove the correctness of the CerCo prototype compiler. 710 711 Finally, in~\cite{Susmit Sarkar, Pater Sewell, Francesco Zappa Nardelli, Scott Owens, Tom Ridge, Thomas Braibant Magnus O. Myreen, and Jade Alglave. The semantics of x86CC multiprocessor machine code. In Principles of Programming Languages (POPL). ACM, 2009.} Sarkar et al provide an executable semantics for x86CC multiprocessor machine code, which exhibits an high degree of non uniformity as the 8051. However, they only consider a very small subset of the 735 712 instructions and they overapproximate the possibilities of unorthogonality of 736 713 the instruction set, dodging the problems we had to face.
Note: See TracChangeset
for help on using the changeset viewer.