Ignore:
Timestamp:
Sep 10, 2011, 12:09:42 PM (8 years ago)
Author:
mulligan
Message:

implemented referees comments

Location:
Deliverables/D4.1/ITP-Paper
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • Deliverables/D4.1/ITP-Paper/itp-2011.bib

    r1199 r1203  
    5555
    5656@inproceedings
     57{ myreen:extensible:2009,
     58  author = {M. O. Myreen and K. Slind and Michael J. C. Gordon},
     59  title = {Extensible proof-producing compilation},
     60  booktitle = {CC},
     61  volume = {5501},
     62  pages = {2--16},
     63  year = {2009}
     64}
     65
     66@inproceedings
     67{ hunt:verifying:2010,
     68  author = {W. A. Hunt Jr.},
     69  title = {Verifying {VIA Nano} microprocessor components},
     70  booktitle = {FMCAD},
     71  year = {2010}
     72}
     73
     74@inproceedings
    5775{ yan:wcet:2008,
    5876  author = {J. Yan and W. Zhang},
  • Deliverables/D4.1/ITP-Paper/itp-2011.tex

    r1202 r1203  
    9494Formal methods aim to increase our confidence in the design and implementation of software.
    9595Ideally, all software should come equipped with a formal specification and a proof of correctness for the corresponding implementation.
    96 The majority of programs are written in high level languages and then compiled into low level ones.
    97 Specifications are therefore also given at a high level and correctness can be proved by reasoning on the program's source code.
     96Most programs are written in high level languages and then compiled into low level ones.
     97Specifications are also given at a high level and correctness can be proved by reasoning on the program's source code.
    9898The code that is actually run is not the high level source code that we reason on, but low level code generated by the compiler.
    9999
     
    295295\paragraph*{Overview of paper}\quad
    296296In Section~\ref{sect.design.issues.formalisation} we discuss design issues in the development of the formalisation.
     297In Section~\ref{sect.correctness.pseudoinstruction.expansion} we briefly discuss some difficulties present in the ongoing proof of correctness for the assembly process.
    297298In Section~\ref{sect.validation} we discuss validation of the emulator to ensure that what we formalised was an accurate model of an MCS-51 series microprocessor.
    298 In Section~\ref{sect.related.work} we describe previous work, with an eye toward describing its relation with the work described herein.
    299 In Section~\ref{sect.conclusions} we conclude.
     299In Section~\ref{sect.related.work} we describe previous, related work.
     300Section~\ref{sect.conclusions} concludes.
    300301
    301302%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     
    316317As a programming language, Matita corresponds to the functional fragment of O'Caml extended with dependent types.
    317318Matita also features a rich higher-order logic for reasoning about programs, including a universe hierarchy with a single impredicative universe, \texttt{Prop}, and potentially infinitely many predicative universes \texttt{Type[i]} for $0 \geq i$.
    318 Unlike O'Caml, all recursive functions admitted by the Matita typechecker must be structurally recursive and total.
     319All recursive functions admitted by the Matita typechecker must be structurally recursive and total.
    319320
    320321We box Matita code to distinguish it from O'Caml code.
     
    674675During the assembly phase, where labels and pseudoinstructions are eliminated, a map is generated associating cost labels with memory locations.
    675676This map is later used in a separate analysis which computes the cost of a program by traversing through a program, fetching one instruction at a time, and computing the cost of blocks.
    676 When targetting more complex processors, this simple analysis will need to be replaced by a more sophisticated WCET analysis.
    677 These block costings are stored in another map, and will later be passed back to the prototype compiler.
     677When targetting more complex processors, this simple analysis could be replaced by a sophisticated WCET analysis.
     678Block costings are stored in another map, and will be passed back to the compiler.
     679
     680%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     681% SECTION                                                                      %
     682%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     683\section{Correctness of pseudoinstruction expansion}
     684\label{sect.correctness.pseudoinstruction.expansion}
     685
     686With the addition of pseudoinstructions and labels, an assembly language is induced on top of the machine language.
     687Pseudoinstructions in an assembly program must be expanded into machine instructions prior to execution.
     688As we are interested in constructing a verified compiler, this expansion process must be shown to be `correct', avoiding any semantic changes to the assembly program.
     689
     690Demonstrating such `correctness'is subtle due to jump pseudoinstructions.
     691We would like to expand jump pseudoinstructions to the smallest possible jump machine instruction.
     692Doing this is non-trivial, as it is easy to construct programs with configurations of jumps where larger jump instructions can only be `shrunk' if and only if others are also shrunk.
     693This is a well-known problem in the assembly programming community, but causes difficulties for the correctness proof of an assembler that purports to produce small executables.
     694
     695Another problem is computing the cost of pseudoinstructions, a necessary step for lifting our cost model to the C-source level in a cost-preserving manner.
     696Conditional jumps can be expanded in complex ways, where the `true' and `false branches' of the jump instruction get expanded into pieces of machine code with differing costs.
     697What, then, is the cost of a conditional jump pseudoinstruction?
     698
     699How we solve these problems, and a full description of the assembler's correctness proof, will be described in a following publication.
    678700
    679701%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     
    738760We have also built a complete MCS-51 ecosystem: UART, I/O lines, and hardware timers, complete with an assembler.
    739761
    740 Work closely related to our own can be found in~\cite{fox:trustworthy:2010}.
    741 Here, the authors describe the formalisation, in HOL4, of the ARMv7 instruction set architecture.
     762Fox \emph{et al} formalised, in HOL4, the ARMv7 instruction set architecture, along with various related tools, such as assemblers~~\cite{fox:trustworthy:2010}.
     763The formalisation is executable both within (using rewriting) and outwith the theorem prover.
    742764They further point to an excellent list of references to related work in the literature.
    743 This formalisation also considers the machine code level, opposed to their formalisation, which only considering an abstract assembly language.
    744 Instruction decoding is explicitly modeled inside HOL4's logic.
    745 We go further in also providing an assembly language, complete with assembler, to translate instructions and pseudoinstruction into machine code.
     765In further related work~\cite{myreen:extensible:2009}, a verified compiler for a Lisp-like language is presented, targetting this formalised microprocessor.
    746766
    747767Further, in~\cite{fox:trustworthy:2010} the authors validated their formalisation by using development boards and random testing.
    748768We currently rely on non-exhaustive testing against a third party emulator.
    749769We recognise the importance of this exhaustive testing, but currently leave it for future work.
    750 
    751 Executability is another key difference between our work and that of~\cite{fox:trustworthy:2010}.
    752 Our formalisation is executable: applying the emulation function to an input state eventually reduces to an output state.
    753 This is because Matita is based on a logic, CIC, which internalizes conversion.
    754 In~\cite{fox:trustworthy:2010} the authors provide an automation layer to derive single step theorems: if the processor is in a state that satisfies some preconditions, then after execution of an instruction it will reside in a state satisfying some postconditions.
    755 We will not need single step theorems of this form to prove properties of the assembly code.
     770Both models---ours and that of~\cite{fox:trustworthy:2010}---are `executable'.
    756771
    757772Our main difficulties resided in the non-uniformity of an old 8-bit architecture, in terms of the instruction set, addressing modes and memory models.
     
    760775Other related projects CompCert~\cite{leroy:formally:2009} and the CLI Stack.
    761776CompCert concerns the certification of a C compiler and includes a formalisation in Coq of a subset of PowerPC.
    762 The CLI Stack consists of the design and verification of a whole chain of artefacts including a 32-bit microprocessor, the Piton assembler and two compilers.
     777The CLI Stack consists of the design and verification of a whole chain of artefacts including a 32-bit microprocessor (FM9001), the Piton assembler and two compilers.
    763778Like CerCo, the CLI Stack compilers gave the cost of high-level instructions in processor cycles.
    764779The CLI Stack high-level languages ($\mu$Gypsy and Nqthm Lisp) and FM9001 microprocessor were not commercial products, but bespoke designs for the purpose of verification (see~\cite{moore:grand:2005}).
     780This work has been continued by Hunt and Swords, verifying the Centaur Technologies Via Nano and Isaiah micoprocessors in ACL2~\cite{hunt:verifying:2010}.
    765781
    766782The CompCert C compiler is extracted to O'Caml using Coq's code extraction facility.
     
    768784We aim to make use of a similar code extraction facility in Matita, but only if the extracted code exhibits the same degree of type safety, provided by polymorphic variants, and human readability that the O'Caml emulator posseses.
    769785This is because we aim to use the emulator as a library for non-certified software written directly in O'Caml.
    770 How we have used Matita's dependent types to handle the instruction set (Subsection~\ref{subsect.instruction.set.unorthogonality}) could enable code extraction to make use of polymorphic variants.
     786Matita's dependent typesm, used to handle the instruction set (Subsection~\ref{subsect.instruction.set.unorthogonality}), could enable code extraction to make use of polymorphic variants.
    771787Using Coq's current code extraction algorithm we could write assembly programs that would generate runtime errors when emulated.
    772788We leave this for future work.
     
    811827We provide an assembly language, enhanched with labels and pseudoinstructions, and an assembler from this language to machine code.
    812828This assembly language is similar to those found in `industrial strength' compilers, such as SDCC.
     829A proof of correctness for the assembly process is ongoing.
    813830We introduce cost labels in the machine language to relate the data flow of the assembly program to that of the C source language, in order to associate costs to the C program.
    814 For the O'Caml version, we provide a parser and pretty printer from code memory to Intel HEX.
    815 Hence we can perform testing on programs compiled using any free or commercial compiler.
    816831
    817832Our main difficulty in formalising the MCS-51 was the unorthogonality of its memory model and instruction set.
     
    822837All instructions have been tested at least once, but we have not yet pushed testing further, for example with random testing or by using development boards.
    823838I/O in particular has not been tested yet, and it is currently unclear how to provide exhaustive testing in the presence of I/O.
    824 Finally, we are aware of having over-specified the processor in several places, by fixing a behaviour hopefully consistent with the real machine, where manufacturer data sheets are ambiguous or under-specified.
     839Finally, we are aware of having over-specified the processor in several places, by fixing a behaviour hopefully consistent with the real machine, where manufacturer data sheets are ambiguous or under-specified.\footnote{Submitted paper 16 pages by permission of program committee.}
    825840
    826841\bibliography{itp-2011}
Note: See TracChangeset for help on using the changeset viewer.