Changeset 571


Ignore:
Timestamp:
Feb 18, 2011, 11:49:36 AM (6 years ago)
Author:
mulligan
Message:

Finished tightening everything up. Just under 16 pages. Cannot add another sentence without it going over!

Location:
Deliverables/D4.1/ITP-Paper
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • Deliverables/D4.1/ITP-Paper/itp-2011.bib

    r568 r571  
    4444{ atkey:coqjvm:2007,
    4545  author = {Robert Atkey},
    46   title = {{CoqJVM}: An executable specification of the Java virtual machine using dependent types},
     46  title = {{CoqJVM}: An executable specification of the {Java Virtual Machine} using dependent types},
    4747  booktitle = {Proceedings of the Conference of the {TYPES} Project},
    4848  pages = {18--32},
  • Deliverables/D4.1/ITP-Paper/itp-2011.tex

    r570 r571  
    419419\label{subsect.anatomy.matita.emulator}
    420420
    421 The internal state of our Matita emulator is represented as a record:
     421The internal state of the Matita emulator is represented as a record:
    422422\begin{lstlisting}
    423423record Status: Type[0] ≝ {
     
    431431  ...  }.
    432432\end{lstlisting}
    433 This record neatly encapsulates the current memory contents, the program counter, the state of the current SFRs, and so on.
    434 
    435 Here the MCS-51 memory model is implemented using four disjoint memory spaces plus the SFRs.
    436 From the programmer's point of view, what matters are addressing modes that are in a many-to-many relation with the spaces.
    437 \texttt{DIRECT} addressing can be used to address either low internal RAM (if the first bit is 0) or the SFRs (if the first bit is 1), for instance.
    438 That's why DIRECT uses 8-bit addresses but pointers to the low internal RAM only use 7 bits.
    439 The complexity of the memory model is captured in the \texttt{get\_arg\_XX} and \texttt{set\_arg\_XX} functions that get and set data of size \texttt{XX} from memory, considering all addressing modes
     433This record encapsulates the current memory contents, the program counter, the state of the current SFRs, and so on.
     434
     435Here the MCS-51's memory model is implemented using four disjoint memory spaces, plus SFRs.
     436From the programmer's point of view, what \emph{really} matters are the addressing modes that are in a many-to-many relationship with the spaces.
     437\texttt{DIRECT} addressing can be used to address either lower IRAM (if the first bit is 0) or the SFRs (if the first bit is 1), for instance.
     438That's why DIRECT uses 8-bit addresses but pointers to lower IRAM only use 7 bits.
     439The complexity of the memory model is captured in a pair of functions, \texttt{get\_arg\_XX} and \texttt{set\_arg\_XX}, that `get' and `set' data of size \texttt{XX} from memory.
    440440
    441441%Overlapping, and checking which addressing modes can be used to address particular memory spaces, is handled through numerous \texttt{get\_arg\_XX} and \texttt{set\_arg\_XX} (for 1, 8 and 16 bits) functions.
    442442
    443 Both the Matita and O'Caml emulators follows the classic `fetch-decode-execute' model of processor operation.
     443Both the Matita and O'Caml emulators follow the classic `fetch-decode-execute' model of processor operation.
    444444The next instruction to be processed, indexed by the program counter, is fetched from code memory with \texttt{fetch}.
    445 An updated program counter, along with the concrete cost, in processor cycles for executing this instruction, is also returned.
    446 These costs are taken from a Siemens Semiconductor Group data sheet for the MCS-51~\cite{siemens:2011}, and will likely vary across manufacturers and particular derivatives of the processor.
     445An updated program counter, along with its concrete cost in processor cycles, is also returned.
     446These costs are taken from a Siemens Semiconductor Group data sheet for the MCS-51~\cite{siemens:2011}, and will likely vary across manufacturers and derivatives of the processor.
    447447\begin{lstlisting}
    448448definition fetch: BitVectorTrie Byte 16 $\rightarrow$ Word $\rightarrow$ instruction $\times$ Word $\times$ nat
     
    471471A callback function was also accepted as an argument, which could `witness' the execution as it happened, providing a print-out of the processor state.
    472472Due to Matita's termination requirement, \texttt{execute} cannot execute a program indefinitely.
    473 An alternative would be to produce an infinite stream of statuses representing an execution trace.
     473An alternative approach would be to produce an infinite stream of statuses representing an execution trace.
    474474Matita supports infinite streams through co-inductive types.
    475475
     
    480480\label{subsect.instruction.set.unorthogonality}
    481481
    482 A peculiarity of the MCS-51 is the non-orthogonality of its instruction set.
    483 For instance, the \texttt{MOV} instruction, can be invoked using one of 16 combinations of addressing modes out of a possible 361.
     482A peculiarity of the MCS-51 is its unorthogonal instruction set.
     483For instance, the \texttt{MOV} instruction can be invoked using one of 16 combinations of addressing modes out of a possible 361.
    484484
    485485% Show example of pattern matching with polymorphic variants
    486486
    487 Such non-orthogonality in the instruction set was handled with the use of polymorphic variants in the O'Caml emulator.
     487Such unorthogonality in the instruction set was handled with the use of polymorphic variants in O'Caml.
    488488For instance, we introduced types corresponding to each addressing mode:
    489489\begin{lstlisting}
     
    513513For our purposes, the types \texttt{union2}, \texttt{union3} and \texttt{union6} sufficed.
    514514
    515 This polymorphic variant machinery worked well: it introduced a certain level of type safety (for instance, the type of our \texttt{MOV} instruction above guarantees it cannot be invoked with arguments in the \texttt{carry} and \texttt{data16} addressing modes, respectively), and also allowed us to pattern match against instructions, when necessary.
     515This polymorphic variant machinery worked well: it introduced a certain level of type safety (for instance, the type of \texttt{MOV} above guarantees it cannot be invoked with arguments in the \texttt{carry} and \texttt{data16} addressing modes, respectively), and also allowed us to pattern match against instructions, when necessary.
    516516However, this polymorphic variant machinery is \emph{not} present in Matita.
    517517We needed some way to produce the same effect, which Matita supported.
     
    544544The \texttt{is\_in} function checks if an \texttt{addressing\_mode} matches a set of tags represented as a vector. It simply extends the \texttt{is\_a} function in the obvious manner.
    545545
    546 Finally, a \texttt{subaddressing\_mode} is an ad-hoc non empty $\Sigma$-type of addressing
    547 modes constrained to be in a given set of tags:
     546A \texttt{subaddressing\_mode} is an \emph{ad hoc} non-empty $\Sigma$-type of \texttt{addressing\_mode}s constrained to be in a set of tags:
    548547\begin{lstlisting}
    549548record subaddressing_mode n (l: Vector addressing_mode_tag (S n)): Type[0] :=
     
    551550   subaddressing_modein: bool_to_Prop (is_in ? l subaddressing_modeel) }.
    552551\end{lstlisting}
    553 An implicit coercion is provided to promote vectors of tags (denoted with
    554 $\llbracket - \rrbracket$)
    555 to the
    556 corresponding \texttt{subaddressing\_mode} so that we can use a syntax
    557 close to the O'Caml one to specify preinstructions:
     552An implicit coercion is provided to promote vectors of tags (denoted with $\llbracket - \rrbracket$) to the corresponding \texttt{subaddressing\_mode} so that we can use a syntax close to that of O'Caml to specify \texttt{preinstruction}s:
    558553\begin{lstlisting}
    559554inductive preinstruction (A: Type[0]): Type[0] ≝
     
    566561% One of these coercions opens up a proof obligation which needs discussing
    567562% Have lemmas proving that if an element is a member of a sub, then it is a member of a superlist, and so on
    568 The final, missing component is a pair of type coercions from \texttt{addressing\_mode} to \texttt{subaddressing\_mode} and from \texttt{subaddressing\_mode} to \texttt{Type$\lbrack0\rbrack$}, respectively.
     563The final component is a pair of type coercions from \texttt{addressing\_mode} to \texttt{subaddressing\_mode} and from \texttt{subaddressing\_mode} to \texttt{Type$\lbrack0\rbrack$}, respectively.
    569564The first is a forgetful coercion, while the second opens a proof obligation wherein we must prove that the provided value is in the admissible set.
    570 These coercions were first introduced by PVS to implement subset types~\cite{shankar:principles:1999}, and later in Coq as an addition~\cite{sozeau:subset:2006}.
     565These coercions were first introduced by PVS to implement subset types~\cite{shankar:principles:1999}, and later in Coq as part of Russell~\cite{sozeau:subset:2006}.
    571566In Matita all coercions can open proof obligations.
    572567
    573 Proof obligations impels us to state and prove a few auxilliary lemmas related
    574 to transitivity of subtyping. For instance, an addressing mode that belongs
    575 to an allowed set also belongs to any one of its super-set. At the moment,
    576 Matita's automation exploits these lemmas to completely solve all the proof
    577 obligations opened in our formalization, comprising the 200 proof obligations
    578 related to the main \texttt{execute\_1} function.
    579 
    580 The machinery just described allows us to restrict the set of addressing
    581 modes expected by a function and use this information during pattern matching
    582 to skip impossible cases.
     568Proof obligations require us to state and prove a few auxilliary lemmas related to the transitivity of subtyping.
     569For instance, an \texttt{addressing\_mode} that belongs to an allowed set also belongs to any one of its supersets.
     570At the moment, Matita's automation exploits these lemmas to completely solve all the proof obligations opened in our formalisation.
     571The \texttt{execute\_1} function, for instance, opens over 200 proof obligations during type checking.
     572
     573The machinery just described allows us to restrict the set of \texttt{addressing\_mode}s expected by a function and use this information during pattern matching.
     574This allows us to skip impossible cases.
    583575For instance, consider \texttt{set\_arg\_16}, which expects only a \texttt{DPTR}:
    584576\begin{lstlisting}
     
    592584     | _ $\Rightarrow$ $\lambda$_: False. $\bot$ ] $~$(subaddressing_modein $\ldots$ a).
    593585\end{lstlisting}
    594 We give a proof (the expression \texttt{(subaddressing\_modein} $\ldots$ \texttt{a)}) that the argument $a$ is in the set $\llbracket$ \texttt{dptr} $\rrbracket$ to the match expression.
     586We give a proof (the expression \texttt{(subaddressing\_modein} $\ldots$ \texttt{a)}) that the argument $a$ is in the set $\llbracket$ \texttt{dptr} $\rrbracket$ to the \texttt{match} expression.
    595587In every case but \texttt{DPTR}, the proof is a proof of \texttt{False}, and the system opens a proof obligation $\bot$ that can be discarded using \emph{ex falso}.
    596 Attempting to match against a disallowed addressing mode (replacing \texttt{False} with \texttt{True} in the branch) produces a type-error.
    597 
    598 Other dependently and non-dependently typed solutions we tried were clumsy in practice.
    599 As we need a large number of different combinations of addressing modes to describe the whole instruction set, it is unfeasible to declare a data type for each one of these combinations.
    600 The current solution is the one that best matches the corresponding O'Caml code, to the point that the translation from O'Caml to Matita is almost syntactical.
    601 We would like to investigate the possibility of changing the code extraction procedure of Matita to recognise this programming pattern and output O'Caml code using polymorphic variants.
     588Attempting to match against a disallowed addressing mode (replacing \texttt{False} with \texttt{True} in the branch) produces a type error.
     589
     590We tried other dependently and non-dependently typed solutions before settling on this approach.
     591As we need a large number of different combinations of addressing modes to describe the whole instruction set, it is infeasible to declare a datatype for each one of these combinations.
     592The current solution is closest to the corresponding O'Caml code, to the point that the translation from O'Caml to Matita is almost syntactical.
     593We would like to investigate the possibility of changing the code extraction procedure of Matita so that it recognises this programming pattern and outputs O'Caml code using polymorphic variants.
    602594
    603595% Talk about extraction to O'Caml code, which hopefully will allow us to extract back to using polymorphic variants, or when extracting vectors we could extract using phantom types
     
    616608
    617609To accurately model timers and I/O, we add an unbounded integral field \texttt{clock} to the central \texttt{status} record.
    618 This field is only logical, since it does not represent any quantity stored in the actual processor, and is used to keep track of the current processor time.
     610This field is only logical, since it does not represent any quantity stored in the physical processor, and is used to keep track of the current `processor time'.
    619611Before every execution step, \texttt{clock} is incremented by the number of processor cycles that the instruction just fetched will take to execute.
    620 The processor then executes the instruction, followed by the code implementing the timers and I/O\footnote{Though it isn't fully specified by the manufacturer's data sheets if I/O is handled at the beginning or the end of each cycle.}. In order to model I/O, we also store in the status a
    621 \emph{continuation} which is a description of the behaviour of the environment:
     612The emulator then executes the instruction, followed by the code implementing the timers and I/O\footnote{Though it isn't fully specified by the manufacturer's data sheets if I/O is handled at the beginning or the end of each cycle.}.
     613In order to model I/O, we also store in \texttt{status} a \emph{continuation} which is a description of the behaviour of the environment:
    622614\begin{lstlisting}
    623615type line =
     
    628620  [`Out of (time -> line -> time * continuation)]
    629621\end{lstlisting}
    630 At each moment, the second projection of the continuation $k$ describes how the environment will react to an output event performed in the future by the processor. Let $\pi_2(k)(\tau,o) = \langle \tau',k' \rangle$.
    631 If the processor at time $\tau$ starts an asynchronous output $o$ either on the P1 or P3 output lines, or on the UART, then the environment will receive the output at time $\tau'$.
    632 Moreover the status is immediately updated with the continuation $k'$.
    633 
    634 Further, if $\pi_1(k) = \mathtt{Some}~\langle \tau',i,\epsilon,k'\rangle$, then at time $\tau'$ the environment will send the asynchronous input $i$ to the processor and the status will be updated with the continuation $k'$.
    635 This input is visible to the processor only at time $\tau' + \epsilon$.
     622At each moment, the second projection of the continuation $k$ describes how the environment will react to an output event performed in the future by the processor.
     623Suppose $\pi_2(k)(\tau,o) = \langle \tau',k' \rangle$.
     624If the emulator at time $\tau$ starts an asynchronous output $o$ either on the P1 or P3 output lines, or on the UART, then the environment will receive the output at time $\tau'$.
     625Moreover \texttt{status} is immediately updated with the continuation $k'$.
     626
     627Further, if $\pi_1(k) = \mathtt{Some}~\langle \tau',i,\epsilon,k'\rangle$, then at time $\tau'$ the environment will send the asynchronous input $i$ to the emulator and \texttt{status} is updated with the continuation $k'$.
     628This input is visible to the emulator only at time $\tau' + \epsilon$.
    636629
    637630The time required to perform an I/O operation is partially specified in the data sheets of the UART module.
     
    640633
    641634We use only the P1 and P3 lines despite the MCS-51 having four output lines, P0--P3.
    642 This is because P0 and P2 become inoperable if the processor is equipped with XRAM (which we assume it is).
     635This is because P0 and P2 become inoperable if the processor is equipped with XRAM (we assume it is).
    643636
    644637The UART port can work in several modes, depending on the how the SFRs are set.
    645 In an asyncrhonous mode, the UART transmits 8 bits at a time, using a ninth line for syncrhonization.
    646 In a syncrhonous mode the ninth line is used to transmit an additional bit.
     638In an asyncrhonous mode, the UART transmits 8 bits at a time, using a ninth line for synchronisation.
     639In a synchronous mode the ninth line is used to transmit an additional bit.
    647640
    648641%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     
    653646
    654647As mentioned in Subsection~\ref{subsect.labels.pseudoinstructions} we introduced a notion of \emph{cost label}.
    655 Cost labels are inserted by the prototype C compiler in specific locations in the object code.
     648Cost labels are inserted by the prototype C compiler at specific locations in the object code.
    656649Roughly, for those familiar with control flow graphs, they are inserted at the start of every basic block.
    657650
     
    669662We spent considerable effort attempting to ensure that what we have formalised is an accurate model of the MCS-51 microprocessor.
    670663
    671 First, we made use of multiple data sheets, each from a different semiconductor manufacturer.  This helped us spot errors in the specification of the processor's instruction set, and its behaviour, for instance, in a datasheet from Philips.
     664We made use of multiple data sheets, each from a different semiconductor manufacturer.
     665This helped us triangulate errors in the specification of the processor's instruction set, and its behaviour, for instance, in a data sheet from Philips Semiconductor.
    672666
    673667The O'Caml prototype was especially useful for validation purposes.
    674 This is because we wrote a module for parsing and loading the Intel HEX file format.
    675 HEX is a standard format that all compilers targetting the MCS-51, and similar processors, produce.
     668We wrote a module for parsing and loading Intel HEX format files.
     669Intel HEX is a standard format that all compilers targetting the MCS-51, and similar processors, produce.
    676670It is essentially a snapshot of the processor's code memory in compressed form.
    677 Using this, we were able to compile C programs with SDCC, an open source compiler, and load the resulting program directly into our emulator's code memory, ready for execution.
    678 Further, we are able to produce a HEX file from our emulator's code memory, for loading into third party tools.
    679 After each step of execution, we can print out both the instruction that had been executed, along with its arguments, and a snapshot of the processor's state, including all flags and register contents.
     671Using this we were able to compile C programs with SDCC and load the resulting program directly into our emulator's code memory, ready for execution.
     672Further, we can produce a HEX file from our emulator's code memory for loading into third party tools.
     673After each step of execution, we can print out both the instruction that had been executed and a snapshot of the processor's state, including all flags and register contents.
    680674For example:
    681675\begin{frametxt}
     
    699693\end{small}
    700694\end{frametxt}
    701 Here, the traces indicates that the instruction \texttt{mov 81 \#07} has just been executed by the processor, which is now in the state indicated.
     695Here, the trace indicates that the instruction \texttt{mov 81 \#07} has just been executed by the processor, which is now in the state indicated.
    702696These traces were useful in spotting anything that was `obviously' wrong with the execution of the program.
    703697
    704 Further, we used MCU 8051 IDE as a reference.
    705 Using our execution traces, we were able to step through a compiled program, one instruction at a time, in MCU 8051 IDE, and compare the resulting execution trace with the trace produced by our emulator.
    706 
    707 Our Matita formalisation was largely copied from the O'Caml source code, apart from changes related to addressing modes already mentioned.
     698We further used MCU 8051 IDE as a reference, which allows a user to step through an assembly program one instruction at a time.
     699Using our execution traces, we were able to step through a compiled program in MCU 8051 IDE and compare the resulting execution trace with the trace produced by our emulator.
     700
     701Our Matita formalisation was largely copied from the O'Caml source code, apart from the changes already mentioned.
    708702However, as the Matita emulator is executable, we could perform further validation by comparing the trace of a program's execution in the Matita emulator with the trace of the same program in the O'Caml emulator.
    709703
     
    713707\section{Related work}
    714708\label{sect.related.work}
    715 There exists a large body of literature on the formalisation of microprocessors.
    716 The majority of it aims to prove correctness of the implementation of the microprocessor at the microcode or gate level.
    717  We are interested in providing a precise specification of the behaviour of the microprocessor in order to prove the correctness of a compiler which will target the processor.
     709A large body of literature on the formalisation of microprocessors exists.
     710The majority of it deals with proving correctness of implementations of microprocessors at the microcode or gate level.
     711We are interested in providing a precise specification of the behaviour of the microprocessor in order to prove the correctness of a compiler which will target the processor.
    718712In particular, we are interested in intensional properties of the processor; precise timings of instruction execution in clock cycles.
    719 Moreover, in addition to formalising the interface of an MCS-51 processor, we have also built a complete MCS-51 ecosystem: the UART, the I/O lines, and hardware timers, along with an assembler.
    720 
    721 Similar work to ours can be found in~\cite{fox:trustworthy:2010}.
    722 Here, the authors describe the formalisation, in HOL4, of the ARMv7 instruction set architecture, and point to a good list of references to related work in the literature.
    723 This formalisation also considers the machine code level, as opposed to only considering an abstract assembly language.
    724 In particular, instruction decoding is explicitly modelled inside HOL4's logic.
    725 We go further in also providing an assembly language, complete with assembler, to translate instructions and pseudoinstruction to machine code.
     713Moreover, in addition to formalising the interface of an MCS-51 processor, we have also built a complete MCS-51 ecosystem: UART, I/O lines, and hardware timers, complete with an assembler.
     714
     715Work closely related to our own can be found in~\cite{fox:trustworthy:2010}.
     716Here, the authors describe the formalisation, in HOL4, of the ARMv7 instruction set architecture.
     717They further point to an excellent list of references to related work in the literature for the interested reader.
     718This formalisation also considers the machine code level, opposed to their formalisation, which only considering an abstract assembly language.
     719In particular, instruction decoding is explicitly modeled inside HOL4's logic.
     720We go further in also providing an assembly language, complete with assembler, to translate instructions and pseudoinstruction into machine code.
    726721
    727722Further, in~\cite{fox:trustworthy:2010} the authors validated their formalisation by using development boards and random testing.
    728723We currently rely on non-exhaustive testing against a third party emulator.
    729 We leave exhaustive testing for future work.
    730 
    731 Executability is another key difference between our work and~\cite{fox:trustworthy:2010}.
    732 Our formalisation is executable: applying the emulation function to an input state eventually reduces to an output state that already satisfies the appropriate conditions.
    733 This is because Matita is based on a logic that internalizes conversion.
     724We recognise the importance of this exhaustive testing, but currently leave it for future work.
     725
     726Executability is another key difference between our work and that of~\cite{fox:trustworthy:2010}.
     727Our formalisation is executable: applying the emulation function to an input state eventually reduces to an output state.
     728This is because Matita is based on a logic, CIC, which internalizes conversion.
    734729In~\cite{fox:trustworthy:2010} the authors provide an automation layer to derive single step theorems: if the processor is in a particular state that satisfies some preconditions, then after execution of an instruction it will reside in another state satisfying some postconditions.
    735730We do not need single step theorems of this form.
     
    740735Perhaps the closest project to CerCo is CompCert~\cite{leroy:formally:2009}.
    741736CompCert concerns the certification of a C compiler and includes a formalisation in Coq of a subset of PowerPC.
    742 Coq and Matita essentially share the same logic.
     737(Coq and Matita essentially share the same logic.)
    743738
    744739Despite this similarity, the two formalisations do not have much in common.
    745 First, CompCert provides a formalisation at the assembly level (no instruction decoding), and this impels them to trust an unformalised assembler and linker, whereas we provide our own. Our formalization is directly executable, while the
    746 one in CompCert only provides a relation that describes execution.
     740First, CompCert provides a formalisation at the assembly level (no instruction decoding).
     741This impels them to trust an unformalised assembler and linker, whereas we provide our own.
     742Our formalisation is \emph{directly} executable, while the one in CompCert only provides a relation that describes execution.
    747743I/O is also not considered at all in CompCert.
    748744Moreover an idealized abstract and uniform memory model is assumed, while we take into account the complicated overlapping memory model of the MCS-51 architecture.
    749 Finally, 82 instructions of the 200+ offered by the processor are formalised in CompCert, and the assembly language is augmented with macro instructions that are turned into `real' instructions only during communication with the external assembler.
    750 Even from a technical level the two formalisations differ: while we tried to exploit dependent types as often as possible, CompCert largely sticks to the non-dependent fragment of Coq.
    751 
    752 In~\cite{atkey:coqjvm:2007} Atkey presents an executable specification of the Java virtual machine which uses dependent types.
    753 As we do, dependent types are used to remove spurious partiality from the model, and to lower the need for over-specifying the behaviour of the processor in impossible cases.
    754 Our use of dependent types will also help to maintain invariants when we prove the correctness of the CerCo prototype compiler.
    755 
    756 Finally, Sarkar et al~\cite{sarkar:semantics:2009} provide an executable semantics for x86-CC multiprocessor machine code.
     745Finally, 82 instructions of the more than 200 offered by the processor are formalised in CompCert, and the assembly language is augmented with macro instructions that are turned into `real' instructions only during communication with the external assembler.
     746Even from a technical level the two formalisations differ: we tried to exploit dependent types whilst CompCert largely sticks to a non-dependent fragment of Coq.
     747
     748In~\cite{atkey:coqjvm:2007} an executable specification of the Java Virtual Machine, using dependent types, is presented.
     749As we do, dependent types there are used to remove spurious partiality from the model.
     750They also lower the need for over-specifying the behaviour of the processor in impossible cases.
     751Our use of dependent types will also help to maintain invariants when we prove the correctness of the CerCo prototype C compiler.
     752
     753Finally~\cite{sarkar:semantics:2009} provides an executable semantics for x86-CC multiprocessor machine code.
    757754This machine code exhibits a high degree of non-uniformity similar to the MCS-51.
    758755However, only a small subset of the instruction set is considered, and they over-approximate the possibilities of unorthogonality of the instruction set, largely dodging the problems we had to face.
     
    761758A small domain specific language of patterns is formalised in HOL4.
    762759This is similar to the specification language of the x86 instruction set found in manufacturer's data sheets.
    763 A decode function is implemented by copying lines from data sheets into the proof script.
     760A decode function is implemented by copying lines from data sheets into the proof script, which are then interpreted.
    764761
    765762We are currently considering implementing a similar domain specific language in Matita.
     
    773770\label{sect.conclusions}
    774771
    775 The CerCo project is interested in the certification of a compiler for C that induces a precise cost model on the source code.
    776 Our cost model assigns costs to blocks of instructions by tracing the way that blocks are compiled, and by computing exact costs on generated assembly code.
    777 To perform this accurately, we have provided an executable semantics for the MCS-51 family of processors, better known as 8051/8052.
    778 The formalisation was done twice, first in O'Caml and then in Matita, and captures the exact timings of the processor.
     772In CerCo, we are interested in the certification of a compiler for C that induces a precise cost model on the source code.
     773Our cost model assigns costs to blocks of instructions by tracing the way that blocks are compiled, and by computing exact costs on generated machine language.
     774To perform this accurately, we have provided an executable semantics for the MCS-51 family of processors.
     775The formalisation was done twice, first in O'Caml and then in Matita, and captures the exact timings of the processor (according to a Siemen's data sheet).
    779776Moreover, the O'Caml formalisation also considers timers and I/O.
    780 Adding support for I/O and timers in Matita is an on-going work that will not present any major problem, and was delayed only because the addition is not immediately useful for the formalisation of the CerCo compiler.
     777Adding support for I/O and timers in Matita is on-going work that will not present any major problem, and was delayed only because the addition is not immediately useful for the formalisation of the CerCo compiler.
    781778
    782779The formalisation is done at machine level and not at assembly level; we also formalise fetching and decoding.
    783780We separately provide an assembly language, enhanched with labels and pseudoinstructions, and an assembler from this language to machine code.
     781This assembly language is similar to those found in `industrial strength' compilers, such as SDCC.
    784782We introduce cost labels in the machine language to relate the data flow of the assembly program to that of the C source language, in order to associate costs to the C program.
    785783For the O'Caml version, we provide a parser and pretty printer from code memory to Intel HEX format.
     
    788786Our main difficulty in formalising the MCS-51 was the unorthogonality of its memory model and instruction set.
    789787These problems are easily handled in O'Caml by using advanced language features like polymorphic variants and phantom types, simulating Generalized Abstract Data Types.
    790 In Matita, we use dependent types to recover the same flexibility, to reduce spurious partiality, and to grant invariants that will be useful in the formalization of the CerCo compiler.
     788In Matita, we use dependent types to recover the same flexibility, to reduce spurious partiality, and to grant invariants that will be later useful in other formalisations in the CerCo project.
    791789
    792790The formalisation has been partially verified by computing execution traces on selected programs and comparing them with an existing emulator.
    793791All instructions have been tested at least once, but we have not yet pushed testing further, for example with random testing or by using development boards.
    794792I/O in particular has not been tested yet, and it is currently unclear how to provide exhaustive testing in the presence of I/O.
    795 Finally, we are aware of having over-specified the processor in several places, by fixing a behaviour hopefully consistent with the real machine, where manufacturer data sheets are ambiguous or under specified.
     793Finally, we are aware of having over-specified the processor in several places, by fixing a behaviour hopefully consistent with the real machine, where manufacturer data sheets are ambiguous or under-specified.
    796794
    797795\bibliography{itp-2011.bib}
Note: See TracChangeset for help on using the changeset viewer.