Changeset 571 for Deliverables
- Timestamp:
- Feb 18, 2011, 11:49:36 AM (10 years ago)
- Location:
- Deliverables/D4.1/ITP-Paper
- Files:
-
- 2 edited
Legend:
- Unmodified
- Added
- Removed
-
Deliverables/D4.1/ITP-Paper/itp-2011.bib
r568 r571 44 44 { atkey:coqjvm:2007, 45 45 author = {Robert Atkey}, 46 title = {{CoqJVM}: An executable specification of the Java virtual machineusing dependent types},46 title = {{CoqJVM}: An executable specification of the {Java Virtual Machine} using dependent types}, 47 47 booktitle = {Proceedings of the Conference of the {TYPES} Project}, 48 48 pages = {18--32}, -
Deliverables/D4.1/ITP-Paper/itp-2011.tex
r570 r571 419 419 \label{subsect.anatomy.matita.emulator} 420 420 421 The internal state of ourMatita emulator is represented as a record:421 The internal state of the Matita emulator is represented as a record: 422 422 \begin{lstlisting} 423 423 record Status: Type[0] ≝ { … … 431 431 ... }. 432 432 \end{lstlisting} 433 This record neatlyencapsulates the current memory contents, the program counter, the state of the current SFRs, and so on.434 435 Here the MCS-51 memory model is implemented using four disjoint memory spaces plus theSFRs.436 From the programmer's point of view, what matters are addressing modes that are in a many-to-many relationwith the spaces.437 \texttt{DIRECT} addressing can be used to address either low internalRAM (if the first bit is 0) or the SFRs (if the first bit is 1), for instance.438 That's why DIRECT uses 8-bit addresses but pointers to the low internalRAM only use 7 bits.439 The complexity of the memory model is captured in the \texttt{get\_arg\_XX} and \texttt{set\_arg\_XX} functions that get and set data of size \texttt{XX} from memory, considering all addressing modes433 This record encapsulates the current memory contents, the program counter, the state of the current SFRs, and so on. 434 435 Here the MCS-51's memory model is implemented using four disjoint memory spaces, plus SFRs. 436 From the programmer's point of view, what \emph{really} matters are the addressing modes that are in a many-to-many relationship with the spaces. 437 \texttt{DIRECT} addressing can be used to address either lower IRAM (if the first bit is 0) or the SFRs (if the first bit is 1), for instance. 438 That's why DIRECT uses 8-bit addresses but pointers to lower IRAM only use 7 bits. 439 The complexity of the memory model is captured in a pair of functions, \texttt{get\_arg\_XX} and \texttt{set\_arg\_XX}, that `get' and `set' data of size \texttt{XX} from memory. 440 440 441 441 %Overlapping, and checking which addressing modes can be used to address particular memory spaces, is handled through numerous \texttt{get\_arg\_XX} and \texttt{set\_arg\_XX} (for 1, 8 and 16 bits) functions. 442 442 443 Both the Matita and O'Caml emulators follow sthe classic `fetch-decode-execute' model of processor operation.443 Both the Matita and O'Caml emulators follow the classic `fetch-decode-execute' model of processor operation. 444 444 The next instruction to be processed, indexed by the program counter, is fetched from code memory with \texttt{fetch}. 445 An updated program counter, along with the concrete cost, in processor cycles for executing this instruction, is also returned.446 These costs are taken from a Siemens Semiconductor Group data sheet for the MCS-51~\cite{siemens:2011}, and will likely vary across manufacturers and particularderivatives of the processor.445 An updated program counter, along with its concrete cost in processor cycles, is also returned. 446 These costs are taken from a Siemens Semiconductor Group data sheet for the MCS-51~\cite{siemens:2011}, and will likely vary across manufacturers and derivatives of the processor. 447 447 \begin{lstlisting} 448 448 definition fetch: BitVectorTrie Byte 16 $\rightarrow$ Word $\rightarrow$ instruction $\times$ Word $\times$ nat … … 471 471 A callback function was also accepted as an argument, which could `witness' the execution as it happened, providing a print-out of the processor state. 472 472 Due to Matita's termination requirement, \texttt{execute} cannot execute a program indefinitely. 473 An alternative would be to produce an infinite stream of statuses representing an execution trace.473 An alternative approach would be to produce an infinite stream of statuses representing an execution trace. 474 474 Matita supports infinite streams through co-inductive types. 475 475 … … 480 480 \label{subsect.instruction.set.unorthogonality} 481 481 482 A peculiarity of the MCS-51 is the non-orthogonality of itsinstruction set.483 For instance, the \texttt{MOV} instruction ,can be invoked using one of 16 combinations of addressing modes out of a possible 361.482 A peculiarity of the MCS-51 is its unorthogonal instruction set. 483 For instance, the \texttt{MOV} instruction can be invoked using one of 16 combinations of addressing modes out of a possible 361. 484 484 485 485 % Show example of pattern matching with polymorphic variants 486 486 487 Such non-orthogonality in the instruction set was handled with the use of polymorphic variants in the O'Caml emulator.487 Such unorthogonality in the instruction set was handled with the use of polymorphic variants in O'Caml. 488 488 For instance, we introduced types corresponding to each addressing mode: 489 489 \begin{lstlisting} … … 513 513 For our purposes, the types \texttt{union2}, \texttt{union3} and \texttt{union6} sufficed. 514 514 515 This polymorphic variant machinery worked well: it introduced a certain level of type safety (for instance, the type of our \texttt{MOV} instructionabove guarantees it cannot be invoked with arguments in the \texttt{carry} and \texttt{data16} addressing modes, respectively), and also allowed us to pattern match against instructions, when necessary.515 This polymorphic variant machinery worked well: it introduced a certain level of type safety (for instance, the type of \texttt{MOV} above guarantees it cannot be invoked with arguments in the \texttt{carry} and \texttt{data16} addressing modes, respectively), and also allowed us to pattern match against instructions, when necessary. 516 516 However, this polymorphic variant machinery is \emph{not} present in Matita. 517 517 We needed some way to produce the same effect, which Matita supported. … … 544 544 The \texttt{is\_in} function checks if an \texttt{addressing\_mode} matches a set of tags represented as a vector. It simply extends the \texttt{is\_a} function in the obvious manner. 545 545 546 Finally, a \texttt{subaddressing\_mode} is an ad-hoc non empty $\Sigma$-type of addressing 547 modes constrained to be in a given set of tags: 546 A \texttt{subaddressing\_mode} is an \emph{ad hoc} non-empty $\Sigma$-type of \texttt{addressing\_mode}s constrained to be in a set of tags: 548 547 \begin{lstlisting} 549 548 record subaddressing_mode n (l: Vector addressing_mode_tag (S n)): Type[0] := … … 551 550 subaddressing_modein: bool_to_Prop (is_in ? l subaddressing_modeel) }. 552 551 \end{lstlisting} 553 An implicit coercion is provided to promote vectors of tags (denoted with 554 $\llbracket - \rrbracket$) 555 to the 556 corresponding \texttt{subaddressing\_mode} so that we can use a syntax 557 close to the O'Caml one to specify preinstructions: 552 An implicit coercion is provided to promote vectors of tags (denoted with $\llbracket - \rrbracket$) to the corresponding \texttt{subaddressing\_mode} so that we can use a syntax close to that of O'Caml to specify \texttt{preinstruction}s: 558 553 \begin{lstlisting} 559 554 inductive preinstruction (A: Type[0]): Type[0] ≝ … … 566 561 % One of these coercions opens up a proof obligation which needs discussing 567 562 % Have lemmas proving that if an element is a member of a sub, then it is a member of a superlist, and so on 568 The final , missingcomponent is a pair of type coercions from \texttt{addressing\_mode} to \texttt{subaddressing\_mode} and from \texttt{subaddressing\_mode} to \texttt{Type$\lbrack0\rbrack$}, respectively.563 The final component is a pair of type coercions from \texttt{addressing\_mode} to \texttt{subaddressing\_mode} and from \texttt{subaddressing\_mode} to \texttt{Type$\lbrack0\rbrack$}, respectively. 569 564 The first is a forgetful coercion, while the second opens a proof obligation wherein we must prove that the provided value is in the admissible set. 570 These coercions were first introduced by PVS to implement subset types~\cite{shankar:principles:1999}, and later in Coq as an addition~\cite{sozeau:subset:2006}.565 These coercions were first introduced by PVS to implement subset types~\cite{shankar:principles:1999}, and later in Coq as part of Russell~\cite{sozeau:subset:2006}. 571 566 In Matita all coercions can open proof obligations. 572 567 573 Proof obligations impels us to state and prove a few auxilliary lemmas related 574 to transitivity of subtyping. For instance, an addressing mode that belongs 575 to an allowed set also belongs to any one of its super-set. At the moment, 576 Matita's automation exploits these lemmas to completely solve all the proof 577 obligations opened in our formalization, comprising the 200 proof obligations 578 related to the main \texttt{execute\_1} function. 579 580 The machinery just described allows us to restrict the set of addressing 581 modes expected by a function and use this information during pattern matching 582 to skip impossible cases. 568 Proof obligations require us to state and prove a few auxilliary lemmas related to the transitivity of subtyping. 569 For instance, an \texttt{addressing\_mode} that belongs to an allowed set also belongs to any one of its supersets. 570 At the moment, Matita's automation exploits these lemmas to completely solve all the proof obligations opened in our formalisation. 571 The \texttt{execute\_1} function, for instance, opens over 200 proof obligations during type checking. 572 573 The machinery just described allows us to restrict the set of \texttt{addressing\_mode}s expected by a function and use this information during pattern matching. 574 This allows us to skip impossible cases. 583 575 For instance, consider \texttt{set\_arg\_16}, which expects only a \texttt{DPTR}: 584 576 \begin{lstlisting} … … 592 584 | _ $\Rightarrow$ $\lambda$_: False. $\bot$ ] $~$(subaddressing_modein $\ldots$ a). 593 585 \end{lstlisting} 594 We give a proof (the expression \texttt{(subaddressing\_modein} $\ldots$ \texttt{a)}) that the argument $a$ is in the set $\llbracket$ \texttt{dptr} $\rrbracket$ to the matchexpression.586 We give a proof (the expression \texttt{(subaddressing\_modein} $\ldots$ \texttt{a)}) that the argument $a$ is in the set $\llbracket$ \texttt{dptr} $\rrbracket$ to the \texttt{match} expression. 595 587 In every case but \texttt{DPTR}, the proof is a proof of \texttt{False}, and the system opens a proof obligation $\bot$ that can be discarded using \emph{ex falso}. 596 Attempting to match against a disallowed addressing mode (replacing \texttt{False} with \texttt{True} in the branch) produces a type -error.597 598 Other dependently and non-dependently typed solutions we tried were clumsy in practice.599 As we need a large number of different combinations of addressing modes to describe the whole instruction set, it is unfeasible to declare a datatype for each one of these combinations.600 The current solution is the one that best matchesthe corresponding O'Caml code, to the point that the translation from O'Caml to Matita is almost syntactical.601 We would like to investigate the possibility of changing the code extraction procedure of Matita to recognise this programming pattern and outputO'Caml code using polymorphic variants.588 Attempting to match against a disallowed addressing mode (replacing \texttt{False} with \texttt{True} in the branch) produces a type error. 589 590 We tried other dependently and non-dependently typed solutions before settling on this approach. 591 As we need a large number of different combinations of addressing modes to describe the whole instruction set, it is infeasible to declare a datatype for each one of these combinations. 592 The current solution is closest to the corresponding O'Caml code, to the point that the translation from O'Caml to Matita is almost syntactical. 593 We would like to investigate the possibility of changing the code extraction procedure of Matita so that it recognises this programming pattern and outputs O'Caml code using polymorphic variants. 602 594 603 595 % Talk about extraction to O'Caml code, which hopefully will allow us to extract back to using polymorphic variants, or when extracting vectors we could extract using phantom types … … 616 608 617 609 To accurately model timers and I/O, we add an unbounded integral field \texttt{clock} to the central \texttt{status} record. 618 This field is only logical, since it does not represent any quantity stored in the actual processor, and is used to keep track of the current processor time.610 This field is only logical, since it does not represent any quantity stored in the physical processor, and is used to keep track of the current `processor time'. 619 611 Before every execution step, \texttt{clock} is incremented by the number of processor cycles that the instruction just fetched will take to execute. 620 The processor then executes the instruction, followed by the code implementing the timers and I/O\footnote{Though it isn't fully specified by the manufacturer's data sheets if I/O is handled at the beginning or the end of each cycle.}. In order to model I/O, we also store in the status a621 \emph{continuation} which is a description of the behaviour of the environment:612 The emulator then executes the instruction, followed by the code implementing the timers and I/O\footnote{Though it isn't fully specified by the manufacturer's data sheets if I/O is handled at the beginning or the end of each cycle.}. 613 In order to model I/O, we also store in \texttt{status} a \emph{continuation} which is a description of the behaviour of the environment: 622 614 \begin{lstlisting} 623 615 type line = … … 628 620 [`Out of (time -> line -> time * continuation)] 629 621 \end{lstlisting} 630 At each moment, the second projection of the continuation $k$ describes how the environment will react to an output event performed in the future by the processor. Let $\pi_2(k)(\tau,o) = \langle \tau',k' \rangle$. 631 If the processor at time $\tau$ starts an asynchronous output $o$ either on the P1 or P3 output lines, or on the UART, then the environment will receive the output at time $\tau'$. 632 Moreover the status is immediately updated with the continuation $k'$. 633 634 Further, if $\pi_1(k) = \mathtt{Some}~\langle \tau',i,\epsilon,k'\rangle$, then at time $\tau'$ the environment will send the asynchronous input $i$ to the processor and the status will be updated with the continuation $k'$. 635 This input is visible to the processor only at time $\tau' + \epsilon$. 622 At each moment, the second projection of the continuation $k$ describes how the environment will react to an output event performed in the future by the processor. 623 Suppose $\pi_2(k)(\tau,o) = \langle \tau',k' \rangle$. 624 If the emulator at time $\tau$ starts an asynchronous output $o$ either on the P1 or P3 output lines, or on the UART, then the environment will receive the output at time $\tau'$. 625 Moreover \texttt{status} is immediately updated with the continuation $k'$. 626 627 Further, if $\pi_1(k) = \mathtt{Some}~\langle \tau',i,\epsilon,k'\rangle$, then at time $\tau'$ the environment will send the asynchronous input $i$ to the emulator and \texttt{status} is updated with the continuation $k'$. 628 This input is visible to the emulator only at time $\tau' + \epsilon$. 636 629 637 630 The time required to perform an I/O operation is partially specified in the data sheets of the UART module. … … 640 633 641 634 We use only the P1 and P3 lines despite the MCS-51 having four output lines, P0--P3. 642 This is because P0 and P2 become inoperable if the processor is equipped with XRAM (w hich we assume it is).635 This is because P0 and P2 become inoperable if the processor is equipped with XRAM (we assume it is). 643 636 644 637 The UART port can work in several modes, depending on the how the SFRs are set. 645 In an asyncrhonous mode, the UART transmits 8 bits at a time, using a ninth line for sync rhonization.646 In a sync rhonous mode the ninth line is used to transmit an additional bit.638 In an asyncrhonous mode, the UART transmits 8 bits at a time, using a ninth line for synchronisation. 639 In a synchronous mode the ninth line is used to transmit an additional bit. 647 640 648 641 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% … … 653 646 654 647 As mentioned in Subsection~\ref{subsect.labels.pseudoinstructions} we introduced a notion of \emph{cost label}. 655 Cost labels are inserted by the prototype C compiler inspecific locations in the object code.648 Cost labels are inserted by the prototype C compiler at specific locations in the object code. 656 649 Roughly, for those familiar with control flow graphs, they are inserted at the start of every basic block. 657 650 … … 669 662 We spent considerable effort attempting to ensure that what we have formalised is an accurate model of the MCS-51 microprocessor. 670 663 671 First, we made use of multiple data sheets, each from a different semiconductor manufacturer. This helped us spot errors in the specification of the processor's instruction set, and its behaviour, for instance, in a datasheet from Philips. 664 We made use of multiple data sheets, each from a different semiconductor manufacturer. 665 This helped us triangulate errors in the specification of the processor's instruction set, and its behaviour, for instance, in a data sheet from Philips Semiconductor. 672 666 673 667 The O'Caml prototype was especially useful for validation purposes. 674 This is because we wrote a module for parsing and loading the Intel HEX file format.675 HEX is a standard format that all compilers targetting the MCS-51, and similar processors, produce.668 We wrote a module for parsing and loading Intel HEX format files. 669 Intel HEX is a standard format that all compilers targetting the MCS-51, and similar processors, produce. 676 670 It is essentially a snapshot of the processor's code memory in compressed form. 677 Using this , we were able to compile C programs with SDCC, an open source compiler,and load the resulting program directly into our emulator's code memory, ready for execution.678 Further, we are able to produce a HEX file from our emulator's code memory,for loading into third party tools.679 After each step of execution, we can print out both the instruction that had been executed , along with its arguments,and a snapshot of the processor's state, including all flags and register contents.671 Using this we were able to compile C programs with SDCC and load the resulting program directly into our emulator's code memory, ready for execution. 672 Further, we can produce a HEX file from our emulator's code memory for loading into third party tools. 673 After each step of execution, we can print out both the instruction that had been executed and a snapshot of the processor's state, including all flags and register contents. 680 674 For example: 681 675 \begin{frametxt} … … 699 693 \end{small} 700 694 \end{frametxt} 701 Here, the trace sindicates that the instruction \texttt{mov 81 \#07} has just been executed by the processor, which is now in the state indicated.695 Here, the trace indicates that the instruction \texttt{mov 81 \#07} has just been executed by the processor, which is now in the state indicated. 702 696 These traces were useful in spotting anything that was `obviously' wrong with the execution of the program. 703 697 704 Further, we used MCU 8051 IDE as a reference.705 Using our execution traces, we were able to step through a compiled program , one instruction at a time, in MCU 8051 IDE,and compare the resulting execution trace with the trace produced by our emulator.706 707 Our Matita formalisation was largely copied from the O'Caml source code, apart from changes related to addressing modes already mentioned.698 We further used MCU 8051 IDE as a reference, which allows a user to step through an assembly program one instruction at a time. 699 Using our execution traces, we were able to step through a compiled program in MCU 8051 IDE and compare the resulting execution trace with the trace produced by our emulator. 700 701 Our Matita formalisation was largely copied from the O'Caml source code, apart from the changes already mentioned. 708 702 However, as the Matita emulator is executable, we could perform further validation by comparing the trace of a program's execution in the Matita emulator with the trace of the same program in the O'Caml emulator. 709 703 … … 713 707 \section{Related work} 714 708 \label{sect.related.work} 715 There exists a large body of literature on the formalisation of microprocessors.716 The majority of it aims to prove correctness of the implementation of the microprocessorat the microcode or gate level.717 709 A large body of literature on the formalisation of microprocessors exists. 710 The majority of it deals with proving correctness of implementations of microprocessors at the microcode or gate level. 711 We are interested in providing a precise specification of the behaviour of the microprocessor in order to prove the correctness of a compiler which will target the processor. 718 712 In particular, we are interested in intensional properties of the processor; precise timings of instruction execution in clock cycles. 719 Moreover, in addition to formalising the interface of an MCS-51 processor, we have also built a complete MCS-51 ecosystem: the UART, the I/O lines, and hardware timers, along with an assembler. 720 721 Similar work to ours can be found in~\cite{fox:trustworthy:2010}. 722 Here, the authors describe the formalisation, in HOL4, of the ARMv7 instruction set architecture, and point to a good list of references to related work in the literature. 723 This formalisation also considers the machine code level, as opposed to only considering an abstract assembly language. 724 In particular, instruction decoding is explicitly modelled inside HOL4's logic. 725 We go further in also providing an assembly language, complete with assembler, to translate instructions and pseudoinstruction to machine code. 713 Moreover, in addition to formalising the interface of an MCS-51 processor, we have also built a complete MCS-51 ecosystem: UART, I/O lines, and hardware timers, complete with an assembler. 714 715 Work closely related to our own can be found in~\cite{fox:trustworthy:2010}. 716 Here, the authors describe the formalisation, in HOL4, of the ARMv7 instruction set architecture. 717 They further point to an excellent list of references to related work in the literature for the interested reader. 718 This formalisation also considers the machine code level, opposed to their formalisation, which only considering an abstract assembly language. 719 In particular, instruction decoding is explicitly modeled inside HOL4's logic. 720 We go further in also providing an assembly language, complete with assembler, to translate instructions and pseudoinstruction into machine code. 726 721 727 722 Further, in~\cite{fox:trustworthy:2010} the authors validated their formalisation by using development boards and random testing. 728 723 We currently rely on non-exhaustive testing against a third party emulator. 729 We leave exhaustive testingfor future work.730 731 Executability is another key difference between our work and ~\cite{fox:trustworthy:2010}.732 Our formalisation is executable: applying the emulation function to an input state eventually reduces to an output state that already satisfies the appropriate conditions.733 This is because Matita is based on a logic thatinternalizes conversion.724 We recognise the importance of this exhaustive testing, but currently leave it for future work. 725 726 Executability is another key difference between our work and that of~\cite{fox:trustworthy:2010}. 727 Our formalisation is executable: applying the emulation function to an input state eventually reduces to an output state. 728 This is because Matita is based on a logic, CIC, which internalizes conversion. 734 729 In~\cite{fox:trustworthy:2010} the authors provide an automation layer to derive single step theorems: if the processor is in a particular state that satisfies some preconditions, then after execution of an instruction it will reside in another state satisfying some postconditions. 735 730 We do not need single step theorems of this form. … … 740 735 Perhaps the closest project to CerCo is CompCert~\cite{leroy:formally:2009}. 741 736 CompCert concerns the certification of a C compiler and includes a formalisation in Coq of a subset of PowerPC. 742 Coq and Matita essentially share the same logic. 737 (Coq and Matita essentially share the same logic.) 743 738 744 739 Despite this similarity, the two formalisations do not have much in common. 745 First, CompCert provides a formalisation at the assembly level (no instruction decoding), and this impels them to trust an unformalised assembler and linker, whereas we provide our own. Our formalization is directly executable, while the 746 one in CompCert only provides a relation that describes execution. 740 First, CompCert provides a formalisation at the assembly level (no instruction decoding). 741 This impels them to trust an unformalised assembler and linker, whereas we provide our own. 742 Our formalisation is \emph{directly} executable, while the one in CompCert only provides a relation that describes execution. 747 743 I/O is also not considered at all in CompCert. 748 744 Moreover an idealized abstract and uniform memory model is assumed, while we take into account the complicated overlapping memory model of the MCS-51 architecture. 749 Finally, 82 instructions of the 200+ offered by the processor are formalised in CompCert, and the assembly language is augmented with macro instructions that are turned into `real' instructions only during communication with the external assembler. 750 Even from a technical level the two formalisations differ: while we tried to exploit dependent types as often as possible, CompCert largely sticks to the non-dependent fragment of Coq. 751 752 In~\cite{atkey:coqjvm:2007} Atkey presents an executable specification of the Java virtual machine which uses dependent types. 753 As we do, dependent types are used to remove spurious partiality from the model, and to lower the need for over-specifying the behaviour of the processor in impossible cases. 754 Our use of dependent types will also help to maintain invariants when we prove the correctness of the CerCo prototype compiler. 755 756 Finally, Sarkar et al~\cite{sarkar:semantics:2009} provide an executable semantics for x86-CC multiprocessor machine code. 745 Finally, 82 instructions of the more than 200 offered by the processor are formalised in CompCert, and the assembly language is augmented with macro instructions that are turned into `real' instructions only during communication with the external assembler. 746 Even from a technical level the two formalisations differ: we tried to exploit dependent types whilst CompCert largely sticks to a non-dependent fragment of Coq. 747 748 In~\cite{atkey:coqjvm:2007} an executable specification of the Java Virtual Machine, using dependent types, is presented. 749 As we do, dependent types there are used to remove spurious partiality from the model. 750 They also lower the need for over-specifying the behaviour of the processor in impossible cases. 751 Our use of dependent types will also help to maintain invariants when we prove the correctness of the CerCo prototype C compiler. 752 753 Finally~\cite{sarkar:semantics:2009} provides an executable semantics for x86-CC multiprocessor machine code. 757 754 This machine code exhibits a high degree of non-uniformity similar to the MCS-51. 758 755 However, only a small subset of the instruction set is considered, and they over-approximate the possibilities of unorthogonality of the instruction set, largely dodging the problems we had to face. … … 761 758 A small domain specific language of patterns is formalised in HOL4. 762 759 This is similar to the specification language of the x86 instruction set found in manufacturer's data sheets. 763 A decode function is implemented by copying lines from data sheets into the proof script .760 A decode function is implemented by copying lines from data sheets into the proof script, which are then interpreted. 764 761 765 762 We are currently considering implementing a similar domain specific language in Matita. … … 773 770 \label{sect.conclusions} 774 771 775 The CerCo project isinterested in the certification of a compiler for C that induces a precise cost model on the source code.776 Our cost model assigns costs to blocks of instructions by tracing the way that blocks are compiled, and by computing exact costs on generated assembly code.777 To perform this accurately, we have provided an executable semantics for the MCS-51 family of processors , better known as 8051/8052.778 The formalisation was done twice, first in O'Caml and then in Matita, and captures the exact timings of the processor .772 In CerCo, we are interested in the certification of a compiler for C that induces a precise cost model on the source code. 773 Our cost model assigns costs to blocks of instructions by tracing the way that blocks are compiled, and by computing exact costs on generated machine language. 774 To perform this accurately, we have provided an executable semantics for the MCS-51 family of processors. 775 The formalisation was done twice, first in O'Caml and then in Matita, and captures the exact timings of the processor (according to a Siemen's data sheet). 779 776 Moreover, the O'Caml formalisation also considers timers and I/O. 780 Adding support for I/O and timers in Matita is anon-going work that will not present any major problem, and was delayed only because the addition is not immediately useful for the formalisation of the CerCo compiler.777 Adding support for I/O and timers in Matita is on-going work that will not present any major problem, and was delayed only because the addition is not immediately useful for the formalisation of the CerCo compiler. 781 778 782 779 The formalisation is done at machine level and not at assembly level; we also formalise fetching and decoding. 783 780 We separately provide an assembly language, enhanched with labels and pseudoinstructions, and an assembler from this language to machine code. 781 This assembly language is similar to those found in `industrial strength' compilers, such as SDCC. 784 782 We introduce cost labels in the machine language to relate the data flow of the assembly program to that of the C source language, in order to associate costs to the C program. 785 783 For the O'Caml version, we provide a parser and pretty printer from code memory to Intel HEX format. … … 788 786 Our main difficulty in formalising the MCS-51 was the unorthogonality of its memory model and instruction set. 789 787 These problems are easily handled in O'Caml by using advanced language features like polymorphic variants and phantom types, simulating Generalized Abstract Data Types. 790 In Matita, we use dependent types to recover the same flexibility, to reduce spurious partiality, and to grant invariants that will be useful in the formalization of the CerCo compiler.788 In Matita, we use dependent types to recover the same flexibility, to reduce spurious partiality, and to grant invariants that will be later useful in other formalisations in the CerCo project. 791 789 792 790 The formalisation has been partially verified by computing execution traces on selected programs and comparing them with an existing emulator. 793 791 All instructions have been tested at least once, but we have not yet pushed testing further, for example with random testing or by using development boards. 794 792 I/O in particular has not been tested yet, and it is currently unclear how to provide exhaustive testing in the presence of I/O. 795 Finally, we are aware of having over-specified the processor in several places, by fixing a behaviour hopefully consistent with the real machine, where manufacturer data sheets are ambiguous or under 793 Finally, we are aware of having over-specified the processor in several places, by fixing a behaviour hopefully consistent with the real machine, where manufacturer data sheets are ambiguous or under-specified. 796 794 797 795 \bibliography{itp-2011.bib}
Note: See TracChangeset
for help on using the changeset viewer.