Ignore:
Timestamp:
Sep 27, 2012, 3:16:34 PM (7 years ago)
Author:
mulligan
Message:

Down to 17 pages now.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • Papers/cpp-asm-2012/cpp-2012-asm.tex

    r2357 r2358  
    4747\begin{abstract}
    4848We present a proof of correctness, in Matita, for an optimising assembler for the MCS-51 microcontroller.
    49 This assembler constitutes a major component of the EU's CerCo project.
    5049
    5150The efficient expansion of pseudoinstructions---namely jumps---into MCS-51 machine instructions is complex.
     
    9392This problem is known to be complex for most CISC architectures (see~\cite{hyde:branch:2006}).
    9493
    95 To free the CerCo C compiler from having to consider complications relating to branch displacement, we have chosen to implement an optimising assembler, whose input language the compiler will target.
     94To free CerCo's C compiler from having to consider complications relating to branch displacement, we have chosen to implement an optimising assembler, whose input language the compiler will target.
    9695Labels, conditional jumps to labels, a program preamble containing global data and a \texttt{MOV} instruction for moving this global data into the MCS-51's one 16-bit register all feature in our assembly language.
    9796We simplify the proof by assuming that all our assembly programs are pre-linked (i.e. we do not formalise a linker---this is left for future work).
     
    158157
    159158Matita is a proof assistant based on a variant of the Calculus of (Co)inductive Constructions~\cite{asperti:user:2007}.
    160 In particular, it features dependent types that we exploit in the formalisation.
    161 The syntax of the statements and definitions in the paper should be self-explanatory, at least to those exposed to dependent type theory.
     159It features dependent types that we exploit in the formalisation.
     160The (simplified) syntax of the statements and definitions in the paper should be self-explanatory.
    162161Pairs are denoted with angular brackets, $\langle-, -\rangle$.
    163162
     
    461460    instr = i $\wedge$ ticks = (ticks_of_instruction instr) $\wedge$ pc' = pc_plus_len.
    462461\end{lstlisting}
    463 In particular, we read \texttt{fetch\_assembly} as follows.
     462We read \texttt{fetch\_assembly} as follows.
    464463Given an instruction, \texttt{i}, we first assemble the instruction to obtain \texttt{assembled}, checking that the assembled instruction was stored in code memory correctly.
    465464Fetching from code memory, we obtain a tuple consisting of the instruction, new program counter, and the number of ticks this instruction will take to execute.
     
    516515The fetched sequence corresponds to the expansion, according to the policy, of the pseudoinstruction.
    517516
    518 At first, the lemma appears to immediately imply the correctness of the assembler.
    519 However, this property is \emph{not} strong enough to establish that the semantics of an assembly program has been preserved by the assembly process since it does not establish the correspondence between the semantics of a pseudoinstruction and that of its expansion.
     517At first, the lemma appears to immediately imply the correctness of the assembler, but this property is \emph{not} strong enough to establish that the semantics of an assembly program has been preserved by the assembly process since it does not establish the correspondence between the semantics of a pseudoinstruction and that of its expansion.
    520518In particular, the two semantics differ on instructions that \emph{could} directly manipulate program addresses.
    521519
     
    547545This approach, of using real addresses coupled with a weak, dynamic typing system sitting atop of memory, is similar to one taken by Tuch \emph{et al}~\cite{tuch:types:2007}, for reasoning about low-level C code.
    548546
    549 Our analogue of the semantic function above is then merely a wrapper around the function that implements the semantics of machine code, paired with a function that keeps track of addresses.
    550 This permits a large amount of code reuse, as the semantics of pseudo- and machine code are essentially shared.
     547Our analogue of the semantic function above is merely a wrapper around the function that implements the semantics of machine code, paired with a function that keeps track of addresses.
     548The semantics of pseudo- and machine code are then essentially shared.
    551549The only thing that changes at the assembly level is the presence of the new tracking function.
    552550
     
    627625\end{lstlisting}
    628626The statement is standard for forward simulation, but restricted to \texttt{PseudoStatuses} \texttt{ps} whose next instruction to be executed is well-behaved with respect to the \texttt{internal\_pseudo\_address\_map} \texttt{M}.
    629 Further, we explicitly require proof that our policy is correct, our program is well-labelled (i.e. no repeated labels, and so on) and the pseudo-program counter lies within the bounds of the program.
    630 Theorem \texttt{main\_thm} establishes the correctness of the assembly process and can simply be lifted to the forward simulation of an arbitrary number of well-behaved steps on the assembly program.
     627We explicitly require proof that the policy is correct, the program is well-labelled (i.e. no repeated labels, etc.) and the pseudo-program counter is in the program's bounds.
     628Theorem \texttt{main\_thm} establishes the correctness of the assembly process and can be lifted to the forward simulation of an arbitrary number of well-behaved steps on the assembly program.
    631629
    632630% ---------------------------------------------------------------------------- %
     
    637635
    638636We are proving the correctness of an assembler for MCS-51 assembly language.
    639 In particular, our assembly language features labels, arbitrary conditional and unconditional jumps to labels, global data and instructions for moving this data into the MCS-51's single 16-bit register.
     637Our assembly language features labels, arbitrary conditional and unconditional jumps to labels, global data and instructions for moving this data into the MCS-51's single 16-bit register.
    640638Expanding these pseudoinstructions into machine code instructions is not trivial, and the proof that the assembly process is `correct', in that the semantics of a subset of assembly programs are not changed is complex.
    641639
    642 The formalisation is a key component of the CerCo project, which aims to produce a verified concrete complexity preserving compiler for a large subset of the C programming language.
    643 The verified assembler, complete with the underlying formalisation of the semantics of MCS-51 machine code, will form the bedrock layer upon which the rest of the CerCo project will build its verified compiler platform.
     640The formalisation is a component of CerCo which aims to produce a verified concrete complexity preserving compiler for a large subset of the C language.
     641The verified assembler, complete with the underlying formalisation of the semantics of MCS-51 machine code, will form the bedrock layer upon which the rest of CerCo will build its verified compiler platform.
    644642
    645643It is interesting to compare our work to an `industrial grade' assembler for the MCS-51: SDCC~\cite{sdcc:2011}.
     
    662660Our formalisation exploits dependent types in different ways and for multiple purposes.
    663661The first purpose is to reduce potential errors in the formalisation of the microprocessor.
    664 In particular, dependent types are used to constrain the size of bitvectors and tries that represent memory quantities and memory areas respectively.
     662Dependent types are used to constrain the size of bitvectors and tries that represent memory quantities and memory areas respectively.
    665663They are also used to simulate polymorphic variants in Matita, in order to provide precise typings to various functions expecting only a subset of all possible addressing modes that the MCS-51 offers.
    666664Polymorphic variants nicely capture the absolutely unorthogonal instruction set of the MCS-51 where every opcode must accept its own subset of the 11 addressing mode of the processor.
     
    673671Not every proof has been carried out in this way: we only used this style to prove that a function satisfies a specification that only involves that function in a significant way.
    674672It would not be natural to see the proof that fetch and assembly commute as the specification of one of the two functions.
    675 
    676 \subsection{Related work}
    677 \label{subsect.related.work}
    678 
     673\paragraph{Related work}
    679674% piton
    680675We are not the first to consider the correctness of an assembler for a non-trivial assembly language.
    681 The most impressive piece of work in this domain is the Piton stack~\cite{moore:piton:1996}, a stack of verified components, written and verified in ACL2, ranging from a proprietary FM9001 microprocessor verified at the gate level, to assemblers and compilers for two high-level languages---Lisp and $\mu$Gypsy~\cite{moore:grand:2005}.
     676The most impressive piece of work in this domain is Piton~\cite{moore:piton:1996}, a stack of verified components, written and verified in ACL2, ranging from a proprietary FM9001 microprocessor verified at the gate level, to assemblers and compilers for two high-level languages---Lisp and $\mu$Gypsy~\cite{moore:grand:2005}.
    682677
    683678% jinja
     
    691686This is only possible by inducing a cost model on the source code from the optimisation strategy and input program.
    692687This will be a \emph{leit motif} of CerCo.
    693 
    694 \subsection{Resources}
    695 \label{subsect.resources}
    696 
    697 All files relating to our formalisation effort can be found online at~\url{http://cerco.cs.unibo.it}.
    698 In particular, we have assumed several properties of `library functions' related in particular to modular arithmetic and datastructure manipulation.
    699 Moreover, we have axiomatised various ancillary functions needed to complete the main theorems, as well as some `routine' proof obligations of the theorems themselves, preferring instead to focus on the main meat of the theorems.
    700 We thus believe that the proof strategy is sound and that we will be able to close soon all axioms, up to possible minor bugs that should have local fixes that do not affect the global proof strategy.
    701 
    702 The development, including the definition of the executable semantics of the MCS-51, is spread across 29 files, totalling around 18,500 lines of Matita source.
    703 The bulk of the proof described herein is contained in a series of files, \texttt{AssemblyProof.ma}, \texttt{AssemblyProofSplit.ma} and \texttt{AssemblyProofSplitSplit.ma} consisting at the moment of approximately 4200 lines of Matita source.
     688\paragraph{Resources}
     689Our source files are available at~\url{http://cerco.cs.unibo.it}.
     690We assumed several properties of `library functions', e.g. modular arithmetic and datastructure manipulation.
     691We axiomatised various small functions needed to complete the main theorems, as well as some `routine' proof obligations of the theorems themselves, in focussing on the main meat of the theorems.
     692We believe that the proof strategy is sound and that we will be able to close all axioms, up to minor bugs that should have local fixes that do not affect the global proof strategy.
     693
     694The development, including the definition of the executable semantics of the MCS-51, is spread across 29 files, with around 18,500 lines of Matita source.
     695The bulk of the proof is contained in a series of files, \texttt{AssemblyProof.ma}, \texttt{AssemblyProofSplit.ma} and \texttt{AssemblyProofSplitSplit.ma} consisting of approximately 4500 lines of Matita source.
    704696Numerous other lines of proofs are spread all over the development because of dependent types and the Russell proof style, which does not allow one to separate the code from the proofs.
    705 The low ratio between the number of lines of code and the number of lines of proof is unusual.
    706 It is justified by the fact that the pseudo-assembly and the assembly language share most constructs and that large parts of the semantics are also shared.
    707 Therefore many lines of code are required to describe the complex semantics of the processor, but, for the shared cases, the proof of preservation of the semantics is essentially trivial.
     697The low ratio between source lines and the number of lines of proof is unusual, but justified by the fact that the pseudo-assembly and the assembly language share most constructs and large swathes of the semantics are shared.
     698Many lines of code are required to describe the complex semantics of the processor, but for the shared cases the proof of preservation of the semantics is essentially trivial.
    708699
    709700\bibliography{cpp-2012-asm.bib}
Note: See TracChangeset for help on using the changeset viewer.