Changeset 2060


Ignore:
Timestamp:
Jun 13, 2012, 4:26:50 PM (5 years ago)
Author:
mulligan
Message:

More work on paper.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • src/ASM/CPP2012-asm/cpp-2012-asm.tex

    r2058 r2060  
    8181Each of these three instructions expects arguments in different sizes and behaves in markedly different ways: \texttt{SJMP} may only perform a `local jump'; \texttt{LJMP} may jump to any address in the MCS-51's memory space and \texttt{AJMP} may jump to any address in the current memory page.
    8282Consequently, the size of each opcode is different, and to squeeze as much code as possible into the MCS-51's limited code memory, the smallest possible opcode that will suffice should be selected.
    83 This is a well known problem to assembler writers, often referred to as `branch displacement'.
     83This is a well known problem to assembler writers who target RISC architectures, often referred to as `branch displacement'.
    8484
    8585Branch displacement is not a simple problem to solve and requires the implementation of an optimising assembler.
     
    403403This pseudo internal RAM corresponds to an internal RAM where the stack holds the real addresses after optimisation, and all the other values remain untouched.
    404404
    405 We use an \texttt{internal\_pseudo\_address\_map} to trace addresses of code memory addresses in internal RAM.
    406 The current code is parametric on the implementation of the map itself.
    407 \begin{lstlisting}
    408 axiom internal_pseudo_address_map: Type[0].
    409 \end{lstlisting}
     405We use an \texttt{internal\_pseudo\_address\_map} to trace addresses of code memory addresses in internal RAM:
     406\begin{lstlisting}
     407definition internal_pseudo_address_map :=
     408  list ((BitVector 8) $\times$ (bool $\times$ Word)) $\times$ (option (bool $\times$ Word)).
     409\end{lstlisting}
     410The implementation of \texttt{internal\_pseudo\_address\_map} is complicated by some peculiarities of the MCS-51's instruction set.
     411Note here that all addresses are 16 bit words, but are stored (and manipulated) as 8 bit bytes.
     412All \texttt{MOV} instructions in the MCS-51 must use the accumulator \texttt{A} as an intermediary, moving a byte at a time.
     413The second component of \texttt{internal\_pseudo\_address\_map} therefore states whether the accumulator currently holds a piece of an address, and if so, whether it is the upper or lower byte of the address (using the boolean flag) complete with the corresponding, source address in full.
     414The first component, on the other hand, performs a similar task for the rest of external RAM.
     415Again, we use a boolean flag to describe whether a byte is the upper or lower component of a 16-bit address.
    410416
    411417The \texttt{low\_internal\_ram\_of\_pseudo\_low\_internal\_ram} function converts the lower internal RAM of a \texttt{PseudoStatus} into the lower internal RAM of a \texttt{Status}.
     
    475481In particular, our assembly language featured labels, arbitrary conditional and unconditional jumps to labels, global data and instructions for moving this data into the MCS-51's single 16-bit register.
    476482Expanding these pseudoinstructions into machine code instructions is not trivial, and the proof that the assembly process is `correct', in that the semantics of a subset of assembly programs are not changed is complex.
    477 Further, we have observed the `shocking' fact that any optimising assembler cannot preserve the semantics of all assembly programs.
    478483
    479484The formalisation is a key component of the CerCo project, which aims to produce a verified concrete complexity preserving compiler for a large subset of the C programming language.
    480485The verified assembler, complete with the underlying formalisation of the semantics of MCS-51 machine code (described fully in~\cite{mulligan:executable:2011}), will form the bedrock layer upon which the rest of the CerCo project will build its verified compiler platform.
    481 However, further work is needed.
    482 In particular, as it stands, the code produced by the prototype CerCo C compiler does not fall into the `semantics preserving' subset of assembly programs for our assembler.
    483 This is because the MCS-51 features a small stack space, and a larger stack is customarily manually emulated in external RAM.
    484 As a result, the majority of programs feature slices of memory addresses and program counters being moved in-and-out of external RAM via the registers, simulating the stack mechanism.
    485 At the moment, this movement is not tracked by \texttt{internal\_pseudo\_address\_map}, which only tracks the movement of memory addresses in low internal RAM.
    486 We leave extending this tracking of memory addresses throughout the whole of the MCS-51's address spaces as future work.
    487486
    488487It is interesting to compare our work to an `industrial grade' assembler for the MCS-51: SDCC~\cite{sdcc:2011}.
     
    492491However, this comes at the expense of assembler completeness: the generated program may be too large to fit into code memory.
    493492In this respect, there is a trade-off between the completeness of the assembler and the efficiency of the assembled program.
    494 The definition and proof of a complete, optimal (in the sense that object code size is minimised) and correct jump expansion policy is ongoing work.
     493The definition and proof of a terminating, correct jump expansion policy is described in a companion publication to this one.
    495494
    496495Aside from their application in verified compiler projects such as CerCo and CompCert, verified assemblers such as ours could also be applied to the verification of operating system kernels.
     
    499498
    500499Note that both CompCert and the seL4 formalisation assume the existence of `trustworthy' assemblers.
    501 Our observation that an optimising assembler cannot preserve the semantics of every assembly program may have important consequences for these projects.
     500The observation that an optimising assembler cannot preserve the semantics of every assembly program may have important consequences for these projects.
    502501If CompCert chooses to assume the existence of an optimising assembler, then care should be made to ensure that any assembly program produced by the CompCert compiler falls into the subset of programs that have a hope of having their semantics preserved by an optimising assembler.
    503502
    504 Our formalisation exploits dependent types in different ways and for multiple
    505 purposes. The first purpose is to reduce potential errors in the formalisation
    506 of the microprocessor. In particular,
    507 dependent types are used to constraint the size of bit-vectors and
    508 tries that represent memory quantities and memory areas respectively.
    509 They are also used as explained in~\cite{mulligan:executable:2011}.
    510 to simulate polymorphic variants in Matita. Polymorphic variants nicely
    511 capture the absolutely unorthogonal instruction set of the MCS-51 where every
    512 opcode must accept its own subset of the 11 addressing mode of the processor.
    513 
    514 The second purpose is to single out the sources of incompleteness. By
    515 abstracting our functions over the dependent type of correct policies, we were
    516 able to manifest the fact that the compiler never refuses to compile a program
    517 where a correct policy exists. This also allowed to simplify the
    518 initial proof by dropping lemmas establishing that one function fails if and
    519 only if some other one does so.
    520 
    521 Finally, dependent types, together with Matita's liberal system of coercions,
    522 allow to simulate almost entirely in user space the proof methodology
    523 ``Russell'' of Sozeau~\cite{sozeau:subset:2006}. However, not every
    524 proof has been done this way: we only used this style to prove that a
    525 function satisfies a specification that only involves that function in a
    526 significant way. For example, it would be unnatural to see the proof that
    527 fetch and assembly commute as the specification of one of the two functions.
     503Our formalisation exploits dependent types in different ways and for multiple purposes.
     504The first purpose is to reduce potential errors in the formalisation of the microprocessor.
     505In particular, dependent types are used to constraint the size of bitvectors and tries that represent memory quantities and memory areas respectively.
     506They are also used to simulate polymorphic variants in Matita, in order to provide precise typings to various functions expecting only a subset of all possible addressing modes than the MCS-51 offers.
     507Polymorphic variants nicely capture the absolutely unorthogonal instruction set of the MCS-51 where every opcode must accept its own subset of the 11 addressing mode of the processor.
     508
     509The second purpose is to single out the sources of incompleteness.
     510By abstracting our functions over the dependent type of correct policies, we were able to manifest the fact that the compiler never refuses to compile a program where a correct policy exists.
     511This also allowed to simplify the initial proof by dropping lemmas establishing that one function fails if and only if some other one does so.
     512
     513Finally, dependent types, together with Matita's liberal system of coercions, allow to simulate almost entirely in user space the proof methodology ``Russell'' of Sozeau~\cite{sozeau:subset:2006}.
     514However, not every proof has been done this way: we only used this style to prove that a function satisfies a specification that only involves that function in a significant way.
     515For example, it would be unnatural to see the proof that fetch and assembly commute as the specification of one of the two functions.
    528516
    529517\subsection{Related work}
     
    541529We believe some other verified assemblers exist in the literature.
    542530However, what sets our work apart from that above is our attempt to optimise the machine code generated by our assembler.
    543 This complicates any formalisation effort, as the best possible selection of machine instructions must be made, especially important on a device such as the MCS-51 with a miniscule code memory.
     531This complicates any formalisation effort, as an attempt at the best possible selection of machine instructions must be made, especially important on a device such as the MCS-51 with a miniscule code memory.
    544532Further, care must be taken to ensure that the time properties of an assembly program are not modified by the assembly process lest we affect the semantics of any program employing the MCS-51's I/O facilities.
    545533This is only possible by inducing a cost model on the source code from the optimisation strategy and input program.
    546534This will be a \emph{leit motif} of CerCo.
    547535
    548 Finally, mention of CerCo will invariably invite comparisons with CompCert~\cite{compcert:2011,leroy:formal:2009}, another verified compiler project related to CerCo.
    549 As previously mentioned, CompCert considers only extensional correctness of the compiler, and not intensional correctness, which CerCo focusses on.
    550 However, CerCo also extends CompCert in other ways.
    551 Namely, the CompCert verified compilation chain terminates at the assembly level, and takes for granted the existence of a trustworthy assembler.
    552 CerCo chooses to go further, by considering a verified compilation chain all the way down to the machine code level.
    553 The work presented in this publication is one part of CerCo's extension over CompCert.
    554 
    555536\subsection{Resources}
    556537\label{subsect.resources}
    557538
    558 All files relating to our formalisation effort can be found online at~\url{http://cerco.cs.unibo.it}. The code of the compiler has been completed, and the
    559 proof of correctness described here is still in progress. In particular, we
    560 have assumed several properties of ``library functions'' related in particular
    561 to modular arithmetics and datastructures manipulation. Moreover, we only
    562 completed the interesting cases of some of the main theorems that proceed by
    563 cases on all the possible opcodes.
    564 We thus believe that the proof strategy is sound and that we will be able to
    565 close soon all axioms, up to possible minor bugs that should have local fixes
    566 that do not affect the global proof strategy.
    567 
    568 The development, including the definition of the executable semantics of the MCS-51, is spread across 17 files, totalling around 11,500 lines of Matita source.
    569 The bulk of the proof described herein is contained in a single file, \texttt{AssemblyProof.ma}, consisting at the moment of approximately 2500 lines of Matita source. Another 1000 lines of proofs are spread all over the development because
    570 of dependent types and the Russell proof style, that do not allow to separate the code from the proofs. The low ratio between the number of lines of code and
    571 the number of lines of proof is unusual. It is justified by the fact that
    572 the pseudo-assembly and the assembly language share most constructs and that
    573 large parts of the semantics is also shared. Thus many lines of code are
    574 required to describe the complex semantics of the processor, but, for
    575 the shared cases, the proof of preservation of the semantics is essentially
    576 trivial.
     539All files relating to our formalisation effort can be found online at~\url{http://cerco.cs.unibo.it}.
     540The code of the compiler has been completed, and the proof of correctness described here is still in progress.
     541In particular, we have assumed several properties of ``library functions'' related in particular to modular arithmetic and datastructure manipulation.
     542Moreover, we have axiomatised various ancillary functions needed to complete the main theorems, as well as some `routine' proof obligations of the theorems themselves, preferring instead to focus on the main meat of the theorems.
     543We thus believe that the proof strategy is sound and that we will be able to close soon all axioms, up to possible minor bugs that should have local fixes that do not affect the global proof strategy.
     544
     545The development, including the definition of the executable semantics of the MCS-51, is spread across 29 files, totalling around 18,500 lines of Matita source.
     546The bulk of the proof described herein is contained in a series of files, \texttt{AssemblyProof.ma}, \texttt{AssemblyProofSplit.ma} and \texttt{AssemblyProofSplitSplit.ma} consisting at the moment of approximately 4200 lines of Matita source.
     547Numerous other lines of proofs are spread all over the development because of dependent types and the Russell proof style, which does not allow one to separate the code from the proofs.
     548The low ratio between the number of lines of code and the number of lines of proof is unusual.
     549It is justified by the fact that the pseudo-assembly and the assembly language share most constructs and that large parts of the semantics are also shared.
     550Thus many lines of code are required to describe the complex semantics of the processor, but, for the shared cases, the proof of preservation of the semantics is essentially trivial.
    577551
    578552\bibliography{cpp-2012-asm.bib}
Note: See TracChangeset for help on using the changeset viewer.