Changeset 995


Ignore:
Timestamp:
Jun 20, 2011, 3:27:54 PM (8 years ago)
Author:
mulligan
Message:

changes

File:
1 edited

Legend:

Unmodified
Added
Removed
  • src/ASM/CPP2011/cpp-2011.tex

    r992 r995  
    3636
    3737\title{On the correctness of an assembler for the Intel MCS-51 microprocessor}
    38 \author{Dominic P. Mulligan \and Claudio Sacerdoti Coen}
     38\author{Dominic P. Mulligan \and Claudio Sacerdoti Coen\thanks{The project CerCo acknowledges the financial support of the Future and Emerging Technologies (FET) programme within the Seventh Framework Programme for Research of the European Commission, under FET-Open grant number: 243881}}
    3939\institute{Dipartimento di Scienze dell'Informazione, Universit\'a di Bologna}
    4040
     
    5353This makes the proof of correctness for the assembler significantly more straightforward.
    5454
    55 We prove, under the assumption of the existence of a correct policy, that the assembly process preserves the semantics of assembly programs.
     55We prove, under the assumption of the existence of a correct policy, that the assembly process preserves the semantics of a subset of assembly programs.
    5656Correct policies fail to exist only in a limited number of pathological circumstances.
     57Surprisingly, it is impossible for an optimising assembler to preserve the semantics of every assembly program.
    5758\end{abstract}
    5859
     
    8283In particular, the MCS-51 features relatively miniscule memory spaces (including read-only code memory, stack and internal/external random access memory) by modern standards.
    8384As a result our compiler, to have any sort of hope of successfully compiling realistic C programs, ought to produce `tight' machine code.
    84 This is not simple.
    85 
    86 We here focus on a single issue in the MCS-51's instruction set: unconditional jumps.
    87 The MCS-51 features three conditional jump instructions: \texttt{LJMP} and \texttt{SJMP}---`long jump' and `short jump' respectively---and an 11-bit oddity of the MCS-51, \texttt{AJMP}, that the prototype CerCo compiler~\cite{cerco-report-code:2011} ignores for simplicity's sake.\footnote{Ignoring \texttt{AJMP} and its analogue \texttt{ACALL} is not idiosyncratic.  The Small Device C Compiler (SDCC)~\cite{sdcc:2011}, the leading open source C compiler for the MCS-51, also seemingly does not produce \texttt{AJMP} and \texttt{ACALL} instructions.  Their utility in a modern context remains unclear.}
    88 Each of these three instructions expects arguments in different sizes and behaves in different ways.
    89 \texttt{SJMP} may only perform a `local jump' whereas \texttt{LJMP} may jump to any address in the MCS-51's memory space.
     85This is not simple and requires the use of optimisations.
     86
     87For example, the MCS-51 features three conditional jump instructions: \texttt{LJMP} and \texttt{SJMP}---`long jump' and `short jump' respectively---and an 11-bit oddity of the MCS-51, \texttt{AJMP}.
     88Each of these three instructions expects arguments in different sizes and behaves in different ways: \texttt{SJMP} may only perform a `local jump'; \texttt{LJMP} may jump to any address in the MCS-51's memory space and \texttt{AJMP} may jump to any address in the current memory page.
    9089Consequently, the size of each opcode is different, and to squeeze as much code as possible into the MCS-51's limited code memory, the smallest possible opcode should be selected.
    9190
     
    9897However, this assembly process is not trivial, for numerous reasons.
    9998For example, our conditional jumps to labels behave differently from their machine code counterparts.
    100 At the machine code level, conditional jumps may only jump to a relative offset, expressed in a byte, of the current program counter, limiting their range.
     99At the machine code level, all conditional jumps are `short', limiting their range.
    101100However, at the assembly level, conditional jumps may jump to a label that appears anywhere in the program, significantly liberalising the use of conditional jumps and further simplifying the design of the CerCo compiler.
    102101
    103102Further, trying to na\"ively relate assembly programs with their machine code counterparts simply does not work.
    104 Machine code programs that fetch from code memory and programs that combine the program counter with constant shifts do not make sense at the assembly level.
     103Machine code programs that fetch from constant addresses in code memory or programs that combine the program counter with constant shifts do not make sense at the assembly level, since the position of instructions in code memory will be known only after assembly and optimisation.
    105104More generally, memory addresses can only be compared with other memory addresses.
    106105However, checking that memory addresses are only compared against each other at the assembly level is in fact undecidable.
    107 In short, the full preservation of the semantics of the two languages is impossible.
    108 
    109 Yet more complications are added by the peculiarities of the CerCo project itself.
    110 As mentioned, the CerCo consortium is in the business of constructing a verified compiler for the C programming language.
    111 However, unlike CompCert~\cite{compcert:2011,leroy:formal:2009,leroy:formally:2009}---which currently represents the state of the art for `industrial grade' verified compilers---CerCo considers not just the \emph{intensional correctness} of the compiler, but also its \emph{extensional correctness}.
     106In short, we come to the shocking realisation that, with optimisations, the full preservation of the semantics of the two languages is impossible.
     107We believe that this revelation is significant for large formalisation projects that assume the existence of a correct assembler.
     108Projects in this class include both the recent CompCert~\cite{compcert:2011,leroy:formal:2009} and seL4 formalisations~\cite{klein:sel4:2009}.
     109
     110Yet, the situation is even more complex than having to expand pseudoinstructions correctly.
     111In particular, when formalising the assembler, we must make sure that the assembly process does not change the timing characteristics of an assembly program for two reasons.
     112First, the semantics of some functions of the MCS-51, notably I/O, depend on the microprocessor's clock.
     113Changing how long a particular program takes to execute can affect the semantics of a program.
     114This is undesirable.
     115
     116Second, as mentioned, the CerCo consortium is in the business of constructing a verified compiler for the C programming language.
     117However, unlike CompCert~\cite{compcert:2011,leroy:formal:2009,leroy:formally:2009}---which currently represents the state of the art for `industrial grade' verified compilers---CerCo considers not just the \emph{extensional correctness} of the compiler, but also its \emph{intensional correctness}.
    112118That is, CompCert focusses solely on the preservation of the \emph{meaning} of a program during the compilation process, guaranteeing that the program's meaning does not change as it is gradually transformed into assembly code.
    113119However in any realistic compiler (even the CompCert compiler!) there is no guarantee that the program's time properties are preserved during the compilation process; a compiler's `optimisations' could, in theory, even conspire to degrade the concrete complexity of certain classes of programs.
    114120CerCo aims to expand the current state of the art by producing a compiler where this temporal degradation is guaranteed not to happen.
     121Moreover, CerCo's approach lifts a program's timing information to the source (C language) level, wherein the programmer can reason about a program's intensional properties by directly examining the source code that they write.
    115122
    116123In order to achieve this CerCo imposes a cost model on programs, or more specifically, on simple blocks of instructions.
     
    141148Perhaps more insidious, the number of cycles needed to execute the instructions in the two branches of a translated conditional jump may be different.
    142149Depending on the particular MCS-51 derivative at hand, an \texttt{SJMP} could in theory take a different number of clock cycles to execute than an \texttt{LJMP}.
    143 These issues must also be dealt with in order to prove that the translation pass preserves the concrete complexity of the code.
     150These issues must also be dealt with in order to prove that the translation pass preserves the concrete complexity of the code, and that the semantics of a program using the MCS-51's I/O facilities does not change.
    144151
    145152The question remains: how do we decide whether to expand a jump into an \texttt{SJMP} or an \texttt{LJMP}?
     
    181188\label{sect.matita}
    182189
    183 Matita is a proof assistant based on the (Co)inductive Calculus of Constructions~\cite{asperti:user:2007}.
     190Matita is a proof assistant based on the Calculus of (Co)inductive Constructions~\cite{asperti:user:2007}.
    184191For those familiar with Coq, Matita's syntax and mode of operation should be entirely familiar.
    185 We take time here to explain one of Matita's syntactic idiosyncracies, however.
    186 The use of `$\mathtt{?}$' or `$\mathtt{\ldots}$' in an argument position denotes a type or types to be inferred automatically by unification respectively.
     192However, we take time here to explain one of Matita's syntactic idiosyncrasies.
     193The use of `$\mathtt{?}$' or `$\mathtt{\ldots}$' in an argument position denotes a term or terms to be inferred automatically by unification, respectively.
    187194The use of `$\mathtt{?}$' in the body of a definition, lemma or axiom denotes an incomplete term that is to be closed, by hand, using tactics.
    188195
Note: See TracChangeset for help on using the changeset viewer.