Changeset 995 for src/ASM/CPP2011
- Timestamp:
- Jun 20, 2011, 3:27:54 PM (8 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
src/ASM/CPP2011/cpp-2011.tex
r992 r995 36 36 37 37 \title{On the correctness of an assembler for the Intel MCS-51 microprocessor} 38 \author{Dominic P. Mulligan \and Claudio Sacerdoti Coen }38 \author{Dominic P. Mulligan \and Claudio Sacerdoti Coen\thanks{The project CerCo acknowledges the financial support of the Future and Emerging Technologies (FET) programme within the Seventh Framework Programme for Research of the European Commission, under FET-Open grant number: 243881}} 39 39 \institute{Dipartimento di Scienze dell'Informazione, Universit\'a di Bologna} 40 40 … … 53 53 This makes the proof of correctness for the assembler significantly more straightforward. 54 54 55 We prove, under the assumption of the existence of a correct policy, that the assembly process preserves the semantics of a ssembly programs.55 We prove, under the assumption of the existence of a correct policy, that the assembly process preserves the semantics of a subset of assembly programs. 56 56 Correct policies fail to exist only in a limited number of pathological circumstances. 57 Surprisingly, it is impossible for an optimising assembler to preserve the semantics of every assembly program. 57 58 \end{abstract} 58 59 … … 82 83 In particular, the MCS-51 features relatively miniscule memory spaces (including read-only code memory, stack and internal/external random access memory) by modern standards. 83 84 As a result our compiler, to have any sort of hope of successfully compiling realistic C programs, ought to produce `tight' machine code. 84 This is not simple. 85 86 We here focus on a single issue in the MCS-51's instruction set: unconditional jumps. 87 The MCS-51 features three conditional jump instructions: \texttt{LJMP} and \texttt{SJMP}---`long jump' and `short jump' respectively---and an 11-bit oddity of the MCS-51, \texttt{AJMP}, that the prototype CerCo compiler~\cite{cerco-report-code:2011} ignores for simplicity's sake.\footnote{Ignoring \texttt{AJMP} and its analogue \texttt{ACALL} is not idiosyncratic. The Small Device C Compiler (SDCC)~\cite{sdcc:2011}, the leading open source C compiler for the MCS-51, also seemingly does not produce \texttt{AJMP} and \texttt{ACALL} instructions. Their utility in a modern context remains unclear.} 88 Each of these three instructions expects arguments in different sizes and behaves in different ways. 89 \texttt{SJMP} may only perform a `local jump' whereas \texttt{LJMP} may jump to any address in the MCS-51's memory space. 85 This is not simple and requires the use of optimisations. 86 87 For example, the MCS-51 features three conditional jump instructions: \texttt{LJMP} and \texttt{SJMP}---`long jump' and `short jump' respectively---and an 11-bit oddity of the MCS-51, \texttt{AJMP}. 88 Each of these three instructions expects arguments in different sizes and behaves in different ways: \texttt{SJMP} may only perform a `local jump'; \texttt{LJMP} may jump to any address in the MCS-51's memory space and \texttt{AJMP} may jump to any address in the current memory page. 90 89 Consequently, the size of each opcode is different, and to squeeze as much code as possible into the MCS-51's limited code memory, the smallest possible opcode should be selected. 91 90 … … 98 97 However, this assembly process is not trivial, for numerous reasons. 99 98 For example, our conditional jumps to labels behave differently from their machine code counterparts. 100 At the machine code level, conditional jumps may only jump to a relative offset, expressed in a byte, of the current program counter, limiting their range.99 At the machine code level, all conditional jumps are `short', limiting their range. 101 100 However, at the assembly level, conditional jumps may jump to a label that appears anywhere in the program, significantly liberalising the use of conditional jumps and further simplifying the design of the CerCo compiler. 102 101 103 102 Further, trying to na\"ively relate assembly programs with their machine code counterparts simply does not work. 104 Machine code programs that fetch from co de memory and programs that combine the program counter with constant shifts do not make sense at the assembly level.103 Machine code programs that fetch from constant addresses in code memory or programs that combine the program counter with constant shifts do not make sense at the assembly level, since the position of instructions in code memory will be known only after assembly and optimisation. 105 104 More generally, memory addresses can only be compared with other memory addresses. 106 105 However, checking that memory addresses are only compared against each other at the assembly level is in fact undecidable. 107 In short, the full preservation of the semantics of the two languages is impossible. 108 109 Yet more complications are added by the peculiarities of the CerCo project itself. 110 As mentioned, the CerCo consortium is in the business of constructing a verified compiler for the C programming language. 111 However, unlike CompCert~\cite{compcert:2011,leroy:formal:2009,leroy:formally:2009}---which currently represents the state of the art for `industrial grade' verified compilers---CerCo considers not just the \emph{intensional correctness} of the compiler, but also its \emph{extensional correctness}. 106 In short, we come to the shocking realisation that, with optimisations, the full preservation of the semantics of the two languages is impossible. 107 We believe that this revelation is significant for large formalisation projects that assume the existence of a correct assembler. 108 Projects in this class include both the recent CompCert~\cite{compcert:2011,leroy:formal:2009} and seL4 formalisations~\cite{klein:sel4:2009}. 109 110 Yet, the situation is even more complex than having to expand pseudoinstructions correctly. 111 In particular, when formalising the assembler, we must make sure that the assembly process does not change the timing characteristics of an assembly program for two reasons. 112 First, the semantics of some functions of the MCS-51, notably I/O, depend on the microprocessor's clock. 113 Changing how long a particular program takes to execute can affect the semantics of a program. 114 This is undesirable. 115 116 Second, as mentioned, the CerCo consortium is in the business of constructing a verified compiler for the C programming language. 117 However, unlike CompCert~\cite{compcert:2011,leroy:formal:2009,leroy:formally:2009}---which currently represents the state of the art for `industrial grade' verified compilers---CerCo considers not just the \emph{extensional correctness} of the compiler, but also its \emph{intensional correctness}. 112 118 That is, CompCert focusses solely on the preservation of the \emph{meaning} of a program during the compilation process, guaranteeing that the program's meaning does not change as it is gradually transformed into assembly code. 113 119 However in any realistic compiler (even the CompCert compiler!) there is no guarantee that the program's time properties are preserved during the compilation process; a compiler's `optimisations' could, in theory, even conspire to degrade the concrete complexity of certain classes of programs. 114 120 CerCo aims to expand the current state of the art by producing a compiler where this temporal degradation is guaranteed not to happen. 121 Moreover, CerCo's approach lifts a program's timing information to the source (C language) level, wherein the programmer can reason about a program's intensional properties by directly examining the source code that they write. 115 122 116 123 In order to achieve this CerCo imposes a cost model on programs, or more specifically, on simple blocks of instructions. … … 141 148 Perhaps more insidious, the number of cycles needed to execute the instructions in the two branches of a translated conditional jump may be different. 142 149 Depending on the particular MCS-51 derivative at hand, an \texttt{SJMP} could in theory take a different number of clock cycles to execute than an \texttt{LJMP}. 143 These issues must also be dealt with in order to prove that the translation pass preserves the concrete complexity of the code .150 These issues must also be dealt with in order to prove that the translation pass preserves the concrete complexity of the code, and that the semantics of a program using the MCS-51's I/O facilities does not change. 144 151 145 152 The question remains: how do we decide whether to expand a jump into an \texttt{SJMP} or an \texttt{LJMP}? … … 181 188 \label{sect.matita} 182 189 183 Matita is a proof assistant based on the (Co)inductive Calculus ofConstructions~\cite{asperti:user:2007}.190 Matita is a proof assistant based on the Calculus of (Co)inductive Constructions~\cite{asperti:user:2007}. 184 191 For those familiar with Coq, Matita's syntax and mode of operation should be entirely familiar. 185 We take time here to explain one of Matita's syntactic idiosyncracies, however.186 The use of `$\mathtt{?}$' or `$\mathtt{\ldots}$' in an argument position denotes a t ype or types to be inferred automatically by unificationrespectively.192 However, we take time here to explain one of Matita's syntactic idiosyncrasies. 193 The use of `$\mathtt{?}$' or `$\mathtt{\ldots}$' in an argument position denotes a term or terms to be inferred automatically by unification, respectively. 187 194 The use of `$\mathtt{?}$' in the body of a definition, lemma or axiom denotes an incomplete term that is to be closed, by hand, using tactics. 188 195
Note: See TracChangeset
for help on using the changeset viewer.