# Changeset 995

Ignore:
Timestamp:
Jun 20, 2011, 3:27:54 PM (9 years ago)
Message:

changes

File:
1 edited

### Legend:

Unmodified
 r992 \title{On the correctness of an assembler for the Intel MCS-51 microprocessor} \author{Dominic P. Mulligan \and Claudio Sacerdoti Coen} \author{Dominic P. Mulligan \and Claudio Sacerdoti Coen\thanks{The project CerCo acknowledges the financial support of the Future and Emerging Technologies (FET) programme within the Seventh Framework Programme for Research of the European Commission, under FET-Open grant number: 243881}} \institute{Dipartimento di Scienze dell'Informazione, Universit\'a di Bologna} This makes the proof of correctness for the assembler significantly more straightforward. We prove, under the assumption of the existence of a correct policy, that the assembly process preserves the semantics of assembly programs. We prove, under the assumption of the existence of a correct policy, that the assembly process preserves the semantics of a subset of assembly programs. Correct policies fail to exist only in a limited number of pathological circumstances. Surprisingly, it is impossible for an optimising assembler to preserve the semantics of every assembly program. \end{abstract} In particular, the MCS-51 features relatively miniscule memory spaces (including read-only code memory, stack and internal/external random access memory) by modern standards. As a result our compiler, to have any sort of hope of successfully compiling realistic C programs, ought to produce tight' machine code. This is not simple. We here focus on a single issue in the MCS-51's instruction set: unconditional jumps. The MCS-51 features three conditional jump instructions: \texttt{LJMP} and \texttt{SJMP}---long jump' and short jump' respectively---and an 11-bit oddity of the MCS-51, \texttt{AJMP}, that the prototype CerCo compiler~\cite{cerco-report-code:2011} ignores for simplicity's sake.\footnote{Ignoring \texttt{AJMP} and its analogue \texttt{ACALL} is not idiosyncratic.  The Small Device C Compiler (SDCC)~\cite{sdcc:2011}, the leading open source C compiler for the MCS-51, also seemingly does not produce \texttt{AJMP} and \texttt{ACALL} instructions.  Their utility in a modern context remains unclear.} Each of these three instructions expects arguments in different sizes and behaves in different ways. \texttt{SJMP} may only perform a local jump' whereas \texttt{LJMP} may jump to any address in the MCS-51's memory space. This is not simple and requires the use of optimisations. For example, the MCS-51 features three conditional jump instructions: \texttt{LJMP} and \texttt{SJMP}---long jump' and short jump' respectively---and an 11-bit oddity of the MCS-51, \texttt{AJMP}. Each of these three instructions expects arguments in different sizes and behaves in different ways: \texttt{SJMP} may only perform a local jump'; \texttt{LJMP} may jump to any address in the MCS-51's memory space and \texttt{AJMP} may jump to any address in the current memory page. Consequently, the size of each opcode is different, and to squeeze as much code as possible into the MCS-51's limited code memory, the smallest possible opcode should be selected. However, this assembly process is not trivial, for numerous reasons. For example, our conditional jumps to labels behave differently from their machine code counterparts. At the machine code level, conditional jumps may only jump to a relative offset, expressed in a byte, of the current program counter, limiting their range. At the machine code level, all conditional jumps are short', limiting their range. However, at the assembly level, conditional jumps may jump to a label that appears anywhere in the program, significantly liberalising the use of conditional jumps and further simplifying the design of the CerCo compiler. Further, trying to na\"ively relate assembly programs with their machine code counterparts simply does not work. Machine code programs that fetch from code memory and programs that combine the program counter with constant shifts do not make sense at the assembly level. Machine code programs that fetch from constant addresses in code memory or programs that combine the program counter with constant shifts do not make sense at the assembly level, since the position of instructions in code memory will be known only after assembly and optimisation. More generally, memory addresses can only be compared with other memory addresses. However, checking that memory addresses are only compared against each other at the assembly level is in fact undecidable. In short, the full preservation of the semantics of the two languages is impossible. Yet more complications are added by the peculiarities of the CerCo project itself. As mentioned, the CerCo consortium is in the business of constructing a verified compiler for the C programming language. However, unlike CompCert~\cite{compcert:2011,leroy:formal:2009,leroy:formally:2009}---which currently represents the state of the art for industrial grade' verified compilers---CerCo considers not just the \emph{intensional correctness} of the compiler, but also its \emph{extensional correctness}. In short, we come to the shocking realisation that, with optimisations, the full preservation of the semantics of the two languages is impossible. We believe that this revelation is significant for large formalisation projects that assume the existence of a correct assembler. Projects in this class include both the recent CompCert~\cite{compcert:2011,leroy:formal:2009} and seL4 formalisations~\cite{klein:sel4:2009}. Yet, the situation is even more complex than having to expand pseudoinstructions correctly. In particular, when formalising the assembler, we must make sure that the assembly process does not change the timing characteristics of an assembly program for two reasons. First, the semantics of some functions of the MCS-51, notably I/O, depend on the microprocessor's clock. Changing how long a particular program takes to execute can affect the semantics of a program. This is undesirable. Second, as mentioned, the CerCo consortium is in the business of constructing a verified compiler for the C programming language. However, unlike CompCert~\cite{compcert:2011,leroy:formal:2009,leroy:formally:2009}---which currently represents the state of the art for industrial grade' verified compilers---CerCo considers not just the \emph{extensional correctness} of the compiler, but also its \emph{intensional correctness}. That is, CompCert focusses solely on the preservation of the \emph{meaning} of a program during the compilation process, guaranteeing that the program's meaning does not change as it is gradually transformed into assembly code. However in any realistic compiler (even the CompCert compiler!) there is no guarantee that the program's time properties are preserved during the compilation process; a compiler's optimisations' could, in theory, even conspire to degrade the concrete complexity of certain classes of programs. CerCo aims to expand the current state of the art by producing a compiler where this temporal degradation is guaranteed not to happen. Moreover, CerCo's approach lifts a program's timing information to the source (C language) level, wherein the programmer can reason about a program's intensional properties by directly examining the source code that they write. In order to achieve this CerCo imposes a cost model on programs, or more specifically, on simple blocks of instructions. Perhaps more insidious, the number of cycles needed to execute the instructions in the two branches of a translated conditional jump may be different. Depending on the particular MCS-51 derivative at hand, an \texttt{SJMP} could in theory take a different number of clock cycles to execute than an \texttt{LJMP}. These issues must also be dealt with in order to prove that the translation pass preserves the concrete complexity of the code. These issues must also be dealt with in order to prove that the translation pass preserves the concrete complexity of the code, and that the semantics of a program using the MCS-51's I/O facilities does not change. The question remains: how do we decide whether to expand a jump into an \texttt{SJMP} or an \texttt{LJMP}? \label{sect.matita} Matita is a proof assistant based on the (Co)inductive Calculus of Constructions~\cite{asperti:user:2007}. Matita is a proof assistant based on the Calculus of (Co)inductive Constructions~\cite{asperti:user:2007}. For those familiar with Coq, Matita's syntax and mode of operation should be entirely familiar. We take time here to explain one of Matita's syntactic idiosyncracies, however. The use of $\mathtt{?}$' or $\mathtt{\ldots}$' in an argument position denotes a type or types to be inferred automatically by unification respectively. However, we take time here to explain one of Matita's syntactic idiosyncrasies. The use of $\mathtt{?}$' or $\mathtt{\ldots}$' in an argument position denotes a term or terms to be inferred automatically by unification, respectively. The use of $\mathtt{?}$' in the body of a definition, lemma or axiom denotes an incomplete term that is to be closed, by hand, using tactics.