# Changeset 2327 for src/ASM/CPP2012-asm

Ignore:
Timestamp:
Sep 10, 2012, 4:45:44 PM (8 years ago)
Message:

Fixed typos in paper highlighted by referees. More substantial changes will have to wait due to us moving house.

File:
1 edited

### Legend:

Unmodified
 r2095 This assembler constitutes a major component of the EU's CerCo project. The efficient expansion of pseudoinstructions---particularly jumps---into MCS-51 machine instructions is complex. The efficient expansion of pseudoinstructions---namely jumps---into MCS-51 machine instructions is complex. We isolate the decision making over how jumps should be expanded from the expansion process itself as much as possible using policies'. This makes the proof of correctness for the assembler significantly more straightforward. Abandoning this attempt, we instead split the policy'---the decision over how any particular jump should be expanded---from the implementation that actually expands assembly programs into machine code. Assuming the existence of a correct policy, we proved the implementation of the assembler correct. Assuming the existence of a correct policy, we prove the implementation of the assembler correct. Further, we proved that the assembler fails to assemble an assembly program if and only if a correct policy does not exist. This is achieved by means of dependent types: the assembly function is total over a program, a policy and the proof that the policy is correct for that program. The rest of this paper is a detailed description of our proof that is, in part, still a work in progress. % ---------------------------------------------------------------------------- % % SECTION                                                                      % % ---------------------------------------------------------------------------- % \subsection{Overview of the paper} \label{subsect.overview.of.the.paper} We provide the reader with a brief roadmap' for the rest of the paper. In Section~\ref{sect.matita} we provide a brief overview of the Matita proof assistant for the unfamiliar reader. In Section~\ref{sect.the.proof} we discuss the design and implementation of the proof proper. Matita is a proof assistant based on a variant of the Calculus of (Co)inductive Constructions~\cite{asperti:user:2007}. In particular, it features dependent types that we heavily exploit in the formalisation. In particular, it features dependent types that we exploit in the formalisation. The syntax of the statements and definitions in the paper should be self-explanatory, at least to those exposed to dependent type theory. We only remark the use of of $\mathtt{?}$' or $\mathtt{\ldots}$' for omitting single terms or sequences of terms to be inferred automatically by the system, respectively. We only remark that the use of $\mathtt{?}$' or $\mathtt{\ldots}$' for omitting single terms or sequences of terms to be inferred automatically by the system, respectively. Those that are not inferred are left to the user as proof obligations. Pairs are denoted with angular brackets, $\langle-, -\rangle$. Our emulator centres around a \texttt{Status} record, describing the microprocessor's state. This record contains fields corresponding to the microprocessor's program counter, registers, and so on. This record contains fields corresponding to the microprocessor's program counter, registers, stack pointer, special function registers, and so on. At the machine code level, code memory is implemented as a compact trie of bytes addressed by the program counter. We parameterise \texttt{Status} records by this representation as a few technical tasks manipulating statuses are made simpler using this approach, as well as permitting a modicum of abstraction. The function \texttt{assembly\_1\_pseudo\_instruction} used in the body of the paper is essentially the composition of the two passes. The branch displacement problems consists of the problem of expanding pseudojumps into their concrete counterparts, preferably as efficiently as possible. The branch displacement problem refers to the task of expanding pseudojumps into their concrete counterparts, preferably as efficiently as possible. For instance, the MCS-51 features three unconditional jump instructions: \texttt{LJMP} and \texttt{SJMP}---long jump' and short jump' respectively---and an 11-bit oddity of the MCS-51, \texttt{AJMP}. Each of these three instructions expects arguments in different sizes and behaves in markedly different ways: \texttt{SJMP} may only perform a local jump'; \texttt{LJMP} may jump to any address in the MCS-51's memory space and \texttt{AJMP} may jump to any address in the current memory page. nth j a = nth (add $\ldots$ (sigma ppc) (bitvector_of_nat ? j))) assembled \end{lstlisting} In plain words, the type of assembly states the following. In plain words, the type of \texttt{assembly} states the following. Suppose we are given a policy that is correct for the program we are assembling, and suppose the program to be assembled is fully addressable by a 16-bit word. Then if we fetch from the pseudo program counter \texttt{ppc} we get a pseudoinstruction \texttt{pi} and a new pseudo program counter \texttt{ppc}. \end{lstlisting} Here we use $\pi_1 \ldots$ to eject out the existential witness from the Russell-typed function \texttt{assembly}. Here we use $\pi_1 \ldots$ to project the existential witness from the Russell-typed function \texttt{assembly}. We read \texttt{fetch\_assembly\_pseudo2} as follows. The fetched sequence corresponds to the expansion, according to \texttt{sigma}, of the pseudoinstruction. At first, the lemmas appears to immediately imply the correctness of the assembler. At first, the lemma appears to immediately imply the correctness of the assembler. However, this property is \emph{not} strong enough to establish that the semantics of an assembly program has been preserved by the assembly process since it does not establish the correspondence between the semantics of a pseudoinstruction and that of its expansion. In particular, the two semantics differ on instructions that \emph{could} directly manipulate program addresses. We believe that this approach is more flexible when compared to the traditional approach, as in principle it allows us to introduce some permitted \emph{benign} manipulations of addresses that the traditional approach, using opaque addresses, cannot handle, therefore expanding the set of input programs that can be assembled correctly. This approach, of using real addresses coupled with a weak, dynamic typing system sitting atop of memory, is similar to one taken by Huth \emph{et al}~\cite{tuch:types:2007}, for reasoning about low-level C code. This approach, of using real addresses coupled with a weak, dynamic typing system sitting atop of memory, is similar to one taken by Tuch \emph{et al}~\cite{tuch:types:2007}, for reasoning about low-level C code. Our analogue of the semantic function above is then merely a wrapper around the function that implements the semantics of machine code, paired with a function that keeps track of addresses. Note here that all addresses are 16 bit words, but are stored (and manipulated) as 8 bit bytes. All \texttt{MOV} instructions in the MCS-51 must use the accumulator \texttt{A} as an intermediary, moving a byte at a time. The second component of \texttt{internal\_pseudo\_address\_map} therefore states whether the accumulator currently holds a piece of an address, and if so, whether it is the upper or lower byte of the address (using the boolean flag) complete with the corresponding, source address in full. The second component of \texttt{internal\_pseudo\_address\_map} therefore states whether the accumulator currently holds a piece of an address, and if so, whether it is the upper or lower byte of the address (using the boolean flag) complete with the corresponding source address in full. The first component, on the other hand, performs a similar task for the rest of external RAM. Again, we use a boolean flag to describe whether a byte is the upper or lower component of a 16-bit address. Our formalisation exploits dependent types in different ways and for multiple purposes. The first purpose is to reduce potential errors in the formalisation of the microprocessor. In particular, dependent types are used to constraint the size of bitvectors and tries that represent memory quantities and memory areas respectively. They are also used to simulate polymorphic variants in Matita, in order to provide precise typings to various functions expecting only a subset of all possible addressing modes than the MCS-51 offers. In particular, dependent types are used to constrain the size of bitvectors and tries that represent memory quantities and memory areas respectively. They are also used to simulate polymorphic variants in Matita, in order to provide precise typings to various functions expecting only a subset of all possible addressing modes that the MCS-51 offers. Polymorphic variants nicely capture the absolutely unorthogonal instruction set of the MCS-51 where every opcode must accept its own subset of the 11 addressing mode of the processor.