# Changeset 2348

Ignore:
Timestamp:
Sep 26, 2012, 3:43:36 PM (9 years ago)
Message:

Edited up to Sec. 3.3

File:
1 edited

### Legend:

Unmodified
 r2347 formalisation is available at~\url{http://cerco.cs.unibo.it}. In Section~\ref{subsect.machine.code.semantics} we sketch an operational semantics (a realistic and efficient emulator) for the MCS-51. We also introduce a syntax for decoded instructions that will be reused for the assembly language. In Section~\ref{subsect.assembly.code.semantics} we describe the assembly code language and operational semantics. The latter is parametric in the cost model that will be induced by the assembler. It fully reuses the semantics of the machine code on all real' (i.e. non pseudo) instructions. In Section~\ref{subsect.machine.code.semantics} we sketch an operational semantics (a realistic and efficient emulator) for the MCS-51. We also introduce a syntax for decoded instructions that will be reused for the assembly language. In Section~\ref{subsect.assembly.code.semantics} we describe the assembly language and its operational semantics. The latter is parametric in the cost model that will be induced by the assembler. It reuses the semantics of the machine code on all real' (i.e. non-pseudo-) instructions. Branch displacement policies are introduced in Section~\ref{subsect.the.assembler} where we also describe the assembler as a function over policies as previously described. The proof of correctness of the assembler consists in showing that the object code given in output together with a cost model for the source program simulates the source program executed using that cost model. The proof can be divided into two main lemmas. The first is correctness with respect to fetching, described in Section~\ref{subsect.total.correctness.of.the.assembler}. It roughly states that one step of fetching at the assembly level that returns the decoded instruction $I$ is simulated by $n$ steps of fetching at object level that return instructions $J_1,\ldots,J_n$, where $J_1,\ldots,J_n$ is, among the possible expansions of $I$, the one picked by the policy. The second lemma shows that $J_1,\ldots,J_n$ simulates $I$ but only if the $I$ is well-behaved, i.e. it manipulates addresses in ways that are anticipated in the correctness proof. To keep track of the well-behaved ways, we couple the assembly status with a map that records where addresses are currently stored in memory or in the registers. Then we introduce a dynamic checking function that inspects the assembly status and the map to determine if the operation is well behaved. A positive answer is the pre-condition of the lemma. The second lemma is part of Section~\ref{subsect.total.correctness.for.well.behaved.assembly.programs} where we also establish \emph{total correctness} of our assembler as a composition of the two lemmas: programs that are well behaved when executed under the cost model induced by the compiler are correctly simulated by the compiled code. The proof of correctness of the assembler consists in showing that the object code given in output, together with a cost model for the source program, simulates the source program executed using that cost model. The proof can be divided into two main lemmas. The first is correctness with respect to fetching, described in Section~\ref{subsect.total.correctness.of.the.assembler}. It roughly states that one step of fetching at the assembly level that returns the decoded instruction $I$ is simulated by $n$ steps of fetching at the object level that returns instructions $J_1,\ldots,J_n$, where $J_1,\ldots,J_n$ is, amongst the possible expansions of $I$, the one picked by the policy. The second lemma shows that $J_1,\ldots,J_n$ simulates $I$ but only if $I$ is well-behaved, i.e. it manipulates addresses in ways that are anticipated in the correctness proof. To keep track of well-behaved address manipulations, we couple the assembly status with a map that records where addresses are currently stored in memory or in the processor's accumulators. We then introduce a dynamic checking function that inspects the assembly status and this map to determine if the operation is well behaved. An affirmative answer is the pre-condition of the lemma. The second lemma is detailed in Section~\ref{subsect.total.correctness.for.well.behaved.assembly.programs} where we also establish \emph{total correctness} of our assembler as a composition of the two lemmas: programs that are well behaved when executed under the cost model induced by the compiler are correctly simulated by the compiled code. % ---------------------------------------------------------------------------- % definition execute_1: $\forall$cm. Status cm $\rightarrow$ Status cm \end{lstlisting} Here \texttt{Status} is parameterised by \texttt{cm}, a code memory represented as a trie. Here \texttt{cm} is our trie of bytes representing code memory. %The function \texttt{execute} allows one to execute an arbitrary, but fixed (due to Matita's normalisation requirement) number of steps of a program. The function \texttt{execute\_1} closely matches the operation of the MCS-51 hardware, as described by a Siemen's manufacturer's data sheet (specifically, this one~\cite{siemens:2011}). The function \texttt{execute\_1} closely matches the operation of the MCS-51 hardware, as described by a Siemen's manufacturer's data sheet~\cite{siemens:2011}. We first fetch, using the program counter, from code memory the first byte of the instruction to be executed, decoding the resulting opcode, fetching more bytes as is necessary. Decoded instructions are represented as an inductive type, where $\llbracket - \rrbracket$ denotes a fixed-length vector: These are jumps and calls to absolute addresses, which we do not wish to allow at the assembly level. Execution of pseudoinstructions is a function from \texttt{PseudoStatus} to \texttt{PseudoStatus}. Execution of pseudoinstructions is an endofunction on \texttt{PseudoStatus}. Both \texttt{Status} and \texttt{PseudoStatus} are instances of a more general type parameterised over the representation of code memory. The more general type is crucial for sharing the majority of the semantics of the two languages. \end{lstlisting} The type of \texttt{execute\_1\_pseudo\_instruction} is interesting. Note here that our representation of code memory \texttt{cm} is no longer a bitvector trie of bytes, but a list of pseudoinstructions. Further, we take in a function that maps program counters (at the assembly level) to pairs of natural numbers representing the number of clock ticks that the pseudoinstructions needs to execute, post expansion. We call this function a \emph{costing}, and note that the costing is induced by the particular strategy we use to expand pseudoinstructions. Here our representation of code memory \texttt{cm} is no longer a trie of bytes, but a list of pseudoinstructions. We take in a function that maps program counters (at the assembly level) to pairs of natural numbers representing the number of clock ticks that the pseudoinstructions needs to execute, post expansion. This function is called a \emph{costing}, and the costing is induced by whatever strategy we use to expand pseudoinstructions. If we change how we expand conditional jumps to labels, for instance, then the costing needs to change, hence \texttt{execute\_1\_pseudo\_instruction}'s parametricity in the costing. This costing function expects a proof, at its second argument, stating that the program counter falls within the bounds of the pseudoprogram. A similar invariant holds of the pseudoprogram counter passed to the \texttt{execute\_1\_pseudo\_instruction} function. This costing function expects a proof, as its second argument, stating that the program counter falls within the bounds of the pseudoprogram. A similar proof is required for the pseudo-program counter passed to the \texttt{execute\_1\_pseudo\_instruction} function. The costing returns \emph{pairs} of natural numbers because, in the case of expanding conditional jumps to labels, the expansion of the true branch' and false branch' may differ in execution time.