# Changeset 2368 for Papers/cpp-asm-2012/cpp-2012-asm.tex

Ignore:
Timestamp:
Sep 28, 2012, 11:41:39 AM (7 years ago)
Message:

3.4 patched.

File:
1 edited

### Legend:

Unmodified
 r2367 gets you back to where you started. Lemma \texttt{fetch\_assembly\_pseudo} is obtained by composition of \texttt{expand\_pseudo\_instruction} and \texttt{assembly\_1\_pseudoinstruction}: \XXX they do not exist any longer! Remembering that \texttt{assembly\_1\_pseudo\_instruction} is the composition of \texttt{assembly1} with \texttt{expand\_pseudo\_instruction}, we can lift the previous result from instructions (already expanded) to pseudoinstructions (to be expanded): \begin{lstlisting} lemma fetch_assembly_pseudo: $\forall$program: pseudo_assembly_program. $\forall$policy. $\forall$ppc. $\forall$code_memory. $\forall$policy,ppc,code_memory. let $\langle$preamble, instr_list$\rangle$ := program in let pi := $\pi_1$ (fetch_pseudo_instruction instr_list ppc) in let pc := policy ppc in let instrs := expand_pseudo_instructio policy ppc pi in let instrs := expand_pseudo_instruction policy ppc pi in let $\langle$l, a$\rangle$ := assembly_1_pseudoinstruction policy ppc pi in let pc_plus_len := pc + l in The function \texttt{fetch\_many} fetches multiple machine code instructions from code memory and performs some routine checks. Intuitively, Lemma \texttt{fetch\_assembly\_pseudo} can be read as follows. Suppose we expand the pseudoinstruction at \texttt{ppc} with the policy, obtaining the list of machine code instructions \texttt{instrs}. Suppose we also assemble the pseudoinstruction at \texttt{ppc} to obtain \texttt{a}, a list of bytes. Then, we check with \texttt{fetch\_many} that the number of machine instructions that were fetched matches the number of instruction that \texttt{expand\_pseudo\_instruction} expanded. The final lemma in this series is \texttt{fetch\_assembly\_pseudo2} that combines the Lemma \texttt{fetch\_assembly\_pseudo} with the correctness of the functions that load object code into the processor's memory: Intuitively, Lemma \texttt{fetch\_assembly\_pseudo} says that expanding a pseudoinstruction into $n$ instructions, encoding the instructions and immediately fetching $n$ instructions back yield exactly the expansion. Combining \texttt{assembly\_ok} with the previos lemma and a proof of correctness of loading object code in memory, we finally get correctness of the assembler w.r.t. fetching: \begin{lstlisting} lemma fetch_assembly_pseudo2: $\forall$program. $\mid$snd program$\mid$ $\leq$ $2^{16}$ $\rightarrow$ $\forall$policy. policy is correct for program $\rightarrow$ $\forall$program.  $\mid$snd program$\mid$ $\leq$ $2^{16}$ $\rightarrow$ $\forall$policy.  policy is correct for program $\rightarrow$ $\forall$ppc. ppc < $\mid$snd program$\mid$ $\rightarrow$ let $\langle$labels, costs$\rangle$ := create_label_cost_map program in let $\langle$assembled, costs'$\rangle$ := $\pi_1$ (assembly program policy) in let cmem := load_code_memory assembled in Suppose we are given an assembly program which can be addressed by a 16-bit word and a policy that is correct for this program. Suppose we are able to successfully assemble an assembly program using \texttt{assembly} and produce a code memory, \texttt{cmem}. Then, fetching a pseudoinstruction from the pseudo-code memory at address \texttt{ppc} corresponds to fetching a sequence of instructions from the real code memory using \texttt{policy} to expand pseudoinstructions. The fetched sequence corresponds to the expansion, according to the policy, of the pseudoinstruction. At first, the lemma appears to immediately imply the correctness of the assembler, but this property is \emph{not} strong enough to establish that the semantics of an assembly program has been preserved by the assembly process since it does not establish the correspondence between the semantics of a pseudoinstruction and that of its expansion. In particular, the two semantics differ on instructions that \emph{could} directly manipulate program addresses. Then, fetching a pseudoinstruction from the pseudo-code memory stored in the interval $[ppc,newppc]$ corresponds to fetching a sequence of instructions from the real code memory, stored in the interval $[policy(ppc),policy(ppc+1)]$. The correspondence is precise: the fetched instructions are exactly those obtained expanding the pseudoinstruction according to policy. In order to complete the proof of correctness of the assembler, we need to prove that each pseudoinstruction is simulated by the execution of its expansion (correctness w.r.t. execution). In general this is not the case when the instruction freely manipulates program addresses. Characterizing well behaved programs and proving correctness w.r.t. expansion is the topic of the next Section. % ---------------------------------------------------------------------------- % \label{subsect.total.correctness.for.well.behaved.assembly.programs} The traditional approach to verifying the correctness of an assembler is to treat memory addresses as opaque structures that cannot be modified. The traditional approach to verifying the correctness of an assembler is to restrict the semantics of assembly code to treat memory addresses as opaque (symbolic) structures that cannot be modified. Memory is represented as a map from opaque addresses to the disjoint union of data and opaque addresses---addresses are kept opaque to prevent their possible `semantics breaking' manipulation by assembly programs: \begin{displaymath} \mathtt{Mem} : \mathtt{Addr} \rightarrow \mathtt{Bytes} + \mathtt{Addr} \qquad \llbracket - \rrbracket : \mathtt{Instr} \rightarrow \mathtt{Mem} \rightarrow \mathtt{option\ Mem} \end{displaymath} The semantics of a pseudoinstruction, $\llbracket - \rrbracket$, is given as a possibly failing function from pseudoinstructions and memory spaces to new memory spaces. The semantic function proceeds by case analysis over the operands of a given instruction, failing if either operand is an opaque address, or otherwise succeeding, updating memory. Similarly, registers may contain either data or opaque addresses. The semantics of a pseudoinstruction, $\llbracket - \rrbracket$, is given as a possibly failing function from pseudoinstructions and memory spaces (pseudostatuses to be more precise) to new memory spaces. The semantic function proceeds by case analysis over the operands of a given instruction, failing if either operand is an opaque address and the operation is meaningless on it. \begin{gather*} \llbracket \mathtt{ADD\ @A1\ @A2} \rrbracket^\mathtt{M} = \begin{cases} \llbracket \mathtt{MUL\ @A1\ @A2} \rrbracket^\mathtt{M} = \begin{cases} \mathtt{Byte\ b1},\ \mathtt{Byte\ b2} & \rightarrow \mathtt{Some}(\mathtt{M}\ \text{with}\ \mathtt{b1} + \mathtt{b2}) \\ -,\ \mathtt{Addr\ a} & \rightarrow \mathtt{None} \\