Index: /src/ASM/CPP2011/cpp2011.tex
===================================================================
 /src/ASM/CPP2011/cpp2011.tex (revision 982)
+++ /src/ASM/CPP2011/cpp2011.tex (revision 983)
@@ 46,7 +46,13 @@
\begin{abstract}
We consider the formalisation of an assembler for the Intel MCS51 8bit microprocessor in the Matita proof assistant.
+We consider the formalisation of an assembler for Intel MCS51 assembly language in the Matita proof assistant.
This formalisation forms a major component of the EUfunded CerCo project, concering the construction and formalisation of a concrete complexity preserving compiler for a large subset of the C programming language.
...
+
+The efficient expansion of pseudoinstructionsparticularly jumpsinto MCS51 machine instructions is complex.
+We employ a strategy, involving the use of `policies', that separates the decision making over how jumps should be expanded from the expansion process itself.
+This makes the proof of correctness for the assembler significantly more straightforward.
+
+We prove, under the assumption of the existence of a correct policy, that the assembly process preserves the semantics of assembly programs.
+Correct policies fail to exist only in a limited number of pathological circumstances.
\end{abstract}
@@ 294,25 +300,4 @@
For this, we use the \texttt{sigma} machinery defined at the end of Subsection~\ref{subsect.policies}.
CSC: no options using policy
\begin{lstlisting}
lemma fetch_assembly_pseudo2: $\forall$program, assembled, costs, labels.
 Some $\ldots$ $\langle$labels,costs$\rangle$ = build_maps program $\rightarrow$
 Some $\ldots$ $\langle$assembled,costs$\rangle$ = assembly program $\rightarrow$ $\forall$ppc.
 let code_memory := load_code_memory assembled in
 let preamble := $\pi_1$ program in
 let data_labels := construct_datalabels preamble in
 let lk_labels :=
 λx. sigma program (address_of_word_labels_code_mem ($\pi_2$ program) x) in
 let lk_dlabels := λx. lookup ? ? x data_labels (zero ?) in
 let expansion := jump_expansion ppc program in
 let $\langle$pi,newppc$\rangle$ := fetch_pseudo_instruction ($\pi_2$ program) ppc in
 let ppc' := sigma program ppc in
 let newppc' := sigma program newppc in
 let instructions' :=
 expand_pseudo_instruction lk_labels lk_dlabels ppc' expansion pi in
 let fetched := fetch_many code_memory newppc' ppc' instructions in
 $\exists$instructions. Some ? instructions = instructions' $\wedge$ fetched.
\end{lstlisting}

We expand pseudoinstructions using the function \texttt{expand\_pseudo\_instruction}.
This function accepts a `policy decision'an element of type \texttt{jump\_length}that is used when expanding a \texttt{Call}, \texttt{Jmp} or conditional jump to a label into the correct machine instruction.
@@ 327,4 +312,5 @@
For instance, if the policy decision dictates that we should expand a \texttt{Call} pseudoinstruction into a `short jump', then we fail, as the MCS51's instruction set only features instructions \texttt{ACALL} and \texttt{LCALL}.
+% dpm todo
\begin{lstlisting}
axiom assembly_ok: ∀program,assembled,costs,labels.
@@ 348,5 +334,19 @@
\end{lstlisting}
Lemma \texttt{fetch\_assembly\_pseudo} establishes the correctness of expanding and then assembling pseudoinstructions:
+% dpm todo
+\begin{lstlisting}
+theorem fetch_assembly: $\forall$pc, i, cmem, assembled.
+ assembled = assembly1 i $\rightarrow$
+ let len := length $\ldots$ assembled in
+ encoding_check cmem pc (pc + len) assembled $\rightarrow$
+ let fetched := fetch code_memory (bitvector_of_nat $\ldots$ pc) in
+ let $\langle$instr_pc, ticks$\rangle$ := fetched in
+ let $\langle$instr, pc'$\rangle$ := instr_pc in
+ (eq_instruction instr i $\wedge$
+ eqb ticks (ticks_of_instruction instr) $\wedge$
+ eq_bv $\ldots$ pc' (pc + len)) = true.
+\end{lstlisting}
+
+Lemma \texttt{fetch\_assembly\_pseudo} establishes a basic property between \texttt{expand\_pseudo\_instruction} and \texttt{assembly\_1\_pseudoinstruction}:
\begin{lstlisting}
lemma fetch_assembly_pseudo: $\forall$program, ppc, lk_labels, lk_dlabels.
@@ 366,21 +366,36 @@
The function \texttt{fetch\_many} fetches multiple machine code instructions from code memory and performs some routine checks.
Intuitively, lemma \texttt{fetch\_assembly\_pseudo} can be read as follows.
+Intuitively, Lemma \texttt{fetch\_assembly\_pseudo} can be read as follows.
Suppose our policy \texttt{jump\_expansion} dictates that the pseudoinstruction indexed by the pseudo program counter \texttt{ppc} in assembly program \texttt{program} gives us the policy decision \texttt{jexp}.
Further, suppose we expand the pseudoinstruction at \texttt{ppc} with the policy decision \texttt{jexp}, obtaining an (optional) list of machine code instructions \texttt{exp}.
Suppose we also

\begin{lstlisting}
theorem fetch_assembly: $\forall$pc, i, cmem, assembled.
 assembled = assembly1 i $\rightarrow$
 let len := length $\ldots$ assembled in
 encoding_check cmem pc (pc + len) assembled $\rightarrow$
 let fetched := fetch code_memory (bitvector_of_nat $\ldots$ pc) in
 let $\langle$instr_pc, ticks$\rangle$ := fetched in
 let $\langle$instr, pc'$\rangle$ := instr_pc in
 (eq_instruction instr i $\wedge$
 eqb ticks (ticks_of_instruction instr) $\wedge$
 eq_bv $\ldots$ pc' (pc + len)) = true.
\end{lstlisting}
+Suppose we also assemble the pseudoinstruction at \texttt{ppc} to obtain \texttt{ass}, a list of bytes.
+Then, under the assumption that neither the expansion of the pseudoinstruction to obtain \texttt{exp}, nor the assembly of the pseudoinstruction to obtain \texttt{ass}, failed, we check with \texttt{fetch\_many} that the number of machine instructions that were fetched matches the number of instruction that \texttt{expand\_pseudo\_instruction} expanded.
+
+At first sight, Lemma \texttt{fetch\_assembly\_pseudo2} appears to nearly establish the correctness of the assembler:
+\begin{lstlisting}
+lemma fetch_assembly_pseudo2: $\forall$program, assembled, costs, labels.
+ Some $\ldots$ $\langle$labels, costs$\rangle$ = build_maps program $\rightarrow$
+ Some $\ldots$ $\langle$assembled, costs$\rangle$ = assembly program $\rightarrow$ $\forall$ppc.
+ let code_memory := load_code_memory assembled in
+ let preamble := $\pi_1$ program in
+ let data_labels := construct_datalabels preamble in
+ let lk_labels :=
+ λx. sigma program (address_of_word_labels_code_mem ($\pi_2$ program) x) in
+ let lk_dlabels := λx. lookup ? ? x data_labels (zero ?) in
+ let expansion := jump_expansion ppc program in
+ let $\langle$pi, newppc$\rangle$ := fetch_pseudo_instruction ($\pi_2$ program) ppc in
+ let ppc' := sigma program ppc in
+ let newppc' := sigma program newppc in
+ let instructions' :=
+ expand_pseudo_instruction lk_labels lk_dlabels ppc' expansion pi in
+ let fetched := $\lambda$instr. fetch_many code_memory newppc' ppc' instr in
+ $\exists$instrs. Some ? instrs = instructions' $\wedge$ fetched instrs.
+\end{lstlisting}
+Intuitively, we may read \texttt{fetch\_assembly\_pseudo2} as follows.
+Suppose we are able to successfully assemble an assembly program using \texttt{assembly} and produce a code memory, \texttt{code\_memory}.
+Then there exists some list of machine instructions equal to the expansion of a pseudoinstruction and the number of machine instructions that need to be fetched is equal to the number of machine instructions that the pseudoinstruction was expanded into.
+
+However, this property is \emph{not} strong enough to establish that the semantics of an assembly program has been preserved by the assembly process.
+In particular, \texttt{fetch\_assembly\_pseudo2} says nothing about how
An \texttt{internal\_pseudo\_address\_map} positions in the memory of a \texttt{PseudoStatus} with a physical memory address.
@@ 426,7 +441,7 @@
\end{lstlisting}
The statement can be given an intuitive reading as follows.
Suppose our \texttt{PseudoStatus} $ps$ can be successfully converted into a \texttt{Status} $s$.
Suppose further that, after executing a single assembly instruction and converting the resulting \texttt{PseudoStatus} into a \texttt{Status}, we obtain $s''$, being careful to track the number of ticks executed.
Then, there exists some number $n$, so that executing $n$ machine code instructions in \texttt{Status} $s$ gives us \texttt{Status} $s''$.
+Suppose our \texttt{PseudoStatus}, \texttt{ps}, can be successfully converted into a \texttt{Status}, \texttt{s}.
+Suppose further that, after executing a single assembly instruction and converting the resulting \texttt{PseudoStatus} into a \texttt{Status}, we obtain \texttt{s''}, being careful to track the number of ticks executed.
+Then, there exists some number \texttt{n}, so that executing \texttt{n} machine code instructions in \texttt{Status} \texttt{s} gives us \texttt{Status} \texttt{s''}.
Theorem \texttt{main\_thm} establishes the correctness of the assembly process.
@@ 441,7 +456,10 @@
Expanding these pseudoinstructions into machine code instructions is not trivial, and the proof that the assembly process is `correct', in that the semantics of an assembly program are not changed is complex.
Aside from their application in verified compiler projects such as CerCo, verified assemblers could also be applied to the verification of operating system kernels.
+The formalisation is a key component of the CerCo project, which aims to produce a verified concrete complexity preserving compiler for a large subset of the C programming language.
+The verified assembler, complete with the underlying formalisation of the semantics of MCS51 machine code (described fully in~\cite{mulligan:executable:2011}), will form the bedrock layer upon which the rest of the CerCo project will build its verified compiler platform.
+
+Aside from their application in verified compiler projects such as CerCo, verified assemblers such as ours could also be applied to the verification of operating system kernels.
Of particular note is the verified seL4 kernel~\cite{klein:sel4:2009,klein:sel4:2010}.
This verification explicitly assumes the existence of, amongst other things, a trustworthy assembler.
+This verification explicitly assumes the existence of, amongst other things, a trustworthy assembler and compiler.
\paragraph{Use of dependent types and Russell}