Changeset 2058


Ignore:
Timestamp:
Jun 13, 2012, 3:17:24 PM (5 years ago)
Author:
mulligan
Message:

First draft of changes to main sections (i.e. those describing the proof itself).

File:
1 edited

Legend:

Unmodified
Added
Removed
  • src/ASM/CPP2012-asm/cpp-2012-asm.tex

    r2053 r2058  
    206206\label{subsect.the.assembler.and.semantics.of.machine.code}
    207207
    208 The formalisation in Matita of the semantics of MCS-51 machine code is described in~\cite{mulligan:executable:2011}.
    209 We merely describe enough here to understand the rest of the proof.
    210 
    211 The emulator centres around a \texttt{Status} record, describing the microprocessor's state.
     208Our emulator centres around a \texttt{Status} record, describing the microprocessor's state.
    212209This record contains fields corresponding to the microprocessor's program counter, registers, and so on.
    213210At the machine code level, code memory is implemented as a compact trie of bytes, addressed by the program counter.
    214 Machine code programs are loaded into \texttt{Status} using the \texttt{load\_code\_memory} function.
     211We parameterise \texttt{Status} records by this representation as a few technical tasks manipulating statuses are made simpler using this approach, as well as permitting a modicum of abstraction.
    215212
    216213We may execute a single step of a machine code program using the \texttt{execute\_1} function, which returns an updated \texttt{Status}:
    217214\begin{lstlisting}
    218 definition execute_1: Status $\rightarrow$ Status := $\ldots$
     215definition execute_1: $\forall$cm. Status cm $\rightarrow$ Status cm := $\ldots$
    219216\end{lstlisting}
    220217The function \texttt{execute} allows one to execute an arbitrary, but fixed (due to Matita's normalisation requirement) number of steps of a program.
     218Execution proceeds by a case analysis on the instruction fetched from code memory using the program counter of the input \texttt{Status} record.
    221219
    222220Naturally, assembly programs have analogues.
     
    228226Our analogue of \texttt{execute\_1} is \texttt{execute\_1\_pseudo\_instruction}:
    229227\begin{lstlisting}
    230 definition execute_1_pseudo_instruction: (Word $\rightarrow$ nat $\times$ nat) $\rightarrow$
    231                                          PseudoStatus $\rightarrow$ PseudoStatus := $\ldots$
     228definition execute_1_pseudo_instruction:
     229 (Word $\rightarrow$ nat $\times$ nat) $\rightarrow$ $\forall$cm. PseudoStatus cm $\rightarrow$ PseudoStatus cm := $\ldots$
    232230\end{lstlisting}
    233231Notice, here, that the emulation function for assembly programs takes an additional argument.
     
    241239
    242240The assembler, mapping programs consisting of lists of pseudoinstructions to lists of bytes, operates in a mostly straightforward manner.
    243 To a degree of approximation, the assembler on an assembly program, consisting of $n$ pseudoinstructions $\mathtt{P_i}$ for $1 \leq i \leq n$, works as in the following diagram (we use $-^{*}$ to denote a combined map and flatten operation):
     241To a degree of approximation the assembler, on an assembly program consisting of $n$ pseudoinstructions $\mathtt{P_i}$ for $1 \leq i \leq n$, works as in the following diagram (we use $-^{*}$ to denote a combined map and flatten operation):
    244242\begin{displaymath}
    245243[\mathtt{P_1}, \ldots \mathtt{P_n}] \xrightarrow{\left(\mathtt{P_i} \xrightarrow{\mbox{\fontsize{7}{9}\selectfont$\mathtt{expand\_pseudo\_instruction}$}} \mathtt{[I^1_i, \ldots I^q_i]} \xrightarrow{\mbox{\fontsize{7}{9}\selectfont$\mathtt{~~~~~~~~assembly1^{*}~~~~~~~~}$}} \mathtt{[0110]}\right)^{*}} \mathtt{[010101]}
     
    252250% SECTION                                                                      %
    253251% ---------------------------------------------------------------------------- %
    254 \subsection{Policies}
    255 \label{subsect.policies}
    256 
    257 Policies exist to dictate how conditional and unconditional jumps at the assembly level should be expanded into machine code instructions.
    258 Using policies, we are able to completely decouple the decision over how jumps are expanded with the act of expansion, simplifying our proofs.
    259 As mentioned, the MCS-51 instruction set includes three different jump instructions: \texttt{SJMP}, \texttt{AJMP} and \texttt{LJMP}; call these `short', `medium' and `long' jumps, respectively:
    260 \begin{lstlisting}
    261 inductive jump_length: Type[0] :=
    262   |short_jump:jump_length |medium_jump:jump_length |long_jump:jump_length.
    263 \end{lstlisting}
    264 
    265 A \texttt{jump\_expansion\_policy} is a map from pseudo program counters (implemented as \texttt{Word}s) to \texttt{jump\_length}s.
    266 Efficient implementations of policies are based on tries.
    267 Intuitively, a policy maps positions in a program (indexed using program counters implemented as \texttt{Word}s) to a particular variety of jump:
    268 \begin{lstlisting}
    269 definition policy_type := Word $\rightarrow$ jump_length.
    270 \end{lstlisting}
    271 
    272 Next, we require a series of `sigma' functions.
    273 These functions map assembly program counters to their machine code counterparts, establishing the correspondence between `positions' in an assembly program and `positions' in a machine code program.
    274 At the heart of this process is \texttt{sigma0} which traverses an assembly program building maps from pseudo program counters to program counters.
    275 This function fails if and only if an internal call to \texttt{assembly\_1\_pseudoinstruction\_safe} fails, which happens if a jump pseudoinstruction is expanded incorrectly:
    276 \begin{lstlisting}
    277 definition sigma0: pseudo_assembly_program $\rightarrow$ policy_type
    278   $\rightarrow$ option (nat $\times$ (nat $\times$ (BitVectorTrie Word 16))) := $\ldots$
    279 \end{lstlisting}
    280 Here, the returned \texttt{BitVectorTrie} is a map between pseudo program counters and program counters, and is constructed by successively expanding pseudoinstructions and incrementing the two program counters the requisite amount to keep them in correct correspondence.
    281 The two natural numbers returned are the maximum values that the two program counters attained.
    282 
    283 We eventually lift this to functions from pseudo program counters to program counters, implemented as \texttt{Word}s:
    284 \begin{lstlisting}
    285 definition sigma_safe:
    286   pseudo_assembly_program $\rightarrow$ policy_type $\rightarrow$ option (Word $\rightarrow$ Word) := $\ldots$
    287 \end{lstlisting}
    288 
    289 Now, it's possible to define what a `good policy' is for a program \texttt{p}.
    290 A policy \texttt{pol} is deemed good when it prevents \texttt{sigma\_safe} from failing on \texttt{p}.
    291 Failure is only possible when the policy dictates that short or medium jumps be expanded to jumps to locations too far away, or when the produced object code does not fit into code memory.
    292 A \texttt{policy} for a program \texttt{p} is a policy that is good for \texttt{p}:
    293 \begin{lstlisting}
    294 definition policy_ok := $\lambda$pol.$\lambda$p. sigma_safe p $\neq$ None $\ldots$
    295 definition policy :=
    296   $\lambda$p. $\Sigma$jump_expansion: policy_type. policy_ok jump_expansion p
    297 \end{lstlisting}
    298 
    299 Finally, we obtain \texttt{sigma}, a mapping from pseudo program counters to program counters that takes in input a good policy and thus never fails.
    300 Note how we avoid failure here, and in most of the remaining functions, by restricting the domain using the dependent type \texttt{policy}:
    301 \begin{lstlisting}
    302 definition sigma: $\forall$p. policy p $\rightarrow$ Word $\rightarrow$ Word := $\ldots$
    303 \end{lstlisting}
    304 
    305 % ---------------------------------------------------------------------------- %
    306 % SECTION                                                                      %
    307 % ---------------------------------------------------------------------------- %
    308252\subsection{Correctness of the assembler with respect to fetching}
    309253\label{subsect.total.correctness.of.the.assembler}
    310254
     255Throughout the proof of correctness for assembly we assume that policies are bifurcated into two functions: \texttt{sigma} mapping \texttt{Word} to \texttt{Word} and \texttt{policy} mapping \texttt{Word} to \texttt{bool}.
     256For our purposes, \texttt{sigma} is most interesting, as it maps pseudo program counters to program counters; \texttt{policy} is merely a technical device used in jump expansion.
     257
    311258Using our policies, we now work toward proving the total correctness of the assembler.
    312259By `total correctness', we mean that the assembly process never fails when provided with a good policy and that the process does not change the semantics of a certain class of well behaved assembly programs.
    313 Naturally, this necessitates keeping some sort of correspondence between addresses at the assembly level and addresses at the machine code level.
    314 For this, we use the \texttt{sigma} machinery defined at the end of Subsection~\ref{subsect.policies}.
     260By `good policy' we mean that policies provided to us will keep a lockstep correspondence between addresses at the assembly level and addresses at the machine code level:
     261\begin{displaymath}
     262\texttt{sigma}(\texttt{ppc} + 1) = \texttt{pc} + \texttt{current\_instruction\_size}
     263\end{displaymath}
     264along with some other technical properties related to program counters falling within the bounds of our programs.
     265We assume the correctness of the policies given to us using a function, \texttt{sigma\_policy\_specification} that we take in input, when needed.
    315266
    316267We expand pseudoinstructions using the function \texttt{expand\_pseudo\_instruction}.
    317 This takes an assembly program (consisting of a list of pseudoinstructions), a good policy for the program and a pointer to the pseudo code memory.
     268This takes an assembly program (consisting of a list of pseudoinstructions), a policy for the program, a map detailing the physical addresses of data labels from the pseudo program's preamble, and the pseudoinstruction to be expanded.
    318269It returns a list of instructions, corresponding to the expanded pseudoinstruction referenced by the pointer.
    319 The policy is used to decide how to expand \texttt{Call}s, \texttt{Jmp}s and conditional jumps.
    320 The function is given a dependent type that incorporates its specification.
    321 Its pre- and post-conditions are omitted in the paper due to lack of space.
    322 We show them as an example in the next function, \texttt{build\_maps}.
     270Execution proceeds by case analysis on the pseudoinstruction given as input.
     271The policy, \texttt{sigma}, is used to decide how to expand \texttt{Call}s, \texttt{Jmp}s and conditional jumps.
    323272\begin{lstlisting}
    324273definition expand_pseudo_instruction:
    325   $\forall$program. $\forall$pol: policy program.
    326   $\forall$ppc:Word. $\ldots$ $\Sigma$res. list instruction. $\ldots$ := $\ldots$
    327 \end{lstlisting}
    328 
    329 The following function, \texttt{build\_maps}, is used to construct a pair of mappings from program counters to labels and cost labels, respectively.
    330 Cost labels are a technical device used in the CerCo prototype C compiler for proving that the compiler is cost preserving.
    331 For our purposes in this paper, they can be safely ignored, though the interested reader may consult~\cite{amadio:certifying:2010} for an overview.
    332 
    333 The label map, on the other hand, records the position of labels that appear in an assembly program, so that the pseudoinstruction expansion process can replace them with real memory addresses:
    334 \begin{lstlisting}
    335 definition build_maps:
    336  $\forall$p. $\forall$pol: policy p.
    337  $\Sigma$res : ((BitVectorTrie Word 16) $\times$ (BitVectorTrie Word 16)).
    338    let $\langle$labels, costs$\rangle$ := res in
    339      $\forall$id. occurs_exactly_once id ($\pi_2$ p) $\rightarrow$
    340     let addr := address_of_word_labels_code_mem ($\pi_2$ p) id in
    341       lookup $\ldots$ id labels (zero $\ldots$) = sigma pseudo_program pol addr := $\ldots$
    342 \end{lstlisting}
    343 The type of \texttt{build\_maps} owes to our use of Matita's Russell facility to provide a strong specification for the function in the type (c.f. the use of $\Sigma$-types and coercions, through which Russell is simulated in Matita).
    344 We express that for all labels that appear exactly once in any assembly program, the newly created map used in the implementation, and the stronger \texttt{sigma} function used in the specification, agree.
    345 
    346 Using \texttt{build\_maps}, we can express the following lemma, expressing the correctness of the assembly function:
    347 \begin{lstlisting}
    348 lemma assembly_ok: $\forall$p,pol,assembled.
    349   let $\langle$labels, costs$\rangle$ := build_maps p pol in
    350   $\langle$assembled,costs$\rangle$ = assembly p pol $\rightarrow$
    351   let cmem := load_code_memory assembled in
    352   let preamble := $\pi_1$ p in
    353   let dlbls := construct_datalabels preamble in
    354   let addr := address_of_word_labels_code_mem ($\pi_2$ p) in
    355   let lk_lbls := λx. sigma p pol (addr x) in
    356   let lk_dlbls := λx. lookup $\ldots$ x datalabels (zero ?) in
    357   $\forall$ppc, pi, newppc.
    358   $\forall$prf: $\langle$pi, newppc$\rangle$ = fetch_pseudo_instruction ($\pi_2$ p) ppc.
    359   $\forall$len, assm.
    360   let spol := sigma program pol ppc in
    361   let spol_len := spol + len in
    362   let echeck := encoding_check cmem spol spol_len assm in
    363   let a1pi := assembly_1_pseudoinstruction in
    364   $\langle$len, assm$\rangle$ = a1pi p pol ppc lk_lbls lk_dlbls pi (refl $\ldots$) (refl $\ldots$) ? $\rightarrow$
    365     echeck $\wedge$ sigma p pol newppc = spol_len.
    366 \end{lstlisting}
    367 Suppose also we assemble our program \texttt{p} in accordance with a policy \texttt{pol} to obtain \texttt{assembled}.
    368 Here, we perform a `sanity check' to ensure that the two cost label maps generated are identical, before loading the assembled program into code memory \texttt{cmem}.
     274 $\forall$lookup_labels: Identifier $\rightarrow$ Word.
     275 $\forall$sigma: Word $\rightarrow$ Word.
     276 $\forall$policy: Word $\rightarrow$ bool.
     277  Word $\rightarrow$ (Identifier $\rightarrow$ Word) $\rightarrow$
     278   pseudo_instruction $\rightarrow$ list instruction := ...
     279\end{lstlisting}
     280
     281We can express the following lemma, expressing the correctness of the assembly function (slightly simplified):
     282\begin{lstlisting}
     283lemma assembly_ok:
     284 $\forall$program: pseudo_assembly_program.
     285 $\forall$sigma: Word $\rightarrow$ Word.
     286 $\forall$policy: Word $\rightarrow$ bool.
     287 $\forall$sigma_policy_witness.
     288 $\forall$assembled.
     289 $\forall$costs': BitVectorTrie costlabel 16.
     290 let $\langle$preamble, instr_list$\rangle$ := program in
     291 let $\langle$labels, costs$\rangle$ := create_label_cost_map instr_list in
     292  $\langle$assembled,costs'$\rangle$ = assembly program sigma policy $\rightarrow$
     293   let cmem := load_code_memory assembled in
     294   $\forall$ppc.
     295    let $\langle$pi, newppc$\rangle$ := fetch_pseudo_instruction instr_list ppc in     
     296    let $\langle$len,assembled$\rangle$ := assembly_1_pseudoinstruction $\ldots$ ppc $\ldots$ pi in
     297    let pc := sigma ppc in
     298    let pc_plus_len := add $\ldots$ pc (bitvector_of_nat $\ldots$ len) in
     299     encoding_check cmem pc pc_plus_len assembled $\wedge$
     300      sigma newppc = add $\ldots$ pc (bitvector_of_nat $\ldots$ len).
     301\end{lstlisting}
     302Here, \texttt{encoding\_check} is a recursive function that checks that assembled machine code is correctly stored in code memory.
     303Suppose also we assemble our program \texttt{p} in accordance with a policy \texttt{sigma} to obtain \texttt{assembled}, loading the assembled program into code memory \texttt{cmem}.
    369304Then, for every pseudoinstruction \texttt{pi}, pseudo program counter \texttt{ppc} and new pseudo program counter \texttt{newppc}, such that we obtain \texttt{pi} and \texttt{newppc} from fetching a pseudoinstruction at \texttt{ppc}, we check that assembling this pseudoinstruction produces the correct number of machine code instructions, and that the new pseudo program counter \texttt{ppc} has the value expected of it.
    370305
    371 Theorem \texttt{fetch\_assembly} establishes that the \texttt{fetch} and \texttt{assembly1} functions interact correctly.
     306Lemma \texttt{fetch\_assembly} establishes that the \texttt{fetch} and \texttt{assembly1} functions interact correctly.
    372307The \texttt{fetch} function, as its name implies, fetches the instruction indexed by the program counter in the code memory, while \texttt{assembly1} maps a single instruction to its byte encoding:
    373308\begin{lstlisting}
    374 theorem fetch_assembly: $\forall$pc, i, cmem, assembled.  assembled=assembly1 i $\rightarrow$
     309lemma fetch_assembly:
     310 $\forall$pc: Word.
     311 $\forall$i: instruction.
     312 $\forall$code_memory: BitVectorTrie Byte 16.
     313 $\forall$assembled: list Byte.
     314  assembled = assembly1 i $\rightarrow$
    375315  let len := length $\ldots$ assembled in
    376   encoding_check cmem pc (pc + len) assembled $\rightarrow$
    377     let fetched := fetch code_memory (bitvector_of_nat $\ldots$ pc) in
    378     let $\langle$instr_pc, ticks$\rangle$ := fetched in
    379     let $\langle$instr, pc'$\rangle$ := instr_pc in
    380       (eq_instruction instr i $\wedge$ eqb ticks (ticks_of_instruction instr) $\wedge$
    381        eq_bv $\ldots$ pc' (pc + len)) = true.
     316  let pc_plus_len := add $\ldots$ pc (bitvector_of_nat $\ldots$ len) in
     317   encoding_check code_memory pc pc_plus_len assembled $\rightarrow$
     318   let $\langle$instr, pc', ticks$\rangle$ := fetch code_memory pc in
     319    (eq_instruction instr i $\wedge$
     320     eqb ticks (ticks_of_instruction instr) $\wedge$
     321     eq_bv $\ldots$ pc' pc_plus_len) = true.
    382322\end{lstlisting}
    383323In particular, we read \texttt{fetch\_assembly} as follows.
    384324Given an instruction, \texttt{i}, we first assemble the instruction to obtain \texttt{assembled}, checking that the assembled instruction was stored in code memory correctly.
    385 Fetching from code memory, we obtain \texttt{fetched}, a tuple consisting of the instruction, new program counter, and the number of ticks this instruction will take to execute.
    386 Deconstructing these tuples, we finally check that the fetched instruction is the same instruction that we began with, and the number of ticks this instruction will take to execute is the same as the result returned by a lookup function, \texttt{ticks\_of\_instruction}, devoted to tracking this information.
     325Fetching from code memory, we obtain a tuple consisting of the instruction, new program counter, and the number of ticks this instruction will take to execute.
     326We finally check that the fetched instruction is the same instruction that we began with, and the number of ticks this instruction will take to execute is the same as the result returned by a lookup function, \texttt{ticks\_of\_instruction}, devoted to tracking this information.
    387327Or, in plainer words, assembling and then immediately fetching again gets you back to where you started.
    388328
     
    390330\begin{lstlisting}
    391331lemma fetch_assembly_pseudo:
    392  ∀program.∀pol:policy program.∀ppc.∀code_memory.
    393   let pi := $\pi_1$ (fetch_pseudo_instruction ($\pi_2$ program) ppc) in
    394   let instructions := expand_pseudo_instruction program pol ppc ... in
    395   let $\langle$len,assembled$\rangle$ := assembly_1_pseudoinstruction program pol ppc ... in
    396   encoding_check code_memory pc (pc + len) assembled →
    397   fetch_many code_memory (pc + len) pc instructions.
    398 \end{lstlisting}
    399 Here, \texttt{len} is the number of machine code instructions the pseudoinstruction at hand has been expanded into, and \texttt{encoding\_check} is a recursive function that checks that assembled machine code is correctly stored in code memory.
     332 $\forall$program: pseudo_assembly_program.
     333 $\forall$sigma: Word $\rightarrow$ Word.
     334 $\forall$policy: Word $\rightarrow$ bool.
     335 $\forall$ppc.
     336 $\forall$code_memory.
     337 let $\langle$preamble, instr_list$\rangle$ := program in
     338 let pi := $\pi_1$ (fetch_pseudo_instruction instr_list ppc) in
     339 let pc := sigma ppc in
     340 let instrs := expand_pseudo_instruction $\ldots$ sigma policy ppc $\ldots$ pi in
     341 let $\langle$l, a$\rangle$ := assembly_1_pseudoinstruction $\ldots$ sigma policy ppc $\ldots$ pi in
     342 let pc_plus_len := add $\ldots$ pc (bitvector_of_nat $\ldots$ l) in
     343  encoding_check code_memory pc pc_plus_len a $\rightarrow$
     344   fetch_many code_memory pc_plus_len pc instructions.
     345\end{lstlisting}
     346Here, \texttt{l} is the number of machine code instructions the pseudoinstruction at hand has been expanded into.
    400347We assemble a single pseudoinstruction with \texttt{assembly\_1\_pseudoinstruction}, which internally calls \texttt{jump\_expansion} and \texttt{expand\_pseudo\_instruction}.
    401348The function \texttt{fetch\_many} fetches multiple machine code instructions from code memory and performs some routine checks.
    402349
    403350Intuitively, Lemma \texttt{fetch\_assembly\_pseudo} can be read as follows.
    404 Suppose we expand the pseudoinstruction at \texttt{ppc} with the policy decision \texttt{pol}, obtaining the list of machine code instructions \texttt{instructions}.
    405 Suppose we also assemble the pseudoinstruction at \texttt{ppc} to obtain \texttt{assembled}, a list of bytes.
     351Suppose we expand the pseudoinstruction at \texttt{ppc} with the policy decision \texttt{sigma}, obtaining the list of machine code instructions \texttt{instrs}.
     352Suppose we also assemble the pseudoinstruction at \texttt{ppc} to obtain \texttt{a}, a list of bytes.
    406353Then, we check with \texttt{fetch\_many} that the number of machine instructions that were fetched matches the number of instruction that \texttt{expand\_pseudo\_instruction} expanded.
    407354
    408355The final lemma in this series is \texttt{fetch\_assembly\_pseudo2} that combines the Lemma \texttt{fetch\_assembly\_pseudo} with the correctness of the functions that load object code into the processor's memory.
     356Again, we slightly simplify:
    409357\begin{lstlisting}
    410358lemma fetch_assembly_pseudo2:
    411  ∀program,pol,ppc.
    412   let $\langle$labels,costs$\rangle$ := build_maps program pol in
    413   let assembled := $\pi_1$ (assembly program pol) in
    414   let code_memory := load_code_memory assembled in
    415   let data_labels := construct_datalabels ($\pi_1$ program) in
    416   let lookup_labels :=
    417     λx. sigma $\ldots$ pol (address_of_word_labels_code_mem ($\pi_2$ program) x) in
    418   let lookup_datalabels := λx. lookup ? ? x data_labels (zero ?) in
    419   let $\langle$pi,newppc$\rangle$ := fetch_pseudo_instruction ($\pi_2$ program) ppc in
    420   let instrs ≝ expand_pseudo_instruction program pol ppc ... in
    421    fetch_many code_memory (sigma $\ldots$ pol newppc) (sigma $\ldots$ pol ppc) instrs.
    422 \end{lstlisting}
     359 $\forall$program.
     360 $\forall$sigma.
     361 $\forall$policy.
     362 $\forall$sigma_policy_specification_witness.
     363 $\forall$ppc.
     364  let $\langle$preamble, instr_list$\rangle$ := program in
     365  let $\langle$labels, costs$\rangle$ := create_label_cost_map instr_list in
     366  let $\langle$assembled, costs'$\rangle$ := $\pi_1$ $\ldots$ (assembly program sigma policy) in
     367  let cmem := load_code_memory assembled in
     368  let $\langle$pi, newppc$\rangle$ := fetch_pseudo_instruction instr_list ppc in
     369  let instructions := expand_pseudo_instruction $\ldots$ sigma ppc $\ldots$ pi in
     370    fetch_many cmem (sigma newppc) (sigma ppc) instructions.
     371\end{lstlisting}
     372
     373Here we use $\pi_1 \ldots$ to eject out the existential witness from the Russell-typed function \texttt{assembly}.
    423374
    424375We read \texttt{fetch\_assembly\_pseudo2} as follows.
    425 Suppose we are able to successfully assemble an assembly program using \texttt{assembly} and produce a code memory, \texttt{code\_memory}.
    426 Then, fetching a pseudoinstruction from the pseudo code memory at address \texttt{ppc} corresponds to fetching a sequence of instructions from the real code memory at address \texttt{sigma program pol ppc}.
    427 The fetched sequence corresponds to the expansion, according to \texttt{pol}, of the pseudoinstruction.
     376Suppose we are able to successfully assemble an assembly program using \texttt{assembly} and produce a code memory, \texttt{cmem}.
     377Then, fetching a pseudoinstruction from the pseudo code memory at address \texttt{ppc} corresponds to fetching a sequence of instructions from the real code memory using $\sigma$ to expand pseudoinstructions.
     378The fetched sequence corresponds to the expansion, according to \texttt{sigma}, of the pseudoinstruction.
    428379
    429380At first, the lemmas appears to immediately imply the correctness of the assembler.
     
    437388\label{subsect.total.correctness.for.well.behaved.assembly.programs}
    438389
    439 In any `reasonable' assembly language addresses in code memory are just data that can be manipulated in multiple ways by the program.
    440 An assembly program can forge, compare and move addresses around, shift existing addresses or apply logical and arithmetical operations to them.
    441 Further, every optimising assembler is allowed to modify code memory.
    442 Hence only the semantics of a few of the aforementioned operations are preserved by an optimising assembler/compiler.
    443 Moreover, this characterisation of well behaved programs is clearly undecidable.
    444 
    445 To obtain a reasonable statement of correctness for our assembler, we need to trace memory locations (and, potentially, registers) that contain memory addresses.
    446 This is necessary for two purposes.
    447 
    448 First we must detect (at run time) programs that manipulate addresses in well behaved ways, according to some approximation of well-behavedness.
     390The traditional approach to verifying the correctness of an assembler is to treat memory addresses as opaque structures that cannot be modified.
     391This prevents assembly programs from performing any `dangerous', semantics breaking manipulations of memory addresses by making these programs simply unrepresentable in the source language.
     392
     393Here, we take a different approach to this problem: we trace memory locations (and, potentially, registers) that contain memory addresses.
     394We then prove that only those assembly programs that use addresses in `safe' ways have their semantics preserved by the assembly process.
     395We believe that this approach is more flexible when compared to the traditional approach, as in principle it allows us to introduce some permitted \emph{benign} manipulations of addresses that the traditional approach, using opaque addresses, cannot handle, therefore expanding the set of input programs that can be assembled correctly.
     396
     397However, with this approach we must detect (at run time) programs that manipulate addresses in well behaved ways, according to some approximation of well-behavedness.
    449398Second, we must compute statuses that correspond to pseudo-statuses.
    450399The contents of the program counter must be translated, as well as the contents of all traced locations, by applying the \texttt{sigma} map.
     
    473422This never fails, providing that our policy is correct:
    474423\begin{lstlisting}
    475 definition status_of_pseudo_status: internal_pseudo_address_map $\rightarrow$
    476   $\forall$ps:PseudoStatus. policy (code_memory $\ldots$ ps) $\rightarrow$ Status
     424definition status_of_pseudo_status:
     425 internal_pseudo_address_map $\rightarrow$ $\forall$pap. $\forall$ps: PseudoStatus pap.
     426 $\forall$sigma: Word $\rightarrow$ Word. $\forall$policy: Word $\rightarrow$ bool.
     427  Status (code_memory_of_pseudo_assembly_program pap sigma policy)
    477428\end{lstlisting}
    478429
     
    482433\begin{lstlisting}
    483434definition next_internal_pseudo_address_map: internal_pseudo_address_map
    484   $\rightarrow$ PseudoStatus $\rightarrow$ option internal_pseudo_address_map
    485 \end{lstlisting}
     435 $\rightarrow$ PseudoStatus $\rightarrow$ option internal_pseudo_address_map
     436\end{lstlisting}
     437Note, if we wished to allow `benign manipulations' of addresses, it would be this function that needs to be changed.
    486438
    487439The function \texttt{ticks\_of} computes how long---in clock cycles---a pseudoinstruction will take to execute when expanded in accordance with a given policy.
    488440The function returns a pair of natural numbers, needed for recording the execution times of each branch of a conditional jump.
    489441\begin{lstlisting}
    490 definition ticks_of:
    491   $\forall$p:pseudo_assembly_program. policy p $\rightarrow$ Word $\rightarrow$ nat $\times$ nat := $\ldots$
     442axiom ticks_of:
     443 $\forall$p:pseudo_assembly_program. policy p $\rightarrow$ Word $\rightarrow$ nat $\times$ nat := $\ldots$
    492444\end{lstlisting}
    493445
    494446Finally, we are able to state and prove our main theorem.
    495 This relates the execution of a single assembly instruction and the execution of (possibly) many machine code instructions, as long .
    496 That is, the assembly process preserves the semantics of an assembly program, as it is translated into machine code, as long as we are able to track memory addresses properly:
     447This relates the execution of a single assembly instruction and the execution of (possibly) many machine code instructions, as long as we are able to track memory addresses properly:
    497448\begin{lstlisting}
    498449theorem main_thm:
    499  ∀M,M':internal_pseudo_address_map.∀ps.∀pol: policy ps.
    500   next_internal_pseudo_address_map M ps = Some $\ldots$ M' →
    501    ∃n.
    502       execute n (status_of_pseudo_status M ps pol)
    503     = status_of_pseudo_status M'
    504        (execute_1_pseudo_instruction (ticks_of (code_memory $\ldots$ ps) pol) ps)
    505        [pol].
     450 $\forall$M, M': internal_pseudo_address_map.
     451 $\forall$program: pseudo_assembly_program.
     452 let $\langle$preamble, instr_list$\rangle$ := program in
     453 $\forall$is_well_labelled: is_well_labelled_p instr_list.
     454 $\forall$sigma: Word $\rightarrow$ Word.
     455 $\forall$policy: Word $\rightarrow$ bool.
     456 $\forall$sigma_policy_specification_witness.
     457 $\forall$ps: PseudoStatus program.
     458 $\forall$program_counter_in_bounds.
     459  next_internal_pseudo_address_map M program ps = Some $\ldots$ M' $\rightarrow$
     460  $\exists$n. execute n $\ldots$ (status_of_pseudo_status M $\ldots$ ps sigma policy) =
     461   status_of_pseudo_status M' $\ldots$ (execute_1_pseudo_instruction
     462     (ticks_of program sigma policy) program ps) sigma policy.
    506463\end{lstlisting}
    507464The statement is standard for forward simulation, but restricted to \texttt{PseudoStatuses} \texttt{ps} whose next instruction to be executed is well-behaved with respect to the \texttt{internal\_pseudo\_address\_map} \texttt{M}.
     465Further, we explicitly requires proof that our policy is correct and the pseudo program counter lies within the bounds of the program.
    508466Theorem \texttt{main\_thm} establishes the total correctness of the assembly process and can simply be lifted to the forward simulation of an arbitrary number of well behaved steps on the assembly program.
    509467
Note: See TracChangeset for help on using the changeset viewer.