- Timestamp:
- Jun 13, 2012, 3:17:24 PM (9 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
src/ASM/CPP2012-asm/cpp-2012-asm.tex
r2053 r2058 206 206 \label{subsect.the.assembler.and.semantics.of.machine.code} 207 207 208 The formalisation in Matita of the semantics of MCS-51 machine code is described in~\cite{mulligan:executable:2011}. 209 We merely describe enough here to understand the rest of the proof. 210 211 The emulator centres around a \texttt{Status} record, describing the microprocessor's state. 208 Our emulator centres around a \texttt{Status} record, describing the microprocessor's state. 212 209 This record contains fields corresponding to the microprocessor's program counter, registers, and so on. 213 210 At the machine code level, code memory is implemented as a compact trie of bytes, addressed by the program counter. 214 Machine code programs are loaded into \texttt{Status} using the \texttt{load\_code\_memory} function.211 We parameterise \texttt{Status} records by this representation as a few technical tasks manipulating statuses are made simpler using this approach, as well as permitting a modicum of abstraction. 215 212 216 213 We may execute a single step of a machine code program using the \texttt{execute\_1} function, which returns an updated \texttt{Status}: 217 214 \begin{lstlisting} 218 definition execute_1: Status $\rightarrow$ Status:= $\ldots$215 definition execute_1: $\forall$cm. Status cm $\rightarrow$ Status cm := $\ldots$ 219 216 \end{lstlisting} 220 217 The function \texttt{execute} allows one to execute an arbitrary, but fixed (due to Matita's normalisation requirement) number of steps of a program. 218 Execution proceeds by a case analysis on the instruction fetched from code memory using the program counter of the input \texttt{Status} record. 221 219 222 220 Naturally, assembly programs have analogues. … … 228 226 Our analogue of \texttt{execute\_1} is \texttt{execute\_1\_pseudo\_instruction}: 229 227 \begin{lstlisting} 230 definition execute_1_pseudo_instruction: (Word $\rightarrow$ nat $\times$ nat) $\rightarrow$231 PseudoStatus $\rightarrow$ PseudoStatus:= $\ldots$228 definition execute_1_pseudo_instruction: 229 (Word $\rightarrow$ nat $\times$ nat) $\rightarrow$ $\forall$cm. PseudoStatus cm $\rightarrow$ PseudoStatus cm := $\ldots$ 232 230 \end{lstlisting} 233 231 Notice, here, that the emulation function for assembly programs takes an additional argument. … … 241 239 242 240 The assembler, mapping programs consisting of lists of pseudoinstructions to lists of bytes, operates in a mostly straightforward manner. 243 To a degree of approximation , the assembler on an assembly program,consisting of $n$ pseudoinstructions $\mathtt{P_i}$ for $1 \leq i \leq n$, works as in the following diagram (we use $-^{*}$ to denote a combined map and flatten operation):241 To a degree of approximation the assembler, on an assembly program consisting of $n$ pseudoinstructions $\mathtt{P_i}$ for $1 \leq i \leq n$, works as in the following diagram (we use $-^{*}$ to denote a combined map and flatten operation): 244 242 \begin{displaymath} 245 243 [\mathtt{P_1}, \ldots \mathtt{P_n}] \xrightarrow{\left(\mathtt{P_i} \xrightarrow{\mbox{\fontsize{7}{9}\selectfont$\mathtt{expand\_pseudo\_instruction}$}} \mathtt{[I^1_i, \ldots I^q_i]} \xrightarrow{\mbox{\fontsize{7}{9}\selectfont$\mathtt{~~~~~~~~assembly1^{*}~~~~~~~~}$}} \mathtt{[0110]}\right)^{*}} \mathtt{[010101]} … … 252 250 % SECTION % 253 251 % ---------------------------------------------------------------------------- % 254 \subsection{Policies}255 \label{subsect.policies}256 257 Policies exist to dictate how conditional and unconditional jumps at the assembly level should be expanded into machine code instructions.258 Using policies, we are able to completely decouple the decision over how jumps are expanded with the act of expansion, simplifying our proofs.259 As mentioned, the MCS-51 instruction set includes three different jump instructions: \texttt{SJMP}, \texttt{AJMP} and \texttt{LJMP}; call these `short', `medium' and `long' jumps, respectively:260 \begin{lstlisting}261 inductive jump_length: Type[0] :=262 |short_jump:jump_length |medium_jump:jump_length |long_jump:jump_length.263 \end{lstlisting}264 265 A \texttt{jump\_expansion\_policy} is a map from pseudo program counters (implemented as \texttt{Word}s) to \texttt{jump\_length}s.266 Efficient implementations of policies are based on tries.267 Intuitively, a policy maps positions in a program (indexed using program counters implemented as \texttt{Word}s) to a particular variety of jump:268 \begin{lstlisting}269 definition policy_type := Word $\rightarrow$ jump_length.270 \end{lstlisting}271 272 Next, we require a series of `sigma' functions.273 These functions map assembly program counters to their machine code counterparts, establishing the correspondence between `positions' in an assembly program and `positions' in a machine code program.274 At the heart of this process is \texttt{sigma0} which traverses an assembly program building maps from pseudo program counters to program counters.275 This function fails if and only if an internal call to \texttt{assembly\_1\_pseudoinstruction\_safe} fails, which happens if a jump pseudoinstruction is expanded incorrectly:276 \begin{lstlisting}277 definition sigma0: pseudo_assembly_program $\rightarrow$ policy_type278 $\rightarrow$ option (nat $\times$ (nat $\times$ (BitVectorTrie Word 16))) := $\ldots$279 \end{lstlisting}280 Here, the returned \texttt{BitVectorTrie} is a map between pseudo program counters and program counters, and is constructed by successively expanding pseudoinstructions and incrementing the two program counters the requisite amount to keep them in correct correspondence.281 The two natural numbers returned are the maximum values that the two program counters attained.282 283 We eventually lift this to functions from pseudo program counters to program counters, implemented as \texttt{Word}s:284 \begin{lstlisting}285 definition sigma_safe:286 pseudo_assembly_program $\rightarrow$ policy_type $\rightarrow$ option (Word $\rightarrow$ Word) := $\ldots$287 \end{lstlisting}288 289 Now, it's possible to define what a `good policy' is for a program \texttt{p}.290 A policy \texttt{pol} is deemed good when it prevents \texttt{sigma\_safe} from failing on \texttt{p}.291 Failure is only possible when the policy dictates that short or medium jumps be expanded to jumps to locations too far away, or when the produced object code does not fit into code memory.292 A \texttt{policy} for a program \texttt{p} is a policy that is good for \texttt{p}:293 \begin{lstlisting}294 definition policy_ok := $\lambda$pol.$\lambda$p. sigma_safe p $\neq$ None $\ldots$295 definition policy :=296 $\lambda$p. $\Sigma$jump_expansion: policy_type. policy_ok jump_expansion p297 \end{lstlisting}298 299 Finally, we obtain \texttt{sigma}, a mapping from pseudo program counters to program counters that takes in input a good policy and thus never fails.300 Note how we avoid failure here, and in most of the remaining functions, by restricting the domain using the dependent type \texttt{policy}:301 \begin{lstlisting}302 definition sigma: $\forall$p. policy p $\rightarrow$ Word $\rightarrow$ Word := $\ldots$303 \end{lstlisting}304 305 % ---------------------------------------------------------------------------- %306 % SECTION %307 % ---------------------------------------------------------------------------- %308 252 \subsection{Correctness of the assembler with respect to fetching} 309 253 \label{subsect.total.correctness.of.the.assembler} 310 254 255 Throughout the proof of correctness for assembly we assume that policies are bifurcated into two functions: \texttt{sigma} mapping \texttt{Word} to \texttt{Word} and \texttt{policy} mapping \texttt{Word} to \texttt{bool}. 256 For our purposes, \texttt{sigma} is most interesting, as it maps pseudo program counters to program counters; \texttt{policy} is merely a technical device used in jump expansion. 257 311 258 Using our policies, we now work toward proving the total correctness of the assembler. 312 259 By `total correctness', we mean that the assembly process never fails when provided with a good policy and that the process does not change the semantics of a certain class of well behaved assembly programs. 313 Naturally, this necessitates keeping some sort of correspondence between addresses at the assembly level and addresses at the machine code level. 314 For this, we use the \texttt{sigma} machinery defined at the end of Subsection~\ref{subsect.policies}. 260 By `good policy' we mean that policies provided to us will keep a lockstep correspondence between addresses at the assembly level and addresses at the machine code level: 261 \begin{displaymath} 262 \texttt{sigma}(\texttt{ppc} + 1) = \texttt{pc} + \texttt{current\_instruction\_size} 263 \end{displaymath} 264 along with some other technical properties related to program counters falling within the bounds of our programs. 265 We assume the correctness of the policies given to us using a function, \texttt{sigma\_policy\_specification} that we take in input, when needed. 315 266 316 267 We expand pseudoinstructions using the function \texttt{expand\_pseudo\_instruction}. 317 This takes an assembly program (consisting of a list of pseudoinstructions), a good policy for the program and a pointer to the pseudo code memory.268 This takes an assembly program (consisting of a list of pseudoinstructions), a policy for the program, a map detailing the physical addresses of data labels from the pseudo program's preamble, and the pseudoinstruction to be expanded. 318 269 It returns a list of instructions, corresponding to the expanded pseudoinstruction referenced by the pointer. 319 The policy is used to decide how to expand \texttt{Call}s, \texttt{Jmp}s and conditional jumps. 320 The function is given a dependent type that incorporates its specification. 321 Its pre- and post-conditions are omitted in the paper due to lack of space. 322 We show them as an example in the next function, \texttt{build\_maps}. 270 Execution proceeds by case analysis on the pseudoinstruction given as input. 271 The policy, \texttt{sigma}, is used to decide how to expand \texttt{Call}s, \texttt{Jmp}s and conditional jumps. 323 272 \begin{lstlisting} 324 273 definition expand_pseudo_instruction: 325 $\forall$program. $\forall$pol: policy program. 326 $\forall$ppc:Word. $\ldots$ $\Sigma$res. list instruction. $\ldots$ := $\ldots$ 327 \end{lstlisting} 328 329 The following function, \texttt{build\_maps}, is used to construct a pair of mappings from program counters to labels and cost labels, respectively. 330 Cost labels are a technical device used in the CerCo prototype C compiler for proving that the compiler is cost preserving. 331 For our purposes in this paper, they can be safely ignored, though the interested reader may consult~\cite{amadio:certifying:2010} for an overview. 332 333 The label map, on the other hand, records the position of labels that appear in an assembly program, so that the pseudoinstruction expansion process can replace them with real memory addresses: 334 \begin{lstlisting} 335 definition build_maps: 336 $\forall$p. $\forall$pol: policy p. 337 $\Sigma$res : ((BitVectorTrie Word 16) $\times$ (BitVectorTrie Word 16)). 338 let $\langle$labels, costs$\rangle$ := res in 339 $\forall$id. occurs_exactly_once id ($\pi_2$ p) $\rightarrow$ 340 let addr := address_of_word_labels_code_mem ($\pi_2$ p) id in 341 lookup $\ldots$ id labels (zero $\ldots$) = sigma pseudo_program pol addr := $\ldots$ 342 \end{lstlisting} 343 The type of \texttt{build\_maps} owes to our use of Matita's Russell facility to provide a strong specification for the function in the type (c.f. the use of $\Sigma$-types and coercions, through which Russell is simulated in Matita). 344 We express that for all labels that appear exactly once in any assembly program, the newly created map used in the implementation, and the stronger \texttt{sigma} function used in the specification, agree. 345 346 Using \texttt{build\_maps}, we can express the following lemma, expressing the correctness of the assembly function: 347 \begin{lstlisting} 348 lemma assembly_ok: $\forall$p,pol,assembled. 349 let $\langle$labels, costs$\rangle$ := build_maps p pol in 350 $\langle$assembled,costs$\rangle$ = assembly p pol $\rightarrow$ 351 let cmem := load_code_memory assembled in 352 let preamble := $\pi_1$ p in 353 let dlbls := construct_datalabels preamble in 354 let addr := address_of_word_labels_code_mem ($\pi_2$ p) in 355 let lk_lbls := λx. sigma p pol (addr x) in 356 let lk_dlbls := λx. lookup $\ldots$ x datalabels (zero ?) in 357 $\forall$ppc, pi, newppc. 358 $\forall$prf: $\langle$pi, newppc$\rangle$ = fetch_pseudo_instruction ($\pi_2$ p) ppc. 359 $\forall$len, assm. 360 let spol := sigma program pol ppc in 361 let spol_len := spol + len in 362 let echeck := encoding_check cmem spol spol_len assm in 363 let a1pi := assembly_1_pseudoinstruction in 364 $\langle$len, assm$\rangle$ = a1pi p pol ppc lk_lbls lk_dlbls pi (refl $\ldots$) (refl $\ldots$) ? $\rightarrow$ 365 echeck $\wedge$ sigma p pol newppc = spol_len. 366 \end{lstlisting} 367 Suppose also we assemble our program \texttt{p} in accordance with a policy \texttt{pol} to obtain \texttt{assembled}. 368 Here, we perform a `sanity check' to ensure that the two cost label maps generated are identical, before loading the assembled program into code memory \texttt{cmem}. 274 $\forall$lookup_labels: Identifier $\rightarrow$ Word. 275 $\forall$sigma: Word $\rightarrow$ Word. 276 $\forall$policy: Word $\rightarrow$ bool. 277 Word $\rightarrow$ (Identifier $\rightarrow$ Word) $\rightarrow$ 278 pseudo_instruction $\rightarrow$ list instruction := ... 279 \end{lstlisting} 280 281 We can express the following lemma, expressing the correctness of the assembly function (slightly simplified): 282 \begin{lstlisting} 283 lemma assembly_ok: 284 $\forall$program: pseudo_assembly_program. 285 $\forall$sigma: Word $\rightarrow$ Word. 286 $\forall$policy: Word $\rightarrow$ bool. 287 $\forall$sigma_policy_witness. 288 $\forall$assembled. 289 $\forall$costs': BitVectorTrie costlabel 16. 290 let $\langle$preamble, instr_list$\rangle$ := program in 291 let $\langle$labels, costs$\rangle$ := create_label_cost_map instr_list in 292 $\langle$assembled,costs'$\rangle$ = assembly program sigma policy $\rightarrow$ 293 let cmem := load_code_memory assembled in 294 $\forall$ppc. 295 let $\langle$pi, newppc$\rangle$ := fetch_pseudo_instruction instr_list ppc in 296 let $\langle$len,assembled$\rangle$ := assembly_1_pseudoinstruction $\ldots$ ppc $\ldots$ pi in 297 let pc := sigma ppc in 298 let pc_plus_len := add $\ldots$ pc (bitvector_of_nat $\ldots$ len) in 299 encoding_check cmem pc pc_plus_len assembled $\wedge$ 300 sigma newppc = add $\ldots$ pc (bitvector_of_nat $\ldots$ len). 301 \end{lstlisting} 302 Here, \texttt{encoding\_check} is a recursive function that checks that assembled machine code is correctly stored in code memory. 303 Suppose also we assemble our program \texttt{p} in accordance with a policy \texttt{sigma} to obtain \texttt{assembled}, loading the assembled program into code memory \texttt{cmem}. 369 304 Then, for every pseudoinstruction \texttt{pi}, pseudo program counter \texttt{ppc} and new pseudo program counter \texttt{newppc}, such that we obtain \texttt{pi} and \texttt{newppc} from fetching a pseudoinstruction at \texttt{ppc}, we check that assembling this pseudoinstruction produces the correct number of machine code instructions, and that the new pseudo program counter \texttt{ppc} has the value expected of it. 370 305 371 Theorem\texttt{fetch\_assembly} establishes that the \texttt{fetch} and \texttt{assembly1} functions interact correctly.306 Lemma \texttt{fetch\_assembly} establishes that the \texttt{fetch} and \texttt{assembly1} functions interact correctly. 372 307 The \texttt{fetch} function, as its name implies, fetches the instruction indexed by the program counter in the code memory, while \texttt{assembly1} maps a single instruction to its byte encoding: 373 308 \begin{lstlisting} 374 theorem fetch_assembly: $\forall$pc, i, cmem, assembled. assembled=assembly1 i $\rightarrow$ 309 lemma fetch_assembly: 310 $\forall$pc: Word. 311 $\forall$i: instruction. 312 $\forall$code_memory: BitVectorTrie Byte 16. 313 $\forall$assembled: list Byte. 314 assembled = assembly1 i $\rightarrow$ 375 315 let len := length $\ldots$ assembled in 376 encoding_check cmem pc (pc + len) assembled $\rightarrow$377 let fetched := fetch code_memory (bitvector_of_nat $\ldots$ pc) in378 let $\langle$instr_pc, ticks$\rangle$ := fetchedin379 let $\langle$instr, pc'$\rangle$ := instr_pc in380 (eq_instruction instr i $\wedge$eqb ticks (ticks_of_instruction instr) $\wedge$381 eq_bv $\ldots$ pc' (pc + len)) = true.316 let pc_plus_len := add $\ldots$ pc (bitvector_of_nat $\ldots$ len) in 317 encoding_check code_memory pc pc_plus_len assembled $\rightarrow$ 318 let $\langle$instr, pc', ticks$\rangle$ := fetch code_memory pc in 319 (eq_instruction instr i $\wedge$ 320 eqb ticks (ticks_of_instruction instr) $\wedge$ 321 eq_bv $\ldots$ pc' pc_plus_len) = true. 382 322 \end{lstlisting} 383 323 In particular, we read \texttt{fetch\_assembly} as follows. 384 324 Given an instruction, \texttt{i}, we first assemble the instruction to obtain \texttt{assembled}, checking that the assembled instruction was stored in code memory correctly. 385 Fetching from code memory, we obtain \texttt{fetched},a tuple consisting of the instruction, new program counter, and the number of ticks this instruction will take to execute.386 Deconstructing these tuples, we finally check that the fetched instruction is the same instruction that we began with, and the number of ticks this instruction will take to execute is the same as the result returned by a lookup function, \texttt{ticks\_of\_instruction}, devoted to tracking this information.325 Fetching from code memory, we obtain a tuple consisting of the instruction, new program counter, and the number of ticks this instruction will take to execute. 326 We finally check that the fetched instruction is the same instruction that we began with, and the number of ticks this instruction will take to execute is the same as the result returned by a lookup function, \texttt{ticks\_of\_instruction}, devoted to tracking this information. 387 327 Or, in plainer words, assembling and then immediately fetching again gets you back to where you started. 388 328 … … 390 330 \begin{lstlisting} 391 331 lemma fetch_assembly_pseudo: 392 ∀program.∀pol:policy program.∀ppc.∀code_memory. 393 let pi := $\pi_1$ (fetch_pseudo_instruction ($\pi_2$ program) ppc) in 394 let instructions := expand_pseudo_instruction program pol ppc ... in 395 let $\langle$len,assembled$\rangle$ := assembly_1_pseudoinstruction program pol ppc ... in 396 encoding_check code_memory pc (pc + len) assembled → 397 fetch_many code_memory (pc + len) pc instructions. 398 \end{lstlisting} 399 Here, \texttt{len} is the number of machine code instructions the pseudoinstruction at hand has been expanded into, and \texttt{encoding\_check} is a recursive function that checks that assembled machine code is correctly stored in code memory. 332 $\forall$program: pseudo_assembly_program. 333 $\forall$sigma: Word $\rightarrow$ Word. 334 $\forall$policy: Word $\rightarrow$ bool. 335 $\forall$ppc. 336 $\forall$code_memory. 337 let $\langle$preamble, instr_list$\rangle$ := program in 338 let pi := $\pi_1$ (fetch_pseudo_instruction instr_list ppc) in 339 let pc := sigma ppc in 340 let instrs := expand_pseudo_instruction $\ldots$ sigma policy ppc $\ldots$ pi in 341 let $\langle$l, a$\rangle$ := assembly_1_pseudoinstruction $\ldots$ sigma policy ppc $\ldots$ pi in 342 let pc_plus_len := add $\ldots$ pc (bitvector_of_nat $\ldots$ l) in 343 encoding_check code_memory pc pc_plus_len a $\rightarrow$ 344 fetch_many code_memory pc_plus_len pc instructions. 345 \end{lstlisting} 346 Here, \texttt{l} is the number of machine code instructions the pseudoinstruction at hand has been expanded into. 400 347 We assemble a single pseudoinstruction with \texttt{assembly\_1\_pseudoinstruction}, which internally calls \texttt{jump\_expansion} and \texttt{expand\_pseudo\_instruction}. 401 348 The function \texttt{fetch\_many} fetches multiple machine code instructions from code memory and performs some routine checks. 402 349 403 350 Intuitively, Lemma \texttt{fetch\_assembly\_pseudo} can be read as follows. 404 Suppose we expand the pseudoinstruction at \texttt{ppc} with the policy decision \texttt{ pol}, obtaining the list of machine code instructions \texttt{instructions}.405 Suppose we also assemble the pseudoinstruction at \texttt{ppc} to obtain \texttt{a ssembled}, a list of bytes.351 Suppose we expand the pseudoinstruction at \texttt{ppc} with the policy decision \texttt{sigma}, obtaining the list of machine code instructions \texttt{instrs}. 352 Suppose we also assemble the pseudoinstruction at \texttt{ppc} to obtain \texttt{a}, a list of bytes. 406 353 Then, we check with \texttt{fetch\_many} that the number of machine instructions that were fetched matches the number of instruction that \texttt{expand\_pseudo\_instruction} expanded. 407 354 408 355 The final lemma in this series is \texttt{fetch\_assembly\_pseudo2} that combines the Lemma \texttt{fetch\_assembly\_pseudo} with the correctness of the functions that load object code into the processor's memory. 356 Again, we slightly simplify: 409 357 \begin{lstlisting} 410 358 lemma fetch_assembly_pseudo2: 411 ∀program,pol,ppc. 412 let $\langle$labels,costs$\rangle$ := build_maps program pol in 413 let assembled := $\pi_1$ (assembly program pol) in 414 let code_memory := load_code_memory assembled in 415 let data_labels := construct_datalabels ($\pi_1$ program) in 416 let lookup_labels := 417 λx. sigma $\ldots$ pol (address_of_word_labels_code_mem ($\pi_2$ program) x) in 418 let lookup_datalabels := λx. lookup ? ? x data_labels (zero ?) in 419 let $\langle$pi,newppc$\rangle$ := fetch_pseudo_instruction ($\pi_2$ program) ppc in 420 let instrs ≝ expand_pseudo_instruction program pol ppc ... in 421 fetch_many code_memory (sigma $\ldots$ pol newppc) (sigma $\ldots$ pol ppc) instrs. 422 \end{lstlisting} 359 $\forall$program. 360 $\forall$sigma. 361 $\forall$policy. 362 $\forall$sigma_policy_specification_witness. 363 $\forall$ppc. 364 let $\langle$preamble, instr_list$\rangle$ := program in 365 let $\langle$labels, costs$\rangle$ := create_label_cost_map instr_list in 366 let $\langle$assembled, costs'$\rangle$ := $\pi_1$ $\ldots$ (assembly program sigma policy) in 367 let cmem := load_code_memory assembled in 368 let $\langle$pi, newppc$\rangle$ := fetch_pseudo_instruction instr_list ppc in 369 let instructions := expand_pseudo_instruction $\ldots$ sigma ppc $\ldots$ pi in 370 fetch_many cmem (sigma newppc) (sigma ppc) instructions. 371 \end{lstlisting} 372 373 Here we use $\pi_1 \ldots$ to eject out the existential witness from the Russell-typed function \texttt{assembly}. 423 374 424 375 We read \texttt{fetch\_assembly\_pseudo2} as follows. 425 Suppose we are able to successfully assemble an assembly program using \texttt{assembly} and produce a code memory, \texttt{c ode\_memory}.426 Then, fetching a pseudoinstruction from the pseudo code memory at address \texttt{ppc} corresponds to fetching a sequence of instructions from the real code memory at address \texttt{sigma program pol ppc}.427 The fetched sequence corresponds to the expansion, according to \texttt{ pol}, of the pseudoinstruction.376 Suppose we are able to successfully assemble an assembly program using \texttt{assembly} and produce a code memory, \texttt{cmem}. 377 Then, fetching a pseudoinstruction from the pseudo code memory at address \texttt{ppc} corresponds to fetching a sequence of instructions from the real code memory using $\sigma$ to expand pseudoinstructions. 378 The fetched sequence corresponds to the expansion, according to \texttt{sigma}, of the pseudoinstruction. 428 379 429 380 At first, the lemmas appears to immediately imply the correctness of the assembler. … … 437 388 \label{subsect.total.correctness.for.well.behaved.assembly.programs} 438 389 439 In any `reasonable' assembly language addresses in code memory are just data that can be manipulated in multiple ways by the program. 440 An assembly program can forge, compare and move addresses around, shift existing addresses or apply logical and arithmetical operations to them. 441 Further, every optimising assembler is allowed to modify code memory. 442 Hence only the semantics of a few of the aforementioned operations are preserved by an optimising assembler/compiler. 443 Moreover, this characterisation of well behaved programs is clearly undecidable. 444 445 To obtain a reasonable statement of correctness for our assembler, we need to trace memory locations (and, potentially, registers) that contain memory addresses. 446 This is necessary for two purposes. 447 448 First we must detect (at run time) programs that manipulate addresses in well behaved ways, according to some approximation of well-behavedness. 390 The traditional approach to verifying the correctness of an assembler is to treat memory addresses as opaque structures that cannot be modified. 391 This prevents assembly programs from performing any `dangerous', semantics breaking manipulations of memory addresses by making these programs simply unrepresentable in the source language. 392 393 Here, we take a different approach to this problem: we trace memory locations (and, potentially, registers) that contain memory addresses. 394 We then prove that only those assembly programs that use addresses in `safe' ways have their semantics preserved by the assembly process. 395 We believe that this approach is more flexible when compared to the traditional approach, as in principle it allows us to introduce some permitted \emph{benign} manipulations of addresses that the traditional approach, using opaque addresses, cannot handle, therefore expanding the set of input programs that can be assembled correctly. 396 397 However, with this approach we must detect (at run time) programs that manipulate addresses in well behaved ways, according to some approximation of well-behavedness. 449 398 Second, we must compute statuses that correspond to pseudo-statuses. 450 399 The contents of the program counter must be translated, as well as the contents of all traced locations, by applying the \texttt{sigma} map. … … 473 422 This never fails, providing that our policy is correct: 474 423 \begin{lstlisting} 475 definition status_of_pseudo_status: internal_pseudo_address_map $\rightarrow$ 476 $\forall$ps:PseudoStatus. policy (code_memory $\ldots$ ps) $\rightarrow$ Status 424 definition status_of_pseudo_status: 425 internal_pseudo_address_map $\rightarrow$ $\forall$pap. $\forall$ps: PseudoStatus pap. 426 $\forall$sigma: Word $\rightarrow$ Word. $\forall$policy: Word $\rightarrow$ bool. 427 Status (code_memory_of_pseudo_assembly_program pap sigma policy) 477 428 \end{lstlisting} 478 429 … … 482 433 \begin{lstlisting} 483 434 definition next_internal_pseudo_address_map: internal_pseudo_address_map 484 $\rightarrow$ PseudoStatus $\rightarrow$ option internal_pseudo_address_map 485 \end{lstlisting} 435 $\rightarrow$ PseudoStatus $\rightarrow$ option internal_pseudo_address_map 436 \end{lstlisting} 437 Note, if we wished to allow `benign manipulations' of addresses, it would be this function that needs to be changed. 486 438 487 439 The function \texttt{ticks\_of} computes how long---in clock cycles---a pseudoinstruction will take to execute when expanded in accordance with a given policy. 488 440 The function returns a pair of natural numbers, needed for recording the execution times of each branch of a conditional jump. 489 441 \begin{lstlisting} 490 definitionticks_of:491 442 axiom ticks_of: 443 $\forall$p:pseudo_assembly_program. policy p $\rightarrow$ Word $\rightarrow$ nat $\times$ nat := $\ldots$ 492 444 \end{lstlisting} 493 445 494 446 Finally, we are able to state and prove our main theorem. 495 This relates the execution of a single assembly instruction and the execution of (possibly) many machine code instructions, as long . 496 That is, the assembly process preserves the semantics of an assembly program, as it is translated into machine code, as long as we are able to track memory addresses properly: 447 This relates the execution of a single assembly instruction and the execution of (possibly) many machine code instructions, as long as we are able to track memory addresses properly: 497 448 \begin{lstlisting} 498 449 theorem main_thm: 499 ∀M,M':internal_pseudo_address_map.∀ps.∀pol: policy ps. 500 next_internal_pseudo_address_map M ps = Some $\ldots$ M' → 501 ∃n. 502 execute n (status_of_pseudo_status M ps pol) 503 = status_of_pseudo_status M' 504 (execute_1_pseudo_instruction (ticks_of (code_memory $\ldots$ ps) pol) ps) 505 [pol]. 450 $\forall$M, M': internal_pseudo_address_map. 451 $\forall$program: pseudo_assembly_program. 452 let $\langle$preamble, instr_list$\rangle$ := program in 453 $\forall$is_well_labelled: is_well_labelled_p instr_list. 454 $\forall$sigma: Word $\rightarrow$ Word. 455 $\forall$policy: Word $\rightarrow$ bool. 456 $\forall$sigma_policy_specification_witness. 457 $\forall$ps: PseudoStatus program. 458 $\forall$program_counter_in_bounds. 459 next_internal_pseudo_address_map M program ps = Some $\ldots$ M' $\rightarrow$ 460 $\exists$n. execute n $\ldots$ (status_of_pseudo_status M $\ldots$ ps sigma policy) = 461 status_of_pseudo_status M' $\ldots$ (execute_1_pseudo_instruction 462 (ticks_of program sigma policy) program ps) sigma policy. 506 463 \end{lstlisting} 507 464 The statement is standard for forward simulation, but restricted to \texttt{PseudoStatuses} \texttt{ps} whose next instruction to be executed is well-behaved with respect to the \texttt{internal\_pseudo\_address\_map} \texttt{M}. 465 Further, we explicitly requires proof that our policy is correct and the pseudo program counter lies within the bounds of the program. 508 466 Theorem \texttt{main\_thm} establishes the total correctness of the assembly process and can simply be lifted to the forward simulation of an arbitrary number of well behaved steps on the assembly program. 509 467
Note: See TracChangeset
for help on using the changeset viewer.