Changeset 1019 for src/ASM/CPP2011
- Timestamp:
- Jun 21, 2011, 11:39:12 AM (10 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
src/ASM/CPP2011/cpp-2011.tex
r1018 r1019 331 331 % SECTION % 332 332 % ---------------------------------------------------------------------------- % 333 \subsection{ Total correctness of the assembler}333 \subsection{Correctness of the assembler with respect to fetching} 334 334 \label{subsect.total.correctness.of.the.assembler} 335 335 336 336 Using our policies, we now work toward proving the total correctness of the assembler. 337 By total correctness, we mean that the assembly process does not change the semantics of an assembly program. 337 By total correctness, we mean that the assembly process never fails when 338 provided with a good strategy and that it does not change the semantics of 339 a certain class of well behaved assembly program. 338 340 Naturally, this necessitates keeping some sort of correspondence between addresses at the assembly level and addresses at the machine code level. 339 341 For this, we use the \texttt{sigma} machinery defined at the end of Subsection~\ref{subsect.policies}. 340 342 341 We expand pseudoinstructions using the function \texttt{expand\_pseudo\_instruction}. 342 This function accepts a `policy decision'---an element of type \texttt{jump\_length}---that is used when expanding a \texttt{Call}, \texttt{Jmp} or conditional jump to a label into the correct machine instruction. 343 This \texttt{policy\_decision} is asssumed to originate from a policy as defined in Subsection~\ref{subsect.policies}. 343 We expand pseudoinstructions using the function \texttt{expand\_pseudo\_instruction} that takes a program made of pseudo-instructions, a good policy for 344 the program and a pointer to the pseudo memory. It returns the list of 345 instructions that correspond to the pseudo-instruction referenced by the 346 pointer. The policy is used to decide how to expand 347 \texttt{Call}s, \texttt{Jmp}s and conditional jumps. The function is given 348 a dependent type that incorporates its specification. The pre and 349 post-conditions are omitted in the paper due to lack of space. We show them 350 as an example for the next function, \texttt{build\_maps}. 344 351 \begin{lstlisting} 345 352 definition expand_pseudo_instruction: 346 ∀ lookup_labels, lookup_datalabels, pc, policy_decision.347 pseudo_instruction $\rightarrow$ list instruction:= $\ldots$353 ∀program.∀pol:policy program.∀ppc:Word. ... $\Sigma$res.list instruction. ... 354 := $\ldots$ 348 355 \end{lstlisting} 349 356 … … 363 370 \end{lstlisting} 364 371 The rather complex type of \texttt{build\_maps} owes to our use of Matita's Russell facility to provide a strong specification for the function in the type (c.f. the use of sigma types, through which Russell is implemented in Matita). 365 In particular, we express that any label should only appear exactly in any assembly program, and looking up a label in the newly created map is the same as applying the \texttt{sigma} function, recording the correspondence between pseudo program counters and program counters. 372 In particular, we express that for all labels that appear exactly once in any assembly program, the newly created map used in the implementation and the 373 stronger \texttt{sigma} function used in the specification agree. 366 374 367 375 Using \texttt{build\_maps}, we can express the following lemma, expressing the correctness of the assembly function: … … 410 418 Or, in plainer words, assembling and then immediately fetching again gets you back to where you started. 411 419 412 Lemma \texttt{fetch\_assembly\_pseudo} establishes a basic relationship between \texttt{expand\_pseudo\_instruction} and \texttt{assembly\_1\_pseudoinstruction}: 413 \begin{lstlisting} 414 lemma fetch_assembly_pseudo: $\forall$p. $\forall$pol: policy p. $\forall$ppc, lk_lbl, lk_dlbl. 415 $\forall$pi, code_memory, len, assembled, instructions, pc. 416 let exp := pol ppc in 417 let expand := expand_pseudo_instruction lk_lbls lk_dlbl pc exp pi in 418 let ass := assembly_1_pseudoinstruction p pol ppc pc lk_lbl lk_dlbl pi in 419 Some ? instructions = expand $\rightarrow$ 420 Some $\ldots$ $\langle$len, assembled$\rangle$ = ass $\rightarrow$ 421 encoding_check code_memory pc (pc + len) assembled $\rightarrow$ 422 fetch_many code_memory (pc + len) pc instructions. 420 Lemma \texttt{fetch\_assembly\_pseudo} (whose type is shown here slightly 421 simplified) is obtained by composition of 422 \texttt{expand\_pseudo\_instruction} and 423 \texttt{assembly\_1\_pseudoinstruction}: 424 \begin{lstlisting} 425 lemma fetch_assembly_pseudo: 426 ∀program.∀pol:policy program.∀ppc.∀code_memory. 427 let pi := fst (fetch_pseudo_instruction (snd program) ppc) in 428 let instructions := expand_pseudo_instruction program pol ppc ... in 429 let $\langle$len,assembled$\rangle$ := assembly_1_pseudoinstruction program pol ppc ... in 430 encoding_check code_memory pc (pc + len) assembled → 431 fetch_many code_memory (pc + len) pc instructions. 423 432 \end{lstlisting} 424 433 Here, \texttt{len} is the number of machine code instructions the pseudoinstruction at hand has been expanded into, and \texttt{encoding\_check} is a recursive function that checks that assembled machine code is correctly stored in code memory. … … 427 436 428 437 Intuitively, Lemma \texttt{fetch\_assembly\_pseudo} can be read as follows. 429 Suppose our policy \texttt{jump\_expansion} dictates that the pseudoinstruction indexed by the pseudo program counter \texttt{ppc} in assembly program \texttt{program} gives us the policy decision \texttt{ jexp}.430 Further, suppose we expand the pseudoinstruction at \texttt{ppc} with the policy decision \texttt{ jexp}, obtaining an (optional) list of machine code instructions \texttt{exp}.431 Suppose we also assemble the pseudoinstruction at \texttt{ppc} to obtain \texttt{ass }, a list of bytes.438 Suppose our policy \texttt{jump\_expansion} dictates that the pseudoinstruction indexed by the pseudo program counter \texttt{ppc} in assembly program \texttt{program} gives us the policy decision \texttt{pol}. 439 Further, suppose we expand the pseudoinstruction at \texttt{ppc} with the policy decision \texttt{pol}, obtaining the list of machine code instructions \texttt{instructions}. 440 Suppose we also assemble the pseudoinstruction at \texttt{ppc} to obtain \texttt{assembled}, a list of bytes. 432 441 Then, we check with \texttt{fetch\_many} that the number of machine instructions that were fetched matches the number of instruction that \texttt{expand\_pseudo\_instruction} expanded. 433 442 434 At first sight, Lemma \texttt{fetch\_assembly\_pseudo2} appears to nearly establish the correctness of the assembler: 435 \begin{lstlisting} 436 lemma fetch_assembly_pseudo2: $\forall$p. $\forall$pol: policy p. $\forall$assembled, costs, labels. 437 Some $\ldots$ $\langle$labels, costs$\rangle$ = build_maps program $\rightarrow$ 438 Some $\ldots$ $\langle$assembled, costs$\rangle$ = assembly program $\rightarrow$ $\forall$ppc. 443 The final lemma in this series is 444 \texttt{fetch\_assembly\_pseudo2} that combines the previous one with the 445 correctness of the operations that load the object code in the processor 446 memory. At first, the lemmas appears to nearly establish the correctness of 447 the assembler: 448 \begin{lstlisting} 449 lemma fetch_assembly_pseudo2: 450 ∀program,pol,ppc. 451 let $\langle$labels,costs$\rangle$ := build_maps program pol in 452 let assembled := \fst (assembly program pol) in 439 453 let code_memory := load_code_memory assembled in 440 let preamble := $\pi_1$ program in 441 let data_labels := construct_datalabels preamble in 442 let lk_labels := 443 λx. sigma program (address_of_word_labels_code_mem ($\pi_2$ program) x) in 444 let lk_dlabels := λx. lookup ? ? x data_labels (zero ?) in 445 let expansion := jump_expansion ppc program in 446 let $\langle$pi, newppc$\rangle$ := fetch_pseudo_instruction ($\pi_2$ program) ppc in 447 let ppc' := sigma program ppc in 448 let newppc' := sigma program newppc in 449 let instructions' := 450 expand_pseudo_instruction lk_labels lk_dlabels ppc' expansion pi in 451 let fetched := $\lambda$instr. fetch_many code_memory newppc' ppc' instr in 452 $\exists$instrs. Some ? instrs = instructions' $\wedge$ fetched instrs. 454 let data_labels := construct_datalabels (\fst program) in 455 let lookup_labels := 456 λx. sigma $\ldots$ pol (address_of_word_labels_code_mem (\snd program) x) in 457 let lookup_datalabels := λx. lookup ? ? x data_labels (zero ?) in 458 let $\langle$pi,newppc$\rangle$ := fetch_pseudo_instruction (\snd program) ppc in 459 let instructions ≝ expand_pseudo_instruction program pol ppc ... in 460 fetch_many code_memory (sigma program pol newppc) 461 (sigma program pol ppc) instructions. 453 462 \end{lstlisting} 454 463 455 464 Intuitively, we may read \texttt{fetch\_assembly\_pseudo2} as follows. 456 Suppose we are able to successfully assemble an assembly program using \texttt{assembly} and produce a code memory, \texttt{code\_memory}. 457 Then there exists some list of machine instructions equal to the expansion of a pseudoinstruction and the number of machine instructions that need to be fetched is equal to the number of machine instructions that the pseudoinstruction was expanded into. 458 459 However, this property is \emph{not} strong enough to establish that the semantics of an assembly program has been preserved by the assembly process. 460 In particular, \texttt{fetch\_assembly\_pseudo2} says nothing about how memory addresses evolve during assembly. 461 Memory addresses in one memory space may be mapped to memory addresses in a completely different memory space during assembly. 462 To handle this problem, we need some more machinery in order to track how memory addresses are moved around, and to record a correspondence between addresses at the pseudo assembly level and the machine code level. 463 464 We use an \texttt{internal\_pseudo\_address\_map} for this purpose. 465 An \texttt{internal\_pseudo\_address\_map} associates memory addresses in the lower internal RAM of a \texttt{PseudoStatus} with a physical memory address at the machine code level: 466 \begin{lstlisting} 467 definition internal_pseudo_address_map := list (BitVector 8). 468 \end{lstlisting} 469 470 We use a map associating memory addresses, of type \texttt{internal\_pseudo\_address\_map}, to convert the lower internal RAM of a \texttt{PseudoStatus} into the lower internal RAM of a \texttt{Status}. 471 The actual conversion process is performed by \texttt{low\_internal\_ram\_of\_pseudo\_low\_internal\_ram}:\footnote{An associated set of axioms describe how \texttt{low\_internal\_ram\_of\_pseudo\_low\_internal\_ram} behaves. This is a form of parametricity. We don't care about the particulars of the conversion functions, as long as they behave in accordance with our axioms.} 465 Suppose we are able to successfully assemble an assembly program using \texttt{assembly} and produce a code memory, \texttt{code\_memory}. The, fetching 466 a pseudo-instruction from the pseudo-code memory at address \texttt{ppc} 467 corresponds to fetching a sequence of instructions from the real code memory 468 at address \texttt{sigma program pol ppc}. The fetched sequence is granted 469 to be the expansion of the pseudo-instruction, according to the good policy 470 \texttt{pol}. 471 472 However, this property is \emph{not} strong enough to establish that the semantics of an assembly program has been preserved by the assembly process since it 473 does not establish the correspondence between the semantics of a 474 pseudo-instruction and that of its expansion. In particular, the two semantics 475 differ on instructions that \emph{could} directly manipulate program 476 addresses. 477 478 \subsection{Total correctness for well-behaved assembly programs} 479 In any reasonable assembly language addresses in code memory are just data that 480 can be manipulated in multiple ways by the program. An assembly program can 481 forge constant addresses, compare addresses, move addresses around, 482 shift existent addresses or apply logical and arithmetical operations 483 to them. Every optimizing assembler is allowed to modify the code memory. 484 Hence only the semantics of a few of the former operations is preserved by 485 an optimizing compiler. Moreover, the problem of characterizing well behaved 486 programs is clearly undecidable. 487 488 To obtain a reasonable statement of correctness for our compiler, we need to 489 trace the memory locations (and, potentially, registers) that contain memory 490 addresses. This is necessary for two purposes. The first one is to detect 491 (at run time) misbehaved programs that only manipulate addresses in well 492 behaved ways, according to some safe approximation of well behavedness. 493 The second one is to compute statuses that correspond to pseudo-statues: 494 not only the content of the program counter, but also the content of all 495 traced locations must be translated applying the \texttt{sigma} map; all 496 the remaining memory cells are copied verbatim. 497 498 For instance, after a function call, the two bytes that form the return 499 pseudo-address are pushed on top of the stack, i.e. in internal RAM. 500 This pseudo-internal RAM corresponds to an internal RAM where the stack 501 holds the real addresses after optimization, and all the other values untouched. 502 503 We use an \texttt{internal\_pseudo\_address\_map} 504 to trace addresses of code memory addresses in internal RAM. 505 The current code is parametric on the implementation of the map itself. 506 \begin{lstlisting} 507 axiom internal_pseudo_address_map: Type[0]. 508 \end{lstlisting} 509 510 The \texttt{low\_internal\_ram\_of\_pseudo\_low\_internal\_ram} function 511 converts the lower internal RAM of a \texttt{PseudoStatus} into the 512 lower internal RAM of a \texttt{Status}. A similar one exists for higher 513 internal RAM. Note that both RAM segments are index using addresses 7-bits 514 long. The function is currently axiomatized and an associated 515 set of axioms prescribe the behaviour of the function. 472 516 \begin{lstlisting} 473 517 axiom low_internal_ram_of_pseudo_low_internal_ram: 474 518 internal_pseudo_address_map $\rightarrow$ BitVectorTrie Byte 7 $\rightarrow$ BitVectorTrie Byte 7. 475 519 \end{lstlisting} 476 A similar axiom exists for high internal RAM.477 478 Notice, the MCS-51's internal RAM is addressed with a 7-bit `byte'.479 % dpm: ugly English, fix480 The whole of the internal RAM space is addressed with bytes: the first bit is used to distinguish between the programmer addressing low and high internal memory.481 520 482 521 Next, we are able to translate \texttt{PseudoStatus} records into \texttt{Status} records using \texttt{status\_of\_pseudo\_status}. 483 522 Translating a \texttt{PseudoStatus}'s code memory requires we expand pseudoinstructions and then assemble to obtain a trie of bytes. 484 This should never fail, providing that our policy is correct:523 This never fails, providing that our policy is correct: 485 524 \begin{lstlisting} 486 525 definition status_of_pseudo_status: internal_pseudo_address_map $\rightarrow$ … … 488 527 \end{lstlisting} 489 528 490 After fetching an assembly instruction we must update any \texttt{internal\_pseudo\hyp{}\_address\_map}s that may be laying around. 491 This is done with the following function: 529 The \texttt{next\_internal\_pseudo\_address\_map} function is responsible for 530 monitoring at run time the behaviour of pseudo-assembly programs to detect 531 the well behaved ones. It returns the new map that traces memory addresses 532 in internal RAM after execution of the next pseudo-instruction. It fails when 533 the instruction tampers with memory addresses in unticipated (but potentially 534 correct) ways. It thus decides membership to a correct but not complete 535 subset of well behaved programs. 492 536 \begin{lstlisting} 493 537 definition next_internal_pseudo_address_map: internal_pseudo_address_map … … 507 551 \begin{lstlisting} 508 552 theorem main_thm: 509 ∀M,M',ps,s,s''.510 next_internal_pseudo_address_map M ps = Some $\ldots$ M' $\rightarrow$511 status_of_pseudo_status M ps = Some $\ldots$ s $\rightarrow$512 status_of_pseudo_status M'513 (execute_1_pseudo_instruction514 (ticks_of (code_memory $\ldots$ ps)) ps) = Some $\ldots$ s'' $\rightarrow$515 $\exists$n. execute n s = s''.516 \end{lstlisting} 517 The statement can be given an intuitive reading as follows.518 Suppose our \texttt{PseudoStatus}, \texttt{ps}, can be successfully converted into a \texttt{Status}, \texttt{s}. 519 Suppose further that, after executing a single assembly instruction and converting the resulting \texttt{PseudoStatus} into a \texttt{Status}, making sure to map memory addresses at the pseudo assembly level to memory addresses at the machine level correctly, we obtain \texttt{s''}, being careful to track the number of ticks executed with \texttt{ticks\_of}.520 Then, there exists some number \texttt{n}, so that executing \texttt{n} machine code instructions in \texttt{Status} \texttt{s} gives us \texttt{Status} \texttt{s''}. 521 522 Theorem \texttt{main\_thm} establishes the correctness of the assembly process.553 ∀M,M':internal_pseudo_address_map.∀ps.∀pol: policy ps. 554 next_internal_pseudo_address_map M ps = Some $\ldots$ M' → 555 ∃n. 556 execute n (status_of_pseudo_status M ps pol) 557 = status_of_pseudo_status M' 558 (execute_1_pseudo_instruction (ticks_of (code_memory $\ldots$ ps) pol) ps) 559 [pol]. 560 \end{lstlisting} 561 The statement is standard for forward simulation, but restricted to 562 \texttt{PseudoStatuses} \texttt{ps} whose next instruction to be executed is 563 well-behaved with respect to the \texttt{internal\_pseudo\_address\_map} \texttt{M}. 564 565 Theorem \texttt{main\_thm} establishes the total correctness of the assembly process and can simply be lifted to the forward simulation of an arbitrary number 566 of well behaved steps on the assembly program. 523 567 524 568 % ---------------------------------------------------------------------------- %
Note: See TracChangeset
for help on using the changeset viewer.