Changeset 1021 for src/ASM/CPP2011


Ignore:
Timestamp:
Jun 21, 2011, 12:15:04 PM (8 years ago)
Author:
mulligan
Message:

tidied english in sect 3

File:
1 edited

Legend:

Unmodified
Added
Removed
  • src/ASM/CPP2011/cpp-2011.tex

    r1020 r1021  
    244244We merely describe enough here to understand the rest of the proof.
    245245
    246 At heart, the MCS-51 emulator centres around a \texttt{Status} record, describing the current state of the microprocessor.
     246The MCS-51 emulator centres around a \texttt{Status} record, describing the current state of the microprocessor.
    247247This record contains fields corresponding to the microprocessor's program counter, special function registers, and so on.
    248 At the machine code level, code memory is implemented as a trie of bytes, addressed by the program counter.
     248At the machine code level, code memory is implemented as a compact trie of bytes, addressed by the program counter.
    249249Machine code programs are loaded into \texttt{Status} using the \texttt{load\_code\_memory} function.
    250250
    251 We may execut a single step of a machine code program using the \texttt{execute\_1} function, which returns an updated \texttt{Status}:
     251We may execute a single step of a machine code program using the \texttt{execute\_1} function, which returns an updated \texttt{Status}:
    252252\begin{lstlisting}
    253253definition execute_1: Status $\rightarrow$ Status := $\ldots$
    254254\end{lstlisting}
    255 The function \texttt{execute} allows one to execute an arbitrary, but fixed (due to Matita's normalisation requirement!) number of steps of a program.
     255The function \texttt{execute} allows one to execute an arbitrary, but fixed (due to Matita's normalisation requirement) number of steps of a program.
    256256
    257257Naturally, assembly programs have analogues.
    258258The counterpart of the \texttt{Status} record is \texttt{PseudoStatus}.
    259259Instead of code memory being implemented as tries of bytes, code memory is here implemented as lists of pseudoinstructions, and program counters are merely indices into this list.
    260 In actual fact, both \texttt{Status} and \texttt{PseudoStatus} are both specialisations of the same \texttt{PreStatus} record, parametric in the representation of code memory.
     260Both \texttt{Status} and \texttt{PseudoStatus} are specialisations of the same \texttt{PreStatus} record, parametric in the representation of code memory.
    261261This allows us to share some code that is common to both records (for instance, `setter' and `getter' functions).
    262 A further benefit of this sharing is that those instructions that are completely ambivalent about the particular representation of code memory can be factored out into their own type.
    263262
    264263Our analogue of \texttt{execute\_1} is \texttt{execute\_1\_pseudo\_instruction}:
     
    285284This process is iterated for each pseudoinstruction, before the lists are flattened into a single bit list representation of the original assembly program.
    286285
    287 By inspecting the above diagram, it would appear that the best way to proceed with a proof that the assembly process does not change the semantics of an assembly program is via a decomposition of the problem into two subproblems.
    288 Namely, we first expand any and all pseudoinstructions into lists of machine instructions, and provide a proof that this process does not change our program's semantics.
    289 Finally, we assemble all machine code instructions into machine code---lists of bytes---and prove once more that this process does not have an adverse effect on a program's semantics.
    290 By composition, we then have that the whole assembly process is semantics preserving.
     286%By inspecting the above diagram, it would appear that the best way to proceed with a proof that the assembly process does not change the semantics of an assembly program is via a decomposition of the problem into two subproblems.
     287%Namely, we first expand any and all pseudoinstructions into lists of machine instructions, and provide a proof that this process does not change our program's semantics.
     288%Finally, we assemble all machine code instructions into machine code---lists of bytes---and prove once more that this process does not have an adverse effect on a program's semantics.
     289%By composition, we then have that the whole assembly process is semantics preserving.
    291290
    292291%This is a tempting approach to the proof, but ultimately the wrong approach.
     
    310309  | long_jump: jump_length.
    311310\end{lstlisting}
    312 A \texttt{jump\_expansion\_policy} is a map from pseudo program counters (implemented as \texttt{Word}s) to \texttt{jump\_length}s. Efficient implementations
    313 of policies are based on tries.
    314 Intuitively, a policy maps positions in a program (indexed using program counters implemented as \texttt{Word}s) to a particular variety of jump.
    315 \begin{lstlisting}
    316 definition policy_type ≝ Word → jump_length.
    317 \end{lstlisting}
     311
     312A \texttt{jump\_expansion\_policy} is a map from pseudo program counters (implemented as \texttt{Word}s) to \texttt{jump\_length}s.
     313Efficient implementations of policies are based on tries.
     314Intuitively, a policy maps positions in a program (indexed using program counters implemented as \texttt{Word}s) to a particular variety of jump:
     315\begin{lstlisting}
     316definition policy_type := Word $\rightarrow$ jump_length.
     317\end{lstlisting}
     318
    318319Next, we require a series of `sigma' functions.
    319320These functions map assembly program counters to their machine code counterparts, establishing the correspondence between `positions' in an assembly program and `positions' in a machine code program.
     
    332333  pseudo_assembly_program $\rightarrow$ policy_type $\rightarrow$ option (Word $\rightarrow$ Word) := $\ldots$
    333334\end{lstlisting}
     335
    334336Now, it's possible to define what a `good policy' is for a program \texttt{p}.
    335337A policy \texttt{pol} is deemed good when it prevents \texttt{sigma\_safe} from failing on \texttt{p}.
     
    338340\begin{lstlisting}
    339341definition policy_ok := $\lambda$pol.$\lambda$p. sigma_safe p $\neq$ None $\ldots$
    340 definition policy := $\lambda$p. $\Sigma$jump_expansion: policy_type. policy_ok jump_expansion p
    341 \end{lstlisting}
     342definition policy :=
     343  $\lambda$p. $\Sigma$jump_expansion: policy_type. policy_ok jump_expansion p
     344\end{lstlisting}
     345
    342346Finally, we obtain \texttt{sigma}, a mapping from pseudo program counters to program counters that takes in input a good policy and thus never fails.
    343347Note how we avoid failure here, and in most of the remaining functions, by restricting the domain using the dependent type \texttt{policy}:
     
    353357
    354358Using our policies, we now work toward proving the total correctness of the assembler.
    355 By total correctness, we mean that the assembly process never fails when
    356 provided with a good strategy and that it does not change the semantics of
    357 a certain class of well behaved assembly program.
     359By `total correctness', we mean that the assembly process never fails when provided with a good policy and that the process does not change the semantics of a certain class of well behaved assembly programs.
    358360Naturally, this necessitates keeping some sort of correspondence between addresses at the assembly level and addresses at the machine code level.
    359361For this, we use the \texttt{sigma} machinery defined at the end of Subsection~\ref{subsect.policies}.
    360362
    361 We expand pseudoinstructions using the function \texttt{expand\_pseudo\_instruction} that takes a program made of pseudo-instructions, a good policy for
    362 the program and a pointer to the pseudo memory. It returns the list of
    363 instructions that correspond to the pseudo-instruction referenced by the
    364 pointer. The policy is used to decide how to expand
    365 \texttt{Call}s, \texttt{Jmp}s and conditional jumps. The function is given
    366 a dependent type that incorporates its specification. The pre and
    367 post-conditions are omitted in the paper due to lack of space. We show them
    368 as an example for the next function, \texttt{build\_maps}.
     363We expand pseudoinstructions using the function \texttt{expand\_pseudo\_instruction}.
     364This takes an assembly program (consisting of a list of pseudoinstructions), a good policy for the program and a pointer to the pseudo code memory.
     365It returns a list of instructions, corresponding to the expanded pseudoinstruction referenced by the pointer.
     366The policy is used to decide how to expand \texttt{Call}s, \texttt{Jmp}s and conditional jumps.
     367The function is given a dependent type that incorporates its specification.
     368Its pre- and post-conditions are omitted in the paper due to lack of space.
     369We show them as an example in the next function, \texttt{build\_maps}.
    369370\begin{lstlisting}
    370371definition expand_pseudo_instruction:
    371   ∀program.∀pol:policy program.∀ppc:Word. ... $\Sigma$res.list instruction. ...
    372   := $\ldots$
     372  $\forall$program. $\forall$pol: policy program.
     373  $\forall$ppc:Word. $\ldots$ $\Sigma$res. list instruction. $\ldots$ := $\ldots$
    373374\end{lstlisting}
    374375
     
    436437Or, in plainer words, assembling and then immediately fetching again gets you back to where you started.
    437438
    438 Lemma \texttt{fetch\_assembly\_pseudo} (whose type is shown here slightly
    439 simplified) is obtained by composition of
    440 \texttt{expand\_pseudo\_instruction} and
    441 \texttt{assembly\_1\_pseudoinstruction}:
     439Lemma \texttt{fetch\_assembly\_pseudo} (whose type is shown here slightly simplified) is obtained by composition of \texttt{expand\_pseudo\_instruction} and \texttt{assembly\_1\_pseudoinstruction}:
    442440\begin{lstlisting}
    443441lemma fetch_assembly_pseudo:
     
    459457Then, we check with \texttt{fetch\_many} that the number of machine instructions that were fetched matches the number of instruction that \texttt{expand\_pseudo\_instruction} expanded.
    460458
    461 The final lemma in this series is
    462 \texttt{fetch\_assembly\_pseudo2} that combines the previous one with the
    463 correctness of the operations that load the object code in the processor
    464 memory. At first, the lemmas appears to nearly establish the correctness of
    465 the assembler:
     459The final lemma in this series is \texttt{fetch\_assembly\_pseudo2} that combines the Lemma \texttt{fetch\_aasembly\_pseudo} with the correctness of the functions that load object code into the processor's memory.
     460At first, the lemmas appears to nearly establish the correctness of the assembler:
    466461\begin{lstlisting}
    467462lemma fetch_assembly_pseudo2:
     
    481476
    482477Intuitively, we may read \texttt{fetch\_assembly\_pseudo2} as follows.
    483 Suppose we are able to successfully assemble an assembly program using \texttt{assembly} and produce a code memory, \texttt{code\_memory}. The, fetching
    484 a pseudo-instruction from the pseudo-code memory at address \texttt{ppc}
    485 corresponds to fetching a sequence of instructions from the real code memory
    486 at address \texttt{sigma program pol ppc}. The fetched sequence is granted
    487 to be the expansion of the pseudo-instruction, according to the good policy
    488 \texttt{pol}.
    489 
    490 However, this property is \emph{not} strong enough to establish that the semantics of an assembly program has been preserved by the assembly process since it
    491 does not establish the correspondence between the semantics of a
    492 pseudo-instruction and that of its expansion. In particular, the two semantics
    493 differ on instructions that \emph{could} directly manipulate program
    494 addresses.
    495 
    496 \subsection{Total correctness for well-behaved assembly programs}
    497 In any reasonable assembly language addresses in code memory are just data that
    498 can be manipulated in multiple ways by the program. An assembly program can
    499 forge constant addresses, compare addresses, move addresses around,
    500 shift existent addresses or apply logical and arithmetical operations
    501 to them. Every optimizing assembler is allowed to modify the code memory.
    502 Hence only the semantics of a few of the former operations is preserved by
    503 an optimizing compiler. Moreover, the problem of characterizing well behaved
    504 programs is clearly undecidable.
    505 
    506 To obtain a reasonable statement of correctness for our compiler, we need to
    507 trace the memory locations (and, potentially, registers) that contain memory
    508 addresses. This is necessary for two purposes. The first one is to detect
    509 (at run time) misbehaved programs that only manipulate addresses in well
    510 behaved ways, according to some safe approximation of well behavedness.
    511 The second one is to compute statuses that correspond to pseudo-statues:
    512 not only the content of the program counter, but also the content of all
    513 traced locations must be translated applying the \texttt{sigma} map; all
    514 the remaining memory cells are copied verbatim.
    515 
    516 For instance, after a function call, the two bytes that form the return
    517 pseudo-address are pushed on top of the stack, i.e. in internal RAM.
    518 This pseudo-internal RAM corresponds to an internal RAM where the stack
    519 holds the real addresses after optimization, and all the other values untouched.
    520 
    521 We use an \texttt{internal\_pseudo\_address\_map}
    522 to trace addresses of code memory addresses in internal RAM.
     478Suppose we are able to successfully assemble an assembly program using \texttt{assembly} and produce a code memory, \texttt{code\_memory}.
     479Then, fetching a pseudoinstruction from the pseudo code memory at address \texttt{ppc} corresponds to fetching a sequence of instructions from the real code memory at address \texttt{sigma program pol ppc}.
     480The fetched sequence is established as the expansion of the pseudoinstruction, according to the good policy \texttt{pol}.
     481
     482However, this property is \emph{not} strong enough to establish that the semantics of an assembly program has been preserved by the assembly process since it does not establish the correspondence between the semantics of a pseudo-instruction and that of its expansion.
     483In particular, the two semantics differ on instructions that \emph{could} directly manipulate program addresses.
     484
     485% ---------------------------------------------------------------------------- %
     486% SECTION                                                                      %
     487% ---------------------------------------------------------------------------- %
     488\subsection{Total correctness for `well behaved' assembly programs}
     489\label{subsect.total.correctness.for.well.behaved.assembly.programs}
     490
     491In any `reasonable' assembly language addresses in code memory are just data that can be manipulated in multiple ways by the program.
     492An assembly program can forge, compare and move addresses around, shift existing addresses or apply logical and arithmetical operations to them.
     493Further, every optimising assembler is allowed to modify code memory.
     494Hence only the semantics of a few of the aforementioned operations are preserved by an optimising assembler/compiler.
     495Moreover, this characterisation of well behaved programs is clearly undecidable.
     496
     497To obtain a reasonable statement of correctness for our assembler, we need to trace memory locations (and, potentially, registers) that contain memory addresses.
     498This is necessary for two purposes.
     499
     500First we must detect (at run time) programs that manipulate addresses in well behaved ways, according to some approximation of well-behavedness.
     501Second, we must compute statuses that correspond to pseudo-statuses.
     502The contents of the program counter must be translated, as well as the contents of all traced locations, by applying the \texttt{sigma} map.
     503Remaining memory cells are copied \emph{verbatim}.
     504
     505For instance, after a function call, the two bytes that form the return pseudo address are pushed on top of the stack, i.e. in internal RAM.
     506This pseudo internal RAM corresponds to an internal RAM where the stack holds the real addresses after optimisation, and all the other values remain untouched.
     507
     508We use an \texttt{internal\_pseudo\_address\_map} to trace addresses of code memory addresses in internal RAM.
    523509The current code is parametric on the implementation of the map itself.
    524510\begin{lstlisting}
     
    526512\end{lstlisting}
    527513
    528 The \texttt{low\_internal\_ram\_of\_pseudo\_low\_internal\_ram} function
    529 converts the lower internal RAM of a \texttt{PseudoStatus} into the
    530 lower internal RAM of a \texttt{Status}. A similar one exists for higher
    531 internal RAM. Note that both RAM segments are index using addresses 7-bits
    532 long. The function is currently axiomatized and an associated
    533 set of axioms prescribe the behaviour of the function.
     514The \texttt{low\_internal\_ram\_of\_pseudo\_low\_internal\_ram} function converts the lower internal RAM of a \texttt{PseudoStatus} into the lower internal RAM of a \texttt{Status}.
     515A similar function exists for higher internal RAM.
     516Note that both RAM segments are indexed using addresses 7-bits long.
     517The function is currently axiomatised, and an associated set of axioms prescribe the behaviour of the function:
    534518\begin{lstlisting}
    535519axiom low_internal_ram_of_pseudo_low_internal_ram:
     
    545529\end{lstlisting}
    546530
    547 The \texttt{next\_internal\_pseudo\_address\_map} function is responsible for
    548 monitoring at run time the behaviour of pseudo-assembly programs to detect
    549 the well behaved ones. It returns the new map that traces memory addresses
    550 in internal RAM after execution of the next pseudo-instruction. It fails when
    551 the instruction tampers with memory addresses in unticipated (but potentially
    552 correct) ways. It thus decides membership to a correct but not complete
    553 subset of well behaved programs.
     531The \texttt{next\_internal\_pseudo\_address\_map} function is responsible for run time monitoring of the behaviour of assembly programs, in order to detect well behaved ones.
     532It returns the new map that traces memory addresses in internal RAM after execution of the next pseudoinstruction.
     533It fails when the instruction tampers with memory addresses in unanticipated (but potentially correct) ways.
     534It thus decides the membership of a correct but not complete subset of well behaved programs.
    554535\begin{lstlisting}
    555536definition next_internal_pseudo_address_map: internal_pseudo_address_map
     
    577558       [pol].
    578559\end{lstlisting}
    579 The statement is standard for forward simulation, but restricted to
    580 \texttt{PseudoStatuses} \texttt{ps} whose next instruction to be executed is
    581 well-behaved with respect to the \texttt{internal\_pseudo\_address\_map} \texttt{M}.
    582 
    583 Theorem \texttt{main\_thm} establishes the total correctness of the assembly process and can simply be lifted to the forward simulation of an arbitrary number
    584 of well behaved steps on the assembly program.
     560The statement is standard for forward simulation, but restricted to \texttt{PseudoStatuses} \texttt{ps} whose next instruction to be executed is well-behaved with respect to the \texttt{internal\_pseudo\_address\_map} \texttt{M}.
     561Theorem \texttt{main\_thm} establishes the total correctness of the assembly process and can simply be lifted to the forward simulation of an arbitrary number of well behaved steps on the assembly program.
    585562
    586563% ---------------------------------------------------------------------------- %
Note: See TracChangeset for help on using the changeset viewer.