# Changeset 1021

Ignore:
Timestamp:
Jun 21, 2011, 12:15:04 PM (10 years ago)
Message:

tidied english in sect 3

File:
1 edited

### Legend:

Unmodified
 r1020 We merely describe enough here to understand the rest of the proof. At heart, the MCS-51 emulator centres around a \texttt{Status} record, describing the current state of the microprocessor. The MCS-51 emulator centres around a \texttt{Status} record, describing the current state of the microprocessor. This record contains fields corresponding to the microprocessor's program counter, special function registers, and so on. At the machine code level, code memory is implemented as a trie of bytes, addressed by the program counter. At the machine code level, code memory is implemented as a compact trie of bytes, addressed by the program counter. Machine code programs are loaded into \texttt{Status} using the \texttt{load\_code\_memory} function. We may execut a single step of a machine code program using the \texttt{execute\_1} function, which returns an updated \texttt{Status}: We may execute a single step of a machine code program using the \texttt{execute\_1} function, which returns an updated \texttt{Status}: \begin{lstlisting} definition execute_1: Status $\rightarrow$ Status := $\ldots$ \end{lstlisting} The function \texttt{execute} allows one to execute an arbitrary, but fixed (due to Matita's normalisation requirement!) number of steps of a program. The function \texttt{execute} allows one to execute an arbitrary, but fixed (due to Matita's normalisation requirement) number of steps of a program. Naturally, assembly programs have analogues. The counterpart of the \texttt{Status} record is \texttt{PseudoStatus}. Instead of code memory being implemented as tries of bytes, code memory is here implemented as lists of pseudoinstructions, and program counters are merely indices into this list. In actual fact, both \texttt{Status} and \texttt{PseudoStatus} are both specialisations of the same \texttt{PreStatus} record, parametric in the representation of code memory. Both \texttt{Status} and \texttt{PseudoStatus} are specialisations of the same \texttt{PreStatus} record, parametric in the representation of code memory. This allows us to share some code that is common to both records (for instance, setter' and getter' functions). A further benefit of this sharing is that those instructions that are completely ambivalent about the particular representation of code memory can be factored out into their own type. Our analogue of \texttt{execute\_1} is \texttt{execute\_1\_pseudo\_instruction}: This process is iterated for each pseudoinstruction, before the lists are flattened into a single bit list representation of the original assembly program. By inspecting the above diagram, it would appear that the best way to proceed with a proof that the assembly process does not change the semantics of an assembly program is via a decomposition of the problem into two subproblems. Namely, we first expand any and all pseudoinstructions into lists of machine instructions, and provide a proof that this process does not change our program's semantics. Finally, we assemble all machine code instructions into machine code---lists of bytes---and prove once more that this process does not have an adverse effect on a program's semantics. By composition, we then have that the whole assembly process is semantics preserving. %By inspecting the above diagram, it would appear that the best way to proceed with a proof that the assembly process does not change the semantics of an assembly program is via a decomposition of the problem into two subproblems. %Namely, we first expand any and all pseudoinstructions into lists of machine instructions, and provide a proof that this process does not change our program's semantics. %Finally, we assemble all machine code instructions into machine code---lists of bytes---and prove once more that this process does not have an adverse effect on a program's semantics. %By composition, we then have that the whole assembly process is semantics preserving. %This is a tempting approach to the proof, but ultimately the wrong approach. | long_jump: jump_length. \end{lstlisting} A \texttt{jump\_expansion\_policy} is a map from pseudo program counters (implemented as \texttt{Word}s) to \texttt{jump\_length}s. Efficient implementations of policies are based on tries. Intuitively, a policy maps positions in a program (indexed using program counters implemented as \texttt{Word}s) to a particular variety of jump. \begin{lstlisting} definition policy_type ≝ Word → jump_length. \end{lstlisting} A \texttt{jump\_expansion\_policy} is a map from pseudo program counters (implemented as \texttt{Word}s) to \texttt{jump\_length}s. Efficient implementations of policies are based on tries. Intuitively, a policy maps positions in a program (indexed using program counters implemented as \texttt{Word}s) to a particular variety of jump: \begin{lstlisting} definition policy_type := Word $\rightarrow$ jump_length. \end{lstlisting} Next, we require a series of sigma' functions. These functions map assembly program counters to their machine code counterparts, establishing the correspondence between positions' in an assembly program and positions' in a machine code program. pseudo_assembly_program $\rightarrow$ policy_type $\rightarrow$ option (Word $\rightarrow$ Word) := $\ldots$ \end{lstlisting} Now, it's possible to define what a good policy' is for a program \texttt{p}. A policy \texttt{pol} is deemed good when it prevents \texttt{sigma\_safe} from failing on \texttt{p}. \begin{lstlisting} definition policy_ok := $\lambda$pol.$\lambda$p. sigma_safe p $\neq$ None $\ldots$ definition policy := $\lambda$p. $\Sigma$jump_expansion: policy_type. policy_ok jump_expansion p \end{lstlisting} definition policy := $\lambda$p. $\Sigma$jump_expansion: policy_type. policy_ok jump_expansion p \end{lstlisting} Finally, we obtain \texttt{sigma}, a mapping from pseudo program counters to program counters that takes in input a good policy and thus never fails. Note how we avoid failure here, and in most of the remaining functions, by restricting the domain using the dependent type \texttt{policy}: Using our policies, we now work toward proving the total correctness of the assembler. By total correctness, we mean that the assembly process never fails when provided with a good strategy and that it does not change the semantics of a certain class of well behaved assembly program. By total correctness', we mean that the assembly process never fails when provided with a good policy and that the process does not change the semantics of a certain class of well behaved assembly programs. Naturally, this necessitates keeping some sort of correspondence between addresses at the assembly level and addresses at the machine code level. For this, we use the \texttt{sigma} machinery defined at the end of Subsection~\ref{subsect.policies}. We expand pseudoinstructions using the function \texttt{expand\_pseudo\_instruction} that takes a program made of pseudo-instructions, a good policy for the program and a pointer to the pseudo memory. It returns the list of instructions that correspond to the pseudo-instruction referenced by the pointer. The policy is used to decide how to expand \texttt{Call}s, \texttt{Jmp}s and conditional jumps. The function is given a dependent type that incorporates its specification. The pre and post-conditions are omitted in the paper due to lack of space. We show them as an example for the next function, \texttt{build\_maps}. We expand pseudoinstructions using the function \texttt{expand\_pseudo\_instruction}. This takes an assembly program (consisting of a list of pseudoinstructions), a good policy for the program and a pointer to the pseudo code memory. It returns a list of instructions, corresponding to the expanded pseudoinstruction referenced by the pointer. The policy is used to decide how to expand \texttt{Call}s, \texttt{Jmp}s and conditional jumps. The function is given a dependent type that incorporates its specification. Its pre- and post-conditions are omitted in the paper due to lack of space. We show them as an example in the next function, \texttt{build\_maps}. \begin{lstlisting} definition expand_pseudo_instruction: ∀program.∀pol:policy program.∀ppc:Word. ... $\Sigma$res.list instruction. ... := $\ldots$ $\forall$program. $\forall$pol: policy program. $\forall$ppc:Word. $\ldots$ $\Sigma$res. list instruction. $\ldots$ := $\ldots$ \end{lstlisting} Or, in plainer words, assembling and then immediately fetching again gets you back to where you started. Lemma \texttt{fetch\_assembly\_pseudo} (whose type is shown here slightly simplified) is obtained by composition of \texttt{expand\_pseudo\_instruction} and \texttt{assembly\_1\_pseudoinstruction}: Lemma \texttt{fetch\_assembly\_pseudo} (whose type is shown here slightly simplified) is obtained by composition of \texttt{expand\_pseudo\_instruction} and \texttt{assembly\_1\_pseudoinstruction}: \begin{lstlisting} lemma fetch_assembly_pseudo: Then, we check with \texttt{fetch\_many} that the number of machine instructions that were fetched matches the number of instruction that \texttt{expand\_pseudo\_instruction} expanded. The final lemma in this series is \texttt{fetch\_assembly\_pseudo2} that combines the previous one with the correctness of the operations that load the object code in the processor memory. At first, the lemmas appears to nearly establish the correctness of the assembler: The final lemma in this series is \texttt{fetch\_assembly\_pseudo2} that combines the Lemma \texttt{fetch\_aasembly\_pseudo} with the correctness of the functions that load object code into the processor's memory. At first, the lemmas appears to nearly establish the correctness of the assembler: \begin{lstlisting} lemma fetch_assembly_pseudo2: Intuitively, we may read \texttt{fetch\_assembly\_pseudo2} as follows. Suppose we are able to successfully assemble an assembly program using \texttt{assembly} and produce a code memory, \texttt{code\_memory}. The, fetching a pseudo-instruction from the pseudo-code memory at address \texttt{ppc} corresponds to fetching a sequence of instructions from the real code memory at address \texttt{sigma program pol ppc}. The fetched sequence is granted to be the expansion of the pseudo-instruction, according to the good policy \texttt{pol}. However, this property is \emph{not} strong enough to establish that the semantics of an assembly program has been preserved by the assembly process since it does not establish the correspondence between the semantics of a pseudo-instruction and that of its expansion. In particular, the two semantics differ on instructions that \emph{could} directly manipulate program addresses. \subsection{Total correctness for well-behaved assembly programs} In any reasonable assembly language addresses in code memory are just data that can be manipulated in multiple ways by the program. An assembly program can forge constant addresses, compare addresses, move addresses around, shift existent addresses or apply logical and arithmetical operations to them. Every optimizing assembler is allowed to modify the code memory. Hence only the semantics of a few of the former operations is preserved by an optimizing compiler. Moreover, the problem of characterizing well behaved programs is clearly undecidable. To obtain a reasonable statement of correctness for our compiler, we need to trace the memory locations (and, potentially, registers) that contain memory addresses. This is necessary for two purposes. The first one is to detect (at run time) misbehaved programs that only manipulate addresses in well behaved ways, according to some safe approximation of well behavedness. The second one is to compute statuses that correspond to pseudo-statues: not only the content of the program counter, but also the content of all traced locations must be translated applying the \texttt{sigma} map; all the remaining memory cells are copied verbatim. For instance, after a function call, the two bytes that form the return pseudo-address are pushed on top of the stack, i.e. in internal RAM. This pseudo-internal RAM corresponds to an internal RAM where the stack holds the real addresses after optimization, and all the other values untouched. We use an \texttt{internal\_pseudo\_address\_map} to trace addresses of code memory addresses in internal RAM. Suppose we are able to successfully assemble an assembly program using \texttt{assembly} and produce a code memory, \texttt{code\_memory}. Then, fetching a pseudoinstruction from the pseudo code memory at address \texttt{ppc} corresponds to fetching a sequence of instructions from the real code memory at address \texttt{sigma program pol ppc}. The fetched sequence is established as the expansion of the pseudoinstruction, according to the good policy \texttt{pol}. However, this property is \emph{not} strong enough to establish that the semantics of an assembly program has been preserved by the assembly process since it does not establish the correspondence between the semantics of a pseudo-instruction and that of its expansion. In particular, the two semantics differ on instructions that \emph{could} directly manipulate program addresses. % ---------------------------------------------------------------------------- % % SECTION                                                                      % % ---------------------------------------------------------------------------- % \subsection{Total correctness for well behaved' assembly programs} \label{subsect.total.correctness.for.well.behaved.assembly.programs} In any `reasonable' assembly language addresses in code memory are just data that can be manipulated in multiple ways by the program. An assembly program can forge, compare and move addresses around, shift existing addresses or apply logical and arithmetical operations to them. Further, every optimising assembler is allowed to modify code memory. Hence only the semantics of a few of the aforementioned operations are preserved by an optimising assembler/compiler. Moreover, this characterisation of well behaved programs is clearly undecidable. To obtain a reasonable statement of correctness for our assembler, we need to trace memory locations (and, potentially, registers) that contain memory addresses. This is necessary for two purposes. First we must detect (at run time) programs that manipulate addresses in well behaved ways, according to some approximation of well-behavedness. Second, we must compute statuses that correspond to pseudo-statuses. The contents of the program counter must be translated, as well as the contents of all traced locations, by applying the \texttt{sigma} map. Remaining memory cells are copied \emph{verbatim}. For instance, after a function call, the two bytes that form the return pseudo address are pushed on top of the stack, i.e. in internal RAM. This pseudo internal RAM corresponds to an internal RAM where the stack holds the real addresses after optimisation, and all the other values remain untouched. We use an \texttt{internal\_pseudo\_address\_map} to trace addresses of code memory addresses in internal RAM. The current code is parametric on the implementation of the map itself. \begin{lstlisting} \end{lstlisting} The \texttt{low\_internal\_ram\_of\_pseudo\_low\_internal\_ram} function converts the lower internal RAM of a \texttt{PseudoStatus} into the lower internal RAM of a \texttt{Status}. A similar one exists for higher internal RAM. Note that both RAM segments are index using addresses 7-bits long. The function is currently axiomatized and an associated set of axioms prescribe the behaviour of the function. The \texttt{low\_internal\_ram\_of\_pseudo\_low\_internal\_ram} function converts the lower internal RAM of a \texttt{PseudoStatus} into the lower internal RAM of a \texttt{Status}. A similar function exists for higher internal RAM. Note that both RAM segments are indexed using addresses 7-bits long. The function is currently axiomatised, and an associated set of axioms prescribe the behaviour of the function: \begin{lstlisting} axiom low_internal_ram_of_pseudo_low_internal_ram: \end{lstlisting} The \texttt{next\_internal\_pseudo\_address\_map} function is responsible for monitoring at run time the behaviour of pseudo-assembly programs to detect the well behaved ones. It returns the new map that traces memory addresses in internal RAM after execution of the next pseudo-instruction. It fails when the instruction tampers with memory addresses in unticipated (but potentially correct) ways. It thus decides membership to a correct but not complete subset of well behaved programs. The \texttt{next\_internal\_pseudo\_address\_map} function is responsible for run time monitoring of the behaviour of assembly programs, in order to detect well behaved ones. It returns the new map that traces memory addresses in internal RAM after execution of the next pseudoinstruction. It fails when the instruction tampers with memory addresses in unanticipated (but potentially correct) ways. It thus decides the membership of a correct but not complete subset of well behaved programs. \begin{lstlisting} definition next_internal_pseudo_address_map: internal_pseudo_address_map [pol]. \end{lstlisting} The statement is standard for forward simulation, but restricted to \texttt{PseudoStatuses} \texttt{ps} whose next instruction to be executed is well-behaved with respect to the \texttt{internal\_pseudo\_address\_map} \texttt{M}. Theorem \texttt{main\_thm} establishes the total correctness of the assembly process and can simply be lifted to the forward simulation of an arbitrary number of well behaved steps on the assembly program. The statement is standard for forward simulation, but restricted to \texttt{PseudoStatuses} \texttt{ps} whose next instruction to be executed is well-behaved with respect to the \texttt{internal\_pseudo\_address\_map} \texttt{M}. Theorem \texttt{main\_thm} establishes the total correctness of the assembly process and can simply be lifted to the forward simulation of an arbitrary number of well behaved steps on the assembly program. % ---------------------------------------------------------------------------- %