Changeset 1019


Ignore:
Timestamp:
Jun 21, 2011, 11:39:12 AM (8 years ago)
Author:
sacerdot
Message:

Finished rewriting of Section 3.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • src/ASM/CPP2011/cpp-2011.tex

    r1018 r1019  
    331331% SECTION                                                                      %
    332332% ---------------------------------------------------------------------------- %
    333 \subsection{Total correctness of the assembler}
     333\subsection{Correctness of the assembler with respect to fetching}
    334334\label{subsect.total.correctness.of.the.assembler}
    335335
    336336Using our policies, we now work toward proving the total correctness of the assembler.
    337 By total correctness, we mean that the assembly process does not change the semantics of an assembly program.
     337By total correctness, we mean that the assembly process never fails when
     338provided with a good strategy and that it does not change the semantics of
     339a certain class of well behaved assembly program.
    338340Naturally, this necessitates keeping some sort of correspondence between addresses at the assembly level and addresses at the machine code level.
    339341For this, we use the \texttt{sigma} machinery defined at the end of Subsection~\ref{subsect.policies}.
    340342
    341 We expand pseudoinstructions using the function \texttt{expand\_pseudo\_instruction}.
    342 This function accepts a `policy decision'---an element of type \texttt{jump\_length}---that is used when expanding a \texttt{Call}, \texttt{Jmp} or conditional jump to a label into the correct machine instruction.
    343 This \texttt{policy\_decision} is asssumed to originate from a policy as defined in Subsection~\ref{subsect.policies}.
     343We expand pseudoinstructions using the function \texttt{expand\_pseudo\_instruction} that takes a program made of pseudo-instructions, a good policy for
     344the program and a pointer to the pseudo memory. It returns the list of
     345instructions that correspond to the pseudo-instruction referenced by the
     346pointer. The policy is used to decide how to expand
     347\texttt{Call}s, \texttt{Jmp}s and conditional jumps. The function is given
     348a dependent type that incorporates its specification. The pre and
     349post-conditions are omitted in the paper due to lack of space. We show them
     350as an example for the next function, \texttt{build\_maps}.
    344351\begin{lstlisting}
    345352definition expand_pseudo_instruction:
    346   ∀lookup_labels, lookup_datalabels, pc, policy_decision.
    347     pseudo_instruction $\rightarrow$ list instruction := $\ldots$
     353  ∀program.∀pol:policy program.∀ppc:Word. ... $\Sigma$res.list instruction. ...
     354   := $\ldots$
    348355\end{lstlisting}
    349356
     
    363370\end{lstlisting}
    364371The rather complex type of \texttt{build\_maps} owes to our use of Matita's Russell facility to provide a strong specification for the function in the type (c.f. the use of sigma types, through which Russell is implemented in Matita).
    365 In particular, we express that any label should only appear exactly in any assembly program, and looking up a label in the newly created map is the same as applying the \texttt{sigma} function, recording the correspondence between pseudo program counters and program counters.
     372In particular, we express that for all labels that appear exactly once in any assembly program, the newly created map used in the implementation and the
     373stronger \texttt{sigma} function used in the specification agree.
    366374
    367375Using \texttt{build\_maps}, we can express the following lemma, expressing the correctness of the assembly function:
     
    410418Or, in plainer words, assembling and then immediately fetching again gets you back to where you started.
    411419
    412 Lemma \texttt{fetch\_assembly\_pseudo} establishes a basic relationship between \texttt{expand\_pseudo\_instruction} and \texttt{assembly\_1\_pseudoinstruction}:
    413 \begin{lstlisting}
    414 lemma fetch_assembly_pseudo: $\forall$p. $\forall$pol: policy p. $\forall$ppc, lk_lbl, lk_dlbl.
    415   $\forall$pi, code_memory, len, assembled, instructions, pc.
    416   let exp := pol ppc in
    417   let expand := expand_pseudo_instruction lk_lbls lk_dlbl pc exp pi in
    418   let ass := assembly_1_pseudoinstruction p pol ppc pc lk_lbl lk_dlbl pi in
    419   Some ? instructions = expand $\rightarrow$
    420   Some $\ldots$ $\langle$len, assembled$\rangle$ = ass $\rightarrow$
    421   encoding_check code_memory pc (pc + len) assembled $\rightarrow$
    422     fetch_many code_memory (pc + len) pc instructions.
     420Lemma \texttt{fetch\_assembly\_pseudo} (whose type is shown here slightly
     421simplified) is obtained by composition of
     422\texttt{expand\_pseudo\_instruction} and
     423\texttt{assembly\_1\_pseudoinstruction}:
     424\begin{lstlisting}
     425lemma fetch_assembly_pseudo:
     426 ∀program.∀pol:policy program.∀ppc.∀code_memory.
     427  let pi := fst (fetch_pseudo_instruction (snd program) ppc) in
     428  let instructions := expand_pseudo_instruction program pol ppc ... in
     429  let $\langle$len,assembled$\rangle$ := assembly_1_pseudoinstruction program pol ppc ... in
     430  encoding_check code_memory pc (pc + len) assembled →
     431  fetch_many code_memory (pc + len) pc instructions.
    423432\end{lstlisting}
    424433Here, \texttt{len} is the number of machine code instructions the pseudoinstruction at hand has been expanded into, and \texttt{encoding\_check} is a recursive function that checks that assembled machine code is correctly stored in code memory.
     
    427436
    428437Intuitively, Lemma \texttt{fetch\_assembly\_pseudo} can be read as follows.
    429 Suppose our policy \texttt{jump\_expansion} dictates that the pseudoinstruction indexed by the pseudo program counter \texttt{ppc} in assembly program \texttt{program} gives us the policy decision \texttt{jexp}.
    430 Further, suppose we expand the pseudoinstruction at \texttt{ppc} with the policy decision \texttt{jexp}, obtaining an (optional) list of machine code instructions \texttt{exp}.
    431 Suppose we also assemble the pseudoinstruction at \texttt{ppc} to obtain \texttt{ass}, a list of bytes.
     438Suppose our policy \texttt{jump\_expansion} dictates that the pseudoinstruction indexed by the pseudo program counter \texttt{ppc} in assembly program \texttt{program} gives us the policy decision \texttt{pol}.
     439Further, suppose we expand the pseudoinstruction at \texttt{ppc} with the policy decision \texttt{pol}, obtaining the list of machine code instructions \texttt{instructions}.
     440Suppose we also assemble the pseudoinstruction at \texttt{ppc} to obtain \texttt{assembled}, a list of bytes.
    432441Then, we check with \texttt{fetch\_many} that the number of machine instructions that were fetched matches the number of instruction that \texttt{expand\_pseudo\_instruction} expanded.
    433442
    434 At first sight, Lemma \texttt{fetch\_assembly\_pseudo2} appears to nearly establish the correctness of the assembler:
    435 \begin{lstlisting}
    436 lemma fetch_assembly_pseudo2: $\forall$p. $\forall$pol: policy p. $\forall$assembled, costs, labels.
    437   Some $\ldots$ $\langle$labels, costs$\rangle$ = build_maps program $\rightarrow$
    438   Some $\ldots$ $\langle$assembled, costs$\rangle$ = assembly program $\rightarrow$ $\forall$ppc.
     443The final lemma in this series is
     444\texttt{fetch\_assembly\_pseudo2} that combines the previous one with the
     445correctness of the operations that load the object code in the processor
     446memory. At first, the lemmas appears to nearly establish the correctness of
     447the assembler:
     448\begin{lstlisting}
     449lemma fetch_assembly_pseudo2:
     450 ∀program,pol,ppc.
     451  let $\langle$labels,costs$\rangle$ := build_maps program pol in
     452  let assembled := \fst (assembly program pol) in
    439453  let code_memory := load_code_memory assembled in
    440   let preamble := $\pi_1$ program in
    441   let data_labels := construct_datalabels preamble in
    442   let lk_labels :=
    443     λx. sigma program (address_of_word_labels_code_mem ($\pi_2$ program) x) in
    444   let lk_dlabels := λx. lookup ? ? x data_labels (zero ?) in
    445   let expansion := jump_expansion ppc program in
    446   let $\langle$pi, newppc$\rangle$ := fetch_pseudo_instruction ($\pi_2$ program) ppc in
    447   let ppc' := sigma program ppc in
    448   let newppc' := sigma program newppc in
    449   let instructions' :=
    450     expand_pseudo_instruction lk_labels lk_dlabels ppc' expansion pi in
    451   let fetched := $\lambda$instr. fetch_many code_memory newppc' ppc' instr in
    452     $\exists$instrs. Some ? instrs = instructions' $\wedge$ fetched instrs.
     454  let data_labels := construct_datalabels (\fst program) in
     455  let lookup_labels :=
     456    λx. sigma $\ldots$ pol (address_of_word_labels_code_mem (\snd program) x) in
     457  let lookup_datalabels := λx. lookup ? ? x data_labels (zero ?) in
     458  let $\langle$pi,newppc$\rangle$ := fetch_pseudo_instruction (\snd program) ppc in
     459  let instructions ≝ expand_pseudo_instruction program pol ppc ... in
     460   fetch_many code_memory (sigma program pol newppc)
     461     (sigma program pol ppc) instructions.
    453462\end{lstlisting}
    454463
    455464Intuitively, we may read \texttt{fetch\_assembly\_pseudo2} as follows.
    456 Suppose we are able to successfully assemble an assembly program using \texttt{assembly} and produce a code memory, \texttt{code\_memory}.
    457 Then there exists some list of machine instructions equal to the expansion of a pseudoinstruction and the number of machine instructions that need to be fetched is equal to the number of machine instructions that the pseudoinstruction was expanded into.
    458 
    459 However, this property is \emph{not} strong enough to establish that the semantics of an assembly program has been preserved by the assembly process.
    460 In particular, \texttt{fetch\_assembly\_pseudo2} says nothing about how memory addresses evolve during assembly.
    461 Memory addresses in one memory space may be mapped to memory addresses in a completely different memory space during assembly.
    462 To handle this problem, we need some more machinery in order to track how memory addresses are moved around, and to record a correspondence between addresses at the pseudo assembly level and the machine code level.
    463 
    464 We use an \texttt{internal\_pseudo\_address\_map} for this purpose.
    465 An \texttt{internal\_pseudo\_address\_map} associates memory addresses in the lower internal RAM of a \texttt{PseudoStatus} with a physical memory address at the machine code level:
    466 \begin{lstlisting}
    467 definition internal_pseudo_address_map := list (BitVector 8).
    468 \end{lstlisting}
    469 
    470 We use a map associating memory addresses, of type \texttt{internal\_pseudo\_address\_map}, to convert the lower internal RAM of a \texttt{PseudoStatus} into the lower internal RAM of a \texttt{Status}.
    471 The actual conversion process is performed by \texttt{low\_internal\_ram\_of\_pseudo\_low\_internal\_ram}:\footnote{An associated set of axioms describe how \texttt{low\_internal\_ram\_of\_pseudo\_low\_internal\_ram} behaves.  This is a form of parametricity.  We don't care about the particulars of the conversion functions, as long as they behave in accordance with our axioms.}
     465Suppose we are able to successfully assemble an assembly program using \texttt{assembly} and produce a code memory, \texttt{code\_memory}. The, fetching
     466a pseudo-instruction from the pseudo-code memory at address \texttt{ppc}
     467corresponds to fetching a sequence of instructions from the real code memory
     468at address \texttt{sigma program pol ppc}. The fetched sequence is granted
     469to be the expansion of the pseudo-instruction, according to the good policy
     470\texttt{pol}.
     471
     472However, this property is \emph{not} strong enough to establish that the semantics of an assembly program has been preserved by the assembly process since it
     473does not establish the correspondence between the semantics of a
     474pseudo-instruction and that of its expansion. In particular, the two semantics
     475differ on instructions that \emph{could} directly manipulate program
     476addresses.
     477
     478\subsection{Total correctness for well-behaved assembly programs}
     479In any reasonable assembly language addresses in code memory are just data that
     480can be manipulated in multiple ways by the program. An assembly program can
     481forge constant addresses, compare addresses, move addresses around,
     482shift existent addresses or apply logical and arithmetical operations
     483to them. Every optimizing assembler is allowed to modify the code memory.
     484Hence only the semantics of a few of the former operations is preserved by
     485an optimizing compiler. Moreover, the problem of characterizing well behaved
     486programs is clearly undecidable.
     487
     488To obtain a reasonable statement of correctness for our compiler, we need to
     489trace the memory locations (and, potentially, registers) that contain memory
     490addresses. This is necessary for two purposes. The first one is to detect
     491(at run time) misbehaved programs that only manipulate addresses in well
     492behaved ways, according to some safe approximation of well behavedness.
     493The second one is to compute statuses that correspond to pseudo-statues:
     494not only the content of the program counter, but also the content of all
     495traced locations must be translated applying the \texttt{sigma} map; all
     496the remaining memory cells are copied verbatim.
     497
     498For instance, after a function call, the two bytes that form the return
     499pseudo-address are pushed on top of the stack, i.e. in internal RAM.
     500This pseudo-internal RAM corresponds to an internal RAM where the stack
     501holds the real addresses after optimization, and all the other values untouched.
     502
     503We use an \texttt{internal\_pseudo\_address\_map}
     504to trace addresses of code memory addresses in internal RAM.
     505The current code is parametric on the implementation of the map itself.
     506\begin{lstlisting}
     507axiom internal_pseudo_address_map: Type[0].
     508\end{lstlisting}
     509
     510The \texttt{low\_internal\_ram\_of\_pseudo\_low\_internal\_ram} function
     511converts the lower internal RAM of a \texttt{PseudoStatus} into the
     512lower internal RAM of a \texttt{Status}. A similar one exists for higher
     513internal RAM. Note that both RAM segments are index using addresses 7-bits
     514long. The function is currently axiomatized and an associated
     515set of axioms prescribe the behaviour of the function.
    472516\begin{lstlisting}
    473517axiom low_internal_ram_of_pseudo_low_internal_ram:
    474518  internal_pseudo_address_map $\rightarrow$ BitVectorTrie Byte 7 $\rightarrow$ BitVectorTrie Byte 7.
    475519\end{lstlisting}
    476 A similar axiom exists for high internal RAM.
    477 
    478 Notice, the MCS-51's internal RAM is addressed with a 7-bit `byte'.
    479 % dpm: ugly English, fix
    480 The whole of the internal RAM space is addressed with bytes: the first bit is used to distinguish between the programmer addressing low and high internal memory.
    481520
    482521Next, we are able to translate \texttt{PseudoStatus} records into \texttt{Status} records using \texttt{status\_of\_pseudo\_status}.
    483522Translating a \texttt{PseudoStatus}'s code memory requires we expand pseudoinstructions and then assemble to obtain a trie of bytes.
    484 This should never fail, providing that our policy is correct:
     523This never fails, providing that our policy is correct:
    485524\begin{lstlisting}
    486525definition status_of_pseudo_status: internal_pseudo_address_map $\rightarrow$
     
    488527\end{lstlisting}
    489528
    490 After fetching an assembly instruction we must update any \texttt{internal\_pseudo\hyp{}\_address\_map}s that may be laying around.
    491 This is done with the following function:
     529The \texttt{next\_internal\_pseudo\_address\_map} function is responsible for
     530monitoring at run time the behaviour of pseudo-assembly programs to detect
     531the well behaved ones. It returns the new map that traces memory addresses
     532in internal RAM after execution of the next pseudo-instruction. It fails when
     533the instruction tampers with memory addresses in unticipated (but potentially
     534correct) ways. It thus decides membership to a correct but not complete
     535subset of well behaved programs.
    492536\begin{lstlisting}
    493537definition next_internal_pseudo_address_map: internal_pseudo_address_map
     
    507551\begin{lstlisting}
    508552theorem main_thm:
    509   ∀M,M',ps,s,s''.
    510     next_internal_pseudo_address_map M ps = Some $\ldots$ M' $\rightarrow$
    511       status_of_pseudo_status M ps = Some $\ldots$ s $\rightarrow$
    512         status_of_pseudo_status M'
    513           (execute_1_pseudo_instruction
    514             (ticks_of (code_memory $\ldots$ ps)) ps) = Some $\ldots$ s'' $\rightarrow$
    515               $\exists$n. execute n s = s''.
    516 \end{lstlisting}
    517 The statement can be given an intuitive reading as follows.
    518 Suppose our \texttt{PseudoStatus}, \texttt{ps}, can be successfully converted into a \texttt{Status}, \texttt{s}.
    519 Suppose further that, after executing a single assembly instruction and converting the resulting \texttt{PseudoStatus} into a \texttt{Status}, making sure to map memory addresses at the pseudo assembly level to memory addresses at the machine level correctly, we obtain \texttt{s''}, being careful to track the number of ticks executed with \texttt{ticks\_of}.
    520 Then, there exists some number \texttt{n}, so that executing \texttt{n} machine code instructions in \texttt{Status} \texttt{s} gives us \texttt{Status} \texttt{s''}.
    521 
    522 Theorem \texttt{main\_thm} establishes the correctness of the assembly process.
     553 ∀M,M':internal_pseudo_address_map.∀ps.∀pol: policy ps.
     554  next_internal_pseudo_address_map M ps = Some $\ldots$ M' →
     555   ∃n.
     556      execute n (status_of_pseudo_status M ps pol)
     557    = status_of_pseudo_status M'
     558       (execute_1_pseudo_instruction (ticks_of (code_memory $\ldots$ ps) pol) ps)
     559       [pol].
     560\end{lstlisting}
     561The statement is standard for forward simulation, but restricted to
     562\texttt{PseudoStatuses} \texttt{ps} whose next instruction to be executed is
     563well-behaved with respect to the \texttt{internal\_pseudo\_address\_map} \texttt{M}.
     564
     565Theorem \texttt{main\_thm} establishes the total correctness of the assembly process and can simply be lifted to the forward simulation of an arbitrary number
     566of well behaved steps on the assembly program.
    523567
    524568% ---------------------------------------------------------------------------- %
Note: See TracChangeset for help on using the changeset viewer.