Changeset 561


Ignore:
Timestamp:
Feb 17, 2011, 4:20:52 PM (6 years ago)
Author:
sacerdot
Message:

...

File:
1 edited

Legend:

Unmodified
Added
Removed
  • Deliverables/D4.1/ITP-Paper/itp-2011.tex

    r560 r561  
    425425The internal state of our Matita emulator is represented as a record:
    426426\begin{lstlisting}
    427 record Status: Type[0] ≝
    428 {
     427record Status: Type[0] ≝ {
    429428  code_memory: BitVectorTrie Byte 16;
    430429  low_internal_ram: BitVectorTrie Byte 7;
     
    434433  special_function_registers_8051: Vector Byte 19;
    435434  special_function_registers_8052: Vector Byte 5;
    436   ...
    437 }.
     435  ...  }.
    438436\end{lstlisting}
    439437This record neatly encapsulates the current memory contents, the program counter, the state of the current SFRs, and so on.
    440 One peculiarity is the packing of the 24 combined SFRs into fixed length vectors.
    441 This was due to a bug in Matita when we were constructing the emulator, since fixed, where the time needed to typecheck a record grew exponentially with the number of fields.
    442 
    443 Here, it appears that the MCS-51's memory spaces are completely disjoint.
    444 This is not so; many of them overlap with each other, and there's a many-many relationship between addressing modes and memory spaces.
    445 For instance, \texttt{DIRECT} addressing can be used to address low internal RAM and the SFRs, but not high internal RAM.
    446 
    447 For simplicity, we merely treat memory spaces as if they are completely disjoint in the \texttt{Status} record.
    448 Overlapping, and checking which addressing modes can be used to address particular memory spaces, is handled through numerous \texttt{get\_arg\_XX} and \texttt{set\_arg\_XX} (for 1, 8 and 16 bits) functions.
     438
     439Here we choosed to represent the MCS-51 memory model using four disjoint
     440segments plus the SFRs. From the programmer point of view, however, what
     441matters are addressing modes that are in a many-to-many relation with the
     442segments. For instance, the \texttt{DIRECT} addressing mode can be used to
     443address either low internal RAM (if the first bit is 0) or the SFRs (if the first bit is 1). That's why DIRECT uses 8-bits address but pointers to the low
     444internal RAM only have 7 bits. Hence the complexity of the memory model
     445is incapsulated in the  \texttt{get\_arg\_XX} and \texttt{set\_arg\_XX}
     446functions that get and set data of size \texttt{XX} from the
     447memory by considering all possible addressing modes
     448
     449%Overlapping, and checking which addressing modes can be used to address particular memory spaces, is handled through numerous \texttt{get\_arg\_XX} and \texttt{set\_arg\_XX} (for 1, 8 and 16 bits) functions.
    449450
    450451Both the Matita and O'Caml emulators follows the classic `fetch-decode-execute' model of processor operation.
     
    453454These costs are taken from a Siemens Semiconductor Group data sheet for the MCS-51~\cite{siemens:2011}, and will likely vary across manufacturers and particular derivatives of the processor.
    454455\begin{lstlisting}
    455 definition fetch:
    456   BitVectorTrie Byte 16 $\rightarrow$ Word $\rightarrow$ instruction $\times$ Word $\times$ nat := ...
    457 \end{lstlisting}
    458 A single instruction is assembled into its corresponding bit encoding with \texttt{assembly1}:
    459 \begin{lstlisting}
    460 definition assembly1: instruction $\rightarrow$ list Byte := ...
     456definition fetch: BitVectorTrie Byte 16 $\rightarrow$ Word $\rightarrow$ instruction $\times$ Word $\times$ nat
     457\end{lstlisting}
     458Instruction are assembled to bit encodings by \texttt{assembly1}:
     459\begin{lstlisting}
     460definition assembly1: instruction $\rightarrow$ list Byte
    461461\end{lstlisting}
    462462An assembly program, consisting of a preamble containing global data, and a list of (pseudo)instructions, is assembled using \texttt{assembly}.
     
    465465\begin{lstlisting}
    466466definition assembly:
    467   assembly_program $\rightarrow$ option (list Byte $\times$ (BitVectorTrie String 16)) := ...
    468 \end{lstlisting}
    469 A single execution step of the processor is evaluated using \texttt{execute\_1}, mapping a \texttt{Status} to a \texttt{Status}:
    470 \begin{lstlisting}
    471 definition execute_1: Status $\rightarrow$ Status := ...
    472 \end{lstlisting}
    473 Multiple steps of processor execution are implemented in \texttt{execute}, which wraps \texttt{execute\_1}:
     467  assembly_program $\rightarrow$ option (list Byte $\times$ (BitVectorTrie String 16))
     468\end{lstlisting}
     469A single fetch-decode-execute cycle is performed by \texttt{execute\_1}:
     470\begin{lstlisting}
     471definition execute_1: Status $\rightarrow$ Status
     472\end{lstlisting}
     473The \texttt{execute} functions performs a fixed number of cycles by iterating
     474\texttt{execute\_1}:
    474475\begin{lstlisting}
    475476let rec execute (n: nat) (s: Status) on n: Status := ...
    476477\end{lstlisting}
    477 This differs slightly from the design of the O'Caml emulator, which executed a program indefinitely, and also accepted a callback function as an argument, which could `witness' the execution as it happened, and providing a print-out of the processor state, and other debugging information.
    478 Due to Matita's requirement that all functions be strongly normalizing, \texttt{execute} cannot execute a program indefinitely, and must execute a fixed number of steps.
     478This differs slightly from the design of the O'Caml emulator, which executed a program indefinitely, and also accepted a callback function as an argument, which could `witness' the execution as it happened, and provide a print-out of the processor state, and other debugging information.
     479Due to Matita's requirement that all functions be strongly normalizing, \texttt{execute} cannot execute a program indefinitely. An alternative is to
     480produce an infinite stream of statuses representing the execution trace.
     481Infinite streams are encodable in Matita as co-inductive types.
    479482
    480483%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     
    485488
    486489A peculiarity of the MCS-51 is the non-orthogonality of its instruction set.
    487 For instance, the \texttt{MOV} instruction, can be invoked using one of sixteen combinations of addressing modes.
     490For instance, the \texttt{MOV} instruction, can be invoked using one of sixteen combinations of addressing modes out of 361.
    488491
    489492% Show example of pattern matching with polymorphic variants
     
    496499...
    497500\end{lstlisting}
    498 Which were then used in our inductive datatype for assembly instructions, as follows:
     501Which were then combined in our inductive datatype for assembly instructions using the union operator `$|$':
    499502\begin{lstlisting}
    500503type 'addr preinstruction =
     
    538541...
    539542\end{lstlisting}
    540 A function that checks whether an \texttt{addressing\_mode} is `morally' an \texttt{addressing\_mode\_tag} is provided, as follows:
     543The \texttt{is\_a} function checks if an \texttt{addressing\_mode} matches an \texttt{addressing\_mode\_tag}:
    541544\begin{lstlisting}
    542545let rec is_a (d: addressing_mode_tag) (A: addressing_mode) on d :=
     
    546549...
    547550\end{lstlisting}
    548 We also extend this check to vectors of \texttt{addressing\_mode\_tag}'s in the obvious manner:
    549 \begin{lstlisting}
    550 let rec is_in (n: nat) (l: Vector addressing_mode_tag n) (A: addressing_mode) on l :=
    551  match l return $\lambda$m.$\lambda$_: Vector addressing_mode_tag m. bool with
    552   [ VEmpty $\Rightarrow$ false
    553   | VCons m he (tl: Vector addressing_mode_tag m) $\Rightarrow$
    554      is_a he A $\vee$ is_in ? tl A ].
    555 \end{lstlisting}
    556 Here $\mathtt{\vee}$ is inclusive disjunction on the \texttt{bool} datatype.
    557 \begin{lstlisting}
    558 record subaddressing_mode (n: nat) (l: Vector addressing_mode_tag (S n)): Type[0] :=
    559 {
    560   subaddressing_modeel :> addressing_mode;
    561   subaddressing_modein: bool_to_Prop (is_in ? l subaddressing_modeel)
    562 }.
    563 \end{lstlisting}
    564 We can now provide an inductive type of preinstructions with precise typings:
     551The \texttt{is\_in} function checks if an \texttt{addressing\_mode} matches a set of tags represented as a vector. It simply extends the \texttt{is\_a} function in the obvious manner.
     552
     553Finally, a \texttt{subaddressing\_mode} is an ad-hoc non empty $\Sigma$-type of addressing
     554modes constrained to be in a given set of tags:
     555\begin{lstlisting}
     556record subaddressing_mode n (l: Vector addressing_mode_tag (S n)): Type[0] :=
     557 { subaddressing_modeel :> addressing_mode;
     558   subaddressing_modein: bool_to_Prop (is_in ? l subaddressing_modeel) }.
     559\end{lstlisting}
     560An implicit coercion is provided to promote vectors of tags (denoted with
     561$\llbracket - \rrbracket$)
     562to the
     563corresponding \texttt{subaddressing\_mode} so that we can use a syntax
     564close to the O'Caml one to specify preinstructions:
    565565\begin{lstlisting}
    566566inductive preinstruction (A: Type[0]): Type[0] ≝
     
    569569...
    570570\end{lstlisting}
    571 Here $\llbracket - \rrbracket$ is syntax denoting a vector.
    572571We see that the constructor \texttt{ADD} expects two parameters, the first being the accumulator A (\texttt{acc\_a}), and the second being one of a register, direct, indirect or data addressing mode.
    573572
     
    575574% Have lemmas proving that if an element is a member of a sub, then it is a member of a superlist, and so on
    576575The final, missing component is a pair of type coercions from \texttt{addressing\_mode} to \texttt{subaddressing\_mode} and from \texttt{subaddressing\_mode} to \texttt{Type$\lbrack0\rbrack$}, respectively.
    577 The latter coercion is largely straightforward, however the former is not:
    578 \begin{lstlisting}
    579 coercion mk_subaddressing_mode:
    580   $\forall$n.  $\forall$l: Vector addressing_mode_tag (S n).
    581   $\forall$a: addressing_mode.
    582   $\forall$p: bool_to_Prop (is_in ? l a). subaddressing_mode n l :=
    583     mk_subaddressing_mode on a: addressing_mode to subaddressing_mode ? ?.
    584 \end{lstlisting}
    585 Using this coercion opens a proof obligation wherein we must prove that the \texttt{addressing\_mode\_tag} in correspondence with the \texttt{addressing\_mode} is a member of the \texttt{Vector} of permissible \texttt{addressing\_mode\_tag}s.
    586 This impels us to state and prove a number of auxilliary lemmas.
    587 For instance, we prove that if an \texttt{addressing\_mode\_tag} is a member of a \texttt{Vector}, and we possess another vector with additional elements, then the same \texttt{addressing\_mode\_tag} is a member of this vector.
    588 Using these lemmas, and Matita's automation, all proof obligations are solved easily.
    589 (Type checking the main \texttt{execute\_1} function, for instance, opens up over 200 proof obligations.)
    590 
    591 The machinery just described allows us to state in the type of a function what addressing modes that function expects.
     576The first one is simply a forgetful coercion, while the second one opens
     577a proof obligation wherein we must prove that the provided value is in the
     578admissible set. This kind of coercions were firstly introduced in PVS to
     579implement subset types~\cite{pvs?} and then in Coq as an additional machinery~\cite{russell}. In Matita all coercions can open proof obligations.
     580
     581Proof obligations impels us to state and prove a few auxilliary lemmas related
     582to transitivity of subtyping. For instance, an addressing mode that belongs
     583to an allowed set also belongs to any one of its super-set. At the moment,
     584Matita's automation exploits these lemmas to completely solve all the proof
     585obligations opened in our formalization, comprising the 200 proof obligations
     586related to the main \texttt{execute\_1} function.
     587
     588The machinery just described allows us to restrict the set of addressing
     589modes expected by a function and use this information during pattern matching
     590to skip impossible cases.
    592591For instance, consider \texttt{set\_arg\_16}, which expects only a \texttt{DPTR}:
    593592\begin{lstlisting}
    594 definition set_arg_16: Status $\rightarrow$ Word $\rightarrow$ $\llbracket$ dptr $\rrbracket$ $\rightarrow$ Status ≝
    595   $\lambda$s, v, a.
     593definition set_arg_16: Status $\rightarrow$ Word $\rightarrow$ $\llbracket$ dptr $\rrbracket$ $\rightarrow$ Status ≝ $~\lambda$s, v, a.
    596594   match a return $\lambda$x. bool_to_Prop (is_in ? $\llbracket$ dptr $\rrbracket$ x) $\rightarrow$ ? with
    597595     [ DPTR $\Rightarrow$ $\lambda$_: True.
     
    600598       let status := set_8051_sfr status SFR_DPL bl in
    601599         status
    602      | _ $\Rightarrow$ $\lambda$_: False.
    603        match K in False with
    604        [
    605        ]
    606      ] (subaddressing_modein $\ldots$ a).
    607 \end{lstlisting}
    608 All other cases are discharged by the catch-all at the bottom of the match expression.
    609 Attempting to match against another addressing mode not indicated in the type (for example, \texttt{REGISTER}) will produce a type-error.
     600     | _ $\Rightarrow$ $\lambda$_: False. $\bot$ ] $~$(subaddressing_modein $\ldots$ a).
     601\end{lstlisting}
     602We feed to the pattern matching the proof \texttt{(subaddressing\_modein} $\ldots$ \texttt{a)} that the argument $a$ is in the set $\llbracket$ \texttt{dptr} $\rrbracket$. In all cases but \texttt{DPTR}, the proof is a proof of \texttt{False} and we can ask the system to open a proof obligation $\bot$ that will be
     603discarded automatically using ex-falso.
     604Attempting to match against a non allowed addressing mode
     605(replacing \texttt{False} with \texttt{True} in the branch) will produce a
     606type-error.
     607
     608All the other dependently and non dependently typed solutions we tried before
     609the current one resulted to be sub-optimal in practice. In particular, since
     610we need a large number of different combinations of address modes to describe
     611the whole instruction set, it is unfeasible to declare a data type for each
     612one of these combinations. Moreover, the current solution is the one that
     613matches best the corresponding O'Caml code, at the point that the translation
     614from O'Caml to Matita is almost syntactical. In particular, we would like to
     615investigate the possibility of changing the code extraction procedure of
     616Matita to recognize this programming pattern and output the original
     617code based on polymorphic variants.
    610618
    611619% Talk about extraction to O'Caml code, which hopefully will allow us to extract back to using polymorphic variants, or when extracting vectors we could extract using phantom types
Note: See TracChangeset for help on using the changeset viewer.