Index: Deliverables/D4.1/ITPPaper/itp2011.tex
===================================================================
 Deliverables/D4.1/ITPPaper/itp2011.tex (revision 560)
+++ Deliverables/D4.1/ITPPaper/itp2011.tex (revision 561)
@@ 425,6 +425,5 @@
The internal state of our Matita emulator is represented as a record:
\begin{lstlisting}
record Status: Type[0] ≝
{
+record Status: Type[0] ≝ {
code_memory: BitVectorTrie Byte 16;
low_internal_ram: BitVectorTrie Byte 7;
@@ 434,17 +433,19 @@
special_function_registers_8051: Vector Byte 19;
special_function_registers_8052: Vector Byte 5;
 ...
}.
+ ... }.
\end{lstlisting}
This record neatly encapsulates the current memory contents, the program counter, the state of the current SFRs, and so on.
One peculiarity is the packing of the 24 combined SFRs into fixed length vectors.
This was due to a bug in Matita when we were constructing the emulator, since fixed, where the time needed to typecheck a record grew exponentially with the number of fields.

Here, it appears that the MCS51's memory spaces are completely disjoint.
This is not so; many of them overlap with each other, and there's a manymany relationship between addressing modes and memory spaces.
For instance, \texttt{DIRECT} addressing can be used to address low internal RAM and the SFRs, but not high internal RAM.

For simplicity, we merely treat memory spaces as if they are completely disjoint in the \texttt{Status} record.
Overlapping, and checking which addressing modes can be used to address particular memory spaces, is handled through numerous \texttt{get\_arg\_XX} and \texttt{set\_arg\_XX} (for 1, 8 and 16 bits) functions.
+
+Here we choosed to represent the MCS51 memory model using four disjoint
+segments plus the SFRs. From the programmer point of view, however, what
+matters are addressing modes that are in a manytomany relation with the
+segments. For instance, the \texttt{DIRECT} addressing mode can be used to
+address either low internal RAM (if the first bit is 0) or the SFRs (if the first bit is 1). That's why DIRECT uses 8bits address but pointers to the low
+internal RAM only have 7 bits. Hence the complexity of the memory model
+is incapsulated in the \texttt{get\_arg\_XX} and \texttt{set\_arg\_XX}
+functions that get and set data of size \texttt{XX} from the
+memory by considering all possible addressing modes
+
+%Overlapping, and checking which addressing modes can be used to address particular memory spaces, is handled through numerous \texttt{get\_arg\_XX} and \texttt{set\_arg\_XX} (for 1, 8 and 16 bits) functions.
Both the Matita and O'Caml emulators follows the classic `fetchdecodeexecute' model of processor operation.
@@ 453,10 +454,9 @@
These costs are taken from a Siemens Semiconductor Group data sheet for the MCS51~\cite{siemens:2011}, and will likely vary across manufacturers and particular derivatives of the processor.
\begin{lstlisting}
definition fetch:
 BitVectorTrie Byte 16 $\rightarrow$ Word $\rightarrow$ instruction $\times$ Word $\times$ nat := ...
\end{lstlisting}
A single instruction is assembled into its corresponding bit encoding with \texttt{assembly1}:
\begin{lstlisting}
definition assembly1: instruction $\rightarrow$ list Byte := ...
+definition fetch: BitVectorTrie Byte 16 $\rightarrow$ Word $\rightarrow$ instruction $\times$ Word $\times$ nat
+\end{lstlisting}
+Instruction are assembled to bit encodings by \texttt{assembly1}:
+\begin{lstlisting}
+definition assembly1: instruction $\rightarrow$ list Byte
\end{lstlisting}
An assembly program, consisting of a preamble containing global data, and a list of (pseudo)instructions, is assembled using \texttt{assembly}.
@@ 465,16 +465,19 @@
\begin{lstlisting}
definition assembly:
 assembly_program $\rightarrow$ option (list Byte $\times$ (BitVectorTrie String 16)) := ...
\end{lstlisting}
A single execution step of the processor is evaluated using \texttt{execute\_1}, mapping a \texttt{Status} to a \texttt{Status}:
\begin{lstlisting}
definition execute_1: Status $\rightarrow$ Status := ...
\end{lstlisting}
Multiple steps of processor execution are implemented in \texttt{execute}, which wraps \texttt{execute\_1}:
+ assembly_program $\rightarrow$ option (list Byte $\times$ (BitVectorTrie String 16))
+\end{lstlisting}
+A single fetchdecodeexecute cycle is performed by \texttt{execute\_1}:
+\begin{lstlisting}
+definition execute_1: Status $\rightarrow$ Status
+\end{lstlisting}
+The \texttt{execute} functions performs a fixed number of cycles by iterating
+\texttt{execute\_1}:
\begin{lstlisting}
let rec execute (n: nat) (s: Status) on n: Status := ...
\end{lstlisting}
This differs slightly from the design of the O'Caml emulator, which executed a program indefinitely, and also accepted a callback function as an argument, which could `witness' the execution as it happened, and providing a printout of the processor state, and other debugging information.
Due to Matita's requirement that all functions be strongly normalizing, \texttt{execute} cannot execute a program indefinitely, and must execute a fixed number of steps.
+This differs slightly from the design of the O'Caml emulator, which executed a program indefinitely, and also accepted a callback function as an argument, which could `witness' the execution as it happened, and provide a printout of the processor state, and other debugging information.
+Due to Matita's requirement that all functions be strongly normalizing, \texttt{execute} cannot execute a program indefinitely. An alternative is to
+produce an infinite stream of statuses representing the execution trace.
+Infinite streams are encodable in Matita as coinductive types.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@@ 485,5 +488,5 @@
A peculiarity of the MCS51 is the nonorthogonality of its instruction set.
For instance, the \texttt{MOV} instruction, can be invoked using one of sixteen combinations of addressing modes.
+For instance, the \texttt{MOV} instruction, can be invoked using one of sixteen combinations of addressing modes out of 361.
% Show example of pattern matching with polymorphic variants
@@ 496,5 +499,5 @@
...
\end{lstlisting}
Which were then used in our inductive datatype for assembly instructions, as follows:
+Which were then combined in our inductive datatype for assembly instructions using the union operator `$$':
\begin{lstlisting}
type 'addr preinstruction =
@@ 538,5 +541,5 @@
...
\end{lstlisting}
A function that checks whether an \texttt{addressing\_mode} is `morally' an \texttt{addressing\_mode\_tag} is provided, as follows:
+The \texttt{is\_a} function checks if an \texttt{addressing\_mode} matches an \texttt{addressing\_mode\_tag}:
\begin{lstlisting}
let rec is_a (d: addressing_mode_tag) (A: addressing_mode) on d :=
@@ 546,21 +549,18 @@
...
\end{lstlisting}
We also extend this check to vectors of \texttt{addressing\_mode\_tag}'s in the obvious manner:
\begin{lstlisting}
let rec is_in (n: nat) (l: Vector addressing_mode_tag n) (A: addressing_mode) on l :=
 match l return $\lambda$m.$\lambda$_: Vector addressing_mode_tag m. bool with
 [ VEmpty $\Rightarrow$ false
  VCons m he (tl: Vector addressing_mode_tag m) $\Rightarrow$
 is_a he A $\vee$ is_in ? tl A ].
\end{lstlisting}
Here $\mathtt{\vee}$ is inclusive disjunction on the \texttt{bool} datatype.
\begin{lstlisting}
record subaddressing_mode (n: nat) (l: Vector addressing_mode_tag (S n)): Type[0] :=
{
 subaddressing_modeel :> addressing_mode;
 subaddressing_modein: bool_to_Prop (is_in ? l subaddressing_modeel)
}.
\end{lstlisting}
We can now provide an inductive type of preinstructions with precise typings:
+The \texttt{is\_in} function checks if an \texttt{addressing\_mode} matches a set of tags represented as a vector. It simply extends the \texttt{is\_a} function in the obvious manner.
+
+Finally, a \texttt{subaddressing\_mode} is an adhoc non empty $\Sigma$type of addressing
+modes constrained to be in a given set of tags:
+\begin{lstlisting}
+record subaddressing_mode n (l: Vector addressing_mode_tag (S n)): Type[0] :=
+ { subaddressing_modeel :> addressing_mode;
+ subaddressing_modein: bool_to_Prop (is_in ? l subaddressing_modeel) }.
+\end{lstlisting}
+An implicit coercion is provided to promote vectors of tags (denoted with
+$\llbracket  \rrbracket$)
+to the
+corresponding \texttt{subaddressing\_mode} so that we can use a syntax
+close to the O'Caml one to specify preinstructions:
\begin{lstlisting}
inductive preinstruction (A: Type[0]): Type[0] ≝
@@ 569,5 +569,4 @@
...
\end{lstlisting}
Here $\llbracket  \rrbracket$ is syntax denoting a vector.
We see that the constructor \texttt{ADD} expects two parameters, the first being the accumulator A (\texttt{acc\_a}), and the second being one of a register, direct, indirect or data addressing mode.
@@ 575,23 +574,22 @@
% Have lemmas proving that if an element is a member of a sub, then it is a member of a superlist, and so on
The final, missing component is a pair of type coercions from \texttt{addressing\_mode} to \texttt{subaddressing\_mode} and from \texttt{subaddressing\_mode} to \texttt{Type$\lbrack0\rbrack$}, respectively.
The latter coercion is largely straightforward, however the former is not:
\begin{lstlisting}
coercion mk_subaddressing_mode:
 $\forall$n. $\forall$l: Vector addressing_mode_tag (S n).
 $\forall$a: addressing_mode.
 $\forall$p: bool_to_Prop (is_in ? l a). subaddressing_mode n l :=
 mk_subaddressing_mode on a: addressing_mode to subaddressing_mode ? ?.
\end{lstlisting}
Using this coercion opens a proof obligation wherein we must prove that the \texttt{addressing\_mode\_tag} in correspondence with the \texttt{addressing\_mode} is a member of the \texttt{Vector} of permissible \texttt{addressing\_mode\_tag}s.
This impels us to state and prove a number of auxilliary lemmas.
For instance, we prove that if an \texttt{addressing\_mode\_tag} is a member of a \texttt{Vector}, and we possess another vector with additional elements, then the same \texttt{addressing\_mode\_tag} is a member of this vector.
Using these lemmas, and Matita's automation, all proof obligations are solved easily.
(Type checking the main \texttt{execute\_1} function, for instance, opens up over 200 proof obligations.)

The machinery just described allows us to state in the type of a function what addressing modes that function expects.
+The first one is simply a forgetful coercion, while the second one opens
+a proof obligation wherein we must prove that the provided value is in the
+admissible set. This kind of coercions were firstly introduced in PVS to
+implement subset types~\cite{pvs?} and then in Coq as an additional machinery~\cite{russell}. In Matita all coercions can open proof obligations.
+
+Proof obligations impels us to state and prove a few auxilliary lemmas related
+to transitivity of subtyping. For instance, an addressing mode that belongs
+to an allowed set also belongs to any one of its superset. At the moment,
+Matita's automation exploits these lemmas to completely solve all the proof
+obligations opened in our formalization, comprising the 200 proof obligations
+related to the main \texttt{execute\_1} function.
+
+The machinery just described allows us to restrict the set of addressing
+modes expected by a function and use this information during pattern matching
+to skip impossible cases.
For instance, consider \texttt{set\_arg\_16}, which expects only a \texttt{DPTR}:
\begin{lstlisting}
definition set_arg_16: Status $\rightarrow$ Word $\rightarrow$ $\llbracket$ dptr $\rrbracket$ $\rightarrow$ Status ≝
 $\lambda$s, v, a.
+definition set_arg_16: Status $\rightarrow$ Word $\rightarrow$ $\llbracket$ dptr $\rrbracket$ $\rightarrow$ Status ≝ $~\lambda$s, v, a.
match a return $\lambda$x. bool_to_Prop (is_in ? $\llbracket$ dptr $\rrbracket$ x) $\rightarrow$ ? with
[ DPTR $\Rightarrow$ $\lambda$_: True.
@@ 600,12 +598,22 @@
let status := set_8051_sfr status SFR_DPL bl in
status
  _ $\Rightarrow$ $\lambda$_: False.
 match K in False with
 [
 ]
 ] (subaddressing_modein $\ldots$ a).
\end{lstlisting}
All other cases are discharged by the catchall at the bottom of the match expression.
Attempting to match against another addressing mode not indicated in the type (for example, \texttt{REGISTER}) will produce a typeerror.
+  _ $\Rightarrow$ $\lambda$_: False. $\bot$ ] $~$(subaddressing_modein $\ldots$ a).
+\end{lstlisting}
+We feed to the pattern matching the proof \texttt{(subaddressing\_modein} $\ldots$ \texttt{a)} that the argument $a$ is in the set $\llbracket$ \texttt{dptr} $\rrbracket$. In all cases but \texttt{DPTR}, the proof is a proof of \texttt{False} and we can ask the system to open a proof obligation $\bot$ that will be
+discarded automatically using exfalso.
+Attempting to match against a non allowed addressing mode
+(replacing \texttt{False} with \texttt{True} in the branch) will produce a
+typeerror.
+
+All the other dependently and non dependently typed solutions we tried before
+the current one resulted to be suboptimal in practice. In particular, since
+we need a large number of different combinations of address modes to describe
+the whole instruction set, it is unfeasible to declare a data type for each
+one of these combinations. Moreover, the current solution is the one that
+matches best the corresponding O'Caml code, at the point that the translation
+from O'Caml to Matita is almost syntactical. In particular, we would like to
+investigate the possibility of changing the code extraction procedure of
+Matita to recognize this programming pattern and output the original
+code based on polymorphic variants.
% Talk about extraction to O'Caml code, which hopefully will allow us to extract back to using polymorphic variants, or when extracting vectors we could extract using phantom types