# Changeset 570

Ignore:
Timestamp:
Feb 18, 2011, 10:57:24 AM (9 years ago)
Message:

Tightened English up to Subsection 2.5

File:
1 edited

### Legend:

Unmodified
 r569 \label{sect.design.issues.formalisation} From hereonin, we typeset O'Caml source with \texttt{\color{blue}{blue}} and Matita source with \texttt{\color{red}{red}} to distinguish the two syntaxes. Matita's syntax is largely straightforward to those familiar with Coq or O'Caml. The only subtlety is the use of \texttt{?}' in an argument position denoting an argument that should be inferred automatically. From hereonin, we typeset O'Caml source with \texttt{\color{blue}{blue}} and Matita source with \texttt{\color{red}{red}}. Matita's syntax is straightforward if familiar with Coq or O'Caml. One subtlety is the use of \texttt{?}' in an argument position denoting an argument to be inferred. A full account of the formalisation can be found in~\cite{cerco-report:2011}. Our implementation progressed in two stages. We began with an emulator written in O'Caml to iron out' any bugs in our design and implementation. We began with an emulator written in O'Caml to iron out' any bugs in the design and implementation. O'Caml's ability to perform file I/O also eased debugging and validation. Once we were happy with the design of the O'Caml emulator, we moved Matita. Once we were happy with the design of the O'Caml emulator, we moved to Matita. Matita's syntax is lexically similar to O'Caml's. This eased the translation, as code was merely copied with minor modifications. This eased the translation, as swathes of code were copied with minor modifications. However, several major issues had to be addresses when moving from to Matita. These are now discussed. \end{figure} The formalization of MCS-51 must deal with bytes (8-bits), words (16-bits) but also with more exoteric quantities (7-bits, 3-bits, 9-bits). To avoid size mismatch bugs difficult to spot, we represent all of these quantities using bitvectors, i.e. fixed length vectors of booleans. The formalization of MCS-51 must deal with bytes (8-bits), words (16-bits), and also more exoteric quantities (7, 3 and 9-bits). To avoid difficult-to-trace size mismatch bugs, we represented all quantities using bitvectors, i.e. fixed length vectors of booleans. In our O'Caml emulator, we faked' bitvectors using phantom types~\cite{leijen:domain:1999} implemented with polymorphic variants~\cite{garrigue:programming:1998}, as in Figure~\ref{fig.ocaml.implementation.bitvectors}. From within the bitvector module (left column) bitvectors are just lists of bits and no guarantee is provided on sizes. However, the module's interface (right column) enforces the size invariants in the rest of the code. However, the module's interface (right column) enforces size invariants in the rest of the code. In Matita, we are able to use the full power of dependent types to always work with vectors of a known size: \end{lstlisting} We define \texttt{BitVector} as a specialization of \texttt{Vector} to \texttt{bool}. We may use Matita's type system to provide precise typing for functions that are polymorphic in the size without having to duplicate the code as we did in O'Caml: We may use Matita's type system to provide precise typings for functions that are polymorphic in the size without code duplication: \begin{lstlisting} let rec split (A: Type[0]) (m,n: nat) on m: \label{subsect.representing.memory} The MCS-51 has numerous disjoint memory spaces addressed by pointers of different sizes. In our prototype implementation, we use a map data structure (from O'Caml's standard library) for each space. The MCS-51 has numerous disjoint memory spaces addressed by differently sized pointers. In our O'Caml implementation, we use a map data structure (from the standard library) for each space. Matita's standard library is small, and does not contain a generic map data structure. We had the opportunity of crafting a dependently typed special-purpose data structure for the job to enforce the correspondence between the size of pointer and the size of the memory space. We assumed that large swathes of memory would often be uninitialized. Further, we assumed that large swathes of memory would often be uninitialized. We picked a modified form of trie of fixed height $h$. It represents uninitialized data'. Performing a lookup in memory is now straight-forward. We traverse a path, and if we encounter \texttt{Stub}, we return a default value\footnote{All manufacturer data sheets that we consulted were silent on the subject of what should be returned if we attempt to access uninitialized memory.  We defaulted to simply returning zero, though our \texttt{lookup} function is parametric in this choice.  We do not believe that this is an outrageous decision, as SDCC for instance generates code which first zeroes out' all memory in a preamble before executing the program proper.  This is in line with the C standard, which guarantees that all global variables will be zero initialized piecewise.}. The only subtlety over normal trie lookup is how we handle \texttt{Stub}. We traverse a path, and upon encountering \texttt{Stub}, we return a default value\footnote{All manufacturer data sheets that we consulted were silent on the subject of what should be returned if we attempt to access uninitialized memory.  We defaulted to simply returning zero, though our \texttt{lookup} function is parametric in this choice.  We do not believe that this is an outrageous decision, as SDCC for instance generates code which first zeroes out' all memory in a preamble before executing the program proper.  This is in line with the C standard, which guarantees that all global variables will be zero initialized piecewise.}. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Further, all jump instructions require a concrete memory address---to jump to---to be specified. Compilers that support separate compilation cannot directly compute these offsets and select the appropriate jump instructions. These operations are also burdensome for compilers that do not do separate compilation and are handled by the assemblers, as we decided to do. These operations are also burdensome for compilers that do not do separate compilation and are handled by assemblers. We followed suit. While introducing pseudoinstructions, we also introduced labels for locations to jump to, and for global data. To specify global data via labels, we introduced a preamble before the program where labels and the size of reserved space for data is associated. A pseudoinstruction \texttt{Mov} moves (16-bit) data stored at a label into the (16-bit) register \texttt{DPTR}. Our pseudoinstructions and labels induce an assembly language similar to that of SDCC. To specify global data via labels, we introduced a preamble before the program where labels and the size of reserved space for data is stored. A pseudoinstruction \texttt{Mov} moves (16-bit) data stored at a label into the MCS-51's one 16-bit register, \texttt{DPTR}. Our pseudoinstructions and labels induce an assembly language similar to that of SDCC's. All pseudoinstructions and labels are `assembled away' prior to program execution. Jumps are computed in two stages. A map associating memory addresses to labels is built, before removing pseudojumps with concrete jumps to the correct address. The algorithm currently implemented does not try to minimize the object code size by always picking the shortest possible jump instruction. A better algorithm is currently left for future work. A map associating memory addresses to labels is built, before replacing pseudojumps with concrete jumps to the correct address. The algorithm currently implemented does not try to minimize object code size by picking the shortest possible jump instruction. A better algorithm is left for future work. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%