\documentclass[a4paper]{article}
\usepackage{a4wide}
\usepackage{algpseudocode}
\usepackage{alltt}
\usepackage{amsfonts}
\usepackage{amsmath}
\usepackage[british]{babel}
\usepackage{caption}
\usepackage[colorlinks]{hyperref}
\usepackage[utf8]{inputenc}
\usepackage{listings}
\usepackage{microtype}
\usepackage{subcaption}
%\usepackage{kpfonts}
\usepackage[T1]{fontenc}
%\usepackage[ttdefault=true]{AnonymousPro}
\newcommand{\All}[1]{\forall{#1}.\;}
\newcommand{\deffont}[1]{\textbf{#1}}
\newcommand{\lam}[1]{\lambda{#1}.\;}
\newcommand{\todo}[1]{\ensuremath{\texttt{[#1]}}}
\makeatletter
\def\blfootnote{\xdef\@thefnmark{}\@footnotetext}
\makeatother
\bibliographystyle{alpha}
%\renewcommand{\verb}{\lstinline}
\def\lstlanguagefiles{lst-grafite.tex}
\lstset{language=Grafite}
\author{Jaap Boender \and Dominic P. Mulligan \and Claudio Sacerdoti Coen}
\title{On the proof of correctness of a verified optimising assembler}
\begin{document}
\maketitle
\begin{abstract}
A typical microprocessor's instruction set provides several forms of conditional and unconditional jumps, procedure calls, and data movement instructions.
Unconditional jump and call instructions vary depending on the \emph{span} of the branch---the distance between the location of the instruction and the branch target---with short jump and call instructions taking less space in code memory to encode than their long jump and call counterparts.
Assembly languages are a thin veneer over machine code, with a few light abstractions provided for the benefit of the programmer.
Two such abstractions provided in assembly languages are uniform `jump' and `call' pseudoinstructions that branch to a labelled point in program source rather than a concrete machine address.
These pseudoinstructions remove the burden of selecting between long and short jump instructions, and computing concrete machine addresses, from the programmer.
Pseudoinstructions must be `assembled away' by the assembler, and a jump pseudoinstruction can always be replaced by a long jump machine code instruction with no semantic change.
Yet, such a na\"ive replacement is inefficient and wasteful in terms of code memory usage---a prime concern when targetting embedded devices---as some long jumps could have been replaced with short jumps, reducing the size of the program in code memory.
Therefore any assembler should replace as many long jumps with short jumps as possible, optimising for program size.
We present a formalisation and proof of correctness of an optimising assembler targetting a space-constrained embedded microcontroller: the 8-bit Intel MCS-51 series.
All proofs are checked by the Matita theorem prover, and the work forms a major component of the Certified Complexity (`CerCo') project, aiming to produce a verified concrete complexity-preserving compiler for a subset of the C programming language.
Our optimisations complicate the proof of correctness of the assembler proper and therefore we employ the use of `policies', separating out the decision of how a pseudoinstruction should be expanded from the expansion process itself, simplifying our proofs.
Our proof strategy also tracks `good' addresses, and only programs that use good addresses have their semantics preserved under assembly.
This tracking offers increased flexibility over the traditional approach---where addresses are kept abstract and immutable---as we may experiment with some manipulations of machine addresses.
\end{abstract}
\newpage
\tableofcontents
\blfootnote{Research supported by the CerCo project, within the Future and Emerging Technologies (FET) programme of the Seventh Framework Programme for Research of the European Commission, under FET-Open grant number 243881.}
\newpage
%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
% Section %
%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
\section{Introduction}
\label{sect.introduction}
We present an overview of work formalising and proving correct an optimising assembler for a typical embedded microcontroller.
Though the work described herein is specialised to the 8-bit Intel MCS-51 series, we believe the design decisions and techniques developed for our proof of correctness are of general interest.
% Everybody
%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
% Section %
%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
\subsection{Matita}
\label{subsect.matita}
Matita is a theorem prover based on a variant of the Calculus of Coinductive Constructions~\cite{asperti:user:2007}.
The system features a full spectrum of dependent types and (co)inductive families, a system of coercions, a tactic-driven proof construction engine~\cite{sacerdoti-coen:tinycals:2007}, and paramodulation based automation~\cite{asperti:higher-order:2007}, all of which we exploit in the formalisation described herein.
Matita's syntax is similar to the syntaxes of mainstream functional programming languages such as OCaml or Standard ML.
The type theory that Matita implements is broadly akin to that of Coq~\cite{coq:2004} and Agda~\cite{bove:brief:2009}.
Nevertheless, we provide a brief explanation of the main syntactic and type-theoretic features of Matita that will be needed to follow the body of the paper:
\begin{itemize}
\item
Non-recursive functions and definitions are introduced via the \texttt{definition} keyword.
Recursive functions are introduced with \texttt{let rec}.
Mutually recursive functions are separated via the \texttt{and} keyword.
Matita's termination checker ensures that all recursive functions are terminating before being admitted to maintain soundness of the system's logic.
\item
Matita has an infinite hierarchy of type universes.
A single impredicative universe of types, \texttt{Prop}, exists at the base of this hierarchy.
An infinite series of predicative universes, \texttt{Type[0]} : \texttt{Type[1]} : \texttt{Type[2]}, and so on and so forth, sits atop \texttt{Prop}.
Matita, unlike Coq or Agda, implements no form of typical ambiguity or universe polymorphism, with explicit concrete universe levels being preferred instead.
\item
Matita's type theory plays host to a rich and expressive higher-order logic.
Constants \texttt{True} and \texttt{False} represent truth and falsity in \texttt{Prop} respectively.
Two inductive families in \texttt{Prop} encode conjunction and disjunction---$\mathtt{P \wedge Q}$ and $\mathtt{P \vee Q}$ respectively.
As is usual, implication and universal quantification are identified with the dependent function space ($\Pi$ types), whereas (constructive) existential quantification is encoded as a dependent sum (a $\Sigma$-type).
We write $\All{x : \phi}\psi$ for the dependent function space, and abbreviate this as $\phi \rightarrow \psi$ when $x \not\in fv(\psi)$ as usual.
We use $\langle M,N \rangle$ for the pairing of $M$ and $N$.
\item
Inductive and coinductive families are introduced via the \texttt{inductive} and \texttt{coinductive} keywords respectively, with named constructor declarations separated by a bar.
Mutually inductive data family declarations are separated via \texttt{with}.
In the following declaration:
\begin{lstlisting}
inductive I ($P_1$ : $\tau_1$) $\ldots$ ($P_n$ : $\tau_n$) : $\phi_1 \rightarrow \ldots \rightarrow \phi_m \rightarrow \phi$ := $\ldots$
\end{lstlisting}
We call $P_i$ for $0 \leq i \leq n$ the \deffont{parameters} of \texttt{I} and $\phi_j$ for $0 \leq j \leq m$ the \deffont{indices} of \texttt{I}.
Matita's positivity checker ensures that constructors have strictly-positive types before admitting an inductive family to maintain soundness of the system's logic.
\item
Records are introduced with the \texttt{record} keyword.
A Matita record
\begin{lstlisting}
record R : Type[0] := { F1 : nat }.
\end{lstlisting}
may be thought of as syntactic sugar for a single-constructor inductive data type of the same name:
\begin{lstlisting}
inductive R : Type[0] :=
| mk_R : nat -> R.
\end{lstlisting}
A record field's type may depend on fields declared earlier in the record.
Records may be decomposed with projections.
Projections, one for each of field of a record, are registered in the global context.
In the example record above, \texttt{F1} of type $R \rightarrow nat$ is registered as a field projection and $mk\_R$ of type $nat \rightarrow R$ is registered as a constructor.
Record fields may also be marked as coercions.
In the following example
\begin{lstlisting}
record S : Type[1] :=
{
Carrier :> Type[0];
op : Carrier -> Carrier -> Carrier
}
\end{lstlisting}
the field \texttt{Carrier} is declared to be a coercion with `\texttt{:>}'.with the operational effect being that the field projection \texttt{Carrier} may be omitted where it could be successfully inferred by Matita.
Field coercions facilitate the informal but common mathematical practice of intentionally confusing a structure with its underlying carrier set.
\item
Terms may be freely omitted, allowing the user to write down partial types and terms.
A question mark, \texttt{?}, denotes a single term that has been omitted by the user.
Some omitted terms can be deduced by Matita's refinement system.
Other, more complex goals arising from omitted terms may require user input to solve, in which case a proof obligation is opened for each term that cannot be deduced automatically.
Three consecutive dots, \texttt{$\ldots$}, denote multiple terms or types that have been omitted.
\item
Data may be decomposed by pattern matching with a \texttt{match} expression.
We may fully annotate a \texttt{match} expression with its return type.
This is especially useful when working with indexed families of types or with invariants, expressed as types, on functions.
In the following
\begin{lstlisting}
match t return $\lam{x}x = 0 \rightarrow bool$ with
[ 0 $\Rightarrow$ $\lam{prf_1}P_1$
| S m $\Rightarrow$ $\lam{prf_2}P_2$
] (refl $\ldots$ t)
\end{lstlisting}
the \texttt{0} branch of the \texttt{match} expression returns a function from $0 = 0$ to \texttt{bool}, whereas the \texttt{S m} branch of the \texttt{match} expression returns a function from \texttt{S m = 0} to \texttt{bool}.
In both cases the annotated return type $\lam{x}x = 0 \rightarrow bool$ has been specialised given new information about \texttt{t} revealed by the act of pattern matching.
The entire term, with \texttt{match} expression applied to \texttt{refl $\ldots$ t}, has type \texttt{bool}.
\item
Matita features a liberal system of coercions (distinct from the previously mentioned record field coercions).
It is possible to define a uniform coercion $\lam{x}\langle x, ?\rangle$ from every type $T$ to the dependent product $\Sigma{x : T}. P x$.
The coercion opens a proof obligation that asks the user to prove that $P$ holds for $x$.
When a coercion is to be applied to a complex term (for example, a $\lambda$-abstraction, a local definition, or a case analysis), the system automatically propagates the coercion to the sub-terms.
For instance, to apply a coercion to force $\lam{x}M : A \rightarrow B$ to
have type $\All{x : A}\Sigma{y : B}. P x y$, the system looks for a coercion from $M : B$ to $\Sigma{y : B}. P x y$ in a context augmented with $x : A$.
This is significant when the coercion opens a proof obligation, as the user will be presented with multiple, but simpler proof obligations in the correct context.
In this way, Matita supports the `Russell' proof methodology developed by Sozeau in~\cite{sozeau:subset:2007}, in a lightweight but tightly-integrated manner.
\end{itemize}
Throughout, for reasons of clarity, conciseness, and readability, we may choose to simplify or omit parts of Matita code.
We will always ensure that these omissions do not mislead the reader.
%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
% Section %
%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
\subsection{Map of paper}
\label{subsect.map.of.paper}
% Anybody
%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
% Section %
%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
\section{The proof}
\label{sect.the.proof}
%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
% Section %
%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
\subsection{Structure of the assembler}
\label{subsect.structure.of.the.assembler}
% Everybody
%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
% Section %
%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
\subsection{Encoding the instruction set}
\label{subsect.encoding.the.instruction.set}
% DPM and CSC
% Polymorphic variants
% Non-regularity of instruction set (CISC)
% Use of dependent types in get_arg_XXX and set_arg_XXX
% Assembly code features (what pseudo-instructions are present)
%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
% Section %
%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
\subsection{Operational semantics}
\label{subsect.operational.semantics}
% DPM and CSC
One of the central data types in our formalisation is the \texttt{BitVector}.
\texttt{BitVector}s are fixed-length lists of bits (implemented as booleans) used to encode words and bytes:
\begin{lstlisting}
definition BitVector := $\lambda$n: nat. Vector bool n.
\end{lstlisting}
Arithmetic in our formalisation is carried out on \texttt{BitVector} values.
An MCS-51 byte is eight bits wide and an MCS-51 word is a double byte, or sixteen bits wide.
Some MCS-51 instructions require four, seven and eleven bit wide sub-bytes and sub-words respectively.
We capture these notions as definitions:
\begin{lstlisting}
definition Nibble := BitVector 4.
definition Byte7 := BitVector 7.
definition Byte := BitVector 8.
definition Word := BitVector 16.
definition Word11 := BitVector 11.
\end{lstlisting}
A central design decision in our formalisation is the choice of how to encode memory.
The MCS-51 has various memory spaces and registers that can be addressed with bytes and words.
Using a complete associative map to capture an entire memory space, including large swathes of uninitialised data, would be incredibly inefficient and make any proof incredibly difficult due to the strain placed on Matita.
However, using a na\"ive data structured like an association-list to encode memory would lead to inefficiencies when stepping through the semantics of a machine code program inside Matita.
We therefore use an indexed type, \texttt{BitVectorTrie}, to efficiently represent a memory space that may have `holes' of uninitialised or unspecified data:
\begin{lstlisting}
inductive BitVectorTrie (A: Type[0]): $\mathbb{N}$ $\rightarrow$ Type[0] :=
Leaf: A $\rightarrow$ BitVectorTrie A O
| Node: $\forall$n. BitVectorTrie A n $\rightarrow$ BitVectorTrie A n $\rightarrow$ BitVectorTrie A (S n)
| Stub: $\forall$n. BitVectorTrie A n.
\end{lstlisting}
The $\mathbb{N}$ index of the \texttt{BitVectorTrie} family records the maximum length of any path to stored data through the tree.
For a dependently-typed encoding of a trie data structure, the \texttt{Leaf} and \texttt{Node} constructors are standard.
The novelty of \texttt{BitVectorTrie} lies in the \texttt{Stub} constructor, which intuitively represents a `hole' in the tree, and can appear
% Trie data structure and use of dependent types
% Operational semantics of MCS-51
% Operational semantics of assembly code
%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
% Section %
%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
\subsection{Policies and the correctness proof}
\label{subsect.policies.and.the.correctness.proof}
% DPM and CSC
%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
% Section %
%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
\subsection{Correctness of branch displacement}
\label{subsect.correctness.of.branch.displacement}
% JB and CSC
One of the consequences of using the MCS-51 instruction set is that we have to take its specific branch instructions into account. The MCS-51 has three different types of branch instruction---which one should be used depends on the distance between the instruction and its target (the {\em span}). This applies to unconditional branches, conditional branches and procedure calls.
The three different types of branch instructions are:
\begin{itemize}
\item The {\em short} branch: a 2-byte instruction that can branch over a 1-byte displacement, i.e. 127 bytes forwards or 128 bytes backwards.
\item The {\em absolute} branch: a 2-byte instruction that can change the last 11 bits of the program counter. Thus, it can branch to any destination within the same 2048-byte segment.
\item The {\em long} branch: a 3-byte instruction that can branch to any destination in memory.
\end{itemize}
Not all combinations of instructions are provided. For example, there is no short call, nor is there a long conditional branch.
Apart from their difference in size, the instructions also take different amounts of time to execute: 2 cycles for the short and absolute branch, 3 cycles for the long branch.
The {\em branch optimisation problem}, a well-known problem in assembler design, is to find the smallest branch instructions for a given assembler program. This is complicated by the fact that branch instructions can influence each other: an instruction's span depends on the size of any branch instructions in between.
The branch displacement problem has been found to be NP-complete for several
similar architectures~\cite{Robertson1979,Szymanski1978}, therefore an optimal
solution could be time-consuming to find. The standard
solution~\cite{Szymanski1978,Dickson2008} is to use a fixed point algorithm.
Two flavours of solution are possible: either a greatest fixed point algorithm
(encoding all branches as long and then replacing them by shorter branches
insofar possible), or a least fixed point algorithm (encoding all branches as
short and then replacing them by longer branches as necessary). The advantage
of a greatest fixed point algorithm is that it can be stopped after any
iteration, as all the intermediate solutions are safe, if not necessarily
optimal. This is not the case for a least fixed point algorithm, but this
algorithm, if it terminates, will result in a smaller (or at least equal-sized)
solution.
In the particular case of the MCS-51 instruction set, matters are complicated
by the existence of the absolute branch. With this instruction, the encoding of
a branch does not solely depend on its span, but also on their location within
the program. This means that instead of being determined solely by the size of
instructions within its span, the encoding of a branch is determined also by
all of the branches that precede it, since lengthening or shortening a
preceding instruction could propel a jump and its destination into the same
segment (or out of it).
This also means that the standard termination argument for the least fixed
point algorithm no longer holds. With only short and long jumps, once a short
jump is replaced by a long jump, it will never be replaced by a short jump
again (since the span of jumps only increases), so the number of jumps is
an upper bound. However, with the absolute jump, it is entirely possible for
a long jump to be replaced by an absolute jump, so this argument no longer
holds.
%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
% Section %
%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
\subsection{Fitting together}
\label{subsect.fitting.together}
% Everybody
%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
% Section %
%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
\section{Conclusion}
\label{sect.conclusion}
% Everbody
%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
% Section %
%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
\subsection{Related work}
\label{subsect.related.work}
% Everybody
%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
% Section %
%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
\subsection{Formalisation}
\label{subsect.formalisation}
% Everybody
% Man-hours of effort
% Lines of code, etc.
% Pointer to web address for formalisation
\newpage
\bibliography{boender-jar-2015}
\end{document}