\section{Our algorithm}
\subsection{Design decisions}
Given the NP-completeness of the problem, to arrive at an optimal solution
within a short space of time (using, for example, a constraint solver) will
potentially take a great amount of time.
The SDCC compiler~\cite{SDCC2011}, which has the MCS-51 among its target
instruction sets, simply encodes every jump as a long jump without taking the
distance into account. While certainly correct (the long jump can reach any
destination in memory) and rapid, it does result in a less than optimal
solution.
The {\tt gcc} compiler suite~\cite{GCC2012}, while compiling C on the x86
architecture, uses a greatest fix point algorithm. In other words, it starts
off with all jumps encoded as the largest jumps available, and then tries to
reduce jumps as much as possible.
Such an algorithm has the advantage that any intermediate results it returns
are correct: the solution where every jump is encoded as a large jump is
always possible, and the algorithm only reduces those jumps where the
destination address is in range for a shorter jump instruction. The algorithm
can thus be stopped after a determined amount of steps without losing
correctness.
The result, however, is not necessarily optimal, even if the algorithm is run
until it terminates naturally: the fixed point reached is the {\em greatest}
fixed point, not the least fixed point. Furthermore, {\tt gcc} (at least for
the x86 architecture) only uses short and long jumps. This makes the algorithm
more rapid, as shown in the previous section, but also results in a less
optimal solution.
In the CerCo assembler, we opted at first for a least fixed point algorithm,
taking medium jumps into account.
Here, we ran into a problem with proving termination: whereas the SDCC
algorithm only switches jumps from short to long, when we add medium jumps,
it is theoretically possible for a jump to switch from medium to long and back,
as explained in the previous section.
Proving termination then becomes difficult, because there is nothing that
precludes a jump switching back and forth between medium and long indefinitely.
In fact, this mirrors the argument from~\cite{Szymanski1978}. There, it is
argued that for the problem to be NP-complete, it must be allowed to contain
{\em pathological} jumps. These are jumps that can normally not be encoded as a
short(er) jump, but gain this property when some other jumps are encoded as a
long(er) jump. This is exactly what happens in figure~\ref{f:term_example}: by
encoding the first jump as a long jump, another jump switches from long to
medium (which is shorter).
In order to keep the algorithm linear and more easily prove termination, we
decided to explicitly enforce the `jumps must always increase' requirement: if
a jump is encoded as a long jump in one step, it will also be encoded as a
long jump in all the following steps. This means that any jump can change at
maximum two times: once from short to medium (or long), and once from medium
to long.
There is one complicating factor: suppose that a jump is encoded in step $n$
as a medium jump, but in step $n+1$ it is determined that (because of changes
elsewhere) it can now be encoded as a short jump. Due to the requirement that
jumps must always increase, this means that the jump will be encoded as a
medium jump in step $n+1$ as well.
This is not necessarily correct, however: it is not the case that any short
jump can correctly be encoded as a medium jump (a short jump can bridge
segments, whereas a medium jump cannot). Therefore, in this situation
we decide to encode the jump as a long jump, which is always correct.
The resulting algorithm, while not optimal, is at least as good as the ones
from {\tt gcc} and SDCC, and potentially better. Its complexity remains
linear (though with a higher constant than SDCC).
\subsection{The algorithm in detail}
The branch displacement algorithm forms part of the translation from
pseudo-code to assembler. More specifically, it is used by the function that
translates pseudo-addresses (natural numbers indicating the position of the
instruction in the program) to actual addresses in memory.
The original intention was to have two different functions, one function
$\mathtt{policy}: \mathbb{N} \rightarrow \{\mathtt{short}, \mathtt{medium},
\mathtt{long}\}$ to associate jumps to their intended translation, and a
function $\sigma: \mathbb{N} \rightarrow \mathtt{Word}$ to associate
pseudo-addresses to actual addresses. $\sigma$ would use $\mathtt{policy}$ to
determine the size of jump instructions.
This turned out to be suboptimal from the algorithmic point of view and
impossible to prove correct.
From the algorithmic point of view, in order to create the $\mathtt{policy}$
function, we must necessarily have a translation from pseudo-addresses
to actual addresses (i.e. a $\sigma$ function): in order to judge the distance
between a jump and its destination, we must know their memory locations.
Conversely, in order to create the $\sigma$ function, we need to have the
$\mathtt{policy}$ function, otherwise we do not know the sizes of the jump
instructions in the program.
Much the same problem appears when we try to prove the algorithm correct: the
correctness of $\mathtt{policy}$ depends on the correctness of $\sigma$, and
the correctness of $\sigma$ depends on the correctness of $\mathtt{policy}$.
We solved this problem by integrating the $\mathtt{policy}$ and $\sigma$
algorithms. We now have a function
$\sigma: \mathbb{N} \rightarrow \mathtt{Word} \times \mathtt{bool}$ which
associates a pseudo-address to an actual address. The boolean denotes a forced
long jump; as noted in the previous section, if during the fixed point
computation a medium jump needs to be re-encoded as a short jump, the result
is actually a long jump. It might therefore be the case that jumps are encoded
as long jumps without this actually being necessary.
The assembler function encodes the jumps by checking the distance between
source and destination according to $\sigma$, so it could select a medium
jump in a situation where there should be a long jump. The boolean is there
to prevent this from happening by indicating the locations where a long jump
should be encoded, even if a shorter jump is possible. This has no effect on
correctness, since a long jump is applicable in any situation.
\begin{figure}
\begin{algorithmic}
\Function{f}{$labels$,$old\_sigma$,$instr$,$ppc$,$acc$}
\State $\langle added, pc, sigma \rangle \gets acc$
\If {$instr$ is a backward jump to $j$}
\State $length \gets \mathrm{jump\_size}(pc,sigma_1(labels(j)))$
\ElsIf {$instr$ is a forward jump to $j$}
\State $length \gets \mathrm{jump\_size}(pc,old\_sigma_1(labels(j))+added)$
\Else
\State $length \gets \mathtt{short\_jump}$
\EndIf
\State $old\_length \gets \mathrm{old\_sigma_1}(ppc)$
\State $new\_length \gets \mathrm{max}(old\_length, length)$
\State $old\_size \gets \mathrm{old\_sigma_2}(ppc)$
\State $new\_size \gets \mathrm{instruction\_size}(instr,new\_length)$
\State $new\_added \gets added+(new\_size-old\_size)$
\State $new\_sigma_1(ppc+1) \gets pc+new\_size$
\State $new\_sigma_2(ppc) \gets new\_length$ \\
\Return $\langle new\_added, pc+new\_size, new\_sigma \rangle$
\EndFunction
\end{algorithmic}
\caption{The heart of the algorithm}
\label{f:jump_expansion_step}
\end{figure}
The algorithm, shown in figure~\ref{f:jump_expansion_step}, works by folding the
function {\sc f} over the entire program, thus gradually constructing $sigma$.
This constitutes one step in the fixed point calculation; successive steps
repeat the fold until a fixed point is reached.
Parameters of the function {\sc f} are:
\begin{itemize}
\item a function $labels$ that associates a label to its pseudo-address;
\item $old\_sigma$, the $\sigma$ function returned by the previous
iteration of the fixed point calculcation;
\item $instr$, the instruction currently under consideration;
\item $ppc$, the pseudo-address of $instr$;
\item $acc$, the fold accumulator, which contains $pc$ (the highest memory
address reached so far), $added$ (the number of bytes added to the program
size with respect to the previous iteration), and of course $sigma$, the
$\sigma$ function under construction.
\end{itemize}
The first two are parameters that remain the same through one iteration, the
last three are standard parameters for a fold function (including $ppc$,
which is simply the number of instructions of the program already processed).
The $\sigma$ functions used by {\sc f} are not of the same type as the final
$\sigma$ function: they are of type
$\sigma: \mathbb{N} \rightarrow \mathbb{N} \times \{\mathtt{short\_jump},
\mathtt{medium\_jump},\mathtt{long\_jump}\}$; a function that associates a
pseudo-address with an memory address and a jump length. We do this to be able
to more easily compare the jump lengths between iterations. In the algorithm,
we use the notation $sigma_1(x)$ to denote the memory address corresponding to
$x$, and $sigma_2(x)$ to denote the jump length corresponding to $x$.
Note that the $\sigma$ function used for label lookup varies depending on
whether the label is behind our current position or ahead of it. For
backward jumps, where the label is behind our current position, we can use
$sigma$ for lookup, since its memory address is already known. However, for
forward jumps, the memory address of the address of the label is not yet
known, so we must use $old\_sigma$.
We cannot use $old\_sigma$ without change: it might be the case that we have
already changed some jumps before, making the program longer. We must therefore
compensate for this by adding the size increase of the program to the label's
memory address according to $old\_sigma$, so that jump distances do not get
compromised.
Note also that we add the pc to $sigma$ at location $ppc+1$, whereas we add the
jump length at location $ppc$. We do this so that $sigma(ppc)$ will always
return a couple with the start address of the instruction at $ppc$ and the
length of its jump; the end address of the program can be found at $sigma(n+1)$,
where $n$ is the number of instructions in the program.