\section{Our algorithm}
\subsection{Design decisions}
Given the NP-completeness of the problem, to arrive at an optimal solution
(using, for example, a constraint solver) will potentially take a great amount
of time.
The SDCC compiler~\cite{SDCC2011}, which has a backend targetting the MCS-51
instruction set, simply encodes every branch instruction as a long jump
without taking the distance into account. While certainly correct (the long
jump can reach any destination in memory) and a very fast solution to compute,
it results in a less than optimal solution.
On the other hand, the {\tt gcc} compiler suite~\cite{GCC2012}, while compiling
C on the x86 architecture, uses a greatest fix point algorithm. In other words,
it starts with all branch instructions encoded as the largest jumps
available, and then tries to reduce the size of branch instructions as much as
possible.
Such an algorithm has the advantage that any intermediate result it returns
is correct: the solution where every branch instruction is encoded as a large
jump is always possible, and the algorithm only reduces those branch
instructions whose destination address is in range for a shorter jump.
The algorithm can thus be stopped after a determined number of steps without
sacrificing correctness.
The result, however, is not necessarily optimal. Even if the algorithm is run
until it terminates naturally, the fixed point reached is the {\em greatest}
fixed point, not the least fixed point. Furthermore, {\tt gcc} (at least for
the x86 architecture) only uses short and long jumps. This makes the algorithm
more efficient, as shown in the previous section, but also results in a less
optimal solution.
In the CerCo assembler, we opted at first for a least fixed point algorithm,
taking absolute jumps into account.
Here, we ran into a problem with proving termination, as explained in the
previous section: if we only take short and long jumps into account, the jump
encoding can only switch from short to long, but never in the other direction.
When we add absolute jumps, however, it is theoretically possible for a branch
instruction to switch from absolute to long and back, as previously explained.
Proving termination then becomes difficult, because there is nothing that
precludes a branch instruction from oscillating back and forth between absolute
and long jumps indefinitely.
In order to keep the algorithm in the same complexity class and more easily
prove termination, we decided to explicitly enforce the `branch instructions
must always grow longer' requirement: if a branch instruction is encoded as a
long jump in one iteration, it will also be encoded as a long jump in all the
following iterations. This means that the encoding of any branch instruction
can change at most two times: once from short to absolute (or long), and once
from absolute to long.
There is one complicating factor. Suppose that a branch instruction is encoded
in step $n$ as an absolute jump, but in step $n+1$ it is determined that
(because of changes elsewhere) it can now be encoded as a short jump. Due to
the requirement that the branch instructions must always grow longer, this
means that the branch encoding will be encoded as an absolute jump in step
$n+1$ as well.
This is not necessarily correct. A branch instruction that can be
encoded as a short jump cannot always also be encoded as an absolute jump, as a
short jump can bridge segments, whereas an absolute jump cannot. Therefore,
in this situation we have decided to encode the branch instruction as a long
jump, which is always correct.
The resulting algorithm, while not optimal, is at least as good as the ones
from SDCC and {\tt gcc}, and potentially better. Its complexity remains
the same (there are at most $2n$ iterations, where $n$ is the number of branch
instructions in the program).
\subsection{The algorithm in detail}
The branch displacement algorithm forms part of the translation from
pseudocode to assembler. More specifically, it is used by the function that
translates pseudo-addresses (natural numbers indicating the position of the
instruction in the program) to actual addresses in memory.
Our original intention was to have two different functions, one function
$\mathtt{policy}: \mathbb{N} \rightarrow \{\mathtt{short\_jump},
\mathtt{absolute\_jump}, \mathtt{long\_jump}\}$ to associate jumps to their
intended encoding, and a function $\sigma: \mathbb{N} \rightarrow
\mathtt{Word}$ to associate pseudo-addresses to machine addresses. $\sigma$
would use $\mathtt{policy}$ to determine the size of jump instructions.
This turned out to be suboptimal from the algorithmic point of view and
impossible to prove correct.
From the algorithmic point of view, in order to create the $\mathtt{policy}$
function, we must necessarily have a translation from pseudo-addresses
to machine addresses (i.e. a $\sigma$ function): in order to judge the distance
between a jump and its destination, we must know their memory locations.
Conversely, in order to create the $\sigma$ function, we need to have the
$\mathtt{policy}$ function, otherwise we do not know the sizes of the jump
instructions in the program.
Much the same problem appears when we try to prove the algorithm correct: the
correctness of $\mathtt{policy}$ depends on the correctness of $\sigma$, and
the correctness of $\sigma$ depends on the correctness of $\mathtt{policy}$.
We solved this problem by integrating the $\mathtt{policy}$ and $\sigma$
algorithms. We now have a function
$\sigma: \mathbb{N} \rightarrow \mathtt{Word} \times \mathtt{bool}$ which
associates a pseudo-address to a machine address. The boolean denotes a forced
long jump; as noted in the previous section, if during the fixed point
computation an absolute jump changes to be potentially re-encoded as a short
jump, the result is actually a long jump. It might therefore be the case that
jumps are encoded as long jumps without this actually being necessary, and this
information needs to be passed to the code generating function.
The assembler function encodes the jumps by checking the distance between
source and destination according to $\sigma$, so it could select an absolute
jump in a situation where there should be a long jump. The boolean is there
to prevent this from happening by indicating the locations where a long jump
should be encoded, even if a shorter jump is possible. This has no effect on
correctness, since a long jump is applicable in any situation.
\begin{figure}
\begin{algorithmic}
\Function{f}{$labels$,$old\_sigma$,$instr$,$ppc$,$acc$}
\State $\langle added, pc, sigma \rangle \gets acc$
\If {$instr$ is a backward jump to $j$}
\State $length \gets \mathrm{jump\_size}(pc,sigma_1(labels(j)))$
\ElsIf {$instr$ is a forward jump to $j$}
\State $length \gets \mathrm{jump\_size}(pc,old\_sigma_1(labels(j))+added)$
\Else
\State $length \gets \mathtt{short\_jump}$
\EndIf
\State $old\_length \gets \mathrm{old\_sigma_1}(ppc)$
\State $new\_length \gets \mathrm{max}(old\_length, length)$
\State $old\_size \gets \mathrm{old\_sigma_2}(ppc)$
\State $new\_size \gets \mathrm{instruction\_size}(instr,new\_length)$
\State $new\_added \gets added+(new\_size-old\_size)$
\State $new\_sigma_1(ppc+1) \gets pc+new\_size$
\State $new\_sigma_2(ppc) \gets new\_length$ \\
\Return $\langle new\_added, pc+new\_size, new\_sigma \rangle$
\EndFunction
\end{algorithmic}
\caption{The heart of the algorithm}
\label{f:jump_expansion_step}
\end{figure}
The algorithm, shown in Figure~\ref{f:jump_expansion_step}, works by folding the
function {\sc f} over the entire program, thus gradually constructing $sigma$.
This constitutes one step in the fixed point calculation; successive steps
repeat the fold until a fixed point is reached.
Parameters of the function {\sc f} are:
\begin{itemize}
\item a function $labels$ that associates a label to its pseudo-address;
\item $old\_sigma$, the $\sigma$ function returned by the previous
iteration of the fixed point calculation;
\item $instr$, the instruction currently under consideration;
\item $ppc$, the pseudo-address of $instr$;
\item $acc$, the fold accumulator, which contains $pc$ (the highest memory
address reached so far), $added$ (the number of bytes added to the program
size with respect to the previous iteration), and of course $sigma$, the
$\sigma$ function under construction.
\end{itemize}
The first two are parameters that remain the same through one iteration, the
final three are standard parameters for a fold function (including $ppc$,
which is simply the number of instructions of the program already processed).
The $\sigma$ functions used by {\sc f} are not of the same type as the final
$\sigma$ function: they are of type
$\sigma: \mathbb{N} \rightarrow \mathbb{N} \times \{\mathtt{short\_jump},
\mathtt{absolute\_jump},\mathtt{long\_jump}\}$; a function that associates a
pseudo-address with a memory address and a jump length. We do this to be able
to ease the comparison of jump lengths between iterations. In the algorithm,
we use the notation $sigma_1(x)$ to denote the memory address corresponding to
$x$, and $sigma_2(x)$ to denote the jump length corresponding to $x$.
Note that the $\sigma$ function used for label lookup varies depending on
whether the label is behind our current position or ahead of it. For
backward branches, where the label is behind our current position, we can use
$sigma$ for lookup, since its memory address is already known. However, for
forward branches, the memory address of the address of the label is not yet
known, so we must use $old\_sigma$.
We cannot use $old\_sigma$ without change: it might be the case that we have
already increased the size of some branch instructions before, making the program
longer and moving every instruction forward. We must compensate for this by
adding the size increase of the program to the label's memory address according
to $old\_sigma$, so that branch instruction spans do not get compromised.
Note also that we add the pc to $sigma$ at location $ppc+1$, whereas we add the
jump length at location $ppc$. We do this so that $sigma(ppc)$ will always
return a pair with the start address of the instruction at $ppc$ and the
length of its branch instruction (if any); the end address of the program can
be found at $sigma(n+1)$, where $n$ is the number of instructions in the
program.