Index: /src/ASM/CPP2012-policy/algorithm.tex
===================================================================
--- /src/ASM/CPP2012-policy/algorithm.tex (revision 2048)
+++ /src/ASM/CPP2012-policy/algorithm.tex (revision 2049)
@@ -1,1 +1,76 @@
\section{Our algorithm}
+
+\subsection{Design decisions}
+
+Given the NP-completeness of the problem, to arrive at an optimal solution
+within a short space of time (using, for example, a constraint solver) will
+potentially take a great amount of time.
+
+The {\tt gcc} compiler suite, for the x86 architecture, uses a greatest fix
+point algorithm. In other words, it starts off with all jumps encoded as the
+largest jumps possible, and then tries to reduce jumps as much as possible.
+
+Such an algorithm has the advantage that any intermediate results it returns
+are correct: the solution where every jump is encoded as a large jump is
+always possible, and the algorithm only reduces those jumps where the
+destination address is in range for a shorter jump instruction. The algorithm
+can thus be stopped after a determined amount of steps without losing
+correctness.
+
+The result, however, is not necessarily optimal, even if the algorithm is run
+until it terminates naturally: the fixed point reached is the {\em greatest}
+fixed point, not the least fixed point.
+
+The SDCC compiler, which has the MCS-51 among its target instruction sets, uses
+a least fix point algorithm, but does not take the presence of medium jumps
+into account. This makes the algorithm run in linear time with respect to the
+number of jumps in the program: starting out with every jump encoded as a
+short jump, each jump can be switched to a long jump once, but no jump ever
+goes back from long to short. This algorithm must be run until a fixed point
+is reached, because the intermediate solutions are not necessarily correct.
+
+This algorithm results in a least fixed point, which means its solution is
+potentially more optimal then the one reached by the {\tt gcc} algorithm.
+However, the solution is still not optimal, since there might be jumps whose
+destination is in the same segment. These jumps could be encoded as medium
+jumps, which are smaller than long jumps.
+
+Our first attempt at an algorithm was a least fixed point algorithm that took
+medium jumps into account.
+
+Here, we ran into a problem with proving termination: whereas the SDCC
+algorithm only switches jumps from short to long, when we add medium jumps,
+it is theoretically possible for a jump to switch from medium to long and back,
+as explained in the previous section.
+
+Proving termination then becomes difficult, because there is nothing that
+precludes a jump switching back and forth between medium and long indefinitely.
+
+In fact, this mirrors the argument from~\cite{Szymanski1978}. There, it is
+argued that for the problem to be NP-complete, it must be allowed to contain
+{\em pathological} jumps. These are jumps that can normally not be encoded as a
+short(er) jump, but gain this property when some other jumps are encoded as a
+long(er) jump. This is exactly what happens in figure~\ref{f:term_example}: by
+encoding the first jump as a long jump, another jump switches from long to
+medium (which is shorter).
+
+In order to keep the algorithm linear and more easily prove termination, we
+decided to explicitly enforce the `jumps must always increase' requirement: if
+a jump is encoded as a long jump in one step, it will also be encoded as a
+long jump in all the following steps. This means that any jump can change at
+maximum two times: once from short to medium (or long), and once from medium
+to long.
+
+There is one complicating factor: suppose that a jump is encoded in step $n$
+as a medium jump, but in step $n+1$ it is determined that (because of changes
+elsewhere) it can now be encoded as a short jump. Due to the requirement that
+jumps must always increase, this means that the jump will be encoded as a
+medium jump in step $n+1$ as well.
+
+This is not necessarily correct, however: it is not the case that any short
+jump can correctly be encoded as a medium jump. Therefore, in this situation
+we decide to encode the jump as a long jump, which is always correct.
+
+The resulting algorithm, while not optimal, is at least as good as the ones
+from {\tt gcc} and SDCC, and potentially better. Its complexity remains
+linear (though with a higher constant than SDCC).
Index: /src/ASM/CPP2012-policy/problem.tex
===================================================================
--- /src/ASM/CPP2012-policy/problem.tex (revision 2048)
+++ /src/ASM/CPP2012-policy/problem.tex (revision 2049)
@@ -9,6 +9,6 @@
address. Mostly this occurs with jump instructions; for example, in the
x86-64 instruction set there are eleven different forms of the unconditional
-jump instruction, all with different ranges and semantics (only six are valid
-in 64-bit mode, for example). Some examples are shown in
+jump instruction, all with different ranges, instruction sizes and semantics
+(only six are valid in 64-bit mode, for example). Some examples are shown in
figure~\ref{f:x86jumps}:
@@ -75,8 +75,7 @@
\subsection*{Adding medium jumps}
-In both the canonical solutions presented, the encoding of a jump is only
-dependent on the distance between the jump and its target: below a certain
-value a short jump can be used; above this value the jump must be encoded
-as a long jump.
+In both papers mentioned above, the encoding of a jump is only dependent on the
+distance between the jump and its target: below a certain value a short jump
+can be used; above this value the jump must be encoded as a long jump.
Here, termination of the smallest fixpoint algorithm is easy to prove. All