1 | \section{Our algorithm} |
---|
2 | |
---|
3 | \subsection{Design decisions} |
---|
4 | |
---|
5 | Given the NP-completeness of the problem, to arrive at an optimal solution |
---|
6 | within a short space of time (using, for example, a constraint solver) will |
---|
7 | potentially take a great amount of time. |
---|
8 | |
---|
9 | The {\tt gcc} compiler suite, for the x86 architecture, uses a greatest fix |
---|
10 | point algorithm. In other words, it starts off with all jumps encoded as the |
---|
11 | largest jumps possible, and then tries to reduce jumps as much as possible. |
---|
12 | |
---|
13 | Such an algorithm has the advantage that any intermediate results it returns |
---|
14 | are correct: the solution where every jump is encoded as a large jump is |
---|
15 | always possible, and the algorithm only reduces those jumps where the |
---|
16 | destination address is in range for a shorter jump instruction. The algorithm |
---|
17 | can thus be stopped after a determined amount of steps without losing |
---|
18 | correctness. |
---|
19 | |
---|
20 | The result, however, is not necessarily optimal, even if the algorithm is run |
---|
21 | until it terminates naturally: the fixed point reached is the {\em greatest} |
---|
22 | fixed point, not the least fixed point. |
---|
23 | |
---|
24 | The SDCC compiler, which has the MCS-51 among its target instruction sets, uses |
---|
25 | a least fix point algorithm, but does not take the presence of medium jumps |
---|
26 | into account. This makes the algorithm run in linear time with respect to the |
---|
27 | number of jumps in the program: starting out with every jump encoded as a |
---|
28 | short jump, each jump can be switched to a long jump once, but no jump ever |
---|
29 | goes back from long to short. This algorithm must be run until a fixed point |
---|
30 | is reached, because the intermediate solutions are not necessarily correct. |
---|
31 | |
---|
32 | This algorithm results in a least fixed point, which means its solution is |
---|
33 | potentially more optimal then the one reached by the {\tt gcc} algorithm. |
---|
34 | However, the solution is still not optimal, since there might be jumps whose |
---|
35 | destination is in the same segment. These jumps could be encoded as medium |
---|
36 | jumps, which are smaller than long jumps. |
---|
37 | |
---|
38 | Our first attempt at an algorithm was a least fixed point algorithm that took |
---|
39 | medium jumps into account. |
---|
40 | |
---|
41 | Here, we ran into a problem with proving termination: whereas the SDCC |
---|
42 | algorithm only switches jumps from short to long, when we add medium jumps, |
---|
43 | it is theoretically possible for a jump to switch from medium to long and back, |
---|
44 | as explained in the previous section. |
---|
45 | |
---|
46 | Proving termination then becomes difficult, because there is nothing that |
---|
47 | precludes a jump switching back and forth between medium and long indefinitely. |
---|
48 | |
---|
49 | In fact, this mirrors the argument from~\cite{Szymanski1978}. There, it is |
---|
50 | argued that for the problem to be NP-complete, it must be allowed to contain |
---|
51 | {\em pathological} jumps. These are jumps that can normally not be encoded as a |
---|
52 | short(er) jump, but gain this property when some other jumps are encoded as a |
---|
53 | long(er) jump. This is exactly what happens in figure~\ref{f:term_example}: by |
---|
54 | encoding the first jump as a long jump, another jump switches from long to |
---|
55 | medium (which is shorter). |
---|
56 | |
---|
57 | In order to keep the algorithm linear and more easily prove termination, we |
---|
58 | decided to explicitly enforce the `jumps must always increase' requirement: if |
---|
59 | a jump is encoded as a long jump in one step, it will also be encoded as a |
---|
60 | long jump in all the following steps. This means that any jump can change at |
---|
61 | maximum two times: once from short to medium (or long), and once from medium |
---|
62 | to long. |
---|
63 | |
---|
64 | There is one complicating factor: suppose that a jump is encoded in step $n$ |
---|
65 | as a medium jump, but in step $n+1$ it is determined that (because of changes |
---|
66 | elsewhere) it can now be encoded as a short jump. Due to the requirement that |
---|
67 | jumps must always increase, this means that the jump will be encoded as a |
---|
68 | medium jump in step $n+1$ as well. |
---|
69 | |
---|
70 | This is not necessarily correct, however: it is not the case that any short |
---|
71 | jump can correctly be encoded as a medium jump. Therefore, in this situation |
---|
72 | we decide to encode the jump as a long jump, which is always correct. |
---|
73 | |
---|
74 | The resulting algorithm, while not optimal, is at least as good as the ones |
---|
75 | from {\tt gcc} and SDCC, and potentially better. Its complexity remains |
---|
76 | linear (though with a higher constant than SDCC). |
---|