Changeset 2049

Jun 12, 2012, 3:49:27 PM (5 years ago)
  • progress
2 edited


  • src/ASM/CPP2012-policy/algorithm.tex

    r1889 r2049  
    11\section{Our algorithm}
     3\subsection{Design decisions}
     5Given the NP-completeness of the problem, to arrive at an optimal solution
     6within a short space of time (using, for example, a constraint solver) will
     7potentially take a great amount of time.
     9The {\tt gcc} compiler suite, for the x86 architecture, uses a greatest fix
     10point algorithm. In other words, it starts off with all jumps encoded as the
     11largest jumps possible, and then tries to reduce jumps as much as possible.
     13Such an algorithm has the advantage that any intermediate results it returns
     14are correct: the solution where every jump is encoded as a large jump is
     15always possible, and the algorithm only reduces those jumps where the
     16destination address is in range for a shorter jump instruction. The algorithm
     17can thus be stopped after a determined amount of steps without losing
     20The result, however, is not necessarily optimal, even if the algorithm is run
     21until it terminates naturally: the fixed point reached is the {\em greatest}
     22fixed point, not the least fixed point.
     24The SDCC compiler, which has the MCS-51 among its target instruction sets, uses
     25a least fix point algorithm, but does not take the presence of medium jumps
     26into account. This makes the algorithm run in linear time with respect to the
     27number of jumps in the program: starting out with every jump encoded as a
     28short jump, each jump can be switched to a long jump once, but no jump ever
     29goes back from long to short. This algorithm must be run until a fixed point
     30is reached, because the intermediate solutions are not necessarily correct.
     32This algorithm results in a least fixed point, which means its solution is
     33potentially more optimal then the one reached by the {\tt gcc} algorithm.
     34However, the solution is still not optimal, since there might be jumps whose
     35destination is in the same segment. These jumps could be encoded as medium
     36jumps, which are smaller than long jumps.
     38Our first attempt at an algorithm was a least fixed point algorithm that took
     39medium jumps into account.
     41Here, we ran into a problem with proving termination: whereas the SDCC
     42algorithm only switches jumps from short to long, when we add medium jumps,
     43it is theoretically possible for a jump to switch from medium to long and back,
     44as explained in the previous section.
     46Proving termination then becomes difficult, because there is nothing that
     47precludes a jump switching back and forth between medium and long indefinitely.
     49In fact, this mirrors the argument from~\cite{Szymanski1978}. There, it is
     50argued that for the problem to be NP-complete, it must be allowed to contain
     51{\em pathological} jumps. These are jumps that can normally not be encoded as a
     52short(er) jump, but gain this property when some other jumps are encoded as a
     53long(er) jump. This is exactly what happens in figure~\ref{f:term_example}: by
     54encoding the first jump as a long jump, another jump switches from long to
     55medium (which is shorter).
     57In order to keep the algorithm linear and more easily prove termination, we
     58decided to explicitly enforce the `jumps must always increase' requirement: if
     59a jump is encoded as a long jump in one step, it will also be encoded as a
     60long jump in all the following steps. This means that any jump can change at
     61maximum two times: once from short to medium (or long), and once from medium
     62to long.
     64There is one complicating factor: suppose that a jump is encoded in step $n$
     65as a medium jump, but in step $n+1$ it is determined that (because of changes
     66elsewhere) it can now be encoded as a short jump. Due to the requirement that
     67jumps must always increase, this means that the jump will be encoded as a
     68medium jump in step $n+1$ as well.
     70This is not necessarily correct, however: it is not the case that any short
     71jump can correctly be encoded as a medium jump. Therefore, in this situation
     72we decide to encode the jump as a long jump, which is always correct.
     74The resulting algorithm, while not optimal, is at least as good as the ones
     75from {\tt gcc} and SDCC, and potentially better. Its complexity remains
     76linear (though with a higher constant than SDCC).
  • src/ASM/CPP2012-policy/problem.tex

    r1889 r2049  
    99address. Mostly this occurs with jump instructions; for example, in the
    1010x86-64 instruction set there are eleven different forms of the unconditional
    11 jump instruction, all with different ranges and semantics (only six are valid
    12 in 64-bit mode, for example). Some examples are shown in
     11jump instruction, all with different ranges, instruction sizes and semantics
     12(only six are valid in 64-bit mode, for example). Some examples are shown in
    7575\subsection*{Adding medium jumps}
    77 In both the canonical solutions presented, the encoding of a jump is only
    78 dependent on the distance between the jump and its target: below a certain
    79 value a short jump can be used; above this value the jump must be encoded
    80 as a long jump.
     77In both papers mentioned above, the encoding of a jump is only dependent on the
     78distance between the jump and its target: below a certain value a short jump
     79can be used; above this value the jump must be encoded as a long jump.
    8281Here, termination of the smallest fixpoint algorithm is easy to prove. All
Note: See TracChangeset for help on using the changeset viewer.