Ignore:
Timestamp:
Jun 15, 2012, 1:35:46 PM (7 years ago)
Author:
boender
Message:
  • systematically changed 'jump' to 'branch'
File:
1 edited

Legend:

Unmodified
Added
Removed
  • src/ASM/CPP2012-policy/algorithm.tex

    r2086 r2091  
    88
    99The SDCC compiler~\cite{SDCC2011}, which has the MCS-51 among its target
    10 instruction sets, simply encodes every jump as a long jump without taking the
    11 distance into account. While certainly correct (the long jump can reach any
    12 destination in memory) and rapid, it does result in a less than optimal
    13 solution.
     10instruction sets, simply encodes every branch instruction as a long jump
     11without taking the distance into account. While certainly correct (the long
     12jump can reach any destination in memory) and rapid, it does result in a less
     13than optimal solution.
    1414
    1515The {\tt gcc} compiler suite~\cite{GCC2012}, while compiling C on the x86
    1616architecture, uses a greatest fix point algorithm. In other words, it starts
    17 off with all jumps encoded as the largest jumps available, and then tries to
    18 reduce jumps as much as possible.
     17off with all branch instructions encoded as the largest jumps available, and
     18then tries to reduce branch instructions as much as possible.
    1919
    20 Such an algorithm has the advantage that any intermediate results it returns
    21 are correct: the solution where every jump is encoded as a large jump is
    22 always possible, and the algorithm only reduces those jumps where the
    23 destination address is in range for a shorter jump instruction. The algorithm
    24 can thus be stopped after a determined amount of steps without losing
    25 correctness.
     20Such an algorithm has the advantage that any intermediate result it returns
     21is correct: the solution where every branch instruction is encoded as a large
     22jump is always possible, and the algorithm only reduces those branch
     23instructions whose destination address is in range for a shorter jump.
     24The algorithm can thus be stopped after a determined amount of steps without
     25losing correctness.
    2626
    2727The result, however, is not necessarily optimal, even if the algorithm is run
     
    3838previous section: if we only take short and long jumps into account, the jump
    3939encoding can only switch from short to long, but never in the other direction.
    40 When we add absolute jumps, however, it is theoretically possible for a jump to
    41 switch from absolute to long and back, as explained in the previous section.
     40When we add absolute jumps, however, it is theoretically possible for a branch
     41instruction to switch from absolute to long and back, as shown in the previous
     42section.
    4243
    4344Proving termination then becomes difficult, because there is nothing that
    44 precludes a jump from switching back and forth between absolute and long
    45 indefinitely.
     45precludes a branch instruction from switching back and forth between absolute
     46and long indefinitely.
    4647
    4748In order to keep the algorithm in the same complexity class and more easily
    48 prove termination, we decided to explicitly enforce the `jumps must always
    49 increase' requirement: if a jump is encoded as a long jump in one step, it will
    50 also be encoded as a long jump in all the following steps. This means that any
    51 jump can change at most two times: once from short to absolute (or long), and
    52 once from absolute to long.
     49prove termination, we decided to explicitly enforce the `branch instructions
     50must always grow longer' requirement: if a branch instruction is encoded as a
     51long jump in one iteration, it will also be encoded as a long jump in all the
     52following iterations. This means that the encoding of any branch instruction
     53can change at most two times: once from short to absolute (or long), and once
     54from absolute to long.
    5355
    54 There is one complicating factor: suppose that a jump is encoded in step $n$
    55 as an absolute jump, but in step $n+1$ it is determined that (because of changes
    56 elsewhere) it can now be encoded as a short jump. Due to the requirement that
    57 jumps must always increase, this means that the jump will be encoded as an
    58 absolute jump in step $n+1$ as well.
     56There is one complicating factor: suppose that a branch instruction is encoded
     57in step $n$ as an absolute jump, but in step $n+1$ it is determined that
     58(because of changes elsewhere) it can now be encoded as a short jump. Due to
     59the requirement that the branch instructions must always grow longer, this
     60means that the branch encoding will be encoded as an absolute jump in step
     61$n+1$ as well.
    5962
    60 This is not necessarily correct, however: it is not the case that any short
    61 jump can correctly be encoded as an absolute jump (a short jump can bridge
    62 segments, whereas an absolute jump cannot). Therefore, in this situation
    63 we decide to encode the jump as a long jump, which is always correct.
     63This is not necessarily correct, however: a branch instruction that can be
     64encoded as a short jump cannot always also be encoded as an absolute jump
     65(a short jump can bridge segments, whereas an absolute jump cannot). Therefore,
     66in this situation we decide to encode the branch instruction as a long jump,
     67which is always correct.
    6468
    6569The resulting algorithm, while not optimal, is at least as good as the ones
    6670from SDCC and {\tt gcc}, and potentially better. Its complexity remains
    67 the same (there are at most $2n$ iterations, where $n$ is the number of jumps
    68 in the program).
     71the same (there are at most $2n$ iterations, where $n$ is the number of branch
     72instructions in the program).
    6973
    7074\subsection{The algorithm in detail}
     
    172176Note that the $\sigma$ function used for label lookup varies depending on
    173177whether the label is behind our current position or ahead of it. For
    174 backward jumps, where the label is behind our current position, we can use
     178backward branches, where the label is behind our current position, we can use
    175179$sigma$ for lookup, since its memory address is already known. However, for
    176 forward jumps, the memory address of the address of the label is not yet
     180forward branches, the memory address of the address of the label is not yet
    177181known, so we must use $old\_sigma$.
    178182
    179183We cannot use $old\_sigma$ without change: it might be the case that we have
    180 already changed some jumps before, making the program longer. We must therefore
    181 compensate for this by adding the size increase of the program to the label's
    182 memory address according to $old\_sigma$, so that jump distances do not get
    183 compromised.
     184already increased the size some branch instructions before, making the program
     185longer and moving every instruction forward. We must compensate for this by
     186adding the size increase of the program to the label's memory address according
     187to $old\_sigma$, so that branch instruction spans do not get compromised.
    184188
    185189Note also that we add the pc to $sigma$ at location $ppc+1$, whereas we add the
    186190jump length at location $ppc$. We do this so that $sigma(ppc)$ will always
    187191return a couple with the start address of the instruction at $ppc$ and the
    188 length of its jump; the end address of the program can be found at $sigma(n+1)$,
    189 where $n$ is the number of instructions in the program.
     192length of its branch instruction (if any); the end address of the program can
     193be found at $sigma(n+1)$, where $n$ is the number of instructions in the
     194program.
Note: See TracChangeset for help on using the changeset viewer.