# Changeset 2091 for src/ASM/CPP2012-policy/algorithm.tex

Ignore:
Timestamp:
Jun 15, 2012, 1:35:46 PM (7 years ago)
Message:
• systematically changed 'jump' to 'branch'
File:
1 edited

### Legend:

Unmodified
 r2086 The SDCC compiler~\cite{SDCC2011}, which has the MCS-51 among its target instruction sets, simply encodes every jump as a long jump without taking the distance into account. While certainly correct (the long jump can reach any destination in memory) and rapid, it does result in a less than optimal solution. instruction sets, simply encodes every branch instruction as a long jump without taking the distance into account. While certainly correct (the long jump can reach any destination in memory) and rapid, it does result in a less than optimal solution. The {\tt gcc} compiler suite~\cite{GCC2012}, while compiling C on the x86 architecture, uses a greatest fix point algorithm. In other words, it starts off with all jumps encoded as the largest jumps available, and then tries to reduce jumps as much as possible. off with all branch instructions encoded as the largest jumps available, and then tries to reduce branch instructions as much as possible. Such an algorithm has the advantage that any intermediate results it returns are correct: the solution where every jump is encoded as a large jump is always possible, and the algorithm only reduces those jumps where the destination address is in range for a shorter jump instruction. The algorithm can thus be stopped after a determined amount of steps without losing correctness. Such an algorithm has the advantage that any intermediate result it returns is correct: the solution where every branch instruction is encoded as a large jump is always possible, and the algorithm only reduces those branch instructions whose destination address is in range for a shorter jump. The algorithm can thus be stopped after a determined amount of steps without losing correctness. The result, however, is not necessarily optimal, even if the algorithm is run previous section: if we only take short and long jumps into account, the jump encoding can only switch from short to long, but never in the other direction. When we add absolute jumps, however, it is theoretically possible for a jump to switch from absolute to long and back, as explained in the previous section. When we add absolute jumps, however, it is theoretically possible for a branch instruction to switch from absolute to long and back, as shown in the previous section. Proving termination then becomes difficult, because there is nothing that precludes a jump from switching back and forth between absolute and long indefinitely. precludes a branch instruction from switching back and forth between absolute and long indefinitely. In order to keep the algorithm in the same complexity class and more easily prove termination, we decided to explicitly enforce the jumps must always increase' requirement: if a jump is encoded as a long jump in one step, it will also be encoded as a long jump in all the following steps. This means that any jump can change at most two times: once from short to absolute (or long), and once from absolute to long. prove termination, we decided to explicitly enforce the branch instructions must always grow longer' requirement: if a branch instruction is encoded as a long jump in one iteration, it will also be encoded as a long jump in all the following iterations. This means that the encoding of any branch instruction can change at most two times: once from short to absolute (or long), and once from absolute to long. There is one complicating factor: suppose that a jump is encoded in step $n$ as an absolute jump, but in step $n+1$ it is determined that (because of changes elsewhere) it can now be encoded as a short jump. Due to the requirement that jumps must always increase, this means that the jump will be encoded as an absolute jump in step $n+1$ as well. There is one complicating factor: suppose that a branch instruction is encoded in step $n$ as an absolute jump, but in step $n+1$ it is determined that (because of changes elsewhere) it can now be encoded as a short jump. Due to the requirement that the branch instructions must always grow longer, this means that the branch encoding will be encoded as an absolute jump in step $n+1$ as well. This is not necessarily correct, however: it is not the case that any short jump can correctly be encoded as an absolute jump (a short jump can bridge segments, whereas an absolute jump cannot). Therefore, in this situation we decide to encode the jump as a long jump, which is always correct. This is not necessarily correct, however: a branch instruction that can be encoded as a short jump cannot always also be encoded as an absolute jump (a short jump can bridge segments, whereas an absolute jump cannot). Therefore, in this situation we decide to encode the branch instruction as a long jump, which is always correct. The resulting algorithm, while not optimal, is at least as good as the ones from SDCC and {\tt gcc}, and potentially better. Its complexity remains the same (there are at most $2n$ iterations, where $n$ is the number of jumps in the program). the same (there are at most $2n$ iterations, where $n$ is the number of branch instructions in the program). \subsection{The algorithm in detail} Note that the $\sigma$ function used for label lookup varies depending on whether the label is behind our current position or ahead of it. For backward jumps, where the label is behind our current position, we can use backward branches, where the label is behind our current position, we can use $sigma$ for lookup, since its memory address is already known. However, for forward jumps, the memory address of the address of the label is not yet forward branches, the memory address of the address of the label is not yet known, so we must use $old\_sigma$. We cannot use $old\_sigma$ without change: it might be the case that we have already changed some jumps before, making the program longer. We must therefore compensate for this by adding the size increase of the program to the label's memory address according to $old\_sigma$, so that jump distances do not get compromised. already increased the size some branch instructions before, making the program longer and moving every instruction forward. We must compensate for this by adding the size increase of the program to the label's memory address according to $old\_sigma$, so that branch instruction spans do not get compromised. Note also that we add the pc to $sigma$ at location $ppc+1$, whereas we add the jump length at location $ppc$. We do this so that $sigma(ppc)$ will always return a couple with the start address of the instruction at $ppc$ and the length of its jump; the end address of the program can be found at $sigma(n+1)$, where $n$ is the number of instructions in the program. length of its branch instruction (if any); the end address of the program can be found at $sigma(n+1)$, where $n$ is the number of instructions in the program.