Changeset 2091 for src/ASM/CPP2012policy/algorithm.tex
 Timestamp:
 Jun 15, 2012, 1:35:46 PM (8 years ago)
 File:

 1 edited
Legend:
 Unmodified
 Added
 Removed

src/ASM/CPP2012policy/algorithm.tex
r2086 r2091 8 8 9 9 The SDCC compiler~\cite{SDCC2011}, which has the MCS51 among its target 10 instruction sets, simply encodes every jump as a long jump without taking the11 distance into account. While certainly correct (the long jump can reach any 12 destination in memory) and rapid, it does result in a less than optimal 13 solution.10 instruction sets, simply encodes every branch instruction as a long jump 11 without taking the distance into account. While certainly correct (the long 12 jump can reach any destination in memory) and rapid, it does result in a less 13 than optimal solution. 14 14 15 15 The {\tt gcc} compiler suite~\cite{GCC2012}, while compiling C on the x86 16 16 architecture, uses a greatest fix point algorithm. In other words, it starts 17 off with all jumps encoded as the largest jumps available, and then tries to18 reduce jumps as much as possible.17 off with all branch instructions encoded as the largest jumps available, and 18 then tries to reduce branch instructions as much as possible. 19 19 20 Such an algorithm has the advantage that any intermediate result sit returns21 are correct: the solution where every jump is encoded as a large jump is 22 always possible, and the algorithm only reduces those jumps where the 23 destination address is in range for a shorter jump instruction. The algorithm 24 can thus be stopped after a determined amount of steps without losing 25 correctness.20 Such an algorithm has the advantage that any intermediate result it returns 21 is correct: the solution where every branch instruction is encoded as a large 22 jump is always possible, and the algorithm only reduces those branch 23 instructions whose destination address is in range for a shorter jump. 24 The algorithm can thus be stopped after a determined amount of steps without 25 losing correctness. 26 26 27 27 The result, however, is not necessarily optimal, even if the algorithm is run … … 38 38 previous section: if we only take short and long jumps into account, the jump 39 39 encoding can only switch from short to long, but never in the other direction. 40 When we add absolute jumps, however, it is theoretically possible for a jump to 41 switch from absolute to long and back, as explained in the previous section. 40 When we add absolute jumps, however, it is theoretically possible for a branch 41 instruction to switch from absolute to long and back, as shown in the previous 42 section. 42 43 43 44 Proving termination then becomes difficult, because there is nothing that 44 precludes a jump from switching back and forth between absolute and long45 indefinitely.45 precludes a branch instruction from switching back and forth between absolute 46 and long indefinitely. 46 47 47 48 In order to keep the algorithm in the same complexity class and more easily 48 prove termination, we decided to explicitly enforce the `jumps must always 49 increase' requirement: if a jump is encoded as a long jump in one step, it will 50 also be encoded as a long jump in all the following steps. This means that any 51 jump can change at most two times: once from short to absolute (or long), and 52 once from absolute to long. 49 prove termination, we decided to explicitly enforce the `branch instructions 50 must always grow longer' requirement: if a branch instruction is encoded as a 51 long jump in one iteration, it will also be encoded as a long jump in all the 52 following iterations. This means that the encoding of any branch instruction 53 can change at most two times: once from short to absolute (or long), and once 54 from absolute to long. 53 55 54 There is one complicating factor: suppose that a jump is encoded in step $n$ 55 as an absolute jump, but in step $n+1$ it is determined that (because of changes 56 elsewhere) it can now be encoded as a short jump. Due to the requirement that 57 jumps must always increase, this means that the jump will be encoded as an 58 absolute jump in step $n+1$ as well. 56 There is one complicating factor: suppose that a branch instruction is encoded 57 in step $n$ as an absolute jump, but in step $n+1$ it is determined that 58 (because of changes elsewhere) it can now be encoded as a short jump. Due to 59 the requirement that the branch instructions must always grow longer, this 60 means that the branch encoding will be encoded as an absolute jump in step 61 $n+1$ as well. 59 62 60 This is not necessarily correct, however: it is not the case that any short 61 jump can correctly be encoded as an absolute jump (a short jump can bridge 62 segments, whereas an absolute jump cannot). Therefore, in this situation 63 we decide to encode the jump as a long jump, which is always correct. 63 This is not necessarily correct, however: a branch instruction that can be 64 encoded as a short jump cannot always also be encoded as an absolute jump 65 (a short jump can bridge segments, whereas an absolute jump cannot). Therefore, 66 in this situation we decide to encode the branch instruction as a long jump, 67 which is always correct. 64 68 65 69 The resulting algorithm, while not optimal, is at least as good as the ones 66 70 from SDCC and {\tt gcc}, and potentially better. Its complexity remains 67 the same (there are at most $2n$ iterations, where $n$ is the number of jumps68 in the program).71 the same (there are at most $2n$ iterations, where $n$ is the number of branch 72 instructions in the program). 69 73 70 74 \subsection{The algorithm in detail} … … 172 176 Note that the $\sigma$ function used for label lookup varies depending on 173 177 whether the label is behind our current position or ahead of it. For 174 backward jumps, where the label is behind our current position, we can use178 backward branches, where the label is behind our current position, we can use 175 179 $sigma$ for lookup, since its memory address is already known. However, for 176 forward jumps, the memory address of the address of the label is not yet180 forward branches, the memory address of the address of the label is not yet 177 181 known, so we must use $old\_sigma$. 178 182 179 183 We cannot use $old\_sigma$ without change: it might be the case that we have 180 already changed some jumps before, making the program longer. We must therefore181 compensate for this by adding the size increase of the program to the label's 182 memory address according to $old\_sigma$, so that jump distances do not get 183 compromised.184 already increased the size some branch instructions before, making the program 185 longer and moving every instruction forward. We must compensate for this by 186 adding the size increase of the program to the label's memory address according 187 to $old\_sigma$, so that branch instruction spans do not get compromised. 184 188 185 189 Note also that we add the pc to $sigma$ at location $ppc+1$, whereas we add the 186 190 jump length at location $ppc$. We do this so that $sigma(ppc)$ will always 187 191 return a couple with the start address of the instruction at $ppc$ and the 188 length of its jump; the end address of the program can be found at $sigma(n+1)$, 189 where $n$ is the number of instructions in the program. 192 length of its branch instruction (if any); the end address of the program can 193 be found at $sigma(n+1)$, where $n$ is the number of instructions in the 194 program.
Note: See TracChangeset
for help on using the changeset viewer.