[1889] | 1 | \section{Our algorithm} |
---|
[2049] | 2 | |
---|
| 3 | \subsection{Design decisions} |
---|
| 4 | |
---|
[3361] | 5 | Given the NP-completeness of the problem, finding optimal solutions |
---|
| 6 | (using, for example, a constraint solver) can potentially be very costly. |
---|
[2049] | 7 | |
---|
[3362] | 8 | The SDCC compiler~\cite{SDCC2011}, which has a backend targeting the MCS-51 |
---|
[2096] | 9 | instruction set, simply encodes every branch instruction as a long jump |
---|
[2091] | 10 | without taking the distance into account. While certainly correct (the long |
---|
[2096] | 11 | jump can reach any destination in memory) and a very fast solution to compute, |
---|
[3362] | 12 | it results in a less than optimal solution in terms of output size and |
---|
| 13 | execution time. |
---|
[2049] | 14 | |
---|
[2096] | 15 | On the other hand, the {\tt gcc} compiler suite~\cite{GCC2012}, while compiling |
---|
| 16 | C on the x86 architecture, uses a greatest fix point algorithm. In other words, |
---|
| 17 | it starts with all branch instructions encoded as the largest jumps |
---|
| 18 | available, and then tries to reduce the size of branch instructions as much as |
---|
| 19 | possible. |
---|
[2080] | 20 | |
---|
[2091] | 21 | Such an algorithm has the advantage that any intermediate result it returns |
---|
| 22 | is correct: the solution where every branch instruction is encoded as a large |
---|
| 23 | jump is always possible, and the algorithm only reduces those branch |
---|
| 24 | instructions whose destination address is in range for a shorter jump. |
---|
[2096] | 25 | The algorithm can thus be stopped after a determined number of steps without |
---|
| 26 | sacrificing correctness. |
---|
[2049] | 27 | |
---|
[2096] | 28 | The result, however, is not necessarily optimal. Even if the algorithm is run |
---|
| 29 | until it terminates naturally, the fixed point reached is the {\em greatest} |
---|
[2080] | 30 | fixed point, not the least fixed point. Furthermore, {\tt gcc} (at least for |
---|
| 31 | the x86 architecture) only uses short and long jumps. This makes the algorithm |
---|
[2096] | 32 | more efficient, as shown in the previous section, but also results in a less |
---|
[2080] | 33 | optimal solution. |
---|
[2049] | 34 | |
---|
[2080] | 35 | In the CerCo assembler, we opted at first for a least fixed point algorithm, |
---|
[2085] | 36 | taking absolute jumps into account. |
---|
[2049] | 37 | |
---|
[2085] | 38 | Here, we ran into a problem with proving termination, as explained in the |
---|
| 39 | previous section: if we only take short and long jumps into account, the jump |
---|
| 40 | encoding can only switch from short to long, but never in the other direction. |
---|
[2091] | 41 | When we add absolute jumps, however, it is theoretically possible for a branch |
---|
[2096] | 42 | instruction to switch from absolute to long and back, as previously explained. |
---|
[2049] | 43 | |
---|
| 44 | Proving termination then becomes difficult, because there is nothing that |
---|
[2096] | 45 | precludes a branch instruction from oscillating back and forth between absolute |
---|
| 46 | and long jumps indefinitely. |
---|
[2049] | 47 | |
---|
[3361] | 48 | To keep the algorithm in the same complexity class and more easily |
---|
[2091] | 49 | prove termination, we decided to explicitly enforce the `branch instructions |
---|
| 50 | must always grow longer' requirement: if a branch instruction is encoded as a |
---|
| 51 | long jump in one iteration, it will also be encoded as a long jump in all the |
---|
[3361] | 52 | following iterations. Therefore the encoding of any branch instruction |
---|
[2091] | 53 | can change at most two times: once from short to absolute (or long), and once |
---|
| 54 | from absolute to long. |
---|
[2049] | 55 | |
---|
[2096] | 56 | There is one complicating factor. Suppose that a branch instruction is encoded |
---|
[2091] | 57 | in step $n$ as an absolute jump, but in step $n+1$ it is determined that |
---|
| 58 | (because of changes elsewhere) it can now be encoded as a short jump. Due to |
---|
[3361] | 59 | the requirement that the branch instructions must always grow longer, |
---|
| 60 | the branch encoding will be encoded as an absolute jump in step |
---|
[2091] | 61 | $n+1$ as well. |
---|
[2049] | 62 | |
---|
[2096] | 63 | This is not necessarily correct. A branch instruction that can be |
---|
| 64 | encoded as a short jump cannot always also be encoded as an absolute jump, as a |
---|
| 65 | short jump can bridge segments, whereas an absolute jump cannot. Therefore, |
---|
| 66 | in this situation we have decided to encode the branch instruction as a long |
---|
| 67 | jump, which is always correct. |
---|
[2049] | 68 | |
---|
[3362] | 69 | The resulting algorithm, therefore, will not return the least fixed point, as |
---|
| 70 | it might have too many long jumps. However, it is still better than the |
---|
| 71 | algorithms from SDCC and {\tt gcc}, since even in the worst case, it will still |
---|
| 72 | return a smaller or equal solution. |
---|
[2054] | 73 | |
---|
[3362] | 74 | As for complexity, there are at most $2n$ iterations, with $n$ the number of |
---|
| 75 | branch instructions. Practical tests within the CerCo project on small to |
---|
| 76 | medium pieces of code have shown that in almost all cases, a fixed point is |
---|
| 77 | reached in 3 passes. Only in one case did the algorithm need 4. |
---|
| 78 | |
---|
| 79 | This is not surprising: after all, the difference between short/absolute and |
---|
| 80 | long jumps is only one byte (three for conditional jumps). For a change from |
---|
| 81 | short/absolute to long to have an effect on other jumps is therefore relatively |
---|
| 82 | uncommon, which explains why a fixed point is reached so quickly. |
---|
| 83 | |
---|
[2054] | 84 | \subsection{The algorithm in detail} |
---|
| 85 | |
---|
| 86 | The branch displacement algorithm forms part of the translation from |
---|
[2096] | 87 | pseudocode to assembler. More specifically, it is used by the function that |
---|
[2054] | 88 | translates pseudo-addresses (natural numbers indicating the position of the |
---|
| 89 | instruction in the program) to actual addresses in memory. |
---|
| 90 | |
---|
[2096] | 91 | Our original intention was to have two different functions, one function |
---|
[2085] | 92 | $\mathtt{policy}: \mathbb{N} \rightarrow \{\mathtt{short\_jump}, |
---|
| 93 | \mathtt{absolute\_jump}, \mathtt{long\_jump}\}$ to associate jumps to their |
---|
| 94 | intended encoding, and a function $\sigma: \mathbb{N} \rightarrow |
---|
[2096] | 95 | \mathtt{Word}$ to associate pseudo-addresses to machine addresses. $\sigma$ |
---|
[2085] | 96 | would use $\mathtt{policy}$ to determine the size of jump instructions. |
---|
[2054] | 97 | |
---|
| 98 | This turned out to be suboptimal from the algorithmic point of view and |
---|
| 99 | impossible to prove correct. |
---|
| 100 | |
---|
| 101 | From the algorithmic point of view, in order to create the $\mathtt{policy}$ |
---|
| 102 | function, we must necessarily have a translation from pseudo-addresses |
---|
[2096] | 103 | to machine addresses (i.e. a $\sigma$ function): in order to judge the distance |
---|
[2054] | 104 | between a jump and its destination, we must know their memory locations. |
---|
| 105 | Conversely, in order to create the $\sigma$ function, we need to have the |
---|
| 106 | $\mathtt{policy}$ function, otherwise we do not know the sizes of the jump |
---|
| 107 | instructions in the program. |
---|
| 108 | |
---|
| 109 | Much the same problem appears when we try to prove the algorithm correct: the |
---|
| 110 | correctness of $\mathtt{policy}$ depends on the correctness of $\sigma$, and |
---|
| 111 | the correctness of $\sigma$ depends on the correctness of $\mathtt{policy}$. |
---|
| 112 | |
---|
| 113 | We solved this problem by integrating the $\mathtt{policy}$ and $\sigma$ |
---|
| 114 | algorithms. We now have a function |
---|
| 115 | $\sigma: \mathbb{N} \rightarrow \mathtt{Word} \times \mathtt{bool}$ which |
---|
[2096] | 116 | associates a pseudo-address to a machine address. The boolean denotes a forced |
---|
[2054] | 117 | long jump; as noted in the previous section, if during the fixed point |
---|
[2085] | 118 | computation an absolute jump changes to be potentially re-encoded as a short |
---|
| 119 | jump, the result is actually a long jump. It might therefore be the case that |
---|
| 120 | jumps are encoded as long jumps without this actually being necessary, and this |
---|
| 121 | information needs to be passed to the code generating function. |
---|
[2054] | 122 | |
---|
| 123 | The assembler function encodes the jumps by checking the distance between |
---|
[2085] | 124 | source and destination according to $\sigma$, so it could select an absolute |
---|
[2054] | 125 | jump in a situation where there should be a long jump. The boolean is there |
---|
| 126 | to prevent this from happening by indicating the locations where a long jump |
---|
| 127 | should be encoded, even if a shorter jump is possible. This has no effect on |
---|
| 128 | correctness, since a long jump is applicable in any situation. |
---|
| 129 | |
---|
[3361] | 130 | \begin{figure}[t] |
---|
[2054] | 131 | \begin{algorithmic} |
---|
[2064] | 132 | \Function{f}{$labels$,$old\_sigma$,$instr$,$ppc$,$acc$} |
---|
| 133 | \State $\langle added, pc, sigma \rangle \gets acc$ |
---|
| 134 | \If {$instr$ is a backward jump to $j$} |
---|
| 135 | \State $length \gets \mathrm{jump\_size}(pc,sigma_1(labels(j)))$ |
---|
| 136 | \ElsIf {$instr$ is a forward jump to $j$} |
---|
| 137 | \State $length \gets \mathrm{jump\_size}(pc,old\_sigma_1(labels(j))+added)$ |
---|
| 138 | \Else |
---|
| 139 | \State $length \gets \mathtt{short\_jump}$ |
---|
| 140 | \EndIf |
---|
| 141 | \State $old\_length \gets \mathrm{old\_sigma_1}(ppc)$ |
---|
| 142 | \State $new\_length \gets \mathrm{max}(old\_length, length)$ |
---|
| 143 | \State $old\_size \gets \mathrm{old\_sigma_2}(ppc)$ |
---|
| 144 | \State $new\_size \gets \mathrm{instruction\_size}(instr,new\_length)$ |
---|
| 145 | \State $new\_added \gets added+(new\_size-old\_size)$ |
---|
[3353] | 146 | \State $new\_sigma \gets old\_sigma$ |
---|
[2064] | 147 | \State $new\_sigma_1(ppc+1) \gets pc+new\_size$ |
---|
| 148 | \State $new\_sigma_2(ppc) \gets new\_length$ \\ |
---|
| 149 | \Return $\langle new\_added, pc+new\_size, new\_sigma \rangle$ |
---|
[2054] | 150 | \EndFunction |
---|
| 151 | \end{algorithmic} |
---|
| 152 | \caption{The heart of the algorithm} |
---|
| 153 | \label{f:jump_expansion_step} |
---|
| 154 | \end{figure} |
---|
[2064] | 155 | |
---|
[2096] | 156 | The algorithm, shown in Figure~\ref{f:jump_expansion_step}, works by folding the |
---|
[2064] | 157 | function {\sc f} over the entire program, thus gradually constructing $sigma$. |
---|
| 158 | This constitutes one step in the fixed point calculation; successive steps |
---|
| 159 | repeat the fold until a fixed point is reached. |
---|
| 160 | |
---|
| 161 | Parameters of the function {\sc f} are: |
---|
| 162 | \begin{itemize} |
---|
| 163 | \item a function $labels$ that associates a label to its pseudo-address; |
---|
| 164 | \item $old\_sigma$, the $\sigma$ function returned by the previous |
---|
[2086] | 165 | iteration of the fixed point calculation; |
---|
[2064] | 166 | \item $instr$, the instruction currently under consideration; |
---|
| 167 | \item $ppc$, the pseudo-address of $instr$; |
---|
| 168 | \item $acc$, the fold accumulator, which contains $pc$ (the highest memory |
---|
| 169 | address reached so far), $added$ (the number of bytes added to the program |
---|
| 170 | size with respect to the previous iteration), and of course $sigma$, the |
---|
| 171 | $\sigma$ function under construction. |
---|
| 172 | \end{itemize} |
---|
| 173 | |
---|
| 174 | The first two are parameters that remain the same through one iteration, the |
---|
[2096] | 175 | final three are standard parameters for a fold function (including $ppc$, |
---|
[2064] | 176 | which is simply the number of instructions of the program already processed). |
---|
| 177 | |
---|
| 178 | The $\sigma$ functions used by {\sc f} are not of the same type as the final |
---|
| 179 | $\sigma$ function: they are of type |
---|
| 180 | $\sigma: \mathbb{N} \rightarrow \mathbb{N} \times \{\mathtt{short\_jump}, |
---|
[2085] | 181 | \mathtt{absolute\_jump},\mathtt{long\_jump}\}$; a function that associates a |
---|
[2096] | 182 | pseudo-address with a memory address and a jump length. We do this to be able |
---|
[2098] | 183 | to ease the comparison of jump lengths between iterations. In the algorithm, |
---|
[2064] | 184 | we use the notation $sigma_1(x)$ to denote the memory address corresponding to |
---|
| 185 | $x$, and $sigma_2(x)$ to denote the jump length corresponding to $x$. |
---|
| 186 | |
---|
| 187 | Note that the $\sigma$ function used for label lookup varies depending on |
---|
| 188 | whether the label is behind our current position or ahead of it. For |
---|
[2091] | 189 | backward branches, where the label is behind our current position, we can use |
---|
[2064] | 190 | $sigma$ for lookup, since its memory address is already known. However, for |
---|
[2091] | 191 | forward branches, the memory address of the address of the label is not yet |
---|
[2064] | 192 | known, so we must use $old\_sigma$. |
---|
| 193 | |
---|
| 194 | We cannot use $old\_sigma$ without change: it might be the case that we have |
---|
[2096] | 195 | already increased the size of some branch instructions before, making the program |
---|
[2091] | 196 | longer and moving every instruction forward. We must compensate for this by |
---|
| 197 | adding the size increase of the program to the label's memory address according |
---|
| 198 | to $old\_sigma$, so that branch instruction spans do not get compromised. |
---|
[2064] | 199 | |
---|
| 200 | Note also that we add the pc to $sigma$ at location $ppc+1$, whereas we add the |
---|
| 201 | jump length at location $ppc$. We do this so that $sigma(ppc)$ will always |
---|
[2096] | 202 | return a pair with the start address of the instruction at $ppc$ and the |
---|
[2091] | 203 | length of its branch instruction (if any); the end address of the program can |
---|
| 204 | be found at $sigma(n+1)$, where $n$ is the number of instructions in the |
---|
| 205 | program. |
---|