[1889] | 1 | \section{The problem} |
---|
| 2 | |
---|
| 3 | The problem of branch displacement optimisation, also known as jump encoding, is |
---|
| 4 | a well-known problem in assembler design. |
---|
| 5 | |
---|
| 6 | In many instruction sets, amongst which the ubiquitous x86 architecture (both |
---|
| 7 | its 32-- and 64--bit incarnations), there are instructions whose addressing |
---|
| 8 | mode varies with the distance between the instruction and the object they |
---|
| 9 | address. Mostly this occurs with jump instructions; for example, in the |
---|
| 10 | x86-64 instruction set there are eleven different forms of the unconditional |
---|
| 11 | jump instruction, all with different ranges and semantics (only six are valid |
---|
| 12 | in 64-bit mode, for example). Some examples are shown in |
---|
| 13 | figure~\ref{f:x86jumps}: |
---|
| 14 | |
---|
| 15 | \begin{figure}[h] |
---|
| 16 | \begin{tabular}{|l|l|l|} |
---|
| 17 | \hline |
---|
| 18 | Instruction & Size (bytes) & Displacement range \\ |
---|
| 19 | \hline |
---|
| 20 | Short jump & 2 & -128 to 127 bytes \\ |
---|
| 21 | Relative near jump & 5 & $-2^{32}$ to $2^{32}-1$ bytes \\ |
---|
| 22 | Absolute near jump & 6 & one segment (64-bit address) \\ |
---|
| 23 | Far jump & 8 & entire memory \\ |
---|
| 24 | \hline |
---|
| 25 | \end{tabular} |
---|
| 26 | \caption{List of x86 jump instructions} |
---|
| 27 | \label{f:x86jumps} |
---|
| 28 | \end{figure} |
---|
| 29 | |
---|
| 30 | The chosen target architecture of the CerCo project is the Intel MCS-51, which |
---|
| 31 | features three types of jump instructions, as shown in |
---|
| 32 | figure~\ref{f:mcs51jumps}. |
---|
| 33 | |
---|
| 34 | \begin{figure}[h] |
---|
| 35 | \begin{tabular}{|l|l|l|l|} |
---|
| 36 | \hline |
---|
| 37 | Instruction & Size & Execution time & Displacement range \\ |
---|
| 38 | & (bytes) & (cycles) & \\ |
---|
| 39 | \hline |
---|
| 40 | SJMP (`short jump') & 2 & 2 & -128 to 127 bytes \\ |
---|
| 41 | AJMP (`medium jump') & 2 & 2 & one segment (11-bit address) \\ |
---|
| 42 | LJMP (`long jump') & 3 & 3 & entire memory \\ |
---|
| 43 | \hline |
---|
| 44 | \end{tabular} |
---|
| 45 | \caption{List of MCS-51 jump instructions} |
---|
| 46 | \label{f:mcs51jumps} |
---|
| 47 | \end{figure} |
---|
| 48 | |
---|
| 49 | The conditional jump instruction is only available in short form, which |
---|
| 50 | means that a conditional jump outside the short address range has to be |
---|
| 51 | encoded using two jumps; the call instruction is only available in |
---|
| 52 | medium and long forms. |
---|
| 53 | |
---|
| 54 | Note that even though the MCS-51 architecture is much less advanced and more |
---|
| 55 | simple than the x86-64 architecture, the basic types of jump remain the same: |
---|
| 56 | a short jump with a limited range, an intra-segment jump and a jump that can |
---|
| 57 | reach the entire available memory. |
---|
| 58 | |
---|
| 59 | Generally, in assembly code, there is only one way to indicate a jump; the |
---|
| 60 | algorithm used by the assembler to encode these jumps into the different |
---|
| 61 | machine instructions is known as the {\tt branch displacement algorithm}. The |
---|
| 62 | optimisation problem consists in using as small an encoding as possible, thus |
---|
| 63 | minimising program length and execution time. |
---|
| 64 | |
---|
| 65 | This problem is known to be NP-complete~\cite{Robertson1979,Szymanski1978}, |
---|
| 66 | which could make finding an optimal solution very time-consuming. |
---|
| 67 | |
---|
| 68 | The canonical solution, as shown in~\cite{Szymanski1978} or more recently |
---|
| 69 | in~\cite{Dickson2008} for the x86 instruction set, is to use a fixpoint |
---|
| 70 | algorithm that starts out with the shortest possible encoding (all jumps |
---|
| 71 | encoded as short jumps, which is very probably not a correct solution) and then |
---|
| 72 | iterates over the program to re-encode those jumps whose target is outside |
---|
| 73 | their range. |
---|
| 74 | |
---|
| 75 | \subsection*{Adding medium jumps} |
---|
| 76 | |
---|
| 77 | In both the canonical solutions presented, the encoding of a jump is only |
---|
| 78 | dependent on the distance between the jump and its target: below a certain |
---|
| 79 | value a short jump can be used; above this value the jump must be encoded |
---|
| 80 | as a long jump. |
---|
| 81 | |
---|
| 82 | Here, termination of the smallest fixpoint algorithm is easy to prove. All |
---|
| 83 | jumps start out encoded as short jumps, which means that the distance between |
---|
| 84 | any jump and its target is as short as possible. If we therefore need to encode |
---|
| 85 | a jump $j$ as a long jump, we can be sure that we can never reach a situation |
---|
| 86 | where the span of $j$ is so small that it can be encoded as a short jump. This |
---|
| 87 | reasoning holds throughout the subsequent iterations of the algorithms: short |
---|
| 88 | jumps can change into long jumps, but not vice versa. Hence, the algorithm |
---|
| 89 | either terminates when a fixpoint is reached or when all short jumps have been |
---|
| 90 | changed into long jumps. |
---|
| 91 | |
---|
| 92 | Also, we can be certain that we have reached an optimal solution: a short jump |
---|
| 93 | is only changed into a long jump if it is absolutely necessary. |
---|
| 94 | |
---|
| 95 | However, neither of these claims (termination nor optimality) hold when we add |
---|
| 96 | the medium jump. |
---|
| 97 | |
---|
| 98 | The reason for this is that with medium jumps, the encoding of a jump no longer |
---|
| 99 | depends only on the distance between the jump and its target: in order for a |
---|
| 100 | medium jump to be possible, they need to be in the same segment. It is therefore |
---|
| 101 | entirely possible for two jumps with the same span to be encoded in different |
---|
| 102 | ways (medium if the jump and its destination are in the same segment, long if |
---|
| 103 | this is not the case). |
---|
| 104 | |
---|
| 105 | This invalidates the termination argument: a jump, once encoded as a long jump, |
---|
| 106 | can be re-encoded during a later iteration as a medium jump. Consider the |
---|
| 107 | program shown in figure~\ref{f:term_example}. At the start of the first |
---|
| 108 | iteration, both the jump to {\tt X} and the jump to $\mathtt{L}_{0}$ are |
---|
| 109 | encoded as small jumps. Let us assume that in this case, the placement of |
---|
| 110 | $\mathtt{L}_{0}$ and the jump to it are such that $\mathtt{L}_{0}$ is just |
---|
| 111 | outside the segment that contains this jump. Let us also assume that the |
---|
| 112 | distance between $\mathtt{L}_{0}$ and the jump to it are too large for the jump |
---|
| 113 | to be encoded as a short jump. |
---|
| 114 | |
---|
| 115 | All this means that in the second iteration, the jump to $\mathtt{L}_{0}$ will |
---|
| 116 | be encoded as a long jump. If we assume that the jump to {\tt X} is encoded as |
---|
| 117 | a long jump as well, the size of the jump will increase and $\mathtt{L}_{0}$ |
---|
| 118 | will be `propelled' into the same segment as its jump. Hence, in the next |
---|
| 119 | iteration, it can be encoded as a medium jump. At first glance, there is |
---|
| 120 | nothing that prevents us from making a construction where two jumps interact |
---|
| 121 | in such a way as to keep switching between long and medium indefinitely. |
---|
| 122 | |
---|
| 123 | \begin{figure}[h] |
---|
| 124 | \begin{alltt} |
---|
| 125 | jmp X |
---|
| 126 | \vdots |
---|
| 127 | L\(\sb{0}\): |
---|
| 128 | \vdots |
---|
| 129 | jmp L\(\sb{0}\) |
---|
| 130 | \end{alltt} |
---|
| 131 | \caption{Example of a program where a long jump becomes medium} |
---|
| 132 | \label{f:term_example} |
---|
| 133 | \end{figure} |
---|
| 134 | |
---|
| 135 | The optimality argument no longer holds either. Let us consider the program |
---|
| 136 | shown in figure~\ref{f:opt_example}. Suppose that the distance between |
---|
| 137 | $\mathtt{L}_{0}$ and $\mathtt{L}_{1}$ is such that if {\tt jmp X} is encoded |
---|
| 138 | as a short jump, there is a segment border just after $\mathtt{L}_{1}$. Let |
---|
| 139 | us also assume that the three jumps to $\mathtt{L}_{1}$ are all in the same |
---|
| 140 | segment, but far enough away from $\mathtt{L}_{1}$ that they cannot be encoded |
---|
| 141 | as short jumps. |
---|
| 142 | |
---|
| 143 | Then, if {\tt jmp X} were to be encoded as a short jump, which is clearly |
---|
| 144 | possible, all of the jumps to $\mathtt{L}_{1}$ would have to be encoded as |
---|
| 145 | long jumps. However, if {\tt jmp X} were to be encoded as a long jump, and |
---|
| 146 | therefore increase in size, $\mathtt{L}_{1}$ would be `propelled' across the |
---|
| 147 | segment border, so that the three jumps to $\mathtt{L}_{1}$ could be encoded |
---|
| 148 | as medium jumps. Depending on the relative sizes of long and medium jumps, this |
---|
| 149 | solution might actually be smaller than the one reached by the smallest |
---|
| 150 | fixpoint algorithm. |
---|
| 151 | |
---|
| 152 | \begin{figure}[h] |
---|
| 153 | \begin{alltt} |
---|
| 154 | L\(\sb{0}\): jmp X |
---|
| 155 | X: |
---|
| 156 | \vdots |
---|
| 157 | L\(\sb{1}\): |
---|
| 158 | \vdots |
---|
| 159 | jmp L\(\sb{1}\) |
---|
| 160 | \vdots |
---|
| 161 | jmp L\(\sb{1}\) |
---|
| 162 | \vdots |
---|
| 163 | jmp L\(\sb{1}\) |
---|
| 164 | \vdots |
---|
| 165 | \end{alltt} |
---|
| 166 | \caption{Example of a program where the fixed-point algorithm is not optimal} |
---|
| 167 | \label{f:opt_example} |
---|
| 168 | \end{figure} |
---|