[2054] | 1 | \section{Introduction} |
---|
[1889] | 2 | |
---|
| 3 | The problem of branch displacement optimisation, also known as jump encoding, is |
---|
[3362] | 4 | a well-known problem in assembler design~\cite{Hyde2006}. Its origin lies in the |
---|
[2091] | 5 | fact that in many architecture sets, the encoding (and therefore size) of some |
---|
| 6 | instructions depends on the distance to their operand (the instruction 'span'). |
---|
[2096] | 7 | The branch displacement optimisation problem consists of encoding these |
---|
[2091] | 8 | span-dependent instructions in such a way that the resulting program is as |
---|
| 9 | small as possible. |
---|
[1889] | 10 | |
---|
[2085] | 11 | This problem is the subject of the present paper. After introducing the problem |
---|
| 12 | in more detail, we will discuss the solutions used by other compilers, present |
---|
[2096] | 13 | the algorithm we use in the CerCo assembler, and discuss its verification, |
---|
| 14 | that is the proofs of termination and correctness using the Matita proof |
---|
| 15 | assistant~\cite{Asperti2007}. |
---|
[2085] | 16 | |
---|
| 17 | The research presented in this paper has been executed within the CerCo project |
---|
| 18 | which aims at formally verifying a C compiler with cost annotations. The |
---|
| 19 | target architecture for this project is the MCS-51, whose instruction set |
---|
| 20 | contains span-dependent instructions. Furthermore, its maximum addressable |
---|
[2096] | 21 | memory size is very small (64 Kb), which makes it important to generate |
---|
[2085] | 22 | programs that are as small as possible. |
---|
[1889] | 23 | |
---|
[2085] | 24 | With this optimisation, however, comes increased complexity and hence |
---|
[2091] | 25 | increased possibility for error. We must make sure that the branch instructions |
---|
| 26 | are encoded correctly, otherwise the assembled program will behave |
---|
| 27 | unpredictably. |
---|
[2085] | 28 | |
---|
| 29 | \section{The branch displacement optimisation problem} |
---|
| 30 | |
---|
[2091] | 31 | In most modern instruction sets that have them, the only span-dependent |
---|
| 32 | instructions are branch instructions. Taking the ubiquitous x86-64 instruction |
---|
[2096] | 33 | set as an example, we find that it contains eleven different forms of the |
---|
[2091] | 34 | unconditional branch instruction, all with different ranges, instruction sizes |
---|
[2085] | 35 | and semantics (only six are valid in 64-bit mode, for example). Some examples |
---|
[2099] | 36 | are shown in Figure~\ref{f:x86jumps} (see also~\cite{IntelDev}). |
---|
[2085] | 37 | |
---|
[1889] | 38 | \begin{figure}[h] |
---|
[2085] | 39 | \begin{center} |
---|
[1889] | 40 | \begin{tabular}{|l|l|l|} |
---|
| 41 | \hline |
---|
| 42 | Instruction & Size (bytes) & Displacement range \\ |
---|
| 43 | \hline |
---|
| 44 | Short jump & 2 & -128 to 127 bytes \\ |
---|
| 45 | Relative near jump & 5 & $-2^{32}$ to $2^{32}-1$ bytes \\ |
---|
| 46 | Absolute near jump & 6 & one segment (64-bit address) \\ |
---|
[3354] | 47 | Far jump & 8 & entire memory (indirect jump) \\ |
---|
[1889] | 48 | \hline |
---|
| 49 | \end{tabular} |
---|
[2085] | 50 | \end{center} |
---|
[2091] | 51 | \caption{List of x86 branch instructions} |
---|
[1889] | 52 | \label{f:x86jumps} |
---|
| 53 | \end{figure} |
---|
| 54 | |
---|
| 55 | The chosen target architecture of the CerCo project is the Intel MCS-51, which |
---|
[2085] | 56 | features three types of branch instructions (or jump instructions; the two terms |
---|
[2096] | 57 | are used interchangeably), as shown in Figure~\ref{f:mcs51jumps}. |
---|
[1889] | 58 | |
---|
| 59 | \begin{figure}[h] |
---|
[2085] | 60 | \begin{center} |
---|
[1889] | 61 | \begin{tabular}{|l|l|l|l|} |
---|
| 62 | \hline |
---|
| 63 | Instruction & Size & Execution time & Displacement range \\ |
---|
| 64 | & (bytes) & (cycles) & \\ |
---|
| 65 | \hline |
---|
| 66 | SJMP (`short jump') & 2 & 2 & -128 to 127 bytes \\ |
---|
[2085] | 67 | AJMP (`absolute jump') & 2 & 2 & one segment (11-bit address) \\ |
---|
[1889] | 68 | LJMP (`long jump') & 3 & 3 & entire memory \\ |
---|
| 69 | \hline |
---|
| 70 | \end{tabular} |
---|
[2085] | 71 | \end{center} |
---|
[2091] | 72 | \caption{List of MCS-51 branch instructions} |
---|
[1889] | 73 | \label{f:mcs51jumps} |
---|
| 74 | \end{figure} |
---|
| 75 | |
---|
[2091] | 76 | Conditional branch instructions are only available in short form, which |
---|
| 77 | means that a conditional branch outside the short address range has to be |
---|
| 78 | encoded using three branch instructions (for instructions whose logical |
---|
| 79 | negation is available, it can be done with two branch instructions, but for |
---|
[3362] | 80 | some instructions this is not the case). The call instruction is |
---|
[2091] | 81 | only available in absolute and long forms. |
---|
[1889] | 82 | |
---|
[3362] | 83 | Note that even though the MCS-51 architecture is much less advanced and much |
---|
| 84 | simpler than the x86-64 architecture, the basic types of branch instruction |
---|
[2091] | 85 | remain the same: a short jump with a limited range, an intra-segment jump and a |
---|
| 86 | jump that can reach the entire available memory. |
---|
[1889] | 87 | |
---|
[2096] | 88 | Generally, in code fed to the assembler as input, the only |
---|
| 89 | difference between branch instructions is semantics, not span. This |
---|
[2091] | 90 | means that a distinction is made between an unconditional branch and the |
---|
| 91 | several kinds of conditional branch, but not between their short, absolute or |
---|
| 92 | long variants. |
---|
[2085] | 93 | |
---|
[2091] | 94 | The algorithm used by the assembler to encode these branch instructions into |
---|
| 95 | the different machine instructions is known as the {\em branch displacement |
---|
[2096] | 96 | algorithm}. The optimisation problem consists of finding as small an encoding as |
---|
[2091] | 97 | possible, thus minimising program length and execution time. |
---|
[1889] | 98 | |
---|
[3362] | 99 | Similar problems, e.g. the branch displacement optimisation problem for other |
---|
| 100 | architectures, are known to be NP-complete~\cite{Robertson1979,Szymanski1978}, |
---|
[1889] | 101 | which could make finding an optimal solution very time-consuming. |
---|
| 102 | |
---|
[2085] | 103 | The canonical solution, as shown by Szymanski~\cite{Szymanski1978} or more |
---|
| 104 | recently by Dickson~\cite{Dickson2008} for the x86 instruction set, is to use a |
---|
[2096] | 105 | fixed point algorithm that starts with the shortest possible encoding (all |
---|
| 106 | branch instruction encoded as short jumps, which is likely not a correct |
---|
[3361] | 107 | solution) and then iterates over the source to re-encode those branch |
---|
[2091] | 108 | instructions whose target is outside their range. |
---|
[1889] | 109 | |
---|
[2085] | 110 | \subsection*{Adding absolute jumps} |
---|
[1889] | 111 | |
---|
[2049] | 112 | In both papers mentioned above, the encoding of a jump is only dependent on the |
---|
| 113 | distance between the jump and its target: below a certain value a short jump |
---|
| 114 | can be used; above this value the jump must be encoded as a long jump. |
---|
[1889] | 115 | |
---|
[2064] | 116 | Here, termination of the smallest fixed point algorithm is easy to prove. All |
---|
[2091] | 117 | branch instructions start out encoded as short jumps, which means that the |
---|
| 118 | distance between any branch instruction and its target is as short as possible. |
---|
| 119 | If, in this situation, there is a branch instruction $b$ whose span is not |
---|
| 120 | within the range for a short jump, we can be sure that we can never reach a |
---|
| 121 | situation where the span of $j$ is so small that it can be encoded as a short |
---|
| 122 | jump. This argument continues to hold throughout the subsequent iterations of |
---|
[2096] | 123 | the algorithm: short jumps can change into long jumps, but not \emph{vice versa}, |
---|
| 124 | as spans only increase. Hence, the algorithm either terminates early when a fixed |
---|
[2091] | 125 | point is reached or when all short jumps have been changed into long jumps. |
---|
[1889] | 126 | |
---|
| 127 | Also, we can be certain that we have reached an optimal solution: a short jump |
---|
| 128 | is only changed into a long jump if it is absolutely necessary. |
---|
| 129 | |
---|
| 130 | However, neither of these claims (termination nor optimality) hold when we add |
---|
[3362] | 131 | the absolute jump. With absolute jumps, the encoding of a branch |
---|
[2091] | 132 | instruction no longer depends only on the distance between the branch |
---|
[3362] | 133 | instruction and its target. An absolute jump is possible when instruction and |
---|
| 134 | target are in the same segment (for the MCS-51, this means that the first 5 |
---|
[2091] | 135 | bytes of their addresses have to be equal). It is therefore entirely possible |
---|
| 136 | for two branch instructions with the same span to be encoded in different ways |
---|
| 137 | (absolute if the branch instruction and its target are in the same segment, |
---|
| 138 | long if this is not the case). |
---|
[1889] | 139 | |
---|
[3361] | 140 | \begin{figure}[t] |
---|
| 141 | \begin{subfigure}[b]{.45\linewidth} |
---|
[2096] | 142 | \begin{alltt} |
---|
| 143 | jmp X |
---|
[3361] | 144 | \ldots |
---|
| 145 | L\(\sb{0}\): \ldots |
---|
| 146 | % Start of new segment if |
---|
| 147 | % jmp X is encoded as short |
---|
| 148 | \ldots |
---|
[2096] | 149 | jmp L\(\sb{0}\) |
---|
| 150 | \end{alltt} |
---|
| 151 | \caption{Example of a program where a long jump becomes absolute} |
---|
| 152 | \label{f:term_example} |
---|
[3361] | 153 | \end{subfigure} |
---|
| 154 | \hfill |
---|
| 155 | \begin{subfigure}[b]{.45\linewidth} |
---|
| 156 | \begin{alltt} |
---|
| 157 | L\(\sb{0}\): jmp X |
---|
| 158 | X: \ldots |
---|
| 159 | \ldots |
---|
| 160 | L\(\sb{1}\): \ldots |
---|
| 161 | % Start of new segment if |
---|
| 162 | % jmp X is encoded as short |
---|
| 163 | \ldots |
---|
| 164 | jmp L\(\sb{1}\) |
---|
| 165 | \ldots |
---|
| 166 | jmp L\(\sb{1}\) |
---|
| 167 | \ldots |
---|
| 168 | jmp L\(\sb{1}\) |
---|
| 169 | \ldots |
---|
| 170 | \end{alltt} |
---|
| 171 | \caption{Example of a program where the fixed-point algorithm is not optimal} |
---|
| 172 | \label{f:opt_example} |
---|
| 173 | \end{subfigure} |
---|
[2096] | 174 | \end{figure} |
---|
| 175 | |
---|
| 176 | This invalidates our earlier termination argument: a branch instruction, once encoded |
---|
[2091] | 177 | as a long jump, can be re-encoded during a later iteration as an absolute jump. |
---|
[2096] | 178 | Consider the program shown in Figure~\ref{f:term_example}. At the start of the |
---|
[2091] | 179 | first iteration, both the branch to {\tt X} and the branch to $\mathtt{L}_{0}$ |
---|
| 180 | are encoded as small jumps. Let us assume that in this case, the placement of |
---|
| 181 | $\mathtt{L}_{0}$ and the branch to it are such that $\mathtt{L}_{0}$ is just |
---|
| 182 | outside the segment that contains this branch. Let us also assume that the |
---|
[3353] | 183 | distance between $\mathtt{L}_{0}$ and the branch to it is too large for the |
---|
[2091] | 184 | branch instruction to be encoded as a short jump. |
---|
[1889] | 185 | |
---|
[2091] | 186 | All this means that in the second iteration, the branch to $\mathtt{L}_{0}$ will |
---|
| 187 | be encoded as a long jump. If we assume that the branch to {\tt X} is encoded as |
---|
| 188 | a long jump as well, the size of the branch instruction will increase and |
---|
| 189 | $\mathtt{L}_{0}$ will be `propelled' into the same segment as its branch |
---|
| 190 | instruction, because every subsequent instruction will move one byte forward. |
---|
| 191 | Hence, in the third iteration, the branch to $\mathtt{L}_{0}$ can be encoded as |
---|
| 192 | an absolute jump. At first glance, there is nothing that prevents us from |
---|
[2096] | 193 | constructing a configuration where two branch instructions interact in such a |
---|
| 194 | way as to iterate indefinitely between long and absolute encodings. |
---|
[1889] | 195 | |
---|
[2096] | 196 | This situation mirrors the explanation by Szymanski~\cite{Szymanski1978} of why |
---|
| 197 | the branch displacement optimisation problem is NP-complete. In this explanation, |
---|
| 198 | a condition for NP-completeness is the fact that programs be allowed to contain |
---|
| 199 | {\em pathological} jumps. These are branch instructions that can normally not be |
---|
| 200 | encoded as a short(er) jump, but gain this property when some other branch |
---|
| 201 | instructions are encoded as a long(er) jump. This is exactly what happens in |
---|
| 202 | Figure~\ref{f:term_example}. By encoding the first branch instruction as a long |
---|
| 203 | jump, another branch instruction switches from long to absolute (which is |
---|
| 204 | shorter). |
---|
[1889] | 205 | |
---|
[3362] | 206 | In addition, our previous optimality argument no longer holds. Consider the |
---|
| 207 | program shown in Figure~\ref{f:opt_example}. Suppose that the distance between |
---|
[1889] | 208 | $\mathtt{L}_{0}$ and $\mathtt{L}_{1}$ is such that if {\tt jmp X} is encoded |
---|
| 209 | as a short jump, there is a segment border just after $\mathtt{L}_{1}$. Let |
---|
[3362] | 210 | us also assume that all three branches to $\mathtt{L}_{1}$ are in the same |
---|
[1889] | 211 | segment, but far enough away from $\mathtt{L}_{1}$ that they cannot be encoded |
---|
| 212 | as short jumps. |
---|
| 213 | |
---|
[2096] | 214 | Then, if {\tt jmp X} were to be encoded as a short jump, which is clearly |
---|
| 215 | possible, all of the branches to $\mathtt{L}_{1}$ would have to be encoded as |
---|
| 216 | long jumps. However, if {\tt jmp X} were to be encoded as a long jump, and |
---|
| 217 | therefore increase in size, $\mathtt{L}_{1}$ would be `propelled' across the |
---|
| 218 | segment border, so that the three branches to $\mathtt{L}_{1}$ could be encoded |
---|
| 219 | as absolute jumps. Depending on the relative sizes of long and absolute jumps, |
---|
| 220 | this solution might actually be smaller than the one reached by the smallest |
---|
| 221 | fixed point algorithm. |
---|