[2054] | 1 | \section{Introduction} |
---|
[1889] | 2 | |
---|
| 3 | The problem of branch displacement optimisation, also known as jump encoding, is |
---|
[2091] | 4 | a well-known problem in assembler design~\cite{Hyde2006}. It is caused by the |
---|
| 5 | fact that in many architecture sets, the encoding (and therefore size) of some |
---|
| 6 | instructions depends on the distance to their operand (the instruction 'span'). |
---|
[2096] | 7 | The branch displacement optimisation problem consists of encoding these |
---|
[2091] | 8 | span-dependent instructions in such a way that the resulting program is as |
---|
| 9 | small as possible. |
---|
[1889] | 10 | |
---|
[2085] | 11 | This problem is the subject of the present paper. After introducing the problem |
---|
| 12 | in more detail, we will discuss the solutions used by other compilers, present |
---|
[2096] | 13 | the algorithm we use in the CerCo assembler, and discuss its verification, |
---|
| 14 | that is the proofs of termination and correctness using the Matita proof |
---|
| 15 | assistant~\cite{Asperti2007}. |
---|
[2085] | 16 | |
---|
| 17 | The research presented in this paper has been executed within the CerCo project |
---|
| 18 | which aims at formally verifying a C compiler with cost annotations. The |
---|
| 19 | target architecture for this project is the MCS-51, whose instruction set |
---|
| 20 | contains span-dependent instructions. Furthermore, its maximum addressable |
---|
[2096] | 21 | memory size is very small (64 Kb), which makes it important to generate |
---|
[2085] | 22 | programs that are as small as possible. |
---|
[1889] | 23 | |
---|
[2085] | 24 | With this optimisation, however, comes increased complexity and hence |
---|
[2091] | 25 | increased possibility for error. We must make sure that the branch instructions |
---|
| 26 | are encoded correctly, otherwise the assembled program will behave |
---|
| 27 | unpredictably. |
---|
[2085] | 28 | |
---|
| 29 | \section{The branch displacement optimisation problem} |
---|
| 30 | |
---|
[2091] | 31 | In most modern instruction sets that have them, the only span-dependent |
---|
| 32 | instructions are branch instructions. Taking the ubiquitous x86-64 instruction |
---|
[2096] | 33 | set as an example, we find that it contains eleven different forms of the |
---|
[2091] | 34 | unconditional branch instruction, all with different ranges, instruction sizes |
---|
[2085] | 35 | and semantics (only six are valid in 64-bit mode, for example). Some examples |
---|
[2099] | 36 | are shown in Figure~\ref{f:x86jumps} (see also~\cite{IntelDev}). |
---|
[2085] | 37 | |
---|
[1889] | 38 | \begin{figure}[h] |
---|
[2085] | 39 | \begin{center} |
---|
[1889] | 40 | \begin{tabular}{|l|l|l|} |
---|
| 41 | \hline |
---|
| 42 | Instruction & Size (bytes) & Displacement range \\ |
---|
| 43 | \hline |
---|
| 44 | Short jump & 2 & -128 to 127 bytes \\ |
---|
| 45 | Relative near jump & 5 & $-2^{32}$ to $2^{32}-1$ bytes \\ |
---|
| 46 | Absolute near jump & 6 & one segment (64-bit address) \\ |
---|
| 47 | Far jump & 8 & entire memory \\ |
---|
| 48 | \hline |
---|
| 49 | \end{tabular} |
---|
[2085] | 50 | \end{center} |
---|
[2091] | 51 | \caption{List of x86 branch instructions} |
---|
[1889] | 52 | \label{f:x86jumps} |
---|
| 53 | \end{figure} |
---|
| 54 | |
---|
| 55 | The chosen target architecture of the CerCo project is the Intel MCS-51, which |
---|
[2085] | 56 | features three types of branch instructions (or jump instructions; the two terms |
---|
[2096] | 57 | are used interchangeably), as shown in Figure~\ref{f:mcs51jumps}. |
---|
[1889] | 58 | |
---|
| 59 | \begin{figure}[h] |
---|
[2085] | 60 | \begin{center} |
---|
[1889] | 61 | \begin{tabular}{|l|l|l|l|} |
---|
| 62 | \hline |
---|
| 63 | Instruction & Size & Execution time & Displacement range \\ |
---|
| 64 | & (bytes) & (cycles) & \\ |
---|
| 65 | \hline |
---|
| 66 | SJMP (`short jump') & 2 & 2 & -128 to 127 bytes \\ |
---|
[2085] | 67 | AJMP (`absolute jump') & 2 & 2 & one segment (11-bit address) \\ |
---|
[1889] | 68 | LJMP (`long jump') & 3 & 3 & entire memory \\ |
---|
| 69 | \hline |
---|
| 70 | \end{tabular} |
---|
[2085] | 71 | \end{center} |
---|
[2091] | 72 | \caption{List of MCS-51 branch instructions} |
---|
[1889] | 73 | \label{f:mcs51jumps} |
---|
| 74 | \end{figure} |
---|
| 75 | |
---|
[2091] | 76 | Conditional branch instructions are only available in short form, which |
---|
| 77 | means that a conditional branch outside the short address range has to be |
---|
| 78 | encoded using three branch instructions (for instructions whose logical |
---|
| 79 | negation is available, it can be done with two branch instructions, but for |
---|
[2096] | 80 | some instructions this is not available); the call instruction is |
---|
[2091] | 81 | only available in absolute and long forms. |
---|
[1889] | 82 | |
---|
[2096] | 83 | Note that even though the MCS-51 architecture is much less advanced and simpler |
---|
| 84 | than the x86-64 architecture, the basic types of branch instruction |
---|
[2091] | 85 | remain the same: a short jump with a limited range, an intra-segment jump and a |
---|
| 86 | jump that can reach the entire available memory. |
---|
[1889] | 87 | |
---|
[2096] | 88 | Generally, in code fed to the assembler as input, the only |
---|
| 89 | difference between branch instructions is semantics, not span. This |
---|
[2091] | 90 | means that a distinction is made between an unconditional branch and the |
---|
| 91 | several kinds of conditional branch, but not between their short, absolute or |
---|
| 92 | long variants. |
---|
[2085] | 93 | |
---|
[2091] | 94 | The algorithm used by the assembler to encode these branch instructions into |
---|
| 95 | the different machine instructions is known as the {\em branch displacement |
---|
[2096] | 96 | algorithm}. The optimisation problem consists of finding as small an encoding as |
---|
[2091] | 97 | possible, thus minimising program length and execution time. |
---|
[1889] | 98 | |
---|
| 99 | This problem is known to be NP-complete~\cite{Robertson1979,Szymanski1978}, |
---|
| 100 | which could make finding an optimal solution very time-consuming. |
---|
| 101 | |
---|
[2085] | 102 | The canonical solution, as shown by Szymanski~\cite{Szymanski1978} or more |
---|
| 103 | recently by Dickson~\cite{Dickson2008} for the x86 instruction set, is to use a |
---|
[2096] | 104 | fixed point algorithm that starts with the shortest possible encoding (all |
---|
| 105 | branch instruction encoded as short jumps, which is likely not a correct |
---|
[2091] | 106 | solution) and then iterates over the program to re-encode those branch |
---|
| 107 | instructions whose target is outside their range. |
---|
[1889] | 108 | |
---|
[2085] | 109 | \subsection*{Adding absolute jumps} |
---|
[1889] | 110 | |
---|
[2049] | 111 | In both papers mentioned above, the encoding of a jump is only dependent on the |
---|
| 112 | distance between the jump and its target: below a certain value a short jump |
---|
| 113 | can be used; above this value the jump must be encoded as a long jump. |
---|
[1889] | 114 | |
---|
[2064] | 115 | Here, termination of the smallest fixed point algorithm is easy to prove. All |
---|
[2091] | 116 | branch instructions start out encoded as short jumps, which means that the |
---|
| 117 | distance between any branch instruction and its target is as short as possible. |
---|
| 118 | If, in this situation, there is a branch instruction $b$ whose span is not |
---|
| 119 | within the range for a short jump, we can be sure that we can never reach a |
---|
| 120 | situation where the span of $j$ is so small that it can be encoded as a short |
---|
| 121 | jump. This argument continues to hold throughout the subsequent iterations of |
---|
[2096] | 122 | the algorithm: short jumps can change into long jumps, but not \emph{vice versa}, |
---|
| 123 | as spans only increase. Hence, the algorithm either terminates early when a fixed |
---|
[2091] | 124 | point is reached or when all short jumps have been changed into long jumps. |
---|
[1889] | 125 | |
---|
| 126 | Also, we can be certain that we have reached an optimal solution: a short jump |
---|
| 127 | is only changed into a long jump if it is absolutely necessary. |
---|
| 128 | |
---|
| 129 | However, neither of these claims (termination nor optimality) hold when we add |
---|
[2096] | 130 | the absolute jump, as with absolute jumps, the encoding of a branch |
---|
[2091] | 131 | instruction no longer depends only on the distance between the branch |
---|
| 132 | instruction and its target: in order for an absolute jump to be possible, they |
---|
| 133 | need to be in the same segment (for the MCS-51, this means that the first 5 |
---|
| 134 | bytes of their addresses have to be equal). It is therefore entirely possible |
---|
| 135 | for two branch instructions with the same span to be encoded in different ways |
---|
| 136 | (absolute if the branch instruction and its target are in the same segment, |
---|
| 137 | long if this is not the case). |
---|
[1889] | 138 | |
---|
[2096] | 139 | \begin{figure}[ht] |
---|
| 140 | \begin{alltt} |
---|
| 141 | jmp X |
---|
| 142 | \vdots |
---|
| 143 | L\(\sb{0}\): |
---|
| 144 | \vdots |
---|
| 145 | jmp L\(\sb{0}\) |
---|
| 146 | \end{alltt} |
---|
| 147 | \caption{Example of a program where a long jump becomes absolute} |
---|
| 148 | \label{f:term_example} |
---|
| 149 | \end{figure} |
---|
| 150 | |
---|
| 151 | This invalidates our earlier termination argument: a branch instruction, once encoded |
---|
[2091] | 152 | as a long jump, can be re-encoded during a later iteration as an absolute jump. |
---|
[2096] | 153 | Consider the program shown in Figure~\ref{f:term_example}. At the start of the |
---|
[2091] | 154 | first iteration, both the branch to {\tt X} and the branch to $\mathtt{L}_{0}$ |
---|
| 155 | are encoded as small jumps. Let us assume that in this case, the placement of |
---|
| 156 | $\mathtt{L}_{0}$ and the branch to it are such that $\mathtt{L}_{0}$ is just |
---|
| 157 | outside the segment that contains this branch. Let us also assume that the |
---|
| 158 | distance between $\mathtt{L}_{0}$ and the branch to it are too large for the |
---|
| 159 | branch instruction to be encoded as a short jump. |
---|
[1889] | 160 | |
---|
[2091] | 161 | All this means that in the second iteration, the branch to $\mathtt{L}_{0}$ will |
---|
| 162 | be encoded as a long jump. If we assume that the branch to {\tt X} is encoded as |
---|
| 163 | a long jump as well, the size of the branch instruction will increase and |
---|
| 164 | $\mathtt{L}_{0}$ will be `propelled' into the same segment as its branch |
---|
| 165 | instruction, because every subsequent instruction will move one byte forward. |
---|
| 166 | Hence, in the third iteration, the branch to $\mathtt{L}_{0}$ can be encoded as |
---|
| 167 | an absolute jump. At first glance, there is nothing that prevents us from |
---|
[2096] | 168 | constructing a configuration where two branch instructions interact in such a |
---|
| 169 | way as to iterate indefinitely between long and absolute encodings. |
---|
[1889] | 170 | |
---|
[2096] | 171 | This situation mirrors the explanation by Szymanski~\cite{Szymanski1978} of why |
---|
| 172 | the branch displacement optimisation problem is NP-complete. In this explanation, |
---|
| 173 | a condition for NP-completeness is the fact that programs be allowed to contain |
---|
| 174 | {\em pathological} jumps. These are branch instructions that can normally not be |
---|
| 175 | encoded as a short(er) jump, but gain this property when some other branch |
---|
| 176 | instructions are encoded as a long(er) jump. This is exactly what happens in |
---|
| 177 | Figure~\ref{f:term_example}. By encoding the first branch instruction as a long |
---|
| 178 | jump, another branch instruction switches from long to absolute (which is |
---|
| 179 | shorter). |
---|
[1889] | 180 | |
---|
[2096] | 181 | In addition, our previous optimality argument no longer holds. Consider the program |
---|
| 182 | shown in Figure~\ref{f:opt_example}. Suppose that the distance between |
---|
[1889] | 183 | $\mathtt{L}_{0}$ and $\mathtt{L}_{1}$ is such that if {\tt jmp X} is encoded |
---|
| 184 | as a short jump, there is a segment border just after $\mathtt{L}_{1}$. Let |
---|
[2091] | 185 | us also assume that the three branches to $\mathtt{L}_{1}$ are all in the same |
---|
[1889] | 186 | segment, but far enough away from $\mathtt{L}_{1}$ that they cannot be encoded |
---|
| 187 | as short jumps. |
---|
| 188 | |
---|
[2096] | 189 | \begin{figure}[ht] |
---|
[1889] | 190 | \begin{alltt} |
---|
| 191 | L\(\sb{0}\): jmp X |
---|
| 192 | X: |
---|
| 193 | \vdots |
---|
| 194 | L\(\sb{1}\): |
---|
| 195 | \vdots |
---|
| 196 | jmp L\(\sb{1}\) |
---|
| 197 | \vdots |
---|
| 198 | jmp L\(\sb{1}\) |
---|
| 199 | \vdots |
---|
| 200 | jmp L\(\sb{1}\) |
---|
| 201 | \vdots |
---|
| 202 | \end{alltt} |
---|
| 203 | \caption{Example of a program where the fixed-point algorithm is not optimal} |
---|
| 204 | \label{f:opt_example} |
---|
| 205 | \end{figure} |
---|
[2096] | 206 | |
---|
| 207 | Then, if {\tt jmp X} were to be encoded as a short jump, which is clearly |
---|
| 208 | possible, all of the branches to $\mathtt{L}_{1}$ would have to be encoded as |
---|
| 209 | long jumps. However, if {\tt jmp X} were to be encoded as a long jump, and |
---|
| 210 | therefore increase in size, $\mathtt{L}_{1}$ would be `propelled' across the |
---|
| 211 | segment border, so that the three branches to $\mathtt{L}_{1}$ could be encoded |
---|
| 212 | as absolute jumps. Depending on the relative sizes of long and absolute jumps, |
---|
| 213 | this solution might actually be smaller than the one reached by the smallest |
---|
| 214 | fixed point algorithm. |
---|