1 | \section{Introduction} |
---|
2 | |
---|
3 | The problem of branch displacement optimisation, also known as jump encoding, is |
---|
4 | a well-known problem in assembler design~\cite{Hyde2006}. It is caused by the |
---|
5 | fact that in many architecture sets, the encoding (and therefore size) of some |
---|
6 | instructions depends on the distance to their operand (the instruction 'span'). |
---|
7 | The branch displacement optimisation problem consists of encoding these |
---|
8 | span-dependent instructions in such a way that the resulting program is as |
---|
9 | small as possible. |
---|
10 | |
---|
11 | This problem is the subject of the present paper. After introducing the problem |
---|
12 | in more detail, we will discuss the solutions used by other compilers, present |
---|
13 | the algorithm we use in the CerCo assembler, and discuss its verification, |
---|
14 | that is the proofs of termination and correctness using the Matita proof |
---|
15 | assistant~\cite{Asperti2007}. |
---|
16 | |
---|
17 | The research presented in this paper has been executed within the CerCo project |
---|
18 | which aims at formally verifying a C compiler with cost annotations. The |
---|
19 | target architecture for this project is the MCS-51, whose instruction set |
---|
20 | contains span-dependent instructions. Furthermore, its maximum addressable |
---|
21 | memory size is very small (64 Kb), which makes it important to generate |
---|
22 | programs that are as small as possible. |
---|
23 | |
---|
24 | With this optimisation, however, comes increased complexity and hence |
---|
25 | increased possibility for error. We must make sure that the branch instructions |
---|
26 | are encoded correctly, otherwise the assembled program will behave |
---|
27 | unpredictably. |
---|
28 | |
---|
29 | \section{The branch displacement optimisation problem} |
---|
30 | |
---|
31 | In most modern instruction sets that have them, the only span-dependent |
---|
32 | instructions are branch instructions. Taking the ubiquitous x86-64 instruction |
---|
33 | set as an example, we find that it contains eleven different forms of the |
---|
34 | unconditional branch instruction, all with different ranges, instruction sizes |
---|
35 | and semantics (only six are valid in 64-bit mode, for example). Some examples |
---|
36 | are shown in Figure~\ref{f:x86jumps} (see also~\cite{IntelDev}). |
---|
37 | |
---|
38 | \begin{figure}[h] |
---|
39 | \begin{center} |
---|
40 | \begin{tabular}{|l|l|l|} |
---|
41 | \hline |
---|
42 | Instruction & Size (bytes) & Displacement range \\ |
---|
43 | \hline |
---|
44 | Short jump & 2 & -128 to 127 bytes \\ |
---|
45 | Relative near jump & 5 & $-2^{32}$ to $2^{32}-1$ bytes \\ |
---|
46 | Absolute near jump & 6 & one segment (64-bit address) \\ |
---|
47 | Far jump & 8 & entire memory \\ |
---|
48 | \hline |
---|
49 | \end{tabular} |
---|
50 | \end{center} |
---|
51 | \caption{List of x86 branch instructions} |
---|
52 | \label{f:x86jumps} |
---|
53 | \end{figure} |
---|
54 | |
---|
55 | The chosen target architecture of the CerCo project is the Intel MCS-51, which |
---|
56 | features three types of branch instructions (or jump instructions; the two terms |
---|
57 | are used interchangeably), as shown in Figure~\ref{f:mcs51jumps}. |
---|
58 | |
---|
59 | \begin{figure}[h] |
---|
60 | \begin{center} |
---|
61 | \begin{tabular}{|l|l|l|l|} |
---|
62 | \hline |
---|
63 | Instruction & Size & Execution time & Displacement range \\ |
---|
64 | & (bytes) & (cycles) & \\ |
---|
65 | \hline |
---|
66 | SJMP (`short jump') & 2 & 2 & -128 to 127 bytes \\ |
---|
67 | AJMP (`absolute jump') & 2 & 2 & one segment (11-bit address) \\ |
---|
68 | LJMP (`long jump') & 3 & 3 & entire memory \\ |
---|
69 | \hline |
---|
70 | \end{tabular} |
---|
71 | \end{center} |
---|
72 | \caption{List of MCS-51 branch instructions} |
---|
73 | \label{f:mcs51jumps} |
---|
74 | \end{figure} |
---|
75 | |
---|
76 | Conditional branch instructions are only available in short form, which |
---|
77 | means that a conditional branch outside the short address range has to be |
---|
78 | encoded using three branch instructions (for instructions whose logical |
---|
79 | negation is available, it can be done with two branch instructions, but for |
---|
80 | some instructions this is not available); the call instruction is |
---|
81 | only available in absolute and long forms. |
---|
82 | |
---|
83 | Note that even though the MCS-51 architecture is much less advanced and simpler |
---|
84 | than the x86-64 architecture, the basic types of branch instruction |
---|
85 | remain the same: a short jump with a limited range, an intra-segment jump and a |
---|
86 | jump that can reach the entire available memory. |
---|
87 | |
---|
88 | Generally, in code fed to the assembler as input, the only |
---|
89 | difference between branch instructions is semantics, not span. This |
---|
90 | means that a distinction is made between an unconditional branch and the |
---|
91 | several kinds of conditional branch, but not between their short, absolute or |
---|
92 | long variants. |
---|
93 | |
---|
94 | The algorithm used by the assembler to encode these branch instructions into |
---|
95 | the different machine instructions is known as the {\em branch displacement |
---|
96 | algorithm}. The optimisation problem consists of finding as small an encoding as |
---|
97 | possible, thus minimising program length and execution time. |
---|
98 | |
---|
99 | This problem is known to be NP-complete~\cite{Robertson1979,Szymanski1978}, |
---|
100 | which could make finding an optimal solution very time-consuming. |
---|
101 | |
---|
102 | The canonical solution, as shown by Szymanski~\cite{Szymanski1978} or more |
---|
103 | recently by Dickson~\cite{Dickson2008} for the x86 instruction set, is to use a |
---|
104 | fixed point algorithm that starts with the shortest possible encoding (all |
---|
105 | branch instruction encoded as short jumps, which is likely not a correct |
---|
106 | solution) and then iterates over the program to re-encode those branch |
---|
107 | instructions whose target is outside their range. |
---|
108 | |
---|
109 | \subsection*{Adding absolute jumps} |
---|
110 | |
---|
111 | In both papers mentioned above, the encoding of a jump is only dependent on the |
---|
112 | distance between the jump and its target: below a certain value a short jump |
---|
113 | can be used; above this value the jump must be encoded as a long jump. |
---|
114 | |
---|
115 | Here, termination of the smallest fixed point algorithm is easy to prove. All |
---|
116 | branch instructions start out encoded as short jumps, which means that the |
---|
117 | distance between any branch instruction and its target is as short as possible. |
---|
118 | If, in this situation, there is a branch instruction $b$ whose span is not |
---|
119 | within the range for a short jump, we can be sure that we can never reach a |
---|
120 | situation where the span of $j$ is so small that it can be encoded as a short |
---|
121 | jump. This argument continues to hold throughout the subsequent iterations of |
---|
122 | the algorithm: short jumps can change into long jumps, but not \emph{vice versa}, |
---|
123 | as spans only increase. Hence, the algorithm either terminates early when a fixed |
---|
124 | point is reached or when all short jumps have been changed into long jumps. |
---|
125 | |
---|
126 | Also, we can be certain that we have reached an optimal solution: a short jump |
---|
127 | is only changed into a long jump if it is absolutely necessary. |
---|
128 | |
---|
129 | However, neither of these claims (termination nor optimality) hold when we add |
---|
130 | the absolute jump, as with absolute jumps, the encoding of a branch |
---|
131 | instruction no longer depends only on the distance between the branch |
---|
132 | instruction and its target: in order for an absolute jump to be possible, they |
---|
133 | need to be in the same segment (for the MCS-51, this means that the first 5 |
---|
134 | bytes of their addresses have to be equal). It is therefore entirely possible |
---|
135 | for two branch instructions with the same span to be encoded in different ways |
---|
136 | (absolute if the branch instruction and its target are in the same segment, |
---|
137 | long if this is not the case). |
---|
138 | |
---|
139 | \begin{figure}[ht] |
---|
140 | \begin{alltt} |
---|
141 | jmp X |
---|
142 | \vdots |
---|
143 | L\(\sb{0}\): |
---|
144 | \vdots |
---|
145 | jmp L\(\sb{0}\) |
---|
146 | \end{alltt} |
---|
147 | \caption{Example of a program where a long jump becomes absolute} |
---|
148 | \label{f:term_example} |
---|
149 | \end{figure} |
---|
150 | |
---|
151 | This invalidates our earlier termination argument: a branch instruction, once encoded |
---|
152 | as a long jump, can be re-encoded during a later iteration as an absolute jump. |
---|
153 | Consider the program shown in Figure~\ref{f:term_example}. At the start of the |
---|
154 | first iteration, both the branch to {\tt X} and the branch to $\mathtt{L}_{0}$ |
---|
155 | are encoded as small jumps. Let us assume that in this case, the placement of |
---|
156 | $\mathtt{L}_{0}$ and the branch to it are such that $\mathtt{L}_{0}$ is just |
---|
157 | outside the segment that contains this branch. Let us also assume that the |
---|
158 | distance between $\mathtt{L}_{0}$ and the branch to it are too large for the |
---|
159 | branch instruction to be encoded as a short jump. |
---|
160 | |
---|
161 | All this means that in the second iteration, the branch to $\mathtt{L}_{0}$ will |
---|
162 | be encoded as a long jump. If we assume that the branch to {\tt X} is encoded as |
---|
163 | a long jump as well, the size of the branch instruction will increase and |
---|
164 | $\mathtt{L}_{0}$ will be `propelled' into the same segment as its branch |
---|
165 | instruction, because every subsequent instruction will move one byte forward. |
---|
166 | Hence, in the third iteration, the branch to $\mathtt{L}_{0}$ can be encoded as |
---|
167 | an absolute jump. At first glance, there is nothing that prevents us from |
---|
168 | constructing a configuration where two branch instructions interact in such a |
---|
169 | way as to iterate indefinitely between long and absolute encodings. |
---|
170 | |
---|
171 | This situation mirrors the explanation by Szymanski~\cite{Szymanski1978} of why |
---|
172 | the branch displacement optimisation problem is NP-complete. In this explanation, |
---|
173 | a condition for NP-completeness is the fact that programs be allowed to contain |
---|
174 | {\em pathological} jumps. These are branch instructions that can normally not be |
---|
175 | encoded as a short(er) jump, but gain this property when some other branch |
---|
176 | instructions are encoded as a long(er) jump. This is exactly what happens in |
---|
177 | Figure~\ref{f:term_example}. By encoding the first branch instruction as a long |
---|
178 | jump, another branch instruction switches from long to absolute (which is |
---|
179 | shorter). |
---|
180 | |
---|
181 | In addition, our previous optimality argument no longer holds. Consider the program |
---|
182 | shown in Figure~\ref{f:opt_example}. Suppose that the distance between |
---|
183 | $\mathtt{L}_{0}$ and $\mathtt{L}_{1}$ is such that if {\tt jmp X} is encoded |
---|
184 | as a short jump, there is a segment border just after $\mathtt{L}_{1}$. Let |
---|
185 | us also assume that the three branches to $\mathtt{L}_{1}$ are all in the same |
---|
186 | segment, but far enough away from $\mathtt{L}_{1}$ that they cannot be encoded |
---|
187 | as short jumps. |
---|
188 | |
---|
189 | \begin{figure}[ht] |
---|
190 | \begin{alltt} |
---|
191 | L\(\sb{0}\): jmp X |
---|
192 | X: |
---|
193 | \vdots |
---|
194 | L\(\sb{1}\): |
---|
195 | \vdots |
---|
196 | jmp L\(\sb{1}\) |
---|
197 | \vdots |
---|
198 | jmp L\(\sb{1}\) |
---|
199 | \vdots |
---|
200 | jmp L\(\sb{1}\) |
---|
201 | \vdots |
---|
202 | \end{alltt} |
---|
203 | \caption{Example of a program where the fixed-point algorithm is not optimal} |
---|
204 | \label{f:opt_example} |
---|
205 | \end{figure} |
---|
206 | |
---|
207 | Then, if {\tt jmp X} were to be encoded as a short jump, which is clearly |
---|
208 | possible, all of the branches to $\mathtt{L}_{1}$ would have to be encoded as |
---|
209 | long jumps. However, if {\tt jmp X} were to be encoded as a long jump, and |
---|
210 | therefore increase in size, $\mathtt{L}_{1}$ would be `propelled' across the |
---|
211 | segment border, so that the three branches to $\mathtt{L}_{1}$ could be encoded |
---|
212 | as absolute jumps. Depending on the relative sizes of long and absolute jumps, |
---|
213 | this solution might actually be smaller than the one reached by the smallest |
---|
214 | fixed point algorithm. |
---|