1 | \section{Introduction} |
---|
2 | |
---|
3 | The problem of branch displacement optimisation, also known as jump encoding, is |
---|
4 | a well-known problem in assembler design. It is caused by the fact that in |
---|
5 | many architecture sets, the encoding (and therefore size) of some instructions |
---|
6 | depends on the distance to their operand (the instruction 'span'). The branch |
---|
7 | displacement optimisation problem consists in encoding these span-dependent |
---|
8 | instructions in such a way that the resulting program is as small as possible. |
---|
9 | |
---|
10 | This problem is the subject of the present paper. After introducing the problem |
---|
11 | in more detail, we will discuss the solutions used by other compilers, present |
---|
12 | the algorithm used by us in the CerCo assembler, and discuss its verification, |
---|
13 | that is the proofs of termination and correctness. |
---|
14 | |
---|
15 | The research presented in this paper has been executed within the CerCo project |
---|
16 | which aims at formally verifying a C compiler with cost annotations. The |
---|
17 | target architecture for this project is the MCS-51, whose instruction set |
---|
18 | contains span-dependent instructions. Furthermore, its maximum addressable |
---|
19 | memory size is very small (65 Kbytes), which makes it important to generate |
---|
20 | programs that are as small as possible. |
---|
21 | |
---|
22 | With this optimisation, however, comes increased complexity and hence |
---|
23 | increased possibility for error. We must make sure that the jumps are encoded |
---|
24 | correctly, otherwise the assembled program will behave impredictably. |
---|
25 | |
---|
26 | \section{The branch displacement optimisation problem} |
---|
27 | |
---|
28 | In most modern instruction sets, the only span-dependent instructions are |
---|
29 | branch instructions. Taking the ubiquitous x86-64 instruction set as an |
---|
30 | example, we find that it contains are eleven different forms of the |
---|
31 | unconditional jump instruction, all with different ranges, instruction sizes |
---|
32 | and semantics (only six are valid in 64-bit mode, for example). Some examples |
---|
33 | are shown in figure~\ref{f:x86jumps}. |
---|
34 | |
---|
35 | \begin{figure}[h] |
---|
36 | \begin{center} |
---|
37 | \begin{tabular}{|l|l|l|} |
---|
38 | \hline |
---|
39 | Instruction & Size (bytes) & Displacement range \\ |
---|
40 | \hline |
---|
41 | Short jump & 2 & -128 to 127 bytes \\ |
---|
42 | Relative near jump & 5 & $-2^{32}$ to $2^{32}-1$ bytes \\ |
---|
43 | Absolute near jump & 6 & one segment (64-bit address) \\ |
---|
44 | Far jump & 8 & entire memory \\ |
---|
45 | \hline |
---|
46 | \end{tabular} |
---|
47 | \end{center} |
---|
48 | \caption{List of x86 jump instructions} |
---|
49 | \label{f:x86jumps} |
---|
50 | \end{figure} |
---|
51 | |
---|
52 | The chosen target architecture of the CerCo project is the Intel MCS-51, which |
---|
53 | features three types of branch instructions (or jump instructions; the two terms |
---|
54 | are used interchangeably), as shown in figure~\ref{f:mcs51jumps}. |
---|
55 | |
---|
56 | \begin{figure}[h] |
---|
57 | \begin{center} |
---|
58 | \begin{tabular}{|l|l|l|l|} |
---|
59 | \hline |
---|
60 | Instruction & Size & Execution time & Displacement range \\ |
---|
61 | & (bytes) & (cycles) & \\ |
---|
62 | \hline |
---|
63 | SJMP (`short jump') & 2 & 2 & -128 to 127 bytes \\ |
---|
64 | AJMP (`absolute jump') & 2 & 2 & one segment (11-bit address) \\ |
---|
65 | LJMP (`long jump') & 3 & 3 & entire memory \\ |
---|
66 | \hline |
---|
67 | \end{tabular} |
---|
68 | \end{center} |
---|
69 | \caption{List of MCS-51 jump instructions} |
---|
70 | \label{f:mcs51jumps} |
---|
71 | \end{figure} |
---|
72 | |
---|
73 | The conditional jump instruction is only available in short form, which |
---|
74 | means that a conditional jump outside the short address range has to be |
---|
75 | encoded using two jumps; the call instruction is only available in |
---|
76 | absolute and long forms. |
---|
77 | |
---|
78 | Note that even though the MCS-51 architecture is much less advanced and more |
---|
79 | simple than the x86-64 architecture, the basic types of jump remain the same: |
---|
80 | a short jump with a limited range, an intra-segment jump and a jump that can |
---|
81 | reach the entire available memory. |
---|
82 | |
---|
83 | Generally, in the code that is sent to the assembler as input, the only |
---|
84 | difference made between jump instructions is by semantics, not by span. This |
---|
85 | means that a distinction is made between the inconditional jump and the several |
---|
86 | kinds of conditional jump, but not between their short, absolute or long |
---|
87 | variants. |
---|
88 | |
---|
89 | The algorithm used by the assembler to encode these jumps into the different |
---|
90 | machine instructions is known as the {\tt branch displacement algorithm}. The |
---|
91 | optimisation problem consists in using as small an encoding as possible, thus |
---|
92 | minimising program length and execution time. |
---|
93 | |
---|
94 | This problem is known to be NP-complete~\cite{Robertson1979,Szymanski1978}, |
---|
95 | which could make finding an optimal solution very time-consuming. |
---|
96 | |
---|
97 | The canonical solution, as shown by Szymanski~\cite{Szymanski1978} or more |
---|
98 | recently by Dickson~\cite{Dickson2008} for the x86 instruction set, is to use a |
---|
99 | fixed point algorithm that starts out with the shortest possible encoding (all |
---|
100 | jumps encoded as short jumps, which is very probably not a correct solution) |
---|
101 | and then iterates over the program to re-encode those jumps whose target is |
---|
102 | outside their range. |
---|
103 | |
---|
104 | \subsection*{Adding absolute jumps} |
---|
105 | |
---|
106 | In both papers mentioned above, the encoding of a jump is only dependent on the |
---|
107 | distance between the jump and its target: below a certain value a short jump |
---|
108 | can be used; above this value the jump must be encoded as a long jump. |
---|
109 | |
---|
110 | Here, termination of the smallest fixed point algorithm is easy to prove. All |
---|
111 | jumps start out encoded as short jumps, which means that the distance between |
---|
112 | any jump and its target is as short as possible. If, in this situation, there |
---|
113 | is a jump $j$ whose span is not within the range for a short jump, we can be |
---|
114 | sure that we can never reach a situation where the span of $j$ is so small that |
---|
115 | it can be encoded as a short jump. This reasoning holds throughout the |
---|
116 | subsequent iterations of the algorithm: short jumps can change into long jumps, |
---|
117 | but not vice versa (spans only increase). Hence, the algorithm either |
---|
118 | terminates when a fixed point is reached or when all short jumps have been |
---|
119 | changed into long jumps. |
---|
120 | |
---|
121 | Also, we can be certain that we have reached an optimal solution: a short jump |
---|
122 | is only changed into a long jump if it is absolutely necessary. |
---|
123 | |
---|
124 | However, neither of these claims (termination nor optimality) hold when we add |
---|
125 | the absolute jump. |
---|
126 | |
---|
127 | The reason for this is that with absolute jumps, the encoding of a jump no |
---|
128 | longer depends only on the distance between the jump and its target: in order |
---|
129 | for an absolute jump to be possible, they need to be in the same segment (for |
---|
130 | the MCS-51, this means that the first 5 bytes of their addesses have to be |
---|
131 | equal). It is therefore entirely possible for two jumps with the same span to |
---|
132 | be encoded in different ways (absolute if the jump and its target are in the |
---|
133 | same segment, long if this is not the case). |
---|
134 | |
---|
135 | This invalidates the termination argument: a jump, once encoded as a long jump, |
---|
136 | can be re-encoded during a later iteration as an absolute jump. Consider the |
---|
137 | program shown in figure~\ref{f:term_example}. At the start of the first |
---|
138 | iteration, both the jump to {\tt X} and the jump to $\mathtt{L}_{0}$ are |
---|
139 | encoded as small jumps. Let us assume that in this case, the placement of |
---|
140 | $\mathtt{L}_{0}$ and the jump to it are such that $\mathtt{L}_{0}$ is just |
---|
141 | outside the segment that contains this jump. Let us also assume that the |
---|
142 | distance between $\mathtt{L}_{0}$ and the jump to it are too large for the jump |
---|
143 | to be encoded as a short jump. |
---|
144 | |
---|
145 | All this means that in the second iteration, the jump to $\mathtt{L}_{0}$ will |
---|
146 | be encoded as a long jump. If we assume that the jump to {\tt X} is encoded as |
---|
147 | a long jump as well, the size of the jump will increase and $\mathtt{L}_{0}$ |
---|
148 | will be `propelled' into the same segment as its jump. Hence, in the third |
---|
149 | iteration, it can be encoded as an absolute jump. At first glance, there is |
---|
150 | nothing that prevents us from making a construction where two jumps interact |
---|
151 | in such a way as to keep switching between long and absolute encodings for |
---|
152 | an indefinite amount of iterations. |
---|
153 | |
---|
154 | \begin{figure}[h] |
---|
155 | \begin{alltt} |
---|
156 | jmp X |
---|
157 | \vdots |
---|
158 | L\(\sb{0}\): |
---|
159 | \vdots |
---|
160 | jmp L\(\sb{0}\) |
---|
161 | \end{alltt} |
---|
162 | \caption{Example of a program where a long jump becomes absolute} |
---|
163 | \label{f:term_example} |
---|
164 | \end{figure} |
---|
165 | |
---|
166 | In fact, this situation mirrors the explanation by |
---|
167 | Szymanski~\cite{Szymanski1978} of why the branch displacement optimisation |
---|
168 | problem is NP-complete. In this explanation, a condition for NP-completeness |
---|
169 | is the fact that programs be allowed to contain {\em pathological} jumps. |
---|
170 | These are jumps that can normally not be encoded as a short(er) jump, but gain |
---|
171 | this property when some other jumps are encoded as a long(er) jump. This is |
---|
172 | exactly what happens in figure~\ref{f:term_example}: by encoding the first |
---|
173 | jump as a long jump, another jump switches from long to absolute |
---|
174 | (which is shorter). |
---|
175 | |
---|
176 | The optimality argument no longer holds either. Let us consider the program |
---|
177 | shown in figure~\ref{f:opt_example}. Suppose that the distance between |
---|
178 | $\mathtt{L}_{0}$ and $\mathtt{L}_{1}$ is such that if {\tt jmp X} is encoded |
---|
179 | as a short jump, there is a segment border just after $\mathtt{L}_{1}$. Let |
---|
180 | us also assume that the three jumps to $\mathtt{L}_{1}$ are all in the same |
---|
181 | segment, but far enough away from $\mathtt{L}_{1}$ that they cannot be encoded |
---|
182 | as short jumps. |
---|
183 | |
---|
184 | Then, if {\tt jmp X} were to be encoded as a short jump, which is clearly |
---|
185 | possible, all of the jumps to $\mathtt{L}_{1}$ would have to be encoded as |
---|
186 | long jumps. However, if {\tt jmp X} were to be encoded as a long jump, and |
---|
187 | therefore increase in size, $\mathtt{L}_{1}$ would be `propelled' across the |
---|
188 | segment border, so that the three jumps to $\mathtt{L}_{1}$ could be encoded |
---|
189 | as absolute jumps. Depending on the relative sizes of long and absolute jumps, |
---|
190 | this solution might actually be smaller than the one reached by the smallest |
---|
191 | fixed point algorithm. |
---|
192 | |
---|
193 | \begin{figure}[h] |
---|
194 | \begin{alltt} |
---|
195 | L\(\sb{0}\): jmp X |
---|
196 | X: |
---|
197 | \vdots |
---|
198 | L\(\sb{1}\): |
---|
199 | \vdots |
---|
200 | jmp L\(\sb{1}\) |
---|
201 | \vdots |
---|
202 | jmp L\(\sb{1}\) |
---|
203 | \vdots |
---|
204 | jmp L\(\sb{1}\) |
---|
205 | \vdots |
---|
206 | \end{alltt} |
---|
207 | \caption{Example of a program where the fixed-point algorithm is not optimal} |
---|
208 | \label{f:opt_example} |
---|
209 | \end{figure} |
---|