1 | \section{Introduction} |
---|
2 | |
---|
3 | The problem of branch displacement optimisation, also known as jump encoding, is |
---|
4 | a well-known problem in assembler design~\cite{Hyde2006}. Its origin lies in the |
---|
5 | fact that in many architecture sets, the encoding (and therefore size) of some |
---|
6 | instructions depends on the distance to their operand (the instruction 'span'). |
---|
7 | The branch displacement optimisation problem consists of encoding these |
---|
8 | span-dependent instructions in such a way that the resulting program is as |
---|
9 | small as possible. |
---|
10 | |
---|
11 | This problem is the subject of the present paper. After introducing the problem |
---|
12 | in more detail, we will discuss the solutions used by other compilers, present |
---|
13 | the algorithm we use in the CerCo assembler, and discuss its verification, |
---|
14 | that is the proofs of termination and correctness using the Matita proof |
---|
15 | assistant~\cite{Asperti2007}. |
---|
16 | |
---|
17 | Formulating the final statement of correctness and finding the loop invariants |
---|
18 | have been non-trivial tasks and are, indeed, the main contribution of this |
---|
19 | paper. It has required considerable care and fine-tuning to formulate not |
---|
20 | only the minimal statement required for the ulterior proof of correctness of |
---|
21 | the assembler, but also the minimal set of invariants needed for the proof |
---|
22 | of correctness of the algorithm. |
---|
23 | |
---|
24 | The research presented in this paper has been executed within the CerCo project |
---|
25 | which aims at formally verifying a C compiler with cost annotations. The |
---|
26 | target architecture for this project is the MCS-51, whose instruction set |
---|
27 | contains span-dependent instructions. Furthermore, its maximum addressable |
---|
28 | memory size is very small (64 Kb), which makes it important to generate |
---|
29 | programs that are as small as possible. With this optimisation, however, comes increased complexity and hence increased possibility for error. We must make sure that the branch instructions are encoded correctly, otherwise the assembled program will behave unpredictably. |
---|
30 | |
---|
31 | All Matita files related to this development can be found on the CerCo |
---|
32 | website, \url{http://cerco.cs.unibo.it}. The specific part that contains the |
---|
33 | branch displacement algorithm is in the {\tt ASM} subdirectory, in the files |
---|
34 | {\tt PolicyFront.ma}, {\tt PolicyStep.ma} and {\tt Policy.ma}. |
---|
35 | |
---|
36 | \section{The branch displacement optimisation problem} |
---|
37 | |
---|
38 | In most modern instruction sets that have them, the only span-dependent |
---|
39 | instructions are branch instructions. Taking the ubiquitous x86-64 instruction |
---|
40 | set as an example, we find that it contains eleven different forms of the |
---|
41 | unconditional branch instruction, all with different ranges, instruction sizes |
---|
42 | and semantics (only six are valid in 64-bit mode, for example). Some examples |
---|
43 | are shown in Figure~\ref{f:x86jumps} (see also~\cite{IntelDev}). |
---|
44 | |
---|
45 | \begin{figure}[h] |
---|
46 | \begin{center} |
---|
47 | \begin{tabular}{|l|l|l|} |
---|
48 | \hline |
---|
49 | Instruction & Size (bytes) & Displacement range \\ |
---|
50 | \hline |
---|
51 | Short jump & 2 & -128 to 127 bytes \\ |
---|
52 | Relative near jump & 5 & $-2^{32}$ to $2^{32}-1$ bytes \\ |
---|
53 | Absolute near jump & 6 & one segment (64-bit address) \\ |
---|
54 | Far jump & 8 & entire memory (indirect jump) \\ |
---|
55 | \hline |
---|
56 | \end{tabular} |
---|
57 | \end{center} |
---|
58 | \caption{List of x86 branch instructions} |
---|
59 | \label{f:x86jumps} |
---|
60 | \end{figure} |
---|
61 | |
---|
62 | The chosen target architecture of the CerCo project is the Intel MCS-51, which |
---|
63 | features three types of branch instructions (or jump instructions; the two terms |
---|
64 | are used interchangeably), as shown in Figure~\ref{f:mcs51jumps}. |
---|
65 | |
---|
66 | \begin{figure}[h] |
---|
67 | \begin{center} |
---|
68 | \begin{tabular}{|l|l|l|l|} |
---|
69 | \hline |
---|
70 | Instruction & Size & Execution time & Displacement range \\ |
---|
71 | & (bytes) & (cycles) & \\ |
---|
72 | \hline |
---|
73 | SJMP (`short jump') & 2 & 2 & -128 to 127 bytes \\ |
---|
74 | AJMP (`absolute jump') & 2 & 2 & one segment (11-bit address) \\ |
---|
75 | LJMP (`long jump') & 3 & 3 & entire memory \\ |
---|
76 | \hline |
---|
77 | \end{tabular} |
---|
78 | \end{center} |
---|
79 | \caption{List of MCS-51 branch instructions} |
---|
80 | \label{f:mcs51jumps} |
---|
81 | \end{figure} |
---|
82 | |
---|
83 | Conditional branch instructions are only available in short form, which |
---|
84 | means that a conditional branch outside the short address range has to be |
---|
85 | encoded using three branch instructions (for instructions whose logical |
---|
86 | negation is available, it can be done with two branch instructions, but for |
---|
87 | some instructions this is not the case). The call instruction is |
---|
88 | only available in absolute and long forms. |
---|
89 | |
---|
90 | Note that even though the MCS-51 architecture is much less advanced and much |
---|
91 | simpler than the x86-64 architecture, the basic types of branch instruction |
---|
92 | remain the same: a short jump with a limited range, an intra-segment jump and a |
---|
93 | jump that can reach the entire available memory. |
---|
94 | |
---|
95 | Generally, in code fed to the assembler as input, the only |
---|
96 | difference between branch instructions is semantics, not span. This |
---|
97 | means that a distinction is made between an unconditional branch and the |
---|
98 | several kinds of conditional branch, but not between their short, absolute or |
---|
99 | long variants. |
---|
100 | |
---|
101 | The algorithm used by the assembler to encode these branch instructions into |
---|
102 | the different machine instructions is known as the {\em branch displacement |
---|
103 | algorithm}. The optimisation problem consists of finding as small an encoding as |
---|
104 | possible, thus minimising program length and execution time. |
---|
105 | |
---|
106 | Similar problems, e.g. the branch displacement optimisation problem for other |
---|
107 | architectures, are known to be NP-complete~\cite{Robertson1979,Szymanski1978}, |
---|
108 | which could make finding an optimal solution very time-consuming. |
---|
109 | |
---|
110 | The canonical solution, as shown by Szymanski~\cite{Szymanski1978} or more |
---|
111 | recently by Dickson~\cite{Dickson2008} for the x86 instruction set, is to use a |
---|
112 | fixed point algorithm that starts with the shortest possible encoding (all |
---|
113 | branch instruction encoded as short jumps, which is likely not a correct |
---|
114 | solution) and then iterates over the source to re-encode those branch |
---|
115 | instructions whose target is outside their range. |
---|
116 | |
---|
117 | \subsection*{Adding absolute jumps} |
---|
118 | |
---|
119 | In both papers mentioned above, the encoding of a jump is only dependent on the |
---|
120 | distance between the jump and its target: below a certain value a short jump |
---|
121 | can be used; above this value the jump must be encoded as a long jump. |
---|
122 | |
---|
123 | Here, termination of the smallest fixed point algorithm is easy to prove. All |
---|
124 | branch instructions start out encoded as short jumps, which means that the |
---|
125 | distance between any branch instruction and its target is as short as possible |
---|
126 | (all the intervening jumps are short). |
---|
127 | If, in this situation, there is a branch instruction $b$ whose span is not |
---|
128 | within the range for a short jump, we can be sure that we can never reach a |
---|
129 | situation where the span of $j$ is so small that it can be encoded as a short |
---|
130 | jump. This argument continues to hold throughout the subsequent iterations of |
---|
131 | the algorithm: short jumps can change into long jumps, but not \emph{vice versa}, |
---|
132 | as spans only increase. Hence, the algorithm either terminates early when a fixed |
---|
133 | point is reached or when all short jumps have been changed into long jumps. |
---|
134 | |
---|
135 | Also, we can be certain that we have reached an optimal solution: a short jump |
---|
136 | is only changed into a long jump if it is absolutely necessary. |
---|
137 | |
---|
138 | However, neither of these claims (termination nor optimality) hold when we add |
---|
139 | the absolute jump. With absolute jumps, the encoding of a branch |
---|
140 | instruction no longer depends only on the distance between the branch |
---|
141 | instruction and its target. An absolute jump is possible when instruction and |
---|
142 | target are in the same segment (for the MCS-51, this means that the first 5 |
---|
143 | bytes of their addresses have to be equal). It is therefore entirely possible |
---|
144 | for two branch instructions with the same span to be encoded in different ways |
---|
145 | (absolute if the branch instruction and its target are in the same segment, |
---|
146 | long if this is not the case). |
---|
147 | |
---|
148 | \begin{figure}[t] |
---|
149 | \begin{subfigure}[b]{.45\linewidth} |
---|
150 | \small |
---|
151 | \begin{alltt} |
---|
152 | jmp X |
---|
153 | \ldots |
---|
154 | L\(\sb{0}\): \ldots |
---|
155 | % Start of new segment if |
---|
156 | % jmp X is encoded as short |
---|
157 | \ldots |
---|
158 | jmp L\(\sb{0}\) |
---|
159 | \end{alltt} |
---|
160 | \caption{Example of a program where a long jump becomes absolute} |
---|
161 | \label{f:term_example} |
---|
162 | \end{subfigure} |
---|
163 | \hfill |
---|
164 | \begin{subfigure}[b]{.45\linewidth} |
---|
165 | \small |
---|
166 | \begin{alltt} |
---|
167 | L\(\sb{0}\): jmp X |
---|
168 | X: \ldots |
---|
169 | \ldots |
---|
170 | L\(\sb{1}\): \ldots |
---|
171 | % Start of new segment if |
---|
172 | % jmp X is encoded as short |
---|
173 | \ldots |
---|
174 | jmp L\(\sb{1}\) |
---|
175 | \ldots |
---|
176 | jmp L\(\sb{1}\) |
---|
177 | \ldots |
---|
178 | jmp L\(\sb{1}\) |
---|
179 | \ldots |
---|
180 | \end{alltt} |
---|
181 | \caption{Example of a program where the fixed-point algorithm is not optimal} |
---|
182 | \label{f:opt_example} |
---|
183 | \end{subfigure} |
---|
184 | \end{figure} |
---|
185 | |
---|
186 | This invalidates our earlier termination argument: a branch instruction, once encoded |
---|
187 | as a long jump, can be re-encoded during a later iteration as an absolute jump. |
---|
188 | Consider the program shown in Figure~\ref{f:term_example}. At the start of the |
---|
189 | first iteration, both the branch to {\tt X} and the branch to $\mathtt{L}_{0}$ |
---|
190 | are encoded as small jumps. Let us assume that in this case, the placement of |
---|
191 | $\mathtt{L}_{0}$ and the branch to it are such that $\mathtt{L}_{0}$ is just |
---|
192 | outside the segment that contains this branch. Let us also assume that the |
---|
193 | distance between $\mathtt{L}_{0}$ and the branch to it is too large for the |
---|
194 | branch instruction to be encoded as a short jump. |
---|
195 | |
---|
196 | All this means that in the second iteration, the branch to $\mathtt{L}_{0}$ will |
---|
197 | be encoded as a long jump. If we assume that the branch to {\tt X} is encoded as |
---|
198 | a long jump as well, the size of the branch instruction will increase and |
---|
199 | $\mathtt{L}_{0}$ will be `propelled' into the same segment as its branch |
---|
200 | instruction, because every subsequent instruction will move one byte forward. |
---|
201 | Hence, in the third iteration, the branch to $\mathtt{L}_{0}$ can be encoded as |
---|
202 | an absolute jump. At first glance, there is nothing that prevents us from |
---|
203 | constructing a configuration where two branch instructions interact in such a |
---|
204 | way as to iterate indefinitely between long and absolute encodings. |
---|
205 | |
---|
206 | This situation mirrors the explanation by Szymanski~\cite{Szymanski1978} of why |
---|
207 | the branch displacement optimisation problem is NP-complete. In this explanation, |
---|
208 | a condition for NP-completeness is the fact that programs be allowed to contain |
---|
209 | {\em pathological} jumps. These are branch instructions that can normally not be |
---|
210 | encoded as a short(er) jump, but gain this property when some other branch |
---|
211 | instructions are encoded as a long(er) jump. This is exactly what happens in |
---|
212 | Figure~\ref{f:term_example}. By encoding the first branch instruction as a long |
---|
213 | jump, another branch instruction switches from long to absolute (which is |
---|
214 | shorter). |
---|
215 | |
---|
216 | In addition, our previous optimality argument no longer holds. Consider the |
---|
217 | program shown in Figure~\ref{f:opt_example}. Suppose that the distance between |
---|
218 | $\mathtt{L}_{0}$ and $\mathtt{L}_{1}$ is such that if {\tt jmp X} is encoded |
---|
219 | as a short jump, there is a segment border just after $\mathtt{L}_{1}$. Let |
---|
220 | us also assume that all three branches to $\mathtt{L}_{1}$ are in the same |
---|
221 | segment, but far enough away from $\mathtt{L}_{1}$ that they cannot be encoded |
---|
222 | as short jumps. |
---|
223 | |
---|
224 | Then, if {\tt jmp X} were to be encoded as a short jump, which is clearly |
---|
225 | possible, all of the branches to $\mathtt{L}_{1}$ would have to be encoded as |
---|
226 | long jumps. However, if {\tt jmp X} were to be encoded as a long jump, and |
---|
227 | therefore increase in size, $\mathtt{L}_{1}$ would be `propelled' across the |
---|
228 | segment border, so that the three branches to $\mathtt{L}_{1}$ could be encoded |
---|
229 | as absolute jumps. Depending on the relative sizes of long and absolute jumps, |
---|
230 | this solution might actually be smaller than the one reached by the smallest |
---|
231 | fixed point algorithm. |
---|