source:src/ASM/CPP2012-policy/problem.tex@2086

Last change on this file since 2086 was 2086, checked in by boender, 8 years ago
• spell-check
File size: 9.5 KB
Line
1\section{Introduction}
2
3The problem of branch displacement optimisation, also known as jump encoding, is
4a well-known problem in assembler design. It is caused by the fact that in
5many architecture sets, the encoding (and therefore size) of some instructions
6depends on the distance to their operand (the instruction 'span'). The branch
7displacement optimisation problem consists in encoding these span-dependent
8instructions in such a way that the resulting program is as small as possible.
9
10This problem is the subject of the present paper. After introducing the problem
11in more detail, we will discuss the solutions used by other compilers, present
12the algorithm used by us in the CerCo assembler, and discuss its verification,
13that is the proofs of termination and correctness.
14
15The research presented in this paper has been executed within the CerCo project
16which aims at formally verifying a C compiler with cost annotations. The
17target architecture for this project is the MCS-51, whose instruction set
18contains span-dependent instructions. Furthermore, its maximum addressable
19memory size is very small (65 Kbytes), which makes it important to generate
20programs that are as small as possible.
21
22With this optimisation, however, comes increased complexity and hence
23increased possibility for error. We must make sure that the jumps are encoded
24correctly, otherwise the assembled program will behave unpredictably.
25
26\section{The branch displacement optimisation problem}
27
28In most modern instruction sets, the only span-dependent instructions are
29branch instructions. Taking the ubiquitous x86-64 instruction set as an
30example, we find that it contains are eleven different forms of the
31unconditional jump instruction, all with different ranges, instruction sizes
32and semantics (only six are valid in 64-bit mode, for example). Some examples
33are shown in figure~\ref{f:x86jumps}.
34
35\begin{figure}[h]
36\begin{center}
37\begin{tabular}{|l|l|l|}
38\hline
39Instruction & Size (bytes) & Displacement range \\
40\hline
41Short jump & 2 & -128 to 127 bytes \\
42Relative near jump & 5 & $-2^{32}$ to $2^{32}-1$ bytes \\
43Absolute near jump & 6 & one segment (64-bit address) \\
44Far jump & 8 & entire memory \\
45\hline
46\end{tabular}
47\end{center}
48\caption{List of x86 jump instructions}
49\label{f:x86jumps}
50\end{figure}
51
52The chosen target architecture of the CerCo project is the Intel MCS-51, which
53features three types of branch instructions (or jump instructions; the two terms
54are used interchangeably), as shown in figure~\ref{f:mcs51jumps}.
55
56\begin{figure}[h]
57\begin{center}
58\begin{tabular}{|l|l|l|l|}
59\hline
60Instruction & Size    & Execution time & Displacement range \\
61            & (bytes) & (cycles) & \\
62\hline
63SJMP (short jump') & 2 & 2 & -128 to 127 bytes \\
64AJMP (absolute jump') & 2 & 2 & one segment (11-bit address) \\
65LJMP (long jump') & 3 & 3 & entire memory \\
66\hline
67\end{tabular}
68\end{center}
69\caption{List of MCS-51 jump instructions}
70\label{f:mcs51jumps}
71\end{figure}
72
73The conditional jump instruction is only available in short form, which
74means that a conditional jump outside the short address range has to be
75encoded using two jumps; the call instruction is only available in
76absolute and long forms.
77
78Note that even though the MCS-51 architecture is much less advanced and more
79simple than the x86-64 architecture, the basic types of jump remain the same:
80a short jump with a limited range, an intra-segment jump and a jump that can
81reach the entire available memory.
82
83Generally, in the code that is sent to the assembler as input, the only
84difference made between jump instructions is by semantics, not by span. This
85means that a distinction is made between the unconditional jump and the several
86kinds of conditional jump, but not between their short, absolute or long
87variants.
88
89The algorithm used by the assembler to encode these jumps into the different
90machine instructions is known as the {\tt branch displacement algorithm}. The
91optimisation problem consists in using as small an encoding as possible, thus
92minimising program length and execution time.
93
94This problem is known to be NP-complete~\cite{Robertson1979,Szymanski1978},
95which could make finding an optimal solution very time-consuming.
96
97The canonical solution, as shown by Szymanski~\cite{Szymanski1978} or more
98recently by Dickson~\cite{Dickson2008} for the x86 instruction set, is to use a
99fixed point algorithm that starts out with the shortest possible encoding (all
100jumps encoded as short jumps, which is very probably not a correct solution)
101and then iterates over the program to re-encode those jumps whose target is
102outside their range.
103
105
106In both papers mentioned above, the encoding of a jump is only dependent on the
107distance between the jump and its target: below a certain value a short jump
108can be used; above this value the jump must be encoded as a long jump.
109
110Here, termination of the smallest fixed point algorithm is easy to prove. All
111jumps start out encoded as short jumps, which means that the distance between
112any jump and its target is as short as possible. If, in this situation, there
113is a jump $j$ whose span is not within the range for a short jump, we can be
114sure that we can never reach a situation where the span of $j$ is so small that
115it can be encoded as a short jump. This reasoning holds throughout the
116subsequent iterations of the algorithm: short jumps can change into long jumps,
117but not vice versa (spans only increase). Hence, the algorithm either
118terminates when a fixed point is reached or when all short jumps have been
119changed into long jumps.
120
121Also, we can be certain that we have reached an optimal solution: a short jump
122is only changed into a long jump if it is absolutely necessary.
123
124However, neither of these claims (termination nor optimality) hold when we add
125the absolute jump.
126
127The reason for this is that with absolute jumps, the encoding of a jump no
128longer depends only on the distance between the jump and its target: in order
129for an absolute jump to be possible, they need to be in the same segment (for
130the MCS-51, this means that the first 5 bytes of their addresses have to be
131equal). It is therefore entirely possible for two jumps with the same span to
132be encoded in different ways (absolute if the jump and its target are in the
133same segment, long if this is not the case).
134
135This invalidates the termination argument: a jump, once encoded as a long jump,
136can be re-encoded during a later iteration as an absolute jump. Consider the
137program shown in figure~\ref{f:term_example}. At the start of the first
138iteration, both the jump to {\tt X} and the jump to $\mathtt{L}_{0}$ are
139encoded as small jumps. Let us assume that in this case, the placement of
140$\mathtt{L}_{0}$ and the jump to it are such that $\mathtt{L}_{0}$ is just
141outside the segment that contains this jump. Let us also assume that the
142distance between $\mathtt{L}_{0}$ and the jump to it are too large for the jump
143to be encoded as a short jump.
144
145All this means that in the second iteration, the jump to $\mathtt{L}_{0}$ will
146be encoded as a long jump. If we assume that the jump to {\tt X} is encoded as
147a long jump as well, the size of the jump will increase and $\mathtt{L}_{0}$
148will be propelled' into the same segment as its jump. Hence, in the third
149iteration, it can be encoded as an absolute jump. At first glance, there is
150nothing that prevents us from making a construction where two jumps interact
151in such a way as to keep switching between long and absolute encodings for
152an indefinite amount of iterations.
153
154\begin{figure}[h]
155\begin{alltt}
156    jmp X
157    \vdots
158L$$\sb{0}$$:
159    \vdots
160    jmp L$$\sb{0}$$
161\end{alltt}
162\caption{Example of a program where a long jump becomes absolute}
163\label{f:term_example}
164\end{figure}
165
166In fact, this situation mirrors the explanation by
167Szymanski~\cite{Szymanski1978} of why the branch displacement optimisation
168problem is NP-complete. In this explanation, a condition for NP-completeness
169is the fact that programs be allowed to contain {\em pathological} jumps.
170These are jumps that can normally not be encoded as a short(er) jump, but gain
171this property when some other jumps are encoded as a long(er) jump. This is
172exactly what happens in figure~\ref{f:term_example}: by encoding the first
173jump as a long jump, another jump switches from long to absolute
174(which is shorter).
175
176The optimality argument no longer holds either. Let us consider the program
177shown in figure~\ref{f:opt_example}. Suppose that the distance between
178$\mathtt{L}_{0}$ and $\mathtt{L}_{1}$ is such that if {\tt jmp X} is encoded
179as a short jump, there is a segment border just after $\mathtt{L}_{1}$. Let
180us also assume that the three jumps to $\mathtt{L}_{1}$ are all in the same
181segment, but far enough away from $\mathtt{L}_{1}$ that they cannot be encoded
182as short jumps.
183
184Then, if {\tt jmp X} were to be encoded as a short jump, which is clearly
185possible, all of the jumps to $\mathtt{L}_{1}$ would have to be encoded as
186long jumps. However, if {\tt jmp X} were to be encoded as a long jump, and
187therefore increase in size, $\mathtt{L}_{1}$ would be `propelled' across the
188segment border, so that the three jumps to $\mathtt{L}_{1}$ could be encoded
189as absolute jumps. Depending on the relative sizes of long and absolute jumps,
190this solution might actually be smaller than the one reached by the smallest
191fixed point algorithm.
192
193\begin{figure}[h]
194\begin{alltt}
195L$$\sb{0}$$: jmp X
196X:
197    \vdots
198L$$\sb{1}$$:
199    \vdots
200    jmp L$$\sb{1}$$
201    \vdots
202    jmp L$$\sb{1}$$
203    \vdots
204    jmp L$$\sb{1}$$
205    \vdots
206\end{alltt}
207\caption{Example of a program where the fixed-point algorithm is not optimal}
208\label{f:opt_example}
209\end{figure}
Note: See TracBrowser for help on using the repository browser.