source: src/ASM/CPP2012-policy/problem.tex @ 2091

Last change on this file since 2091 was 2091, checked in by boender, 7 years ago
  • systematically changed 'jump' to 'branch'
File size: 10.2 KB
Line 
1\section{Introduction}
2
3The problem of branch displacement optimisation, also known as jump encoding, is
4a well-known problem in assembler design~\cite{Hyde2006}. It is caused by the
5fact that in many architecture sets, the encoding (and therefore size) of some
6instructions depends on the distance to their operand (the instruction 'span').
7The branch displacement optimisation problem consists in encoding these
8span-dependent instructions in such a way that the resulting program is as
9small as possible.
10
11This problem is the subject of the present paper. After introducing the problem
12in more detail, we will discuss the solutions used by other compilers, present
13the algorithm used by us in the CerCo assembler, and discuss its verification,
14that is the proofs of termination and correctness using the Matita theorem
15prover~\cite{Asperti2007}.
16 
17The research presented in this paper has been executed within the CerCo project
18which aims at formally verifying a C compiler with cost annotations. The
19target architecture for this project is the MCS-51, whose instruction set
20contains span-dependent instructions. Furthermore, its maximum addressable
21memory size is very small (65 Kbytes), which makes it important to generate
22programs that are as small as possible.
23
24With this optimisation, however, comes increased complexity and hence
25increased possibility for error. We must make sure that the branch instructions
26are encoded correctly, otherwise the assembled program will behave
27unpredictably.
28
29\section{The branch displacement optimisation problem}
30
31In most modern instruction sets that have them, the only span-dependent
32instructions are branch instructions. Taking the ubiquitous x86-64 instruction
33set as an example, we find that it contains are eleven different forms of the
34unconditional branch instruction, all with different ranges, instruction sizes
35and semantics (only six are valid in 64-bit mode, for example). Some examples
36are shown in figure~\ref{f:x86jumps}.
37
38\begin{figure}[h]
39\begin{center}
40\begin{tabular}{|l|l|l|}
41\hline
42Instruction & Size (bytes) & Displacement range \\
43\hline
44Short jump & 2 & -128 to 127 bytes \\
45Relative near jump & 5 & $-2^{32}$ to $2^{32}-1$ bytes \\
46Absolute near jump & 6 & one segment (64-bit address) \\
47Far jump & 8 & entire memory \\
48\hline
49\end{tabular}
50\end{center}
51\caption{List of x86 branch instructions}
52\label{f:x86jumps}
53\end{figure}
54
55The chosen target architecture of the CerCo project is the Intel MCS-51, which
56features three types of branch instructions (or jump instructions; the two terms
57are used interchangeably), as shown in figure~\ref{f:mcs51jumps}.
58
59\begin{figure}[h]
60\begin{center}
61\begin{tabular}{|l|l|l|l|}
62\hline
63Instruction & Size    & Execution time & Displacement range \\
64            & (bytes) & (cycles) & \\
65\hline
66SJMP (`short jump') & 2 & 2 & -128 to 127 bytes \\
67AJMP (`absolute jump') & 2 & 2 & one segment (11-bit address) \\
68LJMP (`long jump') & 3 & 3 & entire memory \\
69\hline
70\end{tabular}
71\end{center}
72\caption{List of MCS-51 branch instructions}
73\label{f:mcs51jumps}
74\end{figure}
75
76Conditional branch instructions are only available in short form, which
77means that a conditional branch outside the short address range has to be
78encoded using three branch instructions (for instructions whose logical
79negation is available, it can be done with two branch instructions, but for
80some instructions, this opposite is not available); the call instruction is
81only available in absolute and long forms.
82
83Note that even though the MCS-51 architecture is much less advanced and more
84simple than the x86-64 architecture, the basic types of branch instruction
85remain the same: a short jump with a limited range, an intra-segment jump and a
86jump that can reach the entire available memory.
87 
88Generally, in the code that is sent to the assembler as input, the only
89difference made between branch instructions is by semantics, not by span. This
90means that a distinction is made between an unconditional branch and the
91several kinds of conditional branch, but not between their short, absolute or
92long variants.
93
94The algorithm used by the assembler to encode these branch instructions into
95the different machine instructions is known as the {\em branch displacement
96algorithm}. The optimisation problem consists in using as small an encoding as
97possible, thus minimising program length and execution time.
98
99This problem is known to be NP-complete~\cite{Robertson1979,Szymanski1978},
100which could make finding an optimal solution very time-consuming.
101
102The canonical solution, as shown by Szymanski~\cite{Szymanski1978} or more
103recently by Dickson~\cite{Dickson2008} for the x86 instruction set, is to use a
104fixed point algorithm that starts out with the shortest possible encoding (all
105branch instruction encoded as short jumps, which is very probably not a correct
106solution) and then iterates over the program to re-encode those branch
107instructions whose target is outside their range.
108
109\subsection*{Adding absolute jumps}
110
111In both papers mentioned above, the encoding of a jump is only dependent on the
112distance between the jump and its target: below a certain value a short jump
113can be used; above this value the jump must be encoded as a long jump.
114
115Here, termination of the smallest fixed point algorithm is easy to prove. All
116branch instructions start out encoded as short jumps, which means that the
117distance between any branch instruction and its target is as short as possible.
118If, in this situation, there is a branch instruction $b$ whose span is not
119within the range for a short jump, we can be sure that we can never reach a
120situation where the span of $j$ is so small that it can be encoded as a short
121jump. This argument continues to hold throughout the subsequent iterations of
122the algorithm: short jumps can change into long jumps, but not vice versa
123(spans only increase). Hence, the algorithm either terminates when a fixed
124point is reached or when all short jumps have been changed into long jumps.
125
126Also, we can be certain that we have reached an optimal solution: a short jump
127is only changed into a long jump if it is absolutely necessary.
128
129However, neither of these claims (termination nor optimality) hold when we add
130the absolute jump.
131
132The reason for this is that with absolute jumps, the encoding of a branch
133instruction no longer depends only on the distance between the branch
134instruction and its target: in order for an absolute jump to be possible, they
135need to be in the same segment (for the MCS-51, this means that the first 5
136bytes of their addresses have to be equal). It is therefore entirely possible
137for two branch instructions with the same span to be encoded in different ways
138(absolute if the branch instruction and its target are in the same segment,
139long if this is not the case).
140
141This invalidates the termination argument: a branch instruction, once encoded
142as a long jump, can be re-encoded during a later iteration as an absolute jump.
143Consider the program shown in figure~\ref{f:term_example}. At the start of the
144first iteration, both the branch to {\tt X} and the branch to $\mathtt{L}_{0}$
145are encoded as small jumps. Let us assume that in this case, the placement of
146$\mathtt{L}_{0}$ and the branch to it are such that $\mathtt{L}_{0}$ is just
147outside the segment that contains this branch. Let us also assume that the
148distance between $\mathtt{L}_{0}$ and the branch to it are too large for the
149branch instruction to be encoded as a short jump.
150
151All this means that in the second iteration, the branch to $\mathtt{L}_{0}$ will
152be encoded as a long jump. If we assume that the branch to {\tt X} is encoded as
153a long jump as well, the size of the branch instruction will increase and
154$\mathtt{L}_{0}$ will be `propelled' into the same segment as its branch
155instruction, because every subsequent instruction will move one byte forward.
156Hence, in the third iteration, the branch to $\mathtt{L}_{0}$ can be encoded as
157an absolute jump. At first glance, there is nothing that prevents us from
158making a construction where two branch instructions interact in such a way as
159to keep switching between long and absolute encodings for an indefinite amount
160of iterations.
161
162\begin{figure}[h]
163\begin{alltt}
164    jmp X
165    \vdots
166L\(\sb{0}\):
167    \vdots
168    jmp L\(\sb{0}\)
169\end{alltt}
170\caption{Example of a program where a long jump becomes absolute}
171\label{f:term_example}
172\end{figure}
173
174In fact, this situation mirrors the explanation by
175Szymanski~\cite{Szymanski1978} of why the branch displacement optimisation
176problem is NP-complete. In this explanation, a condition for NP-completeness
177is the fact that programs be allowed to contain {\em pathological} jumps.
178These are branch instructions that can normally not be encoded as a short(er)
179jump, but gain this property when some other branch instructions are encoded as
180a long(er) jump. This is exactly what happens in figure~\ref{f:term_example}:
181by encoding the first branch instruction as a long jump, another branch
182instructions switches from long to absolute (which is shorter).
183
184The optimality argument no longer holds either. Let us consider the program
185shown in figure~\ref{f:opt_example}. Suppose that the distance between
186$\mathtt{L}_{0}$ and $\mathtt{L}_{1}$ is such that if {\tt jmp X} is encoded
187as a short jump, there is a segment border just after $\mathtt{L}_{1}$. Let
188us also assume that the three branches to $\mathtt{L}_{1}$ are all in the same
189segment, but far enough away from $\mathtt{L}_{1}$ that they cannot be encoded
190as short jumps.
191
192Then, if {\tt jmp X} were to be encoded as a short jump, which is clearly
193possible, all of the branches to $\mathtt{L}_{1}$ would have to be encoded as
194long jumps. However, if {\tt jmp X} were to be encoded as a long jump, and
195therefore increase in size, $\mathtt{L}_{1}$ would be `propelled' across the
196segment border, so that the three branches to $\mathtt{L}_{1}$ could be encoded
197as absolute jumps. Depending on the relative sizes of long and absolute jumps,
198this solution might actually be smaller than the one reached by the smallest
199fixed point algorithm.
200
201\begin{figure}[h]
202\begin{alltt}
203L\(\sb{0}\): jmp X
204X:
205    \vdots
206L\(\sb{1}\):
207    \vdots
208    jmp L\(\sb{1}\)
209    \vdots
210    jmp L\(\sb{1}\)
211    \vdots
212    jmp L\(\sb{1}\) 
213    \vdots
214\end{alltt}
215\caption{Example of a program where the fixed-point algorithm is not optimal}
216\label{f:opt_example}
217\end{figure}
Note: See TracBrowser for help on using the repository browser.