source: Papers/sttt/problem.tex @ 3470

Last change on this file since 3470 was 3470, checked in by boender, 5 years ago
  • added first version of STTT paper
File size: 11.1 KB
Line 
1\section{Introduction}
2
3The problem of branch displacement optimisation, also known as jump encoding, is
4a well-known problem in assembler design~\cite{Hyde2006}. Its origin lies in the
5fact that in many architecture sets, the encoding (and therefore size) of some
6instructions depends on the distance to their operand (the instruction 'span').
7The branch displacement optimisation problem consists of encoding these
8span-dependent instructions in such a way that the resulting program is as
9small as possible.
10
11This problem is the subject of the present paper. After introducing the problem
12in more detail, we will discuss the solutions used by other compilers, present
13the algorithm we use in the CerCo assembler, and discuss its verification,
14that is the proofs of termination and correctness using the Matita proof
15assistant~\cite{Asperti2007}.
16
17Formulating the final statement of correctness and finding the loop invariants
18have been non-trivial tasks and are, indeed, the main contribution of this
19paper. It has required considerable care and fine-tuning to formulate not
20only the minimal statement required for the ulterior proof of correctness of
21the assembler, but also the minimal set of invariants needed for the proof
22of correctness of the algorithm.
23
24The research presented in this paper has been executed within the CerCo project
25which aims at formally verifying a C compiler with cost annotations. The
26target architecture for this project is the MCS-51, whose instruction set
27contains span-dependent instructions. Furthermore, its maximum addressable
28memory size is very small (64 Kb), which makes it important to generate
29programs that are as small as possible. With this optimisation, however, comes increased complexity and hence increased possibility for error. We must make sure that the branch instructions are encoded correctly, otherwise the assembled program will behave unpredictably.
30
31All Matita files related to this development can be found on the CerCo
32website, \url{http://cerco.cs.unibo.it}. The specific part that contains the
33branch displacement algorithm is in the {\tt ASM} subdirectory, in the files
34{\tt PolicyFront.ma}, {\tt PolicyStep.ma} and {\tt Policy.ma}.
35
36\section{The branch displacement optimisation problem}
37
38In most modern instruction sets that have them, the only span-dependent
39instructions are branch instructions. Taking the ubiquitous x86-64 instruction
40set as an example, we find that it contains eleven different forms of the
41unconditional branch instruction, all with different ranges, instruction sizes
42and semantics (only six are valid in 64-bit mode, for example). Some examples
43are shown in Figure~\ref{f:x86jumps} (see also~\cite{IntelDev}).
44
45\begin{figure}[h]
46\begin{center}
47\begin{tabular}{|l|l|l|}
48\hline
49Instruction & Size (bytes) & Displacement range \\
50\hline
51Short jump & 2 & -128 to 127 bytes \\
52Relative near jump & 5 & $-2^{32}$ to $2^{32}-1$ bytes \\
53Absolute near jump & 6 & one segment (64-bit address) \\
54Far jump & 8 & entire memory (indirect jump) \\
55\hline
56\end{tabular}
57\end{center}
58\caption{List of x86 branch instructions}
59\label{f:x86jumps}
60\end{figure}
61
62The chosen target architecture of the CerCo project is the Intel MCS-51, which
63features three types of branch instructions (or jump instructions; the two terms
64are used interchangeably), as shown in Figure~\ref{f:mcs51jumps}.
65
66\begin{figure}[h]
67\begin{center}
68\begin{tabular}{|l|l|l|l|}
69\hline
70Instruction & Size    & Execution time & Displacement range \\
71            & (bytes) & (cycles) & \\
72\hline
73SJMP (`short jump') & 2 & 2 & -128 to 127 bytes \\
74AJMP (`absolute jump') & 2 & 2 & one segment (11-bit address) \\
75LJMP (`long jump') & 3 & 3 & entire memory \\
76\hline
77\end{tabular}
78\end{center}
79\caption{List of MCS-51 branch instructions}
80\label{f:mcs51jumps}
81\end{figure}
82
83Conditional branch instructions are only available in short form, which
84means that a conditional branch outside the short address range has to be
85encoded using three branch instructions (for instructions whose logical
86negation is available, it can be done with two branch instructions, but for
87some instructions this is not the case). The call instruction is
88only available in absolute and long forms.
89
90Note that even though the MCS-51 architecture is much less advanced and much
91simpler than the x86-64 architecture, the basic types of branch instruction
92remain the same: a short jump with a limited range, an intra-segment jump and a
93jump that can reach the entire available memory.
94 
95Generally, in code fed to the assembler as input, the only
96difference between branch instructions is semantics, not span. This
97means that a distinction is made between an unconditional branch and the
98several kinds of conditional branch, but not between their short, absolute or
99long variants.
100
101The algorithm used by the assembler to encode these branch instructions into
102the different machine instructions is known as the {\em branch displacement
103algorithm}. The optimisation problem consists of finding as small an encoding as
104possible, thus minimising program length and execution time.
105
106Similar problems, e.g. the branch displacement optimisation problem for other
107architectures, are known to be NP-complete~\cite{Robertson1979,Szymanski1978},
108which could make finding an optimal solution very time-consuming.
109
110The canonical solution, as shown by Szymanski~\cite{Szymanski1978} or more
111recently by Dickson~\cite{Dickson2008} for the x86 instruction set, is to use a
112fixed point algorithm that starts with the shortest possible encoding (all
113branch instruction encoded as short jumps, which is likely not a correct
114solution) and then iterates over the source to re-encode those branch
115instructions whose target is outside their range.
116
117\subsection*{Adding absolute jumps}
118
119In both papers mentioned above, the encoding of a jump is only dependent on the
120distance between the jump and its target: below a certain value a short jump
121can be used; above this value the jump must be encoded as a long jump.
122
123Here, termination of the smallest fixed point algorithm is easy to prove. All
124branch instructions start out encoded as short jumps, which means that the
125distance between any branch instruction and its target is as short as possible
126(all the intervening jumps are short).
127If, in this situation, there is a branch instruction $b$ whose span is not
128within the range for a short jump, we can be sure that we can never reach a
129situation where the span of $j$ is so small that it can be encoded as a short
130jump. This argument continues to hold throughout the subsequent iterations of
131the algorithm: short jumps can change into long jumps, but not \emph{vice versa},
132as spans only increase. Hence, the algorithm either terminates early when a fixed
133point is reached or when all short jumps have been changed into long jumps.
134
135Also, we can be certain that we have reached an optimal solution: a short jump
136is only changed into a long jump if it is absolutely necessary.
137
138However, neither of these claims (termination nor optimality) hold when we add
139the absolute jump. With absolute jumps, the encoding of a branch
140instruction no longer depends only on the distance between the branch
141instruction and its target. An absolute jump is possible when instruction and
142target are in the same segment (for the MCS-51, this means that the first 5
143bytes of their addresses have to be equal). It is therefore entirely possible
144for two branch instructions with the same span to be encoded in different ways
145(absolute if the branch instruction and its target are in the same segment,
146long if this is not the case).
147
148\begin{figure}[t]
149\begin{subfigure}[b]{.45\linewidth}
150\small
151\begin{alltt}
152    jmp X
153    \ldots
154L\(\sb{0}\): \ldots
155% Start of new segment if
156% jmp X is encoded as short
157    \ldots
158    jmp L\(\sb{0}\)
159\end{alltt}
160\caption{Example of a program where a long jump becomes absolute}
161\label{f:term_example}
162\end{subfigure}
163\hfill
164\begin{subfigure}[b]{.45\linewidth}
165\small
166\begin{alltt}
167L\(\sb{0}\): jmp X
168X:  \ldots
169    \ldots
170L\(\sb{1}\): \ldots
171% Start of new segment if
172% jmp X is encoded as short
173    \ldots
174    jmp L\(\sb{1}\)
175    \ldots
176    jmp L\(\sb{1}\)
177    \ldots
178    jmp L\(\sb{1}\) 
179    \ldots
180\end{alltt}
181\caption{Example of a program where the fixed-point algorithm is not optimal}
182\label{f:opt_example}
183\end{subfigure}
184\end{figure}
185
186This invalidates our earlier termination argument: a branch instruction, once encoded
187as a long jump, can be re-encoded during a later iteration as an absolute jump.
188Consider the program shown in Figure~\ref{f:term_example}. At the start of the
189first iteration, both the branch to {\tt X} and the branch to $\mathtt{L}_{0}$
190are encoded as small jumps. Let us assume that in this case, the placement of
191$\mathtt{L}_{0}$ and the branch to it are such that $\mathtt{L}_{0}$ is just
192outside the segment that contains this branch. Let us also assume that the
193distance between $\mathtt{L}_{0}$ and the branch to it is too large for the
194branch instruction to be encoded as a short jump.
195
196All this means that in the second iteration, the branch to $\mathtt{L}_{0}$ will
197be encoded as a long jump. If we assume that the branch to {\tt X} is encoded as
198a long jump as well, the size of the branch instruction will increase and
199$\mathtt{L}_{0}$ will be `propelled' into the same segment as its branch
200instruction, because every subsequent instruction will move one byte forward.
201Hence, in the third iteration, the branch to $\mathtt{L}_{0}$ can be encoded as
202an absolute jump. At first glance, there is nothing that prevents us from
203constructing a configuration where two branch instructions interact in such a
204way as to iterate indefinitely between long and absolute encodings.
205
206This situation mirrors the explanation by Szymanski~\cite{Szymanski1978} of why
207the branch displacement optimisation problem is NP-complete. In this explanation,
208a condition for NP-completeness is the fact that programs be allowed to contain
209{\em pathological} jumps. These are branch instructions that can normally not be
210encoded as a short(er) jump, but gain this property when some other branch
211instructions are encoded as a long(er) jump. This is exactly what happens in
212Figure~\ref{f:term_example}. By encoding the first branch instruction as a long
213jump, another branch instruction switches from long to absolute (which is
214shorter).
215
216In addition, our previous optimality argument no longer holds. Consider the
217program shown in Figure~\ref{f:opt_example}. Suppose that the distance between
218$\mathtt{L}_{0}$ and $\mathtt{L}_{1}$ is such that if {\tt jmp X} is encoded
219as a short jump, there is a segment border just after $\mathtt{L}_{1}$. Let
220us also assume that all three branches to $\mathtt{L}_{1}$ are in the same
221segment, but far enough away from $\mathtt{L}_{1}$ that they cannot be encoded
222as short jumps.
223
224Then, if {\tt jmp X} were to be encoded as a short jump, which is clearly
225possible, all of the branches to $\mathtt{L}_{1}$ would have to be encoded as
226long jumps. However, if {\tt jmp X} were to be encoded as a long jump, and
227therefore increase in size, $\mathtt{L}_{1}$ would be `propelled' across the
228segment border, so that the three branches to $\mathtt{L}_{1}$ could be encoded
229as absolute jumps. Depending on the relative sizes of long and absolute jumps,
230this solution might actually be smaller than the one reached by the smallest
231fixed point algorithm.
Note: See TracBrowser for help on using the repository browser.