source: src/ASM/CPP2012-policy/problem.tex @ 3362

Last change on this file since 3362 was 3362, checked in by boender, 6 years ago
  • added some bits as per Claudio's mail
  • rewrote some small things
  • general reread, spell check, grammar check
  • 16 pages again now
File size: 10.4 KB
RevLine 
[2054]1\section{Introduction}
[1889]2
3The problem of branch displacement optimisation, also known as jump encoding, is
[3362]4a well-known problem in assembler design~\cite{Hyde2006}. Its origin lies in the
[2091]5fact that in many architecture sets, the encoding (and therefore size) of some
6instructions depends on the distance to their operand (the instruction 'span').
[2096]7The branch displacement optimisation problem consists of encoding these
[2091]8span-dependent instructions in such a way that the resulting program is as
9small as possible.
[1889]10
[2085]11This problem is the subject of the present paper. After introducing the problem
12in more detail, we will discuss the solutions used by other compilers, present
[2096]13the algorithm we use in the CerCo assembler, and discuss its verification,
14that is the proofs of termination and correctness using the Matita proof
15assistant~\cite{Asperti2007}.
[2085]16 
17The research presented in this paper has been executed within the CerCo project
18which aims at formally verifying a C compiler with cost annotations. The
19target architecture for this project is the MCS-51, whose instruction set
20contains span-dependent instructions. Furthermore, its maximum addressable
[2096]21memory size is very small (64 Kb), which makes it important to generate
[2085]22programs that are as small as possible.
[1889]23
[2085]24With this optimisation, however, comes increased complexity and hence
[2091]25increased possibility for error. We must make sure that the branch instructions
26are encoded correctly, otherwise the assembled program will behave
27unpredictably.
[2085]28
29\section{The branch displacement optimisation problem}
30
[2091]31In most modern instruction sets that have them, the only span-dependent
32instructions are branch instructions. Taking the ubiquitous x86-64 instruction
[2096]33set as an example, we find that it contains eleven different forms of the
[2091]34unconditional branch instruction, all with different ranges, instruction sizes
[2085]35and semantics (only six are valid in 64-bit mode, for example). Some examples
[2099]36are shown in Figure~\ref{f:x86jumps} (see also~\cite{IntelDev}).
[2085]37
[1889]38\begin{figure}[h]
[2085]39\begin{center}
[1889]40\begin{tabular}{|l|l|l|}
41\hline
42Instruction & Size (bytes) & Displacement range \\
43\hline
44Short jump & 2 & -128 to 127 bytes \\
45Relative near jump & 5 & $-2^{32}$ to $2^{32}-1$ bytes \\
46Absolute near jump & 6 & one segment (64-bit address) \\
[3354]47Far jump & 8 & entire memory (indirect jump) \\
[1889]48\hline
49\end{tabular}
[2085]50\end{center}
[2091]51\caption{List of x86 branch instructions}
[1889]52\label{f:x86jumps}
53\end{figure}
54
55The chosen target architecture of the CerCo project is the Intel MCS-51, which
[2085]56features three types of branch instructions (or jump instructions; the two terms
[2096]57are used interchangeably), as shown in Figure~\ref{f:mcs51jumps}.
[1889]58
59\begin{figure}[h]
[2085]60\begin{center}
[1889]61\begin{tabular}{|l|l|l|l|}
62\hline
63Instruction & Size    & Execution time & Displacement range \\
64            & (bytes) & (cycles) & \\
65\hline
66SJMP (`short jump') & 2 & 2 & -128 to 127 bytes \\
[2085]67AJMP (`absolute jump') & 2 & 2 & one segment (11-bit address) \\
[1889]68LJMP (`long jump') & 3 & 3 & entire memory \\
69\hline
70\end{tabular}
[2085]71\end{center}
[2091]72\caption{List of MCS-51 branch instructions}
[1889]73\label{f:mcs51jumps}
74\end{figure}
75
[2091]76Conditional branch instructions are only available in short form, which
77means that a conditional branch outside the short address range has to be
78encoded using three branch instructions (for instructions whose logical
79negation is available, it can be done with two branch instructions, but for
[3362]80some instructions this is not the case). The call instruction is
[2091]81only available in absolute and long forms.
[1889]82
[3362]83Note that even though the MCS-51 architecture is much less advanced and much
84simpler than the x86-64 architecture, the basic types of branch instruction
[2091]85remain the same: a short jump with a limited range, an intra-segment jump and a
86jump that can reach the entire available memory.
[1889]87 
[2096]88Generally, in code fed to the assembler as input, the only
89difference between branch instructions is semantics, not span. This
[2091]90means that a distinction is made between an unconditional branch and the
91several kinds of conditional branch, but not between their short, absolute or
92long variants.
[2085]93
[2091]94The algorithm used by the assembler to encode these branch instructions into
95the different machine instructions is known as the {\em branch displacement
[2096]96algorithm}. The optimisation problem consists of finding as small an encoding as
[2091]97possible, thus minimising program length and execution time.
[1889]98
[3362]99Similar problems, e.g. the branch displacement optimisation problem for other
100architectures, are known to be NP-complete~\cite{Robertson1979,Szymanski1978},
[1889]101which could make finding an optimal solution very time-consuming.
102
[2085]103The canonical solution, as shown by Szymanski~\cite{Szymanski1978} or more
104recently by Dickson~\cite{Dickson2008} for the x86 instruction set, is to use a
[2096]105fixed point algorithm that starts with the shortest possible encoding (all
106branch instruction encoded as short jumps, which is likely not a correct
[3361]107solution) and then iterates over the source to re-encode those branch
[2091]108instructions whose target is outside their range.
[1889]109
[2085]110\subsection*{Adding absolute jumps}
[1889]111
[2049]112In both papers mentioned above, the encoding of a jump is only dependent on the
113distance between the jump and its target: below a certain value a short jump
114can be used; above this value the jump must be encoded as a long jump.
[1889]115
[2064]116Here, termination of the smallest fixed point algorithm is easy to prove. All
[2091]117branch instructions start out encoded as short jumps, which means that the
118distance between any branch instruction and its target is as short as possible.
119If, in this situation, there is a branch instruction $b$ whose span is not
120within the range for a short jump, we can be sure that we can never reach a
121situation where the span of $j$ is so small that it can be encoded as a short
122jump. This argument continues to hold throughout the subsequent iterations of
[2096]123the algorithm: short jumps can change into long jumps, but not \emph{vice versa},
124as spans only increase. Hence, the algorithm either terminates early when a fixed
[2091]125point is reached or when all short jumps have been changed into long jumps.
[1889]126
127Also, we can be certain that we have reached an optimal solution: a short jump
128is only changed into a long jump if it is absolutely necessary.
129
130However, neither of these claims (termination nor optimality) hold when we add
[3362]131the absolute jump. With absolute jumps, the encoding of a branch
[2091]132instruction no longer depends only on the distance between the branch
[3362]133instruction and its target. An absolute jump is possible when instruction and
134target are in the same segment (for the MCS-51, this means that the first 5
[2091]135bytes of their addresses have to be equal). It is therefore entirely possible
136for two branch instructions with the same span to be encoded in different ways
137(absolute if the branch instruction and its target are in the same segment,
138long if this is not the case).
[1889]139
[3361]140\begin{figure}[t]
141\begin{subfigure}[b]{.45\linewidth}
[2096]142\begin{alltt}
143    jmp X
[3361]144    \ldots
145L\(\sb{0}\): \ldots
146% Start of new segment if
147% jmp X is encoded as short
148    \ldots
[2096]149    jmp L\(\sb{0}\)
150\end{alltt}
151\caption{Example of a program where a long jump becomes absolute}
152\label{f:term_example}
[3361]153\end{subfigure}
154\hfill
155\begin{subfigure}[b]{.45\linewidth}
156\begin{alltt}
157L\(\sb{0}\): jmp X
158X:  \ldots
159    \ldots
160L\(\sb{1}\): \ldots
161% Start of new segment if
162% jmp X is encoded as short
163    \ldots
164    jmp L\(\sb{1}\)
165    \ldots
166    jmp L\(\sb{1}\)
167    \ldots
168    jmp L\(\sb{1}\) 
169    \ldots
170\end{alltt}
171\caption{Example of a program where the fixed-point algorithm is not optimal}
172\label{f:opt_example}
173\end{subfigure}
[2096]174\end{figure}
175
176This invalidates our earlier termination argument: a branch instruction, once encoded
[2091]177as a long jump, can be re-encoded during a later iteration as an absolute jump.
[2096]178Consider the program shown in Figure~\ref{f:term_example}. At the start of the
[2091]179first iteration, both the branch to {\tt X} and the branch to $\mathtt{L}_{0}$
180are encoded as small jumps. Let us assume that in this case, the placement of
181$\mathtt{L}_{0}$ and the branch to it are such that $\mathtt{L}_{0}$ is just
182outside the segment that contains this branch. Let us also assume that the
[3353]183distance between $\mathtt{L}_{0}$ and the branch to it is too large for the
[2091]184branch instruction to be encoded as a short jump.
[1889]185
[2091]186All this means that in the second iteration, the branch to $\mathtt{L}_{0}$ will
187be encoded as a long jump. If we assume that the branch to {\tt X} is encoded as
188a long jump as well, the size of the branch instruction will increase and
189$\mathtt{L}_{0}$ will be `propelled' into the same segment as its branch
190instruction, because every subsequent instruction will move one byte forward.
191Hence, in the third iteration, the branch to $\mathtt{L}_{0}$ can be encoded as
192an absolute jump. At first glance, there is nothing that prevents us from
[2096]193constructing a configuration where two branch instructions interact in such a
194way as to iterate indefinitely between long and absolute encodings.
[1889]195
[2096]196This situation mirrors the explanation by Szymanski~\cite{Szymanski1978} of why
197the branch displacement optimisation problem is NP-complete. In this explanation,
198a condition for NP-completeness is the fact that programs be allowed to contain
199{\em pathological} jumps. These are branch instructions that can normally not be
200encoded as a short(er) jump, but gain this property when some other branch
201instructions are encoded as a long(er) jump. This is exactly what happens in
202Figure~\ref{f:term_example}. By encoding the first branch instruction as a long
203jump, another branch instruction switches from long to absolute (which is
204shorter).
[1889]205
[3362]206In addition, our previous optimality argument no longer holds. Consider the
207program shown in Figure~\ref{f:opt_example}. Suppose that the distance between
[1889]208$\mathtt{L}_{0}$ and $\mathtt{L}_{1}$ is such that if {\tt jmp X} is encoded
209as a short jump, there is a segment border just after $\mathtt{L}_{1}$. Let
[3362]210us also assume that all three branches to $\mathtt{L}_{1}$ are in the same
[1889]211segment, but far enough away from $\mathtt{L}_{1}$ that they cannot be encoded
212as short jumps.
213
[2096]214Then, if {\tt jmp X} were to be encoded as a short jump, which is clearly
215possible, all of the branches to $\mathtt{L}_{1}$ would have to be encoded as
216long jumps. However, if {\tt jmp X} were to be encoded as a long jump, and
217therefore increase in size, $\mathtt{L}_{1}$ would be `propelled' across the
218segment border, so that the three branches to $\mathtt{L}_{1}$ could be encoded
219as absolute jumps. Depending on the relative sizes of long and absolute jumps,
220this solution might actually be smaller than the one reached by the smallest
221fixed point algorithm.
Note: See TracBrowser for help on using the repository browser.