source: src/ASM/CPP2012-policy/algorithm.tex @ 2091

Last change on this file since 2091 was 2091, checked in by boender, 7 years ago
  • systematically changed 'jump' to 'branch'
File size: 10.1 KB
Line 
1\section{Our algorithm}
2
3\subsection{Design decisions}
4
5Given the NP-completeness of the problem, to arrive at an optimal solution
6within a short space of time (using, for example, a constraint solver) will
7potentially take a great amount of time.
8
9The SDCC compiler~\cite{SDCC2011}, which has the MCS-51 among its target
10instruction sets, simply encodes every branch instruction as a long jump
11without taking the distance into account. While certainly correct (the long
12jump can reach any destination in memory) and rapid, it does result in a less
13than optimal solution.
14
15The {\tt gcc} compiler suite~\cite{GCC2012}, while compiling C on the x86
16architecture, uses a greatest fix point algorithm. In other words, it starts
17off with all branch instructions encoded as the largest jumps available, and
18then tries to reduce branch instructions as much as possible.
19
20Such an algorithm has the advantage that any intermediate result it returns
21is correct: the solution where every branch instruction is encoded as a large
22jump is always possible, and the algorithm only reduces those branch
23instructions whose destination address is in range for a shorter jump.
24The algorithm can thus be stopped after a determined amount of steps without
25losing correctness.
26
27The result, however, is not necessarily optimal, even if the algorithm is run
28until it terminates naturally: the fixed point reached is the {\em greatest}
29fixed point, not the least fixed point. Furthermore, {\tt gcc} (at least for
30the x86 architecture) only uses short and long jumps. This makes the algorithm
31more rapid, as shown in the previous section, but also results in a less
32optimal solution.
33
34In the CerCo assembler, we opted at first for a least fixed point algorithm,
35taking absolute jumps into account.
36
37Here, we ran into a problem with proving termination, as explained in the
38previous section: if we only take short and long jumps into account, the jump
39encoding can only switch from short to long, but never in the other direction.
40When we add absolute jumps, however, it is theoretically possible for a branch
41instruction to switch from absolute to long and back, as shown in the previous
42section.
43
44Proving termination then becomes difficult, because there is nothing that
45precludes a branch instruction from switching back and forth between absolute
46and long indefinitely.
47
48In order to keep the algorithm in the same complexity class and more easily
49prove termination, we decided to explicitly enforce the `branch instructions
50must always grow longer' requirement: if a branch instruction is encoded as a
51long jump in one iteration, it will also be encoded as a long jump in all the
52following iterations. This means that the encoding of any branch instruction
53can change at most two times: once from short to absolute (or long), and once
54from absolute to long.
55
56There is one complicating factor: suppose that a branch instruction is encoded
57in step $n$ as an absolute jump, but in step $n+1$ it is determined that
58(because of changes elsewhere) it can now be encoded as a short jump. Due to
59the requirement that the branch instructions must always grow longer, this
60means that the branch encoding will be encoded as an absolute jump in step
61$n+1$ as well.
62
63This is not necessarily correct, however: a branch instruction that can be
64encoded as a short jump cannot always also be encoded as an absolute jump
65(a short jump can bridge segments, whereas an absolute jump cannot). Therefore,
66in this situation we decide to encode the branch instruction as a long jump,
67which is always correct.
68
69The resulting algorithm, while not optimal, is at least as good as the ones
70from SDCC and {\tt gcc}, and potentially better. Its complexity remains
71the same (there are at most $2n$ iterations, where $n$ is the number of branch
72instructions in the program).
73
74\subsection{The algorithm in detail}
75
76The branch displacement algorithm forms part of the translation from
77pseudo-code to assembler. More specifically, it is used by the function that
78translates pseudo-addresses (natural numbers indicating the position of the
79instruction in the program) to actual addresses in memory.
80
81The original intention was to have two different functions, one function
82$\mathtt{policy}: \mathbb{N} \rightarrow \{\mathtt{short\_jump},
83\mathtt{absolute\_jump}, \mathtt{long\_jump}\}$ to associate jumps to their
84intended encoding, and a function $\sigma: \mathbb{N} \rightarrow
85\mathtt{Word}$ to associate pseudo-addresses to actual addresses. $\sigma$
86would use $\mathtt{policy}$ to determine the size of jump instructions.
87
88This turned out to be suboptimal from the algorithmic point of view and
89impossible to prove correct.
90
91From the algorithmic point of view, in order to create the $\mathtt{policy}$
92function, we must necessarily have a translation from pseudo-addresses
93to actual addresses (i.e. a $\sigma$ function): in order to judge the distance
94between a jump and its destination, we must know their memory locations.
95Conversely, in order to create the $\sigma$ function, we need to have the
96$\mathtt{policy}$ function, otherwise we do not know the sizes of the jump
97instructions in the program.
98
99Much the same problem appears when we try to prove the algorithm correct: the
100correctness of $\mathtt{policy}$ depends on the correctness of $\sigma$, and
101the correctness of $\sigma$ depends on the correctness of $\mathtt{policy}$.
102
103We solved this problem by integrating the $\mathtt{policy}$ and $\sigma$
104algorithms. We now have a function
105$\sigma: \mathbb{N} \rightarrow \mathtt{Word} \times \mathtt{bool}$ which
106associates a pseudo-address to an actual address. The boolean denotes a forced
107long jump; as noted in the previous section, if during the fixed point
108computation an absolute jump changes to be potentially re-encoded as a short
109jump, the result is actually a long jump. It might therefore be the case that
110jumps are encoded as long jumps without this actually being necessary, and this
111information needs to be passed to the code generating function.
112
113The assembler function encodes the jumps by checking the distance between
114source and destination according to $\sigma$, so it could select an absolute
115jump in a situation where there should be a long jump. The boolean is there
116to prevent this from happening by indicating the locations where a long jump
117should be encoded, even if a shorter jump is possible. This has no effect on
118correctness, since a long jump is applicable in any situation.
119
120\begin{figure}
121\begin{algorithmic}
122\Function{f}{$labels$,$old\_sigma$,$instr$,$ppc$,$acc$}
123        \State $\langle added, pc, sigma \rangle \gets acc$
124        \If {$instr$ is a backward jump to $j$}
125                \State $length \gets \mathrm{jump\_size}(pc,sigma_1(labels(j)))$
126        \ElsIf {$instr$ is a forward jump to $j$}
127                \State $length \gets \mathrm{jump\_size}(pc,old\_sigma_1(labels(j))+added)$
128        \Else
129                \State $length \gets \mathtt{short\_jump}$
130        \EndIf
131        \State $old\_length \gets \mathrm{old\_sigma_1}(ppc)$
132        \State $new\_length \gets \mathrm{max}(old\_length, length)$
133        \State $old\_size \gets \mathrm{old\_sigma_2}(ppc)$
134        \State $new\_size \gets \mathrm{instruction\_size}(instr,new\_length)$
135        \State $new\_added \gets added+(new\_size-old\_size)$
136        \State $new\_sigma_1(ppc+1) \gets pc+new\_size$
137        \State $new\_sigma_2(ppc) \gets new\_length$ \\
138        \Return $\langle new\_added, pc+new\_size, new\_sigma \rangle$
139\EndFunction
140\end{algorithmic}
141\caption{The heart of the algorithm}
142\label{f:jump_expansion_step}
143\end{figure}
144
145The algorithm, shown in figure~\ref{f:jump_expansion_step}, works by folding the
146function {\sc f} over the entire program, thus gradually constructing $sigma$.
147This constitutes one step in the fixed point calculation; successive steps
148repeat the fold until a fixed point is reached.
149
150Parameters of the function {\sc f} are:
151\begin{itemize}
152        \item a function $labels$ that associates a label to its pseudo-address;
153        \item $old\_sigma$, the $\sigma$ function returned by the previous
154                iteration of the fixed point calculation;
155        \item $instr$, the instruction currently under consideration;
156        \item $ppc$, the pseudo-address of $instr$;
157        \item $acc$, the fold accumulator, which contains $pc$ (the highest memory
158                address reached so far), $added$ (the number of bytes added to the program
159                size with respect to the previous iteration), and of course $sigma$, the
160                $\sigma$ function under construction.
161\end{itemize}
162
163The first two are parameters that remain the same through one iteration, the
164last three are standard parameters for a fold function (including $ppc$,
165which is simply the number of instructions of the program already processed).
166
167The $\sigma$ functions used by {\sc f} are not of the same type as the final
168$\sigma$ function: they are of type
169$\sigma: \mathbb{N} \rightarrow \mathbb{N} \times \{\mathtt{short\_jump},
170\mathtt{absolute\_jump},\mathtt{long\_jump}\}$; a function that associates a
171pseudo-address with an memory address and a jump length. We do this to be able
172to more easily compare the jump lengths between iterations. In the algorithm,
173we use the notation $sigma_1(x)$ to denote the memory address corresponding to
174$x$, and $sigma_2(x)$ to denote the jump length corresponding to $x$.
175
176Note that the $\sigma$ function used for label lookup varies depending on
177whether the label is behind our current position or ahead of it. For
178backward branches, where the label is behind our current position, we can use
179$sigma$ for lookup, since its memory address is already known. However, for
180forward branches, the memory address of the address of the label is not yet
181known, so we must use $old\_sigma$.
182
183We cannot use $old\_sigma$ without change: it might be the case that we have
184already increased the size some branch instructions before, making the program
185longer and moving every instruction forward. We must compensate for this by
186adding the size increase of the program to the label's memory address according
187to $old\_sigma$, so that branch instruction spans do not get compromised.
188
189Note also that we add the pc to $sigma$ at location $ppc+1$, whereas we add the
190jump length at location $ppc$. We do this so that $sigma(ppc)$ will always
191return a couple with the start address of the instruction at $ppc$ and the
192length of its branch instruction (if any); the end address of the program can
193be found at $sigma(n+1)$, where $n$ is the number of instructions in the
194program.
Note: See TracBrowser for help on using the repository browser.