source: src/ASM/CPP2012-policy/algorithm.tex @ 2090

Last change on this file since 2090 was 2086, checked in by boender, 8 years ago
  • spell-check
File size: 9.8 KB
1\section{Our algorithm}
3\subsection{Design decisions}
5Given the NP-completeness of the problem, to arrive at an optimal solution
6within a short space of time (using, for example, a constraint solver) will
7potentially take a great amount of time.
9The SDCC compiler~\cite{SDCC2011}, which has the MCS-51 among its target
10instruction sets, simply encodes every jump as a long jump without taking the
11distance into account. While certainly correct (the long jump can reach any
12destination in memory) and rapid, it does result in a less than optimal
15The {\tt gcc} compiler suite~\cite{GCC2012}, while compiling C on the x86
16architecture, uses a greatest fix point algorithm. In other words, it starts
17off with all jumps encoded as the largest jumps available, and then tries to
18reduce jumps as much as possible.
20Such an algorithm has the advantage that any intermediate results it returns
21are correct: the solution where every jump is encoded as a large jump is
22always possible, and the algorithm only reduces those jumps where the
23destination address is in range for a shorter jump instruction. The algorithm
24can thus be stopped after a determined amount of steps without losing
27The result, however, is not necessarily optimal, even if the algorithm is run
28until it terminates naturally: the fixed point reached is the {\em greatest}
29fixed point, not the least fixed point. Furthermore, {\tt gcc} (at least for
30the x86 architecture) only uses short and long jumps. This makes the algorithm
31more rapid, as shown in the previous section, but also results in a less
32optimal solution.
34In the CerCo assembler, we opted at first for a least fixed point algorithm,
35taking absolute jumps into account.
37Here, we ran into a problem with proving termination, as explained in the
38previous section: if we only take short and long jumps into account, the jump
39encoding can only switch from short to long, but never in the other direction.
40When we add absolute jumps, however, it is theoretically possible for a jump to
41switch from absolute to long and back, as explained in the previous section.
43Proving termination then becomes difficult, because there is nothing that
44precludes a jump from switching back and forth between absolute and long
47In order to keep the algorithm in the same complexity class and more easily
48prove termination, we decided to explicitly enforce the `jumps must always
49increase' requirement: if a jump is encoded as a long jump in one step, it will
50also be encoded as a long jump in all the following steps. This means that any
51jump can change at most two times: once from short to absolute (or long), and
52once from absolute to long.
54There is one complicating factor: suppose that a jump is encoded in step $n$
55as an absolute jump, but in step $n+1$ it is determined that (because of changes
56elsewhere) it can now be encoded as a short jump. Due to the requirement that
57jumps must always increase, this means that the jump will be encoded as an
58absolute jump in step $n+1$ as well.
60This is not necessarily correct, however: it is not the case that any short
61jump can correctly be encoded as an absolute jump (a short jump can bridge
62segments, whereas an absolute jump cannot). Therefore, in this situation
63we decide to encode the jump as a long jump, which is always correct.
65The resulting algorithm, while not optimal, is at least as good as the ones
66from SDCC and {\tt gcc}, and potentially better. Its complexity remains
67the same (there are at most $2n$ iterations, where $n$ is the number of jumps
68in the program).
70\subsection{The algorithm in detail}
72The branch displacement algorithm forms part of the translation from
73pseudo-code to assembler. More specifically, it is used by the function that
74translates pseudo-addresses (natural numbers indicating the position of the
75instruction in the program) to actual addresses in memory.
77The original intention was to have two different functions, one function
78$\mathtt{policy}: \mathbb{N} \rightarrow \{\mathtt{short\_jump},
79\mathtt{absolute\_jump}, \mathtt{long\_jump}\}$ to associate jumps to their
80intended encoding, and a function $\sigma: \mathbb{N} \rightarrow
81\mathtt{Word}$ to associate pseudo-addresses to actual addresses. $\sigma$
82would use $\mathtt{policy}$ to determine the size of jump instructions.
84This turned out to be suboptimal from the algorithmic point of view and
85impossible to prove correct.
87From the algorithmic point of view, in order to create the $\mathtt{policy}$
88function, we must necessarily have a translation from pseudo-addresses
89to actual addresses (i.e. a $\sigma$ function): in order to judge the distance
90between a jump and its destination, we must know their memory locations.
91Conversely, in order to create the $\sigma$ function, we need to have the
92$\mathtt{policy}$ function, otherwise we do not know the sizes of the jump
93instructions in the program.
95Much the same problem appears when we try to prove the algorithm correct: the
96correctness of $\mathtt{policy}$ depends on the correctness of $\sigma$, and
97the correctness of $\sigma$ depends on the correctness of $\mathtt{policy}$.
99We solved this problem by integrating the $\mathtt{policy}$ and $\sigma$
100algorithms. We now have a function
101$\sigma: \mathbb{N} \rightarrow \mathtt{Word} \times \mathtt{bool}$ which
102associates a pseudo-address to an actual address. The boolean denotes a forced
103long jump; as noted in the previous section, if during the fixed point
104computation an absolute jump changes to be potentially re-encoded as a short
105jump, the result is actually a long jump. It might therefore be the case that
106jumps are encoded as long jumps without this actually being necessary, and this
107information needs to be passed to the code generating function.
109The assembler function encodes the jumps by checking the distance between
110source and destination according to $\sigma$, so it could select an absolute
111jump in a situation where there should be a long jump. The boolean is there
112to prevent this from happening by indicating the locations where a long jump
113should be encoded, even if a shorter jump is possible. This has no effect on
114correctness, since a long jump is applicable in any situation.
119        \State $\langle added, pc, sigma \rangle \gets acc$
120        \If {$instr$ is a backward jump to $j$}
121                \State $length \gets \mathrm{jump\_size}(pc,sigma_1(labels(j)))$
122        \ElsIf {$instr$ is a forward jump to $j$}
123                \State $length \gets \mathrm{jump\_size}(pc,old\_sigma_1(labels(j))+added)$
124        \Else
125                \State $length \gets \mathtt{short\_jump}$
126        \EndIf
127        \State $old\_length \gets \mathrm{old\_sigma_1}(ppc)$
128        \State $new\_length \gets \mathrm{max}(old\_length, length)$
129        \State $old\_size \gets \mathrm{old\_sigma_2}(ppc)$
130        \State $new\_size \gets \mathrm{instruction\_size}(instr,new\_length)$
131        \State $new\_added \gets added+(new\_size-old\_size)$
132        \State $new\_sigma_1(ppc+1) \gets pc+new\_size$
133        \State $new\_sigma_2(ppc) \gets new\_length$ \\
134        \Return $\langle new\_added, pc+new\_size, new\_sigma \rangle$
137\caption{The heart of the algorithm}
141The algorithm, shown in figure~\ref{f:jump_expansion_step}, works by folding the
142function {\sc f} over the entire program, thus gradually constructing $sigma$.
143This constitutes one step in the fixed point calculation; successive steps
144repeat the fold until a fixed point is reached.
146Parameters of the function {\sc f} are:
148        \item a function $labels$ that associates a label to its pseudo-address;
149        \item $old\_sigma$, the $\sigma$ function returned by the previous
150                iteration of the fixed point calculation;
151        \item $instr$, the instruction currently under consideration;
152        \item $ppc$, the pseudo-address of $instr$;
153        \item $acc$, the fold accumulator, which contains $pc$ (the highest memory
154                address reached so far), $added$ (the number of bytes added to the program
155                size with respect to the previous iteration), and of course $sigma$, the
156                $\sigma$ function under construction.
159The first two are parameters that remain the same through one iteration, the
160last three are standard parameters for a fold function (including $ppc$,
161which is simply the number of instructions of the program already processed).
163The $\sigma$ functions used by {\sc f} are not of the same type as the final
164$\sigma$ function: they are of type
165$\sigma: \mathbb{N} \rightarrow \mathbb{N} \times \{\mathtt{short\_jump},
166\mathtt{absolute\_jump},\mathtt{long\_jump}\}$; a function that associates a
167pseudo-address with an memory address and a jump length. We do this to be able
168to more easily compare the jump lengths between iterations. In the algorithm,
169we use the notation $sigma_1(x)$ to denote the memory address corresponding to
170$x$, and $sigma_2(x)$ to denote the jump length corresponding to $x$.
172Note that the $\sigma$ function used for label lookup varies depending on
173whether the label is behind our current position or ahead of it. For
174backward jumps, where the label is behind our current position, we can use
175$sigma$ for lookup, since its memory address is already known. However, for
176forward jumps, the memory address of the address of the label is not yet
177known, so we must use $old\_sigma$.
179We cannot use $old\_sigma$ without change: it might be the case that we have
180already changed some jumps before, making the program longer. We must therefore
181compensate for this by adding the size increase of the program to the label's
182memory address according to $old\_sigma$, so that jump distances do not get
185Note also that we add the pc to $sigma$ at location $ppc+1$, whereas we add the
186jump length at location $ppc$. We do this so that $sigma(ppc)$ will always
187return a couple with the start address of the instruction at $ppc$ and the
188length of its jump; the end address of the program can be found at $sigma(n+1)$,
189where $n$ is the number of instructions in the program.
Note: See TracBrowser for help on using the repository browser.