source: src/ASM/CPP2013-policy/algorithm.tex @ 3363

Last change on this file since 3363 was 3363, checked in by boender, 6 years ago
  • renamed directory
File size: 10.8 KB
RevLine 
[1889]1\section{Our algorithm}
[2049]2
3\subsection{Design decisions}
4
[3361]5Given the NP-completeness of the problem, finding optimal solutions
6(using, for example, a constraint solver) can potentially be very costly.
[2049]7
[3362]8The SDCC compiler~\cite{SDCC2011}, which has a backend targeting the MCS-51
[2096]9instruction set, simply encodes every branch instruction as a long jump
[2091]10without taking the distance into account. While certainly correct (the long
[2096]11jump can reach any destination in memory) and a very fast solution to compute,
[3362]12it results in a less than optimal solution in terms of output size and
13execution time.
[2049]14
[2096]15On the other hand, the {\tt gcc} compiler suite~\cite{GCC2012}, while compiling
16C on the x86 architecture, uses a greatest fix point algorithm. In other words,
17it starts with all branch instructions encoded as the largest jumps
18available, and then tries to reduce the size of branch instructions as much as
19possible.
[2080]20
[2091]21Such an algorithm has the advantage that any intermediate result it returns
22is correct: the solution where every branch instruction is encoded as a large
23jump is always possible, and the algorithm only reduces those branch
24instructions whose destination address is in range for a shorter jump.
[2096]25The algorithm can thus be stopped after a determined number of steps without
26sacrificing correctness.
[2049]27
[2096]28The result, however, is not necessarily optimal. Even if the algorithm is run
29until it terminates naturally, the fixed point reached is the {\em greatest}
[2080]30fixed point, not the least fixed point. Furthermore, {\tt gcc} (at least for
31the x86 architecture) only uses short and long jumps. This makes the algorithm
[2096]32more efficient, as shown in the previous section, but also results in a less
[2080]33optimal solution.
[2049]34
[2080]35In the CerCo assembler, we opted at first for a least fixed point algorithm,
[2085]36taking absolute jumps into account.
[2049]37
[2085]38Here, we ran into a problem with proving termination, as explained in the
39previous section: if we only take short and long jumps into account, the jump
40encoding can only switch from short to long, but never in the other direction.
[2091]41When we add absolute jumps, however, it is theoretically possible for a branch
[2096]42instruction to switch from absolute to long and back, as previously explained.
[2049]43
44Proving termination then becomes difficult, because there is nothing that
[2096]45precludes a branch instruction from oscillating back and forth between absolute
46and long jumps indefinitely.
[2049]47
[3361]48To keep the algorithm in the same complexity class and more easily
[2091]49prove termination, we decided to explicitly enforce the `branch instructions
50must always grow longer' requirement: if a branch instruction is encoded as a
51long jump in one iteration, it will also be encoded as a long jump in all the
[3361]52following iterations. Therefore the encoding of any branch instruction
[2091]53can change at most two times: once from short to absolute (or long), and once
54from absolute to long.
[2049]55
[2096]56There is one complicating factor. Suppose that a branch instruction is encoded
[2091]57in step $n$ as an absolute jump, but in step $n+1$ it is determined that
58(because of changes elsewhere) it can now be encoded as a short jump. Due to
[3361]59the requirement that the branch instructions must always grow longer,
60the branch encoding will be encoded as an absolute jump in step
[2091]61$n+1$ as well.
[2049]62
[2096]63This is not necessarily correct. A branch instruction that can be
64encoded as a short jump cannot always also be encoded as an absolute jump, as a
65short jump can bridge segments, whereas an absolute jump cannot. Therefore,
66in this situation we have decided to encode the branch instruction as a long
67jump, which is always correct.
[2049]68
[3362]69The resulting algorithm, therefore, will not return the least fixed point, as
70it might have too many long jumps. However, it is still better than the
71algorithms from SDCC and {\tt gcc}, since even in the worst case, it will still
72return a smaller or equal solution.
[2054]73
[3362]74As for complexity, there are at most $2n$ iterations, with $n$ the number of
75branch instructions. Practical tests within the CerCo project on small to
76medium pieces of code have shown that in almost all cases, a fixed point is
77reached in 3 passes. Only in one case did the algorithm need 4.
78
79This is not surprising: after all, the difference between short/absolute and
80long jumps is only one byte (three for conditional jumps). For a change from
81short/absolute to long to have an effect on other jumps is therefore relatively
82uncommon, which explains why a fixed point is reached so quickly.
83
[2054]84\subsection{The algorithm in detail}
85
86The branch displacement algorithm forms part of the translation from
[2096]87pseudocode to assembler. More specifically, it is used by the function that
[2054]88translates pseudo-addresses (natural numbers indicating the position of the
89instruction in the program) to actual addresses in memory.
90
[2096]91Our original intention was to have two different functions, one function
[2085]92$\mathtt{policy}: \mathbb{N} \rightarrow \{\mathtt{short\_jump},
93\mathtt{absolute\_jump}, \mathtt{long\_jump}\}$ to associate jumps to their
94intended encoding, and a function $\sigma: \mathbb{N} \rightarrow
[2096]95\mathtt{Word}$ to associate pseudo-addresses to machine addresses. $\sigma$
[2085]96would use $\mathtt{policy}$ to determine the size of jump instructions.
[2054]97
98This turned out to be suboptimal from the algorithmic point of view and
99impossible to prove correct.
100
101From the algorithmic point of view, in order to create the $\mathtt{policy}$
102function, we must necessarily have a translation from pseudo-addresses
[2096]103to machine addresses (i.e. a $\sigma$ function): in order to judge the distance
[2054]104between a jump and its destination, we must know their memory locations.
105Conversely, in order to create the $\sigma$ function, we need to have the
106$\mathtt{policy}$ function, otherwise we do not know the sizes of the jump
107instructions in the program.
108
109Much the same problem appears when we try to prove the algorithm correct: the
110correctness of $\mathtt{policy}$ depends on the correctness of $\sigma$, and
111the correctness of $\sigma$ depends on the correctness of $\mathtt{policy}$.
112
113We solved this problem by integrating the $\mathtt{policy}$ and $\sigma$
114algorithms. We now have a function
115$\sigma: \mathbb{N} \rightarrow \mathtt{Word} \times \mathtt{bool}$ which
[2096]116associates a pseudo-address to a machine address. The boolean denotes a forced
[2054]117long jump; as noted in the previous section, if during the fixed point
[2085]118computation an absolute jump changes to be potentially re-encoded as a short
119jump, the result is actually a long jump. It might therefore be the case that
120jumps are encoded as long jumps without this actually being necessary, and this
121information needs to be passed to the code generating function.
[2054]122
123The assembler function encodes the jumps by checking the distance between
[2085]124source and destination according to $\sigma$, so it could select an absolute
[2054]125jump in a situation where there should be a long jump. The boolean is there
126to prevent this from happening by indicating the locations where a long jump
127should be encoded, even if a shorter jump is possible. This has no effect on
128correctness, since a long jump is applicable in any situation.
129
[3361]130\begin{figure}[t]
[2054]131\begin{algorithmic}
[2064]132\Function{f}{$labels$,$old\_sigma$,$instr$,$ppc$,$acc$}
133        \State $\langle added, pc, sigma \rangle \gets acc$
134        \If {$instr$ is a backward jump to $j$}
135                \State $length \gets \mathrm{jump\_size}(pc,sigma_1(labels(j)))$
136        \ElsIf {$instr$ is a forward jump to $j$}
137                \State $length \gets \mathrm{jump\_size}(pc,old\_sigma_1(labels(j))+added)$
138        \Else
139                \State $length \gets \mathtt{short\_jump}$
140        \EndIf
141        \State $old\_length \gets \mathrm{old\_sigma_1}(ppc)$
142        \State $new\_length \gets \mathrm{max}(old\_length, length)$
143        \State $old\_size \gets \mathrm{old\_sigma_2}(ppc)$
144        \State $new\_size \gets \mathrm{instruction\_size}(instr,new\_length)$
145        \State $new\_added \gets added+(new\_size-old\_size)$
[3353]146        \State $new\_sigma \gets old\_sigma$
[2064]147        \State $new\_sigma_1(ppc+1) \gets pc+new\_size$
148        \State $new\_sigma_2(ppc) \gets new\_length$ \\
149        \Return $\langle new\_added, pc+new\_size, new\_sigma \rangle$
[2054]150\EndFunction
151\end{algorithmic}
152\caption{The heart of the algorithm}
153\label{f:jump_expansion_step}
154\end{figure}
[2064]155
[2096]156The algorithm, shown in Figure~\ref{f:jump_expansion_step}, works by folding the
[2064]157function {\sc f} over the entire program, thus gradually constructing $sigma$.
158This constitutes one step in the fixed point calculation; successive steps
159repeat the fold until a fixed point is reached.
160
161Parameters of the function {\sc f} are:
162\begin{itemize}
163        \item a function $labels$ that associates a label to its pseudo-address;
164        \item $old\_sigma$, the $\sigma$ function returned by the previous
[2086]165                iteration of the fixed point calculation;
[2064]166        \item $instr$, the instruction currently under consideration;
167        \item $ppc$, the pseudo-address of $instr$;
168        \item $acc$, the fold accumulator, which contains $pc$ (the highest memory
169                address reached so far), $added$ (the number of bytes added to the program
170                size with respect to the previous iteration), and of course $sigma$, the
171                $\sigma$ function under construction.
172\end{itemize}
173
174The first two are parameters that remain the same through one iteration, the
[2096]175final three are standard parameters for a fold function (including $ppc$,
[2064]176which is simply the number of instructions of the program already processed).
177
178The $\sigma$ functions used by {\sc f} are not of the same type as the final
179$\sigma$ function: they are of type
180$\sigma: \mathbb{N} \rightarrow \mathbb{N} \times \{\mathtt{short\_jump},
[2085]181\mathtt{absolute\_jump},\mathtt{long\_jump}\}$; a function that associates a
[2096]182pseudo-address with a memory address and a jump length. We do this to be able
[2098]183to ease the comparison of jump lengths between iterations. In the algorithm,
[2064]184we use the notation $sigma_1(x)$ to denote the memory address corresponding to
185$x$, and $sigma_2(x)$ to denote the jump length corresponding to $x$.
186
187Note that the $\sigma$ function used for label lookup varies depending on
188whether the label is behind our current position or ahead of it. For
[2091]189backward branches, where the label is behind our current position, we can use
[2064]190$sigma$ for lookup, since its memory address is already known. However, for
[2091]191forward branches, the memory address of the address of the label is not yet
[2064]192known, so we must use $old\_sigma$.
193
194We cannot use $old\_sigma$ without change: it might be the case that we have
[2096]195already increased the size of some branch instructions before, making the program
[2091]196longer and moving every instruction forward. We must compensate for this by
197adding the size increase of the program to the label's memory address according
198to $old\_sigma$, so that branch instruction spans do not get compromised.
[2064]199
200Note also that we add the pc to $sigma$ at location $ppc+1$, whereas we add the
201jump length at location $ppc$. We do this so that $sigma(ppc)$ will always
[2096]202return a pair with the start address of the instruction at $ppc$ and the
[2091]203length of its branch instruction (if any); the end address of the program can
204be found at $sigma(n+1)$, where $n$ is the number of instructions in the
205program.
Note: See TracBrowser for help on using the repository browser.