1 | \section{Our algorithm} |
---|
2 | |
---|
3 | \subsection{Design decisions} |
---|
4 | |
---|
5 | Given the NP-completeness of the problem, to arrive at an optimal solution |
---|
6 | within a short space of time (using, for example, a constraint solver) will |
---|
7 | potentially take a great amount of time. |
---|
8 | |
---|
9 | The SDCC compiler~\cite{SDCC2011}, which has the MCS-51 among its target |
---|
10 | instruction sets, simply encodes every jump as a long jump without taking the |
---|
11 | distance into account. While certainly correct (the long jump can reach any |
---|
12 | destination in memory) and rapid, it does result in a less than optimal |
---|
13 | solution. |
---|
14 | |
---|
15 | The {\tt gcc} compiler suite~\cite{GCC2012}, while compiling C on the x86 |
---|
16 | architecture, uses a greatest fix point algorithm. In other words, it starts |
---|
17 | off with all jumps encoded as the largest jumps available, and then tries to |
---|
18 | reduce jumps as much as possible. |
---|
19 | |
---|
20 | Such an algorithm has the advantage that any intermediate results it returns |
---|
21 | are correct: the solution where every jump is encoded as a large jump is |
---|
22 | always possible, and the algorithm only reduces those jumps where the |
---|
23 | destination address is in range for a shorter jump instruction. The algorithm |
---|
24 | can thus be stopped after a determined amount of steps without losing |
---|
25 | correctness. |
---|
26 | |
---|
27 | The result, however, is not necessarily optimal, even if the algorithm is run |
---|
28 | until it terminates naturally: the fixed point reached is the {\em greatest} |
---|
29 | fixed point, not the least fixed point. Furthermore, {\tt gcc} (at least for |
---|
30 | the x86 architecture) only uses short and long jumps. This makes the algorithm |
---|
31 | more rapid, as shown in the previous section, but also results in a less |
---|
32 | optimal solution. |
---|
33 | |
---|
34 | In the CerCo assembler, we opted at first for a least fixed point algorithm, |
---|
35 | taking absolute jumps into account. |
---|
36 | |
---|
37 | Here, we ran into a problem with proving termination, as explained in the |
---|
38 | previous section: if we only take short and long jumps into account, the jump |
---|
39 | encoding can only switch from short to long, but never in the other direction. |
---|
40 | When we add absolute jumps, however, it is theoretically possible for a jump to |
---|
41 | switch from absolute to long and back, as explained in the previous section. |
---|
42 | |
---|
43 | Proving termination then becomes difficult, because there is nothing that |
---|
44 | precludes a jump from switching back and forth between absolute and long |
---|
45 | indefinitely. |
---|
46 | |
---|
47 | In order to keep the algorithm in the same complexity class and more easily |
---|
48 | prove termination, we decided to explicitly enforce the `jumps must always |
---|
49 | increase' requirement: if a jump is encoded as a long jump in one step, it will |
---|
50 | also be encoded as a long jump in all the following steps. This means that any |
---|
51 | jump can change at most two times: once from short to absolute (or long), and |
---|
52 | once from absolute to long. |
---|
53 | |
---|
54 | There is one complicating factor: suppose that a jump is encoded in step $n$ |
---|
55 | as an absolute jump, but in step $n+1$ it is determined that (because of changes |
---|
56 | elsewhere) it can now be encoded as a short jump. Due to the requirement that |
---|
57 | jumps must always increase, this means that the jump will be encoded as an |
---|
58 | absolute jump in step $n+1$ as well. |
---|
59 | |
---|
60 | This is not necessarily correct, however: it is not the case that any short |
---|
61 | jump can correctly be encoded as an absolute jump (a short jump can bridge |
---|
62 | segments, whereas an absolute jump cannot). Therefore, in this situation |
---|
63 | we decide to encode the jump as a long jump, which is always correct. |
---|
64 | |
---|
65 | The resulting algorithm, while not optimal, is at least as good as the ones |
---|
66 | from SDCC and {\tt gcc}, and potentially better. Its complexity remains |
---|
67 | the same (there are at most $2n$ iterations, where $n$ is the number of jumps |
---|
68 | in the program). |
---|
69 | |
---|
70 | \subsection{The algorithm in detail} |
---|
71 | |
---|
72 | The branch displacement algorithm forms part of the translation from |
---|
73 | pseudo-code to assembler. More specifically, it is used by the function that |
---|
74 | translates pseudo-addresses (natural numbers indicating the position of the |
---|
75 | instruction in the program) to actual addresses in memory. |
---|
76 | |
---|
77 | The original intention was to have two different functions, one function |
---|
78 | $\mathtt{policy}: \mathbb{N} \rightarrow \{\mathtt{short\_jump}, |
---|
79 | \mathtt{absolute\_jump}, \mathtt{long\_jump}\}$ to associate jumps to their |
---|
80 | intended encoding, and a function $\sigma: \mathbb{N} \rightarrow |
---|
81 | \mathtt{Word}$ to associate pseudo-addresses to actual addresses. $\sigma$ |
---|
82 | would use $\mathtt{policy}$ to determine the size of jump instructions. |
---|
83 | |
---|
84 | This turned out to be suboptimal from the algorithmic point of view and |
---|
85 | impossible to prove correct. |
---|
86 | |
---|
87 | From the algorithmic point of view, in order to create the $\mathtt{policy}$ |
---|
88 | function, we must necessarily have a translation from pseudo-addresses |
---|
89 | to actual addresses (i.e. a $\sigma$ function): in order to judge the distance |
---|
90 | between a jump and its destination, we must know their memory locations. |
---|
91 | Conversely, in order to create the $\sigma$ function, we need to have the |
---|
92 | $\mathtt{policy}$ function, otherwise we do not know the sizes of the jump |
---|
93 | instructions in the program. |
---|
94 | |
---|
95 | Much the same problem appears when we try to prove the algorithm correct: the |
---|
96 | correctness of $\mathtt{policy}$ depends on the correctness of $\sigma$, and |
---|
97 | the correctness of $\sigma$ depends on the correctness of $\mathtt{policy}$. |
---|
98 | |
---|
99 | We solved this problem by integrating the $\mathtt{policy}$ and $\sigma$ |
---|
100 | algorithms. We now have a function |
---|
101 | $\sigma: \mathbb{N} \rightarrow \mathtt{Word} \times \mathtt{bool}$ which |
---|
102 | associates a pseudo-address to an actual address. The boolean denotes a forced |
---|
103 | long jump; as noted in the previous section, if during the fixed point |
---|
104 | computation an absolute jump changes to be potentially re-encoded as a short |
---|
105 | jump, the result is actually a long jump. It might therefore be the case that |
---|
106 | jumps are encoded as long jumps without this actually being necessary, and this |
---|
107 | information needs to be passed to the code generating function. |
---|
108 | |
---|
109 | The assembler function encodes the jumps by checking the distance between |
---|
110 | source and destination according to $\sigma$, so it could select an absolute |
---|
111 | jump in a situation where there should be a long jump. The boolean is there |
---|
112 | to prevent this from happening by indicating the locations where a long jump |
---|
113 | should be encoded, even if a shorter jump is possible. This has no effect on |
---|
114 | correctness, since a long jump is applicable in any situation. |
---|
115 | |
---|
116 | \begin{figure} |
---|
117 | \begin{algorithmic} |
---|
118 | \Function{f}{$labels$,$old\_sigma$,$instr$,$ppc$,$acc$} |
---|
119 | \State $\langle added, pc, sigma \rangle \gets acc$ |
---|
120 | \If {$instr$ is a backward jump to $j$} |
---|
121 | \State $length \gets \mathrm{jump\_size}(pc,sigma_1(labels(j)))$ |
---|
122 | \ElsIf {$instr$ is a forward jump to $j$} |
---|
123 | \State $length \gets \mathrm{jump\_size}(pc,old\_sigma_1(labels(j))+added)$ |
---|
124 | \Else |
---|
125 | \State $length \gets \mathtt{short\_jump}$ |
---|
126 | \EndIf |
---|
127 | \State $old\_length \gets \mathrm{old\_sigma_1}(ppc)$ |
---|
128 | \State $new\_length \gets \mathrm{max}(old\_length, length)$ |
---|
129 | \State $old\_size \gets \mathrm{old\_sigma_2}(ppc)$ |
---|
130 | \State $new\_size \gets \mathrm{instruction\_size}(instr,new\_length)$ |
---|
131 | \State $new\_added \gets added+(new\_size-old\_size)$ |
---|
132 | \State $new\_sigma_1(ppc+1) \gets pc+new\_size$ |
---|
133 | \State $new\_sigma_2(ppc) \gets new\_length$ \\ |
---|
134 | \Return $\langle new\_added, pc+new\_size, new\_sigma \rangle$ |
---|
135 | \EndFunction |
---|
136 | \end{algorithmic} |
---|
137 | \caption{The heart of the algorithm} |
---|
138 | \label{f:jump_expansion_step} |
---|
139 | \end{figure} |
---|
140 | |
---|
141 | The algorithm, shown in figure~\ref{f:jump_expansion_step}, works by folding the |
---|
142 | function {\sc f} over the entire program, thus gradually constructing $sigma$. |
---|
143 | This constitutes one step in the fixed point calculation; successive steps |
---|
144 | repeat the fold until a fixed point is reached. |
---|
145 | |
---|
146 | Parameters of the function {\sc f} are: |
---|
147 | \begin{itemize} |
---|
148 | \item a function $labels$ that associates a label to its pseudo-address; |
---|
149 | \item $old\_sigma$, the $\sigma$ function returned by the previous |
---|
150 | iteration of the fixed point calculcation; |
---|
151 | \item $instr$, the instruction currently under consideration; |
---|
152 | \item $ppc$, the pseudo-address of $instr$; |
---|
153 | \item $acc$, the fold accumulator, which contains $pc$ (the highest memory |
---|
154 | address reached so far), $added$ (the number of bytes added to the program |
---|
155 | size with respect to the previous iteration), and of course $sigma$, the |
---|
156 | $\sigma$ function under construction. |
---|
157 | \end{itemize} |
---|
158 | |
---|
159 | The first two are parameters that remain the same through one iteration, the |
---|
160 | last three are standard parameters for a fold function (including $ppc$, |
---|
161 | which is simply the number of instructions of the program already processed). |
---|
162 | |
---|
163 | The $\sigma$ functions used by {\sc f} are not of the same type as the final |
---|
164 | $\sigma$ function: they are of type |
---|
165 | $\sigma: \mathbb{N} \rightarrow \mathbb{N} \times \{\mathtt{short\_jump}, |
---|
166 | \mathtt{absolute\_jump},\mathtt{long\_jump}\}$; a function that associates a |
---|
167 | pseudo-address with an memory address and a jump length. We do this to be able |
---|
168 | to more easily compare the jump lengths between iterations. In the algorithm, |
---|
169 | we use the notation $sigma_1(x)$ to denote the memory address corresponding to |
---|
170 | $x$, and $sigma_2(x)$ to denote the jump length corresponding to $x$. |
---|
171 | |
---|
172 | Note that the $\sigma$ function used for label lookup varies depending on |
---|
173 | whether the label is behind our current position or ahead of it. For |
---|
174 | backward jumps, where the label is behind our current position, we can use |
---|
175 | $sigma$ for lookup, since its memory address is already known. However, for |
---|
176 | forward jumps, the memory address of the address of the label is not yet |
---|
177 | known, so we must use $old\_sigma$. |
---|
178 | |
---|
179 | We cannot use $old\_sigma$ without change: it might be the case that we have |
---|
180 | already changed some jumps before, making the program longer. We must therefore |
---|
181 | compensate for this by adding the size increase of the program to the label's |
---|
182 | memory address according to $old\_sigma$, so that jump distances do not get |
---|
183 | compromised. |
---|
184 | |
---|
185 | Note also that we add the pc to $sigma$ at location $ppc+1$, whereas we add the |
---|
186 | jump length at location $ppc$. We do this so that $sigma(ppc)$ will always |
---|
187 | return a couple with the start address of the instruction at $ppc$ and the |
---|
188 | length of its jump; the end address of the program can be found at $sigma(n+1)$, |
---|
189 | where $n$ is the number of instructions in the program. |
---|