Changeset 3474

Sep 22, 2014, 11:26:39 AM (6 years ago)


1 edited


  • Papers/sttt/algorithm.tex

    r3470 r3474  
    1 \section{Our algorithm}
    3 \subsection{Design decisions}
    5 Given the NP-completeness of the problem, finding optimal solutions
    6 (using, for example, a constraint solver) can potentially be very costly.
    8 The SDCC compiler~\cite{SDCC2011}, which has a backend targeting the MCS-51
    9 instruction set, simply encodes every branch instruction as a long jump
    10 without taking the distance into account. While certainly correct (the long
    11 jump can reach any destination in memory) and a very fast solution to compute,
    12 it results in a less than optimal solution in terms of output size and
    13 execution time.
    15 On the other hand, the {\tt gcc} compiler suite, while compiling
    16 C on the x86 architecture, uses a greatest fix point algorithm. In other words,
    17 it starts with all branch instructions encoded as the largest jumps
    18 available, and then tries to reduce the size of branch instructions as much as
    19 possible.
    21 Such an algorithm has the advantage that any intermediate result it returns
    22 is correct: the solution where every branch instruction is encoded as a large
    23 jump is always possible, and the algorithm only reduces those branch
    24 instructions whose destination address is in range for a shorter jump.
    25 The algorithm can thus be stopped after a determined number of steps without
    26 sacrificing correctness.
    28 The result, however, is not necessarily optimal. Even if the algorithm is run
    29 until it terminates naturally, the fixed point reached is the {\em greatest}
    30 fixed point, not the least fixed point. Furthermore, {\tt gcc} (at least for
    31 the x86 architecture) only uses short and long jumps. This makes the algorithm
    32 more efficient, as shown in the previous section, but also results in a less
    33 optimal solution.
    35 In the CerCo assembler, we opted at first for a least fixed point algorithm,
    36 taking absolute jumps into account.
    38 Here, we ran into a problem with proving termination, as explained in the
    39 previous section: if we only take short and long jumps into account, the jump
    40 encoding can only switch from short to long, but never in the other direction.
    41 When we add absolute jumps, however, it is theoretically possible for a branch
    42 instruction to switch from absolute to long and back, as previously explained.
    43 Proving termination then becomes difficult, because there is nothing that
    44 precludes a branch instruction from oscillating back and forth between absolute
    45 and long jumps indefinitely.
    47 To keep the algorithm in the same complexity class and more easily
    48 prove termination, we decided to explicitly enforce the `branch instructions
    49 must always grow longer' requirement: if a branch instruction is encoded as a
    50 long jump in one iteration, it will also be encoded as a long jump in all the
    51 following iterations. Therefore the encoding of any branch instruction
    52 can change at most two times: once from short to absolute (or long), and once
    53 from absolute to long.
    55 There is one complicating factor. Suppose that a branch instruction is encoded
    56 in step $n$ as an absolute jump, but in step $n+1$ it is determined that
    57 (because of changes elsewhere) it can now be encoded as a short jump. Due to
    58 the requirement that the branch instructions must always grow longer,
    59 the branch encoding will be encoded as an absolute jump in step
    60 $n+1$ as well.
    62 This is not necessarily correct. A branch instruction that can be
    63 encoded as a short jump cannot always also be encoded as an absolute jump, as a
    64 short jump can bridge segments, whereas an absolute jump cannot. Therefore,
    65 in this situation we have decided to encode the branch instruction as a long
    66 jump, which is always correct.
    68 The resulting algorithm, therefore, will not return the least fixed point, as
    69 it might have too many long jumps. However, it is still better than the
    70 algorithms from SDCC and {\tt gcc}, since even in the worst case, it will still
    71 return a smaller or equal solution.
    73 Experimenting with our algorithm on the test suite of C programs included with
    74 gcc 2.3.3 has shown that on average, about 25 percent of jumps are encoded as short or absolute jumps by the algorithm. As not all instructions are jumps, this does not make for a large reduction in size, but it can make for a reduction in execution time: if jumps
    75 are executed multiple times, for example in loops, the fact that short jumps take less cycles to execute than long jumps can have great effect.
    77 As for complexity, there are at most $2n$ iterations, where $n$ is the number of
    78 branch instructions. Practical tests within the CerCo project on small to
    79 medium pieces of code have shown that in almost all cases, a fixed point is
    80 reached in 3 passes. Only in one case did the algorithm need 4. This is not surprising: after all, the difference between short/absolute and
    81 long jumps is only one byte (three for conditional jumps). For a change from
    82 short/absolute to long to have an effect on other jumps is therefore relatively
    83 uncommon, which explains why a fixed point is reached so quickly.
    85 \subsection{The algorithm in detail}
    87 The branch displacement algorithm forms part of the translation from
    88 pseudocode to assembler. More specifically, it is used by the function that
    89 translates pseudo-addresses (natural numbers indicating the position of the
    90 instruction in the program) to actual addresses in memory. Note that in pseudocode, all instructions are of size 1.
    92 Our original intention was to have two different functions, one function
    93 $\mathtt{policy}: \mathbb{N} \rightarrow \{\mathtt{short\_jump},
    94 \mathtt{absolute\_jump}, \mathtt{long\_jump}\}$ to associate jumps to their
    95 intended encoding, and a function $\sigma: \mathbb{N} \rightarrow
    96 \mathtt{Word}$ to associate pseudo-addresses to machine addresses. $\sigma$
    97 would use $\mathtt{policy}$ to determine the size of jump instructions. This turned out to be suboptimal from the algorithmic point of view and
    98 impossible to prove correct.
    100 From the algorithmic point of view, in order to create the $\mathtt{policy}$
    101 function, we must necessarily have a translation from pseudo-addresses
    102 to machine addresses (i.e. a $\sigma$ function): in order to judge the distance
    103 between a jump and its destination, we must know their memory locations.
    104 Conversely, in order to create the $\sigma$ function, we need to have the
    105 $\mathtt{policy}$ function, otherwise we do not know the sizes of the jump
    106 instructions in the program.
    108 Much the same problem appears when we try to prove the algorithm correct: the
    109 correctness of $\mathtt{policy}$ depends on the correctness of $\sigma$, and
    110 the correctness of $\sigma$ depends on the correctness of $\mathtt{policy}$.
    112 We solved this problem by integrating the $\mathtt{policy}$ and $\sigma$
    113 algorithms. We now have a function
    114 $\sigma: \mathbb{N} \rightarrow \mathtt{Word} \times \mathtt{bool}$ which
    115 associates a pseudo-address to a machine address. The boolean denotes a forced
    116 long jump; as noted in the previous section, if during the fixed point
    117 computation an absolute jump changes to be potentially re-encoded as a short
    118 jump, the result is actually a long jump. It might therefore be the case that
    119 jumps are encoded as long jumps without this actually being necessary, and this
    120 information needs to be passed to the code generating function.
    122 The assembler function encodes the jumps by checking the distance between
    123 source and destination according to $\sigma$, so it could select an absolute
    124 jump in a situation where there should be a long jump. The boolean is there
    125 to prevent this from happening by indicating the locations where a long jump
    126 should be encoded, even if a shorter jump is possible. This has no effect on
    127 correctness, since a long jump is applicable in any situation.
    129 \begin{figure}[t]
    130 \small
    131 \begin{algorithmic}
    132 \Function{f}{$labels$,$old\_sigma$,$instr$,$ppc$,$acc$}
    133         \State $\langle added, pc, sigma \rangle \gets acc$
    134         \If {$instr$ is a backward jump to $j$}
    135                 \State $length \gets \mathrm{jump\_size}(pc,sigma_1(labels(j)))$
    136                 \Comment compute jump distance
    137         \ElsIf {$instr$ is a forward jump to $j$}
    138                 \State $length \gets \mathrm{jump\_size}(pc,old\_sigma_1(labels(j))+added)$
    139         \EndIf
    140         \State $old\_length \gets \mathrm{old\_sigma_1}(ppc)$
    141         \State $new\_length \gets \mathrm{max}(old\_length, length)$
    142         \Comment length must never decrease
    143         \State $old\_size \gets \mathrm{old\_sigma_2}(ppc)$
    144         \State $new\_size \gets \mathrm{instruction\_size}(instr,new\_length)$
    145         \Comment compute size in bytes
    146         \State $new\_added \gets added+(new\_size-old\_size)$
    147         \Comment keep track of total added bytes
    148         \State $new\_sigma \gets old\_sigma$
    149         \State $new\_sigma_1(ppc+1) \gets pc+new\_size$
    150         \State $new\_sigma_2(ppc) \gets new\_length$
    151         \Comment update $\sigma$ \\
    152         \Return $\langle new\_added, pc+new\_size, new\_sigma \rangle$
    153 \EndFunction
    154 \end{algorithmic}
    155 \caption{The heart of the algorithm}
    156 \label{f:jump_expansion_step}
    157 \end{figure}
    159 The algorithm, shown in Figure~\ref{f:jump_expansion_step}, works by folding the
    160 function {\sc f} over the entire program, thus gradually constructing $sigma$.
    161 This constitutes one step in the fixed point calculation; successive steps
    162 repeat the fold until a fixed point is reached. We have abstracted away the case where an instruction is not a jump, since the size of these instructions is constant.
    164 Parameters of the function {\sc f} are:
    165 \begin{itemize}
    166         \item a function $labels$ that associates a label to its pseudo-address;
    167         \item $old\_sigma$, the $\sigma$ function returned by the previous
    168                 iteration of the fixed point calculation;
    169         \item $instr$, the instruction currently under consideration;
    170         \item $ppc$, the pseudo-address of $instr$;
    171         \item $acc$, the fold accumulator, which contains $added$ (the number of
    172         bytes added to the program size with respect to the previous iteration), $pc$
    173         (the highest memory address reached so far), and of course $sigma$, the
    174                 $\sigma$ function under construction.
    175 \end{itemize}
    176 The first two are parameters that remain the same through one iteration, the
    177 final three are standard parameters for a fold function (including $ppc$,
    178 which is simply the number of instructions of the program already processed).
    180 The $\sigma$ functions used by {\sc f} are not of the same type as the final
    181 $\sigma$ function: they are of type
    182 $\sigma: \mathbb{N} \rightarrow \mathbb{N} \times \{\mathtt{short\_jump},
    183 \mathtt{absolute\_jump},\mathtt{long\_jump}\}$; a function that associates a
    184 pseudo-address with a memory address and a jump length. We do this to
    185 ease the comparison of jump lengths between iterations. In the algorithm,
    186 we use the notation $sigma_1(x)$ to denote the memory address corresponding to
    187 $x$, and $sigma_2(x)$ for the jump length corresponding to $x$.
    189 Note that the $\sigma$ function used for label lookup varies depending on
    190 whether the label is behind our current position or ahead of it. For
    191 backward branches, where the label is behind our current position, we can use
    192 $sigma$ for lookup, since its memory address is already known. However, for
    193 forward branches, the memory address of the address of the label is not yet
    194 known, so we must use $old\_sigma$.
    196 We cannot use $old\_sigma$ without change: it might be the case that we have
    197 already increased the size of some branch instructions before, making the
    198 program longer and moving every instruction forward. We must compensate for this
    199 by adding the size increase of the program to the label's memory address
    200 according to $old\_sigma$, so that branch instruction spans do not get
    201 compromised.
    203 %Note also that we add the pc to $sigma$ at location $ppc+1$, whereas we add the
    204 %jump length at location $ppc$. We do this so that $sigma(ppc)$ will always
    205 %return a pair with the start address of the instruction at $ppc$ and the
    206 %length of its branch instruction (if any); the end address of the program can
    207 %be found at $sigma(n+1)$, where $n$ is the number of instructions in the
    208 %program.
Note: See TracChangeset for help on using the changeset viewer.