# Changeset 2085 for src/ASM/CPP2012-policy/algorithm.tex

Ignore:
Timestamp:
Jun 15, 2012, 11:36:47 AM (8 years ago)
Message:
• rewrote introduction
• changed 'medium' to 'absolute'
• added a bit to conclusion (CompCert?, Piton, ...)
File:
1 edited

### Legend:

Unmodified
 r2080 In the CerCo assembler, we opted at first for a least fixed point algorithm, taking medium jumps into account. taking absolute jumps into account. Here, we ran into a problem with proving termination: whereas the SDCC algorithm only switches jumps from short to long, when we add medium jumps, it is theoretically possible for a jump to switch from medium to long and back, as explained in the previous section. Here, we ran into a problem with proving termination, as explained in the previous section: if we only take short and long jumps into account, the jump encoding can only switch from short to long, but never in the other direction. When we add absolute jumps, however, it is theoretically possible for a jump to switch from absolute to long and back, as explained in the previous section. Proving termination then becomes difficult, because there is nothing that precludes a jump switching back and forth between medium and long indefinitely. precludes a jump from switching back and forth between absolute and long indefinitely. In fact, this mirrors the argument from~\cite{Szymanski1978}. There, it is argued that for the problem to be NP-complete, it must be allowed to contain {\em pathological} jumps. These are jumps that can normally not be encoded as a short(er) jump, but gain this property when some other jumps are encoded as a long(er) jump. This is exactly what happens in figure~\ref{f:term_example}: by encoding the first jump as a long jump, another jump switches from long to medium (which is shorter). In order to keep the algorithm linear and more easily prove termination, we decided to explicitly enforce the jumps must always increase' requirement: if a jump is encoded as a long jump in one step, it will also be encoded as a long jump in all the following steps. This means that any jump can change at maximum two times: once from short to medium (or long), and once from medium to long. In order to keep the algorithm in the same complexity class and more easily prove termination, we decided to explicitly enforce the jumps must always increase' requirement: if a jump is encoded as a long jump in one step, it will also be encoded as a long jump in all the following steps. This means that any jump can change at most two times: once from short to absolute (or long), and once from absolute to long. There is one complicating factor: suppose that a jump is encoded in step $n$ as a medium jump, but in step $n+1$ it is determined that (because of changes as an absolute jump, but in step $n+1$ it is determined that (because of changes elsewhere) it can now be encoded as a short jump. Due to the requirement that jumps must always increase, this means that the jump will be encoded as a medium jump in step $n+1$ as well. jumps must always increase, this means that the jump will be encoded as an absolute jump in step $n+1$ as well. This is not necessarily correct, however: it is not the case that any short jump can correctly be encoded as a medium jump (a short jump can bridge segments, whereas a medium jump cannot). Therefore, in this situation jump can correctly be encoded as an absolute jump (a short jump can bridge segments, whereas an absolute jump cannot). Therefore, in this situation we decide to encode the jump as a long jump, which is always correct. The resulting algorithm, while not optimal, is at least as good as the ones from {\tt gcc} and SDCC, and potentially better. Its complexity remains linear (though with a higher constant than SDCC). from SDCC and {\tt gcc}, and potentially better. Its complexity remains the same (there are at most $2n$ iterations, where $n$ is the number of jumps in the program). \subsection{The algorithm in detail} The original intention was to have two different functions, one function $\mathtt{policy}: \mathbb{N} \rightarrow \{\mathtt{short}, \mathtt{medium}, \mathtt{long}\}$ to associate jumps to their intended translation, and a function $\sigma: \mathbb{N} \rightarrow \mathtt{Word}$ to associate pseudo-addresses to actual addresses. $\sigma$ would use $\mathtt{policy}$ to determine the size of jump instructions. $\mathtt{policy}: \mathbb{N} \rightarrow \{\mathtt{short\_jump}, \mathtt{absolute\_jump}, \mathtt{long\_jump}\}$ to associate jumps to their intended encoding, and a function $\sigma: \mathbb{N} \rightarrow \mathtt{Word}$ to associate pseudo-addresses to actual addresses. $\sigma$ would use $\mathtt{policy}$ to determine the size of jump instructions. This turned out to be suboptimal from the algorithmic point of view and associates a pseudo-address to an actual address. The boolean denotes a forced long jump; as noted in the previous section, if during the fixed point computation a medium jump needs to be re-encoded as a short jump, the result is actually a long jump. It might therefore be the case that jumps are encoded as long jumps without this actually being necessary. computation an absolute jump changes to be potentially re-encoded as a short jump, the result is actually a long jump. It might therefore be the case that jumps are encoded as long jumps without this actually being necessary, and this information needs to be passed to the code generating function. The assembler function encodes the jumps by checking the distance between source and destination according to $\sigma$, so it could select a medium source and destination according to $\sigma$, so it could select an absolute jump in a situation where there should be a long jump. The boolean is there to prevent this from happening by indicating the locations where a long jump $\sigma$ function: they are of type $\sigma: \mathbb{N} \rightarrow \mathbb{N} \times \{\mathtt{short\_jump}, \mathtt{medium\_jump},\mathtt{long\_jump}\}$; a function that associates a \mathtt{absolute\_jump},\mathtt{long\_jump}\}\$; a function that associates a pseudo-address with an memory address and a jump length. We do this to be able to more easily compare the jump lengths between iterations. In the algorithm,