Index: src/ASM/CPP2012policy/algorithm.tex
===================================================================
 src/ASM/CPP2012policy/algorithm.tex (revision 2084)
+++ src/ASM/CPP2012policy/algorithm.tex (revision 2085)
@@ 33,43 +33,38 @@
In the CerCo assembler, we opted at first for a least fixed point algorithm,
taking medium jumps into account.
+taking absolute jumps into account.
Here, we ran into a problem with proving termination: whereas the SDCC
algorithm only switches jumps from short to long, when we add medium jumps,
it is theoretically possible for a jump to switch from medium to long and back,
as explained in the previous section.
+Here, we ran into a problem with proving termination, as explained in the
+previous section: if we only take short and long jumps into account, the jump
+encoding can only switch from short to long, but never in the other direction.
+When we add absolute jumps, however, it is theoretically possible for a jump to
+switch from absolute to long and back, as explained in the previous section.
Proving termination then becomes difficult, because there is nothing that
precludes a jump switching back and forth between medium and long indefinitely.
+precludes a jump from switching back and forth between absolute and long
+indefinitely.
In fact, this mirrors the argument from~\cite{Szymanski1978}. There, it is
argued that for the problem to be NPcomplete, it must be allowed to contain
{\em pathological} jumps. These are jumps that can normally not be encoded as a
short(er) jump, but gain this property when some other jumps are encoded as a
long(er) jump. This is exactly what happens in figure~\ref{f:term_example}: by
encoding the first jump as a long jump, another jump switches from long to
medium (which is shorter).

In order to keep the algorithm linear and more easily prove termination, we
decided to explicitly enforce the `jumps must always increase' requirement: if
a jump is encoded as a long jump in one step, it will also be encoded as a
long jump in all the following steps. This means that any jump can change at
maximum two times: once from short to medium (or long), and once from medium
to long.
+In order to keep the algorithm in the same complexity class and more easily
+prove termination, we decided to explicitly enforce the `jumps must always
+increase' requirement: if a jump is encoded as a long jump in one step, it will
+also be encoded as a long jump in all the following steps. This means that any
+jump can change at most two times: once from short to absolute (or long), and
+once from absolute to long.
There is one complicating factor: suppose that a jump is encoded in step $n$
as a medium jump, but in step $n+1$ it is determined that (because of changes
+as an absolute jump, but in step $n+1$ it is determined that (because of changes
elsewhere) it can now be encoded as a short jump. Due to the requirement that
jumps must always increase, this means that the jump will be encoded as a
medium jump in step $n+1$ as well.
+jumps must always increase, this means that the jump will be encoded as an
+absolute jump in step $n+1$ as well.
This is not necessarily correct, however: it is not the case that any short
jump can correctly be encoded as a medium jump (a short jump can bridge
segments, whereas a medium jump cannot). Therefore, in this situation
+jump can correctly be encoded as an absolute jump (a short jump can bridge
+segments, whereas an absolute jump cannot). Therefore, in this situation
we decide to encode the jump as a long jump, which is always correct.
The resulting algorithm, while not optimal, is at least as good as the ones
from {\tt gcc} and SDCC, and potentially better. Its complexity remains
linear (though with a higher constant than SDCC).
+from SDCC and {\tt gcc}, and potentially better. Its complexity remains
+the same (there are at most $2n$ iterations, where $n$ is the number of jumps
+in the program).
\subsection{The algorithm in detail}
@@ 81,9 +76,9 @@
The original intention was to have two different functions, one function
$\mathtt{policy}: \mathbb{N} \rightarrow \{\mathtt{short}, \mathtt{medium},
\mathtt{long}\}$ to associate jumps to their intended translation, and a
function $\sigma: \mathbb{N} \rightarrow \mathtt{Word}$ to associate
pseudoaddresses to actual addresses. $\sigma$ would use $\mathtt{policy}$ to
determine the size of jump instructions.
+$\mathtt{policy}: \mathbb{N} \rightarrow \{\mathtt{short\_jump},
+\mathtt{absolute\_jump}, \mathtt{long\_jump}\}$ to associate jumps to their
+intended encoding, and a function $\sigma: \mathbb{N} \rightarrow
+\mathtt{Word}$ to associate pseudoaddresses to actual addresses. $\sigma$
+would use $\mathtt{policy}$ to determine the size of jump instructions.
This turned out to be suboptimal from the algorithmic point of view and
@@ 107,10 +102,11 @@
associates a pseudoaddress to an actual address. The boolean denotes a forced
long jump; as noted in the previous section, if during the fixed point
computation a medium jump needs to be reencoded as a short jump, the result
is actually a long jump. It might therefore be the case that jumps are encoded
as long jumps without this actually being necessary.
+computation an absolute jump changes to be potentially reencoded as a short
+jump, the result is actually a long jump. It might therefore be the case that
+jumps are encoded as long jumps without this actually being necessary, and this
+information needs to be passed to the code generating function.
The assembler function encodes the jumps by checking the distance between
source and destination according to $\sigma$, so it could select a medium
+source and destination according to $\sigma$, so it could select an absolute
jump in a situation where there should be a long jump. The boolean is there
to prevent this from happening by indicating the locations where a long jump
@@ 168,5 +164,5 @@
$\sigma$ function: they are of type
$\sigma: \mathbb{N} \rightarrow \mathbb{N} \times \{\mathtt{short\_jump},
\mathtt{medium\_jump},\mathtt{long\_jump}\}$; a function that associates a
+\mathtt{absolute\_jump},\mathtt{long\_jump}\}$; a function that associates a
pseudoaddress with an memory address and a jump length. We do this to be able
to more easily compare the jump lengths between iterations. In the algorithm,
Index: src/ASM/CPP2012policy/biblio.bib
===================================================================
 src/ASM/CPP2012policy/biblio.bib (revision 2084)
+++ src/ASM/CPP2012policy/biblio.bib (revision 2085)
@@ 67,2 +67,29 @@
year = {2012}
}
+
+@article {Leroy2009,
+ author = {Leroy, Xavier},
+ affiliation = {INRIA ParisRocquencourt B.P. 105 78153 Le Chesnay France},
+ title = {A Formally Verified Compiler Backend},
+ journal = {Journal of Automated Reasoning},
+ publisher = {Springer Netherlands},
+ issn = {01687433},
+ keyword = {Computer Science},
+ pages = {363446},
+ volume = {43},
+ issue = {4},
+ url = {http://dx.doi.org/10.1007/s1081700991554},
+ note = {10.1007/s1081700991554},
+ year = {2009}
+}
+
+@book {Moore1996,
+ author = {J Strother Moore},
+ title = {Piton: A mechanically verified assembly language},
+ series = {Automated Reasoning Series},
+ volume = {3},
+ isbn = {9780792339205},
+ publisher = {Springer},
+ year = {1996}
+}
+
Index: src/ASM/CPP2012policy/conclusion.tex
===================================================================
 src/ASM/CPP2012policy/conclusion.tex (revision 2084)
+++ src/ASM/CPP2012policy/conclusion.tex (revision 2085)
@@ 5,4 +5,16 @@
termination and correctness for this algorithm, as formalised in Matita.
+The algorithm we have presented is fast and correct, but not optimal; a true
+optimal solution would need techniques like constraint solvers. While outside
+the scope of the present research, it would be interesting to see if enough
+heuristics could be find to make such a solution practical for implementing
+in an existent compiler; this would be especially useful for embedded systems,
+where it is important to have a small solution as possible.
+
+This algorithm is part of a greater whole, the CerCo project, which aims at
+the complete formalisation and verification of a compiler. More information
+on the formalisation of the assembler, of which the present work is a part,
+can be found in a companion publication.
+
\subsection{Related work}
@@ 10,6 +22,16 @@
displacement algorithm.
(Piton? CompCert?)
+The CompCert project is another project aimed at formally verifying a compiler.
+Their backend~\cite{Leroy2009} generates assembly for (amongst others) the
+PowerPC and x86 (32bit) architectures, but does not include a branch
+displacement algorithm; at least for the x86 architecture, all jumps are
+encoded as long jumps.
+There is also Piton~\cite{Moore1996}, which not only includes the
+formal verification of a compiler, but also of the machine architecture
+targeted by that compiler. However, this architecture does not have different
+jump sizes (branching is simulated by assigning values to the program counter),
+so the branch displacement problem does not occur.
+
\subsection{Formal development}
Index: src/ASM/CPP2012policy/problem.tex
===================================================================
 src/ASM/CPP2012policy/problem.tex (revision 2084)
+++ src/ASM/CPP2012policy/problem.tex (revision 2085)
@@ 2,16 +2,37 @@
The problem of branch displacement optimisation, also known as jump encoding, is
a wellknown problem in assembler design.

In many instruction sets, amongst which the ubiquitous x86 architecture (both
its 32 and 64bit incarnations), there are instructions whose addressing
mode varies with the distance between the instruction and the object they
address. Mostly this occurs with jump instructions; for example, in the
x8664 instruction set there are eleven different forms of the unconditional
jump instruction, all with different ranges, instruction sizes and semantics
(only six are valid in 64bit mode, for example). Some examples are shown in
figure~\ref{f:x86jumps}:

\begin{figure}[h]
+a wellknown problem in assembler design. It is caused by the fact that in
+many architecture sets, the encoding (and therefore size) of some instructions
+depends on the distance to their operand (the instruction 'span'). The branch
+displacement optimisation problem consists in encoding these spandependent
+instructions in such a way that the resulting program is as small as possible.
+
+This problem is the subject of the present paper. After introducing the problem
+in more detail, we will discuss the solutions used by other compilers, present
+the algorithm used by us in the CerCo assembler, and discuss its verification,
+that is the proofs of termination and correctness.
+
+The research presented in this paper has been executed within the CerCo project
+which aims at formally verifying a C compiler with cost annotations. The
+target architecture for this project is the MCS51, whose instruction set
+contains spandependent instructions. Furthermore, its maximum addressable
+memory size is very small (65 Kbytes), which makes it important to generate
+programs that are as small as possible.
+
+With this optimisation, however, comes increased complexity and hence
+increased possibility for error. We must make sure that the jumps are encoded
+correctly, otherwise the assembled program will behave impredictably.
+
+\section{The branch displacement optimisation problem}
+
+In most modern instruction sets, the only spandependent instructions are
+branch instructions. Taking the ubiquitous x8664 instruction set as an
+example, we find that it contains are eleven different forms of the
+unconditional jump instruction, all with different ranges, instruction sizes
+and semantics (only six are valid in 64bit mode, for example). Some examples
+are shown in figure~\ref{f:x86jumps}.
+
+\begin{figure}[h]
+\begin{center}
\begin{tabular}{lll}
\hline
@@ 24,4 +45,5 @@
\hline
\end{tabular}
+\end{center}
\caption{List of x86 jump instructions}
\label{f:x86jumps}
@@ 29,8 +51,9 @@
The chosen target architecture of the CerCo project is the Intel MCS51, which
features three types of jump instructions, as shown in
figure~\ref{f:mcs51jumps}.

\begin{figure}[h]
+features three types of branch instructions (or jump instructions; the two terms
+are used interchangeably), as shown in figure~\ref{f:mcs51jumps}.
+
+\begin{figure}[h]
+\begin{center}
\begin{tabular}{llll}
\hline
@@ 39,8 +62,9 @@
\hline
SJMP (`short jump') & 2 & 2 & 128 to 127 bytes \\
AJMP (`medium jump') & 2 & 2 & one segment (11bit address) \\
+AJMP (`absolute jump') & 2 & 2 & one segment (11bit address) \\
LJMP (`long jump') & 3 & 3 & entire memory \\
\hline
\end{tabular}
+\end{center}
\caption{List of MCS51 jump instructions}
\label{f:mcs51jumps}
@@ 50,5 +74,5 @@
means that a conditional jump outside the short address range has to be
encoded using two jumps; the call instruction is only available in
medium and long forms.
+absolute and long forms.
Note that even though the MCS51 architecture is much less advanced and more
@@ 57,6 +81,11 @@
reach the entire available memory.
Generally, in assembly code, there is only one way to indicate a jump; the
algorithm used by the assembler to encode these jumps into the different
+Generally, in the code that is sent to the assembler as input, the only
+difference made between jump instructions is by semantics, not by span. This
+means that a distinction is made between the inconditional jump and the several
+kinds of conditional jump, but not between their short, absolute or long
+variants.
+
+The algorithm used by the assembler to encode these jumps into the different
machine instructions is known as the {\tt branch displacement algorithm}. The
optimisation problem consists in using as small an encoding as possible, thus
@@ 66,12 +95,12 @@
which could make finding an optimal solution very timeconsuming.
The canonical solution, as shown in~\cite{Szymanski1978} or more recently
in~\cite{Dickson2008} for the x86 instruction set, is to use a fixed point
algorithm that starts out with the shortest possible encoding (all jumps
encoded as short jumps, which is very probably not a correct solution) and then
iterates over the program to reencode those jumps whose target is outside
their range.

\subsection*{Adding medium jumps}
+The canonical solution, as shown by Szymanski~\cite{Szymanski1978} or more
+recently by Dickson~\cite{Dickson2008} for the x86 instruction set, is to use a
+fixed point algorithm that starts out with the shortest possible encoding (all
+jumps encoded as short jumps, which is very probably not a correct solution)
+and then iterates over the program to reencode those jumps whose target is
+outside their range.
+
+\subsection*{Adding absolute jumps}
In both papers mentioned above, the encoding of a jump is only dependent on the
@@ 81,11 +110,12 @@
Here, termination of the smallest fixed point algorithm is easy to prove. All
jumps start out encoded as short jumps, which means that the distance between
any jump and its target is as short as possible. If we therefore need to encode
a jump $j$ as a long jump, we can be sure that we can never reach a situation
where the span of $j$ is so small that it can be encoded as a short jump. This
reasoning holds throughout the subsequent iterations of the algorithms: short
jumps can change into long jumps, but not vice versa. Hence, the algorithm
either terminates when a fixed point is reached or when all short jumps have
been changed into long jumps.
+any jump and its target is as short as possible. If, in this situation, there
+is a jump $j$ whose span is not within the range for a short jump, we can be
+sure that we can never reach a situation where the span of $j$ is so small that
+it can be encoded as a short jump. This reasoning holds throughout the
+subsequent iterations of the algorithm: short jumps can change into long jumps,
+but not vice versa (spans only increase). Hence, the algorithm either
+terminates when a fixed point is reached or when all short jumps have been
+changed into long jumps.
Also, we can be certain that we have reached an optimal solution: a short jump
@@ 93,15 +123,16 @@
However, neither of these claims (termination nor optimality) hold when we add
the medium jump.

The reason for this is that with medium jumps, the encoding of a jump no longer
depends only on the distance between the jump and its target: in order for a
medium jump to be possible, they need to be in the same segment. It is therefore
entirely possible for two jumps with the same span to be encoded in different
ways (medium if the jump and its destination are in the same segment, long if
this is not the case).
+the absolute jump.
+
+The reason for this is that with absolute jumps, the encoding of a jump no
+longer depends only on the distance between the jump and its target: in order
+for an absolute jump to be possible, they need to be in the same segment (for
+the MCS51, this means that the first 5 bytes of their addesses have to be
+equal). It is therefore entirely possible for two jumps with the same span to
+be encoded in different ways (absolute if the jump and its target are in the
+same segment, long if this is not the case).
This invalidates the termination argument: a jump, once encoded as a long jump,
can be reencoded during a later iteration as a medium jump. Consider the
+can be reencoded during a later iteration as an absolute jump. Consider the
program shown in figure~\ref{f:term_example}. At the start of the first
iteration, both the jump to {\tt X} and the jump to $\mathtt{L}_{0}$ are
@@ 115,8 +146,9 @@
be encoded as a long jump. If we assume that the jump to {\tt X} is encoded as
a long jump as well, the size of the jump will increase and $\mathtt{L}_{0}$
will be `propelled' into the same segment as its jump. Hence, in the next
iteration, it can be encoded as a medium jump. At first glance, there is
+will be `propelled' into the same segment as its jump. Hence, in the third
+iteration, it can be encoded as an absolute jump. At first glance, there is
nothing that prevents us from making a construction where two jumps interact
in such a way as to keep switching between long and medium indefinitely.
+in such a way as to keep switching between long and absolute encodings for
+an indefinite amount of iterations.
\begin{figure}[h]
@@ 128,7 +160,17 @@
jmp L\(\sb{0}\)
\end{alltt}
\caption{Example of a program where a long jump becomes medium}
+\caption{Example of a program where a long jump becomes absolute}
\label{f:term_example}
\end{figure}
+
+In fact, this situation mirrors the explanation by
+Szymanski~\cite{Szymanski1978} of why the branch displacement optimisation
+problem is NPcomplete. In this explanation, a condition for NPcompleteness
+is the fact that programs be allowed to contain {\em pathological} jumps.
+These are jumps that can normally not be encoded as a short(er) jump, but gain
+this property when some other jumps are encoded as a long(er) jump. This is
+exactly what happens in figure~\ref{f:term_example}: by encoding the first
+jump as a long jump, another jump switches from long to absolute
+(which is shorter).
The optimality argument no longer holds either. Let us consider the program
@@ 145,6 +187,6 @@
therefore increase in size, $\mathtt{L}_{1}$ would be `propelled' across the
segment border, so that the three jumps to $\mathtt{L}_{1}$ could be encoded
as medium jumps. Depending on the relative sizes of long and medium jumps, this
solution might actually be smaller than the one reached by the smallest
+as absolute jumps. Depending on the relative sizes of long and absolute jumps,
+this solution might actually be smaller than the one reached by the smallest
fixed point algorithm.
Index: src/ASM/CPP2012policy/proof.tex
===================================================================
 src/ASM/CPP2012policy/proof.tex (revision 2084)
+++ src/ASM/CPP2012policy/proof.tex (revision 2085)
@@ 5,4 +5,6 @@
The main correctness statement is as follows:
+
+\clearpage
\begin{lstlisting}
@@ 55,6 +57,6 @@
$2n$ the number of jumps in the program. This worst case is reached if at every
iteration, we change the encoding of exactly one jump; since a jump can change
from {\tt short} to {\tt medium} and from {\tt medium} to {\tt long}, there
can be at most $2n$ changes.
+from short to absolute and from absolute to long, there can be at most $2n$
+changes.
This proof has been executed in the ``Russell'' style from~\cite{Sozeau2006}.
@@ 114,4 +116,6 @@
iteration, and $p$ the current one), jump lengths either remain equal or
increase. It is needed for proving termination.
+
+\clearpage
\begin{lstlisting}
@@ 160,9 +164,9 @@
[ short_jump $\Rightarrow$ $\neg$is_call instr $\wedge$
\fst (short_jump_cond pc_plus_jmp_length addr) = true
  medium_jump $\Rightarrow$ $\neg$is_relative_jump instr $\wedge$
 \fst (medium_jump_cond pc_plus_jmp_length addr) = true $\wedge$
+  absolute_jump $\Rightarrow$ $\neg$is_relative_jump instr $\wedge$
+ \fst (absolute_jump_cond pc_plus_jmp_length addr) = true $\wedge$
\fst (short_jump_cond pc_plus_jmp_length addr) = false
 long_jump $\Rightarrow$ \fst (short_jump_cond pc_plus_jmp_length addr) = false
 $\wedge$ \fst (medium_jump_cond pc_plus_jmp_length addr) = false
+ $\wedge$ \fst (absolute_jump_cond pc_plus_jmp_length addr) = false
].
\end{lstlisting}
@@ 200,5 +204,5 @@
We need to use two different formulations, because the fact that $added$ is 0
does not guarantee that no jumps have changed: it is possible that we have
replaced a short jump with a medium jump, which does not change the size.
+replaced a short jump with a absolute jump, which does not change the size.
Therefore {\tt policy\_pc\_equal} states that $old\_sigma_1(x) = sigma_1(x)$,
@@ 245,6 +249,6 @@
\begin{lstlisting}
definition sigma_compact: list labelled_instruction → label_map → ppc_pc_map → Prop :=
 λprogram.λlabels.λsigma.
+definition sigma_compact :=
+ λprogram:list labelled_instruction.λlabels:label_map.λsigma:ppc_pc_map.
∀n.n < program →
match bvt_lookup_opt $\ldots$ (bitvector_of_nat ? n) (\snd sigma) with