Index: src/ASM/CPP2012policy/algorithm.tex
===================================================================
 src/ASM/CPP2012policy/algorithm.tex (revision 2095)
+++ src/ASM/CPP2012policy/algorithm.tex (revision 2096)
@@ 4,17 +4,18 @@
Given the NPcompleteness of the problem, to arrive at an optimal solution
within a short space of time (using, for example, a constraint solver) will
potentially take a great amount of time.
+(using, for example, a constraint solver) will potentially take a great amount
+of time.
The SDCC compiler~\cite{SDCC2011}, which has the MCS51 among its target
instruction sets, simply encodes every branch instruction as a long jump
+The SDCC compiler~\cite{SDCC2011}, which has a backend targetting the MCS51
+instruction set, simply encodes every branch instruction as a long jump
without taking the distance into account. While certainly correct (the long
jump can reach any destination in memory) and rapid, it does result in a less
than optimal solution.
+jump can reach any destination in memory) and a very fast solution to compute,
+it results in a less than optimal solution.
The {\tt gcc} compiler suite~\cite{GCC2012}, while compiling C on the x86
architecture, uses a greatest fix point algorithm. In other words, it starts
off with all branch instructions encoded as the largest jumps available, and
then tries to reduce branch instructions as much as possible.
+On the other hand, the {\tt gcc} compiler suite~\cite{GCC2012}, while compiling
+C on the x86 architecture, uses a greatest fix point algorithm. In other words,
+it starts with all branch instructions encoded as the largest jumps
+available, and then tries to reduce the size of branch instructions as much as
+possible.
Such an algorithm has the advantage that any intermediate result it returns
@@ 22,12 +23,12 @@
jump is always possible, and the algorithm only reduces those branch
instructions whose destination address is in range for a shorter jump.
The algorithm can thus be stopped after a determined amount of steps without
losing correctness.
+The algorithm can thus be stopped after a determined number of steps without
+sacrificing correctness.
The result, however, is not necessarily optimal, even if the algorithm is run
until it terminates naturally: the fixed point reached is the {\em greatest}
+The result, however, is not necessarily optimal. Even if the algorithm is run
+until it terminates naturally, the fixed point reached is the {\em greatest}
fixed point, not the least fixed point. Furthermore, {\tt gcc} (at least for
the x86 architecture) only uses short and long jumps. This makes the algorithm
more rapid, as shown in the previous section, but also results in a less
+more efficient, as shown in the previous section, but also results in a less
optimal solution.
@@ 39,10 +40,9 @@
encoding can only switch from short to long, but never in the other direction.
When we add absolute jumps, however, it is theoretically possible for a branch
instruction to switch from absolute to long and back, as shown in the previous
section.
+instruction to switch from absolute to long and back, as previously explained.
Proving termination then becomes difficult, because there is nothing that
precludes a branch instruction from switching back and forth between absolute
and long indefinitely.
+precludes a branch instruction from oscillating back and forth between absolute
+and long jumps indefinitely.
In order to keep the algorithm in the same complexity class and more easily
@@ 54,5 +54,5 @@
from absolute to long.
There is one complicating factor: suppose that a branch instruction is encoded
+There is one complicating factor. Suppose that a branch instruction is encoded
in step $n$ as an absolute jump, but in step $n+1$ it is determined that
(because of changes elsewhere) it can now be encoded as a short jump. Due to
@@ 61,9 +61,9 @@
$n+1$ as well.
This is not necessarily correct, however: a branch instruction that can be
encoded as a short jump cannot always also be encoded as an absolute jump
(a short jump can bridge segments, whereas an absolute jump cannot). Therefore,
in this situation we decide to encode the branch instruction as a long jump,
which is always correct.
+This is not necessarily correct. A branch instruction that can be
+encoded as a short jump cannot always also be encoded as an absolute jump, as a
+short jump can bridge segments, whereas an absolute jump cannot. Therefore,
+in this situation we have decided to encode the branch instruction as a long
+jump, which is always correct.
The resulting algorithm, while not optimal, is at least as good as the ones
@@ 75,13 +75,13 @@
The branch displacement algorithm forms part of the translation from
pseudocode to assembler. More specifically, it is used by the function that
+pseudocode to assembler. More specifically, it is used by the function that
translates pseudoaddresses (natural numbers indicating the position of the
instruction in the program) to actual addresses in memory.
The original intention was to have two different functions, one function
+Our original intention was to have two different functions, one function
$\mathtt{policy}: \mathbb{N} \rightarrow \{\mathtt{short\_jump},
\mathtt{absolute\_jump}, \mathtt{long\_jump}\}$ to associate jumps to their
intended encoding, and a function $\sigma: \mathbb{N} \rightarrow
\mathtt{Word}$ to associate pseudoaddresses to actual addresses. $\sigma$
+\mathtt{Word}$ to associate pseudoaddresses to machine addresses. $\sigma$
would use $\mathtt{policy}$ to determine the size of jump instructions.
@@ 91,5 +91,5 @@
From the algorithmic point of view, in order to create the $\mathtt{policy}$
function, we must necessarily have a translation from pseudoaddresses
to actual addresses (i.e. a $\sigma$ function): in order to judge the distance
+to machine addresses (i.e. a $\sigma$ function): in order to judge the distance
between a jump and its destination, we must know their memory locations.
Conversely, in order to create the $\sigma$ function, we need to have the
@@ 104,5 +104,5 @@
algorithms. We now have a function
$\sigma: \mathbb{N} \rightarrow \mathtt{Word} \times \mathtt{bool}$ which
associates a pseudoaddress to an actual address. The boolean denotes a forced
+associates a pseudoaddress to a machine address. The boolean denotes a forced
long jump; as noted in the previous section, if during the fixed point
computation an absolute jump changes to be potentially reencoded as a short
@@ 143,5 +143,5 @@
\end{figure}
The algorithm, shown in figure~\ref{f:jump_expansion_step}, works by folding the
+The algorithm, shown in Figure~\ref{f:jump_expansion_step}, works by folding the
function {\sc f} over the entire program, thus gradually constructing $sigma$.
This constitutes one step in the fixed point calculation; successive steps
@@ 162,5 +162,5 @@
The first two are parameters that remain the same through one iteration, the
last three are standard parameters for a fold function (including $ppc$,
+final three are standard parameters for a fold function (including $ppc$,
which is simply the number of instructions of the program already processed).
@@ 169,6 +169,6 @@
$\sigma: \mathbb{N} \rightarrow \mathbb{N} \times \{\mathtt{short\_jump},
\mathtt{absolute\_jump},\mathtt{long\_jump}\}$; a function that associates a
pseudoaddress with an memory address and a jump length. We do this to be able
to more easily compare the jump lengths between iterations. In the algorithm,
+pseudoaddress with a memory address and a jump length. We do this to be able
+to more ease the comparison of jump lengths between iterations. In the algorithm,
we use the notation $sigma_1(x)$ to denote the memory address corresponding to
$x$, and $sigma_2(x)$ to denote the jump length corresponding to $x$.
@@ 182,5 +182,5 @@
We cannot use $old\_sigma$ without change: it might be the case that we have
already increased the size some branch instructions before, making the program
+already increased the size of some branch instructions before, making the program
longer and moving every instruction forward. We must compensate for this by
adding the size increase of the program to the label's memory address according
@@ 189,5 +189,5 @@
Note also that we add the pc to $sigma$ at location $ppc+1$, whereas we add the
jump length at location $ppc$. We do this so that $sigma(ppc)$ will always
return a couple with the start address of the instruction at $ppc$ and the
+return a pair with the start address of the instruction at $ppc$ and the
length of its branch instruction (if any); the end address of the program can
be found at $sigma(n+1)$, where $n$ is the number of instructions in the
Index: src/ASM/CPP2012policy/conclusion.tex
===================================================================
 src/ASM/CPP2012policy/conclusion.tex (revision 2095)
+++ src/ASM/CPP2012policy/conclusion.tex (revision 2096)
@@ 1,5 +1,5 @@
\section{Conclusion}
In the previous sections, we have discussed the branch displacement optimisation
+In the previous sections we have discussed the branch displacement optimisation
problem, presented an optimised solution, and discussed the proof of
termination and correctness for this algorithm, as formalised in Matita.
@@ 8,10 +8,11 @@
optimal solution would need techniques like constraint solvers. While outside
the scope of the present research, it would be interesting to see if enough
heuristics could be find to make such a solution practical for implementing
in an existent compiler; this would be especially useful for embedded systems,
where it is important to have a small solution as possible.
+heuristics could be found to make such a solution practical for implementing
+in an existing compiler; this would be especially useful for embedded systems,
+where it is important to have as small solution as possible.
This algorithm is part of a greater whole, the CerCo project, which aims at
the complete formalisation and verification of a compiler. More information
+This algorithm is part of a greater whole, the CerCo project, which aims to
+complete formalise and verify a concrete cost preserving compiler for a large
+subset of the C programming language. More information
on the formalisation of the assembler, of which the present work is a part,
can be found in a companion publication~\cite{DC2012}.
@@ 22,6 +23,6 @@
displacement optimisation algorithm.
The CompCert project is another project aimed at formally verifying a compiler.
Their backend~\cite{Leroy2009} generates assembly code for (amongst others) the
+The CompCert project is another verified compiler project.
+Their backend~\cite{Leroy2009} generates assembly code for (amongst others) subsets of the
PowerPC and x86 (32bit) architectures. At the assembly code stage, there is
no distinction between the spandependent jump instructions, so a branch
@@ 30,12 +31,13 @@
An offshoot of the CompCert project is the CompCertTSO project, who add thread
concurrency and synchronisation to the CompCert compiler~\cite{TSO2011}. This
compiler also generate assembly code and therefore does not include a branch
+compiler also generates assembly code and therefore does not include a branch
displacement algorithm.
There is also Piton~\cite{Moore1996}, which not only includes the
+Finally, there is also the Piton stack~\cite{Moore1996}, which not only includes the
formal verification of a compiler, but also of the machine architecture
targeted by that compiler. However, this architecture does not have different
+targeted by that compiler, a bespoke microprocessor called the FM9001.
+However, this architecture does not have different
jump sizes (branching is simulated by assigning values to the program counter),
so the branch displacement problem does not occur.
+so the branch displacement problem is irrelevant.
\subsection{Formal development}
Index: src/ASM/CPP2012policy/problem.tex
===================================================================
 src/ASM/CPP2012policy/problem.tex (revision 2095)
+++ src/ASM/CPP2012policy/problem.tex (revision 2096)
@@ 5,5 +5,5 @@
fact that in many architecture sets, the encoding (and therefore size) of some
instructions depends on the distance to their operand (the instruction 'span').
The branch displacement optimisation problem consists in encoding these
+The branch displacement optimisation problem consists of encoding these
spandependent instructions in such a way that the resulting program is as
small as possible.
@@ 11,7 +11,7 @@
This problem is the subject of the present paper. After introducing the problem
in more detail, we will discuss the solutions used by other compilers, present
the algorithm used by us in the CerCo assembler, and discuss its verification,
that is the proofs of termination and correctness using the Matita theorem
prover~\cite{Asperti2007}.
+the algorithm we use in the CerCo assembler, and discuss its verification,
+that is the proofs of termination and correctness using the Matita proof
+assistant~\cite{Asperti2007}.
The research presented in this paper has been executed within the CerCo project
@@ 19,5 +19,5 @@
target architecture for this project is the MCS51, whose instruction set
contains spandependent instructions. Furthermore, its maximum addressable
memory size is very small (65 Kbytes), which makes it important to generate
+memory size is very small (64 Kb), which makes it important to generate
programs that are as small as possible.
@@ 31,8 +31,8 @@
In most modern instruction sets that have them, the only spandependent
instructions are branch instructions. Taking the ubiquitous x8664 instruction
set as an example, we find that it contains are eleven different forms of the
+set as an example, we find that it contains eleven different forms of the
unconditional branch instruction, all with different ranges, instruction sizes
and semantics (only six are valid in 64bit mode, for example). Some examples
are shown in figure~\ref{f:x86jumps}.
+are shown in Figure~\ref{f:x86jumps}.
\begin{figure}[h]
@@ 55,5 +55,5 @@
The chosen target architecture of the CerCo project is the Intel MCS51, which
features three types of branch instructions (or jump instructions; the two terms
are used interchangeably), as shown in figure~\ref{f:mcs51jumps}.
+are used interchangeably), as shown in Figure~\ref{f:mcs51jumps}.
\begin{figure}[h]
@@ 78,14 +78,14 @@
encoded using three branch instructions (for instructions whose logical
negation is available, it can be done with two branch instructions, but for
some instructions, this opposite is not available); the call instruction is
+some instructions this is not available); the call instruction is
only available in absolute and long forms.
Note that even though the MCS51 architecture is much less advanced and more
simple than the x8664 architecture, the basic types of branch instruction
+Note that even though the MCS51 architecture is much less advanced and simpler
+than the x8664 architecture, the basic types of branch instruction
remain the same: a short jump with a limited range, an intrasegment jump and a
jump that can reach the entire available memory.
Generally, in the code that is sent to the assembler as input, the only
difference made between branch instructions is by semantics, not by span. This
+Generally, in code fed to the assembler as input, the only
+difference between branch instructions is semantics, not span. This
means that a distinction is made between an unconditional branch and the
several kinds of conditional branch, but not between their short, absolute or
@@ 94,5 +94,5 @@
The algorithm used by the assembler to encode these branch instructions into
the different machine instructions is known as the {\em branch displacement
algorithm}. The optimisation problem consists in using as small an encoding as
+algorithm}. The optimisation problem consists of finding as small an encoding as
possible, thus minimising program length and execution time.
@@ 102,6 +102,6 @@
The canonical solution, as shown by Szymanski~\cite{Szymanski1978} or more
recently by Dickson~\cite{Dickson2008} for the x86 instruction set, is to use a
fixed point algorithm that starts out with the shortest possible encoding (all
branch instruction encoded as short jumps, which is very probably not a correct
+fixed point algorithm that starts with the shortest possible encoding (all
+branch instruction encoded as short jumps, which is likely not a correct
solution) and then iterates over the program to reencode those branch
instructions whose target is outside their range.
@@ 120,6 +120,6 @@
situation where the span of $j$ is so small that it can be encoded as a short
jump. This argument continues to hold throughout the subsequent iterations of
the algorithm: short jumps can change into long jumps, but not vice versa
(spans only increase). Hence, the algorithm either terminates when a fixed
+the algorithm: short jumps can change into long jumps, but not \emph{vice versa},
+as spans only increase. Hence, the algorithm either terminates early when a fixed
point is reached or when all short jumps have been changed into long jumps.
@@ 128,7 +128,5 @@
However, neither of these claims (termination nor optimality) hold when we add
the absolute jump.

The reason for this is that with absolute jumps, the encoding of a branch
+the absolute jump, as with absolute jumps, the encoding of a branch
instruction no longer depends only on the distance between the branch
instruction and its target: in order for an absolute jump to be possible, they
@@ 139,7 +137,19 @@
long if this is not the case).
This invalidates the termination argument: a branch instruction, once encoded
+\begin{figure}[ht]
+\begin{alltt}
+ jmp X
+ \vdots
+L\(\sb{0}\):
+ \vdots
+ jmp L\(\sb{0}\)
+\end{alltt}
+\caption{Example of a program where a long jump becomes absolute}
+\label{f:term_example}
+\end{figure}
+
+This invalidates our earlier termination argument: a branch instruction, once encoded
as a long jump, can be reencoded during a later iteration as an absolute jump.
Consider the program shown in figure~\ref{f:term_example}. At the start of the
+Consider the program shown in Figure~\ref{f:term_example}. At the start of the
first iteration, both the branch to {\tt X} and the branch to $\mathtt{L}_{0}$
are encoded as small jumps. Let us assume that in this case, the placement of
@@ 156,32 +166,19 @@
Hence, in the third iteration, the branch to $\mathtt{L}_{0}$ can be encoded as
an absolute jump. At first glance, there is nothing that prevents us from
making a construction where two branch instructions interact in such a way as
to keep switching between long and absolute encodings for an indefinite amount
of iterations.

\begin{figure}[h]
\begin{alltt}
 jmp X
 \vdots
L\(\sb{0}\):
 \vdots
 jmp L\(\sb{0}\)
\end{alltt}
\caption{Example of a program where a long jump becomes absolute}
\label{f:term_example}
\end{figure}

In fact, this situation mirrors the explanation by
Szymanski~\cite{Szymanski1978} of why the branch displacement optimisation
problem is NPcomplete. In this explanation, a condition for NPcompleteness
is the fact that programs be allowed to contain {\em pathological} jumps.
These are branch instructions that can normally not be encoded as a short(er)
jump, but gain this property when some other branch instructions are encoded as
a long(er) jump. This is exactly what happens in figure~\ref{f:term_example}:
by encoding the first branch instruction as a long jump, another branch
instructions switches from long to absolute (which is shorter).

The optimality argument no longer holds either. Let us consider the program
shown in figure~\ref{f:opt_example}. Suppose that the distance between
+constructing a configuration where two branch instructions interact in such a
+way as to iterate indefinitely between long and absolute encodings.
+
+This situation mirrors the explanation by Szymanski~\cite{Szymanski1978} of why
+the branch displacement optimisation problem is NPcomplete. In this explanation,
+a condition for NPcompleteness is the fact that programs be allowed to contain
+{\em pathological} jumps. These are branch instructions that can normally not be
+encoded as a short(er) jump, but gain this property when some other branch
+instructions are encoded as a long(er) jump. This is exactly what happens in
+Figure~\ref{f:term_example}. By encoding the first branch instruction as a long
+jump, another branch instruction switches from long to absolute (which is
+shorter).
+
+In addition, our previous optimality argument no longer holds. Consider the program
+shown in Figure~\ref{f:opt_example}. Suppose that the distance between
$\mathtt{L}_{0}$ and $\mathtt{L}_{1}$ is such that if {\tt jmp X} is encoded
as a short jump, there is a segment border just after $\mathtt{L}_{1}$. Let
@@ 189,4 +186,22 @@
segment, but far enough away from $\mathtt{L}_{1}$ that they cannot be encoded
as short jumps.
+
+\begin{figure}[ht]
+\begin{alltt}
+L\(\sb{0}\): jmp X
+X:
+ \vdots
+L\(\sb{1}\):
+ \vdots
+ jmp L\(\sb{1}\)
+ \vdots
+ jmp L\(\sb{1}\)
+ \vdots
+ jmp L\(\sb{1}\)
+ \vdots
+\end{alltt}
+\caption{Example of a program where the fixedpoint algorithm is not optimal}
+\label{f:opt_example}
+\end{figure}
Then, if {\tt jmp X} were to be encoded as a short jump, which is clearly
@@ 198,20 +213,2 @@
this solution might actually be smaller than the one reached by the smallest
fixed point algorithm.

\begin{figure}[h]
\begin{alltt}
L\(\sb{0}\): jmp X
X:
 \vdots
L\(\sb{1}\):
 \vdots
 jmp L\(\sb{1}\)
 \vdots
 jmp L\(\sb{1}\)
 \vdots
 jmp L\(\sb{1}\)
 \vdots
\end{alltt}
\caption{Example of a program where the fixedpoint algorithm is not optimal}
\label{f:opt_example}
\end{figure}
Index: src/ASM/CPP2012policy/proof.tex
===================================================================
 src/ASM/CPP2012policy/proof.tex (revision 2095)
+++ src/ASM/CPP2012policy/proof.tex (revision 2096)
@@ 1,41 +1,36 @@
\section{The proof}
In this section, we will present the correctness proof of the algorithm in more
detail.

The main correctness statement is as follows:
+In this section, we present the correctness proof for the algorithm in more
+detail. The main correctness statement is as follows (slightly simplified, here):
\clearpage
\begin{lstlisting}
definition sigma_policy_specification :=:
$\lambda$program: pseudo_assembly_program.
$\lambda$sigma: Word → Word.
$\lambda$policy: Word → bool.
 sigma (zero $\ldots$) = zero $\ldots$ $\wedge$
 $\forall$ppc: Word.$\forall$ppc_ok.
 let instr_list := \snd program in
 let pc ≝ sigma ppc in
 let labels := \fst (create_label_cost_map (\snd program)) in
 let lookup_labels :=
 $\lambda$x.bitvector_of_nat ? (lookup_def ?? labels x 0) in
 let instruction :=
 \fst (fetch_pseudo_instruction (\snd program) ppc ppc_ok) in
 let next_pc := \fst (sigma (add ? ppc (bitvector_of_nat ? 1))) in
 (nat_of_bitvector $\ldots$ ppc ≤ instr_list →
 next_pc = add ? pc (bitvector_of_nat $\ldots$
 (instruction_size lookup_labels sigma policy ppc instruction)))
 $\wedge$
 ((nat_of_bitvector $\ldots$ ppc < instr_list →
 nat_of_bitvector $\ldots$ pc < nat_of_bitvector $\ldots$ next_pc)
 $\vee$ (nat_of_bitvector $\ldots$ ppc = instr_list → next_pc = (zero $\ldots$))).
+definition sigma_policy_specification :=
+ $\lambda$program: pseudo_assembly_program.
+ $\lambda$sigma: Word $\rightarrow$ Word.
+ $\lambda$policy: Word $\rightarrow$ bool.
+ sigma (zero $\ldots$) = zero $\ldots$ $\wedge$
+ $\forall$ppc: Word.$\forall$ppc_ok.
+ let $\langle$preamble, instr_list$\rangle$ := program in
+ let pc := sigma ppc in
+ let instruction :=
+ \fst (fetch_pseudo_instruction instr_list ppc ppc_ok) in
+ let next_pc := \fst (sigma (add ? ppc (bitvector_of_nat ? 1))) in
+ (nat_of_bitvector $\ldots$ ppc $\leq$ instr_list $\rightarrow$
+ next_pc = add ? pc (bitvector_of_nat $\ldots$
+ (instruction_size $\ldots$ sigma policy ppc instruction)))
+ $\wedge$
+ ((nat_of_bitvector $\ldots$ ppc < instr_list $\rightarrow$
+ nat_of_bitvector $\ldots$ pc < nat_of_bitvector $\ldots$ next_pc)
+ $\vee$ (nat_of_bitvector $\ldots$ ppc = instr_list $\rightarrow$ next_pc = (zero $\ldots$))).
\end{lstlisting}
Informally, this means that when fetching a pseudoinstruction at $ppc$, the
translation by $\sigma$ of $ppc+1$ is the same as $\sigma(ppc)$ plus the size
of the instruction at $ppc$; i.e. an instruction is placed immediately
+of the instruction at $ppc$. That is, an instruction is placed consecutively
after the previous one, and there are no overlaps.
The other condition enforced is the fact that instructions are stocked in
+Instructions are also stocked in
order: the memory address of the instruction at $ppc$ should be smaller than
the memory address of the instruction at $ppc+1$. There is one exeception to
@@ 44,6 +39,5 @@
to the amount of memory).
And finally, we enforce that the program starts at address 0, i.e.
$\sigma(0) = 0$.
+Finally, we enforce that the program starts at address 0, i.e. $\sigma(0) = 0$.
Since our computation is a least fixed point computation, we must prove
@@ 51,5 +45,5 @@
a number of steps without reaching a fixed point, the solution is not
guaranteed to be correct. More specifically, branch instructions might be
encoded who do not coincide with the span between their location and their
+encoded which do not coincide with the span between their location and their
destination.
@@ 62,5 +56,5 @@
long, there can be at most $2n$ changes.
This proof has been executed in the ``Russell'' style from~\cite{Sozeau2006}.
+The proof has been carried out using the ``Russell'' style from~\cite{Sozeau2006}.
We have proven some invariants of the {\sc f} function from the previous
section; these invariants are then used to prove properties that hold for every
@@ 74,26 +68,26 @@
Note that during the fixed point computation, the $\sigma$ function is
implemented as a trie for ease access; computing $\sigma(x)$ is done by looking
+implemented as a trie for ease of access; computing $\sigma(x)$ is achieved by looking
up the value of $x$ in the trie. Actually, during the fold, the value we
pass along is a couple $\mathbb{N} \times \mathtt{ppc_pc_map}$. The natural
number is the number of bytes added to the program so far with respect to
the previous iteration, and {\tt ppc\_pc\_map} is a couple of the current
size of the program and our $\sigma$ function.
+pass along is a pair $\mathbb{N} \times \mathtt{ppc_pc_map}$. The first component
+is the number of bytes added to the program so far with respect to
+the previous iteration, and the second component, {\tt ppc\_pc\_map}, is a pair
+consisting of the current size of the program and our $\sigma$ function.
\begin{lstlisting}
definition out_of_program_none :=
$\lambda$prefix:list labelled_instruction.$\lambda$sigma:ppc_pc_map.
 $\forall$i.i < 2^16 → (i > prefix $\leftrightarrow$
+ $\forall$i.i < 2^16 $\rightarrow$ (i > prefix $\leftrightarrow$
bvt_lookup_opt $\ldots$ (bitvector_of_nat ? i) (\snd sigma) = None ?).
\end{lstlisting}
This invariant states that every pseudoaddress not yet treated cannot be
found in the lookup trie.

\begin{lstlisting}
definition not_jump_default ≝
+This invariant states that any pseudoaddress not yet examined is not
+present in the lookup trie.
+
+\begin{lstlisting}
+definition not_jump_default :=
$\lambda$prefix:list labelled_instruction.$\lambda$sigma:ppc_pc_map.
 $\forall$i.i < prefix →
 ¬is_jump (\snd (nth i ? prefix $\langle$None ?, Comment []$\rangle$)) →
+ $\forall$i.i < prefix $\rightarrow$
+ ¬is_jump (\snd (nth i ? prefix $\langle$None ?, Comment []$\rangle$)) $\rightarrow$
\snd (bvt_lookup $\ldots$ (bitvector_of_nat ? i) (\snd sigma)
$\langle$0,short_jump$\rangle$) = short_jump.
@@ 106,6 +100,6 @@
\begin{lstlisting}
definition jump_increase :=
 λprefix:list labelled_instruction.λop:ppc_pc_map.λp:ppc_pc_map.
 ∀i.i ≤ prefix →
+ $\lambda$prefix:list labelled_instruction.$\lambda$op:ppc_pc_map.$\lambda$p:ppc_pc_map.
+ $\forall$i.i $\leq$ prefix $\rightarrow$
let $\langle$opc,oj$\rangle$ :=
bvt_lookup $\ldots$ (bitvector_of_nat ? i) (\snd op) $\langle$0,short_jump$\rangle$ in
@@ 115,5 +109,5 @@
\end{lstlisting}
This invariant states that between iterations ($op$ being the previous
+This invariant states that between iterations (with $op$ being the previous
iteration, and $p$ the current one), jump lengths either remain equal or
increase. It is needed for proving termination.
@@ 123,12 +117,12 @@
\begin{lstlisting}
definition sigma_compact_unsafe :=
 λprogram:list labelled_instruction.λlabels:label_map.λsigma:ppc_pc_map.
 ∀n.n < program →
+ $\lambda$program:list labelled_instruction.$\lambda$labels:label_map.$\lambda$sigma:ppc_pc_map.
+ $\forall$n.n < program $\rightarrow$
match bvt_lookup_opt $\ldots$ (bitvector_of_nat ? n) (\snd sigma) with
 [ None ⇒ False
  Some x ⇒ let $\langle$pc,j$\rangle$ := x in
+ [ None $\Rightarrow$ False
+  Some x $\Rightarrow$ let $\langle$pc,j$\rangle$ := x in
match bvt_lookup_opt $\ldots$ (bitvector_of_nat ? (S n)) (\snd sigma) with
 [ None ⇒ False
  Some x1 ⇒ let $\langle$pc1,j1$\rangle$ ≝ x1 in
+ [ None $\Rightarrow$ False
+  Some x1 $\Rightarrow$ let $\langle$pc1,j1$\rangle$ := x1 in
pc1 = pc + instruction_size_jmplen j
(\snd (nth n ? program $\langle$None ?, Comment []$\rangle$)))
@@ 139,5 +133,5 @@
This is a temporary formulation of the main property
({\tt sigma\_policy\_specification}); its main difference
with the final version is that it uses {\tt instruction\_size\_jmplen} to
+from the final version is that it uses {\tt instruction\_size\_jmplen} to
compute the instruction size. This function uses $j$ to compute the span
of branch instructions (i.e. it uses the $\sigma$ function under construction),
@@ 149,7 +143,7 @@
\begin{lstlisting}
definition sigma_safe :=
 λprefix:list labelled_instruction.λlabels:label_map.λadded:$\mathbb{N}$.
 λold_sigma:ppc_pc_map.λsigma:ppc_pc_map.
 ∀i.i < prefix → let $\langle$pc,j$\rangle$ :=
+ $\lambda$prefix:list labelled_instruction.$\lambda$labels:label_map.$\lambda$added:$\mathbb{N}$.
+ $\lambda$old_sigma:ppc_pc_map.$\lambda$sigma:ppc_pc_map.
+ $\forall$i.i < prefix $\rightarrow$ let $\langle$pc,j$\rangle$ :=
bvt_lookup $\ldots$ (bitvector_of_nat ? i) (\snd sigma) $\langle$0,short_jump$\rangle$ in
let pc_plus_jmp_length := bitvector_of_nat ? (\fst (bvt_lookup $\ldots$
@@ 178,5 +172,5 @@
Note that we compute the distance using the memory address of the instruction
plus its size: this is due to the behaviour of the MCS51 architecture, which
+plus its size: this follows the behaviour of the MCS51 microprocessor, which
increases the program counter directly after fetching, and only then executes
the branch instruction (by changing the program counter again).
@@ 194,18 +188,18 @@
\begin{lstlisting}
(added = 0 → policy_pc_equal prefix old_sigma policy))
(policy_jump_equal prefix old_sigma policy → added = 0))
+(added = 0 $\rightarrow$ policy_pc_equal prefix old_sigma policy))
+(policy_jump_equal prefix old_sigma policy $\rightarrow$ added = 0))
\end{lstlisting}
And finally, two properties that deal with what happens when the previous
iteration does not change with respect to the current one. $added$ is the
+iteration does not change with respect to the current one. $added$ is a
variable that keeps track of the number of bytes we have added to the program
size by changing the encoding of branch instructions; if this is 0, the program
+size by changing the encoding of branch instructions. If $added$ is 0, the program
has not changed and vice versa.
We need to use two different formulations, because the fact that $added$ is 0
does not guarantee that no branch instructions have changed: it is possible
that we have replaced a short jump with a absolute jump, which does not change
the size of the branch instruction.
+does not guarantee that no branch instructions have changed. For instance,
+it is possible that we have replaced a short jump with a absolute jump, which
+does not change the size of the branch instruction.
Therefore {\tt policy\_pc\_equal} states that $old\_sigma_1(x) = sigma_1(x)$,
@@ 220,8 +214,8 @@
difference between these invariants and the fold invariants is that after the
completion of the fold, we check whether the program size does not supersede
65 Kbytes (the maximum memory size the MCS51 can address).
+64 Kb, the maximum memory size the MCS51 can address.
The type of an iteration therefore becomes an option type: {\tt None} in case
the program becomes larger than 65 KBytes, or $\mathtt{Some}\ \sigma$
+the program becomes larger than 64 Kb, or $\mathtt{Some}\ \sigma$
otherwise. We also no longer use a natural number to pass along the number of
bytes added to the program size, but a boolean that indicates whether we have
@@ 232,7 +226,7 @@
\begin{lstlisting}
definition nec_plus_ultra :=
 λprogram:list labelled_instruction.λp:ppc_pc_map.
 ¬(∀i.i < program →
 is_jump (\snd (nth i ? program $\langle$None ?, Comment []$\rangle$)) →
+ $\lambda$program:list labelled_instruction.$\lambda$p:ppc_pc_map.
+ ¬($\forall$i.i < program $\rightarrow$
+ is_jump (\snd (nth i ? program $\langle$None ?, Comment []$\rangle$)) $\rightarrow$
\snd (bvt_lookup $\ldots$ (bitvector_of_nat 16 i) (\snd p) $\langle$0,short_jump$\rangle$) =
long_jump).
@@ 241,5 +235,5 @@
This invariant is applied to $old\_sigma$; if our program becomes too large
for memory, the previous iteration cannot have every branch instruction encoded
as a long jump. This is needed later on in the proof of termination.
+as a long jump. This is needed later in the proof of termination.
If the iteration returns $\mathtt{Some}\ \sigma$, the invariants
@@ 253,17 +247,17 @@
\begin{lstlisting}
definition sigma_compact :=
 λprogram:list labelled_instruction.λlabels:label_map.λsigma:ppc_pc_map.
 ∀n.n < program →
+ $\lambda$program:list labelled_instruction.$\lambda$labels:label_map.$\lambda$sigma:ppc_pc_map.
+ $\forall$n.n < program $\rightarrow$
match bvt_lookup_opt $\ldots$ (bitvector_of_nat ? n) (\snd sigma) with
 [ None ⇒ False
  Some x ⇒ let $\langle$pc,j$\rangle$ := x in
+ [ None $\Rightarrow$ False
+  Some x $\Rightarrow$ let $\langle$pc,j$\rangle$ := x in
match bvt_lookup_opt $\ldots$ (bitvector_of_nat ? (S n)) (\snd sigma) with
 [ None ⇒ False
  Some x1 ⇒ let $\langle$pc1,j1$\rangle$ := x1 in
+ [ None $\Rightarrow$ False
+  Some x1 $\Rightarrow$ let $\langle$pc1,j1$\rangle$ := x1 in
pc1 = pc + instruction_size
 (λid.bitvector_of_nat ? (lookup_def ?? labels id 0))
 (λppc.bitvector_of_nat ?
+ ($\lambda$id.bitvector_of_nat ? (lookup_def ?? labels id 0))
+ ($\lambda$ppc.bitvector_of_nat ?
(\fst (bvt_lookup $\ldots$ ppc (\snd sigma) $\langle$0,short_jump$\rangle$)))
 (λppc.jmpeqb long_jump (\snd (bvt_lookup $\ldots$ ppc
+ ($\lambda$ppc.jmpeqb long_jump (\snd (bvt_lookup $\ldots$ ppc
(\snd sigma) $\langle$0,short_jump$\rangle$))) (bitvector_of_nat ? n)
(\snd (nth n ? program $\langle$None ?, Comment []$\rangle$))
@@ 272,5 +266,5 @@
\end{lstlisting}
This is the same invariant as ${\tt sigma\_compact\_unsafe}$, but instead it
+This is almost the same invariant as ${\tt sigma\_compact\_unsafe}$, but differs in that it
computes the sizes of branch instructions by looking at the distance between
position and destination using $\sigma$.
@@ 292,11 +286,11 @@
The proof of {\tt nec\_plus\_ultra} works as follows: if we return {\tt None},
then the program size must be greater than 65 Kbytes. However, since the
+then the program size must be greater than 64 Kb. However, since the
previous iteration did not return {\tt None} (because otherwise we would
terminate immediately), the program size in the previous iteration must have
been smaller than 65 Kbytes.
+been smaller than 64 Kb.
Suppose that all the branch instructions in the previous iteration are
encodes as long jumps. This means that all branch instructions in this
+encoded as long jumps. This means that all branch instructions in this
iteration are long jumps as well, and therefore that both iterations are equal
in the encoding of their branch instructions. Per the invariant, this means that
@@ 304,5 +298,5 @@
But if all addresses are equal, the program sizes must be equal too, which
means that the program size in the current iteration must be smaller than
65 Kbytes. This contradicts the earlier hypothesis, hence not all branch
+64 Kb. This contradicts the earlier hypothesis, hence not all branch
instructions in the previous iteration are encoded as long jumps.
@@ 316,9 +310,9 @@
These are the invariants that hold after $2n$ iterations, where $n$ is the
program size (we use the program size for convenience; we could also use the
number of branch instructions, but this is more complicated). Here, we only
+number of branch instructions, but this is more complex). Here, we only
need {\tt out\_of\_program\_none}, {\tt sigma\_compact} and the fact that
$\sigma(0) = 0$.
Termination can now be proven through the fact that there is a $k \leq 2n$, with
+Termination can now be proved using the fact that there is a $k \leq 2n$, with
$n$ the length of the program, such that iteration $k$ is equal to iteration
$k+1$. There are two possibilities: either there is a $k < 2n$ such that this