Changeset 2091


Ignore:
Timestamp:
Jun 15, 2012, 1:35:46 PM (5 years ago)
Author:
boender
Message:
  • systematically changed 'jump' to 'branch'
Location:
src/ASM/CPP2012-policy
Files:
5 edited

Legend:

Unmodified
Added
Removed
  • src/ASM/CPP2012-policy/algorithm.tex

    r2086 r2091  
    88
    99The SDCC compiler~\cite{SDCC2011}, which has the MCS-51 among its target
    10 instruction sets, simply encodes every jump as a long jump without taking the
    11 distance into account. While certainly correct (the long jump can reach any
    12 destination in memory) and rapid, it does result in a less than optimal
    13 solution.
     10instruction sets, simply encodes every branch instruction as a long jump
     11without taking the distance into account. While certainly correct (the long
     12jump can reach any destination in memory) and rapid, it does result in a less
     13than optimal solution.
    1414
    1515The {\tt gcc} compiler suite~\cite{GCC2012}, while compiling C on the x86
    1616architecture, uses a greatest fix point algorithm. In other words, it starts
    17 off with all jumps encoded as the largest jumps available, and then tries to
    18 reduce jumps as much as possible.
     17off with all branch instructions encoded as the largest jumps available, and
     18then tries to reduce branch instructions as much as possible.
    1919
    20 Such an algorithm has the advantage that any intermediate results it returns
    21 are correct: the solution where every jump is encoded as a large jump is
    22 always possible, and the algorithm only reduces those jumps where the
    23 destination address is in range for a shorter jump instruction. The algorithm
    24 can thus be stopped after a determined amount of steps without losing
    25 correctness.
     20Such an algorithm has the advantage that any intermediate result it returns
     21is correct: the solution where every branch instruction is encoded as a large
     22jump is always possible, and the algorithm only reduces those branch
     23instructions whose destination address is in range for a shorter jump.
     24The algorithm can thus be stopped after a determined amount of steps without
     25losing correctness.
    2626
    2727The result, however, is not necessarily optimal, even if the algorithm is run
     
    3838previous section: if we only take short and long jumps into account, the jump
    3939encoding can only switch from short to long, but never in the other direction.
    40 When we add absolute jumps, however, it is theoretically possible for a jump to
    41 switch from absolute to long and back, as explained in the previous section.
     40When we add absolute jumps, however, it is theoretically possible for a branch
     41instruction to switch from absolute to long and back, as shown in the previous
     42section.
    4243
    4344Proving termination then becomes difficult, because there is nothing that
    44 precludes a jump from switching back and forth between absolute and long
    45 indefinitely.
     45precludes a branch instruction from switching back and forth between absolute
     46and long indefinitely.
    4647
    4748In order to keep the algorithm in the same complexity class and more easily
    48 prove termination, we decided to explicitly enforce the `jumps must always
    49 increase' requirement: if a jump is encoded as a long jump in one step, it will
    50 also be encoded as a long jump in all the following steps. This means that any
    51 jump can change at most two times: once from short to absolute (or long), and
    52 once from absolute to long.
     49prove termination, we decided to explicitly enforce the `branch instructions
     50must always grow longer' requirement: if a branch instruction is encoded as a
     51long jump in one iteration, it will also be encoded as a long jump in all the
     52following iterations. This means that the encoding of any branch instruction
     53can change at most two times: once from short to absolute (or long), and once
     54from absolute to long.
    5355
    54 There is one complicating factor: suppose that a jump is encoded in step $n$
    55 as an absolute jump, but in step $n+1$ it is determined that (because of changes
    56 elsewhere) it can now be encoded as a short jump. Due to the requirement that
    57 jumps must always increase, this means that the jump will be encoded as an
    58 absolute jump in step $n+1$ as well.
     56There is one complicating factor: suppose that a branch instruction is encoded
     57in step $n$ as an absolute jump, but in step $n+1$ it is determined that
     58(because of changes elsewhere) it can now be encoded as a short jump. Due to
     59the requirement that the branch instructions must always grow longer, this
     60means that the branch encoding will be encoded as an absolute jump in step
     61$n+1$ as well.
    5962
    60 This is not necessarily correct, however: it is not the case that any short
    61 jump can correctly be encoded as an absolute jump (a short jump can bridge
    62 segments, whereas an absolute jump cannot). Therefore, in this situation
    63 we decide to encode the jump as a long jump, which is always correct.
     63This is not necessarily correct, however: a branch instruction that can be
     64encoded as a short jump cannot always also be encoded as an absolute jump
     65(a short jump can bridge segments, whereas an absolute jump cannot). Therefore,
     66in this situation we decide to encode the branch instruction as a long jump,
     67which is always correct.
    6468
    6569The resulting algorithm, while not optimal, is at least as good as the ones
    6670from SDCC and {\tt gcc}, and potentially better. Its complexity remains
    67 the same (there are at most $2n$ iterations, where $n$ is the number of jumps
    68 in the program).
     71the same (there are at most $2n$ iterations, where $n$ is the number of branch
     72instructions in the program).
    6973
    7074\subsection{The algorithm in detail}
     
    172176Note that the $\sigma$ function used for label lookup varies depending on
    173177whether the label is behind our current position or ahead of it. For
    174 backward jumps, where the label is behind our current position, we can use
     178backward branches, where the label is behind our current position, we can use
    175179$sigma$ for lookup, since its memory address is already known. However, for
    176 forward jumps, the memory address of the address of the label is not yet
     180forward branches, the memory address of the address of the label is not yet
    177181known, so we must use $old\_sigma$.
    178182
    179183We cannot use $old\_sigma$ without change: it might be the case that we have
    180 already changed some jumps before, making the program longer. We must therefore
    181 compensate for this by adding the size increase of the program to the label's
    182 memory address according to $old\_sigma$, so that jump distances do not get
    183 compromised.
     184already increased the size some branch instructions before, making the program
     185longer and moving every instruction forward. We must compensate for this by
     186adding the size increase of the program to the label's memory address according
     187to $old\_sigma$, so that branch instruction spans do not get compromised.
    184188
    185189Note also that we add the pc to $sigma$ at location $ppc+1$, whereas we add the
    186190jump length at location $ppc$. We do this so that $sigma(ppc)$ will always
    187191return a couple with the start address of the instruction at $ppc$ and the
    188 length of its jump; the end address of the program can be found at $sigma(n+1)$,
    189 where $n$ is the number of instructions in the program.
     192length of its branch instruction (if any); the end address of the program can
     193be found at $sigma(n+1)$, where $n$ is the number of instructions in the
     194program.
  • src/ASM/CPP2012-policy/biblio.bib

    r2085 r2091  
    5959  title = {Small Device {C} Compiler 3.1.0},
    6060  howpublished = {\url{http://sdcc.sourceforge.net/}},
    61   year = {2011}
     61  year = {2011},
     62        key = {SDCC2011}
    6263}
    6364
     
    6566        title = {GNU Compiler Collection 4.7.0},
    6667        howpublished = {\url{http://gcc.gnu.org/}},
    67         year = {2012}
     68        year = {2012},
     69        key = {GCC2012}
    6870}
    6971
     
    9496}
    9597
     98@misc {Hyde2006,
     99  title = {Branch displacement optimisation},
     100        author = {Randall Hyde},
     101  howpublished = {\url{http://groups.google.com/group/alt.lang.asm/msg/d31192d442accad3}},
     102  year = {2006},
     103}
     104
     105@misc{DC2012,
     106        title={On the correctness of an optimising assembler for the Intel MCS-51 microprocessor},
     107        author = {Dominic P. Mulligan and Claudio {Sacerdoti Coen}},
     108        year = {2012},
     109        booktitle = {Certified Proofs and Programs {(CPP)}},
     110        note = {Submitted}
     111}
     112
     113@article{Asperti2007,
     114  author = {Andrea Asperti and Claudio {Sacerdoti Coen} and Enrico Tassi and Stefano Zacchiroli},
     115  title = {User interaction with the {Matita} proof assistant},
     116  journal = {Automated Reasoning},
     117  pages = {109--139},
     118  volume = {39},
     119  issue = {2},
     120  year = {2007}
     121}
  • src/ASM/CPP2012-policy/conclusion.tex

    r2085 r2091  
    1515the complete formalisation and verification of a compiler. More information
    1616on the formalisation of the assembler, of which the present work is a part,
    17 can be found in a companion publication.
     17can be found in a companion publication~\cite{DC2012}.
    1818
    1919\subsection{Related work}
  • src/ASM/CPP2012-policy/problem.tex

    r2086 r2091  
    22
    33The problem of branch displacement optimisation, also known as jump encoding, is
    4 a well-known problem in assembler design. It is caused by the fact that in
    5 many architecture sets, the encoding (and therefore size) of some instructions
    6 depends on the distance to their operand (the instruction 'span'). The branch
    7 displacement optimisation problem consists in encoding these span-dependent
    8 instructions in such a way that the resulting program is as small as possible.
     4a well-known problem in assembler design~\cite{Hyde2006}. It is caused by the
     5fact that in many architecture sets, the encoding (and therefore size) of some
     6instructions depends on the distance to their operand (the instruction 'span').
     7The branch displacement optimisation problem consists in encoding these
     8span-dependent instructions in such a way that the resulting program is as
     9small as possible.
    910
    1011This problem is the subject of the present paper. After introducing the problem
    1112in more detail, we will discuss the solutions used by other compilers, present
    1213the algorithm used by us in the CerCo assembler, and discuss its verification,
    13 that is the proofs of termination and correctness.
     14that is the proofs of termination and correctness using the Matita theorem
     15prover~\cite{Asperti2007}.
    1416 
    1517The research presented in this paper has been executed within the CerCo project
     
    2123
    2224With this optimisation, however, comes increased complexity and hence
    23 increased possibility for error. We must make sure that the jumps are encoded
    24 correctly, otherwise the assembled program will behave unpredictably.
     25increased possibility for error. We must make sure that the branch instructions
     26are encoded correctly, otherwise the assembled program will behave
     27unpredictably.
    2528
    2629\section{The branch displacement optimisation problem}
    2730
    28 In most modern instruction sets, the only span-dependent instructions are
    29 branch instructions. Taking the ubiquitous x86-64 instruction set as an
    30 example, we find that it contains are eleven different forms of the
    31 unconditional jump instruction, all with different ranges, instruction sizes
     31In most modern instruction sets that have them, the only span-dependent
     32instructions are branch instructions. Taking the ubiquitous x86-64 instruction
     33set as an example, we find that it contains are eleven different forms of the
     34unconditional branch instruction, all with different ranges, instruction sizes
    3235and semantics (only six are valid in 64-bit mode, for example). Some examples
    3336are shown in figure~\ref{f:x86jumps}.
     
    4649\end{tabular}
    4750\end{center}
    48 \caption{List of x86 jump instructions}
     51\caption{List of x86 branch instructions}
    4952\label{f:x86jumps}
    5053\end{figure}
     
    6770\end{tabular}
    6871\end{center}
    69 \caption{List of MCS-51 jump instructions}
     72\caption{List of MCS-51 branch instructions}
    7073\label{f:mcs51jumps}
    7174\end{figure}
    7275
    73 The conditional jump instruction is only available in short form, which
    74 means that a conditional jump outside the short address range has to be
    75 encoded using two jumps; the call instruction is only available in
    76 absolute and long forms.
     76Conditional branch instructions are only available in short form, which
     77means that a conditional branch outside the short address range has to be
     78encoded using three branch instructions (for instructions whose logical
     79negation is available, it can be done with two branch instructions, but for
     80some instructions, this opposite is not available); the call instruction is
     81only available in absolute and long forms.
    7782
    7883Note that even though the MCS-51 architecture is much less advanced and more
    79 simple than the x86-64 architecture, the basic types of jump remain the same:
    80 a short jump with a limited range, an intra-segment jump and a jump that can
    81 reach the entire available memory.
     84simple than the x86-64 architecture, the basic types of branch instruction
     85remain the same: a short jump with a limited range, an intra-segment jump and a
     86jump that can reach the entire available memory.
    8287 
    8388Generally, in the code that is sent to the assembler as input, the only
    84 difference made between jump instructions is by semantics, not by span. This
    85 means that a distinction is made between the unconditional jump and the several
    86 kinds of conditional jump, but not between their short, absolute or long
    87 variants.
    88 
    89 The algorithm used by the assembler to encode these jumps into the different
    90 machine instructions is known as the {\tt branch displacement algorithm}. The
    91 optimisation problem consists in using as small an encoding as possible, thus
    92 minimising program length and execution time.
     89difference made between branch instructions is by semantics, not by span. This
     90means that a distinction is made between an unconditional branch and the
     91several kinds of conditional branch, but not between their short, absolute or
     92long variants.
     93
     94The algorithm used by the assembler to encode these branch instructions into
     95the different machine instructions is known as the {\em branch displacement
     96algorithm}. The optimisation problem consists in using as small an encoding as
     97possible, thus minimising program length and execution time.
    9398
    9499This problem is known to be NP-complete~\cite{Robertson1979,Szymanski1978},
     
    98103recently by Dickson~\cite{Dickson2008} for the x86 instruction set, is to use a
    99104fixed point algorithm that starts out with the shortest possible encoding (all
    100 jumps encoded as short jumps, which is very probably not a correct solution)
    101 and then iterates over the program to re-encode those jumps whose target is
    102 outside their range.
     105branch instruction encoded as short jumps, which is very probably not a correct
     106solution) and then iterates over the program to re-encode those branch
     107instructions whose target is outside their range.
    103108
    104109\subsection*{Adding absolute jumps}
     
    109114
    110115Here, termination of the smallest fixed point algorithm is easy to prove. All
    111 jumps start out encoded as short jumps, which means that the distance between
    112 any jump and its target is as short as possible. If, in this situation, there
    113 is a jump $j$ whose span is not within the range for a short jump, we can be
    114 sure that we can never reach a situation where the span of $j$ is so small that
    115 it can be encoded as a short jump. This reasoning holds throughout the
    116 subsequent iterations of the algorithm: short jumps can change into long jumps,
    117 but not vice versa (spans only increase). Hence, the algorithm either
    118 terminates when a fixed point is reached or when all short jumps have been
    119 changed into long jumps.
     116branch instructions start out encoded as short jumps, which means that the
     117distance between any branch instruction and its target is as short as possible.
     118If, in this situation, there is a branch instruction $b$ whose span is not
     119within the range for a short jump, we can be sure that we can never reach a
     120situation where the span of $j$ is so small that it can be encoded as a short
     121jump. This argument continues to hold throughout the subsequent iterations of
     122the algorithm: short jumps can change into long jumps, but not vice versa
     123(spans only increase). Hence, the algorithm either terminates when a fixed
     124point is reached or when all short jumps have been changed into long jumps.
    120125
    121126Also, we can be certain that we have reached an optimal solution: a short jump
     
    125130the absolute jump.
    126131
    127 The reason for this is that with absolute jumps, the encoding of a jump no
    128 longer depends only on the distance between the jump and its target: in order
    129 for an absolute jump to be possible, they need to be in the same segment (for
    130 the MCS-51, this means that the first 5 bytes of their addresses have to be
    131 equal). It is therefore entirely possible for two jumps with the same span to
    132 be encoded in different ways (absolute if the jump and its target are in the
    133 same segment, long if this is not the case).
    134 
    135 This invalidates the termination argument: a jump, once encoded as a long jump,
    136 can be re-encoded during a later iteration as an absolute jump. Consider the
    137 program shown in figure~\ref{f:term_example}. At the start of the first
    138 iteration, both the jump to {\tt X} and the jump to $\mathtt{L}_{0}$ are
    139 encoded as small jumps. Let us assume that in this case, the placement of
    140 $\mathtt{L}_{0}$ and the jump to it are such that $\mathtt{L}_{0}$ is just
    141 outside the segment that contains this jump. Let us also assume that the
    142 distance between $\mathtt{L}_{0}$ and the jump to it are too large for the jump
    143 to be encoded as a short jump.
    144 
    145 All this means that in the second iteration, the jump to $\mathtt{L}_{0}$ will
    146 be encoded as a long jump. If we assume that the jump to {\tt X} is encoded as
    147 a long jump as well, the size of the jump will increase and $\mathtt{L}_{0}$
    148 will be `propelled' into the same segment as its jump. Hence, in the third
    149 iteration, it can be encoded as an absolute jump. At first glance, there is
    150 nothing that prevents us from making a construction where two jumps interact
    151 in such a way as to keep switching between long and absolute encodings for
    152 an indefinite amount of iterations.
     132The reason for this is that with absolute jumps, the encoding of a branch
     133instruction no longer depends only on the distance between the branch
     134instruction and its target: in order for an absolute jump to be possible, they
     135need to be in the same segment (for the MCS-51, this means that the first 5
     136bytes of their addresses have to be equal). It is therefore entirely possible
     137for two branch instructions with the same span to be encoded in different ways
     138(absolute if the branch instruction and its target are in the same segment,
     139long if this is not the case).
     140
     141This invalidates the termination argument: a branch instruction, once encoded
     142as a long jump, can be re-encoded during a later iteration as an absolute jump.
     143Consider the program shown in figure~\ref{f:term_example}. At the start of the
     144first iteration, both the branch to {\tt X} and the branch to $\mathtt{L}_{0}$
     145are encoded as small jumps. Let us assume that in this case, the placement of
     146$\mathtt{L}_{0}$ and the branch to it are such that $\mathtt{L}_{0}$ is just
     147outside the segment that contains this branch. Let us also assume that the
     148distance between $\mathtt{L}_{0}$ and the branch to it are too large for the
     149branch instruction to be encoded as a short jump.
     150
     151All this means that in the second iteration, the branch to $\mathtt{L}_{0}$ will
     152be encoded as a long jump. If we assume that the branch to {\tt X} is encoded as
     153a long jump as well, the size of the branch instruction will increase and
     154$\mathtt{L}_{0}$ will be `propelled' into the same segment as its branch
     155instruction, because every subsequent instruction will move one byte forward.
     156Hence, in the third iteration, the branch to $\mathtt{L}_{0}$ can be encoded as
     157an absolute jump. At first glance, there is nothing that prevents us from
     158making a construction where two branch instructions interact in such a way as
     159to keep switching between long and absolute encodings for an indefinite amount
     160of iterations.
    153161
    154162\begin{figure}[h]
     
    168176problem is NP-complete. In this explanation, a condition for NP-completeness
    169177is the fact that programs be allowed to contain {\em pathological} jumps.
    170 These are jumps that can normally not be encoded as a short(er) jump, but gain
    171 this property when some other jumps are encoded as a long(er) jump. This is
    172 exactly what happens in figure~\ref{f:term_example}: by encoding the first
    173 jump as a long jump, another jump switches from long to absolute
    174 (which is shorter).
     178These are branch instructions that can normally not be encoded as a short(er)
     179jump, but gain this property when some other branch instructions are encoded as
     180a long(er) jump. This is exactly what happens in figure~\ref{f:term_example}:
     181by encoding the first branch instruction as a long jump, another branch
     182instructions switches from long to absolute (which is shorter).
    175183
    176184The optimality argument no longer holds either. Let us consider the program
     
    178186$\mathtt{L}_{0}$ and $\mathtt{L}_{1}$ is such that if {\tt jmp X} is encoded
    179187as a short jump, there is a segment border just after $\mathtt{L}_{1}$. Let
    180 us also assume that the three jumps to $\mathtt{L}_{1}$ are all in the same
     188us also assume that the three branches to $\mathtt{L}_{1}$ are all in the same
    181189segment, but far enough away from $\mathtt{L}_{1}$ that they cannot be encoded
    182190as short jumps.
    183191
    184192Then, if {\tt jmp X} were to be encoded as a short jump, which is clearly
    185 possible, all of the jumps to $\mathtt{L}_{1}$ would have to be encoded as
     193possible, all of the branches to $\mathtt{L}_{1}$ would have to be encoded as
    186194long jumps. However, if {\tt jmp X} were to be encoded as a long jump, and
    187195therefore increase in size, $\mathtt{L}_{1}$ would be `propelled' across the
    188 segment border, so that the three jumps to $\mathtt{L}_{1}$ could be encoded
     196segment border, so that the three branches to $\mathtt{L}_{1}$ could be encoded
    189197as absolute jumps. Depending on the relative sizes of long and absolute jumps,
    190198this solution might actually be smaller than the one reached by the smallest
  • src/ASM/CPP2012-policy/proof.tex

    r2085 r2091  
    5050termination in order to prove correctness: if the algorithm is halted after
    5151a number of steps without reaching a fixed point, the solution is not
    52 guaranteed to be correct. More specifically, jumps might be encoded whose
    53 displacement is too great for the instruction chosen.
    54 
    55 Proof of termination rests on the fact that jumps can only increase, which
    56 means that we must reach a fixed point after at most $2n$ iterations, with
    57 $2n$ the number of jumps in the program. This worst case is reached if at every
    58 iteration, we change the encoding of exactly one jump; since a jump can change
    59 from short to absolute and from absolute to long, there can be at most $2n$
    60 changes.
     52guaranteed to be correct. More specifically, branch instructions might be
     53encoded who do not coincide with the span between their location and their
     54destination.
     55
     56Proof of termination rests on the fact that the encoding of branch
     57instructions can only grow larger, which means that we must reach a fixed point
     58after at most $2n$ iterations, with $n$ the number of branch instructions in
     59the program. This worst case is reached if at every iteration, we change the
     60encoding of exactly one branch instruction; since the encoding of any branch
     61instructions can change first from short to absolute and then from absolute to
     62long, there can be at most $2n$ changes.
    6163
    6264This proof has been executed in the ``Russell'' style from~\cite{Sozeau2006}.
     
    99101
    100102This invariant states that when we try to look up the jump length of a
    101 pseudo-address where there is no jump, we will get the default value, a short
    102 jump.
     103pseudo-address where there is no branch instruction, we will get the default
     104value, a short jump.
    103105
    104106\begin{lstlisting}
     
    138140({\tt sigma\_policy\_specification}); its main difference
    139141with the final version is that it uses {\tt instruction\_size\_jmplen} to
    140 compute the instruction size. This function uses $j$ to compute the size
    141 of jumps (i.e. it uses the $\sigma$ function under construction), instead
    142 of looking at the distance between source and destination. This is because
    143 $\sigma$ is still under construction; later on we will prove that after the
    144 final iteration, {\tt sigma\_compact\_unsafe} is equivalent to the main
     142compute the instruction size. This function uses $j$ to compute the span
     143of branch instructions  (i.e. it uses the $\sigma$ function under construction),
     144instead of looking at the distance between source and destination. This is
     145because $\sigma$ is still under construction; later on we will prove that after
     146the final iteration, {\tt sigma\_compact\_unsafe} is equivalent to the main
    145147property.
    146148
     
    172174\end{lstlisting}
    173175
    174 This is a more direct safety property: it states that jump instructions are
    175 encoded properly, and that no wrong jump instructions are chosen.
     176This is a more direct safety property: it states that branch instructions are
     177encoded properly, and that no wrong branch instructions are chosen.
    176178
    177179Note that we compute the distance using the memory address of the instruction
    178180plus its size: this is due to the behaviour of the MCS-51 architecture, which
    179181increases the program counter directly after fetching, and only then executes
    180 the jump.
     182the branch instruction (by changing the program counter again).
    181183
    182184\begin{lstlisting}
     
    199201iteration does not change with respect to the current one. $added$ is the
    200202variable that keeps track of the number of bytes we have added to the program
    201 size by changing jumps; if this is 0, the program has not changed and vice
    202 versa.
     203size by changing the encoding of branch instructions; if this is 0, the program
     204has not changed and vice versa.
    203205
    204206We need to use two different formulations, because the fact that $added$ is 0
    205 does not guarantee that no jumps have changed: it is possible that we have
    206 replaced a short jump with a absolute jump, which does not change the size.
     207does not guarantee that no branch instructions have changed: it is possible
     208that we have replaced a short jump with a absolute jump, which does not change
     209the size of the branch instruction.
    207210
    208211Therefore {\tt policy\_pc\_equal} states that $old\_sigma_1(x) = sigma_1(x)$,
     
    237240
    238241This invariant is applied to $old\_sigma$; if our program becomes too large
    239 for memory, the previous iteration cannot have every jump encoded as a long
    240 jump. This is needed later on in the proof of termination.
     242for memory, the previous iteration cannot have every branch instruction encoded
     243as a long jump. This is needed later on in the proof of termination.
    241244
    242245If the iteration returns $\mathtt{Some}\ \sigma$, the invariants
     
    270273
    271274This is the same invariant as ${\tt sigma\_compact\_unsafe}$, but instead it
    272 computes the sizes of jump instructions by looking at the distance between
     275computes the sizes of branch instructions by looking at the distance between
    273276position and destination using $\sigma$.
    274277
     
    294297been smaller than 65 Kbytes.
    295298
    296 Suppose that all the jumps in the previous iteration are long jumps. This means
    297 that all jumps in this iteration are long jumps as well, and therefore that
    298 both iterations are equal in jumps. Per the invariant, this means that
     299Suppose that all the branch instructions in the previous iteration are
     300encodes as long jumps. This means that all branch instructions in this
     301iteration are long jumps as well, and therefore that both iterations are equal
     302in the encoding of their branch instructions. Per the invariant, this means that
    299303$added = 0$, and therefore that all addresses in both iterations are equal.
    300304But if all addresses are equal, the program sizes must be equal too, which
    301305means that the program size in the current iteration must be smaller than
    302 65 Kbytes. This contradicts the earlier hypothesis, hence not all jumps in
    303 the previous iteration are long jumps.
     30665 Kbytes. This contradicts the earlier hypothesis, hence not all branch
     307instructions in the previous iteration are encoded as long jumps.
    304308
    305309The proof of {\tt sigma\_compact} follows from {\tt sigma\_compact\_unsafe} and
     
    311315
    312316These are the invariants that hold after $2n$ iterations, where $n$ is the
    313 program size. Here, we only need {\tt out\_of\_program\_none},
    314 {\tt sigma\_compact} and the fact that $\sigma(0) = 0$.
     317program size (we use the program size for convenience; we could also use the
     318number of branch instructions, but this is more complicated). Here, we only
     319need {\tt out\_of\_program\_none}, {\tt sigma\_compact} and the fact that
     320$\sigma(0) = 0$.
    315321
    316322Termination can now be proven through the fact that there is a $k \leq 2n$, with
     
    319325property holds, or every iteration up to $2n$ is different. In the latter case,
    320326since the only changes between the iterations can be from shorter jumps to
    321 longer jumps, in iteration $2n$ every jump must be long. In this case,
    322 iteration $2n$ is equal to iteration $2n+1$ and the fixpoint is reached.
     327longer jumps, in iteration $2n$ every branch instruction must be encoded as
     328a long jump. In this case, iteration $2n$ is equal to iteration $2n+1$ and the
     329fixpoint is reached.
Note: See TracChangeset for help on using the changeset viewer.