Changeset 3473


Ignore:
Timestamp:
Sep 22, 2014, 11:25:51 AM (5 years ago)
Author:
mulligan
Message:

inlined section into main document, title change

Location:
Papers/sttt
Files:
4 edited

Legend:

Unmodified
Added
Removed
  • Papers/sttt/conclusion.tex

    r3470 r3473  
    1 \section{Conclusion}
    2 
    3 In the previous sections we have discussed the branch displacement optimisation
    4 problem, presented an optimised solution, and discussed the proof of
    5 termination and correctness for this algorithm, as formalised in Matita.
    6 
    7 The algorithm we have presented is fast and correct, but not optimal; a true
    8 optimal solution would need techniques like constraint solvers. While outside
    9 the scope of the present research, it would be interesting to see if enough
    10 heuristics could be found to make such a solution practical for implementing
    11 in an existing compiler; this would be especially useful for embedded systems,
    12 where it is important to have as small a solution as possible.
    13 
    14 In itself the algorithm is already useful, as it results in a smaller solution
    15 than the simple `every branch instruction is long' used up until now---and with
    16 only 64 Kb of memory, every byte counts. It also results in a smaller solution
    17 than the greatest fixed point algorithm that {\tt gcc} uses. It does this
    18 without sacrificing speed or correctness.
    19 
    20 The certification of an assembler that relies on the branch displacement
    21 algorithm described in this paper was presented in~\cite{lastyear}.
    22 The assembler computes the $\sigma$ map as described in this paper and
    23 then works in two passes. In the first pass it builds a map
    24 from instruction labels to addresses in the assembly code. In the
    25 second pass it iterates over the code, translating every pseudo jump
    26 at address $src$ to a label l associated to the assembly instruction at
    27 address $dst$ to a jump of the size dictated by $(\sigma\ src)$ to
    28 $(\sigma\ dst)$. In case of conditional jumps, the translated jump may be
    29 implemented with a series of instructions.
    30 
    31 The proof of correctness abstracts over the algorithm used and only relies on
    32 {\tt sigma\_policy\_specification} (page~\ref{sigmapolspec}). It is a variation
    33 of a standard 1-to-many forward simulation proof~\cite{Leroy2009}. The
    34 relation R between states just maps every code address $ppc$ stored in
    35 registers or memory to $(\sigma\ ppc)$. To identify the code addresses,
    36 an additional data structure is always kept together with the source
    37 state and is updated by the semantics. The semantics is preserved
    38 only for those programs whose source code operations
    39 $(f\ ppc_1\ \ldots\ ppc_n)$ applied to code addresses $ppc_1 \ldots ppc_n$ are
    40 such that $(f\ (\sigma\ ppc_1)\ ...\ (\sigma\ ppc_n) = f\ ppc_1\ ppc_n))$. For
    41 example, an injective $\sigma$ preserves a binary equality test f for code
    42 addresses, but not pointer subtraction.
    43 
    44 The main lemma (fetching simulation), which relies on\\
    45 {\tt sigma\_policy\_specification} and is established by structural induction
    46 over the source code, says that fetching an assembly instruction at
    47 position ppc is equal to fetching the translation of the instruction at
    48 position $(\sigma\ ppc)$, and that the new incremented program counter is at
    49 the beginning of the next instruction (compactness). The only exception is
    50 when the instruction fetched is placed at the end of code memory and is
    51 followed only by dead code. Execution simulation is trivial because of the
    52 restriction over well behaved programs w.r.t. sigma. The condition
    53 $\sigma\ 0 = 0$ is necessary because the hardware model prescribes that the
    54 first instruction to be executed will be at address 0. For the details
    55 see~\cite{lastyear}.
    56 
    57 Instead of verifying the algorithm directly, another solution to the problem
    58 would be to run an optimisation algorithm, and then verify the safety of the
    59 result using a verified validator. Such a validator would be easier to verify
    60 than the algorithm itself and it would also be efficient, requiring only a
    61 linear pass over the source code to test the specification. However, it is
    62 surely also interesting to formally prove that the assembler never rejects
    63 programs that should be accepted, i.e. that the algorithm itself is correct.
    64 This is the topic of the current paper.
    65 
    66 \subsection{Related work}
    67 
    68 As far as we are aware, this is the first formal discussion of the branch
    69 displacement optimisation algorithm.
    70 
    71 The CompCert project is another verified compiler project.
    72 Their backend~\cite{Leroy2009} generates assembly code for (amongst others) subsets of the
    73 PowerPC and x86 (32-bit) architectures. At the assembly code stage, there is
    74 no distinction between the span-dependent jump instructions, so a branch
    75 displacement optimisation algorithm is not needed.
    76 
    77 %An offshoot of the CompCert project is the CompCertTSO project, which adds
    78 %thread concurrency and synchronisation to the CompCert compiler~\cite{TSO2011}.
    79 %This compiler also generates assembly code and therefore does not include a
    80 %branch displacement algorithm.
    81  
    82 %Finally, there is also the Piton stack~\cite{Moore1996}, which not only includes the
    83 %formal verification of a compiler, but also of the machine architecture
    84 %targeted by that compiler, a microprocessor called the FM9001.
    85 %However, this architecture does not have different
    86 %jump sizes (branching is simulated by assigning values to the program counter),
    87 %so the branch displacement problem is irrelevant.
    88  
  • Papers/sttt/main.tex

    r3472 r3473  
    1818\begin{document}
    1919
    20 \title{A verified optimising assembler
     20\title{On the proof of correctness of a verified optimising assembler
    2121\thanks{Research supported by the CerCo project, within the Future and Emerging Technologies (FET) programme of the Seventh Framework Programme for Research of the European Commission, under FET-Open grant number 243881}}
    2222\author{Jaap Boender \and Dominic P. Mulligan \and Claudio Sacerdoti Coen}
     
    2626
    2727\begin{abstract}
    28 Optimising assemblers face the `branch displacement problem', i.e. how best to choose between concrete machine code jump instructions---typically of differing instruction and offset sizes---when expanding pseudo-instructions.
    29 Ideally, a well-implemented optimising assembler would choose the set of jump expansions that minimises the size of the resulting machine code program, a task that is provably \textsc{np}-hard.
    30 
    31 As part of CerCo (`Certified Complexity')---an \textsc{eu}-funded project to develop a verified concrete complexity preserving compiler for a large subset of the C programming language---we have implemented and proved correct an assembler within an interactive theorem prover.
     28Optimising assemblers face the `branch displacement' or `jump encoding' problem, i.e. how best to choose between concrete machine code jump instructions --- typically of differing instruction and offset sizes --- when expanding pseudo-instructions.
     29Ideally, an optimising assembler would choose the set of jump expansions that minimises the size of the resulting machine code program, a task that is provably \textsc{np}-hard.
     30
     31As part of CerCo (`Certified Complexity') --- an \textsc{eu}-funded project to develop a verified concrete complexity preserving compiler for a large subset of the C programming language --- we have implemented and proved correct an assembler within an interactive theorem prover.
    3232Our assembler targets the instruction set of a typical micro-controller, the Intel \textsc{mcs}-51 series.
    3333As is common in embedded systems development, this micro-controller has a paucity of available code memory and therefore we face an additional pressure in reducing the size of any assembled machine code program.
     
    3939\end{abstract}
    4040
    41 \input{problem}
    42 \input{algorithm}
    43 \input{proof}
    44 \input{conclusion}
     41\section{Introduction}
     42
     43The problem of branch displacement optimisation, also known as jump encoding, is
     44a well-known problem in assembler design~\cite{Hyde2006}. Its origin lies in the
     45fact that in many architecture sets, the encoding (and therefore size) of some
     46instructions depends on the distance to their operand (the instruction 'span').
     47The branch displacement optimisation problem consists of encoding these
     48span-dependent instructions in such a way that the resulting program is as
     49small as possible.
     50
     51This problem is the subject of the present paper. After introducing the problem
     52in more detail, we will discuss the solutions used by other compilers, present
     53the algorithm we use in the CerCo assembler, and discuss its verification,
     54that is the proofs of termination and correctness using the Matita proof
     55assistant~\cite{Asperti2007}.
     56
     57Formulating the final statement of correctness and finding the loop invariants
     58have been non-trivial tasks and are, indeed, the main contribution of this
     59paper. It has required considerable care and fine-tuning to formulate not
     60only the minimal statement required for the ulterior proof of correctness of
     61the assembler, but also the minimal set of invariants needed for the proof
     62of correctness of the algorithm.
     63
     64The research presented in this paper has been executed within the CerCo project
     65which aims at formally verifying a C compiler with cost annotations. The
     66target architecture for this project is the MCS-51, whose instruction set
     67contains span-dependent instructions. Furthermore, its maximum addressable
     68memory size is very small (64 Kb), which makes it important to generate
     69programs that are as small as possible. With this optimisation, however, comes increased complexity and hence increased possibility for error. We must make sure that the branch instructions are encoded correctly, otherwise the assembled program will behave unpredictably.
     70
     71All Matita files related to this development can be found on the CerCo
     72website, \url{http://cerco.cs.unibo.it}. The specific part that contains the
     73branch displacement algorithm is in the {\tt ASM} subdirectory, in the files
     74{\tt PolicyFront.ma}, {\tt PolicyStep.ma} and {\tt Policy.ma}.
     75
     76\section{The branch displacement optimisation problem}
     77
     78In most modern instruction sets that have them, the only span-dependent
     79instructions are branch instructions. Taking the ubiquitous x86-64 instruction
     80set as an example, we find that it contains eleven different forms of the
     81unconditional branch instruction, all with different ranges, instruction sizes
     82and semantics (only six are valid in 64-bit mode, for example). Some examples
     83are shown in Figure~\ref{f:x86jumps} (see also~\cite{IntelDev}).
     84
     85\begin{figure}[h]
     86\begin{center}
     87\begin{tabular}{|l|l|l|}
     88\hline
     89Instruction & Size (bytes) & Displacement range \\
     90\hline
     91Short jump & 2 & -128 to 127 bytes \\
     92Relative near jump & 5 & $-2^{32}$ to $2^{32}-1$ bytes \\
     93Absolute near jump & 6 & one segment (64-bit address) \\
     94Far jump & 8 & entire memory (indirect jump) \\
     95\hline
     96\end{tabular}
     97\end{center}
     98\caption{List of x86 branch instructions}
     99\label{f:x86jumps}
     100\end{figure}
     101
     102The chosen target architecture of the CerCo project is the Intel MCS-51, which
     103features three types of branch instructions (or jump instructions; the two terms
     104are used interchangeably), as shown in Figure~\ref{f:mcs51jumps}.
     105
     106\begin{figure}[h]
     107\begin{center}
     108\begin{tabular}{|l|l|l|l|}
     109\hline
     110Instruction & Size    & Execution time & Displacement range \\
     111            & (bytes) & (cycles) & \\
     112\hline
     113SJMP (`short jump') & 2 & 2 & -128 to 127 bytes \\
     114AJMP (`absolute jump') & 2 & 2 & one segment (11-bit address) \\
     115LJMP (`long jump') & 3 & 3 & entire memory \\
     116\hline
     117\end{tabular}
     118\end{center}
     119\caption{List of MCS-51 branch instructions}
     120\label{f:mcs51jumps}
     121\end{figure}
     122
     123Conditional branch instructions are only available in short form, which
     124means that a conditional branch outside the short address range has to be
     125encoded using three branch instructions (for instructions whose logical
     126negation is available, it can be done with two branch instructions, but for
     127some instructions this is not the case). The call instruction is
     128only available in absolute and long forms.
     129
     130Note that even though the MCS-51 architecture is much less advanced and much
     131simpler than the x86-64 architecture, the basic types of branch instruction
     132remain the same: a short jump with a limited range, an intra-segment jump and a
     133jump that can reach the entire available memory.
     134 
     135Generally, in code fed to the assembler as input, the only
     136difference between branch instructions is semantics, not span. This
     137means that a distinction is made between an unconditional branch and the
     138several kinds of conditional branch, but not between their short, absolute or
     139long variants.
     140
     141The algorithm used by the assembler to encode these branch instructions into
     142the different machine instructions is known as the {\em branch displacement
     143algorithm}. The optimisation problem consists of finding as small an encoding as
     144possible, thus minimising program length and execution time.
     145
     146Similar problems, e.g. the branch displacement optimisation problem for other
     147architectures, are known to be NP-complete~\cite{Robertson1979,Szymanski1978},
     148which could make finding an optimal solution very time-consuming.
     149
     150The canonical solution, as shown by Szymanski~\cite{Szymanski1978} or more
     151recently by Dickson~\cite{Dickson2008} for the x86 instruction set, is to use a
     152fixed point algorithm that starts with the shortest possible encoding (all
     153branch instruction encoded as short jumps, which is likely not a correct
     154solution) and then iterates over the source to re-encode those branch
     155instructions whose target is outside their range.
     156
     157\subsection*{Adding absolute jumps}
     158
     159In both papers mentioned above, the encoding of a jump is only dependent on the
     160distance between the jump and its target: below a certain value a short jump
     161can be used; above this value the jump must be encoded as a long jump.
     162
     163Here, termination of the smallest fixed point algorithm is easy to prove. All
     164branch instructions start out encoded as short jumps, which means that the
     165distance between any branch instruction and its target is as short as possible
     166(all the intervening jumps are short).
     167If, in this situation, there is a branch instruction $b$ whose span is not
     168within the range for a short jump, we can be sure that we can never reach a
     169situation where the span of $j$ is so small that it can be encoded as a short
     170jump. This argument continues to hold throughout the subsequent iterations of
     171the algorithm: short jumps can change into long jumps, but not \emph{vice versa},
     172as spans only increase. Hence, the algorithm either terminates early when a fixed
     173point is reached or when all short jumps have been changed into long jumps.
     174
     175Also, we can be certain that we have reached an optimal solution: a short jump
     176is only changed into a long jump if it is absolutely necessary.
     177
     178However, neither of these claims (termination nor optimality) hold when we add
     179the absolute jump. With absolute jumps, the encoding of a branch
     180instruction no longer depends only on the distance between the branch
     181instruction and its target. An absolute jump is possible when instruction and
     182target are in the same segment (for the MCS-51, this means that the first 5
     183bytes of their addresses have to be equal). It is therefore entirely possible
     184for two branch instructions with the same span to be encoded in different ways
     185(absolute if the branch instruction and its target are in the same segment,
     186long if this is not the case).
     187
     188\begin{figure}[t]
     189\begin{subfigure}[b]{.45\linewidth}
     190\small
     191\begin{alltt}
     192    jmp X
     193    \ldots
     194L\(\sb{0}\): \ldots
     195% Start of new segment if
     196% jmp X is encoded as short
     197    \ldots
     198    jmp L\(\sb{0}\)
     199\end{alltt}
     200\caption{Example of a program where a long jump becomes absolute}
     201\label{f:term_example}
     202\end{subfigure}
     203\hfill
     204\begin{subfigure}[b]{.45\linewidth}
     205\small
     206\begin{alltt}
     207L\(\sb{0}\): jmp X
     208X:  \ldots
     209    \ldots
     210L\(\sb{1}\): \ldots
     211% Start of new segment if
     212% jmp X is encoded as short
     213    \ldots
     214    jmp L\(\sb{1}\)
     215    \ldots
     216    jmp L\(\sb{1}\)
     217    \ldots
     218    jmp L\(\sb{1}\)
     219    \ldots
     220\end{alltt}
     221\caption{Example of a program where the fixed-point algorithm is not optimal}
     222\label{f:opt_example}
     223\end{subfigure}
     224\end{figure}
     225
     226This invalidates our earlier termination argument: a branch instruction, once encoded
     227as a long jump, can be re-encoded during a later iteration as an absolute jump.
     228Consider the program shown in Figure~\ref{f:term_example}. At the start of the
     229first iteration, both the branch to {\tt X} and the branch to $\mathtt{L}_{0}$
     230are encoded as small jumps. Let us assume that in this case, the placement of
     231$\mathtt{L}_{0}$ and the branch to it are such that $\mathtt{L}_{0}$ is just
     232outside the segment that contains this branch. Let us also assume that the
     233distance between $\mathtt{L}_{0}$ and the branch to it is too large for the
     234branch instruction to be encoded as a short jump.
     235
     236All this means that in the second iteration, the branch to $\mathtt{L}_{0}$ will
     237be encoded as a long jump. If we assume that the branch to {\tt X} is encoded as
     238a long jump as well, the size of the branch instruction will increase and
     239$\mathtt{L}_{0}$ will be `propelled' into the same segment as its branch
     240instruction, because every subsequent instruction will move one byte forward.
     241Hence, in the third iteration, the branch to $\mathtt{L}_{0}$ can be encoded as
     242an absolute jump. At first glance, there is nothing that prevents us from
     243constructing a configuration where two branch instructions interact in such a
     244way as to iterate indefinitely between long and absolute encodings.
     245
     246This situation mirrors the explanation by Szymanski~\cite{Szymanski1978} of why
     247the branch displacement optimisation problem is NP-complete. In this explanation,
     248a condition for NP-completeness is the fact that programs be allowed to contain
     249{\em pathological} jumps. These are branch instructions that can normally not be
     250encoded as a short(er) jump, but gain this property when some other branch
     251instructions are encoded as a long(er) jump. This is exactly what happens in
     252Figure~\ref{f:term_example}. By encoding the first branch instruction as a long
     253jump, another branch instruction switches from long to absolute (which is
     254shorter).
     255
     256In addition, our previous optimality argument no longer holds. Consider the
     257program shown in Figure~\ref{f:opt_example}. Suppose that the distance between
     258$\mathtt{L}_{0}$ and $\mathtt{L}_{1}$ is such that if {\tt jmp X} is encoded
     259as a short jump, there is a segment border just after $\mathtt{L}_{1}$. Let
     260us also assume that all three branches to $\mathtt{L}_{1}$ are in the same
     261segment, but far enough away from $\mathtt{L}_{1}$ that they cannot be encoded
     262as short jumps.
     263
     264Then, if {\tt jmp X} were to be encoded as a short jump, which is clearly
     265possible, all of the branches to $\mathtt{L}_{1}$ would have to be encoded as
     266long jumps. However, if {\tt jmp X} were to be encoded as a long jump, and
     267therefore increase in size, $\mathtt{L}_{1}$ would be `propelled' across the
     268segment border, so that the three branches to $\mathtt{L}_{1}$ could be encoded
     269as absolute jumps. Depending on the relative sizes of long and absolute jumps,
     270this solution might actually be smaller than the one reached by the smallest
     271fixed point algorithm.
     272
     273\section{Our algorithm}
     274
     275\subsection{Design decisions}
     276
     277Given the NP-completeness of the problem, finding optimal solutions
     278(using, for example, a constraint solver) can potentially be very costly.
     279
     280The SDCC compiler~\cite{SDCC2011}, which has a backend targeting the MCS-51
     281instruction set, simply encodes every branch instruction as a long jump
     282without taking the distance into account. While certainly correct (the long
     283jump can reach any destination in memory) and a very fast solution to compute,
     284it results in a less than optimal solution in terms of output size and
     285execution time.
     286
     287On the other hand, the {\tt gcc} compiler suite, while compiling
     288C on the x86 architecture, uses a greatest fix point algorithm. In other words,
     289it starts with all branch instructions encoded as the largest jumps
     290available, and then tries to reduce the size of branch instructions as much as
     291possible.
     292
     293Such an algorithm has the advantage that any intermediate result it returns
     294is correct: the solution where every branch instruction is encoded as a large
     295jump is always possible, and the algorithm only reduces those branch
     296instructions whose destination address is in range for a shorter jump.
     297The algorithm can thus be stopped after a determined number of steps without
     298sacrificing correctness.
     299
     300The result, however, is not necessarily optimal. Even if the algorithm is run
     301until it terminates naturally, the fixed point reached is the {\em greatest}
     302fixed point, not the least fixed point. Furthermore, {\tt gcc} (at least for
     303the x86 architecture) only uses short and long jumps. This makes the algorithm
     304more efficient, as shown in the previous section, but also results in a less
     305optimal solution.
     306
     307In the CerCo assembler, we opted at first for a least fixed point algorithm,
     308taking absolute jumps into account.
     309
     310Here, we ran into a problem with proving termination, as explained in the
     311previous section: if we only take short and long jumps into account, the jump
     312encoding can only switch from short to long, but never in the other direction.
     313When we add absolute jumps, however, it is theoretically possible for a branch
     314instruction to switch from absolute to long and back, as previously explained.
     315Proving termination then becomes difficult, because there is nothing that
     316precludes a branch instruction from oscillating back and forth between absolute
     317and long jumps indefinitely.
     318
     319To keep the algorithm in the same complexity class and more easily
     320prove termination, we decided to explicitly enforce the `branch instructions
     321must always grow longer' requirement: if a branch instruction is encoded as a
     322long jump in one iteration, it will also be encoded as a long jump in all the
     323following iterations. Therefore the encoding of any branch instruction
     324can change at most two times: once from short to absolute (or long), and once
     325from absolute to long.
     326
     327There is one complicating factor. Suppose that a branch instruction is encoded
     328in step $n$ as an absolute jump, but in step $n+1$ it is determined that
     329(because of changes elsewhere) it can now be encoded as a short jump. Due to
     330the requirement that the branch instructions must always grow longer,
     331the branch encoding will be encoded as an absolute jump in step
     332$n+1$ as well.
     333
     334This is not necessarily correct. A branch instruction that can be
     335encoded as a short jump cannot always also be encoded as an absolute jump, as a
     336short jump can bridge segments, whereas an absolute jump cannot. Therefore,
     337in this situation we have decided to encode the branch instruction as a long
     338jump, which is always correct.
     339
     340The resulting algorithm, therefore, will not return the least fixed point, as
     341it might have too many long jumps. However, it is still better than the
     342algorithms from SDCC and {\tt gcc}, since even in the worst case, it will still
     343return a smaller or equal solution.
     344
     345Experimenting with our algorithm on the test suite of C programs included with
     346gcc 2.3.3 has shown that on average, about 25 percent of jumps are encoded as short or absolute jumps by the algorithm. As not all instructions are jumps, this does not make for a large reduction in size, but it can make for a reduction in execution time: if jumps
     347are executed multiple times, for example in loops, the fact that short jumps take less cycles to execute than long jumps can have great effect.
     348
     349As for complexity, there are at most $2n$ iterations, where $n$ is the number of
     350branch instructions. Practical tests within the CerCo project on small to
     351medium pieces of code have shown that in almost all cases, a fixed point is
     352reached in 3 passes. Only in one case did the algorithm need 4. This is not surprising: after all, the difference between short/absolute and
     353long jumps is only one byte (three for conditional jumps). For a change from
     354short/absolute to long to have an effect on other jumps is therefore relatively
     355uncommon, which explains why a fixed point is reached so quickly.
     356
     357\subsection{The algorithm in detail}
     358
     359The branch displacement algorithm forms part of the translation from
     360pseudocode to assembler. More specifically, it is used by the function that
     361translates pseudo-addresses (natural numbers indicating the position of the
     362instruction in the program) to actual addresses in memory. Note that in pseudocode, all instructions are of size 1.
     363
     364Our original intention was to have two different functions, one function
     365$\mathtt{policy}: \mathbb{N} \rightarrow \{\mathtt{short\_jump},
     366\mathtt{absolute\_jump}, \mathtt{long\_jump}\}$ to associate jumps to their
     367intended encoding, and a function $\sigma: \mathbb{N} \rightarrow
     368\mathtt{Word}$ to associate pseudo-addresses to machine addresses. $\sigma$
     369would use $\mathtt{policy}$ to determine the size of jump instructions. This turned out to be suboptimal from the algorithmic point of view and
     370impossible to prove correct.
     371
     372From the algorithmic point of view, in order to create the $\mathtt{policy}$
     373function, we must necessarily have a translation from pseudo-addresses
     374to machine addresses (i.e. a $\sigma$ function): in order to judge the distance
     375between a jump and its destination, we must know their memory locations.
     376Conversely, in order to create the $\sigma$ function, we need to have the
     377$\mathtt{policy}$ function, otherwise we do not know the sizes of the jump
     378instructions in the program.
     379
     380Much the same problem appears when we try to prove the algorithm correct: the
     381correctness of $\mathtt{policy}$ depends on the correctness of $\sigma$, and
     382the correctness of $\sigma$ depends on the correctness of $\mathtt{policy}$.
     383
     384We solved this problem by integrating the $\mathtt{policy}$ and $\sigma$
     385algorithms. We now have a function
     386$\sigma: \mathbb{N} \rightarrow \mathtt{Word} \times \mathtt{bool}$ which
     387associates a pseudo-address to a machine address. The boolean denotes a forced
     388long jump; as noted in the previous section, if during the fixed point
     389computation an absolute jump changes to be potentially re-encoded as a short
     390jump, the result is actually a long jump. It might therefore be the case that
     391jumps are encoded as long jumps without this actually being necessary, and this
     392information needs to be passed to the code generating function.
     393
     394The assembler function encodes the jumps by checking the distance between
     395source and destination according to $\sigma$, so it could select an absolute
     396jump in a situation where there should be a long jump. The boolean is there
     397to prevent this from happening by indicating the locations where a long jump
     398should be encoded, even if a shorter jump is possible. This has no effect on
     399correctness, since a long jump is applicable in any situation.
     400
     401\begin{figure}[t]
     402\small
     403\begin{algorithmic}
     404\Function{f}{$labels$,$old\_sigma$,$instr$,$ppc$,$acc$}
     405  \State $\langle added, pc, sigma \rangle \gets acc$
     406  \If {$instr$ is a backward jump to $j$}
     407    \State $length \gets \mathrm{jump\_size}(pc,sigma_1(labels(j)))$
     408    \Comment compute jump distance
     409  \ElsIf {$instr$ is a forward jump to $j$}
     410    \State $length \gets \mathrm{jump\_size}(pc,old\_sigma_1(labels(j))+added)$
     411  \EndIf
     412  \State $old\_length \gets \mathrm{old\_sigma_1}(ppc)$
     413  \State $new\_length \gets \mathrm{max}(old\_length, length)$
     414  \Comment length must never decrease
     415  \State $old\_size \gets \mathrm{old\_sigma_2}(ppc)$
     416  \State $new\_size \gets \mathrm{instruction\_size}(instr,new\_length)$
     417  \Comment compute size in bytes
     418  \State $new\_added \gets added+(new\_size-old\_size)$
     419  \Comment keep track of total added bytes
     420  \State $new\_sigma \gets old\_sigma$
     421  \State $new\_sigma_1(ppc+1) \gets pc+new\_size$
     422  \State $new\_sigma_2(ppc) \gets new\_length$
     423  \Comment update $\sigma$ \\
     424  \Return $\langle new\_added, pc+new\_size, new\_sigma \rangle$
     425\EndFunction
     426\end{algorithmic}
     427\caption{The heart of the algorithm}
     428\label{f:jump_expansion_step}
     429\end{figure}
     430
     431The algorithm, shown in Figure~\ref{f:jump_expansion_step}, works by folding the
     432function {\sc f} over the entire program, thus gradually constructing $sigma$.
     433This constitutes one step in the fixed point calculation; successive steps
     434repeat the fold until a fixed point is reached. We have abstracted away the case where an instruction is not a jump, since the size of these instructions is constant.
     435
     436Parameters of the function {\sc f} are:
     437\begin{itemize}
     438  \item a function $labels$ that associates a label to its pseudo-address;
     439  \item $old\_sigma$, the $\sigma$ function returned by the previous
     440    iteration of the fixed point calculation;
     441  \item $instr$, the instruction currently under consideration;
     442  \item $ppc$, the pseudo-address of $instr$;
     443  \item $acc$, the fold accumulator, which contains $added$ (the number of
     444  bytes added to the program size with respect to the previous iteration), $pc$
     445  (the highest memory address reached so far), and of course $sigma$, the
     446    $\sigma$ function under construction.
     447\end{itemize}
     448The first two are parameters that remain the same through one iteration, the
     449final three are standard parameters for a fold function (including $ppc$,
     450which is simply the number of instructions of the program already processed).
     451
     452The $\sigma$ functions used by {\sc f} are not of the same type as the final
     453$\sigma$ function: they are of type
     454$\sigma: \mathbb{N} \rightarrow \mathbb{N} \times \{\mathtt{short\_jump},
     455\mathtt{absolute\_jump},\mathtt{long\_jump}\}$; a function that associates a
     456pseudo-address with a memory address and a jump length. We do this to
     457ease the comparison of jump lengths between iterations. In the algorithm,
     458we use the notation $sigma_1(x)$ to denote the memory address corresponding to
     459$x$, and $sigma_2(x)$ for the jump length corresponding to $x$.
     460
     461Note that the $\sigma$ function used for label lookup varies depending on
     462whether the label is behind our current position or ahead of it. For
     463backward branches, where the label is behind our current position, we can use
     464$sigma$ for lookup, since its memory address is already known. However, for
     465forward branches, the memory address of the address of the label is not yet
     466known, so we must use $old\_sigma$.
     467
     468We cannot use $old\_sigma$ without change: it might be the case that we have
     469already increased the size of some branch instructions before, making the
     470program longer and moving every instruction forward. We must compensate for this
     471by adding the size increase of the program to the label's memory address
     472according to $old\_sigma$, so that branch instruction spans do not get
     473compromised.
     474
     475%Note also that we add the pc to $sigma$ at location $ppc+1$, whereas we add the
     476%jump length at location $ppc$. We do this so that $sigma(ppc)$ will always
     477%return a pair with the start address of the instruction at $ppc$ and the
     478%length of its branch instruction (if any); the end address of the program can
     479%be found at $sigma(n+1)$, where $n$ is the number of instructions in the
     480%program.
     481
     482\section{The proof}
     483
     484In this section, we present the correctness proof for the algorithm in more
     485detail. The main correctness statement is shown, slightly simplified, in~Figure~\ref{statement}.
     486%
     487\begin{figure}[t]
     488\small
     489\begin{alignat*}{6}
     490\mathtt{sigma}&\omit\rlap{$\mathtt{\_policy\_specification} \equiv
     491\lambda program.\lambda sigma.$} \notag\\
     492  & \omit\rlap{$sigma\ 0 = 0\ \wedge$} \notag\\
     493  & \mathbf{let}\ & & \omit\rlap{$instr\_list \equiv code\ program\ \mathbf{in}$} \notag\\
     494  &&& \omit\rlap{$\forall ppc.ppc < |instr\_list| \rightarrow$} \notag\\
     495  &&& \mathbf{let}\ && pc \equiv sigma\ ppc\ \mathbf{in} \notag\\
     496  &&& \mathbf{let}\ && instruction \equiv \mathtt{fetch\_pseudo\_instruction}\ instr\_list\ ppc\ \mathbf{in} \notag\\
     497  &&& \mathbf{let}\ && next\_pc \equiv sigma\ (ppc+1)\ \mathbf{in}\notag\\
     498  &&&&& next\_pc = pc + \mathtt{instruction\_size}\ sigma\ ppc\ instruction\ \wedge\notag\\
     499  &&&&& (pc + \mathtt{instruction\_size}\ sigma\ ppc\ instruction < 2^{16}\ \vee\notag\\
     500  &&&&& (\forall ppc'.ppc' < |instr\_list| \rightarrow ppc < ppc' \rightarrow \notag\\
     501  &&&&& \mathbf{let}\ instruction' \equiv \mathtt{fetch\_pseudo\_instruction}\ instr\_list\ ppc'\ \mathbf{in} \notag\\
     502  &&&&&\ \mathtt{instruction\_size}\ sigma\ ppc'\ instruction' = 0)\ \wedge \notag\\
     503  &&&&& pc + \mathtt{instruction\_size}\ sigma\ ppc\ instruction = 2^{16})
     504\end{alignat*}
     505\caption{Main correctness statement\label{statement}}
     506\label{sigmapolspec}
     507\end{figure}
     508%
     509Informally, this means that when fetching a pseudo-instruction at $ppc$, the
     510translation by $\sigma$ of $ppc+1$ is the same as $\sigma(ppc)$ plus the size
     511of the instruction at $ppc$.  That is, an instruction is placed consecutively
     512after the previous one, and there are no overlaps. The rest of the statement deals with memory size: either the next instruction fits within memory ($next\_pc < 2^{16}$) or it ends exactly at the limit memory,
     513in which case it must be the last translated instruction in the program (enforced by specfiying that the size of all subsequent instructions is 0: there may be comments or cost annotations that are not translated).
     514
     515Finally, we enforce that the program starts at address 0, i.e. $\sigma(0) = 0$. It may seem strange that we do not explicitly include a safety property stating that every jump instruction is of the right type with respect to its target (akin to the lemma from Figure~\ref{sigmasafe}), but this is not necessary. The distance is recalculated according to the instruction addresses from $\sigma$, which implicitly expresses safety.
     516
     517Since our computation is a least fixed point computation, we must prove
     518termination in order to prove correctness: if the algorithm is halted after
     519a number of steps without reaching a fixed point, the solution is not
     520guaranteed to be correct. More specifically, branch instructions might be
     521encoded which do not coincide with the span between their location and their
     522destination.
     523
     524Proof of termination rests on the fact that the encoding of branch
     525instructions can only grow larger, which means that we must reach a fixed point
     526after at most $2n$ iterations, with $n$ the number of branch instructions in
     527the program. This worst case is reached if at every iteration, we change the
     528encoding of exactly one branch instruction; since the encoding of any branch
     529instruction can change first from short to absolute, and then to long, there
     530can be at most $2n$ changes.
     531
     532%The proof has been carried out using the ``Russell'' style from~\cite{Sozeau2006}.
     533%We have proven some invariants of the {\sc f} function from the previous
     534%section; these invariants are then used to prove properties that hold for every
     535%iteration of the fixed point computation; and finally, we can prove some
     536%properties of the fixed point.
     537
     538\subsection{Fold invariants}
     539
     540In this section, we present the invariants that hold during the fold of {\sc f}
     541over the program. These will be used later on to prove the properties of the
     542iteration. During the fixed point computation, the $\sigma$ function is
     543implemented as a trie for ease of access; computing $\sigma(x)$ is achieved by
     544looking up the value of $x$ in the trie. Actually, during the fold, the value
     545we pass along is a pair $\mathbb{N} \times \mathtt{ppc\_pc\_map}$. The first
     546component is the number of bytes added to the program so far with respect to
     547the previous iteration, and the second component, {\tt ppc\_pc\_map}, is the
     548actual $\sigma$ trie (which we'll call $strie$ to avoid confusion).
     549%
     550{\small
     551\begin{alignat*}{2}
     552\mathtt{out} & \mathtt{\_of\_program\_none} \equiv \lambda prefix.\lambda strie. \notag\\
     553& \forall i.i < 2^{16} \rightarrow (i > |prefix| \leftrightarrow
     554 \mathtt{lookup\_opt}\ i\ (\mathtt{snd}\ strie) = \mathtt{None})
     555\end{alignat*}}
     556%
     557The first invariant states that any pseudo-address not yet examined is not
     558present in the lookup trie.
     559%
     560{\small
     561\begin{alignat*}{2}
     562\mathtt{not} & \mathtt{\_jump\_default} \equiv \lambda prefix.\lambda strie.\forall i.i < |prefix| \rightarrow\notag\\
     563& \neg\mathtt{is\_jump}\ (\mathtt{nth}\ i\ prefix) \rightarrow \mathtt{lookup}\ i\ (\mathtt{snd}\ strie) = \mathtt{short\_jump}
     564\end{alignat*}}
     565%
     566This invariant states that when we try to look up the jump length of a
     567pseudo-address where there is no branch instruction, we will get the default
     568value, a short jump.
     569%
     570{\small
     571\begin{alignat*}{4}
     572\mathtt{jump} & \mathtt{\_increase} \equiv \lambda pc.\lambda op.\lambda p.\forall i.i < |prefix| \rightarrow \notag\\ 
     573& \mathbf{let}\  oj \equiv \mathtt{lookup}\ i\ (\mathtt{snd}\ op)\ \mathbf{in} \notag\\
     574& \mathbf{let}\ j \equiv \mathtt{lookup}\ i\ (\mathtt{snd}\ p)\ \mathbf{in}\ \mathtt{jmpleq}\ oj\ j
     575\end{alignat*}}
     576%
     577This invariant states that between iterations (with $op$ being the previous
     578iteration, and $p$ the current one), jump lengths either remain equal or
     579increase. It is needed for proving termination.
     580%
     581\begin{figure}[h]
     582\small
     583\begin{alignat*}{6}
     584\mathtt{sigma} & \omit\rlap{$\mathtt{\_compact\_unsafe} \equiv \lambda prefix.\lambda strie.\forall n.n < |prefix| \rightarrow$}\notag\\
     585& \mathbf{match}\ && \omit\rlap{$\mathtt{lookup\_opt}\ n\ (\mathtt{snd}\ strie)\ \mathbf{with}$}\notag\\
     586&&& \omit\rlap{$\mathtt{None} \Rightarrow \mathrm{False}$} \notag\\
     587&&& \omit\rlap{$\mathtt{Some}\ \langle pc, j \rangle \Rightarrow$} \notag\\
     588&&& \mathbf{match}\ && \mathtt{lookup\_opt}\ (n+1)\ (\mathtt{snd}\ strie)\ \mathbf{with}\notag\\
     589&&&&& \mathtt{None} \Rightarrow \mathrm{False} \notag\\
     590&&&&& \mathtt{Some}\ \langle pc_1, j_1 \rangle \Rightarrow
     591    pc_1 = pc + \notag\\
     592&&&&& \ \ \mathtt{instruction\_size\_jmplen}\ j\ (\mathtt{nth}\ n\ prefix)
     593\end{alignat*}
     594\caption{Temporary safety property}
     595\label{sigmacompactunsafe}
     596\end{figure}
     597%
     598We now proceed with the safety lemmas. The lemma in
     599Figure~\ref{sigmacompactunsafe} is a temporary formulation of the main
     600property {\tt sigma\_policy\_specification}. Its main difference from the
     601final version is that it uses {\tt instruction\_size\_jmplen} to compute the
     602instruction size. This function uses $j$ to compute the span of branch
     603instructions  (i.e. it uses the $\sigma$ under construction), instead
     604of looking at the distance between source and destination. This is because
     605$\sigma$ is still under construction; we will prove below that after the
     606final iteration, {\tt sigma\_compact\_unsafe} is equivalent to the main
     607property in Figure~\ref{sigmasafe} which holds at the end of the computation.
     608%
     609\begin{figure}[h]
     610\small
     611\begin{alignat*}{6}
     612\mathtt{sigma} & \omit\rlap{$\mathtt{\_safe} \equiv \lambda prefix.\lambda labels.\lambda old\_strie.\lambda strie.\forall i.i < |prefix| \rightarrow$} \notag\\
     613& \omit\rlap{$\forall dest\_label.\mathtt{is\_jump\_to\ (\mathtt{nth}\ i\ prefix})\ dest\_label \rightarrow$} \notag\\
     614& \mathbf{let} && \omit\rlap{$\ paddr \equiv \mathtt{lookup}\ labels\ dest\_label\ \mathbf{in}$} \notag\\
     615& \mathbf{let} && \omit\rlap{$\ \langle j, src, dest \rangle \equiv \mathbf{if} \ paddr\ \leq\ i\ \mathbf{then}$}\notag\\
     616&&&&& \mathbf{let}\ \langle \_, j \rangle \equiv \mathtt{lookup}\ i\ (\mathtt{snd}\ strie)\ \mathbf{in} \notag\\
     617&&&&& \mathbf{let}\ \langle pc\_plus\_jl, \_ \rangle \equiv \mathtt{lookup}\ (i+1)\ (\mathtt{snd}\ strie)\ \mathbf{in}\notag\\
     618&&&&& \mathbf{let}\ \langle addr, \_ \rangle \equiv \mathtt{lookup}\ paddr\ (\mathtt{snd}\ strie)\ \mathbf{in}\notag\\
     619&&&&& \langle j, pc\_plus\_jl, addr \rangle\notag\\
     620&&&\mathbf{else} \notag\\
     621&&&&&\mathbf{let}\ \langle \_, j \rangle \equiv \mathtt{lookup}\ i\ (\mathtt{snd}\ strie)\ \mathbf{in} \notag\\
     622&&&&&\mathbf{let}\ \langle pc\_plus\_jl, \_ \rangle \equiv \mathtt{lookup}\ (i+1)\ (\mathtt{snd}\ old\_strie)\ \mathbf{in}\notag\\
     623&&&&&\mathbf{let}\ \langle addr, \_ \rangle \equiv \mathtt{lookup}\ paddr\ (\mathtt{snd}\ old\_strie)\ \mathbf{in}\notag\\
     624&&&&&\langle j, pc\_plus\_jl, addr \rangle \mathbf{in}\ \notag\\
     625&&&\mathbf{match} && \ j\ \mathbf{with} \notag\\
     626&&&&&\mathrm{short\_jump} \Rightarrow \mathtt{short\_jump\_valid}\ src\ dest\notag\\
     627&&&&&\mathrm{absolute\_jump} \Rightarrow \mathtt{absolute\_jump\_valid}\ src\ dest\notag\\
     628&&&&&\mathrm{long\_jump} \Rightarrow \mathrm{True}
     629\end{alignat*}
     630\caption{Safety property}
     631\label{sigmasafe}
     632\end{figure}
     633%
     634We compute the distance using the memory address of the instruction
     635plus its size. This follows the behaviour of the MCS-51 microprocessor, which
     636increases the program counter directly after fetching, and only then executes
     637the branch instruction (by changing the program counter again).
     638
     639There are also some simple, properties to make sure that our policy
     640remains consistent, and to keep track of whether the fixed point has been
     641reached. We do not include them here in detail. Two of these properties give the values of $\sigma$ for the start and end of the program; $\sigma(0) = 0$ and $\sigma(n)$, where $n$ is the number of instructions up until now, is equal to the maximum memory address so far. There are also two properties that deal with what happens when the previous
     642iteration does not change with respect to the current one. $added$ is a
     643variable that keeps track of the number of bytes we have added to the program
     644size by changing the encoding of branch instructions. If $added$ is 0, the program
     645has not changed and vice versa.
     646
     647%{\small
     648%\begin{align*}
     649%& \mathtt{lookup}\ 0\ (\mathtt{snd}\ strie) = 0 \notag\\
     650%& \mathtt{lookup}\ |prefix|\ (\mathtt{snd}\ strie) = \mathtt{fst}\ strie
     651%\end{align*}}
     652
     653
     654%{\small
     655%\begin{align*}
     656%& added = 0\ \rightarrow\ \mathtt{policy\_pc\_equal}\ prefix\ old\_strie\ strie \notag\\
     657%& \mathtt{policy\_jump\_equal}\ prefix\ old\_strie\ strie\ \rightarrow\ added = 0
     658%\end{align*}}
     659
     660We need to use two different formulations, because the fact that $added$ is 0
     661does not guarantee that no branch instructions have changed.  For instance,
     662it is possible that we have replaced a short jump with an absolute jump, which
     663does not change the size of the branch instruction. Therefore {\tt policy\_pc\_equal} states that $old\_sigma_1(x) = sigma_1(x)$, whereas {\tt policy\_jump\_equal} states that $old\_sigma_2(x) = sigma_2(x)$. This formulation is sufficient to prove termination and compactness. 
     664
     665Proving these invariants is simple, usually by induction on the prefix length.
     666
     667\subsection{Iteration invariants}
     668
     669These are invariants that hold after the completion of an iteration. The main
     670difference between these invariants and the fold invariants is that after the
     671completion of the fold, we check whether the program size does not supersede
     67264 Kb, the maximum memory size the MCS-51 can address. The type of an iteration therefore becomes an option type: {\tt None} in case
     673the program becomes larger than 64 Kb, or $\mathtt{Some}\ \sigma$
     674otherwise. We also no longer pass along the number of bytes added to the
     675program size, but a boolean that indicates whether we have changed something
     676during the iteration or not.
     677
     678If the iteration returns {\tt None}, which means that it has become too large for memory, there is an invariant that states that the previous iteration cannot
     679have every branch instruction encoded as a long jump. This is needed later in the proof of termination. If the iteration returns $\mathtt{Some}\ \sigma$, the fold invariants are retained without change.
     680
     681Instead of using {\tt sigma\_compact\_unsafe}, we can now use the proper
     682invariant:
     683%
     684{\small
     685\begin{alignat*}{6}
     686\mathtt{sigma} & \omit\rlap{$\mathtt{\_compact} \equiv \lambda program.\lambda sigma.$} \notag\\
     687& \omit\rlap{$\forall n.n < |program|\ \rightarrow$} \notag\\
     688& \mathbf{match}\ && \omit\rlap{$\mathtt{lookup\_opt}\ n\ (\mathtt{snd}\ sigma)\ \mathbf{with}$}\notag\\
     689&&& \omit\rlap{$\mathrm{None}\ \Rightarrow\ \mathrm{False}$}\notag\\
     690&&& \omit\rlap{$\mathrm{Some}\ \langle pc, j \rangle \Rightarrow$}\notag\\
     691&&& \mathbf{match}\ && \mathtt{lookup\_opt}\ (n+1)\ (\mathtt{snd}\ sigma)\ \mathbf{with}\notag\\
     692&&&&& \mathrm{None}\ \Rightarrow\ \mathrm{False}\notag\\
     693&&&&& \mathrm{Some} \langle pc1, j1 \rangle \Rightarrow\notag\\
     694&&&&& \ \ pc1 = pc + \mathtt{instruction\_size}\ n\ (\mathtt{nth}\ n\ program)
     695\end{alignat*}}
     696%
     697This is almost the same invariant as ${\tt sigma\_compact\_unsafe}$, but differs in that it
     698computes the sizes of branch instructions by looking at the distance between
     699position and destination using $\sigma$. In actual use, the invariant is qualified: $\sigma$ is compact if there have
     700been no changes (i.e. the boolean passed along is {\tt true}). This is to
     701reflect the fact that we are doing a least fixed point computation: the result
     702is only correct when we have reached the fixed point.
     703
     704There is another, trivial, invariant in case the iteration returns
     705$\mathtt{Some}\ \sigma$: it must hold that $\mathtt{fst}\ sigma < 2^{16}$.
     706We need this invariant to make sure that addresses do not overflow.
     707
     708The proof of {\tt nec\_plus\_ultra} goes as follows: if we return {\tt None},
     709then the program size must be greater than 64 Kb. However, since the
     710previous iteration did not return {\tt None} (because otherwise we would
     711terminate immediately), the program size in the previous iteration must have
     712been smaller than 64 Kb.
     713
     714Suppose that all the branch instructions in the previous iteration are
     715encoded as long jumps. This means that all branch instructions in this
     716iteration are long jumps as well, and therefore that both iterations are equal
     717in the encoding of their branch instructions. Per the invariant, this means that
     718$added = 0$, and therefore that all addresses in both iterations are equal.
     719But if all addresses are equal, the program sizes must be equal too, which
     720means that the program size in the current iteration must be smaller than
     72164 Kb. This contradicts the earlier hypothesis, hence not all branch
     722instructions in the previous iteration are encoded as long jumps.
     723
     724The proof of {\tt sigma\_compact} follows from {\tt sigma\_compact\_unsafe} and
     725the fact that we have reached a fixed point, i.e. the previous iteration and
     726the current iteration are the same. This means that the results of
     727{\tt instruction\_size\_jmplen} and {\tt instruction\_size} are the same.
     728
     729\subsection{Final properties}
     730
     731These are the invariants that hold after $2n$ iterations, where $n$ is the
     732program size (we use the program size for convenience; we could also use the
     733number of branch instructions, but this is more complex). Here, we only
     734need {\tt out\_of\_program\_none}, {\tt sigma\_compact} and the fact that
     735$\sigma(0) = 0$.
     736
     737Termination can now be proved using the fact that there is a $k \leq 2n$, with
     738$n$ the length of the program, such that iteration $k$ is equal to iteration
     739$k+1$. There are two possibilities: either there is a $k < 2n$ such that this
     740property holds, or every iteration up to $2n$ is different. In the latter case,
     741since the only changes between the iterations can be from shorter jumps to
     742longer jumps, in iteration $2n$ every branch instruction must be encoded as
     743a long jump. In this case, iteration $2n$ is equal to iteration $2n+1$ and the
     744fixed point is reached.
     745
     746\section{Conclusion}
     747
     748In the previous sections we have discussed the branch displacement optimisation
     749problem, presented an optimised solution, and discussed the proof of
     750termination and correctness for this algorithm, as formalised in Matita.
     751
     752The algorithm we have presented is fast and correct, but not optimal; a true
     753optimal solution would need techniques like constraint solvers. While outside
     754the scope of the present research, it would be interesting to see if enough
     755heuristics could be found to make such a solution practical for implementing
     756in an existing compiler; this would be especially useful for embedded systems,
     757where it is important to have as small a solution as possible.
     758
     759In itself the algorithm is already useful, as it results in a smaller solution
     760than the simple `every branch instruction is long' used up until now---and with
     761only 64 Kb of memory, every byte counts. It also results in a smaller solution
     762than the greatest fixed point algorithm that {\tt gcc} uses. It does this
     763without sacrificing speed or correctness.
     764
     765The certification of an assembler that relies on the branch displacement
     766algorithm described in this paper was presented in~\cite{lastyear}.
     767The assembler computes the $\sigma$ map as described in this paper and
     768then works in two passes. In the first pass it builds a map
     769from instruction labels to addresses in the assembly code. In the
     770second pass it iterates over the code, translating every pseudo jump
     771at address $src$ to a label l associated to the assembly instruction at
     772address $dst$ to a jump of the size dictated by $(\sigma\ src)$ to
     773$(\sigma\ dst)$. In case of conditional jumps, the translated jump may be
     774implemented with a series of instructions.
     775
     776The proof of correctness abstracts over the algorithm used and only relies on
     777{\tt sigma\_policy\_specification} (page~\ref{sigmapolspec}). It is a variation
     778of a standard 1-to-many forward simulation proof~\cite{Leroy2009}. The
     779relation R between states just maps every code address $ppc$ stored in
     780registers or memory to $(\sigma\ ppc)$. To identify the code addresses,
     781an additional data structure is always kept together with the source
     782state and is updated by the semantics. The semantics is preserved
     783only for those programs whose source code operations
     784$(f\ ppc_1\ \ldots\ ppc_n)$ applied to code addresses $ppc_1 \ldots ppc_n$ are
     785such that $(f\ (\sigma\ ppc_1)\ ...\ (\sigma\ ppc_n) = f\ ppc_1\ ppc_n))$. For
     786example, an injective $\sigma$ preserves a binary equality test f for code
     787addresses, but not pointer subtraction.
     788
     789The main lemma (fetching simulation), which relies on\\
     790{\tt sigma\_policy\_specification} and is established by structural induction
     791over the source code, says that fetching an assembly instruction at
     792position ppc is equal to fetching the translation of the instruction at
     793position $(\sigma\ ppc)$, and that the new incremented program counter is at
     794the beginning of the next instruction (compactness). The only exception is
     795when the instruction fetched is placed at the end of code memory and is
     796followed only by dead code. Execution simulation is trivial because of the
     797restriction over well behaved programs w.r.t. sigma. The condition
     798$\sigma\ 0 = 0$ is necessary because the hardware model prescribes that the
     799first instruction to be executed will be at address 0. For the details
     800see~\cite{lastyear}.
     801
     802Instead of verifying the algorithm directly, another solution to the problem
     803would be to run an optimisation algorithm, and then verify the safety of the
     804result using a verified validator. Such a validator would be easier to verify
     805than the algorithm itself and it would also be efficient, requiring only a
     806linear pass over the source code to test the specification. However, it is
     807surely also interesting to formally prove that the assembler never rejects
     808programs that should be accepted, i.e. that the algorithm itself is correct.
     809This is the topic of the current paper.
     810
     811\subsection{Related work}
     812
     813As far as we are aware, this is the first formal discussion of the branch
     814displacement optimisation algorithm.
     815
     816The CompCert project is another verified compiler project.
     817Their backend~\cite{Leroy2009} generates assembly code for (amongst others) subsets of the
     818PowerPC and x86 (32-bit) architectures. At the assembly code stage, there is
     819no distinction between the span-dependent jump instructions, so a branch
     820displacement optimisation algorithm is not needed.
     821
     822%An offshoot of the CompCert project is the CompCertTSO project, which adds
     823%thread concurrency and synchronisation to the CompCert compiler~\cite{TSO2011}.
     824%This compiler also generates assembly code and therefore does not include a
     825%branch displacement algorithm.
     826 
     827%Finally, there is also the Piton stack~\cite{Moore1996}, which not only includes the
     828%formal verification of a compiler, but also of the machine architecture
     829%targeted by that compiler, a microprocessor called the FM9001.
     830%However, this architecture does not have different
     831%jump sizes (branching is simulated by assigning values to the program counter),
     832%so the branch displacement problem is irrelevant.
     833 
     834
    45835
    46836\bibliography{biblio}
  • Papers/sttt/problem.tex

    r3470 r3473  
    1 \section{Introduction}
    2 
    3 The problem of branch displacement optimisation, also known as jump encoding, is
    4 a well-known problem in assembler design~\cite{Hyde2006}. Its origin lies in the
    5 fact that in many architecture sets, the encoding (and therefore size) of some
    6 instructions depends on the distance to their operand (the instruction 'span').
    7 The branch displacement optimisation problem consists of encoding these
    8 span-dependent instructions in such a way that the resulting program is as
    9 small as possible.
    10 
    11 This problem is the subject of the present paper. After introducing the problem
    12 in more detail, we will discuss the solutions used by other compilers, present
    13 the algorithm we use in the CerCo assembler, and discuss its verification,
    14 that is the proofs of termination and correctness using the Matita proof
    15 assistant~\cite{Asperti2007}.
    16 
    17 Formulating the final statement of correctness and finding the loop invariants
    18 have been non-trivial tasks and are, indeed, the main contribution of this
    19 paper. It has required considerable care and fine-tuning to formulate not
    20 only the minimal statement required for the ulterior proof of correctness of
    21 the assembler, but also the minimal set of invariants needed for the proof
    22 of correctness of the algorithm.
    23 
    24 The research presented in this paper has been executed within the CerCo project
    25 which aims at formally verifying a C compiler with cost annotations. The
    26 target architecture for this project is the MCS-51, whose instruction set
    27 contains span-dependent instructions. Furthermore, its maximum addressable
    28 memory size is very small (64 Kb), which makes it important to generate
    29 programs that are as small as possible. With this optimisation, however, comes increased complexity and hence increased possibility for error. We must make sure that the branch instructions are encoded correctly, otherwise the assembled program will behave unpredictably.
    30 
    31 All Matita files related to this development can be found on the CerCo
    32 website, \url{http://cerco.cs.unibo.it}. The specific part that contains the
    33 branch displacement algorithm is in the {\tt ASM} subdirectory, in the files
    34 {\tt PolicyFront.ma}, {\tt PolicyStep.ma} and {\tt Policy.ma}.
    35 
    36 \section{The branch displacement optimisation problem}
    37 
    38 In most modern instruction sets that have them, the only span-dependent
    39 instructions are branch instructions. Taking the ubiquitous x86-64 instruction
    40 set as an example, we find that it contains eleven different forms of the
    41 unconditional branch instruction, all with different ranges, instruction sizes
    42 and semantics (only six are valid in 64-bit mode, for example). Some examples
    43 are shown in Figure~\ref{f:x86jumps} (see also~\cite{IntelDev}).
    44 
    45 \begin{figure}[h]
    46 \begin{center}
    47 \begin{tabular}{|l|l|l|}
    48 \hline
    49 Instruction & Size (bytes) & Displacement range \\
    50 \hline
    51 Short jump & 2 & -128 to 127 bytes \\
    52 Relative near jump & 5 & $-2^{32}$ to $2^{32}-1$ bytes \\
    53 Absolute near jump & 6 & one segment (64-bit address) \\
    54 Far jump & 8 & entire memory (indirect jump) \\
    55 \hline
    56 \end{tabular}
    57 \end{center}
    58 \caption{List of x86 branch instructions}
    59 \label{f:x86jumps}
    60 \end{figure}
    61 
    62 The chosen target architecture of the CerCo project is the Intel MCS-51, which
    63 features three types of branch instructions (or jump instructions; the two terms
    64 are used interchangeably), as shown in Figure~\ref{f:mcs51jumps}.
    65 
    66 \begin{figure}[h]
    67 \begin{center}
    68 \begin{tabular}{|l|l|l|l|}
    69 \hline
    70 Instruction & Size    & Execution time & Displacement range \\
    71             & (bytes) & (cycles) & \\
    72 \hline
    73 SJMP (`short jump') & 2 & 2 & -128 to 127 bytes \\
    74 AJMP (`absolute jump') & 2 & 2 & one segment (11-bit address) \\
    75 LJMP (`long jump') & 3 & 3 & entire memory \\
    76 \hline
    77 \end{tabular}
    78 \end{center}
    79 \caption{List of MCS-51 branch instructions}
    80 \label{f:mcs51jumps}
    81 \end{figure}
    82 
    83 Conditional branch instructions are only available in short form, which
    84 means that a conditional branch outside the short address range has to be
    85 encoded using three branch instructions (for instructions whose logical
    86 negation is available, it can be done with two branch instructions, but for
    87 some instructions this is not the case). The call instruction is
    88 only available in absolute and long forms.
    89 
    90 Note that even though the MCS-51 architecture is much less advanced and much
    91 simpler than the x86-64 architecture, the basic types of branch instruction
    92 remain the same: a short jump with a limited range, an intra-segment jump and a
    93 jump that can reach the entire available memory.
    94  
    95 Generally, in code fed to the assembler as input, the only
    96 difference between branch instructions is semantics, not span. This
    97 means that a distinction is made between an unconditional branch and the
    98 several kinds of conditional branch, but not between their short, absolute or
    99 long variants.
    100 
    101 The algorithm used by the assembler to encode these branch instructions into
    102 the different machine instructions is known as the {\em branch displacement
    103 algorithm}. The optimisation problem consists of finding as small an encoding as
    104 possible, thus minimising program length and execution time.
    105 
    106 Similar problems, e.g. the branch displacement optimisation problem for other
    107 architectures, are known to be NP-complete~\cite{Robertson1979,Szymanski1978},
    108 which could make finding an optimal solution very time-consuming.
    109 
    110 The canonical solution, as shown by Szymanski~\cite{Szymanski1978} or more
    111 recently by Dickson~\cite{Dickson2008} for the x86 instruction set, is to use a
    112 fixed point algorithm that starts with the shortest possible encoding (all
    113 branch instruction encoded as short jumps, which is likely not a correct
    114 solution) and then iterates over the source to re-encode those branch
    115 instructions whose target is outside their range.
    116 
    117 \subsection*{Adding absolute jumps}
    118 
    119 In both papers mentioned above, the encoding of a jump is only dependent on the
    120 distance between the jump and its target: below a certain value a short jump
    121 can be used; above this value the jump must be encoded as a long jump.
    122 
    123 Here, termination of the smallest fixed point algorithm is easy to prove. All
    124 branch instructions start out encoded as short jumps, which means that the
    125 distance between any branch instruction and its target is as short as possible
    126 (all the intervening jumps are short).
    127 If, in this situation, there is a branch instruction $b$ whose span is not
    128 within the range for a short jump, we can be sure that we can never reach a
    129 situation where the span of $j$ is so small that it can be encoded as a short
    130 jump. This argument continues to hold throughout the subsequent iterations of
    131 the algorithm: short jumps can change into long jumps, but not \emph{vice versa},
    132 as spans only increase. Hence, the algorithm either terminates early when a fixed
    133 point is reached or when all short jumps have been changed into long jumps.
    134 
    135 Also, we can be certain that we have reached an optimal solution: a short jump
    136 is only changed into a long jump if it is absolutely necessary.
    137 
    138 However, neither of these claims (termination nor optimality) hold when we add
    139 the absolute jump. With absolute jumps, the encoding of a branch
    140 instruction no longer depends only on the distance between the branch
    141 instruction and its target. An absolute jump is possible when instruction and
    142 target are in the same segment (for the MCS-51, this means that the first 5
    143 bytes of their addresses have to be equal). It is therefore entirely possible
    144 for two branch instructions with the same span to be encoded in different ways
    145 (absolute if the branch instruction and its target are in the same segment,
    146 long if this is not the case).
    147 
    148 \begin{figure}[t]
    149 \begin{subfigure}[b]{.45\linewidth}
    150 \small
    151 \begin{alltt}
    152     jmp X
    153     \ldots
    154 L\(\sb{0}\): \ldots
    155 % Start of new segment if
    156 % jmp X is encoded as short
    157     \ldots
    158     jmp L\(\sb{0}\)
    159 \end{alltt}
    160 \caption{Example of a program where a long jump becomes absolute}
    161 \label{f:term_example}
    162 \end{subfigure}
    163 \hfill
    164 \begin{subfigure}[b]{.45\linewidth}
    165 \small
    166 \begin{alltt}
    167 L\(\sb{0}\): jmp X
    168 X:  \ldots
    169     \ldots
    170 L\(\sb{1}\): \ldots
    171 % Start of new segment if
    172 % jmp X is encoded as short
    173     \ldots
    174     jmp L\(\sb{1}\)
    175     \ldots
    176     jmp L\(\sb{1}\)
    177     \ldots
    178     jmp L\(\sb{1}\)
    179     \ldots
    180 \end{alltt}
    181 \caption{Example of a program where the fixed-point algorithm is not optimal}
    182 \label{f:opt_example}
    183 \end{subfigure}
    184 \end{figure}
    185 
    186 This invalidates our earlier termination argument: a branch instruction, once encoded
    187 as a long jump, can be re-encoded during a later iteration as an absolute jump.
    188 Consider the program shown in Figure~\ref{f:term_example}. At the start of the
    189 first iteration, both the branch to {\tt X} and the branch to $\mathtt{L}_{0}$
    190 are encoded as small jumps. Let us assume that in this case, the placement of
    191 $\mathtt{L}_{0}$ and the branch to it are such that $\mathtt{L}_{0}$ is just
    192 outside the segment that contains this branch. Let us also assume that the
    193 distance between $\mathtt{L}_{0}$ and the branch to it is too large for the
    194 branch instruction to be encoded as a short jump.
    195 
    196 All this means that in the second iteration, the branch to $\mathtt{L}_{0}$ will
    197 be encoded as a long jump. If we assume that the branch to {\tt X} is encoded as
    198 a long jump as well, the size of the branch instruction will increase and
    199 $\mathtt{L}_{0}$ will be `propelled' into the same segment as its branch
    200 instruction, because every subsequent instruction will move one byte forward.
    201 Hence, in the third iteration, the branch to $\mathtt{L}_{0}$ can be encoded as
    202 an absolute jump. At first glance, there is nothing that prevents us from
    203 constructing a configuration where two branch instructions interact in such a
    204 way as to iterate indefinitely between long and absolute encodings.
    205 
    206 This situation mirrors the explanation by Szymanski~\cite{Szymanski1978} of why
    207 the branch displacement optimisation problem is NP-complete. In this explanation,
    208 a condition for NP-completeness is the fact that programs be allowed to contain
    209 {\em pathological} jumps. These are branch instructions that can normally not be
    210 encoded as a short(er) jump, but gain this property when some other branch
    211 instructions are encoded as a long(er) jump. This is exactly what happens in
    212 Figure~\ref{f:term_example}. By encoding the first branch instruction as a long
    213 jump, another branch instruction switches from long to absolute (which is
    214 shorter).
    215 
    216 In addition, our previous optimality argument no longer holds. Consider the
    217 program shown in Figure~\ref{f:opt_example}. Suppose that the distance between
    218 $\mathtt{L}_{0}$ and $\mathtt{L}_{1}$ is such that if {\tt jmp X} is encoded
    219 as a short jump, there is a segment border just after $\mathtt{L}_{1}$. Let
    220 us also assume that all three branches to $\mathtt{L}_{1}$ are in the same
    221 segment, but far enough away from $\mathtt{L}_{1}$ that they cannot be encoded
    222 as short jumps.
    223 
    224 Then, if {\tt jmp X} were to be encoded as a short jump, which is clearly
    225 possible, all of the branches to $\mathtt{L}_{1}$ would have to be encoded as
    226 long jumps. However, if {\tt jmp X} were to be encoded as a long jump, and
    227 therefore increase in size, $\mathtt{L}_{1}$ would be `propelled' across the
    228 segment border, so that the three branches to $\mathtt{L}_{1}$ could be encoded
    229 as absolute jumps. Depending on the relative sizes of long and absolute jumps,
    230 this solution might actually be smaller than the one reached by the smallest
    231 fixed point algorithm.
  • Papers/sttt/proof.tex

    r3470 r3473  
    1 \section{The proof}
    2 
    3 In this section, we present the correctness proof for the algorithm in more
    4 detail. The main correctness statement is shown, slightly simplified, in~Figure~\ref{statement}.
    5 %
    6 \begin{figure}[t]
    7 \small
    8 \begin{alignat*}{6}
    9 \mathtt{sigma}&\omit\rlap{$\mathtt{\_policy\_specification} \equiv
    10 \lambda program.\lambda sigma.$} \notag\\
    11   & \omit\rlap{$sigma\ 0 = 0\ \wedge$} \notag\\
    12         & \mathbf{let}\ & & \omit\rlap{$instr\_list \equiv code\ program\ \mathbf{in}$} \notag\\
    13         &&& \omit\rlap{$\forall ppc.ppc < |instr\_list| \rightarrow$} \notag\\
    14         &&& \mathbf{let}\ && pc \equiv sigma\ ppc\ \mathbf{in} \notag\\
    15         &&& \mathbf{let}\ && instruction \equiv \mathtt{fetch\_pseudo\_instruction}\ instr\_list\ ppc\ \mathbf{in} \notag\\
    16         &&& \mathbf{let}\ && next\_pc \equiv sigma\ (ppc+1)\ \mathbf{in}\notag\\
    17         &&&&& next\_pc = pc + \mathtt{instruction\_size}\ sigma\ ppc\ instruction\ \wedge\notag\\
    18   &&&&& (pc + \mathtt{instruction\_size}\ sigma\ ppc\ instruction < 2^{16}\ \vee\notag\\
    19   &&&&& (\forall ppc'.ppc' < |instr\_list| \rightarrow ppc < ppc' \rightarrow \notag\\
    20         &&&&& \mathbf{let}\ instruction' \equiv \mathtt{fetch\_pseudo\_instruction}\ instr\_list\ ppc'\ \mathbf{in} \notag\\
    21         &&&&&\ \mathtt{instruction\_size}\ sigma\ ppc'\ instruction' = 0)\ \wedge \notag\\
    22         &&&&& pc + \mathtt{instruction\_size}\ sigma\ ppc\ instruction = 2^{16})
    23 \end{alignat*}
    24 \caption{Main correctness statement\label{statement}}
    25 \label{sigmapolspec}
    26 \end{figure}
    27 %
    28 Informally, this means that when fetching a pseudo-instruction at $ppc$, the
    29 translation by $\sigma$ of $ppc+1$ is the same as $\sigma(ppc)$ plus the size
    30 of the instruction at $ppc$.  That is, an instruction is placed consecutively
    31 after the previous one, and there are no overlaps. The rest of the statement deals with memory size: either the next instruction fits within memory ($next\_pc < 2^{16}$) or it ends exactly at the limit memory,
    32 in which case it must be the last translated instruction in the program (enforced by specfiying that the size of all subsequent instructions is 0: there may be comments or cost annotations that are not translated).
    33 
    34 Finally, we enforce that the program starts at address 0, i.e. $\sigma(0) = 0$. It may seem strange that we do not explicitly include a safety property stating that every jump instruction is of the right type with respect to its target (akin to the lemma from Figure~\ref{sigmasafe}), but this is not necessary. The distance is recalculated according to the instruction addresses from $\sigma$, which implicitly expresses safety.
    35 
    36 Since our computation is a least fixed point computation, we must prove
    37 termination in order to prove correctness: if the algorithm is halted after
    38 a number of steps without reaching a fixed point, the solution is not
    39 guaranteed to be correct. More specifically, branch instructions might be
    40 encoded which do not coincide with the span between their location and their
    41 destination.
    42 
    43 Proof of termination rests on the fact that the encoding of branch
    44 instructions can only grow larger, which means that we must reach a fixed point
    45 after at most $2n$ iterations, with $n$ the number of branch instructions in
    46 the program. This worst case is reached if at every iteration, we change the
    47 encoding of exactly one branch instruction; since the encoding of any branch
    48 instruction can change first from short to absolute, and then to long, there
    49 can be at most $2n$ changes.
    50 
    51 %The proof has been carried out using the ``Russell'' style from~\cite{Sozeau2006}.
    52 %We have proven some invariants of the {\sc f} function from the previous
    53 %section; these invariants are then used to prove properties that hold for every
    54 %iteration of the fixed point computation; and finally, we can prove some
    55 %properties of the fixed point.
    56 
    57 \subsection{Fold invariants}
    58 
    59 In this section, we present the invariants that hold during the fold of {\sc f}
    60 over the program. These will be used later on to prove the properties of the
    61 iteration. During the fixed point computation, the $\sigma$ function is
    62 implemented as a trie for ease of access; computing $\sigma(x)$ is achieved by
    63 looking up the value of $x$ in the trie. Actually, during the fold, the value
    64 we pass along is a pair $\mathbb{N} \times \mathtt{ppc\_pc\_map}$. The first
    65 component is the number of bytes added to the program so far with respect to
    66 the previous iteration, and the second component, {\tt ppc\_pc\_map}, is the
    67 actual $\sigma$ trie (which we'll call $strie$ to avoid confusion).
    68 %
    69 {\small
    70 \begin{alignat*}{2}
    71 \mathtt{out} & \mathtt{\_of\_program\_none} \equiv \lambda prefix.\lambda strie. \notag\\
    72 & \forall i.i < 2^{16} \rightarrow (i > |prefix| \leftrightarrow
    73  \mathtt{lookup\_opt}\ i\ (\mathtt{snd}\ strie) = \mathtt{None})
    74 \end{alignat*}}
    75 %
    76 The first invariant states that any pseudo-address not yet examined is not
    77 present in the lookup trie.
    78 %
    79 {\small
    80 \begin{alignat*}{2}
    81 \mathtt{not} & \mathtt{\_jump\_default} \equiv \lambda prefix.\lambda strie.\forall i.i < |prefix| \rightarrow\notag\\
    82 & \neg\mathtt{is\_jump}\ (\mathtt{nth}\ i\ prefix) \rightarrow \mathtt{lookup}\ i\ (\mathtt{snd}\ strie) = \mathtt{short\_jump}
    83 \end{alignat*}}
    84 %
    85 This invariant states that when we try to look up the jump length of a
    86 pseudo-address where there is no branch instruction, we will get the default
    87 value, a short jump.
    88 %
    89 {\small
    90 \begin{alignat*}{4}
    91 \mathtt{jump} & \mathtt{\_increase} \equiv \lambda pc.\lambda op.\lambda p.\forall i.i < |prefix| \rightarrow \notag\\ 
    92 &       \mathbf{let}\  oj \equiv \mathtt{lookup}\ i\ (\mathtt{snd}\ op)\ \mathbf{in} \notag\\
    93 &       \mathbf{let}\ j \equiv \mathtt{lookup}\ i\ (\mathtt{snd}\ p)\ \mathbf{in}\ \mathtt{jmpleq}\ oj\ j
    94 \end{alignat*}}
    95 %
    96 This invariant states that between iterations (with $op$ being the previous
    97 iteration, and $p$ the current one), jump lengths either remain equal or
    98 increase. It is needed for proving termination.
    99 %
    100 \begin{figure}[h]
    101 \small
    102 \begin{alignat*}{6}
    103 \mathtt{sigma} & \omit\rlap{$\mathtt{\_compact\_unsafe} \equiv \lambda prefix.\lambda strie.\forall n.n < |prefix| \rightarrow$}\notag\\
    104 & \mathbf{match}\ && \omit\rlap{$\mathtt{lookup\_opt}\ n\ (\mathtt{snd}\ strie)\ \mathbf{with}$}\notag\\
    105 &&& \omit\rlap{$\mathtt{None} \Rightarrow \mathrm{False}$} \notag\\
    106 &&& \omit\rlap{$\mathtt{Some}\ \langle pc, j \rangle \Rightarrow$} \notag\\
    107 &&& \mathbf{match}\ && \mathtt{lookup\_opt}\ (n+1)\ (\mathtt{snd}\ strie)\ \mathbf{with}\notag\\
    108 &&&&& \mathtt{None} \Rightarrow \mathrm{False} \notag\\
    109 &&&&& \mathtt{Some}\ \langle pc_1, j_1 \rangle \Rightarrow
    110                 pc_1 = pc + \notag\\
    111 &&&&& \ \ \mathtt{instruction\_size\_jmplen}\ j\ (\mathtt{nth}\ n\ prefix)
    112 \end{alignat*}
    113 \caption{Temporary safety property}
    114 \label{sigmacompactunsafe}
    115 \end{figure}
    116 %
    117 We now proceed with the safety lemmas. The lemma in
    118 Figure~\ref{sigmacompactunsafe} is a temporary formulation of the main
    119 property {\tt sigma\_policy\_specification}. Its main difference from the
    120 final version is that it uses {\tt instruction\_size\_jmplen} to compute the
    121 instruction size. This function uses $j$ to compute the span of branch
    122 instructions  (i.e. it uses the $\sigma$ under construction), instead
    123 of looking at the distance between source and destination. This is because
    124 $\sigma$ is still under construction; we will prove below that after the
    125 final iteration, {\tt sigma\_compact\_unsafe} is equivalent to the main
    126 property in Figure~\ref{sigmasafe} which holds at the end of the computation.
    127 %
    128 \begin{figure}[h]
    129 \small
    130 \begin{alignat*}{6}
    131 \mathtt{sigma} & \omit\rlap{$\mathtt{\_safe} \equiv \lambda prefix.\lambda labels.\lambda old\_strie.\lambda strie.\forall i.i < |prefix| \rightarrow$} \notag\\
    132 & \omit\rlap{$\forall dest\_label.\mathtt{is\_jump\_to\ (\mathtt{nth}\ i\ prefix})\ dest\_label \rightarrow$} \notag\\
    133 & \mathbf{let} && \omit\rlap{$\ paddr \equiv \mathtt{lookup}\ labels\ dest\_label\ \mathbf{in}$} \notag\\
    134 & \mathbf{let} && \omit\rlap{$\ \langle j, src, dest \rangle \equiv \mathbf{if} \ paddr\ \leq\ i\ \mathbf{then}$}\notag\\
    135 &&&&& \mathbf{let}\ \langle \_, j \rangle \equiv \mathtt{lookup}\ i\ (\mathtt{snd}\ strie)\ \mathbf{in} \notag\\
    136 &&&&& \mathbf{let}\ \langle pc\_plus\_jl, \_ \rangle \equiv \mathtt{lookup}\ (i+1)\ (\mathtt{snd}\ strie)\ \mathbf{in}\notag\\
    137 &&&&& \mathbf{let}\ \langle addr, \_ \rangle \equiv \mathtt{lookup}\ paddr\ (\mathtt{snd}\ strie)\ \mathbf{in}\notag\\
    138 &&&&&   \langle j, pc\_plus\_jl, addr \rangle\notag\\
    139 &&&\mathbf{else} \notag\\
    140 &&&&&\mathbf{let}\ \langle \_, j \rangle \equiv \mathtt{lookup}\ i\ (\mathtt{snd}\ strie)\ \mathbf{in} \notag\\
    141 &&&&&\mathbf{let}\ \langle pc\_plus\_jl, \_ \rangle \equiv \mathtt{lookup}\ (i+1)\ (\mathtt{snd}\ old\_strie)\ \mathbf{in}\notag\\
    142 &&&&&\mathbf{let}\ \langle addr, \_ \rangle \equiv \mathtt{lookup}\ paddr\ (\mathtt{snd}\ old\_strie)\ \mathbf{in}\notag\\
    143 &&&&&\langle j, pc\_plus\_jl, addr \rangle \mathbf{in}\ \notag\\
    144 &&&\mathbf{match} && \ j\ \mathbf{with} \notag\\
    145 &&&&&\mathrm{short\_jump} \Rightarrow \mathtt{short\_jump\_valid}\ src\ dest\notag\\
    146 &&&&&\mathrm{absolute\_jump} \Rightarrow \mathtt{absolute\_jump\_valid}\ src\ dest\notag\\
    147 &&&&&\mathrm{long\_jump} \Rightarrow \mathrm{True}
    148 \end{alignat*}
    149 \caption{Safety property}
    150 \label{sigmasafe}
    151 \end{figure}
    152 %
    153 We compute the distance using the memory address of the instruction
    154 plus its size. This follows the behaviour of the MCS-51 microprocessor, which
    155 increases the program counter directly after fetching, and only then executes
    156 the branch instruction (by changing the program counter again).
    157 
    158 There are also some simple, properties to make sure that our policy
    159 remains consistent, and to keep track of whether the fixed point has been
    160 reached. We do not include them here in detail. Two of these properties give the values of $\sigma$ for the start and end of the program; $\sigma(0) = 0$ and $\sigma(n)$, where $n$ is the number of instructions up until now, is equal to the maximum memory address so far. There are also two properties that deal with what happens when the previous
    161 iteration does not change with respect to the current one. $added$ is a
    162 variable that keeps track of the number of bytes we have added to the program
    163 size by changing the encoding of branch instructions. If $added$ is 0, the program
    164 has not changed and vice versa.
    165 
    166 %{\small
    167 %\begin{align*}
    168 %& \mathtt{lookup}\ 0\ (\mathtt{snd}\ strie) = 0 \notag\\
    169 %& \mathtt{lookup}\ |prefix|\ (\mathtt{snd}\ strie) = \mathtt{fst}\ strie
    170 %\end{align*}}
    171 
    172 
    173 %{\small
    174 %\begin{align*}
    175 %& added = 0\ \rightarrow\ \mathtt{policy\_pc\_equal}\ prefix\ old\_strie\ strie \notag\\
    176 %& \mathtt{policy\_jump\_equal}\ prefix\ old\_strie\ strie\ \rightarrow\ added = 0
    177 %\end{align*}}
    178 
    179 We need to use two different formulations, because the fact that $added$ is 0
    180 does not guarantee that no branch instructions have changed.  For instance,
    181 it is possible that we have replaced a short jump with an absolute jump, which
    182 does not change the size of the branch instruction. Therefore {\tt policy\_pc\_equal} states that $old\_sigma_1(x) = sigma_1(x)$, whereas {\tt policy\_jump\_equal} states that $old\_sigma_2(x) = sigma_2(x)$. This formulation is sufficient to prove termination and compactness. 
    183 
    184 Proving these invariants is simple, usually by induction on the prefix length.
    185 
    186 \subsection{Iteration invariants}
    187 
    188 These are invariants that hold after the completion of an iteration. The main
    189 difference between these invariants and the fold invariants is that after the
    190 completion of the fold, we check whether the program size does not supersede
    191 64 Kb, the maximum memory size the MCS-51 can address. The type of an iteration therefore becomes an option type: {\tt None} in case
    192 the program becomes larger than 64 Kb, or $\mathtt{Some}\ \sigma$
    193 otherwise. We also no longer pass along the number of bytes added to the
    194 program size, but a boolean that indicates whether we have changed something
    195 during the iteration or not.
    196 
    197 If the iteration returns {\tt None}, which means that it has become too large for memory, there is an invariant that states that the previous iteration cannot
    198 have every branch instruction encoded as a long jump. This is needed later in the proof of termination. If the iteration returns $\mathtt{Some}\ \sigma$, the fold invariants are retained without change.
    199 
    200 Instead of using {\tt sigma\_compact\_unsafe}, we can now use the proper
    201 invariant:
    202 %
    203 {\small
    204 \begin{alignat*}{6}
    205 \mathtt{sigma} & \omit\rlap{$\mathtt{\_compact} \equiv \lambda program.\lambda sigma.$} \notag\\
    206 & \omit\rlap{$\forall n.n < |program|\ \rightarrow$} \notag\\
    207 & \mathbf{match}\ && \omit\rlap{$\mathtt{lookup\_opt}\ n\ (\mathtt{snd}\ sigma)\ \mathbf{with}$}\notag\\
    208 &&& \omit\rlap{$\mathrm{None}\ \Rightarrow\ \mathrm{False}$}\notag\\
    209 &&& \omit\rlap{$\mathrm{Some}\ \langle pc, j \rangle \Rightarrow$}\notag\\
    210 &&& \mathbf{match}\ && \mathtt{lookup\_opt}\ (n+1)\ (\mathtt{snd}\ sigma)\ \mathbf{with}\notag\\
    211 &&&&&   \mathrm{None}\ \Rightarrow\ \mathrm{False}\notag\\
    212 &&&&& \mathrm{Some} \langle pc1, j1 \rangle \Rightarrow\notag\\
    213 &&&&& \ \ pc1 = pc + \mathtt{instruction\_size}\ n\ (\mathtt{nth}\ n\ program)
    214 \end{alignat*}}
    215 %
    216 This is almost the same invariant as ${\tt sigma\_compact\_unsafe}$, but differs in that it
    217 computes the sizes of branch instructions by looking at the distance between
    218 position and destination using $\sigma$. In actual use, the invariant is qualified: $\sigma$ is compact if there have
    219 been no changes (i.e. the boolean passed along is {\tt true}). This is to
    220 reflect the fact that we are doing a least fixed point computation: the result
    221 is only correct when we have reached the fixed point.
    222 
    223 There is another, trivial, invariant in case the iteration returns
    224 $\mathtt{Some}\ \sigma$: it must hold that $\mathtt{fst}\ sigma < 2^{16}$.
    225 We need this invariant to make sure that addresses do not overflow.
    226 
    227 The proof of {\tt nec\_plus\_ultra} goes as follows: if we return {\tt None},
    228 then the program size must be greater than 64 Kb. However, since the
    229 previous iteration did not return {\tt None} (because otherwise we would
    230 terminate immediately), the program size in the previous iteration must have
    231 been smaller than 64 Kb.
    232 
    233 Suppose that all the branch instructions in the previous iteration are
    234 encoded as long jumps. This means that all branch instructions in this
    235 iteration are long jumps as well, and therefore that both iterations are equal
    236 in the encoding of their branch instructions. Per the invariant, this means that
    237 $added = 0$, and therefore that all addresses in both iterations are equal.
    238 But if all addresses are equal, the program sizes must be equal too, which
    239 means that the program size in the current iteration must be smaller than
    240 64 Kb. This contradicts the earlier hypothesis, hence not all branch
    241 instructions in the previous iteration are encoded as long jumps.
    242 
    243 The proof of {\tt sigma\_compact} follows from {\tt sigma\_compact\_unsafe} and
    244 the fact that we have reached a fixed point, i.e. the previous iteration and
    245 the current iteration are the same. This means that the results of
    246 {\tt instruction\_size\_jmplen} and {\tt instruction\_size} are the same.
    247 
    248 \subsection{Final properties}
    249 
    250 These are the invariants that hold after $2n$ iterations, where $n$ is the
    251 program size (we use the program size for convenience; we could also use the
    252 number of branch instructions, but this is more complex). Here, we only
    253 need {\tt out\_of\_program\_none}, {\tt sigma\_compact} and the fact that
    254 $\sigma(0) = 0$.
    255 
    256 Termination can now be proved using the fact that there is a $k \leq 2n$, with
    257 $n$ the length of the program, such that iteration $k$ is equal to iteration
    258 $k+1$. There are two possibilities: either there is a $k < 2n$ such that this
    259 property holds, or every iteration up to $2n$ is different. In the latter case,
    260 since the only changes between the iterations can be from shorter jumps to
    261 longer jumps, in iteration $2n$ every branch instruction must be encoded as
    262 a long jump. In this case, iteration $2n$ is equal to iteration $2n+1$ and the
    263 fixed point is reached.
Note: See TracChangeset for help on using the changeset viewer.