source: Papers/sttt/main.tex @ 3477

Last change on this file since 3477 was 3477, checked in by mulligan, 5 years ago

changes to intro, in progress...

  • Property svn:executable set to *
File size: 51.3 KB
Line 
1%\documentclass[twocolumn,draft]{svjour3}
2\documentclass[a4paper]{article}
3\usepackage{algpseudocode}
4%\usepackage{algorithmicx}
5\usepackage{alltt}
6\usepackage{amsfonts}
7\usepackage{amsmath}
8\usepackage[british]{babel}
9\usepackage{caption}
10\usepackage[colorlinks]{hyperref}
11\usepackage[utf8]{inputenc}
12\usepackage{listings}
13\usepackage{microtype}
14\usepackage{subcaption}
15
16\renewcommand{\verb}{\lstinline}
17\def\lstlanguagefiles{lst-grafite.tex}
18\lstset{language=Grafite}
19
20\begin{document}
21
22\title{On the proof of correctness of a verified optimising assembler
23\thanks{Research supported by the CerCo project, within the Future and Emerging Technologies (FET) programme of the Seventh Framework Programme for Research of the European Commission, under FET-Open grant number 243881}}
24\author{Jaap Boender \and Dominic P. Mulligan \and Claudio Sacerdoti Coen}
25%\institute{Department of Computer Science, University of Middlesex \and Computer Laboratory, University of Cambridge \and Dipartimento di Informatica, University of Bologna}
26
27\maketitle
28
29\begin{abstract}
30Optimising assemblers face the `branch displacement' or `jump encoding' problem, that is: how best to choose between concrete machine code jump instructions---typically of differing instruction and offset sizes---when expanding pseudo-instructions.
31Ideally, on space constrained hardware, an optimising assembler would choose the set of pseudo-instruction expansions that minimises the size of the resulting machine code program, a task that is provably \textsc{np}-hard.
32
33As part of CerCo (`Certified Complexity')---an \textsc{eu}-funded project to develop a verified concrete complexity preserving compiler for a large subset of the C programming language---we have implemented and proved correct an optimising assembler within the Matita interactive theorem prover.
34Our assembler targets the instruction set of a typical micro-controller, the Intel \textsc{mcs-51} series.
35As is common with embedded systems development this micro-controller has a paucity of available code memory and therefore we face an additional pressure in reducing the size of any assembled machine code program.
36Out of necessity, then, our assembler implements an algorithm for solving the branch displacement problem, and we have proved that this algorithm is correct.
37
38However, this efficient expansion of jump pseudoinstructions into machine code equivalents is complex, and therefore could unneccessarily complicate the proof of correctness for the rest of the assembler.
39We therefore isolate the decision making over how jumps should be expanded from the expansion process itself as much as possible using `policies', making the rest of the proof of correctness more straightforward.
40Our proof strategy contains a tracking facility for `good addresses' and only programs that use good addresses have their semantics preserved under assembly, as we observe that it is impossible for an assembler to preserve the semantics of every assembly program.
41Our strategy offers increased flexibility over the traditional approach to proving the correctness of assemblers, wherein addresses in assembly are kept opaque and immutable.
42In particular, our approach permits experimentation with the benign manipulation of addresses.
43
44We discuss wider issues associated with a proof of correctness for an assembler, detail our algorithm solving the `branch displacement' problem, and discuss our proof of correctness, employing `policies', for the assembler.
45
46%\keywords{Formal verification, interactive theorem proving, assembler, branch displacement optimisation, CerCo (`Certified Complexity'), MCS-51 microcontroller, Matita proof assistant}
47\end{abstract}
48
49\section{Introduction}
50
51We consider the formalisation of an assembler for the Intel MCS-51 8-bit microprocessor in the Matita proof assistant~\cite{asperti:user:2007}.
52This formalisation forms a major component of the EU-funded CerCo (`Certified Complexity') project~\cite{cerco:2011}, concerning the construction and formalisation of a concrete complexity preserving compiler for a large subset of the C programming language.
53
54The \textsc{mcs-51}, commonly called the 8051, is an 8-bit Harvard architecture \textsc{cisc} instruction set micro-controller dating from the early 1980s, and was originally manufactured by Intel.
55An extended variant, the \textsc{mcs-52} or 8052, was subsequently released adding extra \textsc{ram} and \textsc{rom} above and beyond that offered by the original \textsc{mcs-51}, and an extra timer.
56Derivatives are still widely manufactured by a number of semiconductor foundries, with the processor being used especially in embedded systems.
57
58The MCS-51 has a relative paucity of features compared to its more modern brethren, with a lack of any caching or pipelining features meaning that timing of execution is predictable, making the MCS-51 very attractive for CerCo's ends.
59However, the MCS-51's paucity of features---though an advantage in many respects---also quickly becomes a hindrance, as the MCS-51 features a relatively minuscule series of memory spaces by modern standards.
60As a result our C compiler, to be able to successfully compile realistic programs for embedded devices, ought to produce `tight' machine code.
61
62To do this, we must solve the `branch displacement' problem---deciding how best to expand pseudojumps to labels in assembly language to machine code jumps.
63This problem, also known as `jump encoding', is
64a well-known problem amongst assembler implementors~\cite{Hyde2006} and arises when pseudojumps can be expanded
65in different ways to real machine instructions, but the different expansions
66are not equivalent (e.g. jumps that differ in the distance that they `span', or instructions that differ in the number of clock cycles needed to completely execute them) and are not always
67correct (e.g. correctness is only up to global constraints over the compiled
68code). For instance, some jump instructions (short jumps) are very small
69and can be executed quickly, but they can only reach destinations within a
70certain distance from the current instruction. When the destinations are
71too far away, larger and slower long jumps must be used. The use of a long jump may
72augment the distance between another pseudojump and its target, forcing
73another long jump use, in a cascade. The job of the optimising
74compiler (assembler) is to individually expand every pseudo-instruction in such a way
75that all global constraints are satisfied and that the compiled program
76is minimal in size and faster in concrete time complexity.
77This problem is known to be computationally hard for most CISC architectures (\textsc{np}-hard, see~\cite{hyde:branch:2006}).
78
79To simplify the CerCo C compiler we have chosen to implement an optimising assembler whose input language the compiler will target.
80Labels, conditional jumps to labels, a program preamble containing global data and a \texttt{MOV} instruction for moving this global data into the MCS-51's one 16-bit register all feature in our assembly language.
81We further simplify by ignoring linking, assuming that all our assembly programs are pre-linked.
82
83The requirements of the CerCo project add yet more complications to our proof of correctness, as we must also address a cost model.
84CerCo imposes a cost model on C programs or, more specifically, on simple blocks of instructions.
85This cost model is induced by the compilation process itself, and its non-compositional nature allows us to assign different costs to identical C statements depending on how they are compiled.
86In short, we aim to obtain a very precise costing for a program by embracing the compilation process, not ignoring it.
87At the assembler level, this is reflected by our need to induce a cost
88model on the assembly code as a function of the assembly program and the
89strategy used to solve the branch displacement problem. In particular, our
90optimising assembler should also return a map that assigns a cost (in clock
91cycles) to every instruction in the source program. We expect the induced cost
92to be preserved by the assembler: we will prove that the compiled code
93tightly simulates the source code by taking exactly the predicted amount of
94time.
95Note that the temporal `tightness' of the simulation is a fundamental prerequisite
96of the correctness of the simulation, as some functions of the MCS-51---timers and \textsc{i/o}---depend on the microprocessor's clock.
97If the pseudo- and concrete clock differ the result of an \textsc{i/o} operation may not be preserved.
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119% This problem is the subject of the present paper. After introducing the problem
120% in more detail, we will discuss the solutions used by other compilers, present
121% the algorithm we use in the CerCo assembler, and discuss its verification,
122% that is the proofs of termination and correctness using the Matita proof
123% assistant~\cite{Asperti2007}.
124
125% Formulating the final statement of correctness and finding the loop invariants
126% have been non-trivial tasks and are, indeed, the main contribution of this
127% paper. It has required considerable care and fine-tuning to formulate not
128% only the minimal statement required for the ulterior proof of correctness of
129% the assembler, but also the minimal set of invariants needed for the proof
130% of correctness of the algorithm.
131
132% The research presented in this paper has been executed within the CerCo project
133% which aims at formally verifying a C compiler with cost annotations. The
134% target architecture for this project is the MCS-51, whose instruction set
135% contains span-dependent instructions. Furthermore, its maximum addressable
136% memory size is very small (64 Kb), which makes it important to generate
137% programs that are as small as possible. With this optimisation, however, comes increased complexity and hence increased possibility for error. We must make sure that the branch instructions are encoded correctly, otherwise the assembled program will behave unpredictably.
138
139% All Matita files related to this development can be found on the CerCo
140% website, \url{http://cerco.cs.unibo.it}. The specific part that contains the
141% branch displacement algorithm is in the {\tt ASM} subdirectory, in the files
142% {\tt PolicyFront.ma}, {\tt PolicyStep.ma} and {\tt Policy.ma}.
143
144\section{The branch displacement optimisation problem}
145
146In most modern instruction sets that have them, the only span-dependent
147instructions are branch instructions. Taking the ubiquitous x86-64 instruction
148set as an example, we find that it contains eleven different forms of the
149unconditional branch instruction, all with different ranges, instruction sizes
150and semantics (only six are valid in 64-bit mode, for example). Some examples
151are shown in Figure~\ref{f:x86jumps} (see also~\cite{IntelDev}).
152
153\begin{figure}[h]
154\begin{center}
155\begin{tabular}{|l|l|l|}
156\hline
157Instruction & Size (bytes) & Displacement range \\
158\hline
159Short jump & 2 & -128 to 127 bytes \\
160Relative near jump & 5 & $-2^{32}$ to $2^{32}-1$ bytes \\
161Absolute near jump & 6 & one segment (64-bit address) \\
162Far jump & 8 & entire memory (indirect jump) \\
163\hline
164\end{tabular}
165\end{center}
166\caption{List of x86 branch instructions}
167\label{f:x86jumps}
168\end{figure}
169
170The chosen target architecture of the CerCo project is the Intel MCS-51, which
171features three types of branch instructions (or jump instructions; the two terms
172are used interchangeably), as shown in Figure~\ref{f:mcs51jumps}.
173
174\begin{figure}[h]
175\begin{center}
176\begin{tabular}{|l|l|l|l|}
177\hline
178Instruction & Size    & Execution time & Displacement range \\
179            & (bytes) & (cycles) & \\
180\hline
181SJMP (`short jump') & 2 & 2 & -128 to 127 bytes \\
182AJMP (`absolute jump') & 2 & 2 & one segment (11-bit address) \\
183LJMP (`long jump') & 3 & 3 & entire memory \\
184\hline
185\end{tabular}
186\end{center}
187\caption{List of MCS-51 branch instructions}
188\label{f:mcs51jumps}
189\end{figure}
190
191Conditional branch instructions are only available in short form, which
192means that a conditional branch outside the short address range has to be
193encoded using three branch instructions (for instructions whose logical
194negation is available, it can be done with two branch instructions, but for
195some instructions this is not the case). The call instruction is
196only available in absolute and long forms.
197
198Note that even though the MCS-51 architecture is much less advanced and much
199simpler than the x86-64 architecture, the basic types of branch instruction
200remain the same: a short jump with a limited range, an intra-segment jump and a
201jump that can reach the entire available memory.
202 
203Generally, in code fed to the assembler as input, the only
204difference between branch instructions is semantics, not span. This
205means that a distinction is made between an unconditional branch and the
206several kinds of conditional branch, but not between their short, absolute or
207long variants.
208
209The algorithm used by the assembler to encode these branch instructions into
210the different machine instructions is known as the {\em branch displacement
211algorithm}. The optimisation problem consists of finding as small an encoding as
212possible, thus minimising program length and execution time.
213
214Similar problems, e.g. the branch displacement optimisation problem for other
215architectures, are known to be NP-complete~\cite{Robertson1979,Szymanski1978},
216which could make finding an optimal solution very time-consuming.
217
218The canonical solution, as shown by Szymanski~\cite{Szymanski1978} or more
219recently by Dickson~\cite{Dickson2008} for the x86 instruction set, is to use a
220fixed point algorithm that starts with the shortest possible encoding (all
221branch instruction encoded as short jumps, which is likely not a correct
222solution) and then iterates over the source to re-encode those branch
223instructions whose target is outside their range.
224
225\subsection*{Adding absolute jumps}
226
227In both papers mentioned above, the encoding of a jump is only dependent on the
228distance between the jump and its target: below a certain value a short jump
229can be used; above this value the jump must be encoded as a long jump.
230
231Here, termination of the smallest fixed point algorithm is easy to prove. All
232branch instructions start out encoded as short jumps, which means that the
233distance between any branch instruction and its target is as short as possible
234(all the intervening jumps are short).
235If, in this situation, there is a branch instruction $b$ whose span is not
236within the range for a short jump, we can be sure that we can never reach a
237situation where the span of $j$ is so small that it can be encoded as a short
238jump. This argument continues to hold throughout the subsequent iterations of
239the algorithm: short jumps can change into long jumps, but not \emph{vice versa},
240as spans only increase. Hence, the algorithm either terminates early when a fixed
241point is reached or when all short jumps have been changed into long jumps.
242
243Also, we can be certain that we have reached an optimal solution: a short jump
244is only changed into a long jump if it is absolutely necessary.
245
246However, neither of these claims (termination nor optimality) hold when we add
247the absolute jump. With absolute jumps, the encoding of a branch
248instruction no longer depends only on the distance between the branch
249instruction and its target. An absolute jump is possible when instruction and
250target are in the same segment (for the MCS-51, this means that the first 5
251bytes of their addresses have to be equal). It is therefore entirely possible
252for two branch instructions with the same span to be encoded in different ways
253(absolute if the branch instruction and its target are in the same segment,
254long if this is not the case).
255
256\begin{figure}[t]
257\begin{subfigure}[b]{.45\linewidth}
258\small
259\begin{alltt}
260    jmp X
261    \ldots
262L\(\sb{0}\): \ldots
263% Start of new segment if
264% jmp X is encoded as short
265    \ldots
266    jmp L\(\sb{0}\)
267\end{alltt}
268\caption{Example of a program where a long jump becomes absolute}
269\label{f:term_example}
270\end{subfigure}
271\hfill
272\begin{subfigure}[b]{.45\linewidth}
273\small
274\begin{alltt}
275L\(\sb{0}\): jmp X
276X:  \ldots
277    \ldots
278L\(\sb{1}\): \ldots
279% Start of new segment if
280% jmp X is encoded as short
281    \ldots
282    jmp L\(\sb{1}\)
283    \ldots
284    jmp L\(\sb{1}\)
285    \ldots
286    jmp L\(\sb{1}\) 
287    \ldots
288\end{alltt}
289\caption{Example of a program where the fixed-point algorithm is not optimal}
290\label{f:opt_example}
291\end{subfigure}
292\end{figure}
293
294This invalidates our earlier termination argument: a branch instruction, once encoded
295as a long jump, can be re-encoded during a later iteration as an absolute jump.
296Consider the program shown in Figure~\ref{f:term_example}. At the start of the
297first iteration, both the branch to {\tt X} and the branch to $\mathtt{L}_{0}$
298are encoded as small jumps. Let us assume that in this case, the placement of
299$\mathtt{L}_{0}$ and the branch to it are such that $\mathtt{L}_{0}$ is just
300outside the segment that contains this branch. Let us also assume that the
301distance between $\mathtt{L}_{0}$ and the branch to it is too large for the
302branch instruction to be encoded as a short jump.
303
304All this means that in the second iteration, the branch to $\mathtt{L}_{0}$ will
305be encoded as a long jump. If we assume that the branch to {\tt X} is encoded as
306a long jump as well, the size of the branch instruction will increase and
307$\mathtt{L}_{0}$ will be `propelled' into the same segment as its branch
308instruction, because every subsequent instruction will move one byte forward.
309Hence, in the third iteration, the branch to $\mathtt{L}_{0}$ can be encoded as
310an absolute jump. At first glance, there is nothing that prevents us from
311constructing a configuration where two branch instructions interact in such a
312way as to iterate indefinitely between long and absolute encodings.
313
314This situation mirrors the explanation by Szymanski~\cite{Szymanski1978} of why
315the branch displacement optimisation problem is NP-complete. In this explanation,
316a condition for NP-completeness is the fact that programs be allowed to contain
317{\em pathological} jumps. These are branch instructions that can normally not be
318encoded as a short(er) jump, but gain this property when some other branch
319instructions are encoded as a long(er) jump. This is exactly what happens in
320Figure~\ref{f:term_example}. By encoding the first branch instruction as a long
321jump, another branch instruction switches from long to absolute (which is
322shorter).
323
324In addition, our previous optimality argument no longer holds. Consider the
325program shown in Figure~\ref{f:opt_example}. Suppose that the distance between
326$\mathtt{L}_{0}$ and $\mathtt{L}_{1}$ is such that if {\tt jmp X} is encoded
327as a short jump, there is a segment border just after $\mathtt{L}_{1}$. Let
328us also assume that all three branches to $\mathtt{L}_{1}$ are in the same
329segment, but far enough away from $\mathtt{L}_{1}$ that they cannot be encoded
330as short jumps.
331
332Then, if {\tt jmp X} were to be encoded as a short jump, which is clearly
333possible, all of the branches to $\mathtt{L}_{1}$ would have to be encoded as
334long jumps. However, if {\tt jmp X} were to be encoded as a long jump, and
335therefore increase in size, $\mathtt{L}_{1}$ would be `propelled' across the
336segment border, so that the three branches to $\mathtt{L}_{1}$ could be encoded
337as absolute jumps. Depending on the relative sizes of long and absolute jumps,
338this solution might actually be smaller than the one reached by the smallest
339fixed point algorithm.
340
341\section{Our algorithm}
342
343\subsection{Design decisions}
344
345Given the NP-completeness of the problem, finding optimal solutions
346(using, for example, a constraint solver) can potentially be very costly.
347
348The SDCC compiler~\cite{SDCC2011}, which has a backend targeting the MCS-51
349instruction set, simply encodes every branch instruction as a long jump
350without taking the distance into account. While certainly correct (the long
351jump can reach any destination in memory) and a very fast solution to compute,
352it results in a less than optimal solution in terms of output size and
353execution time.
354
355On the other hand, the {\tt gcc} compiler suite, while compiling
356C on the x86 architecture, uses a greatest fix point algorithm. In other words,
357it starts with all branch instructions encoded as the largest jumps
358available, and then tries to reduce the size of branch instructions as much as
359possible.
360
361Such an algorithm has the advantage that any intermediate result it returns
362is correct: the solution where every branch instruction is encoded as a large
363jump is always possible, and the algorithm only reduces those branch
364instructions whose destination address is in range for a shorter jump.
365The algorithm can thus be stopped after a determined number of steps without
366sacrificing correctness.
367
368The result, however, is not necessarily optimal. Even if the algorithm is run
369until it terminates naturally, the fixed point reached is the {\em greatest}
370fixed point, not the least fixed point. Furthermore, {\tt gcc} (at least for
371the x86 architecture) only uses short and long jumps. This makes the algorithm
372more efficient, as shown in the previous section, but also results in a less
373optimal solution.
374
375In the CerCo assembler, we opted at first for a least fixed point algorithm,
376taking absolute jumps into account.
377
378Here, we ran into a problem with proving termination, as explained in the
379previous section: if we only take short and long jumps into account, the jump
380encoding can only switch from short to long, but never in the other direction.
381When we add absolute jumps, however, it is theoretically possible for a branch
382instruction to switch from absolute to long and back, as previously explained.
383Proving termination then becomes difficult, because there is nothing that
384precludes a branch instruction from oscillating back and forth between absolute
385and long jumps indefinitely.
386
387To keep the algorithm in the same complexity class and more easily
388prove termination, we decided to explicitly enforce the `branch instructions
389must always grow longer' requirement: if a branch instruction is encoded as a
390long jump in one iteration, it will also be encoded as a long jump in all the
391following iterations. Therefore the encoding of any branch instruction
392can change at most two times: once from short to absolute (or long), and once
393from absolute to long.
394
395There is one complicating factor. Suppose that a branch instruction is encoded
396in step $n$ as an absolute jump, but in step $n+1$ it is determined that
397(because of changes elsewhere) it can now be encoded as a short jump. Due to
398the requirement that the branch instructions must always grow longer,
399the branch encoding will be encoded as an absolute jump in step
400$n+1$ as well.
401
402This is not necessarily correct. A branch instruction that can be
403encoded as a short jump cannot always also be encoded as an absolute jump, as a
404short jump can bridge segments, whereas an absolute jump cannot. Therefore,
405in this situation we have decided to encode the branch instruction as a long
406jump, which is always correct.
407
408The resulting algorithm, therefore, will not return the least fixed point, as
409it might have too many long jumps. However, it is still better than the
410algorithms from SDCC and {\tt gcc}, since even in the worst case, it will still
411return a smaller or equal solution.
412
413Experimenting with our algorithm on the test suite of C programs included with
414gcc 2.3.3 has shown that on average, about 25 percent of jumps are encoded as short or absolute jumps by the algorithm. As not all instructions are jumps, this does not make for a large reduction in size, but it can make for a reduction in execution time: if jumps
415are executed multiple times, for example in loops, the fact that short jumps take less cycles to execute than long jumps can have great effect.
416
417As for complexity, there are at most $2n$ iterations, where $n$ is the number of
418branch instructions. Practical tests within the CerCo project on small to
419medium pieces of code have shown that in almost all cases, a fixed point is
420reached in 3 passes. Only in one case did the algorithm need 4. This is not surprising: after all, the difference between short/absolute and
421long jumps is only one byte (three for conditional jumps). For a change from
422short/absolute to long to have an effect on other jumps is therefore relatively
423uncommon, which explains why a fixed point is reached so quickly.
424
425\subsection{The algorithm in detail}
426
427The branch displacement algorithm forms part of the translation from
428pseudocode to assembler. More specifically, it is used by the function that
429translates pseudo-addresses (natural numbers indicating the position of the
430instruction in the program) to actual addresses in memory. Note that in pseudocode, all instructions are of size 1.
431
432Our original intention was to have two different functions, one function
433$\mathtt{policy}: \mathbb{N} \rightarrow \{\mathtt{short\_jump},
434\mathtt{absolute\_jump}, \mathtt{long\_jump}\}$ to associate jumps to their
435intended encoding, and a function $\sigma: \mathbb{N} \rightarrow
436\mathtt{Word}$ to associate pseudo-addresses to machine addresses. $\sigma$
437would use $\mathtt{policy}$ to determine the size of jump instructions. This turned out to be suboptimal from the algorithmic point of view and
438impossible to prove correct.
439
440From the algorithmic point of view, in order to create the $\mathtt{policy}$
441function, we must necessarily have a translation from pseudo-addresses
442to machine addresses (i.e. a $\sigma$ function): in order to judge the distance
443between a jump and its destination, we must know their memory locations.
444Conversely, in order to create the $\sigma$ function, we need to have the
445$\mathtt{policy}$ function, otherwise we do not know the sizes of the jump
446instructions in the program.
447
448Much the same problem appears when we try to prove the algorithm correct: the
449correctness of $\mathtt{policy}$ depends on the correctness of $\sigma$, and
450the correctness of $\sigma$ depends on the correctness of $\mathtt{policy}$.
451
452We solved this problem by integrating the $\mathtt{policy}$ and $\sigma$
453algorithms. We now have a function
454$\sigma: \mathbb{N} \rightarrow \mathtt{Word} \times \mathtt{bool}$ which
455associates a pseudo-address to a machine address. The boolean denotes a forced
456long jump; as noted in the previous section, if during the fixed point
457computation an absolute jump changes to be potentially re-encoded as a short
458jump, the result is actually a long jump. It might therefore be the case that
459jumps are encoded as long jumps without this actually being necessary, and this
460information needs to be passed to the code generating function.
461
462The assembler function encodes the jumps by checking the distance between
463source and destination according to $\sigma$, so it could select an absolute
464jump in a situation where there should be a long jump. The boolean is there
465to prevent this from happening by indicating the locations where a long jump
466should be encoded, even if a shorter jump is possible. This has no effect on
467correctness, since a long jump is applicable in any situation.
468
469\begin{figure}[t]
470\small
471\begin{algorithmic}
472\Function{f}{$labels$,$old\_sigma$,$instr$,$ppc$,$acc$}
473  \State $\langle added, pc, sigma \rangle \gets acc$
474  \If {$instr$ is a backward jump to $j$}
475    \State $length \gets \mathrm{jump\_size}(pc,sigma_1(labels(j)))$
476    \Comment compute jump distance
477  \ElsIf {$instr$ is a forward jump to $j$}
478    \State $length \gets \mathrm{jump\_size}(pc,old\_sigma_1(labels(j))+added)$
479  \EndIf
480  \State $old\_length \gets \mathrm{old\_sigma_1}(ppc)$
481  \State $new\_length \gets \mathrm{max}(old\_length, length)$
482  \Comment length must never decrease
483  \State $old\_size \gets \mathrm{old\_sigma_2}(ppc)$
484  \State $new\_size \gets \mathrm{instruction\_size}(instr,new\_length)$
485  \Comment compute size in bytes
486  \State $new\_added \gets added+(new\_size-old\_size)$
487  \Comment keep track of total added bytes
488  \State $new\_sigma \gets old\_sigma$
489  \State $new\_sigma_1(ppc+1) \gets pc+new\_size$
490  \State $new\_sigma_2(ppc) \gets new\_length$
491  \Comment update $\sigma$ \\
492  \Return $\langle new\_added, pc+new\_size, new\_sigma \rangle$
493\EndFunction
494\end{algorithmic}
495\caption{The heart of the algorithm}
496\label{f:jump_expansion_step}
497\end{figure}
498
499The algorithm, shown in Figure~\ref{f:jump_expansion_step}, works by folding the
500function {\sc f} over the entire program, thus gradually constructing $sigma$.
501This constitutes one step in the fixed point calculation; successive steps
502repeat the fold until a fixed point is reached. We have abstracted away the case where an instruction is not a jump, since the size of these instructions is constant.
503
504Parameters of the function {\sc f} are:
505\begin{itemize}
506  \item a function $labels$ that associates a label to its pseudo-address;
507  \item $old\_sigma$, the $\sigma$ function returned by the previous
508    iteration of the fixed point calculation;
509  \item $instr$, the instruction currently under consideration;
510  \item $ppc$, the pseudo-address of $instr$;
511  \item $acc$, the fold accumulator, which contains $added$ (the number of
512  bytes added to the program size with respect to the previous iteration), $pc$
513  (the highest memory address reached so far), and of course $sigma$, the
514    $\sigma$ function under construction.
515\end{itemize}
516The first two are parameters that remain the same through one iteration, the
517final three are standard parameters for a fold function (including $ppc$,
518which is simply the number of instructions of the program already processed).
519
520The $\sigma$ functions used by {\sc f} are not of the same type as the final
521$\sigma$ function: they are of type
522$\sigma: \mathbb{N} \rightarrow \mathbb{N} \times \{\mathtt{short\_jump},
523\mathtt{absolute\_jump},\mathtt{long\_jump}\}$; a function that associates a
524pseudo-address with a memory address and a jump length. We do this to
525ease the comparison of jump lengths between iterations. In the algorithm,
526we use the notation $sigma_1(x)$ to denote the memory address corresponding to
527$x$, and $sigma_2(x)$ for the jump length corresponding to $x$.
528
529Note that the $\sigma$ function used for label lookup varies depending on
530whether the label is behind our current position or ahead of it. For
531backward branches, where the label is behind our current position, we can use
532$sigma$ for lookup, since its memory address is already known. However, for
533forward branches, the memory address of the address of the label is not yet
534known, so we must use $old\_sigma$.
535
536We cannot use $old\_sigma$ without change: it might be the case that we have
537already increased the size of some branch instructions before, making the
538program longer and moving every instruction forward. We must compensate for this
539by adding the size increase of the program to the label's memory address
540according to $old\_sigma$, so that branch instruction spans do not get
541compromised.
542
543%Note also that we add the pc to $sigma$ at location $ppc+1$, whereas we add the
544%jump length at location $ppc$. We do this so that $sigma(ppc)$ will always
545%return a pair with the start address of the instruction at $ppc$ and the
546%length of its branch instruction (if any); the end address of the program can
547%be found at $sigma(n+1)$, where $n$ is the number of instructions in the
548%program.
549
550\section{The proof}
551
552In this section, we present the correctness proof for the algorithm in more
553detail. The main correctness statement is shown, slightly simplified, in~Figure~\ref{statement}.
554%
555\begin{figure}[t]
556\small
557\begin{alignat*}{6}
558\mathtt{sigma}&\omit\rlap{$\mathtt{\_policy\_specification} \equiv
559\lambda program.\lambda sigma.$} \notag\\
560  & \omit\rlap{$sigma\ 0 = 0\ \wedge$} \notag\\
561  & \mathbf{let}\ & & \omit\rlap{$instr\_list \equiv code\ program\ \mathbf{in}$} \notag\\
562  &&& \omit\rlap{$\forall ppc.ppc < |instr\_list| \rightarrow$} \notag\\
563  &&& \mathbf{let}\ && pc \equiv sigma\ ppc\ \mathbf{in} \notag\\
564  &&& \mathbf{let}\ && instruction \equiv \mathtt{fetch\_pseudo\_instruction}\ instr\_list\ ppc\ \mathbf{in} \notag\\
565  &&& \mathbf{let}\ && next\_pc \equiv sigma\ (ppc+1)\ \mathbf{in}\notag\\
566  &&&&& next\_pc = pc + \mathtt{instruction\_size}\ sigma\ ppc\ instruction\ \wedge\notag\\
567  &&&&& (pc + \mathtt{instruction\_size}\ sigma\ ppc\ instruction < 2^{16}\ \vee\notag\\
568  &&&&& (\forall ppc'.ppc' < |instr\_list| \rightarrow ppc < ppc' \rightarrow \notag\\
569  &&&&& \mathbf{let}\ instruction' \equiv \mathtt{fetch\_pseudo\_instruction}\ instr\_list\ ppc'\ \mathbf{in} \notag\\
570  &&&&&\ \mathtt{instruction\_size}\ sigma\ ppc'\ instruction' = 0)\ \wedge \notag\\
571  &&&&& pc + \mathtt{instruction\_size}\ sigma\ ppc\ instruction = 2^{16})
572\end{alignat*}
573\caption{Main correctness statement\label{statement}}
574\label{sigmapolspec}
575\end{figure}
576%
577Informally, this means that when fetching a pseudo-instruction at $ppc$, the
578translation by $\sigma$ of $ppc+1$ is the same as $\sigma(ppc)$ plus the size
579of the instruction at $ppc$.  That is, an instruction is placed consecutively
580after the previous one, and there are no overlaps. The rest of the statement deals with memory size: either the next instruction fits within memory ($next\_pc < 2^{16}$) or it ends exactly at the limit memory,
581in which case it must be the last translated instruction in the program (enforced by specfiying that the size of all subsequent instructions is 0: there may be comments or cost annotations that are not translated).
582
583Finally, we enforce that the program starts at address 0, i.e. $\sigma(0) = 0$. It may seem strange that we do not explicitly include a safety property stating that every jump instruction is of the right type with respect to its target (akin to the lemma from Figure~\ref{sigmasafe}), but this is not necessary. The distance is recalculated according to the instruction addresses from $\sigma$, which implicitly expresses safety.
584
585Since our computation is a least fixed point computation, we must prove
586termination in order to prove correctness: if the algorithm is halted after
587a number of steps without reaching a fixed point, the solution is not
588guaranteed to be correct. More specifically, branch instructions might be
589encoded which do not coincide with the span between their location and their
590destination.
591
592Proof of termination rests on the fact that the encoding of branch
593instructions can only grow larger, which means that we must reach a fixed point
594after at most $2n$ iterations, with $n$ the number of branch instructions in
595the program. This worst case is reached if at every iteration, we change the
596encoding of exactly one branch instruction; since the encoding of any branch
597instruction can change first from short to absolute, and then to long, there
598can be at most $2n$ changes.
599
600%The proof has been carried out using the ``Russell'' style from~\cite{Sozeau2006}.
601%We have proven some invariants of the {\sc f} function from the previous
602%section; these invariants are then used to prove properties that hold for every
603%iteration of the fixed point computation; and finally, we can prove some
604%properties of the fixed point.
605
606\subsection{Fold invariants}
607
608In this section, we present the invariants that hold during the fold of {\sc f}
609over the program. These will be used later on to prove the properties of the
610iteration. During the fixed point computation, the $\sigma$ function is
611implemented as a trie for ease of access; computing $\sigma(x)$ is achieved by
612looking up the value of $x$ in the trie. Actually, during the fold, the value
613we pass along is a pair $\mathbb{N} \times \mathtt{ppc\_pc\_map}$. The first
614component is the number of bytes added to the program so far with respect to
615the previous iteration, and the second component, {\tt ppc\_pc\_map}, is the
616actual $\sigma$ trie (which we'll call $strie$ to avoid confusion).
617%
618{\small
619\begin{alignat*}{2}
620\mathtt{out} & \mathtt{\_of\_program\_none} \equiv \lambda prefix.\lambda strie. \notag\\
621& \forall i.i < 2^{16} \rightarrow (i > |prefix| \leftrightarrow
622 \mathtt{lookup\_opt}\ i\ (\mathtt{snd}\ strie) = \mathtt{None})
623\end{alignat*}}
624%
625The first invariant states that any pseudo-address not yet examined is not
626present in the lookup trie.
627%
628{\small
629\begin{alignat*}{2}
630\mathtt{not} & \mathtt{\_jump\_default} \equiv \lambda prefix.\lambda strie.\forall i.i < |prefix| \rightarrow\notag\\
631& \neg\mathtt{is\_jump}\ (\mathtt{nth}\ i\ prefix) \rightarrow \mathtt{lookup}\ i\ (\mathtt{snd}\ strie) = \mathtt{short\_jump}
632\end{alignat*}}
633%
634This invariant states that when we try to look up the jump length of a
635pseudo-address where there is no branch instruction, we will get the default
636value, a short jump.
637%
638{\small
639\begin{alignat*}{4}
640\mathtt{jump} & \mathtt{\_increase} \equiv \lambda pc.\lambda op.\lambda p.\forall i.i < |prefix| \rightarrow \notag\\ 
641& \mathbf{let}\  oj \equiv \mathtt{lookup}\ i\ (\mathtt{snd}\ op)\ \mathbf{in} \notag\\
642& \mathbf{let}\ j \equiv \mathtt{lookup}\ i\ (\mathtt{snd}\ p)\ \mathbf{in}\ \mathtt{jmpleq}\ oj\ j
643\end{alignat*}}
644%
645This invariant states that between iterations (with $op$ being the previous
646iteration, and $p$ the current one), jump lengths either remain equal or
647increase. It is needed for proving termination.
648%
649\begin{figure}[h]
650\small
651\begin{alignat*}{6}
652\mathtt{sigma} & \omit\rlap{$\mathtt{\_compact\_unsafe} \equiv \lambda prefix.\lambda strie.\forall n.n < |prefix| \rightarrow$}\notag\\
653& \mathbf{match}\ && \omit\rlap{$\mathtt{lookup\_opt}\ n\ (\mathtt{snd}\ strie)\ \mathbf{with}$}\notag\\
654&&& \omit\rlap{$\mathtt{None} \Rightarrow \mathrm{False}$} \notag\\
655&&& \omit\rlap{$\mathtt{Some}\ \langle pc, j \rangle \Rightarrow$} \notag\\
656&&& \mathbf{match}\ && \mathtt{lookup\_opt}\ (n+1)\ (\mathtt{snd}\ strie)\ \mathbf{with}\notag\\
657&&&&& \mathtt{None} \Rightarrow \mathrm{False} \notag\\
658&&&&& \mathtt{Some}\ \langle pc_1, j_1 \rangle \Rightarrow
659    pc_1 = pc + \notag\\
660&&&&& \ \ \mathtt{instruction\_size\_jmplen}\ j\ (\mathtt{nth}\ n\ prefix)
661\end{alignat*}
662\caption{Temporary safety property}
663\label{sigmacompactunsafe}
664\end{figure}
665%
666We now proceed with the safety lemmas. The lemma in
667Figure~\ref{sigmacompactunsafe} is a temporary formulation of the main
668property {\tt sigma\_policy\_specification}. Its main difference from the
669final version is that it uses {\tt instruction\_size\_jmplen} to compute the
670instruction size. This function uses $j$ to compute the span of branch
671instructions  (i.e. it uses the $\sigma$ under construction), instead
672of looking at the distance between source and destination. This is because
673$\sigma$ is still under construction; we will prove below that after the
674final iteration, {\tt sigma\_compact\_unsafe} is equivalent to the main
675property in Figure~\ref{sigmasafe} which holds at the end of the computation.
676%
677\begin{figure}[h]
678\small
679\begin{alignat*}{6}
680\mathtt{sigma} & \omit\rlap{$\mathtt{\_safe} \equiv \lambda prefix.\lambda labels.\lambda old\_strie.\lambda strie.\forall i.i < |prefix| \rightarrow$} \notag\\
681& \omit\rlap{$\forall dest\_label.\mathtt{is\_jump\_to\ (\mathtt{nth}\ i\ prefix})\ dest\_label \rightarrow$} \notag\\
682& \mathbf{let} && \omit\rlap{$\ paddr \equiv \mathtt{lookup}\ labels\ dest\_label\ \mathbf{in}$} \notag\\
683& \mathbf{let} && \omit\rlap{$\ \langle j, src, dest \rangle \equiv \mathbf{if} \ paddr\ \leq\ i\ \mathbf{then}$}\notag\\
684&&&&& \mathbf{let}\ \langle \_, j \rangle \equiv \mathtt{lookup}\ i\ (\mathtt{snd}\ strie)\ \mathbf{in} \notag\\
685&&&&& \mathbf{let}\ \langle pc\_plus\_jl, \_ \rangle \equiv \mathtt{lookup}\ (i+1)\ (\mathtt{snd}\ strie)\ \mathbf{in}\notag\\
686&&&&& \mathbf{let}\ \langle addr, \_ \rangle \equiv \mathtt{lookup}\ paddr\ (\mathtt{snd}\ strie)\ \mathbf{in}\notag\\
687&&&&& \langle j, pc\_plus\_jl, addr \rangle\notag\\
688&&&\mathbf{else} \notag\\
689&&&&&\mathbf{let}\ \langle \_, j \rangle \equiv \mathtt{lookup}\ i\ (\mathtt{snd}\ strie)\ \mathbf{in} \notag\\
690&&&&&\mathbf{let}\ \langle pc\_plus\_jl, \_ \rangle \equiv \mathtt{lookup}\ (i+1)\ (\mathtt{snd}\ old\_strie)\ \mathbf{in}\notag\\
691&&&&&\mathbf{let}\ \langle addr, \_ \rangle \equiv \mathtt{lookup}\ paddr\ (\mathtt{snd}\ old\_strie)\ \mathbf{in}\notag\\
692&&&&&\langle j, pc\_plus\_jl, addr \rangle \mathbf{in}\ \notag\\
693&&&\mathbf{match} && \ j\ \mathbf{with} \notag\\
694&&&&&\mathrm{short\_jump} \Rightarrow \mathtt{short\_jump\_valid}\ src\ dest\notag\\
695&&&&&\mathrm{absolute\_jump} \Rightarrow \mathtt{absolute\_jump\_valid}\ src\ dest\notag\\
696&&&&&\mathrm{long\_jump} \Rightarrow \mathrm{True}
697\end{alignat*}
698\caption{Safety property}
699\label{sigmasafe}
700\end{figure}
701%
702We compute the distance using the memory address of the instruction
703plus its size. This follows the behaviour of the MCS-51 microprocessor, which
704increases the program counter directly after fetching, and only then executes
705the branch instruction (by changing the program counter again).
706
707There are also some simple, properties to make sure that our policy
708remains consistent, and to keep track of whether the fixed point has been
709reached. We do not include them here in detail. Two of these properties give the values of $\sigma$ for the start and end of the program; $\sigma(0) = 0$ and $\sigma(n)$, where $n$ is the number of instructions up until now, is equal to the maximum memory address so far. There are also two properties that deal with what happens when the previous
710iteration does not change with respect to the current one. $added$ is a
711variable that keeps track of the number of bytes we have added to the program
712size by changing the encoding of branch instructions. If $added$ is 0, the program
713has not changed and vice versa.
714
715%{\small
716%\begin{align*}
717%& \mathtt{lookup}\ 0\ (\mathtt{snd}\ strie) = 0 \notag\\
718%& \mathtt{lookup}\ |prefix|\ (\mathtt{snd}\ strie) = \mathtt{fst}\ strie
719%\end{align*}}
720
721
722%{\small
723%\begin{align*}
724%& added = 0\ \rightarrow\ \mathtt{policy\_pc\_equal}\ prefix\ old\_strie\ strie \notag\\
725%& \mathtt{policy\_jump\_equal}\ prefix\ old\_strie\ strie\ \rightarrow\ added = 0
726%\end{align*}}
727
728We need to use two different formulations, because the fact that $added$ is 0
729does not guarantee that no branch instructions have changed.  For instance,
730it is possible that we have replaced a short jump with an absolute jump, which
731does not change the size of the branch instruction. Therefore {\tt policy\_pc\_equal} states that $old\_sigma_1(x) = sigma_1(x)$, whereas {\tt policy\_jump\_equal} states that $old\_sigma_2(x) = sigma_2(x)$. This formulation is sufficient to prove termination and compactness. 
732
733Proving these invariants is simple, usually by induction on the prefix length.
734
735\subsection{Iteration invariants}
736
737These are invariants that hold after the completion of an iteration. The main
738difference between these invariants and the fold invariants is that after the
739completion of the fold, we check whether the program size does not supersede
74064 Kb, the maximum memory size the MCS-51 can address. The type of an iteration therefore becomes an option type: {\tt None} in case
741the program becomes larger than 64 Kb, or $\mathtt{Some}\ \sigma$
742otherwise. We also no longer pass along the number of bytes added to the
743program size, but a boolean that indicates whether we have changed something
744during the iteration or not.
745
746If the iteration returns {\tt None}, which means that it has become too large for memory, there is an invariant that states that the previous iteration cannot
747have every branch instruction encoded as a long jump. This is needed later in the proof of termination. If the iteration returns $\mathtt{Some}\ \sigma$, the fold invariants are retained without change.
748
749Instead of using {\tt sigma\_compact\_unsafe}, we can now use the proper
750invariant:
751%
752{\small
753\begin{alignat*}{6}
754\mathtt{sigma} & \omit\rlap{$\mathtt{\_compact} \equiv \lambda program.\lambda sigma.$} \notag\\
755& \omit\rlap{$\forall n.n < |program|\ \rightarrow$} \notag\\
756& \mathbf{match}\ && \omit\rlap{$\mathtt{lookup\_opt}\ n\ (\mathtt{snd}\ sigma)\ \mathbf{with}$}\notag\\
757&&& \omit\rlap{$\mathrm{None}\ \Rightarrow\ \mathrm{False}$}\notag\\
758&&& \omit\rlap{$\mathrm{Some}\ \langle pc, j \rangle \Rightarrow$}\notag\\
759&&& \mathbf{match}\ && \mathtt{lookup\_opt}\ (n+1)\ (\mathtt{snd}\ sigma)\ \mathbf{with}\notag\\
760&&&&& \mathrm{None}\ \Rightarrow\ \mathrm{False}\notag\\
761&&&&& \mathrm{Some} \langle pc1, j1 \rangle \Rightarrow\notag\\
762&&&&& \ \ pc1 = pc + \mathtt{instruction\_size}\ n\ (\mathtt{nth}\ n\ program)
763\end{alignat*}}
764%
765This is almost the same invariant as ${\tt sigma\_compact\_unsafe}$, but differs in that it
766computes the sizes of branch instructions by looking at the distance between
767position and destination using $\sigma$. In actual use, the invariant is qualified: $\sigma$ is compact if there have
768been no changes (i.e. the boolean passed along is {\tt true}). This is to
769reflect the fact that we are doing a least fixed point computation: the result
770is only correct when we have reached the fixed point.
771
772There is another, trivial, invariant in case the iteration returns
773$\mathtt{Some}\ \sigma$: it must hold that $\mathtt{fst}\ sigma < 2^{16}$.
774We need this invariant to make sure that addresses do not overflow.
775
776The proof of {\tt nec\_plus\_ultra} goes as follows: if we return {\tt None},
777then the program size must be greater than 64 Kb. However, since the
778previous iteration did not return {\tt None} (because otherwise we would
779terminate immediately), the program size in the previous iteration must have
780been smaller than 64 Kb.
781
782Suppose that all the branch instructions in the previous iteration are
783encoded as long jumps. This means that all branch instructions in this
784iteration are long jumps as well, and therefore that both iterations are equal
785in the encoding of their branch instructions. Per the invariant, this means that
786$added = 0$, and therefore that all addresses in both iterations are equal.
787But if all addresses are equal, the program sizes must be equal too, which
788means that the program size in the current iteration must be smaller than
78964 Kb. This contradicts the earlier hypothesis, hence not all branch
790instructions in the previous iteration are encoded as long jumps.
791
792The proof of {\tt sigma\_compact} follows from {\tt sigma\_compact\_unsafe} and
793the fact that we have reached a fixed point, i.e. the previous iteration and
794the current iteration are the same. This means that the results of
795{\tt instruction\_size\_jmplen} and {\tt instruction\_size} are the same.
796
797\subsection{Final properties}
798
799These are the invariants that hold after $2n$ iterations, where $n$ is the
800program size (we use the program size for convenience; we could also use the
801number of branch instructions, but this is more complex). Here, we only
802need {\tt out\_of\_program\_none}, {\tt sigma\_compact} and the fact that
803$\sigma(0) = 0$.
804
805Termination can now be proved using the fact that there is a $k \leq 2n$, with
806$n$ the length of the program, such that iteration $k$ is equal to iteration
807$k+1$. There are two possibilities: either there is a $k < 2n$ such that this
808property holds, or every iteration up to $2n$ is different. In the latter case,
809since the only changes between the iterations can be from shorter jumps to
810longer jumps, in iteration $2n$ every branch instruction must be encoded as
811a long jump. In this case, iteration $2n$ is equal to iteration $2n+1$ and the
812fixed point is reached.
813
814\section{Conclusion}
815
816In the previous sections we have discussed the branch displacement optimisation
817problem, presented an optimised solution, and discussed the proof of
818termination and correctness for this algorithm, as formalised in Matita.
819
820The algorithm we have presented is fast and correct, but not optimal; a true
821optimal solution would need techniques like constraint solvers. While outside
822the scope of the present research, it would be interesting to see if enough
823heuristics could be found to make such a solution practical for implementing
824in an existing compiler; this would be especially useful for embedded systems,
825where it is important to have as small a solution as possible.
826
827In itself the algorithm is already useful, as it results in a smaller solution
828than the simple `every branch instruction is long' used up until now---and with
829only 64 Kb of memory, every byte counts. It also results in a smaller solution
830than the greatest fixed point algorithm that {\tt gcc} uses. It does this
831without sacrificing speed or correctness.
832
833The certification of an assembler that relies on the branch displacement
834algorithm described in this paper was presented in~\cite{lastyear}.
835The assembler computes the $\sigma$ map as described in this paper and
836then works in two passes. In the first pass it builds a map
837from instruction labels to addresses in the assembly code. In the
838second pass it iterates over the code, translating every pseudo jump
839at address $src$ to a label l associated to the assembly instruction at
840address $dst$ to a jump of the size dictated by $(\sigma\ src)$ to
841$(\sigma\ dst)$. In case of conditional jumps, the translated jump may be
842implemented with a series of instructions.
843
844The proof of correctness abstracts over the algorithm used and only relies on
845{\tt sigma\_policy\_specification} (page~\ref{sigmapolspec}). It is a variation
846of a standard 1-to-many forward simulation proof~\cite{Leroy2009}. The
847relation R between states just maps every code address $ppc$ stored in
848registers or memory to $(\sigma\ ppc)$. To identify the code addresses,
849an additional data structure is always kept together with the source
850state and is updated by the semantics. The semantics is preserved
851only for those programs whose source code operations
852$(f\ ppc_1\ \ldots\ ppc_n)$ applied to code addresses $ppc_1 \ldots ppc_n$ are
853such that $(f\ (\sigma\ ppc_1)\ ...\ (\sigma\ ppc_n) = f\ ppc_1\ ppc_n))$. For
854example, an injective $\sigma$ preserves a binary equality test f for code
855addresses, but not pointer subtraction.
856
857The main lemma (fetching simulation), which relies on\\
858{\tt sigma\_policy\_specification} and is established by structural induction
859over the source code, says that fetching an assembly instruction at
860position ppc is equal to fetching the translation of the instruction at
861position $(\sigma\ ppc)$, and that the new incremented program counter is at
862the beginning of the next instruction (compactness). The only exception is
863when the instruction fetched is placed at the end of code memory and is
864followed only by dead code. Execution simulation is trivial because of the
865restriction over well behaved programs w.r.t. sigma. The condition
866$\sigma\ 0 = 0$ is necessary because the hardware model prescribes that the
867first instruction to be executed will be at address 0. For the details
868see~\cite{lastyear}.
869
870Instead of verifying the algorithm directly, another solution to the problem
871would be to run an optimisation algorithm, and then verify the safety of the
872result using a verified validator. Such a validator would be easier to verify
873than the algorithm itself and it would also be efficient, requiring only a
874linear pass over the source code to test the specification. However, it is
875surely also interesting to formally prove that the assembler never rejects
876programs that should be accepted, i.e. that the algorithm itself is correct.
877This is the topic of the current paper.
878
879\subsection{Related work}
880
881As far as we are aware, this is the first formal discussion of the branch
882displacement optimisation algorithm.
883
884The CompCert project is another verified compiler project.
885Their backend~\cite{Leroy2009} generates assembly code for (amongst others) subsets of the
886PowerPC and x86 (32-bit) architectures. At the assembly code stage, there is
887no distinction between the span-dependent jump instructions, so a branch
888displacement optimisation algorithm is not needed.
889
890%An offshoot of the CompCert project is the CompCertTSO project, which adds
891%thread concurrency and synchronisation to the CompCert compiler~\cite{TSO2011}.
892%This compiler also generates assembly code and therefore does not include a
893%branch displacement algorithm.
894 
895%Finally, there is also the Piton stack~\cite{Moore1996}, which not only includes the
896%formal verification of a compiler, but also of the machine architecture
897%targeted by that compiler, a microprocessor called the FM9001.
898%However, this architecture does not have different
899%jump sizes (branching is simulated by assigning values to the program counter),
900%so the branch displacement problem is irrelevant.
901 
902
903\bibliographystyle{alpha}
904\bibliography{biblio}
905%\bibliographystyle{spbasic}
906
907\end{document}
Note: See TracBrowser for help on using the repository browser.