1 | \section{Conclusion} |
---|
2 | |
---|
3 | In the previous sections we have discussed the branch displacement optimisation |
---|
4 | problem, presented an optimised solution, and discussed the proof of |
---|
5 | termination and correctness for this algorithm, as formalised in Matita. |
---|
6 | |
---|
7 | The algorithm we have presented is fast and correct, but not optimal; a true |
---|
8 | optimal solution would need techniques like constraint solvers. While outside |
---|
9 | the scope of the present research, it would be interesting to see if enough |
---|
10 | heuristics could be found to make such a solution practical for implementing |
---|
11 | in an existing compiler; this would be especially useful for embedded systems, |
---|
12 | where it is important to have as small a solution as possible. |
---|
13 | |
---|
14 | In itself the algorithm is already useful, as it results in a smaller solution |
---|
15 | than the simple `every branch instruction is long' used up until now---and with |
---|
16 | only 64 Kb of memory, every byte counts. It also results in a smaller solution |
---|
17 | than the greatest fixed point algorithm that {\tt gcc} uses. It does this |
---|
18 | without sacrificing speed or correctness. |
---|
19 | |
---|
20 | The certification of an assembler that relies on the branch displacement |
---|
21 | algorithm described in this paper was presented in~\cite{lastyear}. |
---|
22 | The assembler computes the $\sigma$ map as described in this paper and |
---|
23 | then works in two passes. In the first pass it builds a map |
---|
24 | from instruction labels to addresses in the assembly code. In the |
---|
25 | second pass it iterates over the code, translating every pseudo jump |
---|
26 | at address $src$ to a label l associated to the assembly instruction at |
---|
27 | address $dst$ to a jump of the size dictated by $(\sigma\ src)$ to |
---|
28 | $(\sigma\ dst)$. In case of conditional jumps, the translated jump may be |
---|
29 | implemented with a series of instructions. |
---|
30 | |
---|
31 | The proof of correctness abstracts over the algorithm used and only relies on |
---|
32 | {\tt sigma\_policy\_specification} (page~\ref{sigmapolspec}). It is a variation |
---|
33 | of a standard 1-to-many forward simulation proof~\cite{Leroy2009}. The |
---|
34 | relation R between states just maps every code address $ppc$ stored in |
---|
35 | registers or memory to $(\sigma\ ppc)$. To identify the code addresses, |
---|
36 | an additional data structure is always kept together with the source |
---|
37 | state and is updated by the semantics. The semantics is preserved |
---|
38 | only for those programs whose source code operations |
---|
39 | $(f\ ppc_1\ \ldots\ ppc_n)$ applied to code addresses $ppc_1 \ldots ppc_n$ are |
---|
40 | such that $(f\ (\sigma\ ppc_1)\ ...\ (\sigma\ ppc_n) = f\ ppc_1\ ppc_n))$. For |
---|
41 | example, an injective $\sigma$ preserves a binary equality test f for code |
---|
42 | addresses, but not pointer subtraction. |
---|
43 | |
---|
44 | The main lemma (fetching simulation), which relies on\\ |
---|
45 | {\tt sigma\_policy\_specification} and is established by structural induction |
---|
46 | over the source code, says that fetching an assembly instruction at |
---|
47 | position ppc is equal to fetching the translation of the instruction at |
---|
48 | position $(\sigma\ ppc)$, and that the new incremented program counter is at |
---|
49 | the beginning of the next instruction (compactness). The only exception is |
---|
50 | when the instruction fetched is placed at the end of code memory and is |
---|
51 | followed only by dead code. Execution simulation is trivial because of the |
---|
52 | restriction over well behaved programs w.r.t. sigma. The condition |
---|
53 | $\sigma\ 0 = 0$ is necessary because the hardware model prescribes that the |
---|
54 | first instruction to be executed will be at address 0. For the details |
---|
55 | see~\cite{lastyear}. |
---|
56 | |
---|
57 | Instead of verifying the algorithm directly, another solution to the problem |
---|
58 | would be to run an optimisation algorithm, and then verify the safety of the |
---|
59 | result using a verified validator. Such a validator would be easier to verify |
---|
60 | than the algorithm itself and it would also be efficient, requiring only a |
---|
61 | linear pass over the source code to test the specification. However, it is |
---|
62 | surely also interesting to formally prove that the assembler never rejects |
---|
63 | programs that should be accepted, i.e. that the algorithm itself is correct. |
---|
64 | This is the topic of the current paper. |
---|
65 | |
---|
66 | \subsection{Related work} |
---|
67 | |
---|
68 | As far as we are aware, this is the first formal discussion of the branch |
---|
69 | displacement optimisation algorithm. |
---|
70 | |
---|
71 | The CompCert project is another verified compiler project. |
---|
72 | Their backend~\cite{Leroy2009} generates assembly code for (amongst others) subsets of the |
---|
73 | PowerPC and x86 (32-bit) architectures. At the assembly code stage, there is |
---|
74 | no distinction between the span-dependent jump instructions, so a branch |
---|
75 | displacement optimisation algorithm is not needed. |
---|
76 | |
---|
77 | %An offshoot of the CompCert project is the CompCertTSO project, which adds |
---|
78 | %thread concurrency and synchronisation to the CompCert compiler~\cite{TSO2011}. |
---|
79 | %This compiler also generates assembly code and therefore does not include a |
---|
80 | %branch displacement algorithm. |
---|
81 | |
---|
82 | %Finally, there is also the Piton stack~\cite{Moore1996}, which not only includes the |
---|
83 | %formal verification of a compiler, but also of the machine architecture |
---|
84 | %targeted by that compiler, a microprocessor called the FM9001. |
---|
85 | %However, this architecture does not have different |
---|
86 | %jump sizes (branching is simulated by assigning values to the program counter), |
---|
87 | %so the branch displacement problem is irrelevant. |
---|
88 | |
---|