Changeset 3362


Ignore:
Timestamp:
Jun 17, 2013, 1:08:27 PM (4 years ago)
Author:
boender
Message:
  • added some bits as per Claudio's mail
  • rewrote some small things
  • general reread, spell check, grammar check
  • 16 pages again now
Location:
src/ASM/CPP2012-policy
Files:
5 edited

Legend:

Unmodified
Added
Removed
  • src/ASM/CPP2012-policy/algorithm.tex

    r3361 r3362  
    66(using, for example, a constraint solver) can potentially be very costly.
    77
    8 The SDCC compiler~\cite{SDCC2011}, which has a backend targetting the MCS-51
     8The SDCC compiler~\cite{SDCC2011}, which has a backend targeting the MCS-51
    99instruction set, simply encodes every branch instruction as a long jump
    1010without taking the distance into account. While certainly correct (the long
    1111jump can reach any destination in memory) and a very fast solution to compute,
    12 it results in a less than optimal solution.
     12it results in a less than optimal solution in terms of output size and
     13execution time.
    1314
    1415On the other hand, the {\tt gcc} compiler suite~\cite{GCC2012}, while compiling
     
    6667jump, which is always correct.
    6768
    68 The resulting algorithm, while not optimal, is at least as good as the ones
    69 from SDCC and {\tt gcc}, and potentially better. Its complexity remains
    70 the same (there are at most $2n$ iterations, where $n$ is the number of branch
    71 instructions in the program).
     69The resulting algorithm, therefore, will not return the least fixed point, as
     70it might have too many long jumps. However, it is still better than the
     71algorithms from SDCC and {\tt gcc}, since even in the worst case, it will still
     72return a smaller or equal solution.
     73
     74As for complexity, there are at most $2n$ iterations, with $n$ the number of
     75branch instructions. Practical tests within the CerCo project on small to
     76medium pieces of code have shown that in almost all cases, a fixed point is
     77reached in 3 passes. Only in one case did the algorithm need 4.
     78
     79This is not surprising: after all, the difference between short/absolute and
     80long jumps is only one byte (three for conditional jumps). For a change from
     81short/absolute to long to have an effect on other jumps is therefore relatively
     82uncommon, which explains why a fixed point is reached so quickly.
    7283
    7384\subsection{The algorithm in detail}
  • src/ASM/CPP2012-policy/biblio.bib

    r2099 r3362  
    166166 keywords = {relaxed memory models, semantics, verifying compilation},
    167167}
     168
     169@incollection{lastyear,
     170year={2012},
     171isbn={978-3-642-35307-9},
     172booktitle={Certified Programs and Proofs},
     173volume={7679},
     174series={Lecture Notes in Computer Science},
     175editor={Hawblitzel, Chris and Miller, Dale},
     176doi={10.1007/978-3-642-35308-6_8},
     177title={An Executable Semantics for CompCert C},
     178url={http://dx.doi.org/10.1007/978-3-642-35308-6_8},
     179publisher={Springer Berlin Heidelberg},
     180author={Campbell, Brian},
     181pages={60-75}
     182}
     183
     184@misc
     185{ compcert:2011,
     186  title = {The {CompCert} project},
     187  howpublished = {\url{http://compcert.inria.fr/}},
     188  year = {2011},
     189  key = {compcert:2011}
     190}
  • src/ASM/CPP2012-policy/conclusion.tex

    r2099 r3362  
    1010heuristics could be found to make such a solution practical for implementing
    1111in an existing compiler; this would be especially useful for embedded systems,
    12 where it is important to have as small solution as possible.
     12where it is important to have as small a solution as possible.
    1313
    1414In itself the algorithm is already useful, as it results in a smaller solution
     
    1818without sacrificing speed or correctness.
    1919
    20 This algorithm is part of a greater whole, the CerCo project, which aims to
    21 completely formalise and verify a concrete cost preserving compiler for a large
    22 subset of the C programming language. More information on the formalisation of
    23 the assembler, of which the present work is a part, can be found in a companion
    24 publication~\cite{DC2012}.
     20The certification of an assembler that relies on the branch displacement
     21algorithm described in this paper was presented in~\cite{lastyear}.
     22The assembler computes the $\sigma$ map as described in this paper and
     23then works in two passes. In the first pass it builds a map
     24from instruction labels to addresses in the assembly code. In the
     25second pass it iterates over the code, translating every pseudo jump
     26at address src to a label l associated to the assembly instruction at
     27address dst to a jump of the size dictated by $(\sigma\ src)$ to
     28$(\sigma\ dst)$. In case of conditional jumps, the translated jump may be
     29implemented with a series of instructions.
     30
     31The proof of correctness abstracts over the algorithm used and only relies on
     32{\tt sigma\_policy\_specification} (page~\ref{sigmapolspec}). It is a variation
     33of a standard 1-to-many forward simulation proof~\cite{compcert:2011}. The
     34relation R between states just maps every code address ppc stored in
     35registers or memory to $(\sigma\ ppc)$. To identify the code addresses,
     36an additional data structure is always kept together with the source
     37state and is updated by the semantics. The semantics is preserved
     38only for those programs whose source code operations
     39$(f\ ppc1\ \ldots\ ppcn)$ applied to code addresses $ppc1 \ldots ppcn$ are such
     40that $(f\ (\sigma\ ppc1)\ ...\ (\sigma\ ppc) = f\ ppc\ ppcn))$. For example,
     41an injective $\sigma$ preserves a binary equality test f for code addresses,
     42but not pointer subtraction.
     43
     44The main lemma (fetching simulation), which relies on\\
     45{\tt sigma\_policy\_specification} and is established by structural induction
     46over the source code, says that fetching an assembly instruction at
     47position ppc is equal to fetching the translation of the instruction at
     48position $(\sigma\ ppc)$, and that the new incremented program counter is at
     49the beginning of the next instruction (compactness). The only exception is
     50when the instruction fetched is placed at the end of code memory and is
     51followed only by dead code. Execution simulation is trivial because of the
     52restriction over well behaved programs w.r.t. sigma. The condition
     53$\sigma\ 0 = 0$ is necessary because the hardware model prescribes that the
     54first instruction to be executed will be at address 0. For the details
     55see~\cite{lastyear}.
    2556
    2657\subsection{Related work}
  • src/ASM/CPP2012-policy/problem.tex

    r3361 r3362  
    22
    33The problem of branch displacement optimisation, also known as jump encoding, is
    4 a well-known problem in assembler design~\cite{Hyde2006}. It is caused by the
     4a well-known problem in assembler design~\cite{Hyde2006}. Its origin lies in the
    55fact that in many architecture sets, the encoding (and therefore size) of some
    66instructions depends on the distance to their operand (the instruction 'span').
     
    7878encoded using three branch instructions (for instructions whose logical
    7979negation is available, it can be done with two branch instructions, but for
    80 some instructions this is not available); the call instruction is
     80some instructions this is not the case). The call instruction is
    8181only available in absolute and long forms.
    8282
    83 Note that even though the MCS-51 architecture is much less advanced and simpler
    84 than the x86-64 architecture, the basic types of branch instruction
     83Note that even though the MCS-51 architecture is much less advanced and much
     84simpler than the x86-64 architecture, the basic types of branch instruction
    8585remain the same: a short jump with a limited range, an intra-segment jump and a
    8686jump that can reach the entire available memory.
     
    9797possible, thus minimising program length and execution time.
    9898
    99 This problem is known to be NP-complete~\cite{Robertson1979,Szymanski1978},
     99Similar problems, e.g. the branch displacement optimisation problem for other
     100architectures, are known to be NP-complete~\cite{Robertson1979,Szymanski1978},
    100101which could make finding an optimal solution very time-consuming.
    101102
     
    128129
    129130However, neither of these claims (termination nor optimality) hold when we add
    130 the absolute jump, as with absolute jumps, the encoding of a branch
     131the absolute jump. With absolute jumps, the encoding of a branch
    131132instruction no longer depends only on the distance between the branch
    132 instruction and its target: an absolute jump is possible when they
    133 are in the same segment (for the MCS-51, this means that the first 5
     133instruction and its target. An absolute jump is possible when instruction and
     134target are in the same segment (for the MCS-51, this means that the first 5
    134135bytes of their addresses have to be equal). It is therefore entirely possible
    135136for two branch instructions with the same span to be encoded in different ways
     
    203204shorter).
    204205
    205 In addition, our previous optimality argument no longer holds. Consider the program
    206 shown in Figure~\ref{f:opt_example}. Suppose that the distance between
     206In addition, our previous optimality argument no longer holds. Consider the
     207program shown in Figure~\ref{f:opt_example}. Suppose that the distance between
    207208$\mathtt{L}_{0}$ and $\mathtt{L}_{1}$ is such that if {\tt jmp X} is encoded
    208209as a short jump, there is a segment border just after $\mathtt{L}_{1}$. Let
    209 us also assume that the three branches to $\mathtt{L}_{1}$ are all in the same
     210us also assume that all three branches to $\mathtt{L}_{1}$ are in the same
    210211segment, but far enough away from $\mathtt{L}_{1}$ that they cannot be encoded
    211212as short jumps.
  • src/ASM/CPP2012-policy/proof.tex

    r3361 r3362  
    22
    33In this section, we present the correctness proof for the algorithm in more
    4 detail.  The main correctness statement is shown, slightly simplified, in~Figure~\cite{statement}.
    5 
     4detail.  The main correctness statement is shown, slightly simplified, in~Figure~\ref{statement}.
     5
     6\label{sigmapolspec}
    67\begin{figure}[t]
    78\begin{alignat*}{6}
     
    6162\subsection{Fold invariants}
    6263
    63 These are the invariants that hold during the fold of {\sc f} over the program,
    64 and that will later on be used to prove the properties of the iteration.
     64In this section, we present the invariants that hold during the fold of {\sc f}
     65over the program. These will be used later on to prove the properties of the
     66iteration.
    6567
    6668Note that during the fixed point computation, the $\sigma$ function is
    67 implemented as a trie for ease of access; computing $\sigma(x)$ is achieved by looking
    68 up the value of $x$ in the trie. Actually, during the fold, the value we
    69 pass along is a pair $\mathbb{N} \times \mathtt{ppc\_pc\_map}$. The first component
    70 is the number of bytes added to the program so far with respect to
     69implemented as a trie for ease of access; computing $\sigma(x)$ is achieved by
     70looking up the value of $x$ in the trie. Actually, during the fold, the value
     71we pass along is a pair $\mathbb{N} \times \mathtt{ppc\_pc\_map}$. The first
     72component is the number of bytes added to the program so far with respect to
    7173the previous iteration, and the second component, {\tt ppc\_pc\_map}, is the
    72 acual $\sigma$ trie.
     74actual $\sigma$ trie.
    7375
    7476\begin{alignat*}{2}
     
    7880\end{alignat*}
    7981
    80 This invariant states that any pseudo-address not yet examined is not
     82The first invariant states that any pseudo-address not yet examined is not
    8183present in the lookup trie.
    8284
     
    104106increase. It is needed for proving termination.
    105107
     108\begin{figure}[ht]
    106109\begin{alignat*}{6}
    107110\mathtt{sigma} & \omit\rlap{$\mathtt{\_compact\_unsafe} \equiv \lambda prefix.\lambda sigma.$}\notag\\
     
    116119&&&&& \ \ \mathtt{instruction\_size\_jmplen}\ j\ (\mathtt{nth}\ n\ prefix)
    117120\end{alignat*}
    118 
    119 This is a temporary formulation of the main property\\
    120 ({\tt sigma\_policy\_specification}); its main difference
     121\caption{Temporary safety property}
     122\label{sigmacompactunsafe}
     123\end{figure}
     124
     125We can now proceed with the lemmas that are needed for algorithm safety.
     126The lemma in Figure~\ref{sigmacompactunsafe} is a temporary formulation of
     127the main property\\ ({\tt sigma\_policy\_specification}). Its main difference
    121128from the final version is that it uses {\tt instruction\_size\_jmplen} to
    122129compute the instruction size. This function uses $j$ to compute the span
     
    127134property.
    128135
     136\begin{figure}[ht]
    129137\begin{alignat*}{6}
    130138\mathtt{sigma} & \omit\rlap{$\mathtt{\_safe} \equiv \lambda prefix.\lambda labels.\lambda old\_sigma.\lambda sigma$}\notag\\
     
    149157&&&&&\mathrm{long\_jump} \Rightarrow \mathrm{True}
    150158\end{alignat*}
    151 
    152 This is a more direct safety property: it states that branch instructions are
    153 encoded properly, and that no wrong branch instructions are chosen.
     159\caption{Safety property}
     160\label{sigmasafe}
     161\end{figure}
     162
     163The lemma in figure~\ref{sigmasafe} is a more direct safety property. It states
     164that branch instructions are encoded properly, and that no wrong branch
     165instructions are chosen.
    154166
    155167Note that we compute the distance using the memory address of the instruction
    156 plus its size: this follows the behaviour of the MCS-51 microprocessor, which
     168plus its size. This follows the behaviour of the MCS-51 microprocessor, which
    157169increases the program counter directly after fetching, and only then executes
    158170the branch instruction (by changing the program counter again).
     171
     172We now encode some final, simple, properties to make sure that our policy
     173remains consistent, and to keep track of whether the fixed point has been
     174reached.
    159175
    160176\begin{align*}
     
    198214The type of an iteration therefore becomes an option type: {\tt None} in case
    199215the program becomes larger than 64 Kb, or $\mathtt{Some}\ \sigma$
    200 otherwise. We also no longer use a natural number to pass along the number of
    201 bytes added to the program size, but a boolean that indicates whether we have
    202 changed something during the iteration or not.
     216otherwise. We also no longer pass along the number of bytes added to the
     217program size, but a boolean that indicates whether we have changed something
     218during the iteration or not.
    203219
    204220If an iteration returns {\tt None}, we have the following invariant:
     
    246262is only correct when we have reached the fixed point.
    247263
    248 There is another, trivial, invariant if the iteration returns
    249 $\mathtt{Some}\ \sigma$: $\mathtt{fst}\ sigma = 2^{16}$.
     264There is another, trivial, invariant in case the iteration returns
     265$\mathtt{Some}\ \sigma$: it must hold that $\mathtt{fst}\ sigma < 2^{16}$.
     266We need this invariant to make sure that addresses do not overflow.
    250267
    251268The invariants that are taken directly from the fold invariants are trivial to
    252269prove.
    253270
    254 The proof of {\tt nec\_plus\_ultra} works as follows: if we return {\tt None},
     271The proof of {\tt nec\_plus\_ultra} goes as follows: if we return {\tt None},
    255272then the program size must be greater than 64 Kb. However, since the
    256273previous iteration did not return {\tt None} (because otherwise we would
     
    288305longer jumps, in iteration $2n$ every branch instruction must be encoded as
    289306a long jump. In this case, iteration $2n$ is equal to iteration $2n+1$ and the
    290 fixpoint is reached.
     307fixed point is reached.
Note: See TracChangeset for help on using the changeset viewer.