Changeset 3362
 Timestamp:
 Jun 17, 2013, 1:08:27 PM (4 years ago)
 Location:
 src/ASM/CPP2012policy
 Files:

 5 edited
Legend:
 Unmodified
 Added
 Removed

src/ASM/CPP2012policy/algorithm.tex
r3361 r3362 6 6 (using, for example, a constraint solver) can potentially be very costly. 7 7 8 The SDCC compiler~\cite{SDCC2011}, which has a backend target ting the MCS518 The SDCC compiler~\cite{SDCC2011}, which has a backend targeting the MCS51 9 9 instruction set, simply encodes every branch instruction as a long jump 10 10 without taking the distance into account. While certainly correct (the long 11 11 jump can reach any destination in memory) and a very fast solution to compute, 12 it results in a less than optimal solution. 12 it results in a less than optimal solution in terms of output size and 13 execution time. 13 14 14 15 On the other hand, the {\tt gcc} compiler suite~\cite{GCC2012}, while compiling … … 66 67 jump, which is always correct. 67 68 68 The resulting algorithm, while not optimal, is at least as good as the ones 69 from SDCC and {\tt gcc}, and potentially better. Its complexity remains 70 the same (there are at most $2n$ iterations, where $n$ is the number of branch 71 instructions in the program). 69 The resulting algorithm, therefore, will not return the least fixed point, as 70 it might have too many long jumps. However, it is still better than the 71 algorithms from SDCC and {\tt gcc}, since even in the worst case, it will still 72 return a smaller or equal solution. 73 74 As for complexity, there are at most $2n$ iterations, with $n$ the number of 75 branch instructions. Practical tests within the CerCo project on small to 76 medium pieces of code have shown that in almost all cases, a fixed point is 77 reached in 3 passes. Only in one case did the algorithm need 4. 78 79 This is not surprising: after all, the difference between short/absolute and 80 long jumps is only one byte (three for conditional jumps). For a change from 81 short/absolute to long to have an effect on other jumps is therefore relatively 82 uncommon, which explains why a fixed point is reached so quickly. 72 83 73 84 \subsection{The algorithm in detail} 
src/ASM/CPP2012policy/biblio.bib
r2099 r3362 166 166 keywords = {relaxed memory models, semantics, verifying compilation}, 167 167 } 168 169 @incollection{lastyear, 170 year={2012}, 171 isbn={9783642353079}, 172 booktitle={Certified Programs and Proofs}, 173 volume={7679}, 174 series={Lecture Notes in Computer Science}, 175 editor={Hawblitzel, Chris and Miller, Dale}, 176 doi={10.1007/9783642353086_8}, 177 title={An Executable Semantics for CompCert C}, 178 url={http://dx.doi.org/10.1007/9783642353086_8}, 179 publisher={Springer Berlin Heidelberg}, 180 author={Campbell, Brian}, 181 pages={6075} 182 } 183 184 @misc 185 { compcert:2011, 186 title = {The {CompCert} project}, 187 howpublished = {\url{http://compcert.inria.fr/}}, 188 year = {2011}, 189 key = {compcert:2011} 190 } 
src/ASM/CPP2012policy/conclusion.tex
r2099 r3362 10 10 heuristics could be found to make such a solution practical for implementing 11 11 in an existing compiler; this would be especially useful for embedded systems, 12 where it is important to have as small solution as possible.12 where it is important to have as small a solution as possible. 13 13 14 14 In itself the algorithm is already useful, as it results in a smaller solution … … 18 18 without sacrificing speed or correctness. 19 19 20 This algorithm is part of a greater whole, the CerCo project, which aims to 21 completely formalise and verify a concrete cost preserving compiler for a large 22 subset of the C programming language. More information on the formalisation of 23 the assembler, of which the present work is a part, can be found in a companion 24 publication~\cite{DC2012}. 20 The certification of an assembler that relies on the branch displacement 21 algorithm described in this paper was presented in~\cite{lastyear}. 22 The assembler computes the $\sigma$ map as described in this paper and 23 then works in two passes. In the first pass it builds a map 24 from instruction labels to addresses in the assembly code. In the 25 second pass it iterates over the code, translating every pseudo jump 26 at address src to a label l associated to the assembly instruction at 27 address dst to a jump of the size dictated by $(\sigma\ src)$ to 28 $(\sigma\ dst)$. In case of conditional jumps, the translated jump may be 29 implemented with a series of instructions. 30 31 The proof of correctness abstracts over the algorithm used and only relies on 32 {\tt sigma\_policy\_specification} (page~\ref{sigmapolspec}). It is a variation 33 of a standard 1tomany forward simulation proof~\cite{compcert:2011}. The 34 relation R between states just maps every code address ppc stored in 35 registers or memory to $(\sigma\ ppc)$. To identify the code addresses, 36 an additional data structure is always kept together with the source 37 state and is updated by the semantics. The semantics is preserved 38 only for those programs whose source code operations 39 $(f\ ppc1\ \ldots\ ppcn)$ applied to code addresses $ppc1 \ldots ppcn$ are such 40 that $(f\ (\sigma\ ppc1)\ ...\ (\sigma\ ppc) = f\ ppc\ ppcn))$. For example, 41 an injective $\sigma$ preserves a binary equality test f for code addresses, 42 but not pointer subtraction. 43 44 The main lemma (fetching simulation), which relies on\\ 45 {\tt sigma\_policy\_specification} and is established by structural induction 46 over the source code, says that fetching an assembly instruction at 47 position ppc is equal to fetching the translation of the instruction at 48 position $(\sigma\ ppc)$, and that the new incremented program counter is at 49 the beginning of the next instruction (compactness). The only exception is 50 when the instruction fetched is placed at the end of code memory and is 51 followed only by dead code. Execution simulation is trivial because of the 52 restriction over well behaved programs w.r.t. sigma. The condition 53 $\sigma\ 0 = 0$ is necessary because the hardware model prescribes that the 54 first instruction to be executed will be at address 0. For the details 55 see~\cite{lastyear}. 25 56 26 57 \subsection{Related work} 
src/ASM/CPP2012policy/problem.tex
r3361 r3362 2 2 3 3 The problem of branch displacement optimisation, also known as jump encoding, is 4 a wellknown problem in assembler design~\cite{Hyde2006}. It is caused bythe4 a wellknown problem in assembler design~\cite{Hyde2006}. Its origin lies in the 5 5 fact that in many architecture sets, the encoding (and therefore size) of some 6 6 instructions depends on the distance to their operand (the instruction 'span'). … … 78 78 encoded using three branch instructions (for instructions whose logical 79 79 negation is available, it can be done with two branch instructions, but for 80 some instructions this is not available); the call instruction is80 some instructions this is not the case). The call instruction is 81 81 only available in absolute and long forms. 82 82 83 Note that even though the MCS51 architecture is much less advanced and simpler84 than the x8664 architecture, the basic types of branch instruction83 Note that even though the MCS51 architecture is much less advanced and much 84 simpler than the x8664 architecture, the basic types of branch instruction 85 85 remain the same: a short jump with a limited range, an intrasegment jump and a 86 86 jump that can reach the entire available memory. … … 97 97 possible, thus minimising program length and execution time. 98 98 99 This problem is known to be NPcomplete~\cite{Robertson1979,Szymanski1978}, 99 Similar problems, e.g. the branch displacement optimisation problem for other 100 architectures, are known to be NPcomplete~\cite{Robertson1979,Szymanski1978}, 100 101 which could make finding an optimal solution very timeconsuming. 101 102 … … 128 129 129 130 However, neither of these claims (termination nor optimality) hold when we add 130 the absolute jump , as with absolute jumps, the encoding of a branch131 the absolute jump. With absolute jumps, the encoding of a branch 131 132 instruction no longer depends only on the distance between the branch 132 instruction and its target : an absolute jump is possible when they133 are in the same segment (for the MCS51, this means that the first 5133 instruction and its target. An absolute jump is possible when instruction and 134 target are in the same segment (for the MCS51, this means that the first 5 134 135 bytes of their addresses have to be equal). It is therefore entirely possible 135 136 for two branch instructions with the same span to be encoded in different ways … … 203 204 shorter). 204 205 205 In addition, our previous optimality argument no longer holds. Consider the program206 shown in Figure~\ref{f:opt_example}. Suppose that the distance between206 In addition, our previous optimality argument no longer holds. Consider the 207 program shown in Figure~\ref{f:opt_example}. Suppose that the distance between 207 208 $\mathtt{L}_{0}$ and $\mathtt{L}_{1}$ is such that if {\tt jmp X} is encoded 208 209 as a short jump, there is a segment border just after $\mathtt{L}_{1}$. Let 209 us also assume that the three branches to $\mathtt{L}_{1}$ are allin the same210 us also assume that all three branches to $\mathtt{L}_{1}$ are in the same 210 211 segment, but far enough away from $\mathtt{L}_{1}$ that they cannot be encoded 211 212 as short jumps. 
src/ASM/CPP2012policy/proof.tex
r3361 r3362 2 2 3 3 In this section, we present the correctness proof for the algorithm in more 4 detail. The main correctness statement is shown, slightly simplified, in~Figure~\cite{statement}. 5 4 detail. The main correctness statement is shown, slightly simplified, in~Figure~\ref{statement}. 5 6 \label{sigmapolspec} 6 7 \begin{figure}[t] 7 8 \begin{alignat*}{6} … … 61 62 \subsection{Fold invariants} 62 63 63 These are the invariants that hold during the fold of {\sc f} over the program, 64 and that will later on be used to prove the properties of the iteration. 64 In this section, we present the invariants that hold during the fold of {\sc f} 65 over the program. These will be used later on to prove the properties of the 66 iteration. 65 67 66 68 Note that during the fixed point computation, the $\sigma$ function is 67 implemented as a trie for ease of access; computing $\sigma(x)$ is achieved by looking68 up the value of $x$ in the trie. Actually, during the fold, the value we69 pass along is a pair $\mathbb{N} \times \mathtt{ppc\_pc\_map}$. The first component70 is the number of bytes added to the program so far with respect to69 implemented as a trie for ease of access; computing $\sigma(x)$ is achieved by 70 looking up the value of $x$ in the trie. Actually, during the fold, the value 71 we pass along is a pair $\mathbb{N} \times \mathtt{ppc\_pc\_map}$. The first 72 component is the number of bytes added to the program so far with respect to 71 73 the previous iteration, and the second component, {\tt ppc\_pc\_map}, is the 72 ac ual $\sigma$ trie.74 actual $\sigma$ trie. 73 75 74 76 \begin{alignat*}{2} … … 78 80 \end{alignat*} 79 81 80 Th isinvariant states that any pseudoaddress not yet examined is not82 The first invariant states that any pseudoaddress not yet examined is not 81 83 present in the lookup trie. 82 84 … … 104 106 increase. It is needed for proving termination. 105 107 108 \begin{figure}[ht] 106 109 \begin{alignat*}{6} 107 110 \mathtt{sigma} & \omit\rlap{$\mathtt{\_compact\_unsafe} \equiv \lambda prefix.\lambda sigma.$}\notag\\ … … 116 119 &&&&& \ \ \mathtt{instruction\_size\_jmplen}\ j\ (\mathtt{nth}\ n\ prefix) 117 120 \end{alignat*} 118 119 This is a temporary formulation of the main property\\ 120 ({\tt sigma\_policy\_specification}); its main difference 121 \caption{Temporary safety property} 122 \label{sigmacompactunsafe} 123 \end{figure} 124 125 We can now proceed with the lemmas that are needed for algorithm safety. 126 The lemma in Figure~\ref{sigmacompactunsafe} is a temporary formulation of 127 the main property\\ ({\tt sigma\_policy\_specification}). Its main difference 121 128 from the final version is that it uses {\tt instruction\_size\_jmplen} to 122 129 compute the instruction size. This function uses $j$ to compute the span … … 127 134 property. 128 135 136 \begin{figure}[ht] 129 137 \begin{alignat*}{6} 130 138 \mathtt{sigma} & \omit\rlap{$\mathtt{\_safe} \equiv \lambda prefix.\lambda labels.\lambda old\_sigma.\lambda sigma$}\notag\\ … … 149 157 &&&&&\mathrm{long\_jump} \Rightarrow \mathrm{True} 150 158 \end{alignat*} 151 152 This is a more direct safety property: it states that branch instructions are 153 encoded properly, and that no wrong branch instructions are chosen. 159 \caption{Safety property} 160 \label{sigmasafe} 161 \end{figure} 162 163 The lemma in figure~\ref{sigmasafe} is a more direct safety property. It states 164 that branch instructions are encoded properly, and that no wrong branch 165 instructions are chosen. 154 166 155 167 Note that we compute the distance using the memory address of the instruction 156 plus its size : this follows the behaviour of the MCS51 microprocessor, which168 plus its size. This follows the behaviour of the MCS51 microprocessor, which 157 169 increases the program counter directly after fetching, and only then executes 158 170 the branch instruction (by changing the program counter again). 171 172 We now encode some final, simple, properties to make sure that our policy 173 remains consistent, and to keep track of whether the fixed point has been 174 reached. 159 175 160 176 \begin{align*} … … 198 214 The type of an iteration therefore becomes an option type: {\tt None} in case 199 215 the program becomes larger than 64 Kb, or $\mathtt{Some}\ \sigma$ 200 otherwise. We also no longer use a natural number to pass along the number of201 bytes added to the program size, but a boolean that indicates whether we have 202 changed somethingduring the iteration or not.216 otherwise. We also no longer pass along the number of bytes added to the 217 program size, but a boolean that indicates whether we have changed something 218 during the iteration or not. 203 219 204 220 If an iteration returns {\tt None}, we have the following invariant: … … 246 262 is only correct when we have reached the fixed point. 247 263 248 There is another, trivial, invariant if the iteration returns 249 $\mathtt{Some}\ \sigma$: $\mathtt{fst}\ sigma = 2^{16}$. 264 There is another, trivial, invariant in case the iteration returns 265 $\mathtt{Some}\ \sigma$: it must hold that $\mathtt{fst}\ sigma < 2^{16}$. 266 We need this invariant to make sure that addresses do not overflow. 250 267 251 268 The invariants that are taken directly from the fold invariants are trivial to 252 269 prove. 253 270 254 The proof of {\tt nec\_plus\_ultra} works as follows: if we return {\tt None},271 The proof of {\tt nec\_plus\_ultra} goes as follows: if we return {\tt None}, 255 272 then the program size must be greater than 64 Kb. However, since the 256 273 previous iteration did not return {\tt None} (because otherwise we would … … 288 305 longer jumps, in iteration $2n$ every branch instruction must be encoded as 289 306 a long jump. In this case, iteration $2n$ is equal to iteration $2n+1$ and the 290 fix point is reached.307 fixed point is reached.
Note: See TracChangeset
for help on using the changeset viewer.