# Changeset 3362

Ignore:
Timestamp:
Jun 17, 2013, 1:08:27 PM (6 years ago)
Message:
• added some bits as per Claudio's mail
• rewrote some small things
• general reread, spell check, grammar check
• 16 pages again now
Location:
src/ASM/CPP2012-policy
Files:
5 edited

Unmodified
Removed
• ## src/ASM/CPP2012-policy/algorithm.tex

 r3361 (using, for example, a constraint solver) can potentially be very costly. The SDCC compiler~\cite{SDCC2011}, which has a backend targetting the MCS-51 The SDCC compiler~\cite{SDCC2011}, which has a backend targeting the MCS-51 instruction set, simply encodes every branch instruction as a long jump without taking the distance into account. While certainly correct (the long jump can reach any destination in memory) and a very fast solution to compute, it results in a less than optimal solution. it results in a less than optimal solution in terms of output size and execution time. On the other hand, the {\tt gcc} compiler suite~\cite{GCC2012}, while compiling jump, which is always correct. The resulting algorithm, while not optimal, is at least as good as the ones from SDCC and {\tt gcc}, and potentially better. Its complexity remains the same (there are at most $2n$ iterations, where $n$ is the number of branch instructions in the program). The resulting algorithm, therefore, will not return the least fixed point, as it might have too many long jumps. However, it is still better than the algorithms from SDCC and {\tt gcc}, since even in the worst case, it will still return a smaller or equal solution. As for complexity, there are at most $2n$ iterations, with $n$ the number of branch instructions. Practical tests within the CerCo project on small to medium pieces of code have shown that in almost all cases, a fixed point is reached in 3 passes. Only in one case did the algorithm need 4. This is not surprising: after all, the difference between short/absolute and long jumps is only one byte (three for conditional jumps). For a change from short/absolute to long to have an effect on other jumps is therefore relatively uncommon, which explains why a fixed point is reached so quickly. \subsection{The algorithm in detail}
• ## src/ASM/CPP2012-policy/biblio.bib

 r2099 keywords = {relaxed memory models, semantics, verifying compilation}, } @incollection{lastyear, year={2012}, isbn={978-3-642-35307-9}, booktitle={Certified Programs and Proofs}, volume={7679}, series={Lecture Notes in Computer Science}, editor={Hawblitzel, Chris and Miller, Dale}, doi={10.1007/978-3-642-35308-6_8}, title={An Executable Semantics for CompCert C}, url={http://dx.doi.org/10.1007/978-3-642-35308-6_8}, publisher={Springer Berlin Heidelberg}, author={Campbell, Brian}, pages={60-75} } @misc { compcert:2011, title = {The {CompCert} project}, howpublished = {\url{http://compcert.inria.fr/}}, year = {2011}, key = {compcert:2011} }
• ## src/ASM/CPP2012-policy/conclusion.tex

 r2099 heuristics could be found to make such a solution practical for implementing in an existing compiler; this would be especially useful for embedded systems, where it is important to have as small solution as possible. where it is important to have as small a solution as possible. In itself the algorithm is already useful, as it results in a smaller solution without sacrificing speed or correctness. This algorithm is part of a greater whole, the CerCo project, which aims to completely formalise and verify a concrete cost preserving compiler for a large subset of the C programming language. More information on the formalisation of the assembler, of which the present work is a part, can be found in a companion publication~\cite{DC2012}. The certification of an assembler that relies on the branch displacement algorithm described in this paper was presented in~\cite{lastyear}. The assembler computes the $\sigma$ map as described in this paper and then works in two passes. In the first pass it builds a map from instruction labels to addresses in the assembly code. In the second pass it iterates over the code, translating every pseudo jump at address src to a label l associated to the assembly instruction at address dst to a jump of the size dictated by $(\sigma\ src)$ to $(\sigma\ dst)$. In case of conditional jumps, the translated jump may be implemented with a series of instructions. The proof of correctness abstracts over the algorithm used and only relies on {\tt sigma\_policy\_specification} (page~\ref{sigmapolspec}). It is a variation of a standard 1-to-many forward simulation proof~\cite{compcert:2011}. The relation R between states just maps every code address ppc stored in registers or memory to $(\sigma\ ppc)$. To identify the code addresses, an additional data structure is always kept together with the source state and is updated by the semantics. The semantics is preserved only for those programs whose source code operations $(f\ ppc1\ \ldots\ ppcn)$ applied to code addresses $ppc1 \ldots ppcn$ are such that $(f\ (\sigma\ ppc1)\ ...\ (\sigma\ ppc) = f\ ppc\ ppcn))$. For example, an injective $\sigma$ preserves a binary equality test f for code addresses, but not pointer subtraction. The main lemma (fetching simulation), which relies on\\ {\tt sigma\_policy\_specification} and is established by structural induction over the source code, says that fetching an assembly instruction at position ppc is equal to fetching the translation of the instruction at position $(\sigma\ ppc)$, and that the new incremented program counter is at the beginning of the next instruction (compactness). The only exception is when the instruction fetched is placed at the end of code memory and is followed only by dead code. Execution simulation is trivial because of the restriction over well behaved programs w.r.t. sigma. The condition $\sigma\ 0 = 0$ is necessary because the hardware model prescribes that the first instruction to be executed will be at address 0. For the details see~\cite{lastyear}. \subsection{Related work}
• ## src/ASM/CPP2012-policy/problem.tex

 r3361 The problem of branch displacement optimisation, also known as jump encoding, is a well-known problem in assembler design~\cite{Hyde2006}. It is caused by the a well-known problem in assembler design~\cite{Hyde2006}. Its origin lies in the fact that in many architecture sets, the encoding (and therefore size) of some instructions depends on the distance to their operand (the instruction 'span'). encoded using three branch instructions (for instructions whose logical negation is available, it can be done with two branch instructions, but for some instructions this is not available); the call instruction is some instructions this is not the case). The call instruction is only available in absolute and long forms. Note that even though the MCS-51 architecture is much less advanced and simpler than the x86-64 architecture, the basic types of branch instruction Note that even though the MCS-51 architecture is much less advanced and much simpler than the x86-64 architecture, the basic types of branch instruction remain the same: a short jump with a limited range, an intra-segment jump and a jump that can reach the entire available memory. possible, thus minimising program length and execution time. This problem is known to be NP-complete~\cite{Robertson1979,Szymanski1978}, Similar problems, e.g. the branch displacement optimisation problem for other architectures, are known to be NP-complete~\cite{Robertson1979,Szymanski1978}, which could make finding an optimal solution very time-consuming. However, neither of these claims (termination nor optimality) hold when we add the absolute jump, as with absolute jumps, the encoding of a branch the absolute jump. With absolute jumps, the encoding of a branch instruction no longer depends only on the distance between the branch instruction and its target: an absolute jump is possible when they are in the same segment (for the MCS-51, this means that the first 5 instruction and its target. An absolute jump is possible when instruction and target are in the same segment (for the MCS-51, this means that the first 5 bytes of their addresses have to be equal). It is therefore entirely possible for two branch instructions with the same span to be encoded in different ways shorter). In addition, our previous optimality argument no longer holds. Consider the program shown in Figure~\ref{f:opt_example}. Suppose that the distance between In addition, our previous optimality argument no longer holds. Consider the program shown in Figure~\ref{f:opt_example}. Suppose that the distance between $\mathtt{L}_{0}$ and $\mathtt{L}_{1}$ is such that if {\tt jmp X} is encoded as a short jump, there is a segment border just after $\mathtt{L}_{1}$. Let us also assume that the three branches to $\mathtt{L}_{1}$ are all in the same us also assume that all three branches to $\mathtt{L}_{1}$ are in the same segment, but far enough away from $\mathtt{L}_{1}$ that they cannot be encoded as short jumps.
• ## src/ASM/CPP2012-policy/proof.tex

 r3361 In this section, we present the correctness proof for the algorithm in more detail.  The main correctness statement is shown, slightly simplified, in~Figure~\cite{statement}. detail.  The main correctness statement is shown, slightly simplified, in~Figure~\ref{statement}. \label{sigmapolspec} \begin{figure}[t] \begin{alignat*}{6} \subsection{Fold invariants} These are the invariants that hold during the fold of {\sc f} over the program, and that will later on be used to prove the properties of the iteration. In this section, we present the invariants that hold during the fold of {\sc f} over the program. These will be used later on to prove the properties of the iteration. Note that during the fixed point computation, the $\sigma$ function is implemented as a trie for ease of access; computing $\sigma(x)$ is achieved by looking up the value of $x$ in the trie. Actually, during the fold, the value we pass along is a pair $\mathbb{N} \times \mathtt{ppc\_pc\_map}$. The first component is the number of bytes added to the program so far with respect to implemented as a trie for ease of access; computing $\sigma(x)$ is achieved by looking up the value of $x$ in the trie. Actually, during the fold, the value we pass along is a pair $\mathbb{N} \times \mathtt{ppc\_pc\_map}$. The first component is the number of bytes added to the program so far with respect to the previous iteration, and the second component, {\tt ppc\_pc\_map}, is the acual $\sigma$ trie. actual $\sigma$ trie. \begin{alignat*}{2} \end{alignat*} This invariant states that any pseudo-address not yet examined is not The first invariant states that any pseudo-address not yet examined is not present in the lookup trie. increase. It is needed for proving termination. \begin{figure}[ht] \begin{alignat*}{6} \mathtt{sigma} & \omit\rlap{$\mathtt{\_compact\_unsafe} \equiv \lambda prefix.\lambda sigma.$}\notag\\ &&&&& \ \ \mathtt{instruction\_size\_jmplen}\ j\ (\mathtt{nth}\ n\ prefix) \end{alignat*} This is a temporary formulation of the main property\\ ({\tt sigma\_policy\_specification}); its main difference \caption{Temporary safety property} \label{sigmacompactunsafe} \end{figure} We can now proceed with the lemmas that are needed for algorithm safety. The lemma in Figure~\ref{sigmacompactunsafe} is a temporary formulation of the main property\\ ({\tt sigma\_policy\_specification}). Its main difference from the final version is that it uses {\tt instruction\_size\_jmplen} to compute the instruction size. This function uses $j$ to compute the span property. \begin{figure}[ht] \begin{alignat*}{6} \mathtt{sigma} & \omit\rlap{$\mathtt{\_safe} \equiv \lambda prefix.\lambda labels.\lambda old\_sigma.\lambda sigma$}\notag\\ &&&&&\mathrm{long\_jump} \Rightarrow \mathrm{True} \end{alignat*} This is a more direct safety property: it states that branch instructions are encoded properly, and that no wrong branch instructions are chosen. \caption{Safety property} \label{sigmasafe} \end{figure} The lemma in figure~\ref{sigmasafe} is a more direct safety property. It states that branch instructions are encoded properly, and that no wrong branch instructions are chosen. Note that we compute the distance using the memory address of the instruction plus its size: this follows the behaviour of the MCS-51 microprocessor, which plus its size. This follows the behaviour of the MCS-51 microprocessor, which increases the program counter directly after fetching, and only then executes the branch instruction (by changing the program counter again). We now encode some final, simple, properties to make sure that our policy remains consistent, and to keep track of whether the fixed point has been reached. \begin{align*} The type of an iteration therefore becomes an option type: {\tt None} in case the program becomes larger than 64 Kb, or $\mathtt{Some}\ \sigma$ otherwise. We also no longer use a natural number to pass along the number of bytes added to the program size, but a boolean that indicates whether we have changed something during the iteration or not. otherwise. We also no longer pass along the number of bytes added to the program size, but a boolean that indicates whether we have changed something during the iteration or not. If an iteration returns {\tt None}, we have the following invariant: is only correct when we have reached the fixed point. There is another, trivial, invariant if the iteration returns $\mathtt{Some}\ \sigma$: $\mathtt{fst}\ sigma = 2^{16}$. There is another, trivial, invariant in case the iteration returns $\mathtt{Some}\ \sigma$: it must hold that $\mathtt{fst}\ sigma < 2^{16}$. We need this invariant to make sure that addresses do not overflow. The invariants that are taken directly from the fold invariants are trivial to prove. The proof of {\tt nec\_plus\_ultra} works as follows: if we return {\tt None}, The proof of {\tt nec\_plus\_ultra} goes as follows: if we return {\tt None}, then the program size must be greater than 64 Kb. However, since the previous iteration did not return {\tt None} (because otherwise we would longer jumps, in iteration $2n$ every branch instruction must be encoded as a long jump. In this case, iteration $2n$ is equal to iteration $2n+1$ and the fixpoint is reached. fixed point is reached.
Note: See TracChangeset for help on using the changeset viewer.