Changeset 3473
- Timestamp:
- Sep 22, 2014, 11:25:51 AM (6 years ago)
- Location:
- Papers/sttt
- Files:
-
- 4 edited
Legend:
- Unmodified
- Added
- Removed
-
Papers/sttt/conclusion.tex
r3470 r3473 1 \section{Conclusion}2 3 In the previous sections we have discussed the branch displacement optimisation4 problem, presented an optimised solution, and discussed the proof of5 termination and correctness for this algorithm, as formalised in Matita.6 7 The algorithm we have presented is fast and correct, but not optimal; a true8 optimal solution would need techniques like constraint solvers. While outside9 the scope of the present research, it would be interesting to see if enough10 heuristics could be found to make such a solution practical for implementing11 in an existing compiler; this would be especially useful for embedded systems,12 where it is important to have as small a solution as possible.13 14 In itself the algorithm is already useful, as it results in a smaller solution15 than the simple `every branch instruction is long' used up until now---and with16 only 64 Kb of memory, every byte counts. It also results in a smaller solution17 than the greatest fixed point algorithm that {\tt gcc} uses. It does this18 without sacrificing speed or correctness.19 20 The certification of an assembler that relies on the branch displacement21 algorithm described in this paper was presented in~\cite{lastyear}.22 The assembler computes the $\sigma$ map as described in this paper and23 then works in two passes. In the first pass it builds a map24 from instruction labels to addresses in the assembly code. In the25 second pass it iterates over the code, translating every pseudo jump26 at address $src$ to a label l associated to the assembly instruction at27 address $dst$ to a jump of the size dictated by $(\sigma\ src)$ to28 $(\sigma\ dst)$. In case of conditional jumps, the translated jump may be29 implemented with a series of instructions.30 31 The proof of correctness abstracts over the algorithm used and only relies on32 {\tt sigma\_policy\_specification} (page~\ref{sigmapolspec}). It is a variation33 of a standard 1-to-many forward simulation proof~\cite{Leroy2009}. The34 relation R between states just maps every code address $ppc$ stored in35 registers or memory to $(\sigma\ ppc)$. To identify the code addresses,36 an additional data structure is always kept together with the source37 state and is updated by the semantics. The semantics is preserved38 only for those programs whose source code operations39 $(f\ ppc_1\ \ldots\ ppc_n)$ applied to code addresses $ppc_1 \ldots ppc_n$ are40 such that $(f\ (\sigma\ ppc_1)\ ...\ (\sigma\ ppc_n) = f\ ppc_1\ ppc_n))$. For41 example, an injective $\sigma$ preserves a binary equality test f for code42 addresses, but not pointer subtraction.43 44 The main lemma (fetching simulation), which relies on\\45 {\tt sigma\_policy\_specification} and is established by structural induction46 over the source code, says that fetching an assembly instruction at47 position ppc is equal to fetching the translation of the instruction at48 position $(\sigma\ ppc)$, and that the new incremented program counter is at49 the beginning of the next instruction (compactness). The only exception is50 when the instruction fetched is placed at the end of code memory and is51 followed only by dead code. Execution simulation is trivial because of the52 restriction over well behaved programs w.r.t. sigma. The condition53 $\sigma\ 0 = 0$ is necessary because the hardware model prescribes that the54 first instruction to be executed will be at address 0. For the details55 see~\cite{lastyear}.56 57 Instead of verifying the algorithm directly, another solution to the problem58 would be to run an optimisation algorithm, and then verify the safety of the59 result using a verified validator. Such a validator would be easier to verify60 than the algorithm itself and it would also be efficient, requiring only a61 linear pass over the source code to test the specification. However, it is62 surely also interesting to formally prove that the assembler never rejects63 programs that should be accepted, i.e. that the algorithm itself is correct.64 This is the topic of the current paper.65 66 \subsection{Related work}67 68 As far as we are aware, this is the first formal discussion of the branch69 displacement optimisation algorithm.70 71 The CompCert project is another verified compiler project.72 Their backend~\cite{Leroy2009} generates assembly code for (amongst others) subsets of the73 PowerPC and x86 (32-bit) architectures. At the assembly code stage, there is74 no distinction between the span-dependent jump instructions, so a branch75 displacement optimisation algorithm is not needed.76 77 %An offshoot of the CompCert project is the CompCertTSO project, which adds78 %thread concurrency and synchronisation to the CompCert compiler~\cite{TSO2011}.79 %This compiler also generates assembly code and therefore does not include a80 %branch displacement algorithm.81 82 %Finally, there is also the Piton stack~\cite{Moore1996}, which not only includes the83 %formal verification of a compiler, but also of the machine architecture84 %targeted by that compiler, a microprocessor called the FM9001.85 %However, this architecture does not have different86 %jump sizes (branching is simulated by assigning values to the program counter),87 %so the branch displacement problem is irrelevant.88 -
Papers/sttt/main.tex
r3472 r3473 18 18 \begin{document} 19 19 20 \title{ Averified optimising assembler20 \title{On the proof of correctness of a verified optimising assembler 21 21 \thanks{Research supported by the CerCo project, within the Future and Emerging Technologies (FET) programme of the Seventh Framework Programme for Research of the European Commission, under FET-Open grant number 243881}} 22 22 \author{Jaap Boender \and Dominic P. Mulligan \and Claudio Sacerdoti Coen} … … 26 26 27 27 \begin{abstract} 28 Optimising assemblers face the `branch displacement problem', i.e. how best to choose between concrete machine code jump instructions---typically of differing instruction and offset sizes---when expanding pseudo-instructions.29 Ideally, a well-implementedoptimising assembler would choose the set of jump expansions that minimises the size of the resulting machine code program, a task that is provably \textsc{np}-hard.30 31 As part of CerCo (`Certified Complexity') ---an \textsc{eu}-funded project to develop a verified concrete complexity preserving compiler for a large subset of the C programming language---we have implemented and proved correct an assembler within an interactive theorem prover.28 Optimising assemblers face the `branch displacement' or `jump encoding' problem, i.e. how best to choose between concrete machine code jump instructions --- typically of differing instruction and offset sizes --- when expanding pseudo-instructions. 29 Ideally, an optimising assembler would choose the set of jump expansions that minimises the size of the resulting machine code program, a task that is provably \textsc{np}-hard. 30 31 As part of CerCo (`Certified Complexity') --- an \textsc{eu}-funded project to develop a verified concrete complexity preserving compiler for a large subset of the C programming language --- we have implemented and proved correct an assembler within an interactive theorem prover. 32 32 Our assembler targets the instruction set of a typical micro-controller, the Intel \textsc{mcs}-51 series. 33 33 As is common in embedded systems development, this micro-controller has a paucity of available code memory and therefore we face an additional pressure in reducing the size of any assembled machine code program. … … 39 39 \end{abstract} 40 40 41 \input{problem} 42 \input{algorithm} 43 \input{proof} 44 \input{conclusion} 41 \section{Introduction} 42 43 The problem of branch displacement optimisation, also known as jump encoding, is 44 a well-known problem in assembler design~\cite{Hyde2006}. Its origin lies in the 45 fact that in many architecture sets, the encoding (and therefore size) of some 46 instructions depends on the distance to their operand (the instruction 'span'). 47 The branch displacement optimisation problem consists of encoding these 48 span-dependent instructions in such a way that the resulting program is as 49 small as possible. 50 51 This problem is the subject of the present paper. After introducing the problem 52 in more detail, we will discuss the solutions used by other compilers, present 53 the algorithm we use in the CerCo assembler, and discuss its verification, 54 that is the proofs of termination and correctness using the Matita proof 55 assistant~\cite{Asperti2007}. 56 57 Formulating the final statement of correctness and finding the loop invariants 58 have been non-trivial tasks and are, indeed, the main contribution of this 59 paper. It has required considerable care and fine-tuning to formulate not 60 only the minimal statement required for the ulterior proof of correctness of 61 the assembler, but also the minimal set of invariants needed for the proof 62 of correctness of the algorithm. 63 64 The research presented in this paper has been executed within the CerCo project 65 which aims at formally verifying a C compiler with cost annotations. The 66 target architecture for this project is the MCS-51, whose instruction set 67 contains span-dependent instructions. Furthermore, its maximum addressable 68 memory size is very small (64 Kb), which makes it important to generate 69 programs that are as small as possible. With this optimisation, however, comes increased complexity and hence increased possibility for error. We must make sure that the branch instructions are encoded correctly, otherwise the assembled program will behave unpredictably. 70 71 All Matita files related to this development can be found on the CerCo 72 website, \url{http://cerco.cs.unibo.it}. The specific part that contains the 73 branch displacement algorithm is in the {\tt ASM} subdirectory, in the files 74 {\tt PolicyFront.ma}, {\tt PolicyStep.ma} and {\tt Policy.ma}. 75 76 \section{The branch displacement optimisation problem} 77 78 In most modern instruction sets that have them, the only span-dependent 79 instructions are branch instructions. Taking the ubiquitous x86-64 instruction 80 set as an example, we find that it contains eleven different forms of the 81 unconditional branch instruction, all with different ranges, instruction sizes 82 and semantics (only six are valid in 64-bit mode, for example). Some examples 83 are shown in Figure~\ref{f:x86jumps} (see also~\cite{IntelDev}). 84 85 \begin{figure}[h] 86 \begin{center} 87 \begin{tabular}{|l|l|l|} 88 \hline 89 Instruction & Size (bytes) & Displacement range \\ 90 \hline 91 Short jump & 2 & -128 to 127 bytes \\ 92 Relative near jump & 5 & $-2^{32}$ to $2^{32}-1$ bytes \\ 93 Absolute near jump & 6 & one segment (64-bit address) \\ 94 Far jump & 8 & entire memory (indirect jump) \\ 95 \hline 96 \end{tabular} 97 \end{center} 98 \caption{List of x86 branch instructions} 99 \label{f:x86jumps} 100 \end{figure} 101 102 The chosen target architecture of the CerCo project is the Intel MCS-51, which 103 features three types of branch instructions (or jump instructions; the two terms 104 are used interchangeably), as shown in Figure~\ref{f:mcs51jumps}. 105 106 \begin{figure}[h] 107 \begin{center} 108 \begin{tabular}{|l|l|l|l|} 109 \hline 110 Instruction & Size & Execution time & Displacement range \\ 111 & (bytes) & (cycles) & \\ 112 \hline 113 SJMP (`short jump') & 2 & 2 & -128 to 127 bytes \\ 114 AJMP (`absolute jump') & 2 & 2 & one segment (11-bit address) \\ 115 LJMP (`long jump') & 3 & 3 & entire memory \\ 116 \hline 117 \end{tabular} 118 \end{center} 119 \caption{List of MCS-51 branch instructions} 120 \label{f:mcs51jumps} 121 \end{figure} 122 123 Conditional branch instructions are only available in short form, which 124 means that a conditional branch outside the short address range has to be 125 encoded using three branch instructions (for instructions whose logical 126 negation is available, it can be done with two branch instructions, but for 127 some instructions this is not the case). The call instruction is 128 only available in absolute and long forms. 129 130 Note that even though the MCS-51 architecture is much less advanced and much 131 simpler than the x86-64 architecture, the basic types of branch instruction 132 remain the same: a short jump with a limited range, an intra-segment jump and a 133 jump that can reach the entire available memory. 134 135 Generally, in code fed to the assembler as input, the only 136 difference between branch instructions is semantics, not span. This 137 means that a distinction is made between an unconditional branch and the 138 several kinds of conditional branch, but not between their short, absolute or 139 long variants. 140 141 The algorithm used by the assembler to encode these branch instructions into 142 the different machine instructions is known as the {\em branch displacement 143 algorithm}. The optimisation problem consists of finding as small an encoding as 144 possible, thus minimising program length and execution time. 145 146 Similar problems, e.g. the branch displacement optimisation problem for other 147 architectures, are known to be NP-complete~\cite{Robertson1979,Szymanski1978}, 148 which could make finding an optimal solution very time-consuming. 149 150 The canonical solution, as shown by Szymanski~\cite{Szymanski1978} or more 151 recently by Dickson~\cite{Dickson2008} for the x86 instruction set, is to use a 152 fixed point algorithm that starts with the shortest possible encoding (all 153 branch instruction encoded as short jumps, which is likely not a correct 154 solution) and then iterates over the source to re-encode those branch 155 instructions whose target is outside their range. 156 157 \subsection*{Adding absolute jumps} 158 159 In both papers mentioned above, the encoding of a jump is only dependent on the 160 distance between the jump and its target: below a certain value a short jump 161 can be used; above this value the jump must be encoded as a long jump. 162 163 Here, termination of the smallest fixed point algorithm is easy to prove. All 164 branch instructions start out encoded as short jumps, which means that the 165 distance between any branch instruction and its target is as short as possible 166 (all the intervening jumps are short). 167 If, in this situation, there is a branch instruction $b$ whose span is not 168 within the range for a short jump, we can be sure that we can never reach a 169 situation where the span of $j$ is so small that it can be encoded as a short 170 jump. This argument continues to hold throughout the subsequent iterations of 171 the algorithm: short jumps can change into long jumps, but not \emph{vice versa}, 172 as spans only increase. Hence, the algorithm either terminates early when a fixed 173 point is reached or when all short jumps have been changed into long jumps. 174 175 Also, we can be certain that we have reached an optimal solution: a short jump 176 is only changed into a long jump if it is absolutely necessary. 177 178 However, neither of these claims (termination nor optimality) hold when we add 179 the absolute jump. With absolute jumps, the encoding of a branch 180 instruction no longer depends only on the distance between the branch 181 instruction and its target. An absolute jump is possible when instruction and 182 target are in the same segment (for the MCS-51, this means that the first 5 183 bytes of their addresses have to be equal). It is therefore entirely possible 184 for two branch instructions with the same span to be encoded in different ways 185 (absolute if the branch instruction and its target are in the same segment, 186 long if this is not the case). 187 188 \begin{figure}[t] 189 \begin{subfigure}[b]{.45\linewidth} 190 \small 191 \begin{alltt} 192 jmp X 193 \ldots 194 L\(\sb{0}\): \ldots 195 % Start of new segment if 196 % jmp X is encoded as short 197 \ldots 198 jmp L\(\sb{0}\) 199 \end{alltt} 200 \caption{Example of a program where a long jump becomes absolute} 201 \label{f:term_example} 202 \end{subfigure} 203 \hfill 204 \begin{subfigure}[b]{.45\linewidth} 205 \small 206 \begin{alltt} 207 L\(\sb{0}\): jmp X 208 X: \ldots 209 \ldots 210 L\(\sb{1}\): \ldots 211 % Start of new segment if 212 % jmp X is encoded as short 213 \ldots 214 jmp L\(\sb{1}\) 215 \ldots 216 jmp L\(\sb{1}\) 217 \ldots 218 jmp L\(\sb{1}\) 219 \ldots 220 \end{alltt} 221 \caption{Example of a program where the fixed-point algorithm is not optimal} 222 \label{f:opt_example} 223 \end{subfigure} 224 \end{figure} 225 226 This invalidates our earlier termination argument: a branch instruction, once encoded 227 as a long jump, can be re-encoded during a later iteration as an absolute jump. 228 Consider the program shown in Figure~\ref{f:term_example}. At the start of the 229 first iteration, both the branch to {\tt X} and the branch to $\mathtt{L}_{0}$ 230 are encoded as small jumps. Let us assume that in this case, the placement of 231 $\mathtt{L}_{0}$ and the branch to it are such that $\mathtt{L}_{0}$ is just 232 outside the segment that contains this branch. Let us also assume that the 233 distance between $\mathtt{L}_{0}$ and the branch to it is too large for the 234 branch instruction to be encoded as a short jump. 235 236 All this means that in the second iteration, the branch to $\mathtt{L}_{0}$ will 237 be encoded as a long jump. If we assume that the branch to {\tt X} is encoded as 238 a long jump as well, the size of the branch instruction will increase and 239 $\mathtt{L}_{0}$ will be `propelled' into the same segment as its branch 240 instruction, because every subsequent instruction will move one byte forward. 241 Hence, in the third iteration, the branch to $\mathtt{L}_{0}$ can be encoded as 242 an absolute jump. At first glance, there is nothing that prevents us from 243 constructing a configuration where two branch instructions interact in such a 244 way as to iterate indefinitely between long and absolute encodings. 245 246 This situation mirrors the explanation by Szymanski~\cite{Szymanski1978} of why 247 the branch displacement optimisation problem is NP-complete. In this explanation, 248 a condition for NP-completeness is the fact that programs be allowed to contain 249 {\em pathological} jumps. These are branch instructions that can normally not be 250 encoded as a short(er) jump, but gain this property when some other branch 251 instructions are encoded as a long(er) jump. This is exactly what happens in 252 Figure~\ref{f:term_example}. By encoding the first branch instruction as a long 253 jump, another branch instruction switches from long to absolute (which is 254 shorter). 255 256 In addition, our previous optimality argument no longer holds. Consider the 257 program shown in Figure~\ref{f:opt_example}. Suppose that the distance between 258 $\mathtt{L}_{0}$ and $\mathtt{L}_{1}$ is such that if {\tt jmp X} is encoded 259 as a short jump, there is a segment border just after $\mathtt{L}_{1}$. Let 260 us also assume that all three branches to $\mathtt{L}_{1}$ are in the same 261 segment, but far enough away from $\mathtt{L}_{1}$ that they cannot be encoded 262 as short jumps. 263 264 Then, if {\tt jmp X} were to be encoded as a short jump, which is clearly 265 possible, all of the branches to $\mathtt{L}_{1}$ would have to be encoded as 266 long jumps. However, if {\tt jmp X} were to be encoded as a long jump, and 267 therefore increase in size, $\mathtt{L}_{1}$ would be `propelled' across the 268 segment border, so that the three branches to $\mathtt{L}_{1}$ could be encoded 269 as absolute jumps. Depending on the relative sizes of long and absolute jumps, 270 this solution might actually be smaller than the one reached by the smallest 271 fixed point algorithm. 272 273 \section{Our algorithm} 274 275 \subsection{Design decisions} 276 277 Given the NP-completeness of the problem, finding optimal solutions 278 (using, for example, a constraint solver) can potentially be very costly. 279 280 The SDCC compiler~\cite{SDCC2011}, which has a backend targeting the MCS-51 281 instruction set, simply encodes every branch instruction as a long jump 282 without taking the distance into account. While certainly correct (the long 283 jump can reach any destination in memory) and a very fast solution to compute, 284 it results in a less than optimal solution in terms of output size and 285 execution time. 286 287 On the other hand, the {\tt gcc} compiler suite, while compiling 288 C on the x86 architecture, uses a greatest fix point algorithm. In other words, 289 it starts with all branch instructions encoded as the largest jumps 290 available, and then tries to reduce the size of branch instructions as much as 291 possible. 292 293 Such an algorithm has the advantage that any intermediate result it returns 294 is correct: the solution where every branch instruction is encoded as a large 295 jump is always possible, and the algorithm only reduces those branch 296 instructions whose destination address is in range for a shorter jump. 297 The algorithm can thus be stopped after a determined number of steps without 298 sacrificing correctness. 299 300 The result, however, is not necessarily optimal. Even if the algorithm is run 301 until it terminates naturally, the fixed point reached is the {\em greatest} 302 fixed point, not the least fixed point. Furthermore, {\tt gcc} (at least for 303 the x86 architecture) only uses short and long jumps. This makes the algorithm 304 more efficient, as shown in the previous section, but also results in a less 305 optimal solution. 306 307 In the CerCo assembler, we opted at first for a least fixed point algorithm, 308 taking absolute jumps into account. 309 310 Here, we ran into a problem with proving termination, as explained in the 311 previous section: if we only take short and long jumps into account, the jump 312 encoding can only switch from short to long, but never in the other direction. 313 When we add absolute jumps, however, it is theoretically possible for a branch 314 instruction to switch from absolute to long and back, as previously explained. 315 Proving termination then becomes difficult, because there is nothing that 316 precludes a branch instruction from oscillating back and forth between absolute 317 and long jumps indefinitely. 318 319 To keep the algorithm in the same complexity class and more easily 320 prove termination, we decided to explicitly enforce the `branch instructions 321 must always grow longer' requirement: if a branch instruction is encoded as a 322 long jump in one iteration, it will also be encoded as a long jump in all the 323 following iterations. Therefore the encoding of any branch instruction 324 can change at most two times: once from short to absolute (or long), and once 325 from absolute to long. 326 327 There is one complicating factor. Suppose that a branch instruction is encoded 328 in step $n$ as an absolute jump, but in step $n+1$ it is determined that 329 (because of changes elsewhere) it can now be encoded as a short jump. Due to 330 the requirement that the branch instructions must always grow longer, 331 the branch encoding will be encoded as an absolute jump in step 332 $n+1$ as well. 333 334 This is not necessarily correct. A branch instruction that can be 335 encoded as a short jump cannot always also be encoded as an absolute jump, as a 336 short jump can bridge segments, whereas an absolute jump cannot. Therefore, 337 in this situation we have decided to encode the branch instruction as a long 338 jump, which is always correct. 339 340 The resulting algorithm, therefore, will not return the least fixed point, as 341 it might have too many long jumps. However, it is still better than the 342 algorithms from SDCC and {\tt gcc}, since even in the worst case, it will still 343 return a smaller or equal solution. 344 345 Experimenting with our algorithm on the test suite of C programs included with 346 gcc 2.3.3 has shown that on average, about 25 percent of jumps are encoded as short or absolute jumps by the algorithm. As not all instructions are jumps, this does not make for a large reduction in size, but it can make for a reduction in execution time: if jumps 347 are executed multiple times, for example in loops, the fact that short jumps take less cycles to execute than long jumps can have great effect. 348 349 As for complexity, there are at most $2n$ iterations, where $n$ is the number of 350 branch instructions. Practical tests within the CerCo project on small to 351 medium pieces of code have shown that in almost all cases, a fixed point is 352 reached in 3 passes. Only in one case did the algorithm need 4. This is not surprising: after all, the difference between short/absolute and 353 long jumps is only one byte (three for conditional jumps). For a change from 354 short/absolute to long to have an effect on other jumps is therefore relatively 355 uncommon, which explains why a fixed point is reached so quickly. 356 357 \subsection{The algorithm in detail} 358 359 The branch displacement algorithm forms part of the translation from 360 pseudocode to assembler. More specifically, it is used by the function that 361 translates pseudo-addresses (natural numbers indicating the position of the 362 instruction in the program) to actual addresses in memory. Note that in pseudocode, all instructions are of size 1. 363 364 Our original intention was to have two different functions, one function 365 $\mathtt{policy}: \mathbb{N} \rightarrow \{\mathtt{short\_jump}, 366 \mathtt{absolute\_jump}, \mathtt{long\_jump}\}$ to associate jumps to their 367 intended encoding, and a function $\sigma: \mathbb{N} \rightarrow 368 \mathtt{Word}$ to associate pseudo-addresses to machine addresses. $\sigma$ 369 would use $\mathtt{policy}$ to determine the size of jump instructions. This turned out to be suboptimal from the algorithmic point of view and 370 impossible to prove correct. 371 372 From the algorithmic point of view, in order to create the $\mathtt{policy}$ 373 function, we must necessarily have a translation from pseudo-addresses 374 to machine addresses (i.e. a $\sigma$ function): in order to judge the distance 375 between a jump and its destination, we must know their memory locations. 376 Conversely, in order to create the $\sigma$ function, we need to have the 377 $\mathtt{policy}$ function, otherwise we do not know the sizes of the jump 378 instructions in the program. 379 380 Much the same problem appears when we try to prove the algorithm correct: the 381 correctness of $\mathtt{policy}$ depends on the correctness of $\sigma$, and 382 the correctness of $\sigma$ depends on the correctness of $\mathtt{policy}$. 383 384 We solved this problem by integrating the $\mathtt{policy}$ and $\sigma$ 385 algorithms. We now have a function 386 $\sigma: \mathbb{N} \rightarrow \mathtt{Word} \times \mathtt{bool}$ which 387 associates a pseudo-address to a machine address. The boolean denotes a forced 388 long jump; as noted in the previous section, if during the fixed point 389 computation an absolute jump changes to be potentially re-encoded as a short 390 jump, the result is actually a long jump. It might therefore be the case that 391 jumps are encoded as long jumps without this actually being necessary, and this 392 information needs to be passed to the code generating function. 393 394 The assembler function encodes the jumps by checking the distance between 395 source and destination according to $\sigma$, so it could select an absolute 396 jump in a situation where there should be a long jump. The boolean is there 397 to prevent this from happening by indicating the locations where a long jump 398 should be encoded, even if a shorter jump is possible. This has no effect on 399 correctness, since a long jump is applicable in any situation. 400 401 \begin{figure}[t] 402 \small 403 \begin{algorithmic} 404 \Function{f}{$labels$,$old\_sigma$,$instr$,$ppc$,$acc$} 405 \State $\langle added, pc, sigma \rangle \gets acc$ 406 \If {$instr$ is a backward jump to $j$} 407 \State $length \gets \mathrm{jump\_size}(pc,sigma_1(labels(j)))$ 408 \Comment compute jump distance 409 \ElsIf {$instr$ is a forward jump to $j$} 410 \State $length \gets \mathrm{jump\_size}(pc,old\_sigma_1(labels(j))+added)$ 411 \EndIf 412 \State $old\_length \gets \mathrm{old\_sigma_1}(ppc)$ 413 \State $new\_length \gets \mathrm{max}(old\_length, length)$ 414 \Comment length must never decrease 415 \State $old\_size \gets \mathrm{old\_sigma_2}(ppc)$ 416 \State $new\_size \gets \mathrm{instruction\_size}(instr,new\_length)$ 417 \Comment compute size in bytes 418 \State $new\_added \gets added+(new\_size-old\_size)$ 419 \Comment keep track of total added bytes 420 \State $new\_sigma \gets old\_sigma$ 421 \State $new\_sigma_1(ppc+1) \gets pc+new\_size$ 422 \State $new\_sigma_2(ppc) \gets new\_length$ 423 \Comment update $\sigma$ \\ 424 \Return $\langle new\_added, pc+new\_size, new\_sigma \rangle$ 425 \EndFunction 426 \end{algorithmic} 427 \caption{The heart of the algorithm} 428 \label{f:jump_expansion_step} 429 \end{figure} 430 431 The algorithm, shown in Figure~\ref{f:jump_expansion_step}, works by folding the 432 function {\sc f} over the entire program, thus gradually constructing $sigma$. 433 This constitutes one step in the fixed point calculation; successive steps 434 repeat the fold until a fixed point is reached. We have abstracted away the case where an instruction is not a jump, since the size of these instructions is constant. 435 436 Parameters of the function {\sc f} are: 437 \begin{itemize} 438 \item a function $labels$ that associates a label to its pseudo-address; 439 \item $old\_sigma$, the $\sigma$ function returned by the previous 440 iteration of the fixed point calculation; 441 \item $instr$, the instruction currently under consideration; 442 \item $ppc$, the pseudo-address of $instr$; 443 \item $acc$, the fold accumulator, which contains $added$ (the number of 444 bytes added to the program size with respect to the previous iteration), $pc$ 445 (the highest memory address reached so far), and of course $sigma$, the 446 $\sigma$ function under construction. 447 \end{itemize} 448 The first two are parameters that remain the same through one iteration, the 449 final three are standard parameters for a fold function (including $ppc$, 450 which is simply the number of instructions of the program already processed). 451 452 The $\sigma$ functions used by {\sc f} are not of the same type as the final 453 $\sigma$ function: they are of type 454 $\sigma: \mathbb{N} \rightarrow \mathbb{N} \times \{\mathtt{short\_jump}, 455 \mathtt{absolute\_jump},\mathtt{long\_jump}\}$; a function that associates a 456 pseudo-address with a memory address and a jump length. We do this to 457 ease the comparison of jump lengths between iterations. In the algorithm, 458 we use the notation $sigma_1(x)$ to denote the memory address corresponding to 459 $x$, and $sigma_2(x)$ for the jump length corresponding to $x$. 460 461 Note that the $\sigma$ function used for label lookup varies depending on 462 whether the label is behind our current position or ahead of it. For 463 backward branches, where the label is behind our current position, we can use 464 $sigma$ for lookup, since its memory address is already known. However, for 465 forward branches, the memory address of the address of the label is not yet 466 known, so we must use $old\_sigma$. 467 468 We cannot use $old\_sigma$ without change: it might be the case that we have 469 already increased the size of some branch instructions before, making the 470 program longer and moving every instruction forward. We must compensate for this 471 by adding the size increase of the program to the label's memory address 472 according to $old\_sigma$, so that branch instruction spans do not get 473 compromised. 474 475 %Note also that we add the pc to $sigma$ at location $ppc+1$, whereas we add the 476 %jump length at location $ppc$. We do this so that $sigma(ppc)$ will always 477 %return a pair with the start address of the instruction at $ppc$ and the 478 %length of its branch instruction (if any); the end address of the program can 479 %be found at $sigma(n+1)$, where $n$ is the number of instructions in the 480 %program. 481 482 \section{The proof} 483 484 In this section, we present the correctness proof for the algorithm in more 485 detail. The main correctness statement is shown, slightly simplified, in~Figure~\ref{statement}. 486 % 487 \begin{figure}[t] 488 \small 489 \begin{alignat*}{6} 490 \mathtt{sigma}&\omit\rlap{$\mathtt{\_policy\_specification} \equiv 491 \lambda program.\lambda sigma.$} \notag\\ 492 & \omit\rlap{$sigma\ 0 = 0\ \wedge$} \notag\\ 493 & \mathbf{let}\ & & \omit\rlap{$instr\_list \equiv code\ program\ \mathbf{in}$} \notag\\ 494 &&& \omit\rlap{$\forall ppc.ppc < |instr\_list| \rightarrow$} \notag\\ 495 &&& \mathbf{let}\ && pc \equiv sigma\ ppc\ \mathbf{in} \notag\\ 496 &&& \mathbf{let}\ && instruction \equiv \mathtt{fetch\_pseudo\_instruction}\ instr\_list\ ppc\ \mathbf{in} \notag\\ 497 &&& \mathbf{let}\ && next\_pc \equiv sigma\ (ppc+1)\ \mathbf{in}\notag\\ 498 &&&&& next\_pc = pc + \mathtt{instruction\_size}\ sigma\ ppc\ instruction\ \wedge\notag\\ 499 &&&&& (pc + \mathtt{instruction\_size}\ sigma\ ppc\ instruction < 2^{16}\ \vee\notag\\ 500 &&&&& (\forall ppc'.ppc' < |instr\_list| \rightarrow ppc < ppc' \rightarrow \notag\\ 501 &&&&& \mathbf{let}\ instruction' \equiv \mathtt{fetch\_pseudo\_instruction}\ instr\_list\ ppc'\ \mathbf{in} \notag\\ 502 &&&&&\ \mathtt{instruction\_size}\ sigma\ ppc'\ instruction' = 0)\ \wedge \notag\\ 503 &&&&& pc + \mathtt{instruction\_size}\ sigma\ ppc\ instruction = 2^{16}) 504 \end{alignat*} 505 \caption{Main correctness statement\label{statement}} 506 \label{sigmapolspec} 507 \end{figure} 508 % 509 Informally, this means that when fetching a pseudo-instruction at $ppc$, the 510 translation by $\sigma$ of $ppc+1$ is the same as $\sigma(ppc)$ plus the size 511 of the instruction at $ppc$. That is, an instruction is placed consecutively 512 after the previous one, and there are no overlaps. The rest of the statement deals with memory size: either the next instruction fits within memory ($next\_pc < 2^{16}$) or it ends exactly at the limit memory, 513 in which case it must be the last translated instruction in the program (enforced by specfiying that the size of all subsequent instructions is 0: there may be comments or cost annotations that are not translated). 514 515 Finally, we enforce that the program starts at address 0, i.e. $\sigma(0) = 0$. It may seem strange that we do not explicitly include a safety property stating that every jump instruction is of the right type with respect to its target (akin to the lemma from Figure~\ref{sigmasafe}), but this is not necessary. The distance is recalculated according to the instruction addresses from $\sigma$, which implicitly expresses safety. 516 517 Since our computation is a least fixed point computation, we must prove 518 termination in order to prove correctness: if the algorithm is halted after 519 a number of steps without reaching a fixed point, the solution is not 520 guaranteed to be correct. More specifically, branch instructions might be 521 encoded which do not coincide with the span between their location and their 522 destination. 523 524 Proof of termination rests on the fact that the encoding of branch 525 instructions can only grow larger, which means that we must reach a fixed point 526 after at most $2n$ iterations, with $n$ the number of branch instructions in 527 the program. This worst case is reached if at every iteration, we change the 528 encoding of exactly one branch instruction; since the encoding of any branch 529 instruction can change first from short to absolute, and then to long, there 530 can be at most $2n$ changes. 531 532 %The proof has been carried out using the ``Russell'' style from~\cite{Sozeau2006}. 533 %We have proven some invariants of the {\sc f} function from the previous 534 %section; these invariants are then used to prove properties that hold for every 535 %iteration of the fixed point computation; and finally, we can prove some 536 %properties of the fixed point. 537 538 \subsection{Fold invariants} 539 540 In this section, we present the invariants that hold during the fold of {\sc f} 541 over the program. These will be used later on to prove the properties of the 542 iteration. During the fixed point computation, the $\sigma$ function is 543 implemented as a trie for ease of access; computing $\sigma(x)$ is achieved by 544 looking up the value of $x$ in the trie. Actually, during the fold, the value 545 we pass along is a pair $\mathbb{N} \times \mathtt{ppc\_pc\_map}$. The first 546 component is the number of bytes added to the program so far with respect to 547 the previous iteration, and the second component, {\tt ppc\_pc\_map}, is the 548 actual $\sigma$ trie (which we'll call $strie$ to avoid confusion). 549 % 550 {\small 551 \begin{alignat*}{2} 552 \mathtt{out} & \mathtt{\_of\_program\_none} \equiv \lambda prefix.\lambda strie. \notag\\ 553 & \forall i.i < 2^{16} \rightarrow (i > |prefix| \leftrightarrow 554 \mathtt{lookup\_opt}\ i\ (\mathtt{snd}\ strie) = \mathtt{None}) 555 \end{alignat*}} 556 % 557 The first invariant states that any pseudo-address not yet examined is not 558 present in the lookup trie. 559 % 560 {\small 561 \begin{alignat*}{2} 562 \mathtt{not} & \mathtt{\_jump\_default} \equiv \lambda prefix.\lambda strie.\forall i.i < |prefix| \rightarrow\notag\\ 563 & \neg\mathtt{is\_jump}\ (\mathtt{nth}\ i\ prefix) \rightarrow \mathtt{lookup}\ i\ (\mathtt{snd}\ strie) = \mathtt{short\_jump} 564 \end{alignat*}} 565 % 566 This invariant states that when we try to look up the jump length of a 567 pseudo-address where there is no branch instruction, we will get the default 568 value, a short jump. 569 % 570 {\small 571 \begin{alignat*}{4} 572 \mathtt{jump} & \mathtt{\_increase} \equiv \lambda pc.\lambda op.\lambda p.\forall i.i < |prefix| \rightarrow \notag\\ 573 & \mathbf{let}\ oj \equiv \mathtt{lookup}\ i\ (\mathtt{snd}\ op)\ \mathbf{in} \notag\\ 574 & \mathbf{let}\ j \equiv \mathtt{lookup}\ i\ (\mathtt{snd}\ p)\ \mathbf{in}\ \mathtt{jmpleq}\ oj\ j 575 \end{alignat*}} 576 % 577 This invariant states that between iterations (with $op$ being the previous 578 iteration, and $p$ the current one), jump lengths either remain equal or 579 increase. It is needed for proving termination. 580 % 581 \begin{figure}[h] 582 \small 583 \begin{alignat*}{6} 584 \mathtt{sigma} & \omit\rlap{$\mathtt{\_compact\_unsafe} \equiv \lambda prefix.\lambda strie.\forall n.n < |prefix| \rightarrow$}\notag\\ 585 & \mathbf{match}\ && \omit\rlap{$\mathtt{lookup\_opt}\ n\ (\mathtt{snd}\ strie)\ \mathbf{with}$}\notag\\ 586 &&& \omit\rlap{$\mathtt{None} \Rightarrow \mathrm{False}$} \notag\\ 587 &&& \omit\rlap{$\mathtt{Some}\ \langle pc, j \rangle \Rightarrow$} \notag\\ 588 &&& \mathbf{match}\ && \mathtt{lookup\_opt}\ (n+1)\ (\mathtt{snd}\ strie)\ \mathbf{with}\notag\\ 589 &&&&& \mathtt{None} \Rightarrow \mathrm{False} \notag\\ 590 &&&&& \mathtt{Some}\ \langle pc_1, j_1 \rangle \Rightarrow 591 pc_1 = pc + \notag\\ 592 &&&&& \ \ \mathtt{instruction\_size\_jmplen}\ j\ (\mathtt{nth}\ n\ prefix) 593 \end{alignat*} 594 \caption{Temporary safety property} 595 \label{sigmacompactunsafe} 596 \end{figure} 597 % 598 We now proceed with the safety lemmas. The lemma in 599 Figure~\ref{sigmacompactunsafe} is a temporary formulation of the main 600 property {\tt sigma\_policy\_specification}. Its main difference from the 601 final version is that it uses {\tt instruction\_size\_jmplen} to compute the 602 instruction size. This function uses $j$ to compute the span of branch 603 instructions (i.e. it uses the $\sigma$ under construction), instead 604 of looking at the distance between source and destination. This is because 605 $\sigma$ is still under construction; we will prove below that after the 606 final iteration, {\tt sigma\_compact\_unsafe} is equivalent to the main 607 property in Figure~\ref{sigmasafe} which holds at the end of the computation. 608 % 609 \begin{figure}[h] 610 \small 611 \begin{alignat*}{6} 612 \mathtt{sigma} & \omit\rlap{$\mathtt{\_safe} \equiv \lambda prefix.\lambda labels.\lambda old\_strie.\lambda strie.\forall i.i < |prefix| \rightarrow$} \notag\\ 613 & \omit\rlap{$\forall dest\_label.\mathtt{is\_jump\_to\ (\mathtt{nth}\ i\ prefix})\ dest\_label \rightarrow$} \notag\\ 614 & \mathbf{let} && \omit\rlap{$\ paddr \equiv \mathtt{lookup}\ labels\ dest\_label\ \mathbf{in}$} \notag\\ 615 & \mathbf{let} && \omit\rlap{$\ \langle j, src, dest \rangle \equiv \mathbf{if} \ paddr\ \leq\ i\ \mathbf{then}$}\notag\\ 616 &&&&& \mathbf{let}\ \langle \_, j \rangle \equiv \mathtt{lookup}\ i\ (\mathtt{snd}\ strie)\ \mathbf{in} \notag\\ 617 &&&&& \mathbf{let}\ \langle pc\_plus\_jl, \_ \rangle \equiv \mathtt{lookup}\ (i+1)\ (\mathtt{snd}\ strie)\ \mathbf{in}\notag\\ 618 &&&&& \mathbf{let}\ \langle addr, \_ \rangle \equiv \mathtt{lookup}\ paddr\ (\mathtt{snd}\ strie)\ \mathbf{in}\notag\\ 619 &&&&& \langle j, pc\_plus\_jl, addr \rangle\notag\\ 620 &&&\mathbf{else} \notag\\ 621 &&&&&\mathbf{let}\ \langle \_, j \rangle \equiv \mathtt{lookup}\ i\ (\mathtt{snd}\ strie)\ \mathbf{in} \notag\\ 622 &&&&&\mathbf{let}\ \langle pc\_plus\_jl, \_ \rangle \equiv \mathtt{lookup}\ (i+1)\ (\mathtt{snd}\ old\_strie)\ \mathbf{in}\notag\\ 623 &&&&&\mathbf{let}\ \langle addr, \_ \rangle \equiv \mathtt{lookup}\ paddr\ (\mathtt{snd}\ old\_strie)\ \mathbf{in}\notag\\ 624 &&&&&\langle j, pc\_plus\_jl, addr \rangle \mathbf{in}\ \notag\\ 625 &&&\mathbf{match} && \ j\ \mathbf{with} \notag\\ 626 &&&&&\mathrm{short\_jump} \Rightarrow \mathtt{short\_jump\_valid}\ src\ dest\notag\\ 627 &&&&&\mathrm{absolute\_jump} \Rightarrow \mathtt{absolute\_jump\_valid}\ src\ dest\notag\\ 628 &&&&&\mathrm{long\_jump} \Rightarrow \mathrm{True} 629 \end{alignat*} 630 \caption{Safety property} 631 \label{sigmasafe} 632 \end{figure} 633 % 634 We compute the distance using the memory address of the instruction 635 plus its size. This follows the behaviour of the MCS-51 microprocessor, which 636 increases the program counter directly after fetching, and only then executes 637 the branch instruction (by changing the program counter again). 638 639 There are also some simple, properties to make sure that our policy 640 remains consistent, and to keep track of whether the fixed point has been 641 reached. We do not include them here in detail. Two of these properties give the values of $\sigma$ for the start and end of the program; $\sigma(0) = 0$ and $\sigma(n)$, where $n$ is the number of instructions up until now, is equal to the maximum memory address so far. There are also two properties that deal with what happens when the previous 642 iteration does not change with respect to the current one. $added$ is a 643 variable that keeps track of the number of bytes we have added to the program 644 size by changing the encoding of branch instructions. If $added$ is 0, the program 645 has not changed and vice versa. 646 647 %{\small 648 %\begin{align*} 649 %& \mathtt{lookup}\ 0\ (\mathtt{snd}\ strie) = 0 \notag\\ 650 %& \mathtt{lookup}\ |prefix|\ (\mathtt{snd}\ strie) = \mathtt{fst}\ strie 651 %\end{align*}} 652 653 654 %{\small 655 %\begin{align*} 656 %& added = 0\ \rightarrow\ \mathtt{policy\_pc\_equal}\ prefix\ old\_strie\ strie \notag\\ 657 %& \mathtt{policy\_jump\_equal}\ prefix\ old\_strie\ strie\ \rightarrow\ added = 0 658 %\end{align*}} 659 660 We need to use two different formulations, because the fact that $added$ is 0 661 does not guarantee that no branch instructions have changed. For instance, 662 it is possible that we have replaced a short jump with an absolute jump, which 663 does not change the size of the branch instruction. Therefore {\tt policy\_pc\_equal} states that $old\_sigma_1(x) = sigma_1(x)$, whereas {\tt policy\_jump\_equal} states that $old\_sigma_2(x) = sigma_2(x)$. This formulation is sufficient to prove termination and compactness. 664 665 Proving these invariants is simple, usually by induction on the prefix length. 666 667 \subsection{Iteration invariants} 668 669 These are invariants that hold after the completion of an iteration. The main 670 difference between these invariants and the fold invariants is that after the 671 completion of the fold, we check whether the program size does not supersede 672 64 Kb, the maximum memory size the MCS-51 can address. The type of an iteration therefore becomes an option type: {\tt None} in case 673 the program becomes larger than 64 Kb, or $\mathtt{Some}\ \sigma$ 674 otherwise. We also no longer pass along the number of bytes added to the 675 program size, but a boolean that indicates whether we have changed something 676 during the iteration or not. 677 678 If the iteration returns {\tt None}, which means that it has become too large for memory, there is an invariant that states that the previous iteration cannot 679 have every branch instruction encoded as a long jump. This is needed later in the proof of termination. If the iteration returns $\mathtt{Some}\ \sigma$, the fold invariants are retained without change. 680 681 Instead of using {\tt sigma\_compact\_unsafe}, we can now use the proper 682 invariant: 683 % 684 {\small 685 \begin{alignat*}{6} 686 \mathtt{sigma} & \omit\rlap{$\mathtt{\_compact} \equiv \lambda program.\lambda sigma.$} \notag\\ 687 & \omit\rlap{$\forall n.n < |program|\ \rightarrow$} \notag\\ 688 & \mathbf{match}\ && \omit\rlap{$\mathtt{lookup\_opt}\ n\ (\mathtt{snd}\ sigma)\ \mathbf{with}$}\notag\\ 689 &&& \omit\rlap{$\mathrm{None}\ \Rightarrow\ \mathrm{False}$}\notag\\ 690 &&& \omit\rlap{$\mathrm{Some}\ \langle pc, j \rangle \Rightarrow$}\notag\\ 691 &&& \mathbf{match}\ && \mathtt{lookup\_opt}\ (n+1)\ (\mathtt{snd}\ sigma)\ \mathbf{with}\notag\\ 692 &&&&& \mathrm{None}\ \Rightarrow\ \mathrm{False}\notag\\ 693 &&&&& \mathrm{Some} \langle pc1, j1 \rangle \Rightarrow\notag\\ 694 &&&&& \ \ pc1 = pc + \mathtt{instruction\_size}\ n\ (\mathtt{nth}\ n\ program) 695 \end{alignat*}} 696 % 697 This is almost the same invariant as ${\tt sigma\_compact\_unsafe}$, but differs in that it 698 computes the sizes of branch instructions by looking at the distance between 699 position and destination using $\sigma$. In actual use, the invariant is qualified: $\sigma$ is compact if there have 700 been no changes (i.e. the boolean passed along is {\tt true}). This is to 701 reflect the fact that we are doing a least fixed point computation: the result 702 is only correct when we have reached the fixed point. 703 704 There is another, trivial, invariant in case the iteration returns 705 $\mathtt{Some}\ \sigma$: it must hold that $\mathtt{fst}\ sigma < 2^{16}$. 706 We need this invariant to make sure that addresses do not overflow. 707 708 The proof of {\tt nec\_plus\_ultra} goes as follows: if we return {\tt None}, 709 then the program size must be greater than 64 Kb. However, since the 710 previous iteration did not return {\tt None} (because otherwise we would 711 terminate immediately), the program size in the previous iteration must have 712 been smaller than 64 Kb. 713 714 Suppose that all the branch instructions in the previous iteration are 715 encoded as long jumps. This means that all branch instructions in this 716 iteration are long jumps as well, and therefore that both iterations are equal 717 in the encoding of their branch instructions. Per the invariant, this means that 718 $added = 0$, and therefore that all addresses in both iterations are equal. 719 But if all addresses are equal, the program sizes must be equal too, which 720 means that the program size in the current iteration must be smaller than 721 64 Kb. This contradicts the earlier hypothesis, hence not all branch 722 instructions in the previous iteration are encoded as long jumps. 723 724 The proof of {\tt sigma\_compact} follows from {\tt sigma\_compact\_unsafe} and 725 the fact that we have reached a fixed point, i.e. the previous iteration and 726 the current iteration are the same. This means that the results of 727 {\tt instruction\_size\_jmplen} and {\tt instruction\_size} are the same. 728 729 \subsection{Final properties} 730 731 These are the invariants that hold after $2n$ iterations, where $n$ is the 732 program size (we use the program size for convenience; we could also use the 733 number of branch instructions, but this is more complex). Here, we only 734 need {\tt out\_of\_program\_none}, {\tt sigma\_compact} and the fact that 735 $\sigma(0) = 0$. 736 737 Termination can now be proved using the fact that there is a $k \leq 2n$, with 738 $n$ the length of the program, such that iteration $k$ is equal to iteration 739 $k+1$. There are two possibilities: either there is a $k < 2n$ such that this 740 property holds, or every iteration up to $2n$ is different. In the latter case, 741 since the only changes between the iterations can be from shorter jumps to 742 longer jumps, in iteration $2n$ every branch instruction must be encoded as 743 a long jump. In this case, iteration $2n$ is equal to iteration $2n+1$ and the 744 fixed point is reached. 745 746 \section{Conclusion} 747 748 In the previous sections we have discussed the branch displacement optimisation 749 problem, presented an optimised solution, and discussed the proof of 750 termination and correctness for this algorithm, as formalised in Matita. 751 752 The algorithm we have presented is fast and correct, but not optimal; a true 753 optimal solution would need techniques like constraint solvers. While outside 754 the scope of the present research, it would be interesting to see if enough 755 heuristics could be found to make such a solution practical for implementing 756 in an existing compiler; this would be especially useful for embedded systems, 757 where it is important to have as small a solution as possible. 758 759 In itself the algorithm is already useful, as it results in a smaller solution 760 than the simple `every branch instruction is long' used up until now---and with 761 only 64 Kb of memory, every byte counts. It also results in a smaller solution 762 than the greatest fixed point algorithm that {\tt gcc} uses. It does this 763 without sacrificing speed or correctness. 764 765 The certification of an assembler that relies on the branch displacement 766 algorithm described in this paper was presented in~\cite{lastyear}. 767 The assembler computes the $\sigma$ map as described in this paper and 768 then works in two passes. In the first pass it builds a map 769 from instruction labels to addresses in the assembly code. In the 770 second pass it iterates over the code, translating every pseudo jump 771 at address $src$ to a label l associated to the assembly instruction at 772 address $dst$ to a jump of the size dictated by $(\sigma\ src)$ to 773 $(\sigma\ dst)$. In case of conditional jumps, the translated jump may be 774 implemented with a series of instructions. 775 776 The proof of correctness abstracts over the algorithm used and only relies on 777 {\tt sigma\_policy\_specification} (page~\ref{sigmapolspec}). It is a variation 778 of a standard 1-to-many forward simulation proof~\cite{Leroy2009}. The 779 relation R between states just maps every code address $ppc$ stored in 780 registers or memory to $(\sigma\ ppc)$. To identify the code addresses, 781 an additional data structure is always kept together with the source 782 state and is updated by the semantics. The semantics is preserved 783 only for those programs whose source code operations 784 $(f\ ppc_1\ \ldots\ ppc_n)$ applied to code addresses $ppc_1 \ldots ppc_n$ are 785 such that $(f\ (\sigma\ ppc_1)\ ...\ (\sigma\ ppc_n) = f\ ppc_1\ ppc_n))$. For 786 example, an injective $\sigma$ preserves a binary equality test f for code 787 addresses, but not pointer subtraction. 788 789 The main lemma (fetching simulation), which relies on\\ 790 {\tt sigma\_policy\_specification} and is established by structural induction 791 over the source code, says that fetching an assembly instruction at 792 position ppc is equal to fetching the translation of the instruction at 793 position $(\sigma\ ppc)$, and that the new incremented program counter is at 794 the beginning of the next instruction (compactness). The only exception is 795 when the instruction fetched is placed at the end of code memory and is 796 followed only by dead code. Execution simulation is trivial because of the 797 restriction over well behaved programs w.r.t. sigma. The condition 798 $\sigma\ 0 = 0$ is necessary because the hardware model prescribes that the 799 first instruction to be executed will be at address 0. For the details 800 see~\cite{lastyear}. 801 802 Instead of verifying the algorithm directly, another solution to the problem 803 would be to run an optimisation algorithm, and then verify the safety of the 804 result using a verified validator. Such a validator would be easier to verify 805 than the algorithm itself and it would also be efficient, requiring only a 806 linear pass over the source code to test the specification. However, it is 807 surely also interesting to formally prove that the assembler never rejects 808 programs that should be accepted, i.e. that the algorithm itself is correct. 809 This is the topic of the current paper. 810 811 \subsection{Related work} 812 813 As far as we are aware, this is the first formal discussion of the branch 814 displacement optimisation algorithm. 815 816 The CompCert project is another verified compiler project. 817 Their backend~\cite{Leroy2009} generates assembly code for (amongst others) subsets of the 818 PowerPC and x86 (32-bit) architectures. At the assembly code stage, there is 819 no distinction between the span-dependent jump instructions, so a branch 820 displacement optimisation algorithm is not needed. 821 822 %An offshoot of the CompCert project is the CompCertTSO project, which adds 823 %thread concurrency and synchronisation to the CompCert compiler~\cite{TSO2011}. 824 %This compiler also generates assembly code and therefore does not include a 825 %branch displacement algorithm. 826 827 %Finally, there is also the Piton stack~\cite{Moore1996}, which not only includes the 828 %formal verification of a compiler, but also of the machine architecture 829 %targeted by that compiler, a microprocessor called the FM9001. 830 %However, this architecture does not have different 831 %jump sizes (branching is simulated by assigning values to the program counter), 832 %so the branch displacement problem is irrelevant. 833 834 45 835 46 836 \bibliography{biblio} -
Papers/sttt/problem.tex
r3470 r3473 1 \section{Introduction}2 3 The problem of branch displacement optimisation, also known as jump encoding, is4 a well-known problem in assembler design~\cite{Hyde2006}. Its origin lies in the5 fact that in many architecture sets, the encoding (and therefore size) of some6 instructions depends on the distance to their operand (the instruction 'span').7 The branch displacement optimisation problem consists of encoding these8 span-dependent instructions in such a way that the resulting program is as9 small as possible.10 11 This problem is the subject of the present paper. After introducing the problem12 in more detail, we will discuss the solutions used by other compilers, present13 the algorithm we use in the CerCo assembler, and discuss its verification,14 that is the proofs of termination and correctness using the Matita proof15 assistant~\cite{Asperti2007}.16 17 Formulating the final statement of correctness and finding the loop invariants18 have been non-trivial tasks and are, indeed, the main contribution of this19 paper. It has required considerable care and fine-tuning to formulate not20 only the minimal statement required for the ulterior proof of correctness of21 the assembler, but also the minimal set of invariants needed for the proof22 of correctness of the algorithm.23 24 The research presented in this paper has been executed within the CerCo project25 which aims at formally verifying a C compiler with cost annotations. The26 target architecture for this project is the MCS-51, whose instruction set27 contains span-dependent instructions. Furthermore, its maximum addressable28 memory size is very small (64 Kb), which makes it important to generate29 programs that are as small as possible. With this optimisation, however, comes increased complexity and hence increased possibility for error. We must make sure that the branch instructions are encoded correctly, otherwise the assembled program will behave unpredictably.30 31 All Matita files related to this development can be found on the CerCo32 website, \url{http://cerco.cs.unibo.it}. The specific part that contains the33 branch displacement algorithm is in the {\tt ASM} subdirectory, in the files34 {\tt PolicyFront.ma}, {\tt PolicyStep.ma} and {\tt Policy.ma}.35 36 \section{The branch displacement optimisation problem}37 38 In most modern instruction sets that have them, the only span-dependent39 instructions are branch instructions. Taking the ubiquitous x86-64 instruction40 set as an example, we find that it contains eleven different forms of the41 unconditional branch instruction, all with different ranges, instruction sizes42 and semantics (only six are valid in 64-bit mode, for example). Some examples43 are shown in Figure~\ref{f:x86jumps} (see also~\cite{IntelDev}).44 45 \begin{figure}[h]46 \begin{center}47 \begin{tabular}{|l|l|l|}48 \hline49 Instruction & Size (bytes) & Displacement range \\50 \hline51 Short jump & 2 & -128 to 127 bytes \\52 Relative near jump & 5 & $-2^{32}$ to $2^{32}-1$ bytes \\53 Absolute near jump & 6 & one segment (64-bit address) \\54 Far jump & 8 & entire memory (indirect jump) \\55 \hline56 \end{tabular}57 \end{center}58 \caption{List of x86 branch instructions}59 \label{f:x86jumps}60 \end{figure}61 62 The chosen target architecture of the CerCo project is the Intel MCS-51, which63 features three types of branch instructions (or jump instructions; the two terms64 are used interchangeably), as shown in Figure~\ref{f:mcs51jumps}.65 66 \begin{figure}[h]67 \begin{center}68 \begin{tabular}{|l|l|l|l|}69 \hline70 Instruction & Size & Execution time & Displacement range \\71 & (bytes) & (cycles) & \\72 \hline73 SJMP (`short jump') & 2 & 2 & -128 to 127 bytes \\74 AJMP (`absolute jump') & 2 & 2 & one segment (11-bit address) \\75 LJMP (`long jump') & 3 & 3 & entire memory \\76 \hline77 \end{tabular}78 \end{center}79 \caption{List of MCS-51 branch instructions}80 \label{f:mcs51jumps}81 \end{figure}82 83 Conditional branch instructions are only available in short form, which84 means that a conditional branch outside the short address range has to be85 encoded using three branch instructions (for instructions whose logical86 negation is available, it can be done with two branch instructions, but for87 some instructions this is not the case). The call instruction is88 only available in absolute and long forms.89 90 Note that even though the MCS-51 architecture is much less advanced and much91 simpler than the x86-64 architecture, the basic types of branch instruction92 remain the same: a short jump with a limited range, an intra-segment jump and a93 jump that can reach the entire available memory.94 95 Generally, in code fed to the assembler as input, the only96 difference between branch instructions is semantics, not span. This97 means that a distinction is made between an unconditional branch and the98 several kinds of conditional branch, but not between their short, absolute or99 long variants.100 101 The algorithm used by the assembler to encode these branch instructions into102 the different machine instructions is known as the {\em branch displacement103 algorithm}. The optimisation problem consists of finding as small an encoding as104 possible, thus minimising program length and execution time.105 106 Similar problems, e.g. the branch displacement optimisation problem for other107 architectures, are known to be NP-complete~\cite{Robertson1979,Szymanski1978},108 which could make finding an optimal solution very time-consuming.109 110 The canonical solution, as shown by Szymanski~\cite{Szymanski1978} or more111 recently by Dickson~\cite{Dickson2008} for the x86 instruction set, is to use a112 fixed point algorithm that starts with the shortest possible encoding (all113 branch instruction encoded as short jumps, which is likely not a correct114 solution) and then iterates over the source to re-encode those branch115 instructions whose target is outside their range.116 117 \subsection*{Adding absolute jumps}118 119 In both papers mentioned above, the encoding of a jump is only dependent on the120 distance between the jump and its target: below a certain value a short jump121 can be used; above this value the jump must be encoded as a long jump.122 123 Here, termination of the smallest fixed point algorithm is easy to prove. All124 branch instructions start out encoded as short jumps, which means that the125 distance between any branch instruction and its target is as short as possible126 (all the intervening jumps are short).127 If, in this situation, there is a branch instruction $b$ whose span is not128 within the range for a short jump, we can be sure that we can never reach a129 situation where the span of $j$ is so small that it can be encoded as a short130 jump. This argument continues to hold throughout the subsequent iterations of131 the algorithm: short jumps can change into long jumps, but not \emph{vice versa},132 as spans only increase. Hence, the algorithm either terminates early when a fixed133 point is reached or when all short jumps have been changed into long jumps.134 135 Also, we can be certain that we have reached an optimal solution: a short jump136 is only changed into a long jump if it is absolutely necessary.137 138 However, neither of these claims (termination nor optimality) hold when we add139 the absolute jump. With absolute jumps, the encoding of a branch140 instruction no longer depends only on the distance between the branch141 instruction and its target. An absolute jump is possible when instruction and142 target are in the same segment (for the MCS-51, this means that the first 5143 bytes of their addresses have to be equal). It is therefore entirely possible144 for two branch instructions with the same span to be encoded in different ways145 (absolute if the branch instruction and its target are in the same segment,146 long if this is not the case).147 148 \begin{figure}[t]149 \begin{subfigure}[b]{.45\linewidth}150 \small151 \begin{alltt}152 jmp X153 \ldots154 L\(\sb{0}\): \ldots155 % Start of new segment if156 % jmp X is encoded as short157 \ldots158 jmp L\(\sb{0}\)159 \end{alltt}160 \caption{Example of a program where a long jump becomes absolute}161 \label{f:term_example}162 \end{subfigure}163 \hfill164 \begin{subfigure}[b]{.45\linewidth}165 \small166 \begin{alltt}167 L\(\sb{0}\): jmp X168 X: \ldots169 \ldots170 L\(\sb{1}\): \ldots171 % Start of new segment if172 % jmp X is encoded as short173 \ldots174 jmp L\(\sb{1}\)175 \ldots176 jmp L\(\sb{1}\)177 \ldots178 jmp L\(\sb{1}\)179 \ldots180 \end{alltt}181 \caption{Example of a program where the fixed-point algorithm is not optimal}182 \label{f:opt_example}183 \end{subfigure}184 \end{figure}185 186 This invalidates our earlier termination argument: a branch instruction, once encoded187 as a long jump, can be re-encoded during a later iteration as an absolute jump.188 Consider the program shown in Figure~\ref{f:term_example}. At the start of the189 first iteration, both the branch to {\tt X} and the branch to $\mathtt{L}_{0}$190 are encoded as small jumps. Let us assume that in this case, the placement of191 $\mathtt{L}_{0}$ and the branch to it are such that $\mathtt{L}_{0}$ is just192 outside the segment that contains this branch. Let us also assume that the193 distance between $\mathtt{L}_{0}$ and the branch to it is too large for the194 branch instruction to be encoded as a short jump.195 196 All this means that in the second iteration, the branch to $\mathtt{L}_{0}$ will197 be encoded as a long jump. If we assume that the branch to {\tt X} is encoded as198 a long jump as well, the size of the branch instruction will increase and199 $\mathtt{L}_{0}$ will be `propelled' into the same segment as its branch200 instruction, because every subsequent instruction will move one byte forward.201 Hence, in the third iteration, the branch to $\mathtt{L}_{0}$ can be encoded as202 an absolute jump. At first glance, there is nothing that prevents us from203 constructing a configuration where two branch instructions interact in such a204 way as to iterate indefinitely between long and absolute encodings.205 206 This situation mirrors the explanation by Szymanski~\cite{Szymanski1978} of why207 the branch displacement optimisation problem is NP-complete. In this explanation,208 a condition for NP-completeness is the fact that programs be allowed to contain209 {\em pathological} jumps. These are branch instructions that can normally not be210 encoded as a short(er) jump, but gain this property when some other branch211 instructions are encoded as a long(er) jump. This is exactly what happens in212 Figure~\ref{f:term_example}. By encoding the first branch instruction as a long213 jump, another branch instruction switches from long to absolute (which is214 shorter).215 216 In addition, our previous optimality argument no longer holds. Consider the217 program shown in Figure~\ref{f:opt_example}. Suppose that the distance between218 $\mathtt{L}_{0}$ and $\mathtt{L}_{1}$ is such that if {\tt jmp X} is encoded219 as a short jump, there is a segment border just after $\mathtt{L}_{1}$. Let220 us also assume that all three branches to $\mathtt{L}_{1}$ are in the same221 segment, but far enough away from $\mathtt{L}_{1}$ that they cannot be encoded222 as short jumps.223 224 Then, if {\tt jmp X} were to be encoded as a short jump, which is clearly225 possible, all of the branches to $\mathtt{L}_{1}$ would have to be encoded as226 long jumps. However, if {\tt jmp X} were to be encoded as a long jump, and227 therefore increase in size, $\mathtt{L}_{1}$ would be `propelled' across the228 segment border, so that the three branches to $\mathtt{L}_{1}$ could be encoded229 as absolute jumps. Depending on the relative sizes of long and absolute jumps,230 this solution might actually be smaller than the one reached by the smallest231 fixed point algorithm. -
Papers/sttt/proof.tex
r3470 r3473 1 \section{The proof}2 3 In this section, we present the correctness proof for the algorithm in more4 detail. The main correctness statement is shown, slightly simplified, in~Figure~\ref{statement}.5 %6 \begin{figure}[t]7 \small8 \begin{alignat*}{6}9 \mathtt{sigma}&\omit\rlap{$\mathtt{\_policy\_specification} \equiv10 \lambda program.\lambda sigma.$} \notag\\11 & \omit\rlap{$sigma\ 0 = 0\ \wedge$} \notag\\12 & \mathbf{let}\ & & \omit\rlap{$instr\_list \equiv code\ program\ \mathbf{in}$} \notag\\13 &&& \omit\rlap{$\forall ppc.ppc < |instr\_list| \rightarrow$} \notag\\14 &&& \mathbf{let}\ && pc \equiv sigma\ ppc\ \mathbf{in} \notag\\15 &&& \mathbf{let}\ && instruction \equiv \mathtt{fetch\_pseudo\_instruction}\ instr\_list\ ppc\ \mathbf{in} \notag\\16 &&& \mathbf{let}\ && next\_pc \equiv sigma\ (ppc+1)\ \mathbf{in}\notag\\17 &&&&& next\_pc = pc + \mathtt{instruction\_size}\ sigma\ ppc\ instruction\ \wedge\notag\\18 &&&&& (pc + \mathtt{instruction\_size}\ sigma\ ppc\ instruction < 2^{16}\ \vee\notag\\19 &&&&& (\forall ppc'.ppc' < |instr\_list| \rightarrow ppc < ppc' \rightarrow \notag\\20 &&&&& \mathbf{let}\ instruction' \equiv \mathtt{fetch\_pseudo\_instruction}\ instr\_list\ ppc'\ \mathbf{in} \notag\\21 &&&&&\ \mathtt{instruction\_size}\ sigma\ ppc'\ instruction' = 0)\ \wedge \notag\\22 &&&&& pc + \mathtt{instruction\_size}\ sigma\ ppc\ instruction = 2^{16})23 \end{alignat*}24 \caption{Main correctness statement\label{statement}}25 \label{sigmapolspec}26 \end{figure}27 %28 Informally, this means that when fetching a pseudo-instruction at $ppc$, the29 translation by $\sigma$ of $ppc+1$ is the same as $\sigma(ppc)$ plus the size30 of the instruction at $ppc$. That is, an instruction is placed consecutively31 after the previous one, and there are no overlaps. The rest of the statement deals with memory size: either the next instruction fits within memory ($next\_pc < 2^{16}$) or it ends exactly at the limit memory,32 in which case it must be the last translated instruction in the program (enforced by specfiying that the size of all subsequent instructions is 0: there may be comments or cost annotations that are not translated).33 34 Finally, we enforce that the program starts at address 0, i.e. $\sigma(0) = 0$. It may seem strange that we do not explicitly include a safety property stating that every jump instruction is of the right type with respect to its target (akin to the lemma from Figure~\ref{sigmasafe}), but this is not necessary. The distance is recalculated according to the instruction addresses from $\sigma$, which implicitly expresses safety.35 36 Since our computation is a least fixed point computation, we must prove37 termination in order to prove correctness: if the algorithm is halted after38 a number of steps without reaching a fixed point, the solution is not39 guaranteed to be correct. More specifically, branch instructions might be40 encoded which do not coincide with the span between their location and their41 destination.42 43 Proof of termination rests on the fact that the encoding of branch44 instructions can only grow larger, which means that we must reach a fixed point45 after at most $2n$ iterations, with $n$ the number of branch instructions in46 the program. This worst case is reached if at every iteration, we change the47 encoding of exactly one branch instruction; since the encoding of any branch48 instruction can change first from short to absolute, and then to long, there49 can be at most $2n$ changes.50 51 %The proof has been carried out using the ``Russell'' style from~\cite{Sozeau2006}.52 %We have proven some invariants of the {\sc f} function from the previous53 %section; these invariants are then used to prove properties that hold for every54 %iteration of the fixed point computation; and finally, we can prove some55 %properties of the fixed point.56 57 \subsection{Fold invariants}58 59 In this section, we present the invariants that hold during the fold of {\sc f}60 over the program. These will be used later on to prove the properties of the61 iteration. During the fixed point computation, the $\sigma$ function is62 implemented as a trie for ease of access; computing $\sigma(x)$ is achieved by63 looking up the value of $x$ in the trie. Actually, during the fold, the value64 we pass along is a pair $\mathbb{N} \times \mathtt{ppc\_pc\_map}$. The first65 component is the number of bytes added to the program so far with respect to66 the previous iteration, and the second component, {\tt ppc\_pc\_map}, is the67 actual $\sigma$ trie (which we'll call $strie$ to avoid confusion).68 %69 {\small70 \begin{alignat*}{2}71 \mathtt{out} & \mathtt{\_of\_program\_none} \equiv \lambda prefix.\lambda strie. \notag\\72 & \forall i.i < 2^{16} \rightarrow (i > |prefix| \leftrightarrow73 \mathtt{lookup\_opt}\ i\ (\mathtt{snd}\ strie) = \mathtt{None})74 \end{alignat*}}75 %76 The first invariant states that any pseudo-address not yet examined is not77 present in the lookup trie.78 %79 {\small80 \begin{alignat*}{2}81 \mathtt{not} & \mathtt{\_jump\_default} \equiv \lambda prefix.\lambda strie.\forall i.i < |prefix| \rightarrow\notag\\82 & \neg\mathtt{is\_jump}\ (\mathtt{nth}\ i\ prefix) \rightarrow \mathtt{lookup}\ i\ (\mathtt{snd}\ strie) = \mathtt{short\_jump}83 \end{alignat*}}84 %85 This invariant states that when we try to look up the jump length of a86 pseudo-address where there is no branch instruction, we will get the default87 value, a short jump.88 %89 {\small90 \begin{alignat*}{4}91 \mathtt{jump} & \mathtt{\_increase} \equiv \lambda pc.\lambda op.\lambda p.\forall i.i < |prefix| \rightarrow \notag\\92 & \mathbf{let}\ oj \equiv \mathtt{lookup}\ i\ (\mathtt{snd}\ op)\ \mathbf{in} \notag\\93 & \mathbf{let}\ j \equiv \mathtt{lookup}\ i\ (\mathtt{snd}\ p)\ \mathbf{in}\ \mathtt{jmpleq}\ oj\ j94 \end{alignat*}}95 %96 This invariant states that between iterations (with $op$ being the previous97 iteration, and $p$ the current one), jump lengths either remain equal or98 increase. It is needed for proving termination.99 %100 \begin{figure}[h]101 \small102 \begin{alignat*}{6}103 \mathtt{sigma} & \omit\rlap{$\mathtt{\_compact\_unsafe} \equiv \lambda prefix.\lambda strie.\forall n.n < |prefix| \rightarrow$}\notag\\104 & \mathbf{match}\ && \omit\rlap{$\mathtt{lookup\_opt}\ n\ (\mathtt{snd}\ strie)\ \mathbf{with}$}\notag\\105 &&& \omit\rlap{$\mathtt{None} \Rightarrow \mathrm{False}$} \notag\\106 &&& \omit\rlap{$\mathtt{Some}\ \langle pc, j \rangle \Rightarrow$} \notag\\107 &&& \mathbf{match}\ && \mathtt{lookup\_opt}\ (n+1)\ (\mathtt{snd}\ strie)\ \mathbf{with}\notag\\108 &&&&& \mathtt{None} \Rightarrow \mathrm{False} \notag\\109 &&&&& \mathtt{Some}\ \langle pc_1, j_1 \rangle \Rightarrow110 pc_1 = pc + \notag\\111 &&&&& \ \ \mathtt{instruction\_size\_jmplen}\ j\ (\mathtt{nth}\ n\ prefix)112 \end{alignat*}113 \caption{Temporary safety property}114 \label{sigmacompactunsafe}115 \end{figure}116 %117 We now proceed with the safety lemmas. The lemma in118 Figure~\ref{sigmacompactunsafe} is a temporary formulation of the main119 property {\tt sigma\_policy\_specification}. Its main difference from the120 final version is that it uses {\tt instruction\_size\_jmplen} to compute the121 instruction size. This function uses $j$ to compute the span of branch122 instructions (i.e. it uses the $\sigma$ under construction), instead123 of looking at the distance between source and destination. This is because124 $\sigma$ is still under construction; we will prove below that after the125 final iteration, {\tt sigma\_compact\_unsafe} is equivalent to the main126 property in Figure~\ref{sigmasafe} which holds at the end of the computation.127 %128 \begin{figure}[h]129 \small130 \begin{alignat*}{6}131 \mathtt{sigma} & \omit\rlap{$\mathtt{\_safe} \equiv \lambda prefix.\lambda labels.\lambda old\_strie.\lambda strie.\forall i.i < |prefix| \rightarrow$} \notag\\132 & \omit\rlap{$\forall dest\_label.\mathtt{is\_jump\_to\ (\mathtt{nth}\ i\ prefix})\ dest\_label \rightarrow$} \notag\\133 & \mathbf{let} && \omit\rlap{$\ paddr \equiv \mathtt{lookup}\ labels\ dest\_label\ \mathbf{in}$} \notag\\134 & \mathbf{let} && \omit\rlap{$\ \langle j, src, dest \rangle \equiv \mathbf{if} \ paddr\ \leq\ i\ \mathbf{then}$}\notag\\135 &&&&& \mathbf{let}\ \langle \_, j \rangle \equiv \mathtt{lookup}\ i\ (\mathtt{snd}\ strie)\ \mathbf{in} \notag\\136 &&&&& \mathbf{let}\ \langle pc\_plus\_jl, \_ \rangle \equiv \mathtt{lookup}\ (i+1)\ (\mathtt{snd}\ strie)\ \mathbf{in}\notag\\137 &&&&& \mathbf{let}\ \langle addr, \_ \rangle \equiv \mathtt{lookup}\ paddr\ (\mathtt{snd}\ strie)\ \mathbf{in}\notag\\138 &&&&& \langle j, pc\_plus\_jl, addr \rangle\notag\\139 &&&\mathbf{else} \notag\\140 &&&&&\mathbf{let}\ \langle \_, j \rangle \equiv \mathtt{lookup}\ i\ (\mathtt{snd}\ strie)\ \mathbf{in} \notag\\141 &&&&&\mathbf{let}\ \langle pc\_plus\_jl, \_ \rangle \equiv \mathtt{lookup}\ (i+1)\ (\mathtt{snd}\ old\_strie)\ \mathbf{in}\notag\\142 &&&&&\mathbf{let}\ \langle addr, \_ \rangle \equiv \mathtt{lookup}\ paddr\ (\mathtt{snd}\ old\_strie)\ \mathbf{in}\notag\\143 &&&&&\langle j, pc\_plus\_jl, addr \rangle \mathbf{in}\ \notag\\144 &&&\mathbf{match} && \ j\ \mathbf{with} \notag\\145 &&&&&\mathrm{short\_jump} \Rightarrow \mathtt{short\_jump\_valid}\ src\ dest\notag\\146 &&&&&\mathrm{absolute\_jump} \Rightarrow \mathtt{absolute\_jump\_valid}\ src\ dest\notag\\147 &&&&&\mathrm{long\_jump} \Rightarrow \mathrm{True}148 \end{alignat*}149 \caption{Safety property}150 \label{sigmasafe}151 \end{figure}152 %153 We compute the distance using the memory address of the instruction154 plus its size. This follows the behaviour of the MCS-51 microprocessor, which155 increases the program counter directly after fetching, and only then executes156 the branch instruction (by changing the program counter again).157 158 There are also some simple, properties to make sure that our policy159 remains consistent, and to keep track of whether the fixed point has been160 reached. We do not include them here in detail. Two of these properties give the values of $\sigma$ for the start and end of the program; $\sigma(0) = 0$ and $\sigma(n)$, where $n$ is the number of instructions up until now, is equal to the maximum memory address so far. There are also two properties that deal with what happens when the previous161 iteration does not change with respect to the current one. $added$ is a162 variable that keeps track of the number of bytes we have added to the program163 size by changing the encoding of branch instructions. If $added$ is 0, the program164 has not changed and vice versa.165 166 %{\small167 %\begin{align*}168 %& \mathtt{lookup}\ 0\ (\mathtt{snd}\ strie) = 0 \notag\\169 %& \mathtt{lookup}\ |prefix|\ (\mathtt{snd}\ strie) = \mathtt{fst}\ strie170 %\end{align*}}171 172 173 %{\small174 %\begin{align*}175 %& added = 0\ \rightarrow\ \mathtt{policy\_pc\_equal}\ prefix\ old\_strie\ strie \notag\\176 %& \mathtt{policy\_jump\_equal}\ prefix\ old\_strie\ strie\ \rightarrow\ added = 0177 %\end{align*}}178 179 We need to use two different formulations, because the fact that $added$ is 0180 does not guarantee that no branch instructions have changed. For instance,181 it is possible that we have replaced a short jump with an absolute jump, which182 does not change the size of the branch instruction. Therefore {\tt policy\_pc\_equal} states that $old\_sigma_1(x) = sigma_1(x)$, whereas {\tt policy\_jump\_equal} states that $old\_sigma_2(x) = sigma_2(x)$. This formulation is sufficient to prove termination and compactness.183 184 Proving these invariants is simple, usually by induction on the prefix length.185 186 \subsection{Iteration invariants}187 188 These are invariants that hold after the completion of an iteration. The main189 difference between these invariants and the fold invariants is that after the190 completion of the fold, we check whether the program size does not supersede191 64 Kb, the maximum memory size the MCS-51 can address. The type of an iteration therefore becomes an option type: {\tt None} in case192 the program becomes larger than 64 Kb, or $\mathtt{Some}\ \sigma$193 otherwise. We also no longer pass along the number of bytes added to the194 program size, but a boolean that indicates whether we have changed something195 during the iteration or not.196 197 If the iteration returns {\tt None}, which means that it has become too large for memory, there is an invariant that states that the previous iteration cannot198 have every branch instruction encoded as a long jump. This is needed later in the proof of termination. If the iteration returns $\mathtt{Some}\ \sigma$, the fold invariants are retained without change.199 200 Instead of using {\tt sigma\_compact\_unsafe}, we can now use the proper201 invariant:202 %203 {\small204 \begin{alignat*}{6}205 \mathtt{sigma} & \omit\rlap{$\mathtt{\_compact} \equiv \lambda program.\lambda sigma.$} \notag\\206 & \omit\rlap{$\forall n.n < |program|\ \rightarrow$} \notag\\207 & \mathbf{match}\ && \omit\rlap{$\mathtt{lookup\_opt}\ n\ (\mathtt{snd}\ sigma)\ \mathbf{with}$}\notag\\208 &&& \omit\rlap{$\mathrm{None}\ \Rightarrow\ \mathrm{False}$}\notag\\209 &&& \omit\rlap{$\mathrm{Some}\ \langle pc, j \rangle \Rightarrow$}\notag\\210 &&& \mathbf{match}\ && \mathtt{lookup\_opt}\ (n+1)\ (\mathtt{snd}\ sigma)\ \mathbf{with}\notag\\211 &&&&& \mathrm{None}\ \Rightarrow\ \mathrm{False}\notag\\212 &&&&& \mathrm{Some} \langle pc1, j1 \rangle \Rightarrow\notag\\213 &&&&& \ \ pc1 = pc + \mathtt{instruction\_size}\ n\ (\mathtt{nth}\ n\ program)214 \end{alignat*}}215 %216 This is almost the same invariant as ${\tt sigma\_compact\_unsafe}$, but differs in that it217 computes the sizes of branch instructions by looking at the distance between218 position and destination using $\sigma$. In actual use, the invariant is qualified: $\sigma$ is compact if there have219 been no changes (i.e. the boolean passed along is {\tt true}). This is to220 reflect the fact that we are doing a least fixed point computation: the result221 is only correct when we have reached the fixed point.222 223 There is another, trivial, invariant in case the iteration returns224 $\mathtt{Some}\ \sigma$: it must hold that $\mathtt{fst}\ sigma < 2^{16}$.225 We need this invariant to make sure that addresses do not overflow.226 227 The proof of {\tt nec\_plus\_ultra} goes as follows: if we return {\tt None},228 then the program size must be greater than 64 Kb. However, since the229 previous iteration did not return {\tt None} (because otherwise we would230 terminate immediately), the program size in the previous iteration must have231 been smaller than 64 Kb.232 233 Suppose that all the branch instructions in the previous iteration are234 encoded as long jumps. This means that all branch instructions in this235 iteration are long jumps as well, and therefore that both iterations are equal236 in the encoding of their branch instructions. Per the invariant, this means that237 $added = 0$, and therefore that all addresses in both iterations are equal.238 But if all addresses are equal, the program sizes must be equal too, which239 means that the program size in the current iteration must be smaller than240 64 Kb. This contradicts the earlier hypothesis, hence not all branch241 instructions in the previous iteration are encoded as long jumps.242 243 The proof of {\tt sigma\_compact} follows from {\tt sigma\_compact\_unsafe} and244 the fact that we have reached a fixed point, i.e. the previous iteration and245 the current iteration are the same. This means that the results of246 {\tt instruction\_size\_jmplen} and {\tt instruction\_size} are the same.247 248 \subsection{Final properties}249 250 These are the invariants that hold after $2n$ iterations, where $n$ is the251 program size (we use the program size for convenience; we could also use the252 number of branch instructions, but this is more complex). Here, we only253 need {\tt out\_of\_program\_none}, {\tt sigma\_compact} and the fact that254 $\sigma(0) = 0$.255 256 Termination can now be proved using the fact that there is a $k \leq 2n$, with257 $n$ the length of the program, such that iteration $k$ is equal to iteration258 $k+1$. There are two possibilities: either there is a $k < 2n$ such that this259 property holds, or every iteration up to $2n$ is different. In the latter case,260 since the only changes between the iterations can be from shorter jumps to261 longer jumps, in iteration $2n$ every branch instruction must be encoded as262 a long jump. In this case, iteration $2n$ is equal to iteration $2n+1$ and the263 fixed point is reached.
Note: See TracChangeset
for help on using the changeset viewer.