Changes between Version 6 and Version 7 of Proposal1.2


Ignore:
Timestamp:
Jun 15, 2010, 4:10:20 PM (8 years ago)
Author:
sacerdot
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Proposal1.2

    v6 v7  
    2828'''Figure 2:''' Quicksort
    2929
    30 As a simple example, consider the quicksort program of Fig.2. This algorithm performs in-place sorting of input array img3.png whose bounds are img4.png and img5.png; initially img4.png is expected to be zero, while img5.png is the length of the array minus one. The outermost conditional terminates when the bounds of the array are illegal. (Sorting an empty array will end the recursive behaviour of the algorithm.) The variable img6.png is the so called pivot: a selected element of the array that will be compared with all other elements. Bigger elements will be moved (by the swap function) to the end of the array (the upper part), while smaller elements are placed at the beginning of the array (the lower part). Then the pivot is placed between the lower and the upper part of the array, in position img7.png, its position in the resulting sorted array; all elements before the pivot are smaller and all elements following it are bigger. The algorithm completes the sorting with recursive calls on the lower and on the upper parts of the array.
     30As a simple example, consider the quicksort program of Fig.2. This algorithm performs in-place sorting of input array ''t'' whose bounds are ''l'' and ''r''; initially ''l'' is expected to be zero, while ''r'' is the length of the array minus one. The outermost conditional terminates when the bounds of the array are illegal. (Sorting an empty array will end the recursive behaviour of the algorithm.) The variable ''v'' is the so called pivot: a selected element of the array that will be compared with all other elements. Bigger elements will be moved (by the swap function) to the end of the array (the upper part), while smaller elements are placed at the beginning of the array (the lower part). Then the pivot is placed between the lower and the upper part of the array, in position ''m'', its position in the resulting sorted array; all elements before the pivot are smaller and all elements following it are bigger. The algorithm completes the sorting with recursive calls on the lower and on the upper parts of the array.
    3131
    32 In the body of the `quick_sort` procedure there are only two execution paths, corresponding to the two cases img8.png and img9.png. The latter is a trivial path, immediately leading to termination. The former leads to the while loop (that is regarded as a procedure call), the call to swap, and the two recursive calls. Similarly, the body of the while loop is composed by two paths, corresponding to the two conditions img10.png and img11.png.
     32In the body of the `quick_sort` procedure there are only two execution paths, corresponding to the two cases ''l<r'' and ''l ≥ r''. The latter is a trivial path, immediately leading to termination. The former leads to the while loop (that is regarded as a procedure call), the call to swap, and the two recursive calls. Similarly, the body of the while loop is composed by two paths, corresponding to the two conditions ''i≤r'' and ''i>r''.
    3333
    3434All operations performed along any of these paths takes some constant time c. The complexity magnitude of the program only depends on the loops and recursive calls met along its execution paths, but not on their associated constants. On the other hand, if we want to give tight performance bounds to the execution time, we have to compute the real constants on the executable.
    3535
    36 The compiler must be able to return a set of pairs img12.png, where each img13.png is an execution path, and img14.png is its actual cost[[footnode.html#foot154|^2^]]. It is then up to the user to guess (and to prove, assisted by interactive tools) the complexity of the program: the compiler only provides the infrastructure required to map a complexity analysis on the source into a faithful analog on the target. This approach looks compatible with most local optimizations. Moreover, since we work on a cycle-by-cycle (procedure-by-procedure) basis, the approach should scale up well.
     36The compiler must be able to return a set of pairs ''p(i),c(i)'', where each ''p(i)'' is an execution path, and ''c(i)'' is its actual cost[[footnode.html#foot154|^2^]]. It is then up to the user to guess (and to prove, assisted by interactive tools) the complexity of the program: the compiler only provides the infrastructure required to map a complexity analysis on the source into a faithful analog on the target. This approach looks compatible with most local optimizations. Moreover, since we work on a cycle-by-cycle (procedure-by-procedure) basis, the approach should scale up well.
    3737
    3838[[BR]]
     
    5050  1. The user writes her code in C�(see Fig.�[[node6.html#code|2]]) and compiles it with our CerCo (Certified Complexity) compiler.
    5151  1. The CerCo compiler outputs both the object code and an annotated copy of the C source (Fig.�[[#annotated|4]]). '''Figure 4:''' Cost annotated C code (generated by the compiler)||  ||
    52   '''Figure 5:''' Automatic inference of cost annotations from assembly code|| || ||<tablestyle="width:30%"> C source[[BR]]� [[BR]]`~- void quicksort(t,l,r) {[[BR]] if (l < r) {[[BR]] int i = l + 1;[[BR]] int m = l;[[BR]] int v = t[l];[[BR]] while (i <= r) {[[BR]] if (t[i] < v) { [[BR]] m++; [[BR]] swap(t, i, m); }[[BR]] i++;}[[BR]] swap(t, l, m);[[BR]] quicksort(t, l, m - 1);[[BR]] quicksort(t, m + 1, r);[[BR]]}[[BR]]-~` || || ||<tablestyle="width:30%"> Pseudo-Assembly code[[BR]]� [[BR]]`~- 24: r <- r3[[BR]]29: l <- r2[[BR]]34: cmp l r[[BR]]36: t <- r1[[BR]]3a: jump c4 if l >= r[[BR]]40: i <- l + 1[[BR]]44: r8 <- t[[BR]]48: r7 <- l[[BR]]4b: m <- l[[BR]][[BR]]img20.png[[BR]][[BR]]97: r1 <- t[[BR]]9b: r3 <- m[[BR]]9e: r2 <- l[[BR]]a1: call swap[[BR]]a6: r1 <- t[[BR]]aa: r3 <- m - 0x1[[BR]]af: r2 <- l[[BR]]b2: call quicksort[[BR]]bc: l <- r6[[BR]]bf: call quicksort[[BR]]c4: ret -~` || || ||<tablestyle="width:30%"> Execution Paths[[BR]]� [[BR]]`~- l >= r img21.png [[BR]]24: r <- r3[[BR]]29: l <- r2[[BR]]34: cmp l r[[BR]]36: t <- r1[[BR]]3a: jump c4 if l >= r[[BR]]c4: ret[[BR]]�[[BR]]total: 6 clock cycles[[BR]]�[[BR]]l < r img21.png [[BR]]24: r <- r3[[BR]]29: l <- r2[[BR]]34: cmp l r[[BR]]36: t <- r1[[BR]]3a: jump c4 if l >= r[[BR]]40: i <- l + 1[[BR]]44: r8 <- t[[BR]]48: r7 <- l[[BR]]4b: m <- l[[BR]]while loop[[BR]]97: r1 <- t[[BR]]9b: r3 <- m[[BR]]9e: r2 <- l[[BR]]a1: call swap [[BR]]swap [[BR]]a6: r1 <- t[[BR]]aa: r3 <- m - 0x1[[BR]]af: r2 <- l[[BR]]b2: call quicksort[[BR]]quicksort[[BR]]bc: l <- r6[[BR]]bf: call quicksort[[BR]]quicksort[[BR]]c4: ret[[BR]]�[[BR]]total: 21 clock cycles + function calls -~` || || ||
    53    Each loop and function body is annotated with the cost of one iteration, along all its possible execution paths. The cost is expressed as a function of the state of the program, which comprises the value of every variable. (In the example, we use a textual annotation for simplicity, but we expect to produce a more structured output.) The leftmost column of Fig.�[[#runs|5]] shows the original source code. Colours are used to relate source code statements with their respective (human readable) assembly instructions, reported in the central column. That assembly was produced by gcc[[footnode.html#foot390|^3^]] with a moderate level of optimizations for an Intel 386 family microprocessor. We used img4.png, img5.png, img3.png, img6.png and img7.png to mention locations (registers and/or stack frame temporaries) in which the corresponding C variables are placed, while img22.png are other register or temporaries that have no direct mapping to C. The calling convention puts the first three parameters in img23.png, img24.png and img25.png, and it is up to the callee to eventually store them in local temporaries. Assignment is denoted with `<-`, addition and multiplication with `+` and `*`; the jump instruction is followed by the target address, and when the jump is conditional a C like expression follows (but its evaluation is performed early by the `cmp` instruction, that sets a CPU flag recording the result of the comparison). The only tricky expression is ```*(r8 + r7 * 4)`'', that exploits an advanced addressing mechanism corresponding to array indexing (4 is the size of an array cell in bytes, img26.png is the index and img27.png is the address at which the array starts). It amounts to the C statement ```t[l]`'' that computes the pivot.The rightmost column shows two possible execution paths, with a precise estimation of their cost (here 6 and 21 CPU cycles plus the cost of function calls) and the algebraic conditions characterizing these paths.More precisely
     52  '''Figure 5:''' Automatic inference of cost annotations from assembly code|| ||
     53     ||<tablestyle="width:30%"> C source[[BR]]� [[BR]]`~- void quicksort(t,l,r) {[[BR]] if (l < r) {[[BR]] int i = l + 1;[[BR]] int m = l;[[BR]] int v = t[l];[[BR]] while (i <= r) {[[BR]] if (t[i] < v) { [[BR]] m++; [[BR]] swap(t, i, m); }[[BR]] i++;}[[BR]] swap(t, l, m);[[BR]] quicksort(t, l, m - 1);[[BR]] quicksort(t, m + 1, r);[[BR]]}[[BR]]-~` || || ||<tablestyle="width:30%"> Pseudo-Assembly code[[BR]]� [[BR]]`~- 24: r <- r3[[BR]]29: l <- r2[[BR]]34: cmp l r[[BR]]36: t <- r1[[BR]]3a: jump c4 if l >= r[[BR]]40: i <- l + 1[[BR]]44: r8 <- t[[BR]]48: r7 <- l[[BR]]4b: m <- l[[BR]][[BR]]img20.png[[BR]][[BR]]97: r1 <- t[[BR]]9b: r3 <- m[[BR]]9e: r2 <- l[[BR]]a1: call swap[[BR]]a6: r1 <- t[[BR]]aa: r3 <- m - 0x1[[BR]]af: r2 <- l[[BR]]b2: call quicksort[[BR]]bc: l <- r6[[BR]]bf: call quicksort[[BR]]c4: ret -~` || || ||<tablestyle="width:30%"> Execution Paths[[BR]]� [[BR]]`~- l >= r img21.png [[BR]]24: r <- r3[[BR]]29: l <- r2[[BR]]34: cmp l r[[BR]]36: t <- r1[[BR]]3a: jump c4 if l >= r[[BR]]c4: ret[[BR]]�[[BR]]total: 6 clock cycles[[BR]]�[[BR]]l < r img21.png [[BR]]24: r <- r3[[BR]]29: l <- r2[[BR]]34: cmp l r[[BR]]36: t <- r1[[BR]]3a: jump c4 if l >= r[[BR]]40: i <- l + 1[[BR]]44: r8 <- t[[BR]]48: r7 <- l[[BR]]4b: m <- l[[BR]]while loop[[BR]]97: r1 <- t[[BR]]9b: r3 <- m[[BR]]9e: r2 <- l[[BR]]a1: call swap [[BR]]swap [[BR]]a6: r1 <- t[[BR]]aa: r3 <- m - 0x1[[BR]]af: r2 <- l[[BR]]b2: call quicksort[[BR]]quicksort[[BR]]bc: l <- r6[[BR]]bf: call quicksort[[BR]]quicksort[[BR]]c4: ret[[BR]]�[[BR]]total: 21 clock cycles + function calls -~` || || ||
     54   Each loop and function body is annotated with the cost of one iteration, along all its possible execution paths. The cost is expressed as a function of the state of the program, which comprises the value of every variable. (In the example, we use a textual annotation for simplicity, but we expect to produce a more structured output.) The leftmost column of Fig.�[[#runs|5]] shows the original source code. Colours are used to relate source code statements with their respective (human readable) assembly instructions, reported in the central column. That assembly was produced by gcc[[footnode.html#foot390|^3^]] with a moderate level of optimizations for an Intel 386 family microprocessor. We used ''l,r,t,v'' and ''m'' to mention locations (registers and/or stack frame temporaries) in which the corresponding C variables are placed, while ''r1,...,r9'' are other register or temporaries that have no direct mapping to C. The calling convention puts the first three parameters in ''r1,r2'' and ''r3'', and it is up to the callee to eventually store them in local temporaries. Assignment is denoted with `<-`, addition and multiplication with `+` and `*`; the jump instruction is followed by the target address, and when the jump is conditional a C like expression follows (but its evaluation is performed early by the `cmp` instruction, that sets a CPU flag recording the result of the comparison). The only tricky expression is ```*(r8 + r7 * 4)`'', that exploits an advanced addressing mechanism corresponding to array indexing (4 is the size of an array cell in bytes, ''r7'' is the index and ''r8'' is the address at which the array starts). It amounts to the C statement ```t[l]`'' that computes the pivot.The rightmost column shows two possible execution paths, with a precise estimation of their cost (here 6 and 21 CPU cycles plus the cost of function calls) and the algebraic conditions characterizing these paths.More precisely
    5455    * The CerCo compiler avoids intra-procedural optimisations and loop optimisations that may change the number of iterations performed in a non trivial way.
    5556    * Some intra-procedural or loop optimisations (like the `while` to `repeat` pre-hoisting optimisation applied by gcc in Fig.�[[#runs|5]]) can be allowed, provided that the compiler records them precisely.
     
    5758  1. The user computes (by hand or semi-automatically) the complexity invariants of each cycle and (recursive) function, and he adds them to the C code as special comments[[footnode.html#foot404|^4^]] (Fig.�[[#invariants|6]]). '''Figure 6:''' Invariants annotated C code. The invariants are user provided.|| img28.png ||
    5859  '''Figure 7:''' Complexity obligations (automatically generated). The user should prove every complexity obligation.|| img29.png ||
    59    The quicksort complexity invariant states the maximum number of clock cycles required by its execution on an array delimited by img4.png and img5.png. Since the procedure can be called with wrong indices (img30.png) the formula has to take into account that the img31.png could be negative using the img32.png function to raise that difference to zero when needed. The literature suggests that this quicksort implementation (where the pivot img6.png is chosen deterministically) has a quadratic complexity in the worst case. Cleaning up the formula from multiplicative and additive constants one obtains the expected asymptotic complexity img33.png. The coefficients are those returned by the cost-annotating compiler. Similarly, the user has to give a complexity invariant for the inner while loop.
     60   The quicksort complexity invariant states the maximum number of clock cycles required by its execution on an array delimited by ''l'' and ''r''. Since the procedure can be called with wrong indices (''r<l'') the formula has to take into account that ''r-l'' could be negative using the ''max'' function to raise that difference to zero when needed. The literature suggests that this quicksort implementation (where the pivot ''v'' is chosen deterministically) has a quadratic complexity in the worst case. Cleaning up the formula from multiplicative and additive constants one obtains the expected asymptotic complexity ''(r-l)*(r-l)''. The coefficients are those returned by the cost-annotating compiler. Similarly, the user has to give a complexity invariant for the inner while loop.
    6061  1. The user and compiler annotated C code is fed into an already existing tool (in this example, Caduceus, [[[node60.html#caduceus|Filli�tre and March�]]]) that produces one complexity obligation for each execution path (Fig.�[[#obligations|7]]).
    6162  1. The user should prove all complexity obligations. The proofs are the certificate that the user provided complexity invariant is correct. In many cases, the obligations can be proved automatically using a general purpose automatic theorem prover or an ad-hoc procedure. For instance, to prove the complexity obligations of Fig.�[[#obligations|7]], we must show that a system of inequations holds, which may be done automatically. When an automatic proof is not possible, the user can resort to an interactive proof.