== [[BR]] Progress beyond state of the art ==
Automatic verification of properties of software components has reached a level of maturity allowing complete correctness proofs of entire compilers; that is of the semantic equivalence between the generated assembly code and its source program. For instance, in the framework of the http://www.info.uni-karlsruhe.de/publications.php/id=213Verifix Project a compiler from a subset of Common Lisp to Transputer code was formally checked in PVS (see [[[node60.html#Dold|Dold and Vialard]]]). [[[node60.html#Strecker|Strecker]]] and [[[node60.html#Klein|Klein and Nipkow]]] certified bytecode compilers from a subset of Java to a subset of the Java Virtual Machine in Isabelle. In the same system, [[[node60.html#Leinenbach|Leinenbach et al.]]] formally verified a compiler from a subset of C to a DLX assembly code. [[[node60.html#Chlipala|Chlipala]]] recently wrote in Coq a certified compiler from the simply-typed lambda calculus to assembly language, making an extensive use of dependent types. Perhaps, the most advanced project is http://pauillac.inria.fr/�xleroy/compcert/Compcert, headed by Leroy, based on the use of Coq both for programming the compiler and proving its correctness. In particular, both the back-end ([[[node60.html#Leroy06|Leroy]]]) and the front-end ([[[node60.html#Leroy08|Leroy and Tristan]]]) of an optimising compiler from a subset of C to PowerPC assembly have been certified in this way.
However, very little is known about the preservation of intensional properties of programs, and in particular about their (concrete) complexity. The theoretical study of the complexity impact of program transformations between different computational models has so far been confined to very foundational devices. Here we propose to address a concrete case of compilation from a typical high-level language to assembly. It is worth remarking that it is unlikely to have constant-time transformations between foundational models: for instance coding a multitape Turing machine into a single tape one could introduce a polynomial slow-down. Thus, complexity is architecture dependent, and the claim that you may pass from one language to another, preserving the performance of your algorithms, must be taken with the due caution. In particular, as surprising as it may be, very little is known about the complexity behaviour of a compiled code with respect to its source; as a matter of fact, most industries producing robots or devices with strong temporal constraints (such as, e.g. photoelectric safety barriers) still program such devices in assembly.
The tacit assumption that the complexity of algorithms is preserved along compilation, while plausible under the suitable assumptions, is not supported by any practical or theoretical study. For instance, a single register is usually used to point to activation records, implicitly delimiting their number; you may take more registers to this purpose, but unless you fix a priori their number (hence fixing the size of the stack), you cannot expect to access data in activation records in constant time. In particular, the memory model assumed by Leroy assumes an infinite memory, where allocation requests always succeed, that clearly conflicts with the reality of embedded software, where one has to work within precise (often relatively small) memory bounds. If working in restricted space on one side allows us to properly weight memory access as a unit cost, on the other side it introduces a subtle interplay between space complexity, time complexity and correctness that will be one of the crucial issues of the project.
Even admitting (as we hope to prove) that in a confined universe we may actually preserve complexity, the main interest of the project is in producing a (certified) computational cost (in terms of clock cycles) for all instruction slices of the source program with O(1) complexity, thus providing precise values for all constants appearing in the cost function for the source. This opens the possibility of computing time constraints for executable code by reasoning directly on the high level input language. In particular, we are not aiming to help analyse the complexity (or termination) of programs (that has to be guessed by the user, as he guesses invariants in axiomatic semantics), but we shall build the necessary infrastructure to reflect a high-level, abstract complexity analysis of the source on a concrete instantiation of its target code.
Such instantiation depends on the target architecture: for instance some microcontrollers lack the multiplication instruction as a primitive operation, preventing to count such an operation with a fixed cost. Moreover, if we are interested in a really tight complexity measure, we cannot expect to have a uniform cost for input instructions since, due to register allocation and optimisations, their actual cost depends on their surrounding context. In other words, we have to face the non compositional nature (in terms of the source structure) of most compiler techniques and optimizations.
The Compcert project represents the current baseline for any future work on compiler certification, comprising the one we plan to do. We will improve on Compcert in two directions: by assuming a formal model where resources (memory) are constrained; and by preserving complexity of O(1) operations, also tracing the way they are mapped to assembly to reflect actual computational costs on the source code. Both improvements greatly increase the exploitation potentials, in particular in the domain of embedded systems and real time computation.
[[BR]]
=== The CerCo approach ===
The complexity of a program only depends on its control-flow structure, and in particular on its cycles (procedures calls are just a special case of cycles). Proving that a compiler preserves complexity amounts to proving that it preserves (up to local modifications, like loop unrolling, in-line expansion, etc.) the control-flow structure of the source[[footnode.html#foot144|^1^]] and, less trivially, that all other instructions are compiled into assembly code whose execution takes a bounded number of clock-cycles (i.e. with O(1) complexity). The interest of the project lies in the possibility to compute these costs directly on the target code then refer them back to the source program, allowing the possibility to make precise and trusted temporal assertions about execution from reasoning on the source code.
As already mentioned, the main problem in the backward translation of costs from target code to source code is the fact that, apart from the overall control flow structure, all remaining structure of the input is usually irremediably lost during compilation: optimizations can move instructions around, change the order of jumps, and in general perform operations that are far from compositional w.r.t. the high level syntactic structure of the input program. So there is no hope to compute costs on an instruction-by-instruction basis of the source language, since the actual cost of the executable is not compositional in these figures. We have to find another, eventually coarser, level of granularity where the source can be sensibly annotated by target costs.
We regard a C program as a collection of mutually defined procedures. The flow inside each procedure is determined by branching instructions like if-then-else; ``while'' loops can be regarded as a special kind of tail recursive procedures. The resulting flow can thus be represented as a directed acyclic graph (DAG). We call a path of the directed acyclic graph an ''execution path''.
[[Image(img3.png)]]
'''Figure 2:''' Quicksort
As a simple example, consider the quicksort program of Fig.2. This algorithm performs in-place sorting of input array ''t'' whose bounds are ''l'' and ''r''; initially ''l'' is expected to be zero, while ''r'' is the length of the array minus one. The outermost conditional terminates when the bounds of the array are illegal. (Sorting an empty array will end the recursive behaviour of the algorithm.) The variable ''v'' is the so called pivot: a selected element of the array that will be compared with all other elements. Bigger elements will be moved (by the swap function) to the end of the array (the upper part), while smaller elements are placed at the beginning of the array (the lower part). Then the pivot is placed between the lower and the upper part of the array, in position ''m'', its position in the resulting sorted array; all elements before the pivot are smaller and all elements following it are bigger. The algorithm completes the sorting with recursive calls on the lower and on the upper parts of the array.
In the body of the `quick_sort` procedure there are only two execution paths, corresponding to the two cases ''lr''.
All operations performed along any of these paths takes some constant time c. The complexity magnitude of the program only depends on the loops and recursive calls met along its execution paths, but not on their associated constants. On the other hand, if we want to give tight performance bounds to the execution time, we have to compute the real constants on the executable.
The compiler must be able to return a set of pairs ''p(i),c(i)'', where each ''p(i)'' is an execution path, and ''c(i)'' is its actual cost[[footnode.html#foot154|^2^]]. It is then up to the user to guess (and to prove, assisted by interactive tools) the complexity of the program: the compiler only provides the infrastructure required to map a complexity analysis on the source into a faithful analog on the target. This approach looks compatible with most local optimizations. Moreover, since we work on a cycle-by-cycle (procedure-by-procedure) basis, the approach should scale up well.
[[BR]]
=== User interaction flow ===
The left part of Fig.3 shows the interaction diagram for the final user of the system (the right part is a planned case study for a possible automation of the process and will be discussed later).
[[Image(interaction_diagram.png)]][[Image(case_study_diagram.png)]]
'''Figure 3:''' Interaction and automation diagrams
The interaction is done in several steps. We illustrate them using the quicksort program above.
1. The user writes her code in C (see Fig.2) and compiles it with our CerCo (Certified Complexity) compiler.
2. The CerCo compiler outputs both the object code and an annotated copy of the C source (Fig.4).
Each loop and function body is annotated with the cost of one iteration, along all its possible execution paths. The cost is expressed as a function of the state of the program, which comprises the value of every variable. (In the example, we use a textual annotation for simplicity, but we expect to produce a more structured output.) The leftmost column of Fig.5 shows the original source code. Colours are used to relate source code statements with their respective (human readable) assembly instructions, reported in the central column. That assembly was produced by gcc[[footnode.html#foot390|^3^]] with a moderate level of optimizations for an Intel 386 family microprocessor. We used ''l,r,t,v'' and ''m'' to mention locations (registers and/or stack frame temporaries) in which the corresponding C variables are placed, while '' r1,...,r9'' are other register or temporaries that have no direct mapping to C. The calling convention puts the first three parameters in ''r1,r2'' and ''r3'', and it is up to the callee to eventually store them in local temporaries. Assignment is denoted with `<-`, addition and multiplication with `+` and `*`; the jump instruction is followed by the target address, and when the jump is conditional a C like expression follows (but its evaluation is performed early by the `cmp` instruction, that sets a CPU flag recording the result of the comparison). The only tricky expression is ```*(r8 + r7 * 4)`'', that exploits an advanced addressing mechanism corresponding to array indexing (4 is the size of an array cell in bytes, ''r7'' is the index and ''r8'' is the address at which the array starts). It amounts to the C statement ```t[l]`'' that computes the pivot.The rightmost column shows two possible execution paths, with a precise estimation of their cost (here 6 and 21 CPU cycles plus the cost of function calls) and the algebraic conditions characterizing these paths.More precisely
* The CerCo compiler avoids intra-procedural optimisations and loop optimisations that may change the number of iterations performed in a non trivial way.
* Some intra-procedural or loop optimisations (like the `while` to `repeat` pre-hoisting optimisation applied by gcc in Fig.�[[#runs|5]]) can be allowed, provided that the compiler records them precisely.
* Once the assembly code is produced, the assembly-level control flow graph is analysed in order to compute the cost of execution paths. Fig.�[[#runs|5]] shows two of them in the rightmost column; the analysis of the `while` loop has been omitted, but is similar.
||[[Image(img20.png)]] ||
||'''Figure 4:''' Cost annotated C code (generated by the compiler) ||
|| [[Image(assembly.png,1200)]] ||
||'''Figure 5:''' Automatic inference of cost annotations from assembly code||
3. The user computes (by hand or semi-automatically) the complexity invariants of each cycle and (recursive) function, and he adds them to the C code as special comments[[footnode.html#foot404|^4^]] (Fig.6).
||[[Image(img29.png)]]||
|| '''Figure 6:''' Invariants annotated C code. The invariants are user provided.||
The quicksort complexity invariant states the maximum number of clock cycles required by its execution on an array delimited by ''l'' and ''r''. Since the procedure can be called with wrong indices (''r