# Changes between Version 1 and Version 2 of Proposal1.2

Ignore:
Timestamp:
Jun 15, 2010, 1:18:56 PM (7 years ago)
Comment:

--

### Legend:

Unmodified
 v1 = Progress beyond state of the art = == [[BR]] Progress beyond state of the art == Automatic verification of properties of software components has reached a level of maturity allowing complete correctness proofs of entire compilers; that is of the semantic equivalence between the generated assembly code and its source program. For instance, in the framework of the http://www.info.uni-karlsruhe.de/publications.php/id=213Verifix Project a compiler from a subset of Common Lisp to Transputer code was formally checked in PVS (see [[[node60.html#Dold|Dold and Vialard]]]). [[[node60.html#Strecker|Strecker]]] and [[[node60.html#Klein|Klein and Nipkow]]] certified bytecode compilers from a subset of Java to a subset of the Java Virtual Machine in Isabelle. In the same system, [[[node60.html#Leinenbach|Leinenbach et al.]]] formally verified a compiler from a subset of C to a DLX assembly code. [[[node60.html#Chlipala|Chlipala]]] recently wrote in Coq a certified compiler from the simply-typed lambda calculus to assembly language, making an extensive use of dependent types. Perhaps, the most advanced project is http://pauillac.inria.fr/�xleroy/compcert/Compcert, headed by Leroy, based on the use of Coq both for programming the compiler and proving its correctness. In particular, both the back-end ([[[node60.html#Leroy06|Leroy]]]) and the front-end ([[[node60.html#Leroy08|Leroy and Tristan]]]) of an optimising compiler from a subset of C to PowerPC assembly have been certified in this way. Automatic verification of properties of software components has reached a level of maturity allowing complete correctness proofs of entire compilers; that is of the semantic equivalence between the generated assembly code and its source program. For instance, in the framework of the \href{http://www.info.uni-karlsruhe.de/publications.php/id=213}{Verifix Project} a compiler from a subset of Common Lisp to Transputer code was formally checked in PVS (see \cite{Dold}). \cite{Strecker} and \cite{Klein} certified bytecode compilers from a subset of Java to a subset of the Java Virtual Machine in Isabelle. In the same system, \cite{Leinenbach} formally verified a compiler from a subset of C to a DLX assembly code. \cite{Chlipala} recently wrote in Coq a certified compiler from the simply-typed lambda calculus to assembly language, making an extensive use of dependent types. Perhaps, the most advanced project is \href{http://pauillac.inria.fr/~xleroy/compcert/}{Compcert}, headed by Leroy, based on the use of Coq both for programming the compiler and proving its correctness. In particular, both the back-end (\cite{Leroy06}) and the front-end (\cite{Leroy08}) of an optimising compiler from a subset of C to PowerPC assembly have been certified in this way. However, very little is known about the preservation of intensional properties of programs, and in particular about their (concrete) complexity. The theoretical study of the complexity impact of program transformations between different computational models has so far been confined to very foundational devices. Here we propose to address a concrete case of compilation from a typical high-level language to assembly. It is worth remarking that it is unlikely to have constant-time transformations between foundational models: for instance coding a multitape Turing machine into a single tape one could introduce a polynomial slow-down. Thus, complexity is architecture dependent, and the claim that you may pass from one language to another, preserving the performance of your algorithms, must be taken with the due caution. In particular, as surprising as it may be, very little is known about the complexity behaviour of a compiled code with respect to its source; as a matter of fact, most industries producing robots or devices with strong temporal constraints (such as, e.g. photoelectric safety barriers) still program such devices in assembly. The tacit assumption that the complexity of algorithms is preserved along compilation, while plausible under the suitable assumptions, is not supported by any practical or theoretical study. For instance, a single register is usually used to point to activation records, implicitly delimiting their number; you may take more registers to this purpose, but unless you fix a priori their number (hence fixing the size of the stack), you cannot expect to access data in activation records in constant time. In particular, the memory model assumed by Leroy assumes an infinite memory, where allocation requests always succeed, that clearly conflicts with the reality of embedded software, where one has to work within precise (often relatively small) memory bounds. If working in restricted space on one side allows us to properly weight memory access as a unit cost, on the other side it introduces a subtle interplay between space complexity, time complexity and correctness that will be one of the crucial issues of the project. However, very little is known about the preservation of intensional properties of programs, and in particular about their (concrete) complexity. The theoretical study of the complexity impact of program transformations between different computational models has so far been confined to very foundational devices.  Here we propose to address a concrete case of compilation from a typical high-level language to assembly. It is worth remarking that it is unlikely to have constant-time transformations between foundational models: for instance coding a multitape Turing machine into a single tape one could introduce a polynomial slow-down.  Thus, complexity is architecture dependent, and the claim that you may pass from one language to another, preserving the performance of your algorithms, must be taken with the due caution.  In particular, as surprising as it may be, very little is known about the complexity behaviour of a compiled code with respect to its source; as a matter of fact, most industries producing robots or devices with strong temporal constraints (such as, e.g.\ photoelectric safety barriers) still program such devices in assembly. Even admitting (as we hope to prove) that in a confined universe we may actually preserve complexity, the main interest of the project is in producing a (certified) computational cost (in terms of clock cycles) for all instruction slices of the source program with O(1) complexity, thus providing precise values for all constants appearing in the cost function for the source. This opens the possibility of computing time constraints for executable code by reasoning directly on the high level input language. In particular, we are not aiming to help analyse the complexity (or termination) of programs (that has to be guessed by the user, as he guesses invariants in axiomatic semantics), but we shall build the necessary infrastructure to reflect a high-level, abstract complexity analysis of the source on a concrete instantiation of its target code. The tacit assumption that the complexity of algorithms is preserved along compilation, while plausible under the suitable assumptions, is not supported by any practical or theoretical study. For instance, a single register is usually used to point to activation records, implicitly delimiting their number; you may take more registers to this purpose, but unless you fix a priori their number (hence fixing the size of the stack), you cannot expect to access data in activation records in constant time. In particular, the memory model assumed by Leroy assumes an infinite memory, where allocation requests always succeed, that clearly conflicts with the reality of embedded software, where one has to work within precise (often relatively small) memory bounds. If working in restricted space on one side allows us to properly weight memory access as a unit cost, on the other side it introduces a subtle interplay between space complexity, time complexity and correctness that will be one of the crucial issues of the project. Such instantiation depends on the target architecture: for instance some microcontrollers lack the multiplication instruction as a primitive operation, preventing to count such an operation with a fixed cost. Moreover, if we are interested in a really tight complexity measure, we cannot expect to have a uniform cost for input instructions since, due to register allocation and optimisations, their actual cost depends on their surrounding context. In other words, we have to face the non compositional nature (in terms of the source structure) of most compiler techniques and optimizations. Even admitting (as we hope to prove) that in a confined universe we may actually preserve complexity, the main interest of the project is in producing a (certified) computational cost (in terms of clock cycles) for all instruction slices of the source program with O(1) complexity, thus providing precise values for all constants appearing in the cost function for the source. This opens the possibility of computing time constraints for executable code by reasoning directly on the high level input language. In particular, we are not aiming to help analyse the complexity (or termination) of programs (that has to be guessed by the user, as he guesses invariants in axiomatic semantics), but we shall build the necessary infrastructure to reflect a high-level, abstract complexity analysis of the source on a concrete instantiation of its target code. The Compcert project represents the current baseline for any future work on compiler certification, comprising the one we plan to do. We will improve on Compcert in two directions: by assuming a formal model where resources (memory) are constrained; and by preserving complexity of O(1) operations, also tracing the way they are mapped to assembly to reflect actual computational costs on the source code. Both improvements greatly increase the exploitation potentials, in particular in the domain of embedded systems and real time computation. Such instantiation depends on the target architecture: for instance some microcontrollers lack the multiplication instruction as a primitive operation, preventing to count such an operation with a fixed cost.  Moreover, if we are interested in a really tight complexity measure, we cannot expect to have a uniform cost for input instructions since, due to register allocation and optimisations, their actual cost depends on their surrounding context. In other words, we have to face the non compositional nature (in terms of the source structure) of most compiler techniques and optimizations. [[BR]] The Compcert project represents the current baseline for any future work on compiler certification, comprising the one we plan to do. We will improve on Compcert in two directions: by assuming a formal model where resources (memory) are constrained; and by preserving complexity of O(1) operations, also tracing the way they are mapped to assembly to reflect actual computational costs on the source code. Both improvements greatly increase the exploitation potentials, in particular in the domain of embedded systems and real time computation. === The CerCo approach === == The !CerCo approach == The complexity of a program only depends on its control-flow structure, and in particular on its cycles (procedures calls are just a special case of cycles). Proving that a compiler preserves complexity amounts to proving that it preserves (up to local modifications, like loop unrolling, in-line expansion, etc.) the control-flow structure of the source[[footnode.html#foot144|^1^]] and, less trivially, that all other instructions are compiled into assembly code whose execution takes a bounded number of clock-cycles (i.e. with O(1) complexity). The interest of the project lies in the possibility to compute these costs directly on the target code then refer them back to the source program, allowing the possibility to make precise and trusted temporal assertions about execution from reasoning on the source code. The complexity of a program only depends on its control-flow structure, and in particular on its cycles (procedures calls are just a special case of cycles). Proving that a compiler preserves complexity amounts to proving that it preserves (up to local modifications, like loop unrolling, in-line expansion, etc.) the control-flow structure of the source\footnote{This requires, in turn, the preservation of semantics: the right conditions must be tested, and procedures must be called with the correct parameters.} and, less trivially, that all other instructions are compiled into assembly code whose execution takes a bounded number of clock-cycles (i.e. with O(1) complexity). The interest of the project lies in the possibility to compute these costs directly on the target code then refer them back to the source program, allowing the possibility to make precise and trusted temporal assertions about execution from reasoning on the source code. As already mentioned, the main problem in the backward translation of costs from target code to source code is the fact that, apart from the overall control flow structure, all remaining structure of the input is usually irremediably lost during compilation: optimizations can move instructions around, change the order of jumps, and in general perform operations that are far from compositional w.r.t. the high level syntactic structure of the input program. So there is no hope to compute costs on an instruction-by-instruction basis of the source language, since the actual cost of the executable is not compositional in these figures. We have to find another, eventually coarser, level of granularity where the source can be sensibly annotated by target costs. We regard a C program as a collection of mutually defined procedures. The flow inside each procedure is determined by branching instructions like if-then-else; while'' loops can be regarded as a special kind of tail recursive procedures. The resulting flow can thus be represented as a directed acyclic graph (DAG). We call a path of the directed acyclic graph an ''execution path''. As already mentioned, the main problem in the backward translation of costs from target code to source code is the fact that, apart from the overall control flow structure, all remaining structure of the input is usually irremediably lost during compilation: optimizations can move instructions around, change the order of jumps, and in general perform operations that are far from compositional w.r.t.\ the high level syntactic structure of the input program.  So there is no hope to compute costs on an instruction-by-instruction basis of the source language, since the actual cost of the executable is not compositional in these figures. We have to find another, eventually coarser, level of granularity where the source can be sensibly annotated by target costs. '''Figure 2:''' Quicksort ||  || We regard a C program as a collection of mutually defined procedures. The flow inside each procedure is determined by branching instructions like if-then-else; while'' loops can be regarded as a special kind of tail recursive procedures. The resulting flow can thus be represented as a directed acyclic graph (DAG).  We call a path of the directed acyclic graph an {\em execution path}. \begin{center} \begin{figure}[htb] \begin{verbatim} void quicksort(int t[], int l, int r) { if (l < r) { int v = t[l]; int m = l; int i = l + 1; while (i <= r) { if (t[i] < v) { m++; swap(t, i, m); } i++; } swap(t, l, m); quicksort(t, l, m - 1); quicksort(t, m + 1, r); } } As a simple example, consider the quicksort program of Fig.�[[#code|2]]. This algorithm performs in-place sorting of input array img3.png whose bounds are img4.png and img5.png; initially img4.png is expected to be zero, while img5.png is the length of the array minus one. The outermost conditional terminates when the bounds of the array are illegal. (Sorting an empty array will end the recursive behaviour of the algorithm.) The variable img6.png is the so called pivot: a selected element of the array that will be compared with all other elements. Bigger elements will be moved (by the swap function) to the end of the array (the upper part), while smaller elements are placed at the beginning of the array (the lower part). Then the pivot is placed between the lower and the upper part of the array, in position img7.png, its position in the resulting sorted array; all elements before the pivot are smaller and all elements following it are bigger. The algorithm completes the sorting with recursive calls on the lower and on the upper parts of the array. \end{verbatim} \caption{\label{code}Quicksort} \end{figure} \end{center} As a simple example, consider the quicksort program of Fig.~\ref{code}.  This algorithm performs in-place sorting of input array $t$ whose bounds are $l$ and $r$; initially $l$ is expected to be zero, while $r$ is the length of the array minus one. The outermost conditional terminates when the bounds of the array are illegal. (Sorting an empty array will end the recursive behaviour of the algorithm.) The variable $v$ is the so called pivot: a selected element of the array that will be compared with all other elements. Bigger elements will be moved (by the swap function) to the end of the array (the upper part), while smaller elements are placed at the beginning of the array (the lower part).  Then the pivot is placed between the lower and the upper part of the array, in position $m$, its position in the resulting sorted array; all elements before the pivot are smaller and all elements following it are bigger.  The algorithm completes the sorting with recursive calls on the lower and on the upper parts of the array. In the body of the quick_sort procedure there are only two execution paths, corresponding to the two cases img8.png and img9.png. The latter is a trivial path, immediately leading to termination. The former leads to the while loop (that is regarded as a procedure call), the call to swap, and the two recursive calls. Similarly, the body of the while loop is composed by two paths, corresponding to the two conditions img10.png and img11.png. All operations performed along any of these paths takes some constant time c. The complexity magnitude of the program only depends on the loops and recursive calls met along its execution paths, but not on their associated constants. On the other hand, if we want to give tight performance bounds to the execution time, we have to compute the real constants on the executable. In the body of the \verb+quick_sort+ procedure there are only two execution paths, corresponding to the two cases $l < r$ and $l \ge r$. The latter is a trivial path, immediately leading to termination. The former leads to the while loop (that is regarded as a procedure call), the call to swap, and the two recursive calls. Similarly, the body of the while loop is composed by two paths, corresponding to the two conditions $i \le r$ and $i > r$. The compiler must be able to return a set of pairs img12.png, where each img13.png is an execution path, and img14.png is its actual cost[[footnode.html#foot154|^2^]]. It is then up to the user to guess (and to prove, assisted by interactive tools) the complexity of the program: the compiler only provides the infrastructure required to map a complexity analysis on the source into a faithful analog on the target. This approach looks compatible with most local optimizations. Moreover, since we work on a cycle-by-cycle (procedure-by-procedure) basis, the approach should scale up well. All operations performed along any of these paths takes some constant time c.  The complexity magnitude of the program only depends on the loops and recursive calls met along its execution paths, but not on their associated constants. On the other hand, if we want to give tight performance bounds to the execution time, we have to compute the real constants on the executable. [[BR]] === User interaction flow === The compiler must be able to return a set of pairs $(p_i,c_i)$, where each $p_i$ is an execution path, and $c_i$ is its actual cost\footnote{A more flexible result would consist in returning pairs $(p_i,a_i)$ where $a_i$ is the sequence of assembly instructions corresponding to $p_i$; this would allow to take space into consideration, as well as time.}. It is then up to the user to guess (and to prove, assisted by interactive tools) the complexity of the program: the compiler only provides the infrastructure required to map a complexity analysis on the source into a faithful analog on the target.  This approach looks compatible with most local optimizations. Moreover, since we work on a cycle-by-cycle (procedure-by-procedure) basis, the approach should scale up well. The left part of Fig.�[[#interaction|3]] shows the interaction diagram for the final user of the system (the right part is a planned case study for a possible automation of the process and will be discussed later). == User interaction flow == '''Figure 3:''' Interaction and automation diagrams The left part of Fig.~\ref{interaction} shows the interaction diagram for the final user of the system (the right part is a planned case study for a possible automation of the process and will be discussed later). \begin{figure}[tb] \begin{center} \includegraphics[width=0.40\textwidth]{interaction_diagram} \hspace{0.5cm} \vrule \hspace{1cm} \includegraphics[width=0.45\textwidth]{case_study_diagram} \caption{\label{interaction}Interaction and automation diagrams} \end{center} \end{figure} The interaction is done in several steps. We illustrate them using the quicksort program above. \begin{enumerate} \item The user writes her code in C~(see Fig.~\ref{code}) and compiles it with our CerCo (Certified Complexity) compiler. \item The CerCo compiler outputs both the object code and an annotated copy of the C source (Fig.~\ref{annotated}). || img17.png || \begin{figure}[tb] \begin{quote} \begin{verbatim} void quick_rec(int t[], int l, int r) { /* Cost annotation for quick_rec body (1 cycle only) @ if (l=r) time += 6; */ if (l < r) { int v = t[l]; int m = l; int i = l + 1; while (i <= r) { /* Cost annotation for while (1 cycle only) @ if (t[i] < v) time += 4; else time += 5; */ if (t[i] < v) { m++; swap(t, i, m); } i++; } swap(t, l, m); quick_rec(t, l, m - 1); quick_rec(t, m + 1, r); } } \end{verbatim} \end{quote} \caption{\label{annotated}Cost annotated C code (generated by the compiler)} \end{figure} The interaction is done in several steps. We illustrate them using the quicksort program above. \begin{figure}[p] \begin{center} \begin{tabular}{ccc} \begin{minipage}[t]{.3\textwidth} \textbf{C source}\\~\\ \texttt{\scriptsize{ \colorbox{cola}{void quicksort(t,l,r) \{}\\ \colorbox{colb}{\ \ if (l {\textless} r) \{}\\ \colorbox{colc}{\ \ \ \ int i = l + 1;}\\ \colorbox{cold}{\ \ \ \ int m = l;}\\ \colorbox{cole}{\ \ \ \ int v = t[l];}\\ \colorbox{colf}{\ \ \ \ while (i {\textless}= r) \{}\\ \colorbox{colg}{\ \ \ \ \ \ if (t[i] {\textless} v) \{ }\\ \colorbox{colh}{\ \ \ \ \ \ \ \ m++; }\\ \colorbox{coli}{\ \ \ \ \ \ \ \ swap(t, i, m); \}}\\ \colorbox{colj}{\ \ \ \ \ \ i++;\}}\\ \colorbox{coll}{\ \ \ \ swap(t, l, m);}\\ \colorbox{colm}{\ \ \ \ quicksort(t, l, m {}- 1);}\\ \colorbox{coln}{\ \ \ \ quicksort(t, m + 1, r);}\\ \colorbox{cola}{\}}\\ }} \end{minipage} & \begin{minipage}[t]{.3\textwidth} \textbf{Pseudo-Assembly code}\\~\\ \texttt{\scriptsize{ \colorbox{cola}{24: r <- r3}\\ \colorbox{cola}{29: l <- r2}\\ \colorbox{colb}{34: cmp l r}\\ \colorbox{cola}{36: t <- r1}\\ \colorbox{colb}{3a: jump c4 if l >= r}\\ \colorbox{colc}{40: i <- l + 1}\\ \colorbox{cole}{44: r8 <- t}\\ \colorbox{cole}{48: r7 <- l}\\ \colorbox{cold}{4b: m <- l} \vspace{0.4em}\\ \framebox[0.7\linewidth]{ \begin{minipage}[t]{0.6\textwidth} 1. The user writes her code in C�(see Fig.�[[node6.html#code|2]]) and compiles it with our CerCo (Certified Complexity) compiler. 1. The CerCo compiler outputs both the object code and an annotated copy of the C source (Fig.�[[#annotated|4]]). '''Figure 4:''' Cost annotated C code (generated by the compiler)||  || '''Figure 5:''' Automatic inference of cost annotations from assembly code|| || || C source[[BR]]� [[BR]]~- void quicksort(t,l,r) {[[BR]] if (l < r) {[[BR]] int i = l + 1;[[BR]] int m = l;[[BR]] int v = t[l];[[BR]] while (i <= r) {[[BR]] if (t[i] < v) { [[BR]] m++; [[BR]] swap(t, i, m); }[[BR]] i++;}[[BR]] swap(t, l, m);[[BR]] quicksort(t, l, m - 1);[[BR]] quicksort(t, m + 1, r);[[BR]]}[[BR]]-~ || || || Pseudo-Assembly code[[BR]]� [[BR]]~- 24: r <- r3[[BR]]29: l <- r2[[BR]]34: cmp l r[[BR]]36: t <- r1[[BR]]3a: jump c4 if l >= r[[BR]]40: i <- l + 1[[BR]]44: r8 <- t[[BR]]48: r7 <- l[[BR]]4b: m <- l[[BR]][[BR]]img20.png[[BR]][[BR]]97: r1 <- t[[BR]]9b: r3 <- m[[BR]]9e: r2 <- l[[BR]]a1: call swap[[BR]]a6: r1 <- t[[BR]]aa: r3 <- m - 0x1[[BR]]af: r2 <- l[[BR]]b2: call quicksort[[BR]]bc: l <- r6[[BR]]bf: call quicksort[[BR]]c4: ret -~ || || || Execution Paths[[BR]]� [[BR]]~- l >= r img21.png [[BR]]24: r <- r3[[BR]]29: l <- r2[[BR]]34: cmp l r[[BR]]36: t <- r1[[BR]]3a: jump c4 if l >= r[[BR]]c4: ret[[BR]]�[[BR]]total: 6 clock cycles[[BR]]�[[BR]]l < r img21.png [[BR]]24: r <- r3[[BR]]29: l <- r2[[BR]]34: cmp l r[[BR]]36: t <- r1[[BR]]3a: jump c4 if l >= r[[BR]]40: i <- l + 1[[BR]]44: r8 <- t[[BR]]48: r7 <- l[[BR]]4b: m <- l[[BR]]while loop[[BR]]97: r1 <- t[[BR]]9b: r3 <- m[[BR]]9e: r2 <- l[[BR]]a1: call swap [[BR]]swap [[BR]]a6: r1 <- t[[BR]]aa: r3 <- m - 0x1[[BR]]af: r2 <- l[[BR]]b2: call quicksort[[BR]]quicksort[[BR]]bc: l <- r6[[BR]]bf: call quicksort[[BR]]quicksort[[BR]]c4: ret[[BR]]�[[BR]]total: 21 clock cycles + function calls -~ || || || Each loop and function body is annotated with the cost of one iteration, along all its possible execution paths. The cost is expressed as a function of the state of the program, which comprises the value of every variable. (In the example, we use a textual annotation for simplicity, but we expect to produce a more structured output.) The leftmost column of Fig.�[[#runs|5]] shows the original source code. Colours are used to relate source code statements with their respective (human readable) assembly instructions, reported in the central column. That assembly was produced by gcc[[footnode.html#foot390|^3^]] with a moderate level of optimizations for an Intel 386 family microprocessor. We used img4.png, img5.png, img3.png, img6.png and img7.png to mention locations (registers and/or stack frame temporaries) in which the corresponding C variables are placed, while img22.png are other register or temporaries that have no direct mapping to C. The calling convention puts the first three parameters in img23.png, img24.png and img25.png, and it is up to the callee to eventually store them in local temporaries. Assignment is denoted with <-, addition and multiplication with + and *; the jump instruction is followed by the target address, and when the jump is conditional a C like expression follows (but its evaluation is performed early by the cmp instruction, that sets a CPU flag recording the result of the comparison). The only tricky expression is *(r8 + r7 * 4)'', that exploits an advanced addressing mechanism corresponding to array indexing (4 is the size of an array cell in bytes, img26.png is the index and img27.png is the address at which the array starts). It amounts to the C statement t[l]'' that computes the pivot.The rightmost column shows two possible execution paths, with a precise estimation of their cost (here 6 and 21 CPU cycles plus the cost of function calls) and the algebraic conditions characterizing these paths.More precisely * The CerCo compiler avoids intra-procedural optimisations and loop optimisations that may change the number of iterations performed in a non trivial way. * Some intra-procedural or loop optimisations (like the while to repeat pre-hoisting optimisation applied by gcc in Fig.�[[#runs|5]]) can be allowed, provided that the compiler records them precisely. * Once the assembly code is produced, the assembly-level control flow graph is analysed in order to compute the cost of execution paths. Fig.�[[#runs|5]] shows two of them in the rightmost column; the analysis of the while loop has been omitted, but is similar. 1. The user computes (by hand or semi-automatically) the complexity invariants of each cycle and (recursive) function, and he adds them to the C code as special comments[[footnode.html#foot404|^4^]] (Fig.�[[#invariants|6]]). '''Figure 6:''' Invariants annotated C code. The invariants are user provided.|| img28.png || '''Figure 7:''' Complexity obligations (automatically generated). The user should prove every complexity obligation.|| img29.png || The quicksort complexity invariant states the maximum number of clock cycles required by its execution on an array delimited by img4.png and img5.png. Since the procedure can be called with wrong indices (img30.png) the formula has to take into account that the img31.png could be negative using the img32.png function to raise that difference to zero when needed. The literature suggests that this quicksort implementation (where the pivot img6.png is chosen deterministically) has a quadratic complexity in the worst case. Cleaning up the formula from multiplicative and additive constants one obtains the expected asymptotic complexity img33.png. The coefficients are those returned by the cost-annotating compiler. Similarly, the user has to give a complexity invariant for the inner while loop. 1. The user and compiler annotated C code is fed into an already existing tool (in this example, Caduceus, [[[node60.html#caduceus|Filli�tre and March�]]]) that produces one complexity obligation for each execution path (Fig.�[[#obligations|7]]). 1. The user should prove all complexity obligations. The proofs are the certificate that the user provided complexity invariant is correct. In many cases, the obligations can be proved automatically using a general purpose automatic theorem prover or an ad-hoc procedure. For instance, to prove the complexity obligations of Fig.�[[#obligations|7]], we must show that a system of inequations holds, which may be done automatically. When an automatic proof is not possible, the user can resort to an interactive proof. \colorbox{white}{while loop}\\ \tiny{ \colorbox{colf}{4e: cmp i r}\\ \colorbox{cole}{53: v <- *(r8 + r7 * 4) }\\ \colorbox{colf}{57: jump 97 if i > r}\\ \colorbox{colg}{59: r7 <- i}\\ \colorbox{colg}{5c: r9 <- r8 + r7 * 4}\\ \colorbox{colf}{60: jump 6e}\\ \colorbox{colj}{62: i <- i + 0x1}\\ \colorbox{colg}{65: r9 <- r9 + 0x4}\\ \colorbox{colf}{69: cmp i r}\\ \colorbox{colf}{6c: jump 92 if i > r}\\ \colorbox{colg}{6e: cmp v *r9}\\ \colorbox{colg}{72: jump 62 if v <= r9}\\ \colorbox{coli}{74: r1 <- t}\\ \colorbox{colh}{78: m <- m + 0x1}\\ \colorbox{coli}{7c: r2 <- i}\\ \colorbox{coli}{7e: r3 <- r12d}\\ \colorbox{colj}{81: i <- i + 0x1}\\ \colorbox{colg}{84: r9 <- r9 + 0x4}\\ \colorbox{coli}{88: call swap}\\ \colorbox{colf}{8d: cmp i r}\\ \colorbox{colf}{90: jump 6e if i <= r}\\ \colorbox{coln}{92: r6 <- m + 0x1}\\ } \end{minipage} } \vspace{0.4em} \\ The right part of Fig.�[[#interaction|3]] describes a planned case study for the automation of the complexity proof. We start with a synchronous program which is compiled to C code. The CerCo compiler then produces suitable cost annotations which are used by an invariant synthesizer to build complexity assertions on the C code. The synthesizer can take advantage of the high level control flow information contained in the source synchronous program. The deductive platform (Caduceus) generates complexity obligations which are passed to an ad-hoc proof generator to produce a machine-checked proof from which we can extract a certified bound on the reaction time of the original synchronous program. The proof generator can also take advantage of the high level information coming from the original source program, and user interaction can be used to drive the generator in critical cases. \colorbox{coll}{97: r1 <- t}\\ \colorbox{coll}{9b: r3 <- m}\\ \colorbox{coll}{9e: r2 <- l}\\ \colorbox{coll}{a1: call swap}\\ \colorbox{colm}{a6: r1 <- t}\\ \colorbox{colm}{aa: r3 <- m {}- 0x1}\\ \colorbox{colm}{af: r2 <- l}\\ \colorbox{colm}{b2: call quicksort}\\ \colorbox{coln}{bc: l <- r6}\\ \colorbox{coln}{bf: call quicksort}\\ \colorbox{cola}{c4: ret} }} \end{minipage} & \begin{minipage}[t]{.3\textwidth} \textbf{Execution Paths}\\~\\ \texttt{\scriptsize{ \colorbox{white}{l >= r $\rightarrow$ }\\ \colorbox{cola}{24: r <- r3}\\ \colorbox{cola}{29: l <- r2}\\ \colorbox{colb}{34: cmp l r}\\ \colorbox{cola}{36: t <- r1}\\ \colorbox{colb}{3a: jump c4 if l >= r}\\ \colorbox{cola}{c4: ret}\\ \colorbox{white}{~}\\ \colorbox{white}{total: 6 clock cycles}\\ \colorbox{white}{~}\\ \colorbox{white}{l < r $\rightarrow$ }\\ \colorbox{cola}{24: r <- r3}\\ \colorbox{cola}{29: l <- r2}\\ \colorbox{colb}{34: cmp l r}\\ \colorbox{cola}{36: t <- r1}\\ \colorbox{colb}{3a: jump c4 if l >= r}\\ \colorbox{colc}{40: i <- l + 1}\\ \colorbox{cole}{44: r8 <- t}\\ \colorbox{cole}{48: r7 <- l}\\ \colorbox{cold}{4b: m <- l}\\ \colorbox{white}{while loop}\\ [[BR]] \colorbox{coll}{97: r1 <- t}\\ \colorbox{coll}{9b: r3 <- m}\\ \colorbox{coll}{9e: r2 <- l}\\ \colorbox{coll}{a1: call swap }\\ \colorbox{white}{swap }\\ \colorbox{colm}{a6: r1 <- t}\\ \colorbox{colm}{aa: r3 <- m {}- 0x1}\\ \colorbox{colm}{af: r2 <- l}\\ \colorbox{colm}{b2: call quicksort}\\ \colorbox{white}{quicksort}\\ \colorbox{coln}{bc: l <- r6}\\ \colorbox{coln}{bf: call quicksort}\\ \colorbox{white}{quicksort}\\ \colorbox{cola}{c4: ret}\\ \colorbox{white}{~}\\ \colorbox{white}{total: 21 clock cycles + function calls} }} \end{minipage} \end{tabular} \end{center} \caption{\label{runs}Automatic inference of cost annotations from assembly code} \end{figure} Each loop and function body is annotated with the cost of one iteration, along all its possible execution paths. The cost is expressed as a function of the state of the program, which comprises the value of every variable.  (In the example, we use a textual annotation for simplicity, but we expect to produce a more structured output.) === Certification: tools and techniques === In order to trust the process described in the previous section, we need to trust the CerCo compiler. I.e. we need to fulfil the following requirements: The leftmost column of Fig.~\ref{runs} shows the original source code.  Colours are used to relate source code statements with their respective (human readable) assembly instructions, reported in the central column.  That assembly was produced by gcc\footnote{GNU compiler collection, version 4.2} with a moderate level of optimizations for an Intel 386 family microprocessor.  We used $l$, $r$, $t$, $v$ and $m$ to mention locations (registers and/or stack frame temporaries) in which the corresponding C variables are placed, while $r1, \ldots, r9$ are other register or temporaries that have no direct mapping to C. The calling convention puts the first three parameters in $r1$, $r2$ and $r3$, and it is up to the callee to eventually store them in local temporaries.  Assignment is denoted with \texttt{<-}, addition and multiplication with \texttt{+} and \texttt{*}; the jump instruction is followed by the target address, and when the jump is conditional a C like expression follows (but its evaluation is performed early by the \texttt{cmp} instruction, that sets a CPU flag recording the result of the comparison). The only tricky expression is \texttt{*(r8 + r7 * 4)}'', that exploits an advanced addressing mechanism corresponding to array indexing (4 is the size of an array cell in bytes, $r7$ is the index and $r8$ is the address at which the array starts). It amounts to the C statement \texttt{t[l]}'' that computes the pivot. 1. the compiled assembly program respects the semantics of the C program 1. we need to know that the number of iterations performed by the C and assembly programs are the same, i.e. to prove that the compiler preserves the complexity 1. we need to prove that the cost annotations generated by the compiler really correspond to the number of clock cycles spent by the hardware The rightmost column shows two possible execution paths, with a precise estimation of their cost (here 6 and 21 CPU cycles plus the cost of function calls) and the algebraic conditions characterizing these paths. For this reason, we plan to[[footnode.html#foot438|^5^]]: 1. develop an untrusted CerCo compiler prototype in high level programming language; 1. provide an executable formal specification of the target microprocessor; 1. provide an executable formal specification of the C language; 1. develop an executable version of the CerCo compiler in a language suitable to formal correctness proofs; 1. give a machine checkable proof that the latter implementation satisfies all the requirements mentioned above. More precisely \begin{itemize} \item The CerCo compiler avoids intra-procedural optimisations and loop optimisations that may change the number of iterations performed in a non trivial way. \item Some intra-procedural or loop optimisations (like the \texttt{while} to \texttt{repeat} pre-hoisting optimisation applied by gcc in Fig.~\ref{runs}) can be allowed, provided that the compiler records them precisely. \item Once the assembly code is produced, the assembly-level control flow graph is analysed in order to compute the cost of execution paths. Fig.~\ref{runs} shows two of them in the rightmost column; the analysis  of the \texttt{while} loop has been omitted, but is similar. \end{itemize} \item The user computes (by hand or semi-automatically) the complexity invariants of each cycle and (recursive) function, and he adds them to the C code as special comments\footnote{Again, more interactive forms of annotations can be considered.} (Fig.~\ref{invariants}). \begin{figure}[p] \begin{quote} \begin{verbatim} The untrusted compiler will be written in the http://caml.inria.fr/OCaml programming language, developed and distributed by INRIA, France's national research institute for computer science. OCaml is a general-purpose programming language, especially suited for symbolic manipulation of tree-like data structures, of the kind typically used during compilation. It is a simple and efficient language, designed for program safety and reliability and particularly suited for rapid prototyping. /* User provided invariant for the quicksort function. @ ensures (time <= \old(time) + max(0,((1+r)-l))*(max(0,(r-l))*5+21) + 6) */ void quick_rec(int t[], int l, int r) { /* Cost annotation for quick_rec body (1 cycle only) @ if (l=r) time += 6; */ if (l < r) { int v = t[l]; int m = l; int i = l + 1; /* @ label L User provided invariant for the while loop. To perform i iterations we need at most (i-l)*5 clock cycles. @ invariant time <= \at(time,L) + (i-l) * 5 && Additional invariants of the loop @ l <= m && l+1 <= i && m <= r && i <= r+1 && m < i && l < r */ while (i <= r) { /* Cost annotation for while (1 cycle only) @ if (t[i] < v) time += 4; else time += 5; */ if (t[i] < v) { m++; swap(t, i, m); } i++; } swap(t, l, m); quick_rec(t, l, m - 1); quick_rec(t, m + 1, r); } } \end{verbatim} \end{quote} \vspace{-15pt} \caption{\label{invariants}Invariants annotated C code. The invariants are user provided.} \end{figure} \begin{figure}[p] \begin{center} \framebox[0.9\linewidth]{ \begin{minipage}[t]{0.8\linewidth} \vspace{0.5em} For the certification of the compiler we plan to use the http://matita.cs.unibo.itMatita Interactive Theorem Prover, developed at the Computer Science Department of the University of Bologna. Matita is based on the Calculus of Inductive Constructions, the same foundational paradigm as INRIA's Coq system, and it is partially compatible with it. It adopts a tactic based editing mode. Proof objects (XML-encoded) are produced for storage and exchange. Its graphical interface, inspired by CtCoq and Proof General, supports high quality bidimensional rendering of proofs and formulae, transformed on-the-fly to MathML markup. In spite of its young age it has already been used for complex formalizations, including non trivial results in Number Theory and problems from the Poplmark challenge�[[[node60.html#poplmark|POPLmark]]]. An executable specification for all models of Freescale 8bit ALUs (Families HC05/HC08/RS08/HCS08) and memories (RAM, ROM, Flash) has also already been formalised in Matita. \emph{First complexity obligation}: case where $l < r$. The number of clock cycles spent in one iteration of the recursive function must be greater or equal than the sum of the number of clock cycles spent in the function body, in the \texttt{while} loop and in the two recursive calls: $$\begin{array}{l} \forall i,l,m,r.~ l < r \land l+1 \leq r \land r < i \leq r+1 \land m \leq r \land l+1 \leq i \land l \leq m \Rightarrow\\ (i - l) * 5 + 1 + \\ max(0,1 + r - (m + 1)) * (max(0,r - (m + 1)) * 5 + 21) + 6 + \\ max(0,1 + (m - 1) - l) * (max(0,m - 1 - l) * 5 + 21) + \\ 6 \\ \leq max(0,1 + r - l) * (max(0,r - l) * 5 + 21) + 6 \end{array}$$ For the management of cost annotations and proof obligation synthesis we plan to interface with the http://caduceus.lri.fr/Caduceus verification tool for C programs, developed by the Computer Science Laboratory of the University of Paris sud. Caduceus is built on top of Why, a general-purpose verification condition generator, exploiting Dijkstra's weakest precondition calculus. The Why tool allows the declaration of logical models (types, functions, predicates and axioms) that can be used in programs and annotations; moreover, it can be interfaced to a wide set of existing provers for verification of the resulting conditions. In particular, we will rely on http://alt-ergo.lri.frAlt-Ergo, which includes a decision procedure for linear arithmetic. %\item \emph{Second complexity obligation}: case $l \leq r$. The number of clock cycles spent in one iteration must be greater than the number of clock cycles required when the \texttt{if} statement fails. $$l \leq r \Rightarrow 6 \leq max(0,1 + r - l) * (max(0,r - l) * 5 + 21) + 6$$ We plan to support almost every ANSI C construct (functions, pointers, arrays and structures) and data-types (integers of various sizes) except function pointers, explicit jumps (goto) and pointer aliasing or casting. These features do not seem to pose major additional challenges to the technology we plan to develop, but could be time expensive to implement, formalize and prove correct. Moreover, they are not currently supported by Caduceus, posing additional problems for the development of the proof-of-concept prototype of the whole application. We could also support floating point numbers, but the kind of micro-controller we are targeting (8 and 16 bits processors) do not usually provide instructions to efficiently process them. Moreover floating point numbers are seldom used in embedded software for that very reason, making them a feature of ANSI C of limited interest in our scenario. %\end{itemize} A few others complexity obligations are automatically generated but, being trivial, are automatically proved by the system.\\ \end{minipage} } \caption{\label{obligations}Complexity obligations (automatically generated). The user should prove every complexity obligation.} \end{center} \end{figure} \noindent The quicksort complexity invariant states the maximum number of clock cycles required by its execution on an array delimited by $l$ and $r$. Since the procedure can be called with wrong indices We stress that the proposed approach handles costa annotations for C programs as a special case of standard annotations for imperative programs whose management we plan to automatize with tools such as Caduceus. As explained above, the choice of the tool does have an impact on the fragment of ANSI C constructs we will handle, and future advances in this domain could enlarge the fragment of ANSI C under consideration. On the other hand, tools like Caduceus pose no limitation on the invariants, which can be freely described in a computational and very expressive logic. Hence, every technique to automatically infer invariants and cost annotations can be exploited (and often automatized) in Caduceus. ($r < l$) the formula has to take into account that the $r-l$ could be negative using the $max$ function to raise that difference to zero when needed. The literature suggests that this quicksort implementation (where the pivot $v$ is chosen deterministically) has a quadratic complexity in the worst case. Cleaning up the formula from multiplicative and additive constants one obtains the expected asymptotic complexity $(r-l)^2$. The coefficients are those returned by the cost-annotating compiler. Similarly, the user has to give a complexity invariant for the inner while loop. \item The user and compiler annotated C code is fed into an already existing tool (in this example, Caduceus, \cite{caduceus}) that produces one complexity obligation for each execution path (Fig.~\ref{obligations}). \item The user should prove all complexity obligations. The proofs are the certificate that the user provided complexity invariant is correct.  In many cases, the obligations can be proved automatically using a general purpose automatic theorem prover or an ad-hoc procedure.  For instance, to prove the complexity obligations of Fig.~\ref{obligations}, we must show that a system of inequations holds, which may be done automatically. When an automatic proof is not possible, the user can resort to an interactive proof. \end{enumerate} The right part of Fig.~\ref{interaction} describes a planned case study for the automation of the complexity proof. We start with a synchronous program which is compiled to C code. The CerCo compiler then produces suitable cost annotations which are used by an invariant synthesizer to build complexity assertions on the C code. The synthesizer can take advantage of the high level control flow information contained in the source synchronous program. The deductive platform (Caduceus) generates complexity obligations which are passed to an ad-hoc proof generator to produce a machine-checked proof from which we can extract a certified bound on the reaction time of the original synchronous program. The proof generator can also take advantage of the high level information coming from the original source program, and user interaction can be used to drive the generator in critical cases. \subsubsection{Certification: tools and techniques} In order to trust the process described in the previous section, we need to trust the CerCo compiler. I.e.\ we need to fulfil the following requirements: \begin{enumerate} \item the compiled assembly program respects the semantics of the C program \item we need to know that the number of iterations performed by the C and assembly programs are the same, i.e. to prove that the compiler preserves the complexity \item we need to prove that the cost annotations generated by the compiler really correspond to the number of clock cycles spent by the hardware \end{enumerate} For this reason, we plan to\footnote{See next section for a more articulated description of the methodology}: \begin{enumerate} \item develop an untrusted CerCo compiler prototype in high level programming language; \item provide an executable formal specification of the target microprocessor; \item provide an executable formal specification of the C language; \item develop an executable version of the CerCo compiler in a language suitable to formal correctness proofs; \item give a machine checkable proof that the latter implementation satisfies all the requirements mentioned above. \end{enumerate} The untrusted compiler will be written in the \href{http://caml.inria.fr/}{OCaml} programming language, developed and distributed by \nobreak{INRIA}, France's national research institute for computer science. OCaml is a general-purpose programming language, especially suited for symbolic manipulation of tree-like data structures, of the kind typically used during compilation.  It is a simple and efficient language, designed for program safety and reliability and particularly suited for rapid prototyping. For the certification of the compiler we plan to use the \href{http://matita.cs.unibo.it}{Matita} Interactive Theorem Prover, developed at the Computer Science Department of the University of Bologna. Matita is based on the Calculus of Inductive Constructions, the same foundational paradigm as INRIA's Coq system, and it is partially compatible with it. It adopts a tactic based editing mode. Proof objects (XML-encoded) are produced for storage and exchange. Its graphical interface, inspired by CtCoq and Proof General, supports high quality bidimensional rendering of proofs and formulae, transformed on-the-fly to MathML markup. In spite of its young age it has already been used for complex formalizations, including non trivial results in Number Theory and problems from the Poplmark challenge~\cite{poplmark}. An executable specification for all models of Freescale 8bit ALUs (Families HC05/HC08/RS08/HCS08) and memories (RAM, ROM, Flash) has also already been formalised in Matita. For the management of cost annotations and proof obligation synthesis we plan to interface with the \href{http://caduceus.lri.fr/}{Caduceus} verification tool for C programs, developed by the Computer Science Laboratory of the University of Paris sud. Caduceus is built on top of Why, a general-purpose verification condition generator, exploiting Dijkstra's weakest precondition calculus. The Why tool allows the declaration of logical models (types, functions, predicates and axioms) that can be used in programs and annotations; moreover, it can be interfaced to a wide set of existing provers for verification of the resulting conditions. In particular, we will rely on \href{http://alt-ergo.lri.fr}{Alt-Ergo}, which includes a decision procedure for linear arithmetic. We plan to support almost every ANSI C construct (functions, pointers, arrays and structures) and data-types (integers of various sizes) except function pointers, explicit jumps (goto) and pointer aliasing or casting. These features do not seem to pose major additional challenges to the technology we plan to develop, but could be time expensive to implement, formalize and prove correct. Moreover, they are not currently supported by Caduceus, posing additional problems for the development of the proof-of-concept prototype of the whole application. We could also support floating point numbers, but the kind of micro-controller we are targeting (8 and 16 bits processors) do not usually provide instructions to efficiently process them. Moreover floating point numbers are seldom used in embedded software for that very reason, making them a feature of ANSI C of limited interest in our scenario. We stress that the proposed approach handles costa annotations for C programs as a special case of standard annotations for imperative programs whose management we plan to automatize with tools such as Caduceus. As explained above, the choice of the tool does have an impact on the fragment of ANSI C constructs we will handle, and future advances in this domain could enlarge the fragment of ANSI C under consideration. On the other hand, tools like Caduceus pose no limitation on the invariants, which can be freely described in a computational and very expressive logic. Hence, every technique to automatically infer invariants and cost annotations can be exploited (and often automatized) in Caduceus. [[BR]]