Changes between Version 1 and Version 2 of Proposal1.2
 Timestamp:
 Jun 15, 2010, 1:18:56 PM (8 years ago)
Legend:
 Unmodified
 Added
 Removed
 Modified

Proposal1.2
v1 v2 1 = Progress beyond state of the art=1 == [[BR]] Progress beyond state of the art == 2 2 3 Automatic verification of properties of software components has reached a level of maturity allowing complete correctness proofs of entire compilers; that is of the semantic equivalence between the generated assembly code and its source program. For instance, in the framework of the http://www.info.unikarlsruhe.de/publications.php/id=213Verifix Project a compiler from a subset of Common Lisp to Transputer code was formally checked in PVS (see [[[node60.html#DoldDold and Vialard]]]). [[[node60.html#StreckerStrecker]]] and [[[node60.html#KleinKlein and Nipkow]]] certified bytecode compilers from a subset of Java to a subset of the Java Virtual Machine in Isabelle. In the same system, [[[node60.html#LeinenbachLeinenbach et al.]]] formally verified a compiler from a subset of C to a DLX assembly code. [[[node60.html#ChlipalaChlipala]]] recently wrote in Coq a certified compiler from the simplytyped lambda calculus to assembly language, making an extensive use of dependent types. Perhaps, the most advanced project is http://pauillac.inria.fr/�xleroy/compcert/Compcert, headed by Leroy, based on the use of Coq both for programming the compiler and proving its correctness. In particular, both the backend ([[[node60.html#Leroy06Leroy]]]) and the frontend ([[[node60.html#Leroy08Leroy and Tristan]]]) of an optimising compiler from a subset of C to PowerPC assembly have been certified in this way. 3 4 4 Automatic verification of properties of software components has 5 reached a level of maturity allowing complete correctness proofs of 6 entire compilers; that is of the semantic equivalence between the 7 generated assembly code and its source program. For instance, in the 8 framework of the 9 \href{http://www.info.unikarlsruhe.de/publications.php/id=213}{Verifix Project} 10 a compiler from a subset of Common Lisp to Transputer code was formally checked 11 in PVS (see \cite{Dold}). \cite{Strecker} and \cite{Klein} 12 certified bytecode compilers from a subset of Java to a subset of the Java 13 Virtual Machine in Isabelle. 14 In the same system, \cite{Leinenbach} 15 formally verified a compiler 16 from a subset of C to a DLX assembly code. \cite{Chlipala} recently wrote 17 in Coq a certified compiler from the simplytyped lambda calculus to 18 assembly language, making an extensive use of dependent types. 19 Perhaps, the most advanced project is 20 \href{http://pauillac.inria.fr/~xleroy/compcert/}{Compcert}, headed by Leroy, 21 based on the use of Coq both for programming the compiler and proving its 22 correctness. In particular, both the backend (\cite{Leroy06}) 23 and the frontend (\cite{Leroy08}) 24 of an optimising compiler from a subset of C to PowerPC 25 assembly have been certified in this way. 5 However, very little is known about the preservation of intensional properties of programs, and in particular about their (concrete) complexity. The theoretical study of the complexity impact of program transformations between different computational models has so far been confined to very foundational devices. Here we propose to address a concrete case of compilation from a typical highlevel language to assembly. It is worth remarking that it is unlikely to have constanttime transformations between foundational models: for instance coding a multitape Turing machine into a single tape one could introduce a polynomial slowdown. Thus, complexity is architecture dependent, and the claim that you may pass from one language to another, preserving the performance of your algorithms, must be taken with the due caution. In particular, as surprising as it may be, very little is known about the complexity behaviour of a compiled code with respect to its source; as a matter of fact, most industries producing robots or devices with strong temporal constraints (such as, e.g. photoelectric safety barriers) still program such devices in assembly. 26 6 7 The tacit assumption that the complexity of algorithms is preserved along compilation, while plausible under the suitable assumptions, is not supported by any practical or theoretical study. For instance, a single register is usually used to point to activation records, implicitly delimiting their number; you may take more registers to this purpose, but unless you fix a priori their number (hence fixing the size of the stack), you cannot expect to access data in activation records in constant time. In particular, the memory model assumed by Leroy assumes an infinite memory, where allocation requests always succeed, that clearly conflicts with the reality of embedded software, where one has to work within precise (often relatively small) memory bounds. If working in restricted space on one side allows us to properly weight memory access as a unit cost, on the other side it introduces a subtle interplay between space complexity, time complexity and correctness that will be one of the crucial issues of the project. 27 8 28 However, very little is known about the preservation of intensional 29 properties of programs, and in particular about their (concrete) 30 complexity. The theoretical study of the complexity impact of program 31 transformations between different computational models has so far been 32 confined to very foundational devices. Here we propose to address a 33 concrete case of compilation from a typical highlevel language to 34 assembly. It is worth remarking that it is unlikely to have 35 constanttime transformations between foundational models: for 36 instance coding a multitape Turing machine into a single tape one 37 could introduce a polynomial slowdown. Thus, complexity is 38 architecture dependent, and the claim that you may pass from one 39 language to another, preserving the performance of your algorithms, 40 must be taken with the due caution. In particular, as surprising as 41 it may be, very little is known about the complexity behaviour of a 42 compiled code with respect to its source; as a matter of fact, most 43 industries producing robots or devices with strong temporal 44 constraints (such as, e.g.\ photoelectric safety barriers) still 45 program such devices in assembly. 9 Even admitting (as we hope to prove) that in a confined universe we may actually preserve complexity, the main interest of the project is in producing a (certified) computational cost (in terms of clock cycles) for all instruction slices of the source program with O(1) complexity, thus providing precise values for all constants appearing in the cost function for the source. This opens the possibility of computing time constraints for executable code by reasoning directly on the high level input language. In particular, we are not aiming to help analyse the complexity (or termination) of programs (that has to be guessed by the user, as he guesses invariants in axiomatic semantics), but we shall build the necessary infrastructure to reflect a highlevel, abstract complexity analysis of the source on a concrete instantiation of its target code. 46 10 47 The tacit assumption that the complexity of algorithms is preserved 48 along compilation, while plausible under the suitable assumptions, is 49 not supported by any practical or theoretical study. For instance, a 50 single register is usually used to point to activation records, 51 implicitly delimiting their number; you may take more registers to 52 this purpose, but unless you fix a priori their number (hence fixing 53 the size of the stack), you cannot expect to access data in activation 54 records in constant time. In particular, the memory model assumed by 55 Leroy assumes an infinite memory, where allocation requests always 56 succeed, that clearly conflicts with the reality of embedded software, 57 where one has to work within precise (often relatively small) memory 58 bounds. If working in restricted space on one side allows us to 59 properly weight memory access as a unit cost, on the other side it 60 introduces a subtle interplay between space complexity, time 61 complexity and correctness that will be one of the crucial issues of 62 the project. 11 Such instantiation depends on the target architecture: for instance some microcontrollers lack the multiplication instruction as a primitive operation, preventing to count such an operation with a fixed cost. Moreover, if we are interested in a really tight complexity measure, we cannot expect to have a uniform cost for input instructions since, due to register allocation and optimisations, their actual cost depends on their surrounding context. In other words, we have to face the non compositional nature (in terms of the source structure) of most compiler techniques and optimizations. 63 12 64 Even admitting (as we hope to prove) that in a confined universe we 65 may actually preserve complexity, the main interest of the project is 66 in producing a (certified) computational cost (in terms of clock 67 cycles) for all instruction slices of the source program with O(1) 68 complexity, thus providing precise values for all constants appearing 69 in the cost function for the source. This opens the possibility of 70 computing time constraints for executable code by reasoning directly 71 on the high level input language. In particular, we are not aiming to 72 help analyse the complexity (or termination) of programs (that has to 73 be guessed by the user, as he guesses invariants in axiomatic 74 semantics), but we shall build the necessary infrastructure to reflect 75 a highlevel, abstract complexity analysis of the source on a concrete 76 instantiation of its target code. 13 The Compcert project represents the current baseline for any future work on compiler certification, comprising the one we plan to do. We will improve on Compcert in two directions: by assuming a formal model where resources (memory) are constrained; and by preserving complexity of O(1) operations, also tracing the way they are mapped to assembly to reflect actual computational costs on the source code. Both improvements greatly increase the exploitation potentials, in particular in the domain of embedded systems and real time computation. 77 14 78 Such instantiation depends on the target architecture: for instance 79 some microcontrollers lack the multiplication instruction as a 80 primitive operation, preventing to count such an operation with a 81 fixed cost. Moreover, if we are interested in a really tight 82 complexity measure, we cannot expect to have a uniform cost for input 83 instructions since, due to register allocation and optimisations, 84 their actual cost depends on their surrounding context. In other 85 words, we have to face the non compositional nature (in terms of the 86 source structure) of most compiler techniques and optimizations. 15 [[BR]] 87 16 88 The Compcert project represents the current baseline for any future work on 89 compiler certification, comprising the one we plan to do. We will improve on 90 Compcert in two directions: by assuming a formal model where resources 91 (memory) are constrained; and by preserving complexity of O(1) operations, also 92 tracing the way they are mapped to assembly to reflect actual computational 93 costs on the source code. Both improvements greatly increase the exploitation 94 potentials, in particular in the domain of embedded systems and real time 95 computation. 17 === The CerCo approach === 96 18 97 == The !CerCo approach == 19 The complexity of a program only depends on its controlflow structure, and in particular on its cycles (procedures calls are just a special case of cycles). Proving that a compiler preserves complexity amounts to proving that it preserves (up to local modifications, like loop unrolling, inline expansion, etc.) the controlflow structure of the source[[footnode.html#foot144^1^]] and, less trivially, that all other instructions are compiled into assembly code whose execution takes a bounded number of clockcycles (i.e. with O(1) complexity). The interest of the project lies in the possibility to compute these costs directly on the target code then refer them back to the source program, allowing the possibility to make precise and trusted temporal assertions about execution from reasoning on the source code. 98 20 99 The complexity of a program only depends on its controlflow 100 structure, and in particular on its cycles (procedures calls are just 101 a special case of cycles). Proving that a compiler preserves 102 complexity amounts to proving that it preserves (up to local 103 modifications, like loop unrolling, inline expansion, etc.) the 104 controlflow structure of the source\footnote{This requires, in turn, 105 the preservation of semantics: the right conditions must be tested, 106 and procedures must be called with the correct parameters.} and, less 107 trivially, that all other instructions are compiled into assembly code 108 whose execution takes a bounded number of clockcycles (i.e. with O(1) 109 complexity). The interest of the project lies in the possibility to 110 compute these costs directly on the target code then refer them back 111 to the source program, allowing the possibility to make precise and 112 trusted temporal assertions about execution from reasoning on the 113 source code. 21 As already mentioned, the main problem in the backward translation of costs from target code to source code is the fact that, apart from the overall control flow structure, all remaining structure of the input is usually irremediably lost during compilation: optimizations can move instructions around, change the order of jumps, and in general perform operations that are far from compositional w.r.t. the high level syntactic structure of the input program. So there is no hope to compute costs on an instructionbyinstruction basis of the source language, since the actual cost of the executable is not compositional in these figures. We have to find another, eventually coarser, level of granularity where the source can be sensibly annotated by target costs. 114 22 23 We regard a C program as a collection of mutually defined procedures. The flow inside each procedure is determined by branching instructions like ifthenelse; ``while'' loops can be regarded as a special kind of tail recursive procedures. The resulting flow can thus be represented as a directed acyclic graph (DAG). We call a path of the directed acyclic graph an ''execution path''. 115 24 116 As already mentioned, the main problem in the backward translation of 117 costs from target code to source code is the fact that, apart from the 118 overall control flow structure, all remaining structure of the input 119 is usually irremediably lost during compilation: optimizations can 120 move instructions around, change the order of jumps, and in general 121 perform operations that are far from compositional w.r.t.\ the high 122 level syntactic structure of the input program. So there is no hope 123 to compute costs on an instructionbyinstruction basis of the source 124 language, since the actual cost of the executable is not compositional 125 in these figures. We have to find another, eventually coarser, level 126 of granularity where the source can be sensibly annotated by target 127 costs. 25 '''Figure 2:''' Quicksort 128 26 27   129 28 130 We regard a C program as a collection of mutually defined 131 procedures. The flow inside each procedure is determined by branching 132 instructions like ifthenelse; ``while'' loops can be regarded as a 133 special kind of tail recursive procedures. The resulting flow can thus 134 be represented as a directed acyclic graph (DAG). We call a path of 135 the directed acyclic graph an {\em execution path}. 136 \begin{center} 137 \begin{figure}[htb] 138 \begin{verbatim} 139 void quicksort(int t[], int l, int r) { 140 if (l < r) { 141 int v = t[l]; 142 int m = l; 143 int i = l + 1; 144 while (i <= r) { 145 if (t[i] < v) { m++; swap(t, i, m); } 146 i++; 147 } 148 swap(t, l, m); 149 quicksort(t, l, m  1); 150 quicksort(t, m + 1, r); 151 } 152 } 29 As a simple example, consider the quicksort program of Fig.�[[#code2]]. This algorithm performs inplace sorting of input array img3.png whose bounds are img4.png and img5.png; initially img4.png is expected to be zero, while img5.png is the length of the array minus one. The outermost conditional terminates when the bounds of the array are illegal. (Sorting an empty array will end the recursive behaviour of the algorithm.) The variable img6.png is the so called pivot: a selected element of the array that will be compared with all other elements. Bigger elements will be moved (by the swap function) to the end of the array (the upper part), while smaller elements are placed at the beginning of the array (the lower part). Then the pivot is placed between the lower and the upper part of the array, in position img7.png, its position in the resulting sorted array; all elements before the pivot are smaller and all elements following it are bigger. The algorithm completes the sorting with recursive calls on the lower and on the upper parts of the array. 153 30 154 \end{verbatim} 155 \caption{\label{code}Quicksort} 156 \end{figure} 157 \end{center} 158 As a simple example, consider the quicksort program of 159 Fig.~\ref{code}. This algorithm performs inplace sorting of input 160 array $t$ whose bounds are $l$ and $r$; initially $l$ is expected to 161 be zero, while $r$ is the length of the array minus one. The outermost 162 conditional terminates when the bounds of the array are illegal. 163 (Sorting an empty array will end the recursive behaviour of the 164 algorithm.) The variable $v$ is the so called pivot: a selected 165 element of the array that will be compared with all other elements. 166 Bigger elements will be moved (by the swap function) to the end of the 167 array (the upper part), while smaller elements are placed at the 168 beginning of the array (the lower part). Then the pivot is placed 169 between the lower and the upper part of the array, in position $m$, 170 its position in the resulting sorted array; all elements before the 171 pivot are smaller and all elements following it are bigger. The 172 algorithm completes the sorting with recursive calls on the lower and 173 on the upper parts of the array. 31 In the body of the `quick_sort` procedure there are only two execution paths, corresponding to the two cases img8.png and img9.png. The latter is a trivial path, immediately leading to termination. The former leads to the while loop (that is regarded as a procedure call), the call to swap, and the two recursive calls. Similarly, the body of the while loop is composed by two paths, corresponding to the two conditions img10.png and img11.png. 174 32 33 All operations performed along any of these paths takes some constant time c. The complexity magnitude of the program only depends on the loops and recursive calls met along its execution paths, but not on their associated constants. On the other hand, if we want to give tight performance bounds to the execution time, we have to compute the real constants on the executable. 175 34 176 In the body of the \verb+quick_sort+ procedure there are only two 177 execution paths, corresponding to the two cases $l < r$ and $l \ge 178 r$. The latter is a trivial path, immediately leading to 179 termination. The former leads to the while loop (that is regarded as a 180 procedure call), the call to swap, and the two recursive calls. 181 Similarly, the body of the while loop is composed by two paths, 182 corresponding to the two conditions $i \le r$ and $i > r$. 35 The compiler must be able to return a set of pairs img12.png, where each img13.png is an execution path, and img14.png is its actual cost[[footnode.html#foot154^2^]]. It is then up to the user to guess (and to prove, assisted by interactive tools) the complexity of the program: the compiler only provides the infrastructure required to map a complexity analysis on the source into a faithful analog on the target. This approach looks compatible with most local optimizations. Moreover, since we work on a cyclebycycle (procedurebyprocedure) basis, the approach should scale up well. 183 36 184 All operations performed along any of these paths takes some constant 185 time c. The complexity magnitude of the program only depends on the 186 loops and recursive calls met along its execution paths, but not on 187 their associated constants. On the other hand, if we want to give 188 tight performance bounds to the execution time, we have to compute the 189 real constants on the executable. 37 [[BR]] 190 38 39 === User interaction flow === 191 40 192 The compiler must be able to return a set of pairs $(p_i,c_i)$, where 193 each $p_i$ is an execution path, and $c_i$ is its actual 194 cost\footnote{A more flexible result would consist in returning pairs 195 $(p_i,a_i)$ where $a_i$ is the sequence of assembly instructions 196 corresponding to $p_i$; this would allow to take space into 197 consideration, as well as time.}. It is then up to the user to guess 198 (and to prove, assisted by interactive tools) the complexity of the 199 program: the compiler only provides the infrastructure required to map 200 a complexity analysis on the source into a faithful analog on the 201 target. This approach looks compatible with most local optimizations. 202 Moreover, since we work on a cyclebycycle (procedurebyprocedure) 203 basis, the approach should scale up well. 41 The left part of Fig.�[[#interaction3]] shows the interaction diagram for the final user of the system (the right part is a planned case study for a possible automation of the process and will be discussed later). 204 42 205 == User interaction flow == 43 '''Figure 3:''' Interaction and automation diagrams 206 44 207 The left part of Fig.~\ref{interaction} shows the interaction diagram for the final user of 208 the system (the right part is a planned case study for a possible 209 automation of the process and will be discussed later). 210 \begin{figure}[tb] 211 \begin{center} 212 \includegraphics[width=0.40\textwidth]{interaction_diagram} 213 \hspace{0.5cm} 214 \vrule 215 \hspace{1cm} 216 \includegraphics[width=0.45\textwidth]{case_study_diagram} 217 \caption{\label{interaction}Interaction and automation diagrams} 218 \end{center} 219 \end{figure} 220 The interaction is done in several steps. We illustrate them using the 221 quicksort program above. 222 \begin{enumerate} 223 \item The user writes her code in C~(see Fig.~\ref{code}) and compiles 224 it with our CerCo (Certified Complexity) compiler. 225 \item The CerCo compiler outputs both the object code and an annotated copy 226 of the C source (Fig.~\ref{annotated}). 45  img17.png  227 46 228 \begin{figure}[tb] 229 \begin{quote} 230 \begin{verbatim} 231 void quick_rec(int t[], int l, int r) { 232 /* Cost annotation for quick_rec body (1 cycle only) 233 @ if (l<r) time += 21; 234 @ if (l>=r) time += 6; */ 235 if (l < r) { 236 int v = t[l]; 237 int m = l; 238 int i = l + 1; 239 while (i <= r) { 240 /* Cost annotation for while (1 cycle only) 241 @ if (t[i] < v) time += 4; else time += 5; */ 242 if (t[i] < v) { m++; swap(t, i, m); } 243 i++; 244 } 245 swap(t, l, m); 246 quick_rec(t, l, m  1); 247 quick_rec(t, m + 1, r); 248 } 249 } 250 \end{verbatim} 251 \end{quote} 252 \caption{\label{annotated}Cost annotated C code (generated by the compiler)} 253 \end{figure} 47 The interaction is done in several steps. We illustrate them using the quicksort program above. 254 48 255 \begin{figure}[p] 256 \begin{center} 257 \begin{tabular}{ccc} 258 \begin{minipage}[t]{.3\textwidth} 259 \textbf{C source}\\~\\ 260 \texttt{\scriptsize{ 261 \colorbox{cola}{void quicksort(t,l,r) \{}\\ 262 \colorbox{colb}{\ \ if (l {\textless} r) \{}\\ 263 \colorbox{colc}{\ \ \ \ int i = l + 1;}\\ 264 \colorbox{cold}{\ \ \ \ int m = l;}\\ 265 \colorbox{cole}{\ \ \ \ int v = t[l];}\\ 266 \colorbox{colf}{\ \ \ \ while (i {\textless}= r) \{}\\ 267 \colorbox{colg}{\ \ \ \ \ \ if (t[i] {\textless} v) \{ }\\ 268 \colorbox{colh}{\ \ \ \ \ \ \ \ m++; }\\ 269 \colorbox{coli}{\ \ \ \ \ \ \ \ swap(t, i, m); \}}\\ 270 \colorbox{colj}{\ \ \ \ \ \ i++;\}}\\ 271 \colorbox{coll}{\ \ \ \ swap(t, l, m);}\\ 272 \colorbox{colm}{\ \ \ \ quicksort(t, l, m {} 1);}\\ 273 \colorbox{coln}{\ \ \ \ quicksort(t, m + 1, r);}\\ 274 \colorbox{cola}{\}}\\ 275 }} 276 \end{minipage} 277 & 278 \begin{minipage}[t]{.3\textwidth} 279 \textbf{PseudoAssembly code}\\~\\ 280 \texttt{\scriptsize{ 281 \colorbox{cola}{24: r < r3}\\ 282 \colorbox{cola}{29: l < r2}\\ 283 \colorbox{colb}{34: cmp l r}\\ 284 \colorbox{cola}{36: t < r1}\\ 285 \colorbox{colb}{3a: jump c4 if l >= r}\\ 286 \colorbox{colc}{40: i < l + 1}\\ 287 \colorbox{cole}{44: r8 < t}\\ 288 \colorbox{cole}{48: r7 < l}\\ 289 \colorbox{cold}{4b: m < l} 290 \vspace{0.4em}\\ 291 \framebox[0.7\linewidth]{ 292 \begin{minipage}[t]{0.6\textwidth} 49 1. The user writes her code in C�(see Fig.�[[node6.html#code2]]) and compiles it with our CerCo (Certified Complexity) compiler. 50 1. The CerCo compiler outputs both the object code and an annotated copy of the C source (Fig.�[[#annotated4]]). '''Figure 4:''' Cost annotated C code (generated by the compiler)  51 '''Figure 5:''' Automatic inference of cost annotations from assembly code  <tablestyle="width:30%"> C source[[BR]]� [[BR]]`~ void quicksort(t,l,r) {[[BR]] if (l < r) {[[BR]] int i = l + 1;[[BR]] int m = l;[[BR]] int v = t[l];[[BR]] while (i <= r) {[[BR]] if (t[i] < v) { [[BR]] m++; [[BR]] swap(t, i, m); }[[BR]] i++;}[[BR]] swap(t, l, m);[[BR]] quicksort(t, l, m  1);[[BR]] quicksort(t, m + 1, r);[[BR]]}[[BR]]~`   <tablestyle="width:30%"> PseudoAssembly code[[BR]]� [[BR]]`~ 24: r < r3[[BR]]29: l < r2[[BR]]34: cmp l r[[BR]]36: t < r1[[BR]]3a: jump c4 if l >= r[[BR]]40: i < l + 1[[BR]]44: r8 < t[[BR]]48: r7 < l[[BR]]4b: m < l[[BR]][[BR]]img20.png[[BR]][[BR]]97: r1 < t[[BR]]9b: r3 < m[[BR]]9e: r2 < l[[BR]]a1: call swap[[BR]]a6: r1 < t[[BR]]aa: r3 < m  0x1[[BR]]af: r2 < l[[BR]]b2: call quicksort[[BR]]bc: l < r6[[BR]]bf: call quicksort[[BR]]c4: ret ~`   <tablestyle="width:30%"> Execution Paths[[BR]]� [[BR]]`~ l >= r img21.png [[BR]]24: r < r3[[BR]]29: l < r2[[BR]]34: cmp l r[[BR]]36: t < r1[[BR]]3a: jump c4 if l >= r[[BR]]c4: ret[[BR]]�[[BR]]total: 6 clock cycles[[BR]]�[[BR]]l < r img21.png [[BR]]24: r < r3[[BR]]29: l < r2[[BR]]34: cmp l r[[BR]]36: t < r1[[BR]]3a: jump c4 if l >= r[[BR]]40: i < l + 1[[BR]]44: r8 < t[[BR]]48: r7 < l[[BR]]4b: m < l[[BR]]while loop[[BR]]97: r1 < t[[BR]]9b: r3 < m[[BR]]9e: r2 < l[[BR]]a1: call swap [[BR]]swap [[BR]]a6: r1 < t[[BR]]aa: r3 < m  0x1[[BR]]af: r2 < l[[BR]]b2: call quicksort[[BR]]quicksort[[BR]]bc: l < r6[[BR]]bf: call quicksort[[BR]]quicksort[[BR]]c4: ret[[BR]]�[[BR]]total: 21 clock cycles + function calls ~`    52 Each loop and function body is annotated with the cost of one iteration, along all its possible execution paths. The cost is expressed as a function of the state of the program, which comprises the value of every variable. (In the example, we use a textual annotation for simplicity, but we expect to produce a more structured output.) The leftmost column of Fig.�[[#runs5]] shows the original source code. Colours are used to relate source code statements with their respective (human readable) assembly instructions, reported in the central column. That assembly was produced by gcc[[footnode.html#foot390^3^]] with a moderate level of optimizations for an Intel 386 family microprocessor. We used img4.png, img5.png, img3.png, img6.png and img7.png to mention locations (registers and/or stack frame temporaries) in which the corresponding C variables are placed, while img22.png are other register or temporaries that have no direct mapping to C. The calling convention puts the first three parameters in img23.png, img24.png and img25.png, and it is up to the callee to eventually store them in local temporaries. Assignment is denoted with `<`, addition and multiplication with `+` and `*`; the jump instruction is followed by the target address, and when the jump is conditional a C like expression follows (but its evaluation is performed early by the `cmp` instruction, that sets a CPU flag recording the result of the comparison). The only tricky expression is ```*(r8 + r7 * 4)`'', that exploits an advanced addressing mechanism corresponding to array indexing (4 is the size of an array cell in bytes, img26.png is the index and img27.png is the address at which the array starts). It amounts to the C statement ```t[l]`'' that computes the pivot.The rightmost column shows two possible execution paths, with a precise estimation of their cost (here 6 and 21 CPU cycles plus the cost of function calls) and the algebraic conditions characterizing these paths.More precisely 53 * The CerCo compiler avoids intraprocedural optimisations and loop optimisations that may change the number of iterations performed in a non trivial way. 54 * Some intraprocedural or loop optimisations (like the `while` to `repeat` prehoisting optimisation applied by gcc in Fig.�[[#runs5]]) can be allowed, provided that the compiler records them precisely. 55 * Once the assembly code is produced, the assemblylevel control flow graph is analysed in order to compute the cost of execution paths. Fig.�[[#runs5]] shows two of them in the rightmost column; the analysis of the `while` loop has been omitted, but is similar. 56 1. The user computes (by hand or semiautomatically) the complexity invariants of each cycle and (recursive) function, and he adds them to the C code as special comments[[footnode.html#foot404^4^]] (Fig.�[[#invariants6]]). '''Figure 6:''' Invariants annotated C code. The invariants are user provided. img28.png  57 '''Figure 7:''' Complexity obligations (automatically generated). The user should prove every complexity obligation. img29.png  58 The quicksort complexity invariant states the maximum number of clock cycles required by its execution on an array delimited by img4.png and img5.png. Since the procedure can be called with wrong indices (img30.png) the formula has to take into account that the img31.png could be negative using the img32.png function to raise that difference to zero when needed. The literature suggests that this quicksort implementation (where the pivot img6.png is chosen deterministically) has a quadratic complexity in the worst case. Cleaning up the formula from multiplicative and additive constants one obtains the expected asymptotic complexity img33.png. The coefficients are those returned by the costannotating compiler. Similarly, the user has to give a complexity invariant for the inner while loop. 59 1. The user and compiler annotated C code is fed into an already existing tool (in this example, Caduceus, [[[node60.html#caduceusFilli�tre and March�]]]) that produces one complexity obligation for each execution path (Fig.�[[#obligations7]]). 60 1. The user should prove all complexity obligations. The proofs are the certificate that the user provided complexity invariant is correct. In many cases, the obligations can be proved automatically using a general purpose automatic theorem prover or an adhoc procedure. For instance, to prove the complexity obligations of Fig.�[[#obligations7]], we must show that a system of inequations holds, which may be done automatically. When an automatic proof is not possible, the user can resort to an interactive proof. 293 61 294 \colorbox{white}{while loop}\\ 295 \tiny{ 296 \colorbox{colf}{4e: cmp i r}\\ 297 \colorbox{cole}{53: v < *(r8 + r7 * 4) }\\ 298 \colorbox{colf}{57: jump 97 if i > r}\\ 299 \colorbox{colg}{59: r7 < i}\\ 300 \colorbox{colg}{5c: r9 < r8 + r7 * 4}\\ 301 \colorbox{colf}{60: jump 6e}\\ 302 \colorbox{colj}{62: i < i + 0x1}\\ 303 \colorbox{colg}{65: r9 < r9 + 0x4}\\ 304 \colorbox{colf}{69: cmp i r}\\ 305 \colorbox{colf}{6c: jump 92 if i > r}\\ 306 \colorbox{colg}{6e: cmp v *r9}\\ 307 \colorbox{colg}{72: jump 62 if v <= r9}\\ 308 \colorbox{coli}{74: r1 < t}\\ 309 \colorbox{colh}{78: m < m + 0x1}\\ 310 \colorbox{coli}{7c: r2 < i}\\ 311 \colorbox{coli}{7e: r3 < r12d}\\ 312 \colorbox{colj}{81: i < i + 0x1}\\ 313 \colorbox{colg}{84: r9 < r9 + 0x4}\\ 314 \colorbox{coli}{88: call swap}\\ 315 \colorbox{colf}{8d: cmp i r}\\ 316 \colorbox{colf}{90: jump 6e if i <= r}\\ 317 \colorbox{coln}{92: r6 < m + 0x1}\\ 318 } 319 \end{minipage} 320 } 321 \vspace{0.4em} 322 \\ 62 The right part of Fig.�[[#interaction3]] describes a planned case study for the automation of the complexity proof. We start with a synchronous program which is compiled to C code. The CerCo compiler then produces suitable cost annotations which are used by an invariant synthesizer to build complexity assertions on the C code. The synthesizer can take advantage of the high level control flow information contained in the source synchronous program. The deductive platform (Caduceus) generates complexity obligations which are passed to an adhoc proof generator to produce a machinechecked proof from which we can extract a certified bound on the reaction time of the original synchronous program. The proof generator can also take advantage of the high level information coming from the original source program, and user interaction can be used to drive the generator in critical cases. 323 63 324 \colorbox{coll}{97: r1 < t}\\ 325 \colorbox{coll}{9b: r3 < m}\\ 326 \colorbox{coll}{9e: r2 < l}\\ 327 \colorbox{coll}{a1: call swap}\\ 328 \colorbox{colm}{a6: r1 < t}\\ 329 \colorbox{colm}{aa: r3 < m {} 0x1}\\ 330 \colorbox{colm}{af: r2 < l}\\ 331 \colorbox{colm}{b2: call quicksort}\\ 332 \colorbox{coln}{bc: l < r6}\\ 333 \colorbox{coln}{bf: call quicksort}\\ 334 \colorbox{cola}{c4: ret} 335 }} 336 \end{minipage} 337 & 338 \begin{minipage}[t]{.3\textwidth} 339 \textbf{Execution Paths}\\~\\ 340 \texttt{\scriptsize{ 341 \colorbox{white}{l >= r $\rightarrow$ }\\ 342 \colorbox{cola}{24: r < r3}\\ 343 \colorbox{cola}{29: l < r2}\\ 344 \colorbox{colb}{34: cmp l r}\\ 345 \colorbox{cola}{36: t < r1}\\ 346 \colorbox{colb}{3a: jump c4 if l >= r}\\ 347 \colorbox{cola}{c4: ret}\\ 348 \colorbox{white}{~}\\ 349 \colorbox{white}{total: 6 clock cycles}\\ 350 \colorbox{white}{~}\\ 351 \colorbox{white}{l < r $\rightarrow$ }\\ 352 \colorbox{cola}{24: r < r3}\\ 353 \colorbox{cola}{29: l < r2}\\ 354 \colorbox{colb}{34: cmp l r}\\ 355 \colorbox{cola}{36: t < r1}\\ 356 \colorbox{colb}{3a: jump c4 if l >= r}\\ 357 \colorbox{colc}{40: i < l + 1}\\ 358 \colorbox{cole}{44: r8 < t}\\ 359 \colorbox{cole}{48: r7 < l}\\ 360 \colorbox{cold}{4b: m < l}\\ 361 \colorbox{white}{while loop}\\ 64 [[BR]] 362 65 363 \colorbox{coll}{97: r1 < t}\\ 364 \colorbox{coll}{9b: r3 < m}\\ 365 \colorbox{coll}{9e: r2 < l}\\ 366 \colorbox{coll}{a1: call swap }\\ 367 \colorbox{white}{swap }\\ 368 \colorbox{colm}{a6: r1 < t}\\ 369 \colorbox{colm}{aa: r3 < m {} 0x1}\\ 370 \colorbox{colm}{af: r2 < l}\\ 371 \colorbox{colm}{b2: call quicksort}\\ 372 \colorbox{white}{quicksort}\\ 373 \colorbox{coln}{bc: l < r6}\\ 374 \colorbox{coln}{bf: call quicksort}\\ 375 \colorbox{white}{quicksort}\\ 376 \colorbox{cola}{c4: ret}\\ 377 \colorbox{white}{~}\\ 378 \colorbox{white}{total: 21 clock cycles + function calls} 379 }} 380 \end{minipage} 381 \end{tabular} 382 \end{center} 383 \caption{\label{runs}Automatic inference of cost annotations from assembly code} 384 \end{figure} 385 Each loop and function body is annotated with the cost of one 386 iteration, along all its possible execution paths. The cost is 387 expressed as a function of the state of the program, which comprises 388 the value of every variable. (In the example, we use a textual 389 annotation for simplicity, but we expect to produce a more structured 390 output.) 66 === Certification: tools and techniques === 391 67 68 In order to trust the process described in the previous section, we need to trust the CerCo compiler. I.e. we need to fulfil the following requirements: 392 69 393 The leftmost column of Fig.~\ref{runs} shows the original source code. Colours 394 are used to relate source code statements with their respective (human 395 readable) assembly instructions, reported in the central column. That assembly 396 was produced by gcc\footnote{GNU compiler collection, version 4.2} with a 397 moderate level of optimizations for an Intel 386 family microprocessor. We 398 used $l$, $r$, $t$, $v$ and $m$ to mention locations (registers and/or stack 399 frame temporaries) in which the corresponding C variables are placed, while $r1, 400 \ldots, r9$ are other register or temporaries that have no direct mapping to C. 401 The calling convention puts the first three parameters in $r1$, $r2$ 402 and $r3$, and it is up to the callee to eventually store them in local 403 temporaries. Assignment is denoted with \texttt{<}, addition and 404 multiplication with 405 \texttt{+} and \texttt{*}; the jump instruction is followed by the target 406 address, and when the jump is conditional a C like expression follows (but its 407 evaluation is performed early by the \texttt{cmp} instruction, that sets a CPU 408 flag recording the result of the comparison). The only tricky expression is 409 ``\texttt{*(r8 + r7 * 4)}'', that exploits an advanced addressing mechanism 410 corresponding to array indexing (4 is the size of an array cell in bytes, $r7$ 411 is the index and $r8$ is the address at which the array starts). It amounts to 412 the C statement ``\texttt{t[l]}'' that computes the pivot. 70 1. the compiled assembly program respects the semantics of the C program 71 1. we need to know that the number of iterations performed by the C and assembly programs are the same, i.e. to prove that the compiler preserves the complexity 72 1. we need to prove that the cost annotations generated by the compiler really correspond to the number of clock cycles spent by the hardware 413 73 414 The rightmost column shows two possible execution paths, with a 415 precise estimation of their cost (here 6 and 21 CPU cycles plus the 416 cost of function calls) and the algebraic conditions characterizing 417 these paths. 74 For this reason, we plan to[[footnode.html#foot438^5^]]: 418 75 76 1. develop an untrusted CerCo compiler prototype in high level programming language; 77 1. provide an executable formal specification of the target microprocessor; 78 1. provide an executable formal specification of the C language; 79 1. develop an executable version of the CerCo compiler in a language suitable to formal correctness proofs; 80 1. give a machine checkable proof that the latter implementation satisfies all the requirements mentioned above. 419 81 420 More precisely 421 \begin{itemize} 422 \item The CerCo compiler avoids intraprocedural optimisations and loop 423 optimisations that may change the number of 424 iterations performed in a non trivial way. 425 \item Some intraprocedural or loop optimisations (like the 426 \texttt{while} to \texttt{repeat} prehoisting optimisation applied 427 by gcc in Fig.~\ref{runs}) 428 can be allowed, provided that the compiler records them precisely. 429 \item Once the assembly code is produced, the assemblylevel control flow 430 graph is analysed in order 431 to compute the cost of execution paths. Fig.~\ref{runs} 432 shows two of them in the rightmost column; the analysis of 433 the \texttt{while} loop has been omitted, but is similar. 434 \end{itemize} 435 \item The user computes (by hand or semiautomatically) the complexity 436 invariants of each cycle and (recursive) function, and he adds them 437 to the C code as special comments\footnote{Again, more interactive forms 438 of annotations can be considered.} (Fig.~\ref{invariants}). 439 \begin{figure}[p] 440 \begin{quote} 441 \begin{verbatim} 82 The untrusted compiler will be written in the http://caml.inria.fr/OCaml programming language, developed and distributed by INRIA, France's national research institute for computer science. OCaml is a generalpurpose programming language, especially suited for symbolic manipulation of treelike data structures, of the kind typically used during compilation. It is a simple and efficient language, designed for program safety and reliability and particularly suited for rapid prototyping. 442 83 443 /* User provided invariant for the quicksort function. 444 @ ensures (time <= \old(time) + max(0,((1+r)l))*(max(0,(rl))*5+21) + 6) */ 445 void quick_rec(int t[], int l, int r) { 446 /* Cost annotation for quick_rec body (1 cycle only) 447 @ if (l<r) time += 21; 448 @ if (l>=r) time += 6; */ 449 if (l < r) { 450 int v = t[l]; 451 int m = l; 452 int i = l + 1; 453 /* @ label L 454 User provided invariant for the while loop. To perform i iterations 455 we need at most (il)*5 clock cycles. 456 @ invariant time <= \at(time,L) + (il) * 5 && 457 Additional invariants of the loop 458 @ l <= m && l+1 <= i && m <= r && i <= r+1 && m < i && l < r */ 459 while (i <= r) { 460 /* Cost annotation for while (1 cycle only) 461 @ if (t[i] < v) time += 4; else time += 5; */ 462 if (t[i] < v) { m++; swap(t, i, m); } 463 i++; 464 } 465 swap(t, l, m); 466 quick_rec(t, l, m  1); 467 quick_rec(t, m + 1, r); 468 } 469 } 470 \end{verbatim} 471 \end{quote} 472 \vspace{15pt} 473 \caption{\label{invariants}Invariants annotated C code. The invariants are 474 user provided.} 475 \end{figure} 476 \begin{figure}[p] 477 \begin{center} 478 \framebox[0.9\linewidth]{ 479 \begin{minipage}[t]{0.8\linewidth} 480 \vspace{0.5em} 84 For the certification of the compiler we plan to use the http://matita.cs.unibo.itMatita Interactive Theorem Prover, developed at the Computer Science Department of the University of Bologna. Matita is based on the Calculus of Inductive Constructions, the same foundational paradigm as INRIA's Coq system, and it is partially compatible with it. It adopts a tactic based editing mode. Proof objects (XMLencoded) are produced for storage and exchange. Its graphical interface, inspired by CtCoq and Proof General, supports high quality bidimensional rendering of proofs and formulae, transformed onthefly to MathML markup. In spite of its young age it has already been used for complex formalizations, including non trivial results in Number Theory and problems from the Poplmark challenge�[[[node60.html#poplmarkPOPLmark]]]. An executable specification for all models of Freescale 8bit ALUs (Families HC05/HC08/RS08/HCS08) and memories (RAM, ROM, Flash) has also already been formalised in Matita. 481 85 482 \emph{First complexity obligation}: case where $l < r$. 483 The number of clock cycles spent in one iteration 484 of the recursive function must be greater or equal than the sum of the number 485 of clock cycles spent in the function body, in the \texttt{while} loop and 486 in the two recursive calls: 487 $$\begin{array}{l} 488 \forall i,l,m,r.~ l < r \land l+1 \leq r \land r < i \leq r+1 \land 489 m \leq r \land l+1 \leq i \land l \leq m \Rightarrow\\ 490 (i  l) * 5 + 1 + \\ 491 max(0,1 + r  (m + 1)) * (max(0,r  (m + 1)) * 5 + 21) + 6 + \\ 492 max(0,1 + (m  1)  l) * (max(0,m  1  l) * 5 + 21) + \\ 493 6 \\ 494 \leq 495 max(0,1 + r  l) * (max(0,r  l) * 5 + 21) + 6 496 \end{array} 497 $$ 86 For the management of cost annotations and proof obligation synthesis we plan to interface with the http://caduceus.lri.fr/Caduceus verification tool for C programs, developed by the Computer Science Laboratory of the University of Paris sud. Caduceus is built on top of Why, a generalpurpose verification condition generator, exploiting Dijkstra's weakest precondition calculus. The Why tool allows the declaration of logical models (types, functions, predicates and axioms) that can be used in programs and annotations; moreover, it can be interfaced to a wide set of existing provers for verification of the resulting conditions. In particular, we will rely on http://altergo.lri.frAltErgo, which includes a decision procedure for linear arithmetic. 498 87 499 %\item 500 \emph{Second complexity obligation}: case $l \leq r$. The number of clock cycles spent 501 in one iteration must be greater than the number of clock cycles required when 502 the \texttt{if} statement fails. 503 $$l \leq r \Rightarrow 6 \leq max(0,1 + r  l) * (max(0,r  l) * 5 + 21) + 6 504 $$ 88 We plan to support almost every ANSI C construct (functions, pointers, arrays and structures) and datatypes (integers of various sizes) except function pointers, explicit jumps (goto) and pointer aliasing or casting. These features do not seem to pose major additional challenges to the technology we plan to develop, but could be time expensive to implement, formalize and prove correct. Moreover, they are not currently supported by Caduceus, posing additional problems for the development of the proofofconcept prototype of the whole application. We could also support floating point numbers, but the kind of microcontroller we are targeting (8 and 16 bits processors) do not usually provide instructions to efficiently process them. Moreover floating point numbers are seldom used in embedded software for that very reason, making them a feature of ANSI C of limited interest in our scenario. 505 89 506 %\end{itemize} 507 A few others complexity obligations are automatically generated but, being 508 trivial, are automatically proved by the system.\\ 509 \end{minipage} 510 } 511 \caption{\label{obligations}Complexity obligations (automatically generated). 512 The user should prove every complexity obligation.} 513 \end{center} 514 \end{figure} 515 \noindent 516 The quicksort complexity invariant states the maximum 517 number of clock cycles required by its execution on an array delimited 518 by $l$ and $r$. Since the procedure can be called with wrong indices 90 We stress that the proposed approach handles costa annotations for C programs as a special case of standard annotations for imperative programs whose management we plan to automatize with tools such as Caduceus. As explained above, the choice of the tool does have an impact on the fragment of ANSI C constructs we will handle, and future advances in this domain could enlarge the fragment of ANSI C under consideration. On the other hand, tools like Caduceus pose no limitation on the invariants, which can be freely described in a computational and very expressive logic. Hence, every technique to automatically infer invariants and cost annotations can be exploited (and often automatized) in Caduceus. 519 91 520 ($r < l$) the formula has to take into account that the $rl$ could 521 be negative using the $max$ function to raise that difference to zero 522 when needed. 523 The literature suggests that this quicksort implementation (where 524 the pivot $v$ is chosen deterministically) has a quadratic complexity 525 in the worst case. Cleaning up the 526 formula from multiplicative and additive constants one obtains the 527 expected asymptotic complexity $(rl)^2$. 528 529 The coefficients are those returned by the costannotating compiler. 530 Similarly, the user has to give a complexity invariant for the 531 inner while loop. 532 \item The user and compiler annotated C code is fed into an already existing 533 tool (in this example, Caduceus, \cite{caduceus}) that produces 534 one complexity obligation for each execution path (Fig.~\ref{obligations}). 535 \item The user should prove all complexity obligations. The proofs 536 are the certificate that the user provided complexity invariant is 537 correct. In many cases, the obligations can be proved automatically 538 using a general purpose automatic theorem prover or an adhoc 539 procedure. For instance, to prove the complexity obligations of 540 Fig.~\ref{obligations}, we must show that a system of inequations 541 holds, which may be done automatically. When an automatic proof is 542 not possible, the user can resort to an interactive proof. 543 \end{enumerate} 544 545 546 The right part of Fig.~\ref{interaction} describes a planned case study for the 547 automation of the complexity proof. We start with a synchronous program 548 which is compiled to C code. The CerCo compiler then produces suitable 549 cost annotations which are used by an invariant synthesizer to build 550 complexity assertions on the C code. The synthesizer can take advantage of the 551 high level control flow information contained in the source synchronous program. 552 The deductive platform (Caduceus) generates complexity obligations which are passed to an adhoc proof generator to produce a machinechecked proof from which 553 we can extract a certified bound on the reaction time of the original 554 synchronous program. The proof generator can also take advantage of the high 555 level information coming from the original source program, and user interaction 556 can be used to drive the generator in critical cases. 557 558 \subsubsection{Certification: tools and techniques} 559 In order to trust the process described in the previous section, 560 we need to trust the CerCo compiler. 561 I.e.\ we need to fulfil the following requirements: 562 \begin{enumerate} 563 \item the compiled assembly program respects the semantics of the C program 564 \item we need to know that the number of iterations performed by the C 565 and assembly programs are the same, i.e. to prove that the 566 compiler preserves the complexity 567 \item we need to prove that the cost annotations generated by the compiler 568 really correspond to the number of clock cycles spent by the hardware 569 \end{enumerate} 570 571 572 For this reason, we plan to\footnote{See next section for a 573 more articulated description of the methodology}: 574 \begin{enumerate} 575 \item develop an untrusted CerCo compiler prototype in high level 576 programming language; 577 \item provide an executable 578 formal specification of the target microprocessor; 579 \item provide an executable 580 formal specification of the C language; 581 \item develop an executable version of the CerCo compiler in a language 582 suitable to formal correctness proofs; 583 \item give a machine checkable proof that the latter implementation 584 satisfies all the requirements mentioned above. 585 \end{enumerate} 586 587 The untrusted compiler will be written in the 588 \href{http://caml.inria.fr/}{OCaml} programming 589 language, developed and distributed by \nobreak{INRIA}, France's 590 national research institute for computer science. OCaml is a 591 generalpurpose programming language, especially suited for symbolic 592 manipulation of treelike data structures, of the kind typically used 593 during compilation. It is a simple and efficient language, designed 594 for program safety and reliability and particularly suited for rapid 595 prototyping. 596 597 For the certification of the compiler we plan to use the 598 \href{http://matita.cs.unibo.it}{Matita} 599 Interactive Theorem Prover, developed at the Computer Science 600 Department of the University of Bologna. 601 Matita is based on the Calculus of Inductive Constructions, 602 the same foundational paradigm as INRIA's Coq system, 603 and it is partially compatible with it. 604 It adopts a tactic based editing mode. 605 Proof objects (XMLencoded) are produced for storage and exchange. 606 Its graphical interface, inspired by CtCoq and Proof General, 607 supports high quality bidimensional rendering of proofs and formulae, 608 transformed onthefly to MathML markup. 609 In spite of its young age it has already been used for complex 610 611 formalizations, including non trivial results in Number Theory 612 and problems from the Poplmark challenge~\cite{poplmark}. 613 An executable specification for all models of Freescale 8bit 614 ALUs (Families HC05/HC08/RS08/HCS08) and memories (RAM, ROM, Flash) has 615 also already been formalised in Matita. 616 617 For the management of cost annotations and proof obligation synthesis 618 we plan to interface with the \href{http://caduceus.lri.fr/}{Caduceus} 619 verification tool for C programs, developed by the Computer Science 620 Laboratory of the University of Paris sud. Caduceus is built on top of 621 Why, a generalpurpose verification condition generator, exploiting 622 Dijkstra's weakest precondition calculus. The Why tool allows the 623 declaration of logical models (types, functions, predicates and 624 axioms) that can be used in programs and annotations; moreover, it can 625 be interfaced to a wide set of existing provers for verification of 626 the resulting conditions. In particular, we will rely on 627 \href{http://altergo.lri.fr}{AltErgo}, which includes a decision 628 procedure for linear arithmetic. 629 630 631 We plan to support almost every ANSI C construct (functions, pointers, arrays 632 and structures) and datatypes (integers of various sizes) except function 633 pointers, explicit jumps (goto) and pointer aliasing or casting. 634 These features do not seem to pose major additional challenges to the 635 technology we plan to develop, but could be time expensive to implement, 636 formalize and prove correct. Moreover, they are not currently supported 637 by Caduceus, posing additional problems for the development 638 of the proofofconcept prototype of the whole application. We could also 639 support floating point numbers, but the kind of microcontroller 640 we are targeting (8 and 16 bits processors) do not usually provide 641 instructions to efficiently process them. Moreover floating point 642 numbers are seldom used in embedded software for that very reason, 643 making them a feature of ANSI C of limited interest in our scenario. 644 645 We stress that the proposed approach handles costa annotations for C programs 646 as a special case of standard annotations for imperative programs whose 647 management we plan to automatize with tools such as Caduceus. As explained 648 above, the choice of the tool does have an impact on the fragment of ANSI C 649 constructs we will handle, and future advances in this domain could enlarge 650 the fragment of ANSI C under consideration. On the other hand, tools like 651 Caduceus pose no limitation on the invariants, which can be freely described in 652 a computational and very expressive logic. Hence, every technique to 653 automatically infer invariants and cost annotations can be exploited (and 654 often automatized) in Caduceus. 655 92 [[BR]]