1 | \begin{filecontents*}{example.eps} |
---|
2 | %!PS-Adobe-3.0 EPSF-3.0 |
---|
3 | %%BoundingBox: 19 19 221 221 |
---|
4 | %%CreationDate: Mon Sep 29 1997 |
---|
5 | %%Creator: programmed by hand (JK) |
---|
6 | %%EndComments |
---|
7 | gsave |
---|
8 | newpath |
---|
9 | 20 20 moveto |
---|
10 | 20 220 lineto |
---|
11 | 220 220 lineto |
---|
12 | 220 20 lineto |
---|
13 | closepath |
---|
14 | 2 setlinewidth |
---|
15 | gsave |
---|
16 | .4 setgray fill |
---|
17 | grestore |
---|
18 | stroke |
---|
19 | grestore |
---|
20 | \end{filecontents*} |
---|
21 | |
---|
22 | \RequirePackage{fix-cm} |
---|
23 | |
---|
24 | \documentclass[smallextended]{svjour3} |
---|
25 | |
---|
26 | \usepackage{amsfonts} |
---|
27 | \usepackage{amsmath} |
---|
28 | \usepackage{amssymb} |
---|
29 | \usepackage[british]{babel} |
---|
30 | \usepackage{color} |
---|
31 | \usepackage{fancybox} |
---|
32 | \usepackage{fancyvrb} |
---|
33 | \usepackage{graphicx} |
---|
34 | \usepackage[colorlinks, |
---|
35 | bookmarks,bookmarksopen,bookmarksdepth=2]{hyperref} |
---|
36 | \usepackage{hyphenat} |
---|
37 | \usepackage[utf8x]{inputenc} |
---|
38 | \usepackage{listings} |
---|
39 | \usepackage{mdwlist} |
---|
40 | \usepackage{microtype} |
---|
41 | \usepackage{stmaryrd} |
---|
42 | \usepackage{url} |
---|
43 | |
---|
44 | \usepackage{tikz} |
---|
45 | \usetikzlibrary{positioning,calc,patterns,chains,shapes.geometric,scopes} |
---|
46 | |
---|
47 | \lstset{language=C,basicstyle=\tt,basewidth=.5em,lineskip=-1.5pt} |
---|
48 | |
---|
49 | \newlength{\mylength} |
---|
50 | \newenvironment{frametxt}% |
---|
51 | {\setlength{\fboxsep}{5pt} |
---|
52 | \setlength{\mylength}{\linewidth}% |
---|
53 | \addtolength{\mylength}{-2\fboxsep}% |
---|
54 | \addtolength{\mylength}{-2\fboxrule}% |
---|
55 | \Sbox |
---|
56 | \minipage{\mylength}% |
---|
57 | \setlength{\abovedisplayskip}{0pt}% |
---|
58 | \setlength{\belowdisplayskip}{0pt}% |
---|
59 | }% |
---|
60 | {\endminipage\endSbox |
---|
61 | \[\fbox{\TheSbox}\]} |
---|
62 | |
---|
63 | \smartqed |
---|
64 | |
---|
65 | \title{CerCo: Certified Complexity\thanks{The project CerCo acknowledges the |
---|
66 | financial support of the Future and Emerging Technologies (FET) programme within |
---|
67 | the Seventh Framework Programme for Research of the European Commission, under |
---|
68 | FET-Open grant number: 243881}} |
---|
69 | \subtitle{Verified lifting of concrete complexity annotations through a realistic C compiler} |
---|
70 | \journalname{Journal of Automated Reasoning} |
---|
71 | \titlerunning{Certified Complexity} |
---|
72 | \date{Received: date / Accepted: date} |
---|
73 | \author{Jaap Boender \and Brian Campbell \and Dominic P. Mulligan \and Claudio {Sacerdoti Coen}} % who else? |
---|
74 | \authorrunning{Boender, Campbell, Mulligan, and Sacerdoti Coen} |
---|
75 | \institute{Jaap Boender \at |
---|
76 | Faculty of Science and Technology,\\ |
---|
77 | Middlesex University London,\\ |
---|
78 | United Kingdom.\\ |
---|
79 | \email{J.Boender@mdx.ac.uk} |
---|
80 | \and |
---|
81 | Brian Campbell \at |
---|
82 | Department of Informatics,\\ |
---|
83 | University of Edinburgh,\\ |
---|
84 | United Kingdom.\\ |
---|
85 | \email{Brian.Campbell@ed.ac.uk} |
---|
86 | \and |
---|
87 | Dominic P. Mulligan \at |
---|
88 | Computer Laboratory,\\ |
---|
89 | University of Cambridge, \\ |
---|
90 | United Kingdom.\\ |
---|
91 | \email{Dominic.Mulligan@cl.cam.ac.uk } |
---|
92 | \and |
---|
93 | Claudio Sacerdoti Coen \at |
---|
94 | Dipartimento di Informatica---Scienza e Ingegneria (DISI),\\ |
---|
95 | University of Bologna,\\ |
---|
96 | Italy.\\ |
---|
97 | \email{Claudio.SacerdotiCoen@unibo.it}} |
---|
98 | |
---|
99 | \begin{document} |
---|
100 | |
---|
101 | \maketitle |
---|
102 | |
---|
103 | \begin{abstract} |
---|
104 | We provide an overview of the FET-Open Project CerCo (`Certified Complexity'). |
---|
105 | Our main achievement is the development of a technique for analysing non-functional properties of programs (time, space) at the source level with little or no loss of accuracy and a small trusted code base. |
---|
106 | The core component is a C compiler, verified in the Matita theorem prover, that produces an instrumented copy of the source code in addition to generating object code. |
---|
107 | This instrumentation exposes, and tracks precisely, the actual (non-asymptotic) computational cost of the input program at the source level. |
---|
108 | Untrusted invariant generators and trusted theorem provers may then be used to compute and certify the parametric execution time of the code. |
---|
109 | \keywords{Verified compilation \and Complexity analysis \and CerCo (`Certified Complexity')} |
---|
110 | \end{abstract} |
---|
111 | |
---|
112 | % ---------------------------------------------------------------------------- % |
---|
113 | % SECTION % |
---|
114 | % ---------------------------------------------------------------------------- % |
---|
115 | |
---|
116 | \section{Introduction} |
---|
117 | \label{sect.introduction} |
---|
118 | |
---|
119 | %\paragraph{Problem statement.} |
---|
120 | Programs can be specified with both |
---|
121 | functional constraints (what the program must do) and non-functional constraints (what time, space or other resources the program may use). In the current |
---|
122 | state of the art, functional properties are verified |
---|
123 | by combining user annotations---preconditions, invariants, and so on---with a |
---|
124 | multitude of automated analyses---invariant generators, type systems, abstract |
---|
125 | interpretation, theorem proving, and so on---on the program's high-level source code. |
---|
126 | By contrast, many non-functional properties |
---|
127 | are verified using analyses on low-level object code, |
---|
128 | %\footnote{A notable |
---|
129 | % exception is the explicit allocation of heap space in languages like C, which |
---|
130 | % can be handled at the source level.} |
---|
131 | but these analyses may then need information |
---|
132 | about the high-level functional behaviour of the program that must then be reconstructed. |
---|
133 | This analysis on low-level object code has several problems: |
---|
134 | \begin{itemize*} |
---|
135 | \item |
---|
136 | It can be hard to deduce the high-level structure of the program after compiler optimisations. |
---|
137 | The object code produced by an optimising compiler may have radically different control flow to the original source code program. |
---|
138 | \item |
---|
139 | Techniques that operate on object code are not useful early in the development process of a program, yet problems with a program's design or implementation are cheaper to resolve earlier in the process, rather than later. |
---|
140 | \item |
---|
141 | Parametric cost analysis is very hard: how can we reflect a cost that depends on the execution state, for example the |
---|
142 | value of a register or a carry bit, to a cost that the user can understand |
---|
143 | looking at the source code? |
---|
144 | \item |
---|
145 | Performing functional analyses on object code makes it hard for the programmer to provide information about the program and its expected execution, leading to a loss of precision in the resulting analyses. |
---|
146 | \end{itemize*} |
---|
147 | \paragraph{Vision and approach.} |
---|
148 | We want to reconcile functional and |
---|
149 | non-functional analyses: to share information and perform both at the same time |
---|
150 | on high-level source code. |
---|
151 | % |
---|
152 | What has previously prevented this approach is the lack of a uniform and precise |
---|
153 | cost model for high-level code as each statement occurrence is compiled |
---|
154 | differently, optimisations may change control flow, and the cost of an object |
---|
155 | code instruction may depend on the runtime state of hardware components like |
---|
156 | pipelines and caches, all of which are not visible in the source code. |
---|
157 | |
---|
158 | We envision a new generation of compilers that track program structure through compilation and optimisation and exploit this |
---|
159 | information to define a precise, non-uniform cost model for source code that accounts for runtime state. With such a cost model we can |
---|
160 | reduce non-functional verification to the functional case and exploit the state |
---|
161 | of the art in automated high-level verification~\cite{survey}. The techniques |
---|
162 | currently used by the Worst Case Execution Time (WCET) community, who perform analyses on object code, |
---|
163 | are still available but can be coupled with additional source-level |
---|
164 | analyses. Where our approach produces overly complex cost models, safe approximations can be used to trade complexity with precision. |
---|
165 | Finally, source code analysis can be used early in the development process, when |
---|
166 | components have been specified but not implemented, as modularity means |
---|
167 | that it is enough to specify the non-functional behaviour of missing |
---|
168 | components. |
---|
169 | \paragraph{Contributions.} |
---|
170 | We have developed \emph{the labelling approach}~\cite{labelling}, a |
---|
171 | technique to implement compilers that induce cost models on source programs by |
---|
172 | very lightweight tracking of code changes through compilation. We have studied |
---|
173 | how to formally prove the correctness of compilers implementing this technique, and |
---|
174 | have implemented such a compiler from C to object binaries for the 8051 |
---|
175 | microcontroller for predicting execution time and stack space usage, |
---|
176 | verifying it in an interactive theorem prover. As we are targeting |
---|
177 | an embedded microcontroller we do not consider dynamic memory allocation. |
---|
178 | |
---|
179 | To demonstrate source-level verification of costs we have implemented |
---|
180 | a Frama-C plugin~\cite{framac} that invokes the compiler on a source |
---|
181 | program and uses it to generate invariants on the high-level source |
---|
182 | that correctly model low-level costs. The plugin certifies that the |
---|
183 | program respects these costs by calling automated theorem provers, a |
---|
184 | new and innovative technique in the field of cost analysis. Finally, |
---|
185 | we have conducted several case studies, including showing that the |
---|
186 | plugin can automatically compute and certify the exact reaction time |
---|
187 | of Lustre~\cite{lustre} data flow programs compiled into C. |
---|
188 | |
---|
189 | % ---------------------------------------------------------------------------- % |
---|
190 | % SECTION % |
---|
191 | % ---------------------------------------------------------------------------- % |
---|
192 | |
---|
193 | \section{Project context and approach} |
---|
194 | \label{sect.project.context.and.approach} |
---|
195 | |
---|
196 | Formal methods for verifying functional properties of programs have |
---|
197 | now reached a level of maturity and automation that their adoption is slowly |
---|
198 | increasing in production environments. For safety critical code, it is |
---|
199 | becoming commonplace to combine rigorous software engineering methodologies and testing |
---|
200 | with static analyses, taking the strengths of each and mitigating |
---|
201 | their weaknesses. Of particular interest are open frameworks |
---|
202 | for the combination of different formal methods, where the programs can be |
---|
203 | progressively specified and enriched with new safety |
---|
204 | guarantees: every method contributes knowledge (e.g. new invariants) that |
---|
205 | becomes an assumption for later analysis. |
---|
206 | |
---|
207 | The outlook for verifying non-functional properties of programs (time spent, |
---|
208 | memory used, energy consumed) is bleaker. |
---|
209 | % and the future seems to be getting even worse. |
---|
210 | Most industries verify that real time systems meet their deadlines |
---|
211 | by simply performing many runs of the system and timing their execution, computing the |
---|
212 | maximum time and adding an empirical safety margin, claiming the result to be a |
---|
213 | bound for the WCET of the program. Formal methods and software to statically |
---|
214 | analyse the WCET of programs exist, but they often produce bounds that are too |
---|
215 | pessimistic to be useful. Recent advancements in hardware architecture |
---|
216 | have been |
---|
217 | focused on the improvement of the average case performance, not the |
---|
218 | predictability of the worst case. Execution time is becoming increasingly |
---|
219 | dependent on execution history and the internal state of |
---|
220 | hardware components like pipelines and caches. Multi-core processors and non-uniform |
---|
221 | memory models are drastically reducing the possibility of performing |
---|
222 | static analysis in isolation, because programs are less and less time |
---|
223 | composable. Clock-precise hardware models are necessary for static analysis, and |
---|
224 | obtaining them is becoming harder due to the increased sophistication |
---|
225 | of hardware design. |
---|
226 | |
---|
227 | Despite these problems, the need for reliable real time |
---|
228 | systems and programs is increasing, and there is pressure |
---|
229 | from the research community for the introduction of |
---|
230 | hardware with more predictable behaviour, which would be more suitable |
---|
231 | for static analysis. One example, being investigated by the Proartis |
---|
232 | project~\cite{proartis}, is to decouple execution time from execution |
---|
233 | history by introducing randomisation. |
---|
234 | |
---|
235 | In CerCo~\cite{cerco} we do not address this problem, optimistically |
---|
236 | assuming that improvements in low-level timing analysis or architecture will make |
---|
237 | verification feasible in the longer term. Instead, the main objective of our work is |
---|
238 | to bring together the static analysis of functional and non-functional |
---|
239 | properties, which in the current state of the art are |
---|
240 | independent activities with limited exchange of information: while the |
---|
241 | functional properties are verified on the source code, the analysis of |
---|
242 | non-functional properties is performed on object code to exploit |
---|
243 | clock-precise hardware models. |
---|
244 | |
---|
245 | \subsection{Current object-code methods} |
---|
246 | |
---|
247 | Analysis currently takes place on object code for two main reasons. |
---|
248 | First, there cannot be a uniform, precise cost model for source |
---|
249 | code instructions (or even basic blocks). During compilation, high level |
---|
250 | instructions are broken up and reassembled in context-specific ways so that |
---|
251 | identifying a fragment of object code and a single high level instruction is |
---|
252 | infeasible. Even the control flow of the object and source code can be very |
---|
253 | different as a result of optimisations, for example aggressive loop |
---|
254 | optimisations may completely transform source level loops. Despite the lack of a uniform, compilation- and |
---|
255 | program-independent cost model on the source language, the literature on the |
---|
256 | analysis of non-asymptotic execution time on high level languages assuming |
---|
257 | such a model is growing and gaining momentum. However, unless we provide a |
---|
258 | replacement for such cost models, this literature's future practical impact looks |
---|
259 | to be minimal. Some hope has been provided by the EmBounded project \cite{embounded}, which |
---|
260 | compositionally compiles high-level code to a byte code that is executed by an |
---|
261 | interpreter with guarantees on the maximal execution time spent for each byte code |
---|
262 | instruction. This provides a uniform model at the expense of the model's |
---|
263 | precision (each cost is a pessimistic upper bound) and the performance of the |
---|
264 | executed code (because the byte code is interpreted compositionally instead of |
---|
265 | performing a fully non-compositional compilation). |
---|
266 | |
---|
267 | The second reason to perform the analysis on the object code is that bounding |
---|
268 | the worst case execution time of small code fragments in isolation (e.g. loop |
---|
269 | bodies) and then adding up the bounds yields very poor estimates as no |
---|
270 | knowledge of the hardware state prior to executing the fragment can be assumed. By |
---|
271 | analysing longer runs the bound obtained becomes more precise because the lack |
---|
272 | of information about the initial state has a relatively small impact. |
---|
273 | |
---|
274 | To calculate the cost of an execution, value and control flow analyses |
---|
275 | are required to bound the number of times each basic block is |
---|
276 | executed. Currently, state |
---|
277 | of the art WCET analysis tools, such as AbsInt's aiT toolset~\cite{absint}, perform these analyses on |
---|
278 | object code, where the logic of the program is harder to reconstruct |
---|
279 | and most information available at the source code level has been lost; |
---|
280 | see~\cite{stateart} for a survey. Imprecision in the analysis can lead to useless bounds. To |
---|
281 | augment precision, the tools ask the user to provide constraints on |
---|
282 | the object code control flow, usually in the form of bounds on the |
---|
283 | number of iterations of loops or linear inequalities on them. This |
---|
284 | requires the user to manually link the source and object code, |
---|
285 | translating his assumptions on the source code (which may be wrong) to |
---|
286 | object code constraints. The task is error prone and hard, especially |
---|
287 | in the presence of complex compiler optimisations. |
---|
288 | |
---|
289 | Traditional techniques for WCET that work on object code are also affected by |
---|
290 | another problem: they cannot be applied before the generation of the object |
---|
291 | code. Functional properties can be analysed in early development stages, while |
---|
292 | analysis of non-functional properties may come too late to avoid expensive |
---|
293 | changes to the program architecture. |
---|
294 | |
---|
295 | \subsection{CerCo's approach} |
---|
296 | |
---|
297 | In CerCo we propose a radically new approach to the problem: we reject the idea |
---|
298 | of a uniform cost model and we propose that the compiler, which knows how the |
---|
299 | code is translated, must return the cost model for basic blocks of high level |
---|
300 | instructions. It must do so by keeping track of the control flow modifications |
---|
301 | to reverse them and by interfacing with processor timing analysis. |
---|
302 | |
---|
303 | By embracing compilation, instead of avoiding it like EmBounded did, a CerCo |
---|
304 | compiler can both produce efficient code and return costs that are |
---|
305 | as precise as the processor timing analysis can be. Moreover, our costs can be |
---|
306 | parametric: the cost of a block can depend on actual program data, on a |
---|
307 | summary of the execution history, or on an approximated representation of the |
---|
308 | hardware state. For example, loop optimisations may assign a cost to a loop body |
---|
309 | that is a function of the number of iterations performed. As another example, |
---|
310 | the cost of a block may be a function of the vector of stalled pipeline states, |
---|
311 | which can be exposed in the source code and updated at each basic block exit. It |
---|
312 | is parametricity that allows one to analyse small code fragments without losing |
---|
313 | precision. In the analysis of the code fragment we do not have to ignore the |
---|
314 | initial hardware state, rather, we may assume that we know exactly which |
---|
315 | state (or mode, as the WCET literature calls it) we are in. |
---|
316 | |
---|
317 | The CerCo approach has the potential to dramatically improve the state of the |
---|
318 | art. By performing control and data flow analyses on the source code, the error |
---|
319 | prone translation of invariants is completely avoided. Instead, this |
---|
320 | work is done at the source level using tools of the user's choice. |
---|
321 | Any available technique for the verification of functional properties |
---|
322 | can be immediately reused and multiple techniques can collaborate together to |
---|
323 | infer and certify cost invariants for the program. There are no |
---|
324 | limitations on the types of loops or data structures involved. Parametric cost analysis |
---|
325 | becomes the default one, with non-parametric bounds used as a last resort when the user |
---|
326 | decides to trade the complexity of the analysis with its precision. \emph{A priori}, no |
---|
327 | technique previously used in traditional WCET is lost: processor |
---|
328 | timing analyses can be used by the compiler on the object code, and the rest can be applied |
---|
329 | at the source code level. |
---|
330 | Our approach can also work in the early |
---|
331 | stages of development by axiomatically attaching costs to unimplemented components. |
---|
332 | |
---|
333 | |
---|
334 | Software used to verify properties of programs must be as bug free as |
---|
335 | possible. The trusted code base for verification consists of the code that needs |
---|
336 | to be trusted to believe that the property holds. The trusted code base of |
---|
337 | state-of-the-art WCET tools is very large: one needs to trust the control flow |
---|
338 | analyser, the linear programming libraries used, and also the formal models |
---|
339 | of the hardware under analysis, for example. In CerCo we are moving the control flow analysis to the source |
---|
340 | code and we are introducing a non-standard compiler too. To reduce the trusted |
---|
341 | code base, we implemented a prototype and a static analyser in an interactive |
---|
342 | theorem prover, which was used to certify that the costs added to the source |
---|
343 | code are indeed those incurred by the hardware. Formal models of the |
---|
344 | hardware and of the high level source languages were also implemented in the |
---|
345 | interactive theorem prover. Control flow analysis on the source code has been |
---|
346 | obtained using invariant generators, tools to produce proof obligations from |
---|
347 | generated invariants and automatic theorem provers to verify the obligations. If |
---|
348 | these tools are able to generate proof traces that can be |
---|
349 | independently checked, the only remaining component that enters the trusted code |
---|
350 | base is an off-the-shelf invariant generator which, in turn, can be proved |
---|
351 | correct using an interactive theorem prover. Therefore we achieve the double |
---|
352 | objective of allowing the use of more off-the-shelf components (e.g. provers and |
---|
353 | invariant generators) whilst reducing the trusted code base at the same time. |
---|
354 | |
---|
355 | %\paragraph{Summary of the CerCo objectives.} To summarize, the goal of CerCo is |
---|
356 | % to reconcile functional with non-functional analysis by performing them together |
---|
357 | % on the source code, sharing common knowledge about execution invariants. We want |
---|
358 | % to achieve the goal implementing a new generation of compilers that induce a |
---|
359 | % parametric, precise cost model for basic blocks on the source code. The compiler |
---|
360 | % should be certified using an interactive theorem prover to minimize the trusted |
---|
361 | % code base of the analysis. Once the cost model is induced, off-the-shelf tools |
---|
362 | % and techniques can be combined together to infer and prove parametric cost |
---|
363 | % bounds. |
---|
364 | %The long term benefits of the CerCo vision are expected to be: |
---|
365 | %1. the possibility to perform static analysis during early development stages |
---|
366 | %2. parametric bounds made easier |
---|
367 | %3. the application of off-the-shelf techniques currently unused for the |
---|
368 | % analysis of non-functional properties, like automated proving and type systems |
---|
369 | %4. simpler and safer interaction with the user, that is still asked for |
---|
370 | % knowledge, but on the source code, with the additional possibility of actually |
---|
371 | % verifying the provided knowledge |
---|
372 | %5. a reduced trusted code base |
---|
373 | %6. the increased accuracy of the bounds themselves. |
---|
374 | % |
---|
375 | %The long term success of the project is hindered by the increased complexity of |
---|
376 | % the static prediction of the non-functional behaviour of modern hardware. In the |
---|
377 | % time frame of the European contribution we have focused on the general |
---|
378 | % methodology and on the difficulties related to the development and certification |
---|
379 | % of a cost model inducing compiler. |
---|
380 | |
---|
381 | % ---------------------------------------------------------------------------- % |
---|
382 | % SECTION % |
---|
383 | % ---------------------------------------------------------------------------- % |
---|
384 | |
---|
385 | \section{The typical CerCo workflow} |
---|
386 | \label{sec:workflow} |
---|
387 | |
---|
388 | \begin{figure}[!t] |
---|
389 | \begin{tabular}{l@{\hspace{0.2cm}}|@{\hspace{0.2cm}}l} |
---|
390 | \begin{lstlisting} |
---|
391 | char a[] = {3, 2, 7, 14}; |
---|
392 | char threshold = 4; |
---|
393 | |
---|
394 | int count(char *p, int len) { |
---|
395 | char j; |
---|
396 | int found = 0; |
---|
397 | for (j=0; j < len; j++) { |
---|
398 | if (*p <= threshold) |
---|
399 | found++; |
---|
400 | p++; |
---|
401 | } |
---|
402 | return found; |
---|
403 | } |
---|
404 | |
---|
405 | int main() { |
---|
406 | return count(a,4); |
---|
407 | } |
---|
408 | \end{lstlisting} |
---|
409 | & |
---|
410 | % $\vcenter{\includegraphics[width=7.5cm]{interaction_diagram.pdf}}$ |
---|
411 | \begin{tikzpicture}[ |
---|
412 | baseline={([yshift=-.5ex]current bounding box)}, |
---|
413 | element/.style={draw, text width=1.6cm, on chain, text badly centered}, |
---|
414 | >=stealth |
---|
415 | ] |
---|
416 | {[start chain=going below, node distance=.5cm] |
---|
417 | \node [element] (cerco) {CerCo\\compiler}; |
---|
418 | \node [element] (cost) {CerCo\\cost plugin}; |
---|
419 | {[densely dashed] |
---|
420 | \node [element] (ded) {Deductive\\platform}; |
---|
421 | \node [element] (check) {Proof\\checker}; |
---|
422 | } |
---|
423 | } |
---|
424 | \coordinate [left=3.5cm of cerco] (left); |
---|
425 | {[every node/.style={above, text width=3.5cm, text badly centered, |
---|
426 | font=\scriptsize}] |
---|
427 | \draw [<-] ([yshift=-1ex]cerco.north west) coordinate (t) -- |
---|
428 | node {C source} |
---|
429 | (t-|left); |
---|
430 | \draw [->] (cerco) -- (cost); |
---|
431 | \draw [->] ([yshift=1ex]cerco.south west) coordinate (t) -- |
---|
432 | node {C source+\color{red}{cost annotations}} |
---|
433 | (t-|left) coordinate (cerco out); |
---|
434 | \draw [->] ([yshift=1ex]cost.south west) coordinate (t) -- |
---|
435 | node {C source+\color{red}{cost annotations}\\+\color{blue}{synthesized assertions}} |
---|
436 | (t-|left) coordinate (out); |
---|
437 | {[densely dashed] |
---|
438 | \draw [<-] ([yshift=-1ex]ded.north west) coordinate (t) -- |
---|
439 | node {C source+\color{red}{cost annotations}\\+\color{blue}{complexity assertions}} |
---|
440 | (t-|left) coordinate (ded in) .. controls +(-.5, 0) and +(-.5, 0) .. (out); |
---|
441 | \draw [->] ([yshift=1ex]ded.south west) coordinate (t) -- |
---|
442 | node {complexity obligations} |
---|
443 | (t-|left) coordinate (out); |
---|
444 | \draw [<-] ([yshift=-1ex]check.north west) coordinate (t) -- |
---|
445 | node {complexity proof} |
---|
446 | (t-|left) .. controls +(-.5, 0) and +(-.5, 0) .. (out); |
---|
447 | \draw [dash phase=2.5pt] (cerco out) .. controls +(-1, 0) and +(-1, 0) .. |
---|
448 | (ded in); |
---|
449 | }} |
---|
450 | %% user: |
---|
451 | % head |
---|
452 | \draw (current bounding box.west) ++(-.2,.5) |
---|
453 | circle (.2) ++(0,-.2) -- ++(0,-.1) coordinate (t); |
---|
454 | % arms |
---|
455 | \draw (t) -- +(-.3,-.2); |
---|
456 | \draw (t) -- +(.3,-.2); |
---|
457 | % body |
---|
458 | \draw (t) -- ++(0,-.4) coordinate (t); |
---|
459 | % legs |
---|
460 | \draw (t) -- +(-.2,-.6); |
---|
461 | \draw (t) -- +(.2,-.6); |
---|
462 | \end{tikzpicture} |
---|
463 | \end{tabular} |
---|
464 | \caption{On the left: C code to count the number of elements in an array |
---|
465 | that are less than or equal to a given threshold. On the right: CerCo's interaction |
---|
466 | diagram. Components provided by CerCo are drawn with a solid border.} |
---|
467 | \label{test1} |
---|
468 | \end{figure} |
---|
469 | We illustrate the workflow we envisage (on the right |
---|
470 | of~\autoref{test1}) on an example program (on the left |
---|
471 | of~\autoref{test1}). The user writes the program and feeds it to the |
---|
472 | CerCo compiler, which outputs an instrumented version of the same |
---|
473 | program that updates global variables that record the elapsed |
---|
474 | execution time and the stack space usage. The red lines in |
---|
475 | \autoref{itest1} introducing variables, functions and function calls |
---|
476 | starting with \lstinline'__cost' and \lstinline'__stack' are the instrumentation introduced by the |
---|
477 | compiler. For example, the two calls at the start of |
---|
478 | \lstinline'count' say that 4 bytes of stack are required, and that it |
---|
479 | takes 111 cycles to reach the next cost annotation (in the loop body). |
---|
480 | The compiler measures these on the labelled object code that it generates. |
---|
481 | |
---|
482 | The annotated program can then be enriched with complexity |
---|
483 | assertions in the style of Hoare logic, that are passed to a deductive |
---|
484 | platform (in our case Frama-C). We provide as a Frama-C cost plugin a |
---|
485 | simple automatic synthesiser for complexity assertions which can |
---|
486 | be overridden by the user to increase or decrease accuracy. These are the blue |
---|
487 | comments starting with \lstinline'/*@' in \autoref{itest1}, written in |
---|
488 | Frama-C's specification language, ACSL. From the |
---|
489 | assertions, a general purpose deductive platform produces proof |
---|
490 | obligations which in turn can be closed by automatic or interactive |
---|
491 | provers, ending in a proof certificate. |
---|
492 | |
---|
493 | % NB: if you try this example with the live CD you should increase the timeout |
---|
494 | |
---|
495 | Twelve proof obligations are generated from~\autoref{itest1} (to prove |
---|
496 | that the loop invariant holds after one execution if it holds before, |
---|
497 | to prove that the whole program execution takes at most 1358 cycles, and so on). Note that the synthesised time bound for \lstinline'count', |
---|
498 | $178+214*(1+\text{\lstinline'len'})$ cycles, is parametric in the length of |
---|
499 | the array. The CVC3 prover |
---|
500 | closes all obligations within half a minute on routine commodity |
---|
501 | hardware. A simpler non-parametric version can be solved in a few |
---|
502 | seconds. |
---|
503 | % |
---|
504 | \fvset{commandchars=\\\{\}} |
---|
505 | \lstset{morecomment=[s][\color{blue}]{/*@}{*/}, |
---|
506 | moredelim=[is][\color{blue}]{$}{$}, |
---|
507 | moredelim=[is][\color{red}]{|}{|}, |
---|
508 | lineskip=-2pt} |
---|
509 | \begin{figure}[!p] |
---|
510 | \begin{lstlisting} |
---|
511 | |int __cost = 33, __stack = 5, __stack_max = 5;| |
---|
512 | |void __cost_incr(int incr) { __cost += incr; }| |
---|
513 | |void __stack_incr(int incr) { |
---|
514 | __stack += incr; |
---|
515 | __stack_max = __stack_max < __stack ? __stack : __stack_max; |
---|
516 | }| |
---|
517 | |
---|
518 | char a[4] = {3, 2, 7, 14}; char threshold = 4; |
---|
519 | |
---|
520 | /*@ behavior stack_cost: |
---|
521 | ensures __stack_max <= __max(\old(__stack_max), 4+\old(__stack)); |
---|
522 | ensures __stack == \old(__stack); |
---|
523 | behavior time_cost: |
---|
524 | ensures __cost <= \old(__cost)+(178+214*__max(1+\at(len,Pre), 0)); |
---|
525 | */ |
---|
526 | int count(char *p, int len) { |
---|
527 | char j; int found = 0; |
---|
528 | |__stack_incr(4); __cost_incr(111);| |
---|
529 | $__l: /* internal */$ |
---|
530 | /*@ for time_cost: loop invariant |
---|
531 | __cost <= \at(__cost,__l)+ |
---|
532 | 214*(__max(\at((len-j)+1,__l), 0)-__max(1+(len-j), 0)); |
---|
533 | for stack_cost: loop invariant |
---|
534 | __stack_max == \at(__stack_max,__l); |
---|
535 | for stack_cost: loop invariant |
---|
536 | __stack == \at(__stack,__l); |
---|
537 | loop variant len-j; |
---|
538 | */ |
---|
539 | for (j = 0; j < len; j++) { |
---|
540 | |__cost_incr(78);| |
---|
541 | if (*p <= threshold) { |__cost_incr(136);| found ++; } |
---|
542 | else { |__cost_incr(114);| } |
---|
543 | p ++; |
---|
544 | } |
---|
545 | |__cost_incr(67); __stack_incr(-4);| |
---|
546 | return found; |
---|
547 | } |
---|
548 | |
---|
549 | /*@ behavior stack_cost: |
---|
550 | ensures __stack_max <= __max(\old(__stack_max), 6+\old(__stack)); |
---|
551 | ensures __stack == \old(__stack); |
---|
552 | behavior time_cost: |
---|
553 | ensures __cost <= \old(__cost)+1358; |
---|
554 | */ |
---|
555 | int main(void) { |
---|
556 | int t; |
---|
557 | |__stack_incr(2); __cost_incr(110);| |
---|
558 | t = count(a,4); |
---|
559 | |__stack_incr(-2);| |
---|
560 | return t; |
---|
561 | } |
---|
562 | \end{lstlisting} |
---|
563 | \caption{The instrumented version of the program in \autoref{test1}, |
---|
564 | with instrumentation added by the CerCo compiler in red and cost invariants |
---|
565 | added by the CerCo Frama-C plugin in blue. The \lstinline'__cost', |
---|
566 | \lstinline'__stack' and \lstinline'__stack_max' variables hold the elapsed time |
---|
567 | in clock cycles and the current and maximum stack usage. Their initial values |
---|
568 | hold the clock cycles spent in initialising the global data before calling |
---|
569 | \lstinline'main' and the space required by global data (and thus unavailable for |
---|
570 | the stack). |
---|
571 | } |
---|
572 | \label{itest1} |
---|
573 | \end{figure} |
---|
574 | |
---|
575 | % ---------------------------------------------------------------------------- % |
---|
576 | % SECTION % |
---|
577 | % ---------------------------------------------------------------------------- % |
---|
578 | |
---|
579 | \section{Conclusions} |
---|
580 | \label{sect.conclusions} |
---|
581 | |
---|
582 | \begin{acknowledgements} |
---|
583 | \end{acknowledgements} |
---|
584 | |
---|
585 | \bibliographystyle{spmpsci} |
---|
586 | \bibliography{cerco} |
---|
587 | |
---|
588 | \end{document} |
---|