Changeset 3320


Ignore:
Timestamp:
Jun 6, 2013, 11:49:38 AM (4 years ago)
Author:
sacerdot
Message:

...

File:
1 edited

Legend:

Unmodified
Added
Removed
  • Papers/fopara2013/fopara13.tex

    r3319 r3320  
    4141
    4242\title{Certified Complexity (CerCo)\thanks{The project CerCo acknowledges the financial support of the Future and Emerging Technologies (FET) programme within the Seventh Framework Programme for Research of the European Commission, under FET-Open grant number: 243881}}
    43 \author{Roberto M. Amadio$^{3,4}$ \and Nicolas Ayache$^4$ \and François Bobot$^4$ \and Jaap Boender$^1$ \and Brian Campbell$^2$ \and Ilias Garnier$^2$ \and
    44 Antoine Madet$^4$ \and James McKinna$^2$ \and
    45 Dominic P. Mulligan$^1$ \and Mauro Piccolo$^1$ \and Yann R\'egis-Gianas$^4$ \and
    46 Claudio Sacerdoti Coen$^1$ \and Ian Stark$^2$ \and Paolo Tranquilli$^1$}
     43\author{
     44%Roberto
     45R. M. Amadio$^{3,4}$ \and
     46%Nicolas
     47N. Ayache$^4$ \and
     48%François
     49F. Bobot$^4$ \and
     50%Jacob
     51J. P. Boender$^1$ \and
     52%Brian
     53B. Campbell$^2$ \and
     54%Ilias
     55I. Garnier$^2$ \and
     56%Antoine
     57A. Madet$^4$ \and
     58%James
     59J. McKinna$^2$ \and
     60%Dominic
     61D. P. Mulligan$^1$ \and
     62%Mauro
     63M. Piccolo$^1$ \and
     64%Yann
     65Y. R\'egis-Gianas$^4$ \and
     66%Claudio
     67C. Sacerdoti Coen$^1$ \and
     68%Ian
     69I. Stark$^2$ \and
     70%Paolo
     71P. Tranquilli$^1$}
    4772\institute{Dipartimento di Informatica - Scienza e Ingegneria, Universit\'a di Bologna \and
    4873LFCS, University of Edinburgh
     
    5984
    6085\begin{abstract}
    61 This paper provides an overview of the just completed FET-Open Project CerCo
    62 (Certified Complexity), 2010-2013. The main achievement of the project has been
     86This paper provides an overview of the FET-Open Project CerCo
     87(Certified Complexity). The main achievement of the project has been
    6388the introduction of a technique to perform static analysis of non functional
    6489properties of programs (time, space) at the source level, without loosing
     
    82107This situation presents several problems: 1) it can be hard to infer this high-level structure in the presence of compiler optimizations; 2) techniques working on object code are not useful in early development;  yet problems detected later are more expensive to tackle; 3) parametric cost analysis is very hard: how can we reflect a cost that depends on the execution state (e.g. the value of a register or a carry bit) to a cost that the user can understand looking at source code?; 4) functional analysis performed only on object code leaves out any contribution from the programmer, giving results less precise than those from source code and reducing the precision of the cost estimates computed.
    83108
    84 \paragraph{CerCo vision and approach:} We propose a reconciliation of functional and non-functional analysis: to share information and perform both at the same time on source code.
     109\paragraph{Vision and approach:} We want to reconcile functional and non-functional analysis: to share information and perform both at the same time on source code.
    85110
    86111What has previously prevented this approach is lack of a uniform and precise cost model for high-level code: 1) each statement occurrence is compiled differently and optimizations may change control flow; 2) the cost of an object code instruction may depend on the runtime state of hardware components like pipelines and caches, which is not visible in the source code.
     
    91116
    92117\section{Project context and objectives}
    93 Formal methods for verification of functional properties of programs have reached a level of maturity and automation that is allowing a slow but increasing adoption rate in production environments. For safety critical code, it is getting usual to combine rigorous software engineering methodologies and testing with static analysis in order to benefits from the strong points of every approach and mitigate the weaknesses. Particularly interesting are open frameworks for the combination of different formal methods, where the programs can be progressively specified and are continuously enriched with new safety guarantees: every method contributes knowledge (e.g. new invariants) that becomes an assumption for later analysis.
     118Formal methods for verification of functional properties of programs have reached a level of maturity and automation that is allowing a slow but increasing adoption in production environments. For safety critical code, it is getting usual to combine rigorous software engineering methodologies and testing with static analysis in order to benefits from the strong points of every approach and mitigate the weaknesses. Particularly interesting are open frameworks for the combination of different formal methods, where the programs can be progressively specified and are continuously enriched with new safety guarantees: every method contributes knowledge (e.g. new invariants) that becomes an assumption for later analysis.
    94119
    95120The scenario for the verification of non functional properties (time spent, memory used, energy consumed) is more bleak and the future seems to be getting even worse. Most industries verify that real time systems meets their deadlines simply measuring the time spent in many runs of the systems,  computing the maximum time and adding an empirical safety margin, claiming the result to be a bound for the Worst Case Execution Time of the program. Formal methods and software to statically analyse the WCET of programs exist, but they often produce bounds that are too pessimistic to be useful. Recent advancements in hardware architectures is all focused on the improvement of the average case performance, not the predictability of the worst case. Execution time is getting more and more dependent from the execution history, that determines the internal state of hardware components like pipelines and caches. Multi-core processors and non uniform memory models are drastically reducing the possibility of performing static analysis in isolation, because programs are less and less time composable. Clock precise hardware models are necessary to static analysis, and getting them is becoming harder as a consequence of the increased hardware complexity.
     
    97122Despite the latter scenario, the need for reliable real time systems and programs is increasing, and there is an increasing pressure from the research community towards the differentiation of hardware. The aim is the introduction of alternative hardware whose behavior would be more predictable and more suitable to be statically analysed, for example decoupling execution time from the execution history by introducing randomization.
    98123
    99 In the CerCo project we do not try to address this problem, optimistically assuming that static analysis of non functional properties of programs will return to be feasible in the long term. The main objective of our work is instead to bring together static analysis of functional and non functional properties, which, according to the current state of the art, are completely independent activities with limited exchange of information: while the functional properties are verified on the source code of programs written in high level languages, the analysis of non functional properties is entirely performed on the object code to exploit clock precise hardware models.
    100 
    101 There are two main reasons to currently perform the analysis on the object code. The first one is the obvious lack of a uniform, precise cost model for source code instructions (or even basic blocks). During compilation, high level instructions are torn apart and reassembled in context specific ways so that there is no way to identify a fragment of object code with a single high level instruction. Even the control flow of the object and source code can be very different as a result of optimizations. For instance, loop optimizations reduce the number or the order of the iterations of loops, and may assign different object code, and thus different costs, to different iterations.  Despite the lack of a uniform, compilation and program independent cost model on the source language, the literature on the analysis of non asymptotic execution time on high level languages that assumes such a model is growing and getting momentum. Its practical usefulness is doomed to be minimal, unless we can provide a replacement for such cost models. Some hope has been provided by the EmBounded project (FP6 FET-Open STReP, IST-510255), which compositionally compiles high level code to a byte code that is executed by an emulator with guarantees on the maximal execution time spent for each byte code instruction. The approach indeed provides a uniform model, at the price of loosing precision of the model (each cost is a pessimistic upper bound) and performance of the executed code (because the byte code is emulated compositionally instead of performing a fully non compositional compilation).
    102 
    103 The second reason to perform the analysis on the object code is that bounding the worst case execution time of small code fragments in isolation (e.g. loop bodies) and then adding up the bounds yields very poor estimations because no knowledge on the hardware state can be assumed when executing the fragment. By analysing longer runs (e.g. by full unrolling loops) the bound obtained becomes more precise because the lack of knowledge on the initial state has less effects on longer computations.
     124In the CerCo project we do not try to address this problem, optimistically assuming that static analysis of non functional properties of programs will return to be feasible in the long term. The main objective of our work is instead to bring together static analysis of functional and non functional properties, which, according to the current state of the art, are completely independent activities with limited exchange of information: while the functional properties are verified on the source code, the analysis of non functional properties is entirely performed on the object code to exploit clock precise hardware models.
     125
     126There are two main reasons to currently perform the analysis on the object code. The first one is the obvious lack of a uniform, precise cost model for source code instructions (or even basic blocks). During compilation, high level instructions are torn apart and reassembled in context specific ways so that there is no way to identify a fragment of object code with a single high level instruction. Even the control flow of the object and source code can be very different as a result of optimizations. For instance, loop optimizations reduce the number or the order of the iterations of loops, and may assign different object code, and thus different costs, to different iterations.  Despite the lack of a uniform, compilation and program independent cost model on the source language, the literature on the analysis of non asymptotic execution time on high level languages that assumes such a model is growing and getting momentum. Its practical usefulness is doomed to be minimal, unless we can provide a replacement for such cost models. Some hope has been provided by the EmBounded project, which compositionally compiles high level code to a byte code that is executed by an emulator with guarantees on the maximal execution time spent for each byte code instruction. The approach indeed provides a uniform model, at the price of loosing precision of the model (each cost is a pessimistic upper bound) and performance of the executed code (because the byte code is emulated compositionally instead of performing a fully non compositional compilation).
     127
     128The second reason to perform the analysis on the object code is that bounding the worst case execution time of small code fragments in isolation (e.g. loop bodies) and then adding up the bounds yields very poor estimations because no knowledge on the hardware state can be assumed when executing the fragment. By analysing longer runs the bound obtained becomes more precise because the lack of knowledge on the initial state has less effects on longer computations.
    104129
    105130In CerCo we propose a radically new approach to the problem: we reject the idea of a uniform cost model and we propose that the compiler, which knows how the code is translated, must return the cost model for basic blocks of high level instructions. It must do so by keeping track of the control flow modifications to reverse them and by interfacing with static analysers.
     
    109134The cost of an execution is always the sum of the cost of basic blocks multiplied by the number of times they are executed, which is a functional property of the program. Therefore, in order to perform (parametric) time analysis of programs, it is necessary to combine a cost model with control and data flow analysis. Current state of the art WCET technology performs the analysis on the object code, where the logic of the program is harder to reconstruct and most information available on the source code (e.g. types) has been lost. Imprecision in the analysis leads to useless bounds. To augment precision, the tools ask the user to provide constraints on the object code control flow, usually in the form of bounds on the number of iterations of loops or linear inequalities on them. This requires the user to manually link the source and object code, translating his often wrong assumptions on the source code to object code constraints. The task is error prone and, in presence of complex optimizations, may be very hard if not impossible.
    110135
    111 The CerCo approach has the potentiality to dramatically improve the state of the art. By performing control and data flow analysis on the source code, the error prone translation of invariants is completely avoided. It is in fact performed by the compiler itself when it induces the cost model on the source language. Moreover, any available technique for the verification of functional properties can be immediately reused and multiple techniques can collaborate together to infer and certify cost invariants for the program. Parametric cost analysis becomes the default one, with non parametric bounds used as last resorts when trading the complexity of the analysis with its precision. A priori, no technique previously used in traditional WCET is lost (e.g. full unrolling for non parametric costs): they can just be applied on the source code.
     136The CerCo approach has the potentiality to dramatically improve the state of the art. By performing control and data flow analysis on the source code, the error prone translation of invariants is completely avoided. It is in fact performed by the compiler itself when it induces the cost model on the source language. Moreover, any available technique for the verification of functional properties can be immediately reused and multiple techniques can collaborate together to infer and certify cost invariants for the program. Parametric cost analysis becomes the default one, with non parametric bounds used as last resorts when trading the complexity of the analysis with its precision. A priori, no technique previously used in traditional WCET is lost: they can just be applied on the source code.
    112137
    113138Traditional techniques for WCET that work on object code are also affected by another problem: they cannot be applied before the generation of the object code. Therefore analysis of functional properties of programs already starts in early development stages, while when analysis of non functional properties becomes possible the cost of changes to the program architecture can already be very high. Our approach already works in early development stages by axiomatically attaching costs to components that are not implemented yet.
     
    115140All software used to verify properties of programs must be as bug free as possible. The trusted code base for verification is made by the code that needs to be trusted to believe that the property holds. The trusted code base of state-of-the-art WCET tools is very large: one needs to trust the control flow analyser and the linear programming libraries it uses and also the formal models of the hardware. In CerCo we are moving the control flow analysis to the source code and we are introducing a non standard compiler too. To reduce the trusted code base, we implemented a prototype and a static analyser in an interactive theorem prover, which was used to certify that the cost computed on the source code is indeed the one actually spent by the hardware. Formal models of the hardware and of the high level source languages were also implemented in the interactive theorem prover. Control flow analysis on the source code has been obtained using invariant generators, tools to produce proof obligations from generated invariants and automatic theorem provers to verify the obligations. If the automatic provers are able to generate proof traces that can be independently checked, the only remaining component that enters the trusted code base is an off-the-shelf invariant generator which, in turn, can be proved correct using an interactive theorem prover. Therefore we achieve the double objective of allowing to use more off-the-shelf components (e.g. provers and invariant generators) while reducing the trusted code base at the same time.
    116141
    117 \paragraph{Summary of the CerCo objectives.} To summarize, the goal of CerCo is to reconcile functional with non functional analysis by performing them together on the source code, sharing common knowledge about execution invariants. We want to achieve the goal implementing a new generation of compilers that induce a parametric, precise cost model for basic blocks on the source code. The compiler should be certified using an interactive theorem prover to minimize the trusted code base of the analysis. Once the cost model is induced, off-the-shelf tools and techniques can be combined together to infer and prove parametric cost bounds.
    118 The long term benefits of the CerCo vision are expected to be:
    119 1. the possibility to perform static analysis during early development stages
    120 2.  parametric bounds made easier
    121 3.  the application of off-the-shelf techniques currently unused for the analysis of non functional properties, like automated proving and type systems
    122 4. simpler and safer interaction with the user, that is still asked for knowledge, but on the source code, with the additional possibility of actually verifying the provided knowledge
    123 5. a reduced trusted code base
    124 6. the increased accuracy of the bounds themselves.
    125 
    126 The long term success of the project is hindered by the increased complexity of the static prediction of the non functional behavior of modern hardware. In the time frame of the European contribution we have focused on the general methodology and on the difficulties related to the development and certification of a cost model inducing compiler.
     142%\paragraph{Summary of the CerCo objectives.} To summarize, the goal of CerCo is to reconcile functional with non functional analysis by performing them together on the source code, sharing common knowledge about execution invariants. We want to achieve the goal implementing a new generation of compilers that induce a parametric, precise cost model for basic blocks on the source code. The compiler should be certified using an interactive theorem prover to minimize the trusted code base of the analysis. Once the cost model is induced, off-the-shelf tools and techniques can be combined together to infer and prove parametric cost bounds.
     143%The long term benefits of the CerCo vision are expected to be:
     144%1. the possibility to perform static analysis during early development stages
     145%2.  parametric bounds made easier
     146%3.  the application of off-the-shelf techniques currently unused for the analysis of non functional properties, like automated proving and type systems
     147%4. simpler and safer interaction with the user, that is still asked for knowledge, but on the source code, with the additional possibility of actually verifying the provided knowledge
     148%5. a reduced trusted code base
     149%6. the increased accuracy of the bounds themselves.
     150%
     151%The long term success of the project is hindered by the increased complexity of the static prediction of the non functional behavior of modern hardware. In the time frame of the European contribution we have focused on the general methodology and on the difficulties related to the development and certification of a cost model inducing compiler.
    127152
    128153\section{The typical CerCo workflow}
    129 \begin{figure}[t]
     154\begin{figure}[!t]
    130155\begin{tabular}{l@{\hspace{0.2cm}}|@{\hspace{0.2cm}}l}
    131156\begin{lstlisting}
     
    199224\end{tikzpicture}
    200225\end{tabular}
    201 \caption{On the left: a C program that counts the array's elements
     226\caption{On the left: code to count the array's elements
    202227 greater or equal to the treshold. On the right: CerCo's interaction
    203228 diagram. CerCo's components are drawn solid.}
    204229\label{test1}
    205230\end{figure}
    206 In order to illustrate the typical workflow we envision, we will
    207 follow what is done on an example program (on the left of \autoref{test1},
    208 while on the right a depiction of the workflow can be found).
    209 The user writes the C program and feeds it to the Cerco compiler, which outputs
    210 an instrumented version of the same program. The annotated program can then be enriched
    211 with complexity assertions in the style of Hoare logic, that are passed to a deductive
    212 platform (in our case Frama-C). Optionally a part of these assertions can be automatically
    213 sinthesised by the Frama-C cost plugin. The deductive platform builds the
    214 proof obligations which in turn can be closed by automatic or interactive provers,
    215 ending in a proof certificate.
    216 
    217 \autoref{itest1} shows the result of the process as far as the CerCo part is
    218 concerned. The red lines show the instrumentation of the code as given by the CerCo
    219 compiler. We can keep track of the time and stack cost by reasoning on
    220 variables added by the instrumentation. \texttt{\_\_cost} holds the clock cycles
    221 incurred by executing the compiled program, while \texttt{\_\_stack} and
    222 \texttt{\_\_max\_stack} hold respectively the current and maximal stack usage,
    223 including the space used up by global variables. We therefore have two kinds of annotations:
    224 time increments (in clock cycles) and stack variations (in bytes). In this particular case
    225 all data (including the return address) is kept in hardware registers, so no stack is used.
    226 The initialisation of the \texttt{\_\_cost} and \texttt{\_\_stack} variables tell respectively
    227 the initialisation time cost (before \texttt{main} is called) and the amount of memory occupied by global data.
     231We illustrate the workflow we envision (on the right of \autoref{test1})
     232on an example program (on the left of \autoref{test1}).
     233The user writes the program and feeds it to the Cerco compiler, which outputs
     234an instrumented version of the same program that updates global variables
     235that record the elapsed execution time and the stack space usage.
     236The red lines in \autoref{itest1} are the instrumentation introduced by the
     237compiler. The annotated program can then be enriched with complexity assertions in the style of Hoare logic, that are passed to a deductive platform (in our case Frama-C). We provide in a Frama-c cost plug-in a simple automatic synthesiser for complexity assertions, which can be overridden by the user to increase or decrease accuracy. The blue lines in \autoref{itest1} are the assertions added
     238by the plug-in. From the assertions, a general purpose deductive platform produces proof obligations which in turn can be closed by automatic or interactive provers, ending in a proof certificate. Nine proof obligations are generated
     239from~\autoref{itest1} (to prove that the loop invariant holds after
     240one execution if it holds before, to prove that the whole program execution
     241takes at most 1228 cycles, etc.). The CVC3 prover closes all obligations
     242in a few seconds on standard hardware.
    228243%
    229244\fvset{commandchars=\\\{\}}
     
    284299\end{lstlisting}
    285300\caption{The instrumented version of the program in \autoref{test1},
    286  with lines added by the CerCo compiler in red and those added by the CerCo
    287  Frama-C plugin in blue.}
     301 with instrumentation added by the CerCo compiler in red and cost invariants
     302 added by the CerCo Frama-C plugin in blue. The \texttt{\_\_cost},
     303 \texttt{\_\_stack} and \texttt{\_\_stack\_max} variables hold the elapsed time in clock cycles and the current and max stack usage. Their initial values
     304hold the clock cycles spent in initializying the global data before calling
     305\texttt{main} and the space required by global data (and thus unavailable for
     306the stack).
     307}
    288308\label{itest1}
    289309\end{figure}
    290 
    291 The lines in blue in \autoref{itest1} show the output of the Frama-C cost plugin. Of particular interest
    292 are the annotations of the \texttt{main} function. Proving those correct (which
    293 the automatic prover CVC3 preforms in 8 seconds), together with the results on the
    294 CerCo compiler, provides the mechanical proof that the function does not
    295 use stack and takes 1228 cycles to run. More complex annotations can outline the
    296 dependency of the costs from the function parameters..
    297 
    298310\section{Main S\&T results}
    299311We will now review the main S\&T results achieved in the CerCo project. We will address them in the following order:
Note: See TracChangeset for help on using the changeset viewer.