# Changeset 3334

Ignore:
Timestamp:
Jun 7, 2013, 11:36:25 AM (6 years ago)
Message:

 r3333 \begin{abstract} This paper provides an overview of the FET-Open Project CerCo (Certified Complexity'). The project's main achievement has been (Certified Complexity'). The project's main achievement is the development of a technique for performing static analyses of non-functional properties of programs (time, space) at the source level, without losing accuracy and with a small, trusted code base. The main software component properties of programs (time, space) at the source level, without loss of accuracy and with a small, trusted code base. The main software component developed is a formally certified complexity certifying compiler. The compiler translates source code to object code and produces an instrumented copy of the source code. The latter exposes at source code. This instrumentation exposes at the source level---and tracks precisely---the actual (non-asymptotic) computational cost of the input program. Untrusted invariant generators and trusted theorem provers \section{Introduction} \paragraph{Problem statement.} Computer programs can be specified with both \paragraph{Problem statement} Computer programs can be specified with both functional constraints (what a program must do) and non-functional constraints (e.g. what resources – time, space, etc. – the program may use).  In the current (e.g. what resources---time, space, etc.---the program may use).  In the current state of the art, functional properties are verified for high-level source code by combining user annotations (e.g. preconditions and invariants) with a multitude of automated analyses (invariant generators, type systems, abstract interpretation, theorem proving, etc.). By contrast, non-functional properties interpretation, theorem proving, and so on). By contrast, non-functional properties are generally checked on low-level object code, but also demand information about high-level functional behaviour that must somehow be recreated. This situation presents several problems: 1) it can be hard to infer this high-level structure in the presence of compiler optimizations; 2) techniques working on object code are not useful in early development; yet problems high-level structure in the presence of compiler optimisations; 2) techniques working on object code are not useful in early development, yet problems detected later are more expensive to tackle; 3) parametric cost analysis is very hard: how can we reflect a cost that depends on the execution state (e.g. the value of a register or a carry bit) to a cost that the user can understand looking at source code?; 4) functional analysis performed only on object code looking at source code? 4) functional analysis performed only on object code makes any contribution from the programmer hard, leading to less precision in the estimates. \paragraph{Vision and approach.} We want to reconcile functional and non-functional analysis: to share information and perform both at the same time \paragraph{Vision and approach} We want to reconcile functional and non-functional analyses: to share information and perform both at the same time on source code. % What has previously prevented this approach is the lack of a uniform and precise cost model for high-level code: 1) each statement occurrence is compiled differently and optimizations may change control flow; 2) the cost of an object differently and optimisations may change control flow; 2) the cost of an object code instruction may depend on the runtime state of hardware components like pipelines and caches, which is not visible in the source code. pipelines and caches, all of which are not visible in the source code. To solve the issue, we envision a new generation of compilers able to keep track and accounts for runtime state. With such a source-level cost model we can reduce non-functional verification to the functional case and exploit the state of the art in automated high-level verification \cite{survey}. The techniques previously used by Worst Case Execution Time (WCET) analysers on the object code are still available, but can now be coupled with additional source-level of the art in automated high-level verification~\cite{survey}. The techniques currently used by the Worst Case Execution Time (WCET) community, which performs its analysis on object code , are still available but can now be coupled with an additional source-level analysis. Where the approach produces precise cost models too complex to reason about, safe approximations can be used to trade complexity with precision. components. \paragraph{Contributions.} We have developed the labeling approach \cite{labeling}, a \paragraph{Contributions} We have developed what we refer to as \emph{the labeling approach} \cite{labeling}, a technique to implement compilers that induce cost models on source programs by very lightweight tracking of code changes through compilation. We have studied \section{Project context and objectives} Formal methods for verification of functional properties of programs have reached a level of maturity and automation that is allowing a slow but now reached a level of maturity and automation that is facilitating a slow but increasing adoption in production environments. For safety critical code, it is getting usual to combine rigorous software engineering methodologies and testing with static analysis in order to benefit from the strong points of every approach and mitigate the weaknesses. Of particular interest are open frameworks becoming commonplace to combine rigorous software engineering methodologies and testing with static analysis, taking the strong points of each approach and mitigating their weaknesses. Of particular interest are open frameworks for the combination of different formal methods, where the programs can be progressively specified and are continuously enriched with new safety The scenario for the verification of non-functional properties (time spent, memory used, energy consumed) is bleaker and the future seems to be getting even worse. Most industries verify that real time systems meets their deadlines simply measuring the time spent in many runs of the systems,  computing the worse. Most industries verify that real time systems meet their deadlines by simply performing many runs of the system and timing their execution,  computing the maximum time and adding an empirical safety margin, claiming the result to be a bound for the WCET of the program. Formal methods and software to statically analyse the WCET of programs exist, but they often produce bounds that are too pessimistic to be useful. Recent advancements in hardware architectures is all pessimistic to be useful. Recent advancements in hardware architectures are all focused on the improvement of the average case performance, not the predictability of the worst case. Execution time is getting more and more dependent from the execution history, that determines the internal state of hardware components like pipelines and caches. Multi-core processors and non uniform memory models are drastically reducing the possibility of performing predictability of the worst case. Execution time is becoming increasingly dependent on execution history and the internal state of hardware components like pipelines and caches. Multi-core processors and non-uniform memory models are drastically reducing the possibility of performing static analysis in isolation, because programs are less and less time composable. Clock precise hardware models are necessary to static analysis, and getting them is becoming harder as a consequence of the increased hardware complexity. Despite all the above, the need for reliable real time systems and programs is increasing, and there is a raising pressure from the research community towards the differentiation of hardware. The aim is the introduction of alternative hardware with a more predictable behaviour, more suitable to be statically analysed. An example is the decoupling execution time from the execution history by introducing randomization \cite{proartis}. composable. Clock-precise hardware models are necessary for static analysis, and obtaining them is becoming harder as a consequence of the increased sophistication of hardware design. Despite the aforementioned problems, the need for reliable real time systems and programs is increasing, and there is increasing pressure from the research community towards the differentiation of hardware. The aim is to introduce alternative hardware with more predictable behaviour and hence more suitability for static analyses, for example, the decoupling of execution time from execution history by introducing randomisation \cite{proartis}. In the CerCo project \cite{cerco} we do not try to address this problem, optimistically assuming that static analysis of non-functional properties of programs will return to be feasible in the long term. The main objective of our work is become feasible in the longer term. The main objective of our work is instead to bring together static analysis of functional and non-functional properties, which, according to the current state of the art, are completely functional properties are verified on the source code, the analysis of non-functional properties is entirely performed on the object code to exploit clock precise hardware models. There are two main reasons to currently perform the analysis on the object code. The first one is the obvious lack of a uniform, precise cost model for source clock-precise hardware models. Analysis currently takes place on object code for two main reasons. First, there is a lack of a uniform, precise cost model for source code instructions (or even basic blocks). During compilation, high level instructions are torn apart and reassembled in context-specific ways so that identifying a fragment of object code with a single high level instruction is identifying a fragment of object code and a single high level instruction is infeasible. Even the control flow of the object and source code can be very different as a result of optimizations, for example due to aggressive loop optimisations. Despite the lack of a uniform, compilation- and different as a result of optimisations, for example aggressive loop optimisations may completely transform source level loops. Despite the lack of a uniform, compilation- and program-independent cost model on the source language, the literature on the analysis of non asymptotic execution time on high level languages that assumes such a model is growing and getting momentum. Its practical usefulness is doomed to be minimal, unless we can provide a replacement for such cost models. Some hope has been provided by the EmBounded project \cite{embounded}, which compositionally compiles high level code to a byte code that is executed by an analysis of non-asymptotic execution time on high level languages that assumes such a model is growing and gaining momentum. However, unless we can provide a replacement for such cost models, this literature's future practical impact looks to be minimal. Some hope has been provided by the EmBounded project \cite{embounded}, which compositionally compiles high-level code to a byte code that is executed by an emulator with guarantees on the maximal execution time spent for each byte code instruction. The approach indeed provides a uniform model, at the price of loosing precision of the model (each cost is a pessimistic upper bound) and performance of the executed code (because the byte code is emulated compositionally instead of performing a fully non compositional compilation). instruction. The approach provides a uniform model at the price of the model's precision (each cost is a pessimistic upper bound) and performance of the executed code (because the byte code is emulated  compositionally instead of performing a fully non-compositional compilation). % The second reason to perform the analysis on the object code is that bounding knowledge on the hardware state can be assumed when executing the fragment. By analysing longer runs the bound obtained becomes more precise because the lack of knowledge on the initial state has less effects on longer computations. of knowledge on the initial state has less of an effect on longer computations. In CerCo we propose a radically new approach to the problem: we reject the idea the cost of a block may be a function of the vector of stalled pipeline states, which can be exposed in the source code and updated at each basic block exit. It is parametricity that allows to analyse small code fragments without loosing is parametricity that allows one to analyse small code fragments without losing precision: in the analysis of the code fragment we do not have to ignore the initial hardware state. On the contrary, we can assume to know exactly which state (or mode, as WCET literature calls it) we are in. initial hardware state. On the contrary, we can assume that we know exactly which state (or mode, as the WCET literature calls it) we are in. The cost of an execution is the sum of the cost of basic blocks multiplied by Current state of the art WCET technology \cite{stateart} performs the analysis on the object code, where the logic of the program is harder to reconstruct and most information available on the source code has been lost. Imprecision in the analysis leads to useless bounds. To program is harder to reconstruct and most information available at the source code level has been lost. Imprecision in the analysis leads to useless bounds. To augment precision, the tools ask the user to provide constraints on the object code control flow, usually in the form of bounds on the number of iterations of the source and object code, translating his assumptions on the source code (which may be wrong) to object code constraints. The task is error prone and hard, in presence of complex optimizations. The CerCo approach has the potentiality to dramatically improve the state of the art. By performing control and data flow analysis on the source code, the error hard, especially in the presence of complex compiler optimisations. The CerCo approach has the potential to dramatically improve the state of the art. By performing control and data flow analyses on the source code, the error prone translation of invariants is completely avoided. It is in fact performed by the compiler itself when it induces the cost model on the source language. infer and certify cost invariants for the program. Parametric cost analysis becomes the default one, with non parametric bounds used as last resorts when trading the complexity of the analysis with its precision. A priori, no trading the complexity of the analysis with its precision. \emph{A priori}, no technique previously used in traditional WCET is lost: they can just be applied on the source code. at the source code level. Traditional techniques for WCET that work on object code are also affected by analyser and the linear programming libraries it uses and also the formal models of the hardware. In CerCo we are moving the control flow analysis to the source code and we are introducing a non standard compiler too. To reduce the trusted code and we are introducing a non-standard compiler too. To reduce the trusted code base, we implemented a prototype and a static analyser in an interactive theorem prover, which was used to certify that the cost computed on the source base is an off-the-shelf invariant generator which, in turn, can be proved correct using an interactive theorem prover. Therefore we achieve the double objective of allowing to use more off-the-shelf components (e.g. provers and invariant generators) while reducing the trusted code base at the same time. objective of allowing the use more off-the-shelf components (e.g. provers and invariant generators) whilst reducing the trusted code base at the same time. %\paragraph{Summary of the CerCo objectives.} To summarize, the goal of CerCo is % of a cost model inducing compiler. \section{The typical CerCo work-flow} \section{The typical CerCo workflow} \begin{figure}[!t] \begin{tabular}{l@{\hspace{0.2cm}}|@{\hspace{0.2cm}}l} \end{tabular} \caption{On the left: code to count the array's elements greater or equal to the treshold. On the right: CerCo's interaction that are greater or equal to the treshold. On the right: CerCo's interaction diagram. CerCo's components are drawn solid.} \label{test1} \end{figure} We illustrate the work-flow we envision (on the right of \autoref{test1}) on an example program (on the left of \autoref{test1}). We illustrate the workflow we envisage (on the right of~\autoref{test1}) on an example program (on the left of~\autoref{test1}). The user writes the program and feeds it to the CerCo compiler, which outputs an instrumented version of the same program that updates global variables compiler. The annotated program can then be enriched with complexity assertions in the style of Hoare logic, that are passed to a deductive platform (in our case Frama-C). We provide as a Frama-c cost plugin a simple automatic case Frama-C). We provide as a Frama-C cost plugin a simple automatic synthesiser for complexity assertions (the blue lines in \autoref{itest1}), which can be overridden by the user to increase or decrease accuracy. From the one execution if it holds before, to prove that the whole program execution takes at most 1228 cycles, etc.). The CVC3 prover closes all obligations in a few seconds on standard hardware. in a few seconds on routine commodity hardware. % \fvset{commandchars=\\\{\}}