# Changeset 3430 for Papers/fopara2013

Ignore:
Timestamp:
Feb 13, 2014, 2:12:30 PM (7 years ago)
Message:

Revisions up to the end of section 3.

File:
1 edited

### Legend:

Unmodified
 r3429 the development of a technique for analysing non-functional properties of programs (time, space) at the source level, with little or no loss of accuracy and with a small, trusted code base. The main software component and with a small trusted code base. The main software component developed is a certified compiler producing cost annotations. The compiler translates source code to object code and produces an instrumented copy of the \section{Introduction} \paragraph{Problem statement.} Computer programs can be specified with both %\paragraph{Problem statement.} Computer programs can be specified with both functional constraints (what a program must do) and non-functional constraints (e.g. what resources---time, space, etc.---the program may use).  In the current (what resources---time, space, etc.---the program may use).  In the current state of the art, functional properties are verified for high-level source code by combining user annotations (e.g. preconditions and invariants) with a interpretation, theorem proving, and so on). By contrast, non-functional properties are generally checked on low-level object code, but also demand information about high-level functional behaviour that must somehow be recreated. about high-level functional behaviour that must somehow be reconstructed. This situation presents several problems: 1) it can be hard to infer this hard: how can we reflect a cost that depends on the execution state (e.g. the value of a register or a carry bit) to a cost that the user can understand looking at source code? 4) functional analysis performed only on object code makes any contribution from the programmer hard, leading to less precision in looking at source code? 4) performing functional analysis on the object code makes it hard for the programmer to provide information, leading to less precision in the estimates. of program structure through compilation and optimisation, and to exploit that information to define a cost model for source code that is precise, non-uniform, and accounts for runtime state. With such a source-level cost model we can and which accounts for runtime state. With such a source-level cost model we can reduce non-functional verification to the functional case and exploit the state of the art in automated high-level verification~\cite{survey}. The techniques analysis. Where the approach produces precise cost models too complex to reason about, safe approximations can be used to trade complexity with precision. Finally, source code analysis can be made during early development stages, when Finally, source code analysis can be used during the early stages of development, when components have been specified but not implemented: source code modularity means that it is enough to specify the non-functional behaviour of unimplemented how to formally prove the correctness of compilers implementing this technique. We have implemented such a compiler from C to object binaries for the 8051 micro-controller, and verified it in an interactive theorem prover. We have implemented a Frama-C plugin \cite{framac} that invokes the compiler on a source program and uses this to generate invariants on the high-level source that correctly model low-level costs. Finally, the plugin certifies that the program respects these costs by calling automated theorem provers, a new and innovative technique in the field of cost analysis. As a case study, we show how the plugin can automatically compute and certify the exact reaction time of Lustre \cite{lustre} data flow programs compiled into C. \section{Project context and objectives} micro-controller, and verified it in an interactive theorem prover. To demonstrate source-level verification of costs we have implemented a Frama-C plugin \cite{framac} that invokes the compiler on a source program and uses this to generate invariants on the high-level source that correctly model low-level costs. The plugin certifies that the program respects these costs by calling automated theorem provers, a new and innovative technique in the field of cost analysis. Finally, we have conduct several case studies, including showing that the plugin can automatically compute and certify the exact reaction time of Lustre~\cite{lustre} data flow programs compiled into C. \section{Project context and approach} Formal methods for verification of functional properties of programs have now reached a level of maturity and automation that is facilitating a slow but The scenario for the verification of non-functional properties (time spent, memory used, energy consumed) is bleaker and the future seems to be getting even worse. Most industries verify that real time systems meet their deadlines memory used, energy consumed) is bleaker. % and the future seems to be getting even worse. Most industries verify that real time systems meet their deadlines by simply performing many runs of the system and timing their execution,  computing the maximum time and adding an empirical safety margin, claiming the result to be a bound for the WCET of the program. Formal methods and software to statically analyse the WCET of programs exist, but they often produce bounds that are too pessimistic to be useful. Recent advancements in hardware architectures are all pessimistic to be useful. Recent advancements in hardware architecture have been focused on the improvement of the average case performance, not the predictability of the worst case. Execution time is becoming increasingly static analysis in isolation, because programs are less and less time composable. Clock-precise hardware models are necessary for static analysis, and obtaining them is becoming harder as a consequence of the increased sophistication obtaining them is becoming harder due to the increased sophistication of hardware design. Despite the aforementioned problems, the need for reliable real time systems and programs is increasing, and there is increasing pressure from the research community towards the differentiation of hardware. The aim is to introduce alternative hardware with more predictable behaviour and hence more suitability for static analyses, for example, the decoupling of execution time from execution history by introducing randomisation \cite{proartis}. Despite the aforementioned problems, the need for reliable real time systems and programs is increasing, and there is increasing pressure from the research community towards the introduction of alternative hardware with more predictable behaviour, which would be more suitable for static analysis.  One example being investigated by the Proartis project~\cite{proartis} is to decouple execution time from execution history by introducing randomisation. In the CerCo project \cite{cerco} we do not try to address this problem, optimistically assuming that static analysis of non-functional properties of programs will become feasible in the longer term. The main objective of our work is assuming that improvements in low-level timing analysis or architecture will make verification feasible in the longer term. The main objective of our work is instead to bring together static analysis of functional and non-functional properties, which, according to the current state of the art, are completely program-independent cost model on the source language, the literature on the analysis of non-asymptotic execution time on high level languages that assumes such a model is growing and gaining momentum. However, unless we can provide a such a model is growing and gaining momentum. However, unless we provide a replacement for such cost models, this literature's future practical impact looks to be minimal. Some hope has been provided by the EmBounded project \cite{embounded}, which compositionally compiles high-level code to a byte code that is executed by an emulator with guarantees on the maximal execution time spent for each byte code instruction. That approach provides a uniform model at the price of the model's precision (each cost is a pessimistic upper bound) and performance of the executed code (because the byte code is emulated compositionally instead of interpreter with guarantees on the maximal execution time spent for each byte code instruction. This provides a uniform model at the expense of the model's precision (each cost is a pessimistic upper bound) and the performance of the executed code (because the byte code is interpreted compositionally instead of performing a fully non-compositional compilation). the worst case execution time of small code fragments in isolation (e.g. loop bodies) and then adding up the bounds yields very poor estimates because no knowledge on the hardware state can be assumed when executing the fragment. By knowledge of the hardware state on executing the fragment can be assumed. By analysing longer runs the bound obtained becomes more precise because the lack of knowledge on the initial state has less of an effect on longer computations. The cost of an execution is the sum of the cost of basic blocks multiplied by the number of times they are executed, which is a functional property of the program. Therefore, in order to perform (parametric) time analysis of programs, it is necessary to combine a cost model with control and data flow analysis. Current state of the art WCET technology such as AbsInt (see~\cite{stateart} for a survey) performs the analysis on the object code, where the logic of the program is harder to reconstruct and most information available at the source code level has been lost. Imprecision in the analysis leads to useless bounds. To augment precision, the tools ask the user to provide constraints on the object code control flow, usually in the form of bounds on the number of iterations of loops or linear inequalities on them. This requires the user to manually link the source and object code, translating his assumptions on the source code (which may be wrong) to object code constraints. The task is error prone and hard, especially in the presence of complex compiler optimisations. of information about the initial state has a relatively small impact. To calculate the cost of an execution, value and control flow analyses are required to bound the number of times each basic block is executed.  Current state of the art WCET technology such as AbsInt performs these analyses on the object code, where the logic of the program is harder to reconstruct and most information available at the source code level has been lost; see~\cite{stateart} for a survey. Imprecision in the analysis can lead to useless bounds. To augment precision, the tools ask the user to provide constraints on the object code control flow, usually in the form of bounds on the number of iterations of loops or linear inequalities on them. This requires the user to manually link the source and object code, translating his assumptions on the source code (which may be wrong) to object code constraints. The task is error prone and hard, especially in the presence of complex compiler optimisations. Traditional techniques for WCET that work on object code are also affected by code is translated, must return the cost model for basic blocks of high level instructions. It must do so by keeping track of the control flow modifications to reverse them and by interfacing with static analysers. to reverse them and by interfacing with processor timing analysis. By embracing compilation, instead of avoiding it like EmBounded did, a CerCo compiler can at the same time produce efficient code and return costs that are as precise as the static analysis can be. Moreover, we allow our costs to be parametric: the cost of a block can depend on actual program data or on a summary of the execution history or on an approximated representation of the compiler can both produce efficient code and return costs that are as precise as the processor timing analysis can be. Moreover, our costs can be parametric: the cost of a block can depend on actual program data, on a summary of the execution history, or on an approximated representation of the hardware state. For example, loop optimizations may assign to a loop body a cost that is a function of the number of iterations performed. As another example, can be immediately reused and multiple techniques can collaborate together to infer and certify cost invariants for the program. Parametric cost analysis becomes the default one, with non parametric bounds used as last resorts when becomes the default one, with non-parametric bounds used as a last resort when trading the complexity of the analysis with its precision. \emph{A priori}, no technique previously used in traditional WCET is lost: they can just be applied technique previously used in traditional WCET is lost: processor timing analyses can be used by the compiler on the object code, and the rest can be applied at the source code level. Our approach can also work in the early All software used to verify properties of programs must be as bug free as possible. The trusted code base for verification is made by the code that needs possible. The trusted code base for verification consists of the code that needs to be trusted to believe that the property holds. The trusted code base of state-of-the-art WCET tools is very large: one needs to trust the control flow code and we are introducing a non-standard compiler too. To reduce the trusted code base, we implemented a prototype and a static analyser in an interactive theorem prover, which was used to certify that the cost computed on the source code is indeed the one actually spent by the hardware. Formal models of the theorem prover, which was used to certify that the costs added to the source code are indeed those incurred by the hardware. Formal models of the hardware and of the high level source languages were also implemented in the interactive theorem prover. Control flow analysis on the source code has been execution time and the stack space usage.  The red lines in \autoref{itest1} introducing variables, functions and function calls starting \lstinline'__cost' and \lstinline'__stack' are the instrumentation introduced by the compiler. starting with \lstinline'__cost' and \lstinline'__stack' are the instrumentation introduced by the compiler.  For example, the two calls at the start of \lstinline'count' say that 4 bytes of stack are required, and that it takes 111 cycles to reach the next cost annotation (in the loop body). The compiler measures these on the labeled object code that it generates. The annotated program can then be enriched with complexity assertions in the style of Hoare logic, that are passed to a deductive platform (in our case Frama-C). We provide as a Frama-C cost plugin a simple automatic synthesiser for complexity assertions (the blue comments starting with \lstinline'/*@' in \autoref{itest1}), which can be overridden by the user to increase or decrease accuracy. From the simple automatic synthesiser for complexity assertions which can be overridden by the user to increase or decrease accuracy.  These are the blue comments starting with \lstinline'/*@' in \autoref{itest1}. From the assertions, a general purpose deductive platform produces proof obligations which in turn can be closed by automatic or interactive to prove that the whole program execution takes at most 1358 cycles, etc.).  Note that the synthesised time bound for \lstinline'count', $178+214*(1+\lstinline'len')$ cycles, is parametric in the length of $178+214*(1+\text{\lstinline'len'})$ cycles, is parametric in the length of the array. The CVC3 prover closes all obligations within half a minute on routine commodity ensures __cost <= \old(__cost)+(178+214*__max(1+\at(len,Pre), 0)); */ int count(unsigned char *p, int len) { int count(char *p, int len) { char j;  int found = 0; |__stack_incr(4);  __cost_incr(111);| many changes and more compiler code, but it is not fundamentally more complex. The proof of correctness, however, becomes much harder. \item We target a microprocessor that has a non uniform memory model, which is \item We target a microprocessor that has a non-uniform memory model, which is still often the case for microprocessors used in embedded systems and that is becoming common again in multi-core processors. Therefore the compiler has to