# Changeset 3317

Ignore:
Timestamp:
Jun 5, 2013, 7:02:47 PM (6 years ago)
Message:

moved workflow figure and added a descriptive text

File:
1 edited

### Legend:

Unmodified
 r3316 The long term success of the project is hindered by the increased complexity of the static prediction of the non functional behavior of modern hardware. In the time frame of the European contribution we have focused on the general methodology and on the difficulties related to the development and certification of a cost model inducing compiler. \section{Main S\&T results} We will now review the main S\&T results achieved in the CerCo project. We will address them in the following order: \begin{enumerate} \item \emph{The (basic) labelling approach.} It is the main technique that underlies all the developments in CerCo. It allows to track the evolution of basic blocks during compilation in order to propagate the cost model from the object code to the source code without loosing precision in the process. \item \emph{Dependent labelling.} The basic labelling approach assumes a bijective mapping between object code and source code O(1) blocks (called basic blocks). This assumption is violated by many program optimizations (e.g. loop peeling and loop unrolling). It also assumes the cost model computed on the object code to be non parametric: every block must be assigned a cost that does not depend on the state. This assumption is violated by stateful hardware like pipelines, caches, branch prediction units. The dependent labelling approach is an extension of the basic labelling approach that allows to handle parametric cost models. We showed how the method allows to deal with loop optimizations and pipelines, and we speculated about its applications to caches. \item \emph{Techniques to exploit the induced cost model.} Every technique used for the analysis of functional properties of programs can be adapted to analyse the non functional properties of the source code instrumented by compilers that implement the labelling approach. In order to gain confidence in this claim, we showed how to implement a cost invariant generator combining abstract interpretation with separation logic ideas. We integrated everything in the Frama-C modular architecture, in order to automatically compute proof obligations from the functional and the cost invariants and to use automatic theorem provers to proof them. This is an example of a new technique that is not currently exploited in WCET analysis. It also shows how precise functional invariants benefits the non functional analysis too. Finally, we show how to fully automatically analyse the reaction time of Lustre nodes that are first compiled to C using a standard Lustre compiler and then processed by a C compiler that implements the labelling approach. \item \emph{The CerCo compiler.} This is a compiler from a large subset of the C program to 8051/8052 object code. The compiler implements the labelling approach and integrates a static analyser for 8051 executables. The latter can be implemented easily and does not require dependent costs because the 8051 microprocessor is a very simple processor whose instructions generally have a constant cost. It was picked to separate the issue of exact propagation of the cost model from the target to the source language from the orthogonal problem of the static analysis of object code that requires approximations or dependent costs. The compiler comes in several versions: some are prototypes implemented directly in OCaml, and they implement both the basic and dependent labelling approaches; the final version is extracted from a Matita certification and at the moment implements only the basic approach. \item \emph{A formal cost certification of the CerCo compiler.} We implemented the CerCo compiler in the interactive theorem prover Matita and have formally certified that the cost model induced on the source code correctly and precisely predicts the object code behavior. We actually induce two cost models, one for time and one for stack space used. We show the correctness of the prediction only for those programs that do not exhaust the available machine stack space, a property that thanks to the stack cost model we can statically analyse on the source code using our Frama-C tool. The preservation of functional properties we take as an assumption, not itself formally proved in CerCo.  Other projects have already certified the preservation of functional semantics in similar compilers, and we have not attempted to directly repeat that work. In order to complete the proof for non-functional properties, we have introduced a new semantics for programming languages based on a new kind of structured observables with the relative notions of forward similarity and the study of the intentional consequences of forward similarity. We have also introduced a unified representation for back-end intermediate languages that was exploited to provide a uniform proof of forward similarity. \end{enumerate} \subsection{The (basic) labelling approach.} \paragraph{Problem statement:} given a source program P, we want to obtain an instrumented source program P',  written in the same programming language, and the object code O such that: 1) P' is obtained by inserting into P some additional instructions to update global cost information like the amount of time spent during execution or the maximal stack space required; 2) P and P' must have the same functional behavior, i.e., they must produce that same output and intermediate observables; 3) P and O must have the same functional behavior; 4) after execution and in interesting points during execution, the cost information computed by P' must be an upper bound of the one spent by O  to perform the corresponding operations (soundness property); 5) the difference between the costs computed by P' and the execution costs of O must be bounded by a program dependent constant (precision property). \paragraph{The labeling software components:} we solve the problem in four stages, implemented by four software components that are used in sequence. \begin{enumerate} \item The first component labels the source program P by injecting label emission statements in the program in appropriate positions. The set of labels with their positions is called labelling. The syntax and semantics of the source programming language is augmented with label emission statements. The statement “EMIT l” behaves like a NOP instruction that does not affect the program state or control flow, but it changes the semantics by making the label l observable. Therefore the observables of a run of a program becomes a stream of label emissions: l1 … ln, called the program trace. We clarify the conditions that the labelling must respect later. \item The second component is a labelling preserving compiler. It can be obtained from an existing compiler by adding label emission statements to every intermediate language and by propagating label emission statements during compilation. The compiler is correct if it preserves both the functional behavior of the program and the generated traces. We may also ask that the function that erases the cost emission statements commute with compilation. This optional property grants that the labelling does not interfere with the original compiler behavior. A further set of requirements will be added later. \item The third component is a static object code analyser. It takes in input the object code augmented with label emission statements and it computes, for every such statement, its scope. The scope of a label emission statement is the tree of instructions that may be executed after the statement and before a new label emission statement is found. It is a tree and not a sequence because the scope may contain a branching statement. In order to grant that such a finite tree exists, the object code must not contain any loop that is not broken by a label emission statement. This is the first requirement of a sound labelling. The analyser fails if the labelling is unsound. For each scope, the analyser computes an upper bound of the execution time required by the scope, using the maximum of the costs of the two branches in case of a conditional statement. Finally, the analyser computes the cost of a label by taking the maximum of the costs of the scopes of every statement that emits that label. \item The fourth and last component takes in input the cost model computed at step 3 and the labelled code computed at step 1. It outputs a source program obtained by replacing each label emission statement with a statement that increments the global cost variable with the cost associated to the label by the cost model.  The obtained source code is the instrumented source code. \end{enumerate} \paragraph{Correctness:} Requirements 1, 2 and 3 of the program statement obviously hold, with 2 and 3 being a consequence of the definition of a correct labelling preserving compiler. It is also obvious that the value of the global cost variable of an instrumented source code is at any time equal to the sum of the costs of the labels emitted by the corresponding labelled code. Moreover, because the compiler preserves all traces, the sum of the costs of the labels emitted in the source and target labelled code are the same. Therefore, to satisfy the 4th requirement, we need to grant that the time taken to execute the object code is equal to the sum of the costs of the labels emitted by the object code. We collect all the necessary conditions for this to happen in the definition of a sound labelling: a) all loops must be broken by a cost emission statement;  b) all program instructions must be in the scope of some cost emission statement. To satisfy also the 5th requirement, additional requirements must be imposed on the object code labelling to avoid all uses of the maximum in the cost computation: the labelling is precise if every label is emitted at most once and both branches of a conditional jump start with a label emission statement. The correctness and precision of the labelling approach only rely on the correctness and precision of the object code labelling. The simplest, but not necessary, way to achieve them is to impose correctness and precision requirements also on the initial labelling produced by the first software component, and to ask the labelling preserving compiler to preserve these properties too. The latter requirement imposes serious limitations on the compilation strategy and optimizations: the compiler may not duplicate any code that contains label emission statements, like loop bodies. Therefore several loop optimizations like peeling or unrolling are prevented. Moreover, precision of the object code labelling is not sufficient per se to obtain global precision: we also implicitly assumed the static analysis to be able to associate a precise constant cost to every instruction. This is not possible in presence of stateful hardware whose state influences the cost of operations, like pipelines and caches. In the next section we will see an extension of the basic labelling approach to cover this situation. The labelling approach described in this section can be applied equally well and with minor modifications to imperative and functional languages. In the CerCo project, we have tested it on a simple imperative language without functions (a While language), to a subset of C and to two compilation chains for a purely functional higher order language. The two main changes required to deal with functional languages are: 1) because global variables and updates are not available, the instrumentation phase produces monadic code to “update” the global costs; 2) the requirements for a sound and precise labelling of the source code must be changed when the compilation is based on CPS translations. In particular, we need to introduce both labels emitted before a statement is executed and labels emitted after a statement is executed. The latter capture code that is inserted by the CPS translation and that would escape all label scopes. Phases 1, 2 and 3 can be applied as well to logic languages (e.g. Prolog). However, the instrumentation phase cannot: in standard Prolog there is no notion of (global) variable whose state is not retracted during backtracking. Therefore, the cost of executing computations that are later backtracked would not be correctly counted in. Any extension of logic languages with non-backtrackable state should support the labelling approach. \subsection{Dependent labelling.} The core idea of the basic labelling approach is to establish a tight connection between basic blocks executed in the source and target languages. Once the connection is established, any cost model computed on the object code can be transferred to the source code, without affecting the code of the compiler or its proof. In particular, it is immediate that we can also transport cost models that associate to each label functions from hardware state to natural numbers. However, a problem arises during the instrumentation phase that replaces cost emission statements with increments of global cost variables. The global cost variable must be incremented with the result of applying the function associated to the label to the hardware state at the time of execution of the block. The hardware state comprises both the “functional” state that affects the computation (the value of the registers and memory) and the “non functional” state that does not (the pipeline and caches content for example). The former is in correspondence with the source code state, but reconstructing the correspondence may be hard and lifting the cost model to work on the source code state is likely to produce cost expressions that are too hard to reason on. Luckily enough, in all modern architectures the cost of executing single instructions is either independent of the functional state or the jitter --- the difference between the worst and best case execution times --- is small enough to be bounded without loosing too much precision. Therefore we can concentrate on dependencies over the “non functional” parts of the state only. The non functional state has no correspondence in the high level state and does not influence the functional properties. What can be done is to expose the non functional state in the source code. We just present here the basic intuition in a simplified form: the technical details that allow to handle the general case are more complex and can be found in the CerCo deliverables. We add to the source code an additional global variable that represents the non functional state and another one that remembers the last labels emitted. The state variable must be updated at every label emission statement, using an update function which is computed during static analysis too. The update function associates to each label a function from the recently emitted labels and old state to the new state. It is computed composing the semantics of every instruction in a basic block and restricting it to the non functional part of the state. Not all the details of the non functional state needs to be exposed, and the technique works better when the part of state that is required can be summarized in a simple data structure. For example, to handle simple but realistic pipelines it is sufficient to remember a short integer that encodes the position of bubbles (stuck instructions) in the pipeline. In any case, the user does not need to understand the meaning of the state to reason over the properties of the program. Moreover, at any moment the user or the invariant generator tools that analyse the instrumented source code produced by the compiler can decide to trade precision of the analysis with simplicity by approximating the parametric cost with safe non parametric bounds. Interestingly, the functional analysis of the code can determine which blocks are executed more frequently in order to approximate more aggressively the ones that are executed less. The idea of dependent labelling can also be applied to allow the compiler to duplicate blocks that contain labels (e.g. to allow loop optimizations). The effect of duplication is to assign a different cost to the different occurrences of a duplicated label. For example, loop peeling turns a loop into the concatenation of a copy of the loop body (that executes the first iteration) with the conditional execution of the loop (for the successive iterations). Because of further optimizations, the two copies of the loop will be compiled differently, with the first body usually taking more time. By introducing a variable that keep tracks of the iteration number, we can associate to the label a cost that is a function of the iteration number. The same technique works for loop unrolling without modifications: the function will assign a cost to the even iterations and another cost to the odd ones. The actual work to be done consists in introducing in the source code and for each loop a variable that counts the number of iterations. The loop optimization code that duplicate the loop bodies must also modify the code to propagate correctly the update of the iteration numbers. The technical details are more complex and can be found in the CerCo reports and publications. The implementation, however, is quite simple and the changes to a loop optimizing compiler are minimal. \subsection{Techniques to exploit the induced cost model.} We review the cost synthesis techniques developed in the project. The starting hypothesis is that we have a certified methodology to annotate blocks in the source code with constants which provide a sound and possibly precise upper bound on the cost of executing the blocks after compilation to object code. The principle that we have followed in designing the cost synthesis tools is that the synthetic bounds should be expressed and proved within a general purpose tool built to reason on the source code. In particular, we rely on the Frama − C tool to reason on C code and on the Coq proof-assistant to reason on higher-order functional programs. This principle entails that: 1) The inferred synthetic bounds are indeed correct as long as the general purpose tool is. 2) There is no limitation on the class of programs that can be handled as long as the user is willing to carry on an interactive proof. Of course, automation is desirable whenever possible. Within this framework, automation means writing programs that give hints to the general purpose tool. These hints may take the form, say, of loop invariants/variants, of predicates describing the structure of the heap, or of types in a light logic. If these hints are correct and sufficiently precise the general purpose tool will produce a proof automatically, otherwise, user interaction is required. \paragraph{The Cost plug-in and its application to the Lustre compiler} Frama − C is a set of analysers for C programs with a specification language called ACSL. New analyses can be dynamically added through a plug-in system. For instance, the Jessie plug-in allows deductive verification of C programs with respect to their specification in ACSL, with various provers as back-end tools. We developed the CerCo Cost plug-in for the Frama − C platform as a proof of concept of an automatic environment exploiting the cost annotations produced by the CerCo compiler. It consists of an OCaml program which in first approximation takes the following actions: (1) it receives as input a C program, (2) it applies the CerCo compiler to produce a related C program with cost annotations, (3) it applies some heuristics to produce a tentative bound on the cost of executing the C functions of the program as a function of the value of their parameters, (4) the user can then call the Jessie tool to discharge the related proof obligations. In the following we elaborate on the soundness of the framework and the experiments we performed with the Cost tool on the C programs produced by a Lustre compiler. \paragraph{Soundness} The soundness of the whole framework depends on the cost annotations added by the CerCo compiler, the synthetic costs produced by the Cost plug-in, the verification conditions (VCs) generated by Jessie, and the external provers discharging the VCs. The synthetic costs being in ACSL format, Jessie can be used to verify them. Thus, even if the added synthetic costs are incorrect (relatively to the cost annotations), the process as a whole is still correct: indeed, Jessie will not validate incorrect costs and no conclusion can be made about the WCET of the program in this case. In other terms, the soundness does not really depend on the action of the Cost plug-in, which can in principle produce any synthetic cost. However, in order to be able to actually prove a WCET of a C function, we need to add correct annotations in a way that Jessie and subsequent automatic provers have enough information to deduce their validity. In practice this is not straightforward even for very simple programs composed of branching and assignments (no loops and no recursion) because a fine analysis of the VCs associated with branching may lead to a complexity blow up. \paragraph{Experience with Lustre} Lustre is a data-flow language to program synchronous systems and the language comes with a compiler to C. We designed a wrapper for supporting Lustre files. The C function produced by the compiler implements the step function of the synchronous system and computing the WCET of the function amounts to obtain a bound on the reaction time of the system. We tested the Cost plug-in and the Lustre wrapper on the C programs generated by the Lustre compiler. For programs consisting of a few hundreds loc, the Cost plug-in computes a WCET and Alt − Ergo is able to discharge all VCs automatically. \paragraph{Handling C programs with simple loops} The cost annotations added by the CerCo compiler take the form of C instructions that update by a constant a fresh global variable called the cost variable. Synthesizing a WCET of a C function thus consists in statically resolving an upper bound of the difference between the value of the cost variable before and after the execution of the function, i.e. find in the function the instructions that update the cost variable and establish the number of times they are passed through during the flow of execution. In order to do the analysis the plugin makes the following assumptions on the programs: 1. No recursive functions. 2. Every loop must be annotated with a variant. The variants of ‘for’ loops are automatically inferred. The plugin proceeds as follows. \begin{enumerate} \item First the call graph of the program is computed. If the function f calls the function g then the function g is processed before the function f . \item The computation of the cost of the function is performed by traversing its control flow graph. The cost at a node is the maximum of the costs of the successors. \item In the case of a loop with a body that has a constant cost for every step of the loop, the cost is the product of the cost of the body and of the variant taken at the start of the loop. \item In the case of a loop with a body whose cost depends on the values of some free variables, a fresh logic function f is introduced to represent the cost of the loop in the logic assertions. This logic function takes the variant as a first parameter. The other parameters of f are the free variables of the body of the loop. An axiom is added to account the fact that the cost is accumulated at each step of the loop. \item The cost of the function is directly added as post-condition of the function \end{enumerate} The user can influence the annotation by different means: by using more precise variants or by annotating function with cost specification. The plugin will use this cost for the function instead of computing it. \paragraph{C programs with pointers} When it comes to verifying programs involving pointer-based data structures, such as linked lists, trees, or graphs, the use of traditional first-order logic to specify, and of SMT solvers to verify, shows some limitations. Separation logic is then an elegant alternative. Designed at the turn of the century, it is a program logic with a new notion of conjunction to express spatial heap separation.Bobot has recently introduced in the Jessie plug-in automatically generated separation predicates that allow to simulate separation logic reasoning in a traditional verification framework where the specification language, the verification condition generator, and the theorem provers were not designed with separation logic in mind. CerCo's plug-in can exploit the separation predicates to automatically reason on the cost of execution of simple heap manipulation programs such as an in-place list reversal. \subsection{The CerCo Compiler} In CerCo we have developed a certain number of cost preserving compilers based on the labelling approach. Excluding an initial certified compiler for a While language, all remaining compilers target realistic source languages --- a pure higher order functional language and a large subset of C with pointers, gotos and all data structures --- and real world target processors --- MIPS and the Intel 8051/8052 processor family. Moreover, they achieve a level of optimization that ranges from moderate (comparable to gcc level 1) to intermediate (including loop peeling and unrolling, hoisting and late constant propagation). The so called Trusted CerCo Compiler is the only one that was implemented in the interactive theorem prover Matita and its costs certified. The code distributed is obtained extracting OCaml code from the Matita implementation. In the rest of this section we will only focus on the Trusted CerCo Compiler, that only targets the C language and the 8051/8052 family, and that does not implement the advanced optimizations yet. Its user interface, however, is the same as the one of the other versions, in order to trade safety with additional performances. In particular, the Frama-C CerCo plug-in can work without recompilation with all compilers. The (trusted) CerCo compiler implements the following optimizations: cast simplification, constant propagation in expressions, liveness analysis driven spilling of registers, dead code elimination, branch displacement, tunneling. The two latter optimizations are performed by the optimizing assembler which is part of the compiler. The back-end of the compiler works on three addresses instructions, preferred to static single assignment code for the simplicity of the formal certification. The CerCo compiler is loosely based on the CompCert compiler, a recently developed certified compiler from C to the PowerPC, ARM and x86 microprocessors. Contrarily to CompCert, both the CerCo code and its certification are open source. Some data structures and language definitions for the front-end are directly taken from CompCert, while the back-end is a redesign and reimplementation of a didactic compiler from Pascal to MIPS used by Francois Pottier for a course at the Ecole Polytechnique. The main peculiarities of the CerCo compiler are: \begin{enumerate} \item all of our intermediate languages include label emitting instructions to implement the labelling approach, and the compiler preserves execution traces. \item tradiniotally compilers target an assembly language with additional macro-instructions to be expanded before assembly; for CerCo we need to go all the way down to object code in order to perform the static analysis. Therefore we developed also an optimizing assembler and a static analyser, all integrated in the compiler. \item to avoid implementing a linker and a loader, we do not support separate compilation and external calls. Adding a linker and a loader is a transparent process to the labelling approach and should create no unknown problem \item we target an 8-bit processor. Targeting an 8 bit processor requires several changes and increased code size, but it is not fundamentally more complex. The proof of correctness, however, becomes much harder. \item we target a microprocessor that has a non uniform memory model, which is still often the case for microprocessors used in embedded systems and that is becoming common again in multi-core processors. Therefore the compiler has to keep track of the position of data and it must move data between memory regions in the proper way. Also the size of pointers to different regions is not uniform. In our case, an additional difficulty was that the space available for the stack in internal memory in the 8051 is tiny, allowing only a minor number of nested calls. To support full recursion in order to test the CerCo tools also on recursive programs, the compiler manually manages a stack in external memory. \end{enumerate} \section{A formal certification of the CerCo compiler} The Trusted CerCo Compiler has been implemented and certified using the interactive theorem prover Matita. The details on the proof techniques employed and the proof sketch can be collected from the CerCo deliverables and papers. In this section we will only hint at the correctness statement, which turned out to be more complex than what we expected at the beginning. \paragraph{The statement} The most natural statement of correctness for our complexity preserving compiler is that the time spent for execution by a terminating object code program should be the time predicted on the source code by adding up the cost of every label emission statement. This statement, however, is too naïve to be useful for real world real time programs like those used in embedded systems. Real time programs are often reactive programs that loop forever responding to events (inputs) by performing some computation followed by some action (output) and the return to the initial state. For looping programs the overall execution time does not make sense. The same happens for reactive programs that spend an unpredictable amount of time in I/O. What is interesting is the reaction time that measure the time spent between I/O events. Moreover, we are interested in predicting and ruling out programs that crash running out of space on a certain input. Therefore we need to look for a more complex statement that talks about sub-runs of a program. The most natural statement is that the time predicted on the source code and spent on the object code by two corresponding sub-runs are the same. The problem to solve to make this statement formal is how to identify the corresponding sub-runs and how to single out those that are meaningful. The solution we found is based on the notion of measurability. We say that a run has a measurable sub-run when both the prefix of the sub-run and the sub-run do not exhaust the stack space, the number of function calls and returns in the sub-run is the same, the sub-run does not perform any I/O and the sub-run starts with a label emission statement and ends with a return or another label emission statements. The stack usage is estimated using the stack usage model that is computed by the compiler. The statement that we want to formally prove is that for each C run with a measurable sub-run there exists an object code run with a sub-run such that the observables of the pairs of prefixes and sub-runs are the same and the time spent by the object code in the sub-run is the same as the one predicted on the source code using the time cost model generated by the compiler. We briefly discuss the constraints for measurability. Not exhausting the stack space is a clear requirement of meaningfulness of a run, because source programs do not crash for lack of space while object code ones do. The balancing of function calls/returns is a requirement for precision: the labelling approach allows the scope of label emission statements to extend after function calls to minimize the number of labels. Therefore a label pays for all the instructions in a block, excluding those executed in nested function calls. If the number of calls/returns is unbalanced, it means that there is a call we have not returned to that could be followed by additional instructions whose cost has already been taken in account. To make the statement true (but less precise) in this situation, we could only say that the cost predicted on the source code is a safe bound, not that it is exact. The last condition on the entry/exit points of a run is used to identify sub-runs whose code correspond to a sequence of blocks that we can measure precisely. Any other choice would start/end the run in the middle of a block and we would be forced again to weaken the statement taking as a bound the cost obtained counting in all the instructions that precede the starting one in the block/follow the final one in the block. I/O operations can be performed in the prefix of the run, but not in the measurable sub-run. Therefore we prove that we can predict reaction times, but not I/O times, as it should be. \section{The typical CerCo workflow} \begin{figure}[t] \begin{tabular}{l@{\hspace{0.2cm}}|@{\hspace{0.2cm}}l} \label{test1} \end{figure} In order to illustrate the typical workflow we envision, we will follow what is done on an example program (on the left of \autoref{test1}, while on the right a depiction of the workflow can be found). The user writes the C program and feeds it to the Cerco compiler, which outputs an annotated version of the same program. The annotated program can then be enriched with complexity assertions in the style of Hoare logic, that are passed to a deductive platform (in our case Frama-C). Optionally a part of these assertions can be automatically sinthesised by the Frama-C cost plugin. The deductive platform builds the proof obligations which in turn can be closed by automatic or interactive provers, ending in a proof certificate. \autoref{itest1} shows the result of the process as far as the CerCo part is concerned. The original code is annotated by two kinds of annotations: time increments (in clock cycles) and stack variations (in bytes). In this particular case all data (including the return address) is kept in hardware registers, so no stack is used. The initialisation of the \texttt{\_\_cost} and \texttt{\_\_stack} variables tell respectively the initialisation time cost (before \texttt{main} is called) and the amount of memory occupied by global data. The output shown is verified by the automatic prover CVC3 in 8 seconds. % \fvset{commandchars=\\\{\}} \lstset{morecomment=[s][\color{blue}]{/*@}{*/}, moredelim=[is][\color{blue}]{$}{$}, moredelim=[is][\color{red}]{|}{|}} \begin{figure}[!p] \begin{lstlisting} } \end{lstlisting} \caption{The instrumented version of the program in \autoref{test1}. Lines in red are instrumentation added by the CerCo compiler. The cost invariants in blue are added by the Frama-C plugin and automatically verified by CVC3 in 8s. The time costs are in clock cycles. Being non recursive, the main function uses no stack (all the variables are kept in registers). The global data occupies 5 bytes and 33 clock cycles for the initialization (before calling \texttt{main}). } \caption{The instrumented version of the program in \autoref{test1}, with lines added by the CerCo compiler in red and those added by the CerCo Frama-C plugin in blue.} \label{itest1} \end{figure} \section{Main S\&T results} We will now review the main S\&T results achieved in the CerCo project. We will address them in the following order: \begin{enumerate} \item \emph{The (basic) labelling approach.} It is the main technique that underlies all the developments in CerCo. It allows to track the evolution of basic blocks during compilation in order to propagate the cost model from the object code to the source code without loosing precision in the process. \item \emph{Dependent labelling.} The basic labelling approach assumes a bijective mapping between object code and source code O(1) blocks (called basic blocks). This assumption is violated by many program optimizations (e.g. loop peeling and loop unrolling). It also assumes the cost model computed on the object code to be non parametric: every block must be assigned a cost that does not depend on the state. This assumption is violated by stateful hardware like pipelines, caches, branch prediction units. The dependent labelling approach is an extension of the basic labelling approach that allows to handle parametric cost models. We showed how the method allows to deal with loop optimizations and pipelines, and we speculated about its applications to caches. \item \emph{Techniques to exploit the induced cost model.} Every technique used for the analysis of functional properties of programs can be adapted to analyse the non functional properties of the source code instrumented by compilers that implement the labelling approach. In order to gain confidence in this claim, we showed how to implement a cost invariant generator combining abstract interpretation with separation logic ideas. We integrated everything in the Frama-C modular architecture, in order to automatically compute proof obligations from the functional and the cost invariants and to use automatic theorem provers to proof them. This is an example of a new technique that is not currently exploited in WCET analysis. It also shows how precise functional invariants benefits the non functional analysis too. Finally, we show how to fully automatically analyse the reaction time of Lustre nodes that are first compiled to C using a standard Lustre compiler and then processed by a C compiler that implements the labelling approach. \item \emph{The CerCo compiler.} This is a compiler from a large subset of the C program to 8051/8052 object code. The compiler implements the labelling approach and integrates a static analyser for 8051 executables. The latter can be implemented easily and does not require dependent costs because the 8051 microprocessor is a very simple processor whose instructions generally have a constant cost. It was picked to separate the issue of exact propagation of the cost model from the target to the source language from the orthogonal problem of the static analysis of object code that requires approximations or dependent costs. The compiler comes in several versions: some are prototypes implemented directly in OCaml, and they implement both the basic and dependent labelling approaches; the final version is extracted from a Matita certification and at the moment implements only the basic approach. \item \emph{A formal cost certification of the CerCo compiler.} We implemented the CerCo compiler in the interactive theorem prover Matita and have formally certified that the cost model induced on the source code correctly and precisely predicts the object code behavior. We actually induce two cost models, one for time and one for stack space used. We show the correctness of the prediction only for those programs that do not exhaust the available machine stack space, a property that thanks to the stack cost model we can statically analyse on the source code using our Frama-C tool. The preservation of functional properties we take as an assumption, not itself formally proved in CerCo.  Other projects have already certified the preservation of functional semantics in similar compilers, and we have not attempted to directly repeat that work. In order to complete the proof for non-functional properties, we have introduced a new semantics for programming languages based on a new kind of structured observables with the relative notions of forward similarity and the study of the intentional consequences of forward similarity. We have also introduced a unified representation for back-end intermediate languages that was exploited to provide a uniform proof of forward similarity. \end{enumerate} \subsection{The (basic) labelling approach.} \paragraph{Problem statement:} given a source program P, we want to obtain an instrumented source program P',  written in the same programming language, and the object code O such that: 1) P' is obtained by inserting into P some additional instructions to update global cost information like the amount of time spent during execution or the maximal stack space required; 2) P and P' must have the same functional behavior, i.e., they must produce that same output and intermediate observables; 3) P and O must have the same functional behavior; 4) after execution and in interesting points during execution, the cost information computed by P' must be an upper bound of the one spent by O  to perform the corresponding operations (soundness property); 5) the difference between the costs computed by P' and the execution costs of O must be bounded by a program dependent constant (precision property). \paragraph{The labeling software components:} we solve the problem in four stages, implemented by four software components that are used in sequence. \begin{enumerate} \item The first component labels the source program P by injecting label emission statements in the program in appropriate positions. The set of labels with their positions is called labelling. The syntax and semantics of the source programming language is augmented with label emission statements. The statement “EMIT l” behaves like a NOP instruction that does not affect the program state or control flow, but it changes the semantics by making the label l observable. Therefore the observables of a run of a program becomes a stream of label emissions: l1 … ln, called the program trace. We clarify the conditions that the labelling must respect later. \item The second component is a labelling preserving compiler. It can be obtained from an existing compiler by adding label emission statements to every intermediate language and by propagating label emission statements during compilation. The compiler is correct if it preserves both the functional behavior of the program and the generated traces. We may also ask that the function that erases the cost emission statements commute with compilation. This optional property grants that the labelling does not interfere with the original compiler behavior. A further set of requirements will be added later. \item The third component is a static object code analyser. It takes in input the object code augmented with label emission statements and it computes, for every such statement, its scope. The scope of a label emission statement is the tree of instructions that may be executed after the statement and before a new label emission statement is found. It is a tree and not a sequence because the scope may contain a branching statement. In order to grant that such a finite tree exists, the object code must not contain any loop that is not broken by a label emission statement. This is the first requirement of a sound labelling. The analyser fails if the labelling is unsound. For each scope, the analyser computes an upper bound of the execution time required by the scope, using the maximum of the costs of the two branches in case of a conditional statement. Finally, the analyser computes the cost of a label by taking the maximum of the costs of the scopes of every statement that emits that label. \item The fourth and last component takes in input the cost model computed at step 3 and the labelled code computed at step 1. It outputs a source program obtained by replacing each label emission statement with a statement that increments the global cost variable with the cost associated to the label by the cost model.  The obtained source code is the instrumented source code. \end{enumerate} \paragraph{Correctness:} Requirements 1, 2 and 3 of the program statement obviously hold, with 2 and 3 being a consequence of the definition of a correct labelling preserving compiler. It is also obvious that the value of the global cost variable of an instrumented source code is at any time equal to the sum of the costs of the labels emitted by the corresponding labelled code. Moreover, because the compiler preserves all traces, the sum of the costs of the labels emitted in the source and target labelled code are the same. Therefore, to satisfy the 4th requirement, we need to grant that the time taken to execute the object code is equal to the sum of the costs of the labels emitted by the object code. We collect all the necessary conditions for this to happen in the definition of a sound labelling: a) all loops must be broken by a cost emission statement;  b) all program instructions must be in the scope of some cost emission statement. To satisfy also the 5th requirement, additional requirements must be imposed on the object code labelling to avoid all uses of the maximum in the cost computation: the labelling is precise if every label is emitted at most once and both branches of a conditional jump start with a label emission statement. The correctness and precision of the labelling approach only rely on the correctness and precision of the object code labelling. The simplest, but not necessary, way to achieve them is to impose correctness and precision requirements also on the initial labelling produced by the first software component, and to ask the labelling preserving compiler to preserve these properties too. The latter requirement imposes serious limitations on the compilation strategy and optimizations: the compiler may not duplicate any code that contains label emission statements, like loop bodies. Therefore several loop optimizations like peeling or unrolling are prevented. Moreover, precision of the object code labelling is not sufficient per se to obtain global precision: we also implicitly assumed the static analysis to be able to associate a precise constant cost to every instruction. This is not possible in presence of stateful hardware whose state influences the cost of operations, like pipelines and caches. In the next section we will see an extension of the basic labelling approach to cover this situation. The labelling approach described in this section can be applied equally well and with minor modifications to imperative and functional languages. In the CerCo project, we have tested it on a simple imperative language without functions (a While language), to a subset of C and to two compilation chains for a purely functional higher order language. The two main changes required to deal with functional languages are: 1) because global variables and updates are not available, the instrumentation phase produces monadic code to “update” the global costs; 2) the requirements for a sound and precise labelling of the source code must be changed when the compilation is based on CPS translations. In particular, we need to introduce both labels emitted before a statement is executed and labels emitted after a statement is executed. The latter capture code that is inserted by the CPS translation and that would escape all label scopes. Phases 1, 2 and 3 can be applied as well to logic languages (e.g. Prolog). However, the instrumentation phase cannot: in standard Prolog there is no notion of (global) variable whose state is not retracted during backtracking. Therefore, the cost of executing computations that are later backtracked would not be correctly counted in. Any extension of logic languages with non-backtrackable state should support the labelling approach. \subsection{Dependent labelling.} The core idea of the basic labelling approach is to establish a tight connection between basic blocks executed in the source and target languages. Once the connection is established, any cost model computed on the object code can be transferred to the source code, without affecting the code of the compiler or its proof. In particular, it is immediate that we can also transport cost models that associate to each label functions from hardware state to natural numbers. However, a problem arises during the instrumentation phase that replaces cost emission statements with increments of global cost variables. The global cost variable must be incremented with the result of applying the function associated to the label to the hardware state at the time of execution of the block. The hardware state comprises both the “functional” state that affects the computation (the value of the registers and memory) and the “non functional” state that does not (the pipeline and caches content for example). The former is in correspondence with the source code state, but reconstructing the correspondence may be hard and lifting the cost model to work on the source code state is likely to produce cost expressions that are too hard to reason on. Luckily enough, in all modern architectures the cost of executing single instructions is either independent of the functional state or the jitter --- the difference between the worst and best case execution times --- is small enough to be bounded without loosing too much precision. Therefore we can concentrate on dependencies over the “non functional” parts of the state only. The non functional state has no correspondence in the high level state and does not influence the functional properties. What can be done is to expose the non functional state in the source code. We just present here the basic intuition in a simplified form: the technical details that allow to handle the general case are more complex and can be found in the CerCo deliverables. We add to the source code an additional global variable that represents the non functional state and another one that remembers the last labels emitted. The state variable must be updated at every label emission statement, using an update function which is computed during static analysis too. The update function associates to each label a function from the recently emitted labels and old state to the new state. It is computed composing the semantics of every instruction in a basic block and restricting it to the non functional part of the state. Not all the details of the non functional state needs to be exposed, and the technique works better when the part of state that is required can be summarized in a simple data structure. For example, to handle simple but realistic pipelines it is sufficient to remember a short integer that encodes the position of bubbles (stuck instructions) in the pipeline. In any case, the user does not need to understand the meaning of the state to reason over the properties of the program. Moreover, at any moment the user or the invariant generator tools that analyse the instrumented source code produced by the compiler can decide to trade precision of the analysis with simplicity by approximating the parametric cost with safe non parametric bounds. Interestingly, the functional analysis of the code can determine which blocks are executed more frequently in order to approximate more aggressively the ones that are executed less. The idea of dependent labelling can also be applied to allow the compiler to duplicate blocks that contain labels (e.g. to allow loop optimizations). The effect of duplication is to assign a different cost to the different occurrences of a duplicated label. For example, loop peeling turns a loop into the concatenation of a copy of the loop body (that executes the first iteration) with the conditional execution of the loop (for the successive iterations). Because of further optimizations, the two copies of the loop will be compiled differently, with the first body usually taking more time. By introducing a variable that keep tracks of the iteration number, we can associate to the label a cost that is a function of the iteration number. The same technique works for loop unrolling without modifications: the function will assign a cost to the even iterations and another cost to the odd ones. The actual work to be done consists in introducing in the source code and for each loop a variable that counts the number of iterations. The loop optimization code that duplicate the loop bodies must also modify the code to propagate correctly the update of the iteration numbers. The technical details are more complex and can be found in the CerCo reports and publications. The implementation, however, is quite simple and the changes to a loop optimizing compiler are minimal. \subsection{Techniques to exploit the induced cost model.} We review the cost synthesis techniques developed in the project. The starting hypothesis is that we have a certified methodology to annotate blocks in the source code with constants which provide a sound and possibly precise upper bound on the cost of executing the blocks after compilation to object code. The principle that we have followed in designing the cost synthesis tools is that the synthetic bounds should be expressed and proved within a general purpose tool built to reason on the source code. In particular, we rely on the Frama − C tool to reason on C code and on the Coq proof-assistant to reason on higher-order functional programs. This principle entails that: 1) The inferred synthetic bounds are indeed correct as long as the general purpose tool is. 2) There is no limitation on the class of programs that can be handled as long as the user is willing to carry on an interactive proof. Of course, automation is desirable whenever possible. Within this framework, automation means writing programs that give hints to the general purpose tool. These hints may take the form, say, of loop invariants/variants, of predicates describing the structure of the heap, or of types in a light logic. If these hints are correct and sufficiently precise the general purpose tool will produce a proof automatically, otherwise, user interaction is required. \paragraph{The Cost plug-in and its application to the Lustre compiler} Frama − C is a set of analysers for C programs with a specification language called ACSL. New analyses can be dynamically added through a plug-in system. For instance, the Jessie plug-in allows deductive verification of C programs with respect to their specification in ACSL, with various provers as back-end tools. We developed the CerCo Cost plug-in for the Frama − C platform as a proof of concept of an automatic environment exploiting the cost annotations produced by the CerCo compiler. It consists of an OCaml program which in first approximation takes the following actions: (1) it receives as input a C program, (2) it applies the CerCo compiler to produce a related C program with cost annotations, (3) it applies some heuristics to produce a tentative bound on the cost of executing the C functions of the program as a function of the value of their parameters, (4) the user can then call the Jessie tool to discharge the related proof obligations. In the following we elaborate on the soundness of the framework and the experiments we performed with the Cost tool on the C programs produced by a Lustre compiler. \paragraph{Soundness} The soundness of the whole framework depends on the cost annotations added by the CerCo compiler, the synthetic costs produced by the Cost plug-in, the verification conditions (VCs) generated by Jessie, and the external provers discharging the VCs. The synthetic costs being in ACSL format, Jessie can be used to verify them. Thus, even if the added synthetic costs are incorrect (relatively to the cost annotations), the process as a whole is still correct: indeed, Jessie will not validate incorrect costs and no conclusion can be made about the WCET of the program in this case. In other terms, the soundness does not really depend on the action of the Cost plug-in, which can in principle produce any synthetic cost. However, in order to be able to actually prove a WCET of a C function, we need to add correct annotations in a way that Jessie and subsequent automatic provers have enough information to deduce their validity. In practice this is not straightforward even for very simple programs composed of branching and assignments (no loops and no recursion) because a fine analysis of the VCs associated with branching may lead to a complexity blow up. \paragraph{Experience with Lustre} Lustre is a data-flow language to program synchronous systems and the language comes with a compiler to C. We designed a wrapper for supporting Lustre files. The C function produced by the compiler implements the step function of the synchronous system and computing the WCET of the function amounts to obtain a bound on the reaction time of the system. We tested the Cost plug-in and the Lustre wrapper on the C programs generated by the Lustre compiler. For programs consisting of a few hundreds loc, the Cost plug-in computes a WCET and Alt − Ergo is able to discharge all VCs automatically. \paragraph{Handling C programs with simple loops} The cost annotations added by the CerCo compiler take the form of C instructions that update by a constant a fresh global variable called the cost variable. Synthesizing a WCET of a C function thus consists in statically resolving an upper bound of the difference between the value of the cost variable before and after the execution of the function, i.e. find in the function the instructions that update the cost variable and establish the number of times they are passed through during the flow of execution. In order to do the analysis the plugin makes the following assumptions on the programs: 1. No recursive functions. 2. Every loop must be annotated with a variant. The variants of ‘for’ loops are automatically inferred. The plugin proceeds as follows. \begin{enumerate} \item First the call graph of the program is computed. If the function f calls the function g then the function g is processed before the function f . \item The computation of the cost of the function is performed by traversing its control flow graph. The cost at a node is the maximum of the costs of the successors. \item In the case of a loop with a body that has a constant cost for every step of the loop, the cost is the product of the cost of the body and of the variant taken at the start of the loop. \item In the case of a loop with a body whose cost depends on the values of some free variables, a fresh logic function f is introduced to represent the cost of the loop in the logic assertions. This logic function takes the variant as a first parameter. The other parameters of f are the free variables of the body of the loop. An axiom is added to account the fact that the cost is accumulated at each step of the loop. \item The cost of the function is directly added as post-condition of the function \end{enumerate} The user can influence the annotation by different means: by using more precise variants or by annotating function with cost specification. The plugin will use this cost for the function instead of computing it. \paragraph{C programs with pointers} When it comes to verifying programs involving pointer-based data structures, such as linked lists, trees, or graphs, the use of traditional first-order logic to specify, and of SMT solvers to verify, shows some limitations. Separation logic is then an elegant alternative. Designed at the turn of the century, it is a program logic with a new notion of conjunction to express spatial heap separation.Bobot has recently introduced in the Jessie plug-in automatically generated separation predicates that allow to simulate separation logic reasoning in a traditional verification framework where the specification language, the verification condition generator, and the theorem provers were not designed with separation logic in mind. CerCo's plug-in can exploit the separation predicates to automatically reason on the cost of execution of simple heap manipulation programs such as an in-place list reversal. \subsection{The CerCo Compiler} In CerCo we have developed a certain number of cost preserving compilers based on the labelling approach. Excluding an initial certified compiler for a While language, all remaining compilers target realistic source languages --- a pure higher order functional language and a large subset of C with pointers, gotos and all data structures --- and real world target processors --- MIPS and the Intel 8051/8052 processor family. Moreover, they achieve a level of optimization that ranges from moderate (comparable to gcc level 1) to intermediate (including loop peeling and unrolling, hoisting and late constant propagation). The so called Trusted CerCo Compiler is the only one that was implemented in the interactive theorem prover Matita and its costs certified. The code distributed is obtained extracting OCaml code from the Matita implementation. In the rest of this section we will only focus on the Trusted CerCo Compiler, that only targets the C language and the 8051/8052 family, and that does not implement the advanced optimizations yet. Its user interface, however, is the same as the one of the other versions, in order to trade safety with additional performances. In particular, the Frama-C CerCo plug-in can work without recompilation with all compilers. The (trusted) CerCo compiler implements the following optimizations: cast simplification, constant propagation in expressions, liveness analysis driven spilling of registers, dead code elimination, branch displacement, tunneling. The two latter optimizations are performed by the optimizing assembler which is part of the compiler. The back-end of the compiler works on three addresses instructions, preferred to static single assignment code for the simplicity of the formal certification. The CerCo compiler is loosely based on the CompCert compiler, a recently developed certified compiler from C to the PowerPC, ARM and x86 microprocessors. Contrarily to CompCert, both the CerCo code and its certification are open source. Some data structures and language definitions for the front-end are directly taken from CompCert, while the back-end is a redesign and reimplementation of a didactic compiler from Pascal to MIPS used by Francois Pottier for a course at the Ecole Polytechnique. The main peculiarities of the CerCo compiler are: \begin{enumerate} \item all of our intermediate languages include label emitting instructions to implement the labelling approach, and the compiler preserves execution traces. \item tradiniotally compilers target an assembly language with additional macro-instructions to be expanded before assembly; for CerCo we need to go all the way down to object code in order to perform the static analysis. Therefore we developed also an optimizing assembler and a static analyser, all integrated in the compiler. \item to avoid implementing a linker and a loader, we do not support separate compilation and external calls. Adding a linker and a loader is a transparent process to the labelling approach and should create no unknown problem \item we target an 8-bit processor. Targeting an 8 bit processor requires several changes and increased code size, but it is not fundamentally more complex. The proof of correctness, however, becomes much harder. \item we target a microprocessor that has a non uniform memory model, which is still often the case for microprocessors used in embedded systems and that is becoming common again in multi-core processors. Therefore the compiler has to keep track of the position of data and it must move data between memory regions in the proper way. Also the size of pointers to different regions is not uniform. In our case, an additional difficulty was that the space available for the stack in internal memory in the 8051 is tiny, allowing only a minor number of nested calls. To support full recursion in order to test the CerCo tools also on recursive programs, the compiler manually manages a stack in external memory. \end{enumerate} \section{A formal certification of the CerCo compiler} The Trusted CerCo Compiler has been implemented and certified using the interactive theorem prover Matita. The details on the proof techniques employed and the proof sketch can be collected from the CerCo deliverables and papers. In this section we will only hint at the correctness statement, which turned out to be more complex than what we expected at the beginning. \paragraph{The statement} The most natural statement of correctness for our complexity preserving compiler is that the time spent for execution by a terminating object code program should be the time predicted on the source code by adding up the cost of every label emission statement. This statement, however, is too naïve to be useful for real world real time programs like those used in embedded systems. Real time programs are often reactive programs that loop forever responding to events (inputs) by performing some computation followed by some action (output) and the return to the initial state. For looping programs the overall execution time does not make sense. The same happens for reactive programs that spend an unpredictable amount of time in I/O. What is interesting is the reaction time that measure the time spent between I/O events. Moreover, we are interested in predicting and ruling out programs that crash running out of space on a certain input. Therefore we need to look for a more complex statement that talks about sub-runs of a program. The most natural statement is that the time predicted on the source code and spent on the object code by two corresponding sub-runs are the same. The problem to solve to make this statement formal is how to identify the corresponding sub-runs and how to single out those that are meaningful. The solution we found is based on the notion of measurability. We say that a run has a measurable sub-run when both the prefix of the sub-run and the sub-run do not exhaust the stack space, the number of function calls and returns in the sub-run is the same, the sub-run does not perform any I/O and the sub-run starts with a label emission statement and ends with a return or another label emission statements. The stack usage is estimated using the stack usage model that is computed by the compiler. The statement that we want to formally prove is that for each C run with a measurable sub-run there exists an object code run with a sub-run such that the observables of the pairs of prefixes and sub-runs are the same and the time spent by the object code in the sub-run is the same as the one predicted on the source code using the time cost model generated by the compiler. We briefly discuss the constraints for measurability. Not exhausting the stack space is a clear requirement of meaningfulness of a run, because source programs do not crash for lack of space while object code ones do. The balancing of function calls/returns is a requirement for precision: the labelling approach allows the scope of label emission statements to extend after function calls to minimize the number of labels. Therefore a label pays for all the instructions in a block, excluding those executed in nested function calls. If the number of calls/returns is unbalanced, it means that there is a call we have not returned to that could be followed by additional instructions whose cost has already been taken in account. To make the statement true (but less precise) in this situation, we could only say that the cost predicted on the source code is a safe bound, not that it is exact. The last condition on the entry/exit points of a run is used to identify sub-runs whose code correspond to a sequence of blocks that we can measure precisely. Any other choice would start/end the run in the middle of a block and we would be forced again to weaken the statement taking as a bound the cost obtained counting in all the instructions that precede the starting one in the block/follow the final one in the block. I/O operations can be performed in the prefix of the run, but not in the measurable sub-run. Therefore we prove that we can predict reaction times, but not I/O times, as it should be. \section{Future work}