Changeset 3640

Ignore:
Timestamp:
Mar 8, 2017, 1:50:54 PM (4 months ago)
Message:

more work on intro...

File:
1 edited

Legend:

Unmodified
 r3639 Information could be shared between the two analyses, and both could be performed concurrently on the same high-level source code. What has previously prevented high-level reasoning about non-functional properties is the lack of a uniform and precise cost model for programs written in programming languages, such as C. Since modern compilers---as discussed previously---may compile each expression and statement occurrence in radically different ways, optimisations may change control flow, and the cost of an object code instruction may depend on the runtime state of hardware components like pipelines and caches, none of which are visible in the source code, it has not been clear how one could go about defining such a cost model. What has previously prevented high-level reasoning about non-functional properties is the lack of a precise cost model for programs written in programming languages, such as C. As modern compilers---mentioned above---are inherently non-compositional in their translations, any precise concrete cost model for a high-level language must also be non-compositional, and is inherently tied to how an individual program is compiled. However, as optimisations may change control flow, and the cost of an object code instruction may depend on the runtime state of hardware components like pipelines and caches, none of which are visible in the source code, it has not been clear how one could go about defining such a cost model. In this paper, we report on the scientific and technical contributions of the Certified Complexity (CerCo') project~\cite{cerco}, an EU FET-Open funded collaboration between the Universities of Bologna, Edinburgh, and Paris 7. The central insight of the CerCo project is that a cost model for a high-level programming language can be defined using a compiler's implicit knowledge of how expressions and statements are translated to machine code. This implicit knowledge of how a compiler translates source-language phrases has to date been largely ignored when designing static analysis tools for concrete complexity measures. However, by embracing the compilation process itself, one is able to define a non-uniform (non-compositional) cost model that is customised to the exact translation process that was subjected to the source-language program by the compiler. However, by embracing the compilation process itself, one is able to define a non-uniform, or non-compositional, cost model that is customised to the exact translation process that was subjected to the source-language program by the compiler. Accordingly, we first present a non-uniform, precise cost model for a large fragment of the C programming language---\emph{CerCo C}, which is based on early versions of \emph{CompCert C}~\cite{compcert}. Using this plugin, we have conducted case studies, including showing that the plugin can automatically compute and certify the exact reaction time of Lustre~\cite{lustre} data flow programs compiled into C. The compiler is verified using the Matita interactive theorem prover, an implementation of the Calculus of Coinductive Constructions, developed at the University of Bologna. \subsection{The CerCo approach, in more detail} As mentioned above, in CerCo we reject the idea of a uniform (compositional) cost model for a high-level programming language. We view any uniform  cost model as being hopelessly imprecise, as modern compilers themselves are highly non-compositional in their translations. \subsection{Our approach in more detail} As mentioned above, in CerCo we reject the idea of a uniform, or compositional, cost model for a high-level programming language. We view any uniform cost model as being by its nature imprecise, as modern compilers themselves are highly non-compositional in their translations. Far better is to try to engineer the compiler---which knows how the code is translated---to return the cost model for basic blocks of high level instructions. To do this, the compiler must track modifications to the control flow of the program, in order to reverse them, and interface with a processor timing analysis. By embracing compilation, instead of avoiding it like EmBounded did, a CerCo compiler can both produce efficient code and return costs that are as precise as the processor timing analysis can be. Moreover, our costs can be parametric: the cost of a block can depend on actual program data, on a summary of the execution history, or on an approximated representation of the hardware state. For example, loop optimisations may assign a cost to a loop body that is a function of the number of iterations performed. As another example, the cost of a block may be a function of the vector of stalled pipeline states, which can be exposed in the source code and updated at each basic block exit. It is parametricity that allows one to analyse small code fragments without losing precision. In the analysis of the code fragment we do not have to ignore the initial hardware state; rather, we can assume that we know exactly which state (or mode, as the WCET literature calls it) we are in. The CerCo approach has the potential to dramatically improve the state of the art. By performing control and data flow analyses on the source code, the error-prone translation of invariants is avoided entirely. Instead, this work is done at the source code level using tools of the user's choice. Any available technique for the verification of functional properties can be easily reused and multiple techniques can collaborate to infer and certify cost invariants for the program. There are no limitations on the types of loops or data structures involved. Parametric cost analysis becomes the default, with non-parametric bounds used only as a last resort when the user decides to trade the complexity of the analysis for more precision. \emph{A priori}, no technique previously used in traditional WCET is obsolete: processor timing analysis can be used by the compiler on the object code, and other techniques can be applied at the source code level. Our approach also works in the early stages of development by allowing the user to axiomatically attach costs to unimplemented components. Software used to verify properties of programs must be as bug-free as possible. The trusted code base for verification consists of the code that needs to be trusted to believe that the property holds. The trusted code base of state-of-the-art WCET tools is very large: one needs to trust the control flow analyser, the linear programming libraries used, and also the formal models of the hardware under analysis. In CerCo we move the control flow analysis to the source code level, and we introduce a non-standard compiler. To reduce the size of the trusted code base, we have implemented a prototype compiler and static analyser in an interactive theorem prover, which was used to certify that the costs added to the source code are indeed those incurred by the hardware. We have also implemented formal models of the hardware and of the high level source language in the interactive theorem prover. Control flow analysis on the source code has been obtained using invariant generators, tools to produce proof obligations from generated invariants and automatic theorem provers to verify the obligations. If these tools are able to generate proof traces that can be independently checked, the only remaining component that enters the trusted code base is an off-the-shelf invariant generator which, in turn, can be proved correct using an interactive theorem prover. With these methods, we achieve the objective of allowing the use of more off-the-shelf components (e.g. provers and invariant generators) whilst reducing the trusted code base at the same time. By embracing compilation, the CerCo compiler can both produce efficient code and return costs that are as precise as the processor timing analysis can be. Moreover, our costs can be parametric: the cost of a block can depend on actual program data, on a summary of the execution history, or on an approximated representation of the hardware state. For example, loop optimisations may assign a cost to a loop body that is a function of the number of iterations performed. As another example, the cost of a block may be a function of the vector of stalled pipeline states, which can be exposed in the source code and updated at each basic block exit. It is parametricity that allows one to analyse small code fragments without losing precision. In the analysis of the code fragment we do not have to ignore the initial hardware state; rather, we can assume that we know exactly which state (or mode, as the WCET literature calls it) we are in. Note that this approach also extends naturally to the early stages of program development. Users may axiomatically attach costs to unimplemented sub-components, or modules, which may be refined as program development proceeds. The CerCo approach is an improvement on the state of the art. By performing control and data flow analyses on the source code, the error-prone translation of invariants is avoided entirely. Instead, this work is done at the source code level using tools of the user's choice. Indeed, any available technique for the verification of functional properties can be easily reused and multiple techniques can collaborate to infer and certify cost invariants for the program. There are no limitations on the types of loops or data structures involved. Parametric cost analysis becomes the default, with non-parametric bounds used only as a last resort when the user decides to trade the complexity of the analysis for more precision. Note that we do not aim to supplant existing WCET analyses, but rather to augment and potentially embrace them as subcomponents of our own analysis. \emph{A priori}, no technique previously used in traditional WCET is obsolete: processor timing analysis can be used by the compiler on the object code, and other techniques can be applied at the source code level. In this work, we have instantiated the CerCo approach with a cost model for the MCS-51 hardware that our prototype compiler targets. As this hardware is relatively simple, it admits a precise cost model, and this is reflected in our final theorem, which demonstrates that our high-level cost annotations are both accurate and precise. On more modern, exhibiting features such as caches, pipelines, and speculative execution, an exact cost model for time or space will be hard-to-impossible to define. Instead, existing WCET analyses could be used, and then lifted to high-level annotations using our non-standard compiler architecture. We take it as an article of faith that software used to verify properties of programs must be as bug-free as possible. One way to achieve this is to reduce the trusted code base of any tools used. Here, trusted code base' refers to the corpus of code that must be trusted to believe that the analysis being carried out is accurate. The trusted code base of state-of-the-art WCET tools is significant: one needs to trust the control flow analyser, the linear programming libraries used, and also the formal models of the hardware under analysis. In CerCo we move the control flow analysis to the source code level, and we introduce a non-standard compiler. As this compiler architecture is completely novel, we wish to ensure that our implementation (and by extension, our entire approach) is correct. Therefore, to reduce the size of our trusted code base, we have implemented a prototype compiler and static analyser in an interactive theorem prover---the Matita interactive theorem prover, an implementation of the Calculus of Coinductive Constructions---which was used to certify that the costs added to the source code are indeed those incurred by the hardware. We have also implemented formal models of the hardware and of the high-level source-language in the interactive theorem prover. Control flow analysis on the source code has been obtained using invariant generators, tools to produce proof obligations from generated invariants and automatic theorem provers to verify the obligations. If these tools are able to generate proof traces that can be independently checked, the only remaining component that enters the trusted code base is an off-the-shelf invariant generator which, in turn, can be proved correct using an interactive theorem prover. With these methods, we enable the use of more off-the-shelf components (e.g. provers and invariant generators) whilst reducing the trusted code base at the same time. \subsection{Introduction to Matita} Matita is a theorem prover based on a variant of the Calculus of Coinductive Constructions~\cite{asperti:user:2007}. We now briefly introduce Matita, the interactive theorem prover that was used in CerCo to certify all proofs. Matita is an interactive theorem proving system based on a variant of the Calculus of Coinductive Constructions~\cite{asperti:user:2007}. The system features a full spectrum of dependent types and (co)inductive families, a system of coercions, a tactic-driven proof construction engine~\cite{sacerdoti-coen:tinycals:2007}, and paramodulation based automation~\cite{asperti:higher-order:2007}, all of which we exploit in the formalisation described herein.