Changeset 3640


Ignore:
Timestamp:
Mar 8, 2017, 1:50:54 PM (6 months ago)
Author:
mulligan
Message:

more work on intro...

File:
1 edited

Legend:

Unmodified
Added
Removed
  • Papers/jar-cerco-2017/introduction.tex

    r3639 r3640  
    5252Information could be shared between the two analyses, and both could be performed concurrently on the same high-level source code.
    5353
    54 What has previously prevented high-level reasoning about non-functional properties is the lack of a uniform and precise cost model for programs written in programming languages, such as C.
    55 Since modern compilers---as discussed previously---may compile each expression and statement occurrence in radically different ways, optimisations may change control flow, and the cost of an object code instruction may depend on the runtime state of hardware components like pipelines and caches, none of which are visible in the source code, it has not been clear how one could go about defining such a cost model.
     54What has previously prevented high-level reasoning about non-functional properties is the lack of a precise cost model for programs written in programming languages, such as C.
     55As modern compilers---mentioned above---are inherently non-compositional in their translations, any precise concrete cost model for a high-level language must also be non-compositional, and is inherently tied to how an individual program is compiled.
     56However, as optimisations may change control flow, and the cost of an object code instruction may depend on the runtime state of hardware components like pipelines and caches, none of which are visible in the source code, it has not been clear how one could go about defining such a cost model.
    5657
    5758In this paper, we report on the scientific and technical contributions of the Certified Complexity (`CerCo') project~\cite{cerco}, an EU FET-Open funded collaboration between the Universities of Bologna, Edinburgh, and Paris 7.
     
    5960The central insight of the CerCo project is that a cost model for a high-level programming language can be defined using a compiler's implicit knowledge of how expressions and statements are translated to machine code.
    6061This implicit knowledge of how a compiler translates source-language phrases has to date been largely ignored when designing static analysis tools for concrete complexity measures.
    61 However, by embracing the compilation process itself, one is able to define a non-uniform (non-compositional) cost model that is customised to the exact translation process that was subjected to the source-language program by the compiler.
     62However, by embracing the compilation process itself, one is able to define a non-uniform, or non-compositional, cost model that is customised to the exact translation process that was subjected to the source-language program by the compiler.
    6263
    6364Accordingly, we first present a non-uniform, precise cost model for a large fragment of the C programming language---\emph{CerCo C}, which is based on early versions of \emph{CompCert C}~\cite{compcert}.
     
    7879Using this plugin, we have conducted case studies, including showing that the plugin can automatically compute and certify the exact reaction time of Lustre~\cite{lustre} data flow programs compiled into C.
    7980
    80 The compiler is verified using the Matita interactive theorem prover, an implementation of the Calculus of Coinductive Constructions, developed at the University of Bologna.
    81 
    82 \subsection{The CerCo approach, in more detail}
    83 
    84 As mentioned above, in CerCo we reject the idea of a uniform (compositional) cost model for a high-level programming language.
    85 We view any uniform  cost model as being hopelessly imprecise, as modern compilers themselves are highly non-compositional in their translations.
     81\subsection{Our approach in more detail}
     82
     83As mentioned above, in CerCo we reject the idea of a uniform, or compositional, cost model for a high-level programming language.
     84We view any uniform cost model as being by its nature imprecise, as modern compilers themselves are highly non-compositional in their translations.
    8685Far better is to try to engineer the compiler---which knows how the code is translated---to return the cost model for basic blocks of high level instructions.
    8786To do this, the compiler must track modifications to the control flow of the program, in order to reverse them, and interface with a processor timing analysis.
    8887
    89 By embracing compilation, instead of avoiding it like EmBounded did, a CerCo
    90 compiler can both produce efficient code and return costs that are as precise
    91 as the processor timing analysis can be. Moreover, our costs can be parametric:
    92 the cost of a block can depend on actual program data, on a summary of the
    93 execution history, or on an approximated representation of the hardware state.
    94 
    95 For example, loop optimisations may assign a cost to a loop body that is a
    96 function of the number of iterations performed. As another example, the cost of
    97 a block may be a function of the vector of stalled pipeline states, which can
    98 be exposed in the source code and updated at each basic block exit.
    99 
    100 It is parametricity that allows one to analyse small code fragments without
    101 losing precision. In the analysis of the code fragment we do not have to ignore
    102 the initial hardware state; rather, we can assume that we know exactly which
    103 state (or mode, as the WCET literature calls it) we are in.
    104 
    105 The CerCo approach has the potential to dramatically improve the state of the
    106 art. By performing control and data flow analyses on the source code, the
    107 error-prone translation of invariants is avoided entirely. Instead, this work
    108 is done at the source code level using tools of the user's choice.
    109 
    110 Any available technique for the verification of functional properties can be
    111 easily reused and multiple techniques can collaborate to infer and certify cost
    112 invariants for the program. There are no limitations on the types of loops or
    113 data structures involved.
    114 
    115 Parametric cost analysis becomes the default, with non-parametric bounds used
    116 only as a last resort when the user decides to trade the complexity of the
    117 analysis for more precision.
    118 
    119 \emph{A priori}, no technique previously used in traditional WCET is obsolete:
    120 processor timing analysis can be used by the compiler on the object code, and
    121 other techniques can be applied at the source code level.
    122 
    123 Our approach also works in the early stages of development by allowing the user
    124 to axiomatically attach costs to unimplemented components.
    125 
    126 Software used to verify properties of programs must be as bug-free as possible.
    127 The trusted code base for verification consists of the code that needs to be
    128 trusted to believe that the property holds.
    129 
    130 The trusted code base of state-of-the-art WCET tools is very large: one needs
    131 to trust the control flow analyser, the linear programming libraries used, and
    132 also the formal models of the hardware under analysis.
    133 
    134 In CerCo we move the control flow analysis to the source code level, and we
    135 introduce a non-standard compiler. To reduce the size of the trusted code base,
    136 we have implemented a prototype compiler and static analyser in an interactive
    137 theorem prover, which was used to certify that the costs added to the source
    138 code are indeed those incurred by the hardware. We have also implemented formal models of the hardware and of the high level source language in the interactive
    139 theorem prover.
    140 
    141 Control flow analysis on the source code has been obtained using invariant
    142 generators, tools to produce proof obligations from generated invariants and
    143 automatic theorem provers to verify the obligations. If these tools are able to
    144 generate proof traces that can be independently checked, the only remaining
    145 component that enters the trusted code base is an off-the-shelf invariant
    146 generator which, in turn, can be proved correct using an interactive theorem
    147 prover.
    148 
    149 With these methods, we achieve the objective of allowing the use of more
    150 off-the-shelf components (e.g. provers and invariant generators) whilst
    151 reducing the trusted code base at the same time.
     88By embracing compilation, the CerCo compiler can both produce efficient code and return costs that are as precise as the processor timing analysis can be.
     89Moreover, our costs can be parametric: the cost of a block can depend on actual program data, on a summary of the execution history, or on an approximated representation of the hardware state.
     90
     91For example, loop optimisations may assign a cost to a loop body that is a function of the number of iterations performed.
     92As another example, the cost of a block may be a function of the vector of stalled pipeline states, which can be exposed in the source code and updated at each basic block exit.
     93
     94It is parametricity that allows one to analyse small code fragments without losing precision.
     95In the analysis of the code fragment we do not have to ignore the initial hardware state; rather, we can assume that we know exactly which state (or mode, as the WCET literature calls it) we are in.
     96Note that this approach also extends naturally to the early stages of program development.
     97Users may axiomatically attach costs to unimplemented sub-components, or modules, which may be refined as program development proceeds.
     98
     99The CerCo approach is an improvement on the state of the art.
     100By performing control and data flow analyses on the source code, the error-prone translation of invariants is avoided entirely.
     101Instead, this work is done at the source code level using tools of the user's choice.
     102Indeed, any available technique for the verification of functional properties can be easily reused and multiple techniques can collaborate to infer and certify cost invariants for the program.
     103There are no limitations on the types of loops or data structures involved.
     104Parametric cost analysis becomes the default, with non-parametric bounds used only as a last resort when the user decides to trade the complexity of the analysis for more precision.
     105
     106Note that we do not aim to supplant existing WCET analyses, but rather to augment and potentially embrace them as subcomponents of our own analysis.
     107\emph{A priori}, no technique previously used in traditional WCET is obsolete: processor timing analysis can be used by the compiler on the object code, and other techniques can be applied at the source code level.
     108In this work, we have instantiated the CerCo approach with a cost model for the MCS-51 hardware that our prototype compiler targets.
     109As this hardware is relatively simple, it admits a precise cost model, and this is reflected in our final theorem, which demonstrates that our high-level cost annotations are both accurate and precise.
     110On more modern, exhibiting features such as caches, pipelines, and speculative execution, an exact cost model for time or space will be hard-to-impossible to define.
     111Instead, existing WCET analyses could be used, and then lifted to high-level annotations using our non-standard compiler architecture.
     112
     113We take it as an article of faith that software used to verify properties of programs must be as bug-free as possible.
     114One way to achieve this is to reduce the trusted code base of any tools used.
     115Here, `trusted code base' refers to the corpus of code that must be trusted to believe that the analysis being carried out is accurate.
     116The trusted code base of state-of-the-art WCET tools is significant: one needs to trust the control flow analyser, the linear programming libraries used, and also the formal models of the hardware under analysis.
     117In CerCo we move the control flow analysis to the source code level, and we introduce a non-standard compiler.
     118As this compiler architecture is completely novel, we wish to ensure that our implementation (and by extension, our entire approach) is correct.
     119Therefore, to reduce the size of our trusted code base, we have implemented a prototype compiler and static analyser in an interactive theorem prover---the Matita interactive theorem prover, an implementation of the Calculus of Coinductive Constructions---which was used to certify that the costs added to the source code are indeed those incurred by the hardware.
     120We have also implemented formal models of the hardware and of the high-level source-language in the interactive theorem prover.
     121
     122Control flow analysis on the source code has been obtained using invariant generators, tools to produce proof obligations from generated invariants and automatic theorem provers to verify the obligations.
     123If these tools are able to generate proof traces that can be independently checked, the only remaining component that enters the trusted code base is an off-the-shelf invariant generator which, in turn, can be proved correct using an interactive theorem prover.
     124With these methods, we enable the use of more off-the-shelf components (e.g. provers and invariant generators) whilst reducing the trusted code base at the same time.
    152125
    153126\subsection{Introduction to Matita}
    154127
    155 Matita is a theorem prover based on a variant of the Calculus of Coinductive Constructions~\cite{asperti:user:2007}.
     128We now briefly introduce Matita, the interactive theorem prover that was used in CerCo to certify all proofs.
     129
     130Matita is an interactive theorem proving system based on a variant of the Calculus of Coinductive Constructions~\cite{asperti:user:2007}.
    156131The system features a full spectrum of dependent types and (co)inductive families, a system of coercions, a tactic-driven proof construction engine~\cite{sacerdoti-coen:tinycals:2007}, and paramodulation based automation~\cite{asperti:higher-order:2007}, all of which we exploit in the formalisation described herein.
    157132
Note: See TracChangeset for help on using the changeset viewer.