Mar 6, 2017, 12:43:28 PM (3 years ago)

fopara intro as a base

1 edited


  • Papers/jar-cerco-2017/cerco.tex

    r3603 r3604  
     51        {\setlength{\fboxsep}{5pt}
     52                \setlength{\mylength}{\linewidth}%
     53                \addtolength{\mylength}{-2\fboxsep}%
     54                \addtolength{\mylength}{-2\fboxrule}%
     55                \Sbox
     56                \minipage{\mylength}%
     57                        \setlength{\abovedisplayskip}{0pt}%
     58                        \setlength{\belowdisplayskip}{0pt}%
     59                }%
     60                {\endminipage\endSbox
     61                        \[\fbox{\TheSbox}\]}
     115% ---------------------------------------------------------------------------- %
     116% SECTION                                                                      %
     117% ---------------------------------------------------------------------------- %
     120%\paragraph{Problem statement.}
     121Programs can be specified with both
     122functional constraints (what the program must do) and non-functional constraints (what time, space or other resources the program may use).  In the current
     123state of the art, functional properties are verified
     124by combining user annotations---preconditions, invariants, and so on---with a
     125multitude of automated analyses---invariant generators, type systems, abstract
     126interpretation, theorem proving, and so on---on the program's high-level source code.
     127By contrast, many non-functional properties
     128are verified using analyses on low-level object code,
     129%\footnote{A notable
     130%  exception is the explicit allocation of heap space in languages like C, which
     131%  can be handled at the source level.}
     132but these analyses may then need information
     133about the high-level functional behaviour of the program that must then be reconstructed.
     134This analysis on low-level object code has several problems:
     137It can be hard to deduce the high-level structure of the program after compiler optimisations.
     138The object code produced by an optimising compiler may have radically different control flow to the original source code program.
     140Techniques that operate on object code are not useful early in the development process of a program, yet problems with a program's design or implementation are cheaper to resolve earlier in the process, rather than later.
     142Parametric cost analysis is very hard: how can we reflect a cost that depends on the execution state, for example the
     143value of a register or a carry bit, to a cost that the user can understand
     144looking at the source code?
     146Performing functional analyses on object code makes it hard for the programmer to provide information about the program and its expected execution, leading to a loss of precision in the resulting analyses.
     148\paragraph{Vision and approach.}
     149We want to reconcile functional and
     150non-functional analyses: to share information and perform both at the same time
     151on high-level source code.
     153What has previously prevented this approach is the lack of a uniform and precise
     154cost model for high-level code as each statement occurrence is compiled
     155differently, optimisations may change control flow, and the cost of an object
     156code instruction may depend on the runtime state of hardware components like
     157pipelines and caches, all of which are not visible in the source code.
     159We envision a new generation of compilers that track program structure through compilation and optimisation and exploit this
     160information to define a precise, non-uniform cost model for source code that accounts for runtime state. With such a cost model we can
     161reduce non-functional verification to the functional case and exploit the state
     162of the art in automated high-level verification~\cite{survey}. The techniques
     163currently used by the Worst Case Execution Time (WCET) community, who perform analyses on object code,
     164are still available but can be coupled with additional source-level
     165analyses. Where our approach produces overly complex cost models, safe approximations can be used to trade complexity with precision.
     166Finally, source code analysis can be used early in the development process, when
     167components have been specified but not implemented, as modularity means
     168that it is enough to specify the non-functional behaviour of missing
     171We have developed \emph{the labelling approach}~\cite{labelling}, a
     172technique to implement compilers that induce cost models on source programs by
     173very lightweight tracking of code changes through compilation. We have studied
     174how to formally prove the correctness of compilers implementing this technique, and
     175have implemented such a compiler from C to object binaries for the 8051
     176microcontroller for predicting execution time and stack space usage,
     177verifying it in an interactive theorem prover.  As we are targeting
     178an embedded microcontroller we do not consider dynamic memory allocation.
     180To demonstrate source-level verification of costs we have implemented
     181a Frama-C plugin~\cite{framac} that invokes the compiler on a source
     182program and uses it to generate invariants on the high-level source
     183that correctly model low-level costs. The plugin certifies that the
     184program respects these costs by calling automated theorem provers, a
     185new and innovative technique in the field of cost analysis. Finally,
     186we have conducted several case studies, including showing that the
     187plugin can automatically compute and certify the exact reaction time
     188of Lustre~\cite{lustre} data flow programs compiled into C.
     190\section{Project context and approach}
     191Formal methods for verifying functional properties of programs have 
     192now reached a level of maturity and automation that their adoption is slowly
     193increasing in production environments. For safety critical code, it is
     194becoming commonplace to combine rigorous software engineering methodologies and testing
     195with static analyses, taking the strengths of each and mitigating
     196their weaknesses. Of particular interest are open frameworks
     197for the combination of different formal methods, where the programs can be
     198progressively specified and enriched with new safety
     199guarantees: every method contributes knowledge (e.g. new invariants) that
     200becomes an assumption for later analysis.
     202The outlook for verifying non-functional properties of programs (time spent,
     203memory used, energy consumed) is bleaker.
     204% and the future seems to be getting even worse.
     205Most industries verify that real time systems meet their deadlines
     206by simply performing many runs of the system and timing their execution,  computing the
     207maximum time and adding an empirical safety margin, claiming the result to be a
     208bound for the WCET of the program. Formal methods and software to statically
     209analyse the WCET of programs exist, but they often produce bounds that are too
     210pessimistic to be useful. Recent advancements in hardware architecture
     211have been
     212focused on the improvement of the average case performance, not the
     213predictability of the worst case. Execution time is becoming increasingly
     214dependent on execution history and the internal state of
     215hardware components like pipelines and caches. Multi-core processors and non-uniform
     216memory models are drastically reducing the possibility of performing
     217static analysis in isolation, because programs are less and less time
     218composable. Clock-precise hardware models are necessary for static analysis, and
     219obtaining them is becoming harder due to the increased sophistication
     220of hardware design.
     222Despite these problems, the need for reliable real time
     223systems and programs is increasing, and there is pressure
     224from the research community for the introduction of
     225hardware with more predictable behaviour, which would be more suitable
     226for static analysis.  One example, being investigated by the Proartis
     227project~\cite{proartis}, is to decouple execution time from execution
     228history by introducing randomisation.
     230In CerCo~\cite{cerco} we do not address this problem, optimistically
     231assuming that improvements in low-level timing analysis or architecture will make
     232verification feasible in the longer term. Instead, the main objective of our work is
     233to bring together the static analysis of functional and non-functional
     234properties, which in the current state of the art are
     235independent activities with limited exchange of information: while the
     236functional properties are verified on the source code, the analysis of
     237non-functional properties is performed on object code to exploit
     238clock-precise hardware models.
     240\subsection{Current object-code methods}
     242Analysis currently takes place on object code for two main reasons.
     243First, there cannot be a uniform, precise cost model for source
     244code instructions (or even basic blocks). During compilation, high level
     245instructions are broken up and reassembled in context-specific ways so that
     246identifying a fragment of object code and a single high level instruction is
     247infeasible. Even the control flow of the object and source code can be very
     248different as a result of optimisations, for example aggressive loop
     249optimisations may completely transform source level loops. Despite the lack of a uniform, compilation- and
     250program-independent cost model on the source language, the literature on the
     251analysis of non-asymptotic execution time on high level languages assuming
     252such a model is growing and gaining momentum. However, unless we provide a
     253replacement for such cost models, this literature's future practical impact looks
     254to be minimal. Some hope has been provided by the EmBounded project \cite{embounded}, which
     255compositionally compiles high-level code to a byte code that is executed by an
     256interpreter with guarantees on the maximal execution time spent for each byte code
     257instruction. This provides a uniform model at the expense of the model's
     258precision (each cost is a pessimistic upper bound) and the performance of the
     259executed code (because the byte code is interpreted compositionally instead of
     260performing a fully non-compositional compilation).
     262The second reason to perform the analysis on the object code is that bounding
     263the worst case execution time of small code fragments in isolation (e.g. loop
     264bodies) and then adding up the bounds yields very poor estimates as no
     265knowledge of the hardware state prior to executing the fragment can be assumed. By
     266analysing longer runs the bound obtained becomes more precise because the lack
     267of information about the initial state has a relatively small impact.
     269To calculate the cost of an execution, value and control flow analyses
     270are required to bound the number of times each basic block is
     271executed.  Currently, state
     272of the art WCET analysis tools, such as AbsInt's aiT toolset~\cite{absint}, perform these analyses on
     273object code, where the logic of the program is harder to reconstruct
     274and most information available at the source code level has been lost;
     275see~\cite{stateart} for a survey. Imprecision in the analysis can lead to useless bounds. To
     276augment precision, the tools ask the user to provide constraints on
     277the object code control flow, usually in the form of bounds on the
     278number of iterations of loops or linear inequalities on them. This
     279requires the user to manually link the source and object code,
     280translating his assumptions on the source code (which may be wrong) to
     281object code constraints. The task is error prone and hard, especially
     282in the presence of complex compiler optimisations.
     284Traditional techniques for WCET that work on object code are also affected by
     285another problem: they cannot be applied before the generation of the object
     286code. Functional properties can be analysed in early development stages, while
     287analysis of non-functional properties may come too late to avoid expensive
     288changes to the program architecture.
     290\subsection{CerCo's approach}
     292In CerCo we propose a radically new approach to the problem: we reject the idea
     293of a uniform cost model and we propose that the compiler, which knows how the
     294code is translated, must return the cost model for basic blocks of high level
     295instructions. It must do so by keeping track of the control flow modifications
     296to reverse them and by interfacing with processor timing analysis.
     298By embracing compilation, instead of avoiding it like EmBounded did, a CerCo
     299compiler can both produce efficient code and return costs that are
     300as precise as the processor timing analysis can be. Moreover, our costs can be
     301parametric: the cost of a block can depend on actual program data, on a
     302summary of the execution history, or on an approximated representation of the
     303hardware state. For example, loop optimisations may assign a cost to a loop body
     304that is a function of the number of iterations performed. As another example,
     305the cost of a block may be a function of the vector of stalled pipeline states,
     306which can be exposed in the source code and updated at each basic block exit. It
     307is parametricity that allows one to analyse small code fragments without losing
     308precision. In the analysis of the code fragment we do not have to ignore the
     309initial hardware state, rather, we may assume that we know exactly which
     310state (or mode, as the WCET literature calls it) we are in.
     312The CerCo approach has the potential to dramatically improve the state of the
     313art. By performing control and data flow analyses on the source code, the error
     314prone translation of invariants is completely avoided. Instead, this
     315work is done at the source level using tools of the user's choice.
     316Any available technique for the verification of functional properties
     317can be immediately reused and multiple techniques can collaborate together to
     318infer and certify cost invariants for the program.  There are no
     319limitations on the types of loops or data structures involved. Parametric cost analysis
     320becomes the default one, with non-parametric bounds used as a last resort when the user
     321decides to trade the complexity of the analysis with its precision. \emph{A priori}, no
     322technique previously used in traditional WCET is lost: processor
     323timing analyses can be used by the compiler on the object code, and the rest can be applied
     324at the source code level.
     325 Our approach can also work in the early
     326stages of development by axiomatically attaching costs to unimplemented components.
     329Software used to verify properties of programs must be as bug free as
     330possible. The trusted code base for verification consists of the code that needs
     331to be trusted to believe that the property holds. The trusted code base of
     332state-of-the-art WCET tools is very large: one needs to trust the control flow
     333analyser, the linear programming libraries used, and also the formal models
     334of the hardware under analysis, for example. In CerCo we are moving the control flow analysis to the source
     335code and we are introducing a non-standard compiler too. To reduce the trusted
     336code base, we implemented a prototype and a static analyser in an interactive
     337theorem prover, which was used to certify that the costs added to the source
     338code are indeed those incurred by the hardware. Formal models of the
     339hardware and of the high level source languages were also implemented in the
     340interactive theorem prover. Control flow analysis on the source code has been
     341obtained using invariant generators, tools to produce proof obligations from
     342generated invariants and automatic theorem provers to verify the obligations. If
     343these tools are able to generate proof traces that can be
     344independently checked, the only remaining component that enters the trusted code
     345base is an off-the-shelf invariant generator which, in turn, can be proved
     346correct using an interactive theorem prover. Therefore we achieve the double
     347objective of allowing the use of more off-the-shelf components (e.g. provers and
     348invariant generators) whilst reducing the trusted code base at the same time.
     350%\paragraph{Summary of the CerCo objectives.} To summarize, the goal of CerCo is
     351% to reconcile functional with non-functional analysis by performing them together
     352% on the source code, sharing common knowledge about execution invariants. We want
     353% to achieve the goal implementing a new generation of compilers that induce a
     354% parametric, precise cost model for basic blocks on the source code. The compiler
     355% should be certified using an interactive theorem prover to minimize the trusted
     356% code base of the analysis. Once the cost model is induced, off-the-shelf tools
     357% and techniques can be combined together to infer and prove parametric cost
     358% bounds.
     359%The long term benefits of the CerCo vision are expected to be:
     360%1. the possibility to perform static analysis during early development stages
     361%2.  parametric bounds made easier
     362%3.  the application of off-the-shelf techniques currently unused for the
     363% analysis of non-functional properties, like automated proving and type systems
     364%4. simpler and safer interaction with the user, that is still asked for
     365% knowledge, but on the source code, with the additional possibility of actually
     366% verifying the provided knowledge
     367%5. a reduced trusted code base
     368%6. the increased accuracy of the bounds themselves.
     370%The long term success of the project is hindered by the increased complexity of
     371% the static prediction of the non-functional behaviour of modern hardware. In the
     372% time frame of the European contribution we have focused on the general
     373% methodology and on the difficulties related to the development and certification
     374% of a cost model inducing compiler.
     376\section{The typical CerCo workflow}
     381char a[] = {3, 2, 7, 14};
     382char threshold = 4;
     384int count(char *p, int len) {
     385  char j;
     386  int found = 0;
     387  for (j=0; j < len; j++) {
     388    if (*p <= threshold)
     389      found++;
     390    p++;
     391    }
     392  return found;
     395int main() {
     396  return count(a,4);
     400%  $\vcenter{\includegraphics[width=7.5cm]{interaction_diagram.pdf}}$
     402    baseline={([yshift=-.5ex]current bounding box)},
     403    element/.style={draw, text width=1.6cm, on chain, text badly centered},
     404    >=stealth
     405    ]
     406{[start chain=going below, node distance=.5cm]
     407\node [element] (cerco) {CerCo\\compiler};
     408\node [element] (cost)  {CerCo\\cost plugin};
     409{[densely dashed]
     410\node [element] (ded)   {Deductive\\platform};
     411\node [element] (check) {Proof\\checker};
     414\coordinate [left=3.5cm of cerco] (left);
     415{[every node/.style={above, text width=3.5cm, text badly centered,
     416                     font=\scriptsize}]
     417\draw [<-] ([yshift=-1ex]cerco.north west) coordinate (t) --
     418    node {C source}
     419    (t-|left);
     420\draw [->] (cerco) -- (cost);
     421\draw [->] ([yshift=1ex]cerco.south west) coordinate (t) --
     422    node {C source+\color{red}{cost annotations}}
     423    (t-|left) coordinate (cerco out);
     424\draw [->] ([yshift=1ex]cost.south west) coordinate (t) --
     425    node {C source+\color{red}{cost annotations}\\+\color{blue}{synthesized assertions}}
     426    (t-|left) coordinate (out);
     427{[densely dashed]
     428\draw [<-] ([yshift=-1ex]ded.north west) coordinate (t) --
     429    node {C source+\color{red}{cost annotations}\\+\color{blue}{complexity assertions}}
     430    (t-|left) coordinate (ded in) .. controls +(-.5, 0) and +(-.5, 0) .. (out);
     431\draw [->] ([yshift=1ex]ded.south west) coordinate (t) --
     432    node {complexity obligations}
     433    (t-|left) coordinate (out);
     434\draw [<-] ([yshift=-1ex]check.north west) coordinate (t) --
     435    node {complexity proof}
     436    (t-|left) .. controls +(-.5, 0) and +(-.5, 0) .. (out);
     437\draw [dash phase=2.5pt] (cerco out) .. controls +(-1, 0) and +(-1, 0) ..
     438    (ded in);
     440%% user:
     441% head
     442\draw (current bounding box.west) ++(-.2,.5)
     443    circle (.2) ++(0,-.2) -- ++(0,-.1) coordinate (t);
     444% arms
     445\draw (t) -- +(-.3,-.2);
     446\draw (t) -- +(.3,-.2);
     447% body
     448\draw (t) -- ++(0,-.4) coordinate (t);
     449% legs
     450\draw (t) -- +(-.2,-.6);
     451\draw (t) -- +(.2,-.6);
     454\caption{On the left: C code to count the number of elements in an array
     455 that are less than or equal to a given threshold. On the right: CerCo's interaction
     456 diagram. Components provided by CerCo are drawn with a solid border.}
     459We illustrate the workflow we envisage (on the right
     460of~\autoref{test1}) on an example program (on the left
     461of~\autoref{test1}).  The user writes the program and feeds it to the
     462CerCo compiler, which outputs an instrumented version of the same
     463program that updates global variables that record the elapsed
     464execution time and the stack space usage.  The red lines in
     465\autoref{itest1} introducing variables, functions and function calls
     466starting with \lstinline'__cost' and \lstinline'__stack' are the instrumentation introduced by the
     467compiler.  For example, the two calls at the start of
     468\lstinline'count' say that 4 bytes of stack are required, and that it
     469takes 111 cycles to reach the next cost annotation (in the loop body).
     470The compiler measures these on the labelled object code that it generates.
     472 The annotated program can then be enriched with complexity
     473assertions in the style of Hoare logic, that are passed to a deductive
     474platform (in our case Frama-C). We provide as a Frama-C cost plugin a
     475simple automatic synthesiser for complexity assertions which can
     476be overridden by the user to increase or decrease accuracy.  These are the blue
     477comments starting with \lstinline'/*@' in \autoref{itest1}, written in
     478Frama-C's specification language, ACSL.  From the
     479assertions, a general purpose deductive platform produces proof
     480obligations which in turn can be closed by automatic or interactive
     481provers, ending in a proof certificate.
     483% NB: if you try this example with the live CD you should increase the timeout
     485Twelve proof obligations are generated from~\autoref{itest1} (to prove
     486that the loop invariant holds after one execution if it holds before,
     487to prove that the whole program execution takes at most 1358 cycles, and so on).  Note that the synthesised time bound for \lstinline'count',
     488$178+214*(1+\text{\lstinline'len'})$ cycles, is parametric in the length of
     489the array. The CVC3 prover
     490closes all obligations within half a minute on routine commodity
     491hardware.  A simpler non-parametric version can be solved in a few
     496        moredelim=[is][\color{blue}]{$}{$},
     497        moredelim=[is][\color{red}]{|}{|},
     498        lineskip=-2pt}
     501|int __cost = 33, __stack = 5, __stack_max = 5;|
     502|void __cost_incr(int incr) { __cost += incr; }|
     503|void __stack_incr(int incr) {
     504  __stack += incr;
     505  __stack_max = __stack_max < __stack ? __stack : __stack_max;
     508char a[4] = {3, 2, 7, 14};  char threshold = 4;
     510/*@ behavior stack_cost:
     511      ensures __stack_max <= __max(\old(__stack_max), 4+\old(__stack));
     512      ensures __stack == \old(__stack);
     513    behavior time_cost:
     514      ensures __cost <= \old(__cost)+(178+214*__max(1+\at(len,Pre), 0));
     516int count(char *p, int len) {
     517  char j;  int found = 0;
     518  |__stack_incr(4);  __cost_incr(111);|
     519  $__l: /* internal */$
     520  /*@ for time_cost: loop invariant
     521        __cost <= \at(__cost,__l)+
     522                  214*(__max(\at((len-j)+1,__l), 0)-__max(1+(len-j), 0));
     523      for stack_cost: loop invariant
     524        __stack_max == \at(__stack_max,__l);
     525      for stack_cost: loop invariant
     526        __stack == \at(__stack,__l);
     527      loop variant len-j;
     528  */
     529  for (j = 0; j < len; j++) {
     530    |__cost_incr(78);|
     531    if (*p <= threshold) { |__cost_incr(136);| found ++; }
     532    else { |__cost_incr(114);| }
     533    p ++;
     534  }
     535  |__cost_incr(67);  __stack_incr(-4);|
     536  return found;
     539/*@ behavior stack_cost:
     540      ensures __stack_max <= __max(\old(__stack_max), 6+\old(__stack));
     541      ensures __stack == \old(__stack);
     542    behavior time_cost:
     543      ensures __cost <= \old(__cost)+1358;
     545int main(void) {
     546  int t;
     547  |__stack_incr(2);  __cost_incr(110);|
     548  t = count(a,4);
     549  |__stack_incr(-2);|
     550  return t;
     553\caption{The instrumented version of the program in \autoref{test1},
     554 with instrumentation added by the CerCo compiler in red and cost invariants
     555 added by the CerCo Frama-C plugin in blue. The \lstinline'__cost',
     556 \lstinline'__stack' and \lstinline'__stack_max' variables hold the elapsed time
     557in clock cycles and the current and maximum stack usage. Their initial values
     558hold the clock cycles spent in initialising the global data before calling
     559\lstinline'main' and the space required by global data (and thus unavailable for
     560the stack).
Note: See TracChangeset for help on using the changeset viewer.