source: Papers/fopara2013/fopara13.tex @ 3462

Last change on this file since 3462 was 3462, checked in by campbell, 6 years ago

Some final changes before submission.

File size: 64.6 KB
RevLine 
[3307]1\documentclass{llncs}
2
3\usepackage{amsfonts}
4\usepackage{amsmath}
5\usepackage{amssymb} 
6\usepackage[english]{babel}
7\usepackage{color}
8\usepackage{fancybox}
[3310]9\usepackage{fancyvrb}
[3307]10\usepackage{graphicx}
[3328]11\usepackage[colorlinks,
12            bookmarks,bookmarksopen,bookmarksdepth=2]{hyperref}
[3307]13\usepackage{hyphenat}
14\usepackage[utf8x]{inputenc}
15\usepackage{listings}
16\usepackage{mdwlist}
17\usepackage{microtype}
18\usepackage{stmaryrd}
19\usepackage{url}
20
[3313]21\usepackage{tikz}
22\usetikzlibrary{positioning,calc,patterns,chains,shapes.geometric,scopes}
23
[3307]24%\renewcommand{\verb}{\lstinline}
25%\def\lstlanguagefiles{lst-grafite.tex}
26%\lstset{language=Grafite}
[3429]27\lstset{language=C,basicstyle=\tt,basewidth=.5em,lineskip=-1.5pt}
[3307]28
29\newlength{\mylength}
30\newenvironment{frametxt}%
31        {\setlength{\fboxsep}{5pt}
32                \setlength{\mylength}{\linewidth}%
33                \addtolength{\mylength}{-2\fboxsep}%
34                \addtolength{\mylength}{-2\fboxrule}%
35                \Sbox
36                \minipage{\mylength}%
37                        \setlength{\abovedisplayskip}{0pt}%
38                        \setlength{\belowdisplayskip}{0pt}%
39                }%
40                {\endminipage\endSbox
41                        \[\fbox{\TheSbox}\]}
42
[3327]43\title{Certified Complexity (CerCo)\thanks{The project CerCo acknowledges the
44financial support of the Future and Emerging Technologies (FET) programme within
45the Seventh Framework Programme for Research of the European Commission, under
46FET-Open grant number: 243881}}
[3320]47\author{
48%Roberto
[3335]49R.~M. Amadio$^{4}$ \and
[3320]50%Nicolas
[3335]51N.~Ayache$^{3,4}$ \and
[3320]52%François
[3335]53F.~Bobot$^{3,4}$ \and
[3320]54%Jacob
[3333]55J.~P. Boender$^1$ \and
[3320]56%Brian
[3333]57B.~Campbell$^2$ \and
[3320]58%Ilias
[3333]59I.~Garnier$^2$ \and
[3320]60%Antoine
[3333]61A.~Madet$^4$ \and
[3320]62%James
[3333]63J.~McKinna$^2$ \and
[3320]64%Dominic
[3333]65D.~P. Mulligan$^1$ \and
[3320]66%Mauro
[3333]67M.~Piccolo$^1$ \and
[3332]68%Randy
[3333]69R.~Pollack$^2$ \and
[3320]70%Yann
[3335]71Y.~R\'egis-Gianas$^{3,4}$ \and
[3320]72%Claudio
[3333]73C.~Sacerdoti Coen$^1$ \and
[3320]74%Ian
[3333]75I.~Stark$^2$ \and
[3320]76%Paolo
[3333]77P.~Tranquilli$^1$}
[3327]78\institute{Dipartimento di Informatica - Scienza e Ingegneria, Universit\'a di
79Bologna \and
[3333]80LFCS, School of Informatics, University of Edinburgh
[3431]81\and INRIA (Team $\pi r^2$)
[3308]82\and
[3335]83Universit\`e Paris Diderot
[3308]84}
[3307]85
86\bibliographystyle{splncs03}
87
88\begin{document}
89
90\maketitle
91
[3419]92% Brian: I've changed "without loss of accuracy" because we discuss
93% important precision/complexity tradeoffs for advanced architectures
94% in the dependent labelling section.
[3307]95\begin{abstract}
[3462]96  We provide an overview of the FET-Open Project CerCo (`Certified
97  Complexity'). Our main achievement is the development of a technique
98  for analysing non-functional properties of programs (time, space) at
99  the source level with little or no loss of accuracy and a small
100  trusted code base. The core component is a C compiler, verified in
101  Matita, that produces an instrumented copy of the source code in
102  addition to generating object code.  This instrumentation exposes, and
103  tracks precisely, the actual (non-asymptotic) computational cost of
104  the input program at the source level. Untrusted invariant
105  generators and trusted theorem provers may then be used to compute
106  and certify the parametric execution time of the code.
[3307]107\end{abstract}
108
109% ---------------------------------------------------------------------------- %
110% SECTION                                                                      %
111% ---------------------------------------------------------------------------- %
112\section{Introduction}
113
[3430]114%\paragraph{Problem statement.}
[3451]115Programs can be specified with both
116functional constraints (what the program must do) and non-functional constraints (what time, space or other resources the program may use).  In the current
117state of the art, functional properties are verified
118by combining user annotations---preconditions, invariants, and so on---with a
119multitude of automated analyses---invariant generators, type systems, abstract
120interpretation, theorem proving, and so on---on the program's high-level source code.
121By contrast, many non-functional properties
122are verified using analyses on low-level object code,
[3437]123%\footnote{A notable
124%  exception is the explicit allocation of heap space in languages like C, which
125%  can be handled at the source level.}
[3451]126but these analyses may then need information
127about the high-level functional behaviour of the program that must then be reconstructed.
[3453]128This analysis on low-level object code has several problems:
129\begin{itemize*}
[3452]130\item
[3453]131It can be hard to deduce the high-level structure of the program after compiler optimisations.
[3452]132The object code produced by an optimising compiler may have radically different control flow to the original source code program.
133\item
[3453]134Techniques that operate on object code are not useful early in the development process of a program, yet problems with a program's design or implementation are cheaper to resolve earlier in the process, rather than later.
[3452]135\item
136Parametric cost analysis is very hard: how can we reflect a cost that depends on the execution state, for example the
137value of a register or a carry bit, to a cost that the user can understand
138looking at the source code?
139\item
140Performing functional analyses on object code makes it hard for the programmer to provide information about the program and its expected execution, leading to a loss of precision in the resulting analyses.
[3453]141\end{itemize*}
[3437]142\paragraph{Vision and approach.}
[3462]143We want to reconcile functional and
[3334]144non-functional analyses: to share information and perform both at the same time
[3453]145on high-level source code.
[3327]146%
147What has previously prevented this approach is the lack of a uniform and precise
[3453]148cost model for high-level code as each statement occurrence is compiled
149differently, optimisations may change control flow, and the cost of an object
[3327]150code instruction may depend on the runtime state of hardware components like
[3334]151pipelines and caches, all of which are not visible in the source code.
[3307]152
[3453]153We envision a new generation of compilers that track program structure through compilation and optimisation and exploit this
154information to define a precise, non-uniform cost model for source code that accounts for runtime state. With such a cost model we can
[3327]155reduce non-functional verification to the functional case and exploit the state
[3334]156of the art in automated high-level verification~\cite{survey}. The techniques
[3441]157currently used by the Worst Case Execution Time (WCET) community, who perform analyses on object code,
[3453]158are still available but can be coupled with additional source-level
159analyses. Where our approach produces overly complex cost models, safe approximations can be used to trade complexity with precision.
160Finally, source code analysis can be used early in the development process, when
161components have been specified but not implemented, as modularity means
162that it is enough to specify the non-functional behaviour of missing
[3327]163components.
[3437]164\paragraph{Contributions.}
[3453]165We have developed \emph{the labelling approach}~\cite{labelling}, a
[3327]166technique to implement compilers that induce cost models on source programs by
167very lightweight tracking of code changes through compilation. We have studied
[3453]168how to formally prove the correctness of compilers implementing this technique, and
[3462]169have implemented such a compiler from C to object binaries for the 8051
[3453]170microcontroller for predicting execution time and stack space usage,
171verifying it in an interactive theorem prover.  As we are targeting
172an embedded microcontroller we do not consider dynamic memory allocation.
[3307]173
[3430]174To demonstrate source-level verification of costs we have implemented
[3453]175a Frama-C plugin~\cite{framac} that invokes the compiler on a source
176program and uses it to generate invariants on the high-level source
[3430]177that correctly model low-level costs. The plugin certifies that the
178program respects these costs by calling automated theorem provers, a
179new and innovative technique in the field of cost analysis. Finally,
[3441]180we have conducted several case studies, including showing that the
[3430]181plugin can automatically compute and certify the exact reaction time
182of Lustre~\cite{lustre} data flow programs compiled into C.
183
184\section{Project context and approach}
[3462]185Formal methods for verifying functional properties of programs have 
186now reached a level of maturity and automation that their adoption is slowly
187increasing in production environments. For safety critical code, it is
188becoming commonplace to combine rigorous software engineering methodologies and testing
[3454]189with static analyses, taking the strengths of each and mitigating
[3334]190their weaknesses. Of particular interest are open frameworks
[3327]191for the combination of different formal methods, where the programs can be
[3454]192progressively specified and enriched with new safety
[3327]193guarantees: every method contributes knowledge (e.g. new invariants) that
194becomes an assumption for later analysis.
[3307]195
[3442]196The outlook for verifying non-functional properties of programs (time spent,
[3430]197memory used, energy consumed) is bleaker.
198% and the future seems to be getting even worse.
199Most industries verify that real time systems meet their deadlines
[3334]200by simply performing many runs of the system and timing their execution,  computing the
[3327]201maximum time and adding an empirical safety margin, claiming the result to be a
202bound for the WCET of the program. Formal methods and software to statically
203analyse the WCET of programs exist, but they often produce bounds that are too
[3430]204pessimistic to be useful. Recent advancements in hardware architecture
205have been
[3327]206focused on the improvement of the average case performance, not the
[3334]207predictability of the worst case. Execution time is becoming increasingly
208dependent on execution history and the internal state of
209hardware components like pipelines and caches. Multi-core processors and non-uniform
210memory models are drastically reducing the possibility of performing
[3327]211static analysis in isolation, because programs are less and less time
[3334]212composable. Clock-precise hardware models are necessary for static analysis, and
[3430]213obtaining them is becoming harder due to the increased sophistication
[3334]214of hardware design.
[3307]215
[3455]216Despite these problems, the need for reliable real time
[3462]217systems and programs is increasing, and there is pressure
218from the research community for the introduction of
[3430]219hardware with more predictable behaviour, which would be more suitable
[3462]220for static analysis.  One example, being investigated by the Proartis
[3442]221project~\cite{proartis}, is to decouple execution time from execution
[3430]222history by introducing randomisation.
[3307]223
[3455]224In CerCo~\cite{cerco} we do not address this problem, optimistically
[3430]225assuming that improvements in low-level timing analysis or architecture will make
[3442]226verification feasible in the longer term. Instead, the main objective of our work is
227to bring together the static analysis of functional and non-functional
[3455]228properties, which in the current state of the art are
[3327]229independent activities with limited exchange of information: while the
230functional properties are verified on the source code, the analysis of
[3455]231non-functional properties is performed on object code to exploit
[3334]232clock-precise hardware models.
[3307]233
[3422]234\subsection{Current object-code methods}
235
[3334]236Analysis currently takes place on object code for two main reasons.
[3422]237First, there cannot be a uniform, precise cost model for source
[3327]238code instructions (or even basic blocks). During compilation, high level
[3462]239instructions are broken up and reassembled in context-specific ways so that
[3334]240identifying a fragment of object code and a single high level instruction is
[3329]241infeasible. Even the control flow of the object and source code can be very
[3334]242different as a result of optimisations, for example aggressive loop
243optimisations may completely transform source level loops. Despite the lack of a uniform, compilation- and
[3327]244program-independent cost model on the source language, the literature on the
[3460]245analysis of non-asymptotic execution time on high level languages assuming
[3430]246such a model is growing and gaining momentum. However, unless we provide a
[3334]247replacement for such cost models, this literature's future practical impact looks
248to be minimal. Some hope has been provided by the EmBounded project \cite{embounded}, which
249compositionally compiles high-level code to a byte code that is executed by an
[3430]250interpreter with guarantees on the maximal execution time spent for each byte code
251instruction. This provides a uniform model at the expense of the model's
252precision (each cost is a pessimistic upper bound) and the performance of the
253executed code (because the byte code is interpreted compositionally instead of
[3334]254performing a fully non-compositional compilation).
[3422]255
[3327]256The second reason to perform the analysis on the object code is that bounding
257the worst case execution time of small code fragments in isolation (e.g. loop
[3442]258bodies) and then adding up the bounds yields very poor estimates as no
259knowledge of the hardware state prior to executing the fragment can be assumed. By
[3327]260analysing longer runs the bound obtained becomes more precise because the lack
[3430]261of information about the initial state has a relatively small impact.
[3307]262
[3430]263To calculate the cost of an execution, value and control flow analyses
264are required to bound the number of times each basic block is
[3460]265executed.  Currently, state
266of the art WCET analysis tools, such as AbsInt's aiT toolset~\cite{absint}, perform these analyses on
[3430]267object code, where the logic of the program is harder to reconstruct
268and most information available at the source code level has been lost;
269see~\cite{stateart} for a survey. Imprecision in the analysis can lead to useless bounds. To
270augment precision, the tools ask the user to provide constraints on
271the object code control flow, usually in the form of bounds on the
272number of iterations of loops or linear inequalities on them. This
273requires the user to manually link the source and object code,
274translating his assumptions on the source code (which may be wrong) to
275object code constraints. The task is error prone and hard, especially
276in the presence of complex compiler optimisations.
[3422]277
278Traditional techniques for WCET that work on object code are also affected by
279another problem: they cannot be applied before the generation of the object
280code. Functional properties can be analysed in early development stages, while
281analysis of non-functional properties may come too late to avoid expensive
282changes to the program architecture.
283
[3437]284\subsection{CerCo's approach}
[3422]285
[3462]286In CerCo we propose a radically new approach to the problem: we reject the idea
[3327]287of a uniform cost model and we propose that the compiler, which knows how the
288code is translated, must return the cost model for basic blocks of high level
289instructions. It must do so by keeping track of the control flow modifications
[3430]290to reverse them and by interfacing with processor timing analysis.
[3314]291
[3462]292By embracing compilation, instead of avoiding it like EmBounded did, a CerCo
[3430]293compiler can both produce efficient code and return costs that are
294as precise as the processor timing analysis can be. Moreover, our costs can be
295parametric: the cost of a block can depend on actual program data, on a
296summary of the execution history, or on an approximated representation of the
[3442]297hardware state. For example, loop optimisations may assign a cost to a loop body
[3327]298that is a function of the number of iterations performed. As another example,
299the cost of a block may be a function of the vector of stalled pipeline states,
300which can be exposed in the source code and updated at each basic block exit. It
[3334]301is parametricity that allows one to analyse small code fragments without losing
[3460]302precision. In the analysis of the code fragment we do not have to ignore the
303initial hardware state, rather, we may assume that we know exactly which
[3334]304state (or mode, as the WCET literature calls it) we are in.
[3307]305
[3462]306The CerCo approach has the potential to dramatically improve the state of the
[3334]307art. By performing control and data flow analyses on the source code, the error
[3462]308prone translation of invariants is completely avoided. Instead, this
[3437]309work is done at the source level using tools of the user's choice.
310Any available technique for the verification of functional properties
[3327]311can be immediately reused and multiple techniques can collaborate together to
[3437]312infer and certify cost invariants for the program.  There are no
313limitations on the types of loops or data structures involved. Parametric cost analysis
[3460]314becomes the default one, with non-parametric bounds used as a last resort when the user
315decides to trade the complexity of the analysis with its precision. \emph{A priori}, no
[3430]316technique previously used in traditional WCET is lost: processor
317timing analyses can be used by the compiler on the object code, and the rest can be applied
[3334]318at the source code level.
[3422]319 Our approach can also work in the early
320stages of development by axiomatically attaching costs to unimplemented components.
[3307]321
322
[3462]323Software used to verify properties of programs must be as bug free as
[3430]324possible. The trusted code base for verification consists of the code that needs
[3327]325to be trusted to believe that the property holds. The trusted code base of
326state-of-the-art WCET tools is very large: one needs to trust the control flow
[3460]327analyser, the linear programming libraries used, and also the formal models
328of the hardware under analysis, for example. In CerCo we are moving the control flow analysis to the source
[3334]329code and we are introducing a non-standard compiler too. To reduce the trusted
[3327]330code base, we implemented a prototype and a static analyser in an interactive
[3430]331theorem prover, which was used to certify that the costs added to the source
332code are indeed those incurred by the hardware. Formal models of the
[3327]333hardware and of the high level source languages were also implemented in the
334interactive theorem prover. Control flow analysis on the source code has been
335obtained using invariant generators, tools to produce proof obligations from
336generated invariants and automatic theorem provers to verify the obligations. If
[3461]337these tools are able to generate proof traces that can be
[3327]338independently checked, the only remaining component that enters the trusted code
339base is an off-the-shelf invariant generator which, in turn, can be proved
340correct using an interactive theorem prover. Therefore we achieve the double
[3417]341objective of allowing the use of more off-the-shelf components (e.g. provers and
[3334]342invariant generators) whilst reducing the trusted code base at the same time.
[3307]343
[3327]344%\paragraph{Summary of the CerCo objectives.} To summarize, the goal of CerCo is
345% to reconcile functional with non-functional analysis by performing them together
346% on the source code, sharing common knowledge about execution invariants. We want
347% to achieve the goal implementing a new generation of compilers that induce a
348% parametric, precise cost model for basic blocks on the source code. The compiler
349% should be certified using an interactive theorem prover to minimize the trusted
350% code base of the analysis. Once the cost model is induced, off-the-shelf tools
351% and techniques can be combined together to infer and prove parametric cost
352% bounds.
[3320]353%The long term benefits of the CerCo vision are expected to be:
354%1. the possibility to perform static analysis during early development stages
355%2.  parametric bounds made easier
[3327]356%3.  the application of off-the-shelf techniques currently unused for the
357% analysis of non-functional properties, like automated proving and type systems
358%4. simpler and safer interaction with the user, that is still asked for
359% knowledge, but on the source code, with the additional possibility of actually
360% verifying the provided knowledge
[3320]361%5. a reduced trusted code base
362%6. the increased accuracy of the bounds themselves.
363%
[3327]364%The long term success of the project is hindered by the increased complexity of
[3329]365% the static prediction of the non-functional behaviour of modern hardware. In the
[3327]366% time frame of the European contribution we have focused on the general
367% methodology and on the difficulties related to the development and certification
368% of a cost model inducing compiler.
[3307]369
[3334]370\section{The typical CerCo workflow}
[3433]371\label{sec:workflow}
[3320]372\begin{figure}[!t]
[3313]373\begin{tabular}{l@{\hspace{0.2cm}}|@{\hspace{0.2cm}}l}
[3310]374\begin{lstlisting}
[3420]375char a[] = {3, 2, 7, 14};
376char threshold = 4;
[3310]377
[3420]378int count(char *p, int len) {
[3311]379  char j;
380  int found = 0;
[3420]381  for (j=0; j < len; j++) {
382    if (*p <= threshold)
383      found++;
[3311]384    p++;
[3420]385    }
[3311]386  return found;
[3310]387}
[3420]388
389int main() {
390  return count(a,4);
391}
[3310]392\end{lstlisting}
[3311]393&
[3313]394%  $\vcenter{\includegraphics[width=7.5cm]{interaction_diagram.pdf}}$
395\begin{tikzpicture}[
396    baseline={([yshift=-.5ex]current bounding box)},
397    element/.style={draw, text width=1.6cm, on chain, text badly centered},
398    >=stealth
399    ]
400{[start chain=going below, node distance=.5cm]
401\node [element] (cerco) {CerCo\\compiler};
402\node [element] (cost)  {CerCo\\cost plugin};
403{[densely dashed]
404\node [element] (ded)   {Deductive\\platform};
405\node [element] (check) {Proof\\checker};
406}
407}
408\coordinate [left=3.5cm of cerco] (left);
409{[every node/.style={above, text width=3.5cm, text badly centered,
410                     font=\scriptsize}]
411\draw [<-] ([yshift=-1ex]cerco.north west) coordinate (t) --
412    node {C source}
413    (t-|left);
414\draw [->] (cerco) -- (cost);
415\draw [->] ([yshift=1ex]cerco.south west) coordinate (t) --
[3420]416    node {C source+\color{red}{cost annotations}}
[3313]417    (t-|left) coordinate (cerco out);
418\draw [->] ([yshift=1ex]cost.south west) coordinate (t) --
[3420]419    node {C source+\color{red}{cost annotations}\\+\color{blue}{synthesized assertions}}
[3313]420    (t-|left) coordinate (out);
421{[densely dashed]
422\draw [<-] ([yshift=-1ex]ded.north west) coordinate (t) --
[3420]423    node {C source+\color{red}{cost annotations}\\+\color{blue}{complexity assertions}}
[3313]424    (t-|left) coordinate (ded in) .. controls +(-.5, 0) and +(-.5, 0) .. (out);
425\draw [->] ([yshift=1ex]ded.south west) coordinate (t) --
426    node {complexity obligations}
427    (t-|left) coordinate (out);
428\draw [<-] ([yshift=-1ex]check.north west) coordinate (t) --
429    node {complexity proof}
430    (t-|left) .. controls +(-.5, 0) and +(-.5, 0) .. (out);
[3327]431\draw [dash phase=2.5pt] (cerco out) .. controls +(-1, 0) and +(-1, 0) ..
432    (ded in);
[3313]433}}
434%% user:
435% head
436\draw (current bounding box.west) ++(-.2,.5)
437    circle (.2) ++(0,-.2) -- ++(0,-.1) coordinate (t);
438% arms
439\draw (t) -- +(-.3,-.2);
440\draw (t) -- +(.3,-.2);
441% body
442\draw (t) -- ++(0,-.4) coordinate (t);
443% legs
444\draw (t) -- +(-.2,-.6);
445\draw (t) -- +(.2,-.6);
446\end{tikzpicture}
[3311]447\end{tabular}
[3461]448\caption{On the left: C code to count the number of elements in an array
449 that are less than or equal to a given threshold. On the right: CerCo's interaction
450 diagram. Components provided by CerCo are drawn with a solid border.}
[3310]451\label{test1}
452\end{figure}
[3427]453We illustrate the workflow we envisage (on the right
454of~\autoref{test1}) on an example program (on the left
455of~\autoref{test1}).  The user writes the program and feeds it to the
456CerCo compiler, which outputs an instrumented version of the same
457program that updates global variables that record the elapsed
458execution time and the stack space usage.  The red lines in
459\autoref{itest1} introducing variables, functions and function calls
[3430]460starting with \lstinline'__cost' and \lstinline'__stack' are the instrumentation introduced by the
461compiler.  For example, the two calls at the start of
462\lstinline'count' say that 4 bytes of stack are required, and that it
463takes 111 cycles to reach the next cost annotation (in the loop body).
[3432]464The compiler measures these on the labelled object code that it generates.
[3420]465
[3427]466 The annotated program can then be enriched with complexity
467assertions in the style of Hoare logic, that are passed to a deductive
468platform (in our case Frama-C). We provide as a Frama-C cost plugin a
[3430]469simple automatic synthesiser for complexity assertions which can
470be overridden by the user to increase or decrease accuracy.  These are the blue
[3431]471comments starting with \lstinline'/*@' in \autoref{itest1}, written in
472Frama-C's specification language, ACSL.  From the
[3427]473assertions, a general purpose deductive platform produces proof
474obligations which in turn can be closed by automatic or interactive
475provers, ending in a proof certificate.
476
[3420]477% NB: if you try this example with the live CD you should increase the timeout
478
479Twelve proof obligations are generated from~\autoref{itest1} (to prove
480that the loop invariant holds after one execution if it holds before,
[3461]481to prove that the whole program execution takes at most 1358 cycles, and so on).  Note that the synthesised time bound for \lstinline'count',
[3430]482$178+214*(1+\text{\lstinline'len'})$ cycles, is parametric in the length of
[3420]483the array. The CVC3 prover
484closes all obligations within half a minute on routine commodity
485hardware.  A simpler non-parametric version can be solved in a few
486seconds.
[3317]487%
[3310]488\fvset{commandchars=\\\{\}}
489\lstset{morecomment=[s][\color{blue}]{/*@}{*/},
490        moredelim=[is][\color{blue}]{$}{$},
[3429]491        moredelim=[is][\color{red}]{|}{|},
492        lineskip=-2pt}
[3310]493\begin{figure}[!p]
494\begin{lstlisting}
[3420]495|int __cost = 33, __stack = 5, __stack_max = 5;|
496|void __cost_incr(int incr) { __cost += incr; }|
[3310]497|void __stack_incr(int incr) {
[3420]498  __stack += incr;
[3310]499  __stack_max = __stack_max < __stack ? __stack : __stack_max;
500}|
501
[3420]502char a[4] = {3, 2, 7, 14};  char threshold = 4;
[3310]503
[3462]504/*@ behavior stack_cost:
[3420]505      ensures __stack_max <= __max(\old(__stack_max), 4+\old(__stack));
[3310]506      ensures __stack == \old(__stack);
[3462]507    behavior time_cost:
[3420]508      ensures __cost <= \old(__cost)+(178+214*__max(1+\at(len,Pre), 0));
509*/
[3430]510int count(char *p, int len) {
[3420]511  char j;  int found = 0;
512  |__stack_incr(4);  __cost_incr(111);|
[3310]513  $__l: /* internal */$
514  /*@ for time_cost: loop invariant
[3420]515        __cost <= \at(__cost,__l)+
516                  214*(__max(\at((len-j)+1,__l), 0)-__max(1+(len-j), 0));
[3310]517      for stack_cost: loop invariant
518        __stack_max == \at(__stack_max,__l);
519      for stack_cost: loop invariant
520        __stack == \at(__stack,__l);
[3420]521      loop variant len-j;
522  */
523  for (j = 0; j < len; j++) {
524    |__cost_incr(78);|
525    if (*p <= threshold) { |__cost_incr(136);| found ++; }
526    else { |__cost_incr(114);| }
527    p ++;
[3310]528  }
[3420]529  |__cost_incr(67);  __stack_incr(-4);|
[3310]530  return found;
531}
[3420]532
[3462]533/*@ behavior stack_cost:
[3420]534      ensures __stack_max <= __max(\old(__stack_max), 6+\old(__stack));
535      ensures __stack == \old(__stack);
[3462]536    behavior time_cost:
[3420]537      ensures __cost <= \old(__cost)+1358;
538*/
539int main(void) {
540  int t;
541  |__stack_incr(2);  __cost_incr(110);|
542  t = count(a,4);
543  |__stack_incr(-2);|
544  return t;
545}
[3310]546\end{lstlisting}
[3317]547\caption{The instrumented version of the program in \autoref{test1},
[3320]548 with instrumentation added by the CerCo compiler in red and cost invariants
[3462]549 added by the CerCo Frama-C plugin in blue. The \lstinline'__cost',
550 \lstinline'__stack' and \lstinline'__stack_max' variables hold the elapsed time
[3329]551in clock cycles and the current and maximum stack usage. Their initial values
552hold the clock cycles spent in initialising the global data before calling
[3462]553\lstinline'main' and the space required by global data (and thus unavailable for
[3320]554the stack).
555}
[3310]556\label{itest1}
557\end{figure}
[3339]558\section{Main scientific and technical results}
[3432]559First we describe the basic labelling approach and our compiler
[3425]560implementations that use it.  This is suitable for basic architectures
[3432]561with simple cost models.  Then we will discuss the dependent labelling
[3425]562extension which is suitable for more advanced processor architectures
563and compiler optimisations.  At the end of this section we will
564demonstrate automated high level reasoning about the source level
565costs provided by the compilers.
[3317]566
[3432]567% \emph{Dependent labelling~\cite{?}.} The basic labelling approach assumes a
[3328]568% bijective mapping between object code and source code O(1) blocks (called basic
569% blocks). This assumption is violated by many program optimizations (e.g. loop
570% peeling and loop unrolling). It also assumes the cost model computed on the
571% object code to be non parametric: every block must be assigned a cost that does
572% not depend on the state. This assumption is violated by stateful hardware like
[3432]573% pipelines, caches, branch prediction units. The dependent labelling approach is
574% an extension of the basic labelling approach that allows to handle parametric
[3328]575% cost models. We showed how the method allows to deal with loop optimizations and
576% pipelines, and we speculated about its applications to caches.
577%
578% \emph{Techniques to exploit the induced cost model.} Every technique used for
579% the analysis of functional properties of programs can be adapted to analyse the
580% non-functional properties of the source code instrumented by compilers that
[3432]581% implement the labelling approach. In order to gain confidence in this claim, we
[3328]582% showed how to implement a cost invariant generator combining abstract
583% interpretation with separation logic ideas \cite{separation}. We integrated
584% everything in the Frama-C modular architecture, in order to compute proof
585% obligations from functional and cost invariants and to use automatic theorem
586% provers on them. This is an example of a new technique that is not currently
587% exploited in WCET analysis. It also shows how precise functional invariants
588% benefits the non-functional analysis too. Finally, we show how to fully
589% automatically analyse the reaction time of Lustre nodes that are first compiled
590% to C using a standard Lustre compiler and then processed by a C compiler that
[3432]591% implements the labelling approach.
[3328]592
593% \emph{The CerCo compiler.} This is a compiler from a large subset of the C
594% program to 8051/8052 object code,
[3432]595% integrating the labelling approach and a static analyser for 8051 executables.
[3328]596% The latter can be implemented easily and does not require dependent costs
597% because the 8051 microprocessor is a very simple one, with constant-cost
598% instruction. It was chosen to separate the issue of exact propagation of the
599% cost model from the orthogonal problem of the static analysis of object code
600% that may require approximations or dependent costs. The compiler comes in
601% several versions: some are prototypes implemented directly in OCaml, and they
[3432]602% implement both the basic and dependent labelling approaches; the final version
[3328]603% is extracted from a Matita certification and at the moment implements only the
604% basic approach.
605
[3432]606\subsection{The (basic) labelling approach}
607The labelling approach is the foundational insight that underlies all the developments in CerCo.
[3462]608It allows the evolution of basic blocks to be tracked throughout the compilation process in order to propagate the cost model from the
[3336]609object code to the source code without losing precision in the process.
[3328]610\paragraph{Problem statement.} Given a source program $P$, we want to obtain an
[3327]611instrumented source program $P'$,  written in the same programming language, and
612the object code $O$ such that: 1) $P'$ is obtained by inserting into $P$ some
613additional instructions to update global cost information like the amount of
614time spent during execution or the maximal stack space required; 2) $P$ and $P'$ 
[3329]615must have the same functional behaviour, i.e.\ they must produce that same output
[3327]616and intermediate observables; 3) $P$ and $O$ must have the same functional
[3329]617behaviour; 4) after execution and in interesting points during execution, the
[3327]618cost information computed by $P'$ must be an upper bound of the one spent by $O$ 
[3433]619to perform the corresponding operations (\emph{soundness} property); 5) the difference
[3327]620between the costs computed by $P'$ and the execution costs of $O$ must be
[3433]621bounded by a program-dependent constant (\emph{precision} property).
[3317]622
[3432]623\paragraph{The labelling software components.} We solve the problem in four
624stages \cite{labelling}, implemented by four software components that are used
[3327]625in sequence.
[3321]626
[3327]627The first component labels the source program $P$ by injecting label emission
[3328]628statements in appropriate positions to mark the beginning of basic blocks.
[3433]629These are the positions where the cost instrumentation will appear in the final output.
[3328]630% The
[3432]631% set of labels with their positions is called labelling.
[3328]632The syntax and semantics
[3327]633of the source programming language is augmented with label emission statements.
[3328]634The statement ``EMIT $\ell$'' behaves like a NOP instruction that does not affect the
635program state or control flow, but its execution is observable.
636% Therefore the observables of a run of a program becomes a stream
637% of label emissions: $\ell_1,\ldots,\ell_n$, called the program trace. We clarify the conditions
[3432]638% that the labelling must respect later.
[3433]639For the example in Section~\ref{sec:workflow} this is just the original C code
640with ``EMIT'' instructions added at every point a \lstinline'__cost_incr' call
641appears in the final code.
[3321]642
[3432]643The second component is a labelling preserving compiler. It can be obtained from
[3327]644an existing compiler by adding label emission statements to every intermediate
645language and by propagating label emission statements during compilation. The
[3329]646compiler is correct if it preserves both the functional behaviour of the program
[3433]647and the traces of observables, including the labels `emitted'.
[3328]648% We may also ask that the function that erases the cost
649% emission statements commute with compilation. This optional property grants that
[3432]650% the labelling does not interfere with the original compiler behaviour. A further
[3328]651% set of requirements will be added later.
[3321]652
[3433]653The third component analyses the labelled
654object code to compute the scope of each of its label emission statements,
655i.e.\ the instructions that may be executed after the statement and before
656a new label emission is encountered, and then computes the maximum cost of
657each.  Note that we only have enough information at this point to compute
658the cost of loop-free portions of code.  We will consider how to ensure that
659every loop is broken by a cost label shortly.
660%It is a tree and not a sequence because the scope
661%may contain a branching statement. In order to grant that such a finite tree
662%exists, the object code must not contain any loop that is not broken by a label
663%emission statement. This is the first requirement of a sound labelling. The
664%analyser fails if the labelling is unsound. For each scope, the analyser
665%computes an upper bound of the execution time required by the scope, using the
666%maximum of the costs of the two branches in case of a conditional statement.
667%Finally, the analyser computes the cost of a label by taking the maximum of the
668%costs of the scopes of every statement that emits that label.
[3321]669
[3433]670The fourth and final component replaces the labels in the labelled
671version of the source code produced at the start with the costs
672computed for each label's scope.  This yields the instrumented source
673code.  For the example, this is the code in \autoref{itest1}, except
674for the specifications in comments, which we consider in
675Section~\ref{sec:exploit}.
[3321]676
[3433]677\paragraph{Correctness.} Requirements 1 and 2 hold because of the
678non-invasive labelling procedure.  Requirement 3 can be satisfied by
679implementing compilation correctly.  It is obvious that the value of
680the global cost variable of the instrumented source code is always
681equal to the sum of the costs of the labels emitted by the
682corresponding labelled code. Moreover, because the compiler preserves
683all traces, the sum of the costs of the labels emitted in the source
684and target labelled code are the same. Therefore, to satisfy the
685soundness requirement, we need to ensure that the time taken to
686execute the object code is equal to the sum of the costs of the labels
687emitted by the object code. We collect all the necessary conditions
688for this to happen in the definition of a \emph{sound} labelling: a)
689all loops must be broken by a cost emission statement; b) all program
690instructions must be in the scope of some cost emission statement.
691This ensures that every label's scope is a tree of instructions, with
692the cost being the most expensive path. To satisfy also the precision
693requirement, we must make the scopes flat sequences of instructions.
694We require a \emph{precise} labelling where every label is emitted at
695most once and both branches of each conditional jump start with a label
[3327]696emission statement.
[3317]697
[3432]698The correctness and precision of the labelling approach only rely on the
699correctness and precision of the object code labelling. The simplest
[3433]700way to achieve that is to impose correctness and precision
701requirements on the source code labelling produced at the start, and
[3443]702to demand that the compiler preserves these
[3327]703properties too. The latter requirement imposes serious limitations on the
[3443]704compilation strategy and optimisations: the compiler may not duplicate any code
705that contains label emission statements, like loop bodies. Therefore various
[3336]706loop optimisations like peeling or unrolling are prevented. Moreover, precision
[3432]707of the object code labelling is not sufficient \emph{per se} to obtain global
[3417]708precision: we implicitly assumed that a precise constant cost can be assigned
709to every instruction. This is not possible in
[3336]710the presence of stateful hardware whose state influences the cost of operations,
[3425]711like pipelines and caches. In Section~\ref{lab:deplabel} we will see an extension of the
[3433]712basic labelling approach which tackles these problems.
[3317]713
[3425]714In CerCo we have developed several cost preserving compilers based on
[3432]715the labelling approach. Excluding an initial certified compiler for a
[3425]716`while' language, all remaining compilers target realistic source
717languages---a pure higher order functional language and a large subset
[3462]718of C with pointers, \lstinline'goto's and all data structures---and real world
719target processors---MIPS and the Intel 8051 processor
[3425]720family. Moreover, they achieve a level of optimisation that ranges
721from moderate (comparable to GCC level 1) to intermediate (including
722loop peeling and unrolling, hoisting and late constant propagation).
723We describe the C compilers in detail in the following section.
[3317]724
[3425]725Two compilation chains were implemented for a purely functional
[3432]726higher-order language~\cite{labelling2}. The two main changes required
[3425]727to deal with functional languages are: 1) because global variables and
728updates are not available, the instrumentation phase produces monadic
729code to `update' the global costs; 2) the requirements for a sound and
[3432]730precise labelling of the source code must be changed when the
[3425]731compilation is based on CPS translations.  In particular, we need to
732introduce labels emitted before a statement is executed and also
733labels emitted after a statement is executed. The latter capture code
734that is inserted by the CPS translation and that would escape all
735label scopes.
736
[3417]737% Brian: one of the reviewers pointed out that standard Prolog implementations
738% do have some global state that apparently survives backtracking and might be
739% used.  As we haven't experimented with this, I think it's best to elide it
740% entirely.
[3317]741
[3417]742% Phases 1, 2 and 3 can be applied as well to logic languages (e.g. Prolog).
743% However, the instrumentation phase cannot: in standard Prolog there is no notion
744% of (global) variable whose state is not retracted during backtracking.
745% Therefore, the cost of executing computations that are later backtracked would
746% not be correctly counted in. Any extension of logic languages with
[3432]747% non-backtrackable state could support our labelling approach.
[3417]748
[3425]749\subsection{The CerCo C compilers}
750We implemented two C compilers, one implemented directly in OCaml and
[3455]751the other implemented in Matita, an interactive theorem prover~\cite{matita}.  The first acted as a prototype for the second,
[3425]752but also supported MIPS and acted as a testbed for more advanced
[3432]753features such as the dependent labelling approach
[3425]754in Section~\ref{lab:deplabel}.
755
756The second C compiler is the \emph{Trusted CerCo Compiler}, whose cost
[3433]757predictions are formally verified. The executable code
758is OCaml code extracted from the Matita implementation. The Trusted
759CerCo Compiler only targets
760the C language and the 8051/8052 family, and does not yet implement any
761advanced optimisations. Its user interface, however, is the same as the
[3443]762other version for interoperability purposes. In
[3433]763particular, the Frama-C CerCo plugin descibed in
764Section~\ref{sec:exploit} can work without recompilation
[3425]765with both of our C compilers.
766
[3462]767The 8051 microprocessor is a very simple one, with constant-cost
[3425]768instructions. It was chosen to separate the issue of exact propagation of the
[3433]769cost model from the orthogonal problem of low-level timing analysis of object code
770that may require approximation or dependent costs.
[3425]771
772The (trusted) CerCo compiler implements the following optimisations: cast
773simplification, constant propagation in expressions, liveness analysis driven
[3443]774spilling of registers, dead code elimination, branch displacement, and tunnelling.
[3425]775The two latter optimisations are performed by our optimising assembler
776\cite{correctness}. The back-end of the compiler works on three address
777instructions, preferred to static single assignment code for the simplicity of
778the formal certification.
779
780The CerCo compiler is loosely based on the CompCert compiler \cite{compcert}, a
781recently developed certified compiler from C to the PowerPC, ARM and x86
782microprocessors. In contrast to CompCert, both the CerCo code and its
783certification are fully open source. Some data structures and language definitions for
784the front-end are directly taken from CompCert, while the back-end is a redesign
785of a compiler from Pascal to MIPS used by Fran\c{c}ois Pottier for a course at the
[3443]786\'{E}cole Polytechnique.
[3425]787The main differences in the CerCo compiler are:
[3455]788\begin{itemize*}
[3425]789\item All the intermediate languages include label emitting instructions to
[3432]790implement the labelling approach, and the compiler preserves execution traces.
[3433]791\item Instead of targeting an assembly language with additional
792macro-instructions which are expanded before assembly, we directly
793produce object code in order to perform the timing analysis, using
794an integrated optimising assembler.
[3425]795\item In order to avoid the additional work of implementing a linker
796 and a loader, we do not support separate
797compilation and external calls. Adding them is orthogonal
[3432]798to the labelling approach and should not introduce extra problems.
[3425]799\item We target an 8-bit processor, in contrast to CompCert's 32-bit targets. This requires
800many changes and more compiler code, but it is not fundamentally more
801complex. The proof of correctness, however, becomes much harder.
[3430]802\item We target a microprocessor that has a non-uniform memory model, which is
[3425]803still often the case for microprocessors used in embedded systems and that is
804becoming common again in multi-core processors. Therefore the compiler has to
805keep track of data and it must move data between memory regions in the proper
[3433]806way. Moreover the size of pointers to different regions is not uniform. %An
807%additional difficulty was that the space available for the stack in internal
808%memory in the 8051 is tiny, allowing only a minor number of nested calls. To
809%support full recursion in order to test the CerCo tools also on recursive
810%programs, the compiler implements a stack in external memory.
[3455]811\end{itemize*}
[3425]812
813\subsection{Formal certification of the CerCo compiler}
814We have formally
[3455]815certified in Matita that the cost models induced on the source code by the
[3425]816Trusted CerCo Compiler correctly and precisely
817predict the object code behaviour. There are two cost models, one for execution
818time and one for stack space consumption. We show the correctness of the prediction
[3434]819only for those programs that do not exhaust the available stack space, a
[3425]820property that---thanks to the stack cost model---we can statically analyse on the
[3434]821source code in sharp contrast to other certified compilers.  Other projects have
[3425]822already certified the preservation of functional semantics in similar compilers,
[3434]823so we have not attempted to directly repeat that work and assume
824functional correctness for most passes. In order to complete the
825proof for non-functional properties, we have introduced a new,
[3443]826structured, form of execution trace, with the
[3434]827related notions for forward similarity and the intensional
[3425]828consequences of forward similarity. We have also introduced a unified
829representation for back-end intermediate languages that was exploited to provide
830a uniform proof of forward similarity.
831
832The details on the proof techniques employed
833and
[3462]834the proof sketch can be found in the CerCo deliverables and papers~\cite{cerco-website}.
[3425]835In this section we will only hint at the correctness statement, which turned
[3434]836out to be more complex than expected.
[3425]837
[3434]838\paragraph{The correctness statement.}
[3425]839Real time programs are often reactive programs that loop forever responding to
840events (inputs) by performing some computation followed by some action (output)
[3434]841and continuing as before. For these programs the overall execution
842time does not make sense. The same is true for reactive programs that spend an
843unpredictable amount of time in I/O. Instead, what is interesting is the reaction time ---
844 the time spent between I/O events. Moreover, we are interested in
[3428]845predicting and ruling out crashes due to running out of space on certain
846inputs.
[3434]847Therefore we need a statement that talks about sub-runs of a
848program. A natural candidate is that the time predicted on the source
[3425]849code and spent on the object code by two corresponding sub-runs are the same.
[3434]850To make this statement formal we must identify the
[3425]851corresponding sub-runs and how to single out those that are meaningful.
[3434]852We introduce the notion of a \emph{measurable} sub-run of a run
853which does not exhaust the available stack before or during the
854sub-run, the number of function calls and returns
855in the sub-run is the same, the sub-run does not perform any I/O, and the sub-run
[3425]856starts with a label emission statement and ends with a return or another label
[3434]857emission statement. The stack usage is bounded using the stack usage model
[3425]858that is computed by the compiler.
859
860The statement that we formally proved is: for each C run with a measurable
[3434]861sub-run, there exists an object code run with a sub-run, with the same
862execution trace for both the prefix of the run and the sub-run itself,
863 and where the time spent by the
864object code in the sub-run is the same as the time predicted on the source code
[3425]865using the time cost model generated by the compiler.
[3434]866
867
[3425]868We briefly discuss the constraints for measurability. Not exhausting the stack
[3434]869space is necessary for a run to be meaningful, because the source semantics
870has no notion of running out of memory. Balancing
[3432]871function calls and returns is a requirement for precision: the labelling approach
[3434]872allows the scope of a label to extend after function calls to
873minimize the number of labels. (The scope excludes the called
874function's execution.) If the number of
[3425]875calls/returns is unbalanced, it means that there is a call we have not returned
876to that could be followed by additional instructions whose cost has already been
[3434]877taken in account.  The last condition on the start and end points of
878a run is also required to make the bound precise.  With these
879restrictions and the 8051's simple timing model we obtain \emph{exact}
880predictions.  If we relax these conditions then we obtain a corollary
881with an upper bound on the cost.
882Finally, I/O operations can be performed in the prefix of the run, but not in the
[3425]883measurable sub-run. Therefore we prove that we can predict reaction times, but
884not I/O times, as desired.
885
[3432]886\subsection{Dependent labelling}
[3425]887\label{lab:deplabel}
[3432]888The core idea of the basic labelling approach is to establish a tight connection
[3327]889between basic blocks executed in the source and target languages. Once the
890connection is established, any cost model computed on the object code can be
891transferred to the source code, without affecting the code of the compiler or
[3435]892its proof. In particular, we can also transport cost models
893that associate to each label a \emph{function} from the hardware state
894to a natural number.
895However, a problem arises during the instrumentation phase that replaces label
896emission statements with increments of global cost variables. They are
897incremented by the result of applying the label's cost function
898to the hardware state at the time of execution of the block. However,
899the hardware state comprises both the functional state that affects the
[3328]900computation (the value of the registers and memory) and the non-functional
[3435]901state that does not (the pipeline and cache contents, for example).
902We can find corresponding information for the former in the source code state, but constructing the
[3327]903correspondence may be hard and lifting the cost model to work on the source code
[3336]904state is likely to produce cost expressions that are too complex to understand and reason about.
[3435]905Fortunately, in modern architectures the cost of executing single
[3328]906instructions is either independent of the functional state or the jitter---the
907difference between the worst and best case execution times---is small enough
[3424]908to be bounded without losing too much precision. Therefore we only consider
909dependencies on the `non-functional' parts of the state.
[3317]910
[3428]911The non-functional state is not directly related to the high level
912state and does not influence the functional properties. What can be
913done is to expose key aspects of the non-functional state in the
914source code. We present here the basic intuition in a simplified form:
915the technical details that allow us to handle the general case are
916more complex and can be found in~\cite{paolo}. We add to the source
917code an additional global variable that represents the non-functional
918state and another one that remembers the last few labels emitted. The
919state variable must be updated at every label emission statement,
[3435]920using an update function which is computed during the processor timing
921analysis.
922This update function assigns to each label a function from the
[3428]923recently emitted labels and old state to the new state. It is computed
[3435]924by composing the semantics of every instruction in a basic block
925restricted to the non-functional part of the state.
[3317]926
[3327]927Not all the details of the non-functional state needs to be exposed, and the
[3444]928technique works better when the part of state that is required can be summarised
[3327]929in a simple data structure. For example, to handle simple but realistic
930pipelines it is sufficient to remember a short integer that encodes the position
[3417]931of bubbles (stuck instructions) in the pipeline. In any case, it is not necessary
932for the user to understand the meaning of the state to reason over the properties
933of the
[3435]934program. Moreover, the user, or the invariant generator tools that
[3417]935analyse the instrumented source code produced by the compiler, can decide to
[3428]936trade precision of the analysis for simplicity by approximating the
[3435]937cost by safe bounds that do not depend on the processor state. Interestingly, the functional analysis of
[3424]938the code could determine which blocks are executed more frequently in order to
[3435]939use more aggressive approximations for those that are executed least.
[3317]940
[3432]941Dependent labelling can also be applied to allow the compiler to duplicate
[3336]942blocks that contain labels (e.g. in loop optimisations)~\cite{paolo}. The effect is to assign
[3327]943a different cost to the different occurrences of a duplicated label. For
944example, loop peeling turns a loop into the concatenation of a copy of the loop
[3424]945body for the first iteration and the conditional execution of the
946loop for successive iterations. Further optimisations will compile the two
947copies of the loop differently, with the first body usually
[3327]948taking more time.
[3317]949
[3417]950By introducing a variable that keeps track of the iteration number, we can
[3327]951associate to the label a cost that is a function of the iteration number. The
[3435]952same technique works for loop unrolling without modification: the function will
953assign one cost to the even iterations and another cost to the odd
[3444]954ones.  The optimisation code
955that duplicates the loop bodies must also modify the code to correctly propagate
[3435]956the update of the iteration numbers. The technical details are more complicated and
[3327]957can be found in the CerCo reports and publications. The implementation, however,
[3425]958is quite simple (and forms part of our OCaml version of the compiler)
959and the changes to a loop optimising compiler are minimal.
[3317]960
[3328]961\subsection{Techniques to exploit the induced cost model}
[3433]962\label{sec:exploit}
[3425]963We now turn our attention to synthesising high-level costs, such as
[3436]964the reaction time of a real-time program.  We consider as our starting point source level costs
[3432]965provided by basic labelling, in other words annotations
[3425]966on the source code which are constants that provide a sound and sufficiently
[3327]967precise upper bound on the cost of executing the blocks after compilation to
968object code.
[3317]969
[3327]970The principle that we have followed in designing the cost synthesis tools is
[3436]971that the synthesised bounds should be expressed and proved within a general
[3327]972purpose tool built to reason on the source code. In particular, we rely on the
[3328]973Frama-C tool to reason on C code and on the Coq proof-assistant to reason on
[3327]974higher-order functional programs.
[3436]975This principle entails that
976the inferred synthetic bounds are indeed correct as long as the general purpose
977tool is, and that there is no limitation on the class of programs that can be handled,
978for example by resorting to interactive proof.
[3317]979
[3327]980Of course, automation is desirable whenever possible. Within this framework,
981automation means writing programs that give hints to the general purpose tool.
982These hints may take the form, say, of loop invariants/variants, of predicates
983describing the structure of the heap, or of types in a light logic. If these
984hints are correct and sufficiently precise the general purpose tool will produce
985a proof automatically, otherwise, user interaction is required.
[3317]986
[3339]987\paragraph{The Cost plugin and its application to the Lustre compiler.}
[3328]988Frama-C \cite{framac} is a set of analysers for C programs with a
[3455]989specification language, ACSL. New analyses can be added dynamically
990via plugins. For instance, the Jessie plugin~\cite{jessie} allows deductive
[3327]991verification of C programs with respect to their specification in ACSL, with
992various provers as back-end tools.
[3328]993We developed the CerCo Cost plugin for the Frama-C platform as a proof of
[3327]994concept of an automatic environment exploiting the cost annotations produced by
[3417]995the CerCo compiler. It consists of an OCaml program which essentially
[3436]996uses the CerCo compiler to produce a related C program with cost annotations,
997and applies some heuristics to produce a tentative bound on the cost of
[3327]998executing the C functions of the program as a function of the value of their
[3436]999parameters. The user can then call the Jessie plugin to discharge the
[3327]1000related proof obligations.
1001In the following we elaborate on the soundness of the framework and the
[3436]1002experiments we performed with the Cost tool on C programs, including some produced by a
[3327]1003Lustre compiler.
[3317]1004
[3339]1005\paragraph{Soundness.} The soundness of the whole framework depends on the cost
[3436]1006annotations added by the CerCo compiler,
1007the verification conditions (VCs) generated by Jessie, and the
1008external provers discharging the VCs. Jessie can be used to verify the
1009synthesised bounds because our plugin generates them in ACSL format. Thus, even if the added synthetic costs are
[3327]1010incorrect (relatively to the cost annotations), the process as a whole is still
1011correct: indeed, Jessie will not validate incorrect costs and no conclusion can
1012be made about the WCET of the program in this case. In other terms, the
[3436]1013soundness does not depend on the cost plugin, which can in
[3327]1014principle produce any synthetic cost. However, in order to be able to actually
[3462]1015prove a WCET bound for a C function, we need to add correct annotations in a way that
[3327]1016Jessie and subsequent automatic provers have enough information to deduce their
1017validity. In practice this is not straightforward even for very simple programs
1018composed of branching and assignments (no loops and no recursion) because a fine
1019analysis of the VCs associated with branching may lead to a complexity blow up.
[3436]1020
[3444]1021\paragraph{Experience with Lustre.} Lustre~\cite{lustre} is a data-flow language for programming
[3417]1022synchronous systems, with a compiler which targets C. We designed a
[3327]1023wrapper for supporting Lustre files.
[3436]1024The C function produced by the compiler is relatively simple loop-free
1025code which implements the step function of the
1026synchronous system and computing the WCET of the function amounts to obtaining a
[3328]1027bound on the reaction time of the system. We tested the Cost plugin and the
[3327]1028Lustre wrapper on the C programs generated by the Lustre compiler. For programs
[3424]1029consisting of a few hundred lines of code, the cost plugin computes a
1030WCET and Alt-Ergo is able to discharge all VCs automatically.
[3317]1031
[3339]1032\paragraph{Handling C programs with simple loops.}
[3327]1033The cost annotations added by the CerCo compiler take the form of C instructions
[3424]1034that update a fresh global variable called the cost variable by a constant.
[3462]1035Synthesizing a WCET bound of a C function thus consists of statically resolving an
[3327]1036upper bound of the difference between the value of the cost variable before and
[3424]1037after the execution of the function, i.e. finding the instructions
[3327]1038that update the cost variable and establish the number of times they are passed
[3424]1039through during the flow of execution. To perform the analysis the plugin
[3436]1040assumes that there are no recursive functions in the program, and that
1041every loop is annotated with a variant. In the case of `for' loops the
1042variants are automatically inferred where a loop counter can be
1043syntactically detected.
[3317]1044
[3436]1045The plugin computes a call-graph and proceeds to calculate bounds for
1046each function from the leaves up to the main function.
1047The computation of the cost of each function is performed by traversing its
1048control flow graph, where the cost of a node is the maximum of the
[3327]1049costs of the successors.
1050In the case of a loop with a body that has a constant cost for every step of the
1051loop, the cost is the product of the cost of the body and of the variant taken
1052at the start of the loop.
1053In the case of a loop with a body whose cost depends on the values of some free
[3335]1054variables, a fresh logic function $f$ is introduced to represent the cost of the
[3327]1055loop in the logic assertions. This logic function takes the variant as a first
[3335]1056parameter. The other parameters of $f$ are the free variables of the body of the
[3444]1057loop. An axiom is added to account for the fact that the cost is accumulated at each
[3327]1058step of the loop.
[3455]1059The cost of the function is added as post-condition of the function.
[3321]1060
[3436]1061The user can also specify more precise variants and annotate functions
1062with their own cost specifications. The plugin will use these
1063instead of computing its own, allowing greater precision and the
1064ability to analyse programs which the variant generator does not
1065support.
1066
1067In addition to the loop-free Lustre code, this method was successfully
[3437]1068applied to a small range of cryptographic code.  See~\cite{labelling}
1069for more details.  The example in Section~\ref{sec:workflow} was also
1070produced using the plug-in.  The variant was calculated automatically
1071by noticing that \lstinline'j' is a loop counter with maximum value
1072\lstinline'len'.  The most expensive path through the loop body
1073($78+136 = 214$) is then multiplied by the number of iterations to
1074give the cost of the loop.
[3436]1075
[3339]1076\paragraph{C programs with pointers.}
[3455]1077Using first-order logic and SMT solvers to specify and verify programs involving pointer-based
1078data structures such as linked-lists or graphs shows some limitations.
1079Separation logic, a program logic
1080with a new notion of conjunction to express spatial heap separation, is an elegant alternative.
1081Bobot has
[3337]1082recently introduced automatically generated separation
1083predicates to simulate separation logic reasoning in the Jessie plugin where the specification language, the verification condition
[3327]1084generator, and the theorem provers were not designed with separation logic in
[3455]1085mind~\cite{bobot}. CerCo's plugin can exploit these predicates to automatically
1086reason about the cost of execution of simple heap manipulation programs such as an
[3327]1087in-place list reversal.
[3317]1088
[3417]1089\section{Conclusions and future work}
[3316]1090
[3462]1091All CerCo software and deliverables may be found on the project homepage~\cite{cerco-website}.
[3316]1092
1093The results obtained so far are encouraging and provide evidence that
1094it is possible to perform static time and space analysis at the source level
[3337]1095without losing accuracy, reducing the trusted code base and reconciling the
[3327]1096study of functional and non-functional properties of programs. The
[3316]1097techniques introduced seem to be scalable, cover both imperative and
[3337]1098functional languages and are compatible with every compiler optimisation
[3316]1099considered by us so far.
1100
[3337]1101To prove that compilers can keep track of optimisations
[3316]1102and induce a precise cost model on the source code, we targeted a simple
1103architecture that admits a cost model that is execution history independent.
[3322]1104The most important future work is dealing with hardware architectures
[3444]1105characterised by history-dependent stateful components, like caches and
[3428]1106pipelines. The main issue is to assign a parametric, dependent cost
[3432]1107to basic blocks that can be later transferred by the labelling approach to
[3316]1108the source code and represented in a meaningful way to the user. The dependent
[3432]1109labelling approach that we have studied seems a promising tool to achieve
[3428]1110this goal, but more work is required to provide good source level
1111approximations of the relevant processor state.
[3316]1112
[3428]1113Other examples of future work are to improve the cost invariant
[3443]1114generator algorithms and the coverage of compiler optimisations, to combining
[3432]1115the labelling approach with the type and effect discipline of~\cite{typeffects}
[3428]1116to handle languages with implicit memory management, and to experiment with
1117our tools in the early phases of development. Larger case studies are also necessary
[3337]1118to evaluate the CerCo's prototype on realistic, industrial-scale programs.
[3316]1119
[3339]1120% \bibliographystyle{splncs}
1121\bibliography{fopara13}
1122% \begin{thebibliography}{19}
1123%
1124% \bibitem{survey} \textbf{A Survey of Static Program Analysis Techniques}
1125% W.~W\"ogerer. Technical report. Technische Universit\"at Wien 2005
1126%
1127% \bibitem{cerco} \textbf{Certified Complexity}. R.M. Amadio, A. Asperti, N. Ayache,
1128% B. Campbell, D. P. Mulligan, R. Pollack, Y. Regis-Gianas, C. Sacerdoti Coen, I.
1129% Stark, in Procedia Computer Science, Volume 7, 2011, Proceedings of the 2 nd
1130% European Future Technologies Conference and Exhibition 2011 (FET 11), 175-177.
1131%
[3432]1132% \bibitem{labelling} \textbf{Certifying and Reasoning on Cost Annotations in C
[3339]1133% Programs}, N.  Ayache, R.M. Amadio, Y.R\'{e}gis-Gianas, in Proc. FMICS, Springer
1134% LNCS
1135% 7437: 32-46, 2012.
1136% %, DOI:10.1007/978-3-642-32469-7\_3.
1137%
[3432]1138% \bibitem{labelling2} \textbf{Certifying and reasoning on cost annotations of
[3339]1139% functional programs}.
1140% R.M. Amadio, Y. R\'{e}gis-Gianas. Proceedings of the Second international conference
1141% on Foundational and Practical Aspects of Resource Analysis FOPARA 2011 Springer
1142% LNCS 7177:72-89, 2012.
1143%
1144% \bibitem{compcert} \textbf{Formal verification of a realistic compiler}. X. Leroy,  In Commun. ACM 52(7), 107–115, 2009.
1145%
1146% \bibitem{framac} \textbf{Frama-C user manual}. L. Correnson, P. Cuoq, F. Kirchner, V. Prevosto, A. Puccetti, J. Signoles,
1147% B. Yakobowski. in CEA-LIST, Software Safety Laboratory, Saclay, F-91191,
1148% \url{http://frama-c.com/}.
1149%
1150% \bibitem{paolo} \textbf{Indexed Labels for Loop Iteration Dependent Costs}. P.
1151% Tranquilli, in Proceedings of the 11th International Workshop on Quantitative
1152% Aspects of Programming Languages and Systems (QAPL 2013), Rome, 23rd-24th March
1153% 2013, Electronic Proceedings in Theoretical Computer Science, to appear in 2013.
1154%
1155% \bibitem{separation} \textbf{Intuitionistic reasoning about shared mutable data
1156% structure} J.C. Reynolds. In Millennial Perspectives in Computer Science,
1157% pages 303–321, Houndsmill, Hampshire, 2000. Palgrave.
1158%
1159% \bibitem{lustre} \textbf{LUSTRE: a declarative language for real-time
1160% programming}
1161% P. Caspi, D. Pilaud, N. Halbwachs, J.A. Plaice. In Proceedings of
1162% the 14th ACM SIGACT-SIGPLAN symposium on Principles of programming languages ACM
1163% 1987.
1164%
1165% \bibitem{correctness} \textbf{On the correctness of an optimising assembler for
1166% the intel MCS-51 microprocessor}.
1167%   D. P. Mulligan, C. Sacerdoti Coen. In Proceedings of the Second
1168% international conference on Certified Programs and Proofs, Springer-Verlag 2012.
1169%
1170% \bibitem{proartis} \textbf{PROARTIS: Probabilistically Analysable Real-Time
1171% Systems}, F.J. Cazorla, E. Qui\~{n}ones, T. Vardanega, L. Cucu, B. Triquet, G.
1172% Bernat, E. Berger, J. Abella, F. Wartel, M. Houston, et al., in ACM Transactions
1173% on Embedded Computing Systems, 2012.
1174%
1175% \bibitem{embounded} \textbf{The EmBounded project (project paper)}. K. Hammond,
1176% R. Dyckhoff, C. Ferdinand, R. Heckmann, M. Hofmann, H. Loidl, G. Michaelson, J.
1177% Serot, A. Wallace, in Trends in Functional Programming, Volume 6, Intellect
1178% Press, 2006.
1179%
1180% \bibitem{matita}
1181% \textbf{The Matita Interactive Theorem Prover}.
1182% A. Asperti, C. Sacerdoti Coen, W. Ricciotti, E. Tassi.
1183% 23rd International Conference on Automated Deduction, CADE 2011.
1184%
1185% \bibitem{typeffects} \textbf{The Type and Effect Discipline}. J.-P. Talpin,
1186%  P. Jouvelot.
1187%   In Proceedings of the Seventh Annual Symposium on Logic in Computer Science
1188% (LICS '92), Santa Cruz, California, USA, June 22-25, 1992.
1189% IEEE Computer Society 1992.
1190%
1191% \bibitem{stateart} \textbf{The worst-case execution-time problem overview of
1192% methods
1193% and survey of tools.} R. Wilhelm et al., in  ACM Transactions on Embedded
1194% Computing Systems, 7:1–53, May 2008.
1195%
1196% %\bibitem{proartis2} \textbf{A Cache Design for Probabilistic Real-Time
1197% % Systems}, L. Kosmidis, J. Abella, E. Quinones, and F. Cazorla, in Design,
1198% % Automation, and Test in Europe (DATE), Grenoble, France, 03/2013.
1199%
1200% \end{thebibliography}
[3307]1201
[3328]1202
[3324]1203%\bibliography{fopara13.bib}
1204
[3438]1205\appendix
1206
1207\include{appendix}
1208
[3307]1209\end{document}
Note: See TracBrowser for help on using the repository browser.