source: Papers/jar-cerco-2017/introduction.tex @ 3613

Last change on this file since 3613 was 3613, checked in by boender, 3 years ago

Cut paper into sections, continued introduction rewrite

File size: 14.5 KB
Line 
1\section{Introduction}
2\label{sect.introduction}
3
4%\paragraph{Problem statement.}
5Programs can be specified using both functional constraints (what the program must do) and non-functional constraints (what time, space or other resources the program can use).  At the current state of the art, functional properties are verified by combining user annotations---preconditions, invariants, and so on---with a multitude of automated analyses---invariant generators, type systems, abstract interpretation, theorem proving, and so on---on the program's high-level source code.
6
7By contrast, many non-functional properties are verified by analysing the low-level object code. Analysis on low-level object code has several problems, however:
8\begin{itemize*}
9\item
10It can be hard to deduce the high-level structure of the program after compiler optimisations. The object code produced by an optimising compiler may have radically different control flow to the original source code program.
11\item
12Techniques that operate on object code are not useful early in the development process of a program, yet problems with the design or implementation of a program are cheaper to resolve earlier in the process, rather than later.
13\item
14Parametric cost analysis is very hard: how can we reflect a cost that depends on the execution state, for example the value of a register or a carry bit, to a cost that the user can understand while looking at the source code?
15\item
16Performing functional analysis on object code makes it hard for the programmer to provide information about the program and its expected execution, leading to a loss of precision in the resulting analysis.
17\end{itemize*}
18
19\paragraph{Vision and approach.}
20We want to reconcile functional and non-functional analysis: to share information between them and perform both at the same time on high-level source code.
21 
22What has previously prevented this approach from being applied is the lack of a uniform and precise cost model for high-level code, since each statement
23occurrence is compiled differently, optimisations may change control flow, and the cost of an object code instruction may depend on the runtime state of hardware components like pipelines and caches, none of which are visible in the source code.
24
25We envision a new generation of compilers that track program structure through compilation and optimisation and exploit this
26information to define a precise, non-uniform cost model for source code that accounts for runtime state. With such a cost model we can
27reduce non-functional verification to the functional case and exploit the state
28of the art in automated high-level verification~\cite{survey}. The techniques
29currently used by the Worst Case Execution Time (WCET) community, which perform
30analysis on object code, are still available but can be coupled with additional source-level analysis. In cases where our approach produces overly complex cost
31models, safe approximations can be used to trade complexity for precision.
32Finally, source code analysis can be used early in the development process,
33when components have been specified but not implemented, as modularity means
34that it is enough to specify the non-functional behaviour of missing
35components.
36
37\paragraph{Contributions.}
38We have developed \emph{the labelling approach}~\cite{labelling}, a
39technique to implement compilers that induce cost models on source programs by
40very lightweight tracking of code changes through compilation. We have studied
41how to formally prove the correctness of compilers implementing this technique, and
42have implemented such a compiler from C to object binaries for the 8051
43microcontroller that predicts execution time and stack space usage,
44verifying it in an interactive theorem prover.  As we are targeting
45an embedded microcontroller we do not consider dynamic memory allocation.
46
47To demonstrate source-level verification of costs we have implemented
48a Frama-C plugin~\cite{framac} that invokes the compiler on a source
49program and uses it to generate invariants on the high-level source
50that correctly model low-level costs. The plugin certifies that the
51program respects these costs by calling automated theorem provers, a
52new and innovative technique in the field of cost analysis. Finally,
53we have conducted several case studies, including showing that the
54plugin can automatically compute and certify the exact reaction time
55of Lustre~\cite{lustre} data flow programs compiled into C.
56
57\subsection{Project context and approach}
58
59Formal methods for verifying functional properties of programs have 
60now reached a level of maturity and automation that their adoption is slowly
61increasing in production environments. For safety critical code, it is
62becoming commonplace to combine rigorous software engineering methodologies and testing
63with static analyses, taking the strengths of each and mitigating
64their weaknesses. Of particular interest are open frameworks
65for the combination of different formal methods, where the programs can be
66progressively specified and enriched with new safety
67guarantees: every method contributes knowledge (e.g. new invariants) that
68becomes an assumption for later analysis.
69
70The outlook for verifying non-functional properties of programs (time spent,
71memory used, energy consumed) is bleaker.
72% and the future seems to be getting even worse.
73Most industries verify that real time systems meet their deadlines
74by simply performing many runs of the system and timing their execution,  computing the
75maximum time and adding an empirical safety margin, claiming the result to be a
76bound for the WCET of the program. Formal methods and software to statically
77analyse the WCET of programs exist, but they often produce bounds that are too
78pessimistic to be useful. Recent advancements in hardware architecture
79have been
80focused on the improvement of the average case performance, not the
81predictability of the worst case. Execution time is becoming increasingly
82dependent on execution history and the internal state of
83hardware components like pipelines and caches. Multi-core processors and non-uniform
84memory models are drastically reducing the possibility of performing
85static analysis in isolation, because programs are less and less time
86composable. Clock-precise hardware models are necessary for static analysis, and
87obtaining them is becoming harder due to the increased sophistication
88of hardware design.
89
90Despite these problems, the need for reliable real time
91systems and programs is increasing, and there is pressure
92from the research community for the introduction of
93hardware with more predictable behaviour, which would be more suitable
94for static analysis.  One example, being investigated by the Proartis
95project~\cite{proartis}, is to decouple execution time from execution
96history by introducing randomisation.
97
98In CerCo~\cite{cerco} we do not address this problem, optimistically
99assuming that improvements in low-level timing analysis or architecture will make
100verification feasible in the longer term. Instead, the main objective of our work is
101to bring together the static analysis of functional and non-functional
102properties, which in the current state of the art are
103independent activities with limited exchange of information: while the
104functional properties are verified on the source code, the analysis of
105non-functional properties is performed on object code to exploit
106clock-precise hardware models.
107
108\subsection{Current object-code methods}
109
110Analysis currently takes place on object code for two main reasons.
111First, there cannot be a uniform, precise cost model for source
112code instructions (or even basic blocks). During compilation, high level
113instructions are broken up and reassembled in context-specific ways so that
114identifying a fragment of object code and a single high level instruction is
115infeasible. Even the control flow of the object and source code can be very
116different as a result of optimisations, for example aggressive loop
117optimisations may completely transform source level loops. Despite the lack of a uniform, compilation- and
118program-independent cost model on the source language, the literature on the
119analysis of non-asymptotic execution time on high level languages assuming
120such a model is growing and gaining momentum. However, unless we provide a
121replacement for such cost models, this literature's future practical impact looks
122to be minimal. Some hope has been provided by the EmBounded project \cite{embounded}, which
123compositionally compiles high-level code to a byte code that is executed by an
124interpreter with guarantees on the maximal execution time spent for each byte code
125instruction. This provides a uniform model at the expense of the model's
126precision (each cost is a pessimistic upper bound) and the performance of the
127executed code (because the byte code is interpreted compositionally instead of
128performing a fully non-compositional compilation).
129
130The second reason to perform the analysis on the object code is that bounding
131the worst case execution time of small code fragments in isolation (e.g. loop
132bodies) and then adding up the bounds yields very poor estimates as no
133knowledge of the hardware state prior to executing the fragment can be assumed. By
134analysing longer runs the bound obtained becomes more precise because the lack
135of information about the initial state has a relatively small impact.
136
137To calculate the cost of an execution, value and control flow analyses
138are required to bound the number of times each basic block is
139executed.  Currently, state
140of the art WCET analysis tools, such as AbsInt's aiT toolset~\cite{absint}, perform these analyses on
141object code, where the logic of the program is harder to reconstruct
142and most information available at the source code level has been lost;
143see~\cite{stateart} for a survey. Imprecision in the analysis can lead to useless bounds. To
144augment precision, the tools ask the user to provide constraints on
145the object code control flow, usually in the form of bounds on the
146number of iterations of loops or linear inequalities on them. This
147requires the user to manually link the source and object code,
148translating his assumptions on the source code (which may be wrong) to
149object code constraints. The task is error prone and hard, especially
150in the presence of complex compiler optimisations.
151
152Traditional techniques for WCET that work on object code are also affected by
153another problem: they cannot be applied before the generation of the object
154code. Functional properties can be analysed in early development stages, while
155analysis of non-functional properties may come too late to avoid expensive
156changes to the program architecture.
157
158\subsection{CerCo's approach}
159
160In CerCo we propose a radically new approach to the problem: we reject the idea
161of a uniform cost model and we propose that the compiler, which knows how the
162code is translated, must return the cost model for basic blocks of high level
163instructions. It must do so by keeping track of the control flow modifications
164to reverse them and by interfacing with processor timing analysis.
165
166By embracing compilation, instead of avoiding it like EmBounded did, a CerCo
167compiler can both produce efficient code and return costs that are
168as precise as the processor timing analysis can be. Moreover, our costs can be
169parametric: the cost of a block can depend on actual program data, on a
170summary of the execution history, or on an approximated representation of the
171hardware state. For example, loop optimisations may assign a cost to a loop body
172that is a function of the number of iterations performed. As another example,
173the cost of a block may be a function of the vector of stalled pipeline states,
174which can be exposed in the source code and updated at each basic block exit. It
175is parametricity that allows one to analyse small code fragments without losing
176precision. In the analysis of the code fragment we do not have to ignore the
177initial hardware state, rather, we may assume that we know exactly which
178state (or mode, as the WCET literature calls it) we are in.
179
180The CerCo approach has the potential to dramatically improve the state of the
181art. By performing control and data flow analyses on the source code, the error
182prone translation of invariants is completely avoided. Instead, this
183work is done at the source level using tools of the user's choice.
184Any available technique for the verification of functional properties
185can be immediately reused and multiple techniques can collaborate together to
186infer and certify cost invariants for the program.  There are no
187limitations on the types of loops or data structures involved. Parametric cost analysis
188becomes the default one, with non-parametric bounds used as a last resort when the user
189decides to trade the complexity of the analysis with its precision. \emph{A priori}, no
190technique previously used in traditional WCET is lost: processor
191timing analyses can be used by the compiler on the object code, and the rest can be applied
192at the source code level.
193 Our approach can also work in the early
194stages of development by axiomatically attaching costs to unimplemented components.
195
196
197Software used to verify properties of programs must be as bug free as
198possible. The trusted code base for verification consists of the code that needs
199to be trusted to believe that the property holds. The trusted code base of
200state-of-the-art WCET tools is very large: one needs to trust the control flow
201analyser, the linear programming libraries used, and also the formal models
202of the hardware under analysis, for example. In CerCo we are moving the control flow analysis to the source
203code and we are introducing a non-standard compiler too. To reduce the trusted
204code base, we implemented a prototype and a static analyser in an interactive
205theorem prover, which was used to certify that the costs added to the source
206code are indeed those incurred by the hardware. Formal models of the
207hardware and of the high level source languages were also implemented in the
208interactive theorem prover. Control flow analysis on the source code has been
209obtained using invariant generators, tools to produce proof obligations from
210generated invariants and automatic theorem provers to verify the obligations. If
211these tools are able to generate proof traces that can be
212independently checked, the only remaining component that enters the trusted code
213base is an off-the-shelf invariant generator which, in turn, can be proved
214correct using an interactive theorem prover. Therefore we achieve the double
215objective of allowing the use of more off-the-shelf components (e.g. provers and
216invariant generators) whilst reducing the trusted code base at the same time.
Note: See TracBrowser for help on using the repository browser.