source: Papers/jar-cerco-2017/introduction.tex @ 3615

Last change on this file since 3615 was 3615, checked in by boender, 3 years ago

Moved paper structure comments to their relevant sections

File size: 15.4 KB
Line 
1% Introduction
2%   Problem being solved, background, etc.
3%   Current state of the art (and problem with it)
4%   The `CerCo approach' (tm)
5%   Brief Matita overview
6%   Map of paper
7
8\section{Introduction}
9\label{sect.introduction}
10
11%\paragraph{Problem statement.}
12Programs can be specified using both functional constraints (what the program
13must do) and non-functional constraints (what time, space or other resources
14the program can use).  At the current state of the art, functional properties
15are verified by combining user annotations---preconditions, invariants, and so
16on---with a multitude of automated analyses---invariant generators, type
17systems, abstract interpretation, theorem proving, and so on---on the
18high-level source code of the program.
19
20By contrast, many non-functional properties are verified by analysing the
21low-level object code. Analysis on low-level object code has several problems,
22however:
23
24\begin{itemize*}
25\item
26It can be hard to deduce the high-level structure of the program after compiler
27optimisations. The object code produced by an optimising compiler may have a
28radically different control flow to the original source code program.
29\item
30Techniques that operate on object code are not useful early in the development
31process of a program, yet problems with the design or implementation of a
32program are cheaper to resolve early in the development process.
33\item
34Parametric cost analysis is very hard: how can we translate a cost that depends
35on the execution state, for example the value of a register or a carry bit, to
36a cost that the user can understand while looking at the source code?
37\item
38Performing functional analysis on object code makes it hard for the programmer
39to provide information about the program and its expected execution, leading to
40a loss of precision in the resulting analysis.
41\end{itemize*}
42
43\paragraph{Vision and approach.}
44We want to reconcile functional and non-functional analysis: to share
45information between them and perform both at the same time on high-level source
46code.
47 
48What has previously prevented this approach from being applied is the lack of a
49uniform and precise cost model for high-level code, since each statement
50occurrence is compiled differently, optimisations may change control flow, and
51the cost of an object code instruction may depend on the runtime state of
52hardware components like pipelines and caches, none of which are visible in the
53source code.
54
55We envision a new generation of compilers that track program structure through
56compilation and optimisation and exploit this information to define a precise,
57non-uniform cost model for source code that accounts for runtime state. With
58such a cost model we can perform non-functional verification in a similar way
59to functional verification and exploit the state of the art in automated
60high-level verification~\cite{survey}.
61
62The techniques currently used by the Worst Case Execution Time (WCET) community,
63which perform analysis on object code, are still applicable but can be coupled
64with additional source-level analysis. In cases where our approach produces
65overly complex cost models, safe approximations can be used to trade complexity
66for precision.
67
68Finally, source code analysis can be used early in the development process,
69when components have been specified but not implemented, as modularity means
70that it is enough to specify the non-functional behaviour of missing components.
71
72\paragraph{Contributions.}
73We have developed \emph{the labelling approach}~\cite{labelling}, a technique
74to implement compilers that induce cost models on source programs by very
75lightweight tracking of code changes through the compilation pipeline.
76
77We have studied how to formally prove the correctness of compilers implementing
78this technique, and have implemented such a compiler from C to object binaries
79for the 8051 microcontroller that predicts execution time and stack space
80usage, verifying it in an interactive theorem prover. As we are targeting an
81embedded microcontroller we do not consider dynamic memory allocation.
82
83To demonstrate source-level verification of costs we have implemented a Frama-C
84plugin~\cite{framac} that invokes the compiler on a source program and uses it
85to generate invariants on the high-level source that correctly model low-level
86costs. The plugin certifies that the program respects these costs by calling
87automated theorem provers, a new and innovative technique in the field of cost
88analysis.
89
90Finally, we have conducted several case studies, including showing that the
91plugin can automatically compute and certify the exact reaction time of
92Lustre~\cite{lustre} data flow programs compiled into C.
93
94\subsection{Project context and approach}
95
96Formal methods for verifying functional properties of programs have now reached
97such a level of maturity and automation that their adoption is slowly
98increasing in production environments.
99
100For safety critical code, it is becoming commonplace to combine rigorous
101software engineering methodologies and testing with static analyses, taking the
102strengths of each and mitigating their weaknesses. Of particular interest are
103open frameworks for the combination of different formal methods, where the
104programs can be progressively specified and enriched with new safety
105guarantees: every method contributes knowledge (e.g. new invariants) that
106can be built upon by other analysis methods.
107
108The outlook for verification of non-functional properties of programs (time
109spent, memory used, energy consumed) is bleaker. In most cases, verifying that
110real-time systems meet their deadlines is done by simply performing many runs
111of the system and timing their execution, computing the maximum time and adding
112an empirical safety margin, claiming the result to be a bound for the WCET of
113the program.
114
115Formal methods and software to statically analyse the WCET of programs exist,
116but they often produce bounds that are too pessimistic to be useful. Recent
117advances in hardware architecture have focused on improving average case
118performance, not predictability of the worst case.
119
120Execution time is becoming increasingly dependent on execution history and the
121internal state of hardware components like pipelines and caches. Multi-core
122processors and non-uniform memory models are drastically reducing the
123possibility of performing static analysis in isolation, because programs are
124less and less time-composable. Clock-precise hardware models are necessary for
125static analysis, and obtaining them is becoming harder due to the increased
126sophistication of hardware design.
127
128The need for reliable real-time systems and programs is increasing, and there
129is pressure from the research community for the introduction of hardware with
130more predictable behaviour, which would be more suitable for static analysis.
131One example, being investigated by the Proartis project~\cite{proartis}, is to
132decouple execution time from execution history by introducing randomisation.
133
134In CerCo~\cite{cerco} we do not address this problem, optimistically assuming
135that improvements in low-level timing analysis or architecture will make
136verification feasible in the longer term.
137
138Instead, the main objective of our work is to bring together static analysis of
139functional and non-functional properties, which in the current state of the art
140are independent activities with limited exchange of information: while the
141functional properties are verified on the source code, the analysis of
142non-functional properties is performed on object code to exploit clock-precise
143hardware models.
144
145\subsection{Current object-code methods}
146
147Analysis currently takes place on object code for two main reasons.
148
149Firstly, there cannot be a uniform, precise cost model for source code
150instructions (or even basic blocks). During compilation, high level
151instructions are broken up and reassembled in context-specific ways so that
152identifying a fragment of object code and a single high level instruction is
153infeasible.
154
155Additionally, the control flow of the object and source code can be very
156different as a result of optimisations. For example, aggressive loop
157optimisations can completely transform source level loops.
158
159Despite the lack of a uniform, compilation- and program-independent cost model
160on the source language, research on the analysis of non-asymptotic execution
161time on high level languages assuming such a model is growing and gaining
162momentum.
163
164Unless such cost models are developed, the future practical impact of this
165research looks to be minimal. One existing approach is the EmBounded
166project~\cite{embounded}, which compositionally compiles high-level code to a
167byte code that is executed by an interpreter with guarantees on the maximal
168execution time spent for each byte code instruction. This provides a model
169that is uniform, though at the expense of precision (each cost is a pessimistic
170upper bound) and the performance of the executed code (the byte code is
171interpreted compositionally).
172
173The second reason to perform analysis on the object code is that bounding
174the worst case execution time of small code fragments in isolation (e.g. loop
175bodies) and then adding up the bounds yields very poor estimates as no
176knowledge of the hardware state prior to executing the fragment can be assumed.
177
178By analysing longer runs the bound obtained becomes more precise because the
179lack of information about the initial state has a relatively small impact.
180
181To calculate the cost of an execution, value and control flow analyses are
182required to bound the number of times each basic block is executed. Currently,
183state of the art WCET analysis tools, such as AbsInt's aiT
184toolset~\cite{absint}, perform these analyses on object code, where the logic
185of the program is harder to reconstruct and most information available at the
186source code level has been lost; see~\cite{stateart} for a survey.
187
188Imprecision in the analysis can lead to useless bounds. To augment precision,
189currently tools ask the user to provide constraints on the object code control
190flow, usually in the form of bounds on the number of iterations of loops or
191linear inequalities on them. This requires the user to manually link the source and object code, translating their assumptions on the source code (which may be
192wrong) to object code constraints. This task is hard and error-prone,
193especially in the presence of complex compiler optimisations.
194
195Traditional techniques for WCET that work on object code are also affected by
196another problem: they cannot be applied before the generation of the object
197code. Functional properties can be analysed in early development stages, while
198analysis of non-functional properties may come too late to avoid expensive
199changes to the program architecture.
200
201\subsection{The CerCo approach}
202
203In CerCo we propose a radically new approach to the problem: we reject the idea
204of a uniform cost model and we propose that the compiler, which knows how the
205code is translated, must return the cost model for basic blocks of high level
206instructions. It must do so by keeping track of the control flow modifications
207to reverse them and by interfacing with processor timing analysis.
208
209By embracing compilation, instead of avoiding it like EmBounded did, a CerCo
210compiler can both produce efficient code and return costs that are as precise
211as the processor timing analysis can be. Moreover, our costs can be parametric:
212the cost of a block can depend on actual program data, on a summary of the
213execution history, or on an approximated representation of the hardware state.
214
215For example, loop optimisations may assign a cost to a loop body that is a
216function of the number of iterations performed. As another example, the cost of
217a block may be a function of the vector of stalled pipeline states, which can
218be exposed in the source code and updated at each basic block exit.
219
220It is parametricity that allows one to analyse small code fragments without
221losing precision. In the analysis of the code fragment we do not have to ignore
222the initial hardware state; rather, we can assume that we know exactly which
223state (or mode, as the WCET literature calls it) we are in.
224
225The CerCo approach has the potential to dramatically improve the state of the
226art. By performing control and data flow analyses on the source code, the
227error-prone translation of invariants is avoided entirely. Instead, this work
228is done at the source code level using tools of the user's choice.
229
230Any available technique for the verification of functional properties can be
231easily reused and multiple techniques can collaborate to infer and certify cost
232invariants for the program. There are no limitations on the types of loops or
233data structures involved.
234
235Parametric cost analysis becomes the default, with non-parametric bounds used
236only as a last resort when the user decides to trade the complexity of the
237analysis for more precision.
238
239\emph{A priori}, no technique previously used in traditional WCET is obsolete:
240processor timing analysis can be used by the compiler on the object code, and
241other techniques can be applied at the source code level.
242
243Our approach also works in the early stages of development by allowing the user
244to axiomatically attach costs to unimplemented components.
245
246Software used to verify properties of programs must be as bug-free as possible.
247The trusted code base for verification consists of the code that needs to be
248trusted to believe that the property holds.
249
250The trusted code base of state-of-the-art WCET tools is very large: one needs
251to trust the control flow analyser, the linear programming libraries used, and
252also the formal models of the hardware under analysis.
253
254In CerCo we move the control flow analysis to the source code level, and we
255introduce a non-standard compiler. To reduce the size of the trusted code base,
256we have implemented a prototype compiler and static analyser in an interactive
257theorem prover, which was used to certify that the costs added to the source
258code are indeed those incurred by the hardware. We have also implemented formal models of the hardware and of the high level source language in the interactive
259theorem prover.
260
261Control flow analysis on the source code has been obtained using invariant
262generators, tools to produce proof obligations from generated invariants and
263automatic theorem provers to verify the obligations. If these tools are able to
264generate proof traces that can be independently checked, the only remaining
265component that enters the trusted code base is an off-the-shelf invariant
266generator which, in turn, can be proved correct using an interactive theorem
267prover.
268
269With these methods, we achieve the objective of allowing the use of more
270off-the-shelf components (e.g. provers and invariant generators) whilst
271reducing the trusted code base at the same time.
272
273\subsection{Introduction to Matita}
274
275The theorem prover used to implement the prototype compiler is
276Matita~\cite{matita}.
277
278\subsection{Map of the paper}
279
280The rest of the paper is stuctured as follows.
281
282In section~\ref{sect.compiler.architecture}, we describe the architecture of the
283CerCo compiler, as well as the intermediate languages that it uses. We also
284describe the target hardware and its formal model.
285
286In section~\ref{sect.compiler.proof}, we describe the proof of correctness of
287the compiler in more detail. We explain our use of structured traces, the
288labelling approach, and discuss the assembler.
289
290In section~\ref{sect.formal.development}, we present data on the formal
291development.
292
293In section~\ref{sect.framac.plugin}, we discuss the Frama-C plugin, as well as
294some of the case studies we have performed to validate it.
295
296Finally, in section~\ref{sect:conclusions} we present conclusions, as well as
297related and future work.
Note: See TracBrowser for help on using the repository browser.