source: Papers/jar-cerco-2017/introduction.tex @ 3614

Last change on this file since 3614 was 3614, checked in by boender, 3 years ago

Rewrote introduction, and moved development section

File size: 15.2 KB
Line 
1\section{Introduction}
2\label{sect.introduction}
3
4%\paragraph{Problem statement.}
5Programs can be specified using both functional constraints (what the program
6must do) and non-functional constraints (what time, space or other resources
7the program can use).  At the current state of the art, functional properties
8are verified by combining user annotations---preconditions, invariants, and so
9on---with a multitude of automated analyses---invariant generators, type
10systems, abstract interpretation, theorem proving, and so on---on the
11high-level source code of the program.
12
13By contrast, many non-functional properties are verified by analysing the
14low-level object code. Analysis on low-level object code has several problems,
15however:
16
17\begin{itemize*}
18\item
19It can be hard to deduce the high-level structure of the program after compiler
20optimisations. The object code produced by an optimising compiler may have a
21radically different control flow to the original source code program.
22\item
23Techniques that operate on object code are not useful early in the development
24process of a program, yet problems with the design or implementation of a
25program are cheaper to resolve early in the development process.
26\item
27Parametric cost analysis is very hard: how can we translate a cost that depends
28on the execution state, for example the value of a register or a carry bit, to
29a cost that the user can understand while looking at the source code?
30\item
31Performing functional analysis on object code makes it hard for the programmer
32to provide information about the program and its expected execution, leading to
33a loss of precision in the resulting analysis.
34\end{itemize*}
35
36\paragraph{Vision and approach.}
37We want to reconcile functional and non-functional analysis: to share
38information between them and perform both at the same time on high-level source
39code.
40 
41What has previously prevented this approach from being applied is the lack of a
42uniform and precise cost model for high-level code, since each statement
43occurrence is compiled differently, optimisations may change control flow, and
44the cost of an object code instruction may depend on the runtime state of
45hardware components like pipelines and caches, none of which are visible in the
46source code.
47
48We envision a new generation of compilers that track program structure through
49compilation and optimisation and exploit this information to define a precise,
50non-uniform cost model for source code that accounts for runtime state. With
51such a cost model we can perform non-functional verification in a similar way
52to functional verification and exploit the state of the art in automated
53high-level verification~\cite{survey}.
54
55The techniques currently used by the Worst Case Execution Time (WCET) community,
56which perform analysis on object code, are still applicable but can be coupled
57with additional source-level analysis. In cases where our approach produces
58overly complex cost models, safe approximations can be used to trade complexity
59for precision.
60
61Finally, source code analysis can be used early in the development process,
62when components have been specified but not implemented, as modularity means
63that it is enough to specify the non-functional behaviour of missing components.
64
65\paragraph{Contributions.}
66We have developed \emph{the labelling approach}~\cite{labelling}, a technique
67to implement compilers that induce cost models on source programs by very
68lightweight tracking of code changes through the compilation pipeline.
69
70We have studied how to formally prove the correctness of compilers implementing
71this technique, and have implemented such a compiler from C to object binaries
72for the 8051 microcontroller that predicts execution time and stack space
73usage, verifying it in an interactive theorem prover. As we are targeting an
74embedded microcontroller we do not consider dynamic memory allocation.
75
76To demonstrate source-level verification of costs we have implemented a Frama-C
77plugin~\cite{framac} that invokes the compiler on a source program and uses it
78to generate invariants on the high-level source that correctly model low-level
79costs. The plugin certifies that the program respects these costs by calling
80automated theorem provers, a new and innovative technique in the field of cost
81analysis.
82
83Finally, we have conducted several case studies, including showing that the
84plugin can automatically compute and certify the exact reaction time of
85Lustre~\cite{lustre} data flow programs compiled into C.
86
87\subsection{Project context and approach}
88
89Formal methods for verifying functional properties of programs have now reached
90such a level of maturity and automation that their adoption is slowly
91increasing in production environments.
92
93For safety critical code, it is becoming commonplace to combine rigorous
94software engineering methodologies and testing with static analyses, taking the
95strengths of each and mitigating their weaknesses. Of particular interest are
96open frameworks for the combination of different formal methods, where the
97programs can be progressively specified and enriched with new safety
98guarantees: every method contributes knowledge (e.g. new invariants) that
99can be built upon by other analysis methods.
100
101The outlook for verification of non-functional properties of programs (time
102spent, memory used, energy consumed) is bleaker. In most cases, verifying that
103real-time systems meet their deadlines is done by simply performing many runs
104of the system and timing their execution, computing the maximum time and adding
105an empirical safety margin, claiming the result to be a bound for the WCET of
106the program.
107
108Formal methods and software to statically analyse the WCET of programs exist,
109but they often produce bounds that are too pessimistic to be useful. Recent
110advances in hardware architecture have focused on improving average case
111performance, not predictability of the worst case.
112
113Execution time is becoming increasingly dependent on execution history and the
114internal state of hardware components like pipelines and caches. Multi-core
115processors and non-uniform memory models are drastically reducing the
116possibility of performing static analysis in isolation, because programs are
117less and less time-composable. Clock-precise hardware models are necessary for
118static analysis, and obtaining them is becoming harder due to the increased
119sophistication of hardware design.
120
121The need for reliable real-time systems and programs is increasing, and there
122is pressure from the research community for the introduction of hardware with
123more predictable behaviour, which would be more suitable for static analysis.
124One example, being investigated by the Proartis project~\cite{proartis}, is to
125decouple execution time from execution history by introducing randomisation.
126
127In CerCo~\cite{cerco} we do not address this problem, optimistically assuming
128that improvements in low-level timing analysis or architecture will make
129verification feasible in the longer term.
130
131Instead, the main objective of our work is to bring together static analysis of
132functional and non-functional properties, which in the current state of the art
133are independent activities with limited exchange of information: while the
134functional properties are verified on the source code, the analysis of
135non-functional properties is performed on object code to exploit clock-precise
136hardware models.
137
138\subsection{Current object-code methods}
139
140Analysis currently takes place on object code for two main reasons.
141
142Firstly, there cannot be a uniform, precise cost model for source code
143instructions (or even basic blocks). During compilation, high level
144instructions are broken up and reassembled in context-specific ways so that
145identifying a fragment of object code and a single high level instruction is
146infeasible.
147
148Additionally, the control flow of the object and source code can be very
149different as a result of optimisations. For example, aggressive loop
150optimisations can completely transform source level loops.
151
152Despite the lack of a uniform, compilation- and program-independent cost model
153on the source language, research on the analysis of non-asymptotic execution
154time on high level languages assuming such a model is growing and gaining
155momentum.
156
157Unless such cost models are developed, the future practical impact of this
158research looks to be minimal. One existing approach is the EmBounded
159project~\cite{embounded}, which compositionally compiles high-level code to a
160byte code that is executed by an interpreter with guarantees on the maximal
161execution time spent for each byte code instruction. This provides a model
162that is uniform, though at the expense of precision (each cost is a pessimistic
163upper bound) and the performance of the executed code (the byte code is
164interpreted compositionally).
165
166The second reason to perform analysis on the object code is that bounding
167the worst case execution time of small code fragments in isolation (e.g. loop
168bodies) and then adding up the bounds yields very poor estimates as no
169knowledge of the hardware state prior to executing the fragment can be assumed.
170
171By analysing longer runs the bound obtained becomes more precise because the
172lack of information about the initial state has a relatively small impact.
173
174To calculate the cost of an execution, value and control flow analyses are
175required to bound the number of times each basic block is executed. Currently,
176state of the art WCET analysis tools, such as AbsInt's aiT
177toolset~\cite{absint}, perform these analyses on object code, where the logic
178of the program is harder to reconstruct and most information available at the
179source code level has been lost; see~\cite{stateart} for a survey.
180
181Imprecision in the analysis can lead to useless bounds. To augment precision,
182currently tools ask the user to provide constraints on the object code control
183flow, usually in the form of bounds on the number of iterations of loops or
184linear inequalities on them. This requires the user to manually link the source and object code, translating their assumptions on the source code (which may be
185wrong) to object code constraints. This task is hard and error-prone,
186especially in the presence of complex compiler optimisations.
187
188Traditional techniques for WCET that work on object code are also affected by
189another problem: they cannot be applied before the generation of the object
190code. Functional properties can be analysed in early development stages, while
191analysis of non-functional properties may come too late to avoid expensive
192changes to the program architecture.
193
194\subsection{The CerCo approach}
195
196In CerCo we propose a radically new approach to the problem: we reject the idea
197of a uniform cost model and we propose that the compiler, which knows how the
198code is translated, must return the cost model for basic blocks of high level
199instructions. It must do so by keeping track of the control flow modifications
200to reverse them and by interfacing with processor timing analysis.
201
202By embracing compilation, instead of avoiding it like EmBounded did, a CerCo
203compiler can both produce efficient code and return costs that are as precise
204as the processor timing analysis can be. Moreover, our costs can be parametric:
205the cost of a block can depend on actual program data, on a summary of the
206execution history, or on an approximated representation of the hardware state.
207
208For example, loop optimisations may assign a cost to a loop body that is a
209function of the number of iterations performed. As another example, the cost of
210a block may be a function of the vector of stalled pipeline states, which can
211be exposed in the source code and updated at each basic block exit.
212
213It is parametricity that allows one to analyse small code fragments without
214losing precision. In the analysis of the code fragment we do not have to ignore
215the initial hardware state; rather, we can assume that we know exactly which
216state (or mode, as the WCET literature calls it) we are in.
217
218The CerCo approach has the potential to dramatically improve the state of the
219art. By performing control and data flow analyses on the source code, the
220error-prone translation of invariants is avoided entirely. Instead, this work
221is done at the source code level using tools of the user's choice.
222
223Any available technique for the verification of functional properties can be
224easily reused and multiple techniques can collaborate to infer and certify cost
225invariants for the program. There are no limitations on the types of loops or
226data structures involved.
227
228Parametric cost analysis becomes the default, with non-parametric bounds used
229only as a last resort when the user decides to trade the complexity of the
230analysis for more precision.
231
232\emph{A priori}, no technique previously used in traditional WCET is obsolete:
233processor timing analysis can be used by the compiler on the object code, and
234other techniques can be applied at the source code level.
235
236Our approach also works in the early stages of development by allowing the user
237to axiomatically attach costs to unimplemented components.
238
239Software used to verify properties of programs must be as bug-free as possible.
240The trusted code base for verification consists of the code that needs to be
241trusted to believe that the property holds.
242
243The trusted code base of state-of-the-art WCET tools is very large: one needs
244to trust the control flow analyser, the linear programming libraries used, and
245also the formal models of the hardware under analysis.
246
247In CerCo we move the control flow analysis to the source code level, and we
248introduce a non-standard compiler. To reduce the size of the trusted code base,
249we have implemented a prototype compiler and static analyser in an interactive
250theorem prover, which was used to certify that the costs added to the source
251code are indeed those incurred by the hardware. We have also implemented formal models of the hardware and of the high level source language in the interactive
252theorem prover.
253
254Control flow analysis on the source code has been obtained using invariant
255generators, tools to produce proof obligations from generated invariants and
256automatic theorem provers to verify the obligations. If these tools are able to
257generate proof traces that can be independently checked, the only remaining
258component that enters the trusted code base is an off-the-shelf invariant
259generator which, in turn, can be proved correct using an interactive theorem
260prover.
261
262With these methods, we achieve the objective of allowing the use of more
263off-the-shelf components (e.g. provers and invariant generators) whilst
264reducing the trusted code base at the same time.
265
266\subsection{Introduction to Matita}
267
268The theorem prover used to implement the prototype compiler is
269Matita~\cite{matita}.
270
271\subsection{Map of the paper}
272
273The rest of the paper is stuctured as follows.
274
275In section~\ref{sect.compiler.architecture}, we describe the architecture of the
276CerCo compiler, as well as the intermediate languages that it uses. We also
277describe the target hardware and its formal model.
278
279In section~\ref{sect.compiler.proof}, we describe the proof of correctness of
280the compiler in more detail. We explain our use of structured traces, the
281labelling approach, and discuss the assembler.
282
283In section~\ref{sect.formal.development}, we present data on the formal
284development.
285
286In section~\ref{sect.framac.plugin}, we discuss the Frama-C plugin, as well as
287some of the case studies we have performed to validate it.
288
289Finally, in section~\ref{sect:conclusions} we present conclusions, as well as
290related and future work.
Note: See TracBrowser for help on using the repository browser.