# source:Deliverables/D6.3/report.tex@3117

Last change on this file since 3117 was 3117, checked in by sacerdot, 8 years ago

...

File size: 22.6 KB
Line
1\documentclass[11pt, epsf, a4wide]{article}
2
3\usepackage{../style/cerco}
4
5\usepackage{amsfonts}
6\usepackage{amsmath}
7\usepackage{amssymb}
8\usepackage[english]{babel}
9\usepackage{graphicx}
10\usepackage[utf8x]{inputenc}
11\usepackage{listings}
12\usepackage{stmaryrd}
13\usepackage{url}
14\usepackage{bbm}
15
16\title{
17INFORMATION AND COMMUNICATION TECHNOLOGIES\\
18(ICT)\\
19PROGRAMME\\
20\vspace*{1cm}Project FP7-ICT-2009-C-243881 \cerco{}}
21
22\lstdefinelanguage{matita-ocaml}
23  {keywords={ndefinition,ncoercion,nlemma,ntheorem,nremark,ninductive,nrecord,nqed,nlet,let,in,rec,match,return,with,Type,try},
24   morekeywords={[2]nwhd,nnormalize,nelim,ncases,ndestruct},
25   morekeywords={[3]type,of},
26   mathescape=true,
27  }
28
29\lstset{language=matita-ocaml,basicstyle=\small\tt,columns=flexible,breaklines=false,
30        keywordstyle=\color{red}\bfseries,
31        keywordstyle=[2]\color{blue},
32        keywordstyle=[3]\color{blue}\bfseries,
34        stringstyle=\color{blue},
35        showspaces=false,showstringspaces=false}
36
37\lstset{extendedchars=false}
38\lstset{inputencoding=utf8x}
39\DeclareUnicodeCharacter{8797}{:=}
40\DeclareUnicodeCharacter{10746}{++}
41\DeclareUnicodeCharacter{9001}{\ensuremath{\langle}}
42\DeclareUnicodeCharacter{9002}{\ensuremath{\rangle}}
43
44\date{}
45\author{}
46
47\begin{document}
48
49\thispagestyle{empty}
50
51\vspace*{-1cm}
52\begin{center}
53\includegraphics[width=0.6\textwidth]{../style/cerco_logo.png}
54\end{center}
55
56\begin{minipage}{\textwidth}
57\maketitle
58\end{minipage}
59
60\vspace*{0.5cm}
61\begin{center}
62\begin{LARGE}
63\textbf{
64Report n. D6.3\\
65Final Report on User Validation}
66\end{LARGE}
67\end{center}
68
69\vspace*{2cm}
70\begin{center}
71\begin{large}
72Version 1.0
73\end{large}
74\end{center}
75
76\vspace*{0.5cm}
77\begin{center}
78\begin{large}
79Main Authors:\\
80XXXX %Dominic P. Mulligan and Claudio Sacerdoti Coen
81\end{large}
82\end{center}
83
84\vspace*{\fill}
85
86\noindent
87Project Acronym: \cerco{}\\
88Project full title: Certified Complexity\\
89Proposal/Contract no.: FP7-ICT-2009-C-243881 \cerco{}\\
90
91\clearpage
93\markright{\cerco{}, FP7-ICT-2009-C-243881}
94
95\newpage
96
97\vspace*{7cm}
98\paragraph{Abstract}
99xxx
100
101\newpage
102
103\tableofcontents
104
105\newpage
106
109
110The Grant Agreement describes deliverable D6.3 as follows:
111\begin{quotation}
112\textbf{Final Report on User Validation}: An articulated analysis and critics of the user validation experiences. In particular we will review the effectiveness of the techniques for cost annotation exploitement that have been employed in the project and that have been validated on simple and non trivial examples. We will also identify additional techniques that could be exploited in the middle and long term to bring the CerCo compiler to its full potentialities.
113
114\end{quotation}
115
116\section{Middle and Long Term Improvements}
117In order to identify the middle and long term improvements, we briefly review
118here the premises and goals of the CerCo approach to resource analysis.
119\begin{itemize}
120 \item There is a lot of recent and renewed activity in the formal method
121   community to accomodate resource analysis using techniques derived from
122   functional analysis (type systems, logics, abstract interpretation,
123   amortized analysis applied to data structures, etc.)
124 \item Most of this work, which currently remains at theoretical level,
125   is focused on high level languages and it assumes the existence of correct
126   and compositional resource cost models.
127 \item High level languages are compiled to object code by compilers that
128   should respect the functional properties of the program. However, because
129   of optimizations and the inherently non compositional nature of compilation,
130   compilers do not respect compositional cost models that are imposed a priori
131   on the source language. By controlling the compiler and coupling it with a
132   WCET analyser, it is actually possible to
133   choose the cost model in such a way that the cost bounds are high enough
134   to bound the cost of every produced code. This was attempted for instance
135   in the EmBounded project with good success. However, we believe that bounds
136   obtained in this way have few possibilities of being tight.
137 \item Therefore our approach consists in having the compiler generate the
138   cost model for the user by combining tracking of basic blocks during code
139   transformations with a static resource analysis on the object code for basic
140   blocks. We formally prove the compiler to respect the cost model that is
141   induced on the source level based on a very few assumptions: basically the
142   cost of a sequence of instructions should be associative and commutative
143   and it should not depend on the machine status, except its program counter.
144   Commutativity can be relaxed at the price of introducing more cost updates
145   in the instrumented source code.
146 \item The cost model for basic blocks induced on the source language must then
147   be exploited to derive cost invariants and to prove them automatically.
148   In CerCo we have shown how even simple invariant generations techniques
149   are sufficient to enable the fully automatic proving of parametric WCET
150   bounds for simple C programs and for Lustre programs of arbitrary complexity.
151\end{itemize}
152
153Compared to traditional WCET techniques, our approach currently has many
154similarities, some advantages and some limitations. Both techniques need to
155perform data flow analysis on the control flow graph of the program and both
156techniques need to estimate the cost of control blocks of instructions.
157
158\subsection{Control flow analysis}
159
160The first main difference is in the control flow analysis. Traditional WCET
161starts from object code and reconstructs the control flow graph from it.
162Moreover, abstract interpretation is heavily employed to bound the number of
163executions of cycles. In order to improve the accuracy of estimation, control
164flow constraints are provided by the user, usually as systems of (linear)
165inequalities. In order to do this, the user, helped by the system, needs to
166relate the object code control flow graph with the source one, because it is
167on the latter that the bounds can be figured out and be understood. This
168operations is untrusted and potentially error prone for complex optimizations
169(like aggressive loop optimizations). Efficient tools from linear algebra are
170then used to solve the systems of inequations obtained by the abstract
171interpreter and from the user constraints.
172
173In CerCo, instead, we assume full control on the compiler that
174is able to relate, even in non trivial ways, the object code control flow graph
175onto the source code control flow graph. A clear disadvantage is the
176impossibility of applying the tool on the object code produced by third party
177compilers. On the other hand, we get rid of the possibility of errors
178in the reconstruction of the control flow graph and in the translation of
179high level constraints into low level constraints. The second potentially
180important advantage is that, once we are dealing with the source language,
181we can augment the precision of our dataflow analysis by combining together
182functional and non functional invariants. This is what we attempted with
183the CerCo Cost Annotating Frama-C Plug-In. The Frama-C architecture allows
184several plug-ins to perform all kind of static analisys on the source code,
185reusing results from other plug-ins and augmenting the source code with their
186results. The techniques are absolutely not limited to linear algebra and
187abstract interpretation, and the most important plug-ins call domain specific
188and general purpose automated theorem provers to close proof obligations of
189arbitrary shape and complexity.
190
191In principle, the extended flexibility of the analysis should allow for a major
192advantage of our technique in terms of precision, also
193considering that all analysis used in traditional WCET can still be implemented
194as plug-ins. In particular, the target we have in mind are systems that are
195both (hard) real time and safety critical. Being safety critical, we can already
196expect them to be fully or partially specified at the functional level.
197Therefore we expect that the additional functional invariants should allow to
198augment the precision of the cost bounds, up to the point where the parametric
199cost bound is fully precise.
200In practice, we have not had the time to perform extensive
201comparisons on the kind of code used by industry in production systems.
202The first middle term improvement of CerCo would then consist in this kind
203of analysis, to support or disprove our expectations. It seems that the
204newborn TACLe Cost Action (Timing Analysis on Code Level) would be the
205best framework to achieve this improvement.
206In the case where our
207technique remains promising, the next long term improvement would consist in
208integrating in the Frama-C plug-in ad-hoc analysis and combinations of analysis
209that would augment the coverage of the efficiency of the cost estimation
210techniques.
211
212\subsection{Static analysis of costs of basic blocks}
213At the beginning of the project we have deliberately decided to focus our
214attention on the control flow preservation, the cost model propagation and
215the exploitation of the cost model induced on the high level code. For this
216reason we have devoted almost no attention to the static analysis of basic
217blocks. This was achieved by picking a very simple hardware architecture
218(the 8051 microprocessor family) whose cost model is fully predictable and
219compositional: the cost of every instruction --- except those that deal with
220I/O --- is constant, i.e. it does not depend on the machine status.
221We do not regret this choice because, with the limited amount of man power
222available in the project, it would have been difficult to also consider this
223aspect. However, without showing if the approach can scale to most complex
224architectures, our methodology remains of limited interest for the industry.
225Therefore, the next important middle term improvement will be the extension
226of our methodology to cover pipelines and simple caches. We will now present our
227ideas on how we intend to address the problem. The obvious long term
228improvement would be to take in consideration multicores system and complex
229memory architectures like the ones currently in use in networks on chips.
230The problem of execution time analysis for these systems is currently
231considered extremely hard or even unfeasible and at the moments it seems
232unlikely that our methodology could contribute to the solution of the problem.
233
234\subsubsection{Static analysis of costs of basic blocks revisited}
235We will now describe what currently seems to be the most interesting technique
236for the static analysis of the cost of basic blocks in presence of complex
237hardware architectures that do not have non compositional cost models.
238
239We start presenting an idealized model of the execution of a generic
240microprocessor (with caches) that has all interrupts disabled and no I/O
241instructions. We then classify the models according to some properties of
242their cost model. Then we show how to extend the labelling approach of CerCo
243to cover models that are classified in a certain way.
244
245\paragraph{The microprocessor model}
246
247Let $\sigma,\sigma_1,\ldots$ range over $\Sigma$, the set of the fragments of
248the microprocessor states that hold the program counter, the program status word and
249all the data manipulated by the object code program, i.e. registers and
250memory cells. We call these fragments the \emph{data states}.
251
252Let $\delta,\delta_1,\ldots$ range over $\Delta$, the set of the
253fragments of the microprocessor state that holds the \emph{internal state} of the
254microprocessor (e.g. the content of the pipeline and caches, the status of
255the branch prediction unit, etc.).
256The internal state of the microprocessor influences the execution cost of the
257next instruction, but it has no effect on the functional behaviour of the
258processor. The whole state of the processor is represented by a pair
259$(\sigma,\delta)$.
260
261Let $I,I_1,\ldots$ range over $\mathcal{I}$, the
262the set of \emph{instructions} of the processor and let
263$\gamma,\gamma_1,\ldots$ range over $\Gamma$, the set of \emph{operands}
264of instructions after the fetching and decoding passes.
265Thus a pair $(I,\gamma)$ represents a \emph{decoded instruction} and already contains
266the data required for execution. Execution needs to access the data state only
267to write the result.
268
269Let $fetch: \Sigma \to \mathcal{I} \times \Gamma$ be the function that
270performs the fetching and execution phases, returning the decoded instruction
271ready for execution. This is not meant to be the real fetch-decode function,
272that exploits the internal state too to speed up execution (e.g. by retrieving
273the instruction arguments from caches) and that, in case of pipelines, works
274in several stages. However, such a function exists and
275it is observationally equivalent to the real fetch-decode.
276
277We capture the semantics of the microprocessor with the following set of
278functions:
279\begin{itemize}
280 \item The \emph{functional transition} function $\longrightarrow : \Sigma \to \Sigma$
281   over data states. This is the only part of the semantics that is relevant
282   to functional analysis.
283 \item The \emph{internal state transition} function
284   $\Longrightarrow : \Sigma \times \Delta \to \Delta$ that updates the internal
285   state.
286 \item The \emph{cost function} $K: \mathcal{I} \times \Gamma \times \Delta \to \mathbb{N}$
287   that assigns a cost to transitions. Since decoded instructions hold the data
288   they act on, the cost of an instruction may depend both on the data being
289   manipulated and on the internal state.
290\end{itemize}
291
292Given a processor state $(\sigma,\delta)$, the processor evolves in the
293new state $(\sigma',\delta')$ in $n$ cost units if
294$\sigma \longrightarrow \sigma'$ and $(\sigma,\delta) \Longrightarrow \delta'$
295and $fetch(\sigma) = (I,\gamma)$ and $K(I,\gamma,\delta) = n$.
296
297An \emph{execution history} is a stream of states and transitions
298$\sigma_0 \longrightarrow \sigma_1 \longrightarrow \sigma_2 \ldots$
299that can be either finite or infinite. Given an execution history, the
300corresponding \emph{execution path} is the stream of program counters
301obtained from the execution history by forgetting all the remaining information.
302The execution path of the history $\sigma_0 \longrightarrow \sigma_1 \longrightarrow \sigma_2 \ldots$ is $pc_0,pc_1,\ldots$ where each $pc_i$ is the program
303counter of $\sigma_i$. We denote the set of finite execution paths with $EP$.
304
305We claim this simple model to be generic enough to cover real world
306architectures.
307
308\paragraph{Classification of cost models}
309A cost function is \emph{exact} if it assigns to transitions the real cost
310incurred. It is \emph{approximated} if it returns an upper bound of the real
311cost.
312
313A cost function is \emph{operand insensitive} if it does not depend on the
314operands of the instructions to be executed. Formally, $K$ is operand insensitive
315if there exists a $K': \mathcal{I} \times \Delta \to \mathbb{N}$ such that
316$K(I,\gamma,\delta) = K'(I,\delta)$. In this case, with an abuse of terminology,
317we will identify $K$ with $K'$.
318
319The cost functions of simple hardware architectures, in particular RISC ones,
320are naturally operand insensitive. In the other cases an exact operand sensitive
321cost function can always be turned into an approximated operand insensitive one
322by taking $K'(I,\delta) = \max\{K(I,\gamma,\delta)~|~\gamma \in \Gamma\}$.
323The question when one performs these approximation is how severe the
324approsimation is. A measure is given by the \emph{jitter}, which is defined
325as the difference between the best and worst cases. In our case, the jitter
326of the approximation $K'$ would be $\max\{K(I,\gamma,\delta)~|~\gamma \in \Gamma\} - \min\{K(I,\gamma,\delta)~|~\gamma \in \Gamma\}$. According to experts
327of WCET analysis, the jitters relative to operand sensitivity in modern
328architectures are small enough to make WCET estimations still useful.
329Therefore, in the sequel we will focus on operand insensitive cost models only.
330
331Note that cost model that are operand insensitive may still have significant
332dependencies over the data manipulated by the instructions, because of the
333dependency over internal states. For example, an instruction that reads data
334from the memory may change the state of the cache and thus greatly affect the
335execution time of successive instructions.
336
337Nevertheless, operand insensitivity is an important property for the labelling
338approach. In~\cite{tranquilli} we introduced \emph{dependent labels} and
339\emph{dependent costs}, which are the possibility of assigning costs to basic
340blocks of instructions which are also dependent on the state of the high level
341program at the beginning of the block. The idea we will now try to pursue is
342to exploit dependent costs to capture cost models that are sensitive to
343the internal states of the microprocessor. Operand sensitivity, however, is
344a major issue in presence of dependent labels: to lift a data sensitive cost
345model from the object code to the source language, we need a function that
346maps high level states to the operands of the instructions to be executed,
347and we need these functions to be simple enough to allow reasoning over them.
348Complex optimizations performed by the compiler, however, make the mappings
349extremely cumbersomes and history dependent. Moreover, keeping
350track of status transformations during compilation would be a significant
351departure from classical compilation models which we are not willing to
352undertake. By assuming or removing operand sensitivity, we get rid of part
353of the problem: we only need to make our costs dependent on the internal
354state of the microprocessor. The latter, however, is not at all visible
355in the high level code. Our next step is to make it visible.
356
357In general, the value of the internal state at a certain point in the program
358history is affected by all the preceding history. For instance, the
359pipeline stages either hold operands of instructions in execution or bubbles
360\footnote{A bubble is formed in the pipeline stage $n$ when an instruction is stucked in the pipeline stage $n-1$, waiting for some data which is not available yet.}. The execution history contains data states that in turn contain the
361object code data which we do not know how to relate simply to the source code
362data. We therefore introduce a new classification.
363
364A \emph{view} over internal states is a pair $(\mathcal{V},|.|)$ where
365$\mathcal{V}$ is a non empty set and $|.|:\Delta \to \mathcal{V}$ is
366a forgetful function over internal states.
367
368The operand insensitive cost function $K$ is \emph{dependent on the view
369$\mathcal{V}$} if there exists a $K': \mathcal{I} \times \mathcal{V} \to \mathbb{N}$ such that $K(I,\delta) = K'(I,|\delta|)$. In this case, with an abuse of terminology, we will identify $K$ with $K'$.
370
371Among the possible views, the ones that we will easily be able to work
372with in the labelling approach are the \emph{execution history dependent views}.
373A view $(\mathcal{V},|.|)$ is execution history dependent when there exists
374a transition function $\hookrightarrow : PC \times \mathcal{V} \to \mathcal{V}$
375such that for all $(\sigma,\delta)$ and $pc$ such that $pc$ is the program
376counter of $\sigma$, $(\sigma,\delta) \Longrightarrow \delta'$ iff
377$(pc,|\delta|) \hookrightarrow |\delta'|$.
378
379The reference to execution historys in the names is due to the following
380fact: every execution history dependent transition function $\hookrightarrow$
381can be lifted to the type $EP \times \mathcal{V} \to \mathcal{V}$ by folding
382the definition over the path trace:
383$((pc_0,\ldots,pc_n),v_0) \hookrightarrow v_{n+1}$ iff for all $i \leq n$,
384$(pc_i,v_i) \hookrightarrow v_{i+1}$.
385Moreover, the folding is clearly associative:
386$(\tau_1 @ \tau_2, v) \hookrightarrow v''$ iff
387$(\tau_1,v) \hookrightarrow v'$ and $(\tau_2,v') \hookrightarrow v''$.
388
389As a final definition, we say that a cost function $K$ is
390\emph{data independent} if it is dependent on a view that is execution path
391dependent. In two paragraphs we will show how we can extend the
392labelling approach to deal with data independent cost models.
393
394Before that, we show that the class of data independent cost functions
395is not too restricted to be interesting. In particular, at least simple
396pipeline models admit data independent cost functions.
397
398\paragraph{A data independent cost function for simple pipelines}
399We consider here a simple model for a pipeline with $n$ stages without branch
400prediction and hazards. We also assume that the actual value of the operands
401of the instruction that is being read have no influence on stalls (i.e. the
402creation of bubbles) nor on the execution cost. The type of operands, however,
403can. For example, reading the value 4 from a register may stall a pipeline
404if the register has not been written yet, while reading 4 from a different
405register may not stall the pipeline.
406
407The internal states $\Delta$ of the pipeline are $n$-tuples of decoded
408instructions or bubbles:
409$\Delta = (\mathcal{I} \times \Gamma \cup \mathbbm{1})^n$.
410This representation is not meant to describe the real data structures used
411in the pipeline: in the implementation the operands are not present in every
412stage of the pipeline, but are progressively fetched. A state
413$(x_1,x_2,\ldots,(I,\gamma))$ represents the state of the pipeline just before
414the completion of instruction $(I,\gamma)$. The first $n-1$ instructions that
415follow $I$ may already be stored in the pipeline, unless bubbles have delayed
416one or more of them.
417
418We introduce the following view over internal states:
419$(\{0,1\}^n,|.|)$ where $\mathbb{N}_n = {0,\ldots,2^n-1}$ and
420$|(x_1,\ldots,x_n)| = (y_1,\ldots,y_n)$ where $y_i$ is 1 iff $x_i$ is a bubble.
421Thus the view only remembers which stages of the pipelines are stuck.
422The view holds enough information to reconstruct the internal state given
423the current data state: from the data state we can fetch the program counter
424of the current and the next $n-1$ instructions and, by simulating at most
425$n$ execution steps and by knowing where the bubbles were, we can fill up
426the internal state of the pipeline.
427
428The assumptions on the lack of influence of operands values on stalls and
429execution times ensures the existence of the transition function
430$\hookrightarrow : PC \times \{0,1\}^n \to \{0,1\}^n$ and the data
431independent cost function $K: PC \times \{0,1\}^n \to \mathbb{N}$.
432
433While the model is a bit simplicistic, it can nevertheless be used to describe
434existing pipelines. It is also simple to be convinced that the same model
435also captures static branch prediction: speculative execution of conditional
436jumps is performed by always taking the same branch which does not depend on
437the execution history. In order to take in account jumpt predictions based on
438the execution history, we just need to incorporate in the status and the view
439statistical informations on the last executions of the branch.
440
441\paragraph{The labelling appraoch for data independent cost models}
442We now describe how the labelling approach can be slightly modified to deal
443with data independent cost models.
444
445\end{document}
Note: See TracBrowser for help on using the repository browser.