1 | \documentclass[11pt, epsf, a4wide]{article} |
---|
2 | |
---|
3 | \usepackage{../style/cerco} |
---|
4 | |
---|
5 | \usepackage{amsfonts} |
---|
6 | \usepackage{amsmath} |
---|
7 | \usepackage{amssymb} |
---|
8 | \usepackage[english]{babel} |
---|
9 | \usepackage{graphicx} |
---|
10 | \usepackage[utf8x]{inputenc} |
---|
11 | \usepackage{listings} |
---|
12 | \usepackage{stmaryrd} |
---|
13 | \usepackage{url} |
---|
14 | \usepackage{bbm} |
---|
15 | |
---|
16 | \begin{document} |
---|
17 | |
---|
18 | \title{Dependent labelling applied to stateful hardware components} |
---|
19 | \author{G. Pulcini$^*$, C. Sacerdoti Coen\footnote{The project CerCo acknowledges the financial support of the Future and Emerging Technologies (FET) programme within the Seventh Framework Programme for Research of the European Commission, under FET-Open grant number: 243881}\\~\\ |
---|
20 | \texttt{\{pulcini,sacerdot\}@cs.unibo.it}\\~ |
---|
21 | \\Dipartimento di Informatica - Scienza ed Ingegneria\\ |
---|
22 | Universit\`a di Bologna} |
---|
23 | \date{03/04/2013} |
---|
24 | |
---|
25 | \maketitle |
---|
26 | |
---|
27 | \abstract{The Certified Complexity (CerCo) EU Project~\cite{cerco} aims at integrating and unifying the functional and non functional analyses of safety and time critical software by performing both of them together on the source code. It is based on the labelling approach~\cite{labelling} that allows to write compilers that induce on the source code a sound and precise cost model for basic blocks. The cost model is computed on object code blocks and then transferred to the source code by reversing the optimizations to the control flow~\cite{paolo}. |
---|
28 | |
---|
29 | In this technical report we address the important issue of stateful hardware whose instructions cost is a function of the internal state. Typical examples are pipelines and caches. In order to avoid loss of precision, the cost model must be parametric on the state, which has no correspondent one at the source level. We show how to enrich the source code with the minimal amount of information that allows to compute costs precisely. We also briefly argue in favour of probabilistic static time analysis and we examine how it is already supported by the labelling approach. |
---|
30 | } |
---|
31 | |
---|
32 | \section{Introduction} |
---|
33 | The labelling approach~\cite{labelling,paolo} is a new technique to implement |
---|
34 | compilers that induce on the source code a sound and precise cost model for |
---|
35 | basic blocks. The cost model is computed on object code blocks and then |
---|
36 | transferred to the source code by reversing the optimizations to the control |
---|
37 | flow~\cite{paolo}. In the rest of this technical report we assume reader's |
---|
38 | knowledge on the labelling approach. |
---|
39 | |
---|
40 | At the end of the CerCo project, we are convinced that the labelling approach |
---|
41 | can support most optimizations currently supported by real world compilers. |
---|
42 | Moreover, we have formally certified the code of a compiler that transfers |
---|
43 | cost models according to the labelling approach, greatly reducing the trusted |
---|
44 | code base of time analysis. The main issue that is left is to identify the |
---|
45 | classes of cost models that can be expressed at the source level and that we |
---|
46 | can automatically reason on. In particular, we know that cost models for |
---|
47 | modern hardware are complex objects that assign to each instruction a cost which |
---|
48 | also depends on the internal state of stateful hardware components that are |
---|
49 | designed to optimize the average case. Pipelines, caches, branch predictors are |
---|
50 | typical examples of components that have no impact on the functional behavior |
---|
51 | of the code, but that greatly augment the code performance. |
---|
52 | |
---|
53 | Ignorance on the internal state of these components during a WCET analysis |
---|
54 | forces to assume the worst case, leading to useless time bounds. Therefore, |
---|
55 | standard WCET tools tend to statically execute code fragments as much as |
---|
56 | possible, to minimize the ignorance on the state. For loops, for example, are |
---|
57 | totally unrolled. In CerCo we could do the same, departing from the labelling |
---|
58 | approach. However, what is more natural is to stay closer to the labelling |
---|
59 | approach and avoid loosing precision by computing parametric exact costs. |
---|
60 | This is the topic of this research report. |
---|
61 | |
---|
62 | In Section~\ref{premises} we briefly review the premises and goals of the |
---|
63 | CerCo approach in order to compare the CerCo approach to the traditional |
---|
64 | WCET one, which is able to deal with modern hardware. |
---|
65 | In Section~\ref{controlflow} we compare the control flow analyses performed |
---|
66 | in CerCo and in traditional WCET, arguing that our approach is winning. |
---|
67 | In Section~\ref{static} we compare the static analyses performed in CerCo |
---|
68 | and in traditional WCET. Only the latter can deal with modern hardware. |
---|
69 | Section~\ref{static2} contains the new material. We revise our static analysis |
---|
70 | and the labelling approach in general to accommodate modern hardware. |
---|
71 | Temporary conclusions are found in Section~\ref{conclusions}. |
---|
72 | |
---|
73 | \section{\label{premises}CerCo: premises and goals} |
---|
74 | |
---|
75 | We briefly review |
---|
76 | here the premises and goals of the CerCo approach to resource analysis. |
---|
77 | \begin{itemize} |
---|
78 | \item There is a lot of recent and renewed activity in the formal method |
---|
79 | community to accommodate resource analysis using techniques derived from |
---|
80 | functional analysis (type systems, logics, abstract interpretation, |
---|
81 | amortized analysis applied to data structures, etc.) |
---|
82 | \item Most of this work, which currently remains at theoretical level, |
---|
83 | is focused on high level languages and it assumes the existence of correct |
---|
84 | and compositional resource cost models. |
---|
85 | \item High level languages are compiled to object code by compilers that |
---|
86 | should respect the functional properties of the program. However, because |
---|
87 | of optimizations and the inherently non compositional nature of compilation, |
---|
88 | compilers do not respect compositional cost models that are imposed a priori |
---|
89 | on the source language. By controlling the compiler and coupling it with a |
---|
90 | WCET analyser, it is actually possible to |
---|
91 | choose the cost model in such a way that the cost bounds are high enough |
---|
92 | to bound the cost of every produced code. This was attempted for instance |
---|
93 | in the EMBounded project~\cite{embounded} with good success. However, we believe that bounds |
---|
94 | obtained in this way have few possibilities of being tight. |
---|
95 | \item Therefore our approach consists in having the compiler generate the |
---|
96 | cost model for the user by combining tracking of basic blocks during code |
---|
97 | transformations with a static resource analysis on the object code for basic |
---|
98 | blocks. We formally prove the compiler to respect the cost model that is |
---|
99 | induced on the source level based on a very few assumptions: basically the |
---|
100 | cost of a sequence of instructions should be associative and commutative |
---|
101 | and it should not depend on the machine status, except its program counter. |
---|
102 | Commutativity can be relaxed at the price of introducing more cost updates |
---|
103 | in the instrumented source code. |
---|
104 | \item The cost model for basic blocks induced on the source language must then |
---|
105 | be exploited to derive cost invariants and to prove them automatically. |
---|
106 | In CerCo we have shown how even simple invariant generations techniques |
---|
107 | are sufficient to enable the fully automatic proving of parametric WCET |
---|
108 | bounds for simple C programs and for Lustre programs of arbitrary complexity. |
---|
109 | \end{itemize} |
---|
110 | |
---|
111 | Compared to traditional WCET techniques, our approach currently has many |
---|
112 | similarities, some advantages and some limitations. Both techniques need to |
---|
113 | perform data flow analysis on the control flow graph of the program and both |
---|
114 | techniques need to estimate the cost of control blocks of instructions. |
---|
115 | |
---|
116 | \section{Control flow analysis\label{controlflow}} |
---|
117 | |
---|
118 | The first main difference is in the control flow analysis. Traditional WCET |
---|
119 | starts from object code and reconstructs the control flow graph from it. |
---|
120 | Moreover, abstract interpretation is heavily employed to bound the number of |
---|
121 | executions of cycles. In order to improve the accuracy of estimation, control |
---|
122 | flow constraints are provided by the user, usually as systems of (linear) |
---|
123 | inequalities. In order to do this, the user, helped by the system, needs to |
---|
124 | relate the object code control flow graph with the source one, because it is |
---|
125 | on the latter that the bounds can be figured out and be understood. This |
---|
126 | operations is untrusted and potentially error prone for complex optimizations |
---|
127 | (like aggressive loop optimizations). Efficient tools from linear algebra are |
---|
128 | then used to solve the systems of inequations obtained by the abstract |
---|
129 | interpreter and from the user constraints. |
---|
130 | |
---|
131 | In CerCo, instead, we assume full control on the compiler that |
---|
132 | is able to relate, even in non trivial ways, the object code control flow graph |
---|
133 | onto the source code control flow graph. A clear disadvantage is the |
---|
134 | impossibility of applying the tool on the object code produced by third party |
---|
135 | compilers. On the other hand, we get rid of the possibility of errors |
---|
136 | in the reconstruction of the control flow graph and in the translation of |
---|
137 | high level constraints into low level constraints. The second potentially |
---|
138 | important advantage is that, once we are dealing with the source language, |
---|
139 | we can augment the precision of our dataflow analysis by combining together |
---|
140 | functional and non functional invariants. This is what we attempted with |
---|
141 | the CerCo Cost Annotating Frama-C Plug-In. The Frama-C architecture allows |
---|
142 | several plug-ins to perform all kind of static analysis on the source code, |
---|
143 | reusing results from other plug-ins and augmenting the source code with their |
---|
144 | results. The techniques are absolutely not limited to linear algebra and |
---|
145 | abstract interpretation, and the most important plug-ins call domain specific |
---|
146 | and general purpose automated theorem provers to close proof obligations of |
---|
147 | arbitrary shape and complexity. |
---|
148 | |
---|
149 | In principle, the extended flexibility of the analysis should allow for a major |
---|
150 | advantage of our technique in terms of precision, also |
---|
151 | considering that all analysis used in traditional WCET can still be implemented |
---|
152 | as plug-ins. In particular, the target we have in mind are systems that are |
---|
153 | both (hard) real time and safety critical. Being safety critical, we can already |
---|
154 | expect them to be fully or partially specified at the functional level. |
---|
155 | Therefore we expect that the additional functional invariants should allow to |
---|
156 | augment the precision of the cost bounds, up to the point where the parametric |
---|
157 | cost bound is fully precise. |
---|
158 | In practice, we have not had the time to perform extensive |
---|
159 | comparisons on the kind of code used by industry in production systems. |
---|
160 | The first middle term improvement of CerCo would then consist in this kind |
---|
161 | of analysis, to support or disprove our expectations. It seems that the |
---|
162 | newborn TACLe Cost Action (Timing Analysis on Code Level) would be the |
---|
163 | best framework to achieve this improvement. |
---|
164 | In the case where our |
---|
165 | technique remains promising, the next long term improvement would consist in |
---|
166 | integrating in the Frama-C plug-in ad-hoc analysis and combinations of analysis |
---|
167 | that would augment the coverage of the efficiency of the cost estimation |
---|
168 | techniques. |
---|
169 | |
---|
170 | \section{Static analysis of costs of basic blocks\label{static}} |
---|
171 | At the beginning of the project we have deliberately decided to focus our |
---|
172 | attention on the control flow preservation, the cost model propagation and |
---|
173 | the exploitation of the cost model induced on the high level code. For this |
---|
174 | reason we have devoted almost no attention to the static analysis of basic |
---|
175 | blocks. This was achieved by picking a very simple hardware architecture |
---|
176 | (the 8051 microprocessor family) whose cost model is fully predictable and |
---|
177 | compositional: the cost of every instruction --- except those that deal with |
---|
178 | I/O --- is constant, i.e. it does not depend on the machine status. |
---|
179 | We do not regret this choice because, with the limited amount of man power |
---|
180 | available in the project, it would have been difficult to also consider this |
---|
181 | aspect. However, without showing if the approach can scale to most complex |
---|
182 | architectures, our methodology remains of limited interest for the industry. |
---|
183 | Therefore, the next important middle term improvement will be the extension |
---|
184 | of our methodology to cover pipelines and simple caches. We will now present our |
---|
185 | ideas on how we intend to address the problem. The obvious long term |
---|
186 | improvement would be to take in consideration multicores system and complex |
---|
187 | memory architectures like the ones currently in use in networks on chips. |
---|
188 | The problem of execution time analysis for these systems is currently |
---|
189 | considered extremely hard or even unfeasible and at the moments it seems |
---|
190 | unlikely that our methodology could contribute to the solution of the problem. |
---|
191 | |
---|
192 | \section{Static analysis of costs of basic blocks revisited\label{static2}} |
---|
193 | We will now describe what currently seems to be the most interesting technique |
---|
194 | for the static analysis of the cost of basic blocks in presence of complex |
---|
195 | hardware architectures that do not have non compositional cost models. |
---|
196 | |
---|
197 | We start presenting an idealized model of the execution of a generic |
---|
198 | microprocessor (with caches) that has all interrupts disabled and no I/O |
---|
199 | instructions. We then classify the models according to some properties of |
---|
200 | their cost model. Then we show how to extend the labelling approach of CerCo |
---|
201 | to cover models that are classified in a certain way. |
---|
202 | |
---|
203 | \paragraph{The microprocessor model} |
---|
204 | |
---|
205 | Let $\sigma,\sigma_1,\ldots$ range over $\Sigma$, the set of the fragments of |
---|
206 | the microprocessor states that hold the program counter, the program status word and |
---|
207 | all the data manipulated by the object code program, i.e. registers and |
---|
208 | memory cells. We call these fragments the \emph{data states}. |
---|
209 | |
---|
210 | Let $\delta,\delta_1,\ldots$ range over $\Delta$, the set of the |
---|
211 | fragments of the microprocessor state that holds the \emph{internal state} of the |
---|
212 | microprocessor (e.g. the content of the pipeline and caches, the status of |
---|
213 | the branch prediction unit, etc.). |
---|
214 | The internal state of the microprocessor influences the execution cost of the |
---|
215 | next instruction, but it has no effect on the functional behaviour of the |
---|
216 | processor. The whole state of the processor is represented by a pair |
---|
217 | $(\sigma,\delta)$. |
---|
218 | |
---|
219 | Let $I,I_1,\ldots$ range over $\mathcal{I}$, the |
---|
220 | the set of \emph{instructions} of the processor and let |
---|
221 | $\gamma,\gamma_1,\ldots$ range over $\Gamma$, the set of \emph{operands} |
---|
222 | of instructions after the fetching and decoding passes. |
---|
223 | Thus a pair $(I,\gamma)$ represents a \emph{decoded instruction} and already contains |
---|
224 | the data required for execution. Execution needs to access the data state only |
---|
225 | to write the result. |
---|
226 | |
---|
227 | Let $fetch: \Sigma \to \mathcal{I} \times \Gamma$ be the function that |
---|
228 | performs the fetching and execution phases, returning the decoded instruction |
---|
229 | ready for execution. This is not meant to be the real fetch-decode function, |
---|
230 | that exploits the internal state too to speed up execution (e.g. by retrieving |
---|
231 | the instruction arguments from caches) and that, in case of pipelines, works |
---|
232 | in several stages. However, such a function exists and |
---|
233 | it is observationally equivalent to the real fetch-decode. |
---|
234 | |
---|
235 | We capture the semantics of the microprocessor with the following set of |
---|
236 | functions: |
---|
237 | \begin{itemize} |
---|
238 | \item The \emph{functional transition} function $\longrightarrow : \Sigma \to \Sigma$ |
---|
239 | over data states. This is the only part of the semantics that is relevant |
---|
240 | to functional analysis. |
---|
241 | \item The \emph{internal state transition} function |
---|
242 | $\Longrightarrow : \Sigma \times \Delta \to \Delta$ that updates the internal |
---|
243 | state. |
---|
244 | \item The \emph{cost function} $K: \mathcal{I} \times \Gamma \times \Delta \to \mathbb{N}$ |
---|
245 | that assigns a cost to transitions. Since decoded instructions hold the data |
---|
246 | they act on, the cost of an instruction may depend both on the data being |
---|
247 | manipulated and on the internal state. |
---|
248 | \end{itemize} |
---|
249 | |
---|
250 | Given a processor state $(\sigma,\delta)$, the processor evolves in the |
---|
251 | new state $(\sigma',\delta')$ in $n$ cost units if |
---|
252 | $\sigma \longrightarrow \sigma'$ and $(\sigma,\delta) \Longrightarrow \delta'$ |
---|
253 | and $fetch(\sigma) = (I,\gamma)$ and $K(I,\gamma,\delta) = n$. |
---|
254 | |
---|
255 | An \emph{execution history} is a stream of states and transitions |
---|
256 | $\sigma_0 \longrightarrow \sigma_1 \longrightarrow \sigma_2 \ldots$ |
---|
257 | that can be either finite or infinite. Given an execution history, the |
---|
258 | corresponding \emph{execution path} is the stream of program counters |
---|
259 | obtained from the execution history by forgetting all the remaining information. |
---|
260 | The execution path of the history $\sigma_0 \longrightarrow \sigma_1 \longrightarrow \sigma_2 \ldots$ is $pc_0,pc_1,\ldots$ where each $pc_i$ is the program |
---|
261 | counter of $\sigma_i$. We denote the set of finite execution paths with $EP$. |
---|
262 | |
---|
263 | We claim this simple model to be generic enough to cover real world |
---|
264 | architectures. |
---|
265 | |
---|
266 | \paragraph{Classification of cost models} |
---|
267 | A cost function is \emph{exact} if it assigns to transitions the real cost |
---|
268 | incurred. It is \emph{approximated} if it returns an upper bound of the real |
---|
269 | cost. |
---|
270 | |
---|
271 | A cost function is \emph{operand insensitive} if it does not depend on the |
---|
272 | operands of the instructions to be executed. Formally, $K$ is operand insensitive |
---|
273 | if there exists a $K': \mathcal{I} \times \Delta \to \mathbb{N}$ such that |
---|
274 | $K(I,\gamma,\delta) = K'(I,\delta)$. In this case, with an abuse of terminology, |
---|
275 | we will identify $K$ with $K'$. |
---|
276 | |
---|
277 | The cost functions of simple hardware architectures, in particular RISC ones, |
---|
278 | are naturally operand insensitive. In the other cases an exact operand sensitive |
---|
279 | cost function can always be turned into an approximated operand insensitive one |
---|
280 | by taking $K'(I,\delta) = \max\{K(I,\gamma,\delta)~|~\gamma \in \Gamma\}$. |
---|
281 | The question when one performs these approximation is how severe the |
---|
282 | approximation is. A measure is given by the \emph{jitter}, which is defined |
---|
283 | as the difference between the best and worst cases. In our case, the jitter |
---|
284 | of the approximation $K'$ would be $\max\{K(I,\gamma,\delta)~|~\gamma \in \Gamma\} - \min\{K(I,\gamma,\delta)~|~\gamma \in \Gamma\}$. According to experts |
---|
285 | of WCET analysis, the jitters relative to operand sensitivity in modern |
---|
286 | architectures are small enough to make WCET estimations still useful. |
---|
287 | Therefore, in the sequel we will focus on operand insensitive cost models only. |
---|
288 | |
---|
289 | Note that cost model that are operand insensitive may still have significant |
---|
290 | dependencies over the data manipulated by the instructions, because of the |
---|
291 | dependency over internal states. For example, an instruction that reads data |
---|
292 | from the memory may change the state of the cache and thus greatly affect the |
---|
293 | execution time of successive instructions. |
---|
294 | |
---|
295 | Nevertheless, operand insensitivity is an important property for the labelling |
---|
296 | approach. In~\cite{paolo} we introduced \emph{dependent labels} and |
---|
297 | \emph{dependent costs}, which are the possibility of assigning costs to basic |
---|
298 | blocks of instructions which are also dependent on the state of the high level |
---|
299 | program at the beginning of the block. The idea we will now try to pursue is |
---|
300 | to exploit dependent costs to capture cost models that are sensitive to |
---|
301 | the internal states of the microprocessor. Operand sensitivity, however, is |
---|
302 | a major issue in presence of dependent labels: to lift a data sensitive cost |
---|
303 | model from the object code to the source language, we need a function that |
---|
304 | maps high level states to the operands of the instructions to be executed, |
---|
305 | and we need these functions to be simple enough to allow reasoning over them. |
---|
306 | Complex optimizations performed by the compiler, however, make the mappings |
---|
307 | extremely cumbersome and history dependent. Moreover, keeping |
---|
308 | track of status transformations during compilation would be a significant |
---|
309 | departure from classical compilation models which we are not willing to |
---|
310 | undertake. By assuming or removing operand sensitivity, we get rid of part |
---|
311 | of the problem: we only need to make our costs dependent on the internal |
---|
312 | state of the microprocessor. The latter, however, is not at all visible |
---|
313 | in the high level code. Our next step is to make it visible. |
---|
314 | |
---|
315 | In general, the value of the internal state at a certain point in the program |
---|
316 | history is affected by all the preceding history. For instance, the |
---|
317 | pipeline stages either hold operands of instructions in execution or bubbles |
---|
318 | \footnote{A bubble is formed in the pipeline stage $n$ when an instruction is stuck in the pipeline stage $n-1$, waiting for some data which is not available yet.}. The execution history contains data states that in turn contain the |
---|
319 | object code data which we do not know how to relate simply to the source code |
---|
320 | data. We therefore introduce a new classification. |
---|
321 | |
---|
322 | A \emph{view} over internal states is a pair $(\mathcal{V},|.|)$ where |
---|
323 | $\mathcal{V}$ is a finite non empty set and $|.|:\Delta \to \mathcal{V}$ is |
---|
324 | a forgetful function over internal states. |
---|
325 | |
---|
326 | The operand insensitive cost function $K$ is \emph{dependent on the view |
---|
327 | $\mathcal{V}$} if there exists a $K': \mathcal{I} \times \mathcal{V} \to \mathbb{N}$ such that $K(I,\delta) = K'(I,|\delta|)$. In this case, with an abuse of terminology, we will identify $K$ with $K'$. |
---|
328 | |
---|
329 | Among the possible views, the ones that we will easily be able to work |
---|
330 | with in the labelling approach are the \emph{execution history dependent views}. |
---|
331 | A view $(\mathcal{V},|.|)$ is execution history dependent with a lookahead |
---|
332 | of length $n$ when there exists |
---|
333 | a transition function $\hookrightarrow : PC^n \times \mathcal{V} \to \mathcal{V}$ |
---|
334 | such that for all $(\sigma,\delta)$ and $pc_1,\ldots,pc_n$ such that every |
---|
335 | $pc_i$ is the program counter of $\sigma_i$ defined by $\sigma \longrightarrow^i \sigma_i$, we have $(\sigma,\delta) \Longrightarrow \delta'$ iff |
---|
336 | $((pc_1,\ldots,pc_n),|\delta|) \hookrightarrow |\delta'|$. |
---|
337 | |
---|
338 | Less formally, a view is |
---|
339 | dependent on the execution history if the evolution of the views is fully |
---|
340 | determined by the evolution of the program counters. To better understand |
---|
341 | the definition, consider the case where the next instruction to be executed |
---|
342 | is a conditional jump. Without knowing the values of the registers, it is |
---|
343 | impossible to determine if the true or false branches will be taken. Therefore |
---|
344 | it is likely to be impossible to determine the value of the view the follows |
---|
345 | the current one. On the other hand, knowing the program counter that will be |
---|
346 | reached executing the conditional branch, we also know which |
---|
347 | branch will be taken and this may be sufficient to compute the new view. |
---|
348 | Lookaheads longer then 1 will be used in case of pipelines: when executing |
---|
349 | one instruction in a system with a pipeline of length $n$, the internal state |
---|
350 | of the pipeline already holds information on the next $n$ instructions to |
---|
351 | be executed. |
---|
352 | |
---|
353 | The reference to execution histories in the names is due to the following |
---|
354 | fact: every execution history dependent transition function $\hookrightarrow$ |
---|
355 | can be lifted to the type $EP \times \mathcal{V} \to \mathcal{V}$ by folding |
---|
356 | the definition over the path trace: |
---|
357 | $((pc_0,\ldots,pc_m),v_0) \hookrightarrow v_n$ iff for all $i \leq m - n$, |
---|
358 | $((pc_i,\ldots,pc_{i+n}),v_i) \hookrightarrow v_{i+1}$. |
---|
359 | Moreover, the folding is clearly associative: |
---|
360 | $(\tau_1 @ \tau_2, v) \hookrightarrow v''$ iff |
---|
361 | $(\tau_1,v) \hookrightarrow v'$ and $(\tau_2,v') \hookrightarrow v''$. |
---|
362 | |
---|
363 | As a final definition, we say that a cost function $K$ is |
---|
364 | \emph{data independent} if it is dependent on a view that is execution path |
---|
365 | dependent. In two paragraphs we will show how we can extend the |
---|
366 | labelling approach to deal with data independent cost models. |
---|
367 | |
---|
368 | Before that, we show that the class of data independent cost functions |
---|
369 | is not too restricted to be interesting. In particular, at least simple |
---|
370 | pipeline models admit data independent cost functions. |
---|
371 | |
---|
372 | \paragraph{A data independent cost function for simple pipelines} |
---|
373 | We consider here a simple model for a pipeline with $n$ stages without branch |
---|
374 | prediction and hazards. We also assume that the actual value of the operands |
---|
375 | of the instruction that is being read have no influence on stalls (i.e. the |
---|
376 | creation of bubbles) nor on the execution cost. The type of operands, however, |
---|
377 | can. For example, reading the value 4 from a register may stall a pipeline |
---|
378 | if the register has not been written yet, while reading 4 from a different |
---|
379 | register may not stall the pipeline. |
---|
380 | |
---|
381 | The internal states $\Delta$ of the pipeline are $n$-tuples of decoded |
---|
382 | instructions or bubbles: |
---|
383 | $\Delta = (\mathcal{I} \times \Gamma \cup \mathbbm{1})^n$. |
---|
384 | This representation is not meant to describe the real data structures used |
---|
385 | in the pipeline: in the implementation the operands are not present in every |
---|
386 | stage of the pipeline, but are progressively fetched. A state |
---|
387 | $(x_1,x_2,\ldots,(I,\gamma))$ represents the state of the pipeline just before |
---|
388 | the completion of instruction $(I,\gamma)$. The first $n-1$ instructions that |
---|
389 | follow $I$ may already be stored in the pipeline, unless bubbles have delayed |
---|
390 | one or more of them. |
---|
391 | |
---|
392 | We introduce the following view over internal states: |
---|
393 | $(\{0,1\}^n,|.|)$ where $\mathbb{N}_n = {0,\ldots,2^n-1}$ and |
---|
394 | $|(x_1,\ldots,x_n)| = (y_1,\ldots,y_n)$ where $y_i$ is 1 iff $x_i$ is a bubble. |
---|
395 | Thus the view only remembers which stages of the pipelines are stuck. |
---|
396 | The view holds enough information to reconstruct the internal state given |
---|
397 | the current data state: from the data state we can fetch the program counter |
---|
398 | of the current and the next $n-1$ instructions and, by simulating at most |
---|
399 | $n$ execution steps and by knowing where the bubbles were, we can fill up |
---|
400 | the internal state of the pipeline. |
---|
401 | |
---|
402 | The assumptions on the lack of influence of operands values on stalls and |
---|
403 | execution times ensures the existence of the data independent cost function |
---|
404 | $K: PC \times \{0,1\}^n \to \mathbb{N}$. The transition function for a |
---|
405 | pipeline with $n$ stages may require $n$ lookaheads: |
---|
406 | $\hookrightarrow : PC^n \times \{0,1\}^n \to \{0,1\}^n$. |
---|
407 | |
---|
408 | While the model is a bit simplicist, it can nevertheless be used to describe |
---|
409 | existing pipelines. It is also simple to be convinced that the same model |
---|
410 | also captures static branch prediction: speculative execution of conditional |
---|
411 | jumps is performed by always taking the same branch which does not depend on |
---|
412 | the execution history. In order to take in account jump predictions based on |
---|
413 | the execution history, we just need to incorporate in the status and the view |
---|
414 | statistical informations on the last executions of the branch. |
---|
415 | |
---|
416 | \paragraph{The labelling approach for data independent cost models} |
---|
417 | We now describe how the labelling approach can be slightly modified to deal |
---|
418 | with a data independent cost model $(\hookrightarrow,K)$ built over |
---|
419 | $(\mathcal{V},|.|)$. |
---|
420 | |
---|
421 | In the labelling approach, every basic block in the object code is identified |
---|
422 | with a unique label $L$ which is also associated to the corresponding |
---|
423 | basic block in the source level. Let us assume that labels are also inserted |
---|
424 | after every function call and that this property is preserved during |
---|
425 | compilation. Adding labels after calls makes the instrumented code heavier |
---|
426 | to read and it generates more proof obligations on the instrumented code, but |
---|
427 | it does not create any additional problems. The preservation during compilation |
---|
428 | creates some significant technical complications in the proof of correctness |
---|
429 | of the compiler, but those can be solved. |
---|
430 | |
---|
431 | The static analysis performed in the last step of the basic labelling approach |
---|
432 | analyses the object code in order to assign a cost to every label or, |
---|
433 | equivalently, to every basic block. The cost is simply the sum of the cost |
---|
434 | of very instruction in the basic block. |
---|
435 | |
---|
436 | In our scenario, instructions no longer have a cost, because the cost function |
---|
437 | $K$ takes in input a program counter but also a view $v$. Therefore we |
---|
438 | replace the static analysis with the computation, for every basic block and |
---|
439 | every $v \in \mathcal{V}$, of the sum of the costs of the instructions in the |
---|
440 | block, starting in the initial view $v$. Formally, let the sequence of the |
---|
441 | program counters of the basic block form the execution path $pc_0,\ldots,pc_n$. |
---|
442 | The cost $K(v_0,L)$ associated to the block labelled with |
---|
443 | $L$ and the initial view $v_0$ is |
---|
444 | $K(pc_0,v_0) + K(pc_1,v_1) + \ldots + K(pc_n,v_n)$ where for every $i < n$, |
---|
445 | $((pc_i,\ldots,pc_{i+l}),v_i) \hookrightarrow v_{k+1}$ where $l$ is the |
---|
446 | lookahead required. When the lookahead requires program counters outside the |
---|
447 | block under analysis, we are free to use dummy ones because the parts of the |
---|
448 | view that deal with future behaviour have no impact on the cost of the |
---|
449 | previous operations by assumption. |
---|
450 | |
---|
451 | The static analysis can be performed |
---|
452 | in linear time in the size of the program because the cardinality of the sets |
---|
453 | of labels (i.e. the number of basic blocks) is bounded by the size of the |
---|
454 | program and because the set $V$ is finite. In the case of the pipelines of |
---|
455 | the previous paragraph, the static analysis is $2^n$ times more expensive |
---|
456 | than the one of the basic labelling approach, where $n$ is the number of |
---|
457 | pipeline stages. |
---|
458 | |
---|
459 | The first important theorem in the labelling approach is the correctness of the |
---|
460 | static analysis: if the (dependent) cost associated to a label $L$ is $k$, |
---|
461 | then executing a program from the beginning of the basic block to the end |
---|
462 | of the basic block should take exactly $k$ cost units. The proof only relies |
---|
463 | on associativity and commutativity of the composition of costs. Commutativity |
---|
464 | is only required if basic blocks can be nested, i.e. if a basic block does not |
---|
465 | terminate when it reaches a call, but it continues after the called function |
---|
466 | returns. By assuming to have a cost label after each block, we do not need |
---|
467 | commutativity any longer, which does not hold for the definition of $K$ we |
---|
468 | just gave. The reason is that, if $pc_i$ is a function call executed in |
---|
469 | the view (state) $v_i$, it is not true that, after return, the state |
---|
470 | will be $v_i+1$ defined by $(pc_i,pc_{i+1},v_i) \hookrightarrow v_{i+1}$ |
---|
471 | (assuming a lookahead of 1, which is already problematic). |
---|
472 | Indeed $pc_{i+1}$ is the program counter of the instruction that follows |
---|
473 | the call, whereas the next program counter to be reached is the one of the |
---|
474 | body of the call. Moreover, even if the computation would make sense, |
---|
475 | $v_{i+1}$ would be the state at the beginning of the execution of the body of |
---|
476 | the call, while we should know the state after the function returns. |
---|
477 | The latter cannot be |
---|
478 | statically predicted. That's why we had to impose labels after calls. |
---|
479 | Associativity, on the other hand, trivially holds. Therefore the proof |
---|
480 | of correctness of the static analysis can be reused without any change. |
---|
481 | |
---|
482 | So far, we have computed the dependent cost $K: \mathcal{V} \times \mathcal{L} \to \mathbb{N}$ that associates a cost to basic blocks and views. |
---|
483 | The second step consists in statically computing the transition function |
---|
484 | $\hookrightarrow : \mathcal{L} \times \mathcal{L} \times \mathcal{V} \to \mathcal{V}$ that |
---|
485 | associates to each pair of consecutively executed basic blocks and input view |
---|
486 | the view obtained at the end of the execution of the first block. |
---|
487 | |
---|
488 | The definition is the following: |
---|
489 | $(L,L',v) \hookrightarrow v'$ iff $((pc_0,\ldots,pc_n,pc'_0,\ldots,pc'_m),v) \hookrightarrow v'$ where $(pc_0,\ldots,pc_n)$ are the program counters of the block labelled by $L$ and $(pc'_0,\ldots,pc'_m)$ are those of the block labelled |
---|
490 | with $L'$. We assume here that $m$ is always longer or equal to the lookahead |
---|
491 | required by the transition function $\hookrightarrow$ over execution paths. |
---|
492 | When this is not the case we could make the new transition function take in |
---|
493 | input a longer lookahead of labels. Or we may assume to introduce enough |
---|
494 | \texttt{NOP}s at the beginning of the block $L'$ to enforce the property. |
---|
495 | In the rest of the paragraph we assume to have followed the second approach |
---|
496 | to simplify the presentation. |
---|
497 | |
---|
498 | The extended transition function over labels is not present in the basic labelling approach. Actually, the basic labelling |
---|
499 | approach can be understood as the generalized approach where the view $\mathcal{V} = \mathbbm{1}$. |
---|
500 | The computation of the extended $\hookrightarrow$ transition function |
---|
501 | is again linear in the size of the program. |
---|
502 | |
---|
503 | Both the versions of $K$ and $\hookrightarrow$ defined over labels can be |
---|
504 | lifted to work over traces by folding them over the list of labels in the |
---|
505 | trace: for $K$ we have $K((L_1,\ldots,L_n),v) = K(L_1,v) + K((L_2,\ldots,L_n),v')$ where |
---|
506 | $(L_1,L_2,v) \hookrightarrow v'$; for $\hookrightarrow$ we have |
---|
507 | $((L_1,\ldots,L_n),v) \hookrightarrow v''$ iff $(L_1,L_2,v) \hookrightarrow v'$ and $((L_2,\ldots,L_n),v') \hookrightarrow v''$. The two definitions are also |
---|
508 | clearly associative. |
---|
509 | |
---|
510 | The second main theorem of the labelling approach is trace preservation: |
---|
511 | the trace produced by the object code is the same as the trace produced by |
---|
512 | the source code. Without any need to change the proof, we immediately obtain |
---|
513 | as a corollary that for every view $v$, the cost $K(\tau,v)$ computed from |
---|
514 | the source code trace $\tau$ is the same than the cost $K(\tau,v)$ computed |
---|
515 | on the object code trace, which is again $\tau$. |
---|
516 | |
---|
517 | The final step of the labelling approach is source code instrumentation. |
---|
518 | In the basic labelling approach it consists in adding a global variable |
---|
519 | \texttt{\_\_cost}, initialized with 0, which is incremented at the beginning |
---|
520 | of every basic block with the cost of the label of the basic block. |
---|
521 | Here we just need a more complex instrumentation that keeps track of the |
---|
522 | values of the views during execution: |
---|
523 | \begin{itemize} |
---|
524 | \item We define three global variables \texttt{\_\_cost}, initialized at 0, |
---|
525 | \texttt{\_\_label}, initialized with \texttt{NULL}, and |
---|
526 | \texttt{\_\_view}, uninitialized. |
---|
527 | \item At the beginning of every basic block labelled by $L$ we add the |
---|
528 | following code fragment: |
---|
529 | |
---|
530 | \begin{tabular}{l} |
---|
531 | \texttt{\_\_view = \_\_next(\_\_label,}$L$\texttt{,\_\_view);}\\ |
---|
532 | \texttt{\_\_cost += }$K(\mbox{\texttt{\_\_view}},L)$\texttt{;}\\ |
---|
533 | \texttt{\_\_label = }$L$\texttt{;} |
---|
534 | \end{tabular} |
---|
535 | |
---|
536 | where \texttt{\_\_next(}$L_1,L_2,v$\texttt{)} = $v'$ iff |
---|
537 | $(L_1,L_2,v) \hookrightarrow v'$ unless $L_1$ is \texttt{NULL}. |
---|
538 | In that case \texttt{(\_\_next(NULL,}$L$\texttt{)} = $v_0$ where |
---|
539 | $v_0 = |\delta_0|$ and $\delta_0$ is the initial value of the internal state |
---|
540 | at the beginning of program execution. |
---|
541 | |
---|
542 | The first line of the code fragment computes the view at the beginning of |
---|
543 | the execution of the block from the view at the end of the previous block. |
---|
544 | Then we update the cost function with the cost of the block. Finally we |
---|
545 | remember the current block to use it for the computation of the next view |
---|
546 | at the beginning of the next block. |
---|
547 | \end{itemize} |
---|
548 | |
---|
549 | \paragraph{An example of instrumentation in presence of a pipeline} |
---|
550 | In Figure~\ref{factprogi} we show how the instrumentation of a program |
---|
551 | that computes the factorial of 10 would look like in presence of a pipeline. |
---|
552 | The instrumentation has been manually produced. The \texttt{\_\_next} |
---|
553 | function says that the body of the internal loop of the \texttt{fact} |
---|
554 | function can be executed in two different internal states, summarized by the |
---|
555 | views 2 and 3. The view 2 holds at the beginning of even iterations, |
---|
556 | while the view 3 holds at the beginning of odd ones. Therefore even and odd |
---|
557 | iterations are assigned a different cost. Also the code after the loop can |
---|
558 | be executed in two different states, depending on the oddness of the last |
---|
559 | loop iteration. |
---|
560 | |
---|
561 | \begin{figure}[t] |
---|
562 | \begin{tiny} |
---|
563 | \begin{verbatim} |
---|
564 | int fact (int n) { |
---|
565 | int i, res = 1; |
---|
566 | for (i = 1 ; i <= n ; i++) res *= i; |
---|
567 | return res; |
---|
568 | } |
---|
569 | |
---|
570 | int main () { |
---|
571 | return (fact(10)); |
---|
572 | } |
---|
573 | \end{verbatim} |
---|
574 | \end{tiny} |
---|
575 | \caption{A simple program that computes the factorial of 10.\label{factprog}} |
---|
576 | \end{figure} |
---|
577 | |
---|
578 | The definitions of \texttt{\_\_next} and \texttt{\_\_K} are just examples. |
---|
579 | For instance, it is possible as well that each one of the 10 iterations |
---|
580 | is executed in a different internal state. |
---|
581 | |
---|
582 | \begin{figure}[t] |
---|
583 | \begin{tiny} |
---|
584 | \begin{verbatim} |
---|
585 | int __cost = 8; |
---|
586 | int __label = 0; |
---|
587 | int __view; |
---|
588 | |
---|
589 | void __cost_incr(int incr) { |
---|
590 | __cost = __cost + incr; |
---|
591 | } |
---|
592 | |
---|
593 | int __next(int label1, int label2, int view) { |
---|
594 | if (label1 == 0) return 0; |
---|
595 | else if (label1 == 0 && label2 == 1) return 1; |
---|
596 | else if (label1 == 1 && label2 == 2) return 2; |
---|
597 | else if (label1 == 2 && label2 == 2 && view == 2) return 3; |
---|
598 | else if (label1 == 2 && label2 == 2 && view == 3) return 2; |
---|
599 | else if (label1 == 2 && label2 == 3 && view == 2) return 1; |
---|
600 | else if (label1 == 2 && label2 == 3 && view == 3) return 0; |
---|
601 | else if (label1 == 3 && label2 == 4 && view == 0) return 0; |
---|
602 | else if (label1 == 3 && label2 == 4 && view == 1) return 0; |
---|
603 | } |
---|
604 | |
---|
605 | int __K(int view, int label) { |
---|
606 | if (view == 0 && label == 0) return 3; |
---|
607 | else if (view == 1 && label == 1) return 14; |
---|
608 | else if (view == 2 && label == 2) return 35; |
---|
609 | else if (view == 3 && label == 2) return 26; |
---|
610 | else if (view == 0 && label == 3) return 6; |
---|
611 | else if (view == 1 && label == 3) return 8; |
---|
612 | else if (view == 0 && label == 4) return 6; |
---|
613 | } |
---|
614 | |
---|
615 | int fact(int n) |
---|
616 | { |
---|
617 | int i; |
---|
618 | int res; |
---|
619 | __view = __next(__label,1,__view); __cost_incr(_K(__view,1)); __label = 1; |
---|
620 | res = 1; |
---|
621 | for (i = 1; i <= n; i = i + 1) { |
---|
622 | __view = __next(__label,2,__view); __cost_incr(__K(__view,2)); __label = 2; |
---|
623 | res = res * i; |
---|
624 | } |
---|
625 | __view = __next(__label,3,__view); __cost_incr(K(__view,3)); __label = 3; |
---|
626 | return res; |
---|
627 | } |
---|
628 | |
---|
629 | int main(void) |
---|
630 | { |
---|
631 | int t; |
---|
632 | __view = __next(__label,0,__view); __cost_incr(__K(__view,0)); __label = 0; |
---|
633 | t = fact(10); |
---|
634 | __view = __next(__label,4,__view); __cost_incr(__K(__view,4)); __label = 4; |
---|
635 | return t; |
---|
636 | } |
---|
637 | \end{verbatim} |
---|
638 | \end{tiny} |
---|
639 | \caption{The instrumented version of the program in Figure~\ref{factprog}.\label{factprogi}} |
---|
640 | \end{figure} |
---|
641 | |
---|
642 | \paragraph{Considerations on the instrumentation} |
---|
643 | The example of instrumentation in the previous paragraph shows that the |
---|
644 | approach we are proposing exposes at the source level a certain amount |
---|
645 | of information about the machine behavior. Syntactically, the additional |
---|
646 | details, are almost entirely confined into the \texttt{\_\_next} and |
---|
647 | \texttt{\_\_K} functions and they do not affect at all the functional behaviour |
---|
648 | of the program. In particular, all invariants, proof obligations and proofs |
---|
649 | that deal with the functional behavior only are preserved. |
---|
650 | |
---|
651 | The interesting question, then, is what is the impact of the additional |
---|
652 | details on non functional (intensional) invariants and proof obligations. |
---|
653 | At the moment, without a working implementation to perform some large scale |
---|
654 | tests, it is difficult to understand the level of automation that can be |
---|
655 | achieved and the techniques that are likely to work better without introducing |
---|
656 | major approximations. In any case, the preliminary considerations of the |
---|
657 | project remain valid: |
---|
658 | \begin{itemize} |
---|
659 | \item The task of computing and proving invariants can be simplified, |
---|
660 | even automatically, trading correctness with precision. For example, the |
---|
661 | most aggressive approximation simply replaces the cost function |
---|
662 | \texttt{\_\_K} with the function that ignores the view and returns for each |
---|
663 | label the maximum cost over all possible views. Correspondingly, the |
---|
664 | function \texttt{\_\_next} can be dropped since it returns views that |
---|
665 | are later ignored. |
---|
666 | |
---|
667 | A more refined possibility consists in approximating the output only on |
---|
668 | those labels whose jitter is small or for those that mark basic blocks |
---|
669 | that are executed only a small number of times. By simplifying the |
---|
670 | \texttt{\_\_next} function accordingly, it is possible to considerably |
---|
671 | reduce the search space for automated provers. |
---|
672 | \item The situation is not worse than what happens with time analysis on |
---|
673 | the object code (the current state of the art). There it is common practice |
---|
674 | to analyse larger chunks of execution to minimize the effect of later |
---|
675 | approximations. For example, if it is known that a loop can be executed at |
---|
676 | most 10 times, computing the cost of 10 iterations yields a |
---|
677 | better bound than multiplying by 10 the worst case of a single interaction. |
---|
678 | |
---|
679 | We clearly can do the same on the source level. |
---|
680 | More generally, every algorithm that works in standard WCET tools on the |
---|
681 | object code is likely to have a counterpart on the source code. |
---|
682 | We also expect to be able to do better working on the source code. |
---|
683 | The reason is that we assume to know more functional properties |
---|
684 | of the program and more high level invariants, and to have more techniques |
---|
685 | and tools at our disposal. Even if at the moment we have no evidence to |
---|
686 | support our claims, we think that this approach is worth pursuing in the |
---|
687 | long term. |
---|
688 | \end{itemize} |
---|
689 | |
---|
690 | \paragraph{The problem with caches} |
---|
691 | Cost models for pipelines --- at least simple ones --- are data independent, |
---|
692 | i.e. they are dependent on a view that is execution path dependent. In other |
---|
693 | words, the knowledge about the sequence of executed instructions is sufficient |
---|
694 | to predict the cost of future instructions. |
---|
695 | |
---|
696 | The same property does not hold for caches. The cost of accessing a memory |
---|
697 | cell strongly depends on the addresses of the memory cells that have been read |
---|
698 | in the past. In turn, the accessed addresses are a function of the low level |
---|
699 | data state, that cannot be correlated to the source program state. |
---|
700 | |
---|
701 | The strong correlation between the internal state of caches and the data |
---|
702 | accessed in the past is one of the two main responsibles for the lack of |
---|
703 | precision of static analysis in modern uni-core architectures. The other one |
---|
704 | is the lack of precise knowledge on the real behaviour of modern hardware |
---|
705 | systems. In order to overcome both problems, that Cazorla\&alt.~\cite{proartis} |
---|
706 | call the ``\emph{Timing Analysis Walls}'', the PROARTIS European Project has |
---|
707 | proposed to design ``\emph{a hardware/software architecture whose execution |
---|
708 | timing behaviour eradicates dependence on execution history}'' (\cite{proartis}, Section 1.2). The statement is obviously too strong. What is concretely |
---|
709 | proposed by PROARTIS is the design of a hardware/software architecture whose |
---|
710 | execution timing is \emph{execution path dependent} (our terminology). |
---|
711 | |
---|
712 | We have already seen that we are able to accommodate in the labelling approach |
---|
713 | cost functions that are dependent on views that are execution path dependent. |
---|
714 | Before fully embracing the PROARTIS vision, |
---|
715 | we need to check if there are other aspects of the PROARTIS proposal |
---|
716 | that can be problematic for CerCo. |
---|
717 | |
---|
718 | \paragraph{Static Probabilistic Time Analysis} |
---|
719 | The approach of PROARTIS to achieve execution path dependent cost models |
---|
720 | consists in turning the hard-to-analyze deterministic hardware components |
---|
721 | (e.g. the cache) into probabilistic hardware components. Intuitively, |
---|
722 | algorithms that took decision based on the program history now throw a dice. |
---|
723 | The typical example which has been thoroughly studied in |
---|
724 | PROARTIS~\cite{proartis2} is that of caches. There the proposal is to replace |
---|
725 | the commonly used deterministic placement and replacement algorithms (e.g. LRU) |
---|
726 | with fully probabilistic choices: when the cache needs to evict a page, the |
---|
727 | page to be evicted is randomly selected according to the uniform distribution. |
---|
728 | |
---|
729 | The expectation is that probabilistic hardware will have worse performances |
---|
730 | in the average case, but it will exhibit the worst case performance only with |
---|
731 | negligible probability. Therefore, it becomes no longer interesting to estimate |
---|
732 | the actual worst case bound. What becomes interesting is to plot the probability |
---|
733 | that the execution time will exceed a certain threshold. For all practical |
---|
734 | purposes, a program that misses its deadline with a negligible probability |
---|
735 | (e.g. $10^-9$ per hour of operation) will be perfectly acceptable when deployed |
---|
736 | on an hardware system (e.g. a car or an airplane) that is already specified in |
---|
737 | such a way. |
---|
738 | |
---|
739 | In order to plot the probability distribution of execution times, PROARTIS |
---|
740 | proposes two methodologies: Static Probabilistic Time Analysis (SPTA) and |
---|
741 | Measurement Based Probabilistic Time Analysis (MBPTA). The first one is |
---|
742 | similar to traditional static analysis, but it operates on probabilistic |
---|
743 | hardware. It is the one that we would like to embrace. The second one is |
---|
744 | based on measurements and it is justified by the following assumption: if |
---|
745 | the probabilities associated to every hardware operation are all independent |
---|
746 | and identically distributed, then measuring the time spent on full runs of |
---|
747 | sub-systems components yields a probabilistic estimate that remains valid |
---|
748 | when the sub-system is plugged in a larger one. Moreover, the probabilistic |
---|
749 | distribution of past runs must be equal to the one of future runs. |
---|
750 | |
---|
751 | We understand that MBPTA is useful to analyze closed (sub)-systems whose |
---|
752 | functional behavior is deterministic. It does not seem to have immediate |
---|
753 | applications to parametric time analysis, which we are interested in. |
---|
754 | Therefore we focus on SPTA. |
---|
755 | |
---|
756 | According to~\cite{proartis}, |
---|
757 | ``\emph{in SPTA, execution time probability distributions for individual operations \ldots are determined statically |
---|
758 | from a model of the processor. The design principles of PROARTIS |
---|
759 | will ensure \ldots that the probabilities for the execution time of each |
---|
760 | instruction are independent. \ldots SPTA is performed by calculating the |
---|
761 | convolution ($\oplus$) of the discrete probability distributions which describe |
---|
762 | the execution time for each instruction on a CPU; this provides a probability |
---|
763 | distribution \ldots representing the timing behaviour of the entire sequence of |
---|
764 | instructions.}'' |
---|
765 | |
---|
766 | We will now analyze to what extend we can embrace SPTA in CerCo. |
---|
767 | |
---|
768 | \paragraph{The labelling approach for Static Probabilistic Time Analysis} |
---|
769 | |
---|
770 | To summarize, the main practical differences between standard static time |
---|
771 | analysis and SPTA are: |
---|
772 | \begin{itemize} |
---|
773 | \item The cost functions for single instructions or sequences of instructions |
---|
774 | no longer return a natural numbers (number of cost units) but integral |
---|
775 | random variables. |
---|
776 | \item Cost functions are extended from single instructions to sequences of |
---|
777 | instructions by using the associative convolution operator $\oplus$ |
---|
778 | \end{itemize} |
---|
779 | |
---|
780 | By reviewing the papers that described the labelling approach, it is easy to |
---|
781 | get convinced that the codomain of the cost analysis can be lifted from |
---|
782 | that of natural numbers to any group. Moreover, by imposing labels after |
---|
783 | every function call, commutativity can be dropped and the approach works on |
---|
784 | every monoid (usually called \emph{cost monoids} in the literature). |
---|
785 | Because random variables and convolutions form a monoid, we immediately have |
---|
786 | that the labelling approach extends itself to SPTA. The instrumented code |
---|
787 | produced by an SPTA-CerCo compiler will then have random variables (on a finite |
---|
788 | domain) as costs and convolutions in place of the \texttt\{\_\_cost\_incr\} |
---|
789 | function. |
---|
790 | |
---|
791 | What is left to be understood is the way to state and compute the |
---|
792 | probabilistic invariants to do \emph{parametric SPTA}. Indeed, it seems that |
---|
793 | PROARTIS only got interested into non parametric PTA. For example, it is |
---|
794 | well known that actually computing the convolutions results in an exponential |
---|
795 | growth of the memory required to represent the result of the convolutions. |
---|
796 | Therefore, the analysis should work symbolically until the moment where |
---|
797 | we are interested into extracting information from the convolution. |
---|
798 | |
---|
799 | Moreover, assuming that the problem of computing invariants is solved, |
---|
800 | the actual behavior of automated theorem |
---|
801 | provers on probabilistic invariants is to be understood. It is likely that |
---|
802 | a good amount of domain specific knowledge about probability theory must be |
---|
803 | exploited and incorporated into automatic provers to achieve concrete results. |
---|
804 | |
---|
805 | Parametric SPTA using the methodology developed in CerCo is a future research |
---|
806 | direction that we believe to be worth exploring in the middle and long term. |
---|
807 | |
---|
808 | \paragraph{Static Probabilistic Time Analysis for Caches in CerCo} |
---|
809 | |
---|
810 | As a final remark, we note that the analysis in CerCo of systems that implement |
---|
811 | probabilistic caches requires a combination of SPTA and data independent cost |
---|
812 | models. The need for a probabilistic analysis is obvious but, as we saw in |
---|
813 | the previous paragraph, it requires no modification of the labelling approach. |
---|
814 | |
---|
815 | In order to understand the need for dependent labelling (to work on data |
---|
816 | independent cost functions), we need to review the behaviour of probabilistic |
---|
817 | caches as proposed by PROARTIS. The interested reader can |
---|
818 | consult~\cite{proartis2} for further informations. |
---|
819 | |
---|
820 | In a randomized cache, the probability of evicting a given line on every access |
---|
821 | is $1/N$ where $N$ stands for the number of cache entries. Therefore the |
---|
822 | hit probability of a specific access to such a cache is |
---|
823 | $P(hit) = (\frac{N-1}{N})^K$ where $K$ is the number of cache misses between |
---|
824 | two consecutive accesses to the same cache entry. For the purposes of our |
---|
825 | analysis, we must assume that every cache access could cause an eviction. |
---|
826 | Therefore, we define $K$ (the \emph{reuse distance}) to be the number of |
---|
827 | memory accesses between two consecutive accesses to the same cache entry, |
---|
828 | including the access for which we are computing $K$. In order to compute |
---|
829 | $K$ for every code memory address, we need to know the execution path (in |
---|
830 | our terminology). In other words, we need a view that records for each |
---|
831 | cache entry the number of memory accesses that has occurred since the last |
---|
832 | access. |
---|
833 | |
---|
834 | For pipelines with $n$ stages, the number of possible views is limited to |
---|
835 | $2^n$: a view can usually just be represented by a word. This is not the |
---|
836 | case for the views on caches, which are in principle very large. Therefore, |
---|
837 | the dependent labelling approach for data independent cost functions that we |
---|
838 | have presented here could still be unpractical for caches. If that turns out |
---|
839 | to be the case, a possible strategy is the use of abstract interpretations |
---|
840 | techniques on the object code to reduce the size of views exposed at the |
---|
841 | source level, at the price of an early loss of precision in the analysis. |
---|
842 | |
---|
843 | More research work must be performed at the current stage to understand if |
---|
844 | caches can be analyzed, even probabilistically, using the CerCo technology. |
---|
845 | This is left for future work and it will be postponed after the work on |
---|
846 | pipelines. |
---|
847 | |
---|
848 | \section{Conclusions\label{conclusions}} |
---|
849 | At the current state of the art functional properties of programs are better |
---|
850 | proved high level languages, but the non functional ones are proved on the |
---|
851 | corresponding object code. The non functional analysis, however, depends on |
---|
852 | functional invariants, e.g. to bound or correlate the number of executions of |
---|
853 | cycles. |
---|
854 | |
---|
855 | The aim of the CerCo project is to reconcile the two analysis by performing |
---|
856 | non functional analysis on the source code. This requires computing a cost |
---|
857 | model on the object code and reflecting the cost model on the source code. |
---|
858 | We achieve this in CerCo by designing a certified Cost Annotating Compiler that |
---|
859 | keeps tracks of transformations of basic blocks, in order to create a |
---|
860 | correspondence (not necessarily bijection) between the basic blocks of the |
---|
861 | source and target language. We then prove that the sequence of basic blocks |
---|
862 | that are met in the source and target executions is correlated. Then, |
---|
863 | we perform a static analysis of the cost of basic blocks on the target language |
---|
864 | and we use it to compute the cost model for the source language basic blocks. |
---|
865 | Finally, we compute cost invariants on the source code from the inferred cost |
---|
866 | model and from the functional program invariants; we generate proof obligations |
---|
867 | for the invariants; we use automatic provers to try to close the proof |
---|
868 | obligations. |
---|
869 | |
---|
870 | The cost of single instructions on modern architectures depend on the internal |
---|
871 | state of various hardware components (pipelines, caches, branch predicting |
---|
872 | units, etc.). The internal states are determined by the previous execution |
---|
873 | history. Therefore the cost of basic blocks is parametric on the execution |
---|
874 | history, which means both the instructions executed and the data manipulated |
---|
875 | by the instructions. The CerCo approach is able to correlate the sequence |
---|
876 | of blocks of source instructions with the sequence of blocks of target |
---|
877 | instructions. It does not correlate the high level and the low level data. |
---|
878 | Therefore we are not able in the general case to lift a cost model parametric |
---|
879 | on the execution history on the source code. |
---|
880 | |
---|
881 | To overcome the problem, we have identified a particular class of cost models |
---|
882 | that are not dependent on the data manipulated. We argue that the CerCo |
---|
883 | approach can cover this scenario by exposing in the source program a finite |
---|
884 | data type of views over internal machine states. The costs of basic blocks |
---|
885 | is parametric on these views, and the current view is updated at basic block |
---|
886 | entry according to some abstraction of the machine hardware that does not |
---|
887 | need to be understood. Further studies are needed to understand how invariant |
---|
888 | generators and automatic provers can cope with these updates and parametric |
---|
889 | costs. |
---|
890 | |
---|
891 | We have argued how pipelines, at least simple ones, are captured by the |
---|
892 | previous scenario and can in principle be manipulated using CerCo tools. |
---|
893 | The same is not true for caches, whose behaviour deeply depends on the |
---|
894 | data manipulated. By embracing the PROARTIS proposal of turning caches into |
---|
895 | probabilistic components, we can break the data dependency. Nevertheless, |
---|
896 | cache analysis remains more problematic because of the size of the views. |
---|
897 | Further studies need to be focused on caches to understand if the problem of |
---|
898 | size of the views can be tamed in practice without ruining the whole approach. |
---|
899 | |
---|
900 | \begin{thebibliography}{} |
---|
901 | |
---|
902 | \bibitem{cerco} \textbf{Certified Complexity}, R. Amadio, A. Asperti, N. Ayache, B. Campbell, D. Mulligan, R. Pollack, Y. Regis-Gianas, C. Sacerdoti Coen, I. Stark, in Procedia Computer Science, Volume 7, 2011, Proceedings of the 2 nd European Future Technologies Conference and Exhibition 2011 (FET 11), 175-177. |
---|
903 | |
---|
904 | \bibitem{labelling} \textbf{Certifying and Reasoning on Cost Annotations in C Programs}, N. Ayache, R.M. Amadio, Y.Régis-Gianas, in Proc. FMICS, Springer LNCS |
---|
905 | 7437: 32-46, 2012, DOI:10.1007/978-3-642-32469-7\_3. |
---|
906 | |
---|
907 | \bibitem{paolo} \textbf{Indexed Labels for Loop Iteration Dependent Costs}, P. Tranquilli, in Proceedings of the 11th International Workshop on Quantitative Aspects of Programming Languages and Systems (QAPL 2013), Rome, 23rd-24th March 2013, Electronic Proceedings in Theoretical Computer Science, to appear in 2013. |
---|
908 | |
---|
909 | \bibitem{embounded} \textbf{The EmBounded project (project paper)}, K. Hammond, R. Dyckhoff, C. Ferdinand, R. Heckmann, M. Hofmann, H. Loidl, G. Michaelson, J. Serot, A. Wallace, in Trends in Functional Programming, Volume 6, Intellect Press, 2006. |
---|
910 | |
---|
911 | \bibitem{proartis} \textbf{PROARTIS: Probabilistically Analysable Real-Time Systems}, F.J. Cazorla, E. Qui˜nones, T. Vardanega, L. Cucu, B. Triquet, G. Bernat, E. Berger, J. Abella, F. Wartel, M. Houston, et al., in ACM Transactions on Embedded Computing Systems, 2012. |
---|
912 | |
---|
913 | \bibitem{proartis2} \textbf{A Cache Design for Probabilistic Real-Time Systems}, L. Kosmidis, J. Abella, E. Quinones, and F. Cazorla, in Design, Automation, and Test in Europe (DATE), Grenoble, France, 03/2013. |
---|
914 | |
---|
915 | \end{thebibliography} |
---|
916 | |
---|
917 | \end{document} |
---|