1 | \documentclass[11pt, epsf, a4wide]{article} |
---|
2 | |
---|
3 | \usepackage{../style/cerco} |
---|
4 | |
---|
5 | \usepackage{amsfonts} |
---|
6 | \usepackage{amsmath} |
---|
7 | \usepackage{amssymb} |
---|
8 | \usepackage[english]{babel} |
---|
9 | \usepackage{graphicx} |
---|
10 | \usepackage[utf8x]{inputenc} |
---|
11 | \usepackage{listings} |
---|
12 | \usepackage{stmaryrd} |
---|
13 | \usepackage{url} |
---|
14 | \usepackage{bbm} |
---|
15 | |
---|
16 | \begin{document} |
---|
17 | |
---|
18 | \title{??????} |
---|
19 | \author{??????} |
---|
20 | \date{dd/mm/yy} |
---|
21 | |
---|
22 | \maketitle |
---|
23 | |
---|
24 | \section{Middle and Long Term Improvements} |
---|
25 | In order to identify the middle and long term improvements, we briefly review |
---|
26 | here the premises and goals of the CerCo approach to resource analysis. |
---|
27 | \begin{itemize} |
---|
28 | \item There is a lot of recent and renewed activity in the formal method |
---|
29 | community to accomodate resource analysis using techniques derived from |
---|
30 | functional analysis (type systems, logics, abstract interpretation, |
---|
31 | amortized analysis applied to data structures, etc.) |
---|
32 | \item Most of this work, which currently remains at theoretical level, |
---|
33 | is focused on high level languages and it assumes the existence of correct |
---|
34 | and compositional resource cost models. |
---|
35 | \item High level languages are compiled to object code by compilers that |
---|
36 | should respect the functional properties of the program. However, because |
---|
37 | of optimizations and the inherently non compositional nature of compilation, |
---|
38 | compilers do not respect compositional cost models that are imposed a priori |
---|
39 | on the source language. By controlling the compiler and coupling it with a |
---|
40 | WCET analyser, it is actually possible to |
---|
41 | choose the cost model in such a way that the cost bounds are high enough |
---|
42 | to bound the cost of every produced code. This was attempted for instance |
---|
43 | in the EmBounded project with good success. However, we believe that bounds |
---|
44 | obtained in this way have few possibilities of being tight. |
---|
45 | \item Therefore our approach consists in having the compiler generate the |
---|
46 | cost model for the user by combining tracking of basic blocks during code |
---|
47 | transformations with a static resource analysis on the object code for basic |
---|
48 | blocks. We formally prove the compiler to respect the cost model that is |
---|
49 | induced on the source level based on a very few assumptions: basically the |
---|
50 | cost of a sequence of instructions should be associative and commutative |
---|
51 | and it should not depend on the machine status, except its program counter. |
---|
52 | Commutativity can be relaxed at the price of introducing more cost updates |
---|
53 | in the instrumented source code. |
---|
54 | \item The cost model for basic blocks induced on the source language must then |
---|
55 | be exploited to derive cost invariants and to prove them automatically. |
---|
56 | In CerCo we have shown how even simple invariant generations techniques |
---|
57 | are sufficient to enable the fully automatic proving of parametric WCET |
---|
58 | bounds for simple C programs and for Lustre programs of arbitrary complexity. |
---|
59 | \end{itemize} |
---|
60 | |
---|
61 | Compared to traditional WCET techniques, our approach currently has many |
---|
62 | similarities, some advantages and some limitations. Both techniques need to |
---|
63 | perform data flow analysis on the control flow graph of the program and both |
---|
64 | techniques need to estimate the cost of control blocks of instructions. |
---|
65 | |
---|
66 | \subsection{Control flow analysis} |
---|
67 | |
---|
68 | The first main difference is in the control flow analysis. Traditional WCET |
---|
69 | starts from object code and reconstructs the control flow graph from it. |
---|
70 | Moreover, abstract interpretation is heavily employed to bound the number of |
---|
71 | executions of cycles. In order to improve the accuracy of estimation, control |
---|
72 | flow constraints are provided by the user, usually as systems of (linear) |
---|
73 | inequalities. In order to do this, the user, helped by the system, needs to |
---|
74 | relate the object code control flow graph with the source one, because it is |
---|
75 | on the latter that the bounds can be figured out and be understood. This |
---|
76 | operations is untrusted and potentially error prone for complex optimizations |
---|
77 | (like aggressive loop optimizations). Efficient tools from linear algebra are |
---|
78 | then used to solve the systems of inequations obtained by the abstract |
---|
79 | interpreter and from the user constraints. |
---|
80 | |
---|
81 | In CerCo, instead, we assume full control on the compiler that |
---|
82 | is able to relate, even in non trivial ways, the object code control flow graph |
---|
83 | onto the source code control flow graph. A clear disadvantage is the |
---|
84 | impossibility of applying the tool on the object code produced by third party |
---|
85 | compilers. On the other hand, we get rid of the possibility of errors |
---|
86 | in the reconstruction of the control flow graph and in the translation of |
---|
87 | high level constraints into low level constraints. The second potentially |
---|
88 | important advantage is that, once we are dealing with the source language, |
---|
89 | we can augment the precision of our dataflow analysis by combining together |
---|
90 | functional and non functional invariants. This is what we attempted with |
---|
91 | the CerCo Cost Annotating Frama-C Plug-In. The Frama-C architecture allows |
---|
92 | several plug-ins to perform all kind of static analisys on the source code, |
---|
93 | reusing results from other plug-ins and augmenting the source code with their |
---|
94 | results. The techniques are absolutely not limited to linear algebra and |
---|
95 | abstract interpretation, and the most important plug-ins call domain specific |
---|
96 | and general purpose automated theorem provers to close proof obligations of |
---|
97 | arbitrary shape and complexity. |
---|
98 | |
---|
99 | In principle, the extended flexibility of the analysis should allow for a major |
---|
100 | advantage of our technique in terms of precision, also |
---|
101 | considering that all analysis used in traditional WCET can still be implemented |
---|
102 | as plug-ins. In particular, the target we have in mind are systems that are |
---|
103 | both (hard) real time and safety critical. Being safety critical, we can already |
---|
104 | expect them to be fully or partially specified at the functional level. |
---|
105 | Therefore we expect that the additional functional invariants should allow to |
---|
106 | augment the precision of the cost bounds, up to the point where the parametric |
---|
107 | cost bound is fully precise. |
---|
108 | In practice, we have not had the time to perform extensive |
---|
109 | comparisons on the kind of code used by industry in production systems. |
---|
110 | The first middle term improvement of CerCo would then consist in this kind |
---|
111 | of analysis, to support or disprove our expectations. It seems that the |
---|
112 | newborn TACLe Cost Action (Timing Analysis on Code Level) would be the |
---|
113 | best framework to achieve this improvement. |
---|
114 | In the case where our |
---|
115 | technique remains promising, the next long term improvement would consist in |
---|
116 | integrating in the Frama-C plug-in ad-hoc analysis and combinations of analysis |
---|
117 | that would augment the coverage of the efficiency of the cost estimation |
---|
118 | techniques. |
---|
119 | |
---|
120 | \subsection{Static analysis of costs of basic blocks} |
---|
121 | At the beginning of the project we have deliberately decided to focus our |
---|
122 | attention on the control flow preservation, the cost model propagation and |
---|
123 | the exploitation of the cost model induced on the high level code. For this |
---|
124 | reason we have devoted almost no attention to the static analysis of basic |
---|
125 | blocks. This was achieved by picking a very simple hardware architecture |
---|
126 | (the 8051 microprocessor family) whose cost model is fully predictable and |
---|
127 | compositional: the cost of every instruction --- except those that deal with |
---|
128 | I/O --- is constant, i.e. it does not depend on the machine status. |
---|
129 | We do not regret this choice because, with the limited amount of man power |
---|
130 | available in the project, it would have been difficult to also consider this |
---|
131 | aspect. However, without showing if the approach can scale to most complex |
---|
132 | architectures, our methodology remains of limited interest for the industry. |
---|
133 | Therefore, the next important middle term improvement will be the extension |
---|
134 | of our methodology to cover pipelines and simple caches. We will now present our |
---|
135 | ideas on how we intend to address the problem. The obvious long term |
---|
136 | improvement would be to take in consideration multicores system and complex |
---|
137 | memory architectures like the ones currently in use in networks on chips. |
---|
138 | The problem of execution time analysis for these systems is currently |
---|
139 | considered extremely hard or even unfeasible and at the moments it seems |
---|
140 | unlikely that our methodology could contribute to the solution of the problem. |
---|
141 | |
---|
142 | \subsubsection{Static analysis of costs of basic blocks revisited} |
---|
143 | We will now describe what currently seems to be the most interesting technique |
---|
144 | for the static analysis of the cost of basic blocks in presence of complex |
---|
145 | hardware architectures that do not have non compositional cost models. |
---|
146 | |
---|
147 | We start presenting an idealized model of the execution of a generic |
---|
148 | microprocessor (with caches) that has all interrupts disabled and no I/O |
---|
149 | instructions. We then classify the models according to some properties of |
---|
150 | their cost model. Then we show how to extend the labelling approach of CerCo |
---|
151 | to cover models that are classified in a certain way. |
---|
152 | |
---|
153 | \paragraph{The microprocessor model} |
---|
154 | |
---|
155 | Let $\sigma,\sigma_1,\ldots$ range over $\Sigma$, the set of the fragments of |
---|
156 | the microprocessor states that hold the program counter, the program status word and |
---|
157 | all the data manipulated by the object code program, i.e. registers and |
---|
158 | memory cells. We call these fragments the \emph{data states}. |
---|
159 | |
---|
160 | Let $\delta,\delta_1,\ldots$ range over $\Delta$, the set of the |
---|
161 | fragments of the microprocessor state that holds the \emph{internal state} of the |
---|
162 | microprocessor (e.g. the content of the pipeline and caches, the status of |
---|
163 | the branch prediction unit, etc.). |
---|
164 | The internal state of the microprocessor influences the execution cost of the |
---|
165 | next instruction, but it has no effect on the functional behaviour of the |
---|
166 | processor. The whole state of the processor is represented by a pair |
---|
167 | $(\sigma,\delta)$. |
---|
168 | |
---|
169 | Let $I,I_1,\ldots$ range over $\mathcal{I}$, the |
---|
170 | the set of \emph{instructions} of the processor and let |
---|
171 | $\gamma,\gamma_1,\ldots$ range over $\Gamma$, the set of \emph{operands} |
---|
172 | of instructions after the fetching and decoding passes. |
---|
173 | Thus a pair $(I,\gamma)$ represents a \emph{decoded instruction} and already contains |
---|
174 | the data required for execution. Execution needs to access the data state only |
---|
175 | to write the result. |
---|
176 | |
---|
177 | Let $fetch: \Sigma \to \mathcal{I} \times \Gamma$ be the function that |
---|
178 | performs the fetching and execution phases, returning the decoded instruction |
---|
179 | ready for execution. This is not meant to be the real fetch-decode function, |
---|
180 | that exploits the internal state too to speed up execution (e.g. by retrieving |
---|
181 | the instruction arguments from caches) and that, in case of pipelines, works |
---|
182 | in several stages. However, such a function exists and |
---|
183 | it is observationally equivalent to the real fetch-decode. |
---|
184 | |
---|
185 | We capture the semantics of the microprocessor with the following set of |
---|
186 | functions: |
---|
187 | \begin{itemize} |
---|
188 | \item The \emph{functional transition} function $\longrightarrow : \Sigma \to \Sigma$ |
---|
189 | over data states. This is the only part of the semantics that is relevant |
---|
190 | to functional analysis. |
---|
191 | \item The \emph{internal state transition} function |
---|
192 | $\Longrightarrow : \Sigma \times \Delta \to \Delta$ that updates the internal |
---|
193 | state. |
---|
194 | \item The \emph{cost function} $K: \mathcal{I} \times \Gamma \times \Delta \to \mathbb{N}$ |
---|
195 | that assigns a cost to transitions. Since decoded instructions hold the data |
---|
196 | they act on, the cost of an instruction may depend both on the data being |
---|
197 | manipulated and on the internal state. |
---|
198 | \end{itemize} |
---|
199 | |
---|
200 | Given a processor state $(\sigma,\delta)$, the processor evolves in the |
---|
201 | new state $(\sigma',\delta')$ in $n$ cost units if |
---|
202 | $\sigma \longrightarrow \sigma'$ and $(\sigma,\delta) \Longrightarrow \delta'$ |
---|
203 | and $fetch(\sigma) = (I,\gamma)$ and $K(I,\gamma,\delta) = n$. |
---|
204 | |
---|
205 | An \emph{execution history} is a stream of states and transitions |
---|
206 | $\sigma_0 \longrightarrow \sigma_1 \longrightarrow \sigma_2 \ldots$ |
---|
207 | that can be either finite or infinite. Given an execution history, the |
---|
208 | corresponding \emph{execution path} is the stream of program counters |
---|
209 | obtained from the execution history by forgetting all the remaining information. |
---|
210 | The execution path of the history $\sigma_0 \longrightarrow \sigma_1 \longrightarrow \sigma_2 \ldots$ is $pc_0,pc_1,\ldots$ where each $pc_i$ is the program |
---|
211 | counter of $\sigma_i$. We denote the set of finite execution paths with $EP$. |
---|
212 | |
---|
213 | We claim this simple model to be generic enough to cover real world |
---|
214 | architectures. |
---|
215 | |
---|
216 | \paragraph{Classification of cost models} |
---|
217 | A cost function is \emph{exact} if it assigns to transitions the real cost |
---|
218 | incurred. It is \emph{approximated} if it returns an upper bound of the real |
---|
219 | cost. |
---|
220 | |
---|
221 | A cost function is \emph{operand insensitive} if it does not depend on the |
---|
222 | operands of the instructions to be executed. Formally, $K$ is operand insensitive |
---|
223 | if there exists a $K': \mathcal{I} \times \Delta \to \mathbb{N}$ such that |
---|
224 | $K(I,\gamma,\delta) = K'(I,\delta)$. In this case, with an abuse of terminology, |
---|
225 | we will identify $K$ with $K'$. |
---|
226 | |
---|
227 | The cost functions of simple hardware architectures, in particular RISC ones, |
---|
228 | are naturally operand insensitive. In the other cases an exact operand sensitive |
---|
229 | cost function can always be turned into an approximated operand insensitive one |
---|
230 | by taking $K'(I,\delta) = \max\{K(I,\gamma,\delta)~|~\gamma \in \Gamma\}$. |
---|
231 | The question when one performs these approximation is how severe the |
---|
232 | approsimation is. A measure is given by the \emph{jitter}, which is defined |
---|
233 | as the difference between the best and worst cases. In our case, the jitter |
---|
234 | of the approximation $K'$ would be $\max\{K(I,\gamma,\delta)~|~\gamma \in \Gamma\} - \min\{K(I,\gamma,\delta)~|~\gamma \in \Gamma\}$. According to experts |
---|
235 | of WCET analysis, the jitters relative to operand sensitivity in modern |
---|
236 | architectures are small enough to make WCET estimations still useful. |
---|
237 | Therefore, in the sequel we will focus on operand insensitive cost models only. |
---|
238 | |
---|
239 | Note that cost model that are operand insensitive may still have significant |
---|
240 | dependencies over the data manipulated by the instructions, because of the |
---|
241 | dependency over internal states. For example, an instruction that reads data |
---|
242 | from the memory may change the state of the cache and thus greatly affect the |
---|
243 | execution time of successive instructions. |
---|
244 | |
---|
245 | Nevertheless, operand insensitivity is an important property for the labelling |
---|
246 | approach. In~\cite{tranquilli} we introduced \emph{dependent labels} and |
---|
247 | \emph{dependent costs}, which are the possibility of assigning costs to basic |
---|
248 | blocks of instructions which are also dependent on the state of the high level |
---|
249 | program at the beginning of the block. The idea we will now try to pursue is |
---|
250 | to exploit dependent costs to capture cost models that are sensitive to |
---|
251 | the internal states of the microprocessor. Operand sensitivity, however, is |
---|
252 | a major issue in presence of dependent labels: to lift a data sensitive cost |
---|
253 | model from the object code to the source language, we need a function that |
---|
254 | maps high level states to the operands of the instructions to be executed, |
---|
255 | and we need these functions to be simple enough to allow reasoning over them. |
---|
256 | Complex optimizations performed by the compiler, however, make the mappings |
---|
257 | extremely cumbersomes and history dependent. Moreover, keeping |
---|
258 | track of status transformations during compilation would be a significant |
---|
259 | departure from classical compilation models which we are not willing to |
---|
260 | undertake. By assuming or removing operand sensitivity, we get rid of part |
---|
261 | of the problem: we only need to make our costs dependent on the internal |
---|
262 | state of the microprocessor. The latter, however, is not at all visible |
---|
263 | in the high level code. Our next step is to make it visible. |
---|
264 | |
---|
265 | In general, the value of the internal state at a certain point in the program |
---|
266 | history is affected by all the preceding history. For instance, the |
---|
267 | pipeline stages either hold operands of instructions in execution or bubbles |
---|
268 | \footnote{A bubble is formed in the pipeline stage $n$ when an instruction is stucked in the pipeline stage $n-1$, waiting for some data which is not available yet.}. The execution history contains data states that in turn contain the |
---|
269 | object code data which we do not know how to relate simply to the source code |
---|
270 | data. We therefore introduce a new classification. |
---|
271 | |
---|
272 | A \emph{view} over internal states is a pair $(\mathcal{V},|.|)$ where |
---|
273 | $\mathcal{V}$ is a finite non empty set and $|.|:\Delta \to \mathcal{V}$ is |
---|
274 | a forgetful function over internal states. |
---|
275 | |
---|
276 | The operand insensitive cost function $K$ is \emph{dependent on the view |
---|
277 | $\mathcal{V}$} if there exists a $K': \mathcal{I} \times \mathcal{V} \to \mathbb{N}$ such that $K(I,\delta) = K'(I,|\delta|)$. In this case, with an abuse of terminology, we will identify $K$ with $K'$. |
---|
278 | |
---|
279 | Among the possible views, the ones that we will easily be able to work |
---|
280 | with in the labelling approach are the \emph{execution history dependent views}. |
---|
281 | A view $(\mathcal{V},|.|)$ is execution history dependent with a lookahead |
---|
282 | of length $n$ when there exists |
---|
283 | a transition function $\hookrightarrow : PC^n \times \mathcal{V} \to \mathcal{V}$ |
---|
284 | such that for all $(\sigma,\delta)$ and $pc_1,\ldots,pc_n$ such that every |
---|
285 | $pc_i$ is the program counter of $\sigma_i$ defined by $\sigma \longrightarrow^i \sigma_i$, we have $(\sigma,\delta) \Longrightarrow \delta'$ iff |
---|
286 | $((pc_1,\ldots,pc_n),|\delta|) \hookrightarrow |\delta'|$. |
---|
287 | |
---|
288 | Less formally, a view is |
---|
289 | dependent on the execution history if the evolution of the views is fully |
---|
290 | determined by the evolution of the program counters. To better understand |
---|
291 | the definition, consider the case where the next instruction to be executed |
---|
292 | is a conditional jump. Without knowing the values of the registers, it is |
---|
293 | impossible to determine if the true or false branches will be taken. Therefore |
---|
294 | it is likely to be impossible to determine the value of the view the follows |
---|
295 | the current one. On the other hand, knowing the program counter that will be |
---|
296 | reached executing the conditional branch, we also know which |
---|
297 | branch will be taken and this may be sufficient to compute the new view. |
---|
298 | Lookaheads longer then 1 will be used in case of pipelines: when executing |
---|
299 | one instruction in a system with a pipeline of length $n$, the internal state |
---|
300 | of the pipeline already holds information on the next $n$ instructions to |
---|
301 | be executed. |
---|
302 | |
---|
303 | The reference to execution historys in the names is due to the following |
---|
304 | fact: every execution history dependent transition function $\hookrightarrow$ |
---|
305 | can be lifted to the type $EP \times \mathcal{V} \to \mathcal{V}$ by folding |
---|
306 | the definition over the path trace: |
---|
307 | $((pc_0,\ldots,pc_m),v_0) \hookrightarrow v_n$ iff for all $i \leq m - n$, |
---|
308 | $((pc_i,\ldots,pc_{i+n}),v_i) \hookrightarrow v_{i+1}$. |
---|
309 | Moreover, the folding is clearly associative: |
---|
310 | $(\tau_1 @ \tau_2, v) \hookrightarrow v''$ iff |
---|
311 | $(\tau_1,v) \hookrightarrow v'$ and $(\tau_2,v') \hookrightarrow v''$. |
---|
312 | |
---|
313 | As a final definition, we say that a cost function $K$ is |
---|
314 | \emph{data independent} if it is dependent on a view that is execution path |
---|
315 | dependent. In two paragraphs we will show how we can extend the |
---|
316 | labelling approach to deal with data independent cost models. |
---|
317 | |
---|
318 | Before that, we show that the class of data independent cost functions |
---|
319 | is not too restricted to be interesting. In particular, at least simple |
---|
320 | pipeline models admit data independent cost functions. |
---|
321 | |
---|
322 | \paragraph{A data independent cost function for simple pipelines} |
---|
323 | We consider here a simple model for a pipeline with $n$ stages without branch |
---|
324 | prediction and hazards. We also assume that the actual value of the operands |
---|
325 | of the instruction that is being read have no influence on stalls (i.e. the |
---|
326 | creation of bubbles) nor on the execution cost. The type of operands, however, |
---|
327 | can. For example, reading the value 4 from a register may stall a pipeline |
---|
328 | if the register has not been written yet, while reading 4 from a different |
---|
329 | register may not stall the pipeline. |
---|
330 | |
---|
331 | The internal states $\Delta$ of the pipeline are $n$-tuples of decoded |
---|
332 | instructions or bubbles: |
---|
333 | $\Delta = (\mathcal{I} \times \Gamma \cup \mathbbm{1})^n$. |
---|
334 | This representation is not meant to describe the real data structures used |
---|
335 | in the pipeline: in the implementation the operands are not present in every |
---|
336 | stage of the pipeline, but are progressively fetched. A state |
---|
337 | $(x_1,x_2,\ldots,(I,\gamma))$ represents the state of the pipeline just before |
---|
338 | the completion of instruction $(I,\gamma)$. The first $n-1$ instructions that |
---|
339 | follow $I$ may already be stored in the pipeline, unless bubbles have delayed |
---|
340 | one or more of them. |
---|
341 | |
---|
342 | We introduce the following view over internal states: |
---|
343 | $(\{0,1\}^n,|.|)$ where $\mathbb{N}_n = {0,\ldots,2^n-1}$ and |
---|
344 | $|(x_1,\ldots,x_n)| = (y_1,\ldots,y_n)$ where $y_i$ is 1 iff $x_i$ is a bubble. |
---|
345 | Thus the view only remembers which stages of the pipelines are stuck. |
---|
346 | The view holds enough information to reconstruct the internal state given |
---|
347 | the current data state: from the data state we can fetch the program counter |
---|
348 | of the current and the next $n-1$ instructions and, by simulating at most |
---|
349 | $n$ execution steps and by knowing where the bubbles were, we can fill up |
---|
350 | the internal state of the pipeline. |
---|
351 | |
---|
352 | The assumptions on the lack of influence of operands values on stalls and |
---|
353 | execution times ensures the existence of the data independent cost function |
---|
354 | $K: PC \times \{0,1\}^n \to \mathbb{N}$. The transition function for a |
---|
355 | pipeline with $n$ stages may require $n$ lookaheads: |
---|
356 | $\hookrightarrow : PC^n \times \{0,1\}^n \to \{0,1\}^n$. |
---|
357 | |
---|
358 | While the model is a bit simplicistic, it can nevertheless be used to describe |
---|
359 | existing pipelines. It is also simple to be convinced that the same model |
---|
360 | also captures static branch prediction: speculative execution of conditional |
---|
361 | jumps is performed by always taking the same branch which does not depend on |
---|
362 | the execution history. In order to take in account jumpt predictions based on |
---|
363 | the execution history, we just need to incorporate in the status and the view |
---|
364 | statistical informations on the last executions of the branch. |
---|
365 | |
---|
366 | \paragraph{The labelling approach for data independent cost models} |
---|
367 | We now describe how the labelling approach can be slightly modified to deal |
---|
368 | with a data independent cost model $(\hookrightarrow,K)$ built over |
---|
369 | $(\mathcal{V},|.|)$. |
---|
370 | |
---|
371 | In the labelling approach, every basic block in the object code is identified |
---|
372 | with a unique label $L$ which is also associated to the corresponding |
---|
373 | basic block in the source level. Let us assume that labels are also inserted |
---|
374 | after every function call and that this property is preserved during |
---|
375 | compilation. Adding labels after calls makes the instrumented code heavier |
---|
376 | to read and it generates more proof obligations on the instrumented code, but |
---|
377 | it does not create any additional problems. The preservation during compilation |
---|
378 | creates some significant technical complications in the proof of correctness |
---|
379 | of the compiler, but those can be solved. |
---|
380 | |
---|
381 | The static analysis performed in the last step of the basic labelling approach |
---|
382 | analyses the object code in order to assign a cost to every label or, |
---|
383 | equivalently, to every basic block. The cost is simply the sum of the cost |
---|
384 | of very instruction in the basic block. |
---|
385 | |
---|
386 | In our scenario, instructions no longer have a cost, because the cost function |
---|
387 | $K$ takes in input a program counter but also a view $v$. Therefore we |
---|
388 | replace the static analysis with the computation, for every basic block and |
---|
389 | every $v \in \mathcal{V}$, of the sum of the costs of the instructions in the |
---|
390 | block, starting in the initial view $v$. Formally, let the sequence of the |
---|
391 | program counters of the basic block form the execution path $pc_0,\ldots,pc_n$. |
---|
392 | The cost $K(v_0,L)$ associated to the block labelled with |
---|
393 | $L$ and the initial view $v_0$ is |
---|
394 | $K(pc_0,v_0) + K(pc_1,v_1) + \ldots + K(pc_n,v_n)$ where for every $i < n$, |
---|
395 | $((pc_i,\ldots,pc_{i+l}),v_i) \hookrightarrow v_{k+1}$ where $l$ is the |
---|
396 | lookahead required. When the lookahead requires program counters outside the |
---|
397 | block under analysis, we are free to use dummy ones because the parts of the |
---|
398 | view that deal with future behaviour have no impact on the cost of the |
---|
399 | previous operations by assumption. |
---|
400 | |
---|
401 | The static analysis can be performed |
---|
402 | in linear time in the size of the program because the cardinality of the sets |
---|
403 | of labels (i.e. the number of basic blocks) is bounded by the size of the |
---|
404 | program and because the set $V$ is finite. In the case of the pipelines of |
---|
405 | the previous paragraph, the static analysis is $2^n$ times more expensive |
---|
406 | than the one of the basic labelling approach, where $n$ is the number of |
---|
407 | pipeline stages. |
---|
408 | |
---|
409 | The first imporant theorem in the labelling approach is the correctness of the |
---|
410 | static analysis: if the (dependent) cost associated to a label $L$ is $k$, |
---|
411 | then executing a program from the beginning of the basic block to the end |
---|
412 | of the basic block should take exactly $k$ cost units. The proof only relies |
---|
413 | on associativity and commutativity of the composition of costs. Commutativity |
---|
414 | is only required if basic blocks can be nested, i.e. if a basic block does not |
---|
415 | terminate when it reaches a call, but it continues after the called function |
---|
416 | returns. By assuming to have a cost label after each block, we do not need |
---|
417 | commutativity any longer, which does not hold for the definition of $K$ we |
---|
418 | just gave. The reason is that, if $pc_i$ is a function call executed in |
---|
419 | the view (state) $v_i$, it is not true that, after return, the state |
---|
420 | will be $v_i+1$ defined by $(pc_i,pc_{i+1},v_i) \hookrightarrow v_{i+1}$ |
---|
421 | (assuming a lookahead of 1, which is already problematic). |
---|
422 | Indeed $pc_{i+1}$ is the program counter of the instruction that follows |
---|
423 | the call, whereas the next program counter to be reached is the one of the |
---|
424 | body of the call. Moreover, even if the computation would make sense, |
---|
425 | $v_{i+1}$ would be the state at the beginning of the execution of the body of |
---|
426 | the call, while we should know the state after the function returns. |
---|
427 | The latter cannot be |
---|
428 | statically predicted. That's why we had to impose labels after calls. |
---|
429 | Associativity, on the other hand, trivially holds. Therefore the proof |
---|
430 | of correctness of the static analysis can be reused without any change. |
---|
431 | |
---|
432 | So far, we have computed the dependent cost $K: \mathcal{V} \times \mathcal{L} \to \mathbb{N}$ that associates a cost to basic blocks and views. |
---|
433 | The second step consists in statically computing the transition function |
---|
434 | $\hookrightarrow : \mathcal{L} \times \mathcal{L} \times \mathcal{V} \to \mathcal{V}$ that |
---|
435 | associates to each pair of consecutively executed basic blocks and input view |
---|
436 | the view obtained at the end of the execution of the first block. |
---|
437 | |
---|
438 | The definition is the following: |
---|
439 | $(L,L',v) \hookrightarrow v'$ iff $((pc_0,\ldots,pc_n,pc'_0,\ldots,pc'_m),v) \hookrightarrow v'$ where $(pc_0,\ldots,pc_n)$ are the program counters of the block labelled by $L$ and $(pc'_0,\ldots,pc'_m)$ are those of the block labelled |
---|
440 | with $L'$. We assume here that $m$ is always longer or equal to the lookahead |
---|
441 | required by the transition function $\hookrightarrow$ over execution paths. |
---|
442 | When this is not the case we could make the new transition function take in |
---|
443 | input a longer lookahead of labels. Or we may assume to introduce enough |
---|
444 | \texttt{NOP}s at the beginning of the block $L'$ to enforce the property. |
---|
445 | In the rest of the paragraph we assume to have followed the second approach |
---|
446 | to simplify the presentation. |
---|
447 | |
---|
448 | The extended transition function over labels is not present in the basic labelling approach. Actually, the basic labelling |
---|
449 | approach can be understood as the generalized approach where the view $\mathcal{V} = \mathbbm{1}$. |
---|
450 | The computation of the extended $\hookrightarrow$ transition function |
---|
451 | is again linear in the size of the program. |
---|
452 | |
---|
453 | Both the versions of $K$ and $\hookrightarrow$ defined over labels can be |
---|
454 | lifted to work over traces by folding them over the list of labels in the |
---|
455 | trace: for $K$ we have $K((L_1,\ldots,L_n),v) = K(L_1,v) + K((L_2,\ldots,L_n),v')$ where |
---|
456 | $(L_1,L_2,v) \hookrightarrow v'$; for $\hookrightarrow$ we have |
---|
457 | $((L_1,\ldots,L_n),v) \hookrightarrow v''$ iff $(L_1,L_2,v) \hookrightarrow v'$ and $((L_2,\ldots,L_n),v') \hookrightarrow v''$. The two definitions are also |
---|
458 | clearly associative. |
---|
459 | |
---|
460 | The second main theorem of the labelling approach is trace preservation: |
---|
461 | the trace produced by the object code is the same as the trace produced by |
---|
462 | the source code. Without any need to change the proof, we immediately obtain |
---|
463 | as a corollary that for every view $v$, the cost $K(\tau,v)$ computed from |
---|
464 | the source code trace $\tau$ is the same than the cost $K(\tau,v)$ computed |
---|
465 | on the object code trace, which is again $\tau$. |
---|
466 | |
---|
467 | The final step of the labelling approach is source code instrumentation. |
---|
468 | In the basic labelling approach it consists in adding a global variable |
---|
469 | \texttt{\_\_cost}, initialized with 0, which is incremented at the beginning |
---|
470 | of every basic block with the cost of the label of the basic block. |
---|
471 | Here we just need a more complex instrumentation that keeps track of the |
---|
472 | values of the views during execution: |
---|
473 | \begin{itemize} |
---|
474 | \item We define three global variables \texttt{\_\_cost}, initialized at 0, |
---|
475 | \texttt{\_\_label}, initialized with \texttt{NULL}, and |
---|
476 | \texttt{\_\_view}, uninitialized. |
---|
477 | \item At the beginning of every basic block labelled by $L$ we add the |
---|
478 | following code fragment: |
---|
479 | |
---|
480 | \begin{tabular}{l} |
---|
481 | \texttt{\_\_view = \_\_next(\_\_label,}$L$\texttt{,\_\_view);}\\ |
---|
482 | \texttt{\_\_cost += }$K(\mbox{\texttt{\_\_view}},L)$\texttt{;}\\ |
---|
483 | \texttt{\_\_label = }$L$\texttt{;} |
---|
484 | \end{tabular} |
---|
485 | |
---|
486 | where \texttt{\_\_next(}$L_1,L_2,v$\texttt{)} = $v'$ iff |
---|
487 | $(L_1,L_2,v) \hookrightarrow v'$ unless $L_1$ is \texttt{NULL}. |
---|
488 | In that case \texttt{(\_\_next(NULL,}$L$\texttt{)} = $v_0$ where |
---|
489 | $v_0 = |\delta_0|$ and $\delta_0$ is the initial value of the internal stat |
---|
490 | at the beginning of program execution. |
---|
491 | |
---|
492 | The first line of the code fragment computes the view at the beginning of |
---|
493 | the execution of the block from the view at the end of the previous block. |
---|
494 | Then we update the cost function with the cost of the block. Finally we |
---|
495 | remember the current block to use it for the computation of the next view |
---|
496 | at the beginning of the next block. |
---|
497 | \end{itemize} |
---|
498 | |
---|
499 | \paragraph{An example of instrumentation in presence of a pipeline} |
---|
500 | In Figure~\ref{factprogi} we show how the instrumentationof a program |
---|
501 | that computes the factorial of 10 would look like in presence of a pipeline. |
---|
502 | The instrumentation has been manually produced. The \texttt{\_\_next} |
---|
503 | function says that the body of the internal loop of the \texttt{fact} |
---|
504 | function can be executed in two different internal states, summarized by the |
---|
505 | views 2 and 3. The view 2 holds at the beginning of even iterations, |
---|
506 | while the view 3 holds at the beginning of odd ones. Therefore even and odd |
---|
507 | iterations are assigned a different cost. Also the code after the loop can |
---|
508 | be executed in two different states, depending on the oddness of the last |
---|
509 | loop iteration. |
---|
510 | |
---|
511 | \begin{figure}[t] |
---|
512 | \begin{tiny} |
---|
513 | \begin{verbatim} |
---|
514 | int fact (int n) { |
---|
515 | int i, res = 1; |
---|
516 | for (i = 1 ; i <= n ; i++) res *= i; |
---|
517 | return res; |
---|
518 | } |
---|
519 | |
---|
520 | int main () { |
---|
521 | return (fact(10)); |
---|
522 | } |
---|
523 | \end{verbatim} |
---|
524 | \end{tiny} |
---|
525 | \caption{A simple program that computes the factorial of 10.\label{factprog}} |
---|
526 | \end{figure} |
---|
527 | |
---|
528 | The definitions of \texttt{\_\_next} and \texttt{\_\_K} are just examples. |
---|
529 | For instance, it is possible as well that each one of the 10 iterations |
---|
530 | is executed in a different internal state. |
---|
531 | |
---|
532 | \begin{figure}[t] |
---|
533 | \begin{tiny} |
---|
534 | \begin{verbatim} |
---|
535 | int __cost = 8; |
---|
536 | int __label = 0; |
---|
537 | int __view; |
---|
538 | |
---|
539 | void __cost_incr(int incr) { |
---|
540 | __cost = __cost + incr; |
---|
541 | } |
---|
542 | |
---|
543 | int __next(int label1, int label2, int view) { |
---|
544 | if (label1 == 0) return 0; |
---|
545 | else if (label1 == 0 && label2 == 1) return 1; |
---|
546 | else if (label1 == 1 && label2 == 2) return 2; |
---|
547 | else if (label1 == 2 && label2 == 2 && view == 2) return 3; |
---|
548 | else if (label1 == 2 && label2 == 2 && view == 3) return 2; |
---|
549 | else if (label1 == 2 && label2 == 3 && view == 2) return 1; |
---|
550 | else if (label1 == 2 && label2 == 3 && view == 3) return 0; |
---|
551 | else if (label1 == 3 && label2 == 4 && view == 0) return 0; |
---|
552 | else if (label1 == 3 && label2 == 4 && view == 1) return 0; |
---|
553 | } |
---|
554 | |
---|
555 | int __K(int view, int label) { |
---|
556 | if (view == 0 && label == 0) return 3; |
---|
557 | else if (view == 1 && label == 1) return 14; |
---|
558 | else if (view == 2 && label == 2) return 35; |
---|
559 | else if (view == 3 && label == 2) return 26; |
---|
560 | else if (view == 0 && label == 3) return 6; |
---|
561 | else if (view == 1 && label == 3) return 8; |
---|
562 | else if (view == 0 && label == 4) return 6; |
---|
563 | } |
---|
564 | |
---|
565 | int fact(int n) |
---|
566 | { |
---|
567 | int i; |
---|
568 | int res; |
---|
569 | __view = __next(__label,1,__view); __cost_incr(_K(__view,1)); __label = 1; |
---|
570 | res = 1; |
---|
571 | for (i = 1; i <= n; i = i + 1) { |
---|
572 | __view = __next(__label,2,__view); __cost_incr(__K(__view,2)); __label = 2; |
---|
573 | res = res * i; |
---|
574 | } |
---|
575 | __view = __next(__label,3,__view); __cost_incr(K(__view,3)); __label = 3; |
---|
576 | return res; |
---|
577 | } |
---|
578 | |
---|
579 | int main(void) |
---|
580 | { |
---|
581 | int t; |
---|
582 | __view = __next(__label,0,__view); __cost_incr(__K(__view,0)); __label = 0; |
---|
583 | t = fact(10); |
---|
584 | __view = __next(__label,4,__view); __cost_incr(__K(__view,4)); __label = 4; |
---|
585 | return t; |
---|
586 | } |
---|
587 | \end{verbatim} |
---|
588 | \end{tiny} |
---|
589 | \caption{The instrumented version of the program in Figure~\ref{factprog}.\label{factprogi}} |
---|
590 | \end{figure} |
---|
591 | |
---|
592 | \paragraph{Considerations on the instrumentation} |
---|
593 | The example of instrumentation in the previous paragraph shows that the |
---|
594 | approach we are proposing exposes at the source level a certain amount |
---|
595 | of information about the machine behavior. Syntactically, the additional |
---|
596 | details, are almost entirely confined into the \texttt{\_\_next} and |
---|
597 | \texttt{\_\_K} functions and they do not affect at all the functional behaviour |
---|
598 | of the program. In particular, all invariants, proof obligations and proofs |
---|
599 | that deal with the functional behavior only are preserved. |
---|
600 | |
---|
601 | The interesting question, then, is what is the impact of the additional |
---|
602 | details on non functional (intensional) invariants and proof obligations. |
---|
603 | At the moment, without a working implementation to perform some large scale |
---|
604 | tests, it is difficult to understand the level of automation that can be |
---|
605 | achieved and the techniques that are likely to work better without introducing |
---|
606 | major approximations. In any case, the preliminary considerations of the |
---|
607 | project remain valid: |
---|
608 | \begin{itemize} |
---|
609 | \item The task of computing and proving invariants can be simplified, |
---|
610 | even automatically, trading correctness with precision. For example, the |
---|
611 | most aggressive approximation simply replaces the cost function |
---|
612 | \texttt{\_\_K} with the function that ignores the view and returns for each |
---|
613 | label the maximum cost over all possible views. Correspondingly, the |
---|
614 | function \texttt{\_\_next} can be dropped since it returns views that |
---|
615 | are later ignored. |
---|
616 | |
---|
617 | A more refined possibility consists in approximating the output only on |
---|
618 | those labels whose jitter is small or for those that mark basic blocks |
---|
619 | that are executed only a small number of times. By simplyfing the |
---|
620 | \texttt{\_\_next} function accordingly, it is possible to considerably |
---|
621 | reduce the search space for automated provers. |
---|
622 | \item The situation is not worse than what happens with time analysis on |
---|
623 | the object code (the current state of the art). There it is common practice |
---|
624 | to analyse larger chunks of execution to minimize the effect of later |
---|
625 | approximations. For example, if it is known that a loop can be executed at |
---|
626 | most 10 times, computing the cost of 10 iterations yields a |
---|
627 | better bound than multiplying by 10 the worst case of a single interation. |
---|
628 | |
---|
629 | We clearly can do the same on the source level. |
---|
630 | More generally, every algorithm that works in standard WCET tools on the |
---|
631 | object code is likely to have a counterpart on the source code. |
---|
632 | We also expect to be able to do better working on the source code. |
---|
633 | The reason is that we assume to know more functional properties |
---|
634 | of the program and more high level invariants, and to have more techniques |
---|
635 | and tools at our disposal. Even if at the moment we have no evidence to |
---|
636 | support our claims, we think that this approach is worth pursuing in the |
---|
637 | long term. |
---|
638 | \end{itemize} |
---|
639 | |
---|
640 | \paragraph{The problem with caches} |
---|
641 | Cost models for pipelines --- at least simple ones --- are data independent, |
---|
642 | i.e. they are dependent on a view that is execution path dependent. In other |
---|
643 | words, the knowledge about the sequence of executed instructions is sufficient |
---|
644 | to predict the cost of future instructions. |
---|
645 | |
---|
646 | The same property does not hold for caches. The cost of accessing a memory |
---|
647 | cell strongly depends on the addresses of the memory cells that have been read |
---|
648 | in the past. In turn, the accessed addresses are a function of the low level |
---|
649 | data state, that cannot be correlated to the source program state. |
---|
650 | |
---|
651 | The strong correlation between the internal state of caches and the data |
---|
652 | accessed in the past is one of the two main responsibles for the lack of |
---|
653 | precision of static analysis in modern uni-core architectures. The other one |
---|
654 | is the lack of precise knowledge on the real behaviour of modern hardware |
---|
655 | systems. In order to overcome both problems, that Cazorla\&alt.~\cite{proartis} |
---|
656 | call the ``\emph{Timing Analysis Walls}'', the PROARTIS European Project has |
---|
657 | proposed to design ``\emph{a hardware/software architecture whose execution |
---|
658 | timing behaviour eradicates dependence on execution history}'' (\cite{proartis}, Section 1.2). The statement is obviously too strong. What is concretely |
---|
659 | proposed by PROARTIS is the design of a hardware/software architecture whose |
---|
660 | execution timing is \emph{execution path dependent} (our terminology). |
---|
661 | |
---|
662 | We have already seen that we are able to accomodate in the labelling approach |
---|
663 | cost functions that are dependent on views that are execution path dependent. |
---|
664 | Before fully embracing the PROARTIS vision, |
---|
665 | we need to check if there are other aspects of the PROARTIS proposal |
---|
666 | that can be problematic for CerCo. |
---|
667 | |
---|
668 | \paragraph{Static Probabilistic Time Analysis} |
---|
669 | The approach of PROARTIS to achieve execution path dependent cost models |
---|
670 | consists in turning the hard-to-analyze deterministic hardware components |
---|
671 | (e.g. the cache) into probabilistic hardware components. Intuitively, |
---|
672 | algorithms that took decision based on the program history now throw a dice. |
---|
673 | The typical example which has been thoroughly studied in |
---|
674 | PROARTIS~\cite{proartis2} is that of caches. There the proposal is to replace |
---|
675 | the commonly used deterministic placement and replacement algorithms (e.g. LRU) |
---|
676 | with fully probabilistic choices: when the cache needs to evict a page, the |
---|
677 | page to be evicted is randomly selected according to the uniform distribution. |
---|
678 | |
---|
679 | The expectation is that probabilistic hardware will have worse performances |
---|
680 | in the average case, but it will exhibit the worst case performance only with |
---|
681 | negligible probability. Therefore, it becomes no longer interesting to estimate |
---|
682 | the actual worst case bound. What becomes interesting is to plot the probability |
---|
683 | that the execution time will exceed a certain treshold. For all practical |
---|
684 | purposes, a program that misses its deadline with a negligible probability |
---|
685 | (e.g. $10^-9$ per hour of operation) will be perfectly acceptable when deployed |
---|
686 | on an hardware system (e.g. a car or an airplane) that is already specified in |
---|
687 | such a way. |
---|
688 | |
---|
689 | In order to plot the probability distribution of execution times, PROARTIS |
---|
690 | proposes two methodologies: Static Probabilistic Time Analysis (SPTA) and |
---|
691 | Measurement Based Probabilistic Time Analysis (MBPTA). The first one is |
---|
692 | similar to traditional static analysis, but it operates on probabilistic |
---|
693 | hardware. It is the one that we would like to embrace. The second one is |
---|
694 | based on measurements and it is justified by the following assumption: if |
---|
695 | the probabilities associated to every hardware operation are all independent |
---|
696 | and identically distributed, then measuring the fime spent on full runs of |
---|
697 | sub-systems components yields a probabilistic estimate that remains valid |
---|
698 | when the sub-system is plugged in a larger one. Moreover, the probabilistic |
---|
699 | distribution of past runs must be equal to the one of future runs. |
---|
700 | |
---|
701 | We understand that MBPTA is useful to analyze closed (sub)-systems whose |
---|
702 | functional behavior is deterministic. It does not seem to have immediate |
---|
703 | applications to parametric time analysis, which we are interested in. |
---|
704 | Therefore we focus on SPTA. |
---|
705 | |
---|
706 | According to~\cite{proartis}, |
---|
707 | ``\emph{in SPTA, execution time probability distributions for individual operations \ldots are determined statically |
---|
708 | from a model of the processor. The design principles of PROARTIS |
---|
709 | will ensure \ldots that the probabilities for the execution time of each |
---|
710 | instruction are independent. \ldots SPTA is performed by calculating the |
---|
711 | convolution ($\oplus$) of the discrete probability distributions which describe |
---|
712 | the execution time for each instruction on a CPU; this provides a probability |
---|
713 | distribution \ldots representing the timing behaviour of the entire sequence of |
---|
714 | instructions.}'' |
---|
715 | |
---|
716 | We will now analyze to what extend we can embrace SPTA in CerCo. |
---|
717 | |
---|
718 | \paragraph{The labelling approach for Static Probabilistic Time Analysis} |
---|
719 | |
---|
720 | To summarize, the main practical differences between standard static time |
---|
721 | analysis and SPTA are: |
---|
722 | \begin{itemize} |
---|
723 | \item The cost functions for single instructions or sequences of instructions |
---|
724 | no longer return a natural numbers (number of cost units) but integral |
---|
725 | random variables. |
---|
726 | \item Cost functions are extended from single instructions to sequences of |
---|
727 | instructions by using the associative convolution operator $\oplus$ |
---|
728 | \end{itemize} |
---|
729 | |
---|
730 | By reviewing the papers that described the labelling approach, it is easy to |
---|
731 | get convinced that the codomain of the cost analysis can be lifted from |
---|
732 | that of natural numbers to any group. Moreover, by imposing labels after |
---|
733 | every function call, commutativity can be dropped and the approach works on |
---|
734 | every monoid (usually called \emph{cost monoids} in the litterature). |
---|
735 | Because random variables and convolutions form a monoid, we immediately have |
---|
736 | that the labelling approach extends itself to SPTA. The instrumented code |
---|
737 | produced by an SPTA-CerCo compiler will then have random variables (on a finite |
---|
738 | domain) as costs and convolutions in place of the \texttt\{\_\_cost\_incr\} |
---|
739 | function. |
---|
740 | |
---|
741 | What is left to be understood is the way to state and compute the |
---|
742 | probabilistic invariants to do \emph{parametric SPTA}. Indeed, it seems that |
---|
743 | PROARTIS only got interested into non parametric PTA. For example, it is |
---|
744 | well known that actually computing the convolutions results in an exponential |
---|
745 | growth of the memory required to represent the result of the convolutions. |
---|
746 | Therefore, the analysis should work symbolically until the moment where |
---|
747 | we are interested into extracting information from the convolution. |
---|
748 | |
---|
749 | Moreover, assuming that the problem of computing invariants is solved, |
---|
750 | the actual behavior of automated theorem |
---|
751 | provers on probabilistic invariants is to be understood. It is likely that |
---|
752 | a good amount of domain specific knowledge about probability theory must be |
---|
753 | exploited and incorporated into automatic provers to achieve concrete results. |
---|
754 | |
---|
755 | Parametric SPTA using the methodology developed in CerCo is a future research |
---|
756 | direction that we believe to be worth exploring in the middle and long term. |
---|
757 | |
---|
758 | \paragraph{Static Probabilistic Time Analysis for Caches in CerCo} |
---|
759 | |
---|
760 | As a final remark, we note that the analysis in CerCo of systems that implement |
---|
761 | probabilistic caches requires a combination of SPTA and data independent cost |
---|
762 | models. The need for a probabilistic analysis is obvious but, as we saw in |
---|
763 | the previous paragraph, it requires no modification of the labelling approach. |
---|
764 | |
---|
765 | In order to understand the need for dependent labelling (to work on data |
---|
766 | independent cost functions), we need to review the behaviour of probabilistic |
---|
767 | caches as proposed by PROARTIS. The interested reader can |
---|
768 | consult~\cite{proartis2-articolo30} for further informations. |
---|
769 | |
---|
770 | In a randomized cache, the probability of evicting a given line on every access |
---|
771 | is $1/N$ where $N$ stands for the number of cache entries. Therefore the |
---|
772 | hit probability of a specific access to such a cache is |
---|
773 | $P(hit) = (\frac{N-1}{N})^K$ where $K$ is the number of cache misses between |
---|
774 | two consecutive accesses to the same cache entry. For the purposes of our |
---|
775 | analysis, we must assume that every cache access could cause an eviction. |
---|
776 | Therefore, we define $K$ (the \emph{reuse distance}) to be the number of |
---|
777 | memory accesses between two consecutive accesses to the same cache entry, |
---|
778 | including the access for which we are computing $K$. In order to compute |
---|
779 | $K$ for every code memory address, we need to know the execution path (in |
---|
780 | our terminology). In other words, we need a view that records for each |
---|
781 | cache entry the number of memory accesses that has occurred since the last |
---|
782 | access. |
---|
783 | |
---|
784 | For pipelines with $n$ stages, the number of possible views is limited to |
---|
785 | $2^n$: a view can usually just be represented by a word. This is not the |
---|
786 | case for the views on caches, which are in principle very large. Therefore, |
---|
787 | the dependent labelling approach for data indepedent cost functions that we |
---|
788 | have presented here could still be unpractical for caches. If that turns out |
---|
789 | to be the case, a possible strategy is the use of abstract interpretations |
---|
790 | techniques on the object code to reduce the size of views exposed at the |
---|
791 | source level, at the price of an early loss of precision in the analysis. |
---|
792 | |
---|
793 | More research work must be performed at the current stage to understand if |
---|
794 | caches can be analyzed, even probabilistically, using the CerCo technology. |
---|
795 | This is left for future work and it will be postponed after the work on |
---|
796 | pipelines. |
---|
797 | |
---|
798 | \paragraph{Conclusions} |
---|
799 | At the current state of the art functional properties of programs are better |
---|
800 | proved high level languages, but the non functional ones are proved on the |
---|
801 | corresponding object code. The non functional analysis, however, depends on |
---|
802 | functional invariants, e.g. to bound or correlate the number of executions of |
---|
803 | cycles. |
---|
804 | |
---|
805 | The aim of the CerCo project is to reconcile the two analysis by performing |
---|
806 | non functional analysis on the source code. This requires computing a cost |
---|
807 | model on the object code and reflecting the cost model on the source code. |
---|
808 | We achieve this in CerCo by designing a certified Cost Annotating Compiler that |
---|
809 | keeps tracks of transformations of basic blocks, in order to create a |
---|
810 | correspondence (not necessaritly bijection) between the basic blocks of the |
---|
811 | source and target language. We then prove that the sequence of basic blocks |
---|
812 | that are met in the source and target executions is correlated. Then, |
---|
813 | we perform a static analysis of the cost of basic blocks on the target language |
---|
814 | and we use it to compute the cost model for the source language basic blocks. |
---|
815 | Finally, we compute cost invariants on the source code from the inferred cost |
---|
816 | model and from the functional program invariants; we generate proof obligations |
---|
817 | for the invariants; we use automatic provers to try to close the proof |
---|
818 | obligations. |
---|
819 | |
---|
820 | The cost of single instructions on modern architectures depend on the internal |
---|
821 | state of various hardware components (pipelines, caches, branch predicting |
---|
822 | units, etc.). The internal states are determined by the previous execution |
---|
823 | history. Therefore the cost of basic blocks is parametric on the execution |
---|
824 | history, which means both the instructions executed and the data manipulated |
---|
825 | by the instructions. The CerCo approach is able to correlate the sequence |
---|
826 | of blocks of source instructions with the sequence of blocks of target |
---|
827 | instructions. It does not correlate the high level and the low level data. |
---|
828 | Therefore we are not able in the general case to lift a cost model parametric |
---|
829 | on the execution history on the source code. |
---|
830 | |
---|
831 | To overcome the problem, we have identified a particular class of cost models |
---|
832 | that are not dependent on the data manipulated. We argue that the CerCo |
---|
833 | approach can cover this scenario by exposing in the source program a finite |
---|
834 | data type of views over internal machine states. The costs of basic blocks |
---|
835 | is parametric on these views, and the current view is updated at basic block |
---|
836 | entry according to some abstraction of the machine hardware that does not |
---|
837 | need to be understood. Further studies are needed to understand how invariant |
---|
838 | generators and automatic provers can cope with these updates and parametric |
---|
839 | costs. |
---|
840 | |
---|
841 | We have argued how pipelines, at least simple ones, are captured by the |
---|
842 | previous scenario and can in principle be manipulated using CerCo tools. |
---|
843 | The same is not true for caches, whose behaviour deeply depends on the |
---|
844 | data manipulated. By embracing the PROARTIS proposal of turning caches into |
---|
845 | probabilistic components, we can break the data dependency. Nevertheless, |
---|
846 | cache analysis remains more problematic because of the size of the views. |
---|
847 | Further studies need to be focused on caches to understand if the problem of |
---|
848 | size of the views can be tamed in practice without ruining the whole approach. |
---|
849 | |
---|
850 | \end{document} |
---|