1 | \documentclass[11pt, epsf, a4wide]{article} |
---|
2 | |
---|
3 | \usepackage{../style/cerco} |
---|
4 | |
---|
5 | \usepackage{amsfonts} |
---|
6 | \usepackage{amsmath} |
---|
7 | \usepackage{amssymb} |
---|
8 | \usepackage[english]{babel} |
---|
9 | \usepackage{graphicx} |
---|
10 | \usepackage[utf8x]{inputenc} |
---|
11 | \usepackage{listings} |
---|
12 | \usepackage{stmaryrd} |
---|
13 | \usepackage{url} |
---|
14 | \usepackage{bbm} |
---|
15 | |
---|
16 | \title{ |
---|
17 | INFORMATION AND COMMUNICATION TECHNOLOGIES\\ |
---|
18 | (ICT)\\ |
---|
19 | PROGRAMME\\ |
---|
20 | \vspace*{1cm}Project FP7-ICT-2009-C-243881 \cerco{}} |
---|
21 | |
---|
22 | \lstdefinelanguage{matita-ocaml} |
---|
23 | {keywords={ndefinition,ncoercion,nlemma,ntheorem,nremark,ninductive,nrecord,nqed,nlet,let,in,rec,match,return,with,Type,try}, |
---|
24 | morekeywords={[2]nwhd,nnormalize,nelim,ncases,ndestruct}, |
---|
25 | morekeywords={[3]type,of}, |
---|
26 | mathescape=true, |
---|
27 | } |
---|
28 | |
---|
29 | \lstset{language=matita-ocaml,basicstyle=\small\tt,columns=flexible,breaklines=false, |
---|
30 | keywordstyle=\color{red}\bfseries, |
---|
31 | keywordstyle=[2]\color{blue}, |
---|
32 | keywordstyle=[3]\color{blue}\bfseries, |
---|
33 | commentstyle=\color{green}, |
---|
34 | stringstyle=\color{blue}, |
---|
35 | showspaces=false,showstringspaces=false} |
---|
36 | |
---|
37 | \lstset{extendedchars=false} |
---|
38 | \lstset{inputencoding=utf8x} |
---|
39 | \DeclareUnicodeCharacter{8797}{:=} |
---|
40 | \DeclareUnicodeCharacter{10746}{++} |
---|
41 | \DeclareUnicodeCharacter{9001}{\ensuremath{\langle}} |
---|
42 | \DeclareUnicodeCharacter{9002}{\ensuremath{\rangle}} |
---|
43 | |
---|
44 | \date{} |
---|
45 | \author{} |
---|
46 | |
---|
47 | \begin{document} |
---|
48 | |
---|
49 | \thispagestyle{empty} |
---|
50 | |
---|
51 | \vspace*{-1cm} |
---|
52 | \begin{center} |
---|
53 | \includegraphics[width=0.6\textwidth]{../style/cerco_logo.png} |
---|
54 | \end{center} |
---|
55 | |
---|
56 | \begin{minipage}{\textwidth} |
---|
57 | \maketitle |
---|
58 | \end{minipage} |
---|
59 | |
---|
60 | \vspace*{0.5cm} |
---|
61 | \begin{center} |
---|
62 | \begin{LARGE} |
---|
63 | \textbf{ |
---|
64 | Report n. D6.3\\ |
---|
65 | Final Report on User Validation} |
---|
66 | \end{LARGE} |
---|
67 | \end{center} |
---|
68 | |
---|
69 | \vspace*{2cm} |
---|
70 | \begin{center} |
---|
71 | \begin{large} |
---|
72 | Version 1.0 |
---|
73 | \end{large} |
---|
74 | \end{center} |
---|
75 | |
---|
76 | \vspace*{0.5cm} |
---|
77 | \begin{center} |
---|
78 | \begin{large} |
---|
79 | Main Authors:\\ |
---|
80 | XXXX %Dominic P. Mulligan and Claudio Sacerdoti Coen |
---|
81 | \end{large} |
---|
82 | \end{center} |
---|
83 | |
---|
84 | \vspace*{\fill} |
---|
85 | |
---|
86 | \noindent |
---|
87 | Project Acronym: \cerco{}\\ |
---|
88 | Project full title: Certified Complexity\\ |
---|
89 | Proposal/Contract no.: FP7-ICT-2009-C-243881 \cerco{}\\ |
---|
90 | |
---|
91 | \clearpage |
---|
92 | \pagestyle{myheadings} |
---|
93 | \markright{\cerco{}, FP7-ICT-2009-C-243881} |
---|
94 | |
---|
95 | \newpage |
---|
96 | |
---|
97 | \vspace*{7cm} |
---|
98 | \paragraph{Abstract} |
---|
99 | xxx |
---|
100 | |
---|
101 | \newpage |
---|
102 | |
---|
103 | \tableofcontents |
---|
104 | |
---|
105 | \newpage |
---|
106 | |
---|
107 | \section{Task} |
---|
108 | \label{sect.task} |
---|
109 | |
---|
110 | The Grant Agreement describes deliverable D6.3 as follows: |
---|
111 | \begin{quotation} |
---|
112 | \textbf{Final Report on User Validation}: An articulated analysis and critics of the user validation experiences. In particular we will review the effectiveness of the techniques for cost annotation exploitement that have been employed in the project and that have been validated on simple and non trivial examples. We will also identify additional techniques that could be exploited in the middle and long term to bring the CerCo compiler to its full potentialities. |
---|
113 | |
---|
114 | \end{quotation} |
---|
115 | |
---|
116 | \section{Middle and Long Term Improvements} |
---|
117 | In order to identify the middle and long term improvements, we briefly review |
---|
118 | here the premises and goals of the CerCo approach to resource analysis. |
---|
119 | \begin{itemize} |
---|
120 | \item There is a lot of recent and renewed activity in the formal method |
---|
121 | community to accomodate resource analysis using techniques derived from |
---|
122 | functional analysis (type systems, logics, abstract interpretation, |
---|
123 | amortized analysis applied to data structures, etc.) |
---|
124 | \item Most of this work, which currently remains at theoretical level, |
---|
125 | is focused on high level languages and it assumes the existence of correct |
---|
126 | and compositional resource cost models. |
---|
127 | \item High level languages are compiled to object code by compilers that |
---|
128 | should respect the functional properties of the program. However, because |
---|
129 | of optimizations and the inherently non compositional nature of compilation, |
---|
130 | compilers do not respect compositional cost models that are imposed a priori |
---|
131 | on the source language. By controlling the compiler and coupling it with a |
---|
132 | WCET analyser, it is actually possible to |
---|
133 | choose the cost model in such a way that the cost bounds are high enough |
---|
134 | to bound the cost of every produced code. This was attempted for instance |
---|
135 | in the EmBounded project with good success. However, we believe that bounds |
---|
136 | obtained in this way have few possibilities of being tight. |
---|
137 | \item Therefore our approach consists in having the compiler generate the |
---|
138 | cost model for the user by combining tracking of basic blocks during code |
---|
139 | transformations with a static resource analysis on the object code for basic |
---|
140 | blocks. We formally prove the compiler to respect the cost model that is |
---|
141 | induced on the source level based on a very few assumptions: basically the |
---|
142 | cost of a sequence of instructions should be associative and commutative |
---|
143 | and it should not depend on the machine status, except its program counter. |
---|
144 | Commutativity can be relaxed at the price of introducing more cost updates |
---|
145 | in the instrumented source code. |
---|
146 | \item The cost model for basic blocks induced on the source language must then |
---|
147 | be exploited to derive cost invariants and to prove them automatically. |
---|
148 | In CerCo we have shown how even simple invariant generations techniques |
---|
149 | are sufficient to enable the fully automatic proving of parametric WCET |
---|
150 | bounds for simple C programs and for Lustre programs of arbitrary complexity. |
---|
151 | \end{itemize} |
---|
152 | |
---|
153 | Compared to traditional WCET techniques, our approach currently has many |
---|
154 | similarities, some advantages and some limitations. Both techniques need to |
---|
155 | perform data flow analysis on the control flow graph of the program and both |
---|
156 | techniques need to estimate the cost of control blocks of instructions. |
---|
157 | |
---|
158 | \subsection{Control flow analysis} |
---|
159 | |
---|
160 | The first main difference is in the control flow analysis. Traditional WCET |
---|
161 | starts from object code and reconstructs the control flow graph from it. |
---|
162 | Moreover, abstract interpretation is heavily employed to bound the number of |
---|
163 | executions of cycles. In order to improve the accuracy of estimation, control |
---|
164 | flow constraints are provided by the user, usually as systems of (linear) |
---|
165 | inequalities. In order to do this, the user, helped by the system, needs to |
---|
166 | relate the object code control flow graph with the source one, because it is |
---|
167 | on the latter that the bounds can be figured out and be understood. This |
---|
168 | operations is untrusted and potentially error prone for complex optimizations |
---|
169 | (like aggressive loop optimizations). Efficient tools from linear algebra are |
---|
170 | then used to solve the systems of inequations obtained by the abstract |
---|
171 | interpreter and from the user constraints. |
---|
172 | |
---|
173 | In CerCo, instead, we assume full control on the compiler that |
---|
174 | is able to relate, even in non trivial ways, the object code control flow graph |
---|
175 | onto the source code control flow graph. A clear disadvantage is the |
---|
176 | impossibility of applying the tool on the object code produced by third party |
---|
177 | compilers. On the other hand, we get rid of the possibility of errors |
---|
178 | in the reconstruction of the control flow graph and in the translation of |
---|
179 | high level constraints into low level constraints. The second potentially |
---|
180 | important advantage is that, once we are dealing with the source language, |
---|
181 | we can augment the precision of our dataflow analysis by combining together |
---|
182 | functional and non functional invariants. This is what we attempted with |
---|
183 | the CerCo Cost Annotating Frama-C Plug-In. The Frama-C architecture allows |
---|
184 | several plug-ins to perform all kind of static analisys on the source code, |
---|
185 | reusing results from other plug-ins and augmenting the source code with their |
---|
186 | results. The techniques are absolutely not limited to linear algebra and |
---|
187 | abstract interpretation, and the most important plug-ins call domain specific |
---|
188 | and general purpose automated theorem provers to close proof obligations of |
---|
189 | arbitrary shape and complexity. |
---|
190 | |
---|
191 | In principle, the extended flexibility of the analysis should allow for a major |
---|
192 | advantage of our technique in terms of precision, also |
---|
193 | considering that all analysis used in traditional WCET can still be implemented |
---|
194 | as plug-ins. In particular, the target we have in mind are systems that are |
---|
195 | both (hard) real time and safety critical. Being safety critical, we can already |
---|
196 | expect them to be fully or partially specified at the functional level. |
---|
197 | Therefore we expect that the additional functional invariants should allow to |
---|
198 | augment the precision of the cost bounds, up to the point where the parametric |
---|
199 | cost bound is fully precise. |
---|
200 | In practice, we have not had the time to perform extensive |
---|
201 | comparisons on the kind of code used by industry in production systems. |
---|
202 | The first middle term improvement of CerCo would then consist in this kind |
---|
203 | of analysis, to support or disprove our expectations. It seems that the |
---|
204 | newborn TACLe Cost Action (Timing Analysis on Code Level) would be the |
---|
205 | best framework to achieve this improvement. |
---|
206 | In the case where our |
---|
207 | technique remains promising, the next long term improvement would consist in |
---|
208 | integrating in the Frama-C plug-in ad-hoc analysis and combinations of analysis |
---|
209 | that would augment the coverage of the efficiency of the cost estimation |
---|
210 | techniques. |
---|
211 | |
---|
212 | \subsection{Static analysis of costs of basic blocks} |
---|
213 | At the beginning of the project we have deliberately decided to focus our |
---|
214 | attention on the control flow preservation, the cost model propagation and |
---|
215 | the exploitation of the cost model induced on the high level code. For this |
---|
216 | reason we have devoted almost no attention to the static analysis of basic |
---|
217 | blocks. This was achieved by picking a very simple hardware architecture |
---|
218 | (the 8051 microprocessor family) whose cost model is fully predictable and |
---|
219 | compositional: the cost of every instruction --- except those that deal with |
---|
220 | I/O --- is constant, i.e. it does not depend on the machine status. |
---|
221 | We do not regret this choice because, with the limited amount of man power |
---|
222 | available in the project, it would have been difficult to also consider this |
---|
223 | aspect. However, without showing if the approach can scale to most complex |
---|
224 | architectures, our methodology remains of limited interest for the industry. |
---|
225 | Therefore, the next important middle term improvement will be the extension |
---|
226 | of our methodology to cover pipelines and simple caches. We will now present our |
---|
227 | ideas on how we intend to address the problem. The obvious long term |
---|
228 | improvement would be to take in consideration multicores system and complex |
---|
229 | memory architectures like the ones currently in use in networks on chips. |
---|
230 | The problem of execution time analysis for these systems is currently |
---|
231 | considered extremely hard or even unfeasible and at the moments it seems |
---|
232 | unlikely that our methodology could contribute to the solution of the problem. |
---|
233 | |
---|
234 | \subsubsection{Static analysis of costs of basic blocks revisited} |
---|
235 | We will now describe what currently seems to be the most interesting technique |
---|
236 | for the static analysis of the cost of basic blocks in presence of complex |
---|
237 | hardware architectures that do not have non compositional cost models. |
---|
238 | |
---|
239 | We start presenting an idealized model of the execution of a generic |
---|
240 | microprocessor (with caches) that has all interrupts disabled and no I/O |
---|
241 | instructions. We then classify the models according to some properties of |
---|
242 | their cost model. Then we show how to extend the labelling approach of CerCo |
---|
243 | to cover models that are classified in a certain way. |
---|
244 | |
---|
245 | \paragraph{The microprocessor model} |
---|
246 | |
---|
247 | Let $\sigma,\sigma_1,\ldots$ range over $\Sigma$, the set of the fragments of |
---|
248 | the microprocessor states that hold the program counter, the program status word and |
---|
249 | all the data manipulated by the object code program, i.e. registers and |
---|
250 | memory cells. We call these fragments the \emph{data states}. |
---|
251 | |
---|
252 | Let $\delta,\delta_1,\ldots$ range over $\Delta$, the set of the |
---|
253 | fragments of the microprocessor state that holds the \emph{internal state} of the |
---|
254 | microprocessor (e.g. the content of the pipeline and caches, the status of |
---|
255 | the branch prediction unit, etc.). |
---|
256 | The internal state of the microprocessor influences the execution cost of the |
---|
257 | next instruction, but it has no effect on the functional behaviour of the |
---|
258 | processor. The whole state of the processor is represented by a pair |
---|
259 | $(\sigma,\delta)$. |
---|
260 | |
---|
261 | Let $I,I_1,\ldots$ range over $\mathcal{I}$, the |
---|
262 | the set of \emph{instructions} of the processor and let |
---|
263 | $\gamma,\gamma_1,\ldots$ range over $\Gamma$, the set of \emph{operands} |
---|
264 | of instructions after the fetching and decoding passes. |
---|
265 | Thus a pair $(I,\gamma)$ represents a \emph{decoded instruction} and already contains |
---|
266 | the data required for execution. Execution needs to access the data state only |
---|
267 | to write the result. |
---|
268 | |
---|
269 | Let $fetch: \Sigma \to \mathcal{I} \times \Gamma$ be the function that |
---|
270 | performs the fetching and execution phases, returning the decoded instruction |
---|
271 | ready for execution. This is not meant to be the real fetch-decode function, |
---|
272 | that exploits the internal state too to speed up execution (e.g. by retrieving |
---|
273 | the instruction arguments from caches) and that, in case of pipelines, works |
---|
274 | in several stages. However, such a function exists and |
---|
275 | it is observationally equivalent to the real fetch-decode. |
---|
276 | |
---|
277 | We capture the semantics of the microprocessor with the following set of |
---|
278 | functions: |
---|
279 | \begin{itemize} |
---|
280 | \item The \emph{functional transition} function $\longrightarrow : \Sigma \to \Sigma$ |
---|
281 | over data states. This is the only part of the semantics that is relevant |
---|
282 | to functional analysis. |
---|
283 | \item The \emph{internal state transition} function |
---|
284 | $\Longrightarrow : \Sigma \times \Delta \to \Delta$ that updates the internal |
---|
285 | state. |
---|
286 | \item The \emph{cost function} $K: \mathcal{I} \times \Gamma \times \Delta \to \mathbb{N}$ |
---|
287 | that assigns a cost to transitions. Since decoded instructions hold the data |
---|
288 | they act on, the cost of an instruction may depend both on the data being |
---|
289 | manipulated and on the internal state. |
---|
290 | \end{itemize} |
---|
291 | |
---|
292 | Given a processor state $(\sigma,\delta)$, the processor evolves in the |
---|
293 | new state $(\sigma',\delta')$ in $n$ cost units if |
---|
294 | $\sigma \longrightarrow \sigma'$ and $(\sigma,\delta) \Longrightarrow \delta'$ |
---|
295 | and $fetch(\sigma) = (I,\gamma)$ and $K(I,\gamma,\delta) = n$. |
---|
296 | |
---|
297 | An \emph{execution history} is a stream of states and transitions |
---|
298 | $\sigma_0 \longrightarrow \sigma_1 \longrightarrow \sigma_2 \ldots$ |
---|
299 | that can be either finite or infinite. Given an execution history, the |
---|
300 | corresponding \emph{execution path} is the stream of program counters |
---|
301 | obtained from the execution history by forgetting all the remaining information. |
---|
302 | The execution path of the history $\sigma_0 \longrightarrow \sigma_1 \longrightarrow \sigma_2 \ldots$ is $pc_0,pc_1,\ldots$ where each $pc_i$ is the program |
---|
303 | counter of $\sigma_i$. We denote the set of finite execution paths with $EP$. |
---|
304 | |
---|
305 | We claim this simple model to be generic enough to cover real world |
---|
306 | architectures. |
---|
307 | |
---|
308 | \paragraph{Classification of cost models} |
---|
309 | A cost function is \emph{exact} if it assigns to transitions the real cost |
---|
310 | incurred. It is \emph{approximated} if it returns an upper bound of the real |
---|
311 | cost. |
---|
312 | |
---|
313 | A cost function is \emph{operand insensitive} if it does not depend on the |
---|
314 | operands of the instructions to be executed. Formally, $K$ is operand insensitive |
---|
315 | if there exists a $K': \mathcal{I} \times \Delta \to \mathbb{N}$ such that |
---|
316 | $K(I,\gamma,\delta) = K'(I,\delta)$. In this case, with an abuse of terminology, |
---|
317 | we will identify $K$ with $K'$. |
---|
318 | |
---|
319 | The cost functions of simple hardware architectures, in particular RISC ones, |
---|
320 | are naturally operand insensitive. In the other cases an exact operand sensitive |
---|
321 | cost function can always be turned into an approximated operand insensitive one |
---|
322 | by taking $K'(I,\delta) = \max\{K(I,\gamma,\delta)~|~\gamma \in \Gamma\}$. |
---|
323 | The question when one performs these approximation is how severe the |
---|
324 | approsimation is. A measure is given by the \emph{jitter}, which is defined |
---|
325 | as the difference between the best and worst cases. In our case, the jitter |
---|
326 | of the approximation $K'$ would be $\max\{K(I,\gamma,\delta)~|~\gamma \in \Gamma\} - \min\{K(I,\gamma,\delta)~|~\gamma \in \Gamma\}$. According to experts |
---|
327 | of WCET analysis, the jitters relative to operand sensitivity in modern |
---|
328 | architectures are small enough to make WCET estimations still useful. |
---|
329 | Therefore, in the sequel we will focus on operand insensitive cost models only. |
---|
330 | |
---|
331 | Note that cost model that are operand insensitive may still have significant |
---|
332 | dependencies over the data manipulated by the instructions, because of the |
---|
333 | dependency over internal states. For example, an instruction that reads data |
---|
334 | from the memory may change the state of the cache and thus greatly affect the |
---|
335 | execution time of successive instructions. |
---|
336 | |
---|
337 | Nevertheless, operand insensitivity is an important property for the labelling |
---|
338 | approach. In~\cite{tranquilli} we introduced \emph{dependent labels} and |
---|
339 | \emph{dependent costs}, which are the possibility of assigning costs to basic |
---|
340 | blocks of instructions which are also dependent on the state of the high level |
---|
341 | program at the beginning of the block. The idea we will now try to pursue is |
---|
342 | to exploit dependent costs to capture cost models that are sensitive to |
---|
343 | the internal states of the microprocessor. Operand sensitivity, however, is |
---|
344 | a major issue in presence of dependent labels: to lift a data sensitive cost |
---|
345 | model from the object code to the source language, we need a function that |
---|
346 | maps high level states to the operands of the instructions to be executed, |
---|
347 | and we need these functions to be simple enough to allow reasoning over them. |
---|
348 | Complex optimizations performed by the compiler, however, make the mappings |
---|
349 | extremely cumbersomes and history dependent. Moreover, keeping |
---|
350 | track of status transformations during compilation would be a significant |
---|
351 | departure from classical compilation models which we are not willing to |
---|
352 | undertake. By assuming or removing operand sensitivity, we get rid of part |
---|
353 | of the problem: we only need to make our costs dependent on the internal |
---|
354 | state of the microprocessor. The latter, however, is not at all visible |
---|
355 | in the high level code. Our next step is to make it visible. |
---|
356 | |
---|
357 | In general, the value of the internal state at a certain point in the program |
---|
358 | history is affected by all the preceding history. For instance, the |
---|
359 | pipeline stages either hold operands of instructions in execution or bubbles |
---|
360 | \footnote{A bubble is formed in the pipeline stage $n$ when an instruction is stucked in the pipeline stage $n-1$, waiting for some data which is not available yet.}. The execution history contains data states that in turn contain the |
---|
361 | object code data which we do not know how to relate simply to the source code |
---|
362 | data. We therefore introduce a new classification. |
---|
363 | |
---|
364 | A \emph{view} over internal states is a pair $(\mathcal{V},|.|)$ where |
---|
365 | $\mathcal{V}$ is a non empty set and $|.|:\Delta \to \mathcal{V}$ is |
---|
366 | a forgetful function over internal states. |
---|
367 | |
---|
368 | The operand insensitive cost function $K$ is \emph{dependent on the view |
---|
369 | $\mathcal{V}$} if there exists a $K': \mathcal{I} \times \mathcal{V} \to \mathbb{N}$ such that $K(I,\delta) = K'(I,|\delta|)$. In this case, with an abuse of terminology, we will identify $K$ with $K'$. |
---|
370 | |
---|
371 | Among the possible views, the ones that we will easily be able to work |
---|
372 | with in the labelling approach are the \emph{execution history dependent views}. |
---|
373 | A view $(\mathcal{V},|.|)$ is execution history dependent when there exists |
---|
374 | a transition function $\hookrightarrow : PC \times \mathcal{V} \to \mathcal{V}$ |
---|
375 | such that for all $(\sigma,\delta)$ and $pc$ such that $pc$ is the program |
---|
376 | counter of $\sigma$, $(\sigma,\delta) \Longrightarrow \delta'$ iff |
---|
377 | $(pc,|\delta|) \hookrightarrow |\delta'|$. |
---|
378 | |
---|
379 | The reference to execution historys in the names is due to the following |
---|
380 | fact: every execution history dependent transition function $\hookrightarrow$ |
---|
381 | can be lifted to the type $EP \times \mathcal{V} \to \mathcal{V}$ by folding |
---|
382 | the definition over the path trace: |
---|
383 | $((pc_0,\ldots,pc_n),v_0) \hookrightarrow v_{n+1}$ iff for all $i \leq n$, |
---|
384 | $(pc_i,v_i) \hookrightarrow v_{i+1}$. |
---|
385 | Moreover, the folding is clearly associative: |
---|
386 | $(\tau_1 @ \tau_2, v) \hookrightarrow v''$ iff |
---|
387 | $(\tau_1,v) \hookrightarrow v'$ and $(\tau_2,v') \hookrightarrow v''$. |
---|
388 | |
---|
389 | As a final definition, we say that a cost function $K$ is |
---|
390 | \emph{data independent} if it is dependent on a view that is execution path |
---|
391 | dependent. In two paragraphs we will show how we can extend the |
---|
392 | labelling approach to deal with data independent cost models. |
---|
393 | |
---|
394 | Before that, we show that the class of data independent cost functions |
---|
395 | is not too restricted to be interesting. In particular, at least simple |
---|
396 | pipeline models admit data independent cost functions. |
---|
397 | |
---|
398 | \paragraph{A data independent cost function for simple pipelines} |
---|
399 | We consider here a simple model for a pipeline with $n$ stages without branch |
---|
400 | prediction and hazards. We also assume that the actual value of the operands |
---|
401 | of the instruction that is being read have no influence on stalls (i.e. the |
---|
402 | creation of bubbles) nor on the execution cost. The type of operands, however, |
---|
403 | can. For example, reading the value 4 from a register may stall a pipeline |
---|
404 | if the register has not been written yet, while reading 4 from a different |
---|
405 | register may not stall the pipeline. |
---|
406 | |
---|
407 | The internal states $\Delta$ of the pipeline are $n$-tuples of decoded |
---|
408 | instructions or bubbles: |
---|
409 | $\Delta = (\mathcal{I} \times \Gamma \cup \mathbbm{1})^n$. |
---|
410 | This representation is not meant to describe the real data structures used |
---|
411 | in the pipeline: in the implementation the operands are not present in every |
---|
412 | stage of the pipeline, but are progressively fetched. A state |
---|
413 | $(x_1,x_2,\ldots,(I,\gamma))$ represents the state of the pipeline just before |
---|
414 | the completion of instruction $(I,\gamma)$. The first $n-1$ instructions that |
---|
415 | follow $I$ may already be stored in the pipeline, unless bubbles have delayed |
---|
416 | one or more of them. |
---|
417 | |
---|
418 | We introduce the following view over internal states: |
---|
419 | $(\{0,1\}^n,|.|)$ where $\mathbb{N}_n = {0,\ldots,2^n-1}$ and |
---|
420 | $|(x_1,\ldots,x_n)| = (y_1,\ldots,y_n)$ where $y_i$ is 1 iff $x_i$ is a bubble. |
---|
421 | Thus the view only remembers which stages of the pipelines are stuck. |
---|
422 | The view holds enough information to reconstruct the internal state given |
---|
423 | the current data state: from the data state we can fetch the program counter |
---|
424 | of the current and the next $n-1$ instructions and, by simulating at most |
---|
425 | $n$ execution steps and by knowing where the bubbles were, we can fill up |
---|
426 | the internal state of the pipeline. |
---|
427 | |
---|
428 | The assumptions on the lack of influence of operands values on stalls and |
---|
429 | execution times ensures the existence of the transition function |
---|
430 | $\hookrightarrow : PC \times \{0,1\}^n \to \{0,1\}^n$ and the data |
---|
431 | independent cost function $K: PC \times \{0,1\}^n \to \mathbb{N}$. |
---|
432 | |
---|
433 | While the model is a bit simplicistic, it can nevertheless be used to describe |
---|
434 | existing pipelines. It is also simple to be convinced that the same model |
---|
435 | also captures static branch prediction: speculative execution of conditional |
---|
436 | jumps is performed by always taking the same branch which does not depend on |
---|
437 | the execution history. In order to take in account jumpt predictions based on |
---|
438 | the execution history, we just need to incorporate in the status and the view |
---|
439 | statistical informations on the last executions of the branch. |
---|
440 | |
---|
441 | \paragraph{The labelling appraoch for data independent cost models} |
---|
442 | We now describe how the labelling approach can be slightly modified to deal |
---|
443 | with data independent cost models. |
---|
444 | |
---|
445 | \end{document} |
---|