Changeset 3126
 Timestamp:
 Apr 12, 2013, 1:00:03 AM (5 years ago)
 Location:
 Deliverables/D6.3
 Files:

 1 added
 1 edited
Legend:
 Unmodified
 Added
 Removed

Deliverables/D6.3/report.tex
r3125 r3126 2 2 3 3 \usepackage{../style/cerco} 4 \usepackage{pdfpages} 4 5 5 6 \usepackage{amsfonts} … … 114 115 \end{quotation} 115 116 116 \section{Middle and Long Term Improvements} 117 In order to identify the middle and long term improvements, we briefly review 118 here the premises and goals of the CerCo approach to resource analysis. 119 \begin{itemize} 120 \item There is a lot of recent and renewed activity in the formal method 121 community to accomodate resource analysis using techniques derived from 122 functional analysis (type systems, logics, abstract interpretation, 123 amortized analysis applied to data structures, etc.) 124 \item Most of this work, which currently remains at theoretical level, 125 is focused on high level languages and it assumes the existence of correct 126 and compositional resource cost models. 127 \item High level languages are compiled to object code by compilers that 128 should respect the functional properties of the program. However, because 129 of optimizations and the inherently non compositional nature of compilation, 130 compilers do not respect compositional cost models that are imposed a priori 131 on the source language. By controlling the compiler and coupling it with a 132 WCET analyser, it is actually possible to 133 choose the cost model in such a way that the cost bounds are high enough 134 to bound the cost of every produced code. This was attempted for instance 135 in the EmBounded project with good success. However, we believe that bounds 136 obtained in this way have few possibilities of being tight. 137 \item Therefore our approach consists in having the compiler generate the 138 cost model for the user by combining tracking of basic blocks during code 139 transformations with a static resource analysis on the object code for basic 140 blocks. We formally prove the compiler to respect the cost model that is 141 induced on the source level based on a very few assumptions: basically the 142 cost of a sequence of instructions should be associative and commutative 143 and it should not depend on the machine status, except its program counter. 144 Commutativity can be relaxed at the price of introducing more cost updates 145 in the instrumented source code. 146 \item The cost model for basic blocks induced on the source language must then 147 be exploited to derive cost invariants and to prove them automatically. 148 In CerCo we have shown how even simple invariant generations techniques 149 are sufficient to enable the fully automatic proving of parametric WCET 150 bounds for simple C programs and for Lustre programs of arbitrary complexity. 151 \end{itemize} 117 \newpage 152 118 153 Compared to traditional WCET techniques, our approach currently has many 154 similarities, some advantages and some limitations. Both techniques need to 155 perform data flow analysis on the control flow graph of the program and both 156 techniques need to estimate the cost of control blocks of instructions. 157 158 \subsection{Control flow analysis} 159 160 The first main difference is in the control flow analysis. Traditional WCET 161 starts from object code and reconstructs the control flow graph from it. 162 Moreover, abstract interpretation is heavily employed to bound the number of 163 executions of cycles. In order to improve the accuracy of estimation, control 164 flow constraints are provided by the user, usually as systems of (linear) 165 inequalities. In order to do this, the user, helped by the system, needs to 166 relate the object code control flow graph with the source one, because it is 167 on the latter that the bounds can be figured out and be understood. This 168 operations is untrusted and potentially error prone for complex optimizations 169 (like aggressive loop optimizations). Efficient tools from linear algebra are 170 then used to solve the systems of inequations obtained by the abstract 171 interpreter and from the user constraints. 172 173 In CerCo, instead, we assume full control on the compiler that 174 is able to relate, even in non trivial ways, the object code control flow graph 175 onto the source code control flow graph. A clear disadvantage is the 176 impossibility of applying the tool on the object code produced by third party 177 compilers. On the other hand, we get rid of the possibility of errors 178 in the reconstruction of the control flow graph and in the translation of 179 high level constraints into low level constraints. The second potentially 180 important advantage is that, once we are dealing with the source language, 181 we can augment the precision of our dataflow analysis by combining together 182 functional and non functional invariants. This is what we attempted with 183 the CerCo Cost Annotating FramaC PlugIn. The FramaC architecture allows 184 several plugins to perform all kind of static analisys on the source code, 185 reusing results from other plugins and augmenting the source code with their 186 results. The techniques are absolutely not limited to linear algebra and 187 abstract interpretation, and the most important plugins call domain specific 188 and general purpose automated theorem provers to close proof obligations of 189 arbitrary shape and complexity. 190 191 In principle, the extended flexibility of the analysis should allow for a major 192 advantage of our technique in terms of precision, also 193 considering that all analysis used in traditional WCET can still be implemented 194 as plugins. In particular, the target we have in mind are systems that are 195 both (hard) real time and safety critical. Being safety critical, we can already 196 expect them to be fully or partially specified at the functional level. 197 Therefore we expect that the additional functional invariants should allow to 198 augment the precision of the cost bounds, up to the point where the parametric 199 cost bound is fully precise. 200 In practice, we have not had the time to perform extensive 201 comparisons on the kind of code used by industry in production systems. 202 The first middle term improvement of CerCo would then consist in this kind 203 of analysis, to support or disprove our expectations. It seems that the 204 newborn TACLe Cost Action (Timing Analysis on Code Level) would be the 205 best framework to achieve this improvement. 206 In the case where our 207 technique remains promising, the next long term improvement would consist in 208 integrating in the FramaC plugin adhoc analysis and combinations of analysis 209 that would augment the coverage of the efficiency of the cost estimation 210 techniques. 211 212 \subsection{Static analysis of costs of basic blocks} 213 At the beginning of the project we have deliberately decided to focus our 214 attention on the control flow preservation, the cost model propagation and 215 the exploitation of the cost model induced on the high level code. For this 216 reason we have devoted almost no attention to the static analysis of basic 217 blocks. This was achieved by picking a very simple hardware architecture 218 (the 8051 microprocessor family) whose cost model is fully predictable and 219 compositional: the cost of every instruction  except those that deal with 220 I/O  is constant, i.e. it does not depend on the machine status. 221 We do not regret this choice because, with the limited amount of man power 222 available in the project, it would have been difficult to also consider this 223 aspect. However, without showing if the approach can scale to most complex 224 architectures, our methodology remains of limited interest for the industry. 225 Therefore, the next important middle term improvement will be the extension 226 of our methodology to cover pipelines and simple caches. We will now present our 227 ideas on how we intend to address the problem. The obvious long term 228 improvement would be to take in consideration multicores system and complex 229 memory architectures like the ones currently in use in networks on chips. 230 The problem of execution time analysis for these systems is currently 231 considered extremely hard or even unfeasible and at the moments it seems 232 unlikely that our methodology could contribute to the solution of the problem. 233 234 \subsubsection{Static analysis of costs of basic blocks revisited} 235 We will now describe what currently seems to be the most interesting technique 236 for the static analysis of the cost of basic blocks in presence of complex 237 hardware architectures that do not have non compositional cost models. 238 239 We start presenting an idealized model of the execution of a generic 240 microprocessor (with caches) that has all interrupts disabled and no I/O 241 instructions. We then classify the models according to some properties of 242 their cost model. Then we show how to extend the labelling approach of CerCo 243 to cover models that are classified in a certain way. 244 245 \paragraph{The microprocessor model} 246 247 Let $\sigma,\sigma_1,\ldots$ range over $\Sigma$, the set of the fragments of 248 the microprocessor states that hold the program counter, the program status word and 249 all the data manipulated by the object code program, i.e. registers and 250 memory cells. We call these fragments the \emph{data states}. 251 252 Let $\delta,\delta_1,\ldots$ range over $\Delta$, the set of the 253 fragments of the microprocessor state that holds the \emph{internal state} of the 254 microprocessor (e.g. the content of the pipeline and caches, the status of 255 the branch prediction unit, etc.). 256 The internal state of the microprocessor influences the execution cost of the 257 next instruction, but it has no effect on the functional behaviour of the 258 processor. The whole state of the processor is represented by a pair 259 $(\sigma,\delta)$. 260 261 Let $I,I_1,\ldots$ range over $\mathcal{I}$, the 262 the set of \emph{instructions} of the processor and let 263 $\gamma,\gamma_1,\ldots$ range over $\Gamma$, the set of \emph{operands} 264 of instructions after the fetching and decoding passes. 265 Thus a pair $(I,\gamma)$ represents a \emph{decoded instruction} and already contains 266 the data required for execution. Execution needs to access the data state only 267 to write the result. 268 269 Let $fetch: \Sigma \to \mathcal{I} \times \Gamma$ be the function that 270 performs the fetching and execution phases, returning the decoded instruction 271 ready for execution. This is not meant to be the real fetchdecode function, 272 that exploits the internal state too to speed up execution (e.g. by retrieving 273 the instruction arguments from caches) and that, in case of pipelines, works 274 in several stages. However, such a function exists and 275 it is observationally equivalent to the real fetchdecode. 276 277 We capture the semantics of the microprocessor with the following set of 278 functions: 279 \begin{itemize} 280 \item The \emph{functional transition} function $\longrightarrow : \Sigma \to \Sigma$ 281 over data states. This is the only part of the semantics that is relevant 282 to functional analysis. 283 \item The \emph{internal state transition} function 284 $\Longrightarrow : \Sigma \times \Delta \to \Delta$ that updates the internal 285 state. 286 \item The \emph{cost function} $K: \mathcal{I} \times \Gamma \times \Delta \to \mathbb{N}$ 287 that assigns a cost to transitions. Since decoded instructions hold the data 288 they act on, the cost of an instruction may depend both on the data being 289 manipulated and on the internal state. 290 \end{itemize} 291 292 Given a processor state $(\sigma,\delta)$, the processor evolves in the 293 new state $(\sigma',\delta')$ in $n$ cost units if 294 $\sigma \longrightarrow \sigma'$ and $(\sigma,\delta) \Longrightarrow \delta'$ 295 and $fetch(\sigma) = (I,\gamma)$ and $K(I,\gamma,\delta) = n$. 296 297 An \emph{execution history} is a stream of states and transitions 298 $\sigma_0 \longrightarrow \sigma_1 \longrightarrow \sigma_2 \ldots$ 299 that can be either finite or infinite. Given an execution history, the 300 corresponding \emph{execution path} is the stream of program counters 301 obtained from the execution history by forgetting all the remaining information. 302 The execution path of the history $\sigma_0 \longrightarrow \sigma_1 \longrightarrow \sigma_2 \ldots$ is $pc_0,pc_1,\ldots$ where each $pc_i$ is the program 303 counter of $\sigma_i$. We denote the set of finite execution paths with $EP$. 304 305 We claim this simple model to be generic enough to cover real world 306 architectures. 307 308 \paragraph{Classification of cost models} 309 A cost function is \emph{exact} if it assigns to transitions the real cost 310 incurred. It is \emph{approximated} if it returns an upper bound of the real 311 cost. 312 313 A cost function is \emph{operand insensitive} if it does not depend on the 314 operands of the instructions to be executed. Formally, $K$ is operand insensitive 315 if there exists a $K': \mathcal{I} \times \Delta \to \mathbb{N}$ such that 316 $K(I,\gamma,\delta) = K'(I,\delta)$. In this case, with an abuse of terminology, 317 we will identify $K$ with $K'$. 318 319 The cost functions of simple hardware architectures, in particular RISC ones, 320 are naturally operand insensitive. In the other cases an exact operand sensitive 321 cost function can always be turned into an approximated operand insensitive one 322 by taking $K'(I,\delta) = \max\{K(I,\gamma,\delta)~~\gamma \in \Gamma\}$. 323 The question when one performs these approximation is how severe the 324 approsimation is. A measure is given by the \emph{jitter}, which is defined 325 as the difference between the best and worst cases. In our case, the jitter 326 of the approximation $K'$ would be $\max\{K(I,\gamma,\delta)~~\gamma \in \Gamma\}  \min\{K(I,\gamma,\delta)~~\gamma \in \Gamma\}$. According to experts 327 of WCET analysis, the jitters relative to operand sensitivity in modern 328 architectures are small enough to make WCET estimations still useful. 329 Therefore, in the sequel we will focus on operand insensitive cost models only. 330 331 Note that cost model that are operand insensitive may still have significant 332 dependencies over the data manipulated by the instructions, because of the 333 dependency over internal states. For example, an instruction that reads data 334 from the memory may change the state of the cache and thus greatly affect the 335 execution time of successive instructions. 336 337 Nevertheless, operand insensitivity is an important property for the labelling 338 approach. In~\cite{tranquilli} we introduced \emph{dependent labels} and 339 \emph{dependent costs}, which are the possibility of assigning costs to basic 340 blocks of instructions which are also dependent on the state of the high level 341 program at the beginning of the block. The idea we will now try to pursue is 342 to exploit dependent costs to capture cost models that are sensitive to 343 the internal states of the microprocessor. Operand sensitivity, however, is 344 a major issue in presence of dependent labels: to lift a data sensitive cost 345 model from the object code to the source language, we need a function that 346 maps high level states to the operands of the instructions to be executed, 347 and we need these functions to be simple enough to allow reasoning over them. 348 Complex optimizations performed by the compiler, however, make the mappings 349 extremely cumbersomes and history dependent. Moreover, keeping 350 track of status transformations during compilation would be a significant 351 departure from classical compilation models which we are not willing to 352 undertake. By assuming or removing operand sensitivity, we get rid of part 353 of the problem: we only need to make our costs dependent on the internal 354 state of the microprocessor. The latter, however, is not at all visible 355 in the high level code. Our next step is to make it visible. 356 357 In general, the value of the internal state at a certain point in the program 358 history is affected by all the preceding history. For instance, the 359 pipeline stages either hold operands of instructions in execution or bubbles 360 \footnote{A bubble is formed in the pipeline stage $n$ when an instruction is stucked in the pipeline stage $n1$, waiting for some data which is not available yet.}. The execution history contains data states that in turn contain the 361 object code data which we do not know how to relate simply to the source code 362 data. We therefore introduce a new classification. 363 364 A \emph{view} over internal states is a pair $(\mathcal{V},.)$ where 365 $\mathcal{V}$ is a finite non empty set and $.:\Delta \to \mathcal{V}$ is 366 a forgetful function over internal states. 367 368 The operand insensitive cost function $K$ is \emph{dependent on the view 369 $\mathcal{V}$} if there exists a $K': \mathcal{I} \times \mathcal{V} \to \mathbb{N}$ such that $K(I,\delta) = K'(I,\delta)$. In this case, with an abuse of terminology, we will identify $K$ with $K'$. 370 371 Among the possible views, the ones that we will easily be able to work 372 with in the labelling approach are the \emph{execution history dependent views}. 373 A view $(\mathcal{V},.)$ is execution history dependent with a lookahead 374 of length $n$ when there exists 375 a transition function $\hookrightarrow : PC^n \times \mathcal{V} \to \mathcal{V}$ 376 such that for all $(\sigma,\delta)$ and $pc_1,\ldots,pc_n$ such that every 377 $pc_i$ is the program counter of $\sigma_i$ defined by $\sigma \longrightarrow^i \sigma_i$, we have $(\sigma,\delta) \Longrightarrow \delta'$ iff 378 $((pc_1,\ldots,pc_n),\delta) \hookrightarrow \delta'$. 379 380 Less formally, a view is 381 dependent on the execution history if the evolution of the views is fully 382 determined by the evolution of the program counters. To better understand 383 the definition, consider the case where the next instruction to be executed 384 is a conditional jump. Without knowing the values of the registers, it is 385 impossible to determine if the true or false branches will be taken. Therefore 386 it is likely to be impossible to determine the value of the view the follows 387 the current one. On the other hand, knowing the program counter that will be 388 reached executing the conditional branch, we also know which 389 branch will be taken and this may be sufficient to compute the new view. 390 Lookaheads longer then 1 will be used in case of pipelines: when executing 391 one instruction in a system with a pipeline of length $n$, the internal state 392 of the pipeline already holds information on the next $n$ instructions to 393 be executed. 394 395 The reference to execution historys in the names is due to the following 396 fact: every execution history dependent transition function $\hookrightarrow$ 397 can be lifted to the type $EP \times \mathcal{V} \to \mathcal{V}$ by folding 398 the definition over the path trace: 399 $((pc_0,\ldots,pc_m),v_0) \hookrightarrow v_n$ iff for all $i \leq m  n$, 400 $((pc_i,\ldots,pc_{i+n}),v_i) \hookrightarrow v_{i+1}$. 401 Moreover, the folding is clearly associative: 402 $(\tau_1 @ \tau_2, v) \hookrightarrow v''$ iff 403 $(\tau_1,v) \hookrightarrow v'$ and $(\tau_2,v') \hookrightarrow v''$. 404 405 As a final definition, we say that a cost function $K$ is 406 \emph{data independent} if it is dependent on a view that is execution path 407 dependent. In two paragraphs we will show how we can extend the 408 labelling approach to deal with data independent cost models. 409 410 Before that, we show that the class of data independent cost functions 411 is not too restricted to be interesting. In particular, at least simple 412 pipeline models admit data independent cost functions. 413 414 \paragraph{A data independent cost function for simple pipelines} 415 We consider here a simple model for a pipeline with $n$ stages without branch 416 prediction and hazards. We also assume that the actual value of the operands 417 of the instruction that is being read have no influence on stalls (i.e. the 418 creation of bubbles) nor on the execution cost. The type of operands, however, 419 can. For example, reading the value 4 from a register may stall a pipeline 420 if the register has not been written yet, while reading 4 from a different 421 register may not stall the pipeline. 422 423 The internal states $\Delta$ of the pipeline are $n$tuples of decoded 424 instructions or bubbles: 425 $\Delta = (\mathcal{I} \times \Gamma \cup \mathbbm{1})^n$. 426 This representation is not meant to describe the real data structures used 427 in the pipeline: in the implementation the operands are not present in every 428 stage of the pipeline, but are progressively fetched. A state 429 $(x_1,x_2,\ldots,(I,\gamma))$ represents the state of the pipeline just before 430 the completion of instruction $(I,\gamma)$. The first $n1$ instructions that 431 follow $I$ may already be stored in the pipeline, unless bubbles have delayed 432 one or more of them. 433 434 We introduce the following view over internal states: 435 $(\{0,1\}^n,.)$ where $\mathbb{N}_n = {0,\ldots,2^n1}$ and 436 $(x_1,\ldots,x_n) = (y_1,\ldots,y_n)$ where $y_i$ is 1 iff $x_i$ is a bubble. 437 Thus the view only remembers which stages of the pipelines are stuck. 438 The view holds enough information to reconstruct the internal state given 439 the current data state: from the data state we can fetch the program counter 440 of the current and the next $n1$ instructions and, by simulating at most 441 $n$ execution steps and by knowing where the bubbles were, we can fill up 442 the internal state of the pipeline. 443 444 The assumptions on the lack of influence of operands values on stalls and 445 execution times ensures the existence of the data independent cost function 446 $K: PC \times \{0,1\}^n \to \mathbb{N}$. The transition function for a 447 pipeline with $n$ stages may require $n$ lookaheads: 448 $\hookrightarrow : PC^n \times \{0,1\}^n \to \{0,1\}^n$. 449 450 While the model is a bit simplicistic, it can nevertheless be used to describe 451 existing pipelines. It is also simple to be convinced that the same model 452 also captures static branch prediction: speculative execution of conditional 453 jumps is performed by always taking the same branch which does not depend on 454 the execution history. In order to take in account jumpt predictions based on 455 the execution history, we just need to incorporate in the status and the view 456 statistical informations on the last executions of the branch. 457 458 \paragraph{The labelling approach for data independent cost models} 459 We now describe how the labelling approach can be slightly modified to deal 460 with a data independent cost model $(\hookrightarrow,K)$ built over 461 $(\mathcal{V},.)$. 462 463 In the labelling approach, every basic block in the object code is identified 464 with a unique label $L$ which is also associated to the corresponding 465 basic block in the source level. Let us assume that labels are also inserted 466 after every function call and that this property is preserved during 467 compilation. Adding labels after calls makes the instrumented code heavier 468 to read and it generates more proof obligations on the instrumented code, but 469 it does not create any additional problems. The preservation during compilation 470 creates some significant technical complications in the proof of correctness 471 of the compiler, but those can be solved. 472 473 The static analysis performed in the last step of the basic labelling approach 474 analyses the object code in order to assign a cost to every label or, 475 equivalently, to every basic block. The cost is simply the sum of the cost 476 of very instruction in the basic block. 477 478 In our scenario, instructions no longer have a cost, because the cost function 479 $K$ takes in input a program counter but also a view $v$. Therefore we 480 replace the static analysis with the computation, for every basic block and 481 every $v \in \mathcal{V}$, of the sum of the costs of the instructions in the 482 block, starting in the initial view $v$. Formally, let the sequence of the 483 program counters of the basic block form the execution path $pc_0,\ldots,pc_n$. 484 The cost $K(v_0,L)$ associated to the block labelled with 485 $L$ and the initial view $v_0$ is 486 $K(pc_0,v_0) + K(pc_1,v_1) + \ldots + K(pc_n,v_n)$ where for every $i < n$, 487 $((pc_i,\ldots,pc_{i+l}),v_i) \hookrightarrow v_{k+1}$ where $l$ is the 488 lookahead required. When the lookahead requires program counters outside the 489 block under analysis, we are free to use dummy ones because the parts of the 490 view that deal with future behaviour have no impact on the cost of the 491 previous operations by assumption. 492 493 The static analysis can be performed 494 in linear time in the size of the program because the cardinality of the sets 495 of labels (i.e. the number of basic blocks) is bounded by the size of the 496 program and because the set $V$ is finite. In the case of the pipelines of 497 the previous paragraph, the static analysis is $2^n$ times more expensive 498 than the one of the basic labelling approach, where $n$ is the number of 499 pipeline stages. 500 501 The first imporant theorem in the labelling approach is the correctness of the 502 static analysis: if the (dependent) cost associated to a label $L$ is $k$, 503 then executing a program from the beginning of the basic block to the end 504 of the basic block should take exactly $k$ cost units. The proof only relies 505 on associativity and commutativity of the composition of costs. Commutativity 506 is only required if basic blocks can be nested, i.e. if a basic block does not 507 terminate when it reaches a call, but it continues after the called function 508 returns. By assuming to have a cost label after each block, we do not need 509 commutativity any longer, which does not hold for the definition of $K$ we 510 just gave. The reason is that, if $pc_i$ is a function call executed in 511 the view (state) $v_i$, it is not true that, after return, the state 512 will be $v_i+1$ defined by $(pc_i,pc_{i+1},v_i) \hookrightarrow v_{i+1}$ 513 (assuming a lookahead of 1, which is already problematic). 514 Indeed $pc_{i+1}$ is the program counter of the instruction that follows 515 the call, whereas the next program counter to be reached is the one of the 516 body of the call. Moreover, even if the computation would make sense, 517 $v_{i+1}$ would be the state at the beginning of the execution of the body of 518 the call, while we should know the state after the function returns. 519 The latter cannot be 520 statically predicted. That's why we had to impose labels after calls. 521 Associativity, on the other hand, trivially holds. Therefore the proof 522 of correctness of the static analysis can be reused without any change. 523 524 So far, we have computed the dependent cost $K: \mathcal{V} \times \mathcal{L} \to \mathbb{N}$ that associates a cost to basic blocks and views. 525 The second step consists in statically computing the transition function 526 $\hookrightarrow : \mathcal{L} \times \mathcal{L} \times \mathcal{V} \to \mathcal{V}$ that 527 associates to each pair of consecutively executed basic blocks and input view 528 the view obtained at the end of the execution of the first block. 529 530 The definition is the following: 531 $(L,L',v) \hookrightarrow v'$ iff $((pc_0,\ldots,pc_n,pc'_0,\ldots,pc'_m),v) \hookrightarrow v'$ where $(pc_0,\ldots,pc_n)$ are the program counters of the block labelled by $L$ and $(pc'_0,\ldots,pc'_m)$ are those of the block labelled 532 with $L'$. We assume here that $m$ is always longer or equal to the lookahead 533 required by the transition function $\hookrightarrow$ over execution paths. 534 When this is not the case we could make the new transition function take in 535 input a longer lookahead of labels. Or we may assume to introduce enough 536 \texttt{NOP}s at the beginning of the block $L'$ to enforce the property. 537 In the rest of the paragraph we assume to have followed the second approach 538 to simplify the presentation. 539 540 The extended transition function over labels is not present in the basic labelling approach. Actually, the basic labelling 541 approach can be understood as the generalized approach where the view $\mathcal{V} = \mathbbm{1}$. 542 The computation of the extended $\hookrightarrow$ transition function 543 is again linear in the size of the program. 544 545 Both the versions of $K$ and $\hookrightarrow$ defined over labels can be 546 lifted to work over traces by folding them over the list of labels in the 547 trace: for $K$ we have $K((L_1,\ldots,L_n),v) = K(L_1,v) + K((L_2,\ldots,L_n),v')$ where 548 $(L_1,L_2,v) \hookrightarrow v'$; for $\hookrightarrow$ we have 549 $((L_1,\ldots,L_n),v) \hookrightarrow v''$ iff $(L_1,L_2,v) \hookrightarrow v'$ and $((L_2,\ldots,L_n),v') \hookrightarrow v''$. The two definitions are also 550 clearly associative. 551 552 The second main theorem of the labelling approach is trace preservation: 553 the trace produced by the object code is the same as the trace produced by 554 the source code. Without any need to change the proof, we immediately obtain 555 as a corollary that for every view $v$, the cost $K(\tau,v)$ computed from 556 the source code trace $\tau$ is the same than the cost $K(\tau,v)$ computed 557 on the object code trace, which is again $\tau$. 558 559 The final step of the labelling approach is source code instrumentation. 560 In the basic labelling approach it consists in adding a global variable 561 \texttt{\_\_cost}, initialized with 0, which is incremented at the beginning 562 of every basic block with the cost of the label of the basic block. 563 Here we just need a more complex instrumentation that keeps track of the 564 values of the views during execution: 565 \begin{itemize} 566 \item We define three global variables \texttt{\_\_cost}, initialized at 0, 567 \texttt{\_\_label}, initialized with \texttt{NULL}, and 568 \texttt{\_\_view}, uninitialized. 569 \item At the beginning of every basic block labelled by $L$ we add the 570 following code fragment: 571 572 \begin{tabular}{l} 573 \texttt{\_\_view = \_\_next(\_\_label,}$L$\texttt{,\_\_view);}\\ 574 \texttt{\_\_cost += }$K(\mbox{\texttt{\_\_view}},L)$\texttt{;}\\ 575 \texttt{\_\_label = }$L$\texttt{;} 576 \end{tabular} 577 578 where \texttt{\_\_next(}$L_1,L_2,v$\texttt{)} = $v'$ iff 579 $(L_1,L_2,v) \hookrightarrow v'$ unless $L_1$ is \texttt{NULL}. 580 In that case \texttt{(\_\_next(NULL,}$L$\texttt{)} = $v_0$ where 581 $v_0 = \delta_0$ and $\delta_0$ is the initial value of the internal stat 582 at the beginning of program execution. 583 584 The first line of the code fragment computes the view at the beginning of 585 the execution of the block from the view at the end of the previous block. 586 Then we update the cost function with the cost of the block. Finally we 587 remember the current block to use it for the computation of the next view 588 at the beginning of the next block. 589 \end{itemize} 590 591 \paragraph{An example of instrumentation in presence of a pipeline} 592 In Figure~\ref{factprogi} we show how the instrumentationof a program 593 that computes the factorial of 10 would look like in presence of a pipeline. 594 The instrumentation has been manually produced. The \texttt{\_\_next} 595 function says that the body of the internal loop of the \texttt{fact} 596 function can be executed in two different internal states, summarized by the 597 views 2 and 3. The view 2 holds at the beginning of even iterations, 598 while the view 3 holds at the beginning of odd ones. Therefore even and odd 599 iterations are assigned a different cost. Also the code after the loop can 600 be executed in two different states, depending on the oddness of the last 601 loop iteration. 602 603 \begin{figure}[t] 604 \begin{tiny} 605 \begin{verbatim} 606 int fact (int n) { 607 int i, res = 1; 608 for (i = 1 ; i <= n ; i++) res *= i; 609 return res; 610 } 611 612 int main () { 613 return (fact(10)); 614 } 615 \end{verbatim} 616 \end{tiny} 617 \caption{A simple program that computes the factorial of 10.\label{factprog}} 618 \end{figure} 619 620 The definitions of \texttt{\_\_next} and \texttt{\_\_K} are just examples. 621 For instance, it is possible as well that each one of the 10 iterations 622 is executed in a different internal state. 623 624 \begin{figure}[t] 625 \begin{tiny} 626 \begin{verbatim} 627 int __cost = 8; 628 int __label = 0; 629 int __view; 630 631 void __cost_incr(int incr) { 632 __cost = __cost + incr; 633 } 634 635 int __next(int label1, int label2, int view) { 636 if (label1 == 0) return 0; 637 else if (label1 == 0 && label2 == 1) return 1; 638 else if (label1 == 1 && label2 == 2) return 2; 639 else if (label1 == 2 && label2 == 2 && view == 2) return 3; 640 else if (label1 == 2 && label2 == 2 && view == 3) return 2; 641 else if (label1 == 2 && label2 == 3 && view == 2) return 1; 642 else if (label1 == 2 && label2 == 3 && view == 3) return 0; 643 else if (label1 == 3 && label2 == 4 && view == 0) return 0; 644 else if (label1 == 3 && label2 == 4 && view == 1) return 0; 645 } 646 647 int __K(int view, int label) { 648 if (view == 0 && label == 0) return 3; 649 else if (view == 1 && label == 1) return 14; 650 else if (view == 2 && label == 2) return 35; 651 else if (view == 3 && label == 2) return 26; 652 else if (view == 0 && label == 3) return 6; 653 else if (view == 1 && label == 3) return 8; 654 else if (view == 0 && label == 4) return 6; 655 } 656 657 int fact(int n) 658 { 659 int i; 660 int res; 661 __view = __next(__label,1,__view); __cost_incr(_K(__view,1)); __label = 1; 662 res = 1; 663 for (i = 1; i <= n; i = i + 1) { 664 __view = __next(__label,2,__view); __cost_incr(__K(__view,2)); __label = 2; 665 res = res * i; 666 } 667 __view = __next(__label,3,__view); __cost_incr(K(__view,3)); __label = 3; 668 return res; 669 } 670 671 int main(void) 672 { 673 int t; 674 __view = __next(__label,0,__view); __cost_incr(__K(__view,0)); __label = 0; 675 t = fact(10); 676 __view = __next(__label,4,__view); __cost_incr(__K(__view,4)); __label = 4; 677 return t; 678 } 679 \end{verbatim} 680 \end{tiny} 681 \caption{The instrumented version of the program in Figure~\ref{factprog}.\label{factprogi}} 682 \end{figure} 683 684 \paragraph{Considerations on the instrumentation} 685 The example of instrumentation in the previous paragraph shows that the 686 approach we are proposing exposes at the source level a certain amount 687 of information about the machine behavior. Syntactically, the additional 688 details, are almost entirely confined into the \texttt{\_\_next} and 689 \texttt{\_\_K} functions and they do not affect at all the functional behaviour 690 of the program. In particular, all invariants, proof obligations and proofs 691 that deal with the functional behavior only are preserved. 692 693 The interesting question, then, is what is the impact of the additional 694 details on non functional (intensional) invariants and proof obligations. 695 At the moment, without a working implementation to perform some large scale 696 tests, it is difficult to understand the level of automation that can be 697 achieved and the techniques that are likely to work better without introducing 698 major approximations. In any case, the preliminary considerations of the 699 project remain valid: 700 \begin{itemize} 701 \item The task of computing and proving invariants can be simplified, 702 even automatically, trading correctness with precision. For example, the 703 most aggressive approximation simply replaces the cost function 704 \texttt{\_\_K} with the function that ignores the view and returns for each 705 label the maximum cost over all possible views. Correspondingly, the 706 function \texttt{\_\_next} can be dropped since it returns views that 707 are later ignored. 708 709 A more refined possibility consists in approximating the output only on 710 those labels whose jitter is small or for those that mark basic blocks 711 that are executed only a small number of times. By simplyfing the 712 \texttt{\_\_next} function accordingly, it is possible to considerably 713 reduce the search space for automated provers. 714 \item The situation is not worse than what happens with time analysis on 715 the object code (the current state of the art). There it is common practice 716 to analyse larger chunks of execution to minimize the effect of later 717 approximations. For example, if it is known that a loop can be executed at 718 most 10 times, computing the cost of 10 iterations yields a 719 better bound than multiplying by 10 the worst case of a single interation. 720 721 We clearly can do the same on the source level. 722 More generally, every algorithm that works in standard WCET tools on the 723 object code is likely to have a counterpart on the source code. 724 We also expect to be able to do better working on the source code. 725 The reason is that we assume to know more functional properties 726 of the program and more high level invariants, and to have more techniques 727 and tools at our disposal. Even if at the moment we have no evidence to 728 support our claims, we think that this approach is worth pursuing in the 729 long term. 730 \end{itemize} 731 732 \paragraph{The problem with caches} 733 Cost models for pipelines  at least simple ones  are data independent, 734 i.e. they are dependent on a view that is execution path dependent. In other 735 words, the knowledge about the sequence of executed instructions is sufficient 736 to predict the cost of future instructions. 737 738 The same property does not hold for caches. The cost of accessing a memory 739 cell strongly depends on the addresses of the memory cells that have been read 740 in the past. In turn, the accessed addresses are a function of the low level 741 data state, that cannot be correlated to the source program state. 742 743 The strong correlation between the internal state of caches and the data 744 accessed in the past is one of the two main responsibles for the lack of 745 precision of static analysis in modern unicore architectures. The other one 746 is the lack of precise knowledge on the real behaviour of modern hardware 747 systems. In order to overcome both problems, that Cazorla\&alt.~\cite{proartis} 748 call the ``\emph{Timing Analysis Walls}'', the PROARTIS European Project has 749 proposed to design ``\emph{a hardware/software architecture whose execution 750 timing behaviour eradicates dependence on execution history}'' (\cite{proartis}, Section 1.2). The statement is obviously too strong. What is concretely 751 proposed by PROARTIS is the design of a hardware/software architecture whose 752 execution timing is \emph{execution path dependent} (our terminology). 753 754 We have already seen that we are able to accomodate in the labelling approach 755 cost functions that are dependent on views that are execution path dependent. 756 Before fully embracing the PROARTIS vision, 757 we need to check if there are other aspects of the PROARTIS proposal 758 that can be problematic for CerCo. 759 760 \paragraph{Static Probabilistic Time Analysis} 761 The approach of PROARTIS to achieve execution path dependent cost models 762 consists in turning the hardtoanalyze deterministic hardware components 763 (e.g. the cache) into probabilistic hardware components. Intuitively, 764 algorithms that took decision based on the program history now throw a dice. 765 The typical example which has been thoroughly studied in 766 PROARTIS~\cite{proartis2} is that of caches. There the proposal is to replace 767 the commonly used deterministic placement and replacement algorithms (e.g. LRU) 768 with fully probabilistic choices: when the cache needs to evict a page, the 769 page to be evicted is randomly selected according to the uniform distribution. 770 771 The expectation is that probabilistic hardware will have worse performances 772 in the average case, but it will exhibit the worst case performance only with 773 negligible probability. Therefore, it becomes no longer interesting to estimate 774 the actual worst case bound. What becomes interesting is to plot the probability 775 that the execution time will exceed a certain treshold. For all practical 776 purposes, a program that misses its deadline with a negligible probability 777 (e.g. $10^9$ per hour of operation) will be perfectly acceptable when deployed 778 on an hardware system (e.g. a car or an airplane) that is already specified in 779 such a way. 780 781 In order to plot the probability distribution of execution times, PROARTIS 782 proposes two methodologies: Static Probabilistic Time Analysis (SPTA) and 783 Measurement Based Probabilistic Time Analysis (MBPTA). The first one is 784 similar to traditional static analysis, but it operates on probabilistic 785 hardware. It is the one that we would like to embrace. The second one is 786 based on measurements and it is justified by the following assumption: if 787 the probabilities associated to every hardware operation are all independent 788 and identically distributed, then measuring the fime spent on full runs of 789 subsystems components yields a probabilistic estimate that remains valid 790 when the subsystem is plugged in a larger one. Moreover, the probabilistic 791 distribution of past runs must be equal to the one of future runs. 792 793 We understand that MBPTA is useful to analyze closed (sub)systems whose 794 functional behavior is deterministic. It does not seem to have immediate 795 applications to parametric time analysis, which we are interested in. 796 Therefore we focus on SPTA. 797 798 According to~\cite{proartis}, 799 ``\emph{in SPTA, execution time probability distributions for individual operations \ldots are determined statically 800 from a model of the processor. The design principles of PROARTIS 801 will ensure \ldots that the probabilities for the execution time of each 802 instruction are independent. \ldots SPTA is performed by calculating the 803 convolution ($\oplus$) of the discrete probability distributions which describe 804 the execution time for each instruction on a CPU; this provides a probability 805 distribution \ldots representing the timing behaviour of the entire sequence of 806 instructions.}'' 807 808 We will now analyze to what extend we can embrace SPTA in CerCo. 809 810 \paragraph{The labelling approach for Static Probabilistic Time Analysis} 811 812 To summarize, the main practical differences between standard static time 813 analysis and SPTA are: 814 \begin{itemize} 815 \item The cost functions for single instructions or sequences of instructions 816 no longer return a natural numbers (number of cost units) but integral 817 random variables. 818 \item Cost functions are extended from single instructions to sequences of 819 instructions by using the associative convolution operator $\oplus$ 820 \end{itemize} 821 822 By reviewing the papers that described the labelling approach, it is easy to 823 get convinced that the codomain of the cost analysis can be lifted from 824 that of natural numbers to any group. Moreover, by imposing labels after 825 every function call, commutativity can be dropped and the approach works on 826 every monoid (usually called \emph{cost monoids} in the litterature). 827 Because random variables and convolutions form a monoid, we immediately have 828 that the labelling approach extends itself to SPTA. The instrumented code 829 produced by an SPTACerCo compiler will then have random variables (on a finite 830 domain) as costs and convolutions in place of the \texttt\{\_\_cost\_incr\} 831 function. 832 833 What is left to be understood is the way to state and compute the 834 probabilistic invariants to do \emph{parametric SPTA}. Indeed, it seems that 835 PROARTIS only got interested into non parametric PTA. For example, it is 836 well known that actually computing the convolutions results in an exponential 837 growth of the memory required to represent the result of the convolutions. 838 Therefore, the analysis should work symbolically until the moment where 839 we are interested into extracting information from the convolution. 840 841 Moreover, assuming that the problem of computing invariants is solved, 842 the actual behavior of automated theorem 843 provers on probabilistic invariants is to be understood. It is likely that 844 a good amount of domain specific knowledge about probability theory must be 845 exploited and incorporated into automatic provers to achieve concrete results. 846 847 Parametric SPTA using the methodology developed in CerCo is a future research 848 direction that we believe to be worth exploring in the middle and long term. 849 850 \paragraph{Static Probabilistic Time Analysis for Caches in CerCo} 851 852 As a final remark, we note that the analysis in CerCo of systems that implement 853 probabilistic caches requires a combination of SPTA and data independent cost 854 models. The need for a probabilistic analysis is obvious but, as we saw in 855 the previous paragraph, it requires no modification of the labelling approach. 856 857 In order to understand the need for dependent labelling (to work on data 858 independent cost functions), we need to review the behaviour of probabilistic 859 caches as proposed by PROARTIS. The interested reader can 860 consult~\cite{proartis2articolo30} for further informations. 861 862 In a randomized cache, the probability of evicting a given line on every access 863 is $1/N$ where $N$ stands for the number of cache entries. Therefore the 864 hit probability of a specific access to such a cache is 865 $P(hit) = (\frac{N1}{N})^K$ where $K$ is the number of cache misses between 866 two consecutive accesses to the same cache entry. For the purposes of our 867 analysis, we must assume that every cache access could cause an eviction. 868 Therefore, we define $K$ (the \emph{reuse distance}) to be the number of 869 memory accesses between two consecutive accesses to the same cache entry, 870 including the access for which we are computing $K$. In order to compute 871 $K$ for every code memory address, we need to know the execution path (in 872 our terminology). In other words, we need a view that records for each 873 cache entry the number of memory accesses that has occurred since the last 874 access. 875 876 For pipelines with $n$ stages, the number of possible views is limited to 877 $2^n$: a view can usually just be represented by a word. This is not the 878 case for the views on caches, which are in principle very large. Therefore, 879 the dependent labelling approach for data indepedent cost functions that we 880 have presented here could still be unpractical for caches. If that turns out 881 to be the case, a possible strategy is the use of abstract interpretations 882 techniques on the object code to reduce the size of views exposed at the 883 source level, at the price of an early loss of precision in the analysis. 884 885 More research work must be performed at the current stage to understand if 886 caches can be analyzed, even probabilistically, using the CerCo technology. 887 This is left for future work and it will be postponed after the work on 888 pipelines. 889 890 \paragraph{Conclusions} 891 At the current state of the art functional properties of programs are better 892 proved high level languages, but the non functional ones are proved on the 893 corresponding object code. The non functional analysis, however, depends on 894 functional invariants, e.g. to bound or correlate the number of executions of 895 cycles. 896 897 The aim of the CerCo project is to reconcile the two analysis by performing 898 non functional analysis on the source code. This requires computing a cost 899 model on the object code and reflecting the cost model on the source code. 900 We achieve this in CerCo by designing a certified Cost Annotating Compiler that 901 keeps tracks of transformations of basic blocks, in order to create a 902 correspondence (not necessaritly bijection) between the basic blocks of the 903 source and target language. We then prove that the sequence of basic blocks 904 that are met in the source and target executions is correlated. Then, 905 we perform a static analysis of the cost of basic blocks on the target language 906 and we use it to compute the cost model for the source language basic blocks. 907 Finally, we compute cost invariants on the source code from the inferred cost 908 model and from the functional program invariants; we generate proof obligations 909 for the invariants; we use automatic provers to try to close the proof 910 obligations. 911 912 The cost of single instructions on modern architectures depend on the internal 913 state of various hardware components (pipelines, caches, branch predicting 914 units, etc.). The internal states are determined by the previous execution 915 history. Therefore the cost of basic blocks is parametric on the execution 916 history, which means both the instructions executed and the data manipulated 917 by the instructions. The CerCo approach is able to correlate the sequence 918 of blocks of source instructions with the sequence of blocks of target 919 instructions. It does not correlate the high level and the low level data. 920 Therefore we are not able in the general case to lift a cost model parametric 921 on the execution history on the source code. 922 923 To overcome the problem, we have identified a particular class of cost models 924 that are not dependent on the data manipulated. We argue that the CerCo 925 approach can cover this scenario by exposing in the source program a finite 926 data type of views over internal machine states. The costs of basic blocks 927 is parametric on these views, and the current view is updated at basic block 928 entry according to some abstraction of the machine hardware that does not 929 need to be understood. Further studies are needed to understand how invariant 930 generators and automatic provers can cope with these updates and parametric 931 costs. 932 933 We have argued how pipelines, at least simple ones, are captured by the 934 previous scenario and can in principle be manipulated using CerCo tools. 935 The same is not true for caches, whose behaviour deeply depends on the 936 data manipulated. By embracing the PROARTIS proposal of turning caches into 937 probabilistic components, we can break the data dependency. Nevertheless, 938 cache analysis remains more problematic because of the size of the views. 939 Further studies need to be focused on caches to understand if the problem of 940 size of the views can be tamed in practice without ruining the whole approach. 119 \includepdf[pages={}]{pipelines.pdf} 941 120 942 121 \end{document}
Note: See TracChangeset
for help on using the changeset viewer.