Changeset 1790 for Deliverables
 Timestamp:
 Feb 28, 2012, 4:35:26 PM (8 years ago)
 File:

 1 edited
Legend:
 Unmodified
 Added
 Removed

Deliverables/D1.2/CompilerProofOutline/outline.tex
r1788 r1790 78 78 \begin{LARGE} 79 79 \bf 80 Proof outline for the correctness of 80 Proof outline for the correctness of\\the CerCo compiler 81 81 \end{LARGE} 82 82 \end{center} … … 93 93 \begin{large} 94 94 Main Authors:\\ 95 B. Campbell, D. Mulligan, P. Tranquilli, C. Sacerdoti Coen95 J. Boender, B. Campbell, D. Mulligan, P. Tranquilli, C. Sacerdoti Coen 96 96 \end{large} 97 97 \end{center} … … 110 110 \label{sect.introduction} 111 111 112 In the last project review of the CerCo project, the project reviewers 113 recommended us to quickly outline a paperandpencil correctness proof 114 for each of the stages of the CerCo compiler in order to allow for an 115 estimation of the complexity and time required to complete the formalization 116 of the proof. This has been possible starting from month 18 when we have 117 completed the formalization in Matita of the datastructures and code of 118 the compiler. 119 120 In this document we provide a very highlevel, penandpaper 121 sketch of what we view as the best path to completing the correctness proof 122 for the compiler. In particular, for every translation between two intermediate languages, in both the front and backends, we identify the key translation steps, and identify some invariants that we view as being important for the correctness proof. We sketch the overall correctness results, and also briefly describe the parts of the proof that have already 123 been completed at the end of the First Period. 124 125 In the last section we finally present an estimation of the effort required 126 for the certification in Matita of the compiler and we draw conclusions. 112 In the last project review of the CerCo project, the project reviewers recommended that we briefly outline a pencilandpaper correctness proof for each of the stages of the CerCo compiler in order to facilitate an estimation of the complexity and time required to complete the formalisation of the proof. 113 This has been possible starting from month eighteen, as we have now completed the formalisation in Matita of the data structures and code of the compiler. 114 115 In this document we provide a very highlevel, pencilandpaper sketch of what we view as the best path to completing the correctness proof for the compiler. 116 In particular, for every translation between two intermediate languages, in both the front and backends, we identify the key translation steps, and identify some invariants that we view as being important for the correctness proof. 117 We sketch the overall correctness results, and also briefly describe the parts of the proof that have already been completed at the end of the First Period. 118 119 Finally, in the last section we present an estimation of the effort required for the certification in Matita of the compiler and draw conclusions. 127 120 128 121 \section{Frontend: Clight to RTLabs} … … 139 132 meeting we intend to move this transformation to the backend}\\ 140 133 \> $\downarrow$ \> cost labelling\\ 141 \> $\downarrow$ \> loop optimi zations\footnote{\label{lab:opt2}To be ported from the untrusted compiler and certified only in case of early completion of the certification of the other passes.} (an endotransformation)\\134 \> $\downarrow$ \> loop optimisations\footnote{\label{lab:opt2}To be ported from the untrusted compiler and certified only in case of early completion of the certification of the other passes.} (an endotransformation)\\ 142 135 \> $\downarrow$ \> partial redundancy elimination$^{\mbox{\scriptsize \ref{lab:opt2}}}$ (an endotransformation)\\ 143 136 \> $\downarrow$ \> stack variable allocation and control structure … … 154 147 155 148 Here, by `endotransformation', we mean a mapping from language back to itself: 156 the loop optimi zation step maps the Clight language to itself.149 the loop optimisation step maps the Clight language to itself. 157 150 158 151 %Our overall statements of correctness with respect to costs will … … 357 350 \> $\downarrow$ \quad \= \kill 358 351 \textsf{RTLabs}\\ 359 \> $\downarrow$ \> copy propagation\footnote{\label{lab:opt}To be ported from the untrusted compiler and certified only in case ofearly completion of the certification of the other passes.} (an endotransformation) \\352 \> $\downarrow$ \> copy propagation\footnote{\label{lab:opt}To be ported from the untrusted compiler and certified only in the case of an early completion of the certification of the other passes.} (an endotransformation) \\ 360 353 \> $\downarrow$ \> instruction selection\\ 361 354 \> $\downarrow$ \> change of memory models in compiler\\ … … 380 373 381 374 \subsection{Graph translations} 382 RTLabs and most intermediate languages in the backend have a graph 383 representation: 384 the code of each function is represented by a graph of instructions. 385 The graph maps a set of labels (the names of the nodes) to the instruction 386 stored at that label (the nodes of the graph). 387 Instructions reference zero or more additional labels that are the immediate 388 successors of the instruction: zero for return from functions; more than one 389 for conditional jumps and calls; one in all other cases. The references 390 from one instruction to its immediates are the arcs of the graph. 391 392 Status of graph languages always have a program counter that holds a 393 representation of a reference to the current instruction. 394 395 A translation between two consecutive graph languages maps each instruction 396 stored at location $l$ in the first graph and with immediate successors 397 $\{l_1,\ldots,l_n\}$ to a subgraph of the output graph that has a single 398 entry point at location $l$ and exit arcs to $\{l_1,\ldots,l_n\}$. Moreover, 399 the labels of all non entry nodes in the subgraph are distinct from all the 400 labels in the source graph. 401 402 In order to simplify the translations and the relative proofs of forward 403 simulation, after the release of D4.2 and D4.3, we have provided: 375 RTLabs and most intermediate languages in the backend have a graph representation: the code of each function is represented by a graph of instructions. 376 The graph maps a set of labels (the names of the nodes) to the instruction stored at that label (the nodes of the graph). 377 Instructions reference zero or more additional labels that are the immediate successors of the instruction: zero for return from functions, more than one for conditional jumps and calls, one in all other cases. 378 The references from one instruction to its immediate successors are the arcs of the graph. 379 380 The statuses of graph languages always contain a program counter that holds a representation of a reference to the current instruction. 381 382 A translation between two consecutive graph languages maps each instruction stored at location $l$ in the first graph and with immediate successors $\{l_1,\ldots,l_n\}$ to a subgraph of the output graph that has a single entry point at location $l$ and exit arcs to $\{l_1,\ldots,l_n\}$. 383 Moreover, the labels of all nonentry nodes in the subgraph are distinct from all the labels in the source graph. 384 385 In order to simplify the translations and the relative proofs of forward simulation, after the release of D4.2 and D4.3, we have provided: 404 386 \begin{itemize} 405 \item a new data type (called \texttt{blist}) that represents a 406 sequence of instructions to be added to the output graph. 407 The ``b'' in the name stands for binder, since a \texttt{blist} is 408 either empty, an extension of a \texttt{blist} with an instruction 409 at the front, or the generation of a fresh quantity followed by a 410 \texttt{blist}. The latter feature is used, for instance, to generate 411 fresh register names. The instructions in the list are unlabelled and 412 all of them but the last one are also sequential, like in a linear program. 413 \item a new iterator (called \texttt{b\_graph\_translate}) of type 387 \item 388 A new data type (called \texttt{blist}) that represents a sequence of instructions to be added to the output graph. 389 The ``b'' in the name stands for binder, since a \texttt{blist} is either empty, an extension of a \texttt{blist} with an instruction at the front, or the generation of a fresh quantity followed by a \texttt{blist}. 390 The latter feature is used, for instance, to generate fresh register names. 391 The instructions in the list are unlabelled and all of them but the last one are also sequential, like in a linear program. 392 \item 393 A new iterator (called \texttt{b\_graph\_translate}) of type 414 394 \begin{displaymath} 415 395 \mathtt{b\_graph\_translate}: (\mathtt{label} \rightarrow \mathtt{blist}) 416 396 \rightarrow \mathtt{graph} \rightarrow \mathtt{graph} 417 397 \end{displaymath} 418 The iterator transform the input graph in the output graph by replacing 419 each node with the graph that corresponds to the linear \texttt{blist} 420 obtained by applying the function in input to the node label. 398 The iterator transform the input graph in the output graph by replacing each node with the graph that corresponds to the linear \texttt{blist} obtained by applying the function in input to the node label. 421 399 \end{itemize} 422 400 423 Using the iterator above, the code can be written in such a way that 424 the programmer does not see any distinction between writing a transformation 425 on linear or graph languages. 426 427 In order to prove simulations for translations obtained using the iterator, 428 we will prove the following theorem: 401 Using the iterator above, the code can be written in such a way that the programmer does not see any distinction between writing a transformatio on linear or graph languages. 402 403 In order to prove simulations for translations obtained using the iterator, we will prove the following theorem: 429 404 430 405 \begin{align*} … … 434 409 \end{align*} 435 410 436 Here \texttt{subgraph} is a computational predicate that given a \texttt{blist} 437 $[i_1, \ldots, i_n]$, an entry label $l$, an exit label $l'$ and a graph $G$ 438 expands to the fact that fetching from $G$ at address $l$ one retrieves a node 439 $i_1$ with a successor $l_1$ that, when fetched, yields a node $i_2$ with a 440 successor $l_2$ such that \ldots. The successor of $i_n$ is $l'$. 441 442 Proving a forward simulation diagram of the following kind using the theorem 443 above is now as simple as doing the same using standard small step operational 444 semantics over linear languages. 411 Here \texttt{subgraph} is a computational predicate that given a \texttt{blist} $[i_1, \ldots, i_n]$, an entry label $l$, an exit label $l'$ and a graph $G$ expands to the fact that fetching from $G$ at address $l$ one retrieves a node $i_1$ with a successor $l_1$ that, when fetched, yields a node $i_2$ with a successor $l_2$ such that \ldots. The successor of $i_n$ is $l'$. 412 413 Proving a forward simulation diagram of the following kind using the aforementioned theorem is now as straightforward as doing the same using standard smallstep operational semantics over linear languages. 445 414 446 415 \begin{align*} … … 451 420 \end{align*} 452 421 453 Because of the fact that graph translation preserves entry and exit labels of 454 translated statements, the state translation function $\sigma$ will simply 455 preserve the value of the program counter. The program code, which is 456 part of the state, is translated using the iterator. 457 458 The proof is then roughly the following. Let $l$ be the program counter of the 459 input state $s$. We proceed by cases on the current instruction of $s$. 460 Let $[i_1, \ldots, i_n]$ be the \texttt{blist} associated to $l$ and $s$ 461 by the translation function. The witness required for the existential 462 statement is simply $n$. By applying the theorem above we know that the 463 next $n$ instructions that will be fetched from $s\ \sigma$ will be 464 $[i_1, \ldots, i_n]$ and it is now sufficient to prove that they simulate 465 the original instruction. 422 Because graph translations preserve entry and exit labels of translated statements, the state translation function $\sigma$ will simply preserve the value of the program counter. 423 The program code, which is part of the state, is translated using the iterator. 424 425 The proof is then roughly as follows. 426 Let $l$ be the program counter of the input state $s$. 427 We proceed by cases on the current instruction of $s$. 428 Let $[i_1, \ldots, i_n]$ be the \texttt{blist} associated to $l$ and $s$ by the translation function. 429 The witness required for the existential statement is simply $n$. 430 By applying the theorem above we know that the next $n$ instructions that will be fetched from $s\ \sigma$ will be $[i_1, \ldots, i_n]$ and it is now sufficient to prove that they simulate the original instruction. 466 431 467 432 \subsection{The RTLabs to RTL translation} … … 479 444 \end{displaymath} 480 445 481 In the frontend, we have both integer and float values, where integer values are `sized', along with null values and pointers. Some frontenv values are482 representables in a byte, but some others require more bits.446 In the frontend, we have both integer and float values, where integer values are `sized', along with null values and pointers. 447 Some frontend values are representables in a byte, but some others require more bits. 483 448 484 449 In the backend model all values are meant to be represented in a single byte. 485 Values can thefore be undefined, be one byte long integers or be indexed 486 fragments of a pointer, null or not. Floats values are no longer present, as floating point arithmetic is not supported by the CerCo compiler. 487 488 The $\sigma$ map implements a onetomany relation: a single frontend value 489 is mapped to a sequence of backend values when its size is more then one byte. 490 491 We further require a map, $\sigma$, which maps the frontend \texttt{Memory} and the backend's notion of \texttt{BEMemory}. Both kinds of memory can be 492 thought as an instance of a generic \texttt{Mem} data type parameterized over 493 the kind of values stored in memory. 450 Values can thefore be undefined, be one byte long integers or be indexed fragments of a pointer, null or not. 451 Floats values are no longer present, as floating point arithmetic is not supported by the CerCo compiler. 452 453 The $\sigma$ map implements a onetomany relation: a single frontend value is mapped to a sequence of backend values when its size is more then one byte. 454 455 We further require a map, $\sigma$, which maps the frontend \texttt{Memory} and the backend's notion of \texttt{BEMemory}. 456 Both kinds of memory can be thought as an instance of a generic \texttt{Mem} data type parameterized over the kind of values stored in memory: 494 457 495 458 \begin{displaymath} 496 459 \mathtt{Mem}\ \alpha = \mathtt{Block} \rightarrow (\mathbb{Z} \rightarrow \alpha) 497 460 \end{displaymath} 498 499 461 Here, \texttt{Block} consists of a \texttt{Region} paired with an identifier. 500 462 … … 502 464 \mathtt{Block} ::= \mathtt{Region} \times \mathtt{ID} 503 465 \end{displaymath} 504 505 466 We now have what we need for defining what is meant by the `memory' in the backend memory model. 506 467 Namely, we instantiate the previously defined \texttt{Mem} type with the type of backend memory values. … … 509 470 \mathtt{BEMem} = \mathtt{Mem}~\mathtt{BEValue} 510 471 \end{displaymath} 511 512 472 Memory addresses consist of a pair of backend memory values: 513 473 … … 515 475 \mathtt{Address} = \mathtt{BEValue} \times \mathtt{BEValue} \\ 516 476 \end{displaymath} 517 518 477 The back and frontend memory models differ in how they represent sized integeer values in memory. 519 478 In particular, the frontend stores integer values as a header, with size information, followed by a string of `continuation' blocks, marking out the full representation of the value in memory. … … 555 514 \texttt{load}^* \sigma(a)\ (\mathtt{store}\ \sigma(a')\ \sigma(v)\ \sigma(M)) = \mathtt{load}^*\ \sigma(s)\ \sigma(a)\ \sigma(M) 556 515 \end{displaymath} 557 That is, suppose we store a transformed value $\mathtt{\sigma(v)}$ into a backend memory $\mathtt{\sigma(M)}$ at address $\mathtt{\sigma(a')}$, using \texttt{store}, and then load from the address $\sigma(a)$. Even if $a$ and $a'$ are 558 distinct by hypothesis, there is a priori no guarantee that the consecutive 559 bytes for the value stored at $\sigma(a)$ are disjoint from those for the 560 values stored at $\sigma(a')$. The fact that this holds is a nontrivial 561 property of $\sigma$ to be proved. 516 That is, suppose we store a transformed value $\mathtt{\sigma(v)}$ into a backend memory $\mathtt{\sigma(M)}$ at address $\mathtt{\sigma(a')}$, using \texttt{store}, and then load from the address $\sigma(a)$. Even if $a$ and $a'$ are distinct by hypothesis, there is \emph{a priori} no guarantee that the consecutive bytes for the value stored at $\sigma(a)$ are disjoint from those for the values stored at $\sigma(a')$. 517 The fact that this holds is a nontrivial property of $\sigma$ that must be explicitly proved. 562 518 563 519 \subsubsection*{Translation of RTLabs states} 520 564 521 RTLabs states come in three flavours: 565 522 \begin{displaymath} … … 570 527 \end{array} 571 528 \end{displaymath} 572 \texttt{State} is the default state in which RTLabs programs are almost always in .529 \texttt{State} is the default state in which RTLabs programs are almost always in for the duration of their execution. 573 530 The \texttt{Call} state is only entered when a call instruction is being executed, and then we immediately return to being in \texttt{State}. 574 531 Similarly, \texttt{Return} is only entered when a return instruction is being executed, before returning immediately to \texttt{State}. … … 577 534 578 535 RTL states differ from their RTLabs counterparts, in including a program counter \texttt{PC}, stackpointer \texttt{SP}, internal stack pointer \texttt{ISP}, a carry flag \texttt{CARRY} and a set of registers \texttt{REGS}: 536 579 537 \begin{displaymath} 580 538 \mathtt{State} ::= \mathtt{Frame}^* \times \mathtt{PC} \times \mathtt{SP} \times \mathtt{ISP} \times \mathtt{CARRY} \times \mathtt{REGS} … … 584 542 As a result, we have two stack pointers in our state: \texttt{ISP}, which is the real, hardware stack, and \texttt{SP}, which is the stack pointer of the emulated stack in memory. 585 543 The emulated stack is used for pushing and popping stack frames when calling or returning from function calls, however this is done using the hardware stack, indexed by \texttt{ISP} as an intermediary. 586 Instructions like \texttt{LCALL} and \texttt{ACALL} are hardwired by the processor's design to push the return address on to the hardware stack. Therefore after a call has been made, and before a call returns, the compiler emits code to move the return address back and forth the two stacks. Parameters, return values 587 and local variables are only present in the external stack. 544 Instructions like \texttt{LCALL} and \texttt{ACALL} are hardwired by the processor's design to push the return address on to the hardware stack. 545 Therefore after a call has been made, and before a call returns, the compiler emits code to move the return address back and forth the two stacks. 546 Parameters, return values and local variables are only present in the external stack. 588 547 As a result, for most of the execution of the processor, the hardware stack is empty, or contains a single item ready to be moved into external memory. 589 548 590 549 Once more, we require a relation $\sigma$ between RTLabs states and RTL states. 591 Because $\sigma$ is onetomany and, morally, a multifunction, 592 we use in the following the functional notation for $\sigma$, using $\star$ 593 in the output of $\sigma$ to mean that any value is accepted. 550 Because $\sigma$ is onetomany and, morally, a multivalued function, we use in the sequel the functional notation for $\sigma$, using $\star$ as a distinct marker in the range of $\sigma$ to mean that any value is accepted. 594 551 \begin{displaymath} 595 552 \mathtt{State} \stackrel{\sigma}{\longrightarrow} \mathtt{State} … … 601 558 \sigma(\mathtt{State} (\mathtt{Frame}^* \times \mathtt{\langle PC, REGS, SP \rangle})) \longrightarrow ((\sigma(\mathtt{Frame}^*), \sigma(\mathtt{PC}), \sigma(\mathtt{SP}), \star, \star, \sigma(\mathtt{REGS})), \sigma(\mathtt{Mem})) 602 559 \end{displaymath} 603 Translation then proceeds by translating the remaining stack frames, as well as the contents of the top stack frame. Any value for the internal stack pointer604 and the carry bit is admitted.560 Translation then proceeds by translating the remaining stack frames, as well as the contents of the top stack frame. 561 An arbitrary value for the internal stack pointer and the carry bit is admitted. 605 562 606 563 Translating \texttt{Call} and \texttt{Return} states is more involved, as a commutation between a single step of execution and the translation process must hold: … … 614 571 615 572 Here \emph{return one step} and \emph{call one step} refer to a pair of commuting diagrams relating the onestep execution of a call and return state and translation of both. 616 We provide the one step commuting diagrams in Figure~\ref{fig.commuting.diagrams}. The fact that one execution step in the source language is not performed 617 in the target language is not problematic for preservation of divergence 618 because it is easy to show that every step from a \texttt{Call} or 619 \texttt{Return} state is always preceeded/followed by one step that is always 620 simulated. 573 We provide the one step commuting diagrams in Figure~\ref{fig.commuting.diagrams}. 574 The fact that one execution step in the source language is not performed in the target language is not problematic for preservation of divergence because it is easy to show that every step from a \texttt{Call} or \texttt{Return} state is always preceeded or followed by one step that is always simulated. 621 575 622 576 \begin{figure} … … 641 595 642 596 \subsubsection*{The forward simulation proof} 643 The forward simulation proof for all steps that do not involve function calls are lengthy, but routine. 597 598 The forward simulation proofs for all steps that do not involve function calls are lengthy, but routine. 644 599 They consist of simulating a frontend operation on frontend pseudoregisters and the frontend memory with sequences of backend operations on the backend pseudoregisters and backend memory. 645 600 The properties of $\sigma$ presented before that relate values and memories will need to be heavily exploited. … … 688 643 689 644 690 Eventually, a Return instruction is faced, 691 the return value is fetched from the registers map, 692 the stack frame is deallocated 693 and a Return state is entered: 645 Eventually, a \texttt{RET} instruction is faced, the return value is fetched from the registers map, the stack frame is deallocated and a \texttt{Return} state is entered: 694 646 \begin{displaymath} 695 647 \begin{array}{rcl} 696 \mathtt{R eturn(id,\ args,\ dst,\ pc) \in State(Frame^*, Frame)} & \longrightarrow & \mathtt{free(sp)} \\648 \mathtt{RET(id,\ args,\ dst,\ pc) \in State(Frame^*, Frame)} & \longrightarrow & \mathtt{free(sp)} \\ 697 649 & & \mathtt{Return(M(ret\_val), dst, Frames)} 698 650 \end{array} 699 651 \end{displaymath} 700 652 701 Then the returnstate restores the program counter by popping the stack, and execution proceeds in a new \texttt{State}, like the case for external functions:653 Then the \texttt{Return} state restores the program counter by popping the stack, and execution proceeds in a new \texttt{State}, like the case for external functions: 702 654 \begin{displaymath} 703 655 \begin{array}{rcl} … … 708 660 709 661 \subparagraph{The RTLabs to RTL translation for function calls} 662 710 663 Return instructions are translated to return instructions: 711 664 \begin{displaymath} 712 \mathtt{Return} \longrightarrow \mathtt{Return} 713 \end{displaymath} 714 715 \texttt{Call} instructions are translated to \texttt{Call\_ID} instructions: 716 \begin{displaymath} 717 \mathtt{Call(id,\ args,\ dst,\ pc)} \longrightarrow \mathtt{CALL\_ID(id,\ \Sigma'(args),\ \Sigma(dst),\ pc)} 718 \end{displaymath} 719 Here $\Sigma$ is the map, computed by the compiler, 720 that translate pseudoregisters holding frontend values to list of 721 pseudoregisters holding the chunks for the frontend values. 665 \mathtt{RET} \longrightarrow \mathtt{RET} 666 \end{displaymath} 667 668 \texttt{CALL} instructions are translated to \texttt{CALL\_ID} instructions: 669 \begin{displaymath} 670 \mathtt{CALL(id,\ args,\ dst,\ pc)} \longrightarrow \mathtt{CALL\_ID(id,\ \Sigma'(args),\ \Sigma(dst),\ pc)} 671 \end{displaymath} 672 Here $\Sigma$ is the map, computed by the compiler, that translate pseudoregisters holding frontend values to list of pseudoregisters holding the chunks for the frontend values. 722 673 The specification for $\Sigma$ is that for every state $s$, 723 $$\sigma(s(r)) = (\sigma(s))(\Sigma(r))$$ 674 \begin{displaymath} 675 \sigma(s(r)) = (\sigma(s))(\Sigma(r)) 676 \end{displaymath} 724 677 725 678 \subparagraph{Function call/return in RTL} … … 754 707 \mathtt{PUSH(current\_frame[pc := after\_return])} 755 708 \end{displaymath} 756 was executed. To summarize, up to the different numer of transitions required 757 to do the job, the RTL code for internal function calls closely simulates 758 the RTLabs code. 709 was executed. 710 To summarize, up to the different numer of transitions required to do the job, the RTL code for internal function calls closely simulates the RTLabs code. 759 711 760 712 The execution of \texttt{Return} in RTL is similarly straightforward, with the return address, stack pointer, and so on, being computed by popping off the top of the stack, and the return value computed by the function being retrieved from memory: … … 768 720 To summarize, the forward simulation diagrams for function call/return 769 721 XXX 770 771 722 772 723 Translation and execution must satisfy a pair of commutation properties for the \texttt{Return} and \texttt{Call} cases. … … 842 793 \subsection{The ERTL to LTL translation} 843 794 \label{subsect.ertl.ltl.translation} 844 During the ERTL to LTL translation pseudoregisters are stored in hardware 845 registers or spilled on to the stack frame. The decision is based on liveness 846 analysis of the ERTL code to determine what pair of pseudoregisters are live 847 at the same time at a given location. A coloring algorithm is then used to 848 choose where to store the pseudoregisters, allowing pseudoregisters that 849 are never live at the same time to share the same location. 850 851 We will not certify any coloring algorithm or control flow analysis. 852 Instead, we axiomatically assume the existence of solutions to the 853 coloring and liveness problems. In a later phase we plan to validate the 854 solutions by writing and certifying the code of a validator. 855 856 We describe the liveness analysis and colouring analysis first and then 857 the ERTL to LTL translation. 858 859 Throughout this section, we denote pseudoregisters with the type $\mathtt{register}$ and hardware ones with $\mathtt{hdwregister}$. 795 During the ERTL to LTL translation pseudoregisters are stored in hardware registers or spilled onto the stack frame. 796 The decision is based on a liveness analysis performed on the ERTL code to determine what pair of pseudoregisters are live at the same time for a given location. 797 A colouring algorithm is then used to choose where to store the pseudoregisters, permitting pseudoregisters that are deemed never to be live at the same time to share the same location. 798 799 We will not certify any colouring algorithm or control flow analysis. 800 Instead, we axiomatically assume the existence of `oracles' that implement the colouring and liveness analyses. 801 In a later phase we plan to validate the solutions by writing and certifying the code of a validator. 802 803 We describe the liveness analysis and colouring analysis first and then the ERTL to LTL translation after. 804 805 Throughout this section, we denote pseudoregisters with the type $\mathtt{register}$ and hardware registers with $\mathtt{hdwregister}$. 860 806 \subsubsection{Liveness analysis} 861 807 \newcommand{\declsf}[1]{\expandafter\newcommand\expandafter{\csname #1\endcsname}{\mathop{\mathsf{#1}}\nolimits}} … … 867 813 \declsf{StatementSem} 868 814 869 For the liveness analysis, we aim at a map815 For the liveness analysis, we aim to construct a map 870 816 $\ell \in \mathtt{label} \mapsto $ live registers at $\ell$. 871 We define the following operators on ERTL statements. 872 $$ 817 We define the following operators on ERTL statements: 818 819 \begin{displaymath} 873 820 \begin{array}{lL>{(ex. $}L<{)$}} 874 821 \Defined(\ell) & registers defined at $\ell$ & \ell:r_1\leftarrow r_2+r_3 \mapsto \{r_1,C\}, \ell:\mathtt{CALL}~id\mapsto \text{callersave} … … 876 823 \Used(\ell) & registers used at $\ell$ & \ell:r_1\leftarrow r_2+r_3 \mapsto \{r_2,r_3\}, \ell:\mathtt{CALL}~id\mapsto \text{parameters} 877 824 \end{array} 878 $$ 879 Given $LA:\mathtt{label}\to\mathtt{lattice}$ (where $\mathtt{lattice}$ 880 is the type of sets of registers\footnote{More precisely, it is the lattice 881 $\mathtt{set}(\mathtt{register})\times 882 \mathtt{set}(\mathtt{hdwregister})$, 883 with pointwise operations.}), we also have have the following 884 predicates: 885 $$ 825 \end{displaymath} 826 Given $LA:\mathtt{label}\to\mathtt{lattice}$ (where $\mathtt{lattice}$ is the type of sets of registers\footnote{More precisely, it is the lattice $\mathtt{set}(\mathtt{register}) \times \mathtt{set}(\mathtt{hdwregister})$, with pointwise operations.}), we also have have the following predicates: 827 \begin{displaymath} 886 828 \begin{array}{lL} 887 829 \Eliminable_{LA}(\ell) & iff executing $\ell$ has sideeffects only on $r\notin LA(\ell)$ … … 896 838 \end{cases}$ 897 839 \end{array} 898 $$ 899 In particular, $\Livebefore$ has type $(\mathtt{label}\to\mathtt{lattice})\to 900 \mathtt{label}\to\mathtt{lattice}$. 901 902 The equation on which we build the fixpoint is then 903 $$\Liveafter(\ell) \doteq \bigcup_{\ell <_1 \ell'} \Livebefore_{\Liveafter}(\ell')$$ 904 where $\ell <_1 \ell'$ denotes that $\ell'$ is an immediate successor of $\ell$ 905 in the graph. We do not require the fixpoint to be the least one, so the hypothesis 906 on $\Liveafter$ that we require is840 \end{displaymath} 841 In particular, $\Livebefore$ has type $(\mathtt{label}\to\mathtt{lattice})\to \mathtt{label}\to\mathtt{lattice}$. 842 843 The equation upon which we build the fixpoint is then 844 \begin{displaymath} 845 \Liveafter(\ell) \doteq \bigcup_{\ell <_1 \ell'} \Livebefore_{\Liveafter}(\ell') 846 \end{displaymath} 847 where $\ell <_1 \ell'$ denotes that $\ell'$ is an immediate successor of $\ell$ in the graph. 848 We do not require the fixpoint to be the least one, so the hypothesis on $\Liveafter$ that we require is 907 849 \begin{equation} 908 850 \label{eq:livefixpoint} 909 851 \Liveafter(\ell) \supseteq \bigcup_{\ell <_1 \ell'} \Livebefore(\ell') 910 852 \end{equation} 911 (for shortnesswe drop the subscript from $\Livebefore$).853 (for brevity we drop the subscript from $\Livebefore$). 912 854 913 855 \subsubsection{Colouring} 856 914 857 \declsf{Colour} 915 858 \newcommand{\at}{\mathrel{@}} 916 The aim of the liveness analysis is to define what properties we need 917 of the colouring function, which is a map (computed separately for each 918 internal function) 919 $$\Colour:\mathtt{register}\to\mathtt{hdwregister}+\mathtt{nat}$$ 920 which identifies pseudoregisters with hardware ones if it is able to, otherwise 921 it spills them to the stack. We will just state what property we need from such 922 a map. First, we extend the definition to all types of registers by: 923 $$\begin{aligned} 859 860 The aim of liveness analysis is to define what properties we need of the colouring function, which is a map (computed separately for each internal function) 861 \begin{displaymath} 862 \Colour:\mathtt{register}\to\mathtt{hdwregister}+\mathtt{nat} 863 \end{displaymath} 864 which identifies pseudoregisters with hardware ones if it is able to, otherwise it spills them to the stack. 865 We will just state what property we require from such a map. 866 First, we extend the definition to all types of registers by: 867 \begin{displaymath} 868 \begin{aligned} 924 869 \Colour^+:\mathtt{hdwregister}+\mathtt{register} &\to \mathtt{hdwregister}+\mathtt{nat}\\ 925 870 r & \mapsto … … 928 873 r &\text{if $r\in\mathtt{hdwregister}$,}. 929 874 \end{cases} 930 \end{aligned}$$ 931 The other piece of information we compute for each function is a \emph{similarity}932 relation, which is an equivalence relation on all kinds of registers which depends 933 on the point of the program.We write934 $$x\sim y \at \ell$$ 935 to state that registers $x$ and $y$ are similar at $\ell$. The formal definition 936 of this relation's property will be given next, but intuitively it means that those two registers \emph{must}937 have the same value after $\ell$. The analysis that produces this information can be 938 coarse: in our case, we just set two different registers to be similar at $\ell$ 939 if at $\ell$ itself there is a move instruction between the two.875 \end{aligned} 876 \end{displaymath} 877 The other piece of information we compute for each function is a \emph{similarity} relation, which is an equivalence relation on all kinds of registers which depends on the point of the program. 878 We write 879 \begin{displaymath} 880 x\sim y \at \ell 881 \end{displaymath} 882 to state that registers $x$ and $y$ are similar at $\ell$. 883 The formal definition of this relation's property will be given next, but intuitively it means that those two registers \emph{must} have the same value after $\ell$. 884 The analysis that produces this information can be coarse: in our case, we just set two different registers to be similar at $\ell$ if at $\ell$ itself there is a move instruction between the two. 940 885 941 886 The property required of colouring is the following: … … 947 892 948 893 \subsubsection{The translation} 894 949 895 For example: 950 $$\ell : r_1\leftarrow r_2+r_3 \mapsto \begin{cases} 896 \begin{displaymath} 897 \ell : r_1\leftarrow r_2+r_3 \mapsto \begin{cases} 951 898 \varepsilon & \text{if $\Eliminable(\ell)$},\\ 952 899 \Colour(r_1) \leftarrow \Colour(r_2) + \Colour(r_3) & \text{otherwise}. 953 \end{cases}$$ 954 where $\varepsilon$ is the empty block of instructions (i.e.\ a \texttt{GOTO}), 955 and $\Colour(r_1) \leftarrow \Colour(r_2) + \Colour(r_3)$ is a notation for a 956 block of instructions that take into account: 900 \end{cases} 901 \end{displaymath} 902 where $\varepsilon$ is the empty block of instructions (i.e.\ a \texttt{GOTO}), and $\Colour(r_1) \leftarrow \Colour(r_2) + \Colour(r_3)$ is a notation for a block of instructions that take into account: 957 903 \begin{itemize} 958 \item load and store ops on the stack if any colouring is in fact a spilling; 959 \item using the accumulator to store intermediate values. 904 \item 905 Load and store ops on the stack if any colouring is in fact a spilling; 906 \item 907 Using the accumulator to store intermediate values. 960 908 \end{itemize} 961 The overall effect is that if $T$ is an LTL state with $\ell(T)=\ell$ 962 then we will have $T\to^+T'$ where $T'(\Colour(r_1))=T(\Colour(r_2))+T(\Colour(r_2))$, 963 while $T'(y)=T(y)$ for any $y$ \emph{in the image of $\Colour$} different from 964 $\Colour(r_1)$. Some hardware registers that are used for bookkeeping and which 965 are explicitly excluded from colouring may have different values. 909 The overall effect is that if $T$ is an LTL state with $\ell(T)=\ell$ then we will have $T\to^+T'$ where $T'(\Colour(r_1))=T(\Colour(r_2))+T(\Colour(r_2))$, while $T'(y)=T(y)$ for any $y$ \emph{in the image of $\Colour$} different from $\Colour(r_1)$. 910 Some hardware registers that are used for bookkeeping and which are explicitly excluded from colouring may have different values. 966 911 967 912 We skip the details of correctly dealing with the stack and its size. 913 968 914 \subsubsection{The relation between ERTL and LTL states} 969 Given a state $S$ in ERTL, we abuse the notation by using $S$ as the underlying map 970 $$S : \mathtt{hdwregister}+\mathtt{register} \to \mathtt{Value}.$$ 971 The program counter in $S$ is written as $\ell(S)$. At this point we can state 972 the property asked from similarity: 915 916 Given a state $S$ in ERTL, we abuse notation by using $S$ as the underlying map 917 \begin{displaymath} 918 S : \mathtt{hdwregister}+\mathtt{register} \to \mathtt{Value} 919 \end{displaymath} 920 We write $\ell(S)$ for the program counter in $S$. 921 At this point we can state the property asked from similarity: 973 922 \begin{equation} 974 923 \label{eq:similprop} … … 976 925 \end{equation} 977 926 978 Next, we relate ERTL states with LTL ones. For a state $T$ in LTL we again 979 abuse the notation using $T$ as a map 980 $$T: \mathtt{hdwregister}+\mathtt{nat} \to \mathtt{Value}$$ 981 which maps hardware registers and \emph{local stack offsets} to values (in particular, 982 $T$ as a map depends on the saved frames for computing the correct absolute 983 stack values). 984 985 The relation existing between the states at the two sides of this translation step, 986 which depends on liveness and colouring, is 987 then defined as 988 $$S\mathrel\sigma T \iff \ldots \wedge \forall x. x\in \Livebefore(\ell(S))\Rightarrow T(\Colour^+(x)) = S(x).$$ 989 The ellipsis stands for other straightforward preservation, among which the properties 990 $\ell(T) = \ell(S)$ and, inductively, the preservation of frames. 927 Next, we relate ERTL states with LTL ones. 928 For a state $T$ in LTL we again abuse notation using $T$ as a map 929 \begin{displaymath} 930 T: \mathtt{hdwregister}+\mathtt{nat} \to \mathtt{Value} 931 \end{displaymath} 932 which maps hardware registers and \emph{local stack offsets} to values (in particular, $T$ as a map depends on the saved frames for computing the correct absolute stack values). 933 934 The relation existing between the states at the two sides of this translation step, which depends on liveness and colouring, is then defined as 935 \begin{displaymath} 936 S\mathrel\sigma T \iff \ldots \wedge \forall x. x\in \Livebefore(\ell(S))\Rightarrow T(\Colour^+(x)) = S(x) 937 \end{displaymath} 938 The ellipsis stands for other straightforward preservation, among which the properties $\ell(T) = \ell(S)$ and, inductively, the preservation of frames. 991 939 992 940 \subsubsection{Proof of preservation} 941 993 942 We will prove the following proposition: 994 $$\forall S, T. S \mathrel\sigma T \Rightarrow S \to S' \Rightarrow \exists T'.T\to^+ T' \wedge S'\mathrel\sigma T'$$ 995 (with appropriate costlabelled trace preservation which we omit). We will call $S\mathrel \sigma T$ 996 the inductive hypothsis, as it will be such in the complete proof by induction on the trace of the program. 997 As usual, this step is done by cases 998 on the statement at $\ell(S)$ and how it is translated. We carry out in some detail a single case, the one of 999 a binary operation on registers. 943 \begin{displaymath} 944 \forall S, T. S \mathrel\sigma T \Rightarrow S \to S' \Rightarrow \exists T'.T\to^+ T' \wedge S'\mathrel\sigma T' 945 \end{displaymath} 946 (with appropriate costlabelled trace preservation which we omit). 947 We will call $S\mathrel \sigma T$ the inductive hypothsis, as it will be such in the complete proof by induction on the trace of the program. 948 As usual, this step is done by cases on the statement at $\ell(S)$ and how it is translated. 949 We carry out the case of a binary operation on registers in some detail. 1000 950 1001 951 Suppose that $\ell(S):r_1 \leftarrow r_2+r_3$, so that 1002 $$S'(x)=\begin{cases}S(r_1)+S(r_2) &\text{if $x=r_1$,}\\S(x) &\text{otherwise.}\end{cases}$$ 952 \begin{displaymath} 953 S'(x)=\begin{cases}S(r_1)+S(r_2) &\text{if $x=r_1$,}\\S(x) &\text{otherwise.}\end{cases} 954 \end{displaymath} 955 1003 956 \paragraph*{Case $\Eliminable(\ell(S))$.} 1004 By definition we have $r_1\notin \Liveafter(\ell(S))$, and the translation 1005 of the operation yields a \texttt{GOTO}. We take $T'$ the immediate successor 1006 of $T$.957 958 By definition we have $r_1\notin \Liveafter(\ell(S))$, and the translation of the operation yields a \texttt{GOTO}. 959 We take $T'$ the immediate successor of $T$. 1007 960 1008 961 Now in order to prove $S'\mathrel\sigma T'$, take any 1009 $$x\in\Livebefore(\ell(S'))\subseteq \Liveafter(\ell(S)) = \Livebefore(\ell(S))$$ 1010 where we have used property~\eqref{eq:livefixpoint} and the definition 1011 of $\Livebefore$ when $\Eliminable(\ell(S))$. We get the following chain of equalities: 1012 $$T'(\Colour^+(x))\stackrel 1=T(\Colour^+(x))\stackrel 2=S(x) \stackrel 3= S'(x)$$ 962 \begin{displaymath} 963 x\in\Livebefore(\ell(S'))\subseteq \Liveafter(\ell(S)) = \Livebefore(\ell(S)) 964 \end{displaymath} 965 where we have used property~\eqref{eq:livefixpoint} and the definition of $\Livebefore$ when $\Eliminable(\ell(S))$. 966 We get the following chain of equalities: 967 968 \begin{displaymath} 969 T'(\Colour^+(x))\stackrel 1=T(\Colour^+(x))\stackrel 2=S(x) \stackrel 3= S'(x) 970 \end{displaymath} 1013 971 where 1014 972 \begin{enumerate} 1015 \item is because $T'$ has the same store as $T$, 1016 \item is by inductive hypothesis as $x\in\Livebefore(\ell(S))$, 1017 \item is because $x\neq r_1$, as $r_1\notin \Liveafter(\ell(S))\ni x$. 973 \item 974 follows as $T'$ has the same store as $T$, 975 \item 976 follows from the inductive hypothesis as $x\in\Livebefore(\ell(S))$, 977 \item 978 follows as $x\neq r_1$, as $r_1\notin \Liveafter(\ell(S))\ni x$. 1018 979 \end{enumerate} 980 1019 981 \paragraph*{Case $\neg\Eliminable(\ell(S))$.} 982 1020 983 We then have $r_1\in\Liveafter(\ell(S))$, and 1021 $$\Livebefore(\ell(S))=(\Liveafter(\ell(S))\setminus\{r_1\})\cup\{r_2,r_3\}.$$ 1022 Moreover the statement is translated to $\Colour(r_1)\leftarrow\Colour(r_2)+\Colour(r_3)$, 1023 and we take the $T'\leftarrow^+T$ such that $T'(\Colour(r_1))=T(\Colour(r_2))+T(\Colour(r_3))$ and 1024 $T'(\Colour^+(x))=T(\Colour^+(x))$ for all $x$ with $\Colour^+(x)\neq\Colour(r_1)$. 984 985 \begin{displaymath} 986 \Livebefore(\ell(S))=(\Liveafter(\ell(S))\setminus\{r_1\})\cup\{r_2,r_3\} 987 \end{displaymath} 988 Moreover the statement is translated to $\Colour(r_1)\leftarrow\Colour(r_2)+\Colour(r_3)$, and we take the $T'\leftarrow^+T$ such that $T'(\Colour(r_1))=T(\Colour(r_2))+T(\Colour(r_3))$ and $T'(\Colour^+(x))=T(\Colour^+(x))$ for all $x$ with $\Colour^+(x)\neq\Colour(r_1)$. 1025 989 1026 990 Take any $x\in\Livebefore(\ell(S'))\subseteq \Liveafter(\ell(S))$ (by property~\eqref{eq:livefixpoint}). 1027 991 1028 If $\Colour^+(x)=\Colour(r_1)$, we have by property~\eqref{eq:colourprop} 1029 that $x\sim r_1\at \ell(S)$ (as both $r_1,x\in\Liveafter(\ell(S))$, so that 1030 $$T'(\Colour^+(x))=T(\Colour(r_2))+T(\Colour(r_3))\stackrel 1=S(r_2)+S(r_3)=S'(r_1)\stackrel 2=S(x)$$ 992 If $\Colour^+(x)=\Colour(r_1)$, we have by property~\eqref{eq:colourprop} that $x\sim r_1\at \ell(S)$ (as both $r_1,x\in\Liveafter(\ell(S))$, so that 993 \begin{displaymath} 994 T'(\Colour^+(x))=T(\Colour(r_2))+T(\Colour(r_3))\stackrel 1=S(r_2)+S(r_3)=S'(r_1)\stackrel 2=S(x) 995 \end{displaymath} 1031 996 where 1032 997 \begin{enumerate} 1033 \item is by two uses of inductive hypothesis, as $r_2,r_3\in\Livebefore(\ell(S))$, 1034 \item is by property~\eqref{eq:similprop}\footnote{Notice that in our particular implementation 1035 for this case of binary op $x\sim r_1\at\ell(S)$ implies $x=r_1$. But nothing prevents us from 1036 employing more fine euristics for similarity.}. 998 \item 999 follows from two uses of the inductive hypothesis, as $r_2,r_3\in\Livebefore(\ell(S))$, 1000 \item 1001 follows from property~\eqref{eq:similprop}\footnote{Notice that in our particular implementation for this case of binary op $x\sim r_1\at\ell(S)$ implies $x=r_1$. 1002 However, nothing prevents us from employing more finegrained heuristics for similarity.}. 1037 1003 \end{enumerate} 1038 1004 1039 If $\Colour^+(x)\neq\Colour(r_1)$ (so in particular $x\neq r_1$), then $x\in\Livebefore(\ell(S))$, 1040 so by inductive hypothesis we have 1041 $$T'(\Colour^+(x))=T(\Colour^+(x))=S(x)=S'(x).$$ 1005 If $\Colour^+(x)\neq\Colour(r_1)$ (so in particular $x\neq r_1$), then $x\in\Livebefore(\ell(S))$, so by inductive hypothesis we have 1006 \begin{displaymath} 1007 T'(\Colour^+(x))=T(\Colour^+(x))=S(x)=S'(x) 1008 \end{displaymath} 1042 1009 1043 1010 \subsection{The LTL to LIN translation} 1044 1011 \label{subsect.ltl.lin.translation} 1045 Ad detailed elsewhere in the reports, due to the parameterized representation of 1046 the backend languages, the pass described here is actually much more generic 1047 than the translation from LTL to LIN. It consists in a linearization pass 1048 that maps any graphbased backend language to its corresponding linear form, 1049 preserving its semantics. In the rest of the section, however, we will keep 1050 the names LTL and LIN for the two partial instantiations of the parameterized 1051 language. 1012 As detailed elsewhere in the reports, due to the parameterised representation of the backend languages, the pass described here is actually much more generic than the translation from LTL to LIN. 1013 It consists in a linearisation pass that maps any graphbased backend language to its corresponding linear form, preserving its semantics. 1014 In the rest of the section, however, we will keep the names LTL and LIN for the two partial instantiations of the parameterized language for convenience. 1052 1015 1053 1016 We require a map, $\sigma$, from LTL statuses, where program counters are represented as labels in a graph data structure, to LIN statuses, where program counters are natural numbers: … … 1106 1069 \end{enumerate} 1107 1070 1108 Note, because the LTL to LIN transformation is the first time the code of 1109 a function is linearised in the backend, we must discover a notion of `well formed function code' suitable for linearised forms. 1110 In particular, we see the notion of well formedness (yet to be formally defined) resting on the following conditions: 1071 Note, because the LTL to LIN transformation is the first time the code of a function is linearised in the backend, we must discover a notion of `wellformed function code' suitable for linearised forms. 1072 In particular, we see the notion of wellformedness (yet to be formally defined) resting on the following conditions: 1111 1073 1112 1074 \begin{enumerate} … … 1116 1078 Each label is unique, appearing only once in the function code, 1117 1079 \item 1118 The final instruction of a function code must be a return or an unconditional 1119 jump. 1080 The final instruction of a function code must be a return or an unconditional jump. 1120 1081 \end{enumerate} 1121 1082 … … 1132 1093 1133 1094 The ASM to MCS51 machine code translation step, and the required statements of correctness, are found in an unpublished manuscript attached to this document. 1134 This is the most complex translation because of the huge number of cases 1135 to be addressed and because of the complexity of the two semantics. 1136 Moreover, in the assembly code we have conditional and unconditional jumps 1137 to arbitrary locations in the code, which are not supported by the MCS51 1138 instruction set. The latter has several kind of jumps characterized by a 1139 different instruction size and execution time, but limited in range. For 1140 instance, conditional jumps to locations whose destination is more than 1141 $2^7$ bytes away from the jump instruction location are not supported at 1142 all and need to be emulated with a code transformation. The problem, which 1143 is known in the litterature as branch displacement and that applies also 1144 to modern architectures, is known to be hard and is often NP. As far as we 1145 know, we will provide the first formally verified proof of correctness for 1146 an assembler that implements branch displacement. We are also providing 1147 the first verified proof of correctness of a mildly optimizing branch 1148 displacement algorithm that, at the moment, is almost finished, but not 1149 described in the companion paper. This proof by itself took about 6 men 1150 months. 1095 This is the most complex translation because of the huge number of cases to be addressed and because of the complexity of the two semantics. 1096 Moreover, in the assembly code we have conditional and unconditional jumps to arbitrary locations in the code, which are not supported by the MCS51 instruction set. 1097 The latter has several kind of jumps characterized by a different instruction size and execution time, but limited in range. 1098 For instance, conditional jumps to locations whose destination is more than $2^7$ bytes away from the jump instruction location are not supported at all and need to be emulated with a code transformation. 1099 This problem, which is known in the literature as branch displacement and is a universal problem for all architectures of microcontroller, is known to be computationally hard, often lying inside NP, depending on the exact characteristics of the target architecture. 1100 As far as we know, we will provide the first formally verified proof of correctness for an assembler that implements branch displacement. 1101 We are also providing the first verified proof of correctness of a mildly optimising branch displacement algorithm that, at the moment, is almost finished, but not described in the companion paper. 1102 This proof, in isolation, took around 6 man months. 1151 1103 1152 1104 \section{Correctness of cost prediction} 1153 Roughly speaking, 1154 the proof of correctness of cost prediction shows that the cost of executing 1155 a labelled object code program is the same as the sum over all labels in the 1156 program execution trace of the cost statically associated to the label and 1157 computed on the object code itself. 1158 1159 In presence of object level function calls, the previous statement is, however, 1160 incorrect. The reason is twofold. First of all, a function call may diverge. 1161 To the last labels that comes before the call, however, we also associate 1162 the cost of the instructions that follow the call. Therefore, in the 1163 sum over all labels, when we meet a label we prepay for the instructions 1164 after function calls, assuming all calls to be terminating. This choice is 1165 driven by considerations on the source code. Functions can be called also 1166 inside expressions and it would be too disruptive to put labels inside 1167 expressions to capture the cost of instructions that follow a call. Moreover, 1168 adding a label after each call would produce a much higher number of proof 1169 obligations in the certification of source programs using FramaC. The 1170 proof obligations, moreover, would be guarded by termination of all functions 1171 involved, that also generates lots of additional complex proof obligations 1172 that have little to do with execution costs. With our approach, instead, we 1173 put less burden on the user, at the price of proving a weaker statement: 1174 the estimated and actual costs will be the same if and only if the high level 1175 program is converging. For prefixes of diverging programs we can provide 1176 a similar result where the equality is replaced by an inequality (loss of 1177 precision). 1178 1179 Assuming totality of functions is however not sufficient yet at the object 1180 level. Even if a function returns, there is no guarantee that it will transfer 1181 control back to the calling point. For instance, the function could have 1182 manipulated the return address from its stack frame. Moreover, an object level 1183 program can forge any address and transfer control to it, with no guarantee 1184 on the execution behaviour and labelling properties of the called program. 1185 1186 To solve the problem, we introduced the notion of \emph{structured trace} 1187 that come in two flavours: structured traces for total programs (an inductive 1188 type) and structured traces for diverging programs (a coinductive type based 1189 on the previous one). Roughly speaking, a structured trace represents the 1190 execution of a well behaved program that is subject to several constraints 1191 like 1105 1106 Roughly speaking, the proof of correctness of cost prediction shows that the cost of executing a labelled objectcode program is the same as the summation over all labels in the program execution trace of the cost statically associated to the label and computed on the object code itself. 1107 1108 In the presence of objectlevel function calls, the previous statement is, however, incorrect. 1109 The reason is twofold. 1110 First of all, a function call may diverge. 1111 However, to the labels that appears just before a call, we also associate the cost of the instructions that follow the call. 1112 Therefore, in the summation over all labels, when we meet a label we prepay for the instructions that appear after function calls, assuming all calls to be terminating. 1113 1114 This choice is driven by several considerations on the source code. 1115 Namely, functions can be called inside expressions, and it would be too disruptive to put labels inside expressions to capture the cost of instructions that follow a call. 1116 Moreover, adding a label after each call would produce a much higher number of proof obligations in the certification of source programs using FramaC. 1117 The proof obligations, further, would be guarded by a requirement to demonstrate the termination of all functions involved, that also generates lots of additional complex proof obligations that have little to do with execution costs. 1118 1119 With our approach, instead, we put less burden on the user, at the price of proving a weaker statement: the estimated and actual costs will be the same if and only if the highlevel program is converging. 1120 For prefixes of diverging programs we can provide a similar result where the equality is replaced by an inequality (loss of precision). 1121 1122 Assuming totality of functions is however not a sufficiently strong condition at the objectlevel. 1123 Even if a function returns, there is no guarantee that it will transfer control back to the calling point. 1124 For instance, the function could have manipulated the return address from its stack frame. 1125 Moreover, an objectlevel program can forge any address and transfer control to it, with no guarantees about the execution behaviour and labelling properties of the called program. 1126 1127 To solve the problem, we introduced the notion of a \emph{structured trace} that come in two flavours: structured traces for total programs (an inductive type) and structured traces for diverging programs (a coinductive type based on the previous one). 1128 Roughly speaking, a structured trace represents the execution of a well behaved program that is subject to several constraints, such as: 1192 1129 \begin{enumerate} 1193 \item All function calls return control just after the calling point 1194 \item The execution of all function bodies start with a label and end with 1195 a RET (even the ones reached by invoking a function pointer) 1196 \item All instructions are covered by a label (required by correctness of 1197 the labelling approach) 1198 \item The target of all conditional jumps must be labelled (a sufficient 1199 but not necessary condition for precision of the labelling approach) 1200 \item \label{prop5} Two structured traces with the same structure yield the same 1201 cost traces. 1130 \item 1131 All function calls return control just after the calling point, 1132 \item 1133 The execution of all function bodies start with a label and end with a \texttt{RET} instruction (even those reached by invoking a function pointer), 1134 \item 1135 All instructions are covered by a label (required by the correctness of the labelling approach), 1136 \item 1137 The target of all conditional jumps must be labelled (a sufficient but not necessary condition for precision of the labelling approach) 1138 \item 1139 \label{prop5} 1140 Two structured traces with the same structure yield the same cost traces. 1202 1141 \end{enumerate} 1203 1142 1204 Correctness of cost predictions is proved only for structured execution traces, 1205 i.e. well behaved programs. The forward simulation proof for all backend 1206 passes will actually be a proof of preservation of the structure of 1207 the structured traces that, because of property \ref{prop5}, will imply 1208 correctness of the cost prediction for the backend. The Clight to RTLabs 1209 will also include a proof that associates to each converging execution its 1210 converging structured trace and to each diverging execution its diverging 1211 structured trace. 1212 1213 There are also other two issues that invalidate the naive statement of 1214 correctness of cost prediciton given above. The algorithm that statically 1215 computes the cost of blocks is correct only when the object code is \emph{well 1216 formed} and the program counter is \emph{reachable}. 1217 A well formed object code is such that 1218 the program counter will never overflow after the execution step of 1219 the processor. An overflow that occurs during fetching but is overwritten 1220 during execution is, however, correct and necessary to accept correct 1221 programs that are as large as the program memory. Temporary overflows add 1222 complications to the proof. A reachable address is an address that can be 1223 obtained by fetching (not executing!) a finite number of times from the 1224 beginning of the code memory without ever overflowing. The complication is that 1225 the static prediction traverses the code memory assuming that the memory will 1226 be read sequentially from the beginning and that all jumps jump only to 1227 reachable addresses. When this property is violated, the way the code memory 1228 is interpreted is uncorrect and the cost computed is totally meaningless. 1229 The reachability relation is closed by fetching for well formed programs. 1230 The property that calls to function pointers only target reachable and 1231 well labelled locations, however, is not statically predictable and it is 1232 enforced in the structured trace. 1233 1234 The proof of correctness of cost predictions has been quite complex. Setting 1235 up the good invariants (structured traces, well formed programs, reachability) 1236 and completing the proof has required more than 3 men months while the initally 1237 estimated effort was much lower. In the paperandpencil proof for IMP, the 1238 corresponding proof was obvious and only took two lines. 1239 1240 The proof itself is quite involved. We 1241 basically need to show as an important lemma that the sum of the execution 1242 costs over a structured trace, where the costs are summed in execution order, 1243 is equivalent to the sum of the execution costs in the order of prepayment. 1244 The two orders are quite different and the proof is by mutual recursion over 1245 the definition of the converging structured traces, which is a family of three 1246 mutual inductive types. The fact that this property only holds for converging 1247 function calls in hidden in the definition of the structured traces. 1248 Then we need to show that the order of prepayment 1249 corresponds to the order induced by the cost traces extracted from the 1250 structured trace. Finally, we need to show that the statically computed cost 1251 for one block corresponds to the cost dinamically computed in prepayment 1252 order. 1143 Correctness of cost predictions is only proved for structured execution traces, i.e. well behaved programs. 1144 The forward simulation proof for all backend passes will actually be a proof of preservation of the structure of the structured traces that, because of property \ref{prop5}, will imply correctness of the cost prediction for the backend. 1145 The Clight to RTLabs correctness proof will also include a proof that associates to each converging execution its converging structured trace and to each diverging execution its diverging structured trace. 1146 1147 There are also two more issues that invalidate the na\"ive statement of correctness for cost prediciton given above. 1148 The algorithm that statically computes the costs of blocks is correct only when the object code is \emph{wellformed} and the program counter is \emph{reachable}. 1149 A wellformed objectcode is such that the program counter will never overflow after any execution step of the processor. 1150 An overflow that occurs during fetching but is overwritten during execution is, however, correct and necessary to accept correct programs that are as large as the processor's code memory. 1151 Temporary overflows add complications to the proof. 1152 A reachable address is an address that can be obtained by fetching (\emph{not executing!}) a finite number of times from the beginning of the code memory without ever overflowing. 1153 The complication is that the static prediction traverses the code memory assuming that the memory will be read sequentially from the beginning and that all jumps jump only to reachable addresses. 1154 When this property is violated, the way the code memory is interpreted is incorrect and the cost computed is meaningless. 1155 The reachability relation is closed by fetching for wellformed programs. 1156 The property that states that function pointers only target reachable and welllabelled locations, however, is not statically predictable and it is therefore enforced in the structured trace. 1157 1158 The proof of correctness for cost predictions, and even discovering the correct statement, has been quite complex. 1159 Setting up good invariants (i.e. structured traces, well formed programs, reachability) and completing the proof has required more than 3 man months, while the initally estimated effort was much lower. 1160 In the paperandpencil proof for IMP, the corresponding proof was `obvious' and only took two lines. 1161 1162 The proof itself is quite involved. 1163 We must show, as an important lemma, that the sum of the execution costs over a structured trace, where the costs are summed in execution order, is equivalent to the sum of the execution costs in the order of prepayment. 1164 The two orders are quite different and the proof is by mutual recursion over the definition of the converging structured traces, which is a family of three mutual inductive types. 1165 The fact that this property only holds for converging function calls is hidden in the definition of the structured traces. 1166 Then we need to show that the order of prepayment corresponds to the order induced by the cost traces extracted from the structured trace. 1167 Finally, we need to show that the statically computed cost for one block corresponds to the cost dynamically computed in prepayment order. 1253 1168 1254 1169 \section{Overall results} 1255 1170 1256 Functional correctness of the compiled code can be shown by composing 1257 the simulations to show that the target behaviour matches the 1258 behaviour of the source program, if the source program does not `go 1259 wrong'. More precisely, we show that there is a forward simulation 1260 between the source trace and a (flattened structured) trace of the 1261 output, and conclude equivalence because the target's semantics are 1262 in the form of an executable function, and hence 1263 deterministic. 1264 1265 Combining this with the correctness of the assignment of costs to cost 1266 labels at the ASM level for a structured trace, we can show that the 1267 cost of executing any compiled function (including the main function) 1268 is equal to the sum of all the values for cost labels encountered in 1269 the \emph{source code's} trace of the function. 1171 Functional correctness of the compiled code can be shown by composing the simulations to show that the target behaviour matches the behaviour of the source program, if the source program does not `go wrong'. 1172 More precisely, we show that there is a forward simulation between the source trace and a (flattened structured) trace of the output, and conclude equivalence because the target's semantics are in the form of an executable function, and hence deterministic. 1173 1174 Combining this with the correctness of the assignment of costs to cost labels at the ASM level for a structured trace, we can show that the cost of executing any compiled function (including the toplevel, main function of a C program) is equal to the sum of all the values for cost labels encountered in the \emph{source code's} trace of the function. 1270 1175 1271 1176 \section{Estimated effort} 1272 Based on the rough analysis performed so far we can estimate the total 1273 effort for the certification of the compiler. We obtain this estimation by 1274 combining, for each pass: 1) the number of lines of code to be certified; 1275 2) the ratio of number of lines of proof to number of lines of code from 1276 the CompCert project~\cite{compcert} for the CompCert pass that is closest to 1277 ours; 3) an estimation of the complexity of the pass according to the 1278 analysis above. The result is shown in Table~\ref{table}. 1177 1178 Based on a rough analysis performed so far we can estimate the total effort required for the certification of the whole compiler. 1179 We obtain this estimation by combining, for each pass: 1180 \begin{enumerate} 1181 \item 1182 The number of lines of code to be certified, 1183 \item 1184 The ratio of number of lines of proof to number of lines of code from the CompCert project~\cite{compcert} for the CompCert pass that is closest in spirit to our own, 1185 \item 1186 An estimation of the complexity of the pass according to the analysis above. 1187 \end{enumerate} 1188 The result is shown in Table~\ref{table}. 1279 1189 1280 1190 \begin{table}{h} … … 1302 1212 \end{table} 1303 1213 1304 We provide now some additional informations on the methodology used in the 1305 computation. The passes in Cerco and CompCert frontend closely match each 1306 other. However, there is no clear correspondence between the two backends. 1307 For instance, we enforce the calling convention immediately after instruction 1308 selection, whereas in CompCert this is performed in a later phase. Or we 1309 linearize the code at the very end, whereas CompCert performs linearization 1310 as soon as possible. Therefore, the first part of the exercise has consisted 1311 in shuffling and partitioning the CompCert code in order to assign to each 1312 CerCo pass the CompCert code that performs the same transformation. 1313 1314 After this preliminary step, using the data given in~\cite{compcert} (which 1315 are relative to an early version of CompCert) we computed the ratio between 1316 men months and lines of code in CompCert for each CerCo pass. This is shown 1317 in the third column of Table~\ref{table}. For those CerCo passes that 1318 have no correspondence in CompCert (like the optimizing assembler) or where 1319 we have insufficient data, we have used the average of the ratios computed 1320 above. 1321 1322 The first column of the table shows the number of lines of code for each 1323 pass in CerCo. The third column is obtained multiplying the first with the 1324 CompCert ratio. It provides an estimate of the effort required (in men months) 1325 if the complexity of the proofs for CerCo and Compcert would be the same. 1326 1327 The two proof styles, however, are on purpose completely different. Where 1328 CompCert uses non executable semantics, describing the various semantics with 1329 inductive types, we have preferred executable semantics. Therefore, CompCert 1330 proofs by induction and inversion become proof by functional inversion, 1331 performed using the Russel methodology (now called Program in Coq, but whose 1332 behaviour differs from Matita's one). Moreover, CompCert code is written using 1333 only types that belong to the HindleyMilner fragment, whereas we have 1334 heavily exploited dependent types all over the code. The dependent type 1335 discipline offers many advantages from the point of view of clarity of the 1336 invariants involved and early detection of errors and it naturally combines 1337 well with the Russel approach which is based on dependent types. However, it 1338 is also well known to introduce technical problems all over the code, like 1339 the need to explicitly prove type equalities to be able to manipulate 1340 expressions in certain ways. In many situations, the difficulties encountered 1341 with manipulating dependent types are better addressed by improving the Matita 1342 system, according to the formalization driven system development. For this 1343 reason, and assuming a pessimistic point of view on our performance, the 1344 fourth columns presents the final estimation of the effort required, that also 1345 takes in account the complexity of the proof suggested by the informal proofs 1346 sketched in the previous section. 1214 We provide now some additional informations on the methodology used in the computation. 1215 The passes in Cerco and CompCert frontend closely match each other. 1216 However, there is no clear correspondence between the two backends. 1217 For instance, we enforce the calling convention immediately after instruction selection, whereas in CompCert this is performed in a later phase. 1218 Further, we linearise the code at the very end, whereas CompCert performs linearisation as soon as possible. 1219 Therefore, the first part of the estimation exercise has consisted of shuffling and partitioning the CompCert code in order to assign to each CerCo pass the CompCert code that performs a similar transformation. 1220 1221 After this preliminary step, using the data given in~\cite{compcert} (which refers to an early version of CompCert) we computed the ratio between man months and lines of code in CompCert for each CerCo pass. 1222 This is shown in the third column of Table~\ref{table}. 1223 For those CerCo passes that have no correspondence in CompCert (like the optimising assembler) or where we have insufficient data, we have used the average of the ratios computed above. 1224 1225 The first column of the table shows the number of lines of code for each pass in CerCo. 1226 The third column is obtained multiplying the first with the CompCert ratio. 1227 It provides an estimate of the effort required (in man months) if the complexity of the proofs for CerCo and CompCert would be the same. 1228 1229 The two proof styles, however, are purposefully completely different. 1230 Where CompCert uses nonexecutable semantics, describing the various semantics of languages with inductive types, we have preferred executable semantics. 1231 Therefore, CompCert proofs by induction and inversion become proofs by functional inversion, performed using the Russell methodology (now called Program in Coq, but whose behaviour differs from Matita's one). 1232 Moreover, CompCert code is written using only types that belong to the HindleyMilner fragment, whereas we have heavily exploited dependent types throughout the codebase. 1233 The dependent type discipline offers many advantages, especially from the point of view of clarity of the invariants involved and early detection of errors. 1234 It also naturally combines well with the Russell approach which is based on dependent types. 1235 However, it is also known to introduce several technical problems, like the need to explicitly prove type equalities to be able to manipulate expressions in certain ways. 1236 In many situations, the difficulties encountered with manipulating dependent types are better addressed by improving the Matita system, according to the formalisation driven system development. 1237 For this reason, and assuming a pessimistic estimation of our performance, the fourth columns presents the final estimation of the effort required, that also takes into account the complexity of the proof suggested by the informal proofs sketched in the previous section. 1347 1238 1348 1239 \subsection{Contingency plan} 1349 On the basis of the proof strategy sketched in this document and the 1350 estimated effort, we can refine our contingency plan. In case we will end 1351 the certification of the basic compiler in advance we will have the choice 1352 of either proving loop optimizations and/or partial redundancy elimination 1353 correct (both tasks that seem difficult to achieve in a short time) or 1354 considering the MCS51 specific extensions introduced during the first period 1355 and underused in the formalized prototype. Yet another possibility would be 1356 to better study retargeting of the code and the commutation property between 1357 different compiler passes. The latter study is easily enabled by our 1358 approach where all backend languages are instances of the same parameterized 1359 language. 1360 1361 In the case of a consistent delay in the certification of some 1362 components, we will address first the passes that are more likely to have 1363 undetected bugs and we will follow a topdown approach, axiomatizing 1364 the properties of the data structured used in the compiler to focus more 1365 on the algorithms. The rational is that data structures are easier then 1366 algorithms to test using well known methodologies. 1367 The effort table clearly shows that commond definitions 1368 and data structures are 1/4th of the size of the code and require slightly 1369 less than 1/3rd of the total effort. At least half of this effort really goes 1370 into simple data structures (vectors, bounded and unbounded integers, tries 1371 and maps) whose certification is not interesting and whose code could be 1372 taken without reproving it from the library of any other theorem prover. 1240 1241 On the basis of the proof strategy sketched in this document and the estimated effort, we can refine our contingency plan. 1242 In case we will end the certification of the basic compiler in advance we will have the choice of either proving loop optimisations and/or partial redundancy elimination correct (both tasks seem difficult to achieve in a short time) or considering the MCS51 specific extensions introduced during the first period and underused in the formalised prototype. 1243 Yet another possibility would be to better study retargeting of the code and the commutation property between different compiler passes. 1244 The latter study is easily enabled by our approach where all backend languages are instances of the same parameterized language. 1245 1246 In the case of a consistent delay in the certification of some components, we will first address the passes that are more likely to have undetected bugs and we will follow a topdown approach, axiomatizing the properties of the data structures used in the compiler to focus more on the algorithms. 1247 The rationale is that data structures are easier than algorithms to test using wellknown methodologies. 1248 The effort table clearly shows that common definitions and data structures are one quarter of the size of the current codebase code and require slightly less than one third of the total effort. 1249 At least half of this effort really goes into simple data structures (vectors, bounded and unbounded integers, tries and maps) whose certification is not interesting and whose code could be taken without reproving it from the library of any other theorem prover. 1373 1250 1374 1251 \section{Conclusions} 1375 The overall exercise, whose details have been only minimally reported here, 1376 has been very useful. It has allowed to spot in an early moment some criticities 1377 of the proof that have required major changes in the proof plan. It has also 1378 shown that the last passes of the compilation (e.g. assembly) and cost 1379 prediction on the object code are much more involved than more high level 1380 passes. 1381 1382 The final estimation for the effort is surely affected by a low degree of 1383 confidence. It is however sufficient to conclude that the effort required 1384 is in line with the man power that was scheduled for the second half of the 1385 second period and for the third period. Compared to the number of men months 1386 declared in Annex I of the contract, we will need more men months. However, 1387 both at UNIBO and UEDIN there have been major differences in hiring with 1388 respect to the Annex. Therefore both sites have now an higher number of men 1389 months available, with the tradeoff of a lower level of maturity of the 1390 people employed. 1391 1392 The reviewers suggested that we use this estimation to compare two possible 1393 scenarios: a) proceed as planned, porting all the CompCert proofs to Matita 1394 or b) port D3.1 and D4.1 to Coq and reuse the CompCert proofs. 1395 We remark here again that the backend of the two compilers, from the 1396 memory model on, are sensibly different: we are not reproving correctness 1397 of the same piece of code. Moreover, the proof techniques are different for 1398 the frontend too. Switching to the CompCert formalization would imply 1399 the abandon of the untrusted compiler, the abandon of the experiment with 1400 a different proof technique, the abandon of the quest for an open source 1401 proof, and the abandon of the codevelopment of the formalization and the 1402 Matita proof assistant. In the Commitment Letter~\cite{letter} delivered 1403 to the Officer in May we clarified our personal perspective on the project 1404 goals and objectives. We do not redescribe here the point of view presented 1405 in the letter that we can condense in ``we value diversity''. 1406 1407 Clearly, if the execise would have suggested the infeasability in terms of 1408 effort of concluding the formalization or getting close to that, we would have 1409 abandoned our path and embraced the reviewer's suggestion. However, we 1410 have been comforted in the analysis we did in autumn and further progress done 1411 during the winter does not show yet any major delay with respect to the 1412 proof schedule. We are thus planning to continue the certification according 1413 to the more detailed proof plan that came out from the exercise reported in 1414 this manuscript. 1252 1253 The overall exercise, whose details have only been sketched here, has been very useful. 1254 It has brought to light some errors in our proof that have required major changes in the proof plan. 1255 It has also shown that the last passes of the compilation (e.g. assembly) and cost prediction on the objectcode are much more involved than more highlevel passes. 1256 1257 The final estimation for the effort required to complete the proof suffers from a low degree of confidence engendered in the numbers, due to the difficulty in relating our work, and compiler design, with that of CompCert. 1258 It is however sufficient to conclude that the effort required is in line with the man power that was scheduled for the second half of the second period and for the third period. 1259 Compared to the number of man months declared in Annex I of the contract, we will need more man months. 1260 However, at both UNIBO and UEDIN there have been major differences in hiring with respect to the Annex. 1261 Therefore both sites now have more manpower available, though with the associated tradeoff of a lower level of maturity for the people employed. 1262 1263 Our reviewers suggested that we use this estimation to compare two possible scenarios: a) proceed as planned, porting all the CompCert proofs to Matita or b) port D3.1 and D4.1 to Coq and reuse the CompCert proofs. 1264 We remark here again that the backend of the two compilers, from the memory model on, are quite different: we are not reproving correctness of the same piece of code. 1265 Moreover, the proof techniques are different for the frontend too. 1266 Switching to the CompCert formalisation would imply the abandoning of the untrusted compiler, the abandoning of the experiment with a different proof technique, the abandoning of the quest for an opensource proof, and the abandoning of the codevelopment of the formalisation and the Matita proof assistant. 1267 In the Commitment Letter~\cite{letter}, delivered to the Officer in May, we clarified our personal perspectives on the project goals and objectives. 1268 We do not redescribe here the point of view presented in the letter, other than the condensed soundbite that ``we value diversity''. 1269 1270 Clearly, if the execise would have suggested the infeasability in terms of effort of concluding the formalisation, or getting close to that, we would have abandoned our path and embraced the reviewer's suggestion. 1271 However, we have been comforted in the analysis that we did in Autumn and further progress completed during the winter does not yet show any major delay with respect to the proof schedule. 1272 We are thus planning to continue the certification according to the more detailed proof plan that came out from the exercise reported in this manuscript. 1415 1273 1416 1274 \begin{thebibliography}{2}
Note: See TracChangeset
for help on using the changeset viewer.