Changeset 3231
 Timestamp:
 Apr 30, 2013, 4:53:05 PM (5 years ago)
 Location:
 Deliverables/D3.4/Report
 Files:

 2 edited
Legend:
 Unmodified
 Added
 Removed

Deliverables/D3.4/Report/report.bib
r3212 r3231 93 93 pages = {263288} 94 94 } 95 96 @article{2008LeroyBlazymemorymodel, 97 author = {Xavier Leroy and Sandrine Blazy}, 98 title = {Formal verification of a {C}like memory model 99 and its uses for verifying program transformations}, 100 journal = {Journal of Automated Reasoning}, 101 volume = 41, 102 number = 1, 103 pages = {131}, 104 url = {http://gallium.inria.fr/~xleroy/publi/memorymodeljournal.pdf}, 105 urlpublisher = {http://dx.doi.org/10.1007/s1081700890990}, 106 hal = {http://hal.inria.fr/inria00289542/}, 107 year = 2008, 108 pubkind = {journalintmulti} 109 } 
Deliverables/D3.4/Report/report.tex
r3228 r3231 132 132 of the compiler can be layered on top of normal forward simulation results, 133 133 if we split those results into local callstructure preserving simulations. 134 This split allowed us to concentrate on the intensionalproofs by135 axiomatising some of the simulation results that are very similar to136 existing compiler correctness results .134 This split allowed us to concentrate on the \textbf{intensional} proofs by 135 axiomatising some of the extensional simulation results that are very similar to 136 existing compiler correctness results, such as CompCert. 137 137 138 138 This report is about the correctness results that are deliverable … … 168 168 into the verification of the whole compiler. 169 169 170 A key part of this work was to layer the intensionalcorrectness170 A key part of this work was to layer the \emph{intensional} correctness 171 171 results that show that the costs produced are correct on top of the 172 proofs about the compiled code's extensionalbehaviour (that is, the172 proofs about the compiled code's \emph{extensional} behaviour (that is, the 173 173 functional correctness of the compiler). Unfortunately, the ambitious 174 174 goal of completely verifying the entire compiler was not feasible 175 175 within the time available, but thanks to this separation of 176 extensional and intensional proofs we are able to axiomati ze some176 extensional and intensional proofs we are able to axiomatise some extensional 177 177 simulation results which are similar to those in other compiler verification 178 178 projects and concentrate on the novel intensional proofs. We were … … 192 192 describe the statements we need to prove about the intermediate \textsf{RTLabs} 193 193 programs for the backend proofs. 194 Section~\ref{sec:inputtolabelling} covers the passes which produce the194 Section~\ref{sec:inputtolabelling} covers the compiler passes which produce the 195 195 annotated source program and Section~\ref{sec:measurablelifting} the rest 196 of the transformations in the frontend. Then the compile 196 of the transformations in the frontend. Then the compiletime checks 197 197 for good cost labelling are detailed in Section~\ref{sec:costchecks} 198 198 and the proofs that the structured traces required by the backend 199 199 exist are discussed in Section~\ref{sec:structuredtrace}. 200 200 201 \section{The compiler and main goals}202 203 The un formalised\ocaml{} \cerco{} compiler was originally described201 \section{The compiler and its correctness statement} 202 203 The uncertified prototype \ocaml{} \cerco{} compiler was originally described 204 204 in Deliverables 2.1 and 2.2. Its design was replicated in the formal 205 205 \matita{} code, which was presented in Deliverables 3.2 and 4.2, for … … 217 217 middle two lines of Figure~\ref{fig:compilerlangs}. The upper line 218 218 represents the frontend of the compiler, and the lower the backend, 219 finishing with 8051 binary code. Not all of the frontendpasses220 introduce sa new language, and Figure~\ref{fig:summary} presents a219 finishing with Intel 8051 binary code. Not all of the frontend compiler passes 220 introduce a new language, and Figure~\ref{fig:summary} presents a 221 221 list of every pass involved. 222 222 … … 226 226 \begin{tabbing} 227 227 \quad \= $\downarrow$ \quad \= \kill 228 \textsf{C} (unformali zed)\\229 \> $\downarrow$ \> CIL parser (unformali zed \ocaml)\\228 \textsf{C} (unformalised)\\ 229 \> $\downarrow$ \> CIL parser (unformalised \ocaml)\\ 230 230 \textsf{Clight}\\ 231 231 %\> $\downarrow$ \> add runtime functions\\ … … 250 250 251 251 \label{page:switchintro} 252 The annotated source code is taken afterthe cost labelling phase.252 The annotated source code is produced by the cost labelling phase. 253 253 Note that there is a pass to replace C \lstinline[language=C]'switch' 254 254 statements before labelling  we need to remove them because the … … 291 291 annotated source code, it also inserts the actual costs alongside the 292 292 cost labels, and optionally adds a global cost variable and 293 instrumentation to support further reasoning. 293 instrumentation to support further reasoning in external tools such as 294 FramaC. 294 295 295 296 \subsection{Revisions to the prototype compiler} … … 310 311 311 312 The use of dependent types to capture simple intermediate language 312 invariants makes every frontend pass except \textsf{Clight} to313 \textsf{C minor} and the cost checks total functions. Hence various314 wellformedness and type safety checks are performed on ce between313 invariants makes every frontend pass a total function, except 314 \textsf{Clight} to \textsf{Cminor} and the cost checks. Hence various 315 wellformedness and type safety checks are performed only once between 315 316 \textsf{Clight} and \textsf{Cminor}, and the invariants rule out any 316 difficulties in the later stages. 317 With the benefit of hindsight we would have included an initial 318 checking phase to produce a `wellformed' variant of \textsf{Clight}, 319 conjecturing that this would simplify various parts of the proofs for 320 the \textsf{Clight} stages which deal with potentially illformed 321 code. 322 323 Following D2.2, we previous generated code for global variable 317 difficulties in the later stages. With the benefit of hindsight we 318 would have included an initial checking phase to produce a 319 `wellformed' variant of \textsf{Clight}, conjecturing that this would 320 simplify various parts of the proofs for the \textsf{Clight} stages 321 which deal with potentially illformed code. 322 323 Following D2.2, we previously generated code for global variable 324 324 initialisation in \textsf{Cminor}, for which we reserved a cost label 325 325 to represent the execution time for initialisation. However, the … … 328 328 initialisation code to the backend and merge the costs. 329 329 330 \subsection{Main goals}330 \subsection{Main correctness statement} 331 331 332 332 Informally, our main intensional result links the time difference in a source … … 433 433 theorem in the file \lstinline'correctness.ma'. 434 434 435 \section{ Goalsfor the frontend}435 \section{Correctness statement for the frontend} 436 436 \label{sec:fegoals} 437 437 … … 588 588 \end{itemize} 589 589 In order to tackle the first point, we implemented a version of memory 590 extensions similar to CompCert's.590 extensions similar to those of CompCert. 591 591 592 592 For the simulation we decided to prove a sufficient amount to give us … … 609 609 The invariant we would need is the fact that a global label lookup on a freshly 610 610 created goto is equivalent to a local lookup. This would in turn require the 611 propagation of some freshness hypotheses on labels. For reasons expressed above,611 propagation of some freshness hypotheses on labels. As discussed, 612 612 we decided to omit this part of the correctness proof. 613 613 … … 622 622 during the execution of a \lstinline[language=C]'while' loop. 623 623 624 We do not attempt to capture any cost properties of the labelling in625 the simulation proof\footnote{We describe how the cost properties are 626 established in Section~\ref{sec:costchecks}.}, allowingthe proof to be oblivious to the choice624 We do not attempt to capture any cost properties of the labelling\footnote{We describe how the cost properties are 625 established in Section~\ref{sec:costchecks}.} in 626 the simulation proof, which allows the proof to be oblivious to the choice 627 627 of cost labels. Hence we do not have to reason about the threading of 628 628 name generation through the labelling function, greatly reducing the … … 726 726 introduce extra steps, for example to store parameters in memory in 727 727 \textsf{Cminor}. Thus we have a version of the call simulation 728 that deals with both in one result.729 730 In addition to the subtrace we are interested in measuring we must731 alsoprove that the earlier part of the trace is also preserved in732 order to use the simulation from the initial state. Italso728 that deals with both the call and the cost label in one result. 729 730 In addition to the subtrace we are interested in measuring, we must 731 prove that the earlier part of the trace is also preserved in 732 order to use the simulation from the initial state. This proof also 733 733 guarantees that we do not run out of stack space before the subtrace 734 734 we are interested in. The lemmas for this prefix and the measurable … … 739 739 \begin{lstlisting}[language=matita] 740 740 let rec will_return_aux C (depth:nat) 741 (trace:list (cs_state $...$ C ×trace)) on trace : bool :=741 (trace:list (cs_state $...$ C $\times$ trace)) on trace : bool := 742 742 match trace with 743 743 [ nil $\Rightarrow$ false … … 766 766 Combining the lemmas about the prefix and the measurable subtrace 767 767 requires a little care because the states joining the two might not be 768 in the simulation relation. In particular, if the measurable subtrace768 related in the simulation. In particular, if the measurable subtrace 769 769 starts from the cost label at the beginning of the function there may 770 770 be some extra instructions in the target code to execute to complete 771 771 function entry before the states are back in the relation. Hence we 772 carefully phrased the lemmas to allow for theseextra steps.772 carefully phrased the lemmas to allow for such extra steps. 773 773 774 774 Together, these then gives us an overall result for any simulation fitting the … … 830 830 The \textsf{simplify\_inv} invariant encodes either the conservation 831 831 of the semantics during the transformation corresponding to the failure 832 of the transformation (\textsf{Inv\_eq} constructor), or the successful832 of the coercion (\textsf{Inv\_eq} constructor), or the successful 833 833 downcast of the considered expression to the target type 834 834 (\textsf{Inv\_coerce\_ok}). … … 900 900 Our memory injections are modelled after the work of Blazy \& Leroy. 901 901 However, the corresponding paper is based on the first version of the 902 CompCert memory model , whereas we use a much more concrete model, allowing bytelevel902 CompCert memory model~\cite{2008LeroyBlazymemorymodel}, whereas we use a much more concrete model, allowing bytelevel 903 903 manipulations (as in the later version of CompCert's memory model). We proved 904 roughly 80 \% of the required lemmas. Some difficulties encountered were notably905 due to someoverly relaxed conditions on pointer validity (fixed during development).904 roughly 80 \% of the required lemmas. Notably, some of the difficulties encountered were 905 due to overly relaxed conditions on pointer validity (fixed during development). 906 906 Some more side conditions had to be added to take care of possible overflows when converting 907 from \textbf{Z} block bounds to $16$ bit pointer offsets (in practice, theseoverflows907 from \textbf{Z} block bounds to $16$ bit pointer offsets (in practice, such overflows 908 908 only occur in edge cases that are easily ruled out  but this fact is not visible 909 909 in memory injections). Concretely, some of the lemmas on the preservation of simulation of … … 917 917 translate loops because our approach differs from CompCert by 918 918 converting directly to \textsf{Cminor} \lstinline[language=C]'goto's 919 rather than maintaining a notion of loop s. We also have a partial919 rather than maintaining a notion of loop in \textsf{Cminor}. We also have a partial 920 920 proof for function entry, covering the setup of the memory injection, 921 921 but not function exit. Exits, and the remaining statements, have been 922 922 axiomatised. 923 923 924 Careful management of the proof state was required due to the proof 925 terms that are embedded in \textsf{Cminor} code to show that some 926 invariants are respected. These proof terms can be large and awkward, 927 and while generalising them away is usually sufficient, it can be 928 difficult when they appear under a binder. 924 Careful management of the proof state was required because proof terms 925 are embedded in \textsf{Cminor} code to show that invariants are 926 respected. These proof terms appear in the proof state when inverting 927 the translation functions, and they can be large and awkward. While 928 generalising them away is usually sufficient, it can be difficult when 929 they appear under a binder. 929 930 930 931 %The correctness proof for this transformation was not completed. We proved the … … 940 941 The translation from \textsf{Cminor} to \textsf{RTLabs} is a fairly 941 942 routine control flow graph (CFG) construction. As such, we chose to 942 axiomatise itssimulation results. However, we did prove several943 axiomatise the associated extensional simulation results. However, we did prove several 943 944 properties of the generated programs: 944 945 \begin{itemize} … … 950 951 951 952 These properties rely on similar properties about type safety and the 952 presence of \lstinline[language=C]'goto'labels for Cminorprograms953 presence of \lstinline[language=C]'goto'labels for \textsf{Cminor} programs 953 954 which are checked at the preceding stage. As a result, this 954 955 transformation is total and any compilation failures must occur when … … 959 960 graph inclusion. We automated these proofs using a small amount of 960 961 reflection, making the obligations much easier to handle. One 961 drawback to enforcing invariants th oroughlyis that temporarily962 drawback to enforcing invariants throughout is that temporarily 962 963 breaking them can be awkward. For example, \lstinline'return' 963 964 statements were originally used as placeholders for … … 994 995 995 996 \subsection{Implementation and correctness} 997 \label{sec:costchecksimpl} 996 998 997 999 For a cost labelling to be sound and precise we need a cost label at … … 1005 1007 The implementation progresses through the set of nodes in the graph, 1006 1008 following successors until a cost label is found or a labelfree cycle 1007 is discovered (in which case the property does not hold and we stop). 1008 This is made easier by the prior knowledge that any branch is followed 1009 by cost labels, so we do not need to search each branch. When a label 1010 is found, we remove the chain from the set and continue from another 1011 node in the set until it is empty, at which point we know that there 1012 is a bound for every node in the graph. 1013 1014 Directly reasoning about the function that implements this would be 1009 is discovered (in which case the property does not hold and we return 1010 an error). This is made easier by the prior knowledge that every 1011 successor of a branch instruction is a cost label, so we do not need 1012 to search each branch. When a label is found, we remove the chain of 1013 program counters from the set and continue from another node in the 1014 set until it is empty, at which point we know that there is a bound 1015 for every node in the graph. 1016 1017 Directly reasoning about the function that implements this procedure would be 1015 1018 rather awkward, so an inductive specification of a single step of its 1016 1019 behaviour was written and proved to match the implementation. This … … 1031 1034 \label{sec:structuredtrace} 1032 1035 1033 The \emph{structured traces} introduced in Section~\ref{sec:fegoals} enrich the execution trace of a program by 1034 nesting function calls in a mixedstep style and embedding the cost labelling 1035 properties of the program. It was originally designed to support the 1036 proof of correctness for the timing analysis of the object code in the 1037 backend, then generalised to provide a common structure to use from 1038 the end of the frontend to the object code. See 1039 Figure~\ref{fig:strtrace} on page~\pageref{fig:strtrace} for an 1040 illustration of a structured trace. 1036 The \emph{structured trace} idea introduced in 1037 Section~\ref{sec:fegoals} enriches the execution trace of a program by 1038 nesting function calls in a mixedstep style and embedding the cost 1039 labelling properties of the program. See Figure~\ref{fig:strtrace} on 1040 page~\pageref{fig:strtrace} for an illustration of a structured trace. 1041 It was originally designed to support the proof of correctness for the 1042 timing analysis of the object code in the backend, then generalised 1043 to provide a common structure to use from the end of the frontend to 1044 the object code. 1041 1045 1042 1046 To make the definition generic we abstract over the semantics of the … … 1056 1060 }. 1057 1061 \end{lstlisting} 1058 which gives a type of states, an execution relation\footnote{All of1062 which requires a type of states, an execution relation\footnote{All of 1059 1063 our semantics are executable, but using a relation was simpler in 1060 the abstraction.}, some notion of 1061 program counter swith decidable equality, the classification of states,1064 the abstraction.}, some notion of abstract 1065 program counter with decidable equality, the classification of states, 1062 1066 and functions to extract the observable intensional information (cost 1063 1067 labels and the identity of functions that are called). The … … 1093 1097 The construction of the structured trace replaces syntactic cost 1094 1098 labelling properties, which place requirements on where labels appear 1095 in the program, with semantic sproperties that constrain the execution1099 in the program, with semantic properties that constrain the execution 1096 1100 traces of the program. The construction begins by defining versions 1097 1101 of the sound and precise labelling properties on states and global … … 1100 1104 \textsf{RTLabs} semantics. 1101 1105 1102 Then we show that each cost labelling property the structured traces1103 definition requires is locally satisfied. These are broken up by the1104 classification of states. Similarly, we prove a stepbystep stack 1105 preservation result, which states that the \textsf{RTLabs} semantics 1106 never changes the lower parts of the stack.1106 Then we show that each cost labelling property required by the 1107 definition of structured traces is locally satisfied. These proofs are 1108 broken up by the classification of states. Similarly, we prove a 1109 stepbystep stack preservation result, which states that the 1110 \textsf{RTLabs} semantics never changes the lower parts of the stack. 1107 1111 1108 1112 The core part of the construction of a structured trace is to use the … … 1121 1125 with the property that there is a bound on the number of successor 1122 1126 instructions you can follow in the CFG before you encounter a cost 1123 label .1127 label (from Section~\ref{sec:costchecksimpl}). 1124 1128 1125 1129 \subsubsection{Complete execution structured traces}
Note: See TracChangeset
for help on using the changeset viewer.