# Changeset 3216 for Deliverables

Ignore:
Timestamp:
Apr 29, 2013, 11:15:46 PM (8 years ago)
Message:

Revisions throughout 3.4.

File:
1 edited

Unmodified
Added
Removed
• ## Deliverables/D3.4/Report/report.tex

 r3214 \subsection{Main goals} TODO: need an example for this Informally, our main intensional result links the time difference in a source code execution to the time difference in the object code, expressing the time guarantees on; they must start at a cost label and end at the return of the enclosing function of the cost label\footnote{We expect that this would generalise to subtraces between cost labels in the same function, but could not justify the extra complexity that would be required to show this.}.  A typical example of such a subtrace is the execution of an entire function from the cost label at the start of the function until it returns.  We call such any such subtrace \emph{measurable} if it (and the prefix of the trace from the start to the subtrace) can also be executed within the available stack space. this would generalise to more general subtraces by subtracting costs for unwanted measurable suffixes of a measurable subtrace.}.  A typical example of such a subtrace is the execution of an entire function from the cost label at the start of the function until it returns.  We call such any such subtrace \emph{measurable} if it (and the prefix of the trace from the start to the subtrace) can also be executed within the available stack space. Now we can give the main intensional statement for the compiler. function calls and returns. More formally, the definition of this statement in \matita{} is \begin{lstlisting}[language=matita] definition simulates := $\lambda$p: compiler_output. let initial_status := initialise_status $\dots$ (cm (c_labelled_object_code $\dots$ p)) in let initial_status := initialise_status $...$ (cm (c_labelled_object_code $...$ p)) in $\forall$m1,m2. measurable Clight_pcs (c_labelled_clight $\dots$ p) m1 m2 (lookup_stack_cost (c_stack_cost $\dots$ p)) (c_max_stack $\dots$ p) $\rightarrow$ measurable Clight_pcs (c_labelled_clight $...$ p) m1 m2 (stack_sizes (c_stack_cost $...$ p)) (c_max_stack $...$ p) $\rightarrow$ $\forall$c1,c2. clock_after Clight_pcs (c_labelled_clight $\dots$ p) m1 (c_clight_cost_map $\dots$ p) = OK $\dots$ c1 $\rightarrow$ clock_after Clight_pcs (c_labelled_clight $\dots$ p) (m1+m2) (c_clight_cost_map $\dots$ p) = OK $\dots$ c2 $\rightarrow$ clock_after Clight_pcs (c_labelled_clight $...$ p) m1 (c_clight_cost_map $...$ p) = OK $...$ c1 $\rightarrow$ clock_after Clight_pcs (c_labelled_clight $...$ p) (m1+m2) (c_clight_cost_map $...$ p) = OK $...$ c2 $\rightarrow$ $\exists$n1,n2. observables Clight_pcs (c_labelled_clight $\dots$ p) m1 m2 = observables (OC_preclassified_system (c_labelled_object_code $\dots$ p)) (c_labelled_object_code $\dots$ p) n1 n2 observables Clight_pcs (c_labelled_clight $...$ p) m1 m2 = observables (OC_preclassified_system (c_labelled_object_code $...$ p)) (c_labelled_object_code $...$ p) n1 n2 $\wedge$ c2 - c1 = clock $\dots$ (execute n2 ? initial_status) - clock $\dots$ (execute n1 ? initial_status). clock ?? (execute (n1+n2) ? initial_status) = clock ?? (execute n1 ? initial_status) + (c2-c1). \end{lstlisting} where the \lstinline'measurable', \lstinline'clock_after' and \lstinline'observables' definitions can be applied to multiple \lstinline'observables' definitions are generic definitions for multiple languages; in this case the \lstinline'Clight_pcs' record applies them to \textsf{Clight} programs. $\forall$input_program,output. compile input_program = return output $\rightarrow$ not_wrong … (exec_inf … clight_fullexec input_program) $\rightarrow$ not_wrong $...$ (exec_inf $...$ clight_fullexec input_program) $\rightarrow$ sim_with_labels (exec_inf … clight_fullexec input_program) (exec_inf … clight_fullexec (c_labelled_clight … output)) (exec_inf $...$ clight_fullexec input_program) (exec_inf $...$ clight_fullexec (c_labelled_clight $...$ output)) \end{lstlisting} That is, any successful compilation produces a labelled program that undefined behaviour'. Note that this provides full functional correctness, including Note that this statement provides full functional correctness, including preservation of (non-)termination.  The intensional result above does not do this directly --- it does not guarantee the same result or same termination.  There are two mitigating factors, however: first, to prove the intensional property you need local simulation results that prove the intensional property you need local simulation results --- these can be pieced together to form full behavioural equivalence, only time constraints have prevented us from doing so.  Second, if we wish to The essential parts of the intensional proof were outlined during work on a toy compiler~\cite{d2.1,springerlink:10.1007/978-3-642-32469-7_3}.  These are on a toy compiler in Task 2.1~\cite{d2.1,springerlink:10.1007/978-3-642-32469-7_3}.  These are \begin{enumerate} \item functional correctness, in particular preserving the trace of Jointly, we generalised the structured traces to apply to any of the intermediate languages with some idea of program counter.  This means intermediate languages which have some idea of program counter.  This means that they are introduced part way through the compiler, see Figure~\ref{fig:compiler}.  Proving that a structured trace can be language and so the structural properties are ensured by the semantics. \item Many of the back-end languages from \textsf{RTL} share a common \item Many of the back-end languages from \textsf{RTL} onwards share a common core set of definitions, and using structured traces throughout increases this uniformity. with the transformations in \textsf{Clight} that produce the source program with cost labels, then show that measurable traces can be lifted to \textsf{RTLabs}, and finally that we can construct the lifted to \textsf{RTLabs}, and finally show that we can construct the properties listed above ready for the back-end proofs. As explained on page~\pageref{page:switchintro}, the costs of complex C \lstinline[language=C]'switch' statements cannot be represented with the simple labelling.  Our first pass replaces these statements with simpler C code, allowing our second pass to perform the cost labelling.  We show that the behaviour of programs is unchanged by these passes. the simple labelling used in the formalised compiler.  Our first pass replaces these statements with simpler C code, allowing our second pass to perform the cost labelling.  We show that the behaviour of programs is unchanged by these passes using forward simulations\footnote{All of our languages are deterministic, which can be seen directly from their executable definitions.  Thus we know that forward simulations are sufficient because the target cannot have any other behaviour.}. \subsection{Switch removal} The intermediate languages of the front-end have no jump tables. Hence, we have to compile the \lstinline[language=C]'switch' constructs away before going from \textsf{Clight} to \textsf{Cminor}. We compile away \lstinline[language=C]'switch' statements into more basic \textsf{Clight} code. Note that this transformation does not necessarily deteriorate the efficiency of the generated code. For instance, compilers such as GCC may contain \lstinline[language=C]'break' statements, which have the effect of exiting the switch statement. In the absence of break, the execution falls through each case sequentially. In our current implementation, execution falls through each case sequentially. In our implementation, we produce an equivalent sequence of if-then'' chained by gotos: \begin{lstlisting}[language=C] if(fresh == v2) { lbl_case2: $\llbracket$stmt2;$\rrbracket$ $\llbracket$stmt2$\rrbracket$; goto lbl_case2; }; The proof had to tackle the following points: \begin{itemize} \item the source and target memories are not the same (cf. fresh variable), \item the source and target memories are not the same (due to the fresh variable), \item the flow of control is changed in a non-local way (e.g. \textbf{goto} instead of \textbf{break}). steps for the added cost labels.  The extra instructions do not change the memory or local environments, although we have to keep track of the extra instructions in continuations. the extra instructions that appear in continuations, for example during the execution of a \lstinline[language=C]'while' loop. We do not attempt to capture any cost properties of the labelling in the simulation proof, allowing the proof to be oblivious to the choice the simulation proof\footnote{We describe how the cost properties are established in Section~\ref{sec:costchecks}.}, allowing the proof to be oblivious to the choice of cost labels.  Hence we do not have to reason about the threading of name generation through the labelling function, greatly reducing the amount of effort required.  We describe how the cost properties are established in Section~\ref{sec:costchecks}. amount of effort required. %TODO: both give one-step-sim-by-many forward sim results; switch \item transformation to \textsf{RTLabs} control flow graph \end{enumerate} \todo{I keep mentioning forward sim results - I probably ought to say something about determinancy} We have taken a common approach to We have taken a common approach to each pass: first we build (or axiomatise) forward simulation results that are similar to normal compiler proofs, but slightly more that are similar to normal compiler proofs, but which are slightly more fine-grained so that we can see that the call structure and relative placement of cost labels is preserved. this result we can find a measurable subtrace of the execution of the \textsf{RTLabs} code, suitable for the construction of a structured trace (see Section~\ref{sec:structuredtrace}.  This is essentially an trace (see Section~\ref{sec:structuredtrace}).  This is essentially an extra layer on top of the simulation proofs that provides us with the extra information required for our intensional correctness proof. additional information required for our intensional correctness proof. \subsection{Generic measurable subtrace lifting proof} semantics for the source and target language, a classification of states (the same form of classification is used when defining structured traces), a simulation relation which loosely respects the classification and cost labelling \todo{this isn't very helpful} and four simulation results: structured traces), a simulation relation which respects the classification and cost labelling and four simulation results.  The simulations are split by the starting state's classification and whether it is a cost label, which will allow us to observe that the call structure is preserved.  They are: \begin{enumerate} \item a step from a normal' state (which is not classified as a call \label{prog:terminationproof} \begin{lstlisting}[language=matita] let rec will_return_aux C (depth:nat) (trace:list (cs_state … C × trace)) on trace : bool := match trace with [ nil $\Rightarrow$ false | cons h tl $\Rightarrow$ let $\langle$s,tr$\rangle$ := h in match cs_classify C s with [ cl_call $\Rightarrow$ will_return_aux C (S depth) tl | cl_return $\Rightarrow$ match depth with [ O $\Rightarrow$ match tl with [ nil $\Rightarrow$ true | _ $\Rightarrow$ false ] | S d $\Rightarrow$ will_return_aux C d tl ] | _ $\Rightarrow$ will_return_aux C depth tl ] ]. let rec will_return_aux C (depth:nat) (trace:list (cs_state $...$ C × trace)) on trace : bool := match trace with [ nil $\Rightarrow$ false | cons h tl $\Rightarrow$ let $\langle$s,tr$\rangle$ := h in match cs_classify C s with [ cl_call $\Rightarrow$ will_return_aux C (S depth) tl | cl_return $\Rightarrow$ match depth with [ O $\Rightarrow$ match tl with [ nil $\Rightarrow$ true | _ $\Rightarrow$ false ] | S d $\Rightarrow$ will_return_aux C d tl ] | _ $\Rightarrow$ will_return_aux C depth tl ] ]. \end{lstlisting} The \lstinline'depth' is the number of return states we need to see \end{lstlisting} The stack space requirement that is embedded in \lstinline'measurable' is a consequence of the preservation of observables. is a consequence of the preservation of observables, because it is determined by the functions called and returned from, which are observable. TODO: how to deal with untidy edges wrt to sim rel; anything to say about obs? TODO: say something about termination measures; cost labels are TODO: cost labels are statements/exprs in these languages; split call/return gives simpler simulations \subsection{Simulation results for each pass} \paragraph{Cast simplification.} The parser used in Cerco introduces a lot of explicit type casts. We now consider the simulation results for the passes, each of which is used to instantiate the \lstinline[language=matita]'measured_subtrace_preserved' theorem to construct the measurable subtrace for the next language. \subsubsection{Cast simplification} The parser used in \cerco{} introduces a lot of explicit type casts. If left as they are, these constructs can greatly hamper the quality of the generated code -- especially as the architecture we consider is an $8$ bit one. In \textsf{Clight}, casts are we consider is an $8$-bit one. In \textsf{Clight}, casts are expressions. Hence, most of the work of this transformation proceeds on expressions. The tranformation proceeds by recursively \begin{lstlisting}[language=matita] let rec simplify_expr (e:expr) (target_sz:intsize) (target_sg:signedness) : Σresult:bool×expr. ∀ge,en,m. simplify_inv ge en m e (\snd result) target_sz target_sg (\fst result) ≝ $\ldots$ and simplify_inside (e:expr) : Σresult:expr. conservation e result ≝ $\ldots$ : $\Sigma$result:bool$\times$expr. $\forall$ge,en,m. simplify_inv ge en m e (\snd result) target_sz target_sg (\fst result) := $\ldots$ and simplify_inside (e:expr) : $\Sigma$result:expr. conservation e result := $\ldots$ \end{lstlisting} inductive simplify_inv (ge : genv) (en : env) (m : mem) (e1 : expr) (e2 : expr) (target_sz : intsize) (target_sg : signedness) : bool → Prop ≝ | Inv_eq : ∀result_flag. $\ldots$ (e1 : expr) (e2 : expr) (target_sz : intsize) (target_sg : signedness) : bool $\rightarrow$ Prop := | Inv_eq : $\forall$result_flag. $\ldots$ simplify_inv ge en m e1 e2 target_sz target_sg result_flag | Inv_coerce_ok : ∀src_sz,src_sg. (typeof e1) = (Tint src_sz src_sg) → (typeof e2) = (Tint target_sz target_sg) → (smaller_integer_val src_sz target_sz src_sg (exec_expr ge en m e1) (exec_expr ge en m e2)) → | Inv_coerce_ok : $\forall$src_sz,src_sg. typeof e1 = Tint src_sz src_sg $\rightarrow$ typeof e2 = Tint target_sz target_sg $\rightarrow$ smaller_integer_val src_sz target_sz src_sg (exec_expr ge en m e1) (exec_expr ge en m e2) $\rightarrow$ simplify_inv ge en m e1 e2 target_sz target_sg true. \end{lstlisting} The \textsf{conservation} invariant simply states the conservation The \textsf{conservation} invariant for \textsf{simplify\_inside} simply states the conservation of the semantics, as in the \textsf{Inv\_eq} constructor of the previous invariant. \begin{lstlisting}[language=matita] definition conservation ≝ λe,result. ∀ge,en,m. definition conservation := $\lambda$e,result. $\forall$ge,en,m. res_sim ? (exec_expr ge en m e) (exec_expr ge en m result) ∧ res_sim ? (exec_lvalue ge en m e) (exec_lvalue ge en m result) ∧ typeof e = typeof result. $\wedge$ res_sim ? (exec_lvalue ge en m e) (exec_lvalue ge en m result) $\wedge$ typeof e = typeof result. \end{lstlisting} inconsistently typed programs, a canonical case being a particular integer constant of a certain size typed with another size. This prompted the need to introduce numerous type checks, complexifying both the implementation and the proof. prompted the need to introduce numerous type checks, making both the implementation and the proof more complex, even though more comprehensive checks are made in the next stage. \todo{Make this a particular case of the more general statement on baking more invariants in the Clight language} \paragraph{Clight to Cminor.} This pass is the last one operating on the Clight intermediate language. Its input is a full \textsf{Clight} program, and its output is a \textsf{Cminor} program. Note that we do not use an equivalent of the C\#minor language: we translate directly to Cminor. This presents the advantage of not requiring the special loop constructs, nor the explicit block structure. Another salient point of our approach is that a significant part of the properties needed for the simulation proof were directly encoded in dependently typed translation functions. This concerns more precisely freshness conditions and well-typedness conditions. The main effects of the transformation from \textsf{Clight} to \subsection{Clight to Cminor} This pass is the last one operating on the \textsf{Clight} language. Its input is a full \textsf{Clight} program, and its output is a \textsf{Cminor} program. Note that we do not use an equivalent of Compcert's \textsf{C\#minor} language: we translate directly to a variant of \textsf{Cminor}. This presents the advantage of not requiring the special loop constructs, nor the explicit block structure. Another salient point of our approach is that a significant number of the properties needed for the simulation proof were directly encoded in dependently typed translation functions.  In particular, freshness conditions and well-typedness conditions are included. The main effects of the transformation from \textsf{Clight} to \textsf{Cminor} are listed below. local variables is moved out of the modelled memory and stored in a dedicated environment. \item In Clight, each local variable has a dedicated memory block, whereas \item In \textsf{Clight}, each local variable has a dedicated memory block, whereas stack-allocated locals are bundled together on a function-by-function basis. \item Loops are converted to jumps. manipulations (as in the later version of Compcert's memory model). We proved roughly 80 \% of the required lemmas. Some difficulties encountered were notably due to some too relaxed conditions on pointer validity (problem fixed during development). Some more conditions had to be added to take care of possible overflows when converting from \textbf{Z} block bounds to $16$ bit pointer offsets (in pratice, these overflows due to some overly relaxed conditions on pointer validity (fixed during development). Some more side conditions had to be added to take care of possible overflows when converting from \textbf{Z} block bounds to $16$ bit pointer offsets (in practice, these overflows only occur in edge cases that are easily ruled out -- but this fact is not visible in memory injections). Concretely, some of the lemmas on the preservation of simulation of loads after writes were axiomatised, for lack of time. loads after writes were axiomatised, due to a lack of time. \subparagraph{Simulation proof.} The correction proof for this transformation was not terminated. We proved the The correctness proof for this transformation was not completed. We proved the simulation result for expressions and for some subset of the critical statement cases. Notably lacking are the function entry and exit, where the memory injection is properly set up. As can be guessed, a significant amount of work has to be performed properly set up. As would be expected, a significant amount of work has to be performed to show the conservation of all invariants at each simulation step. Ideally, we would provide proofs that the cost labelling pass always produced programs that are soundly and precisely labelled and that produces programs that are soundly and precisely labelled and that each subsequent pass preserves these properties.  This would match our use of dependent types to eliminate impossible sources of errors checking this is more difficult. The implementation works through the set of nodes in the graph, The implementation progresses through the set of nodes in the graph, following successors until a cost label is found or a label-free cycle is discovered (in which case the property does not hold and we stop). was then used to prove the implementation sound and complete. While we have not attempted to proof that the cost labelled properties While we have not attempted to prove that the cost labelled properties are established and preserved earlier in the compiler, we expect that the effort for the \textsf{Cminor} to \textsf{RTLabs} would be similar to the work outlined above, because it involves the change from requiring a cost label at particular positions to requiring cost labels to break loops in the CFG.  As there are another three passes to consider (including the labelling itself), we believe that using the check above is much simpler overall. the effort for the \textsf{Cminor} to \textsf{RTLabs} stage alone would be similar to the work outlined above, because it involves the change from requiring a cost label at particular positions to requiring cost labels to break loops in the CFG.  As there are another three passes to consider (including the labelling itself), we believe that using the check above is much simpler overall. % TODO? Found some Clight to Cminor bugs quite quickly \label{sec:structuredtrace} \emph{Structured traces} enrich the execution trace of a program by nesting function calls in a mixed-step style and embedding the cost The \emph{structured traces} introduced in Section~\ref{sec:fegoals} enrich the execution trace of a program by nesting function calls in a mixed-step style and embedding the cost labelling properties of the program.  It was originally designed to support the proof of correctness for the timing analysis of the object code in the }. \end{lstlisting} which gives a type of states, an execution relation, some notion of program counters with decidable equality, the classification of states which gives a type of states, an execution relation\footnote{All of our semantics are executable, but using a relation was simpler in the abstraction.}, some notion of program counters with decidable equality, the classification of states, and functions to extract the observable intensional information (cost labels and the identity of functions that are called).  The The structured traces are defined using three mutually inductive types.  The core data structure is \lstinline'trace_any_label', which captures some straight-line execution until the next cost label or function return.  Each function call is embedded as a single step, with its own trace nested inside and the before and after states linked by \lstinline'as_after_return', and states classified as a jump' (in particular branches) must be followed by a cost label. captures some straight-line execution until the next cost label or the return from the enclosing function.  Any function calls are embedded as a single step, with its own trace nested inside and the before and after states linked by \lstinline'as_after_return'; and states classified as a jump' (in particular branches) must be followed by a cost label. The second type, \lstinline'trace_label_label', is a The construction of the structured trace replaces syntactic cost labelling properties which place requirements on where labels appear labelling properties, which place requirements on where labels appear in the program, with semantics properties that constrain the execution traces of the program.  The construction begins by defining versions of the sound and precise labelling properties on the program text that appears in states rather than programs, and showing that these are preserved by steps of the \textsf{RTLabs} semantics. of the sound and precise labelling properties on states and global environments (for the code that appears in each of them) rather than whole programs, and showing that these are preserved by steps of the \textsf{RTLabs} semantics. Then we show that each cost labelling property the structured traces definition requires is satisfied.  These are broken up by the definition requires is locally satisfied.  These are broken up by the classification of states.  Similarly, we prove a step-by-step stack preservation result, which states that the \textsf{RTLabs} semantics label. \subsubsection{Whole program structured traces} \subsubsection{Complete execution structured traces} The development of the construction above started relatively early, before the measurable subtraces preservation proofs.  To be confident that the traces were well-formed, we also developed a whole program form that embeds the traces above.  This includes non-terminating programs, where an infinite number of the terminating structured traces are embedded.  This construction confirmed that our definition of structured traces was consistent, although we later found that we did not require them for the compiler correctness results. before the measurable subtrace preservation proofs.  To be confident that the traces were well-formed at that time, we also developed a complete execution form that embeds the traces above.  This includes non-terminating program executions, where an infinite number of the terminating structured traces are embedded.  This construction confirmed that our definition of structured traces was consistent, although we later found that we did not require the whole execution version for the compiler correctness results. To construct these we need to know whether each function call will concentrating on the novel intensional parts of the proof, we have shown that it is possible to construct certifying compilers that correctly report execution time and stack space costs. correctly report execution time and stack space costs.  The layering of intensional correctness proofs on top of normal simulation results provides a useful separation of concerns, and could permit the reuse of existing results. TODO: appendix on code layout?
Note: See TracChangeset for help on using the changeset viewer.