# Changeset 3356

Ignore:
Timestamp:
Jun 14, 2013, 4:21:50 PM (4 years ago)
Message:

...

File:
1 edited

### Legend:

Unmodified
 r3355 \label{labelling} % \subsection{A brief introduction to the labelling approach} We briefly explain the labelling approach on the example in \autoref{examplewhile}. We briefly explain the labelling approach as introduced in~\cite{easylabelling} on the example in \autoref{examplewhile}. The user wants to analyse the execution time of the program (the black lines in \autoref{subfig:example_input}). He compiles the program using a special compiler that first inserts in the code three label emission statements (\verb+EMIT L_i+) to mark the beginning of basic blocks; statements (\verb+EMIT L_i+) to mark the beginning of basic blocks (\autoref{subfig:example_input}); then the compiler compiles the code to machine code (\autoref{subfig:example_oc}), granting that the execution of the source and object This is achieved by keeping track of basic blocks during compilation, avoiding all optimizations that alter the control flow. The latter can be recovered with a more refined version of the labelling approach~\cite{tranquill}, but in the a more refined version of the labelling approach~\cite{loopoptimizations}, but in the present paper we stick to this simple variant for simplicity. Once the object code is produced, the compiler runs a static code analyzer to associate to never exceeds a certain bound~\cite{cerco}, which is now a functional property of the code. \begin{figure} \vspace{-0.5cm} \begin{figure}[!h] \begin{subfigure}[t]{.32\linewidth} \begin{lstlisting}[moredelim={[is][\color{red}]{|}{|}}] \end{figure} \subsection{The labelling approach in presence of calls} \section{Extending the labelling approach to function calls} % Let's now consider a simple program written in C that contains a function \begin{lstlisting}[xleftmargin=20pt] void main() { EMIT L_1; EMIT $L_1$; $I_1$; (*f)(); $I_2$; } void g() { EMIT L_2; EMIT $L_2$; $I_3$; } \begin{lstlisting}[xleftmargin=20pt] main: EMIT L_1 EMIT $L_1$ $I_4$ CALL $I_5$ RETURN g: EMIT L_2 EMIT $L_2$ $I_6$ RETURN stack so that the next \verb+RETURN+ would start at the second instruction of $I_5$. The compiler would still be perfectly correct if a random, dead code instruction was also added just after each \verb+CALL+. More generally, code instruction was added after the \verb+CALL+ as the first instruction of $I_5$. More generally, \emph{there is no guarantee that a correct compiler that respects the functional behaviour of a program also respects the calling structure of the source code}. stack). This property, however, is a property of the runs of object code programs, and not a property of the object code that can be easily statically verified (as the ones we required for the basic labelling method). verified (as the ones required in \autoref{labelling} in absence of function calls). Therefore, we now need to single out those runs whose cost behaviour can be statically predicted, and we need to prove that every run of programs generated transition systems; 2) we show that on the object code we can correctly compute the execution time of a structured run from the sequence of labels observed; 3) we give unstructured languages a semantics in terms of structured traces; 4) we show that on the source code we can correctly compute the observed; 3) we show that on the source code we can correctly compute the execution time of a program if the compiler produces object code whose runs are weakly similar to the source code runs. The notion of weak simulation for structured traces is a global property which is hard to prove formally and much more demanding than the simple forward simulation required for proofs of preservation of functional properties. Therefore in \autoref{simulation} we will present a set of local simulation conditions that refine the corresponding conditions for forward simulation and that are sufficient to grant the production of weakly similar traces. runs are weakly similar to the source code runs and structured. The proof of correctness of such a compiler is harder than a traditional proof of preservation of functional properties, and a standard forward simulation argument does not work. In \autoref{simulation} we present a refinement of forward simulation that grants all required correctness properties. All the definitions and theorems presented in the paper have been formalized % in the semantics. This is the topic of the next section. \section{Structured traces} \subsection{Structured traces} \label{semantics} deterministic labelled transition system~\cite{LTS} $(S,\Lambda,\to)$ that refines the SOS by observing function calls and the beginning of basic blocks. SOS by observing function calls/returns and the beginning of basic blocks. $S$ is the set of states of the program and $\Lambda = \{ \tau, RET \} \cup \Labels \cup \Functions$ cases when the fragment starts at the beginning of a function body (e.g. the one of \texttt{main}) because in that case nobody would have emitted the observable $\ell(f)$. the observable $\ell(f)$. We plan to compare the two approaches in the future. \paragraph{Measurable execution fragments and their cost prediction.} The first main theorem of CerCo deals with programs written in object code. The first main theorem of CerCo deals with the object code. It states that the execution cost of certain execution fragments, that we call \emph{measurable fragments}, can be $s_0 \To s_n$ and $s_n$ must be a label emission statement. \textbf{CSC: PROVA----------------------} %\textbf{CSC: PROVA----------------------} % The theorem is proved by structural induction over the structured % trace, and is based on the invariant that accurate for the object code, we get as a corollary that it is also accurate for the source code if the compiler preserves weak traces and propagates measurability. Thus it becomes possible to compute cost models propagates measurability. Thus, as prescribed by the CerCo's methodology~\cite{fopara}, it becomes possible to compute cost models on the object code, transfer it to the source code and then reason comfortably on the source code only. \end{theorem} \section{Forward simulation} \section{Proving the compiler correctness} \label{simulation} Because of \autoref{preservation}, to certify a compiler for the labelling approach we need to both prove that it respects the functional semantics of the program, and that it preserves weak traces and propagates measurability. We achieve this by independently proving the three properties for each compiler pass. The first property is standard and can be proved by means of a forward simulation argument (see for example~\cite{compcert}) that runs like this. First a relation between the corresponding source and target states is established. Then a lemma establishes source and target states is defined. Then a lemma establishes a local simulation condition: given two states in relation, if the source one performs one step then the target one performs zero or more steps and the two resulting states are synchronized again according to the relation. No requirements are imposed on the intermediate target states. Finally, the lemma is iterated over the execution trace to establish the final result. kind of local or global analysis of the code followed by a compositional, order preserving translation of every instruction. In order to produce structured traces, however, code emission cannot be fully compositional any longer. traces, however, code emission cannot be fully compositional any longer, and requirements need to be enforced on intermediate target states. For example, consider~requirement \ref{req3} that asks every function body the state just after the return in the source code is matched by the state $s_2$ after these steps in the object code. However, the aforementioned requirement does not involve $s_2$, but the state reached after the return in the object code, that preceeds $s_2$ in the execution fragment. Therefore the requirement does not involve $s_2$, but the intermediate state reached after the return in the object code. Therefore this requirement too cannot be enforced with the standard forward simulation argument. % an equivalence relation. \textbf{----------------------------} We summarise here the results of the previous sections. Each intermediate unstructured language can be given a semantics based on structured traces, that single out those runs that respect a certain number of invariants. A cost model can be computed on the object code and it can be used to predict the execution costs of runs that produce structured traces. The cost model can be lifted from the target to the source code of a pass if the pass maps structured traces to similar structured traces. The latter property is called a \emph{forward simulation}. As for labelled transition systems, in order to establish the forward simulation we are interested in (preservation of observables), we are forced to prove a stronger notion of forward simulation that introduces an explicit relation between states. The classical notion of a 1-to-many forward simulation is the existence of a relation $\S$ over states such that if $s_1 \S s_2$ and $s_1 \to^1 s_1'$ then there exists an $s_2'$ such that $s_2 \to^* s_2'$ and $s_1' \S s_2'$. In our context, we need to replace the one and multi step transition relations $\to^n$ with the existence of a structured trace between the two states, and we need to add the request that the two structured traces are similar. Thus what we would like to state is something like:\\ for all $s_1,s_2,s_1'$ such that there is a $\tau_1$ from $s_1$ to $s_1'$ and $s_1 \S s_2$ there exists an $s_2'$ such that $s_1' \S s_2'$ and a $\tau_2$ from $s_2$ to $s_2'$ such that $\tau_1$ is similar to $\tau_2$. We call this particular form of forward simulation \emph{trace reconstruction}. The statement just introduced, however, is too simplistic and not provable in the general case. To understand why, consider the case of a function call and the pass that fixes the parameter passing conventions. A function call in the source code takes in input an arbitrary number of pseudo-registers (the actual parameters to pass) and returns an arbitrary number of pseudo-registers (where the result is stored). A function call in the target language has no input nor output parameters. The pass must add explicit code before and after the function call to move the pseudo-registers content from/to the hardware registers or the stack in order to implement the parameter passing strategy. Similarly, each function body must be augmented with a preamble and a postamble to complete/initiate the parameter passing strategy for the call/return phase. Therefore what used to be a call followed by the next instruction to execute after the function return, now becomes a sequence of instructions, followed by a call, followed by another sequence. The two states at the beginning of the first sequence and at the end of the second sequence are in relation with the status before/after the call in the source code, like in an usual forward simulation. How can we prove however the additional condition for function calls that asks that when the function returns the instruction immediately after the function call is called? To grant this invariant, there must be another relation between the address of the function call in the source and in the target code. This additional relation is to be used in particular to relate the two stacks. Another example is given by preservation of code emission statements. A single code emission instruction can be simulated by a sequence of steps, followed by a code emission, followed by another sequence. Clearly the initial and final statuses of the sequence are to be in relation with the status before/after the code emission in the source code. In order to preserve the structured traces invariants, however, we must consider a second relation between states that traces the preservation of the code emission statement. @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ Therefore we now introduce an abstract notion of relation set between abstract Therefore we have successfully splitted as much as possible the proof of preservation of functional properties from that of non-functional ones. Secondly, combined with the results in the previous section, it implies that the cost model can be computed on the object code and lifted to the source code to reason on non-functional properties, assuming that the 1-to-many forward simulation conditions are fulfilled for every compiler pass. @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ \paragraph{Relation sets.} @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ Let $S_1$ and $S_2$ be two deterministic labelled transition systems as described in \autoref{semantics}. We introduce now the four relations $\mathcal{S,C,R,L}\subseteq S_1\times S_2$ Two states $s_1$ and $s_2$ are $\R$-related if every time $s_1$ is the successor of a call state that is $\C$-related to a call state successor of a call state $s_1'$ that is $\C$-related to a call state $s_2'$ in the target code, then $s_2$ is the successor of $s_2'$. Formally: $$s_1\R s_2 \iffdef \forall s_1',s_2'.s_1'\C s_2' \to s_1'\ar s_1 \to s_2' \ar s_2.$$ We will require all pairs of states that return from related calls to be $\R$-related. This is the fundamental requirement granting that the target trace is well structured, \emph{i.e.}\ that calls are well nested and returning where they are supposed to. $\R$-related. This, in combinantion with a dual requirement on function calls, will grant that calls return exactly where they are supposed to be. We say states in $s_1\in S_1$ and $s_2\in S_2$ are label-related \end{itemize} The results that follow all rely on common hypotheses to hold for the systems $S_1$ and $S_2$ endowed with the relations $\S$ and $\C$. These hypotheses are depicted by the diagrams in \autoref{fig:forwardsim}. Given the relations $\S$ and $\C$, \autoref{fig:forwardsim} defines a set of local simulation conditions that are sufficient to grant the correctness of the compiler. \begin{figure} \centering \draw [new, overlay] (t1) to [bend left ] node [rel] {$\R$} (r); \end{tikzpicture} \caption{The hypotheses of ???. Dashed relations are implied by solid ones.} \caption{Local simulation conditions. Each one states that the existence of states in~XXX such that the dashed relations holds is implied by the assumptions drawn as solid lines.} \label{fig:forwardsim} \end{figure} \begin{lemma} If $S_1$, $S_2$,$\S$ and $\C$ satisfy the diagrams in \autoref{fig:forwardsim}, $T_1:s_1\To s_1'$ is a structured fragment not starting with a $\ell(f)$ emission, and $s_1\S s_2$, then there is $T_2:s_2\To s_2'$ with $T\approx T'$ and $s_1'\S s_2$. If $S_1,S_2,\S,\C$ satisfy the diagrams in \autoref{fig:forwardsim}, $T_1=s_1\To s_1'$ is a structured fragment not starting with a $\ell(f)$ emission, and $s_1\S s_2$, then there is $T_2=s_2\To s_2'$ with $T\approx T'$ and $s_1'\S s_2'$. \end{lemma} \begin{theorem} If $S_1$, $S_2$, $\S$ and $\C$ satisfy the diagrams in \autoref{fig:forwardsim}, $M_1:s_1\MeasTo s_1'$ is a measurable fragment of $S_1$ and $s_2$ is such that $s_1\L s_2$, then there is $M_2:s_2\MeasTo s_2'$ with $M_1\approx M_2$. If $S_1,S_2,\S,\C$ satisfy the diagrams in \autoref{fig:forwardsim}, $M_1:s_1\to^{*} s_1'$ is a measurable fragment of $S_1$ and $s_2$ is such that $s_1\L s_2$, then there is $M_2:s_2\to^{*} s_2'$ with $M_1\approx M_2$. \end{theorem} @@@@@@@@@@@@@@@@@@@@@@@ can be computed precisely on the source. In this paper we scale the labelling approach to cover a programming language with function calls. This introduces new difficulties only when the language In this paper we scaled the labelling approach to cover a programming language with function calls. This introduces new difficulties when the language is unstructured, i.e. it allows function calls to return anywhere in the code, destroying the hope of a static prediction of the cost of basic blocks. We restore static predictability by introducing a new semantics for unstructured We restored static predictability by introducing a new semantics for unstructured programs that single outs well structured executions. The latter are represented by structured traces, a generalisation of streams of observables that capture several structural invariants of the execution, like well nesting of functions or the fact that every basic block must start with a code emission statement. We show that structured traces are sufficiently structured to statically compute a precise cost model on the object code. We introduce a similarity relation on structured traces that must hold between source and target traces. When the relation holds for every program, we prove that the cost model can be lifted from the object to the source code. In order to prove that similarity holds, we present a generic proof of forward simulation that is aimed at pulling apart as much as possible the part of the simulation related to non-functional properties (preservation of structure) from that related to functional properties. In particular, we reduce the problem of preservation of structure to that of showing a 1-to-many forward simulation that only adds a few additional proof obligations to those of a traditional, function properties only, proof. We showed that structured traces are sufficiently well behaved to statically compute a precise cost model on the object code. We also proved that the cost model computed on the object code is also valid on the source code if the compiler respects two requirements: the weak execution traces of the source and target code must be the same and the object code execution fragments are structured. To prove that a compiler respects the requirement we extended the notion of forward simulation proof for a labelled transition system to grant preservation of structured fragments. If the source language of the compiler is structured, all its execution fragments are, allowing to deduce from preservation of structure that object code fragments are structured too. Finally, we identified measurable execution fragments that are those whose execution time (once compiled) can be exactly computed looking at the object code weak execution traces only. A final instrumentation pass on the source code can then be used to turn the non functional property of having a certain cost into the functional property of granting that a certain global variable incremented at the beginning of every block according to the induced cost model has a certain value. All results presented in the paper are part of a larger certification of a insensible to the resource being measured. Indeed, any cost model computed on the object code can be lifted to the source code (e.g. stack space used, energy consumed, etc.). On the contrary, clock functions only talk about energy consumed, etc.) simply re-proving an analogue of~\autoref{static}. For example, in CerCo we transported to the source level not only the execution time cost model, but also the amount of stack used by function calls. On the contrary, clock functions only talk about number of transition steps. In order to extend the approach with clock functions to other resources, additional functions must be introduced. Moreover, the