Changeset 3366


Ignore:
Timestamp:
Jun 17, 2013, 6:48:16 PM (4 years ago)
Author:
tranquil
Message:

...

File:
1 edited

Legend:

Unmodified
Added
Removed
  • Papers/itp-2013/ccexec2.tex

    r3360 r3366  
    1 \documentclass[bookmarksdepth=2,bookmarksopen]{llncs}
     1\documentclass[bookmarksdepth=2,bookmarksopen,envcountsame]{llncs}
    22\usepackage{hyperref}
    33\usepackage{graphicx}
     
    1818% a built-in definition.
    1919%\newtheorem{theorem}{Theorem}
    20 \newtheorem{condition}{Condition}
     20% \usepackage{aliascnt} %% this will make autoref give correct names
     21% \def\mynewtheorem#1[#2]#3{%
     22%   \newaliascnt{#1}{#2}%
     23%   \newtheorem{#1}[#1]{#3}%
     24%   \aliascntresetthe{#1}%
     25%   \expandafter\def\csname #1autorefname\endcsname{#3}%
     26% }
     27% \let\proof\relax
     28% \let\endproof\relax
     29% \usepackage{amsthm}
     30% theorem environments
     31% \theoremstyle{definition}
     32\spnewtheorem{condition}[theorem]{Condition}{\bfseries}{}
    2133
    2234\lstdefinelanguage{coq}
     
    108120\def\R{\mathrel{\mathcal R}}
    109121\def\C{\mathrel{\mathcal C}}
     122\def\LS{\mathrel{\mathcal{L_S}}}
    110123
    111124\def\Labels{\mathbb L}
     
    131144\def\to{\@ifnextchar[{\new@to}{\oldto}}
    132145\def\new@to[#1]{\xrightarrow{#1}}
    133 \def\st@ck#1[#2]{\stackrel{#2}{#1}}
    134 \def\defst@ck#1#2{\def#1{\@ifnextchar[{\st@ck#2}{#2}}}
    135 \defst@ck\To\implies
    136 \defst@ck\MeasTo\rightsquigarrow
     146% \def\st@ck#1[#2]{\stackrel{#2}{#1}}
     147% \def\defst@ck#1#2{\def#1{\@ifnextchar[{\st@ck#2}{#2}}}
     148\let\To\implies
     149% \let\MeasTo\rightsquigarrow
    137150\makeatother
    138151
     
    229242part that deals with the preservation of structure.
    230243
    231 The plan of this paper is the following. In Section~\ref{labelling} we
     244The plan of this paper is the following. In Section~\ref{sec:labelling} we
    232245sketch the labelling method and the problems derived from the application
    233 to languages with function calls. In Section~\ref{semantics} we introduce
     246to languages with function calls. In Section~\ref{sec:semantics} we introduce
    234247a generic description of an unstructured imperative language and the
    235248corresponding structured traces (the novel semantics). In
    236 Section~\ref{simulation} we describe the forward simulation proof.
    237 Conclusions and future works are in Section~\ref{conclusions}
     249Section~\ref{sec:simulation} we describe the forward simulation proof.
     250Conclusions and future works are in Section~\ref{sec:conclusions}
    238251
    239252\section{The labelling approach}
    240 \label{labelling}
     253\label{sec:labelling}
    241254% \subsection{A brief introduction to the labelling approach}
    242255We briefly explain the labelling approach as introduced in~\cite{easylabelling}
     
    417430stack). This property, however, is a property of the runs of object code
    418431programs, and not a property of the object code that can be easily statically
    419 verified (as the ones required in \autoref{labelling} in absence of function calls).
     432verified (as the ones required in \autoref{sec:labelling} in absence of function calls).
    420433Therefore, we now need to single out those runs whose cost behaviour can be
    421434statically predicted, and we need to prove that every run of programs generated
     
    440453The proof of correctness of such a compiler is harder than a traditional
    441454proof of preservation of functional properties, and a standard forward
    442 simulation argument does not work. In \autoref{simulation} we present
     455simulation argument does not work. In \autoref{sec:simulation} we present
    443456a refinement of forward simulation that grants all required correctness
    444457properties.
     
    695708
    696709\subsection{Structured traces}
    697 \label{semantics}
     710\label{sec:semantics}
    698711
    699712Let's consider a generic unstructured language already equipped with a
     
    763776   $s_i\to[f]s_{i+1}$. This is a technical condition needed to ensure that a
    764777   label associated with a function is only used at the beginning of its
    765    body. Its use will become clear in~\autoref{simulation}.
     778   body. Its use will become clear in~\autoref{sec:simulation}.
    766779 \item For every $i$, if the instruction to be executed in $s_i$ is a
    767780   conditional branch, then there is an $L$ such that $s_{i+1} \to[L] s_{i+2}$ or, equivalently, that $s_{i+1}$ must start execution with an
     
    788801
    789802\begin{theorem}
    790  \label{static}
     803 \label{thm:static}
    791804 for all measurable fragment $T = s_0 \to^{*} s_n$,\\
    792805 $$\Delta_t := \verb+clock+_{s_n} - \verb+clock+_{s_0} = \Sigma_{o \in |T|} k(o)$$
     
    835848holds for each compiler pass.
    836849
    837 Having proved in~\autoref{static} that the statically computed cost model is
     850Having proved in~\autoref{thm:static} that the statically computed cost model is
    838851accurate for the object code, we get as a corollary that it is also accurate
    839852for the source code if the compiler preserves weak traces and
     
    842855on the source code only.
    843856
    844 \begin{theorem}\label{preservation}
     857\begin{theorem}\label{thm:preservation}
    845858Given a compiler that preserves weak traces and propagates measurability,
    846859for all measurable execution fragment $T_1 = s_1 \to^{*} s_1'$ of the source
     
    852865
    853866\section{Proving the compiler correctness}
    854 \label{simulation}
    855 Because of \autoref{preservation}, to certify a compiler for the labelling
     867\label{sec:simulation}
     868Because of \autoref{thm:preservation}, to certify a compiler for the labelling
    856869approach we need to both prove that it respects the functional semantics of the
    857870program, and that it preserves weak traces and propagates measurability.
     
    13731386% As should be expected, even though the rules are asymmetric $\approx$ is in fact
    13741387% an equivalence relation.
    1375 
    1376 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    1377 
    1378 Therefore we now introduce an abstract notion of relation set between abstract
    1379 statuses and an abstract notion of 1-to-many forward simulation conditions.
    1380 These two definitions enjoy the following remarkable properties:
     1388%
     1389This argument enjoys the following remarkable properties:
    13811390\begin{enumerate}
    1382  \item they are generic enough to accommodate all passes of the CerCo compiler
    1383  \item the conjunction of the 1-to-many forward simulation conditions are
     1391 \item it is generic enough to accommodate all passes of the CerCo compiler;
     1392 \item the requested conditions are
    13841393       just slightly stricter than the statement of a 1-to-many forward
    1385        simulation in the classical case. In particular, they only require
    1386        the construction of very simple forms of structured traces made of
    1387        silent states only.
    1388  \item they allow to prove our main result of the paper: the 1-to-many
    1389        forward simulation conditions are sufficient to prove the trace
    1390        reconstruction theorem
     1394       simulation in the classical case;
     1395%        . In particular, they only require
     1396%        the construction of very simple forms of structured traces made of
     1397%        silent states only.
     1398 \item they allow to prove our main result of the paper: the conditions we will
     1399 present are sufficient to prove the intensional preservation needed to prove
     1400 the compiler correct (\autoref{thm:main}).
    13911401\end{enumerate}
    1392 
    1393 Point 3. is the important one. First of all it means that we have reduced
    1394 the complex problem of trace reconstruction to a much simpler one that,
     1402The last point is the important one. First of all it means that we have reduced
     1403the complex problem of preserving the structure of fragments to a much simpler one that,
    13951404moreover, can be solved with slight adaptations of the forward simulation proof
    13961405that is performed for a compiler that only cares about functional properties.
    1397 Therefore we have successfully splitted as much as possible the proof of
     1406Therefore we have successfully split as much as possible the proof of
    13981407preservation of functional properties from that of non-functional ones.
    13991408
    1400 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
     1409% @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    14011410
    14021411\paragraph{Relation sets.}
    14031412Let $S_1$ and $S_2$ be two deterministic labelled transition systems as described
    1404 in \autoref{semantics}. We introduce now the four relations $\mathcal{S,C,R,L}\subseteq S_1\times S_2$
     1413in \autoref{sec:semantics}. We introduce now the four relations $\mathcal{S,C,R,L}\subseteq S_1\times S_2$
    14051414between states of the two systems. The first two are abstract and must be instantiated
    14061415by every pass. The remaining two are derived.
     
    14181427% the target code of every cost emitting statement in the source code.
    14191428
     1429\begin{definition}[$\R$ and $\LS$]
     1430\label{def:R_LS}
    14201431Two states
    14211432$s_1$ and $s_2$ are $\R$-related if every time $s_1$ is the
     
    14231434$s_2'$ in the target code, then $s_2$ is the successor of $s_2'$. Formally:
    14241435$$s_1\R s_2 \iffdef \forall s_1',s_2'.s_1'\C s_2' \to s_1'\ar s_1 \to s_2' \ar s_2.$$
    1425 We will require all pairs of states that return from related calls to be
    1426 $\R$-related. This, in combinantion with a dual requirement on function calls,
    1427 will grant that calls return exactly where they are supposed to be.
    14281436
    14291437We say states in $s_1\in S_1$ and $s_2\in S_2$ are label-related
    1430 (marked $s_1\L s_2$) if
     1438(marked $s_1\LS s_2$) if
    14311439\begin{itemize}
    14321440\item they both emit the same label $L$;
     
    14351443 $s_1\S s_2'$, otherwise if $L\notin\ell(\Functions)$ then $s_1\S s_2$.
    14361444\end{itemize}
    1437 This relation captures the fact that synchronisation on labels can be decoupled from
    1438 ``semantic synchronisation'' (in particular at the start of functions). $s_1\L s_2$
     1445\end{definition}
     1446We will require all pairs of states that return from related calls to be
     1447$\R$-related. This, in combinantion with a dual requirement on function calls,
     1448will grant that calls return exactly where they are supposed to be.
     1449On the other hand the $\LS$ relation captures the fact that synchronisation on labels can be decoupled from
     1450``semantic synchronisation'' (in particular at the start of functions). $s_1\LS s_2$
    14391451tells that the two state are synchronised as to labels, and $s_2$ will eventually
    14401452synchronise semantically using only silent execution steps (apart from the very first
     
    14431455Given the relations $\S$ and $\C$, \autoref{fig:forwardsim} defines a set of
    14441456local simulation conditions that are sufficient to grant the correctness of
    1445 the compiler.
    1446 
     1457the compiler, as stated by the results that follow.
    14471458\begin{figure}
    14481459\centering
     
    15501561    \draw [new, overlay] (t1) to [bend left ] node [rel] {$\R$} (r);
    15511562\end{tikzpicture}
    1552 \caption{Local simulation conditions. Each one states that the existence of
    1553 states in~XXX such that the dashed relations holds is implied by the assumptions
    1554 drawn as solid lines.}
     1563\caption{Local simulation conditions. Each one states that the assumptions
     1564         drawn solid imply the existence of the white states and the dashed relations.}
    15551565\label{fig:forwardsim}
    15561566\end{figure}
    1557 
    1558 \begin{lemma}
     1567%
     1568\begin{lemma}[Preservation of structured fragments]
    15591569If $S_1$, $S_2$, $\S$ and $\C$ satisfy the diagrams in \autoref{fig:forwardsim},
    15601570$T_1=s_1\To s_1'$ is a structured fragment not starting with a $\ell(f)$ emission,
    15611571and $s_1\S s_2$, then there is $T_2=s_2\To s_2'$ with $T\approx T'$ and $s_1'\S s_2'$.
    15621572\end{lemma}
    1563 \begin{theorem}
     1573\begin{theorem}[Preservation of measurable fragments]
     1574\label{thm:main}
    15641575If $S_1$, $S_2$, $\S$ and $\C$ satisfy the diagrams in \autoref{fig:forwardsim},
    1565 $M_1:s_1\MeasTo s_1'$ is a measurable fragment of $S_1$ and $s_2$ is such that
    1566 $s_1\L s_2$, then there is $M_2:s_2\MeasTo s_2'$ with $M_1\approx M_2$. Moreover,
     1576$M_1 = s_1\to^* s_1'$ is a measurable fragment of $S_1$ and $s_2$ is such that
     1577$s_1\LS s_2$, then there is a measurable $M_2 = s_2\to^* s_2'$ with $|M_1|=|M_2|$. Moreover,
    15671578there is a state $s_2''$ with $s_2'\to[\tau *]s_2''$ and $s_1'\S s_2''$.
    15681579\end{theorem}
    1569 @@@@@@@@@@@@@@@@@@@@@@@
     1580In particular, the above theorem
     1581applied to the \verb+main+ function of the program together with an assumption
     1582stating that initial states are $\LS$-related shows that
     1583for each measurable execution of the source code,
     1584the compiled code produces a measurable execution with the same observables.
     1585Combined with \autoref{thm:static} and by proving the conditions in
     1586\autoref{fig:forwardsim} for every pass, the compiler is proved correct.
     1587
    15701588
    15711589% \begin{figure}
     
    19521970\end{verbatim}
    19531971\end{comment}
    1954 
    1955 \section{Conclusions and future works}
    1956 AGGIUNGERE DISCORSO SU APPROCCIO ALTERNATIVO: LABEL DOPO OGNI CALL. COMUNQUE
    1957 CON TRACCE STRUTTURATE, MA DIVERSAMENTE
    1958 \label{conclusions}
    1959 The labelling approach is a technique to implement compilers that induce on
    1960 the source code a non uniform cost model determined from the object code
    1961 produced. The cost model assigns a cost to each basic block of the program.
    1962 The main theorem of the approach says that there is an exact
    1963 correspondence between the sequence of basic blocks started in the source
    1964 and object code, and that no instruction in the source or object code is
    1965 executed outside a basic block. Thus the cost of object code execution
    1966 can be computed precisely on the source.
    1967 
    1968 In this paper we scaled the labelling approach to cover a programming language
    1969 with function calls. This introduces new difficulties when the language
    1970 is unstructured, i.e. it allows function calls to return anywhere in the code,
    1971 destroying the hope of a static prediction of the cost of basic blocks.
    1972 We restored static predictability by introducing a new semantics for unstructured
    1973 programs that single outs well structured executions. The latter are represented
    1974 by structured traces, a generalisation of streams of observables that capture
    1975 several structural invariants of the execution, like well nesting of functions
    1976 or the fact that every basic block must start with a code emission statement.
    1977 We showed that structured traces are sufficiently well behaved to statically compute a precise cost model on the object code.
    1978 
    1979 We also proved that the cost model computed on the object code is also valid
    1980 on the source code if the compiler respects two requirements: the weak execution
    1981 traces of the source and target code must be the same and the object
    1982 code execution fragments are structured.
    1983 
    1984 To prove that a compiler respects the requirement we extended the notion
    1985 of forward simulation proof for a labelled transition system to grant
    1986 preservation of structured fragments. If the source language of the compiler
    1987 is structured, all its execution fragments are, allowing to deduce from
    1988 preservation of structure that object code fragments are structured too.
    1989 
    1990 Finally, we identified measurable execution fragments that are those whose
    1991 execution time (once compiled) can be exactly computed looking at the object
    1992 code weak execution traces only. A final instrumentation pass on the source
    1993 code can then be used to turn the non functional property of having a certain
    1994 cost into the functional property of granting that a certain global variable
    1995 incremented at the beginning of every block according to the induced cost model
    1996 has a certain value.
    1997 
    1998 All results presented in the paper are part of a larger certification of a
    1999 C compiler which is based on the labelling approach. The certification, done
    2000 in Matita, is the main deliverable of the FET-Open Certified Complexity (CerCo).
    2001 
    2002 The short term future work consists in the completion of the certification of
    2003 the CerCo compiler exploiting the main theorem of this paper.
    2004 
    2005 \paragraph{Related works.}
    2006 CerCo is the first project that explicitly tries to induce a
    2007 precise cost model on the source code in order to establish non-functional
    2008 properties of programs on an high level language. Traditional certifications
    2009 of compilers, like~\cite{compcert2,piton}, only explicitly prove preservation
    2010 of the functional properties.
    2011 
    2012 Usually forward simulations take the following form: for each transition
    2013 from $s_1$ to $s_2$ in the source code, there exists an equivalent sequence of
    2014 transitions in the target code of length $n$. The number $n$ of transition steps
    2015 in the target code can just be the witness of the existential statement.
    2016 An equivalent alternative when the proof of simulation is constructive consists
    2017 in providing an explicit function, called \emph{clock function} in the
    2018 literature~\cite{clockfunctions}, that computes $n$ from $s_1$. Every clock
    2019 function constitutes then a cost model for the source code, in the spirit of
    2020 what we are doing in CerCo. However, we believe our solution to be superior
    2021 in the following respects: 1) the machinery of the labelling approach is
    2022 insensible to the resource being measured. Indeed, any cost model computed on
    2023 the object code can be lifted to the source code (e.g. stack space used,
    2024 energy consumed, etc.) simply re-proving an analogue of~\autoref{static}.
    2025 For example, in CerCo we transported to the source level not only the execution
    2026 time cost model, but also the amount of stack used by function calls.
    2027 On the contrary, clock functions only talk about
    2028 number of transition steps. In order to extend the approach with clock functions
    2029 to other resources, additional functions must be introduced. Moreover, the
    2030 additional functions would be handled differently in the proof.
    2031 2) the cost models induced by the labelling approach have a simple presentation.
    2032 In particular, they associate a number to each basic block. More complex
    2033 models can be induced when the approach is scaled to cover, for instance,
    2034 loop optimisations~\cite{loopoptimizations}, but the costs are still meant to
    2035 be easy to understand and manipulate in an interactive theorem prover or
    2036 in Frama-C.
    2037 On the contrary, a clock function is a complex function of the state $s_1$
    2038 which, as a function, is an opaque object that is difficult to reify as
    2039 source code in order to reason on it.
    2040 
    20411972\section{The formalisation}
    20421973As we already explained, for the sake of presentation we explained the formal
     
    20501981 \item Rather than having a generic notion of fragment and a predicate of
    20511982 structuredness and measurability, we use inductive definitions internalising
    2052  the conditions. Among other things this turns the proof of \autoref{preservation} a matter of
     1983 the conditions. Among other things this turns the proof of \autoref{thm:preservation} a matter of
    20531984 structural induction, when it would require more complex induction schemes
    20541985 otherwise.
     
    20611992 with the tradition of labelled transition systems and named sequences of states
    20621993 and steps \emph{execution fragments}. In the development we named our structures
    2063  \emph{traces}. In the remainder of this section we will use both names.
     1994 \emph{traces} (that usually refer to the sequence of emitted labels only).
     1995 In the remainder of this section we will use both names.
    20641996\end{itemize}
    20651997
     
    22692201
    22702202\paragraph{The forward simulation result.}
     2203In the formalisation, the equivalent conditions of those
     2204depicted in \autoref{fig:forwardsim} can be seen in \autoref{fig:forwardsim'}.
     2205Again we must produce for each pass the relations $\S$ and $\C$. Another derived
     2206relation is $\L$, which holds for states $s_1$ and $s_2$ when $L~s_1=L~s_2$.
     2207%
     2208Some details are skipped in the figure regarding the nature of the repeated steps depicted with
     2209an asterisk.
     2210There are three kinds of iterated steps without
     2211calls nor returns involved in the conditions:
     2212\begin{itemize}
     2213 \item sequences where all states strictly after the first one are unlabelled;
     2214 these are what can be safely prepended to \verb+TAL+'s, and are modelled
     2215 by the inductive definition \verb+trace_any_any+ (\verb+TAA+) in the formalisation;
     2216 \item sequences where all ``internal'' states strictly after the first and before the last
     2217 are unlabelled; these are \verb+trace_any_any_free+ in the formalisation (\verb+TAAF+, which
     2218 is just a \verb+TAA+ followed by a step);
     2219 \item sequences where all states strictly before the last are unlabelled that we
     2220 will call here \verb+trace_any_any_right+ (\verb+TAAR+); in the formalisation
     2221 these are in fact \verb+TAAF+'s with an additional condition.
     2222\end{itemize}
     2223%
    22712224\begin{figure}
    22722225\centering
     
    22982251% \end{subfigure}
    22992252% &
    2300 \begin{subfigure}{.25\linewidth}
     2253\begin{subfigure}[b]{.25\linewidth}
    23012254\centering
    23022255\begin{tikzpicture}[every join/.style={ar}, join all, thick,
     
    23172270\end{subfigure}
    23182271&
    2319 \begin{subfigure}{.375\linewidth}
     2272\begin{subfigure}[b]{.375\linewidth}
    23202273\centering
    23212274\begin{tikzpicture}[every join/.style={ar}, join all, thick,
     
    23382291    \draw [new] (s1) to [bend left] node [rel] {$\C$} (c);
    23392292    \end{tikzpicture}
    2340 \caption{The \verb+cl_call+ case.}
     2293\caption{The \verb+cl_call+ case.\\\vspace{3.1ex}}
    23412294\label{subfig:cl_call}
    23422295\end{subfigure}
    23432296&
    2344 \begin{subfigure}{.375\linewidth}
     2297\begin{subfigure}[b]{.375\linewidth}
    23452298\centering
    23462299\begin{tikzpicture}[every join/.style={ar}, join all, thick,
     
    23622315    \draw [new, overlay] (t1) to [bend left ] node [rel] {$\R$} (r);
    23632316\end{tikzpicture}
    2364 \caption{The \verb+cl_return+ case.}
     2317\caption{The \verb+cl_return+ case.\\\vspace{3.1ex}}
    23652318\label{subfig:cl_return}
    23662319\end{subfigure}
    23672320\end{tabular}
    2368 \caption{Mnemonic diagrams depicting the hypotheses for the preservation of structured traces.
    2369          Dashed lines
    2370          and arrows indicates how the diagrams must be closed when solid relations
    2371          are present.
    2372          DA CAMBIARE: $\L$ va in conflitto con l'altra definizione}
    2373 \label{fig:forwardsim}
     2321\caption{Mnemonic diagrams depicting the hypotheses for the preservation of structured traces.}
     2322\label{fig:forwardsim'}
    23742323\end{figure}
     2324This prolification of different types of fragments is a downside of shifting
     2325labelling from steps to states.
     2326The actual conditions are as follows.
     2327%
     2328\begin{condition}[\verb+cl_other+ and \verb+cl_jump+, \autoref{subfig:cl_other_jump}]
     2329\label{cond:other}
     2330 For all $s_1,s_1',s_2$ such that $s_1 \S s_1'$, and
     2331 $s_1\exec s_1'$, and either $s_1 \class \verb+cl_other+$ or
     2332 both $s_1\class\verb+cl_other+$ and $\ell~s_1'$,
     2333 there exists an $s_2'$ and a $taaf:\verb+trace_any_any_free+~s_2~s_2'$
     2334 such that $s_1' \mathrel{{\S} \cap {\L}} s_2'$ and either
     2335 $taaf$ is non empty, or one among $s_1$ and $s_1'$ is not labelled.
     2336 The last condition is needed to prevent the collapsing of two
     2337 label-emitting consecutive states.
     2338\end{condition}
     2339
     2340% In the above condition depicted in ,
     2341% a $\verb+trace_any_any_free+~s_1~s_2$ (which from now on
     2342% will be shorthanded as \verb+TAAF+) is an
     2343% inductive type of structured traces that do not contain function calls or
     2344% cost emission statements. Differently from a \verb+TAA+, the
     2345% instruction to be executed in the lookahead state $s_2$ may be a cost emission
     2346% statement.
     2347%
     2348% The intuition of the condition is that one step can be replaced with zero or more steps if it
     2349% preserves the relation between the data and if the two final statuses are
     2350% labelled in the same way. Moreover, we must take special care of the empty case
     2351% to avoid collapsing two consecutive states that emit a label, missing one of the two emissions.
     2352%
     2353\begin{condition}[\verb+cl_call+, \autoref{subfig:cl_call}]
     2354\label{cond:call}
     2355 For all $s_1,s_1',s_2$ s.t. $s_1 \S s_1'$ and
     2356 $s_1\exec s_1'$ and $s_1 \class \verb+cl_call+$, there exists $s_a, s_b, s_2'$, a
     2357$\verb+TAA+~s_2~s_a$, and a
     2358$\verb+TAAF+~s_b~s_2'$ such that:
     2359$s_a\class\verb+cl_call+$, the \verb+as_call_ident+'s of
     2360the two call states are the same, $s_1 \C s_a$,
     2361$s_a\exec s_b$, $s_1' \L s_b$ and
     2362$s_1' \S s_2'$.
     2363\end{condition}
     2364
     2365% The condition, depicted in \autoref{subfig:cl_call} says that, to simulate a function call, we can perform a
     2366% sequence of silent actions before and after the function call itself.
     2367% The old and new call states must be $\C$-related, the old and new
     2368% states at the beginning of the function execution must be $\L$-related
     2369% and, finally, the two initial and final states must be $\S$-related
     2370% as usual.
     2371
     2372\begin{condition}[\verb+cl_return+, \autoref{subfig:cl_return}]
     2373\label{cond:return}
     2374 For all $s_1,s_1',s_2$ s.t. $s_1 \S s_1'$,
     2375 $s_1\exec s_1'$ and $s_1 \class \verb+cl_return+$, there exists $s_a, s_b, s_2'$, a
     2376$\verb+TAA+~s_2~s_a$ and a
     2377$\verb+TAAR+~s_b~s_2'$ such that:
     2378$s_a\class\verb+cl_return+$,
     2379$s_a\exec s_b$,
     2380$s_1' \R s_b$ and
     2381$s_1' \mathrel{{\S} \cap {\L}} s_2'$.
     2382\end{condition}
     2383
     2384\paragraph{Main result.}
     2385Let us assume that $\S$ and $\C$ are given such that
     2386Conditions~\ref{cond:other}, \ref{cond:call} and~\ref{cond:return}
     2387are satisfied. If we reword the
     2388relation $\LS$ of \autoref{def:R_LS} by defining
     2389$$s_1\LS s_2\iffdef s_1\L s_2\wedge \exists s_2', taa:\verb+TAA+~s_2~s_2'. s_1\S s_2',$$
     2390we can prove the trace reconstruction theorem,
     2391which is the analogue of \autoref{thm:main}.
     2392%
     2393\begin{theorem}[\verb+status_simulation_produce_tlr+]
     2394\label{thm:main'}
     2395For every $s_1,s_1',s_2$ s.t.
     2396$s_1\LS s_2$ and there is a $tlr_1:\verb+TLR+~s_1~s_1'$,
     2397then there exist $s_2'$ and $tlr_2:\verb+TLR+~s_2~s_2'$
     2398with $|tlr_1|=|tlr_2|$. Moreover there are $s_2''$ and
     2399a $\verb+TAAR+~s_2'~s_2''$ with $s_2'\mathrel{{\S}\cap{\L}} s_2''$.
     2400\end{theorem}
     2401% In particular, the \verb+status_simulation_produce_tlr+ theorem
     2402% applied to the \verb+main+ function of the program and equal
     2403% $s_{2_b}$ and $s_2$ states shows that, for every initial state in the
     2404% source code that induces a structured trace in the source code,
     2405% the compiled code produces a similar structured trace.
     2406The theorem states that a \verb+trace_label_return+ in the source code
     2407together with a precomputed preamble of silent states
     2408(hidden in the definition of $\LS$) in the target code induces a
     2409similar \verb+trace_label_return+ in the target code.
     2410% Note that the statement does not
     2411% require the produced \verb+trace_label_return+ to start with the
     2412% precomputed preamble, even if this is likely to be the case in concrete
     2413% implementations.
     2414The preamble in input and the postamble in output are necessary for compositionality,
     2415and follow directly from the the fact that points semantically related
     2416may not correspond to points where the same observable events are fired, in particular
     2417cost labels and $RET$'s that mark the borders of measurability.
     2418
     2419The proof proceeds by mutual structural induction on the traces involved.
     2420In the actual formalisation, in place of $|tlr_1|=|tlr_2|$ we use a
     2421recursively defined simulation relation between traces that implies the required
     2422equality.
     2423
     2424\section{Conclusions and future works}
     2425\label{sec:conclusions}
     2426The labelling approach is a technique to implement compilers that induce on
     2427the source code a non uniform cost model determined from the object code
     2428produced. The cost model assigns a cost to each basic block of the program.
     2429The main theorem of the approach says that there is an exact
     2430correspondence between the sequence of basic blocks started in the source
     2431and object code, and that no instruction in the source or object code is
     2432executed outside a basic block. Thus the cost of object code execution
     2433can be computed precisely on the source.
     2434
     2435In this paper we scaled the labelling approach to cover a programming language
     2436with function calls. This introduces new difficulties when the language
     2437is unstructured, i.e. it allows function calls to return anywhere in the code,
     2438destroying the hope of a static prediction of the cost of basic blocks.
     2439We restored static predictability by introducing a new semantics for unstructured
     2440programs that single outs well structured executions. The latter are represented
     2441by structured traces, a generalisation of streams of observables that capture
     2442several structural invariants of the execution, like well nesting of functions
     2443or the fact that every basic block must start with a code emission statement.
     2444We showed that structured traces are sufficiently well behaved to statically compute a precise cost model on the object code.
     2445
     2446We also proved that the cost model computed on the object code is also valid
     2447on the source code if the compiler respects two requirements: the weak execution
     2448traces of the source and target code must be the same and the object
     2449code execution fragments are structured.
     2450
     2451To prove that a compiler respects the requirement we extended the notion
     2452of forward simulation proof for a labelled transition system to grant
     2453preservation of structured fragments. If the source language of the compiler
     2454is structured, all its execution fragments are, allowing to deduce from
     2455preservation of structure that object code fragments are structured too.
     2456
     2457Finally, we identified measurable execution fragments that are those whose
     2458execution time (once compiled) can be exactly computed looking at the object
     2459code weak execution traces only. A final instrumentation pass on the source
     2460code can then be used to turn the non functional property of having a certain
     2461cost into the functional property of granting that a certain global variable
     2462incremented at the beginning of every block according to the induced cost model
     2463has a certain value.
     2464
     2465All results presented in the paper are part of a larger certification of a
     2466C compiler which is based on the labelling approach. The certification, done
     2467in Matita, is the main deliverable of the FET-Open Certified Complexity (CerCo).
     2468
     2469The short term objective consists in the completion of the certification of
     2470the CerCo compiler exploiting the main theorem of this paper. An alternative approach
     2471to the same problem that we would like to investigate consists in labelling
     2472every instruction that follows a call. This would still require a form of structured
     2473execution fragments, stating that no only calls but also returns are always followed
     2474by an emission. The main downside is the pollution of the instrumented code
     2475with many cost annotations.
     2476
     2477\paragraph{Related works.}
     2478CerCo is the first project that explicitly tries to induce a
     2479precise cost model on the source code in order to establish non-functional
     2480properties of programs on an high level language. Traditional certifications
     2481of compilers, like~\cite{compcert2,piton}, only explicitly prove preservation
     2482of the functional properties.
     2483
     2484Usually forward simulations take the following form: for each transition
     2485from $s_1$ to $s_2$ in the source code, there exists an equivalent sequence of
     2486transitions in the target code of length $n$. The number $n$ of transition steps
     2487in the target code can just be the witness of the existential statement.
     2488An equivalent alternative when the proof of simulation is constructive consists
     2489in providing an explicit function, called \emph{clock function} in the
     2490literature~\cite{clockfunctions}, that computes $n$ from $s_1$. Every clock
     2491function constitutes then a cost model for the source code, in the spirit of
     2492what we are doing in CerCo. However, we believe our solution to be superior
     2493in the following respects: 1) the machinery of the labelling approach is
     2494insensible to the resource being measured. Indeed, any cost model computed on
     2495the object code can be lifted to the source code (e.g. stack space used,
     2496energy consumed, etc.) simply re-proving an analogue of~\autoref{thm:static}.
     2497For example, in CerCo we transported to the source level not only the execution
     2498time cost model, but also the amount of stack used by function calls.
     2499On the contrary, clock functions only talk about
     2500number of transition steps. In order to extend the approach with clock functions
     2501to other resources, additional functions must be introduced. Moreover, the
     2502additional functions would be handled differently in the proof.
     25032) the cost models induced by the labelling approach have a simple presentation.
     2504In particular, they associate a number to each basic block. More complex
     2505models can be induced when the approach is scaled to cover, for instance,
     2506loop optimisations~\cite{loopoptimizations}, but the costs are still meant to
     2507be easy to understand and manipulate in an interactive theorem prover or
     2508in Frama-C.
     2509On the contrary, a clock function is a complex function of the state $s_1$
     2510which, as a function, is an opaque object that is difficult to reify as
     2511source code in order to reason on it.
     2512
     2513% e.g.
     2514% because the 1-to-many forward simulation conditions allow in the
     2515% case of function calls to execute a preamble of silent instructions just after
     2516% the function call.
     2517
     2518% Similarly to the call condition, to simulate a return we can perform a
     2519% sequence of silent actions before and after the return statement itself,
     2520% as depicted in \autoref{subfig:cl_return}.
     2521% The old and the new statements after the return must be $\R$-related,
     2522% to grant that they returned to corresponding calls.
     2523% The two initial and final states must be $\S$-related
     2524% as usual and, moreover, they must exhibit the same labels. Finally, when
     2525% the suffix is non empty we must take care of not inserting a new
     2526% unmatched cost emission statement just after the return statement.
     2527
    23752528
    23762529% There are three mutual structural recursive functions, one for each of
Note: See TracChangeset for help on using the changeset viewer.