Changeset 2634 for Papers


Ignore:
Timestamp:
Feb 6, 2013, 11:23:28 PM (6 years ago)
Author:
sacerdot
Message:

...

File:
1 edited

Legend:

Unmodified
Added
Removed
  • Papers/itp-2013/ccexec.tex

    r2633 r2634  
    4848special observables are required for function calls. In this paper we
    4949present a generic forward simulation proof that preserves all these observables.
    50 The proof statement is based on a new mechanized semantics that traces the
     50The proof statement is based on a new mechanised semantics that traces the
    5151structure of execution when the language is unstructured. The generic semantics
    52 and simulation proof have been mechanized in the interactive theorem prover
     52and simulation proof have been mechanised in the interactive theorem prover
    5353Matita.
    5454\end{abstract}
     
    6868to every occurrence, and the exact cost can only be known after compilation.
    6969
    70 According to the labeling approach, the compiler is free to compile and optimize
     70According to the labelling approach, the compiler is free to compile and optimise
    7171the source code without any major restriction, but it must keep trace
    7272of what happens to basic blocks during the compilation. The cost model is
     
    9090In particular in~\cite{functionallabelling} it has been applied to a functional
    9191language and in~\cite{loopoptimizations} it has been shown that the approach
    92 can be slightly complicated to handle loop optimizations and, more generally,
    93 program optimizations that do not preserve the structure of basic blocks.
     92can be slightly complicated to handle loop optimisations and, more generally,
     93program optimisations that do not preserve the structure of basic blocks.
    9494On-going work also shows that the labelling approach is also compatible with
    9595the complex analyses required to obtain a cost model for object code
     
    9999In the European Project CerCo (Certified Complexity~\footnote{\url{http://cerco.cs.unibo.it}})~\cite{cerco} we are certifying a labelling approach based compiler for a large subset of C to
    1001008051 object code. The compiler is
    101 moderately optimizing and implements a compilation chain that is largely
    102 inspired to that of CompCert~\cite{compcert1,compcert2}. Compared to work done in~\cite{easylabelling}, the main novely and source of difficulties is due to the presence
     101moderately optimising and implements a compilation chain that is largely
     102inspired to that of CompCert~\cite{compcert1,compcert2}. Compared to work done in~\cite{easylabelling}, the main novelty and source of difficulties is due to the presence
    103103of function calls. Surprisingly, the addition of function calls require a
    104 rivisitation of the proof technique given in~\cite{easylabelling}. In
     104revisitation of the proof technique given in~\cite{easylabelling}. In
    105105particular, at the core of the labelling approach there is a forward
    106106simulation proof that, in the case of \texttt{While}, is only minimally
     
    110110the back-end languages must grant a whole new set of invariants.
    111111
    112 In this paper we present a formalization in the Matita interactive theorem
     112In this paper we present a formalisation in the Matita interactive theorem
    113113prover of a generic version of the simulation proof required for unstructured
    114114languages. All back-end languages of the CerCo compiler are unstructured
     
    154154
    155155Let now $C$ be a compiler from $\mathcal{P}$ to the object code $\mathcal{M}$,
    156 that is organized in passes. Let $\mathcal{Q}_i$ be the $i$-th intermediate
     156that is organised in passes. Let $\mathcal{Q}_i$ be the $i$-th intermediate
    157157language used by the compiler. We can easily extend every
    158158intermediate language (and its semantics) with an $\texttt{emit L}$ statement
     
    174174statements, observables now capture both functional and non-functional
    175175behaviours.
    176 This correctness property is called in the litterature a forward simulation
     176This correctness property is called in the literature a forward simulation
    177177and is sufficient for correctness when the target language is
    178178deterministic~\cite{compcert3}.
     
    239239that a function will return control just after the function call point.
    240240The semantics of the return statement, indeed, consists in fetching the
    241 return address from some internal structure (tipically the control stack) and
     241return address from some internal structure (typically the control stack) and
    242242jumping directly to it. The code can freely manipulate the control stack to
    243243make the procedure returns to whatever position. Indeed, it is also possible
     
    248248point? The answer is negative and the property is not implied by forward
    249249simulation proofs. For instance, imagine to modify a correct compiler pass
    250 by sistematically adding one to the return address on the stack and by
     250by systematically adding one to the return address on the stack and by
    251251putting a \texttt{NOP} (or any other instruction that takes one byte) after
    252252every function call. The obtained code will be functionally indistinguishable,
     
    271271non-functional properties, the user (or the automation tool) will have to handle
    272272larger expressions. Second, the more labels are emitted, the more difficult it
    273 becomes to implement powerful optimizations respecting the code structure.
     273becomes to implement powerful optimisations respecting the code structure.
    274274Indeed, function calls are usually implemented in such a way that most registers
    275275are preserved by the call, so that the static analysis of the block is not
    276 interrupted by the call and an optimization can involve both the code before
     276interrupted by the call and an optimisation can involve both the code before
    277277and after the function call. Third, instrumenting the source code may require
    278278unpleasant modification of it. Take, for example, the code
     
    331331defined inductively for terminating programs and coinductively for diverging
    332332ones. In the paper we focus only on the inductive structure, i.e. we assume
    333 that all programs that are given a semantics are total. The Matita formalization
     333that all programs that are given a semantics are total. The Matita formalisation
    334334also shows the coinductive definitions. The semantics of a program is then
    335335defined as a function that maps an initial state into a structured trace.
     
    349349$s_2$ in one step;\\ $\texttt{as\_classify}~s~c$ holds if the next instruction
    350350to be executed in $s$ is classified according to $c \in
    351 \{\texttt{cl\_return,cl\_jump,cl\_call,cl\_other}\}$ (we omit tailcalls for simplicity);
     351\{\texttt{cl\_return,cl\_jump,cl\_call,cl\_other}\}$ (we omit tail-calls for simplicity);
    352352the predicate $\texttt{as\_costed}~s$ holds if the next instruction to be
    353353executed in $s$ is a cost emission statement (also classified
     
    577577instruction. The theorem is proved by structural induction over the structured
    578578trace, and is based on the invariant that
    579 iff the function that computes the cost model has analyzed the instruction
     579iff the function that computes the cost model has analysed the instruction
    580580to be executed at $s_2$ after the one to be executed at $s_1$, and if
    581581the structured trace starts with $s_1$, then eventually it will contain also
     
    616616Therefore we introduce here a similarity relation between traces with
    617617the same structure. Theorem~\texttt{tlr\_rel\_to\_traces\_same\_flatten}
    618 in the Matita formalization shows that~\ref{th2} holds for every pair
     618in the Matita formalisation shows that~\ref{th2} holds for every pair
    619619$(\tau_1,\tau_2)$ of similar traces.
    620620
     
    626626the relation maps function calls to function calls to the same function,
    627627label emission statements to emissions of the same label, concatenation of
    628 subtraces to concatenation of subtraces of the same lenghth and starting with
    629 the same emissione statement, etc.
    630 
    631 In the formalization the three similarity relations --- one for each trace
     628subtraces to concatenation of subtraces of the same length and starting with
     629the same emission statement, etc.
     630
     631In the formalisation the three similarity relations --- one for each trace
    632632kind --- are defined by structural recursion on the first trace and pattern
    633633matching over the second. Here we turn
     
    783783\label{simulation}
    784784
    785 We summarize here the results of the previous sections. Each intermediate
     785We summarise here the results of the previous sections. Each intermediate
    786786unstructured language can be given a semantics based on structured traces,
    787787that single out those runs that respect a certain number of invariants.
     
    809809simulation \emph{trace reconstruction}.
    810810
    811 The statement just introduced, however, is too simplicistic and not provable
     811The statement just introduced, however, is too simplistic and not provable
    812812in the general case. To understand why, consider the case of a function call
    813813and the pass that fixes the parameter passing conventions. A function
     
    841841These two definitions enjoy the following remarkable properties:
    842842\begin{enumerate}
    843  \item they are generic enough to accomodate all passes of the CerCo compiler
     843 \item they are generic enough to accommodate all passes of the CerCo compiler
    844844 \item the conjunction of the 1-to-0-or-many forward simulation conditions are
    845845       just slightly stricter than the statement of a 1-to-0-or-many forward
     
    861861that the cost model can be computed on the object code and lifted to the
    862862source code to reason on non-functional properties, assuming that
    863 the 1-to-0-or-many forward simulation conditions are fullfilled for every
     863the 1-to-0-or-many forward simulation conditions are fulfilled for every
    864864compiler pass.
    865865
     
    12031203We restore static predictability by introducing a new semantics for unstructured
    12041204programs that single outs well structured executions. The latter are represented
    1205 by structured traces, a generalization of streams of observables that capture
     1205by structured traces, a generalisation of streams of observables that capture
    12061206several structural invariants of the execution, like well nesting of functions
    12071207or the fact that every basic block must start with a code emission statement.
     
    12411241An equivalent alternative when the proof of simulation is constructive consists
    12421242in providing an explicit function, called \emph{clock function} in the
    1243 litterature~\cite{clockfunctions}, that computes $n$ from $s_1$. Every clock
     1243literature~\cite{clockfunctions}, that computes $n$ from $s_1$. Every clock
    12441244function constitutes then a cost model for the source code, in the spirit of
    12451245what we are doing in CerCo. However, we believe our solution to be superior
     
    12541254In particular, they associate a number to each basic block. More complex
    12551255models can be induced when the approach is scaled to cover, for instance,
    1256 loop optimizations~\cite{loopoptimizations}, but the costs are still meant to
     1256loop optimisations~\cite{loopoptimizations}, but the costs are still meant to
    12571257be easy to understand and manipulate in an interactive theorem prover or
    12581258in Frama-C.
Note: See TracChangeset for help on using the changeset viewer.