# Changeset 2634

Ignore:
Timestamp:
Feb 6, 2013, 11:23:28 PM (6 years ago)
Message:

...

File:
1 edited

### Legend:

Unmodified
 r2633 special observables are required for function calls. In this paper we present a generic forward simulation proof that preserves all these observables. The proof statement is based on a new mechanized semantics that traces the The proof statement is based on a new mechanised semantics that traces the structure of execution when the language is unstructured. The generic semantics and simulation proof have been mechanized in the interactive theorem prover and simulation proof have been mechanised in the interactive theorem prover Matita. \end{abstract} to every occurrence, and the exact cost can only be known after compilation. According to the labeling approach, the compiler is free to compile and optimize According to the labelling approach, the compiler is free to compile and optimise the source code without any major restriction, but it must keep trace of what happens to basic blocks during the compilation. The cost model is In particular in~\cite{functionallabelling} it has been applied to a functional language and in~\cite{loopoptimizations} it has been shown that the approach can be slightly complicated to handle loop optimizations and, more generally, program optimizations that do not preserve the structure of basic blocks. can be slightly complicated to handle loop optimisations and, more generally, program optimisations that do not preserve the structure of basic blocks. On-going work also shows that the labelling approach is also compatible with the complex analyses required to obtain a cost model for object code In the European Project CerCo (Certified Complexity~\footnote{\url{http://cerco.cs.unibo.it}})~\cite{cerco} we are certifying a labelling approach based compiler for a large subset of C to 8051 object code. The compiler is moderately optimizing and implements a compilation chain that is largely inspired to that of CompCert~\cite{compcert1,compcert2}. Compared to work done in~\cite{easylabelling}, the main novely and source of difficulties is due to the presence moderately optimising and implements a compilation chain that is largely inspired to that of CompCert~\cite{compcert1,compcert2}. Compared to work done in~\cite{easylabelling}, the main novelty and source of difficulties is due to the presence of function calls. Surprisingly, the addition of function calls require a rivisitation of the proof technique given in~\cite{easylabelling}. In revisitation of the proof technique given in~\cite{easylabelling}. In particular, at the core of the labelling approach there is a forward simulation proof that, in the case of \texttt{While}, is only minimally the back-end languages must grant a whole new set of invariants. In this paper we present a formalization in the Matita interactive theorem In this paper we present a formalisation in the Matita interactive theorem prover of a generic version of the simulation proof required for unstructured languages. All back-end languages of the CerCo compiler are unstructured Let now $C$ be a compiler from $\mathcal{P}$ to the object code $\mathcal{M}$, that is organized in passes. Let $\mathcal{Q}_i$ be the $i$-th intermediate that is organised in passes. Let $\mathcal{Q}_i$ be the $i$-th intermediate language used by the compiler. We can easily extend every intermediate language (and its semantics) with an $\texttt{emit L}$ statement statements, observables now capture both functional and non-functional behaviours. This correctness property is called in the litterature a forward simulation This correctness property is called in the literature a forward simulation and is sufficient for correctness when the target language is deterministic~\cite{compcert3}. that a function will return control just after the function call point. The semantics of the return statement, indeed, consists in fetching the return address from some internal structure (tipically the control stack) and return address from some internal structure (typically the control stack) and jumping directly to it. The code can freely manipulate the control stack to make the procedure returns to whatever position. Indeed, it is also possible point? The answer is negative and the property is not implied by forward simulation proofs. For instance, imagine to modify a correct compiler pass by sistematically adding one to the return address on the stack and by by systematically adding one to the return address on the stack and by putting a \texttt{NOP} (or any other instruction that takes one byte) after every function call. The obtained code will be functionally indistinguishable, non-functional properties, the user (or the automation tool) will have to handle larger expressions. Second, the more labels are emitted, the more difficult it becomes to implement powerful optimizations respecting the code structure. becomes to implement powerful optimisations respecting the code structure. Indeed, function calls are usually implemented in such a way that most registers are preserved by the call, so that the static analysis of the block is not interrupted by the call and an optimization can involve both the code before interrupted by the call and an optimisation can involve both the code before and after the function call. Third, instrumenting the source code may require unpleasant modification of it. Take, for example, the code defined inductively for terminating programs and coinductively for diverging ones. In the paper we focus only on the inductive structure, i.e. we assume that all programs that are given a semantics are total. The Matita formalization that all programs that are given a semantics are total. The Matita formalisation also shows the coinductive definitions. The semantics of a program is then defined as a function that maps an initial state into a structured trace. $s_2$ in one step;\\ $\texttt{as\_classify}~s~c$ holds if the next instruction to be executed in $s$ is classified according to $c \in \{\texttt{cl\_return,cl\_jump,cl\_call,cl\_other}\}$ (we omit tailcalls for simplicity); \{\texttt{cl\_return,cl\_jump,cl\_call,cl\_other}\}$(we omit tail-calls for simplicity); the predicate$\texttt{as\_costed}~s$holds if the next instruction to be executed in$s$is a cost emission statement (also classified instruction. The theorem is proved by structural induction over the structured trace, and is based on the invariant that iff the function that computes the cost model has analyzed the instruction iff the function that computes the cost model has analysed the instruction to be executed at$s_2$after the one to be executed at$s_1$, and if the structured trace starts with$s_1$, then eventually it will contain also Therefore we introduce here a similarity relation between traces with the same structure. Theorem~\texttt{tlr\_rel\_to\_traces\_same\_flatten} in the Matita formalization shows that~\ref{th2} holds for every pair in the Matita formalisation shows that~\ref{th2} holds for every pair$(\tau_1,\tau_2)$of similar traces. the relation maps function calls to function calls to the same function, label emission statements to emissions of the same label, concatenation of subtraces to concatenation of subtraces of the same lenghth and starting with the same emissione statement, etc. In the formalization the three similarity relations --- one for each trace subtraces to concatenation of subtraces of the same length and starting with the same emission statement, etc. In the formalisation the three similarity relations --- one for each trace kind --- are defined by structural recursion on the first trace and pattern matching over the second. Here we turn \label{simulation} We summarize here the results of the previous sections. Each intermediate We summarise here the results of the previous sections. Each intermediate unstructured language can be given a semantics based on structured traces, that single out those runs that respect a certain number of invariants. simulation \emph{trace reconstruction}. The statement just introduced, however, is too simplicistic and not provable The statement just introduced, however, is too simplistic and not provable in the general case. To understand why, consider the case of a function call and the pass that fixes the parameter passing conventions. A function These two definitions enjoy the following remarkable properties: \begin{enumerate} \item they are generic enough to accomodate all passes of the CerCo compiler \item they are generic enough to accommodate all passes of the CerCo compiler \item the conjunction of the 1-to-0-or-many forward simulation conditions are just slightly stricter than the statement of a 1-to-0-or-many forward that the cost model can be computed on the object code and lifted to the source code to reason on non-functional properties, assuming that the 1-to-0-or-many forward simulation conditions are fullfilled for every the 1-to-0-or-many forward simulation conditions are fulfilled for every compiler pass. We restore static predictability by introducing a new semantics for unstructured programs that single outs well structured executions. The latter are represented by structured traces, a generalization of streams of observables that capture by structured traces, a generalisation of streams of observables that capture several structural invariants of the execution, like well nesting of functions or the fact that every basic block must start with a code emission statement. An equivalent alternative when the proof of simulation is constructive consists in providing an explicit function, called \emph{clock function} in the litterature~\cite{clockfunctions}, that computes$n$from$s_1$. Every clock literature~\cite{clockfunctions}, that computes$n$from$s_1\$. Every clock function constitutes then a cost model for the source code, in the spirit of what we are doing in CerCo. However, we believe our solution to be superior In particular, they associate a number to each basic block. More complex models can be induced when the approach is scaled to cover, for instance, loop optimizations~\cite{loopoptimizations}, but the costs are still meant to loop optimisations~\cite{loopoptimizations}, but the costs are still meant to be easy to understand and manipulate in an interactive theorem prover or in Frama-C.