source: Papers/itp-2013/ccexec2.tex @ 3346

Last change on this file since 3346 was 3346, checked in by sacerdot, 6 years ago

...

File size: 86.9 KB
RevLine 
[3343]1\documentclass{llncs}
2\usepackage{hyperref}
3\usepackage{graphicx}
4\usepackage{color}
5\usepackage{listings}
6\usepackage{bcprules}%\bcprulessavespace
7\usepackage{verbatim}
8\usepackage{alltt}
9\usepackage{subcaption}
10\usepackage{listings}
11\usepackage{amssymb}
12% \usepackage{amsmath}
13\usepackage{multicol}
14
15\providecommand{\eqref}[1]{(\ref{#1})}
16
17% NB: might be worth removing this if changing class in favour of
18% a built-in definition.
19%\newtheorem{theorem}{Theorem}
20\newtheorem{condition}{Condition}
21
22\lstdefinelanguage{coq}
23  {keywords={Definition,Lemma,Theorem,Remark,Qed,Save,Inductive,Record},
24   morekeywords={[2]if,then,else,forall,Prop,match,with,end,let},
25  }
26
27\lstdefinelanguage[mine]{C}[ANSI]{C}{
28  morekeywords={int8_t},
29  mathescape
30}
31
32\lstset{language=[mine]C,basicstyle=\footnotesize\tt,columns=flexible,breaklines=false,
33        keywordstyle=\color{red}\bfseries,
34        keywordstyle=[2]\color{blue},
35        commentstyle=\color{green},
36        stringstyle=\color{blue},
37        showspaces=false,showstringspaces=false,
38        xleftmargin=1em}
39
40\usepackage{tikz}
41\usetikzlibrary{positioning,calc,patterns,chains,shapes.geometric,scopes}
42\makeatletter
43\pgfutil@ifundefined{pgf@arrow@code@implies}{% supply for lack of double arrow special arrow tip if it is not there%
44  \pgfarrowsdeclare{implies}{implies}%
45  {%
46  \pgfarrowsleftextend{2.2pt}%
47  \pgfarrowsrightextend{2.2pt}%
48  }%
49  {%
50    \pgfsetdash{}{0pt} % do not dash%
51    \pgfsetlinewidth{.33pt}%
52    \pgfsetroundjoin   % fix join%
53    \pgfsetroundcap    % fix cap%
54    \pgfpathmoveto{\pgfpoint{-1.5pt}{2.5pt}}%
55    \pgfpathcurveto{\pgfpoint{-.75pt}{1.5pt}}{\pgfpoint{0.5pt}{.5pt}}{\pgfpoint{2pt}{0pt}}%
56    \pgfpathcurveto{\pgfpoint{0.5pt}{-.5pt}}{\pgfpoint{-.75pt}{-1.5pt}}{\pgfpoint{-1.5pt}{-2.5pt}}%
57    \pgfusepathqstroke%
58  }%
59}{}%
60\makeatother
61
62\tikzset{state/.style={inner sep = 0, outer sep = 2pt, draw, fill},
63         every node/.style={inner sep=2pt},
64         every on chain/.style = {inner sep = 0, outer sep = 2pt},
65         join all/.style = {every on chain/.append style={join}},
66         on/.style={on chain={#1}, state},
67         m/.style={execute at begin node=$, execute at end node=$},
68         node distance=3mm,
69         is other/.style={circle, minimum size = 3pt, state},
70         other/.style={on, is other},
71         is jump/.style={diamond, minimum size = 6pt, state},
72         jump/.style={on, is jump},
73         is call/.style={regular polygon, regular polygon sides=3, minimum size=5pt, state},
74         call/.style={on=going below, is call, node distance=6mm, label=above left:$#1$},
75         is ret/.style={regular polygon, regular polygon sides=3, minimum size=5pt, shape border rotate=180, state},
76         ret/.style={on=going above, is ret, node distance=6mm},
77         chain/.style={start chain=#1 going left},
78         rev ar/.style={stealth-, thick},
79         ar/.style={-stealth, thick},
80         every join/.style={rev ar},
81         labelled/.style={fill=white, label=above:$#1$},
82         vcenter/.style={baseline={([yshift=-.5ex]current bounding box)}},
83         every picture/.style={thick},
84         double equal sign distance/.prefix style={double distance=1.5pt}, %% if already defined (newest version of pgf) it should be ignored%}
85         implies/.style={double, -implies, thin, double equal sign distance, shorten <=5pt, shorten >=5pt},
86         new/.style={densely dashed},
87         rel/.style={font=\scriptsize, fill=white, inner sep=2pt},
88         diag/.style={row sep={11mm,between origins},
89                      column sep={11mm,between origins},
90                      every node/.style={draw, is other}},
91         small vgap/.style={row sep={7mm,between origins}},
92         % for article, maybe temporary
93         is jump/.style=is other,
94         is call/.style=is other,
95         is ret/.style=is other,
96}
97
98\def\L{\mathrel{\mathcal L}}
99\def\S{\mathrel{\mathcal S}}
100\def\R{\mathrel{\mathcal R}}
101\def\C{\mathrel{\mathcal C}}
102
103\newsavebox{\execbox}
104\savebox{\execbox}{\tikz[baseline=-.5ex]\draw [-stealth] (0,0) -- ++(1em, 0);}
105\newcommand{\exec}{\ensuremath{\mathrel{\usebox{\execbox}}}}
106\let\ar\rightsquigarrow
107\renewcommand{\verb}{\lstinline[mathescape]}
108
109\let\class\triangleright
110\let\andalso\quad
111
112\newcommand{\append}{\mathbin{@}}
113\newcommand{\iffdef}{\mathrel{:\Leftrightarrow}}
114
115\begin{document}
116\pagestyle{plain}
117
118\title{Certification of the Preservation of Structure by a Compiler's Back-end Pass\thanks{The project CerCo acknowledges the financial support of the Future and
119Emerging Technologies (FET) programme within the Seventh Framework
120Programme for Research of the European Commission, under FET-Open grant
121number: 243881}}
122\author{Paolo Tranquilli \and Claudio Sacerdoti Coen}
123\institute{Department of Computer Science and Engineering, University of Bologna,\\\email{Paolo.Tranquilli@unibo.it}, \email{Claudio.SacerdotiCoen@unibo.it}}
124\maketitle
125\begin{abstract}
126The labelling approach is a technique to lift cost models for non-functional
127properties of programs from the object code to the source code. It is based
128on the preservation of the structure of the high level program in every
129intermediate language used by the compiler. Such structure is captured by
130observables that are added to the semantics and that needs to be preserved
131by the forward simulation proof of correctness of the compiler. Additional
132special observables are required for function calls. In this paper we
133present a generic forward simulation proof that preserves all these observables.
134The proof statement is based on a new mechanised semantics that traces the
135structure of execution when the language is unstructured. The generic semantics
136and simulation proof have been mechanised in the interactive theorem prover
137Matita.
138\end{abstract}
139
140\section{Introduction}
141The \emph{labelling approach} has been introduced in~\cite{easylabelling} as
142a technique to \emph{lift} cost models for non-functional properties of programs
143from the object code to the source code. Examples of non-functional properties
144are execution time, amount of stack/heap space consumed and energy required for
145communication. The basic idea of the approach is that it is impossible to
146provide a \emph{uniform} cost model for an high level language that is preserved
147\emph{precisely} by a compiler. For instance, two instances of an assignment
148$x = y$ in the source code can be compiled very differently according to the
149place (registers vs stack) where $x$ and $y$ are stored at the moment of
150execution. Therefore a precise cost model must assign a different cost
151to every occurrence, and the exact cost can only be known after compilation.
152
153According to the labelling approach, the compiler is free to compile and optimise
154the source code without any major restriction, but it must keep trace
155of what happens to basic blocks during the compilation. The cost model is
156then computed on the object code. It assigns a cost to every basic block.
157Finally, the compiler propagates back the cost model to the source level,
158assigning a cost to each basic block of the source code.
159
160Implementing the labelling approach in a certified compiler
161allows to reason formally on the high level source code of a program to prove
162non-functional properties that are granted to be preserved by the compiler
163itself. The trusted code base is then reduced to 1) the interactive theorem
164prover (or its kernel) used in the certification of the compiler and
1652) the software used to certify the property on the source language, that
166can be itself certified further reducing the trusted code base.
167In~\cite{easylabelling} the authors provide an example of a simple
168certified compiler that implements the labelling approach for the
169imperative \texttt{While} language~\cite{while}, that does not have
170pointers and function calls.
171
172The labelling approach has been shown to scale to more interesting scenarios.
173In particular in~\cite{functionallabelling} it has been applied to a functional
174language and in~\cite{loopoptimizations} it has been shown that the approach
175can be slightly complicated to handle loop optimisations and, more generally,
176program optimisations that do not preserve the structure of basic blocks.
177On-going work also shows that the labelling approach is also compatible with
178the complex analyses required to obtain a cost model for object code
179on processors that implement advanced features like pipelining, superscalar
180architectures and caches.
181
182In the European Project CerCo (Certified Complexity~\footnote{\url{http://cerco.cs.unibo.it}})~\cite{cerco} we are certifying a labelling approach based compiler for a large subset of C to
1838051 object code. The compiler is
184moderately optimising and implements a compilation chain that is largely
185inspired to that of CompCert~\cite{compcert1,compcert2}. Compared to work done in~\cite{easylabelling}, the main novelty and source of difficulties is due to the presence
186of function calls. Surprisingly, the addition of function calls require a
187revisitation of the proof technique given in~\cite{easylabelling}. In
188particular, at the core of the labelling approach there is a forward
189simulation proof that, in the case of \texttt{While}, is only minimally
190more complex than the proof required for the preservation of the
191functional properties only. In the case of a programming language with
192function calls, instead, it turns out that the forward simulation proof for
193the back-end languages must grant a whole new set of invariants.
194
195In this paper we present a formalisation in the Matita interactive theorem
196prover~\cite{matita1,matita2} of a generic version of the simulation proof required for unstructured
197languages. All back-end languages of the CerCo compiler are unstructured
198languages, so the proof covers half of the correctness of the compiler.
199The statement of the generic proof is based on a new semantics
200for imperative unstructured languages that is based on \emph{structured
201traces} and that restores the preservation of structure in the observables of
202the semantics. The generic proof allows to almost completely split the
203part of the simulation that deals with functional properties only from the
204part that deals with the preservation of structure.
205
206The plan of this paper is the following. In Section~\ref{labelling} we
207sketch the labelling method and the problems derived from the application
208to languages with function calls. In Section~\ref{semantics} we introduce
209a generic description of an unstructured imperative language and the
210corresponding structured traces (the novel semantics). In
211Section~\ref{simulation} we describe the forward simulation proof.
212Conclusions and future works are in Section~\ref{conclusions}
213
214\section{The labelling approach}
215\label{labelling}
[3344]216
217\section{A brief introduction to the labelling approach}
218
219\begin{figure}
220\begin{verbatim}
221EMIT L_1;                         EMIT L_1         cost += k_1;
222I_1;                              I_3              I_1;
223for (i=0; i<2; i++) {        l_1: COND l_2         for (i=0; i<2; i++) {
224  EMIT L_2;                       EMIT L_2           cost += k_2;           
225  I_2;                            I_4                I_2;               
226 }                                GOTO l_1          }                   
227EMIT L_3;                    l_2: EMIT L_3         cost += k_3;           
228\end{verbatim}
229\caption{The labelling approach applied to a simple program.\label{examplewhile}. The $I_i$ are sequences of instructions not containing jumps or loops. }
230\end{figure}
231We briefly explain the labelling approach on the example in Figure~\ref{examplewhile}. The user wants to analyse the execution time of the program made by
232the black lines in the r.h.s. of the figure. He compiles the program using
233a special compiler that first inserts in the code three label emission
234statements (\texttt{EMIT L$_i$}, in red) to mark the beginning of basic blocks;
235then the compiler compiles the code to machine code (in the
236middle of the figure), granting that the execution of the source and object
237code emits the same sequence of labels ($L_1; L_2; L_2; L_3$ in the example).
238This is achieved by keeping track of basic blocks during compilation, avoiding
239all optimizations that alter the control flow. The latter can be recovered with
240a more refined version of the labelling approach~\cite{tranquill}, but in the
241present paper we stick to this simple variant for simplicity. Once the object
242code is produced, the compiler runs a static code analyzer to associate to
243each label $L_1, \ldots, L_3$ the cost (in clock cycles) of the instructions
244that belong to the corresponding basic block. For example, the cost $k_1$
245associated to $L_1$ is the number of cycles required to execute the block
246$I_3$ and $COND l_2$, while the cost $k_2$ associated to $L_2$ counts the
247cycles required by the block $I_4$ and $GOTO l_1$. The compiler also guarantees
248that every executed instruction is in the scope of some code emission label,
249that each scope does not contain loops (to associate a finite cost), and that
250both branches of a conditional statement are followed by a code emission
251statement. Under these assumptions it is true that the total execution cost
252of the program $\Delta_t$ is equal to the sum over the sequence of emitted
253labels of the cost associated to every label:
254$\Delta t = k(L_1; L_2; L_2; L_3) = k_1 + k_2 + k_2 + k_3$.
255Finally, the compiler emits an instrumented version of the source code
256(in the r.h.s. of the figure) where label emission statements are replaced
257by increments of a global variable cost that, before every increment, holds the
258exact number of clock cycles spent by the microprocessor so far:
259the difference $\Delta clock$ between the final and initial value of the variable clock is $\Delta clock = k_1 + k_2 + k_2 + k_3 = \Delta t$. Finally, the
260user can employ any available method (e.g. Hoare logic, invariant generators,
261abstract interpretation and automated provers) to certify that $\Delta clock$
262never exceeds a certain bound~\cite{cerco}, which is now a functional property
263of the code.
264
265\section{The labelling approach in presence of loops}
266
267Let's now consider a simple program written in C that contains a function
268pointer call inside the scope of the cost label $L_1$.
269\begin{verbatim}
270main: EMIT L_1       g: EMIT L_3   EMIT L_1     g: EMIT L_3
271      I_1;              I_3;       I_4;            I_6;
272      (*f)();           return;    CALL            RETURN
273      I_2;                         I_5;
274      EMIT L_2                     EMIT L_2
275\end{verbatim}
276The labelling method works exactly as before, inserting
277code emission statements/\texttt{cost} variable increments at the beginning
278of every basic block and at the beginning of every function. The compiler
279still grants that the sequence of labels observed on the two programs are
280the same. A new difficulty happears when the compiler needs to statically
281analyze the object code to assign a cost to every label. What should the scope
282of the $L_1$ label be? After executing the $I_4$ block, the \texttt{CALL}
283statement passes control to a function that cannot be determined statically.
284Therefore the cost of executing the body must be paid by some other label
285(hence the requirement that every function starts with a code emission
286statement). What label should pay for the cost for the block $I_5$? The only
287reasonable answer is $L_1$, i.e. \emph{the scope of labels should extend to the
288next label emission statement, stepping over function calls}.
289
290The latter definition of scope is adeguate on the source level because
291C is a structured language that guarantees that every function call, if it
292returns, passes control to the first instruction that follows the call. However,
293this is not guaranteed for object code, the backend languages of a compiler
294and, more generally, for unstructured
295languages that use a writable control stack to store the return address of
296calls. For example, $I_6$ could increment by $1$ the return address on the
297stack so that the next \texttt{RETURN} would start at the second instruction
298of $I_5$. The compiler would still be perfectly correct if a random, dead
299code instruction was also added just after each \texttt{CALL}. More generally,
300\emph{there is no guarantee that a correct compiler that respects the functional
301behaviour of a program also respects the calling structure of the source code}.
302Without such an assumption, however, it may not be true that the execution cost
303of the program is the sum of the costs associated to the labels emitted. In our
304example, the cost of $I_5$ is paid by $L_1$, but in place of $I_5$ the processor could execute any other code after $g$ returns.
305
306Obviously, any reasonably written compiler produces object code that behaves
307as if the language was structured (i.e. by properly nesting function
308calls/returns and without tampering with the return addresses on the control
309stack). This property, however, is a property of the runs of object code
310programs, and not a property of the object code that can be easily statically
311verified (as the ones we required for the basic labelling method).
312Therefore, we now need to single out those runs whose cost behaviour can be
313statically predicted, and we need to prove that every run of programs generated
314by our compiler are of that type. We call them \emph{structured} since their
315main property is to respect properties that hold for free on the source code
316because the source language is structured. Moreover, in order to avoid proving
317too many preservation properties of our compiler, we drop the original
318requirements on the object code (all instructons must be in scope of some labels, no loops inside a scope, etc.) in favour of the corresponding requirement
319for structured runs (a structured run must start with a label emission, no instruction can be executed twice between two emissions, etc.).
320
321We will therefore proceed as follows. In the following section
3221) we formally introduce the notion of
323structured trace, which captures structured runs in the style of labelled
324transition systems; 2) we show that on the object code we can correctly
325compute the execution time of a structured run from the sequence of labels
326observed; 3) we give unstructured languages a semantics in terms of structured
327traces; 4) we show that on the source code we can correctly compute the
328execution time of a program if the compiler produces object code whose
329runs are weakly similar to the source code runs.
330
331The notion of weak bisimulation for structured traces is a global property
332which is hard to prove formally and much more demanding than the simple forward
333simulation required for proofs of preservation of functional properties.
334Therefore in Section~\ref{XXX} we will present a set of local simulation
335conditions that refine the corresponding conditions for forward simulation and
336that are sufficient to grant the production of weakly similar traces.
337
338All the definitions and theorems presented in the paper have been formalized
339in the interactive theorem prover Matita and are being used to certify
340the complexity preserving compiler developed in the CerCo project~\cite{cerco}.
341The formalization can be
342found at~\ref{YYY} and it heavily relies on algebraic and dependent types for
343both structured traces and the definition of weak similarity. In the paper
344we did not try to stay close to the formalization. On the contrary,
345the definitions given in the paper are the result of a significant
346simplification effort for
347the sake of presentation and to make easier the re-implementation of the
348concepts in a proof assistant which is not based on the Calculus of Inductive
349Constructions. However the formalization is heavily commented to allow the
350reader to understand the technical details of the formalization.
351
352
353====================================================
354
[3343]355We briefly sketch here a simplified version of the labelling approach as
356introduced in~\cite{easylabelling}. The simplification strengthens the
357sufficient conditions given in~\cite{easylabelling} to allow a simpler
358explanation. The simplified conditions given here are also used in the
359CerCo compiler to simplify the proof.
360
361Let $\mathcal{P}$ be a programming language whose semantics is given in
362terms of observables: a run of a program yields a finite or infinite
363stream of observables. We also assume for the time being that function
364calls are not available in $\mathcal{P}$. We want to associate a cost
365model to a program $P$ written in $\mathcal{P}$. The first step is to
366extend the syntax of $\mathcal{P}$ with a new construct $\texttt{emit L}$
367where $L$ is a label distinct from all observables of $\mathcal{P}$.
368The semantics of $\texttt{emit L}$ is the emission of the observable
369\texttt{L} that is meant to signal the beginning of a basic block.
370
371There exists an automatic procedure that injects into the program $P$ an
372$\texttt{emit L}$ at the beginning of each basic block, using a fresh
373\texttt{L} for each block. In particular, the bodies of loops, both branches
374of \texttt{if-then-else}s and the targets of \texttt{goto}s must all start
375with an emission statement.
376
377Let now $C$ be a compiler from $\mathcal{P}$ to the object code $\mathcal{M}$,
378that is organised in passes. Let $\mathcal{Q}_i$ be the $i$-th intermediate
379language used by the compiler. We can easily extend every
380intermediate language (and its semantics) with an $\texttt{emit L}$ statement
381as we did for $\mathcal{P}$. The same is possible for $\mathcal{M}$ too, with
382the additional difficulty that the syntax of object code is given as a
383sequence of bytes. The injection of an emission statement in the object code
384can be done using a map that maps two consecutive code addresses with the
385statement. The intended semantics is that, if $(pc_1,pc_2) \mapsto \texttt{emit L}$ then the observable \texttt{L} is emitted after the execution of the
386instruction stored at $pc_1$ and before the execution of the instruction
387stored at $pc_2$. The two program counters are necessary because the
388instruction stored at $pc_1$ can have multiple possible successors (e.g.
389in case of a conditional branch or an indirect call). Dually, the instruction
390stored at $pc_2$ can have multiple possible predecessors (e.g. if it is the
391target of a jump).
392
393The compiler, to be functionally correct, must preserve the observational
394equivalence, i.e. executing the program after each compiler pass should
395yield the same stream of observables. After the injection of emission
396statements, observables now capture both functional and non-functional
397behaviours.
398This correctness property is called in the literature a forward simulation
399and is sufficient for correctness when the target language is
400deterministic~\cite{compcert3}.
401We also require a stronger, non-functional preservation property: after each
402pass all basic blocks must start with an emission statement, and all labels
403\texttt{L} must be unique.
404
405Now let $M$ be the object code obtained for the program $P$. Let us suppose
406that we can statically inspect the code $M$ and associate to each basic block
407a cost (e.g. the number of clock cycles required to execute all instructions
408in the basic block, or an upper bound to that time). Every basic block is
409labelled with an unique label \texttt{L}, thus we can actually associate the
410cost to \texttt{L}. Let call it $k(\texttt{L})$.
411
412The function $k$ is defined as the cost model for the object code control
413blocks. It can be equally used as well as the cost model for the source
414control blocks. Indeed, if the semantics of $P$ is the stream
415$L_1 L_2 \ldots$, then, because of forward simulation, the semantics of $M$ is
416also $L_1 L_2 \ldots$ and its actual execution cost is $\Sigma_i k(L_i)$ because
417every instruction belongs to a control block and every control block is
418labelled. Thus it is correct to say that the execution cost of $P$ is also
419$\Sigma_i k(L_i)$. In other words, we have obtained a cost model $k$ for
420the blocks of the high level program $P$ that is preserved by compilation.
421
422How can the user profit from the high level cost model? Suppose, for instance,
423that he wants to prove that the WCET of his program is bounded by $c$. It
424is sufficient for him to prove that $\Sigma_i k(L_i) \leq c$, which is now
425a purely functional property of the code. He can therefore use any technique
426available to certify functional properties of the source code.
427What is suggested in~\cite{easylabelling} is to actually instrument the
428source code $P$ by replacing every label emission statement
429$\texttt{emit L}$ with the instruction $\texttt{cost += k(L)}$ that increments
430a global fresh variable \texttt{cost}. The bound is now proved by establishing
431the program invariant $\texttt{cost} \leq c$, which can be done for example
432using the Frama-C~\cite{framaC} suite if the source code is some variant of
433C.
434
435In order to extend the labeling approach to function calls we make
436\verb+CALL f+ emit the observable \verb+f+ and \verb+RET+ emit a distinguished observable
437\verb+ret+.
438
439For example the following execution history of the program in \autoref{fig:esempio}
440$$I_1; \verb+CALL f+; \verb+COND l+; \verb+EMIT $\ell_2$+; I_3; \verb+RET+; I_2; \verb+RET+$$
441emits the trace
442$$\verb+main+, \verb+f+$$
443\begin{figure}
444\hfil
445\begin{minipage}{.2\linewidth}
446\begin{lstlisting}
447main: $\!I_1$
448      CALL f
449      $I_2$
450      RET
451\end{lstlisting}
452\end{minipage}
453\begin{minipage}{.1\linewidth}
454\begin{lstlisting}
455main
456main
457main
458main
459\end{lstlisting}
460\end{minipage}
461\hfil
462\begin{minipage}{.2\linewidth}
463\begin{lstlisting}
464f: $\!$COND l
465   EMIT $\ell_2$
466   RET
467l: $\!$EMIT $\ell_3$
468   $I_3$
469   RET
470\end{lstlisting}
471\end{minipage}
472\begin{minipage}{.1\linewidth}
473\begin{lstlisting}
474f
475
476$\ell_2$
477
478$\ell_3$
479$\ell_3$
480\end{lstlisting}
481\end{minipage}
482\hfil{}
483\caption{}
484\label{fig:esempio}
485\end{figure}
486
487
488\subsection{Labelling function calls}
489We now want to extend the labelling approach to support function calls.
490On the high level, \emph{structured} programming language $\mathcal{P}$ there
491is not much to change.
492When a function is invoked, the current basic block is temporarily exited
493and the basic block the function starts with take control. When the function
494returns, the execution of the original basic block is resumed. Thus the only
495significant change is that basic blocks can now be nested. Let \texttt{E}
496be the label of the external block and \texttt{I} the label of a nested one.
497Since the external starts before the internal, the semantics observed will be
498\texttt{E I} and the cost associated to it on the source language will be
499$k(\texttt{E}) + k(\texttt{I})$, i.e. the cost of executing all instructions
500in the block \texttt{E} first plus the cost of executing all the instructions in
501the block \texttt{I}. However, we know that some instructions in \texttt{E} are
502executed after the last instruction in \texttt{I}. This is actually irrelevant
503because we are here assuming that costs are additive, so that we can freely
504permute them\footnote{The additivity assumption fails on modern processors that have stateful subsystems, like caches and pipelines. The extension of the labelling approach to those systems is therefore non trivial and under development in the CerCo project.}. Note that, in the present discussion, we are assuming that
505the function call terminates and yields back control to the basic block
506\texttt{E}. If the call diverges, the instrumentation
507$\texttt{cost += k(E)}$ executed at the beginning of \texttt{E} is still valid,
508but just as an upper bound to the real execution cost: only precision is lost.
509
510Let now consider what happens when we move down the compilation chain to an
511unstructured intermediate or final language. Here unstructured means that
512the only control operators are conditional and unconditional jumps, function
513calls and returns. Unlike a structured language, though, there is no guarantee
514that a function will return control just after the function call point.
515The semantics of the return statement, indeed, consists in fetching the
516return address from some internal structure (typically the control stack) and
517jumping directly to it. The code can freely manipulate the control stack to
518make the procedure returns to whatever position. Indeed, it is also possible
519to break the well nesting of function calls/returns.
520
521Is it the case that the code produced by a correct compiler must respect the
522additional property that every function returns just after its function call
523point? The answer is negative and the property is not implied by forward
524simulation proofs. For instance, imagine to modify a correct compiler pass
525by systematically adding one to the return address on the stack and by
526putting a \texttt{NOP} (or any other instruction that takes one byte) after
527every function call. The obtained code will be functionally indistinguishable,
528and the added instructions will all be dead code.
529
530This lack of structure in the semantics badly interferes with the labelling
531approach. The reason is the following: when a basic block labelled with
532\texttt{E} contains a function call, it no longer makes any sense to associate
533to a label \texttt{E} the sum of the costs of all the instructions in the block.
534Indeed, there is no guarantee that the function will return into the block and
535that the instructions that will be executed after the return will be the ones
536we are paying for in the cost model.
537
538How can we make the labelling approach work in this scenario? We only see two
539possible ways. The first one consists in injecting an emission statement after
540every function call: basic blocks no longer contain function calls, but are now
541terminated by them. This completely solves the problem and allows the compiler
542to break the structure of function calls/returns at will. However, the
543technique has several drawbacks. First of all, it greatly augments the number
544of cost labels that are injected in the source code and that become
545instrumentation statements. Thus, when reasoning on the source code to prove
546non-functional properties, the user (or the automation tool) will have to handle
547larger expressions. Second, the more labels are emitted, the more difficult it
548becomes to implement powerful optimisations respecting the code structure.
549Indeed, function calls are usually implemented in such a way that most registers
550are preserved by the call, so that the static analysis of the block is not
551interrupted by the call and an optimisation can involve both the code before
552and after the function call. Third, instrumenting the source code may require
553unpleasant modification of it. Take, for example, the code
554\texttt{f(g(x));}. We need to inject an emission statement/instrumentation
555instruction just after the execution of \texttt{g}. The only way to do that
556is to rewrite the code as \texttt{y = g(x); emit L; f(y);} for some fresh
557variable \texttt{y}. It is pretty clear how in certain situations the obtained
558code would be more obfuscated and then more difficult to manually reason on.
559
560For the previous reasons, in this paper and in the CerCo project we adopt a
561different approach. We do not inject emission statements after every
562function call. However, we want to propagate a strong additional invariant in
563the forward simulation proof. The invariant is the propagation of the structure
564 of the original high level code, even if the target language is unstructured.
565The structure we want to propagate, that will become more clear in the next
566section, comprises 1) the property that every function should return just after
567the function call point, which in turns imply well nesting of function calls;
5682) the property that every basic block starts with a code emission statement.
569
570In the original labelling approach of~\cite{easylabelling}, the second property
571was granted syntactically as a property of the generated code.
572In our revised approach, instead, we will impose the property on the runs:
573it will be possible to generate code that does not respect the syntactic
574property, as soon as all possible runs respect it. For instance, dead code will no longer
575be required to have all basic blocks correctly labelled. The switch is suggested
576from the fact that the first of the two properties --- that related to
577function calls/returns --- can only be defined as property of runs,
578not of the static code. The switch is
579beneficial to the proof because the original proof was made of two parts:
580the forward simulation proof and the proof that the static property was granted.
581In our revised approach the latter disappears and only the forward simulation
582is kept.
583
584In order to capture the structure semantics so that it is preserved
585by a forward simulation argument, we need to make the structure observable
586in the semantics. This is the topic of the next section.
587
588\section{Structured traces}
589\label{semantics}
[3345]590
591Let's consider a generic unstructured language already equipped with a
592small step structured operational semantics (SOS). We introduce a
593deterministic labelled transition system~\cite{LTS} $(S,s_i,\Lambda,\to)$
594that refines the
595SOS by observing function calls and the beginning of basic blocks.
596$S$ is the set of states of the program, $s_i$ the initial state and
597$\Lambda = \{ \tau, RET \} \cup \mathcal{L} \cup \mathcal{F}$
598where $\mathcal{F}$ is the set of names of functions that can occur in the
599program, $\mathcal{L}$ is a set of labels disjoint from $\mathcal{F}$
600and $\tau$ and $RET$ do not belong to $\mathcal{F} \cup \mathcal{L}$.
601The transition function is defined as $s_1 \stackrel{o}{\to} s_2$ if
602$s_1$ moves to $s_2$ according to the SOS; moreover $o = f \in \mathcal{F}$ if
603the function $f$ is called, $o = RET$ if a \texttt{RETURN} is executed,
604$o = L \in \mathcal{L}$ if a \texttt{EMIT L} is executed to signal the
605beginning of a basic block, and $o = \tau$ in all other cases.
606Because we assume the language to be deterministic, the label emitted can
607actually be computed simply observing $s_1$.
608
609Among all possible finite execution fragments
[3346]610$s_0 \stackrel{o_0}{\to} s_1 \ldots \stackrel{o_n}{\to} s_{n+1}$ we want to
[3345]611identify the ones that satisfy the requirements we sketched in the previous
612section. We say that an execution fragment is \emph{structured} iff
613\begin{enumerate}
614 \item for every $i$, if $s_i \stackrel{f}{\to} s_{i+1}$ then there is a
615   label $L$ such that $s_{i+1} \stackrel{L}{\to} s_{i+2}$.
616   Equivalently, $s_{i+1}$ must start execution with an \texttt{EMIT L}.
617   This captures the requirement that the body of function calls always start
618   with a label emission statement.
619 \item for every $i$, if $s_i \stackrel{f}{\to} s_{i+1}$ then
620   there is a strucuted $s_{i+1} \stackrel{o_{i+1}}{\to} \ldots
621   \stackrel{o_n}{\to} s_{n+1}$ such that $s_{n+1} \stackrel{RET}{\to} s_{n+2}$
622   and the instruction to be executed in $s_{n+2}$ follows the call that was
623   executed in $s_i$. This captures the double requirement that every function
624   call must converge and that it must yield back control just after the call.
625 \item for every $i$, if the instruction to be executed in $s_i$ is a
626   conditional branch, then there is an $L$ such that $s_{i+1} \stackrel{L}{\to} s_{i+2}$ or, equivalently, that $s_{i+1}$ must start execution with an
627   \texttt{EMIT L}. This captures the requirement that every branch which is
628   live code must start with a label emission.
629\end{enumerate}
630
[3346]631Let $T = s_0 \stackrel{o_0}{\to} s_1 \ldots \stackrel{o_n}{\to} s_{n+1}$ be an
632execution fragment. The \emph{weak trace} $|T|$ associated to $T$ is the
633subsequence $o_{i_0} \ldots o_{i_m}$ of $o_0 \ldots o_n$ obtained dropping
634every internal action $\tau$.
[3345]635
[3346]636Let $k$ be a cost model that maps observables actions to any commutative cost
637monoid (e.g. natural numbers). We extend the domain of $k$ to weak traces
638by posing $k(\mathcal{T}) = \Sigma_{o \in \mathcal{T}} k(o)$.
[3345]639
[3346]640The labelling approach is based on the idea that the execution cost of
641certain execution fragments, that we call \emph{measurable fragments}, can be
642computed from their weak trace by choosing a $k$ that assigns to any label
643the cost of the instructions in its scope.
644A fragment
645$T = s_0 \stackrel{o_0}{\to} s_1 \ldots \stackrel{o_n}{\to} s_{n+1}$ is
646measurable if it does not start or end in the middle of a basic block.
647Ending in the middle of a block would mean having pre-paid more instructions
648than the ones executed, and starting in the middle would mean not paying any
649instruction up to the first label emission.
650Formally we require $o_0 \in \mathcal{L}$ (or equivalently
651in $s_0$ the program must emit a label) and $s_{n+1}$ must either be a \texttt{RETURN} or a label emission statement (or, equivalently, $s_{n+1} \stackrel{o_{n+1}} s_{n+2}$ with $o_{n+1} \in \mathcal{L} \cup \{RET\}$).
[3345]652
[3346]653Given two deterministic unstructured programming languages with their own
654operational semantics, we say that a state $s_2$ of the second language
655(weakly) simulates the state $s_1$ of the first iff the two unique weak traces
656that originate from them are equal. If $s_1$ also (weakly) simulates $s_2$,
657then the two states are weakly trace equivalent.
658
659 or, equivalently
660because of determinism, that they are weakly bisimilar.
661
662
[3345]663========================================================================
664
[3343]665The program semantics adopted in the traditional labelling approach is based
666on labelled deductive systems. Given a set of observables $\mathcal{O}$ and
667a set of states $\S$, the semantics of one deterministic execution
668step is
669defined as a function $S \to S \times O^*$ where $O^*$ is a (finite) stream of
670observables. The semantics is then lifted compositionally to multiple (finite
671or infinite) execution steps.
672Finally, the semantics of a a whole program execution is obtained by forgetting
673about the final state (if any), yielding a function $S \to O^*$ that given an
674initial status returns the finite or infinite stream of observables in output.
675
676We present here a new definition of semantics where the structure of execution,
677as defined in the previous section, is now observable. The idea is to replace
678the stream of observables with a structured data type that makes explicit
679function call and returns and that grants some additional invariants by
680construction. The data structure, called \emph{structured traces}, is
681defined inductively for terminating programs and coinductively for diverging
682ones. In the paper we focus only on the inductive structure, i.e. we assume
683that all programs that are given a semantics are total. The Matita formalisation
684also shows the coinductive definitions. The semantics of a program is then
685defined as a function that maps an initial state into a structured trace.
686
687In order to have a definition that works on multiple intermediate languages,
688we abstract the type of structure traces over an abstract data type of
689abstract statuses, which we aptly call $\texttt{abstract\_status}$. The fields
690of this record are the following.
691\begin{itemize}
692 \item \verb+S : Type[0]+, the type of states.
693 \item \verb+as_execute : S $\to$ S $\to$ Prop+, a binary predicate stating
694 an execution step. We write $s_1\exec s_2$ for $\verb+as_execute+~s_1~s_2$.
695 \item \verb+as_classifier : S $\to$ classification+, a function tagging all
696 states with a class in
697 $\{\texttt{cl\_return,cl\_jump,cl\_call,cl\_other}\}$, depending on the instruction
698 that is about to be executed (we omit tail-calls for simplicity). We will
699 use $s \class c$ as a shorthand for both $\texttt{as\_classifier}~s=c$
700 (if $c$ is a classification) and $\texttt{as\_classifier}~s\in c$
701 (if $c$ is a set of classifications).
702 \item \verb+as_label : S $\to$ option label+, telling whether the
703 next instruction to be executed in $s$ is a cost emission statement,
704 and if yes returning the associated cost label. Our shorthand for this function
705 will be $\ell$, and we will also abuse the notation by using $\ell~s$ as a
706 predicate stating that $s$ is labelled.
707 \item \verb+as_call_ident : ($\Sigma$s:S. s $\class$ cl_call) $\to$ label+,
708 telling the identifier of the function which is being called in a
709 \verb+cl_call+ state. We will use the shorthand $s\uparrow f$ for
710 $\verb+as_call_ident+~s = f$.
711 \item \verb+as_after_return : ($\Sigma$s:S. s $\class$ cl_call) $\to$ S $\to$ Prop+,
712 which holds on the \verb+cl_call+ state $s_1$ and a state $s_2$ when the
713 instruction to be executed in $s_2$ follows the function call to be
714 executed in (the witness of the $\Sigma$-type) $s_1$. We will use the notation
715 $s_1\ar s_2$ for this relation.
716\end{itemize}
717
718% \begin{alltt}
719% record abstract_status := \{ S: Type[0];
720%  as_execute: S \(\to\) S \(\to\) Prop;   as_classifier: S \(\to\) classification;
721%  as_label: S \(\to\) option label;    as_called: (\(\Sigma\)s:S. c s = cl_call) \(\to\) label;
722%  as_after_return: (\(\Sigma\)s:S. c s = cl_call) \(\to\) S \(\to\) Prop \}
723% \end{alltt}
724
725The inductive type for structured traces is actually made by three multiple
726inductive types with the following semantics:
727\begin{enumerate}
728 \item $(\texttt{trace\_label\_return}~s_1~s_2)$ (shorthand $\verb+TLR+~s_1~s_2$)
729   is a trace that begins in
730   the state $s_1$ (included) and ends just before the state $s_2$ (excluded)
731   such that the instruction to be executed in $s_1$ is a label emission
732   statement and the one to be executed in the state before $s_2$ is a return
733   statement. Thus $s_2$ is the state after the return. The trace
734   may contain other label emission statements. It captures the structure of
735   the execution of function bodies: they must start with a cost emission
736   statement and must end with a return; they are obtained by concatenating
737   one or more basic blocks, all starting with a label emission
738   (e.g. in case of loops).
739 \item $(\texttt{trace\_any\_label}~b~s_1~s_2)$ (shorthand $\verb+TAL+~b~s_1~s_2$)
740   is a trace that begins in
741   the state $s_1$ (included) and ends just before the state $s_2$ (excluded)
742   such that the instruction to be executed in $s_2$/in the state before
743   $s_2$ is either a label emission statement or
744   or a return, according to the boolean $b$. It must not contain
745   any label emission statement. It captures the notion of a suffix of a
746   basic block.
747 \item $(\texttt{trace\_label\_label}~b~s_1~s_2)$ (shorthand $\verb+TLL+~b~s_1~s_2$ is the special case of
748   $\verb+TAL+~b~s_1~s_2)$ such that the instruction to be
749   executed in $s_1$ is a label emission statement. It captures the notion of
750   a basic block.
751\end{enumerate}
752
753\begin{multicols}{3}
754\infrule[\verb+tlr_base+]
755 {\texttt{TLL}~true~s_1~s_2}
756 {\texttt{TLR}~s_1~s_2}
757
758\infrule[\verb+tlr_step+]
759 {\texttt{TLL}~false~s_1~s_2 \andalso
760  \texttt{TLR}~s_2~s_3
761 }
762 {\texttt{TLR}~s_1~s_3}
763
764\infrule[\verb+tll_base+]
765 {\texttt{TAL}~b~s_1~s_2 \andalso
766  \ell~s_1
767 }
768 {\texttt{TLL}~b~s_1~s_2}
769\end{multicols}
770
771\infrule[\verb+tal_base_not_return+]
772 {s_1\exec s_2 \andalso
773  s_1\class\{\verb+cl_jump+, \verb+cl_other+\}\andalso
774  \ell~s_2
775 }
776 {\texttt{TAL}~false~s_1~s_2}
777
778\infrule[\verb+tal_base_return+]
779 {s_1\exec s_2 \andalso
780  s_1 \class \texttt{cl\_return}
781 }
782 {\texttt{TAL}~true~s_1~s_2}
783
784\infrule[\verb+tal_base_call+]
785 {s_1\exec s_2 \andalso
786  s_1 \class \texttt{cl\_call} \andalso
787  s_1\ar s_3 \andalso
788  \texttt{TLR}~s_2~s_3 \andalso
789  \ell~s_3
790 }
791 {\texttt{TAL}~false~s_1~s_3}
792
793\infrule[\verb+tal_step_call+]
794 {s_1\exec s_2 \andalso
795  s_1 \class \texttt{cl\_call} \andalso
796  s_1\ar s_3 \andalso
797  \texttt{TLR}~s_2~s_3 \andalso
798  \texttt{TAL}~b~s_3~s_4
799 }
800 {\texttt{TAL}~b~s_1~s_4}
801
802\infrule[\verb+tal_step_default+]
803 {s_1\exec s_2 \andalso
804  \lnot \ell~s_2 \andalso
805  \texttt{TAL}~b~s_2~s_3\andalso
806  s_1 \class \texttt{cl\_other}
807 }
808 {\texttt{TAL}~b~s_1~s_3}
809\begin{comment}
810\begin{verbatim}
811inductive trace_label_return (S:abstract_status) : S → S → Type[0] ≝
812  | tlr_base:
813      ∀status_before: S.
814      ∀status_after: S.
815        trace_label_label S ends_with_ret status_before status_after →
816        trace_label_return S status_before status_after
817  | tlr_step:
818      ∀status_initial: S.
819      ∀status_labelled: S.
820      ∀status_final: S.
821        trace_label_label S doesnt_end_with_ret status_initial status_labelled →
822        trace_label_return S status_labelled status_final →
823          trace_label_return S status_initial status_final
824with trace_label_label: trace_ends_with_ret → S → S → Type[0] ≝
825  | tll_base:
826      ∀ends_flag: trace_ends_with_ret.
827      ∀start_status: S.
828      ∀end_status: S.
829        trace_any_label S ends_flag start_status end_status →
830        as_costed S start_status →
831          trace_label_label S ends_flag start_status end_status
832with trace_any_label: trace_ends_with_ret → S → S → Type[0] ≝
833  (* Single steps within a function which reach a label.
834     Note that this is the only case applicable for a jump. *)
835  | tal_base_not_return:
836      ∀start_status: S.
837      ∀final_status: S.
838        as_execute S start_status final_status →
839        (as_classifier S start_status cl_jump ∨
840         as_classifier S start_status cl_other) →
841        as_costed S final_status →
842          trace_any_label S doesnt_end_with_ret start_status final_status
843  | tal_base_return:
844      ∀start_status: S.
845      ∀final_status: S.
846        as_execute S start_status final_status →
847        as_classifier S start_status cl_return →
848          trace_any_label S ends_with_ret start_status final_status
849  (* A call followed by a label on return. *)
850  | tal_base_call:
851      ∀status_pre_fun_call: S.
852      ∀status_start_fun_call: S.
853      ∀status_final: S.
854        as_execute S status_pre_fun_call status_start_fun_call →
855        ∀H:as_classifier S status_pre_fun_call cl_call.
856          as_after_return S «status_pre_fun_call, H» status_final →
857          trace_label_return S status_start_fun_call status_final →
858          as_costed S status_final →
859            trace_any_label S doesnt_end_with_ret status_pre_fun_call status_final
860  (* A call followed by a non-empty trace. *)
861  | tal_step_call:
862      ∀end_flag: trace_ends_with_ret.
863      ∀status_pre_fun_call: S.
864      ∀status_start_fun_call: S.
865      ∀status_after_fun_call: S.
866      ∀status_final: S.
867        as_execute S status_pre_fun_call status_start_fun_call →
868        ∀H:as_classifier S status_pre_fun_call cl_call.
869          as_after_return S «status_pre_fun_call, H» status_after_fun_call →
870          trace_label_return S status_start_fun_call status_after_fun_call →
871          ¬ as_costed S status_after_fun_call →
872          trace_any_label S end_flag status_after_fun_call status_final →
873            trace_any_label S end_flag status_pre_fun_call status_final
874  | tal_step_default:
875      ∀end_flag: trace_ends_with_ret.
876      ∀status_pre: S.
877      ∀status_init: S.
878      ∀status_end: S.
879        as_execute S status_pre status_init →
880        trace_any_label S end_flag status_init status_end →
881        as_classifier S status_pre cl_other →
882        ¬ (as_costed S status_init) →
883          trace_any_label S end_flag status_pre status_end.
884\end{verbatim}
885\end{comment}
886A \texttt{trace\_label\_return} is isomorphic to a list of
887\texttt{trace\_label\_label}s that ends with a cost emission followed by a
888return terminated \texttt{trace\_label\_label}.
889The interesting cases are those of $\texttt{trace\_any\_label}~b~s_1~s_2$.
890A \texttt{trace\_any\_label} is a sequence of steps built by a syntax directed
891definition on the classification of $s_1$. The constructors of the datatype
892impose several invariants that are meant to impose a structure to the
893otherwise unstructured execution. In particular, the following invariants are
894imposed:
895\begin{enumerate}
896 \item the trace is never empty; it ends with a return iff $b$ is
897       true
898 \item a jump must always be the last instruction of the trace, and it must
899       be followed by a cost emission statement; i.e. the target of a jump
900       is always the beginning of a new basic block; as such it must start
901       with a cost emission statement
902 \item a cost emission statement can never occur inside the trace, only in
903       the status immediately after
904 \item the trace for a function call step is made of a subtrace for the
905       function body of type
906       $\texttt{trace\_label\_return}~s_1~s_2$, possibly followed by the
907       rest of the trace for this basic block. The subtrace represents the
908       function execution. Being an inductive datum, it grants totality of
909       the function call. The status $s_2$ is the one that follows the return
910       statement. The next instruction of $s_2$ must follow the function call
911       instruction. As a consequence, function calls are also well nested.
912\end{enumerate}
913
914There are three mutual structural recursive functions, one for each of
915\verb+TLR+, \verb+TLL+ and \verb+TAL+, for which we use the same notation
916$|\,.\,|$: the \emph{flattening} of the traces. These functions
917allow to extract from a structured trace the list of emitted cost labels.
918%  We only show here the type of one
919% of them:
920% \begin{alltt}
921% flatten_trace_label_return:
922%  \(\forall\)S: abstract_status. \(\forall\)\(s_1,s_2\).
923%   trace_label_return \(s_1\) \(s_2\) \(\to\) list (as_cost_label S)
924% \end{alltt}
925
926\paragraph{Cost prediction on structured traces.}
927
928The first main theorem of CerCo about traces
929(theorem \texttt{compute\_max\_trace\_label\_return\_cost\_ok\_with\_trace})
930holds for the
931instantiation
932of the structured traces to the concrete status of object code programs.
933Simplifying a bit, it states that
934\begin{equation}\label{th1}
935\begin{array}{l}\forall s_1,s_2. \forall \tau: \texttt{TLR}~s_1~s_2.~
936  \texttt{clock}~s_2 = \texttt{clock}~s_1 +
937  \Sigma_{\alpha \in |\tau|}\;k(\alpha)
938\end{array}
939\end{equation}
940where the cost model $k$ is statically computed from the object code
941by associating to each label $\alpha$ the sum of the cost of the instructions
942in the basic block that starts at $\alpha$ and ends before the next labelled
943instruction. The theorem is proved by structural induction over the structured
944trace, and is based on the invariant that
945iff the function that computes the cost model has analysed the instruction
946to be executed at $s_2$ after the one to be executed at $s_1$, and if
947the structured trace starts with $s_1$, then eventually it will contain also
948$s_2$. When $s_1$ is not a function call, the result holds trivially because
949of the $s_1\exec s_2$ condition obtained by inversion on
950the trace. The only non
951trivial case is the one of function calls: the cost model computation function
952does recursion on the first instruction that follows that function call; the
953\texttt{as\_after\_return} condition of the \texttt{tal\_base\_call} and
954\texttt{tal\_step\_call} grants exactly that the execution will eventually reach
955this state.
956
957\paragraph{Structured traces similarity and cost prediction invariance.}
958
959A compiler pass maps source to object code and initial states to initial
960states. The source code and initial state uniquely determine the structured
961trace of a program, if it exists. The structured trace fails to exists iff
962the structural conditions are violated by the program execution (e.g. a function
963body does not start with a cost emission statement). Let us assume that the
964target structured trace exists.
965
966What is the relation between the source and target structured traces?
967In general, the two traces can be arbitrarily different. However, we are
968interested only in those compiler passes that maps a trace $\tau_1$ to a trace
969$\tau_2$ such that
970\begin{equation}|\tau_1| = |\tau_2|.\label{th2}\end{equation}
971The reason is that the combination of~\eqref{th1} with~\eqref{th2} yields the
972corollary
973\begin{equation}\label{th3}
974\forall s_1,s_2. \forall \tau: \texttt{TLR}~s_1~s_2.~
975  \texttt{clock}~s_2 - \texttt{clock}~s_1 =
976  \Sigma_{\alpha \in |\tau_1|}\;k(\alpha) =
977  \Sigma_{\alpha \in |\tau_2|}\;k(\alpha).
978\end{equation}
979This corollary states that the actual execution time of the program can be computed equally well on the source or target language. Thus it becomes possible to
980transfer the cost model from the target to the source code and reason on the
981source code only.
982
983We are therefore interested in conditions stronger than~\eqref{th2}.
984Therefore we introduce here a similarity relation between traces with
985the same structure. Theorem~\texttt{tlr\_rel\_to\_traces\_same\_flatten}
986in the Matita formalisation shows that~\eqref{th2} holds for every pair
987$(\tau_1,\tau_2)$ of similar traces.
988
989Intuitively, two traces are similar when one can be obtained from
990the other by erasing or inserting silent steps, i.e. states that are
991not \texttt{as\_costed} and that are classified as \texttt{cl\_other}.
992Silent steps do not alter the structure of the traces.
993In particular,
994the relation maps function calls to function calls to the same function,
995label emission statements to emissions of the same label, concatenation of
996subtraces to concatenation of subtraces of the same length and starting with
997the same emission statement, etc.
998
999In the formalisation the three similarity relations --- one for each trace
1000kind --- are defined by structural recursion on the first trace and pattern
1001matching over the second. Here we turn
1002the definition into the inference rules shown in \autoref{fig:txx_rel}
1003for the sake of readability. We also omit from trace constructors all arguments,
1004but those that are traces or that
1005are used in the premises of the rules. By abuse of notation we denote all three
1006relations by infixing $\approx$.
1007
1008\begin{figure}
1009\begin{multicols}{2}
1010\infrule
1011 {tll_1\approx tll_2
1012 }
1013 {\texttt{tlr\_base}~tll_1 \approx \texttt{tlr\_base}~tll_2}
1014
1015\infrule
1016 {tll_1 \approx tll_2 \andalso
1017  tlr_1 \approx tlr_2
1018 }
1019 {\texttt{tlr\_step}~tll_1~tlr_1 \approx \texttt{tlr\_step}~tll_2~tlr_2}
1020\end{multicols}
1021\vspace{3ex}
1022\begin{multicols}{2}
1023\infrule
1024 {\ell~s_1 = \ell~s_2 \andalso
1025  tal_1\approx tal_2
1026 }
1027 {\texttt{tll\_base}~s_1~tal_1 \approx \texttt{tll\_base}~s_2~tal_2}
1028
1029\infrule
1030 {tal_1\approx tal_2
1031 }
1032 {\texttt{tal\_step\_default}~tal_1 \approx tal_2}
1033\end{multicols}
1034\vspace{3ex}
1035\infrule
1036 {}
1037 {\texttt{tal\_base\_not\_return}\approx taa \append \texttt{tal\_base\_not\_return}}
1038\vspace{1ex}
1039\infrule
1040 {}
1041 {\texttt{tal\_base\_return}\approx taa \append \texttt{tal\_base\_return}}
1042\vspace{1ex}
1043\infrule
1044 {tlr_1\approx tlr_2 \andalso
1045  s_1 \uparrow f \andalso s_2\uparrow f
1046 }
1047 {\texttt{tal\_base\_call}~s_1~tlr_1\approx taa \append \texttt{tal\_base\_call}~s_2~tlr_2}
1048\vspace{1ex}
1049\infrule
1050 {tlr_1\approx tlr_2 \andalso
1051  s_1 \uparrow f \andalso s_2\uparrow f \andalso
1052  \texttt{tal\_collapsable}~tal_2
1053 }
1054 {\texttt{tal\_base\_call}~s_1~tlr_1 \approx taa \append \texttt{tal\_step\_call}~s_2~tlr_2~tal_2)}
1055\vspace{1ex}
1056\infrule
1057 {tlr_1\approx tlr_2 \andalso
1058  s_1 \uparrow f \andalso s_2\uparrow f \andalso
1059  \texttt{tal\_collapsable}~tal_1
1060 }
1061 {\texttt{tal\_step\_call}~s_1~tlr_1~tal_1 \approx taa \append \texttt{tal\_base\_call}~s_2~tlr_2)}
1062\vspace{1ex}
1063\infrule
1064 {tlr_1 \approx tlr_2 \andalso
1065  s_1 \uparrow f \andalso s_2\uparrow f\andalso
1066  tal_1 \approx tal_2 \andalso
1067 }
1068 {\texttt{tal\_step\_call}~s_1~tlr_1~tal_1 \approx taa \append \texttt{tal\_step\_call}~s_2~tlr_2~tal_2}
1069\caption{The inference rule for the relation $\approx$.}
1070\label{fig:txx_rel}
1071\end{figure}
1072%
1073\begin{comment}
1074\begin{verbatim}
1075let rec tlr_rel S1 st1 st1' S2 st2 st2'
1076  (tlr1 : trace_label_return S1 st1 st1')
1077  (tlr2 : trace_label_return S2 st2 st2') on tlr1 : Prop ≝
1078match tlr1 with
1079  [ tlr_base st1 st1' tll1 ⇒
1080    match tlr2 with
1081    [ tlr_base st2 st2' tll2 ⇒ tll_rel … tll1 tll2
1082    | _ ⇒ False
1083    ]
1084  | tlr_step st1 st1' st1'' tll1 tl1 ⇒
1085    match tlr2 with
1086    [ tlr_step st2 st2' st2'' tll2 tl2 ⇒
1087      tll_rel … tll1 tll2 ∧ tlr_rel … tl1 tl2
1088    | _ ⇒ False
1089    ]
1090  ]
1091and tll_rel S1 fl1 st1 st1' S2 fl2 st2 st2'
1092 (tll1 : trace_label_label S1 fl1 st1 st1')
1093 (tll2 : trace_label_label S2 fl2 st2 st2') on tll1 : Prop ≝
1094  match tll1 with
1095  [ tll_base fl1 st1 st1' tal1 H ⇒
1096    match tll2 with
1097    [ tll_base fl2 st2 st2 tal2 G ⇒
1098      as_label_safe … («?, H») = as_label_safe … («?, G») ∧
1099      tal_rel … tal1 tal2
1100    ]
1101  ]
1102and tal_rel S1 fl1 st1 st1' S2 fl2 st2 st2'
1103 (tal1 : trace_any_label S1 fl1 st1 st1')
1104 (tal2 : trace_any_label S2 fl2 st2 st2')
1105   on tal1 : Prop ≝
1106  match tal1 with
1107  [ tal_base_not_return st1 st1' _ _ _ ⇒
1108    fl2 = doesnt_end_with_ret ∧
1109    ∃st2mid,taa,H,G,K.
1110    tal2 ≃ taa_append_tal ? st2 ??? taa
1111      (tal_base_not_return ? st2mid st2' H G K)
1112  | tal_base_return st1 st1' _ _ ⇒
1113    fl2 = ends_with_ret ∧
1114    ∃st2mid,taa,H,G.
1115    tal2 ≃ taa_append_tal ? st2 ? st2mid st2' taa
1116      (tal_base_return ? st2mid st2' H G)
1117  | tal_base_call st1 st1' st1'' _ prf _ tlr1 _ ⇒
1118    fl2 = doesnt_end_with_ret ∧
1119    ∃st2mid,G.as_call_ident S2 («st2mid, G») = as_call_ident ? «st1, prf» ∧
1120    ∃taa : trace_any_any ? st2 st2mid.∃st2mid',H.
1121    (* we must allow a tal_base_call to be similar to a call followed
1122      by a collapsable trace (trace_any_any followed by a base_not_return;
1123      we cannot use trace_any_any as it disallows labels in the end as soon
1124      as it is non-empty) *)
1125    (∃K.∃tlr2 : trace_label_return ? st2mid' st2'.∃L.
1126      tal2 ≃ taa @ (tal_base_call … H G K tlr2 L) ∧ tlr_rel … tlr1 tlr2) ∨
1127    ∃st2mid'',K.∃tlr2 : trace_label_return ? st2mid' st2mid''.∃L.
1128    ∃tl2 : trace_any_label … doesnt_end_with_ret st2mid'' st2'.
1129      tal2 ≃ taa @ (tal_step_call … H G K tlr2 L tl2) ∧
1130      tlr_rel … tlr1 tlr2 ∧ tal_collapsable … tl2
1131  | tal_step_call fl1 st1 st1' st1'' st1''' _ prf _ tlr1 _ tl1 ⇒
1132    ∃st2mid,G.as_call_ident S2 («st2mid, G») = as_call_ident ? «st1, prf» ∧
1133    ∃taa : trace_any_any ? st2 st2mid.∃st2mid',H.
1134    (fl2 = doesnt_end_with_ret ∧ ∃K.∃tlr2 : trace_label_return ? st2mid' st2'.∃L.
1135      tal2 ≃ taa @ tal_base_call … H G K tlr2 L ∧
1136      tal_collapsable … tl1 ∧ tlr_rel … tlr1 tlr2) ∨
1137    ∃st2mid'',K.∃tlr2 : trace_label_return ? st2mid' st2mid''.∃L.
1138    ∃tl2 : trace_any_label ? fl2 st2mid'' st2'.
1139      tal2 ≃ taa @ (tal_step_call … H G K tlr2 L tl2) ∧
1140      tal_rel … tl1 tl2 ∧ tlr_rel … tlr1 tlr2
1141  | tal_step_default fl1 st1 st1' st1'' _ tl1 _ _ ⇒
1142    tal_rel … tl1 tal2 (* <- this makes it many to many *)
1143  ].
1144\end{verbatim}
1145\end{comment}
1146%
1147In the preceding rules, a $taa$ is an inhabitant of the
1148$\texttt{trace\_any\_any}~s_1~s_2$ (shorthand $\texttt{TAA}~s_1~s_2$),
1149an inductive data type whose definition
1150is not in the paper for lack of space. It is the type of valid
1151prefixes (even empty ones) of \texttt{TAL}'s that do not contain
1152any function call. Therefore it
1153is possible to concatenate (using ``$\append$'') a \texttt{TAA} to the
1154left of a \texttt{TAL}. A \texttt{TAA} captures
1155a sequence of silent moves.
1156The \texttt{tal\_collapsable} unary predicate over \texttt{TAL}'s
1157holds when the argument does not contain any function call and it ends
1158with a label (not a return). The intuition is that after a function call we
1159can still perform a sequence of silent actions while remaining similar.
1160
1161As should be expected, even though the rules are asymmetric $\approx$ is in fact
1162an equivalence relation.
1163\section{Forward simulation}
1164\label{simulation}
1165
1166We summarise here the results of the previous sections. Each intermediate
1167unstructured language can be given a semantics based on structured traces,
1168that single out those runs that respect a certain number of invariants.
1169A cost model can be computed on the object code and it can be used to predict
1170the execution costs of runs that produce structured traces. The cost model
1171can be lifted from the target to the source code of a pass if the pass maps
1172structured traces to similar structured traces. The latter property is called
1173a \emph{forward simulation}.
1174
1175As for labelled transition systems, in order to establish the forward
1176simulation we are interested in (preservation of observables), we are
1177forced to prove a stronger notion of forward simulation that introduces
1178an explicit relation between states. The classical notion of a 1-to-many
1179forward simulation is the existence of a relation $\S$ over states such that
1180if $s_1 \S s_2$ and $s_1 \to^1 s_1'$ then there exists an $s_2'$ such that
1181$s_2 \to^* s_2'$ and $s_1' \S s_2'$. In our context, we need to replace the
1182one and multi step transition relations $\to^n$ with the existence of
1183a structured trace between the two states, and we need to add the request that
1184the two structured traces are similar. Thus what we would like to state is
1185something like:\\
1186for all $s_1,s_2,s_1'$ such that there is a $\tau_1$ from
1187$s_1$ to $s_1'$ and $s_1 \S s_2$ there exists an $s_2'$ such that
1188$s_1' \S s_2'$ and a $\tau_2$ from $s_2$ to $s_2'$ such that
1189$\tau_1$ is similar to $\tau_2$. We call this particular form of forward
1190simulation \emph{trace reconstruction}.
1191
1192The statement just introduced, however, is too simplistic and not provable
1193in the general case. To understand why, consider the case of a function call
1194and the pass that fixes the parameter passing conventions. A function
1195call in the source code takes in input an arbitrary number of pseudo-registers (the actual parameters to pass) and returns an arbitrary number of pseudo-registers (where the result is stored). A function call in the target language has no
1196input nor output parameters. The pass must add explicit code before and after
1197the function call to move the pseudo-registers content from/to the hardware
1198registers or the stack in order to implement the parameter passing strategy.
1199Similarly, each function body must be augmented with a preamble and a postamble
1200to complete/initiate the parameter passing strategy for the call/return phase.
1201Therefore what used to be a call followed by the next instruction to execute
1202after the function return, now becomes a sequence of instructions, followed by
1203a call, followed by another sequence. The two states at the beginning of the
1204first sequence and at the end of the second sequence are in relation with
1205the status before/after the call in the source code, like in an usual forward
1206simulation. How can we prove however the additional condition for function calls
1207that asks that when the function returns the instruction immediately after the
1208function call is called? To grant this invariant, there must be another relation
1209between the address of the function call in the source and in the target code.
1210This additional relation is to be used in particular to relate the two stacks.
1211
1212Another example is given by preservation of code emission statements. A single
1213code emission instruction can be simulated by a sequence of steps, followed
1214by a code emission, followed by another sequence. Clearly the initial and final
1215statuses of the sequence are to be in relation with the status before/after the
1216code emission in the source code. In order to preserve the structured traces
1217invariants, however, we must consider a second relation between states that
1218traces the preservation of the code emission statement.
1219
1220Therefore we now introduce an abstract notion of relation set between abstract
1221statuses and an abstract notion of 1-to-many forward simulation conditions.
1222These two definitions enjoy the following remarkable properties:
1223\begin{enumerate}
1224 \item they are generic enough to accommodate all passes of the CerCo compiler
1225 \item the conjunction of the 1-to-many forward simulation conditions are
1226       just slightly stricter than the statement of a 1-to-many forward
1227       simulation in the classical case. In particular, they only require
1228       the construction of very simple forms of structured traces made of
1229       silent states only.
1230 \item they allow to prove our main result of the paper: the 1-to-many
1231       forward simulation conditions are sufficient to prove the trace
1232       reconstruction theorem
1233\end{enumerate}
1234
1235Point 3. is the important one. First of all it means that we have reduced
1236the complex problem of trace reconstruction to a much simpler one that,
1237moreover, can be solved with slight adaptations of the forward simulation proof
1238that is performed for a compiler that only cares about functional properties.
1239Therefore we have successfully splitted as much as possible the proof of
1240preservation of functional properties from that of non-functional ones.
1241Secondly, combined with the results in the previous section, it implies
1242that the cost model can be computed on the object code and lifted to the
1243source code to reason on non-functional properties, assuming that
1244the 1-to-many forward simulation conditions are fulfilled for every
1245compiler pass.
1246
1247\paragraph{Relation sets.}
1248
1249We introduce now the four relations $\mathcal{S,C,L,R}$ between abstract
1250statuses that are used to correlate the corresponding statues before and
1251after a compiler pass. The first two are abstract and must be instantiated
1252by every pass. The remaining two are derived relations.
1253
1254The $\S$ relation between states is the classical relation used
1255in forward simulation proofs. It correlates the data of the status
1256(e.g. registers, memory, etc.).
1257
1258The $\C$ relation correlates call states. It allows to track the
1259position in the target code of every call in the source code.
1260
1261The $\L$ relation simply says that the two states are both label
1262emitting states that emit the same label, \emph{i.e.}\ $s_1\L s_2\iffdef \ell~s_1=\ell~s_2$.
1263It allows to track the position in
1264the target code of every cost emitting statement in the source code.
1265
1266Finally the $\R$ relation is the more complex one. Two states
1267$s_1$ and $s_2$ are $\R$ correlated if every time $s_1$ is the
1268successors of a call state that is $\C$-related to a call state
1269$s_2'$ in the target code, then $s_2$ is the successor of $s_2'$. Formally:
1270$$s_1\R s_2 \iffdef \forall s_1',s_2'.s_1'\C s_2' \to s_1'\ar s_1 \to s_2' \ar s_2.$$
1271We will require all pairs of states that follow a related call to be
1272$\R$-related. This is the fundamental requirement granting
1273that the target trace is well structured, \emph{i.e.}\ that calls are well
1274nested and returning where they are supposed to.
1275
1276% \begin{alltt}
1277% record status_rel (S1,S2 : abstract_status) : Type[1] := \{
1278%   \(\S\): S1 \(\to\) S2 \(\to\) Prop;
1279%   \(\C\): (\(\Sigma\)s.as_classifier S1 s cl_call) \(\to\)
1280%      (\(\Sigma\)s.as_classifier S2 s cl_call) \(\to\) Prop \}.
1281%
1282% definition \(\L\) S1 S2 st1 st2 := as_label S1 st1 = as_label S2 st2.
1283%
1284% definition \(\R\) S1 S2 (R: status_rel S1 S2) s1_ret s2_ret ≝
1285%  \(\forall\)s1_pre,s2_pre.
1286%   as_after_return s1_pre s1_ret \(\to\) s1_pre \(\R\) s2_pre \(\to\)
1287%    as_after_return s2_pre s2_ret.
1288% \end{alltt}
1289
1290\begin{figure}
1291\centering
1292\begin{tabular}{@{}c@{}c@{}c@{}}
1293% \begin{subfigure}{.475\linewidth}
1294% \centering
1295% \begin{tikzpicture}[every join/.style={ar}, join all, thick,
1296%                             every label/.style=overlay, node distance=10mm]
1297%     \matrix [diag] (m) {%
1298%          \node (s1) [is jump] {}; & \node [fill=white] (t1) {};\\
1299%          \node (s2) {}; & \node (t2) {}; \\
1300%     };
1301%     \node [above=0 of t1, overlay] {$\alpha$};
1302%     {[-stealth]
1303%     \draw (s1) -- (t1);
1304%     \draw [new] (s2) -- node [above] {$*$} (t2);
1305%     }
1306%     \draw (s1) to node [rel] {$\S$} (s2);
1307%     \draw [new] (t1) to node [rel] {$\S,\L$} (t2);
1308% \end{tikzpicture}
1309% \caption{The \texttt{cl\_jump} case.}
1310% \label{subfig:cl_jump}
1311% \end{subfigure}
1312% &
1313\begin{subfigure}{.25\linewidth}
1314\centering
1315\begin{tikzpicture}[every join/.style={ar}, join all, thick,
1316                            every label/.style=overlay, node distance=10mm]
1317    \matrix [diag] (m) {%
1318         \node (s1) {}; & \node (t1) {};\\
1319         \node (s2) {}; & \node (t2) {}; \\
1320    };
1321    {[-stealth]
1322    \draw (s1) -- (t1);
1323    \draw [new] (s2) -- node [above] {$*$} (t2);
1324    }
1325    \draw (s1) to node [rel] {$\S$} (s2);
1326    \draw [new] (t1) to node [rel] {$\S,\L$} (t2);
1327\end{tikzpicture}
1328\caption{The \texttt{cl\_oher} and \texttt{cl\_jump} cases.}
1329\label{subfig:cl_other_jump}
1330\end{subfigure}
1331&
1332\begin{subfigure}{.375\linewidth}
1333\centering
1334\begin{tikzpicture}[every join/.style={ar}, join all, thick,
1335                            every label/.style=overlay, node distance=10mm]
1336    \matrix [diag, small vgap] (m) {%
1337        \node (t1) {}; \\
1338         \node (s1) [is call] {}; \\
1339         & \node (l) {}; & \node (t2) {};\\
1340         \node (s2) {}; & \node (c) [is call] {};\\   
1341    };
1342    {[-stealth]
1343    \draw (s1) -- node [left] {$f$} (t1);
1344    \draw [new] (s2) -- node [above] {$*$} (c);
1345    \draw [new] (c) -- node [right] {$f$} (l);
1346    \draw [new] (l) -- node [above] {$*$} (t2);
1347    }
1348    \draw (s1) to node [rel] {$\S$} (s2);
1349    \draw [new] (t1) to [bend left] node [rel] {$\S$} (t2);
1350    \draw [new] (t1) to [bend left] node [rel] {$\L$} (l);
1351    \draw [new] (t1) to node [rel] {$\C$} (c);
1352    \end{tikzpicture}
1353\caption{The \texttt{cl\_call} case.}
1354\label{subfig:cl_call}
1355\end{subfigure}
1356&
1357\begin{subfigure}{.375\linewidth}
1358\centering
1359\begin{tikzpicture}[every join/.style={ar}, join all, thick,
1360                            every label/.style=overlay, node distance=10mm]
1361    \matrix [diag, small vgap] (m) {%
1362        \node (s1) [is ret] {}; \\
1363        \node (t1) {}; \\
1364        \node (s2) {}; & \node (c) [is ret] {};\\
1365        & \node (r) {}; & \node (t2) {}; \\   
1366    };
1367    {[-stealth]
1368    \draw (s1) -- (t1);
1369    \draw [new] (s2) -- node [above] {$*$} (c);
1370    \draw [new] (c) -- (r);
1371    \draw [new] (r) -- node [above] {$*$} (t2);
1372    }
1373    \draw (s1) to [bend right=45] node [rel] {$\S$} (s2);
1374    \draw [new, overlay] (t1) to [bend left=90, looseness=1] node [rel] {$\S,\L$} (t2);
1375    \draw [new, overlay] (t1) to [bend left=90, looseness=1.2] node [rel] {$\R$} (r);
1376\end{tikzpicture}
1377\caption{The \texttt{cl\_return} case.}
1378\label{subfig:cl_return}
1379\end{subfigure}
1380\end{tabular}
1381\caption{Mnemonic diagrams depicting the hypotheses for the preservation of structured traces.
1382         Dashed lines
1383         and arrows indicates how the diagrams must be closed when solid relations
1384         are present.}
1385\label{fig:forwardsim}
1386\end{figure}
1387
1388\paragraph{1-to-many forward simulation conditions.}
1389\begin{condition}[Cases \texttt{cl\_other} and \texttt{cl\_jump}]
1390 For all $s_1,s_1',s_2$ such that $s_1 \S s_1'$, and
1391 $s_1\exec s_1'$, and either $s_1 \class \texttt{cl\_other}$ or
1392 both $s_1\class\texttt{cl\_other}\}$ and $\ell~s_1'$,
1393 there exists an $s_2'$ and a $\texttt{trace\_any\_any\_free}~s_2~s_2'$ called $taaf$
1394 such that $s_1' \mathrel{{\S} \cap {\L}} s_2'$ and either
1395$taaf$ is non empty, or one among $s_1$ and $s_1'$ is \texttt{as\_costed}.
1396\end{condition}
1397
1398In the above condition depicted in \autoref{subfig:cl_other_jump},
1399a $\texttt{trace\_any\_any\_free}~s_1~s_2$ (which from now on
1400will be shorthanded as \verb+TAAF+) is an
1401inductive type of structured traces that do not contain function calls or
1402cost emission statements. Differently from a \verb+TAA+, the
1403instruction to be executed in the lookahead state $s_2$ may be a cost emission
1404statement.
1405
1406The intuition of the condition is that one step can be replaced with zero or more steps if it
1407preserves the relation between the data and if the two final statuses are
1408labelled in the same way. Moreover, we must take special care of the empty case
1409to avoid collapsing two consecutive states that emit a label, missing one of the two emissions.
1410
1411\begin{condition}[Case \texttt{cl\_call}]
1412 For all $s_1,s_1',s_2$ s.t. $s_1 \S s_1'$ and
1413 $s_1\exec s_1'$ and $s_1 \class \texttt{cl\_call}$, there exists $s_a, s_b, s_2'$, a
1414$\verb+TAA+~s_2~s_a$, and a
1415$\verb+TAAF+~s_b~s_2'$ such that:
1416$s_a\class\texttt{cl\_call}$, the \texttt{as\_call\_ident}'s of
1417the two call states are the same, $s_1 \C s_a$,
1418$s_a\exec s_b$, $s_1' \L s_b$ and
1419$s_1' \S s_2'$.
1420\end{condition}
1421
1422The condition, depicted in \autoref{subfig:cl_call} says that, to simulate a function call, we can perform a
1423sequence of silent actions before and after the function call itself.
1424The old and new call states must be $\C$-related, the old and new
1425states at the beginning of the function execution must be $\L$-related
1426and, finally, the two initial and final states must be $\S$-related
1427as usual.
1428
1429\begin{condition}[Case \texttt{cl\_return}]
1430 For all $s_1,s_1',s_2$ s.t. $s_1 \S s_1'$,
1431 $s_1\exec s_1'$ and $s_1 \class \texttt{cl\_return}$, there exists $s_a, s_b, s_2'$, a
1432$\verb+TAA+~s_2~s_a$, a
1433$\verb+TAAF+~s_b~s_2'$ called $taaf$ such that:
1434$s_a\class\texttt{cl\_return}$,
1435$s_a\exec s_b$,
1436$s_1' \R s_b$ and
1437$s_1' \mathrel{{\S} \cap {\L}} s_2'$ and either
1438$taaf$ is non empty, or $\lnot \ell~s_a$.
1439\end{condition}
1440
1441Similarly to the call condition, to simulate a return we can perform a
1442sequence of silent actions before and after the return statement itself,
1443as depicted in \autoref{subfig:cl_return}.
1444The old and the new statements after the return must be $\R$-related,
1445to grant that they returned to corresponding calls.
1446The two initial and final states must be $\S$-related
1447as usual and, moreover, they must exhibit the same labels. Finally, when
1448the suffix is non empty we must take care of not inserting a new
1449unmatched cost emission statement just after the return statement.
1450
1451\begin{comment}
1452\begin{verbatim}
1453definition status_simulation ≝
1454  λS1 : abstract_status.
1455  λS2 : abstract_status.
1456  λsim_status_rel : status_rel S1 S2.
1457    ∀st1,st1',st2.as_execute S1 st1 st1' →
1458    sim_status_rel st1 st2 →
1459    match as_classify … st1 with
1460    [ None ⇒ True
1461    | Some cl ⇒
1462      match cl with
1463      [ cl_call ⇒ ∀prf.
1464        (*
1465             st1' ------------S----------\
1466              ↑ \                         \
1467             st1 \--L--\                   \
1468              | \       \                   \
1469              S  \-C-\  st2_after_call →taa→ st2'
1470              |       \     ↑
1471             st2 →taa→ st2_pre_call
1472        *)
1473        ∃st2_pre_call.
1474        as_call_ident ? st2_pre_call = as_call_ident ? («st1, prf») ∧
1475        call_rel ?? sim_status_rel «st1, prf» st2_pre_call ∧
1476        ∃st2_after_call,st2'.
1477        ∃taa2 : trace_any_any … st2 st2_pre_call.
1478        ∃taa2' : trace_any_any … st2_after_call st2'.
1479        as_execute … st2_pre_call st2_after_call ∧
1480        sim_status_rel st1' st2' ∧
1481        label_rel … st1' st2_after_call
1482      | cl_return ⇒
1483        (*
1484             st1
1485            / ↓
1486           | st1'----------S,L------------\
1487           S   \                           \
1488            \   \-----R-------\            |     
1489             \                 |           |
1490             st2 →taa→ st2_ret |           |
1491                          ↓   /            |
1492                     st2_after_ret →taaf→ st2'
1493
1494           we also ask that st2_after_ret be not labelled if the taaf tail is
1495           not empty
1496        *)
1497        ∃st2_ret,st2_after_ret,st2'.
1498        ∃taa2 : trace_any_any … st2 st2_ret.
1499        ∃taa2' : trace_any_any_free … st2_after_ret st2'.
1500        (if taaf_non_empty … taa2' then ¬as_costed … st2_after_ret else True) ∧
1501        as_classifier … st2_ret cl_return ∧
1502        as_execute … st2_ret st2_after_ret ∧ sim_status_rel st1' st2' ∧
1503        ret_rel … sim_status_rel st1' st2_after_ret ∧
1504        label_rel … st1' st2'
1505      | cl_other ⇒
1506          (*         
1507          st1 → st1'
1508            |      \
1509            S      S,L
1510            |        \
1511           st2 →taaf→ st2'
1512           
1513           the taaf can be empty (e.g. tunneling) but we ask it must not be the
1514           case when both st1 and st1' are labelled (we would be able to collapse
1515           labels otherwise)
1516         *)
1517        ∃st2'.
1518        ∃taa2 : trace_any_any_free … st2 st2'.
1519        (if taaf_non_empty … taa2 then True else (¬as_costed … st1 ∨ ¬as_costed … st1')) ∧
1520        sim_status_rel st1' st2' ∧
1521        label_rel … st1' st2'
1522      | cl_jump ⇒
1523        (* just like cl_other, but with a hypothesis more *)
1524        as_costed … st1' →
1525        ∃st2'.
1526        ∃taa2 : trace_any_any_free … st2 st2'.
1527        (if taaf_non_empty … taa2 then True else (¬as_costed … st1 ∨ ¬as_costed … st1')) ∧
1528        sim_status_rel st1' st2' ∧
1529        label_rel … st1' st2'
1530      ]
1531    ].
1532\end{verbatim}
1533\end{comment}
1534
1535\paragraph{Main result: the 1-to-many forward simulation conditions
1536are sufficient to trace reconstruction}
1537
1538Let us assume that a relation set is given such that the 1-to-many
1539forward simulation conditions are satisfied. Under this assumption we
1540can prove the following three trace reconstruction theorems by mutual
1541structural induction over the traces given in input between the
1542$s_1$ and $s_1'$ states.
1543
1544In particular, the \texttt{status\_simulation\_produce\_tlr} theorem
1545applied to the \texttt{main} function of the program and equal
1546$s_{2_b}$ and $s_2$ states shows that, for every initial state in the
1547source code that induces a structured trace in the source code,
1548the compiled code produces a similar structured trace.
1549
1550\begin{theorem}[\texttt{status\_simulation\_produce\_tlr}]
1551For every $s_1,s_1',s_{2_b},s_2$ s.t.
1552there is a $\texttt{TLR}~s_1~s_1'$ called $tlr_1$ and a
1553$\verb+TAA+~s_{2_b}~s_2$ and $s_1 \L s_{2_b}$ and
1554$s_1 \S s_2$, there exists $s_{2_m},s_2'$ s.t.
1555there is a $\texttt{TLR}~s_{2_b}~s_{2_m}$ called $tlr_2$ and
1556there is a $\verb+TAAF+~s_{2_m}~s_2'$ called $taaf$
1557s.t. if $taaf$ is non empty then $\lnot (\ell~s_{2_m})$,
1558and $tlr_1\approx tlr_2$
1559and $s_1' \mathrel{{\S} \cap {\L}} s_2'$ and
1560$s_1' \R s_{2_m}$.
1561\end{theorem}
1562
1563The theorem states that a \texttt{trace\_label\_return} in the source code
1564together with a precomputed preamble of silent states
1565(the \verb+TAA+) in the target code induces a
1566similar \texttt{trace\_label\_return} in the target code which can be
1567followed by a sequence of silent states. Note that the statement does not
1568require the produced \texttt{trace\_label\_return} to start with the
1569precomputed preamble, even if this is likely to be the case in concrete
1570implementations. The preamble in input is necessary for compositionality, e.g.
1571because the 1-to-many forward simulation conditions allow in the
1572case of function calls to execute a preamble of silent instructions just after
1573the function call.
1574
1575Clearly similar results are also available for the other two types of structured
1576traces (in fact, they are all proved simultaneously by mutual induction).
1577% \begin{theorem}[\texttt{status\_simulation\_produce\_tll}]
1578% For every $s_1,s_1',s_{2_b},s_2$ s.t.
1579% there is a $\texttt{TLL}~b~s_1~s_1'$ called $tll_1$ and a
1580% $\verb+TAA+~s_{2_b}~s_2$ and $s_1 \L s_{2_b}$ and
1581% $s_1 \S s_2$, there exists $s_{2_m},s_2'$ s.t.
1582% \begin{itemize}
1583%  \item if $b$ (the trace ends with a return) then there exists $s_{2_m},s_2'$
1584%        and a trace $\texttt{TLL}~b~s_{2_b}~s_{2_m}$ called $tll_2$
1585%        and a $\texttt{TAAF}~s_{2_m}~s_2'$ called $taa_2$ s.t.
1586%        $s_1' \mathrel{{\S} \cap {\L}} s_2'$ and
1587%        $s_1' \R s_{2_m}$ and
1588%        $tll_1\approx tll_2$ and
1589%        if $taa_2$ is non empty then $\lnot \ell~s_{2_m}$;
1590%  \item else there exists $s_2'$ and a
1591%        $\texttt{TLL}~b~s_{2_b}~s_2'$ called $tll_2$ such that
1592%        $s_1' \mathrel{{\S} \cap {\L}} s_2'$ and
1593%        $tll_1\approx tll_2$.
1594% \end{itemize}
1595% \end{theorem}
1596%
1597% The statement is similar to the previous one: a source
1598% \texttt{trace\_label\_label} and a given target preamble of silent states
1599% in the target code induce a similar \texttt{trace\_label\_label} in the
1600% target code, possibly followed by a sequence of silent moves that become the
1601% preamble for the next \texttt{trace\_label\_label} translation.
1602%
1603% \begin{theorem}[\texttt{status\_simulation\_produce\_tal}]
1604% For every $s_1,s_1',s_2$ s.t.
1605% there is a $\texttt{TAL}~b~s_1~s_1'$ called $tal_1$ and
1606% $s_1 \S s_2$
1607% \begin{itemize}
1608%  \item if $b$ (the trace ends with a return) then there exists $s_{2_m},s_2'$
1609%    and a trace $\texttt{TAL}~b~s_2~s_{2_m}$ called $tal_2$ and a
1610%    $\texttt{TAAF}~s_{2_m}~s_2'$ called $taa_2$ s.t.
1611%    $s_1' \mathrel{{\S} \cap {\L}} s_2'$ and
1612%    $s_1' \R s_{2_m}$ and
1613%    $tal_1 \approx tal_2$ and
1614%    if $taa_2$ is non empty then $\lnot \ell~s_{2_m}$;
1615%  \item else there exists $s_2'$ and a
1616%    $\texttt{TAL}~b~s_2~s_2'$ called $tal_2$ such that
1617%    either $s_1' \mathrel{{\S} \cap {\L}} s_2'$ and
1618%        $tal_1\approx tal_2$
1619%    or $s_1' \mathrel{{\S} \cap {\L}} s_2$ and
1620%    $\texttt{tal\_collapsable}~tal_1$ and $\lnot \ell~s_1$.
1621% \end{itemize}
1622% \end{theorem}
1623%
1624% The statement is also similar to the previous ones, but for the lack of
1625% the target code preamble.
1626
1627\begin{comment}
1628\begin{corollary}
1629For every $s_1,s_1',s_2$ s.t.
1630there is a $\texttt{trace\_label\_return}~s_1~s_1'$ called $tlr_1$ and
1631$s_1 (\L \cap \S) s_2$
1632there exists $s_{2_m},s_2'$ s.t.
1633there is a $\texttt{trace\_label\_return}~s_2~s_{2_m}$ called $tlr_2$ and
1634there is a $\texttt{trace\_any\_any\_free}~s_{2_m}~s_2'$ called $taaf$
1635s.t. if $taaf$ is non empty then $\lnot (\texttt{as\_costed}~s_{2_m})$,
1636and $\texttt{tlr\_rel}~tlr_1~tlr_2$
1637and $s_1' (\S \cap \L) s_2'$ and
1638$s_1' \R s_{2_m}$.
1639\end{corollary}
1640\end{comment}
1641
1642\begin{comment}
1643\begin{verbatim}
1644status_simulation_produce_tlr S1 S2 R
1645(* we start from this situation
1646     st1 →→→→tlr→→→→ st1'
1647      | \
1648      L  \---S--\
1649      |          \
1650   st2_lab →taa→ st2   (the taa preamble is in general either empty or given
1651                        by the preceding call)
1652   
1653   and we produce
1654     st1 →→→→tlr→→→→ st1'
1655             \\      /  \
1656             //     R    \-L,S-\
1657             \\     |           \
1658   st2_lab →tlr→ st2_mid →taaf→ st2'
1659*)
1660  st1 st1' st2_lab st2
1661  (tlr1 : trace_label_return S1 st1 st1')
1662  (taa2_pre : trace_any_any S2 st2_lab st2)
1663  (sim_execute : status_simulation S1 S2 R)
1664  on tlr1 : R st1 st2 → label_rel … st1 st2_lab →
1665  ∃st2_mid.∃st2'.
1666  ∃tlr2 : trace_label_return S2 st2_lab st2_mid.
1667  ∃taa2 : trace_any_any_free … st2_mid st2'.
1668  (if taaf_non_empty … taa2 then ¬as_costed … st2_mid else True) ∧
1669  R st1' st2' ∧ ret_rel … R st1' st2_mid ∧ label_rel … st1' st2' ∧
1670  tlr_rel … tlr1 tlr2
1671\end{verbatim}
1672\end{comment}
1673
1674\section{Conclusions and future works}
1675\label{conclusions}
1676The labelling approach is a technique to implement compilers that induce on
1677the source code a non uniform cost model determined from the object code
1678produced. The cost model assigns a cost to each basic block of the program.
1679The main theorem of the approach says that there is an exact
1680correspondence between the sequence of basic blocks started in the source
1681and object code, and that no instruction in the source or object code is
1682executed outside a basic block. Thus the cost of object code execution
1683can be computed precisely on the source.
1684
1685In this paper we scale the labelling approach to cover a programming language
1686with function calls. This introduces new difficulties only when the language
1687is unstructured, i.e. it allows function calls to return anywhere in the code,
1688destroying the hope of a static prediction of the cost of basic blocks.
1689We restore static predictability by introducing a new semantics for unstructured
1690programs that single outs well structured executions. The latter are represented
1691by structured traces, a generalisation of streams of observables that capture
1692several structural invariants of the execution, like well nesting of functions
1693or the fact that every basic block must start with a code emission statement.
1694We show that structured traces are sufficiently structured to statically compute
1695a precise cost model on the object code.
1696
1697We introduce a similarity relation on structured traces that must hold between
1698source and target traces. When the relation holds for every program, we prove
1699that the cost model can be lifted from the object to the source code.
1700
1701In order to prove that similarity holds, we present a generic proof of forward
1702simulation that is aimed at pulling apart as much as possible the part of the
1703simulation related to non-functional properties (preservation of structure)
1704from that related to functional properties. In particular, we reduce the
1705problem of preservation of structure to that of showing a 1-to-many
1706forward simulation that only adds a few additional proof obligations to those
1707of a traditional, function properties only, proof.
1708
1709All results presented in the paper are part of a larger certification of a
1710C compiler which is based on the labelling approach. The certification, done
1711in Matita, is the main deliverable of the FET-Open Certified Complexity (CerCo).
1712
1713The short term future work consists in the completion of the certification of
1714the CerCo compiler exploiting the main theorem of this paper.
1715
1716\paragraph{Related works.}
1717CerCo is the first project that explicitly tries to induce a
1718precise cost model on the source code in order to establish non-functional
1719properties of programs on an high level language. Traditional certifications
1720of compilers, like~\cite{compcert2,piton}, only explicitly prove preservation
1721of the functional properties.
1722
1723Usually forward simulations take the following form: for each transition
1724from $s_1$ to $s_2$ in the source code, there exists an equivalent sequence of
1725transitions in the target code of length $n$. The number $n$ of transition steps
1726in the target code can just be the witness of the existential statement.
1727An equivalent alternative when the proof of simulation is constructive consists
1728in providing an explicit function, called \emph{clock function} in the
1729literature~\cite{clockfunctions}, that computes $n$ from $s_1$. Every clock
1730function constitutes then a cost model for the source code, in the spirit of
1731what we are doing in CerCo. However, we believe our solution to be superior
1732in the following respects: 1) the machinery of the labelling approach is
1733insensible to the resource being measured. Indeed, any cost model computed on
1734the object code can be lifted to the source code (e.g. stack space used,
1735energy consumed, etc.). On the contrary, clock functions only talk about
1736number of transition steps. In order to extend the approach with clock functions
1737to other resources, additional functions must be introduced. Moreover, the
1738additional functions would be handled differently in the proof.
17392) the cost models induced by the labelling approach have a simple presentation.
1740In particular, they associate a number to each basic block. More complex
1741models can be induced when the approach is scaled to cover, for instance,
1742loop optimisations~\cite{loopoptimizations}, but the costs are still meant to
1743be easy to understand and manipulate in an interactive theorem prover or
1744in Frama-C.
1745On the contrary, a clock function is a complex function of the state $s_1$
1746which, as a function, is an opaque object that is difficult to reify as
1747source code in order to reason on it.
1748
1749\bibliographystyle{splncs03}
1750\bibliography{ccexec}
1751
1752% \appendix
1753% \section{Notes for the reviewers}
1754%
1755% The results described in the paper are part of a larger formalization
1756% (the certification of the CerCo compiler). At the moment of the submission
1757% we need to single out from the CerCo formalization the results presented here.
1758% Before the 16-th of February we will submit an attachment that contains the
1759% minimal subset of the CerCo formalization that allows to prove those results.
1760% At that time it will also be possible to measure exactly the size of the
1761% formalization described here. At the moment a rough approximation suggests
1762% about 2700 lines of Matita code.
1763%
1764% We will also attach the development version of the interactive theorem
1765% prover Matita that compiles the submitted formalization. Another possibility
1766% is to backport the development to the last released version of the system
1767% to avoid having to re-compile Matita from scratch.
1768%
1769% The programming and certification style used in the formalization heavily
1770% exploit dependent types. Dependent types are used: 1) to impose invariants
1771% by construction on the data types and operations (e.g. a traces from a state
1772% $s_1$ to a state $s_2$ can be concatenad to a trace from a state
1773% $s_2'$ to a state $s_3$ only if $s_2$ is convertible with $s_2'$); 2)
1774% to state and prove the theorems by using the Russell methodology of
1775% Matthieu Sozeau\footnote{Subset Coercions in Coq in TYPES'06. Matthieu Sozeau. Thorsten Altenkirch and Conor McBride (Eds). Volume 4502 of Lecture Notes in Computer Science. Springer, 2007, pp.237-252.
1776% }, better known in the Coq world as ``\texttt{Program}'' and reimplemented in a simpler way in Matita using coercion propagations\footnote{Andrea Asperti, Wilmer Ricciotti, Claudio Sacerdoti Coen, Enrico Tassi: A Bi-Directional Refinement Algorithm for the Calculus of (Co)Inductive Constructions. Logical Methods in Computer Science 8(1) (2012)}. However, no result presented depends
1777% mandatorily on dependent types: it should be easy to adapt the technique
1778% and results presented in the paper to HOL.
1779%
1780% Finally, Matita and Coq are based on minor variations of the Calculus of
1781% (Co)Inductive Constructions. These variations do not affect the CerCo
1782% formalization. Therefore a porting of the proofs and ideas to Coq would be
1783% rather straightforward.
1784
1785\end{document}
Note: See TracBrowser for help on using the repository browser.