source: Papers/itp-2013/ccexec2.tex @ 3348

Last change on this file since 3348 was 3348, checked in by tranquil, 6 years ago

ipotesi della forward simulation

File size: 94.5 KB
12% \usepackage{amsmath}
17% NB: might be worth removing this if changing class in favour of
18% a built-in definition.
23  {keywords={Definition,Lemma,Theorem,Remark,Qed,Save,Inductive,Record},
24   morekeywords={[2]if,then,else,forall,Prop,match,with,end,let},
25  }
28  morekeywords={int8_t},
29  mathescape,
30  escapechar=\%
36%         keywordstyle=\color{red}\bfseries,
37%         keywordstyle=[2]\color{blue},
38%         commentstyle=\color{green},
39%         stringstyle=\color{blue},
40        showspaces=false,showstringspaces=false,
41%         xleftmargin=1em
42        }
47\pgfutil@ifundefined{pgf@arrow@code@implies}{% supply for lack of double arrow special arrow tip if it is not there%
48  \pgfarrowsdeclare{implies}{implies}%
49  {%
50  \pgfarrowsleftextend{2.2pt}%
51  \pgfarrowsrightextend{2.2pt}%
52  }%
53  {%
54    \pgfsetdash{}{0pt} % do not dash%
55    \pgfsetlinewidth{.33pt}%
56    \pgfsetroundjoin   % fix join%
57    \pgfsetroundcap    % fix cap%
58    \pgfpathmoveto{\pgfpoint{-1.5pt}{2.5pt}}%
59    \pgfpathcurveto{\pgfpoint{-.75pt}{1.5pt}}{\pgfpoint{0.5pt}{.5pt}}{\pgfpoint{2pt}{0pt}}%
60    \pgfpathcurveto{\pgfpoint{0.5pt}{-.5pt}}{\pgfpoint{-.75pt}{-1.5pt}}{\pgfpoint{-1.5pt}{-2.5pt}}%
61    \pgfusepathqstroke%
62  }%
66\tikzset{state/.style={inner sep = 0, outer sep = 2pt, draw, fill},
67         every node/.style={inner sep=2pt},
68         every on chain/.style = {inner sep = 0, outer sep = 2pt},
69         join all/.style = {every on chain/.append style={join}},
70         on/.style={on chain={#1}, state},
71         m/.style={execute at begin node=$, execute at end node=$},
72         node distance=3mm,
73         is other/.style={circle, minimum size = 3pt, state},
74         other/.style={on, is other},
75         is jump/.style={diamond, minimum size = 6pt, state},
76         jump/.style={on, is jump},
77         is call/.style={regular polygon, regular polygon sides=3, minimum size=5pt, state},
78         call/.style={on=going below, is call, node distance=6mm, label=above left:$#1$},
79         is ret/.style={regular polygon, regular polygon sides=3, minimum size=5pt, shape border rotate=180, state},
80         ret/.style={on=going above, is ret, node distance=6mm},
81         chain/.style={start chain=#1 going left},
82         rev ar/.style={stealth-, thick},
83         ar/.style={-stealth, thick},
84         every join/.style={rev ar},
85         labelled/.style={fill=white, label=above:$#1$},
86         vcenter/.style={baseline={([yshift=-.5ex]current bounding box)}},
87         every picture/.style={thick},
88         double equal sign distance/.prefix style={double distance=1.5pt}, %% if already defined (newest version of pgf) it should be ignored%}
89         implies/.style={double, -implies, thin, double equal sign distance, shorten <=5pt, shorten >=5pt},
90         new/.style={densely dashed},
91         rel/.style={font=\scriptsize, fill=white, inner sep=2pt},
92         diag/.style={row sep={11mm,between origins},
93                      column sep={11mm,between origins},
94                      every node/.style={execute at begin node=$, execute at end node=$},
95%                       every node/.style={draw, is other},
96                      },
97         small vgap/.style={row sep={7mm,between origins}},
98         small hgap/.style={column sep={7mm,between origins}},
99         small gap/.style={small vgap, small hgap},
100         % for article, maybe temporary
101         is jump/.style=is other,
102         is call/.style=is other,
103         is ret/.style=is other,
106\def\L{\mathrel{\mathcal L}}
107\def\S{\mathrel{\mathcal S}}
108\def\R{\mathrel{\mathcal R}}
109\def\C{\mathrel{\mathcal C}}
111\def\Labels{\mathbb L}
112\def\Functions{\mathbb F}
115\savebox{\execbox}{\tikz[baseline=-.5ex]\draw [-stealth] (0,0) -- ++(1em, 0);}
139\title{Certification of the Preservation of Structure by a Compiler's Back-end Pass\thanks{The project CerCo acknowledges the financial support of the Future and
140Emerging Technologies (FET) programme within the Seventh Framework
141Programme for Research of the European Commission, under FET-Open grant
142number: 243881}}
143\author{Paolo Tranquilli \and Claudio Sacerdoti Coen}
144\institute{Department of Computer Science and Engineering, University of Bologna,\\\email{}, \email{}}
147The labelling approach is a technique to lift cost models for non-functional
148properties of programs from the object code to the source code. It is based
149on the preservation of the structure of the high level program in every
150intermediate language used by the compiler. Such structure is captured by
151observables that are added to the semantics and that needs to be preserved
152by the forward simulation proof of correctness of the compiler. Additional
153special observables are required for function calls. In this paper we
154present a generic forward simulation proof that preserves all these observables.
155The proof statement is based on a new mechanised semantics that traces the
156structure of execution when the language is unstructured. The generic semantics
157and simulation proof have been mechanised in the interactive theorem prover
162The \emph{labelling approach} has been introduced in~\cite{easylabelling} as
163a technique to \emph{lift} cost models for non-functional properties of programs
164from the object code to the source code. Examples of non-functional properties
165are execution time, amount of stack/heap space consumed and energy required for
166communication. The basic idea of the approach is that it is impossible to
167provide a \emph{uniform} cost model for an high level language that is preserved
168\emph{precisely} by a compiler. For instance, two instances of an assignment
169$x = y$ in the source code can be compiled very differently according to the
170place (registers vs stack) where $x$ and $y$ are stored at the moment of
171execution. Therefore a precise cost model must assign a different cost
172to every occurrence, and the exact cost can only be known after compilation.
174According to the labelling approach, the compiler is free to compile and optimise
175the source code without any major restriction, but it must keep trace
176of what happens to basic blocks during the compilation. The cost model is
177then computed on the object code. It assigns a cost to every basic block.
178Finally, the compiler propagates back the cost model to the source level,
179assigning a cost to each basic block of the source code.
181Implementing the labelling approach in a certified compiler
182allows to reason formally on the high level source code of a program to prove
183non-functional properties that are granted to be preserved by the compiler
184itself. The trusted code base is then reduced to 1) the interactive theorem
185prover (or its kernel) used in the certification of the compiler and
1862) the software used to certify the property on the source language, that
187can be itself certified further reducing the trusted code base.
188In~\cite{easylabelling} the authors provide an example of a simple
189certified compiler that implements the labelling approach for the
190imperative \verb+While+ language~\cite{while}, that does not have
191pointers and function calls.
193The labelling approach has been shown to scale to more interesting scenarios.
194In particular in~\cite{functionallabelling} it has been applied to a functional
195language and in~\cite{loopoptimizations} it has been shown that the approach
196can be slightly complicated to handle loop optimisations and, more generally,
197program optimisations that do not preserve the structure of basic blocks.
198On-going work also shows that the labelling approach is also compatible with
199the complex analyses required to obtain a cost model for object code
200on processors that implement advanced features like pipelining, superscalar
201architectures and caches.
203In the European Project CerCo (Certified Complexity~\footnote{\url{}})~\cite{cerco} we are certifying a labelling approach based compiler for a large subset of C to
2048051 object code. The compiler is
205moderately optimising and implements a compilation chain that is largely
206inspired to that of CompCert~\cite{compcert1,compcert2}. Compared to work done in~\cite{easylabelling}, the main novelty and source of difficulties is due to the presence
207of function calls. Surprisingly, the addition of function calls require a
208revisitation of the proof technique given in~\cite{easylabelling}. In
209particular, at the core of the labelling approach there is a forward
210simulation proof that, in the case of \verb+While+, is only minimally
211more complex than the proof required for the preservation of the
212functional properties only. In the case of a programming language with
213function calls, instead, it turns out that the forward simulation proof for
214the back-end languages must grant a whole new set of invariants.
216In this paper we present a formalisation in the Matita interactive theorem
217prover~\cite{matita1,matita2} of a generic version of the simulation proof required for unstructured
218languages. All back-end languages of the CerCo compiler are unstructured
219languages, so the proof covers half of the correctness of the compiler.
220The statement of the generic proof is based on a new semantics
221for imperative unstructured languages that is based on \emph{structured
222traces} and that restores the preservation of structure in the observables of
223the semantics. The generic proof allows to almost completely split the
224part of the simulation that deals with functional properties only from the
225part that deals with the preservation of structure.
227The plan of this paper is the following. In Section~\ref{labelling} we
228sketch the labelling method and the problems derived from the application
229to languages with function calls. In Section~\ref{semantics} we introduce
230a generic description of an unstructured imperative language and the
231corresponding structured traces (the novel semantics). In
232Section~\ref{simulation} we describe the forward simulation proof.
233Conclusions and future works are in Section~\ref{conclusions}
235\section{The labelling approach}
237% \subsection{A brief introduction to the labelling approach}
238We briefly explain the labelling approach on the example in \autoref{examplewhile}.
239The user wants to analyse the execution time of the program (the black lines in
240\autoref{subfig:example_input}). He compiles the program using
241a special compiler that first inserts in the code three label emission
242statements (\verb+EMIT L_i+) to mark the beginning of basic blocks;
243then the compiler compiles the code to machine code (\autoref{subfig:example_oc}),
244granting that the execution of the source and object
245code emits the same sequence of labels ($L_1; L_2; L_2; L_3$ in the example).
246This is achieved by keeping track of basic blocks during compilation, avoiding
247all optimizations that alter the control flow. The latter can be recovered with
248a more refined version of the labelling approach~\cite{tranquill}, but in the
249present paper we stick to this simple variant for simplicity. Once the object
250code is produced, the compiler runs a static code analyzer to associate to
251each label $L_1, \ldots, L_3$ the cost (in clock cycles) of the instructions
252that belong to the corresponding basic block. For example, the cost $k_1$
253associated to $L_1$ is the number of cycles required to execute the block
254$I_3$ and \verb+COND l_2+, while the cost $k_2$ associated to $L_2$ counts the
255cycles required by the block $I_4$ and \verb+GOTO l_1+. The compiler also guarantees
256that every executed instruction is in the scope of some code emission label,
257that each scope does not contain loops (to associate a finite cost), and that
258both branches of a conditional statement are followed by a code emission
259statement. Under these assumptions it is true that the total execution cost
260of the program $\Delta_t$ is equal to the sum over the sequence of emitted
261labels of the cost associated to every label:
262$\Delta_t = k(L_1; L_2; L_2; L_3) = k_1 + k_2 + k_2 + k_3$.
263Finally, the compiler emits an instrumented version of the source code
264(\autoref{subfig:example_instrument}) where label emission statements are replaced
265by increments of a global variable \verb+cost+ that, before every increment, holds the
266exact number of clock cycles spent by the microprocessor so far:
267the difference $\Deltacost$ between the final and initial value of the
268internal clock is $\Deltacost = k_1 + k_2 + k_2 + k_3 = \Delta_t$. Finally, the
269user can employ any available method (e.g. Hoare logic, invariant generators,
270abstract interpretation and automated provers) to certify that $\Deltacost$
271never exceeds a certain bound~\cite{cerco}, which is now a functional property
272of the code.
276|EMIT L_1;|
278for (i=0; i<2; i++) {
279  |EMIT L_2;|
280  $I_2$;
282|EMIT L_3;|
284\caption{The input program (black lines) with its labelling (red lines).}
290     EMIT L_1
291     $I_3$
292l_1: COND l_2
293     EMIT L_2
294     $I_4$
295     GOTO l_1
296l_2: EMIT L_3
298\caption{The output labelled object code.}
304cost += $k_1$;
306for (i=0; i<2; i++) {
307cost += $k_2$;
308  $I_2$;
310cost += $k_3$;           
312\caption{The output instrumented code.}
315% \begin{verbatim}
316% EMIT L_1;                         EMIT L_1         cost += k_1;
317% I_1;                              I_3              I_1;
318% for (i=0; i<2; i++) {        l_1: COND l_2         for (i=0; i<2; i++) {
319%   EMIT L_2;                       EMIT L_2           cost += k_2;           
320%   I_2;                            I_4                I_2;               
321%  }                                GOTO l_1          }                   
322% EMIT L_3;                    l_2: EMIT L_3         cost += k_3;           
323% \end{verbatim}
324\caption{The labelling approach applied to a simple program.\label{examplewhile}. The $I_i$ are sequences of instructions not containing jumps or loops. }
327\subsection{The labelling approach in presence of calls}
329Let's now consider a simple program written in C that contains a function
330pointer call inside the scope of the cost label $L_1$, in \autoref{subfig:call_input}.
331The labelling method works exactly as before, inserting
332code emission statements/\verb+cost+ variable increments at the beginning
333of every basic block and at the beginning of every function. The compiler
334still grants that the sequence of labels observed on the two programs are
335the same. A new difficulty appears when the compiler needs to statically
336analyze the object code to assign a cost to every label. What should the scope
337of the $L_1$ label be? After executing the $I_4$ block, the \verb+CALL+
338statement passes control to a function that cannot be determined statically.
339Therefore the cost of executing the body must be paid by some other label
340(hence the requirement that every function starts with a code emission
341statement). What label should pay for the cost for the block $I_5$? The only
342reasonable answer is $L_1$, i.e. \emph{the scope of labels should extend to the
343next label emission statement or the end of the function, stepping over function calls}.
350void main() {
351  EMIT L_1;
352  $I_1$;
353  (*f)();
354  $I_2$;
356void g() {
357  EMIT L_2;
358  $I_3$;
361\caption{The input labelled C program.}
369  EMIT L_1
370  $I_4$
371  CALL
372  $I_5$
375  EMIT L_2
376  $I_6$
379\caption{The output labelled object code.}
383\caption{An example compilation of a simple program with a function pointer
384         call.}
388The latter definition of scope is adeguate on the source level because
389C is a structured language that guarantees that every function call, if it
390returns, passes control to the first instruction that follows the call. However,
391this is not guaranteed for object code, the backend languages of a compiler
392and, more generally, for unstructured
393languages that use a writable control stack to store the return address of
394calls. For example, $I_6$ could increment by $1$ the return address on the
395stack so that the next \verb+RETURN+ would start at the second instruction
396of $I_5$. The compiler would still be perfectly correct if a random, dead
397code instruction was also added just after each \verb+CALL+. More generally,
398\emph{there is no guarantee that a correct compiler that respects the functional
399behaviour of a program also respects the calling structure of the source code}.
400Without such an assumption, however, it may not be true that the execution cost
401of the program is the sum of the costs associated to the labels emitted. In our
402example, the cost of $I_5$ is paid by $L_1$, but in place of $I_5$ the processor could execute any other code after $g$ returns.
404Obviously, any reasonably written compiler produces object code that behaves
405as if the language was structured (i.e. by properly nesting function
406calls/returns and without tampering with the return addresses on the control
407stack). This property, however, is a property of the runs of object code
408programs, and not a property of the object code that can be easily statically
409verified (as the ones we required for the basic labelling method).
410Therefore, we now need to single out those runs whose cost behaviour can be
411statically predicted, and we need to prove that every run of programs generated
412by our compiler are of that type. We call them \emph{structured} since their
413main property is to respect properties that hold for free on the source code
414because of structure. Moreover, in order to avoid proving
415too many preservation properties of our compiler, we drop the original
416requirements on the object code (all instructons must be in scope of some labels,
417no loops inside a scope, etc.) in favour of the corresponding requirement
418for structured runs (a structured run must start with a label emission, no
419instruction can be executed twice between two emissions, etc.).
421We will therefore proceed as follows. In the following section
4221) we formally introduce the notion of
423structured trace, which captures structured runs in the style of labelled
424transition systems; 2) we show that on the object code we can correctly
425compute the execution time of a structured run from the sequence of labels
426observed; 3) we give unstructured languages a semantics in terms of structured
427traces; 4) we show that on the source code we can correctly compute the
428execution time of a program if the compiler produces object code whose
429runs are weakly similar to the source code runs.
431The notion of weak bisimulation for structured traces is a global property
432which is hard to prove formally and much more demanding than the simple forward
433simulation required for proofs of preservation of functional properties.
434Therefore in \autoref{simulation} we will present a set of local simulation
435conditions that refine the corresponding conditions for forward simulation and
436that are sufficient to grant the production of weakly similar traces.
438All the definitions and theorems presented in the paper have been formalized
439in the interactive theorem prover Matita and are being used to certify
440the complexity preserving compiler developed in the CerCo project~\cite{cerco}.
441The formalization can be
442found at~\ref{YYY} and it heavily relies on algebraic and dependent types for
443both structured traces and the definition of weak similarity. In the paper
444we did not try to stay close to the formalization. On the contrary,
445the definitions given in the paper are the result of a significant
446simplification effort for
447the sake of presentation and to make easier the re-implementation of the
448concepts in a proof assistant which is not based on the Calculus of Inductive
449Constructions. In any case the formalization is heavily commented to allow the
450reader to understand the technical details of the formalization.
453% @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
455% We briefly sketch here a simplified version of the labelling approach as
456% introduced in~\cite{easylabelling}. The simplification strengthens the
457% sufficient conditions given in~\cite{easylabelling} to allow a simpler
458% explanation. The simplified conditions given here are also used in the
459% CerCo compiler to simplify the proof.
461% Let $\mathcal{P}$ be a programming language whose semantics is given in
462% terms of observables: a run of a program yields a finite or infinite
463% stream of observables. We also assume for the time being that function
464% calls are not available in $\mathcal{P}$. We want to associate a cost
465% model to a program $P$ written in $\mathcal{P}$. The first step is to
466% extend the syntax of $\mathcal{P}$ with a new construct $\verb+emit L+$
467% where $L$ is a label distinct from all observables of $\mathcal{P}$.
468% The semantics of $\verb+emit L+$ is the emission of the observable
469% \verb+L+ that is meant to signal the beginning of a basic block.
471% There exists an automatic procedure that injects into the program $P$ an
472% $\verb+emit L+$ at the beginning of each basic block, using a fresh
473% \verb+L+ for each block. In particular, the bodies of loops, both branches
474% of \verb+if-then-else+s and the targets of \verb+goto+s must all start
475% with an emission statement.
477% Let now $C$ be a compiler from $\mathcal{P}$ to the object code $\mathcal{M}$,
478% that is organised in passes. Let $\mathcal{Q}_i$ be the $i$-th intermediate
479% language used by the compiler. We can easily extend every
480% intermediate language (and its semantics) with an $\verb+emit L+$ statement
481% as we did for $\mathcal{P}$. The same is possible for $\mathcal{M}$ too, with
482% the additional difficulty that the syntax of object code is given as a
483% sequence of bytes. The injection of an emission statement in the object code
484% can be done using a map that maps two consecutive code addresses with the
485% statement. The intended semantics is that, if $(pc_1,pc_2) \mapsto \verb+emit L+$ then the observable \verb+L+ is emitted after the execution of the
486% instruction stored at $pc_1$ and before the execution of the instruction
487% stored at $pc_2$. The two program counters are necessary because the
488% instruction stored at $pc_1$ can have multiple possible successors (e.g.
489% in case of a conditional branch or an indirect call). Dually, the instruction
490% stored at $pc_2$ can have multiple possible predecessors (e.g. if it is the
491% target of a jump).
493% The compiler, to be functionally correct, must preserve the observational
494% equivalence, i.e. executing the program after each compiler pass should
495% yield the same stream of observables. After the injection of emission
496% statements, observables now capture both functional and non-functional
497% behaviours.
498% This correctness property is called in the literature a forward simulation
499% and is sufficient for correctness when the target language is
500% deterministic~\cite{compcert3}.
501% We also require a stronger, non-functional preservation property: after each
502% pass all basic blocks must start with an emission statement, and all labels
503% \verb+L+ must be unique.
505% Now let $M$ be the object code obtained for the program $P$. Let us suppose
506% that we can statically inspect the code $M$ and associate to each basic block
507% a cost (e.g. the number of clock cycles required to execute all instructions
508% in the basic block, or an upper bound to that time). Every basic block is
509% labelled with an unique label \verb+L+, thus we can actually associate the
510% cost to \verb+L+. Let call it $k(\verb+L+)$.
512% The function $k$ is defined as the cost model for the object code control
513% blocks. It can be equally used as well as the cost model for the source
514% control blocks. Indeed, if the semantics of $P$ is the stream
515% $L_1 L_2 \ldots$, then, because of forward simulation, the semantics of $M$ is
516% also $L_1 L_2 \ldots$ and its actual execution cost is $\Sigma_i k(L_i)$ because
517% every instruction belongs to a control block and every control block is
518% labelled. Thus it is correct to say that the execution cost of $P$ is also
519% $\Sigma_i k(L_i)$. In other words, we have obtained a cost model $k$ for
520% the blocks of the high level program $P$ that is preserved by compilation.
522% How can the user profit from the high level cost model? Suppose, for instance,
523% that he wants to prove that the WCET of his program is bounded by $c$. It
524% is sufficient for him to prove that $\Sigma_i k(L_i) \leq c$, which is now
525% a purely functional property of the code. He can therefore use any technique
526% available to certify functional properties of the source code.
527% What is suggested in~\cite{easylabelling} is to actually instrument the
528% source code $P$ by replacing every label emission statement
529% $\verb+emit L+$ with the instruction $\verb+cost += k(L)+$ that increments
530% a global fresh variable \verb+cost+. The bound is now proved by establishing
531% the program invariant $\verb+cost+ \leq c$, which can be done for example
532% using the Frama-C~\cite{framaC} suite if the source code is some variant of
533% C.
535% In order to extend the labelling approach to function calls we make
536% \verb+CALL f+ emit the observable \verb+f+ and \verb+RET+ emit a distinguished observable
537% \verb+ret+.
539% For example the following execution history of the program in \autoref{fig:esempio}
540% $$I_1; \verb+CALL f+; \verb+COND l+; \verb+EMIT $\ell_2$+; I_3; \verb+RET+; I_2; \verb+RET+$$
541% emits the trace
542% $$\verb+main+, \verb+f+$$
543% \begin{figure}
544% \hfil
545% \begin{minipage}{.2\linewidth}
546% \begin{lstlisting}
547% main: $\!I_1$
548%       CALL f
549%       $I_2$
550%       RET
551% \end{lstlisting}
552% \end{minipage}
553% \begin{minipage}{.1\linewidth}
554% \begin{lstlisting}
555% main
556% main
557% main
558% main
559% \end{lstlisting}
560% \end{minipage}
561% \hfil
562% \begin{minipage}{.2\linewidth}
563% \begin{lstlisting}
564% f: $\!$COND l
565%    EMIT $\ell_2$
566%    RET
567% l: $\!$EMIT $\ell_3$
568%    $I_3$
569%    RET
570% \end{lstlisting}
571% \end{minipage}
572% \begin{minipage}{.1\linewidth}
573% \begin{lstlisting}
574% f
576% $\ell_2$
578% $\ell_3$
579% $\ell_3$
580% \end{lstlisting}
581% \end{minipage}
582% \hfil{}
583% \caption{}
584% \label{fig:esempio}
585% \end{figure}
588% \subsection{Labelling function calls}
589% We now want to extend the labelling approach to support function calls.
590% On the high level, \emph{structured} programming language $\mathcal{P}$ there
591% is not much to change.
592% When a function is invoked, the current basic block is temporarily exited
593% and the basic block the function starts with take control. When the function
594% returns, the execution of the original basic block is resumed. Thus the only
595% significant change is that basic blocks can now be nested. Let \verb+E+
596% be the label of the external block and \verb+I+ the label of a nested one.
597% Since the external starts before the internal, the semantics observed will be
598% \verb+E I+ and the cost associated to it on the source language will be
599% $k(\verb+E+) + k(\verb+I+)$, i.e. the cost of executing all instructions
600% in the block \verb+E+ first plus the cost of executing all the instructions in
601% the block \verb+I+. However, we know that some instructions in \verb+E+ are
602% executed after the last instruction in \verb+I+. This is actually irrelevant
603% because we are here assuming that costs are additive, so that we can freely
604% permute them\footnote{The additivity assumption fails on modern processors that have stateful subsystems, like caches and pipelines. The extension of the labelling approach to those systems is therefore non trivial and under development in the CerCo project.}. Note that, in the present discussion, we are assuming that
605% the function call terminates and yields back control to the basic block
606% \verb+E+. If the call diverges, the instrumentation
607% $\verb+cost += k(E)+$ executed at the beginning of \verb+E+ is still valid,
608% but just as an upper bound to the real execution cost: only precision is lost.
610% Let now consider what happens when we move down the compilation chain to an
611% unstructured intermediate or final language. Here unstructured means that
612% the only control operators are conditional and unconditional jumps, function
613% calls and returns. Unlike a structured language, though, there is no guarantee
614% that a function will return control just after the function call point.
615% The semantics of the return statement, indeed, consists in fetching the
616% return address from some internal structure (typically the control stack) and
617% jumping directly to it. The code can freely manipulate the control stack to
618% make the procedure returns to whatever position. Indeed, it is also possible
619% to break the well nesting of function calls/returns.
621% Is it the case that the code produced by a correct compiler must respect the
622% additional property that every function returns just after its function call
623% point? The answer is negative and the property is not implied by forward
624% simulation proofs. For instance, imagine to modify a correct compiler pass
625% by systematically adding one to the return address on the stack and by
626% putting a \verb+NOP+ (or any other instruction that takes one byte) after
627% every function call. The obtained code will be functionally indistinguishable,
628% and the added instructions will all be dead code.
630% This lack of structure in the semantics badly interferes with the labelling
631% approach. The reason is the following: when a basic block labelled with
632% \verb+E+ contains a function call, it no longer makes any sense to associate
633% to a label \verb+E+ the sum of the costs of all the instructions in the block.
634% Indeed, there is no guarantee that the function will return into the block and
635% that the instructions that will be executed after the return will be the ones
636% we are paying for in the cost model.
638% How can we make the labelling approach work in this scenario? We only see two
639% possible ways. The first one consists in injecting an emission statement after
640% every function call: basic blocks no longer contain function calls, but are now
641% terminated by them. This completely solves the problem and allows the compiler
642% to break the structure of function calls/returns at will. However, the
643% technique has several drawbacks. First of all, it greatly augments the number
644% of cost labels that are injected in the source code and that become
645% instrumentation statements. Thus, when reasoning on the source code to prove
646% non-functional properties, the user (or the automation tool) will have to handle
647% larger expressions. Second, the more labels are emitted, the more difficult it
648% becomes to implement powerful optimisations respecting the code structure.
649% Indeed, function calls are usually implemented in such a way that most registers
650% are preserved by the call, so that the static analysis of the block is not
651% interrupted by the call and an optimisation can involve both the code before
652% and after the function call. Third, instrumenting the source code may require
653% unpleasant modification of it. Take, for example, the code
654% \verb+f(g(x));+. We need to inject an emission statement/instrumentation
655% instruction just after the execution of \verb+g+. The only way to do that
656% is to rewrite the code as \verb+y = g(x); emit L; f(y);+ for some fresh
657% variable \verb+y+. It is pretty clear how in certain situations the obtained
658% code would be more obfuscated and then more difficult to manually reason on.
660% For the previous reasons, in this paper and in the CerCo project we adopt a
661% different approach. We do not inject emission statements after every
662% function call. However, we want to propagate a strong additional invariant in
663% the forward simulation proof. The invariant is the propagation of the structure
664%  of the original high level code, even if the target language is unstructured.
665% The structure we want to propagate, that will become more clear in the next
666% section, comprises 1) the property that every function should return just after
667% the function call point, which in turns imply well nesting of function calls;
668% 2) the property that every basic block starts with a code emission statement.
670% In the original labelling approach of~\cite{easylabelling}, the second property
671% was granted syntactically as a property of the generated code.
672% In our revised approach, instead, we will impose the property on the runs:
673% it will be possible to generate code that does not respect the syntactic
674% property, as soon as all possible runs respect it. For instance, dead code will no longer
675% be required to have all basic blocks correctly la. The switch is suggested
676% from the fact that the first of the two properties --- that related to
677% function calls/returns --- can only be defined as property of runs,
678% not of the static code. The switch is
679% beneficial to the proof because the original proof was made of two parts:
680% the forward simulation proof and the proof that the static property was granted.
681% In our revised approach the latter disappears and only the forward simulation
682% is kept.
684% In order to capture the structure semantics so that it is preserved
685% by a forward simulation argument, we need to make the structure observable
686% in the semantics. This is the topic of the next section.
688\section{Structured traces}
691Let's consider a generic unstructured language already equipped with a
692small step structured operational semantics (SOS). We introduce a
693deterministic labelled transition system~\cite{LTS} $(S,s_{\mathrm{init}},\Lambda,\to)$
694that refines the
695SOS by observing function calls and the beginning of basic blocks.
696$S$ is the set of states of the program, $s_\mathrm{init}$ the initial state and
697$\Lambda = \{ \tau, RET \} \cup \Labels \cup \Functions$
698where $\Functions$ is the set of names of functions that can occur in the
699program, $\Labels$ is a set of labels disjoint from $\Functions$
700and $\tau$ and $RET$ do not belong to $\Functions \cup \Labels$. Moreover there
701is an injective function $\ell : \Functions \to \Labels$ that tells the
702starting label of the body of each function, and $\ell(\Functions)\subseteq \Labels$
703denotes the image of this function.
704The transition function is defined as $s_1 \to[o] s_2$ if
705$s_1$ moves to $s_2$ according to the SOS; moreover $o = f \in \Functions$ if
706the function $f$ is called, $o = RET$ if a \verb+RETURN+ is executed,
707$o = L \in \Labels$ if an \verb+EMIT $L$+ is executed to signal the
708beginning of a basic block, and $o = \tau$ in all other cases.
709Because we assume the language to be deterministic, the label emitted can
710actually be computed simply observing $s_1$. Finally, $S$ is also endowed with
711a relation $s\ar s'$ ($s'$ \emph{follows} $s$) when the instruction to be executed
712in $s'$ is just after the one in $s$.
714Among all possible finite execution fragments we want to
715identify the ones that satisfy the requirements we sketched in the previous
716section. We say that an execution fragment
717$s_0 \to[o_0] s_1 \to[o_1] \ldots \to[o_n] s_n$
718is \emph{structured} (marking it as $s_0 \To s_n$) iff the following conditions
719are met.
721 \item For every $i$, if $s_i \to[f] s_{i+1}$ then there is a
722   label $L$ and a $k\ge i+2$ such that
723    $s_{i+1} \to[\ell(f)] s_{i+2} \To s_k \to[RET] s_{k+1}$, with
724    $s_i \ar s_{k+1}$.
725   In other words, $s_{i+1}$ must start execution with \verb+EMIT $\ell(f)$+
726   and then continue with a structured fragment returning control
727   to the instruction immediately following the call.
728   This captures the requirements that the body of function calls always start
729   with a label emission statement, and every function
730   call must converge yielding back control just after it.
731 \item For every $i$ and $f$, if $s_{i+1}\to[\ell(f)]s_{i+2}$ then
732   $s_i\to[f]s_{i+1}$. This is a technical condition needed to ensure that
733   labels associated with functions always follow a call.
734 \item The number of $RET$'s in the fragment is equal to the number of
735   calls, i.e.\ the number of observables in $\Functions$. This, together with
736   the above condition, captures the well-bracketing of the fragment with
737   respect to function calls.
738 \item For every $i$, if the instruction to be executed in $s_i$ is a
739   conditional branch, then there is an $L$ such that $s_{i+1} \to[L] s_{i+2}$ or, equivalently, that $s_{i+1}$ must start execution with an
740   \verb+EMIT $L$+. This captures the requirement that every branch which is
741   live code must start with a label emission.
743One might wonder why $f$ and $\ell(f)$, that aways appear in this order, are not
744collapsed into a single observable. This would indeed simplify some aspects of
745the formalisation, but has the problem of misassagning the cost of calls, which would
746fall under the associated label. As different call instructions with different costs
747are possible, this is not acceptable.
749Let $T = s_0 \to[o_0] s_1 \ldots \to[o_n] s_{n+1}$ be an
750execution fragment. The \emph{weak trace} $|T|$ associated to $T$ is the
751subsequence $o_{i_0} \ldots o_{i_m}$ of $o_0 \ldots o_n$ obtained dropping
752every internal action $\tau$.
754Let $k$ be a cost model that maps observables actions to any commutative cost
755monoid (e.g. natural numbers). We extend the domain of $k$ to fragments
756by posing $k(T) = \Sigma_{o \in |T|} k(o)$.
758The labelling approach is based on the idea that the execution cost of
759certain execution fragments, that we call \emph{measurable fragments}, can be
760computed from their weak trace by choosing a $k$ that assigns to any label
761the cost of the instructions in its scope.
762A structured fragment
763$T = s_0 \To s_n$ is
764measurable if it does not start or end in the middle of a basic block.
765Ending in the middle of a block would mean having pre-paid more instructions
766than the ones executed, and starting in the middle would mean not paying any
767instruction up to the first label emission.
768Formally we require $o_0 \in \Labels$ (or equivalently
769in $s_0$ the program must emit a label) and either
770$s_{n-1}\to[RET]s_n$ or $s_n$ must be a label emission statement
771(i.e.\ $s_n \to[L] s_{n+1}$).
773Given two deterministic unstructured programming languages with their own
774operational semantics, we say that a state $s_2$ of the second language
775(weakly) simulates the state $s_1$ of the first iff the two unique weak traces
776that originate from them are equal. If $s_1$ also (weakly) simulates $s_2$,
777then the two states are weakly trace equivalent.
779 or, equivalently
780because of determinism, that they are weakly bisimilar.
781% @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
783% The program semantics adopted in the traditional labelling approach is based
784% on labelled deductive systems. Given a set of observables $\mathcal{O}$ and
785% a set of states $\S$, the semantics of one deterministic execution
786% step is
787% defined as a function $S \to S \times O^*$ where $O^*$ is a (finite) stream of
788% observables. The semantics is then lifted compositionally to multiple (finite
789% or infinite) execution steps.
790% Finally, the semantics of a a whole program execution is obtained by forgetting
791% about the final state (if any), yielding a function $S \to O^*$ that given an
792% initial status returns the finite or infinite stream of observables in output.
794% We present here a new definition of semantics where the structure of execution,
795% as defined in the previous section, is now observable. The idea is to replace
796% the stream of observables with a structured data type that makes explicit
797% function call and returns and that grants some additional invariants by
798% construction. The data structure, called \emph{structured traces}, is
799% defined inductively for terminating programs and coinductively for diverging
800% ones. In the paper we focus only on the inductive structure, i.e. we assume
801% that all programs that are given a semantics are total. The Matita formalisation
802% also shows the coinductive definitions. The semantics of a program is then
803% defined as a function that maps an initial state into a structured trace.
805% In order to have a definition that works on multiple intermediate languages,
806% we abstract the type of structure traces over an abstract data type of
807% abstract statuses, which we aptly call $\verb+abstract_status+$. The fields
808% of this record are the following.
809% \begin{itemize}
810%  \item \verb+S : Type[0]+, the type of states.
811%  \item \verb+as_execute : S $\to$ S $\to$ Prop+, a binary predicate stating
812%  an execution step. We write $s_1\exec s_2$ for $\verb+as_execute+~s_1~s_2$.
813%  \item \verb+as_classifier : S $\to$ classification+, a function tagging all
814%  states with a class in
815%  $\{\verb+cl_return,cl_jump,cl_call,cl_other+\}$, depending on the instruction
816%  that is about to be executed (we omit tail-calls for simplicity). We will
817%  use $s \class c$ as a shorthand for both $\verb+as_classifier+~s=c$
818%  (if $c$ is a classification) and $\verb+as_classifier+~s\in c$
819%  (if $c$ is a set of classifications).
820%  \item \verb+as_label : S $\to$ option label+, telling whether the
821%  next instruction to be executed in $s$ is a cost emission statement,
822%  and if yes returning the associated cost label. Our shorthand for this function
823%  will be $\ell$, and we will also abuse the notation by using $\ell~s$ as a
824%  predicate stating that $s$ is labelled.
825%  \item \verb+as_call_ident : ($\Sigma$s:S. s $\class$ cl_call) $\to$ label+,
826%  telling the identifier of the function which is being called in a
827%  \verb+cl_call+ state. We will use the shorthand $s\uparrow f$ for
828%  $\verb+as_call_ident+~s = f$.
829%  \item \verb+as_after_return : ($\Sigma$s:S. s $\class$ cl_call) $\to$ S $\to$ Prop+,
830%  which holds on the \verb+cl_call+ state $s_1$ and a state $s_2$ when the
831%  instruction to be executed in $s_2$ follows the function call to be
832%  executed in (the witness of the $\Sigma$-type) $s_1$. We will use the notation
833%  $s_1\ar s_2$ for this relation.
834% \end{itemize}
836% % \begin{alltt}
837% % record abstract_status := \{ S: Type[0];
838% %  as_execute: S \(\to\) S \(\to\) Prop;   as_classifier: S \(\to\) classification;
839% %  as_label: S \(\to\) option label;    as_called: (\(\Sigma\)s:S. c s = cl_call) \(\to\) label;
840% %  as_after_return: (\(\Sigma\)s:S. c s = cl_call) \(\to\) S \(\to\) Prop \}
841% % \end{alltt}
843% The inductive type for structured traces is actually made by three multiple
844% inductive types with the following semantics:
845% \begin{enumerate}
846%  \item $(\verb+trace_label_return+~s_1~s_2)$ (shorthand $\verb+TLR+~s_1~s_2$)
847%    is a trace that begins in
848%    the state $s_1$ (included) and ends just before the state $s_2$ (excluded)
849%    such that the instruction to be executed in $s_1$ is a label emission
850%    statement and the one to be executed in the state before $s_2$ is a return
851%    statement. Thus $s_2$ is the state after the return. The trace
852%    may contain other label emission statements. It captures the structure of
853%    the execution of function bodies: they must start with a cost emission
854%    statement and must end with a return; they are obtained by concatenating
855%    one or more basic blocks, all starting with a label emission
856%    (e.g. in case of loops).
857%  \item $(\verb+trace_any_label+~b~s_1~s_2)$ (shorthand $\verb+TAL+~b~s_1~s_2$)
858%    is a trace that begins in
859%    the state $s_1$ (included) and ends just before the state $s_2$ (excluded)
860%    such that the instruction to be executed in $s_2$/in the state before
861%    $s_2$ is either a label emission statement or
862%    or a return, according to the boolean $b$. It must not contain
863%    any label emission statement. It captures the notion of a suffix of a
864%    basic block.
865%  \item $(\verb+trace_label_label+~b~s_1~s_2)$ (shorthand $\verb+TLL+~b~s_1~s_2$ is the special case of
866%    $\verb+TAL+~b~s_1~s_2)$ such that the instruction to be
867%    executed in $s_1$ is a label emission statement. It captures the notion of
868%    a basic block.
869% \end{enumerate}
871% \begin{multicols}{3}
872% \infrule[\verb+tlr_base+]
873%  {\verb+TLL+~true~s_1~s_2}
874%  {\verb+TLR+~s_1~s_2}
876% \infrule[\verb+tlr_step+]
877%  {\verb+TLL+~false~s_1~s_2 \andalso
878%   \verb+TLR+~s_2~s_3
879%  }
880%  {\verb+TLR+~s_1~s_3}
882% \infrule[\verb+tll_base+]
883%  {\verb+TAL+~b~s_1~s_2 \andalso
884%   \ell~s_1
885%  }
886%  {\verb+TLL+~b~s_1~s_2}
887% \end{multicols}
889% \infrule[\verb+tal_base_not_return+]
890%  {s_1\exec s_2 \andalso
891%   s_1\class\{\verb+cl_jump+, \verb+cl_other+\}\andalso
892%   \ell~s_2
893%  }
894%  {\verb+TAL+~false~s_1~s_2}
896% \infrule[\verb+tal_base_return+]
897%  {s_1\exec s_2 \andalso
898%   s_1 \class \verb+cl_return+
899%  }
900%  {\verb+TAL+~true~s_1~s_2}
902% \infrule[\verb+tal_base_call+]
903%  {s_1\exec s_2 \andalso
904%   s_1 \class \verb+cl_call+ \andalso
905%   s_1\ar s_3 \andalso
906%   \verb+TLR+~s_2~s_3 \andalso
907%   \ell~s_3
908%  }
909%  {\verb+TAL+~false~s_1~s_3}
911% \infrule[\verb+tal_step_call+]
912%  {s_1\exec s_2 \andalso
913%   s_1 \class \verb+cl_call+ \andalso
914%   s_1\ar s_3 \andalso
915%   \verb+TLR+~s_2~s_3 \andalso
916%   \verb+TAL+~b~s_3~s_4
917%  }
918%  {\verb+TAL+~b~s_1~s_4}
920% \infrule[\verb+tal_step_default+]
921%  {s_1\exec s_2 \andalso
922%   \lnot \ell~s_2 \andalso
923%   \verb+TAL+~b~s_2~s_3\andalso
924%   s_1 \class \verb+cl_other+
925%  }
926%  {\verb+TAL+~b~s_1~s_3}
927% \begin{comment}
928% \begin{verbatim}
929% inductive trace_label_return (S:abstract_status) : S → S → Type[0] ≝
930%   | tlr_base:
931%       ∀status_before: S.
932%       ∀status_after: S.
933%         trace_label_label S ends_with_ret status_before status_after →
934%         trace_label_return S status_before status_after
935%   | tlr_step:
936%       ∀status_initial: S.
937%       ∀status_labelled: S.
938%       ∀status_final: S.
939%         trace_label_label S doesnt_end_with_ret status_initial status_labelled →
940%         trace_label_return S status_labelled status_final →
941%           trace_label_return S status_initial status_final
942% with trace_label_label: trace_ends_with_ret → S → S → Type[0] ≝
943%   | tll_base:
944%       ∀ends_flag: trace_ends_with_ret.
945%       ∀start_status: S.
946%       ∀end_status: S.
947%         trace_any_label S ends_flag start_status end_status →
948%         as_costed S start_status →
949%           trace_label_label S ends_flag start_status end_status
950% with trace_any_label: trace_ends_with_ret → S → S → Type[0] ≝
951%   (* Single steps within a function which reach a label.
952%      Note that this is the only case applicable for a jump. *)
953%   | tal_base_not_return:
954%       ∀start_status: S.
955%       ∀final_status: S.
956%         as_execute S start_status final_status →
957%         (as_classifier S start_status cl_jump ∨
958%          as_classifier S start_status cl_other) →
959%         as_costed S final_status →
960%           trace_any_label S doesnt_end_with_ret start_status final_status
961%   | tal_base_return:
962%       ∀start_status: S.
963%       ∀final_status: S.
964%         as_execute S start_status final_status →
965%         as_classifier S start_status cl_return →
966%           trace_any_label S ends_with_ret start_status final_status
967%   (* A call followed by a label on return. *)
968%   | tal_base_call:
969%       ∀status_pre_fun_call: S.
970%       ∀status_start_fun_call: S.
971%       ∀status_final: S.
972%         as_execute S status_pre_fun_call status_start_fun_call →
973%         ∀H:as_classifier S status_pre_fun_call cl_call.
974%           as_after_return S «status_pre_fun_call, H» status_final →
975%           trace_label_return S status_start_fun_call status_final →
976%           as_costed S status_final →
977%             trace_any_label S doesnt_end_with_ret status_pre_fun_call status_final
978%   (* A call followed by a non-empty trace. *)
979%   | tal_step_call:
980%       ∀end_flag: trace_ends_with_ret.
981%       ∀status_pre_fun_call: S.
982%       ∀status_start_fun_call: S.
983%       ∀status_after_fun_call: S.
984%       ∀status_final: S.
985%         as_execute S status_pre_fun_call status_start_fun_call →
986%         ∀H:as_classifier S status_pre_fun_call cl_call.
987%           as_after_return S «status_pre_fun_call, H» status_after_fun_call →
988%           trace_label_return S status_start_fun_call status_after_fun_call →
989%           ¬ as_costed S status_after_fun_call →
990%           trace_any_label S end_flag status_after_fun_call status_final →
991%             trace_any_label S end_flag status_pre_fun_call status_final
992%   | tal_step_default:
993%       ∀end_flag: trace_ends_with_ret.
994%       ∀status_pre: S.
995%       ∀status_init: S.
996%       ∀status_end: S.
997%         as_execute S status_pre status_init →
998%         trace_any_label S end_flag status_init status_end →
999%         as_classifier S status_pre cl_other →
1000%         ¬ (as_costed S status_init) →
1001%           trace_any_label S end_flag status_pre status_end.
1002% \end{verbatim}
1003% \end{comment}
1004% A \verb+trace_label_return+ is isomorphic to a list of
1005% \verb+trace_label_label+s that ends with a cost emission followed by a
1006% return terminated \verb+trace_label_label+.
1007% The interesting cases are those of $\verb+trace_any_label+~b~s_1~s_2$.
1008% A \verb+trace_any_label+ is a sequence of steps built by a syntax directed
1009% definition on the classification of $s_1$. The constructors of the datatype
1010% impose several invariants that are meant to impose a structure to the
1011% otherwise unstructured execution. In particular, the following invariants are
1012% imposed:
1013% \begin{enumerate}
1014%  \item the trace is never empty; it ends with a return iff $b$ is
1015%        true
1016%  \item a jump must always be the last instruction of the trace, and it must
1017%        be followed by a cost emission statement; i.e. the target of a jump
1018%        is always the beginning of a new basic block; as such it must start
1019%        with a cost emission statement
1020%  \item a cost emission statement can never occur inside the trace, only in
1021%        the status immediately after
1022%  \item the trace for a function call step is made of a subtrace for the
1023%        function body of type
1024%        $\verb+trace_label_return+~s_1~s_2$, possibly followed by the
1025%        rest of the trace for this basic block. The subtrace represents the
1026%        function execution. Being an inductive datum, it grants totality of
1027%        the function call. The status $s_2$ is the one that follows the return
1028%        statement. The next instruction of $s_2$ must follow the function call
1029%        instruction. As a consequence, function calls are also well nested.
1030% \end{enumerate}
1032% There are three mutual structural recursive functions, one for each of
1033% \verb+TLR+, \verb+TLL+ and \verb+TAL+, for which we use the same notation
1034% $|\,.\,|$: the \emph{flattening} of the traces. These functions
1035% allow to extract from a structured trace the list of emitted cost labels.
1036% %  We only show here the type of one
1037% % of them:
1038% % \begin{alltt}
1039% % flatten_trace_label_return:
1040% %  \(\forall\)S: abstract_status. \(\forall\)\(s_1,s_2\).
1041% %   trace_label_return \(s_1\) \(s_2\) \(\to\) list (as_cost_label S)
1042% % \end{alltt}
1044% \paragraph{Cost prediction on structured traces.}
1046% The first main theorem of CerCo about traces
1047% (theorem \verb+compute_max_trace_label_return_cost_ok_with_trace+)
1048% holds for the
1049% instantiation
1050% of the structured traces to the concrete status of object code programs.
1051% Simplifying a bit, it states that
1052% \begin{equation}\label{th1}
1053% \begin{array}{l}\forall s_1,s_2. \forall \tau: \verb+TLR+~s_1~s_2.~
1054%   \verb+clock+~s_2 = \verb+clock+~s_1 +
1055%   \Sigma_{\alpha \in |\tau|}\;k(\alpha)
1056% \end{array}
1057% \end{equation}
1058% where the cost model $k$ is statically computed from the object code
1059% by associating to each label $\alpha$ the sum of the cost of the instructions
1060% in the basic block that starts at $\alpha$ and ends before the next labelled
1061% instruction. The theorem is proved by structural induction over the structured
1062% trace, and is based on the invariant that
1063% iff the function that computes the cost model has analysed the instruction
1064% to be executed at $s_2$ after the one to be executed at $s_1$, and if
1065% the structured trace starts with $s_1$, then eventually it will contain also
1066% $s_2$. When $s_1$ is not a function call, the result holds trivially because
1067% of the $s_1\exec s_2$ condition obtained by inversion on
1068% the trace. The only non
1069% trivial case is the one of function calls: the cost model computation function
1070% does recursion on the first instruction that follows that function call; the
1071% \verb+as_after_return+ condition of the \verb+tal_base_call+ and
1072% \verb+tal_step_call+ grants exactly that the execution will eventually reach
1073% this state.
1075% \paragraph{Structured traces similarity and cost prediction invariance.}
1077% A compiler pass maps source to object code and initial states to initial
1078% states. The source code and initial state uniquely determine the structured
1079% trace of a program, if it exists. The structured trace fails to exists iff
1080% the structural conditions are violated by the program execution (e.g. a function
1081% body does not start with a cost emission statement). Let us assume that the
1082% target structured trace exists.
1084% What is the relation between the source and target structured traces?
1085% In general, the two traces can be arbitrarily different. However, we are
1086% interested only in those compiler passes that maps a trace $\tau_1$ to a trace
1087% $\tau_2$ such that
1088% \begin{equation}|\tau_1| = |\tau_2|.\label{th2}\end{equation}
1089% The reason is that the combination of~\eqref{th1} with~\eqref{th2} yields the
1090% corollary
1091% \begin{equation}\label{th3}
1092% \forall s_1,s_2. \forall \tau: \verb+TLR+~s_1~s_2.~
1093%   \verb+clock+~s_2 - \verb+clock+~s_1 =
1094%   \Sigma_{\alpha \in |\tau_1|}\;k(\alpha) =
1095%   \Sigma_{\alpha \in |\tau_2|}\;k(\alpha).
1096% \end{equation}
1097% This corollary states that the actual execution time of the program can be computed equally well on the source or target language. Thus it becomes possible to
1098% transfer the cost model from the target to the source code and reason on the
1099% source code only.
1101% We are therefore interested in conditions stronger than~\eqref{th2}.
1102% Therefore we introduce here a similarity relation between traces with
1103% the same structure. Theorem~\verb+tlr_rel_to_traces_same_flatten+
1104% in the Matita formalisation shows that~\eqref{th2} holds for every pair
1105% $(\tau_1,\tau_2)$ of similar traces.
1107% Intuitively, two traces are similar when one can be obtained from
1108% the other by erasing or inserting silent steps, i.e. states that are
1109% not \verb+as_costed+ and that are classified as \verb+cl_other+.
1110% Silent steps do not alter the structure of the traces.
1111% In particular,
1112% the relation maps function calls to function calls to the same function,
1113% label emission statements to emissions of the same label, concatenation of
1114% subtraces to concatenation of subtraces of the same length and starting with
1115% the same emission statement, etc.
1117% In the formalisation the three similarity relations --- one for each trace
1118% kind --- are defined by structural recursion on the first trace and pattern
1119% matching over the second. Here we turn
1120% the definition into the inference rules shown in \autoref{fig:txx_rel}
1121% for the sake of readability. We also omit from trace constructors all arguments,
1122% but those that are traces or that
1123% are used in the premises of the rules. By abuse of notation we denote all three
1124% relations by infixing $\approx$.
1126% \begin{figure}
1127% \begin{multicols}{2}
1128% \infrule
1129%  {tll_1\approx tll_2
1130%  }
1131%  {\verb+tlr_base+~tll_1 \approx \verb+tlr_base+~tll_2}
1133% \infrule
1134%  {tll_1 \approx tll_2 \andalso
1135%   tlr_1 \approx tlr_2
1136%  }
1137%  {\verb+tlr_step+~tll_1~tlr_1 \approx \verb+tlr_step+~tll_2~tlr_2}
1138% \end{multicols}
1139% \vspace{3ex}
1140% \begin{multicols}{2}
1141% \infrule
1142%  {\ell~s_1 = \ell~s_2 \andalso
1143%   tal_1\approx tal_2
1144%  }
1145%  {\verb+tll_base+~s_1~tal_1 \approx \verb+tll_base+~s_2~tal_2}
1147% \infrule
1148%  {tal_1\approx tal_2
1149%  }
1150%  {\verb+tal_step_default+~tal_1 \approx tal_2}
1151% \end{multicols}
1152% \vspace{3ex}
1153% \infrule
1154%  {}
1155%  {\verb+tal_base_not_return+\approx taa \append \verb+tal_base_not_return+}
1156% \vspace{1ex}
1157% \infrule
1158%  {}
1159%  {\verb+tal_base_return+\approx taa \append \verb+tal_base_return+}
1160% \vspace{1ex}
1161% \infrule
1162%  {tlr_1\approx tlr_2 \andalso
1163%   s_1 \uparrow f \andalso s_2\uparrow f
1164%  }
1165%  {\verb+tal_base_call+~s_1~tlr_1\approx taa \append \verb+tal_base_call+~s_2~tlr_2}
1166% \vspace{1ex}
1167% \infrule
1168%  {tlr_1\approx tlr_2 \andalso
1169%   s_1 \uparrow f \andalso s_2\uparrow f \andalso
1170%   \verb+tal_collapsable+~tal_2
1171%  }
1172%  {\verb+tal_base_call+~s_1~tlr_1 \approx taa \append \verb+tal_step_call+~s_2~tlr_2~tal_2)}
1173% \vspace{1ex}
1174% \infrule
1175%  {tlr_1\approx tlr_2 \andalso
1176%   s_1 \uparrow f \andalso s_2\uparrow f \andalso
1177%   \verb+tal_collapsable+~tal_1
1178%  }
1179%  {\verb+tal_step_call+~s_1~tlr_1~tal_1 \approx taa \append \verb+tal_base_call+~s_2~tlr_2)}
1180% \vspace{1ex}
1181% \infrule
1182%  {tlr_1 \approx tlr_2 \andalso
1183%   s_1 \uparrow f \andalso s_2\uparrow f\andalso
1184%   tal_1 \approx tal_2 \andalso
1185%  }
1186%  {\verb+tal_step_call+~s_1~tlr_1~tal_1 \approx taa \append \verb+tal_step_call+~s_2~tlr_2~tal_2}
1187% \caption{The inference rule for the relation $\approx$.}
1188% \label{fig:txx_rel}
1189% \end{figure}
1190% %
1191% \begin{comment}
1192% \begin{verbatim}
1193% let rec tlr_rel S1 st1 st1' S2 st2 st2'
1194%   (tlr1 : trace_label_return S1 st1 st1')
1195%   (tlr2 : trace_label_return S2 st2 st2') on tlr1 : Prop ≝
1196% match tlr1 with
1197%   [ tlr_base st1 st1' tll1 ⇒
1198%     match tlr2 with
1199%     [ tlr_base st2 st2' tll2 ⇒ tll_rel … tll1 tll2
1200%     | _ ⇒ False
1201%     ]
1202%   | tlr_step st1 st1' st1'' tll1 tl1 ⇒
1203%     match tlr2 with
1204%     [ tlr_step st2 st2' st2'' tll2 tl2 ⇒
1205%       tll_rel … tll1 tll2 ∧ tlr_rel … tl1 tl2
1206%     | _ ⇒ False
1207%     ]
1208%   ]
1209% and tll_rel S1 fl1 st1 st1' S2 fl2 st2 st2'
1210%  (tll1 : trace_label_label S1 fl1 st1 st1')
1211%  (tll2 : trace_label_label S2 fl2 st2 st2') on tll1 : Prop ≝
1212%   match tll1 with
1213%   [ tll_base fl1 st1 st1' tal1 H ⇒
1214%     match tll2 with
1215%     [ tll_base fl2 st2 st2 tal2 G ⇒
1216%       as_label_safe … («?, H») = as_label_safe … («?, G») ∧
1217%       tal_rel … tal1 tal2
1218%     ]
1219%   ]
1220% and tal_rel S1 fl1 st1 st1' S2 fl2 st2 st2'
1221%  (tal1 : trace_any_label S1 fl1 st1 st1')
1222%  (tal2 : trace_any_label S2 fl2 st2 st2')
1223%    on tal1 : Prop ≝
1224%   match tal1 with
1225%   [ tal_base_not_return st1 st1' _ _ _ ⇒
1226%     fl2 = doesnt_end_with_ret ∧
1227%     ∃st2mid,taa,H,G,K.
1228%     tal2 ≃ taa_append_tal ? st2 ??? taa
1229%       (tal_base_not_return ? st2mid st2' H G K)
1230%   | tal_base_return st1 st1' _ _ ⇒
1231%     fl2 = ends_with_ret ∧
1232%     ∃st2mid,taa,H,G.
1233%     tal2 ≃ taa_append_tal ? st2 ? st2mid st2' taa
1234%       (tal_base_return ? st2mid st2' H G)
1235%   | tal_base_call st1 st1' st1'' _ prf _ tlr1 _ ⇒
1236%     fl2 = doesnt_end_with_ret ∧
1237%     ∃st2mid,G.as_call_ident S2 («st2mid, G») = as_call_ident ? «st1, prf» ∧
1238%     ∃taa : trace_any_any ? st2 st2mid.∃st2mid',H.
1239%     (* we must allow a tal_base_call to be similar to a call followed
1240%       by a collapsable trace (trace_any_any followed by a base_not_return;
1241%       we cannot use trace_any_any as it disallows labels in the end as soon
1242%       as it is non-empty) *)
1243%     (∃K.∃tlr2 : trace_label_return ? st2mid' st2'.∃L.
1244%       tal2 ≃ taa @ (tal_base_call … H G K tlr2 L) ∧ tlr_rel … tlr1 tlr2) ∨
1245%     ∃st2mid'',K.∃tlr2 : trace_label_return ? st2mid' st2mid''.∃L.
1246%     ∃tl2 : trace_any_label … doesnt_end_with_ret st2mid'' st2'.
1247%       tal2 ≃ taa @ (tal_step_call … H G K tlr2 L tl2) ∧
1248%       tlr_rel … tlr1 tlr2 ∧ tal_collapsable … tl2
1249%   | tal_step_call fl1 st1 st1' st1'' st1''' _ prf _ tlr1 _ tl1 ⇒
1250%     ∃st2mid,G.as_call_ident S2 («st2mid, G») = as_call_ident ? «st1, prf» ∧
1251%     ∃taa : trace_any_any ? st2 st2mid.∃st2mid',H.
1252%     (fl2 = doesnt_end_with_ret ∧ ∃K.∃tlr2 : trace_label_return ? st2mid' st2'.∃L.
1253%       tal2 ≃ taa @ tal_base_call … H G K tlr2 L ∧
1254%       tal_collapsable … tl1 ∧ tlr_rel … tlr1 tlr2) ∨
1255%     ∃st2mid'',K.∃tlr2 : trace_label_return ? st2mid' st2mid''.∃L.
1256%     ∃tl2 : trace_any_label ? fl2 st2mid'' st2'.
1257%       tal2 ≃ taa @ (tal_step_call … H G K tlr2 L tl2) ∧
1258%       tal_rel … tl1 tl2 ∧ tlr_rel … tlr1 tlr2
1259%   | tal_step_default fl1 st1 st1' st1'' _ tl1 _ _ ⇒
1260%     tal_rel … tl1 tal2 (* <- this makes it many to many *)
1261%   ].
1262% \end{verbatim}
1263% \end{comment}
1264% %
1265% In the preceding rules, a $taa$ is an inhabitant of the
1266% $\verb+trace_any_any+~s_1~s_2$ (shorthand $\verb+TAA+~s_1~s_2$),
1267% an inductive data type whose definition
1268% is not in the paper for lack of space. It is the type of valid
1269% prefixes (even empty ones) of \verb+TAL+'s that do not contain
1270% any function call. Therefore it
1271% is possible to concatenate (using ``$\append$'') a \verb+TAA+ to the
1272% left of a \verb+TAL+. A \verb+TAA+ captures
1273% a sequence of silent moves.
1274% The \verb+tal_collapsable+ unary predicate over \verb+TAL+'s
1275% holds when the argument does not contain any function call and it ends
1276% with a label (not a return). The intuition is that after a function call we
1277% can still perform a sequence of silent actions while remaining similar.
1279% As should be expected, even though the rules are asymmetric $\approx$ is in fact
1280% an equivalence relation.
1281\section{Forward simulation}
1284We summarise here the results of the previous sections. Each intermediate
1285unstructured language can be given a semantics based on structured traces,
1286that single out those runs that respect a certain number of invariants.
1287A cost model can be computed on the object code and it can be used to predict
1288the execution costs of runs that produce structured traces. The cost model
1289can be lifted from the target to the source code of a pass if the pass maps
1290structured traces to similar structured traces. The latter property is called
1291a \emph{forward simulation}.
1293As for labelled transition systems, in order to establish the forward
1294simulation we are interested in (preservation of observables), we are
1295forced to prove a stronger notion of forward simulation that introduces
1296an explicit relation between states. The classical notion of a 1-to-many
1297forward simulation is the existence of a relation $\S$ over states such that
1298if $s_1 \S s_2$ and $s_1 \to^1 s_1'$ then there exists an $s_2'$ such that
1299$s_2 \to^* s_2'$ and $s_1' \S s_2'$. In our context, we need to replace the
1300one and multi step transition relations $\to^n$ with the existence of
1301a structured trace between the two states, and we need to add the request that
1302the two structured traces are similar. Thus what we would like to state is
1303something like:\\
1304for all $s_1,s_2,s_1'$ such that there is a $\tau_1$ from
1305$s_1$ to $s_1'$ and $s_1 \S s_2$ there exists an $s_2'$ such that
1306$s_1' \S s_2'$ and a $\tau_2$ from $s_2$ to $s_2'$ such that
1307$\tau_1$ is similar to $\tau_2$. We call this particular form of forward
1308simulation \emph{trace reconstruction}.
1310The statement just introduced, however, is too simplistic and not provable
1311in the general case. To understand why, consider the case of a function call
1312and the pass that fixes the parameter passing conventions. A function
1313call in the source code takes in input an arbitrary number of pseudo-registers (the actual parameters to pass) and returns an arbitrary number of pseudo-registers (where the result is stored). A function call in the target language has no
1314input nor output parameters. The pass must add explicit code before and after
1315the function call to move the pseudo-registers content from/to the hardware
1316registers or the stack in order to implement the parameter passing strategy.
1317Similarly, each function body must be augmented with a preamble and a postamble
1318to complete/initiate the parameter passing strategy for the call/return phase.
1319Therefore what used to be a call followed by the next instruction to execute
1320after the function return, now becomes a sequence of instructions, followed by
1321a call, followed by another sequence. The two states at the beginning of the
1322first sequence and at the end of the second sequence are in relation with
1323the status before/after the call in the source code, like in an usual forward
1324simulation. How can we prove however the additional condition for function calls
1325that asks that when the function returns the instruction immediately after the
1326function call is called? To grant this invariant, there must be another relation
1327between the address of the function call in the source and in the target code.
1328This additional relation is to be used in particular to relate the two stacks.
1330Another example is given by preservation of code emission statements. A single
1331code emission instruction can be simulated by a sequence of steps, followed
1332by a code emission, followed by another sequence. Clearly the initial and final
1333statuses of the sequence are to be in relation with the status before/after the
1334code emission in the source code. In order to preserve the structured traces
1335invariants, however, we must consider a second relation between states that
1336traces the preservation of the code emission statement.
1338Therefore we now introduce an abstract notion of relation set between abstract
1339statuses and an abstract notion of 1-to-many forward simulation conditions.
1340These two definitions enjoy the following remarkable properties:
1342 \item they are generic enough to accommodate all passes of the CerCo compiler
1343 \item the conjunction of the 1-to-many forward simulation conditions are
1344       just slightly stricter than the statement of a 1-to-many forward
1345       simulation in the classical case. In particular, they only require
1346       the construction of very simple forms of structured traces made of
1347       silent states only.
1348 \item they allow to prove our main result of the paper: the 1-to-many
1349       forward simulation conditions are sufficient to prove the trace
1350       reconstruction theorem
1353Point 3. is the important one. First of all it means that we have reduced
1354the complex problem of trace reconstruction to a much simpler one that,
1355moreover, can be solved with slight adaptations of the forward simulation proof
1356that is performed for a compiler that only cares about functional properties.
1357Therefore we have successfully splitted as much as possible the proof of
1358preservation of functional properties from that of non-functional ones.
1359Secondly, combined with the results in the previous section, it implies
1360that the cost model can be computed on the object code and lifted to the
1361source code to reason on non-functional properties, assuming that
1362the 1-to-many forward simulation conditions are fulfilled for every
1363compiler pass.
1365\paragraph{Relation sets.}
1367We introduce now the four relations $\mathcal{S,C,L,R}$ between abstract
1368statuses that are used to correlate the corresponding statues before and
1369after a compiler pass. The first two are abstract and must be instantiated
1370by every pass. The remaining two are derived relations.
1372The $\S$ relation between states is the classical relation used
1373in forward simulation proofs. It correlates the data of the status
1374(e.g. registers, memory, etc.).
1376The $\C$ relation correlates call states. It allows to track the
1377position in the target code of every call in the source code.
1379The $\L$ relation simply says that the two states are both label
1380emitting states that emit the same label, \emph{i.e.}\ $s_1\L s_2\iffdef \ell~s_1=\ell~s_2$.
1381It allows to track the position in
1382the target code of every cost emitting statement in the source code.
1384Finally the $\R$ relation is the more complex one. Two states
1385$s_1$ and $s_2$ are $\R$ correlated if every time $s_1$ is the
1386successors of a call state that is $\C$-related to a call state
1387$s_2'$ in the target code, then $s_2$ is the successor of $s_2'$. Formally:
1388$$s_1\R s_2 \iffdef \forall s_1',s_2'.s_1'\C s_2' \to s_1'\ar s_1 \to s_2' \ar s_2.$$
1389We will require all pairs of states that follow a related call to be
1390$\R$-related. This is the fundamental requirement granting
1391that the target trace is well structured, \emph{i.e.}\ that calls are well
1392nested and returning where they are supposed to.
1394% \begin{alltt}
1395% record status_rel (S1,S2 : abstract_status) : Type[1] := \{
1396%   \(\S\): S1 \(\to\) S2 \(\to\) Prop;
1397%   \(\C\): (\(\Sigma\)s.as_classifier S1 s cl_call) \(\to\)
1398%      (\(\Sigma\)s.as_classifier S2 s cl_call) \(\to\) Prop \}.
1400% definition \(\L\) S1 S2 st1 st2 := as_label S1 st1 = as_label S2 st2.
1402% definition \(\R\) S1 S2 (R: status_rel S1 S2) s1_ret s2_ret ≝
1403%  \(\forall\)s1_pre,s2_pre.
1404%   as_after_return s1_pre s1_ret \(\to\) s1_pre \(\R\) s2_pre \(\to\)
1405%    as_after_return s2_pre s2_ret.
1406% \end{alltt}
1409% \begin{subfigure}{.475\linewidth}
1410% \centering
1411% \begin{tikzpicture}[every join/.style={ar}, join all, thick,
1412%                             every label/.style=overlay, node distance=10mm]
1413%     \matrix [diag] (m) {%
1414%          \node (s1) [is jump] {}; & \node [fill=white] (t1) {};\\
1415%          \node (s2) {}; & \node (t2) {}; \\
1416%     };
1417%     \node [above=0 of t1, overlay] {$\alpha$};
1418%     {[-stealth]
1419%     \draw (s1) -- (t1);
1420%     \draw [new] (s2) -- node [above] {$*$} (t2);
1421%     }
1422%     \draw (s1) to node [rel] {$\S$} (s2);
1423%     \draw [new] (t1) to node [rel] {$\S,\L$} (t2);
1424% \end{tikzpicture}
1425% \caption{The \verb+cl_jump+ case.}
1426% \label{subfig:cl_jump}
1427% \end{subfigure}
1428% &
1431    \matrix [diag] (m) {%
1432         \node (s1) {s_1}; & \node (t1) {s_1'};\\
1433         \node (s2) {s_2}; & \node (t2) {s_2'}; \\
1434    };
1435    {[-stealth]
1436    \draw (s1) -- node [above] {$\tau$} (t1);
1437    \draw [new] (s2) -- node [above] {$\tau *$} (t2);
1438    }
1439    \draw (s1) to [bend right, anchor=mid] node [rel] {$\S$} (s2);
1440    \draw [new] (t1) to [bend left, anchor=mid] node [rel] {$\S$} (t2);
1444    \matrix [diag] (m) {%
1445         \node (s1) {s_1}; & \node (t1) {s_1'};\\
1446         \node (s2) {s_2}; & \node (t2) {s_2'}; \\
1447    };
1448    {[-stealth]
1449    \draw (s1) -- node [above] {$L$} (t1);
1450    \draw [new] (s2) -- node [above] {$L$} (t2);
1451    }
1452    \draw (s1) to [bend right, anchor=mid] node [rel] {$\S$} (s2);
1453    \draw [new] (t1) to [bend left, anchor=mid] node [rel] {$\S$} (t2);
1455\text{ if $L\notin\ell(\Functions)$}
1458    \matrix [diag] (m) {%
1459         \node (s1) {s_1}; & \node (t1) {s_1'};\\
1460         \node (s2) {s_2};\\
1461    };
1462    {[-stealth]
1463    \draw (s1) -- node [above] {$\ell(f)$} (t1);
1464    }
1465    \draw (s1) to [bend right, anchor=mid] node [rel] {$\S$} (s2);
1466    \draw [new] (t1) to [bend left, anchor=mid] node [rel] {$\S$} (s2);
1470    \matrix [diag, small vgap] (m) {%
1471         &\node (t1) {s_1'}; \\
1472         \node (s1) {s_1}; \\
1473         && \node (l1) {s_b}; & \node (l2) {s_c}; & \node (t2) {s_2'};\\
1474         \node (s2) {s_2}; & \node (c) {s_a};\\   
1475    };
1476    {[-stealth]
1477    \draw (s1) -- node [above left] {$f$} (t1);
1478    \draw [new] (s2) -- node [above] {$\tau *$} (c);
1479    \draw [new] (c) -- node [above left] {$f$} (l1);
1480    \draw [new] (l1) -- node [above] {$\ell(f)$} (l2);
1481    \draw [new] (l2) -- node [above] {$\tau *$} (t2);
1482    }
1483    \draw (s1) to [bend right] node [rel] {$\S$} (s2);
1484    \draw [new] (t1) to [bend left] node [rel] {$\S$} (t2);
1485    \draw [new] (t1) to [bend right] node [rel] {$\C$} (c);
1489    \matrix [diag, small vgap] (m) {%
1490        \node (s1) {s_1}; \\
1491        &\node (t1) {s_1'}; \\
1492        \node (s2) {s_2}; & \node (c) {s_a};\\
1493        && \node (r) {s_b}; & \node (t2) {s_2'}; \\   
1494    };
1495    {[-stealth]
1496    \draw (s1) -- node [above right] {$RET$} (t1);
1497    \draw [new] (s2) -- node [above] {$\tau *$} (c);
1498    \draw [new] (c) -- node [below left] {$RET$} (r);
1499    \draw [new] (r) -- node [above] {$\tau *$} (t2);
1500    }
1501    \draw (s1) to [bend right] node [rel] {$\S$} (s2);
1502    \draw [new, overlay] (t1) to [bend left=60] node [rel] {$\S$} (t2);
1503    \draw [new, overlay] (t1) to [bend left ] node [rel] {$\R$} (r);
1505\caption{The hypotheses of ???. Dashed relations are implied by solid ones.}
1510% \begin{figure}
1511% \centering
1512% \begin{tabular}{@{}c@{}c@{}c@{}}
1513% % \begin{subfigure}{.475\linewidth}
1514% % \centering
1515% % \begin{tikzpicture}[every join/.style={ar}, join all, thick,
1516% %                             every label/.style=overlay, node distance=10mm]
1517% %     \matrix [diag] (m) {%
1518% %          \node (s1) [is jump] {}; & \node [fill=white] (t1) {};\\
1519% %          \node (s2) {}; & \node (t2) {}; \\
1520% %     };
1521% %     \node [above=0 of t1, overlay] {$\alpha$};
1522% %     {[-stealth]
1523% %     \draw (s1) -- (t1);
1524% %     \draw [new] (s2) -- node [above] {$*$} (t2);
1525% %     }
1526% %     \draw (s1) to node [rel] {$\S$} (s2);
1527% %     \draw [new] (t1) to node [rel] {$\S,\L$} (t2);
1528% % \end{tikzpicture}
1529% % \caption{The \verb+cl_jump+ case.}
1530% % \label{subfig:cl_jump}
1531% % \end{subfigure}
1532% % &
1533% \begin{subfigure}{.25\linewidth}
1534% \centering
1535% \begin{tikzpicture}[every join/.style={ar}, join all, thick,
1536%                             every label/.style=overlay, node distance=10mm]
1537%     \matrix [diag] (m) {%
1538%          \node (s1) {}; & \node (t1) {};\\
1539%          \node (s2) {}; & \node (t2) {}; \\
1540%     };
1541%     {[-stealth]
1542%     \draw (s1) -- (t1);
1543%     \draw [new] (s2) -- node [above] {$*$} (t2);
1544%     }
1545%     \draw (s1) to [bend right, anchor=mid] node [rel] {$\S$} (s2);
1546%     \draw [new] (t1) to [bend left, anchor=mid] node [rel] {$\S,\L$} (t2);
1547% \end{tikzpicture}
1548% \caption{The \verb+cl_oher+ and \verb+cl_jump+ cases.}
1549% \label{subfig:cl_other_jump}
1550% \end{subfigure}
1551% &
1552% \begin{subfigure}{.375\linewidth}
1553% \centering
1554% \begin{tikzpicture}[every join/.style={ar}, join all, thick,
1555%                             every label/.style=overlay, node distance=10mm]
1556%     \matrix [diag, small gap] (m) {%
1557%          &\node (t1) {}; \\
1558%          \node (s1) [is call] {}; \\
1559%          && \node (l) {}; & \node (t2) {};\\
1560%          \node (s2) {}; & \node (c) [is call] {};\\   
1561%     };
1562%     {[-stealth]
1563%     \draw (s1) -- node [above left] {$f$} (t1);
1564%     \draw [new] (s2) -- node [above] {$*$} (c);
1565%     \draw [new] (c) -- node [below right] {$f$} (l);
1566%     \draw [new] (l) -- node [above] {$*$} (t2);
1567%     }
1568%     \draw (s1) to [bend right] node [rel] {$\S$} (s2);
1569%     \draw [new] (t1) to [bend left] node [rel] {$\S$} (t2);
1570%     \draw [new] (t1) to [bend left] node [rel] {$\L$} (l);
1571%     \draw [new] (t1) to [bend right] node [rel] {$\C$} (c);
1572%     \end{tikzpicture}
1573% \caption{The \verb+cl_call+ case.}
1574% \label{subfig:cl_call}
1575% \end{subfigure}
1576% &
1577% \begin{subfigure}{.375\linewidth}
1578% \centering
1579% \begin{tikzpicture}[every join/.style={ar}, join all, thick,
1580%                             every label/.style=overlay, node distance=10mm]
1581%     \matrix [diag, small gap] (m) {%
1582%         \node (s1) [is ret] {}; \\
1583%         &\node (t1) {}; \\
1584%         \node (s2) {}; & \node (c) [is ret] {};\\
1585%         && \node (r) {}; & \node (t2) {}; \\   
1586%     };
1587%     {[-stealth]
1588%     \draw (s1) -- node [above right] {$RET$} (t1);
1589%     \draw [new] (s2) -- node [above] {$*$} (c);
1590%     \draw [new] (c) -- node [below left] {$RET$} (r);
1591%     \draw [new] (r) -- node [above] {$*$} (t2);
1592%     }
1593%     \draw (s1) to [bend right] node [rel] {$\S$} (s2);
1594%     \draw [new, overlay] (t1) to [bend left=60] node [rel] {$\S,\L$} (t2);
1595%     \draw [new, overlay] (t1) to [bend left ] node [rel] {$\R$} (r);
1596% \end{tikzpicture}
1597% \caption{The \verb+cl_return+ case.}
1598% \label{subfig:cl_return}
1599% \end{subfigure}
1600% \end{tabular}
1601% \caption{Mnemonic diagrams depicting the hypotheses for the preservation of structured traces.
1602%          Dashed lines
1603%          and arrows indicates how the diagrams must be closed when solid relations
1604%          are present.}
1605% \label{fig:forwardsim}
1606% \end{figure}
1608\paragraph{1-to-many forward simulation conditions.}
1609\begin{condition}[Cases \verb+cl_other+ and \verb+cl_jump+]
1610 For all $s_1,s_1',s_2$ such that $s_1 \S s_1'$, and
1611 $s_1\exec s_1'$, and either $s_1 \class \verb+cl_other+$ or
1612 both $s_1\class\verb+cl_other+\}$ and $\ell~s_1'$,
1613 there exists an $s_2'$ and a $\verb+trace_any_any_free+~s_2~s_2'$ called $taaf$
1614 such that $s_1' \mathrel{{\S} \cap {\L}} s_2'$ and either
1615$taaf$ is non empty, or one among $s_1$ and $s_1'$ is \verb+as_costed+.
1618In the above condition depicted in \autoref{subfig:cl_other_jump},
1619a $\verb+trace_any_any_free+~s_1~s_2$ (which from now on
1620will be shorthanded as \verb+TAAF+) is an
1621inductive type of structured traces that do not contain function calls or
1622cost emission statements. Differently from a \verb+TAA+, the
1623instruction to be executed in the lookahead state $s_2$ may be a cost emission
1626The intuition of the condition is that one step can be replaced with zero or more steps if it
1627preserves the relation between the data and if the two final statuses are
1628labelled in the same way. Moreover, we must take special care of the empty case
1629to avoid collapsing two consecutive states that emit a label, missing one of the two emissions.
1631\begin{condition}[Case \verb+cl_call+]
1632 For all $s_1,s_1',s_2$ s.t. $s_1 \S s_1'$ and
1633 $s_1\exec s_1'$ and $s_1 \class \verb+cl_call+$, there exists $s_a, s_b, s_2'$, a
1634$\verb+TAA+~s_2~s_a$, and a
1635$\verb+TAAF+~s_b~s_2'$ such that:
1636$s_a\class\verb+cl_call+$, the \verb+as_call_ident+'s of
1637the two call states are the same, $s_1 \C s_a$,
1638$s_a\exec s_b$, $s_1' \L s_b$ and
1639$s_1' \S s_2'$.
1642The condition, depicted in \autoref{subfig:cl_call} says that, to simulate a function call, we can perform a
1643sequence of silent actions before and after the function call itself.
1644The old and new call states must be $\C$-related, the old and new
1645states at the beginning of the function execution must be $\L$-related
1646and, finally, the two initial and final states must be $\S$-related
1647as usual.
1649\begin{condition}[Case \verb+cl_return+]
1650 For all $s_1,s_1',s_2$ s.t. $s_1 \S s_1'$,
1651 $s_1\exec s_1'$ and $s_1 \class \verb+cl_return+$, there exists $s_a, s_b, s_2'$, a
1652$\verb+TAA+~s_2~s_a$, a
1653$\verb+TAAF+~s_b~s_2'$ called $taaf$ such that:
1655$s_a\exec s_b$,
1656$s_1' \R s_b$ and
1657$s_1' \mathrel{{\S} \cap {\L}} s_2'$ and either
1658$taaf$ is non empty, or $\lnot \ell~s_a$.
1661Similarly to the call condition, to simulate a return we can perform a
1662sequence of silent actions before and after the return statement itself,
1663as depicted in \autoref{subfig:cl_return}.
1664The old and the new statements after the return must be $\R$-related,
1665to grant that they returned to corresponding calls.
1666The two initial and final states must be $\S$-related
1667as usual and, moreover, they must exhibit the same labels. Finally, when
1668the suffix is non empty we must take care of not inserting a new
1669unmatched cost emission statement just after the return statement.
1673definition status_simulation ≝
1674  λS1 : abstract_status.
1675  λS2 : abstract_status.
1676  λsim_status_rel : status_rel S1 S2.
1677    ∀st1,st1',st2.as_execute S1 st1 st1' →
1678    sim_status_rel st1 st2 →
1679    match as_classify … st1 with
1680    [ None ⇒ True
1681    | Some cl ⇒
1682      match cl with
1683      [ cl_call ⇒ ∀prf.
1684        (*
1685             st1' ------------S----------\
1686              ↑ \                         \
1687             st1 \--L--\                   \
1688              | \       \                   \
1689              S  \-C-\  st2_after_call →taa→ st2'
1690              |       \     ↑
1691             st2 →taa→ st2_pre_call
1692        *)
1693        ∃st2_pre_call.
1694        as_call_ident ? st2_pre_call = as_call_ident ? («st1, prf») ∧
1695        call_rel ?? sim_status_rel «st1, prf» st2_pre_call ∧
1696        ∃st2_after_call,st2'.
1697        ∃taa2 : trace_any_any … st2 st2_pre_call.
1698        ∃taa2' : trace_any_any … st2_after_call st2'.
1699        as_execute … st2_pre_call st2_after_call ∧
1700        sim_status_rel st1' st2' ∧
1701        label_rel … st1' st2_after_call
1702      | cl_return ⇒
1703        (*
1704             st1
1705            / ↓
1706           | st1'----------S,L------------\
1707           S   \                           \
1708            \   \-----R-------\            |     
1709             \                 |           |
1710             st2 →taa→ st2_ret |           |
1711                          ↓   /            |
1712                     st2_after_ret →taaf→ st2'
1714           we also ask that st2_after_ret be not labelled if the taaf tail is
1715           not empty
1716        *)
1717        ∃st2_ret,st2_after_ret,st2'.
1718        ∃taa2 : trace_any_any … st2 st2_ret.
1719        ∃taa2' : trace_any_any_free … st2_after_ret st2'.
1720        (if taaf_non_empty … taa2' then ¬as_costed … st2_after_ret else True) ∧
1721        as_classifier … st2_ret cl_return ∧
1722        as_execute … st2_ret st2_after_ret ∧ sim_status_rel st1' st2' ∧
1723        ret_rel … sim_status_rel st1' st2_after_ret ∧
1724        label_rel … st1' st2'
1725      | cl_other ⇒
1726          (*         
1727          st1 → st1'
1728            |      \
1729            S      S,L
1730            |        \
1731           st2 →taaf→ st2'
1733           the taaf can be empty (e.g. tunneling) but we ask it must not be the
1734           case when both st1 and st1' are labelled (we would be able to collapse
1735           labels otherwise)
1736         *)
1737        ∃st2'.
1738        ∃taa2 : trace_any_any_free … st2 st2'.
1739        (if taaf_non_empty … taa2 then True else (¬as_costed … st1 ∨ ¬as_costed … st1')) ∧
1740        sim_status_rel st1' st2' ∧
1741        label_rel … st1' st2'
1742      | cl_jump ⇒
1743        (* just like cl_other, but with a hypothesis more *)
1744        as_costed … st1' →
1745        ∃st2'.
1746        ∃taa2 : trace_any_any_free … st2 st2'.
1747        (if taaf_non_empty … taa2 then True else (¬as_costed … st1 ∨ ¬as_costed … st1')) ∧
1748        sim_status_rel st1' st2' ∧
1749        label_rel … st1' st2'
1750      ]
1751    ].
1755\paragraph{Main result: the 1-to-many forward simulation conditions
1756are sufficient to trace reconstruction}
1758Let us assume that a relation set is given such that the 1-to-many
1759forward simulation conditions are satisfied. Under this assumption we
1760can prove the following three trace reconstruction theorems by mutual
1761structural induction over the traces given in input between the
1762$s_1$ and $s_1'$ states.
1764In particular, the \verb+status_simulation_produce_tlr+ theorem
1765applied to the \verb+main+ function of the program and equal
1766$s_{2_b}$ and $s_2$ states shows that, for every initial state in the
1767source code that induces a structured trace in the source code,
1768the compiled code produces a similar structured trace.
1771For every $s_1,s_1',s_{2_b},s_2$ s.t.
1772there is a $\verb+TLR+~s_1~s_1'$ called $tlr_1$ and a
1773$\verb+TAA+~s_{2_b}~s_2$ and $s_1 \L s_{2_b}$ and
1774$s_1 \S s_2$, there exists $s_{2_m},s_2'$ s.t.
1775there is a $\verb+TLR+~s_{2_b}~s_{2_m}$ called $tlr_2$ and
1776there is a $\verb+TAAF+~s_{2_m}~s_2'$ called $taaf$
1777s.t. if $taaf$ is non empty then $\lnot (\ell~s_{2_m})$,
1778and $tlr_1\approx tlr_2$
1779and $s_1' \mathrel{{\S} \cap {\L}} s_2'$ and
1780$s_1' \R s_{2_m}$.
1783The theorem states that a \verb+trace_label_return+ in the source code
1784together with a precomputed preamble of silent states
1785(the \verb+TAA+) in the target code induces a
1786similar \verb+trace_label_return+ in the target code which can be
1787followed by a sequence of silent states. Note that the statement does not
1788require the produced \verb+trace_label_return+ to start with the
1789precomputed preamble, even if this is likely to be the case in concrete
1790implementations. The preamble in input is necessary for compositionality, e.g.
1791because the 1-to-many forward simulation conditions allow in the
1792case of function calls to execute a preamble of silent instructions just after
1793the function call.
1795Clearly similar results are also available for the other two types of structured
1796traces (in fact, they are all proved simultaneously by mutual induction).
1797% \begin{theorem}[\verb+status_simulation_produce_tll+]
1798% For every $s_1,s_1',s_{2_b},s_2$ s.t.
1799% there is a $\verb+TLL+~b~s_1~s_1'$ called $tll_1$ and a
1800% $\verb+TAA+~s_{2_b}~s_2$ and $s_1 \L s_{2_b}$ and
1801% $s_1 \S s_2$, there exists $s_{2_m},s_2'$ s.t.
1802% \begin{itemize}
1803%  \item if $b$ (the trace ends with a return) then there exists $s_{2_m},s_2'$
1804%        and a trace $\verb+TLL+~b~s_{2_b}~s_{2_m}$ called $tll_2$
1805%        and a $\verb+TAAF+~s_{2_m}~s_2'$ called $taa_2$ s.t.
1806%        $s_1' \mathrel{{\S} \cap {\L}} s_2'$ and
1807%        $s_1' \R s_{2_m}$ and
1808%        $tll_1\approx tll_2$ and
1809%        if $taa_2$ is non empty then $\lnot \ell~s_{2_m}$;
1810%  \item else there exists $s_2'$ and a
1811%        $\verb+TLL+~b~s_{2_b}~s_2'$ called $tll_2$ such that
1812%        $s_1' \mathrel{{\S} \cap {\L}} s_2'$ and
1813%        $tll_1\approx tll_2$.
1814% \end{itemize}
1815% \end{theorem}
1817% The statement is similar to the previous one: a source
1818% \verb+trace_label_label+ and a given target preamble of silent states
1819% in the target code induce a similar \verb+trace_label_label+ in the
1820% target code, possibly followed by a sequence of silent moves that become the
1821% preamble for the next \verb+trace_label_label+ translation.
1823% \begin{theorem}[\verb+status_simulation_produce_tal+]
1824% For every $s_1,s_1',s_2$ s.t.
1825% there is a $\verb+TAL+~b~s_1~s_1'$ called $tal_1$ and
1826% $s_1 \S s_2$
1827% \begin{itemize}
1828%  \item if $b$ (the trace ends with a return) then there exists $s_{2_m},s_2'$
1829%    and a trace $\verb+TAL+~b~s_2~s_{2_m}$ called $tal_2$ and a
1830%    $\verb+TAAF+~s_{2_m}~s_2'$ called $taa_2$ s.t.
1831%    $s_1' \mathrel{{\S} \cap {\L}} s_2'$ and
1832%    $s_1' \R s_{2_m}$ and
1833%    $tal_1 \approx tal_2$ and
1834%    if $taa_2$ is non empty then $\lnot \ell~s_{2_m}$;
1835%  \item else there exists $s_2'$ and a
1836%    $\verb+TAL+~b~s_2~s_2'$ called $tal_2$ such that
1837%    either $s_1' \mathrel{{\S} \cap {\L}} s_2'$ and
1838%        $tal_1\approx tal_2$
1839%    or $s_1' \mathrel{{\S} \cap {\L}} s_2$ and
1840%    $\verb+tal_collapsable+~tal_1$ and $\lnot \ell~s_1$.
1841% \end{itemize}
1842% \end{theorem}
1844% The statement is also similar to the previous ones, but for the lack of
1845% the target code preamble.
1849For every $s_1,s_1',s_2$ s.t.
1850there is a $\verb+trace_label_return+~s_1~s_1'$ called $tlr_1$ and
1851$s_1 (\L \cap \S) s_2$
1852there exists $s_{2_m},s_2'$ s.t.
1853there is a $\verb+trace_label_return+~s_2~s_{2_m}$ called $tlr_2$ and
1854there is a $\verb+trace_any_any_free+~s_{2_m}~s_2'$ called $taaf$
1855s.t. if $taaf$ is non empty then $\lnot (\verb+as_costed+~s_{2_m})$,
1856and $\verb+tlr_rel+~tlr_1~tlr_2$
1857and $s_1' (\S \cap \L) s_2'$ and
1858$s_1' \R s_{2_m}$.
1864status_simulation_produce_tlr S1 S2 R
1865(* we start from this situation
1866     st1 →→→→tlr→→→→ st1'
1867      | \
1868      L  \---S--\
1869      |          \
1870   st2_lab →taa→ st2   (the taa preamble is in general either empty or given
1871                        by the preceding call)
1873   and we produce
1874     st1 →→→→tlr→→→→ st1'
1875             \\      /  \
1876             //     R    \-L,S-\
1877             \\     |           \
1878   st2_lab →tlr→ st2_mid →taaf→ st2'
1880  st1 st1' st2_lab st2
1881  (tlr1 : trace_label_return S1 st1 st1')
1882  (taa2_pre : trace_any_any S2 st2_lab st2)
1883  (sim_execute : status_simulation S1 S2 R)
1884  on tlr1 : R st1 st2 → label_rel … st1 st2_lab →
1885  ∃st2_mid.∃st2'.
1886  ∃tlr2 : trace_label_return S2 st2_lab st2_mid.
1887  ∃taa2 : trace_any_any_free … st2_mid st2'.
1888  (if taaf_non_empty … taa2 then ¬as_costed … st2_mid else True) ∧
1889  R st1' st2' ∧ ret_rel … R st1' st2_mid ∧ label_rel … st1' st2' ∧
1890  tlr_rel … tlr1 tlr2
1894\section{Conclusions and future works}
1896The labelling approach is a technique to implement compilers that induce on
1897the source code a non uniform cost model determined from the object code
1898produced. The cost model assigns a cost to each basic block of the program.
1899The main theorem of the approach says that there is an exact
1900correspondence between the sequence of basic blocks started in the source
1901and object code, and that no instruction in the source or object code is
1902executed outside a basic block. Thus the cost of object code execution
1903can be computed precisely on the source.
1905In this paper we scale the labelling approach to cover a programming language
1906with function calls. This introduces new difficulties only when the language
1907is unstructured, i.e. it allows function calls to return anywhere in the code,
1908destroying the hope of a static prediction of the cost of basic blocks.
1909We restore static predictability by introducing a new semantics for unstructured
1910programs that single outs well structured executions. The latter are represented
1911by structured traces, a generalisation of streams of observables that capture
1912several structural invariants of the execution, like well nesting of functions
1913or the fact that every basic block must start with a code emission statement.
1914We show that structured traces are sufficiently structured to statically compute
1915a precise cost model on the object code.
1917We introduce a similarity relation on structured traces that must hold between
1918source and target traces. When the relation holds for every program, we prove
1919that the cost model can be lifted from the object to the source code.
1921In order to prove that similarity holds, we present a generic proof of forward
1922simulation that is aimed at pulling apart as much as possible the part of the
1923simulation related to non-functional properties (preservation of structure)
1924from that related to functional properties. In particular, we reduce the
1925problem of preservation of structure to that of showing a 1-to-many
1926forward simulation that only adds a few additional proof obligations to those
1927of a traditional, function properties only, proof.
1929All results presented in the paper are part of a larger certification of a
1930C compiler which is based on the labelling approach. The certification, done
1931in Matita, is the main deliverable of the FET-Open Certified Complexity (CerCo).
1933The short term future work consists in the completion of the certification of
1934the CerCo compiler exploiting the main theorem of this paper.
1936\paragraph{Related works.}
1937CerCo is the first project that explicitly tries to induce a
1938precise cost model on the source code in order to establish non-functional
1939properties of programs on an high level language. Traditional certifications
1940of compilers, like~\cite{compcert2,piton}, only explicitly prove preservation
1941of the functional properties.
1943Usually forward simulations take the following form: for each transition
1944from $s_1$ to $s_2$ in the source code, there exists an equivalent sequence of
1945transitions in the target code of length $n$. The number $n$ of transition steps
1946in the target code can just be the witness of the existential statement.
1947An equivalent alternative when the proof of simulation is constructive consists
1948in providing an explicit function, called \emph{clock function} in the
1949literature~\cite{clockfunctions}, that computes $n$ from $s_1$. Every clock
1950function constitutes then a cost model for the source code, in the spirit of
1951what we are doing in CerCo. However, we believe our solution to be superior
1952in the following respects: 1) the machinery of the labelling approach is
1953insensible to the resource being measured. Indeed, any cost model computed on
1954the object code can be lifted to the source code (e.g. stack space used,
1955energy consumed, etc.). On the contrary, clock functions only talk about
1956number of transition steps. In order to extend the approach with clock functions
1957to other resources, additional functions must be introduced. Moreover, the
1958additional functions would be handled differently in the proof.
19592) the cost models induced by the labelling approach have a simple presentation.
1960In particular, they associate a number to each basic block. More complex
1961models can be induced when the approach is scaled to cover, for instance,
1962loop optimisations~\cite{loopoptimizations}, but the costs are still meant to
1963be easy to understand and manipulate in an interactive theorem prover or
1964in Frama-C.
1965On the contrary, a clock function is a complex function of the state $s_1$
1966which, as a function, is an opaque object that is difficult to reify as
1967source code in order to reason on it.
1972% \appendix
1973% \section{Notes for the reviewers}
1975% The results described in the paper are part of a larger formalization
1976% (the certification of the CerCo compiler). At the moment of the submission
1977% we need to single out from the CerCo formalization the results presented here.
1978% Before the 16-th of February we will submit an attachment that contains the
1979% minimal subset of the CerCo formalization that allows to prove those results.
1980% At that time it will also be possible to measure exactly the size of the
1981% formalization described here. At the moment a rough approximation suggests
1982% about 2700 lines of Matita code.
1984% We will also attach the development version of the interactive theorem
1985% prover Matita that compiles the submitted formalization. Another possibility
1986% is to backport the development to the last released version of the system
1987% to avoid having to re-compile Matita from scratch.
1989% The programming and certification style used in the formalization heavily
1990% exploit dependent types. Dependent types are used: 1) to impose invariants
1991% by construction on the data types and operations (e.g. a traces from a state
1992% $s_1$ to a state $s_2$ can be concatenad to a trace from a state
1993% $s_2'$ to a state $s_3$ only if $s_2$ is convertible with $s_2'$); 2)
1994% to state and prove the theorems by using the Russell methodology of
1995% Matthieu Sozeau\footnote{Subset Coercions in Coq in TYPES'06. Matthieu Sozeau. Thorsten Altenkirch and Conor McBride (Eds). Volume 4502 of Lecture Notes in Computer Science. Springer, 2007, pp.237-252.
1996% }, better known in the Coq world as ``\verb+Program+'' and reimplemented in a simpler way in Matita using coercion propagations\footnote{Andrea Asperti, Wilmer Ricciotti, Claudio Sacerdoti Coen, Enrico Tassi: A Bi-Directional Refinement Algorithm for the Calculus of (Co)Inductive Constructions. Logical Methods in Computer Science 8(1) (2012)}. However, no result presented depends
1997% mandatorily on dependent types: it should be easy to adapt the technique
1998% and results presented in the paper to HOL.
2000% Finally, Matita and Coq are based on minor variations of the Calculus of
2001% (Co)Inductive Constructions. These variations do not affect the CerCo
2002% formalization. Therefore a porting of the proofs and ideas to Coq would be
2003% rather straightforward.
Note: See TracBrowser for help on using the repository browser.