Index: /Papers/itp2013/ccexec2.tex
===================================================================
 /Papers/itp2013/ccexec2.tex (revision 3349)
+++ /Papers/itp2013/ccexec2.tex (revision 3350)
@@ 255,5 +255,5 @@
associated to $L_1$ is the number of cycles required to execute the block
$I_3$ and \verb+COND l_2+, while the cost $k_2$ associated to $L_2$ counts the
cycles required by the block $I_4$ and \verb+GOTO l_1+. The compiler also guarantees
+cycles required by the block $I_4$, \verb+GOTO l_1+ and \verb+COND l_2+. The compiler also guarantees
that every executed instruction is in the scope of some code emission label,
that each scope does not contain loops (to associate a finite cost), and that
@@ 431,5 +431,5 @@
runs are weakly similar to the source code runs.
The notion of weak bisimulation for structured traces is a global property
+The notion of weak simulation for structured traces is a global property
which is hard to prove formally and much more demanding than the simple forward
simulation required for proofs of preservation of functional properties.
@@ 693,8 +693,8 @@
Let's consider a generic unstructured language already equipped with a
small step structured operational semantics (SOS). We introduce a
deterministic labelled transition system~\cite{LTS} $(S,s_{\mathrm{init}},\Lambda,\to)$
+deterministic labelled transition system~\cite{LTS} $(S,\Lambda,\to)$
that refines the
SOS by observing function calls and the beginning of basic blocks.
$S$ is the set of states of the program, $s_\mathrm{init}$ the initial state and
+$S$ is the set of states of the program and
$\Lambda = \{ \tau, RET \} \cup \Labels \cup \Functions$
where $\Functions$ is the set of names of functions that can occur in the
@@ 705,5 +705,5 @@
denotes the image of this function.
The transition function is defined as $s_1 \to[o] s_2$ if
$s_1$ moves to $s_2$ according to the SOS; moreover $o = f \in \Functions$ if
+$s_1$ moves to $s_2$ according to the SOS and $o = f \in \Functions$ if
the function $f$ is called, $o = RET$ if a \verb+RETURN+ is executed,
$o = L \in \Labels$ if an \verb+EMIT $L$+ is executed to signal the
@@ 711,12 +711,24 @@
Because we assume the language to be deterministic, the label emitted can
actually be computed simply observing $s_1$. Finally, $S$ is also endowed with
a relation $s\ar s'$ ($s'$ \emph{follows} $s$) when the instruction to be executed
in $s'$ is just after the one in $s$.

+a relation $s\ar s'$ ($s'$ \emph{follows} $s$) that holds when the instruction
+to be executed in $s'$ follows syntactically the one in $s$ in the source program.
+
+In the rest of the paper we write $s_0 \to^{*} s_n$ for the finite execution
+fragment $T = s_0 \to[o_0] s_1 \to[o_1] \ldots \to[o_{n1}] s_n$
+and, we call \emph{weak trace} of $T$ (denoted as $T$) the
+subsequence $o_{i_0} \ldots o_{i_m}$ of $o_0 \ldots o_{n1}$ obtained dropping
+every internal action $\tau$.
+
+%Let $k$ be a cost model for observables actions that maps elements of
+%$\Lambda \setminus \{\tau\}$ to any commutative cost monoid
+%(e.g. natural numbers). We extend the domain of $k$ to executable fragments
+%by posing $k(T) = \Sigma_{o \in T} k(o)$.
+
+\paragraph{Structured execution fragments}
Among all possible finite execution fragments we want to
identify the ones that satisfy the requirements we sketched in the previous
section. We say that an execution fragment
$s_0 \to[o_0] s_1 \to[o_1] \ldots \to[o_n] s_n$
is \emph{structured} (marking it as $s_0 \To s_n$) iff the following conditions
+$s_0 \to^{*} s_n$
+is \emph{structured} (and we denote it as $s_0 \To s_n$) iff the following conditions
are met.
\begin{enumerate}
@@ 726,59 +738,177 @@
$s_i \ar s_{k+1}$.
In other words, $s_{i+1}$ must start execution with \verb+EMIT $\ell(f)$+
+  so that no instruction falls outside the scope of every label 
and then continue with a structured fragment returning control
to the instruction immediately following the call.
 This captures the requirements that the body of function calls always start
 with a label emission statement, and every function
 call must converge yielding back control just after it.
 \item For every $i$ and $f$, if $s_{i+1}\to[\ell(f)]s_{i+2}$ then
 $s_i\to[f]s_{i+1}$. This is a technical condition needed to ensure that
 labels associated with functions always follow a call.
+
+ The condition also enforces convergence of every function call, which is
+ necessary to bound the cost of the fragment. Note that
+ non convergent programs may still have structured execution fragments
+ that are worth measuring. For example, we can measure the reaction time
+ of a server implemented as an unbounded loop whose body waits for an
+ input, process it and performs an output before looping: the processing
+ steps form a structured execution fragment.
\item The number of $RET$'s in the fragment is equal to the number of
 calls, i.e.\ the number of observables in $\Functions$. This, together with
 the above condition, captures the wellbracketing of the fragment with
 respect to function calls.
+ calls performed. In combination with the previous condition, this ensures
+ wellbacketing of function calls.
+ \item
+ \label{req3}
+ For every $i$ and $f$, if $s_{i+1}\to[\ell(f)]s_{i+2}$ then
+ $s_i\to[f]s_{i+1}$. This is a technical condition needed to ensure that a
+ label associated with a function is only used at the beginning of its
+ body. Its use will become clear in~\autoref{simulation}.
\item For every $i$, if the instruction to be executed in $s_i$ is a
conditional branch, then there is an $L$ such that $s_{i+1} \to[L] s_{i+2}$ or, equivalently, that $s_{i+1}$ must start execution with an
\verb+EMIT $L$+. This captures the requirement that every branch which is
 live code must start with a label emission.
+ live code must start with a label emission. Otherwise, it would be possible
+ to have conditional instructions whose branches are assigned different
+ costs, making impossible to assign a single cost to the label whose scope
+ contains the jump.
\end{enumerate}
One might wonder why $f$ and $\ell(f)$, that aways appear in this order, are not
collapsed into a single observable. This would indeed simplify some aspects of
the formalisation, but has the problem of misassagning the cost of calls, which would
fall under the associated label. As different call instructions with different costs
are possible, this is not acceptable.

Let $T = s_0 \to[o_0] s_1 \ldots \to[o_n] s_{n+1}$ be an
execution fragment. The \emph{weak trace} $T$ associated to $T$ is the
subsequence $o_{i_0} \ldots o_{i_m}$ of $o_0 \ldots o_n$ obtained dropping
every internal action $\tau$.

Let $k$ be a cost model that maps observables actions to any commutative cost
monoid (e.g. natural numbers). We extend the domain of $k$ to fragments
by posing $k(T) = \Sigma_{o \in T} k(o)$.

The labelling approach is based on the idea that the execution cost of
+collapsed into a single observable. This would simplify some aspects of the
+formalisation at the price of others. For example, we should add special
+cases when the fragment starts at the beginning of a function body
+(e.g. the one of \texttt{main}) because in that case nobody would have emitted
+the observable $\ell(f)$.
+
+\paragraph{Measurable execution fragments and their cost prediction.}
+The first main theorem of CerCo deals with programs written in object code.
+It states that the execution cost of
certain execution fragments, that we call \emph{measurable fragments}, can be
computed from their weak trace by choosing a $k$ that assigns to any label
the cost of the instructions in its scope.
A structured fragment
$T = s_0 \To s_n$ is
measurable if it does not start or end in the middle of a basic block.
Ending in the middle of a block would mean having prepaid more instructions
than the ones executed, and starting in the middle would mean not paying any
instruction up to the first label emission.
Formally we require $o_0 \in \Labels$ (or equivalently
+computed from their weak trace by choosing the cost model $k$ that assigns to
+any label the cost (in clock cycles) of the instructions in its scope, and
+$0$ to function calls and $RET$ observables.
+
+\begin{theorem}
+ \label{static}
+ for all measurable fragment $T = s_0 \to^{*} s_n$,\\
+ $$\Delta_t := \verb+clock+_{s_n}  \verb+clock+_{s_0} = \Sigma_{o \in T} k(o)$$
+\end{theorem}
+
+An execution fragment $s_0 \to^{*} s_n$ is
+measurable if it is structured (up to a possible final \texttt{RETURN}) and
+if it does not start or end in the middle of a basic block.
+Ending in the middle of a block would mean that the last label encountered
+would have prepaid more instructions than the ones executed; starting in the
+middle would mean not paying any instruction up to the first label emission.
+
+Formally, $s_0 \to^{*} s_n$ is measurable iff $o_0 \in \Labels$ (or equivalently
in $s_0$ the program must emit a label) and either
$s_{n1}\to[RET]s_n$ or $s_n$ must be a label emission statement
(i.e.\ $s_n \to[L] s_{n+1}$).

+$s_0 \To s_{n1}$ and $s_{n1}\to[RET]s_n$ or
+$s_0 \To s_n$ and $s_n$ must be a label emission statement.
+
+\textbf{CSC: PROVA}
+% The theorem is proved by structural induction over the structured
+% trace, and is based on the invariant that
+% iff the function that computes the cost model has analysed the instruction
+% to be executed at $s_2$ after the one to be executed at $s_1$, and if
+% the structured trace starts with $s_1$, then eventually it will contain also
+% $s_2$. When $s_1$ is not a function call, the result holds trivially because
+% of the $s_1\exec s_2$ condition obtained by inversion on
+% the trace. The only non
+% trivial case is the one of function calls: the cost model computation function
+% does recursion on the first instruction that follows that function call; the
+% \verb+as_after_return+ condition of the \verb+tal_base_call+ and
+% \verb+tal_step_call+ grants exactly that the execution will eventually reach
+% this state.
+
+\paragraph{Weak similarity and cost invariance.}
Given two deterministic unstructured programming languages with their own
operational semantics, we say that a state $s_2$ of the second language
(weakly) simulates the state $s_1$ of the first iff the two unique weak traces
that originate from them are equal. If $s_1$ also (weakly) simulates $s_2$,
then the two states are weakly trace equivalent.

 or, equivalently
because of determinism, that they are weakly bisimilar.
+operational semantics, we say that two execution fragments are
+\emph{weakly trace equivalent} if their weak traces are equal.
+
+A compiler (pass) that preserves the program semantics also preserves weak
+traces and propagates measurability iff for every measurable
+fragment $T_1 = s_1 \to^{*} s_1'$ of the source code, the corresponding
+execution fragment $T_2 = s_2 \to^{*} s_2'$ of the object code is measurable
+and $T_1$ and $T_2$ are weakly trace equivalent. The very intuitive notion of
+``corresponding fragment'' is made clear in the forward simulation proof of
+preservation of the semantics of the program by saying that $s_2$ and $s_1$
+are in a certain relation. Clearly the property holds for a compiler if it
+holds for each compiler pass.
+
+Having proved in~\autoref{static} that the statically computed cost model is
+accurate for the object code, we get as a corollary that it is also accurate
+for the source code if the compiler preserves weak traces and
+propagates measurability. Thus it becomes possible to compute cost models
+on the object code, transfer it to the source code and then reason comfortably
+on the source code only.
+
+\begin{theorem}\label{preservation}
+Given a compiler that preserves weak traces and propagates measurability,
+for all measurable execution fragment $T_1 = s_1 \to^{*} s_1'$ of the source
+code such that $T_2 = s_2 \to^{*} s_2'$ is the corresponding fragment of the
+object code,
+
+$$\Delta_t := \verb+clock+_{s_2'}  \verb+clock+_{s_2} = \Sigma_{o \in T_2} k(o) = \Sigma_{o \in T_1} k(o)$$
+\end{theorem}
+
+\section{Forward simulation}
+\label{simulation}
+Because of \autoref{preservation}, to certify a compiler for the labelling
+approach we need to both prove that it respects the functional semantics of the
+program, and that it preserves weak traces and propagates measurability.
+The first property is standard and can be proved by means of a forward simulation argument (see for example~\cite{compcert}) that runs like this.
+First a relation between the corresponding
+source and target states is established. Then a lemma establishes
+a local simulation condition: given two states in relation, if the source
+one performs one step then the target one performs zero or more steps and
+the two resulting states are synchronized again according to the relation.
+Finally, the lemma is iterated over the execution trace to establish the
+final result.
+
+In principle, preservation of weak traces could be easily shown with the same
+argument (and also at the same time). Surprisingly, propagation of
+measurability cannot. What makes the standard forward
+simulation proof work is the fact that usually a compiler pass performs some
+kind of local or global analysis of the code followed by a compositional, order
+preserving translation of every instruction. In order to produce structured
+traces, however, code emission cannot be fully compositional any longer.
+
+For example, consider~requirement \ref{req3} that asks every function body
+to start with a label emission statement. Some compiler passes must
+add preambles to functions, for example to take care of the parameter passing
+convention. In order to not violate the requirement, the preamble must be
+inserted after the label emission. In the forward simulation proof, however,
+function call steps in the source language are simulated by the new function
+call followed by the execution of the preamble, and only at the end of the
+preamble the reached states are again in the expected relation. In the meantime,
+however, the object code has already performed the label emission statement,
+that still needs to be executed in the source code, breaking forward simulation.
+
+Another reason why the standard argument breaks is due to the requirement that
+function calls should yield back control after the calling point. This must be
+enforced just after
+
+\textbf{XXXXXXXXXXXXXXXXXX}
+
+A compiler preserves the program semantics by suppressing or introducing
+$\tau$ actions.
+
+Intuitively, it is because
+
+To understand why, consider the case of a function call
+and the pass that fixes the parameter passing conventions. A function
+call in the source code takes in input an arbitrary number of pseudoregisters (the actual parameters to pass) and returns an arbitrary number of pseudoregisters (where the result is stored). A function call in the target language has no
+input nor output parameters. The pass must add explicit code before and after
+the function call to move the pseudoregisters content from/to the hardware
+registers or the stack in order to implement the parameter passing strategy.
+Similarly, each function body must be augmented with a preamble and a postamble
+to complete/initiate the parameter passing strategy for the call/return phase.
+Therefore what used to be a call followed by the next instruction to execute
+after the function return, now becomes a sequence of instructions, followed by
+a call, followed by another sequence. The two states at the beginning of the
+first sequence and at the end of the second sequence are in relation with
+the status before/after the call in the source code, like in an usual forward
+simulation. How can we prove however the additional condition for function calls
+that asks that when the function returns the instruction immediately after the
+function call is called? To grant this invariant, there must be another relation
+between the address of the function call in the source and in the target code.
+This additional relation is to be used in particular to relate the two stacks.
+
+
+
+
% @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
%
@@ 1043,35 +1173,4 @@
% % trace_label_return \(s_1\) \(s_2\) \(\to\) list (as_cost_label S)
% % \end{alltt}
%
% \paragraph{Cost prediction on structured traces.}
%
% The first main theorem of CerCo about traces
% (theorem \verb+compute_max_trace_label_return_cost_ok_with_trace+)
% holds for the
% instantiation
% of the structured traces to the concrete status of object code programs.
% Simplifying a bit, it states that
% \begin{equation}\label{th1}
% \begin{array}{l}\forall s_1,s_2. \forall \tau: \verb+TLR+~s_1~s_2.~
% \verb+clock+~s_2 = \verb+clock+~s_1 +
% \Sigma_{\alpha \in \tau}\;k(\alpha)
% \end{array}
% \end{equation}
% where the cost model $k$ is statically computed from the object code
% by associating to each label $\alpha$ the sum of the cost of the instructions
% in the basic block that starts at $\alpha$ and ends before the next labelled
% instruction. The theorem is proved by structural induction over the structured
% trace, and is based on the invariant that
% iff the function that computes the cost model has analysed the instruction
% to be executed at $s_2$ after the one to be executed at $s_1$, and if
% the structured trace starts with $s_1$, then eventually it will contain also
% $s_2$. When $s_1$ is not a function call, the result holds trivially because
% of the $s_1\exec s_2$ condition obtained by inversion on
% the trace. The only non
% trivial case is the one of function calls: the cost model computation function
% does recursion on the first instruction that follows that function call; the
% \verb+as_after_return+ condition of the \verb+tal_base_call+ and
% \verb+tal_step_call+ grants exactly that the execution will eventually reach
% this state.
%
% \paragraph{Structured traces similarity and cost prediction invariance.}
@@ 1281,6 +1380,4 @@
% As should be expected, even though the rules are asymmetric $\approx$ is in fact
% an equivalence relation.
\section{Forward simulation}
\label{simulation}
We summarise here the results of the previous sections. Each intermediate