source: Deliverables/D2.1/Revision/text-report.tex @ 2732

Last change on this file since 2732 was 792, checked in by amadio, 10 years ago

Deliverable D2.1 with addendum

File size: 149.9 KB
Line 
1
2\section{Introduction}\label{intro-sec}
3The formal description and certification of software components is
4reaching a certain level of maturity with impressing case studies
5ranging from compilers to kernels of operating systems.  A
6well-documented example is the proof of functional correctness of a
7moderately optimizing compiler from a large subset of the $\C$ language
8to a typical assembly language of the kind used in embedded systems
9\cite{Leroy09}.
10
11In the framework of the {\em Certified Complexity} ($\cerco$) project~\cite{Cerco10}, we
12aim to refine this line of work by focusing on the issue of the {\em
13execution cost} of the compiled code.  Specifically, we aim to build a
14formally verified $\C$ compiler that given a source program produces
15automatically a functionally equivalent object code plus an annotation
16of the source code which is a sound and precise description
17of the execution cost of the object code.
18
19We target in particular the kind of $\C$ programs produced for embedded
20applications; these programs are eventually compiled to binaries
21executable on specific processors.  The current state of the art in
22commercial products such as $\scade$~\cite{SCADE,Fornari10} is that the {\em reaction
23time} of the program is estimated by means of abstract interpretation
24methods (such as those developed by $\absint$~\cite{AbsInt,AbsintScade}) that operate on the
25binaries.  These methods rely on a specific knowledge of the
26architecture of the processor and may require explicit annotations of
27the binaries to determine the number of times a loop is iterated (see,
28{\em e.g.},~\cite{W09} for a survey of the state of the art).
29
30In this context, our aim is to produce a mechanically verified
31compiler which can {\em lift} in a provably correct way the pieces of
32information on the execution cost of the binary code to cost
33annotations on the source $\C$ code.  Eventually, we plan to
34manipulate the cost annotations with automatic tools such as
35$\framac$~\cite{Frama-C} to infer more synthetic cost annotations.  In
36order to carry on our project, we need a clear and flexible picture
37of: (i) the meaning of cost annotations, (ii) the method to prove them
38sound and precise, and (iii) the way such proofs can be composed.  Our
39purpose here is to propose a methodology addressing these three
40questions and to consider its concrete application to a simple toy
41compiler and to a moderately optimizing untrusted $\C$ compiler.
42
43
44\paragraph{Meaning of cost annotations}
45The execution cost of the source programs we are interested in depends on
46their control structure. Typically, the source programs are composed
47of mutually recursive procedures and loops and their execution cost
48depends, up to some multiplicative constant, on the number of times
49procedure calls and loop iterations are performed.
50Producing a {\em cost annotation} of a  source program  amounts to:
51\begin{itemize}
52
53\item enrich the program with a  collection of {\em global cost variables} 
54to measure resource consumption (time, stack size, heap size,$\ldots$)
55
56\item inject suitable code at some critical points (procedures, loops,$\ldots$) to keep track of the execution cost. 
57
58\end{itemize}
59
60Thus producing a cost-annotation of a source program $P$ amounts to
61build an {\em annotated program} $\w{An}(P)$ which behaves as $P$
62while self-monitoring its execution cost. In particular, if we do {\em
63  not} observe the cost variables then we expect the annotated
64program $\w{An}(P)$ to be functionally equivalent to $P$.  Notice that
65in the proposed approach an annotated program is a program in the
66source language. Therefore the meaning of the cost
67  annotations is automatically defined by the semantics of the source
68language and tools developed to reason on the source programs
69can be directly applied to the annotated programs too.
70
71
72\paragraph{Soundness and precision of cost annotations}
73Suppose we have a functionally correct compiler $\cl{C}$ that
74associates with a program $P$ in the source language a program
75$\cl{C}(P)$ in the object language.  Further suppose we have some
76obvious way of defining the execution cost of an object code. For
77instance, we have a good estimate of the number of cycles needed for
78the execution of each instruction of the object code.  Now the
79annotation of the source program $\w{An}(P)$ is {\em sound} if its
80prediction of the execution cost is an upper bound for the `real'
81execution cost. Moreover, we say that the annotation is {\em precise}
82with respect to the cost model if the {\em difference} between the
83predicted and real execution costs is bounded by a constant which
84depends on the program.
85
86
87\paragraph{Compositionality}\label{comp-intro}
88In order to master the complexity of the compilation process (and its
89verification), the compilation function $\cl{C}$ must be regarded as
90the result of the composition of a certain number of program transformations
91$\cl{C}=\cl{C}_{k} \comp \cdots \comp \cl{C}_{1}$.
92When building a system of
93cost annotations on top of an existing compiler 
94a certain number of problems arise.
95First, the estimated cost of executing a piece of source code
96is determined only at the {\em end} of the compilation process.
97Thus while we are used to define the compilation
98functions $\cl{C}_i$ in increasing order (from left to right),
99the annotation function $\w{An}$ is the result of
100a progressive abstraction from
101the object to the source code (from right to left).
102Second, we must be able to foresee in the source language
103the looping and branching  points of the object code.
104Missing a loop may lead to unsound cost annotations
105while missing a branching point may lead to rough cost
106predictions. This means that we must have a rather good
107idea of the way the source code will eventually be compiled
108to object code.
109Third, the definition of the annotation of the source
110code depends heavily on {\em contextual information}. For instance,
111the cost of the compiled code associated with
112a simple expression such as $x+1$ will depend on the
113place in the memory hierarchy where the variable $x$ is allocated.
114A previous experience described
115%in~\cite{CercoDeliverable}
116in appendix \ref{direct-sec}
117suggests that the process
118of pushing `hidden parameters' in the definitions of cost annotations
119and of manipulating directly numerical cost is error prone and
120produces complex proofs. For this reason, we advocate next a
121`labelling approach' where costs are handled at
122an abstract level and numerical values are produced at the very end of the
123construction.
124
125
126
127
128
129%% \subsection{Direct approach to cost annotations}\label{direct-intro}
130%% A first `direct' approach to the problem of building
131%% cost annotations is summarised by the following diagram.
132
133%% %% {\footnotesize
134%% %% \[
135%% %% \begin{array}{cccccc}
136
137%% %%       &\cl{C}_{1}   &          &                &\cl{C}_{k}              \\
138
139%% %% L_1   &\arrow        &L_2      &\cdots        &\arrow       &L_{k+1}        \\
140
141%% %% \ \quad\downarrow \w{An}_1
142%% %% &
143%% %% &\ \quad\downarrow \w{An}_2
144%% %% &
145%% %% &
146%% %% &\ \quad\downarrow \w{An}_{k+1} \\
147
148%% %% L_1   &             &L_2       &      & &L_{k+1}        \\
149
150%% %% \end{array}
151%% %% \]}
152
153%% \[
154%% \xymatrix{
155%% % Row 1
156%%   L_1 \ar[d]^{An_1} \ar[r]^{\cl{C}_1}
157%% & L_2 \ar[d]^{An_1} 
158%% & \ldots \hspace{0.3cm}\ar[r]^{\cl{C}_k}
159%% & L_{k+1} \ar[d]^{An_{k+1}}  \\
160%% % Row 2
161%%   L_1                                 
162%% & L_2   
163%% & & L_{k+1}
164%% }
165%% \]
166
167%% With respect to our previous discussion,  $L_1$ is the source
168%% language with the related annotation function $\w{An}_1$ while
169%% $L_{k+1}$ is the object language with a related annotation
170%% $\w{An}_{k+1}$. This annotation of the object code is supposed to be truly
171%% straightforward and it is taken as an `axiomatic' definition of
172%% the `real' execution cost of the program.
173%% The languages $L_i$, for $2\leq i \leq k$, are
174%% intermediate languages which are also equipped with increasingly `realistic'
175%% annotation functions.
176%% Suppose we denote with $S$ the source program,
177%% with $\cl{C}(S)$ the compiled program, and
178%% that we write $(P,s)\eval s'$ to mean that
179%% the (source or object) program $P$ in the state $s$
180%% terminates successfully in the state $s'$.
181%% The soundness proof of the compilation function
182%% guarantees that if $(S,s)\eval s'$ then
183%% $(\cl{C}(S),s)\eval s'$.
184%% In the direct approach, the {\em proof of soundness} of
185%% the cost annotations amounts to lift the proof of functional
186%% equivalence of the source program and the object code to a proof of
187%% `quasi-equivalence' of the respective instrumented codes.
188%% Suppose we write $s[c/\w{cost}]$ for a state that associates $c$
189%% with the cost variable $\w{cost}$. Then what we want to show is
190%% that whenever $(\w{An}_{1}(S),s[c/\w{cost}])\eval s'[c'/\w{cost}]$
191%% we have that $(\w{An}_{k+1}(\cl{C}(S)),s[d/\w{cost}])\eval s'[d'/\w{cost}]$
192%% and $|d'-d|\leq |c'-c|+k$.
193%% This means that the increment in the annotated source program bounds up to
194%% an additive constant the increment in the annotated object program.
195%% We will also say that the cost annotation is precise if we can also prove that
196%% $|c'-c|\leq |d'-d|$, {\em i.e.}, the `estimated' cost is not too far
197%% away from the `real' one.
198%% We will see that while in theory one can build sound and precise
199%% annotation functions, in practice definitions and proofs become
200%% unwieldy.
201
202
203
204\paragraph{Labelling approach to cost annotations}\label{label-intro}
205The `labelling' approach to the problem of building
206cost annotations is summarized in the following diagram.
207
208%% {\footnotesize
209%% \[
210%% \begin{array}{c||l}
211
212%% \begin{array}{cccccc}
213
214
215%% L_1      &         &                     &                          &                & \\
216
217%% \quad\uparrow {\cal I}      &\cl{C}_{1}   &  &      &\cl{C}_{k}         &     \\
218
219%% L_{1,\ell}   &\arrow        &L_{2,\ell}         &\cdots       &\arrow       &L_{k+1,\ell}        \\
220
221%% \quad \cl{L} \uparrow \ \downarrow \w{er}_{1}
222%% &
223%% &\ \quad\downarrow \w{er}_{2}
224%% &&
225%% &\ \quad\downarrow \w{er}_{k+1} \\
226
227%% L_{1}   &\arrow             &L_{2}        &\cdots     &\arrow &L_{k+1}        \\
228
229%%       &\cl{C}_{1}   &                     &     &\cl{C}_{k}              \\
230%% \end{array}
231
232%% &\begin{array}{ccc}
233
234%% \w{er}_{i+1} \comp \cl{C}_{i} &= &\cl{C}_{i} \comp \w{er}_{i} \\
235
236%% \w{er}_{1} \comp \cl{L} &= &\w{id}_{L_{1}} \\
237
238%% \w{An}_{1} &= &\cl{I} \comp \cl{L} 
239
240
241%% \end{array}
242%% \end{array}
243%% \]
244%% }
245\newcolumntype{M}[1]{>{\raggedright}m{#1}}
246\-
247{\footnotesize
248
249\begin{tabular}{M{8cm}||M{3cm}}
250$$
251\xymatrix{
252% Row 1
253  L_1 \\
254%
255% Row 2
256  L_{1,\ell}
257  \ar[u]^{\cl{I}}
258  \ar@/^/[d]^{\w{er}_1}
259  \ar[r]^{\cl{C}_1}
260%
261& L_{2,\ell}
262  \ar[d]^{\w{er}_2
263%
264& \ldots \hspace{0.3cm}\ar[r]^{\cl{C}_k}
265%
266& L_{k+1, \ell}
267  \ar[d]^{\w{er}_{k+1}}  \\
268%
269% Row 3
270  L_1                                 
271  \ar@/^/[u]^{\cl{L}}
272  \ar[r]^{\cl{C}_1}
273& L_2   
274& \ldots\hspace{0.3cm}
275  \ar[r]^{\cl{C}_{k}}
276& L_{k+1}
277}
278$$
279&
280$$
281\begin{array}{ccc}
282\w{er}_{i+1} \comp \cl{C}_{i} &= &\cl{C}_{i} \comp \w{er}_{i} \\
283
284\w{er}_{1} \comp \cl{L} &= &\w{id}_{L_{1}} \\ 
285
286\w{An} &= &\cl{I} \comp \cl{L} 
287\end{array}
288$$
289\end{tabular}
290}
291\-
292
293For each language $L_i$ considered in the compilation process,
294we define an extended {\em labelled} language $L_{i,\ell}$ and an
295extended operational semantics. The labels are used to mark
296certain points of the control. The semantics makes sure
297that whenever we cross a labelled control point a labelled and observable
298transition is produced.
299
300For each labelled language there is an obvious function $\w{er}_i$
301erasing all labels and producing a program in the corresponding
302unlabelled language.
303The compilation functions $\cl{C}_i$ are extended from the unlabelled
304to the labelled language so that they enjoy commutation
305with the erasure functions. Moreover, we lift the soundness properties of
306the compilation functions from the unlabelled to the labelled languages
307and transition systems.
308
309A {\em labelling} $\cl{L}$  of the source language $L_1$ is just a function
310such that $\w{er}_{L_{1}} \comp \cl{L}$ is the identity function.
311An {\em instrumentation} $\cl{I}$ of the source labelled language  $L_{1,\ell}$
312is a function replacing the labels with suitable increments of, say,
313a fresh global \w{cost} variable. Then an {\em annotation} $\w{An}$ of the
314source program can be derived simply as the composition of the labelling
315and the instrumentation functions: $\w{An} = \cl{I}\comp \cl{L}$.
316
317Suppose $s$ is some adequate representation
318of the state of a program. Let $P$ be a source program and suppose
319that its annotation satisfies the following property:
320\begin{equation}\label{STEP1}
321(\w{An}(P),s[c/\w{cost}])  \eval s'[c+\delta/\w{cost}]
322\end{equation}
323where $c$ and $\delta$ are some non-negative numbers.
324Then the definition of the instrumentation and the fact that
325the soundness proofs of the compilation functions have been lifted
326to the labelled languages allows to conclude that
327\begin{equation}\label{step2}
328(\cl{C}(\cl{L}(P)),s[c/\w{cost}])  \eval (s'[c/\w{cost}],\lambda)
329\end{equation}
330where $\cl{C} = \cl{C}_{k} \comp \cdots \comp \cl{C}_{1}$ and
331$\lambda$ is a sequence (or a multi-set) of labels whose `cost'
332corresponds to the number $\delta$ produced by the
333annotated program.
334Then the commutation properties of erasure and compilation functions
335allows to conclude that the {\em erasure} of the compiled
336labelled code $\w{er}_{k+1}(\cl{C}(\cl{L}(P)))$ is actually  equal to
337the compiled code $\cl{C}(P)$ we are interested in.
338Given this, the following question arises:
339under which conditions
340the sequence $\lambda$, {\em i.e.}, the increment $\delta$,
341is a sound and possibly precise description of the
342execution cost of the object code?
343
344To answer this question, we observe that the object code we are
345interested in is some kind of assembly code and its control flow can
346be easily represented as a control flow graph. The fact that we have
347to prove the soundness of the compilation functions means that we have
348plenty of information on the way the control flows in the compiled
349code, in particular as far as procedure calls and returns are
350concerned.  These pieces of information allow to build a rather
351accurate representation of the control flow of the compiled code at
352run time.
353
354The idea is then to perform two simple checks on the control flow
355graph. The first check is to verify that all loops go through a labelled node.
356If this is the case then we can associate a finite cost with
357every label and prove that the cost annotations are sound.
358The second check amounts to verify that all paths starting from
359a label have the same cost. If this check is successful then we can
360conclude that the cost annotations are precise.
361
362\paragraph{A toy compiler}\label{toy-intro}
363As a first case study for the labelling approach to cost
364annotations we have sketched, we introduce a
365{\em toy compiler} which is summarized by the
366following diagram.
367%% {\footnotesize
368%% \[
369%% \begin{array}{ccccc}
370
371%%       &\cl{C}          &                   &\cl{C'}              \\
372
373%% \imp   &\arrow        &\vm                &\arrow       &\mips       
374
375%% \end{array}
376%% \]}
377
378{\footnotesize\[
379\xymatrix{
380\imp 
381\ar[r]^{\cl{C}}
382& \vm 
383\ar[r]^{\cl{C'}}
384& \mips
385}
386\]}%
387
388The three languages considered can be shortly described as follows:
389$\imp$ is a very simple imperative language
390with pure expressions, branching and looping commands,
391$\vm$ is an assembly-like language enriched with a stack, and
392$\mips$ is a $\mips$-like assembly language with
393registers and main memory.
394The first compilation function $\cl{C}$ relies on the stack
395of the $\vm$ language to implement expression evaluation while
396the second compilation function $\cl{C'}$ allocates (statically)
397the base of the stack in the registers and the rest in main memory.
398This is of course a naive strategy but it suffices to expose
399some of the problems that arise in defining a compositional approach.
400
401\paragraph{A C compiler}\label{Ccompiler-intro}
402As a second, more complex, case study we consider a $\C$
403compiler. Even if the long-term agenda of the $\cerco$ project is to
404build a mechanically verified compiler using a proof assistant, we
405only developed an untrusted compiler prototype in $\ocaml$ in
406order to first experiment with the scalability of our approach.  The
407architecture of this prototype is summarized by the following
408diagram:
409
410{\footnotesize
411\[
412\begin{array}{cccccccccc}
413&&\C &\arrow &\Clight &\arrow &\Cminor &\arrow &\RTLAbs &\qquad \mbox{(front end)}\\
414                                              &&&&&&&&\downarrow \\
415 \mips \text{ or } \intel &\leftarrow &\LIN &\leftarrow  &\LTL &\leftarrow  &\ERTL  &\leftarrow &\RTL &\qquad \mbox{(back-end)}
416\end{array}
417\]
418}%
419
420The structure follows rather closely the one of the $\compcert$
421compiler~\cite{Leroy09}.  Notable differences are that some
422compilation steps are fusioned, that the front-end goes till $\RTLAbs$
423(rather than $\Cminor$) and that we target the $\mips$ and Intel~$\intel$
424assembly languages (rather than $\powerpc$).  These differences are
425contingent to the way we built the compiler.  The compilation from
426$\C$ to $\Clight$ relies on the $\cil$ front-end~\cite{CIL02}. The one
427from $\Clight$ to $\RTL$ has been programmed from scratch and it is
428partly based on the $\coq$ definitions available in the $\compcert$
429compiler.  Finally, the back-end from $\RTL$ to $\mips$ is based on a
430compiler developed in $\ocaml$ for pedagogical
431purposes~\cite{Pottier}.  We extended this back-end to also target the
432Intel~$\intel$~\cite{intel94}, an 8bits micro-controller used in embedded systems.
433\footnote{According to \cite{share}, in 2008  $\intel$ was the most diffused processor in the
434microcontroller market with a 19\% market share.}
435The main optimizations the back-end performs are
436liveness analysis and register allocation  and graph compression. We
437ran some benchmarks to ensure that our prototype implementation is
438realistic. The results are given in
439appendix~\ref{C-compiler-benchmarks} and the compiler is available
440from the authors.
441
442\paragraph{Organization}
443The rest of the paper is organized as follows.
444Section~\ref{label-toy-sec} describes the application of
445the labelling approach to the toy compiler.
446Section~\ref{C-label-sec} reports our experience in implementing
447and testing the labelling approach on the $\C$ compiler.
448Section~\ref{conclusion-sec} summarizes our contribution and
449outlines some perspectives for future work.
450Appendix~\ref{toy-compiler-sec} gives standard definitions that
451are skipped in the presentation for the sake of conciseness.
452Appendix~\ref{paper-proofs} sketches the proofs that have not
453been mechanically checked in $\coq$ and
454appendix~\ref{C-compiler-sec}  provides some details on the
455structure of the $\C$ compiler we have implemented.
456Appendix~\ref{direct-sec} describes a {\em direct approach}
457to cost annotations that in our experience fails to scale to
458a realistic compiler architecture and
459appendix~\ref{related-sec} discusses informally some related approaches.
460
461\section{Labelling approach for the toy compiler}\label{label-toy-sec}
462
463\begin{figure}
464{\footnotesize
465\newcolumntype{M}[1]{>{\raggedright}m{#1}}
466\begin{tabular}{M{6cm}M{4cm}}
467$$
468{\footnotesize\begin{array}{rcl}
469\multicolumn{3}{l}{\textrm{Syntax for }\imp}\\
470\w{id}&::=& x\Alt y\Alt \ldots       \\ % &\mbox{(identifiers)} \\
471\w{n} &::=& 0 \Alt -1 \Alt +1 \Alt \ldots \\ % &\mbox{(integers)} \\
472v &::=& n \Alt \s{true} \Alt \s{false}  \\ %&\mbox{(values)} \\
473e &::=& \w{id} \Alt n \Alt e+\\ %  &\mbox{(numerical expressions)} \\
474b &::=& e<e  \\ % &\mbox{(boolean conditions)} \\
475S &::=& \s{skip} \Alt \w{id}:=e \Alt S;S \\ 
476  &\Alt& \s{if} \ b \ \s{then} \ S \ \s{else} \ S \\
477  &\Alt& \s{while} \ b \ \s{do} \ \\ % &\mbox{(commands)} \\
478P &::= & \s{prog} \ S    \\ %   &\mbox{(programs)}\\
479\end{array}}
480$$
481&
482$$
483{\footnotesize
484\begin{array}{rcl}
485\multicolumn{3}{l}{\textrm{Syntax for }\vm}\\
486instr_\vm &::=& {\tt cnst} (n) \Alt {\tt var}(n) \\ 
487  &\Alt& {\tt setvar}(n) \Alt {\tt add} \\
488  &\Alt& {\tt branch}(k) \Alt {\tt bge}(k) \Alt {\tt halt}\\
489\\[0.1em]
490\multicolumn{3}{l}{\textrm{Syntax for }\mips}\\
491instr_\mips &::=& {\tt loadi }\, R, n \Alt {\tt load }\, R, l \\
492  &\Alt& {\tt store }\, R, l \Alt {\tt add }\, R, R, R \\
493  &\Alt& {\tt branch }\, k \Alt {\tt bge }\, R, R, k \Alt {\tt halt}
494\end{array}}
495$$
496\end{tabular}
497}
498\caption{Syntax definitions.}
499\label{fig:syntaxes}
500\end{figure}
501
502A first case study for the labelling approach is performed on a toy
503compiler. The syntax of the source, intermediate and target languages
504is given in the Figure~\ref{fig:syntaxes}. The semantics of $imp$ is
505defined over configurations $(S,K,s)$ where $S$ is a statement, $K$ is
506a continuation and $s$ is a state. A {\em continuation} $K$ is a list
507of commands which terminates with a special symbol \s{halt}. The
508semantics of $\vm$ is defined over stack-based machine configurations
509$C \vdash (i, \sigma, s)$ where $C$ is a program, $i$ is a program
510counter, $\sigma$ is a stack and $s$ is a state. The semantics of
511$\mips$ is defined over register-based machine configurations
512$C \vdash (i, m)$ where $C$ is a program, $i$ is a program counter and
513$m$ is a machine memory (with registers and main memory). The
514compilation functions $\cl{C}$ from $\imp$ to $\vm$ and $\cl{C'}$ from
515$\vm$ to $\mips$ are standard and thus eluded.  (see
516Appendix~\ref{toy-compiler-sec} for formal details about semantics and
517the compilation chain.)
518
519We apply the labelling approach introduced in section
520\ref{label-intro} to this toy compiler which results in the following
521diagram.
522
523{\footnotesize
524
525\newcolumntype{M}[1]{>{\raggedright}m{#1}}
526\begin{tabular}{M{7cm}||M{3cm}}
527$$
528\xymatrix{
529% Row 1
530  \imp \\
531%
532% Row 2
533  \imp_\ell
534  \ar[u]^{\cl{I}}
535  \ar@/^/[d]^{\w{er}_\imp}
536  \ar[r]^{\cl{C}}
537%
538& \vm_\ell
539  \ar[d]^{\w{er}_\vm
540  \ar[r]^{\cl{C}'}
541%
542& \mips_\ell
543  \ar[d]^{\w{er}_{\mips}}  \\
544%
545% Row 3
546  \imp               
547  \ar@/^/[u]^{\cl{L}}
548  \ar[r]^{\cl{C}}
549& \vm   
550  \ar[r]^{\cl{C}'}
551& \mips
552}
553$$
554&
555$$
556\begin{array}{ccc}
557\w{er}_\vm \comp \cl{C} &= &\cl{C} \comp \w{er}_{\imp} \\
558
559\w{er}_\mips \comp \cl{C'} &= &\cl{C'} \comp \w{er}_{\vm} \\
560
561\w{er}_\imp \comp \cl{L} & = & id_{\imp} \\
562
563\w{An}_\imp &= &\cl{I} \comp \cl{L} 
564\end{array}
565$$
566\end{tabular}}
567
568
569%% {\footnotesize
570%% \[
571%% \begin{array}{c||l}
572
573%% \begin{array}{ccccc}
574
575
576%% \imp      &         &                                               &                & \\
577
578%% \quad\uparrow {\cal I}      &\cl{C}   &        &\cl{C'}         &     \\
579
580%% \imp_{\ell}   &\arrow        &\vm_{\ell}                &\arrow       &\mips_{\ell}        \\
581
582%% \quad \cl{L} \uparrow \ \downarrow \w{er}_{\imp}
583%% &
584%% &\ \quad\downarrow \w{er}_{\vm}
585%% &
586%% &\ \quad\downarrow \w{er}_{\mips} \\
587
588%% \imp   &\arrow             &\vm             &\arrow &\mips        \\
589
590%%       &\cl{C}   &                          &\cl{C'}              \\
591%% \end{array}
592
593%% &\begin{array}{ccc}
594
595%% \w{er}_{\vm} \comp \cl{C} &= &\cl{C} \comp \w{er}_{\imp} \\
596
597%% \w{er}_{\mips} \comp \cl{C'} &= &\cl{C'} \comp \w{er}_{\vm} \\
598
599%% \w{er}_{\imp} \comp \cl{L} &= &\w{id}_{\imp} \\
600
601%% \w{An}_{\imp} &= &\cl{I} \comp \cl{L} 
602
603
604%% \end{array}
605%% \end{array}
606%% \]
607%% }
608
609
610\subsection{Labelled $\imp$}
611We extend the syntax so that statements
612%lab-bool and boolean conditions in \s{while} loops
613can be labelled: $S::=  \ldots \Alt \ell:S$.
614%
615% {\footnotesize
616% \[
617% \begin{array}{lll}
618%
619% \w{lb}  &::= \ell:b   &\mbox{(labelled boolean conditions)}\\
620%S  &::= \ldots \Alt \ell:S \Alt \s{while} \ \w{lb} \ \s{do} \ S
621%&\mbox{(labelled commands)}
622%
623%\end{array}
624%\]}
625%
626For instance,
627%lab-bool $\ell:(\s{while} \ \ell' : (n<x) \ \s{do} \ \ell:S )$
628$\ell:(\s{while} \ (n<x) \ \s{do} \ \ell:S )$ is a labelled command.
629% Also notice that the definition allows labels in commands to
630% be nested as in $\ell:(\ell':(x:=e))$, though this feature is not really used.
631%lab-bool The evaluation predicate $\eval$ on labelled boolean conditions is extended as follows:
632%lab-bool {\footnotesize \begin{equation} \infer{(b,s) \eval v} {(\ell:b,s) \eval (v,\ell)} \end{equation}}
633%lab-bool So a labelled boolean condition evaluates into a pair  composed of a boolean value and a label.
634The small step semantics of statements
635% defined in table \ref{small-step-imp}
636is extended as described by the following rule.
637
638{\footnotesize
639\[
640\begin{array}{lll}
641(\ell:S,K,s) &\act{\ell} &(S,K,s) 
642
643%lab-bool\\  \\
644%% (\s{if} \ b \ \s{then} \ S \ \s{else} \ S',K,s)
645%% &\act{\ell}
646%% &\left\{
647%% \begin{array}{ll}
648%% (S,K,s) &\mbox{if }(b,s)\eval (\s{true},\ell) \\
649%% (S',K,s) &\mbox{if }(b,s)\eval (\s{false},\ell)
650%% \end{array}
651%% \right.
652%% \\ \\
653
654
655%\lab-bool (\s{while} \ \w{lb} \ \s{do} \ S ,K,s) &\act{\ell} &\left\{ \begin{array}{ll}
656%\lab-bool (S,(\s{while} \ \w{lb} \ \s{do} \ S);K,s) &\mbox{if }(\w{lb},s)\eval (\s{true},\ell) \\
657%\lab-bool (\s{skip},K,s) &\mbox{if }(\w{lb},s)\eval (\s{false},\ell)
658%\lab-bool \end{array} \right. \\
659
660\end{array}
661\]}
662
663We denote with $\lambda,\lambda',\ldots$
664finite sequences of labels. In particular, we denote with $\epsilon$ the empty sequence
665and identify an unlabelled transition with a transition labelled with $\epsilon$.
666Then the small step reduction relation we have defined
667on statements becomes a {\em labelled transition system}.
668There is an obvious {\em erasure} function $\w{er}_{\imp}$ 
669from the labelled language to the unlabelled one which is the identity on expressions
670%lab-bool removes labels from labelled boolean conditions
671and boolean conditions,
672%lab-bool and is the identity on unlabelled ones,
673and traverses commands removing all labels.
674We derive a {\em labelled} big-step semantics as follows:
675$(S,s)\eval (s',\lambda)$ if $(S,\s{halt},s) \act{\lambda_1} \cdots \act{\lambda_n} 
676                                             (\s{skip},\s{halt},s')$  and $\lambda =\lambda_1 \cdots \lambda_n$.
677
678
679
680
681%(*** Stronger properties hold
682%\Defitem{(1)}
683%$(\w{er}(S),s) \eval s'$ iff $\xst{\lambda}{(S,s)\eval (s',\lambda)}$.
684%\Defitem{(2)}
685%$(\cl{I}(S),s[c/\w{cost}]) \eval s'[c+\delta/\w{cost}]$ iff
686%$\xst{\lambda}{\kappa(\lambda)=\delta \mbox{ and }(S,s[c/\w{cost}])\eval(s'[c/\w{cost}],\lambda)}$. ***)
687
688
689
690
691\subsection{Labelled $\vm$}
692We introduce a new instruction
693$\s{nop}(\ell)$
694whose semantics is defined as follows:
695
696{\footnotesize
697\[
698C \Gives (i,\sigma,s) \act{\ell} (i+1,\sigma,s) \qquad \mbox{if }C[i]=\s{nop}(\ell)~.
699\]}
700
701
702The erasure function $\w{er}_{\vm}$ amounts to remove from a
703$\vm$ code $C$ all the $\s{nop}(\ell)$ instructions and recompute jumps
704accordingly. Specifically, let $n(C,i,j)$ be the number of \s{nop} instructions
705in the interval $[i,j]$. Then, assuming $C[i]=\s{branch}(k)$ we replace the offset
706$k$ with an offset $k'$ determined as follows:
707
708
709{\footnotesize
710\[
711k' = 
712\left\{
713\begin{array}{ll}
714
715k-n(C,i,i+k) &\mbox{if }k\geq 0 \\
716
717k+n(C,i+1+k,i)   &\mbox{if }k<0
718
719\end{array}
720\right.
721\]}
722%
723The compilation function $\cl{C}$ is extended to $\imp_\ell$ by defining:
724
725{\footnotesize
726\[
727\begin{array}{llll}   
728\cl{C}(\ell:b,k) &=(\s{nop}(\ell))\cdot \cl{C}(b,k) \qquad
729&\cl{C}(\ell:S)   &=   (\s{nop}(\ell))\cdot \cl{C}(S)~.
730\end{array}
731\]}
732
733\begin{proposition}\label{labelled-sim-imp-vm}
734For all commands $S$ in $\imp_\ell$ we have that:
735
736\Defitem{(1)} $\w{er}_{\vm}(\cl{C}(S)) = \cl{C}(\w{er}_{\imp}(S))$.
737
738\Defitem{(2)} 
739If $(S,s)\eval (s',\lambda)$ then $(\cl{C}(S),s)\eval(s',\lambda)$.
740\end{proposition}
741
742
743\begin{remark}
744\label{rem:multi-set}
745In the current formulation,
746a sequence of transitions $\lambda$ in the source code
747must be simulated by the same sequence of transitions
748in the object code. However, in the actual computation
749of the costs, the order of the labels occurring in the sequence is
750immaterial. Therefore one may consider a more relaxed notion of simulation
751where $\lambda$ is a multi-set of labels.
752\end{remark}
753
754
755\subsection{Labelled $\mips$}
756The labelled extension of $\mips$ is similar to the one of $\vm$.
757We add an instruction $\s{nop} \ \ell$ whose semantics is defined
758as follows:
759%
760
761{\footnotesize
762\[
763\begin{array}{ll}
764M\Gives (i,m) \act{\ell} (i+1,m)    &\mbox{if }M[i]=(\s{nop} \ \ell)~.
765\end{array}
766\]}
767%
768The {\em erasure function} $\w{er}_{\mips}$ is also similar to the one
769of $\vm$ as it amounts to remove from a $\mips$ code all the $(\s{nop} \ \ell)$ 
770instructions and recompute jumps accordingly.
771The compilation function $\cl{C'}$ is extended to $\vm_\ell$ by
772simply translating $\s{nop}(\ell)$ as $(\s{nop} \ \ell)$:
773%
774
775{\footnotesize
776\[
777\begin{array}{ll}   
778\cl{C'}(i,C) = (\s{nop} \ \ell)
779&\mbox{if }C[i]=\s{nop}(\ell)
780\end{array}
781\]}
782The evaluation predicate for labelled $\mips$ is defined as
783$(M,m)\eval (m',\lambda)$
784if
785$M\Gives (0,m) \act{\lambda_1} \cdots \act{\lambda_n}  (j,m')$,
786$\lambda =\lambda_1 \cdots \lambda_n$  and $M[j]=\s{halt}$.
787%
788%Assuming a function $\kappa$ which associates an integer number to
789%labels, and  distinct fresh locations $l_\w{cost},l_A,l_B$ not occurring in
790%the $\mips$ code $M$ under consideration,
791%we define the instrumentation $\cl{I}_{\mips}(M)$ by replacing each instruction $(\s{nop}\ \ell)$ with
792%the following sequence of instructions:
793%{\footnotesize
794%\[
795%\begin{array}{l}
796%(\s{store} \ A,l_{A}) \cdot
797%(\s{store} \ B,l_{B}) \cdot
798%(\s{loadi} \ A, \kappa(\ell)) \cdot
799%(\s{load} \ B, l_{\w{cost}}) \cdot \\
800%(\s{add} \ A, A, B) \cdot
801%(\s{store} \ A,l_{\w{cost}}) \cdot
802%(\s{load} \ A, l_{A}) \cdot
803%(\s{load} \ B, l_{B})
804%~,
805%\end{array}
806%\]}
807%and adapting the offset of the branching instructions consequently.
808%We write $m=_X m'$ if the memories $m$ and $m'$ coincide but for the
809%locations in $X$.
810%The following proposition is the analogous for $\mips$ code of proposition
811%\ref{lab-instr-erasure-imp} for $\imp$ commands.
812%\begin{proposition}\label{lab-er-instr-mips}
813%Let $M$ be a $\mips_\ell$ code. Then:
814%\Defitem{(1)}
815%$(\w{er}_{\mips}(M),m) \eval m'$ iff $\xst{\lambda}{(M,m)\eval (m',\lambda)}$.
816%\Defitem{(2)}
817%If  $(\cl{I}_{\mips}(M),m[c/l_{\w{cost}}]) \eval m'[c+\delta/l_{\w{cost}}]$
818%then there is a $\lambda$ such that
819%$\kappa(\lambda)=\delta$ and a $m''$ such that
820%$(M,m)\eval(m'',\lambda)$ with
821%$m'=_{X}, m''$, and $X=\set{l_{\w{cost}},l_A,l_B}$.
822%\Defitem{(3)}
823%If $(M,m)\eval(m',\lambda)$ then
824%$(\cl{I}_{\mips}(M),m[c/l_{\w{cost}}]) \eval m''[c+\kappa(\lambda)/l_{\w{cost}}]$,
825%$m'=_{X} m''$, and $X=\set{l_{\w{cost}},l_A,l_B}$.
826%\end{proposition}
827%
828The following proposition relates $\vm_\ell$ code and its compilation
829and it is similar to proposition~\ref{labelled-sim-imp-vm}.
830$m \real \sigma, s$ means ``the low-level
831$\mips$ memory~$m$ realizes the $\vm$ stack~$\sigma$ and state $s$''.
832%
833\begin{proposition}\label{sim-vm-mips-prop}
834Let $C$ be a $\vm_\ell$ code. Then:
835
836\Defitem{(1)} $\w{er}_{\mips}(\cl{C'}(C)) = \cl{C'}(\w{er}_{\vm}(C))$.
837
838\Defitem{(2)} 
839If $(C,s) \eval (s',\lambda)$ and $m\real \epsilon,s$ then
840($\cl{C'}(C),m) \eval (m',\lambda)$ and $m'\real \epsilon,s'$.
841\end{proposition}
842
843
844
845
846\subsection{Labellings and instrumentations}
847Assuming a function $\kappa$ which associates an integer number with
848labels and a distinct variable \w{cost} which does not occur in the program $P$ under consideration,
849we abbreviate with $\w{inc}(\ell)$ the assignment $\w{cost}:=\w{cost}+\kappa(\ell)$.
850Then we define the instrumentation $\cl{I}$ (relative to $\kappa$ and $\w{cost})$ 
851as follows:
852\[
853\cl{I}(\ell:S) = \w{inc}(\ell) ; \cl{I}(S) ~.
854\]
855%% {\footnotesize
856%% \[
857%% \begin{array}{ll}
858
859%% \cl{I}(\ell:S) &= \w{inc}(\ell) ; \cl{I}(S)
860%% %\\
861%% %% \cl{I}_{\imp}(\s{if} \ \ell:b \ \s{then} \ S_1 \ \s{else} \ S_2)
862%% %% &= \w{cost}:=\w{cost}+\kappa(\ell); \s{if} \ b \ \s{then} \ \cl{I}_{\imp}(S_1) \ \s{else} \ \cl{I}_{\imp}(S_2) \\
863
864%% %lab-bool \cl{I}(\s{while} \ \ell:b \ \s{do} \ S ) &= \w{inc}(\ell); \s{while} \ b \ \s{do} \ (\cl{I}(S);\w{inc}(\ell))
865
866%% \end{array}
867%% \]}
868The function $\cl{I}$ just distributes over the other operators of the language.
869We extend the function $\kappa$ on labels to sequences of labels by defining
870$\kappa(\ell_1,\ldots,\ell_n) = \kappa(\ell_1)+\cdots +\kappa(\ell_n)$.
871The instrumented $\imp$ program relates to the labelled one has follows.
872%
873\begin{proposition}\label{lab-instr-erasure-imp}
874Let $S$ be an $\imp_\ell$ command. If
875$(\cl{I}(S),s[c/\w{cost}]) \eval s'[c+\delta/\w{cost}]$
876then
877$\xst{\lambda}{\kappa(\lambda)=\delta \mbox{ and }(S,s[c/\w{cost}])\eval(s'[c/\w{cost}],\lambda)}$.
878\end{proposition}
879%
880\begin{definition}\label{labelling-def}
881A {\em labelling} is a function $\cl{L}$ from an unlabelled language to
882the corresponding labelled one such that
883$\w{er}_{\imp} \comp \cl{L}$ is the identity function on the
884$\imp$ language.
885\end{definition}
886%
887\begin{proposition}\label{global-commutation-prop}
888For any labelling function $\cl{L}$, and $\imp$ program $P$, the following holds:
889\begin{equation}
890\w{er}_{\mips}(\cl{C'}(\cl{C}(\cl{L}(P))) = \cl{C'}(\cl{C}(P))~.
891\end{equation}
892\end{proposition}
893%
894\begin{proposition}\label{instrument-to-label-prop}
895Given a function $\kappa$ for the labels and a labelling function $\cl{L}$,
896for all programs $P$ of the source language if
897$(\cl{I}(\cl{L}(P)),s[c/\w{cost}]) \eval s'[c+\delta/\w{cost}]$ 
898and $m\real \epsilon,s[c/\w{cost}]$ then
899$(\cl{C'}(\cl{C}(\cl{L}(P))),m) \eval (m',\lambda)$,
900$m'\real \epsilon,s'[c/\w{cost}]$ and
901$\kappa(\lambda)=\delta$.
902\end{proposition}
903
904\subsection{Sound and precise labellings}\label{sound-label-sec}
905With any $\mips_\ell$ code $M$ we can associate a directed and rooted
906(control flow) graph whose nodes are the instruction positions $\set{0,\ldots,|M|-1}$,
907whose root is the node $0$,  and
908whose directed edges correspond to the possible transitions between instructions. 
909We say that a node is labelled if it corresponds to an instruction $\s{nop}
910\ \ell$
911%
912\begin{definition}
913A {\em simple path} in a $\mips_\ell$ code $M$ is a directed finite path in the graph associated with
914$M$ where the first node is labelled, the last node is the predecessor of either a labelled node or a leaf,
915and all the other nodes are unlabelled.
916\end{definition}
917%
918\begin{definition}
919A $\mips_\ell$ code $M$ is {\em soundly labelled} if
920in the associated graph
921the root node $0$ is labelled
922and there are no loops that do not go through a labelled node.
923\end{definition}
924%
925In a soundly labelled graph there are finitely many simple paths.
926Thus, given a soundly labelled $\mips$ code $M$, we can associate
927with every label $\ell$ a number $\kappa(\ell)$ which is the
928maximum (estimated) cost  of executing a simple
929path whose first node is labelled with $\ell$.
930We stress that in the following we assume that
931the cost of a simple path is proportional to the number of $\mips$
932instructions that are crossed in the path.
933%
934\begin{proposition}\label{sound-label-prop}
935If $M$ is soundly labelled and $(M,m) \eval (m',\lambda)$ then the cost
936of the computation is bounded by $\kappa(\lambda)$.
937\end{proposition}
938%
939Thus for a soundly labelled $\mips$ code the sequence of labels associated
940with a computation is a significant information on the execution cost.
941%
942\begin{definition}
943We say that a soundly labelled code is {\em precise}
944if for every label $\ell$ in the code, the simple
945paths starting from a node labelled with $\ell$ have
946the same cost.
947\end{definition}
948%
949In particular, a code is precise if we can associate at most one simple
950path with every label.
951%
952\begin{proposition}\label{precise-label-prop}
953If $M$ is precisely labelled and $(M,m) \eval (m',\lambda)$ then the cost
954of the computation is $\kappa(\lambda)$.
955\end{proposition}
956%
957The next point we have to check is that
958there are labelling functions (of the source code)
959such that the compilation function does produce sound and possibly
960precise labelled $\mips$ code.  To discuss this point, we introduce in table
961\ref{labelling} two labelling functions $\cl{L}_s$ and $\cl{L}_p$ for the
962$\imp$ language. The first labelling relies on just one label while
963the second one relies on a function ``\w{new}'' which is meant to return fresh
964labels and on an auxiliary function $\cl{L'}_p$ which returns a labelled command
965and a binary directive $d\in \set{0,1}$. If $d=1$
966then the command that follows (if any) must be labelled.
967%
968\begin{table}
969{\footnotesize
970\[
971\begin{array}{ll}
972\cl{L}_s(\s{prog} \ S) &=  \s{prog}\  \ell: \cl{L}_s(S) \\
973
974\cl{L}_s(\s{skip})     &=\s{skip} \\
975
976\cl{L}_s(x:=e)         &=x:=e \\
977
978\cl{L}_s(S;S')         &= \cl{L}_s(S);\cl{L}_s(S') \\
979
980
981\cl{L}_s(\s{if} \ b \ \s{then} \ S_1 \ \s{else} \ S_2) &= 
982
983\s{if} \ b \ \s{then} \ \cl{L}_s(S_1) \ \s{else} \ \cl{L}_s(S_2) \\ 
984
985
986\cl{L}_s(\s{while} \ b \ \s{do} \ S) &= \s{while} \ b \ \s{do} \ \ell:\cl{L}_s(S) \\  \\
987
988
989
990
991\cl{L}_p(\s{prog} \ S) &= \s{prog}\   \cl{L}_p(S) \\
992
993\cl{L}_p(S) &= \w{let} \ \ell=\w{new}, \ (S',d) =  \cl{L'}_p(S)  \ \w{in} \ \ell: S' \\
994
995\cl{L'}_p(S)   &=(S,0)\quad \mbox{if }S=\s{skip}\mbox{ or }S=(x:=e) \\
996
997\cl{L'}_p(\s{if} \ b \ \s{then} \ S_1 \ \s{else} \ S_2) 
998&= (\s{if} \ b \ \s{then} \ \cl{L}_p(S_1) \ \s{else} \ \cl{L}_p(S_2), 1) \\ 
999
1000\cl{L'}_p(\s{while} \ b \ \s{do} \ S) 
1001&= (\s{while} \ b \ \s{do} \ \cl{L}_p(S), 1) \\
1002
1003
1004\cl{L'}_p(S_1;S_2)         &= \w{let} \ (S'_1,d_1)= \cl{L'}_p(S_1)\ (S'_2,d_2)= \cl{L'}_p(S_2) \ \w{in} \ \\ 
1005                           &\qquad \w{case} \ d_1\\
1006                           &\qquad 0: (S'_1;S'_2,d_2) \\
1007                           &\qquad 1: \w{let} \ \ell= \w{new} \ \w{in} \ (S'_1;\ell:S'_2,d_2) 
1008
1009\end{array}
1010\]}
1011\caption{Two labellings for the $\imp$ language} \label{labelling}
1012\end{table}
1013%
1014\begin{proposition}\label{lab-sound}
1015For all $\imp$ programs $P$:
1016
1017\Defitem{(1)} 
1018$\cl{C'}(\cl{C}(\cl{L}_s(P))$ 
1019is a soundly labelled $\mips$ code.
1020
1021\Defitem{(2)}
1022$\cl{C'}(\cl{C}(\cl{L}_p(P))$ is a
1023soundly and precisely  labelled $\mips$ code.
1024\end{proposition}
1025%
1026For an example of command which is not soundly labelled, consider
1027$\ell: \s{while} \ 0<x \ \s{do} \ x:=x+1$, which when compiled, produces
1028a loop that does not go through any label.  On the other hand, for an
1029example of a program which is not precisely labelled consider
1030$\ell:(\s{if} \ 0<x \ \s{then} \ x:=x+1 \ \s{else} \ \s{skip})$.  In the compiled code, we
1031find two simple paths associated with the label $\ell$ whose cost will
1032be quite different in general.
1033
1034Once a sound and possibly precise labelling $\cl{L}$ has been
1035designed, we can determine the cost of each label and define an instrumentation
1036$\cl{I}$ whose composition with $\cl{L}$ will produce the desired
1037cost annotation.
1038
1039
1040\begin{definition}
1041Given a labelling function $\cl{L}$ for the source language $\imp$ 
1042and a program $P$  in the $\imp$ language,  we define an
1043annotation for the source program as follows:
1044\[
1045\w{An}_{\imp}(P) = \cl{I}(\cl{L}(P))~.
1046\]
1047\end{definition}
1048
1049
1050
1051
1052\begin{proposition}\label{ann-correct}
1053If $P$ is a program and $\cl{C'}(\cl{C}(\cl{L}(P)))$ is a sound (sound and precise)
1054labelling then $(\w{An}_{\imp}(P),s[c/\w{cost}]) \eval s'[c+\delta/\w{cost}]$
1055and $m\real \epsilon,s[c/\w{cost}]$ entails that
1056$(\cl{C'}(\cl{C}(P)),m) \eval m'$, $m'\real \epsilon,s'[c/\w{cost}]$ 
1057and the cost of the execution is bound (is exactly) $\delta$.
1058\end{proposition}
1059
1060
1061To summarize, producing sound and precise labellings is mainly a
1062matter of designing the labelled source language so that
1063the labelling is sufficiently {\em fine grained}.
1064For instance, in the toy compiler, it is enough to label commands while it is not necessary to label
1065boolean conditions and expressions.
1066%lab-boolthe fact that boolean conditions are labelled is instrumental to precision while
1067%the labelling of expressions turns out to be unnecessary.
1068
1069Besides soundness and precision, a third criterion to evaluate
1070labellings is that they do not introduce too many unnecessary labels.
1071We call this property {\em economy}. In practice, it seems that one
1072can produce first a sound and possibly precise labelling and then
1073apply heuristics to eliminate unnecessary labels.
1074
1075We stress that  our approach to labelling is based on the hypothesis that we can
1076obtain precise informations on the execution time of each instruction
1077of the generated binary code.  This hypothesis is indeed realised in
1078the processors of the $\intel$ family we are considering. On the other
1079hand, the execution time of instructions running on more complex
1080architectures including, {\em e.g.}, cache memory or pipelines are
1081much less predictable.  This lack of precision may somehow be
1082compensated by analysing the worst-case execution time of (long) sequences of
1083instructions. This point is further developed in appendix \ref{related-sec}.
1084
1085%(on modern processors, on a long
1086%sequence certain costs are amortized, but this is not the case
1087%in \intel). 
1088
1089
1090%\newpage
1091
1092
1093\section{Labelling approach for the $\C$ compiler}\label{C-label-sec}
1094
1095This section informally describes the labelled extensions of the languages in
1096%\marginpar{Add backpointers to section 4}
1097the compilation chain (see Appendix~\ref{C-compiler-sec} for details),
1098the way the labels are propagated by the compilation functions, the
1099labelling of the source code, the hypotheses on the control flow of
1100the labelled assembly code and the verification that we
1101perform on it, the way we build the instrumentation, and finally the
1102way the labelling approach has been tested. The process of annotating
1103a $\Clight$ program using the labelling approach is detailed in the
1104following sections. It is summarized by the following steps.
1105
1106{\begin{enumerate}\footnotesize
1107  \item 
1108  Label the input $\Clight$ program.
1109
1110  \item
1111  Compile the labelled $\Clight$ program in the labelled world. This
1112  produces a labelled assembly code.
1113
1114  \item
1115  For each label of the labelled assembly code, compute the cost of the
1116  instructions under its scope and generate a \emph{label-cost mapping}. An
1117  unlabelled assembly code --- the result of the compilation --- is obtained by
1118  removing the labels from the labelled assembly code.
1119
1120  \item 
1121  Add a fresh \emph{cost variable} to the labelled $\Clight$ program and
1122  replace the labels by an increment of this cost variable according to the
1123  label-cost mapping. The result is an \emph{annotated} $\Clight$ program with no
1124  label.
1125\end{enumerate}}
1126
1127\subsection{Labelled languages} 
1128Both the $\Clight$ and $\Cminor$ languages are extended in the same way
1129by labelling both statements and expressions (by comparison, in the
1130toy language $\imp$ we just labelled statements).
1131%lab-bool and boolean conditions).
1132The labelling of expressions aims to capture precisely their execution cost.
1133Indeed,  $\Clight$ and $\Cminor$ include expressions such
1134as $a_1?a_2;a_3$ whose evaluation cost depends on the boolean value $a_1$.
1135As both languages are extended in the same way, the extended
1136compilation does nothing more than sending $\Clight$ labelled
1137statements and expressions to those of $\Cminor$.
1138% labelled statements and expressions.
1139
1140%\subsection{Labels in $\RTLAbs$ and the back-end languages}
1141The labelled versions of $\RTLAbs$ and the languages in the back-end
1142simply consist in adding a new instruction whose semantics is to emit a label
1143without modifying the state. For the CFG based languages
1144($\RTLAbs$ to $\LTL$), this new instruction is $\s{emit}\ \w{label} \rightarrow
1145\w{node}$. For $\LIN$, $\mips$ and $\intel$, it is $\s{emit}\ \w{label}$. The translation
1146of these label instructions is immediate.
1147In $\mips$ and $\intel$, we also
1148rely on a reserved label $\s{begin\_function}$  to pinpoint the
1149beginning of a function code (cf. section \ref{fun-call-sec}).
1150
1151%% $\s{begin\_function}$. \marginpar{Is this really a new instruction or just a reserved label?}
1152%% Its semantics has no effect on the current environment,
1153%% it is only used as a syntactic marker to easily pinpoint the start of a function
1154%% in the $\mips$ code. We will explain the reason for such an instruction when
1155%% detailing the labelling of function calls.
1156
1157\subsection{Labelling of the source language}
1158As for the toy compiler (cf. end of section \ref{label-toy-sec}), the goals of a labelling are
1159soundness, precision, and possibly economy.
1160%%
1161%% The goal here is to add labels in the source program that cover every reachable
1162%% instruction of the program and avoid unlabelled loops; this can be seen as a \emph{soundness}
1163%% property. Another important point is \emph{precision}, meaning that a label
1164%% might cover several paths to the next labels only if those paths have equal
1165%% costs. Several labellings might satisfy the soundness and precision conditions,
1166%% but from an engineering point of view, a labelling that makes obvious which
1167%% instruction is under the scope of which label would be better. There is a thin
1168%% line to find between too many labels --- which may obfuscate the code --- and
1169%% too few labels --- which makes it harder to see which instruction is under the
1170%% scope of which label. The balance leans a bit towards the \emph{economy} of
1171%% labels because the cost of executing an assembly instruction often depends on
1172%% its context (for instance by the status of the cache memory).
1173We explain our labelling by considering the constructions of $\Clight$ and their compilation to
1174assembly code.
1175%; a useful work hypothesis being that $\Clight$ expressions side-effect free.
1176
1177\paragraph{Sequential instructions}
1178A sequence of $\Clight$ instructions that compile to sequential assembly code,
1179such as a sequence of assignments, can be handled by a single %preceding
1180label which covers the unique execution path.
1181%. Indeed, executing the first instruction of successive assignments will
1182%eventually lead to execute the following instructions in a unique path.
1183%% The example below illustrates the labelling of `sequential' $\Clight$ instructions.
1184%% \begin{center}
1185%%   {\footnotesize
1186%%     \begin{tabular}{lllll}
1187%%       $\Clight$ & $\xrightarrow{Labelling}$ & Labelled $\Clight$ &
1188%%       $\xrightarrow{Compilation}$ & Labelled $\mips$\\\\
1189%%       \codeex{i = 0;} & & \textbf{\_cost:} & & \codeex{emit} \textbf{\_cost}\\
1190%%       \codeex{tab[i] = x;} & & \codeex{i = 0;} & & \codeex{li\ \ \ \$v0, 4}\\
1191%%       \codeex{x++;} & & \codeex{tab[i] = x;} & &
1192%%       \codeex{mul\ \ \$v0, \$zero, \$v0}\\
1193%%       & & \codeex{x++;} & & \codeex{add\ \ \$v0, \$a1, \$v0}\\
1194%%       & & & & \codeex{sw\ \ \ \$a0, 0(\$v0)}\\
1195%%       & & & & \codeex{li\ \ \ \$v0, 1}\\
1196%%       & & & & \codeex{add\ \ \$a0, \$a0, \$v0}
1197%%     \end{tabular}}
1198%% \end{center}
1199
1200\paragraph{Ternary expressions and conditionals}
1201Most $\Clight$ expressions compile to sequential assembly code.
1202{\em Ternary expressions}, that introduce a branching in the control
1203flow, are one exception. In this case, we achieve precision by associating a label with each branch.
1204This is similar to the treatment of the conditional we have already discussed
1205in section \ref{label-toy-sec}. As for the $\Clight$ operations \texttt{\&\&} and \texttt{||}
1206which have a lazy semantics, they are transformed to ternary expressions
1207\emph{before} computing the labelling.
1208%% \begin{center}
1209%%   {\footnotesize
1210%%     \begin{tabular}{lllll}
1211%%       $\Clight$ & $\xrightarrow{Labelling}$ & Labelled $\Clight$ &
1212%%       $\xrightarrow{Compilation}$ & Labelled $\mips$\\\\
1213%%       \codeex{b ? x+1 :} & &
1214%%       \codeex{b ?} \textbf{(\_cost1:} \codeex{x+1}\textbf{)} \codeex{:} & &
1215%%       \codeex{beq\ \ \$a0, \$zero, c\_false}\\
1216%%       \codeex{\ \ \ \ \ y} & &
1217%%       \codeex{\ \ \ \ }\textbf{(\_cost2:} \codeex{y}\textbf{)}
1218%%       & & \codeex{emit} \textbf{\_cost1}\\
1219%%       & & & & \codeex{li\ \ \ \$v0, 1}\\
1220%%       & & & & \codeex{add\ \ \$v0, \$a1, \$v0}\\
1221%%       & & & & \codeex{j\ \ \ \ exit}\\
1222%%       & & & & \codeex{c\_false:}\\
1223%%       & & & & \codeex{emit} \textbf{\_cost2}\\
1224%%       & & & & \codeex{move \$v0, \$a2}\\
1225%%       & & & & \codeex{exit:}
1226%%     \end{tabular}}
1227%% \end{center}
1228%% \paragraph{Related cases.}
1229%Of course these exThese operations have a very close
1230%behaviour to that of a ternary expression. In fact, they are compiled as such,
1231%which introduces implicit branches with constant values. For example, the
1232%$\Clight$ expression \texttt{x \&\& y} is translated into the $\Cminor$
1233%expression \texttt{x?(y?1:0):0}. To handle these two particular cases, the
1234%$\Clight$ \texttt{\&\&} and \texttt{||} operations are transformed in $\Clight$
1235%ternary expressions \emph{before} the labelling is processed.
1236
1237%% \subsubsection{Conditionals}
1238%% Conditionals are another way to introduce a branching. As for ternary
1239%% expressions, the labelling of a conditional consists in adding a starting label
1240%% to the labelling of each branch.
1241%% \begin{center}
1242%%   {\footnotesize
1243%%     \begin{tabular}{lllll}
1244%%       $\Clight$ & $\xrightarrow{Labelling}$ & Labelled $\Clight$ &
1245%%       $\xrightarrow{Compilation}$ & Labelled $\mips$\\\\
1246%%       \codeex{if (b) \{} & & \codeex{if (b) \{} & &
1247%%       \codeex{beq\ \ \$a0, \$zero, c\_false}\\
1248%%       \codeex{\ \ x = 1;} & & \ \ \textbf{\_cost1:}
1249%%       & & \codeex{emit} \textbf{\_cost1}\\
1250%%       \codeex{\ \ ... \}} & & \codeex{\ \ x = 1;} & & \codeex{li\ \ \ \$v0, 1}\\
1251%%       \codeex{else \{} & & \codeex{\ \ ... \}} & & \codeex{...}\\
1252%%       \codeex{\ \ x = 2;} & & \codeex{else \{} & & \codeex{j\ \ \ \ exit}\\
1253%%       \codeex{\ \ ... \}} & & \ \ \textbf{\_cost2:} & & \codeex{c\_false:}\\
1254%%       & & \codeex{\ \ x = 2;} & & \codeex{emit} \textbf{\_cost2}\\
1255%%       & & \codeex{\ \ ... \}} & & \codeex{li\ \ \ \$v0, 2}\\
1256%%       & & & & \codeex{...}\\
1257%%       & & & & \codeex{exit:}
1258%%     \end{tabular}}
1259%% \end{center}
1260
1261\paragraph{Loops}
1262Loops in $\Clight$ are guarded by a condition. Following the arguments for the
1263previous cases, we add two labels when encountering a loop construct: one label
1264to start the loop's body, and one label when exiting the loop.
1265This is similar to the treatment of \s{while} loops discussed in section \ref{label-toy-sec} and it
1266is enough to guarantee that the loop in the compiled code goes through a label.
1267%The definition of the soundness of the labelling in the toy compiler (section
1268%\ref{sound-label-sec}) establishes that each loop's body must contain at least
1269%one label. Interestingly, this is thus trivially satisfied.
1270%% \begin{center}
1271%%   {\footnotesize
1272%%     \begin{tabular}{lllll}
1273%%       $\Clight$ & $\xrightarrow{Labelling}$ & Labelled $\Clight$ &
1274%%       $\xrightarrow{Compilation}$ & Labelled $\mips$\\\\
1275%%       \codeex{while (b) \{} & & \codeex{while (b) \{} & &
1276%%       \codeex{loop:}\\
1277%%       \codeex{\ \ i++;} & & \textbf{\ \ \_cost1:} & &
1278%%       \codeex{beq\ \ \$a0, \$zero, exit}\\
1279%%       \codeex{\ \ ... \}} & & \codeex{\ \ i++;} & &
1280%%       \codeex{emit} \textbf{\_cost1}\\
1281%%       \codeex{x = i;} & & \codeex{\ \ ... \}} & & \codeex{li\ \ \ \$v0, 1}\\
1282%%       & & \textbf{\_cost2:} & & \codeex{add\ \ \$a1, \$a1, \$v0}\\
1283%%       & & \codeex{x = i;} & & \codeex{...}\\
1284%%       & & & & \codeex{j\ \ \ \ loop}\\
1285%%       & & & & \codeex{exit:}\\
1286%%       & & & & \codeex{emit} \textbf{\_cost2}\\
1287%%       & & & & \codeex{move \$a2, \$a1}
1288%%     \end{tabular}}
1289%% \end{center}
1290
1291\paragraph{Program Labels and Gotos}
1292In $\Clight$, program labels and gotos are intraprocedural. Their only effect on
1293the control flow of the resulting assembly code is to potentially introduce an
1294unguarded loop. This loop must contain at least one cost label in order to
1295satisfy the soundness condition, which we ensure by adding a cost label right
1296after a program label.
1297
1298\begin{center}
1299  {\footnotesize
1300    \begin{tabular}{lllll}
1301      $\Clight$ & $\xrightarrow{Labelling}$ & Labelled $\Clight$ &
1302      $\xrightarrow{Compilation}$ & Labelled $\mips$\\\\
1303      \codeex{lbl:} & & \codeex{lbl:} & & \codeex{lbl:}\\
1304      \codeex{i++;} & & \textbf{\_cost:} & &
1305      \codeex{emit} \textbf{\_cost}\\
1306      \codeex{...} & & \codeex{i++;} & &
1307      \codeex{li\ \ \ \$v0, 1}\\
1308      \codeex{goto lbl;} & & \codeex{...} & &
1309      \codeex{add\ \ \$a0, \$a0, \$v0}\\
1310      & & \codeex{goto lbl;} & & \codeex{...}\\
1311      & & & & \codeex{j\ \ \ \ lbl}
1312    \end{tabular}}
1313\end{center}
1314
1315\paragraph{Function calls}\label{fun-call-sec}
1316Function calls in assembly code are performed by indirect jumps, the
1317address of the callee being in the memory. In the general case, this
1318address cannot be inferred statically. Even though the destination
1319point of a function call is unknown, when the considered assembly code
1320has been produced by our compiler, we know for a fact that this
1321function ends with a return statement that transfers the control back
1322to the instruction following the function call in the caller. As a
1323result, we treat function calls according to the following global
1324invariants of the compilation: (1) the instructions of a function are
1325covered by the labels inside this function, (2) we assume a function
1326call always returns and runs the instruction following the call.
1327Invariant (1) entails in particular that each function must contain at
1328least one label. To ensure this, we simply add a starting label in
1329every function definition. The example below illustrates this point:
1330
1331\begin{center}
1332  {\footnotesize
1333    \begin{tabular}{lllll}
1334      $\Clight$ & $\xrightarrow{Labelling}$ & Labelled $\Clight$ &
1335      $\xrightarrow{Compilation}$ & Labelled $\mips$\\\\
1336      \codeex{void f () \{} & & \codeex{void f () \{} & & \codeex{f\_start:}\\
1337      \codeex{\ \ f}\emph{'s body} & & \textbf{\ \ \_cost:} & &
1338      \emph{Frame Creation}\\
1339      \codeex{\}} & & \codeex{\ \ f}\emph{'s body} & & \emph{Initializations}\\
1340      & & \codeex{\}} & & \codeex{emit} \textbf{\_cost} \\
1341      & & & & \codeex{f}\emph{'s body}\\
1342      & & & & \emph{Frame Deletion}\\
1343      & & & & \codeex{return}
1344    \end{tabular}}
1345\end{center}
1346
1347We notice that some instructions in assembly code will be
1348inserted \emph{before} the first label is emitted. These instructions
1349relate to the frame creation and/or variable initializations, and are
1350composed of sequential instructions (no branching). To deal with this
1351issue, we take the convention that the instructions that precede the
1352first label in a function code are actually under the scope of the
1353first label.  Invariant (2) is of course an over-approximation of the
1354program behavior as a function might fail to return because of an
1355infinite loop.  In this case, the proposed labelling remains correct:
1356it just assumes that the instructions following the function call will
1357be executed, and takes their cost into consideration. The final
1358computed cost is still an over-approximation of the actual cost.
1359
1360% As for the imprecision that it may
1361% introduce, we consider it to be reasonable because instructions that follow a
1362% call to a non-returning function can only be regarded as a bug.
1363% RA: IT SEEMS TO ME THAT WE DO NOT CARE WHAT HAPPENS OF OUR COST PREDICTIONS IF THE PROGRAM LOOPS
1364
1365\subsection{Verifications on the object code}
1366
1367The labelling previously described has been designed so that the
1368compiled assembly code satisfies the soundness and precision
1369conditions. However, we do not need to prove this, instead we have to
1370devise an algorithm that checks the conditions on the compiled code.
1371The algorithm assumes a correct management of function calls in the
1372compiled code. In particular, when we call a function we always jump
1373to the first instruction of the corresponding code segment and when we
1374return we always jump to an an instruction that follows a call.  We
1375stress that this is a reasonable hypothesis that is essentially
1376subsumed by the proof that the object code {\em simulates} the source
1377code.
1378
1379In our current implementation, we check the soundness and the
1380precision conditions while building at the same time the label-cost
1381mapping. To this end, the algorithm takes the following main steps.
1382
1383{\footnotesize
1384\begin{itemize}
1385\item First, for each function a control flow graph is built.
1386
1387\item For each graph, we check whether there is a unique
1388label that is reachable from the root by a unique path.
1389This unique path corresponds to the instructions generated by
1390the calling conventions as discussed in section \ref{fun-call-sec}.
1391We shift the occurrence of the label to the root of the graph.
1392
1393\item 
1394By a  strongly connected components algorithm,
1395we check whether every loop in the graphs goes through at least one label.
1396
1397\item We perform a (depth-first) search of the graph. Whenever we reach a
1398  labelled node, we perform a second (depth-first) search that stops at labelled
1399  nodes and computes an upper bound on the cost of the occurrence of the label. Of
1400  course, when crossing a branching instruction, we take the maximum cost of the
1401  branches.  When the second search stops we update the current cost of the
1402  label-cost mapping (by taking a maximum) and we continue the first search.
1403
1404\item Warning messages are emitted whenever the maximum is taken between two
1405different values as in this case  the precision condition may be violated.
1406
1407\end{itemize}}
1408
1409In the particular case of the $\intel$, which is a very basic
1410micro-controller, the number of cycles required by each instruction is
1411fully specified. Therefore, if the labelling is precise, the cost
1412associated to a label is the exact number of cycles necessary to
1413execute every simple execution path starting from that label.
1414
1415%% \item Then, for each node $n$ (instruction) of each control flow graph
1416%% we determine the set of labels the node is under the scope of.
1417%% This amounts to determine the set of labelled nodes in the graph for which there
1418%% is a directed path to $n$ that does not cross another labelled node.
1419%% There is one exception though, we associate to the instructions that are at the beginning of
1420%% a function the first label that follows them.
1421%% We implement this verification through a depth-first search. Along the way,
1422%% we also check that every instruction is in the   scope of a label which is part
1423%% of our soundness condition.
1424
1425%% \item To check whether every loop goes through at least one label, we simply rely on a
1426%%   strongly connected components algorithm. Pass this point, the soundness
1427%%   condition is verified.
1428
1429%% \item Finally, the label-cost mapping is computed by adding the cost of an
1430%%   instruction to the cost associated with the labels it is in the scope of
1431%%  (this is a rough approximation if a label can occur several time, but this situation
1432%%   does not seem to arise with our labelling).
1433%%   There is only one particular case: the branching instruction. When founding one, the
1434%%   cost to reach the next label is computed for each branch. If it is different,
1435%%   a message warns the user that the precision condition may be violated, and the
1436%%   maximum is considered in the resulting cost.
1437%% \end{itemize}
1438
1439% \paragraph{Colouring algorithm.}
1440% Most of the work to check soundness and precision is done by an algorithm that\marginpar{To be made more abstract}
1441% colours each instruction of the $\mips$ code. Four colours are used:
1442% \begin{itemize}
1443% \item \emph{white} tags instructions that have not been processed yet. Initially
1444%   every instruction is white;
1445% \item the \emph{black} colour is used together with a label whose scope is
1446%   currently opened. An instruction may be in the scope of several labels; thus,
1447%   colouring an instruction $i$ in black with label $l$ adds $l$ to the set of
1448%   labels that $i$ is in the scope of;
1449% \item \emph{gray} is the colour of an instruction that has been processed with no
1450%   associated label scope. It is the starting colour of the algorithm;
1451% \item finally, \emph{yellow} is used for the particular case of the instructions
1452%   before the first label of a function. The beginning of a function is marked
1453%   with the instruction $\s{begin\_function}$.
1454% \end{itemize}
1455% The algorithm has the following arguments: the code to colour, a stack of lines
1456% in the code to process, and the current colour. Initially, all the instructions
1457% in the code are white, the stack of lines to process is formed with the first
1458% line only, and the current colour is gray. Processing an instruction consists in
1459% a case analysis over the current colour, the instruction, and its associated
1460% colour. The algorithm ends when the stack of lines to process is empty, or when
1461% reaching the following particular case: finding an instruction coloulred in black
1462% $L$ ($L$ is a set of labels), when the current colour is black $l$ and $l \in
1463% L$. Then the algorithm stops and an error is reported: the soundness condition
1464% may be violated because a loop with no $\s{emit}$ instruction has been found.
1465
1466% \paragraph{Checking the soundness.}
1467% The presence of loops with no $\s{emit}$ instruction is checked during the
1468% colouring algorithm. The other part of the soundness condition is checked on the
1469% result of the algorithm: if an instruction is not black in the end, this means
1470% that it is under the scope of no label. Then, an error is raised.
1471
1472% \paragraph{Building the label-cost mapping and checking the precision.}
1473% Having determined which instruction is under the scope of which label, the label-cost
1474% mapping is computed simply by adding the cost of an instruction to the cost
1475% associated to the label(s) it is in the scope of. There is only one particular
1476% case: the branching instruction. When founding one, the cost to reach the next
1477% label is computed for each branch. If it is different, a message warns the user
1478% that the precision condition may be violated, and the maximum is considered in
1479% the resulting cost.
1480
1481% There are a lot of cases, some examples are described below:
1482% \begin{itemize}
1483% \item when finding a $\s{begin\_function}$ instruction, it is coloured in yellow,
1484%   the current color is set to yellow, and the next line is added to the set of
1485%   lines to process;
1486% \item when finding a $\s{emit}\ l$ white instruction, the instruction is colored
1487%   in black $l$, every yellow instruction in the code is colored in black
1488%   $l$, the current color is set to black $l$, and the next line is added to the
1489%   set of lines to process;
1490% \item when finding an instruction colored in black $L$ ($L$ is a set of labels),
1491%   if the current color is black $l$ where $l \in L$, then the algorithm stops
1492%   and an error is reported: the soundness condition may be violated because a
1493%   loop with no $\s{emit}$ instruction may be found. If $l
1494%   \not\in L$, the instruction is simply colored in black $l$ -- which means that
1495%   $l$ is added to $L$ --- and the next line is added to the set of lines to
1496%   process;
1497% \end{itemize}
1498
1499% The verification algorithm updates a label-cost mapping; it is initially
1500% empty. Moreover, the algorithm has a label argument whose scope is currently
1501% opened; the initial label is undefined. Each time an instruction of the $\mips$
1502% code is processed, its cost is added in the label-cost mapping to the cost
1503% associated to the current label. The algorithm starts with the first
1504% instruction, continues with the instructions below, follows unconditional direct
1505% jumps, and passes through function calls and indirect jumps.
1506
1507\subsection{Building the cost annotation}
1508
1509Once the label-cost mapping is computed, instrumenting the labelled source code is an easy task. A fresh global variable which we call
1510\emph{cost variable} is added to the source program with the  purpose of holding the cost value and it is initialized at the very beginning of the \codeex{main} program.
1511Then, every label is replaced by an increment of the
1512cost variable according to the label-cost
1513mapping.
1514Following this replacement, the cost labels disappear and the result is
1515a $\Clight$ program with annotations in the form of assignments.
1516
1517There is one final problem: labels inside expressions.
1518As we already mentioned,  $\Clight$ does not allow
1519writing side-effect instructions --- such as  cost
1520increments --- inside expressions. To cope with this restriction, we produce
1521first an instrumented $\C$ program --- with side-effects in expressions
1522--- that we translate back to $\Clight$ using $\cil$.
1523This process is summarized below.
1524
1525\begin{center}
1526  {\footnotesize
1527    \begin{tabular}{lllll}
1528      $\left . \begin{array}{l}
1529      \text{Labelled $\Clight$}\\
1530      \text{label-cost mapping}
1531    \end{array} \right \}$ & $\xrightarrow{\text{Instrumentation}}$ &
1532      Instrumented $\C$ & $\xrightarrow{\cil}$ & Instrumented $\Clight$
1533    \end{tabular}}
1534\end{center}
1535
1536\subsection{Testing}
1537It is desirable to test the coherence of the labelling from $\Clight$
1538to assembly code.  To this end, each labelled language comes with an
1539interpreter that produces the trace of the labels encountered during
1540the computation.
1541
1542% Not only does it classically interpret programs,
1543% but also, an interpret produces the trace of labels encountered during
1544% interpretation.
1545Then, one naive approach is to test
1546the equality of the traces
1547produced by the program at the different stages of the compilation.
1548Our current implementation passes this kind of tests.
1549For some optimizations that may re-order computations, the weaker
1550condition mentioned in remark~\ref{rem:multi-set} could be considered.
1551
1552Furthermore, in the case of the $\intel$ back-end, we can run our compiled
1553code into a dedicated simulator and we check that the statically computed
1554number of cycles for each simple path is the same as the real one.
1555
1556%\marginpar{Input Nicolas}
1557
1558
1559%% \paragraph{Contrasting the approaches}~
1560%% \begin{description}
1561%% \item[PROS] Better compositionality, we do not need to handle concrete values
1562%% in the simulation proof and we do not need to define the intermediate annotations.
1563%% \item[CONS] The simulation might need to be relaxed if we want to reorder pieces of
1564%% computation.  For some optimisations that allow to extract a piece of code from a loop,
1565%% it might be difficult to find a suitable notion of simulation that still
1566%% allows to have precision.
1567%% \end{description}
1568
1569
1570\section{Conclusion and future work}\label{conclusion-sec}
1571We have discussed the problem of building a compiler which can {\em
1572  lift} in a provably correct way pieces of information on the
1573execution cost of the object code to cost annotations on the source
1574code.  To this end, we have introduced the so called
1575{\em labelling} approach and discussed its formal application to a
1576toy compiler. Based on this experience, we have argued that the
1577approach has good scalability properties, and  to substantiate this
1578claim, we have reported on our successful experience in implementing
1579and testing the labelling approach on top of a prototype compiler
1580written in $\ocaml$ for a large fragment of the $\C$ language which
1581can be shortly described as $\Clight$ without floating point.
1582
1583We discuss next a few directions for future work.
1584%
1585First, we are currently testing the current compiler on
1586the kind of $\C$ code produced for embedded applications
1587by a $\lustre$ compiler.
1588Starting from the annotated $\C$ code, we are relying on
1589the $\framac$ tool to produce automatically meaningful information on, say,
1590the reaction time of a given synchronous program.
1591%
1592%Second, we are porting the current compiler to other assembly
1593%languages. In particular, we are interested in targeting
1594%one of the assembly languages covered by the $\absint$ tool so as
1595%to obtain more realistic estimations of
1596%the execution cost of sequences of instructions.
1597%
1598Second,  we plan to formalize and validate in the {\em Calculus of Inductive
1599Constructions} the prototype implementation of the labelling approach
1600for the $\C$ compiler described in section \ref{C-label-sec}.
1601This requires a major implementation effort which will be carried
1602on in collaboration with our partners of  the $\cerco$ project~\cite{Cerco10}.
1603%
1604%Fourth, we plan to study the applicability of the labelling
1605%approach to other optimisation techniques in the realm of
1606%the $\C$ compilers technology such as {\em loop optimisations},
1607%and to other languages which rely on 
1608%rather distinct compilation technologies
1609%such as a language of the $\ml$ family.
1610
1611
1612{\footnotesize
1613\begin{thebibliography}{99}
1614
1615\bibitem{AbsInt}
1616AbsInt Angewandte Informatik. {\tt http://www.absint.com/}.
1617
1618%\bibitem{CercoDeliverable}
1619%R.M.~Amadio, N.~Ayache, K.~Memarian, R.~Saillard, Y.~R\'egis-Gianas.
1620%\newblock Compiler Design and Intermediate Languages.
1621%\newblock Deliverable 2.1 of~\cite{Cerco10}.
1622
1623\bibitem{Cerco10}
1624Certified Complexity (Project description).
1625\newblock  ICT-2007.8.0 FET Open, Grant 243881.
1626{\tt http://cerco.cs.unibo.it}.
1627%\newblock  Partners: Universities of Bologna, Edinburgh, and Paris Diderot, 2010.
1628
1629\bibitem{SCADE}
1630Esterel Technologies.
1631{\tt http://www.esterel-technologies.com}.
1632
1633\bibitem{Frama-C}
1634$\framac$ software analysers.
1635{\tt http://frama-c.com/}.
1636
1637\bibitem{AbsintScade}
1638C.~Ferdinand, R.~Heckmann, T.~ Le Sergent, D.~Lopes, B.~Martin, X.~Fornari, and F.~Martin.
1639\newblock Combining a high-level design tool for safety-critical
1640systems with a tool for {WCET} analysis of executables.
1641\newblock  In
1642%Proc. of the 4th European Congress on
1643{\em Embedded Real Time Software (ERTS)},
1644%Toulouse,
16452008.
1646
1647\bibitem{Fornari10}
1648X.~Fornari.
1649\newblock Understanding how {SCADE} suite {KCG} generates safe {C} code.
1650\newblock White paper, Esterel Technologies, 2010.
1651
1652\bibitem{Larus05}
1653J.~Larus.
1654\newblock Assemblers, linkers, and the SPIM simulator.
1655\newblock Appendix of {\em Computer Organization and Design: the hw/sw interface},
1656by Hennessy and Patterson, 2005.
1657
1658\bibitem{intel94}
1659\newblock MCS 51 Microcontroller Family User's Manual.
1660\newblock Publication number 121517,
1661by Intel Corporation, 1994,
1662
1663\bibitem{Leroy09}
1664X.~Leroy.
1665\newblock Formal verification of a realistic compiler.
1666\newblock {\em Commun. ACM}, 52(7):107-115, 2009.
1667
1668
1669\bibitem{Leroy09-ln}
1670X.~Leroy.
1671\newblock Mechanized semantics, with applications to program proof and compiler verification.
1672%(lecture notes and Coq development).
1673\newblock {\em Marktoberdorf summer school}, 2009.
1674
1675
1676%% \bibitem{MP67}
1677%% J.~McCarthy and J.~Painter.
1678%% \newblock Correctness of a compiler for arithmetic expressions.
1679%% \newblock In {\em Math. aspects of Comp. Sci. 1}, vol. 19
1680%% of Symp. in Appl. Math., AMS, 1967.
1681
1682\bibitem{CIL02}
1683 G.~Necula, S.~McPeak, S.P.~Rahul, and W.~Weimer.
1684\newblock CIL: Intermediate Language and Tools for Analysis and Transformation of C Programs.
1685\newblock In {\em Proceedings of Conference on Compiler Construction},
1686Springer LNCS 2304:213--228, 2002.
1687
1688\bibitem{Pottier}
1689F.~Pottier.
1690\newblock Compilation (INF 564), \'Ecole Polytechnique, 2009-2010.
1691{\tt http://www.enseignement.polytechnique.fr/informatique/INF564/}.
1692
1693
1694%\bibitem{Saillard}
1695%R.~Saillard.
1696%\newblock Complexit\'e Certifi\'ee.
1697%\newblock Rapport de Master Recherche, Universit\'e Paris-Diderot, 2010.
1698
1699\bibitem{W09}
1700R.~Wilhelm et al.
1701\newblock The worst-case execution-time problem - overview of methods and survey of tools.
1702\newblock {\em ACM Trans. Embedded Comput. Syst.}, 7(3), 2008.
1703
1704
1705\bibitem{share}
1706Microcontroller market and Technology. Analysis report 2008.
1707EMITT solutions.
1708
1709
1710\end{thebibliography}}
1711
1712\newpage
1713\appendix
1714
1715\section{A toy compiler}\label{toy-compiler-sec}
1716We formalize the toy compiler
1717introduced in section \ref{toy-intro}~\footnote{A mechanized proof in the
1718$\coq$ proof assistant can be found in K.~Memarian's
1719{\em Travail d'\'etude et de recherche}
1720entitled {\em Complexit\'e Certifi\'ee}
1721which can be downloaded at
1722\url{http://www.pps.jussieu.fr/~yrg/miniCerCo/}.}.
1723
1724\subsection{\imp: language and semantics}\label{imp-sec}
1725The syntax of the $\imp$ language is described below.
1726This is a rather standard imperative language with
1727\s{while} loops and \s{if-then-else}.
1728
1729{\footnotesize
1730\[
1731\begin{array}{lll}
1732
1733\w{id}&::= x\Alt y\Alt \ldots       &\mbox{(identifiers)} \\
1734\w{n} &::= 0 \Alt -1 \Alt +1 \Alt \ldots &\mbox{(integers)} \\
1735v &::= n \Alt \s{true} \Alt \s{false}  &\mbox{(values)} \\
1736e &::= \w{id} \Alt n \Alt e+e    &\mbox{(numerical expressions)} \\
1737b &::= e<e  &\mbox{(boolean conditions)} \\
1738S &::= \s{skip} \Alt \w{id}:=e \Alt S;S \Alt 
1739  \s{if} \ b \ \s{then} \ S \ \s{else} \ S
1740   \Alt \s{while} \ b \ \s{do} \ S  &\mbox{(commands)} \\
1741P &::= \s{prog} \ S      &\mbox{(programs)}
1742
1743\end{array}
1744\]}
1745
1746Let $s$ be a total function from identifiers to integers representing
1747the {\bf state}.  If $s$ is a state, $x$ an identifier, and $n$ an
1748integer then $s[n/x]$ is the `updated' state such that $s[n/x](x)=n$
1749and $s[n/x](y)=s(y)$ if $x\neq y$.
1750%
1751%\subsection{$\imp$: semantics}
1752The {\em big-step} operational semantics of $\imp$ 
1753expressions and boolean conditions is defined as follows:
1754
1755{\footnotesize
1756\[
1757\begin{array}{c}
1758
1759\infer{}
1760{(v,s)\eval v}
1761
1762\qquad
1763
1764\infer{}
1765{(x,s) \eval s(x)}
1766
1767\qquad
1768
1769\infer{(e,s)\eval v\quad (e',s)\eval v'}
1770{(e+e',s)\eval (v+_{\Z} v')}
1771
1772\qquad
1773\infer{(e,s)\eval v\quad (e',s)\eval v'}
1774{(e<e',s) \eval (v<_{\Z} v')}
1775\end{array}\]}
1776
1777A {\em continuation} $K$ is a list of commands which terminates with a special
1778symbol \s{halt}:
1779$K::= \s{halt} \Alt S \cdot K$.
1780Table \ref{small-step-imp} defines a small-step semantics of $\imp$ commands
1781whose basic judgement has the shape:
1782$(S,K,s) \arrow  (S',K',s')$.
1783We define the semantics of a program $\s{prog} \ S$ as the semantics
1784of the command $S$ with continuation $\s{halt}$.
1785We derive a big step semantics from the small step one as follows:
1786$(S,s) \eval s'$  if $(S,\s{halt},s) \arrow \cdots \arrow (\s{skip},\s{halt},s')$.
1787
1788\begin{table}
1789{\footnotesize
1790\[
1791\begin{array}{lll}
1792(x:=e,K,s) &\arrow &(\s{skip},K,s[v/x]) \qquad\mbox{if }(e,s)\eval v \\ \\
1793
1794(S;S',K,s) &\arrow &(S,S'\cdot K,s) \\ \\
1795
1796(\s{if} \ b \ \s{then} \ S \ \s{else} \ S',K,s) &\arrow 
1797&\left\{
1798\begin{array}{ll}
1799(S,K,s) &\mbox{if }(b,s)\eval \s{true} \\
1800(S',K,s) &\mbox{if }(b,s)\eval \s{false}
1801\end{array}
1802\right. \\ \\
1803
1804
1805(\s{while} \ b \ \s{do} \ S ,K,s) &\arrow 
1806&\left\{
1807\begin{array}{ll}
1808(S,(\s{while} \ b \ \s{do} \ S)\cdot K,s) &\mbox{if }(b,s)\eval \s{true} \\
1809(\s{skip},K,s) &\mbox{if }(b,s)\eval \s{false}
1810\end{array}
1811\right. \\ \\
1812
1813
1814(\s{skip},S\cdot K,s) &\arrow
1815&(S,K,s)
1816
1817
1818
1819\end{array}
1820\]}
1821\caption{Small-step operational semantics of $\imp$ commands}\label{small-step-imp}
1822\end{table}
1823
1824
1825
1826
1827\subsection{\vm: language and semantics}\label{vm-sec}
1828Following~\cite{Leroy09-ln},
1829we define a virtual machine $\vm$ and its programming language.
1830The machine includes the following elements:
1831(1) a fixed code $C$ (a possibly empty sequence of instructions),
1832(2) a program counter \w{pc},
1833(3) a store $s$ (as for the source program),
1834(4) a stack of integers $\sigma$.
1835
1836%% We will rely on the following instructions with the associated
1837%% informal semantics:
1838
1839%% {\footnotesize
1840%% \[
1841%% \begin{array}{ll}
1842%% {\tt cnst(n)}        &\mbox{push on the stack} \\
1843%% {\tt var(x)}         &\mbox{push value }x   \\
1844%% {\tt setvar(x)}      &\mbox{pop value and assign it to }x \\
1845%% {\tt add}            &\mbox{pop 2 values and push their sum} \\
1846%% {\tt branch(k)}      &\mbox{jump with offset }k        \\
1847%% {\tt bge(k)}         &\mbox{pop 2 values and jump if greater or equal with offset }k \\
1848%% {\tt halt}           &\mbox{stop computation}
1849%% \end{array}
1850%% \]}
1851
1852%% In the branching instructions, $k$ is an integer that has to be added
1853%% to the current program counter in order to determine the following
1854%% instruction to be executed.
1855%In the following to increase readability
1856%we will often rely on labels in order to point symbolically to a
1857%certain instruction.
1858Given a sequence $C$, we denote with $|C|$ its length and
1859with  $C[i]$ its $i^{\w{th}}$ element (the leftmost element
1860being the $0^{\w{th}}$ element).
1861The operational semantics of the instructions is formalized by rules of the shape
1862$C \Gives (i,\sigma,s) \arrow (j,\sigma',s')$
1863and it is fully described in table \ref{semantics-vm}.
1864Notice that $\imp$ and $\vm$ semantics share the same notion of store.
1865%A sequence of exactly $n$ transitions is written $A \arrow^n B$.
1866We write, {\em e.g.}, $n\cdot \sigma$ to stress that the top element
1867of the stack exists and is $n$.
1868We will also write $(C,s)\eval s'$ if
1869$C\Gives (0,\epsilon,s) \trarrow (i,\epsilon,s')$ and $C[i]=\s{halt}$.
1870\begin{table}
1871{\footnotesize
1872\[
1873\begin{array}{l|l}
1874
1875\mbox{Rule}                                                 & C[i]= \\\hline
1876
1877C \Gives (i, \sigma, s) \arrow (i+1, n\cdot \sigma,s)  & {\tt cnst}(n) \\ 
1878C \Gives (i, \sigma, s) \arrow (i+1,s(x)\cdot \sigma,s)  & {\tt var}(x) \\ 
1879C \Gives (i, n \cdot \sigma, s) \arrow (i+1,\sigma,s[n/x])  & {\tt setvar}(x) \\
1880C \Gives (i, n \cdot n' \cdot \sigma, s) \arrow (i+1, (n+_{\Z} n')\cdot \sigma,s)  & {\tt add} \\
1881C \Gives (i, \sigma, s) \arrow (i+k+1, \sigma,s)  & {\tt branch(k)} \\
1882C \Gives (i, n \cdot n' \cdot \sigma, s) \arrow (i+1, \sigma,s)  & {\tt bge(k)} \mbox{ and }n<_{\Z} n'\\
1883C \Gives (i, n \cdot n' \cdot \sigma, s) \arrow (i+k+1, \sigma,s)  & {\tt bge(k)} \mbox{ and }n\geq_{\Z} n'\\
1884
1885\end{array}
1886\]}
1887\caption{Operational semantics $\vm$ programs}\label{semantics-vm}
1888\end{table}
1889
1890Code coming from the compilation of $\imp$ programs has specific
1891properties that are used in the following compilation step when values
1892on the stack are allocated either in registers or in main memory.  In
1893particular, it turns out that for every instruction of the compiled
1894code it is possible to predict statically the {\em height of the stack}
1895whenever the instruction is executed.  We now proceed to define a simple notion
1896of {\em well-formed} code and show that it enjoys this property.  In
1897the following section, we will define the compilation function from
1898$\imp$ to $\vm$ and show that it produces well-formed code.
1899
1900
1901\begin{definition}
1902We say that a sequence of instructions $C$ is well formed if there
1903is a function $h:\set{0,\ldots,|C|} \arrow \Nat$ which satisfies the
1904conditions listed in table \ref{well-formed-instr} for $0\leq i \leq |C|-1$.
1905In this case we write $C:h$.
1906\end{definition}
1907
1908\begin{table}
1909{\footnotesize
1910\[
1911\begin{array}{l|l}
1912
1913C[i]=         & \mbox{Conditions for }C:h \\\hline
1914
1915\s{cnst}(n)
1916\mbox{ or } \s{var}(x)
1917&h(i+1) = h(i)+1 \\
1918
1919\s{add}           
1920& h(i)\geq 2, \quad h(i+1)=h(i)-1 \\
1921
1922\s{setvar}(x)     
1923& h(i)=1, \quad h(i+1)=0 \\
1924
1925\s{branch}(k)
1926& 0\leq i+k+1 \leq |C|, \quad h(i)=h(i+1)=h(i+k+1)=0 \\
1927
1928\s{bge}(k)
1929& 0\leq i+k+1 \leq |C|, \quad h(i)=2, \quad h(i+1)=h(i+k+1)=0 \\
1930
1931\s{halt}   
1932&i=|C|-1, \quad h(i)=h(i+1)=0
1933
1934\end{array}
1935\]}
1936\caption{Conditions for well-formed code}\label{well-formed-instr}
1937\end{table}
1938
1939The conditions defining the predicate $C:h$ are strong enough
1940to entail that $h$ correctly predicts the stack height
1941and to guarantee the uniqueness of $h$ up to the initial condition.
1942
1943\begin{proposition}\label{unicity-height-prop}
1944(1)
1945If $C:h$, $C\Gives (i,\sigma,s) \trarrow (j,\sigma',s')$, and
1946$h(i)=|\sigma|$ then $h(j)=|\sigma'|$.
1947(2) If $C:h$, $C:h'$ and $h(0)=h'(0)$ then $h=h'$.
1948\end{proposition}
1949
1950
1951\subsection{Compilation from $\imp$ to $\vm$}\label{compil-imp-vm-sec}
1952In table \ref{compil-imp-vm},
1953we define compilation functions $\cl{C}$ from $\imp$ to $\vm$ which
1954operate on expressions, boolean conditions, statements, and programs.
1955We write $\w{sz}(e)$, $\w{sz}(b)$, $\w{sz}(S)$ for the number of instructions
1956the compilation function associates with the expression $e$, the boolean
1957condition $b$, and the statement $S$, respectively.
1958
1959
1960\begin{table}
1961{\footnotesize
1962\[
1963\begin{array}{c}
1964
1965\cl{C}(x) = {\tt var}(x)   \qquad
1966\cl{C}(n)  ={\tt cnst}(n)   \qquad
1967\cl{C}(e+e') =\cl{C}(e) \cdot \cl{C}(e') \cdot {\tt add} \\ \\
1968
1969\cl{C}(e<e',k) = \cl{C}(e') \cdot \cl{C}(e) \cdot {\tt bge(k)} \\ \\ 
1970
1971\qquad  \cl{C}(x:=e) = \cl{C}(e) \cdot {\tt setvar(x)}
1972
1973\qquad \cl{C}(S;S') = \cl{C}(S) \cdot \cl{C}(S')\\ \\
1974
1975
1976\cl{C}(\s{if} \ b \ \s{then} \ S\  \s{else} \ S') = 
1977\cl{C}(b,k) \cdot \cl{C}(S) \cdot (\s{branch}(k')) \cdot \cl{C}(S')  \\
1978\mbox{where: } k= \w{sz}(S)+1, \quad k'=\w{sz}(S')
1979
1980\\ \\
1981
1982\cl{C}(\s{while} \ b \ \s{do} \  S )
1983= \cl{C}(b,k)  \cdot \cl{C}(S)  \cdot \s{branch}(k') \\
1984\mbox{where: } k=\w{sz}(S)+1, \quad k'=-(\w{sz}(b)+\w{sz}(S)+1)\\\\
1985
1986\cl{C}(\s{prog} \ S) = 
1987\cl{C}(S) \cdot \s{halt}
1988
1989\end{array}
1990\]}
1991
1992\caption{Compilation from $\imp$ to $\vm$}\label{compil-imp-vm}
1993\end{table}
1994
1995
1996% \subsection{Soundness of compilation: from $\imp$ to $\vm$}
1997We follow~\cite{Leroy09-ln} for
1998the proof of soundness of the compilation function for
1999expressions and boolean conditions.
2000
2001\begin{proposition}\label{soundness-big-step-prop}
2002The following properties hold:
2003
2004\Defitem{(1)}
2005If $(e,s)\eval v$ then
2006$C \cdot \cl{C}(e) \cdot C'
2007\Gives (i,\sigma,s) \trarrow (j,v\cdot \sigma,s)$
2008where $i=|C|$ and $j=|C \cdot \cl{C}(e)|$.
2009
2010
2011\Defitem{(2)}
2012If $(b,s)\eval \s{true}$ then
2013$C \cdot \cl{C}(b,k) \cdot C'
2014\Gives (i,\sigma,s) \trarrow (j+k,\sigma,s)$
2015where  $i=|C|$ and $j=|C \cdot \cl{C}(b,k)|$.
2016
2017\Defitem{(3)}
2018If $(b,s)\eval \s{false}$
2019then $C \cdot \cl{C}(b,k) \cdot C'
2020\Gives (i,\sigma,s) \trarrow (j,\sigma,s)$
2021where  $i=|C|$ and $j=|C \cdot \cl{C}(b,k)|$.
2022
2023% \Defitem{(4)}
2024% If $(S,s)\eval s'$ then
2025% $C \cdot \cl{C}(S) \cdot C'
2026% \Gives (i,\sigma,s) \trarrow (j,\sigma,s')$
2027% where  $i=|C|$ and $j=|C \cdot \cl{C}(e)|$.
2028\end{proposition}
2029
2030Next we focus on the compilation of statements.
2031We introduce a ternary relation $R(C,i,K)$ which relates
2032a $\vm$ code $C$, a number $i\in\set{0,\ldots,|C|-1}$ and a continuation
2033$K$. The intuition is that relative to the code $C$,
2034the instruction $i$ can be regarded as having continuation $K$.
2035(A formal definition is available in Appendix~\ref{soundness-small-step}.)
2036We can then state the correctness of the compilation function as follows.
2037
2038
2039\begin{proposition}\label{soundness-small-step}
2040If $(S,K,s) \arrow (S',K',s')$ and $R(C,i,S\cdot K)$ then
2041$C\Gives (i,\sigma,s) \trarrow (j,\sigma,s')$ and
2042$R(C,j,S'\cdot K')$.
2043\end{proposition}
2044
2045
2046
2047%\subsection{Compiled code is well-formed}
2048As announced, we can prove that the result of the compilation
2049is a well-formed code.
2050
2051\begin{proposition}\label{well-formed-prop}
2052For any program $P$  there is a unique $h$ such that $\cl{C}(P):h$.
2053\end{proposition}
2054
2055
2056
2057\subsection{$\mips$: language and semantics}\label{mips-sec}
2058We consider a $\mips$-like machine~\cite{Larus05} 
2059which includes the following elements:
2060(1) a fixed code $M$ (a sequence of instructions),
2061(2) a program counter \w{pc},
2062(3) a finite set of registers including the registers
2063$A$, $B$, and $R_0,\ldots,R_{b-1}$,
2064and (4)  an (infinite) main memory which maps
2065locations to integers.
2066
2067We denote with $R,R',\ldots$ registers, with $l,l',\ldots$ locations
2068and with $m,m',\ldots$ memories which  are total functions from
2069registers and locations to (unbounded) integers.
2070%
2071%% We will rely on the following instructions with the associated
2072%% informal semantics:
2073%
2074%% {\footnotesize
2075%% \[
2076%% \begin{array}{ll}
2077%% \s{loadi} \ R, n    &\mbox{store value }n\mbox{ in the register }R \\
2078%% \s{load} \ R,l   &\mbox{store contents of location }l \mbox{ in the register }R   \\
2079%% \s{store} \ R,l &\mbox{store contents of register }R \mbox{ in the location }l   \\
2080%% \s{add}\ R,R',R''  &\mbox{add contents of }R',R''\mbox{ and store it in }R \\
2081%% \s{branch}\ k      &\mbox{jump with offset }k        \\
2082%% \s{bge}\  R,R',k     &\mbox{jump with offset }k \mbox{ if contents }R \mbox{ greater or equal than contents }R' \\
2083%% \s{halt}           &\mbox{stop computation}
2084%% \end{array}
2085%% \]}
2086We denote with $M$ a list of instructions.
2087The operational semantics is formalized  in
2088table \ref{semantics-mips} by rules of the shape
2089$M \Gives (i,m) \arrow (j,m')$,
2090where $M$ is a list of $\mips$ instructions, $i,j$ are natural numbers and $m,m'$ are memories.
2091We write $(M,m) \eval m'$ if
2092$M\Gives (0,m) \trarrow (j,m')$ and $M[j]=\s{halt}$.
2093
2094
2095\begin{table}
2096{\footnotesize
2097\[
2098\begin{array}{l|l}
2099
2100\mbox{Rule}                                                 & M[i]= \\\hline
2101
2102M \Gives (i, m) \arrow (i+1, m[n/R])  & \s{loadi} \ R,n \\ 
2103
2104M \Gives (i, m) \arrow (i+1,m[m(l)/R])  & \s{load} \ R,l \\ 
2105
2106M \Gives (i, m ) \arrow (i+1,m[m(R)/l]))  & \s{store} \ R,l \\
2107
2108M \Gives (i, m) 
2109\arrow (i+1, m[m(R')+m(R'')/R])  & \s{add}\ R,R',R'' \\
2110
2111M \Gives (i, m) \arrow (i+k+1, m)  & \s{branch}\ k \\
2112
2113M \Gives (i, m) \arrow (i+1, m)  & \s{bge}\ R,R',k \mbox{ and }m(R)<_{\Z} m(R')\\
2114
2115M \Gives (i, m) \arrow (i+k+1, m)  & \s{bge}\ R,R',k \mbox{ and }m(R)\geq_{\Z} m(R')
2116\end{array}
2117\]}
2118\caption{Operational semantics $\mips$ programs}\label{semantics-mips}
2119\end{table}
2120
2121
2122\subsection{Compilation from $\vm$ to $\mips$}\label{compil-vm-mips-sec}
2123In order to compile $\vm$ programs to $\mips$ programs we make the following
2124hypotheses:
2125(1) for every $\vm$ program variable $x$ we reserve an address $l_x$,
2126(2) for every natural number $h\geq b$, we reserve an address $l_h$ (the addresses $l_x,l_h,\ldots$ are all distinct), and
2127(3) we store the first $b$ elements of the stack $\sigma$ in the registers
2128$R_0,\ldots,R_{b-1}$ and the remaining (if any) at the addresses
2129$l_b,l_{b+1},\ldots$.
2130
2131We say that the memory $m$ represents the stack $\sigma$ and
2132the store $s$, and write $m\real \sigma,s$,
2133if the following conditions are satisfied:
2134(1) $s(x) = m(l_x)$, and (2) if $0\leq i < |\sigma|$ then
2135$\sigma[i] = m(R_i)$ if $i<b$, and
2136$\sigma[i] = m(l_i)$ if $i\geq b$.
2137
2138
2139
2140\begin{table}
2141{\footnotesize
2142\[
2143\begin{array}{l|l}
2144
2145C[i]=          &\cl{C'}(i,C)= 
2146\\\hline
2147
2148\s{cnst}(n) 
2149& \left\{ \begin{array}{lr}
2150                   (\s{loadi} \ R_h,n)                                &\mbox{if } h=h(i)<b \\
2151                   (\s{loadi} \ A,n ) \cdot  (\s{store} \ A, l_h)     &\mbox{otherwise}
2152                   \end{array} \right. \\ 
2153
2154
2155\s{var}(x)
2156\left\{ \begin{array}{lr}
2157                   (\s{load} \ R_h,l_x)                           &\mbox{if } h=h(i)<b \\
2158                   (\s{load} \ A,l_x ) \cdot  (\s{store} \ A, l_h)  &\mbox{otherwise}
2159                   \end{array} \right. \\ 
2160
2161
2162
2163\s{add}
2164& \left\{ \begin{array}{lr}
2165                     (\s{add} \ R_{h-2} , R_{h-2} , R_{h-1})   &\mbox{if }h=h(i)<(b-1)   \\
2166                     (\s{load} \ A,l_{h-1}) \cdot (\s{add} \ R_{h-2}, R_{h-2}, A)         &\mbox{if }h=h(i)=(b-1)  \\
2167                     
2168
2169                           (\s{load} \ A, l_{h-1}) \cdot (\s{load} \ B,l_{h-2})  &\mbox{if }h=h(i)>(b-1) \\ 
2170                           (\s{add} \ A,B,A) \cdot (\s{store} \ A, l_{h-2})         
2171                        \end{array}\right. \\ 
2172
2173\s{setvar}(x)           
2174& \left\{ \begin{array}{lr}
2175                         (\s{store} \ R_{h-1} \ l_x) &\mbox{if }h=h(i)<b\\
2176                         (\s{load} \ A,l_{h-1})\cdot (\s{store} \ A,l_{x}) &\mbox{if }h=h(i)\geq b
2177                         \end{array}\right. \\ 
2178
2179
2180
2181\s{branch}(k)
2182& (\s{branch} \ k')  \quad \mbox{if }k'=p(i+k+1,C)-p(i+1,C)\\ 
2183
2184
2185\s{bge}(k)
2186&\left\{ \begin{array}{lr}
2187                     (\s{bge} \ R_{h-2} , R_{h-1} , k')                                &\mbox{if }h=h(i)<(b-1)   \\
2188
2189                     (\s{load} \ A,l_{h-1}) \cdot (\s{bge} \ R_{h-2}, A, k')         &\mbox{if }h=h(i)=(b-1)  \\
2190                     
2191
2192                           (\s{load} \ A, l_{h-2}) \cdot (\s{load} \ B,l_{h-1}) \cdot 
2193                           (\s{bge} \ A,B,k')         &\mbox{if }h=h(i)>(b-1), \  k'= \\
2194
2195&p(i+k+1,C)-p(i+1,C)
2196          \end{array}\right. \\ 
2197
2198
2199
2200\s{halt}
2201&\s{halt}
2202
2203\end{array}
2204\]}
2205\caption{Compilation from $\vm$ to $\mips$}\label{compil-vm-mips}
2206\end{table}
2207
2208
2209The compilation function $\cl{C'}$ from
2210$\vm$ to $\mips$ is described in table \ref{compil-vm-mips}.
2211It operates on a well-formed $\vm$ code $C$ whose last instruction
2212is $\s{halt}$. Hence, by proposition \ref{well-formed-prop}(3),
2213there is a unique $h$ such that $C:h$.
2214We denote with $\cl{C'}(C)$ the concatenation $\cl{C'}(0,C)\cdots \cl{C'}(|C|-1,C)$.
2215Given a well formed $\vm$ code $C$ with $i<|C|$ we denote with
2216$p(i,C)$ the position of the first instruction in $\cl{C'}(C)$ which
2217corresponds to the compilation of the instruction with position $i$ in $C$.
2218This is defined as\footnote{There is an obvious circularity in this definition that can be easily eliminated by defining first the function
2219$d$ following the case
2220analysis in table \ref{compil-vm-mips}, then the function $p$, and finally the function $\cl{C'}$ as in table \ref{compil-vm-mips}.}
2221$p(i,C) = \Sigma_{0\leq j <i} d(i,C)$,
2222where the function $d(i,C)$ is defined as
2223$d(i,C) = |\cl{C'}(i,C)|$.
2224Hence $d(i,C)$ is the number of $\mips$ instructions associated
2225with the $i^{\w{th}}$ instruction of the (well-formed) $C$ code.
2226The functional correctness of the compilation function can then be stated
2227as follows.
2228
2229
2230
2231\begin{proposition}\label{simulation-vm-mips}
2232Let $C : h$ be a well formed code.
2233If  $C \Gives (i,\sigma,s) \arrow (j,\sigma',s')$ with $h(i) = |\sigma|$ and  $m \real \sigma,s$
2234then
2235$\cl{C'}(C) \Gives (p(i,C),m) \trarrow (p(j,C),m')$ and $m'\real \sigma',s'$.
2236\end{proposition}
2237
2238\newpage
2239
2240\section{Proofs}\label{paper-proofs}
2241
2242We omit the proofs that have been mechanically checked by K. Memarian
2243and R. Saillard with the $\coq$ proof assistant~\footnote{
2244These proofs can be downloaded at \url{http://www.pps.jussieu.fr/~yrg/miniCerCo/}
2245and at \url{http://www.pps.jussieu.fr/~saillard}.
2246}.
2247
2248
2249% THIS HAS BEEN CHECKED IN COQ
2250%% \subsection{Proof  of proposition \ref{unicity-height-prop}}
2251%% \Proofitemf{(1)} By induction on the reduction length and by case analysis
2252%% on the last instruction being executed.
2253
2254%% \Proofitem{(2)} If $h(i)=h'(i)$ for $i=0,\ldots,k$ and $k<|C|$ then,
2255%% by case analysis on the instruction $C[k]$, we check that
2256%% the conditions in table \ref{well-formed-instr} force $h(k+1)=h'(k+1)$. \qed
2257
2258\subsection{Notation}
2259Let $\act{t}$ be a family of reduction relations where
2260$t$ ranges over the set of labels and $\epsilon$. Then we
2261define:
2262\[
2263\wact{t} = \left\{
2264\begin{array}{ll}
2265(\act{\epsilon})^* &\mbox{if }t=\epsilon \\
2266(\act{\epsilon})^* \comp \act{t} \comp (\act{\epsilon})^*  &\mbox{otherwise}
2267\end{array}
2268\right.
2269\]
2270where as usual $R^*$ denote the reflexive and transitive closure of the relation $R$ 
2271and $\comp$ denotes the composition of relations.
2272
2273%%     \item[$(e_1<e_2) \eval \s{false}$] We set $j=c$.
2274%%       \begin{itemize}
2275%%      \item The instance of $(1)$ is  $(S,K,s) \arrow (S_2,K,s)$.
2276%%      \item The reduction required in $(6)$ takes the form
2277%% $C \Gives (i,\sigma,s) \trarrow (i',\sigma,s) \trarrow (c,\sigma,s')$, and it
2278%% follows from $(3')$, the fact that $(e_1<e_2) \eval \s{false}$, and proposition \ref{soundness-big-step-prop}(3).
2279%%      \item Property $(7)$ follows from fact $(5)$ and the following proof tree:
2280
2281%% {\footnotesize         $$ \infer{j \access{C} j \quad R(C,i'',K) }{R(C,j,S_2 \cdot K)} ~.$$ }
2282
2283%% ~\qed
2284%%       \end{itemize}
2285%%   \end{description}
2286
2287
2288%% \subsection{Proof of proposition \ref{well-formed-prop}}
2289%% \Proofitemf{(1)}
2290%% By induction on $e$. For the inductive step, suppose
2291%% we have $\cl{C}(e_1):h_1$ with $h_1(0)=n$ and
2292%% $\cl{C}(e_2):h_2$ with $h_2(0)=n+1$. Then we can build
2293%% $h$ such that $\cl{C}(e_1+e_2):h$ and $h(0)=n$
2294%% by glueing $h_1$ and $h_2$.
2295%% \Proofitem{(2)}
2296%% By induction on $S$.
2297%% \Proofitem{(3)} Consequence of (2). \qed
2298
2299
2300%% \subsection{Proof of proposition \ref{well-formed-prop}}
2301%% We actually prove that for any expression $e$, statement $S$, and program $P$ the
2302%% following holds:
2303
2304%% \Defitem{(1)}
2305%% For any $n\in\Nat$ there is a unique $h$ such that
2306%% $\cl{C}(e):h$, $h(0)=n$, and $h(|\cl{C}(e)|)=h(0)+1$.
2307
2308%% \Defitem{(2)}
2309%% For any $S$,
2310%% there is a unique $h$ such that $\cl{C}(S):h$, $h(0)=0$, and
2311%% $h(|\cl{C}(e)|)=0$.
2312
2313%% \Defitem{(3)}
2314%% There is a unique $h$ such that $\cl{C}(P):h$.
2315
2316
2317
2318\subsection{Proof of proposition \ref{labelled-sim-imp-vm}} 
2319
2320\Defitemf{(1)} By induction on the structure of the command $S$.
2321
2322\Defitem{(2)} By iterating the following proposition.
2323
2324\begin{proposition}
2325  If $(S,K,s) \stackrel{t}{\arrow} (S',K',s')$ and $R(C,i,S\cdot K)$ with $t=\ell$ or $t=\epsilon$ then
2326  $C\Gives (i,\sigma,s) \wact{t} (j,\sigma,s')$ and
2327  $R(C,j,S'\cdot K')$.
2328\end{proposition}
2329
2330This is an extension of proposition \ref{soundness-small-step} and it is proven in the same way with
2331an additional case for labelled commands. \qed
2332
2333\subsection{Proof of proposition \ref{sim-vm-mips-prop}} 
2334\Proofitemf{(1)} The compilation of the $\vm$ instruction $\s{nop}(\ell)$ is the $\mips$ instruction $(\s{nop} \ \ell)$.
2335
2336\Proofitem{(2)}                         
2337By iterating the following proposition.
2338
2339\begin{proposition}
2340Let $C : h$ be a well formed code.
2341If  $C \Gives (i,\sigma,s) \stackrel{t}\arrow (j,\sigma',s')$ with $t=\ell$ or $t=\epsilon$, $h(i) = |\sigma|$ and  $m \real \sigma,s$
2342then
2343$\cl{C'}(C) \Gives (p(i,C),m) \wact{t} (p(j,C),m')$ and $m'\real \sigma',s'$.
2344\end{proposition} 
2345
2346%% This is an extension of proposition \ref{simulation-vm-mips} and it is proven in the same way
2347%% with an additional case for the \s{nop} instruction.\qed
2348
2349\subsection{Proof of proposition \ref{lab-instr-erasure-imp}} 
2350We extend the instrumentation to the continuations by defining:
2351\[
2352\cl{I}(S\cdot K) = \cl{I}(S)\cdot \cl{I}(K) \qquad
2353\cl{I}(\s{halt}) = \s{halt}~.
2354\]
2355Then we examine the possible reductions of a configuration
2356$(\cl{I}(S),\cl{I}(K),s[c/\w{cost}])$.
2357
2358\begin{itemize}
2359
2360\item
2361If $S$ is an unlabelled statement such as $\s{while} \ b \ \s{do} \ S'$
2362then $\cl{I}(S) = \s{while} \ b \ \s{do} \ \cl{I}(S')$ and
2363assuming $(b,s) \eval \s{true}$ the reduction step is:
2364\[
2365(\cl{I}(S),\cl{I}(K),s[c/\w{cost}]) \arrow
2366(\cl{I}(S'), \cl{I}(S)\cdot \cl{I}(K),s[c/\w{cost}])~.
2367\]
2368Noticing that $\cl{I}(S)\cdot \cl{I}(K) = \cl{I}(S\cdot K)$,
2369this step is matched in the labelled language as follows:
2370\[
2371(S,K,s[c/\w{cost}]) \arrow
2372(S', S\cdot K,s[c/\w{cost}])~.
2373\]
2374
2375\item On the other hand, if $S=\ell:S'$ is a labelled statement
2376then $\cl{I}(S)= \w{inc}(\ell);\cl{I}(S')$ 
2377and, by a sequence of reductions steps, we have:
2378\[
2379(\cl{I}(S),\cl{I}(K),s[c/\w{cost}]) \trarrow
2380(\cl{I}(S'), \cl{I}(K),s[c+\kappa(\ell)/\w{cost}])~.
2381\]
2382This step is matched by the labelled reduction:
2383\[
2384(S,K,s[c/\w{cost}]) \act{\ell}
2385(S', K,s[c/\w{cost}])~.
2386\]
2387~\qed
2388\end{itemize}
2389%This is a direct proof on the
2390%In order to carry on the proof, one needs to develop a bit
2391%more the properties of the small-step operational semantics.
2392%In particular, one needs to show that a continuation such
2393%as $S\cdot (S'\cdot K)$ is `observationally equivalent' to the
2394%continuation  $(S;S')\cdot K$. \marginpar{This should be simplified now!}
2395% CAN INSERT HERE PROOF ONCE IT IS CHECKED \input{instrumentation-proof}
2396%\qed
2397%Let $S_A$ denote a command
2398%which cannot be decomposed as $S;S'$.
2399%
2400%\begin{definition}
2401%Given a continuation $K$, its flattening $\w{flat}(K)$ is defined
2402%as follows:
2403%\[
2404%\begin{array}{ll}
2405%
2406%\w{flat}(\s{halt})      &=\s{halt} \\
2407%\w{flat}((S;S')\cdot K) &=\w{flat}(S,S'\cdot K) \\
2408%\w{flat}(S_A \cdot K)     &=S_A \cdot \w{flat}(K)
2409%
2410%\end{array}
2411%\]
2412%\end{definition}
2413%
2414%We write $K\equiv K'$ if $\w{flat}(K)=\w{flat}(K')$.
2415%
2416%\begin{lemma}
2417%If $\w{flat}(S\cdot K) = S_A \cdot K'$ then
2418%$(S,K,s) \trarrow (S_A,K'',s)$ and
2419%$K' \equiv K''$.
2420%\end{lemma}
2421
2422%\begin{lemma}
2423%If $S_1\cdot K_1 \equiv S_2\cdot K_2$ and
2424%$(S_1,K_1,s) \arrow (S'_1,K'_1,s')$ then
2425%$(S_2,K_2,s) \trarrow (S'_2,K'_2,s')$ and
2426%$S'_1 \cdot K'_1 \equiv S'_2 \cdot K'_2$.
2427%\end{lemma}
2428
2429%\begin{definition}
2430%  The relation $\equiv$ between continuations of $\imp_\ell$ and $\imp$ is defined inductively by the following rules :
2431%  $$
2432%  \infer{}{\s{\s{halt}} \equiv \s{\s{halt}}}(R_0) \quad
2433%  \infer{K \equiv K' \quad S'=\cl{I}(S)}{S.K \equiv S'.K'}(R_1) \quad
2434%  \infer{K \equiv K' \quad (S';S'')=\cl{I}(S)}{S.K \equiv S'.S''.K'}(R_2)
2435%  $$
2436%  We write $S.K \equiv_{R_i} S'.K'$ ($i$ = 1 or 2) when the last rule used to infer $S.K\equiv S'.K'$ is $R_i$.
2437%\end{definition}
2438
2439%\begin{lemma}\label{equiv-lem}\ $\\$
2440%  If $S\in \{\s{skip},(x:=e),\s{if} b \s{then} S_t \s{else} S_f,\s{while} b \s{do} S_w\}$ then $S.K\equiv S'.K' \Rightarrow S.K \equiv_{R_1} S'.K'$
2441%\end{lemma}
2442%This comes from the fact that to apply rule $R_2$, $\cl{I}(S)$ must have the form $S';S''$.
2443%
2444%\begin{definition}
2445%  The predicate $\mathcal{T}$ associate states of $\imp_\ell$ with states of $\imp$.
2446%  $$ \mathcal{T}((S_1,K_1,s_1),(S_2,K_2,s_2)) := S_1.K_1\equiv S_2.K_2 \text{ and } s_2=s_1[cost \leftarrow c]$$
2447%\end{definition}
2448%
2449%We, now formulate the small-step version of the theorem of correction of instrumentation.
2450 
2451
2452\subsection{Proof of proposition \ref{global-commutation-prop}}
2453By diagram chasing using propositions \ref{labelled-sim-imp-vm}(1),
2454\ref{sim-vm-mips-prop}(1), and the definition \ref{labelling-def} of labelling. \qed
2455
2456
2457\subsection{Proof of proposition \ref{instrument-to-label-prop}}
2458Suppose that:
2459$$ (\cl{I}(\cl{L}(P)),s[c/\w{cost}]) \eval s'[c+\delta/\w{cost}] \text{ and } m\real s[c/\w{cost}]~. $$
2460Then, by proposition  \ref{lab-instr-erasure-imp}, for some $\lambda$:
2461$$ (\cl{L}(P),s[c/\w{cost}])\eval (s'[c/\w{cost}],\lambda) \text{ and } \kappa(\lambda)=\delta~. $$ 
2462Finally, by propositions  \ref{labelled-sim-imp-vm}(2) and \ref{sim-vm-mips-prop}(2) :
2463$$ (\cl{C'}(\cl{C}(\cl{L}(P))),m) \eval (m',\lambda) \text{ and } m'\real s'[c/\w{cost}]~.$$
2464\qed
2465
2466\subsection{Proof of proposition \ref{sound-label-prop}}
2467 If $\lambda=\ell_1 \cdots \ell_n$ then the
2468computation is the concatenation of simple paths
2469labelled with $\ell_1,\ldots,\ell_n$. Since $\kappa(\ell_i)$
2470bounds the cost of a simple path labelled with $\ell_i$, the
2471cost of the overall computation is bounded by $\kappa(\lambda)=
2472\kappa(\ell_1)+\cdots \kappa(\ell_n)$. \qed 
2473
2474
2475\subsection{Proof of proposition \ref{precise-label-prop}}
2476Same proof as proposition \ref{sound-label-prop}, by replacing the word \emph{bounds} by \emph{is exactly} and the words \emph{bounded by} by \emph{exactly}. \qed
2477
2478\subsection{Proof of proposition \ref{lab-sound}} 
2479In both labellings under consideration the root node is labelled.  An
2480obvious observation is that only commands of the shape \s{while} $b$
2481\s{do} $S$
2482%lab-bool and \s{while} $lb$ \s{do} $S$
2483introduce loops in the compiled code. 
2484%lab-bool In the second case, the compilation ensures that a label is placed in the loop. 
2485%\lab-boolIn the first case,
2486We notice that both labelling introduce a label in the loop (though at different places).
2487Thus all loops go through a label and the compiled code is always sound.
2488
2489To show the precision of the second labelling $\cl{L}_p$, we note the
2490following property. %\marginpar{This seems unchanged.}
2491
2492\begin{lemma}\label{precise-lemma}
2493A soundly labelled graph is precise if each label occurs at most once in the graph and
2494if the immediate successors of the \s{bge} nodes are either \s{halt} (no successor) or labelled nodes.
2495\end{lemma}
2496
2497Indeed, in a such a graph starting from a labelled node we can follow a unique path
2498up to a leaf, another labelled node, or a $\s{bge}$ node. In the last case, the
2499hypotheses in the lemma \ref{precise-lemma} guarantee that the two
2500simple paths one can follow from the $\s{bge}$ node have
2501the same length/cost.  \qed
2502
2503
2504
2505%% This comes from the fact that, in such a graph, each labelled node has an unique simple path. \qed
2506%% The preciseness of $\cl{L}$ comes from the fact that, for any program $P$, $\cl{C'}(\cl{C}(\cl{L}_p(P)))$ satisfies the condition of lemma \ref{precise-lemma}.
2507%% This can be proved by induction on the command, knowing that:
2508%% A \s{beq} node appears only, if a \s{if then else} command or a \s{while do} command is compiled.
2509%% In the \s{if then else} case, the \s{beq} node is followed by the first instruction of each branch, which are the first instruction of the compiled code of a labelled command: a \s{nop} instruction.
2510%% In the \s{while do} case, the \s{beq} node is followed by the first instruction of compiled code of the labelled body loop, which is a \s{nop} instruction, and, if any, by the first instruction of the compiled code of the labelled command following the loop, a \s{nop} instruction, or by a \s{halt} instruction which is a leaf.
2511%% \qed
2512
2513
2514
2515
2516\subsection{Proof of proposition \ref{ann-correct}} 
2517By applying consecutively proposition \ref{instrument-to-label-prop}
2518and propositions \ref{sound-label-prop} or \ref{precise-label-prop}. \qed
2519
2520\subsection{Proof of proposition \ref{soundness-small-step}} 
2521Given a $\vm$ code $C$, we define an `accessibility relation'
2522$\access{C}$ as the least binary relation on $\set{0,\ldots,|C|-1}$ such that:
2523
2524{\footnotesize
2525\[
2526\begin{array}{ll}
2527
2528\infer{}
2529{i\access{C} i}
2530\qquad \qquad
2531&\infer{C[i]=\s{branch}(k)\quad (i+k+1)\access{C} j}
2532{i\access{C} j}
2533
2534\end{array}
2535\]}
2536
2537We also introduce a ternary relation $R(C,i,K)$ which relates
2538a $\vm$ code $C$, a number $i\in\set{0,\ldots,|C|-1}$ and a continuation
2539$K$.
2540The relation is
2541defined as the least one that satisfies the following conditions.
2542
2543{\footnotesize
2544\[
2545\begin{array}{ll}
2546
2547\infer{
2548\begin{array}{c}
2549\\
2550i\access{C} j\qquad C[j]=\s{halt}
2551\end{array}}
2552{R(C,i,\s{halt})}
2553
2554\qquad \qquad
2555&\infer{
2556\begin{array}{c}
2557i\access{C} i'\quad C=C_1 \cdot \cl{C}(S) \cdot C_2\\ 
2558i'=|C_1|\quad j=|C_1\cdot \cl{C}(S)| \quad R(C,j,K)
2559\end{array}}
2560{R(C,i,S\cdot K)}~.
2561
2562\end{array}
2563\]}
2564
2565The following properties are useful.
2566
2567\begin{lemma}\label{access-lemma}
2568\Defitemf{(1)} The relation $\access{C}$ is transitive.
2569
2570\Defitem{(2)}  If $i\access{C} j$ and $R(C,j,K)$ then $R(C,i,K)$.
2571\end{lemma}
2572The first property can be proven by induction on the definition of $\access{C}$ and  the second by induction on the structure of $K$.
2573%\begin{lemma}
2574%\begin{enumerate}
2575%\item If $i \access{C} j$ then $C\Gives (i,\sigma,s) \trarrow (j,\sigma,s)$.
2576%\item If $i\access{C} j$ and $j\access{C} k$ then $i\access{C} k$.
2577%\item If $i\access{C}j$ and $R(C,j,K)$ then $R(C,i,K)$.
2578%\item $R(C,i,K)$ iff $R(C,i,(\s{skip})\cdot K)$.
2579%\end{enumerate}
2580%\end{lemma}
2581
2582
2583Next we can focus on the proposition.
2584The notation $C \stackrel{i}{\cdot} C'$ means that $i=|C|$.
2585Suppose that:
2586
2587{\footnotesize
2588\[
2589\begin{array}{lll}
2590(S,K,s) \arrow (S',K',s') \quad (1)  &\text{and}  &R(C,i,S\cdot K) \quad (2)~.
2591\end{array}
2592\]}
2593From $(2)$, we know that there exist $i'$ and $i''$ such that:
2594
2595{\footnotesize\[
2596\begin{array}{llll}
2597i \access{C} i' \quad (3),
2598&C=C_1\stackrel{i'}{\cdot}\cl{C}(S)\stackrel{i''}{\cdot}C_2 \quad (4),
2599&\text{and}
2600&R(C,i'',K) \quad (5)
2601\end{array}
2602\]}
2603and from $(3)$ it follows that:
2604
2605{\footnotesize
2606\[ C\Gives (i,\sigma,s) \trarrow (i',\sigma,s) \quad (3')~.
2607\]}
2608We are looking for $j$ such that:
2609
2610{\footnotesize
2611\[
2612\begin{array}{lll}
2613C \Gives (i,\sigma,s) \trarrow (j,\sigma,s') \quad (6),
2614&\text{and}
2615&R(C,j,S'\cdot K') \quad (7) ~.
2616\end{array}
2617\]}
2618We proceed by case analysis on $S$. We just detail the case of the conditional command as the
2619the remaining cases have similar proofs.
2620If $S=\s{if}$ $e_1<e_2$ $\s{then}$ $S_1$ $\s{else}$ $S_2$ then $(4)$ is rewritten as follows:
2621
2622{\footnotesize  $$ C=C_1\stackrel{i'}{\cdot}\cl{C}(e_1)\cdot \cl{C}(e_2).\s{bge}(k_1)\stackrel{a}{\cdot}\cl{C}(S_1)\stackrel{b}{\cdot}\s{branch}(k_2) 
2623\stackrel{c}{\cdot} \cl{C}(S_2)\stackrel{i''}{\cdot}C_2~ $$}
2624where $c = a+k_1$ and $i''=c+k_2$.
2625  We distinguish two cases according to the evaluation of the boolean condition.
2626We describe the case
2627%  \begin{description}
2628%\item[
2629$(e_1<e_2) \eval \s{true}$. We set $j=a$.
2630\begin{itemize}
2631        \item The instance of $(1)$ is $(S,K,s) \arrow (S_1,K,s)$.
2632        \item The reduction required in (6) takes the form
2633$C \Gives (i,\sigma,s) \trarrow (i',\sigma,s) \trarrow (a,\sigma,s')$,
2634and it follows from $(3')$, the fact that $(e_1<e_2) \eval \s{true}$, and
2635proposition \ref{soundness-big-step-prop}(2).
2636
2637        \item Property $(7)$, follows from lemma \ref{access-lemma}(2), fact $(5)$, and
2638the following proof tree:
2639
2640{\footnotesize   
2641$$ \infer{j\access{C} j \quad \infer{b \access{C} i'' \quad R(C,i'',K)}{R(C,b,K)}}{R(C,j,S_1\cdot K)} ~.$$ }
2642~\qed
2643\end{itemize}
2644
2645
2646
2647\newpage
2648
2649
2650\section{A $\C$ compiler}\label{C-compiler-sec}
2651
2652This section gives an informal overview of the compiler, in particular it
2653highlights the main features of the intermediate languages, the purpose of the
2654compilation steps, and the optimisations.
2655%For practical reasons, the compiler does not support the \texttt{float} type.
2656%(*** We could have a running example. ***)
2657
2658\subsection{$\Clight$} 
2659$\Clight$ is a large subset of the $\C$ language that we adopt as
2660the source language of our compiler.
2661It features most of the types and operators
2662of $\C$. It includes pointer arithmetic, pointers to
2663functions, and \texttt{struct} and \texttt{union} types, as well as
2664all $\C$ control structures. The main difference with the $\C$
2665language is that $\Clight$ expressions are side-effect free, which
2666means that side-effect operators ($\codeex{=}$,$\codeex{+=}$,$\codeex{++}$,$\ldots$) and function calls
2667within expressions are not supported.
2668Given a $\C$ program, we rely on the $\cil$ tool~\cite{CIL02} to deal
2669with the idiosyncrasy of  $\C$ concrete syntax and to produce an
2670equivalent program in $\Clight$ abstract syntax.
2671We refer to  the $\compcert$ project~\cite{Leroy09} 
2672for a formal definition of the
2673$\Clight$ language. Here we just recall  in
2674figure \ref{syntax-clight} its syntax which is
2675classically structured in expressions, statements, functions,
2676and whole programs. In order to limit the implementation effort,
2677our current compiler for $\Clight$ does {\em not} cover the operators
2678relating to the floating point type {\tt float}. So, in a nutshell,
2679the fragment of $\C$ we have implemented is $\Clight$ without
2680floating point.
2681
2682There is a notable difficulty to compile the $\C$ language into
2683$\intel$ assembly code due to the fact that $\intel$'s machine word are
26848bits long and $\C$ has 32bits primitive datatypes. To deal with the
2685dissimilarity in that case, we first translate the $\Clight$ input
2686program into a $\Clight$ input program that only uses 8bits primitive
2687datatypes.
2688
2689\begin{figure}
2690  \label{syntax-clight}
2691  \footnotesize{
2692  \begin{tabular}{l l l l}
2693    Expressions: & $a$ ::= & $id$               & variable identifier \\
2694    & & $|$ $n$                                 & integer constant \\
2695%    & & $|$ $f$                                & float constant \\
2696    & & $|$ \texttt{sizeof}($\tau$)             & size of a type \\
2697    & & $|$ $op_1$ $a$                          & unary arithmetic operation \\
2698    & & $|$ $a$ $op_2$ $a$                      & binary arithmetic operation \\
2699    & & $|$ $*a$                                & pointer dereferencing \\
2700    & & $|$ $a.id$                              & field access \\
2701    & & $|$ $\&a$                               & taking the address of \\
2702    & & $|$ $(\tau)a$                           & type cast \\
2703    & & $|$ $a ? a : a$                         & conditional expression \\[10pt]
2704    Statements: & $s$ ::= & \texttt{skip}       & empty statement \\
2705    & & $|$ $a=a$                               & assignment \\
2706    & & $|$ $a=a(a^*)$                          & function call \\
2707    & & $|$ $a(a^*)$                            & procedure call \\
2708    & & $|$ $s;s$                               & sequence \\
2709    & & $|$ \texttt{if} $a$ \texttt{then} $s$ \texttt{else} $s$ & conditional \\
2710    & & $|$ \texttt{switch} $a$ $sw$            & multi-way branch \\
2711    & & $|$ \texttt{while} $a$ \texttt{do} $s$  & ``while'' loop \\
2712    & & $|$ \texttt{do} $s$ \texttt{while} $a$  & ``do'' loop \\
2713    & & $|$ \texttt{for}($s$,$a$,$s$) $s$       & ``for'' loop\\
2714    & & $|$ \texttt{break}                      & exit from current loop \\
2715    & & $|$ \texttt{continue}                   & next iteration of the current loop \\
2716    & & $|$ \texttt{return} $a^?$               & return from current function \\
2717    & & $|$ \texttt{goto} $lbl$                 & branching \\
2718    & & $|$ $lbl$ : $s$                         & labelled statement \\[10pt]
2719    Switch cases: & $sw$ ::= & \texttt{default} : $s$           & default case \\
2720    & & $|$ \texttt{case } $n$ : $s;sw$                         & labelled case \\[10pt]
2721    Variable declarations: & $dcl$ ::= & $(\tau\quad id)^*$             & type and name\\[10pt]
2722    Functions: & $Fd$ ::= & $\tau$ $id(dcl)\{dcl;s\}$   & internal function \\
2723    & & $|$ \texttt{extern} $\tau$ $id(dcl)$                    & external function\\[10pt]
2724    Programs: & $P$ ::= & $dcl;Fd^*;\texttt{main}=id$           & global variables, functions, entry point
2725  \end{tabular}}
2726  \caption{Syntax of the $\Clight$ language}
2727\end{figure}
2728
2729
2730\subsection{$\Cminor$}
2731
2732$\Cminor$ is a simple, low-level imperative language, comparable to a
2733stripped-down, typeless variant of $\C$. Again we refer
2734to the $\compcert$ project for its formal definition
2735and we just recall in figure \ref{syntax-cminor}
2736its syntax which as for $\Clight$ is structured in
2737expressions, statements, functions, and whole programs.
2738
2739\begin{figure}
2740  \label{syntax-cminor}
2741  \footnotesize{
2742  \begin{tabular}{l l l l}
2743    Signatures: & $sig$ ::= & \texttt{sig} $\vec{\texttt{int}}$ $(\texttt{int}|\texttt{void})$ & arguments and result \\[10pt]
2744    Expressions: & $a$ ::= & $id$               & local variable \\
2745     & & $|$ $n$                                & integer constant \\
2746%     & & $|$ $f$                               & float constant \\
2747     & & $|$ \texttt{addrsymbol}($id$)          & address of global symbol \\
2748     & & $|$ \texttt{addrstack}($\delta$)       & address within stack data \\
2749     & & $|$ $op_1$ $a$                         & unary arithmetic operation \\
2750     & & $|$ $op_2$ $a$ $a$                     & binary arithmetic operation \\
2751     & & $|$ $\kappa[a]$                        & memory read\\
2752     & & $|$ $a?a:a$                    & conditional expression \\[10pt]
2753    Statements: & $s$ ::= & \texttt{skip}       & empty statement \\
2754    & & $|$ $id=a$                              & assignment \\
2755    & & $|$ $\kappa[a]=a$                       & memory write \\
2756    & & $|$ $id^?=a(\vec{a}):sig$               & function call \\
2757    & & $|$ \texttt{tailcall} $a(\vec{a}):sig$  & function tail call \\
2758    & & $|$ \texttt{return}$(a^?)$              & function return \\
2759    & & $|$ $s;s$                               & sequence \\
2760    & & $|$ \texttt{if} $a$ \texttt{then} $s$ \texttt{else} $s$         & conditional  \\
2761    & & $|$ \texttt{loop} $s$                   & infinite loop \\
2762    & & $|$ \texttt{block} $s$          & block delimiting \texttt{exit} constructs \\
2763    & & $|$ \texttt{exit} $n$                   & terminate the $(n+1)^{th}$ enclosing block \\
2764    & & $|$ \texttt{switch} $a$ $tbl$           & multi-way test and exit\\
2765    & & $|$ $lbl:s$                             & labelled statement \\
2766    & & $|$ \texttt{goto} $lbl$                 & jump to a label\\[10pt]
2767    Switch tables: & $tbl$ ::= & \texttt{default:exit}($n$) & \\
2768    & & $|$ \texttt{case} $i$: \texttt{exit}($n$);$tbl$ & \\[10pt]
2769    Functions: & $Fd$ ::= & \texttt{internal} $sig$ $\vec{id}$ $\vec{id}$ $n$ $s$ & internal function: signature, parameters, \\
2770    & & & local variables, stack size and body \\
2771    & & $|$ \texttt{external} $id$ $sig$ & external function \\[10pt]
2772    Programs: & $P$ ::= & \texttt{prog} $(id=data)^*$ $(id=Fd)^*$ $id$   & global variables, functions and entry point
2773  \end{tabular}}
2774  \caption{Syntax of the $\Cminor$ language}
2775\end{figure}
2776
2777\paragraph{Translation of $\Clight$ to $\Cminor$}
2778As in $\Cminor$ stack operations are made explicit, one has to know
2779which variables are stored in the stack. This
2780information is produced by a static analysis that determines
2781the variables whose address may be `taken'.
2782Also space is reserved for local arrays and structures. In a
2783second step, the proper compilation is performed: it consists mainly
2784in translating $\Clight$ control structures to the basic
2785ones available in $\Cminor$.
2786
2787\subsection{$\RTLAbs$}
2788
2789$\RTLAbs$ is the last architecture independent language in the
2790compilation process.  It is a rather straightforward {\em abstraction} of
2791the {\em architecture-dependent} $\RTL$ intermediate language
2792available in the $\compcert$ project and it is intended to factorize
2793some work common to the various target assembly languages
2794(e.g. optimizations) and thus to make retargeting of the compiler a
2795simpler matter.
2796
2797We stress that in $\RTLAbs$ the structure of $\Cminor$ expressions
2798is lost and that this may have a negative impact on the following
2799instruction selection step.
2800Still, the subtleties of instruction selection seem rather orthogonal
2801to our goals and we deem the possibility of retargeting
2802easily the compiler more important than the efficiency of the generated code.
2803
2804
2805
2806\paragraph{Syntax.}
2807In $\RTLAbs$, programs are represented as \emph{control flow
2808  graphs} (CFGs for short). We associate with the nodes of the graphs
2809instructions reflecting the $\Cminor$ commands. As usual, commands that change the control
2810flow of the program (e.g. loops, conditionals) are translated by inserting
2811suitable branching instructions in the CFG.
2812The syntax of the language is depicted in table
2813\ref{RTLAbs:syntax}. Local variables are now represented by \emph{pseudo
2814  registers} that are available in unbounded number. The grammar rule $\w{op}$ that is
2815not detailed in table \ref{RTLAbs:syntax} defines usual arithmetic and boolean
2816operations (\codeex{+}, \codeex{xor}, \codeex{$\le$}, etc.) as well as constants
2817and conversions between sized integers.
2818
2819\begin{table}
2820{\footnotesize
2821\[
2822\begin{array}{lllllll}
2823\w{return\_type} & ::= & \s{int} \Alt \s{void} & \qquad
2824\w{signature} & ::= & (\s{int} \rightarrow)^*\ \w{return\_type}\\
2825\end{array}
2826\]
2827
2828\[
2829\begin{array}{lllllll}
2830\w{memq} & ::= & \s{int8s} \Alt \s{int8u} \Alt \s{int16s} \Alt \s{int16u}
2831                 \Alt \s{int32} & \qquad
2832\w{fun\_ref} & ::= & \w{fun\_name} \Alt \w{psd\_reg}
2833\end{array}
2834\]
2835
2836\[
2837\begin{array}{llll}
2838\w{instruction} & ::= & \Alt \s{skip} \rightarrow \w{node} &
2839                        \quad \mbox{(no instruction)}\\
2840                &     & \Alt \w{psd\_reg} := \w{op}(\w{psd\_reg}^*)
2841                        \rightarrow \w{node} & \quad \mbox{(operation)}\\
2842                &     & \Alt \w{psd\_reg} := \s{\&}\w{var\_name}
2843                        \rightarrow \w{node} &
2844                        \quad \mbox{(address of a global)}\\
2845                &     & \Alt \w{psd\_reg} := \s{\&locals}[n]
2846                        \rightarrow \w{node} &
2847                        \quad \mbox{(address of a local)}\\
2848                &     & \Alt \w{psd\_reg} := \w{fun\_name}
2849                        \rightarrow \w{node} &
2850                        \quad \mbox{(address of a function)}\\
2851                &     & \Alt \w{psd\_reg} :=
2852                        \w{memq}(\w{psd\_reg}[\w{psd\_reg}])
2853                        \rightarrow \w{node} & \quad \mbox{(memory load)}\\
2854                &     & \Alt \w{memq}(\w{psd\_reg}[\w{psd\_reg}]) :=
2855                        \w{psd\_reg}
2856                        \rightarrow \w{node} & \quad \mbox{(memory store)}\\
2857                &     & \Alt \w{psd\_reg} := \w{fun\_ref}(\w{psd\_reg^*}) :
2858                        \w{signature} \rightarrow \w{node} &
2859                        \quad \mbox{(function call)}\\
2860                &     & \Alt \w{fun\_ref}(\w{psd\_reg^*}) : \w{signature}
2861                        & \quad \mbox{(function tail call)}\\
2862                &     & \Alt \s{test}\ \w{op}(\w{psd\_reg}^*) \rightarrow
2863                        \w{node}, \w{node} & \quad \mbox{(branch)}\\
2864                &     & \Alt \s{return}\ \w{psd\_reg}? & \quad \mbox{(return)}
2865\end{array}
2866\]
2867
2868\[
2869\begin{array}{lll}
2870\w{fun\_def} & ::= & \w{fun\_name}(\w{psd\_reg}^*): \w{signature}\\
2871             &     & \s{result:} \w{psd\_reg}?\\
2872             &     & \s{locals:} \w{psd\_reg}^*\\
2873             &     & \s{stack:} n\\
2874             &     & \s{entry:} \w{node}\\
2875             &     & \s{exit:} \w{node}\\
2876             &     & (\w{node:} \w{instruction})^*
2877\end{array}
2878\]
2879
2880\[
2881\begin{array}{lllllll}
2882\w{init\_datum} & ::= & \s{reserve}(n) \Alt \s{int8}(n) \Alt \s{int16}(n)
2883                       \Alt \s{int32}(n) & \qquad
2884\w{init\_data} & ::= & \w{init\_datum}^+
2885\end{array}
2886\]
2887
2888\[
2889\begin{array}{lllllll}
2890\w{global\_decl} & ::= & \s{var}\ \w{var\_name} \w{\{init\_data\}} & \qquad
2891\w{fun\_decl} & ::= & \s{extern}\ \w{fun\_name} \w{(signature)} \Alt
2892                      \w{fun\_def}
2893\end{array}
2894\]
2895
2896\[
2897\begin{array}{lll}
2898\w{program} & ::= & \w{global\_decl}^*\\
2899            &     & \w{fun\_decl}^*
2900\end{array}
2901\]}
2902\caption{Syntax of the $\RTLAbs$ language}\label{RTLAbs:syntax}
2903\end{table}
2904
2905\paragraph{Translation of $\Cminor$ to $\RTLAbs$.}
2906Translating $\Cminor$ programs to $\RTLAbs$ programs mainly consists in
2907transforming $\Cminor$ commands in CFGs. Most commands are sequential and have a
2908rather straightforward linear translation. A conditional is translated in a
2909branch instruction; a loop is translated using a back edge in the CFG.
2910
2911
2912
2913\subsection{$\RTL$}
2914
2915As in $\RTLAbs$, the structure of $\RTL$ programs is based on
2916CFGs. $\RTL$ is the first architecture-dependant intermediate language
2917of our compiler which, in its current version, targets the $\mips$ and
2918$\intel$ assembly languages.
2919
2920\paragraph{Syntax.}
2921$\RTL$ is very close to $\RTLAbs$. It is based on CFGs and explicits
2922the $\mips$ or the $\intel$ instructions corresponding to the
2923$\RTLAbs$ instructions. Type information disappears: everything is
2924represented using machine words. Moreover, each global of the
2925program is associated to an offset. The syntax of the language can be
2926found in table \ref{RTL:syntax}. The grammar rules $\w{unop}$,
2927$\w{binop}$, $\w{uncon}$, and $\w{bincon}$, respectively, represent
2928the sets of unary operations, binary operations, unary conditions and
2929binary conditions of the target assembly language.
2930
2931\begin{table}
2932{\footnotesize
2933\[
2934\begin{array}{lllllll}
2935\w{size} & ::= & \s{Byte} \Alt \s{HalfWord} \Alt \s{Word}  & \qquad
2936\w{fun\_ref} & ::= & \w{fun\_name} \Alt \w{psd\_reg}
2937\end{array}
2938\]
2939
2940\[
2941\begin{array}{llll}
2942\w{instruction} & ::= & \Alt \s{skip} \rightarrow \w{node} &
2943                        \quad \mbox{(no instruction)}\\
2944                &     & \Alt \w{psd\_reg} := n
2945                        \rightarrow \w{node} & \quad \mbox{(constant)}\\
2946                &     & \Alt \w{psd\_reg} := \w{unop}(\w{psd\_reg})
2947                        \rightarrow \w{node} & \quad \mbox{(unary operation)}\\
2948                &     & \Alt \w{psd\_reg} :=
2949                        \w{binop}(\w{psd\_reg},\w{psd\_reg})
2950                        \rightarrow \w{node} & \quad
2951                        \mbox{(binary operation)}\\
2952                &     & \Alt \w{psd\_reg} := \s{\&globals}[n]
2953                        \rightarrow \w{node} &
2954                        \quad \mbox{(address of a global)}\\
2955                &     & \Alt \w{psd\_reg} := \s{\&locals}[n]
2956                        \rightarrow \w{node} &
2957                        \quad \mbox{(address of a local)}\\
2958                &     & \Alt \w{psd\_reg} := \w{fun\_name}
2959                        \rightarrow \w{node} &
2960                        \quad \mbox{(address of a function)}\\
2961                &     & \Alt \w{psd\_reg} :=
2962                        \w{size}(\w{psd\_reg}[n])
2963                        \rightarrow \w{node} & \quad \mbox{(memory load)}\\
2964                &     & \Alt \w{size}(\w{psd\_reg}[n]) :=
2965                        \w{psd\_reg}
2966                        \rightarrow \w{node} & \quad \mbox{(memory store)}\\
2967                &     & \Alt \w{psd\_reg} := \w{fun\_ref}(\w{psd\_reg^*})
2968                        \rightarrow \w{node} &
2969                        \quad \mbox{(function call)}\\
2970                &     & \Alt \w{fun\_ref}(\w{psd\_reg^*})
2971                        & \quad \mbox{(function tail call)}\\
2972                &     & \Alt \s{test}\ \w{uncon}(\w{psd\_reg}) \rightarrow
2973                        \w{node}, \w{node} & \quad
2974                        \mbox{(branch unary condition)}\\
2975                &     & \Alt \s{test}\ \w{bincon}(\w{psd\_reg},\w{psd\_reg})
2976                        \rightarrow \w{node}, \w{node} & \quad
2977                        \mbox{(branch binary condition)}\\
2978                &     & \Alt \s{return}\ \w{psd\_reg}? & \quad
2979                        \mbox{(return)}
2980\end{array}
2981\]
2982
2983\[
2984\begin{array}{lllllll}
2985\w{fun\_def} & ::= & \w{fun\_name}(\w{psd\_reg}^*) & \qquad
2986\w{program} & ::= & \s{globals}: n\\
2987             &     & \s{result:} \w{psd\_reg}? & \qquad
2988            &     & \w{fun\_def}^*\\
2989             &     & \s{locals:} \w{psd\_reg}^*\\
2990             &     & \s{stack:} n\\
2991             &     & \s{entry:} \w{node}\\
2992             &     & \s{exit:} \w{node}\\
2993             &     & (\w{node:} \w{instruction})^*
2994\end{array}
2995\]}
2996\caption{Syntax of the $\RTL$ language}\label{RTL:syntax}
2997\end{table}
2998
2999\paragraph{Translation of $\RTLAbs$ to $\RTL$.}
3000This translation is mostly straightforward. A $\RTLAbs$ instruction is often
3001directly translated to a corresponding assembly instruction. There are a few
3002exceptions: some $\RTLAbs$ instructions are expanded in two or more assembly
3003instructions. When the translation of a $\RTLAbs$ instruction requires more than
3004a few simple assembly instruction, it is translated into a call to a function
3005defined in the preamble of the compilation result.
3006
3007\subsection{$\ERTL$}
3008
3009As in $\RTL$, the structure of $\ERTL$ programs is based on
3010CFGs. $\ERTL$ explicits the calling conventions of the $\mips$
3011assembly language. In the back-end for $\intel$, we defined our own
3012calling convention since there were none.
3013
3014\paragraph{Syntax.}
3015The syntax of the language is given in table \ref{ERTL:syntax}. The main
3016difference between $\RTL$ and $\ERTL$ is the use of hardware registers.
3017Parameters are passed in specific hardware registers; if there are too many
3018parameters, the remaining are stored in the stack. Other conventionally specific
3019hardware registers are used: a register that holds the result of a function, a
3020register that holds the base address of the globals, a register that holds the
3021address of the top of the stack, and some registers that need to be saved when
3022entering a function and whose values are restored when leaving a
3023function. Following these conventions, function calls do not list their
3024parameters anymore; they only mention their number. Two new instructions appear
3025to allocate and deallocate on the stack some space needed by a function to
3026execute. Along with these two instructions come two instructions to fetch or
3027assign a value in the parameter sections of the stack; these instructions cannot
3028yet be translated using regular load and store instructions because we do not
3029know the final size of the stack area of each function. At last, the return
3030instruction has a boolean argument that tells whether the result of the function
3031may later be used or not (this is exploited for optimizations).
3032
3033\begin{table}
3034{\footnotesize
3035\[
3036\begin{array}{lllllll}
3037\w{size} & ::= & \s{Byte} \Alt \s{HalfWord} \Alt \s{Word}  & \qquad
3038\w{fun\_ref} & ::= & \w{fun\_name} \Alt \w{psd\_reg}
3039\end{array}
3040\]
3041
3042\[
3043\begin{array}{llll}
3044\w{instruction} & ::= & \Alt \s{skip} \rightarrow \w{node} &
3045                        \quad \mbox{(no instruction)}\\
3046                &     & \Alt \s{NewFrame} \rightarrow \w{node} &
3047                        \quad \mbox{(frame creation)}\\
3048                &     & \Alt \s{DelFrame} \rightarrow \w{node} &
3049                        \quad \mbox{(frame deletion)}\\
3050                &     & \Alt \w{psd\_reg} := \s{stack}[\w{slot}, n]
3051                        \rightarrow \w{node} &
3052                        \quad \mbox{(stack load)}\\
3053                &     & \Alt \s{stack}[\w{slot}, n] := \w{psd\_reg}
3054                        \rightarrow \w{node} &
3055                        \quad \mbox{(stack store)}\\
3056                &     & \Alt \w{hdw\_reg} := \w{psd\_reg}
3057                        \rightarrow \w{node} &
3058                        \quad \mbox{(pseudo to hardware)}\\
3059                &     & \Alt \w{psd\_reg} := \w{hdw\_reg}
3060                        \rightarrow \w{node} &
3061                        \quad \mbox{(hardware to pseudo)}\\
3062                &     & \Alt \w{psd\_reg} := n
3063                        \rightarrow \w{node} & \quad \mbox{(constant)}\\
3064                &     & \Alt \w{psd\_reg} := \w{unop}(\w{psd\_reg})
3065                        \rightarrow \w{node} & \quad \mbox{(unary operation)}\\
3066                &     & \Alt \w{psd\_reg} :=
3067                        \w{binop}(\w{psd\_reg},\w{psd\_reg})
3068                        \rightarrow \w{node} & \quad
3069                        \mbox{(binary operation)}\\
3070                &     & \Alt \w{psd\_reg} := \w{fun\_name}
3071                        \rightarrow \w{node} &
3072                        \quad \mbox{(address of a function)}\\
3073                &     & \Alt \w{psd\_reg} :=
3074                        \w{size}(\w{psd\_reg}[n])
3075                        \rightarrow \w{node} & \quad \mbox{(memory load)}\\
3076                &     & \Alt \w{size}(\w{psd\_reg}[n]) :=
3077                        \w{psd\_reg}
3078                        \rightarrow \w{node} & \quad \mbox{(memory store)}\\
3079                &     & \Alt \w{fun\_ref}(n) \rightarrow \w{node} &
3080                        \quad \mbox{(function call)}\\
3081                &     & \Alt \w{fun\_ref}(n)
3082                        & \quad \mbox{(function tail call)}\\
3083                &     & \Alt \s{test}\ \w{uncon}(\w{psd\_reg}) \rightarrow
3084                        \w{node}, \w{node} & \quad
3085                        \mbox{(branch unary condition)}\\
3086                &     & \Alt \s{test}\ \w{bincon}(\w{psd\_reg},\w{psd\_reg})
3087                        \rightarrow \w{node}, \w{node} & \quad
3088                        \mbox{(branch binary condition)}\\
3089                &     & \Alt \s{return}\ b & \quad \mbox{(return)}
3090\end{array}
3091\]
3092
3093\[
3094\begin{array}{lllllll}
3095\w{fun\_def} & ::= & \w{fun\_name}(n) & \qquad
3096\w{program} & ::= & \s{globals}: n\\
3097             &     & \s{locals:} \w{psd\_reg}^* & \qquad
3098            &     & \w{fun\_def}^*\\
3099             &     & \s{stack:} n\\
3100             &     & \s{entry:} \w{node}\\
3101             &     & (\w{node:} \w{instruction})^*
3102\end{array}
3103\]}
3104\caption{Syntax of the $\ERTL$ language}\label{ERTL:syntax}
3105\end{table}
3106
3107\paragraph{Translation of $\RTL$ to $\ERTL$.}
3108The work consists in expliciting the conventions previously mentioned. These
3109conventions appear when entering, calling and leaving a function, and when
3110referencing a global variable or the address of a local variable.
3111
3112\paragraph{Optimizations.}
3113A \emph{liveness analysis} is performed on $\ERTL$ to replace unused
3114instructions by a $\s{skip}$. An instruction is tagged as unused when it
3115performs an assignment on a register that will not be read afterwards. Also, the
3116result of the liveness analysis is exploited by a \emph{register allocation}
3117algorithm whose result is to efficiently associate a physical location (a
3118hardware register or an address in the stack) to each pseudo register of the
3119program.
3120
3121\subsection{$\LTL$}
3122
3123As in $\ERTL$, the structure of $\LTL$ programs is based on CFGs. Pseudo
3124registers are not used anymore; instead, they are replaced by physical locations
3125(a hardware register or an address in the stack).
3126
3127\paragraph{Syntax.}
3128Except for a few exceptions, the instructions of the language are those of
3129$\ERTL$ with hardware registers replacing pseudo registers. Calling and
3130returning conventions were explicited in $\ERTL$; thus, function calls and
3131returns do not need parameters in $\LTL$. The syntax is defined in table
3132\ref{LTL:syntax}.
3133
3134\begin{table}
3135{\footnotesize
3136\[
3137\begin{array}{lllllll}
3138\w{size} & ::= & \s{Byte} \Alt \s{HalfWord} \Alt \s{Word}  & \qquad
3139\w{fun\_ref} & ::= & \w{fun\_name} \Alt \w{hdw\_reg}
3140\end{array}
3141\]
3142
3143\[
3144\begin{array}{llll}
3145\w{instruction} & ::= & \Alt \s{skip} \rightarrow \w{node} &
3146                        \quad \mbox{(no instruction)}\\
3147                &     & \Alt \s{NewFrame} \rightarrow \w{node} &
3148                        \quad \mbox{(frame creation)}\\
3149                &     & \Alt \s{DelFrame} \rightarrow \w{node} &
3150                        \quad \mbox{(frame deletion)}\\
3151                &     & \Alt \w{hdw\_reg} := n
3152                        \rightarrow \w{node} & \quad \mbox{(constant)}\\
3153                &     & \Alt \w{hdw\_reg} := \w{unop}(\w{hdw\_reg})
3154                        \rightarrow \w{node} & \quad \mbox{(unary operation)}\\
3155                &     & \Alt \w{hdw\_reg} :=
3156                        \w{binop}(\w{hdw\_reg},\w{hdw\_reg})
3157                        \rightarrow \w{node} & \quad
3158                        \mbox{(binary operation)}\\
3159                &     & \Alt \w{hdw\_reg} := \w{fun\_name}
3160                        \rightarrow \w{node} &
3161                        \quad \mbox{(address of a function)}\\
3162                &     & \Alt \w{hdw\_reg} := \w{size}(\w{hdw\_reg}[n])
3163                        \rightarrow \w{node} & \quad \mbox{(memory load)}\\
3164                &     & \Alt \w{size}(\w{hdw\_reg}[n]) := \w{hdw\_reg}
3165                        \rightarrow \w{node} & \quad \mbox{(memory store)}\\
3166                &     & \Alt \w{fun\_ref}() \rightarrow \w{node} &
3167                        \quad \mbox{(function call)}\\
3168                &     & \Alt \w{fun\_ref}()
3169                        & \quad \mbox{(function tail call)}\\
3170                &     & \Alt \s{test}\ \w{uncon}(\w{hdw\_reg}) \rightarrow
3171                        \w{node}, \w{node} & \quad
3172                        \mbox{(branch unary condition)}\\
3173                &     & \Alt \s{test}\ \w{bincon}(\w{hdw\_reg},\w{hdw\_reg})
3174                        \rightarrow \w{node}, \w{node} & \quad
3175                        \mbox{(branch binary condition)}\\
3176                &     & \Alt \s{return} & \quad \mbox{(return)}
3177\end{array}
3178\]
3179
3180\[
3181\begin{array}{lllllll}
3182\w{fun\_def} & ::= & \w{fun\_name}(n) & \qquad
3183\w{program} & ::= & \s{globals}: n\\
3184             &     & \s{locals:} n & \qquad
3185            &     & \w{fun\_def}^*\\
3186             &     & \s{stack:} n\\
3187             &     & \s{entry:} \w{node}\\
3188             &     & (\w{node:} \w{instruction})^*
3189\end{array}
3190\]}
3191\caption{Syntax of the $\LTL$ language}\label{LTL:syntax}
3192\end{table}
3193
3194\paragraph{Translation of $\ERTL$ to $\LTL$.} The translation relies on the
3195results of the liveness analysis and of the register allocation. Unused
3196instructions are eliminated and each pseudo register is replaced by a physical
3197location. In $\LTL$, the size of the stack frame of a function is known;
3198instructions intended to load or store values in the stack are translated
3199using regular load and store instructions.
3200
3201\paragraph{Optimizations.} A \emph{graph compression} algorithm removes empty
3202instructions generated by previous compilation passes and by the liveness
3203analysis.
3204
3205\subsection{$\LIN$}
3206
3207In $\LIN$, the structure of a program is no longer based on CFGs. Every function
3208is represented as a sequence of instructions.
3209
3210\paragraph{Syntax.}
3211The instructions of $\LIN$ are very close to those of $\LTL$. \emph{Program
3212  labels}, \emph{gotos} and branch instructions handle the changes in the
3213control flow. The syntax of $\LIN$ programs is shown in table \ref{LIN:syntax}.
3214
3215\begin{table}
3216{\footnotesize
3217\[
3218\begin{array}{lllllll}
3219\w{size} & ::= & \s{Byte} \Alt \s{HalfWord} \Alt \s{Word}  & \qquad
3220\w{fun\_ref} & ::= & \w{fun\_name} \Alt \w{hdw\_reg}
3221\end{array}
3222\]
3223
3224\[
3225\begin{array}{llll}
3226\w{instruction} & ::= & \Alt \s{NewFrame} &
3227                        \quad \mbox{(frame creation)}\\
3228                &     & \Alt \s{DelFrame} &
3229                        \quad \mbox{(frame deletion)}\\
3230                &     & \Alt \w{hdw\_reg} := n & \quad \mbox{(constant)}\\
3231                &     & \Alt \w{hdw\_reg} := \w{unop}(\w{hdw\_reg})
3232                        & \quad \mbox{(unary operation)}\\
3233                &     & \Alt \w{hdw\_reg} :=
3234                        \w{binop}(\w{hdw\_reg},\w{hdw\_reg})
3235                        & \quad \mbox{(binary operation)}\\
3236                &     & \Alt \w{hdw\_reg} := \w{fun\_name}
3237                        & \quad \mbox{(address of a function)}\\
3238                &     & \Alt \w{hdw\_reg} := \w{size}(\w{hdw\_reg}[n])
3239                        & \quad \mbox{(memory load)}\\
3240                &     & \Alt \w{size}(\w{hdw\_reg}[n]) := \w{hdw\_reg}
3241                        & \quad \mbox{(memory store)}\\
3242                &     & \Alt \s{call}\ \w{fun\_ref}
3243                        & \quad \mbox{(function call)}\\
3244                &     & \Alt \s{tailcall}\ \w{fun\_ref}
3245                        & \quad \mbox{(function tail call)}\\
3246                &     & \Alt \w{uncon}(\w{hdw\_reg}) \rightarrow
3247                        \w{node} & \quad
3248                        \mbox{(branch unary condition)}\\
3249                &     & \Alt \w{bincon}(\w{hdw\_reg},\w{hdw\_reg})
3250                        \rightarrow \w{node} & \quad
3251                        \mbox{(branch binary condition)}\\
3252                &     & \Alt \w{asm\_label:} & \quad \mbox{(assembly label)}\\
3253                &     & \Alt \s{goto}\ \w{mips\_label} & \quad \mbox{(goto)}\\
3254                &     & \Alt \s{return} & \quad \mbox{(return)}
3255\end{array}
3256\]
3257
3258\[
3259\begin{array}{lllllll}
3260\w{fun\_def} & ::= & \w{fun\_name}(n) & \qquad
3261\w{program} & ::= & \s{globals}: n\\
3262             &     & \s{locals:} n & \qquad
3263            &     & \w{fun\_def}^*\\
3264             &     & \w{instruction}^*
3265\end{array}
3266\]}
3267\caption{Syntax of the $\LIN$ language}\label{LIN:syntax}
3268\end{table}
3269
3270\paragraph{Translation of $\LTL$ to $\LIN$.}
3271This translation amounts to transform in an efficient way the graph structure of
3272functions into a linear structure of sequential instructions.
3273
3274\subsection{Assembly}
3275
3276$\mips$ is a rather simple assembly language whereas $\intel$'s is
3277rather complex~\cite{intel94}. We only describe $\mips$. As for other
3278assembly languages, a program in $\mips$ is a sequence of
3279instructions. The $\mips$ code produced by the compilation of a
3280$\Clight$ program starts with a preamble in which some useful and
3281non-primitive functions are predefined (e.g. conversion from 8 bits
3282unsigned integers to 32 bits integers). The subset of the $\mips$
3283assembly language that the compilation produces is defined in
3284table \ref{mips:syntax}.
3285
3286\begin{table}
3287{\footnotesize
3288\[
3289\begin{array}{lllllllllll}
3290\w{load} & ::= & \s{lb} \Alt \s{lhw} \Alt \s{lw} & \qquad
3291\w{store} & ::= & \s{sb} \Alt \s{shw} \Alt \s{sw} & \qquad
3292\w{fun\_ref} & ::= & \w{fun\_name} \Alt \w{hdw\_reg}
3293\end{array}
3294\]
3295
3296\[
3297\begin{array}{llll}
3298\w{instruction} & ::= & \Alt \s{nop} & \quad \mbox{(empty instruction)}\\
3299                &     & \Alt \s{li}\ \w{hdw\_reg}, n & \quad \mbox{(constant)}\\
3300                &     & \Alt \w{unop}\ \w{hdw\_reg}, \w{hdw\_reg}
3301                        & \quad \mbox{(unary operation)}\\
3302                &     & \Alt \w{binop}\
3303                        \w{hdw\_reg},\w{hdw\_reg},\w{hdw\_reg}
3304                        & \quad \mbox{(binary operation)}\\
3305                &     & \Alt \s{la}\ \w{hdw\_reg}, \w{fun\_name}
3306                        & \quad \mbox{(address of a function)}\\
3307                &     & \Alt \w{load}\ \w{hdw\_reg}, n(\w{hdw\_reg})
3308                        & \quad \mbox{(memory load)}\\
3309                &     & \Alt \w{store}\ \w{hdw\_reg}, n(\w{hdw\_reg})
3310                        & \quad \mbox{(memory store)}\\
3311                &     & \Alt \s{call}\ \w{fun\_ref}
3312                        & \quad \mbox{(function call)}\\
3313                &     & \Alt \w{uncon}\ \w{hdw\_reg}, \w{node} & \quad
3314                        \mbox{(branch unary condition)}\\
3315                &     & \Alt \w{bincon}\ \w{hdw\_reg},\w{hdw\_reg},\w{node}
3316                        & \quad \mbox{(branch binary condition)}\\
3317                &     & \Alt \w{mips\_label:} & \quad \mbox{($\mips$ label)}\\
3318                &     & \Alt \s{j}\ \w{mips\_label} & \quad \mbox{(goto)}\\
3319                &     & \Alt \s{return} & \quad \mbox{(return)}
3320\end{array}
3321\]
3322
3323\[
3324\begin{array}{lll}
3325\w{program} & ::= & \s{globals}: n\\
3326            &     & \s{entry}: \w{mips\_label}^*\\
3327            &     & \w{instruction}^*
3328\end{array}
3329\]}
3330\caption{Syntax of the $\mips$ language}\label{mips:syntax}
3331\end{table}
3332
3333\paragraph{Translation of $\LIN$ to $\mips$.} This final translation is simple
3334enough. Stack allocation and deallocation are explicited and the function
3335definitions are sequentialized.
3336
3337
3338\subsection{Benchmarks}
3339\label{C-compiler-benchmarks}
3340\begin{figure}
3341
3342\begin{center}
3343\begin{tabular}{r|rrr}
3344& \texttt{gcc -O0} & \texttt{acc} & \texttt{gcc -O1} \\
3345\hline
3346badsort & 55.93 & 34.51 & 12.96 \\
3347fib & 76.24 & 34.28 & 45.68 \\
3348mat\_det & 163.42 & 156.20 & 54.76 \\ 
3349min & 12.21 & 16.25 & 3.95 \\
3350quicksort & 27.46 & 17.95 & 9.41 \\
3351search & 463.19 & 623.79 & 155.38 \\
3352\hline
3353\end{tabular}
3354\end{center}
3355
3356\caption{Benchmarks results (execution time is given in seconds).}
3357\label{fig:benchmark-results}
3358\end{figure}
3359
3360To ensure that our prototype compiler is realistic, we performed some
3361preliminary benchmarks on a 183MHz MIPS 4KEc processor, running a
3362linux based distribution. We compared the wall clock execution time of
3363several simple~\C\ programs compiled with our compiler against the ones
3364produced by {\sc Gcc} set up with optimization levels 0 and 1. As
3365shown by Figure~\ref{fig:benchmark-results}, our prototype compiler
3366produces executable programs that are on average faster than
3367{\sc Gcc}'s without optimizations.
3368
3369
3370\newpage
3371
3372\section{A direct approach}\label{direct-sec}
3373Our first attempt at building cost annotations followed
3374a `direct' approach which is summarised by the following diagram.
3375
3376
3377\[
3378\xymatrix{
3379% Row 1
3380  L_1 \ar[d]^{An_1} \ar[r]^{\cl{C}_1}
3381& L_2 \ar[d]^{An_1
3382& \ldots \hspace{0.3cm}\ar[r]^{\cl{C}_k}
3383& L_{k+1} \ar[d]^{An_{k+1}}  \\
3384% Row 2
3385  L_1                                 
3386& L_2   
3387& & L_{k+1}
3388}
3389\]
3390
3391With respect to the introductive discussion in section \ref{intro-sec}
3392$L_1$ is the source language with the related annotation function $\w{An}_1$ while
3393$L_{k+1}$ is the object language with a related annotation
3394$\w{An}_{k+1}$. This annotation of the object code is supposed to be truly
3395straightforward and it is taken as an `axiomatic' definition of
3396the `real' execution cost of the program.
3397The languages $L_i$, for $2\leq i \leq k$, are
3398intermediate languages which are also equipped with increasingly `realistic'
3399annotation functions.
3400Suppose we denote with $S$ the source program,
3401with $\cl{C}(S)$ the compiled program, and
3402that we write $(P,s)\eval s'$ to mean that
3403the (source or object) program $P$ in the state $s$
3404terminates successfully in the state $s'$.
3405The soundness proof of the compilation function
3406guarantees that if $(S,s)\eval s'$ then
3407$(\cl{C}(S),s)\eval s'$.
3408In the direct approach, the {\em proof of soundness} of
3409the cost annotations amounts to lift the proof of functional
3410equivalence of the source program and the object code to a proof of
3411`quasi-equivalence' of the respective instrumented codes.
3412Suppose we write $s[c/\w{cost}]$ for a state that associates $c$
3413with the cost variable $\w{cost}$. Then what we want to show is
3414that whenever $(\w{An}_{1}(S),s[c/\w{cost}])\eval s'[c'/\w{cost}]$ 
3415we have that $(\w{An}_{k+1}(\cl{C}(S)),s[d/\w{cost}])\eval s'[d'/\w{cost}]$
3416and $|d'-d|\leq |c'-c|+k$.
3417This means that the increment in the annotated source program bounds up to
3418an additive constant the increment in the annotated object program.
3419We will also say that the cost annotation is precise if we can also prove that
3420$|c'-c|\leq |d'-d|$, {\em i.e.}, the `estimated' cost is not too far
3421away from the `real' one.
3422We will see that while in theory one can build sound and precise
3423annotation functions, in practice definitions and proofs become
3424unwieldy.
3425
3426
3427
3428
3429
3430Applying the direct approach  to the toy compiler,
3431results in the following diagram.
3432
3433
3434\[
3435\xymatrix{
3436\imp 
3437\ar[r]^{\cl{C}}
3438\ar[d]^{An_{\imp}}
3439&
3440\vm
3441\ar[r]^{\cl{C'}}
3442\ar[d]^{An_{\vm}}
3443&
3444\mips
3445\ar[d]^{An_{\mips}}
3446\\
3447\imp 
3448&
3449\vm
3450& \mips\\
3451}
3452\]
3453
3454
3455
3456\subsection{$\mips$ and $\vm$ cost annotations}
3457The definition of the cost annotation $\w{An}_{\mips}(M)$ for a $\mips$ code $M$ goes as follows assuming that all the $\mips$ instructions have cost $1$.
3458
3459\begin{enumerate}
3460
3461\item  We select fresh locations $l_{\w{cost}},l_A,l_B$ not used in $M$.
3462
3463\item Before each instruction in $M$ we insert the following
3464list of  $8$ instructions whose effect is to
3465increase by $1$ the contents of location $l_\w{cost}$:
3466
3467{\footnotesize
3468\[
3469\begin{array}{l}
3470
3471(\s{store} \ A,l_{A}) \cdot
3472(\s{store} \ B,l_{B}) \cdot
3473(\s{loadi} \ A, 1) \cdot
3474(\s{load} \ B, l_{\w{cost}}) \cdot \\
3475(\s{add} \ A, A, B) \cdot
3476(\s{store} \ A,l_{\w{cost}}) \cdot
3477(\s{load} \ A, l_{A}) \cdot
3478(\s{load} \ B, l_{B})~.
3479\end{array}
3480\]}
3481
3482
3483\item We update the offset of the branching instructions according to the map
3484$k \mapsto (9\cdot k)$.
3485
3486\end{enumerate}
3487
3488
3489The definition of the cost annotation
3490$\w{An}_{\vm}(C)$ for a well-formed $\vm$ code $C$ such that
3491$C:h$ goes as follows.
3492
3493\begin{enumerate}
3494
3495\item We select a fresh $\w{cost}$ variable not used in $C$.
3496
3497\item   Before the instruction $i^{\w{th}}$ in $C$,
3498we insert the following list of 4 instructions whose
3499effect is to increase the $\w{cost}$ variable by the maximum number
3500$\kappa(C[i])$ of instructions associated with $C[i]$ (as specified
3501in table \ref{cost-approx-instr}):
3502
3503{\footnotesize\[
3504(\s{cnst}(\kappa(C[i]))) \cdot
3505(\s{var}(\w{cost})) \cdot
3506(\s{add}) \cdot
3507(\s{setvar}(\w{cost}))~.
3508\]}
3509
3510\item We update the offset of the branching instructions according to the map
3511$k\mapsto (5 \cdot k)$.
3512
3513\end{enumerate}
3514
3515
3516\begin{table}
3517{\footnotesize
3518\[
3519\begin{array}{c|cccccc}
3520
3521C[i]=
3522
3523&\s{cnst}(n) \mbox{ or } \s{var}(x)
3524
3525&\s{add}
3526
3527&\s{setvar}(x)
3528
3529&\s{branch}(k)
3530
3531&\s{bge}(k)
3532
3533&\s{halt} \\ \hline
3534
3535
3536\kappa(C[i])=
3537
3538&2 
3539
3540& 4 
3541
3542& 2 
3543
3544& 1 
3545
3546& 3 
3547
3548& 1
3549
3550\end{array}
3551\]}
3552\caption{Upper bounds on the cost of $\vm$ instructions}\label{cost-approx-instr}
3553\end{table}
3554
3555
3556To summarise, the situation is that the instruction $i$ of the $\vm$
3557code $C$ corresponds to: the instruction $5\cdot i +4$ of the
3558annotated $\vm$ code $\w{An}_{\vm}(C)$, the instruction $p(i,C)$ of
3559the $\mips$ code $\cl{C'}(C)$, and the instruction $9\cdot p(i,C)+8$
3560of the instrumented $\mips$ code $\w{An}_{\mips}(\cl{C'}(C))$.
3561
3562The following lemma describes the effect of the injected sequences
3563of instructions.
3564
3565\begin{lemma}\label{annot-vm-mips}
3566The following properties hold:
3567
3568\Defitem{(1)} If $M$ is a $\mips$ instruction then $C_1 \cdot \w{An}_{\mips}(M) \cdot C_2 \Gives  (|C_1|,m[c/l_{\w{cost}}]) \trarrow (|C_1|+8,m[c+1/l_{\w{cost}},m(A)/l_A,m(B)/l_B])$.
3569
3570\Defitem{(2)} If $C$ is a $\vm$ instruction then $C_1 \cdot \w{An}_{\vm}(C) \cdot C_2 \Gives (|C_1|,\sigma,s[c/\w{cost}]) \trarrow (|C_1|+4,\sigma,s[c + \kappa(C[i])/\w{cost}])$, where
3571$i=|C_1|$.
3572
3573\end{lemma}
3574
3575
3576The simulation proposition \ref{simulation-vm-mips} is extended to the
3577annotated codes as follows where we write $\arrow^k$ for the relation
3578obtained by composing $k$ times the reduction relation $\arrow$.
3579
3580
3581\begin{proposition}\label{labelled-simulation-vm-mips}
3582Let $C : h$ be a well-formed code. If $\w{An}_{\vm}(C) \Gives (5 \cdot
3583i, \sigma, s) \arrow^5 (5 \cdot j, \sigma', s')$, $m \real \sigma,
3584s$, and $h(i) = |\sigma|$ then $\w{An}_{\mips}(\cl{C'}(C)) \Gives (9
3585\cdot p(i,C),m) \trarrow (9\cdot p(j,C),m')$, with $m'[s'(\w{cost}) /
3586  l_{\w{cost}}] \real \sigma', s'$ and
3587\[
3588m'(l_{\w{cost}}) - m(l_{\w{cost}}) \le s'(\w{cost}) - s(\w{cost})~.
3589\]
3590\end{proposition}
3591
3592
3593
3594
3595
3596
3597\subsection{$\imp$ cost annotation}\label{imp-cost-annotation}
3598The definition of the cost annotation $\w{An}_{\imp}(P)$ for an $\imp$
3599program $P$ is defined in table \ref{annotation-imp} and it relies on
3600an auxiliary function $\kappa$ which provides an upper bound on the
3601cost of executing the various operators of the language. 
3602For the sake of simplicity, this
3603annotation introduces two main {\em approximations}:
3604
3605\begin{itemize}
3606
3607\item It over-approximates the cost of
3608an $\s{if\_then\_else}$ by taking the maximum cost of the
3609cost of the two branches.
3610
3611\item  It always considers the worst case where
3612the data in the stack resides in the main memory.
3613
3614\end{itemize}
3615
3616We will discuss in section \ref{limitations-direct-sec} to what extent
3617these approximations can be removed.
3618
3619\begin{table}
3620{\footnotesize
3621\[
3622\begin{array}{ll}
3623
3624\w{An}_{\imp}(\s{prog} \ S) &= \w{cost}:=\w{cost}+\kappa(S); \w{An}_{\imp}(S) \\
3625\w{An}_{\imp}(\s{skip})     &= \s{skip} \\
3626\w{An}_{\imp}(x:=e)         &= x:=e \\
3627\w{An}_{\imp}(S;S')         &= \w{An}_{\imp}(S);\w{An}_{\imp}(S') \\
3628\w{An}_{\imp}(\s{if} \ b \ \s{then} \ S \ \s{else} \ S') 
3629&=  (\s{if} \ b \ \s{then} \  \w{An}_{\imp}(S) \\
3630&\qquad\qquad  \s{else} \ \w{An}_{\imp}(S')) \\ 
3631
3632\w{An}_{\imp}(\s{while} \ b \ \s{do} \ S) 
3633&= (\s{while} \ b \
3634\s{do} \ \w{cost}:=\w{cost}+\kappa(b)+\kappa(S)+1; \w{An}_{\imp}(S)  ) 
3635
3636\end{array}
3637\]
3638\[
3639\begin{array}{ll}
3640
3641\kappa(\s{skip})  =0 
3642&\kappa(x:=e)       =\kappa(e) + \kappa(\s{setvar}) \\
3643
3644\kappa(S;S')       = \kappa(S)+\kappa(S') 
3645
3646&\kappa(\s{if} \ b \ \s{then} \ S \ \s{else} \ S') 
3647                   =\kappa(b)+\w{max}(\kappa(S) + \kappa(\s{branch}),\kappa(S')) \\
3648
3649\kappa(\s{while} \ b \ \s{do} \ S)
3650                   =\kappa(b)  \\
3651
3652\kappa(e<e')      =\kappa(e)+\kappa(e')+ \kappa(\s{bge})
3653
3654&\kappa(e+e')    = \kappa(e)+\kappa(e')+ \kappa(\s{add})  \\
3655
3656\kappa(n)       = \kappa(\s{cnst})
3657
3658&\kappa(x)       = \kappa(\s{var})
3659
3660\end{array}
3661\]}
3662\caption{Annotation for $\imp$ programs}\label{annotation-imp}
3663\end{table}
3664
3665Next we formulate the soundness of the $\imp$ annotation with respect
3666to the $\vm$ annotation along the pattern of proposition
3667\ref{soundness-big-step-prop}.
3668
3669
3670
3671\begin{proposition}\label{annot-imp-vm}
3672The following properties hold.
3673
3674\Defitem{(1)} If $(e,s)\eval v$ then
3675\[
3676C \cdot \w{An}_{\vm}(\cl{C}(e)) \cdot C'
3677\Gives (i,\sigma,s[c/\w{cost}]) \trarrow 
3678(j,v \cdot \sigma,s[c+\kappa(e)/\w{cost}])
3679\]
3680where $i=|C|$, $j=|C\cdot  \w{An}_{\vm}(\cl{C}(e))|$.
3681
3682
3683\Defitem{(2)} If $(b,s)\eval \s{true}$ then
3684\[
3685C \cdot \w{An}_{\vm}(\cl{C}(b,k)) \cdot C'
3686\Gives (i,\sigma,s[c/\w{cost}]) \trarrow 
3687(5\cdot k +j, \sigma,s[c+\kappa(b)/\w{cost}])
3688\]
3689where $i=|C|$, $j=|C\cdot  \w{An}_{\vm}(\cl{C}(b,k))|$.
3690
3691\Defitem{(3)} If
3692$(b,s)\eval \s{false}$ then
3693\[
3694C \cdot \w{An}_{\vm}(\cl{C}(b,k)) \cdot C'
3695\Gives (i,\sigma,s[c/\w{cost}]) \trarrow 
3696(j,\sigma,s[c+\kappa(b)/\w{cost}])
3697\]
3698where $i=|C|$, $j=|C\cdot  \w{An}_{\vm}(\cl{C}(b,k))|$.
3699
3700\Defitem{(4)}
3701If $(\w{An}_{\imp}(S),s[c/\w{cost}])\eval s'[c'/\w{cost}]$ then
3702\[
3703C \cdot \w{An}_{\vm}(\cl{C}(S)) \cdot C'
3704\Gives (i,\sigma,s[d/\w{cost}]) \trarrow (j,\sigma,s'[d'/\w{cost}]) 
3705\]
3706where $i=|C|$, $j=|C\cdot  \w{An}_{\vm}(\cl{C}(S))|$,
3707and $(d'-d) \leq (c'-c)+\kappa(S)$.
3708\end{proposition}
3709
3710
3711\subsection{Composition}
3712The soundness of the cost annotations can be composed
3713so as to obtain the soundness of the cost annotations of the source
3714program relatively to the one of the object code.
3715
3716\begin{proposition}\label{annot-composition}
3717If $(\w{An}_{\imp}(P),s[0/\w{cost}])\eval s'[c'/\w{cost}]$ 
3718and $m\real \epsilon,s[0/l_{\w{cost}}]$ then
3719\[
3720(\w{An}_{\mips}(\cl{C'}(\cl{C}(P))),m)
3721 \eval m'
3722\]
3723where $m'(l_{\w{cost}})\leq c'$ and $m'[c'/l_{\w{cost}}] \real \epsilon,s'[c'/\w{cost}]$.
3724\end{proposition}
3725
3726
3727
3728\subsection{$\coq$ development}\label{coq-development-sec}
3729We have formalised and mechanically checked in {\sc Coq} the
3730application of the direct approach to the toy compiler (but for
3731propositions \ref{labelled-simulation-vm-mips} and
3732\ref{annot-composition} for which
3733we produced `paper proofs'). 
3734By current standards, this is a small size
3735development including 1000 lines of specifications and 4000 lines of
3736proofs.  Still there are a couple of points that deserve to be
3737mentioned.  First, we did not find a way of re-using the soundness
3738proof of the compiler in a modular way.  As a matter of fact, the
3739soundness proof of the annotations is intertwined with the soundness
3740proof of the compiler.  Second, the manipulation of the cost variables
3741in the annotated programs entails a significant increase in the size
3742of the proofs.  In particular, the soundness proof for the compilation
3743function $\cl{C}$ from $\imp$ to $\vm$ available in \cite{Leroy09-ln}
3744is roughly $7$ times smaller than the soundness proof of the
3745annotation function of the $\imp$ code relatively to the one of the
3746$\vm$ code.
3747
3748
3749\subsection{Limitations of the direct approach}\label{limitations-direct-sec}
3750As mentioned in section \ref{imp-cost-annotation}, the annotation
3751function for the source language introduces some over-approximation
3752thus {\em failing to be precise}.  On one hand, it is easy to modify
3753the definitions in table \ref{annotation-imp} so that they compute the
3754cost of each branch of an {\tt if\_then\_else} separately rather than
3755taking the maximum cost of the two branches.  On the other hand, it is
3756rather difficult to refine the annotation function so that it accounts
3757for the memory hierarchy in the $\mips$ machine; one needs to pass to
3758the function $\kappa$ an `hidden parameter' which corresponds to the
3759stack height.  This process of pushing hidden parameters into the
3760definition of the annotation is error prone and it seems unlikely to
3761work in practice for a realistic compiler. We programmed sound (but
3762not precise) cost annotations for the $\C$ compiler introduced in
3763section \ref{Ccompiler-intro} and found that the approach is difficult
3764to test because an over-approximation of the cost at some point may
3765easily compensate an under-approximation somewhere else. By contrast,
3766in the labelling approach introduced in section \ref{label-intro}, we
3767manipulate costs at an abstract level as labels and produce numerical
3768values only at the very end of the construction.
3769
3770
3771
3772\newpage
3773
3774
3775\section{Related approaches}\label{related-sec}
3776Our fine-grained approach to labelling is based on the hypothesis that we can
3777obtain precise information on the execution time of each instruction
3778of the generated binary code.  This hypothesis is indeed realised in
3779the processors of the $\intel$ family we are considering. On the other
3780hand, the execution time of instructions running on more complex
3781architectures including, {\em e.g.}, cache memory or pipelines are
3782much less predictable.  This lack of precision may somehow be
3783compensated by analysing the worst-case execution time of sequences of
3784instructions.  As mentioned in the introduction, tools such as those
3785developped by $\absint$ require explicit annotations of the binaries
3786to determine the number of times a loop is iterated and apply abstract
3787interpretation methods in order to provide an estimation of the
3788execution time of the code.  From the point of view of a {\em formal}
3789certification this approach rises at least two questions.
3790
3791First, one needs a formal certification of the fact that the abstract
3792interpretation method does indeed produce correct results for a given
3793processor.  This in turn supposes a formal modelling of the processor
3794and a proof that the abstract interpretation method does indeed
3795abstract the processor's behaviour.  Taking a pragmatic (and
3796informal) approach, one could argue that the results of the tool could
3797be trusted, in the same way we trust the execution times given by the
3798manufacturers.
3799
3800
3801Second, one needs a formal certification of the fact that the
3802annotations on the loops are indeed correct.  For instance, suppose we
3803can identify in the source code {\em for-loops} that iterate a {\em
3804constant} number of times some sequence of statements.
3805%To make the discussion concrete,
3806%let us assume our toy $\imp$ language is enriched with an instruction
3807%$\s{loop} \ n \ S$ that executes $n$ times the command $S$.
3808In our approach, we must insert a label in the body of
3809the loop so as to avoid that the control flow graph associated
3810with the binary code contains a loop that does not cross any label.
3811Having determined the cost of one iteration of the body of the loop,
3812we can then reason at the level of the source code to obtain the cost
3813of several iterations.
3814
3815In an alternative approach, we could try to propagate to the control flow graph
3816the information that the body of the loop is iterated a constant number of
3817times and then invoke tools such as those mentioned above to get
3818an estimation of the cost of running the whole iteration.
3819As mentioned above, the rationale for this approach is to get a more precise
3820estimation of the cost of executing the iteration on a complex architecture.
3821While the approach seems easily implementable, it is not clear how
3822one should proceed in order to formally certify its correctness.
3823Specifically, one needs to propagate enough information at the level
3824of the binary code to be able to verify automatically that the
3825predicted number of iterations of the body of the loop is indeed correct.
3826In turn, this means to be able to track down the initial value of a certain
3827number of memory locations (registers and/or stack locations)
3828and to guarantee that their contents is modified at each iteration leading
3829eventually to the exit of the loop. While this seems within the
3830reach of a {\em proof-carrying code} approach, the challenge is to devise
3831a more lightweight approach which targets the specific properties of interest.
3832
3833
3834\newpage
3835
3836\section{Assessment of the deliverable within the $\cerco$ project, with hindsight}\label{asmt-sec}
3837
3838
3839The port of the compiler to the $\intel$ processor has required
3840a major and rather unrewarding implementation effort which is
3841described in deliverable D2.2.
3842
3843One problem that has emerged is that the compilation of general
3844purpose $\C$ programs with recursion and/or 16-32 integers bits
3845produces binaries whose size approaches quickly the limits of the
3846architecture (program memory is limited to 64Kbytes).
3847
3848This is a problem for instance for the case study which is planned in
3849deliverable D5.3 (Case study: analysis of synchronous code) whose
3850purpose is to produce automatically cost annotations for the $\C$
3851programs generated by a compiler of a synchronous language such as
3852{\sc Lustre}.  The generated $\C$ programs are rather large
3853(though their control flow is simple) and assume 32 bits integers.
3854
3855At the time being, two options are being considered as far as
3856deliverable D5.3 is concerned.
3857
3858\begin{enumerate}
3859
3860\item Improve the efficiency of the compilation of 16/32 bits
3861integers for the $\intel$ compiler.
3862
3863\item Conduct the case study relying on a compiler to a 32 bits
3864processor.
3865
3866\end{enumerate}
3867
3868If the second option is chosen, one possibility is to target the {\sc
3869Mips} processor for which a prototype compiler has already been
3870developed.
3871More generally, one advantage of the second option is that it allows
3872to conduct realistic case studies on a larger class of programming constructs.
3873For instance, we are interested in studying the extension of the
3874labelling approach to a compilation chain from
3875a functional language of the {\sc ML} family to $\C$.
Note: See TracBrowser for help on using the repository browser.