source: LTS/paper/ccexec2.tex

Last change on this file was 3594, checked in by sacerdot, 3 years ago

Old stuff never committed is now committed

File size: 29.3 KB
12% \usepackage{amsmath}
23% NB: might be worth removing this if changing class in favour of
24% a built-in definition.
26% \usepackage{aliascnt} %% this will make autoref give correct names
27% \def\mynewtheorem#1[#2]#3{%
28%   \newaliascnt{#1}{#2}%
29%   \newtheorem{#1}[#1]{#3}%
30%   \aliascntresetthe{#1}%
31%   \expandafter\def\csname #1autorefname\endcsname{#3}%
32% }
33% \let\proof\relax
34% \let\endproof\relax
35% \usepackage{amsthm}
36% theorem environments
37% \theoremstyle{definition}
41  {keywords={Definition,Lemma,Theorem,Remark,Qed,Save,Inductive,Record},
42   morekeywords={[2]if,then,else,forall,Prop,match,with,end,let},
43  }
46  morekeywords={int8_t},
47  mathescape,
48  escapechar=\%
54%         keywordstyle=\color{red}\bfseries,
55%         keywordstyle=[2]\color{blue},
56%         commentstyle=\color{green},
57%         stringstyle=\color{blue},
58        showspaces=false,showstringspaces=false,
59%         xleftmargin=1em
60        }
65\pgfutil@ifundefined{pgf@arrow@code@implies}{% supply for lack of double arrow special arrow tip if it is not there%
66  \pgfarrowsdeclare{implies}{implies}%
67  {%
68  \pgfarrowsleftextend{2.2pt}%
69  \pgfarrowsrightextend{2.2pt}%
70  }%
71  {%
72    \pgfsetdash{}{0pt} % do not dash%
73    \pgfsetlinewidth{.33pt}%
74    \pgfsetroundjoin   % fix join%
75    \pgfsetroundcap    % fix cap%
76    \pgfpathmoveto{\pgfpoint{-1.5pt}{2.5pt}}%
77    \pgfpathcurveto{\pgfpoint{-.75pt}{1.5pt}}{\pgfpoint{0.5pt}{.5pt}}{\pgfpoint{2pt}{0pt}}%
78    \pgfpathcurveto{\pgfpoint{0.5pt}{-.5pt}}{\pgfpoint{-.75pt}{-1.5pt}}{\pgfpoint{-1.5pt}{-2.5pt}}%
79    \pgfusepathqstroke%
80  }%
84\tikzset{state/.style={inner sep = 0, outer sep = 2pt, draw, fill},
85         every node/.style={inner sep=2pt},
86         every on chain/.style = {inner sep = 0, outer sep = 2pt},
87         join all/.style = {every on chain/.append style={join}},
88         on/.style={on chain={#1}, state},
89         m/.style={execute at begin node=$, execute at end node=$},
90         node distance=3mm,
91         is other/.style={circle, minimum size = 3pt, state},
92         other/.style={on, is other},
93         is jump/.style={diamond, minimum size = 6pt, state},
94         jump/.style={on, is jump},
95         is call/.style={regular polygon, regular polygon sides=3, minimum size=5pt, state},
96         call/.style={on=going below, is call, node distance=6mm, label=above left:$#1$},
97         is ret/.style={regular polygon, regular polygon sides=3, minimum size=5pt, shape border rotate=180, state},
98         ret/.style={on=going above, is ret, node distance=6mm},
99         chain/.style={start chain=#1 going left},
100         rev ar/.style={stealth-, thick},
101         ar/.style={-stealth, thick},
102         every join/.style={rev ar},
103         vcenter/.style={baseline={([yshift=-.5ex]current bounding box)}},
104         every picture/.style={thick},
105         double equal sign distance/.prefix style={double distance=1.5pt}, %% if already defined (newest version of pgf) it should be ignored%}
106         implies/.style={double, -implies, thin, double equal sign distance, shorten <=5pt, shorten >=5pt},
107         new/.style={densely dashed},
108         newn/.style={solid, fill=white},
109         rel/.style={font=\scriptsize, fill=white, inner sep=2pt},
110         diag/.style={row sep={11mm,between origins},
111                      column sep={11mm,between origins},
112                      every node/.style={execute at begin node=$, execute at end node=$},
113%                       every node/.style={draw, is other},
114                      },
115         small vgap/.style={row sep={7mm,between origins}},
116         small hgap/.style={column sep={7mm,between origins}},
117         small gap/.style={small vgap, small hgap},
118         % for article, maybe temporary
119         is jump/.style=is other,
120         is call/.style=is other,
121         is ret/.style=is other,
124\def\L{\mathrel{\mathcal L}}
125\def\S{\mathrel{\mathcal S}}
126\def\R{\mathrel{\mathcal R}}
127\def\C{\mathrel{\mathcal C}}
130\def\Labels{\mathbb L}
131\def\Functions{\mathbb F}
134\savebox{\execbox}{\tikz[baseline=-.5ex]\draw [-stealth] (0,0) -- ++(1em, 0);}
135% \newcommand{\exec}{\ensuremath{\mathrel{\usebox{\execbox}}}}
139                                identifierstyle=\texttt]}
152% \def\st@ck#1[#2]{\stackrel{#2}{#1}}
153% \def\defst@ck#1#2{\def#1{\@ifnextchar[{\st@ck#2}{#2}}}
155% \let\MeasTo\rightsquigarrow
162\title{The labelling approach to precise resource analysis\\ on the source code, revisited\thanks{The project CerCo acknowledges the financial support of the Future and
163Emerging Technologies (FET) programme within the Seventh Framework
164Programme for Research of the European Commission, under FET-Open grant
165number: 243881}}
167Mauro Piccolo$^{1}$ \and
168Claudio Sacerdoti Coen$^{2}$ \and
169Paolo Tranquilli$^{2}$
172Department of Computer Science, University of Turin$^{1}$,\\
173Department of Computer Science and Engineering, University of Bologna$^{2}$
181The labelling approach is a technique to lift cost models for non-functional
182properties of programs from the object code to the source. It allows to
183perform precise resource analysis of programs directly on the
184source code, reconciling functional and non functional analysis.
186The labelling approach is based on the preservation of the structure of the
187high level program in every intermediate language used by the compiler.
188In order to prove the cost model correct, the semantics of programs is described
189with a labelled transition system that makes the program structure observable.
191The original version of the labelling approach does not simply scale to function
192calls and it may not work properly with source instructions that have several
193predecessors. In the talk we will present an improved version that take cares
194of both limitations and that is more modular.
196Most of the results presented have being mechanised in the interactive
197theorem prover Matita.
201The \emph{labelling approach} has been introduced in~\cite{easylabelling} as
202a technique to \emph{lift} cost models for non-functional properties of programs
203from the object code to the source code. Examples of non-functional properties
204are execution time, amount of stack/heap space consumed and energy required for
205communication. The basic premise of the approach is that it is impossible to
206provide a \emph{uniform} cost model for an high level language that is preserved
207\emph{precisely} by a compiler. For instance, two instances of an assignment
208$x = y$ in the source code can be compiled very differently according to the
209place (registers vs stack) where $x$ and $y$ are stored at the moment of
210execution. Therefore a precise cost model must assign a different cost
211to every occurrence, and the exact cost can only be known after compilation.
213According to the labelling approach, the compiler is free to compile and optimise
214the source code without any major restriction, but it must keep trace
215of what happens to basic blocks during the compilation. The cost model is
216then computed on the object code. It assigns a cost to every basic block.
217Finally, the compiler propagates back the cost model to the source level,
218assigning a cost to each basic block of the source code.
220Implementing the labelling approach in a certified compiler
221allows to reason formally on the high level source code of a program to prove
222non-functional properties that are granted to be preserved by the compiler
223itself. The trusted code base is then reduced to 1) the interactive theorem
224prover (or its kernel) used in the certification of the compiler and
2252) the software used to certify the property on the source language, that
226can be itself certified further reducing the trusted code base.
227In~\cite{easylabelling} the authors provide an example of a simple
228certified compiler that implements the labelling approach for the
229imperative \verb+While+ language~\cite{while}, that does not have
230pointers and function calls.
232The labelling approach has been shown to scale to more interesting scenarios.
233In particular in~\cite{functionallabelling} it has been applied to a functional
234language and in~\cite{loopoptimizations} it has been shown that the approach
235can be slightly complicated to handle loop optimisations and, more generally,
236program optimisations that do not preserve the structure of basic blocks.
237On-going work also shows that the labelling approach is also compatible with
238the complex analyses required to obtain a cost model for object code
239on processors that implement advanced features like pipelining, superscalar
240architectures and caches.
242In the European Project CerCo (Certified Complexity, \url{})~\cite{cerco} we are certifying a labelling approach based compiler for a large subset of C to 8051 object code. The compiler is
243moderately optimising and implements a compilation chain that is largely
244inspired to that of CompCert~\cite{compcert1,compcert2}.
245Compared to work done in~\cite{easylabelling}, the main novelties and source of difficulties are 1) the presence of function calls; 2) their interaction with
246stateful hardware components; 3) the realisation that instructions with multiple
247predecessors and instructions that need to immediately precede the beginning
248of a basic block could not be compiled preserving all the invariants required
249by the labelling approach.
251The aforementioned difficulties forced us to rethink the labelling approach.
252Therefore we proposed in~\cite{report} a much more complex solution that
253accommodates function calls, but does not solve the other problems.
254In this talk we will describe a third version of the labelling approach which
255greatly simplifies the second version and solves at once all the issues known
256at this time. Most of the results presented have also been formalised in
257the Matita interactive theorem prover, and we expect to complete the
258formalisation in the near future.
260The plan of the abstract is the following.
261In Section~\ref{sec:labelling} we summarise the original version of the
262labelling method. In Section~\ref{sec:limitations} we only hint at the issues
263involved with the original methods and we suggest possible
264solutions. Instead in the talk we plan to present the technical statements
265and proofs as formalised in Matita.
267\section{The (basic) labelling approach}
269% \subsection{A brief introduction to the labelling approach}
270We briefly explain the labelling approach as introduced in~\cite{easylabelling}
271on the example in \autoref{examplewhile}.
272The user wants to analyse the execution time of the program (the black lines in
273\autoref{subfig:example_input}). He compiles the program using
274a special compiler that first inserts in the code three label emission
275statements (\verb+EMIT $L_i$+) to mark the beginning of basic blocks
277then the compiler compiles the code to machine code (\autoref{subfig:example_oc}),
278granting that the execution of the source and object
279code emits the same sequence of labels ($L_1; L_2; L_2; L_3$ in the example).
280This is achieved by keeping track of basic blocks during compilation, avoiding
281all optimisations that alter the control flow. The latter can be recovered with
282a more refined version of the labelling approach~\cite{loopoptimizations}, but in the
283present abstract we stick to this simple variant for simplicity. Once the object
284code is produced, the compiler runs a static code analyser to associate to
285each label $L_1, \ldots, L_3$ the cost (in clock cycles) of the instructions
286that belong to the corresponding basic block. For example, the cost $k_1$
287associated to $L_1$ is the number of cycles required to execute the block
288$I_3$ and \verb+COND $l_2$+, while the cost $k_2$ associated to $L_2$ counts the
289cycles required by the block $I_4$, \verb+GOTO $l_1$+ and \verb+COND $l_2$+. The compiler also guarantees
290that every executed instruction is in the scope of some code emission label,
291that each scope does not contain loops (to associate a finite cost), and that
292both branches of a conditional statement are followed by a code emission
293statement. Under these assumptions it is true that the total execution cost
294of the program $\Delta_t$ is equal to the sum over the sequence of emitted
295labels of the cost associated to every label:
296$\Delta_t = k(L_1; L_2; L_2; L_3) = k_1 + k_2 + k_2 + k_3$.
297Finally, the compiler emits an instrumented version of the source code
298(\autoref{subfig:example_instrument}) where label emission statements are replaced
299by increments of a global variable \verb+cost+ that, before every increment, holds the
300exact number of clock cycles spent by the microprocessor so far:
301the difference $\Deltacost$ between the final and initial value of the
302internal clock is $\Deltacost = k_1 + k_2 + k_2 + k_3 = \Delta_t$. Finally, the
303user can employ any available method (e.g. Hoare logic, invariant generators,
304abstract interpretation and automated provers) to certify that $\Deltacost$
305never exceeds a certain bound~\cite{cerco}, which is now a functional property
306of the code.
312|EMIT $L_1$;|
314for (i=0; i<2; i++) {
315  |EMIT $L_2$;|
316  $I_2$;
318|EMIT $L_3$;|
320\caption{The input program (black lines) with its labelling (red lines).}
326   EMIT $L_1$
327   $I_3$
328$l_1$:$\!$  COND $l_2$
329   EMIT $L_2$
330   $I_4$
331   GOTO $l_1$
332$l_2$:$\!$  EMIT $L_3$
334\caption{The output labelled object code.}
340cost += $k_1$;
342for (i=0; i<2; i++) {
343  cost += $k_2$;
344  $I_2$;
346cost += $k_3$;           
348\caption{The output instrumented code.}
351% \begin{verbatim}
352% EMIT L_1;                         EMIT L_1         cost += k_1;
353% I_1;                              I_3              I_1;
354% for (i=0; i<2; i++) {        l_1: COND l_2         for (i=0; i<2; i++) {
355%   EMIT L_2;                       EMIT L_2           cost += k_2;           
356%   I_2;                            I_4                I_2;               
357%  }                                GOTO l_1          }                   
358% EMIT L_3;                    l_2: EMIT L_3         cost += k_3;           
359% \end{verbatim}
360\caption{The labelling approach applied to a simple program.\label{examplewhile}. The $I_i$ are sequences of instructions not containing jumps or loops. }
363\section{The labelling approach revisited.}
365We briefly revise here some hidden assumptions and limitations of the original
366labelling approach described in Section~\ref{sec:labelling} and called
367\emph{basic} labelling approach in the rest of this abstract. We also hint
368at the solutions. The technical details and all the formal statements will
369be given in the presentation only.
371\subsection{Conditional statements and multiple predecessors}
373In order for the cost model to account for every instruction, object
374code instructions with multiple successors (e.g. conditional branches) must
375pass control to basic blocks that start with a label emission statement.
376Moreover, the cost of execution of the branching instruction must be
377constant whatever branch is taken.
379\paragraph{The problem:} In order to grant the first previous assumption, label emissions are introduced
380after every branch in the source code and the compilation is expected to
381preserve this invariant. The instruction that starts a branch, however, may
382have multiple predecessors. While this is rarely the case in an high level
383structured language, it is still be possible in C (due to \texttt{goto}s and
384the fall-back behaviour of \texttt{switch} branches) and it becomes very common
385during intermediate compilation phases (e.g. the instruction that follows a
386\texttt{while} loop may be reached from two different jumps if the loop hoisting
387optimisation is applied). Moreover, in actual architectures it may be the case
388that an high level branching instruction needs to be compiled to an
389instruction that locally branches to a non conditional jump to the actual
390beginning of the branch. Example: \texttt{\mbox{if(E) \{ EMIT L; I \}}} may be
391compiled to \texttt{\mbox{SJNEQ l1; JUMP l3; l1: JUMP l2; \ldots\; l2: EMIT L; I}} where
392the short conditional branch \texttt{SJNEQ} can only jump within 256 bytes
393and the label emission is too far away. Hence the need for the indirect jump
394through \texttt{JUMP l2}. This happens, for instance, in the
395hardware architecture we picked in CerCo.
397Clearly, the example code above violates the requirement because
398the \texttt{JUMP l3} instruction is not in the scope of any label and its cost
399will be missed. We would like to move the label emission away from
400\texttt{I} and put it just before the jump. This is incorrect as well when
401the block of \texttt{I} can be reached from other predecessors because in this
402case the label \texttt{L} would be emitted only if coming from the \texttt{if}
403instruction. The traces emitted in the source and object
404code would be different and the cost computed at the source level would be
407\paragraph{The solution:} in order to solve the problem, we change the
408labelling approach modifying the syntax (and semantics) of all the
409languages involved. In place of (or in addition to) having label emission
410statements, we incorporate label emissions into all branching instructions.
411For example, an \texttt{if} statement would now have syntax
412\texttt{if(E) L:\{I\}} meaning that the label \texttt{L} is emitted when
413the block \texttt{I} is reached coming from the \texttt{if} branch. If
414\texttt{I} has another predecessor (e.g. a \texttt{GOTO} statement), this
415other predecessor will be label emitting too (e.g. \texttt{GOTO L:l} to mean
416jump to location \texttt{l} emitting the label \texttt{L}). Instrumentation
417of the source code is now more complex, requiring a memory cell to remember
418the last block visited and thus pay the correct cost to be emitted at the
419beginning of a basic block.
421The issue seen above is now solved: the \texttt{SJNEQ} instruction will emit
422the label \texttt{L} that will pay for both \texttt{JUMP~l3} and \texttt{I}.
423Similarly, the other assembly level predecessor of \texttt{I} will emit another
424label that will pay for~\texttt{I}.
425This new schema also allows to cope with architectures where a branching
426instruction has a different cost according to the branch taken. It suffices
427to make the labels emitted by the instruction pay the cost of the branching
428instruction. This is not possible in the basic labelling approach because the
429label emissions can have multiple predecessors.
431\subsection{Function calls}
434Let us now consider a simple program written in C that contains a function
435pointer call inside the scope of the cost label $L_1$, in \autoref{subfig:call_input}.
436The labelling method works exactly as before, inserting
437code emission statements/\verb+cost+ variable increments at the beginning
438of every basic block and at the beginning of every function. The compiler
439still grants that the sequence of labels observed on the two programs are
440the same. A new difficulty appears when the compiler needs to statically
441analyse the object code to assign a cost to every label. What should the scope
442of the $L_1$ label be? After executing the $I_4$ block, the \verb+CALL+
443statement passes control to a function that cannot be determined statically.
444Therefore the cost of executing the body must be paid by some other label
445(hence the requirement that every function starts with a code emission
446statement). What label should pay for the cost for the block $I_5$? The only
447reasonable answer is $L_1$, i.e. \emph{the scope of labels should extend to the
448next label emission statement or the end of the function, stepping over function calls}.
455void main() {
456  EMIT $L_1$;
457  $I_1$;
458  (*f)();
459  $I_2$;
462void g() {
463  EMIT $L_2$;
464  $I_3$;
467\caption{The input labelled C program.}
475  EMIT $L_1$
476  $I_4$
477  CALL
478  $I_5$
482  EMIT $L_2$
483  $I_6$
486\caption{The output labelled object code.}
490\caption{An example compilation of a simple program with a function pointer
491         call.}
495The latter definition of scope is adequate on the source level because
496C is a structured language that guarantees that every function call, if it
497returns, passes control to the first instruction that follows the call. However,
498this is not guaranteed for object code, the backend languages of a compiler
499and, more generally, for unstructured
500languages that use a writable control stack to store the return address of
501calls. For example, $I_6$ could increment by $1$ the return address on the
502stack so that the next \verb+RETURN+ would start at the second instruction
503of $I_5$. The compiler would still be perfectly correct if a random, dead
504code instruction was added after the \verb+CALL+ as the first instruction of $I_5$. More generally,
505\emph{there is no guarantee that a correct compiler that respects the functional
506behaviour of a program also respects the calling structure of the source code}.
507Without such an assumption, however, it may not be true that the execution cost
508of the program is the sum of the costs associated to the labels emitted. In our
509example, the cost of $I_5$ is paid by $L_1$, but in place of $I_5$ the processor could execute any other code after $g$ returns. We are therefore in the following situation.
511\paragraph{Assumptions:} to statically associate a cost to every block/label,
512the compiler needs to statically predict that all instructions in the block will
513be reached during computation. Moreover, if the sequence of instructions in the
514block is not sequentially executed, the function that computes the cost of the
515block given the cost of the instructions need to be commutative to allow
516reordering of instructions.
518\paragraph{The problem:} in the intermediate languages and the object code
519it is not possible to statically predict which instruction will be executed
520after a function returns. Thus, in order to extend the scope of blocks after
521function calls, we need to grant an additional property on the object code
522that is not granted by the basic labelling approach. In the latter case,
523moreover, the body of the function called is executed before the instructions
524after the function returns, requiring the function that computes the cost to
525be commutative. Because of stateful hardware components (e.g. caches and
526pipelines), the cost of execution of an instruction depends on the execution
527history. Therefore the function that given the steps in history computes the
528total cost is not commutative. Thus extending blocks after function calls
529does not scale to modern hardware.
531~\\\noindent We investigated two possible solutions.
533\paragraph{First solution:} we add more structure to execution traces. An
534execution trace is no longer a flat stream of observables. Instead, it becomes
535a recursively defined datatype where observables that correspond to function
536calls carry a nested copy of the datatype that describes the instructions
537executed in the function body. Moreover, it is now possible to observe the
538address of instructions in memory. Finally, we define the requirement that the
539observed function call and the instruction that follows the sub-datatype must
540be consecutive in memory. The only programs whose traces satisfy the requirement
541are those that do not tamper with the stack and such that every function call
542terminates returning just after the call.
544The next step consists in defining local simulation conditions that have
545the following property: if every step in the source program is simulated
546by a structured trace fragment that satisfy the local conditions, then every
547trace in the source code is simulated by a structured trace in the object code
548that satisfy the requirement.
550The main drawbacks of this solutions are: 1) the local simulation conditions
551are quite technical and the beauty of the simulation argument is lost in
552the details. We will present them and the simulation argument during the
553talk; 2) it allows for block scopes that extend after function calls,
554but it does not solve the problem with stateful hardware components since it
555still requires the cost function to be commutative.
557\paragraph{Second solution:} intuitively, we just ask every function call to be
558immediately followed by a label emission statement. Thus the scope of blocks
559no longer extends after function calls, solving both issues. More technically,
560we require return instructions to emit the cost label associated to the
561predecessor of the instruction control is returned to. No additional structure
562on execution traces is required and the simulation argument is essentially the
563same used with standard LTS and the basic labelling approach. Moreover, stateful
564hardware components are seamlessly supported.
566The main drawbacks of this solution are: 1) the solution requires many more
567labels to be inserted in the source code, since blocks are now smaller. In
568turn, this makes the computation of cost invariants for source programs more
569difficult, the proof that the invariants hold harder and the computed cost
570less readable; 2) in order to reason on the cost of the source program, the
571user now needs to know that the value of the \texttt{cost} variable at the
572end of a block that contains function calls is the sum of the increments
573that occur after every call. This is not trivial because it requires a proof
574of termination of every call involved. In the first solution, instead, just
575a global assumption of termination of the whole program was required.
577\paragraph{Amendment to the second solution:} in order to mitigate the
578problems of the second solution, we studied a preliminary compilation step
579that turns a program labelled according to the first solution into one labelled
580according to the second solution. In practice, more observables are introduced
581just after function calls. The transformation also builds a sort of
582inverse function that maps traces of the compiled program back to traces of the
583source program (removing the new observables) and costs of blocks of the
584compiled program to costs of blocks of the source program (by consolidating
585the blocks together).
587The benefits of this pass is that the user can still reason with smaller
588blocks and simpler costs like in the first solution, and the remaining parts
589of the compiler do not need to care about extra invariants like in the second
590solution. The proof of simulation is also as simple as in the second solution.
591The drawback is that the consolidation is not always possible in case of non
592commutative costs (e.g. in presence of stateful hardware components).
594The additional pass was suggested as an obvious improvement of the second
595solution by an anonymous reviewer of a previous paper. Surprisingly at first,
596the proof of correctness of the pass is actually quite complex and its
597mechanisation in Matita is not finished yet. We will describe in the talk
598the reasons for the complexity. Intuitively, however, it is clear that the
599pass is correct only under non trivial assumptions like termination
600of every initiated function call. Moreover, following the literature we
601described the pass on a simple programming language equipped with an SOS
602semantics. In SOS it happens that parts of the source program are duplicated
603during execution (e.g. when entering the body of a \texttt{while} loop).
604In order for the proof to go through, we need to establish global invariants
605of the machine state that relate all copies of the same statements. This is
606required to keep track of labels that, being attached to instructions, are
607duplicated as well. In retrospect, it is likely that the proof could be
608greatly simplified abandoning the SOS semantics in favour of some alternative
609description where the code is read-only. This will be discussed in the talk,
610but left as future work.
Note: See TracBrowser for help on using the repository browser.