# Changeset 1677

Ignore:
Timestamp:
Feb 7, 2012, 12:23:04 PM (6 years ago)
Message:

changes to paolo's english in the report, about a 1/4 of the way through

File:
1 edited

### Legend:

Unmodified
 r1676 \usepackage{xspace} %\usepackage{fancyvrb} \usepackage[all]{xy} %\usepackage[all]{xy} %packages pour LNCS %\usepackage{semantic} \section{Introduction} In~\cite{D2.1} an approach is presented tackling the problem of building a compiler from a large fragment of C which is capable of lifting execution cost information from the compiled code and present it to the user. This endeavour is central to the CerCo project, which strives to produce a mechanically certified version of such a compiler. In rough words, the proposed approach consists in decorating the source code with labels at key points, and preserving such labels along the compilation chain. Once the final object code is produced, such labels should correspond to parts of the compiled code which have constant cost. There are two properties one asks of the cost estimate. The first one, paramount to the correctness of the method, is \emph{soundness}: the actual execution cost is bounded by the estimate. In the labelling approach, this is guaranteed if every loop in the control flow of the compiled code passes through at least one cost label. The second one, optional but desirable, is \emph{preciseness}: the estimate \emph{is} the actual cost. In the labelling approach, this will be true if for every label every possible execution of the compiled code starting from such a label yields the same cost before hitting another one. In simple architectures such as the 8051 micro-controller this can be guaranteed by having labels at the immediate start of any branch of the control flow, and by ensuring no duplicate labels are present. It should be underlined that the above mentioned requirements must hold when executing the code obtained at the end of the compilation chain. So even If one is careful injecting the labels at good key places in the source code, the requirements might still fail because of two main obstacles: In~\cite{D2.1}, Armadio et al. propose an approach for building a compiler for a large fragment of the \textsc{c} programming language. The novelty of their proposal lies in the fact that their proposed design is capable of lifting execution cost information from the compiled code and presenting it to the user. This idea is foundational for the CerCo project, which strives to produce a mechanically certified version of such a compiler. To summarise, Armadio's proposal consisted of decorations' on the source code, along with the insertion of labels at key points. These labels are preserved as compilation progresses, from one intermediate language to another. Once the final object code is produced, such labels should correspond to the parts of the compiled code that have a constant cost. Two properties must hold of any cost estimate. The first property, paramount to the correctness of the method, is \emph{soundness}, that is, that the actual execution cost is bounded by the estimate. In the labelling approach, this is guaranteed if every loop in the control flow of the compiled code passes through at least one cost label. The second property, optional but desirable, is \emph{preciseness}: the estimate \emph{is} the actual cost. In the labelling approach, this will be true if, for every label, every possible execution of the compiled code starting from such a label yields the same cost before hitting another one. In simple architectures such as the 8051 micro-controller this can be guaranteed by placing labels at the start of any branch in the control flow, and by ensuring that no labels are duplicated. The reader should note that the above mentioned requirements must hold when executing the code obtained at the end of the compilation chain. So even if one is careful about injecting the labels at suitable places in the source code, the requirements might still fail because of two main obstacles: \begin{itemize} \item compilation introduces important changes in the control flow, inserting loops or branches: an example in addressing this is the insertion of functions in the source code replacing instructions unavailable in the target architecture and that require loops to be carried out (e.g.\ multi-word division and generic shift in the 8051 architecture), or the effort put in providing unbranching translations of higher level instructions~ \cite{D2.2}; \item even if the compiled code \emph{does}, as long as the the syntactic control flow graph is concerned, respect the conditions for soundness and preciseness, the cost of blocks of instructions might not be independent of context, so that different passes through a label might have different costs: this becomes a concern if one wishes to apply the approach to more complex architectures, for example one with cache or pipelining. \item The compilation process introduces important changes in the control flow, inserting loops or branches. For example, the insertion of functions in the source code replacing instructions that are unavailable in the target architecture. This require loops to be inserted (for example, for multi-word division and generic shift in the 8051 architecture), or effort spent in providing unbranching translations of higher level instructions~\cite{D2.2}. \item Even if the compiled code \emph{does}---as far as the the syntactic control flow graph is concerned---respect the conditions for soundness and preciseness, the cost of blocks of instructions might not be independent of context, so that different passes through a label might have different costs. This becomes a concern if one wishes to apply the approach to more complex architectures, for example one with caching or pipelining. \end{itemize} The first point unveils a weakness of the current labelling approach when it comes to some common code transformations done along a compilation chain. In particular most of \emph{loop optimizations} disrupt such conditions directly. In what we will call \emph{loop peeling}, for example, a first iteration of the loop is hoisted ahead of it, possibly to have a different cost than later iterations because of further transformation such as dead code elimination or invariant code motion. The second point strikes a difference in favour of existing tools for the estimate of execution costs and to the detriment of CerCo's approach advocated in~ \cite{D2.1}. We will take as an example the well known tool \s{aiT}~\cite{absint}, based on abstract interpretation: such a tool allows to estimate the worst-case execution time (WCET) taking into account advanced aspects of the target architecture. \s{aiT}'s ability to do close estimates of execution costs even when these depend on the context of execution would enhance the effectiveness of CerCo's compiler, either by integrating such techniques in its development, or by directly calling this tool when ascertaining the cost of blocks of compiled code. A typical case where cache analysis yields a difference in the execution cost of a block is in loops: the first iteration will usually stumble upon more cache misses than subsequent iterations. If one looks closely, the source of the weakness of the labelling approach as presented in~\cite{D2.1} is common to both points: the inability to state different costs for different occurrences of labels, where the difference might be originated by labels being duplicated along the compilation or the costs being sensitive to the current state of execution. The preliminary work we present here addresses this weakness by introducing cost labels that are dependent on which iteration of its containing loops it occurs in. This is achieved by means of \emph{indexed labels}: all cost labels are decorated with formal indexes coming from the loops containing such labels. These indexes allow to rebuild, even after several steps of loop transformations, which iterations of the original loops in the source code a particular label occurrence belongs to. During annotation of source code, this information is given to the user by means of dependent costs. The first point unveils a weakness of the current labelling approach when it comes to some common code transformations performed along a compilation chain. In particular, most \emph{loop optimizations} are disruptive, in the sense outlined in the first bulletpoint above. An example optimisation of this kind is \emph{loop peeling}. This optimisation is employed by compilers in order to trigger other optimisations, for example, dead code elimination or invariant code motion. Here, a first iteration of the loop is hoisted out of the body of the loop, possibly being assigned a different cost than later iterations. The second bulletpoint above highlights a difference between existing tools for estimating execution costs and CerCo's approach advocated in~\cite{D2.1}, in favour of those existing tools and to the detriment of Armadio's proposal. For example, the well known tool \s{aiT}~\cite{absint}---based on abstract interpretation---allows the user to estimate the worst-case execution time (\textsc{wcet}) of a piece of source code, taking into account advanced features of the target architecture. \s{aiT}'s ability to produce tight estimates of execution costs, even if these costs depend on the context of execution, would enhance the effectiveness of the CerCo compiler, either by integrating such techniques in its development, or by directly calling this tool when ascertaining the cost of blocks of compiled code. For instance, a typical case where cache analysis yields a difference in the execution cost of a block is in loops: the first iteration will usually stumble upon more cache misses than subsequent iterations. If one looks closely, the source of the weakness of the labelling approach as presented in~\cite{D2.1} is common to both points: the inability to state different costs for different occurrences of labels, where the difference might be originated by labels being duplicated along the compilation, or the costs being sensitive to the current state of execution. The preliminary work we present here addresses this weakness by introducing cost labels that are dependent on which iteration of its containing loops it occurs in. This is achieved by means of \emph{indexed labels}; all cost labels are decorated with formal indexes coming from the loops containing such labels. These indexes allow us to rebuild, even after multiple loop transformations, which iterations of the original loops in the source code a particular label occurrence belongs to. During the annotation stage of the source code, this information is presented to the user by means of \emph{dependent costs}. We concentrate on integrating the labelling approach with two loop transformations. For general information on compiler optimization (and loop optimizations in particular) we refer the reader to the vast literature on the subject (e.g.~\cite{muchnick,morgan}). \emph{Loop peeling}, as already mentioned, consists in preceding the loop with a copy of its body, appropriately guarded. This is in general useful to trigger further optimizations, such as ones relying on execution information which can be computed at compile time but which is erased by further iterations of the loop, or ones that use the hoisted code to be more effective at eliminating redundant code. Integrating this transformation to the labelling approach would also allow to integrate the common case of cache analysis explained above: the analysis of cache hits and misses usually benefits from a form of \emph{virtual} loop peeling~\cite{cacheprediction}. \emph{Loop unrolling} consists in repeating several times the body of the loop inside the loop itself (inserting appropriate guards, or avoiding them altogether if enough information on the loop's guard is available at compile time). This can limit the number of (conditional or unconditional) jumps executed by the code and trigger further optimizations dealing with pipelining, if applicable to the architecture. While covering only two loop optimizations, the work presented here poses good bases to extending the labelling approach to cover more and more of them, as well as giving hints as to how integrating in CerCo's compiler advanced cost estimation techniques such as cache analysis. Moreover loop peeling has the substantial advantage of enhancing other optimizations. Experimentation with CerCo's untrusted prototype compiler with constant propagation and partial redundancy elimination~\cite{PRE,muchnick} show how loop peeling enhances those other optimizations. \paragraph{Outline.} We will present our approach on a minimal toy imperative language, \imp{} with gotos, which we present in \autoref{sec:defimp} along with formal definition of the loop transformations. This language already presents most of the difficulties encountered when dealing with C, so we stick to it for the sake of this presentation. In \autoref{sec:labelling} we summarize the labelling approach as presented in~\cite{D2.1}. The following \autoref{sec:indexedlabels} explains \emph{indexed labels}, our proposal for dependent labels which are able to describe precise costs even in the presence of the loop transformations we consider. Finally \autoref{sec:conc} goes into more details regarding the implementation of indexed labels in CerCo's untrusted compiler and speculates on further work on the subject. For general information on general compiler optimisations (and loop optimisations in particular) we refer the reader to the vast literature on the subject (e.g.~\cite{muchnick,morgan}). \paragraph{Loop peeling} As already mentioned, loop peeling consists in preceding the loop with a copy of its body, appropriately guarded. This is used, in general, to trigger further optimisations, such as those that rely on execution information which can be computed at compile time, but which is erased by further iterations of the loop, or those that use the hoisted code to be more effective at eliminating redundant code. Integrating this transformation in to the labelling approach would also allow the integration of the common case of cache analysis explained above; the analysis of cache hits and misses usually benefits from a form of \emph{virtual} loop peeling~\cite{cacheprediction}. \paragraph{Loop unrolling} This optimisation consists of the repetition of several copies of the body of the loop inside the loop itself (inserting appropriate guards, or avoiding them altogether if enough information about the loop's guard is available at compile time). This can limit the number of (conditional or unconditional) jumps executed by the code and trigger further optimizations dealing with pipelining, if appropriate for the architecture. \\\\ Whilst we cover only two loop optimisations in this report, we argue that the work presented herein poses a good foundation for extending the labelling approach, in order to cover more and more common optimisations, as well as gaining insight into how to integrate advanced cost estimation techniques, such as cache analysis, into the CerCo compiler. Moreover loop peeling itself has the fortuitous property of enhancing and enabling other optimisations. Experimentation with CerCo's untrusted prototype compiler, which implements constant propagation and partial redundancy elimination~\cite{PRE,muchnick}, show how loop peeling enhances those other optimisations. \paragraph{Outline} We will present our approach on a minimal toy' imperative language, \imp{} with \s{goto}s, which we present in \autoref{sec:defimp} along with formal definitions of the loop transformations. This language already presents most of the difficulties encountered when dealing with \textsc{c}, so we stick to it for the sake of this presentation. In Section~\autoref{sec:labelling} we summarize the labelling approach as presented in~\cite{D2.1}. Section~\autoref{sec:indexedlabels} presents \emph{indexed labels}, our proposal for dependent labels which are able to describe precise costs even in the presence of the various loop transformations we consider. Finally Section~\autoref{sec:conc} goes into more detail regarding the implementation of indexed labels in CerCo's untrusted compiler and speculates on further work on the subject. \section{\imp{} with goto}\label{sec:defimp} We briefly outline the toy language which contains all the relevant instructions to present the development and the problems it is called to solve. The version of the minimal imperative language \imp{} that we will present has, with respect to the bare-bone usual version, \s{goto}s and labelled statements. Break and continue statements can be added at no particular expense. Its syntax and operational semantics is presented in \autoref{fig:minimp}. We briefly outline the toy language, \imp{} with \s{goto}s. The language was designed in order to pose problems for the existing labelling approach, and as a testing ground for our new notion of indexed labels. The syntax and operational semantics of our toy language are presented in \autoref{fig:minimp}. Note, we may augment the language further, with \s{break} and \s{continue}, at no further expense. \begin{figureplr} $$\begin{gathered} \label{fig:minimp} \end{figureplr} The actual grammar for expressions is not particularly relevant so we do not give a precise one. For the sake of conciseness we also treat boolean and arithmetic expressions together (with the usual convention of an expression being true iff non-zero). We may omit the \s{else} clause of a conditional if it leads to a \s{skip} statement. We will suppose programs are \emph{well-labelled}, i.e.\ every label labels at most one occurrence of statement in the program, and every \s{goto} points to a label actually present in the program. The \s{find} helper function has the task of not only finding the labelled statement in the program, but also building the correct continuation. The continuation built by \s{find} replaces the current continuation in the case of a jump. \paragraph{Further down the compilation chain.} As for the rest of the compilation chain, we abstract over it. We just posit every language L further down the compilation chain has a suitable notion of sequential instructions (with one natural successor), to which we can add our own. The precise grammar for expressions is not particularly relevant so we do not give one in full. For the sake of conciseness we also treat boolean and arithmetic expressions together (with the usual \textsc{c} convention of an expression being true iff non-zero). We may omit the \s{else} clause of a conditional if it leads to a \s{skip} statement. We will presuppose that all programs are \emph{well-labelled}, i.e.\ every label labels at most one occurrence of a statement in a program, and every \s{goto} points to a label actually present in the program. The \s{find} helper function has the task of not only finding the labelled statement in the program, but also building the correct continuation. The continuation built by \s{find} replaces the current continuation in the case of a jump. \paragraph{Further down the compilation chain} We abstract over the rest of the compilation chain. We posit the existence of a suitable notion of `sequential instructions', wherein each instruction has a single natural successor to which we can add our own, for every language L further down the compilation chain. \subsection{Loop transformations} We call a loop L \emph{single-entry} in P if there is no \s{goto} of P outside of L which jumps to within L\footnote{This is a reasonable aproximation: it considers multi-entry also those loops having an external but unreachable \s{goto} jumping to them.}. Many loop optimizations do not preserve the semantics of multi-entry loops in general, or are otherwise rendered ineffective. Usually compilers implement a single-entry loop detection which avoids the multi-entry ones from being targeted by optimizations~\cite{muchnick,morgan}. The loop transformations we present are local, i.e.\ they target a single loop and transform it. Which loops are targeted may be decided by an \emph{ad hoc} heuristic, however this is not important here. \paragraph{Loop peeling.}$$ \sop{while}b\sbin{do}S\mapsto \sop{if}b\sbin{then} S; \sop{while} b \sbin{do} S[\ell'_i/\ell_i] We call a loop $L$ \emph{single-entry} in $P$ if there is no \s{goto} to $P$ outside of $L$ which jumps into $L$.\footnote{This is a reasonable aproximation: it defines a loop as multi-entry if it has an external but unreachable \s{goto} jumping into it.} Many loop optimisations do not preserve the semantics of multi-entry loops in general, or are otherwise rendered ineffective. Usually compilers implement a single-entry loop detection which avoids the multi-entry ones from being targeted by optimisations~\cite{muchnick,morgan}. The loop transformations we present are local, i.e.\ they target a single loop and transform it. Which loops are targeted may be decided by some \emph{ad hoc} heuristic. However, the precise details of which loops are targetted and how is not important here. \paragraph{Loop peeling} $$\sop{while}b\sbin{do}S \mapsto \sop{if}b\sbin{then} S; \sop{while} b \sbin{do} S[\ell'_i/\ell_i]$$ where $\ell'_i$ is a fresh label for any $\ell_i$ labelling a statement in $S$. This relabelling is safe as to \s{goto}s external to the loop because of the single-entry condition. Notice that in case of break and continue statements, those should be replaced with \s{goto}s in the peeled body $S$. This relabelling is safe for \s{goto}s occurring outside the loop because of the single-entry condition. Note that for \s{break} and \s{continue} statements, those should be replaced with \s{goto}s in the peeled body $S$. \paragraph{Loop unrolling.} Adapting the indexed labelled approach to cost-labelled expressions does not pose particular problems. \paragraph{Simplification of dependent costs.} As already mentioned, the blind application of the procedure described in \autoref{ssec:depcosts} produces unwieldy cost annotations. In the implementation several transformations are used to simplify such dependent costs. Disjunctions of simple conditions are closed under all logical operations, and it can be computed whether such a disjunction implies a simple condition or its negation. This can be used to eliminate useless branches of dependent costs, to merge branches that have the same value, and possibly to simplify the third case of simple condition. Examples of the three transformations are respectively \paragraph{Simplification of dependent costs} As previously mentioned, the na\"{i}ve application of the procedure described in~\autoref{ssec:depcosts} produces unwieldy cost annotations. In our implementation several transformations are used to simplify such complex dependent costs. Disjunctions of simple conditions are closed under all logical operations, and it can be computed whether such a disjunction implies a simple condition or its negation. This can be used to eliminate useless branches of dependent costs, to merge branches that share the same value, and possibly to simplify the third case of simple condition. Examples of the three transformations are respectively: \begin{itemize} \item \end{tabular} \end{itemize} The second transformation tends to accumulate disjunctions, again to the detriment of readability. A further transformation swaps two branches of the ternary expression if the negation of the condition can be expressed with less clauses. An example is The second transformation tends to accumulate disjunctions, to the detriment of readability. A further transformation swaps two branches of the ternary expression if the negation of the condition can be expressed with fewer clauses. For example: $$\verb+(_i_0 % 3 == 0 || _i_0 % 3 == 1)?+x\verb+:+y \mapsto \verb+(_i_0 % 3 == 2)?+y\verb+:+x.$$ \paragraph{Updates to the frama-C cost plugin.} Cerco's frama-C~\cite{framac} cost plugin\todo{is there a reference for this?}{} has been updated to take into account dependent costs. The frama-c framework explodes ternary expressions to actual branch statements introducing temporaries along the way, which makes the task of analyzing ternary cost expressions rather daunting. It was deemed necessary to provide an option in the compiler to use actual branch statements for cost annotations rather than ternary expressions, so that at least frama-C's use of temporaries in cost annotation be avoided. The cost analysis carried out by the plugin now takes into account such dependent costs. The only limitation (which provided a simplification in the code) is that within a dependent cost simple conditions with modulus on the \paragraph{Updates to the frama-C cost plugin} Cerco's frama-C~\cite{framac} cost plugin\todo{is there a reference for this?}{} has been updated to take into account our new notion of dependent costs. The frama-c framework expands ternary expressions to branch statements, introducing temporaries along the way. This makes the task of analyzing ternary cost expressions rather daunting. It was deemed necessary to provide an option in the compiler to use actual branch statements for cost annotations rather than ternary expressions, so that at least frama-C's use of temporaries in cost annotation could be avoided. The cost analysis carried out by the plugin now takes into account such dependent costs. The only limitation (which provided a simplification in the code) is that within a dependent cost simple conditions with modulus on the same loop index should not be modulo different numbers, which corresponds to the reasonable limitation of not applying multiple times loop unrolling to