Changeset 1677


Ignore:
Timestamp:
Feb 7, 2012, 12:23:04 PM (6 years ago)
Author:
mulligan
Message:

changes to paolo's english in the report, about a 1/4 of the way through

File:
1 edited

Legend:

Unmodified
Added
Removed
  • Deliverables/addenda/indexed_labels/report.tex

    r1676 r1677  
    2525\usepackage{xspace}
    2626%\usepackage{fancyvrb}
    27 \usepackage[all]{xy}
     27%\usepackage[all]{xy}
    2828%packages pour LNCS
    2929%\usepackage{semantic}
     
    626626
    627627\section{Introduction}
    628 In~\cite{D2.1} an approach is presented tackling the problem of building a
    629 compiler from a large fragment of C which is capable of lifting execution cost information
    630 from the compiled code and present it to the user. This endeavour is central to
    631 the CerCo project, which strives to produce a mechanically certified version of
    632 such a compiler.
    633 
    634 In rough words, the proposed approach consists in decorating the source code with
    635 labels at key points, and preserving such labels along the compilation chain.
    636 Once the final object code is produced, such labels should correspond to parts
    637 of the compiled code which have constant cost.
    638 
    639 There are two properties one asks of the cost estimate. The first one, paramount to
    640 the correctness of the method, is \emph{soundness}: the actual execution cost
    641 is bounded by the estimate. In the labelling approach, this is guaranteed if every
    642 loop in the control flow of the compiled code passes through at least one cost
    643 label. The second one, optional but desirable, is \emph{preciseness}: the estimate
    644 \emph{is} the actual cost. In the labelling approach, this will
    645 be true if for every label every possible execution of the compiled code starting
    646 from such a label yields the same cost before hitting another one. In simple
    647 architectures such as the 8051
    648 micro-controller this can be guaranteed by having labels at the immediate start of any
    649 branch of the control flow, and by ensuring no duplicate labels are present.
    650 
    651 It should be underlined that the above mentioned requirements must hold when
    652 executing the code obtained at the
    653 end of the compilation chain. So even If one is careful injecting the labels at good
    654 key places in the source code, the requirements might still fail because of two main obstacles:
     628In~\cite{D2.1}, Armadio et al. propose an approach for building a compiler for a large fragment of the \textsc{c} programming language.
     629The novelty of their proposal lies in the fact that their proposed design is capable of lifting execution cost information from the compiled code and presenting it to the user.
     630This idea is foundational for the CerCo project, which strives to produce a mechanically certified version of such a compiler.
     631
     632To summarise, Armadio's proposal consisted of `decorations' on the source code, along with the insertion of labels at key points.
     633These labels are preserved as compilation progresses, from one intermediate language to another.
     634Once the final object code is produced, such labels should correspond to the parts of the compiled code that have a constant cost.
     635
     636Two properties must hold of any cost estimate.
     637The first property, paramount to the correctness of the method, is \emph{soundness}, that is, that the actual execution cost is bounded by the estimate.
     638In the labelling approach, this is guaranteed if every loop in the control flow of the compiled code passes through at least one cost label.
     639The second property, optional but desirable, is \emph{preciseness}: the estimate \emph{is} the actual cost.
     640In the labelling approach, this will be true if, for every label, every possible execution of the compiled code starting from such a label yields the same cost before hitting another one.
     641In simple architectures such as the 8051 micro-controller this can be guaranteed by placing labels at the start of any branch in the control flow, and by ensuring that no labels are duplicated.
     642
     643The reader should note that the above mentioned requirements must hold when executing the code obtained at the end of the compilation chain.
     644So even if one is careful about injecting the labels at suitable places in the source code, the requirements might still fail because of two main obstacles:
    655645\begin{itemize}
    656  \item compilation introduces important changes in the control flow, inserting
    657 loops or branches: an example in addressing this is the insertion of functions
    658 in the source code replacing instructions unavailable in the target architecture
    659 and that require loops to be carried out (e.g.\ multi-word division and generic
    660 shift in the 8051 architecture), or the effort put in
    661 providing unbranching translations of higher level instructions~
    662 \cite{D2.2};
    663  \item even if the compiled code \emph{does}, as long as the the syntactic
    664   control flow graph is concerned, respect the conditions for soundness and
    665   preciseness, the cost of blocks of instructions might not be independent of context,
    666   so that different passes through a label might have different costs: this
    667   becomes a concern if one wishes to apply the approach to more complex architectures,
    668   for example one with cache or pipelining.
     646\item
     647The compilation process introduces important changes in the control flow, inserting loops or branches.
     648For example, the insertion of functions in the source code replacing instructions that are unavailable in the target architecture.
     649This require loops to be inserted (for example, for multi-word division and generic shift in the 8051 architecture), or effort spent in providing unbranching translations of higher level instructions~\cite{D2.2}.
     650\item
     651Even if the compiled code \emph{does}---as far as the the syntactic control flow graph is concerned---respect the conditions for soundness and preciseness, the cost of blocks of instructions might not be independent of context, so that different passes through a label might have different costs.
     652This becomes a concern if one wishes to apply the approach to more complex architectures, for example one with caching or pipelining.
    669653\end{itemize}
    670 
    671 The first point unveils a weakness of the current labelling approach when it
    672 comes to some common code transformations done along a compilation chain. In
    673 particular most of \emph{loop optimizations} disrupt such conditions directly.
    674 In what we will call \emph{loop peeling}, for example, a first iteration of the
    675 loop is hoisted ahead of it, possibly to have a different cost than later iterations
    676 because of further transformation such as dead code elimination or invariant
    677 code motion.
    678 
    679 The second point strikes a difference in favour of existing tools for the
    680 estimate of execution costs and to the detriment of CerCo's approach advocated in~
    681 \cite{D2.1}. We will take as an example the well known tool \s{aiT}~\cite{absint},
    682 based on abstract interpretation: such a tool allows to estimate the
    683 worst-case execution time (WCET) taking into account advanced aspects of the
    684 target architecture. \s{aiT}'s ability to
    685 do close estimates of execution costs even when these depend on the context of
    686 execution would enhance the effectiveness of CerCo's compiler, either by
    687 integrating such techniques in its development, or by directly calling this tool
    688 when ascertaining the cost of blocks of compiled code. A typical case where
    689 cache analysis yields a difference in the execution cost of a block is in loops:
    690 the first iteration will usually stumble upon more cache misses than subsequent
    691 iterations.
    692 
    693 If one looks closely, the source of the weakness of the labelling approach as
    694 presented in~\cite{D2.1} is common to both points: the inability to state
    695 different costs for different occurrences of labels, where the difference
    696 might be originated by labels being duplicated along the compilation
    697 or the costs being sensitive to the current state of execution.
    698 
    699 The preliminary work
    700 we present here addresses this weakness by introducing cost labels that are dependent
    701 on which iteration of its containing loops it occurs in. This is achieved by
    702 means of \emph{indexed labels}: all cost labels are decorated with formal
    703 indexes coming from the loops containing such labels. These indexes allow to
    704 rebuild, even after several steps of loop transformations,
    705 which iterations of the original loops in the source code a particular label
    706 occurrence belongs to. During annotation of source code, this information
    707 is given to the user by means of dependent costs.
     654The first point unveils a weakness of the current labelling approach when it comes to some common code transformations performed along a compilation chain.
     655In particular, most \emph{loop optimizations} are disruptive, in the sense outlined in the first bulletpoint above.
     656An example optimisation of this kind is \emph{loop peeling}.
     657This optimisation is employed by compilers in order to trigger other optimisations, for example, dead code elimination or invariant code motion.
     658Here, a first iteration of the loop is hoisted out of the body of the loop, possibly being assigned a different cost than later iterations.
     659
     660The second bulletpoint above highlights a difference between existing tools for estimating execution costs and CerCo's approach advocated in~\cite{D2.1}, in favour of those existing tools and to the detriment of Armadio's proposal.
     661For example, the well known tool \s{aiT}~\cite{absint}---based on abstract interpretation---allows the user to estimate the worst-case execution time (\textsc{wcet}) of a piece of source code, taking into account advanced features of the target architecture.
     662\s{aiT}'s ability to produce tight estimates of execution costs, even if these costs depend on the context of execution, would enhance the effectiveness of the CerCo compiler, either by integrating such techniques in its development, or by directly calling this tool when ascertaining the cost of blocks of compiled code.
     663For instance, a typical case where cache analysis yields a difference in the execution cost of a block is in loops: the first iteration will usually stumble upon more cache misses than subsequent iterations.
     664
     665If one looks closely, the source of the weakness of the labelling approach as presented in~\cite{D2.1} is common to both points: the inability to state different costs for different occurrences of labels, where the difference might be originated by labels being duplicated along the compilation, or the costs being sensitive to the current state of execution.
     666The preliminary work we present here addresses this weakness by introducing cost labels that are dependent on which iteration of its containing loops it occurs in.
     667This is achieved by means of \emph{indexed labels}; all cost labels are decorated with formal indexes coming from the loops containing such labels.
     668These indexes allow us to rebuild, even after multiple loop transformations, which iterations of the original loops in the source code a particular label occurrence belongs to.
     669During the annotation stage of the source code, this information is presented to the user by means of \emph{dependent costs}.
    708670
    709671We concentrate on integrating the labelling approach with two loop transformations.
    710 For general information on compiler optimization (and loop optimizations in
    711 particular) we refer the reader to the vast literature on the subject
    712 (e.g.~\cite{muchnick,morgan}).
    713 
    714 \emph{Loop peeling}, as already mentioned, consists in preceding the loop with
    715 a copy of its body, appropriately guarded. This is in general useful to trigger
    716 further optimizations, such as ones relying on execution information which can
    717 be computed at compile time but which is erased by further iterations of the loop,
    718 or ones that use the hoisted code to be more effective at eliminating redundant
    719 code. Integrating this transformation to the labelling approach would also allow
    720 to integrate the common case of cache analysis explained above: the analysis
    721 of cache hits and misses usually benefits from a form of
    722 \emph{virtual} loop peeling~\cite{cacheprediction}.
    723 
    724 \emph{Loop unrolling} consists in repeating several times the body of the loop
    725 inside the loop itself (inserting appropriate guards, or avoiding them altogether
    726 if enough information on the loop's guard is available at compile time). This
    727 can limit the number of (conditional or unconditional) jumps executed by the
    728 code and trigger further optimizations dealing with pipelining, if applicable
    729 to the architecture.
    730 
    731 While covering only two loop optimizations, the work presented here poses good
    732 bases to extending the labelling approach to cover more and more of them, as well
    733 as giving hints as to how integrating in CerCo's compiler advanced cost estimation
    734 techniques such as cache analysis. Moreover loop peeling has the substantial
    735 advantage of enhancing other optimizations. Experimentation with CerCo's
    736 untrusted prototype compiler with constant propagation and
    737 partial redundancy elimination~\cite{PRE,muchnick} show how loop peeling enhances
    738 those other optimizations.
    739 
    740 \paragraph{Outline.}
    741 We will present our approach on a minimal toy imperative language, \imp{}
    742 with gotos, which we present in \autoref{sec:defimp} along with formal
    743 definition of the loop transformations. This language already
    744 presents most of the difficulties encountered when dealing with C, so
    745 we stick to it for the sake of this presentation. In \autoref{sec:labelling}
    746 we summarize the labelling approach as presented in~\cite{D2.1}. The following
    747 \autoref{sec:indexedlabels} explains \emph{indexed labels}, our proposal for
    748 dependent labels which are able to describe precise costs even in the presence
    749 of the loop transformations we consider. Finally \autoref{sec:conc} goes into
    750 more details regarding the implementation of indexed labels in CerCo's
    751 untrusted compiler and speculates on further work on the subject.
    752 
     672For general information on general compiler optimisations (and loop optimisations in particular) we refer the reader to the vast literature on the subject (e.g.~\cite{muchnick,morgan}).
     673
     674\paragraph{Loop peeling}
     675As already mentioned, loop peeling consists in preceding the loop with a copy of its body, appropriately guarded.
     676This is used, in general, to trigger further optimisations, such as those that rely on execution information which can be computed at compile time, but which is erased by further iterations of the loop, or those that use the hoisted code to be more effective at eliminating redundant code.
     677Integrating this transformation in to the labelling approach would also allow the integration of the common case of cache analysis explained above; the analysis of cache hits and misses usually benefits from a form of \emph{virtual} loop peeling~\cite{cacheprediction}.
     678
     679\paragraph{Loop unrolling}
     680This optimisation consists of the repetition of several copies of the body of the loop inside the loop itself (inserting appropriate guards, or avoiding them altogether if enough information about the loop's guard is available at compile time).
     681This can limit the number of (conditional or unconditional) jumps executed by the code and trigger further optimizations dealing with pipelining, if appropriate for the architecture.
     682\\\\
     683Whilst we cover only two loop optimisations in this report, we argue that the work presented herein poses a good foundation for extending the labelling approach, in order to cover more and more common optimisations, as well as gaining insight into how to integrate advanced cost estimation techniques, such as cache analysis, into the CerCo compiler.
     684Moreover loop peeling itself has the fortuitous property of enhancing and enabling other optimisations.
     685Experimentation with CerCo's untrusted prototype compiler, which implements constant propagation and partial redundancy elimination~\cite{PRE,muchnick}, show how loop peeling enhances those other optimisations.
     686
     687\paragraph{Outline}
     688We will present our approach on a minimal `toy' imperative language, \imp{} with \s{goto}s, which we present in \autoref{sec:defimp} along with formal definitions of the loop transformations.
     689This language already presents most of the difficulties encountered when dealing with \textsc{c}, so we stick to it for the sake of this presentation.
     690In Section~\autoref{sec:labelling} we summarize the labelling approach as presented in~\cite{D2.1}.
     691Section~\autoref{sec:indexedlabels} presents \emph{indexed labels}, our proposal for dependent labels which are able to describe precise costs even in the presence of the various loop transformations we consider.
     692Finally Section~\autoref{sec:conc} goes into more detail regarding the implementation of indexed labels in CerCo's untrusted compiler and speculates on further work on the subject.
    753693
    754694\section{\imp{} with goto}\label{sec:defimp}
    755 We briefly outline the toy language which contains all the relevant instructions
    756 to present the development and the problems it is called to solve.
    757 
    758 The version of the minimal imperative language \imp{} that we will present has,
    759 with respect to the bare-bone usual version, \s{goto}s and labelled statements.
    760 Break and continue statements can be added at no particular expense. Its syntax
    761 and operational semantics is presented in \autoref{fig:minimp}.
     695We briefly outline the toy language, \imp{} with \s{goto}s.
     696The language was designed in order to pose problems for the existing labelling approach, and as a testing ground for our new notion of indexed labels.
     697
     698The syntax and operational semantics of our toy language are presented in \autoref{fig:minimp}.
     699Note, we may augment the language further, with \s{break} and \s{continue}, at no further expense.
    762700\begin{figureplr}
    763701$$\begin{gathered}
     
    826764\label{fig:minimp}
    827765\end{figureplr}
    828 The actual
    829 grammar for expressions is not particularly relevant so we do not give a
    830 precise one. For the sake of conciseness we also treat boolean and
    831 arithmetic expressions together (with the usual convention of an expression
    832 being true iff non-zero). We may omit the \s{else} clause of a conditional
    833 if it leads to a \s{skip} statement.
    834 
    835 We will suppose programs are
    836 \emph{well-labelled}, i.e.\ every label labels at most one occurrence of statement
    837 in the program, and every \s{goto} points to a label actually present in the program.
    838 The \s{find} helper function has the task of not only finding the labelled
    839 statement in the program, but also building the correct continuation. The continuation
    840 built by \s{find} replaces the current continuation in the case of a jump.
    841 
    842 \paragraph{Further down the compilation chain.}
    843 As for the rest of the compilation chain, we abstract over it. We just posit
    844 every language $L$ further down the compilation chain has a suitable notion of
    845 sequential instructions (with one natural successor), to which we can add our
    846 own.
     766The precise grammar for expressions is not particularly relevant so we do not give one in full.
     767For the sake of conciseness we also treat boolean and arithmetic expressions together (with the usual \textsc{c} convention of an expression being true iff non-zero).
     768We may omit the \s{else} clause of a conditional if it leads to a \s{skip} statement.
     769
     770We will presuppose that all programs are \emph{well-labelled}, i.e.\ every label labels at most one occurrence of a statement in a program, and every \s{goto} points to a label actually present in the program.
     771The \s{find} helper function has the task of not only finding the labelled statement in the program, but also building the correct continuation.
     772The continuation built by \s{find} replaces the current continuation in the case of a jump.
     773
     774\paragraph{Further down the compilation chain}
     775We abstract over the rest of the compilation chain.
     776We posit the existence of a suitable notion of `sequential instructions', wherein each instruction has a single natural successor to which we can add our own, for every language $L$ further down the compilation chain.
    847777
    848778\subsection{Loop transformations}
    849 We call a loop $L$ \emph{single-entry} in $P$ if there is no \s{goto} of $P$
    850 outside of $L$ which jumps to within $L$\footnote{This is a reasonable
    851 aproximation: it considers multi-entry also those loops having an external but
    852 unreachable \s{goto} jumping to them.}. Many loop optimizations do not preserve
    853 the semantics of multi-entry loops in general, or are otherwise rendered
    854 ineffective. Usually compilers implement a single-entry loop detection which
    855 avoids the multi-entry ones from being targeted by optimizations~\cite{muchnick,morgan}.
    856 The loop transformations we present are local, i.e.\ they target a single loop
    857 and transform it. Which loops are targeted may be decided by an \emph{ad hoc}
    858 heuristic, however this is not important here.
    859 
    860 
    861 \paragraph{Loop peeling.}
    862 $$
    863 \sop{while}b\sbin{do}S\mapsto
    864 \sop{if}b\sbin{then} S; \sop{while} b \sbin{do} S[\ell'_i/\ell_i]
     779We call a loop $L$ \emph{single-entry} in $P$ if there is no \s{goto} to $P$ outside of $L$ which jumps into $L$.\footnote{This is a reasonable aproximation: it defines a loop as multi-entry if it has an external but unreachable \s{goto} jumping into it.}
     780Many loop optimisations do not preserve the semantics of multi-entry loops in general, or are otherwise rendered ineffective.
     781Usually compilers implement a single-entry loop detection which avoids the multi-entry ones from being targeted by optimisations~\cite{muchnick,morgan}.
     782The loop transformations we present are local, i.e.\ they target a single loop and transform it.
     783Which loops are targeted may be decided by some \emph{ad hoc} heuristic.
     784However, the precise details of which loops are targetted and how is not important here.
     785
     786\paragraph{Loop peeling}
     787$$
     788\sop{while}b\sbin{do}S \mapsto \sop{if}b\sbin{then} S; \sop{while} b \sbin{do} S[\ell'_i/\ell_i]
    865789$$
    866790where $\ell'_i$ is a fresh label for any $\ell_i$ labelling a statement in $S$.
    867 This relabelling is safe as to \s{goto}s external to the loop because of the
    868 single-entry condition. Notice that in case of break and continue statements,
    869 those should be replaced with \s{goto}s in the peeled body $S$.
     791This relabelling is safe for \s{goto}s occurring outside the loop because of the single-entry condition.
     792Note that for \s{break} and \s{continue} statements, those should be replaced with \s{goto}s in the peeled body $S$.
    870793
    871794\paragraph{Loop unrolling.}
     
    13981321Adapting the indexed labelled approach to cost-labelled expressions does not
    13991322pose particular problems.
    1400 \paragraph{Simplification of dependent costs.}
    1401 As already mentioned, the blind application of the procedure described in
    1402 \autoref{ssec:depcosts} produces unwieldy cost annotations. In the implementation
    1403 several transformations are used to simplify such dependent costs.
    1404 
    1405 Disjunctions of simple conditions are closed under all logical operations, and
    1406 it can be computed whether such a disjunction implies a simple condition
    1407 or its negation. This can be used to eliminate useless branches of dependent costs,
    1408 to merge branches that have the same value, and possibly to simplify the third
    1409 case of simple condition. Examples of the three transformations are respectively
     1323
     1324\paragraph{Simplification of dependent costs}
     1325As previously mentioned, the na\"{i}ve application of the procedure described in~\autoref{ssec:depcosts} produces unwieldy cost annotations.
     1326In our implementation several transformations are used to simplify such complex dependent costs.
     1327
     1328Disjunctions of simple conditions are closed under all logical operations, and it can be computed whether such a disjunction implies a simple condition or its negation.
     1329This can be used to eliminate useless branches of dependent costs, to merge branches that share the same value, and possibly to simplify the third case of simple condition.
     1330Examples of the three transformations are respectively:
    14101331\begin{itemize}
    14111332\item $
     
    14251346$\end{tabular}
    14261347\end{itemize}
    1427 The second transformation tends to accumulate disjunctions, again to the detriment
    1428 of readability. A further transformation swaps two branches of the ternary
    1429 expression if the negation of the condition can be expressed with less clauses.
    1430 An example is
     1348The second transformation tends to accumulate disjunctions, to the detriment of readability.
     1349A further transformation swaps two branches of the ternary expression if the negation of the condition can be expressed with fewer clauses.
     1350For example:
    14311351$$ \verb+(_i_0 % 3 == 0 || _i_0 % 3 == 1)?+x\verb+:+y \mapsto
    14321352\verb+(_i_0 % 3 == 2)?+y\verb+:+x.
    14331353$$
    14341354
    1435 \paragraph{Updates to the frama-C cost plugin.}
    1436 Cerco's frama-C~\cite{framac} cost plugin\todo{is there a reference for this?}{}
    1437 has been updated to take into account dependent
    1438 costs. The frama-c framework explodes ternary expressions to actual branch
    1439 statements introducing temporaries along the way, which makes the task of
    1440 analyzing ternary cost expressions rather daunting. It was deemed necessary to provide
    1441 an option in the compiler to use actual branch statements for cost annotations
    1442 rather than ternary expressions, so that at least frama-C's use of temporaries in
    1443 cost annotation be avoided. The cost analysis carried out by the plugin now
    1444 takes into account such dependent costs.
    1445 
    1446 The only limitation (which provided
    1447 a simplification in the code) is that within a dependent cost
    1448 simple conditions with modulus on the
     1355\paragraph{Updates to the frama-C cost plugin}
     1356Cerco's frama-C~\cite{framac} cost plugin\todo{is there a reference for this?}{} has been updated to take into account our new notion of dependent costs.
     1357The frama-c framework expands ternary expressions to branch statements, introducing temporaries along the way.
     1358This makes the task of analyzing ternary cost expressions rather daunting.
     1359It was deemed necessary to provide an option in the compiler to use actual branch statements for cost annotations rather than ternary expressions, so that at least frama-C's use of temporaries in cost annotation could be avoided.
     1360The cost analysis carried out by the plugin now takes into account such dependent costs.
     1361
     1362The only limitation (which provided a simplification in the code) is that within a dependent cost simple conditions with modulus on the
    14491363same loop index should not be modulo different numbers, which corresponds to
    14501364the reasonable limitation of not applying multiple times loop unrolling to
Note: See TracChangeset for help on using the changeset viewer.