source: Papers/jar-cerco-2017/conclusions.tex

Last change on this file was 3661, checked in by boender, 3 years ago

Added second part of technical report as basis for Frama-C section

File size: 24.8 KB
4% Conclusions
5%   Summary
6%   Related work
7%   Future work
9The concrete, non-functional properties of programs---time and space usage, for example---are an important factor in the specification of a program, and therefore overall program correctness.
10Indeed, for many application domains, concrete complexity is a critical component of overall correctness: `real time' programs, with bounds on application response time, and libraries exporting cryptographic primitives, designed to be impervious to timing side-channel attacks, are two important classes of application fitting this pattern, for example.
12Worst Case Execution Time (WCET) static analysers currently represent the state of the art in determining accurate concrete resource bounds for programs.
13These tools possess a number of disadvantages, however.
14For instance, their analyses are at once sophisticated and opaque, with trust in the analysis devolving to trust in the complex WCET tool implementation.
15Further, they also require that the analysis be performed on machine code produced by a compiler, where all high-level program structure has been `compiled away', making analyses potentially hard to understand for programmers accustomed to reasoning at the source-code level.
17More ideal would be a mechanism to `lift' a cost model from the machine code generated by a compiler, back to the source code level, where analyses could be performed in terms of source code, abstractions, and control-flow constructs written and understood by the programmer.
18A major challenge here is obtaining a source-level resource analysis that can compete with the precision offered by traditional WCET tools.
19Further, how to perform this high-level analysis reliably is an open question.
21In this paper we have presented an approach to engineering a realistic, mildly-optimising compiler for a large fragment of the C programming language that combines the ergonomics of reasoning at the source-level, with the accuracy and precision one would expect of traditional static analyses for non-functional properties of programs.
22The central insight of our approach is that effective, high-level reasoning about resource usage can be facilitated by embracing the implicit knowledge of program translation that the compiler possesses, not ignoring it.
23In particular, the CerCo verified C compiler tracks the translation of phrases of high-level source code into machine code, wherein a cost measure is applied, before being reflected back through the compiler to the source-code level.
24This tracking is facilitated by a novel technique---the \emph{labelling approach}.
25Thereafter, this cost model is reflected as parametric cost assertions, embellishing the program's source code, which can be used by external reasoning tools---trusted or untrusted---to provide hard bounds on a program's resource consumption.
27The CerCo verified C compiler and associated toolchain presented in this paper have demonstrated that it is indeed possible to perform static concrete resource analysis at the source level with the precision one would expect of traditional WCET tools.
28Indeed, for the MCS-51, our compiler's target architecture, our parametric cost assertions are capable of producing \emph{exact} bounds on resource consumption.
29Further, as the CerCo compiler's lifting of the cost model is fully verified in Matita, the trusted code base is minimised---one no longer need rely on the correctness of a WCET tool to obtain a trustworthy analysis.
31We believe that the work presented in this paper represents a general approach to structuring compilers for lifting cost models to the source-level.
32In particular, the CerCo verified C compiler itself is a lightly-optimising compiler for a large subset of the C programming language, and whilst not as aggressive in its optimisation passes as commercial and open-source C compilers, such as \texttt{gcc}~\cite{gcc}, \texttt{clang}~\cite{clang}, or \texttt{icc}~\cite{icc}, the compiler is still a relatively substantial piece of software.
33As a result, the application of the techniques described herein, first described with pen-and-paper proofs, to the full CerCo compiler is, in our view, already evidence of their scalability, and we see no reason why these techniques could not be applied to a more aggressively optimising compiler, given that none of our techniques are tied to any particular choice of intermediate language or set of optimisations.
34Further, the techniques that we have presented are not tied to any particular programming language, or to any one programming paradigm, and are in fact amenable to arbitrary imperative and functional programming languages.
35This last claim is bolstered by a pen-and-paper proof of the correctness of the labelling approach applied to a `toy' compiler for an applicative language. %cite
37We envision a new generation of compilers that track program structure through compilation and optimisation and exploit this information to define a precise, non-uniform cost model for source code that accounts for runtime state.
38With such a cost model we can perform non-functional verification in a similar way to functional verification and exploit the state of the art in automated high-level verification~\cite{survey}.
40The techniques currently used by the Worst Case Execution Time (WCET) community, which perform analysis on object code, are still applicable but can be coupled with additional source-level analysis.
41In cases where our approach produces overly complex cost models, safe approximations can be used to trade complexity for precision.
42Finally, source code analysis can be used early in the development process, when components have been specified but not implemented, as modularity means that it is enough to specify the non-functional behaviour of missing components.
44\subsection{Related work}
47Many different approaches to resource analysis---asymptotic and concrete, working at low- and high-levels of abstraction---are present within the literature.
48We now survey these approaches, drawing a distinction between low-level approaches that operate directly on the object code (or bytecode) produced by a compiler, and approaches that operate at a higher-level of abstraction.
50\subsubsection{Object-code approaches}
52One existing approach to predicting concrete program execution time at the object-code level was developed in the EmBounded project~\cite{embounded}.
53The EmBounded compiler compositionally compiles high-level Hume programs to the byte code of the Hume Abstract Machine (HAM).
54The HAM provides strong guarantees on the maximal execution time spent for each byte code instruction, derived from empirically testing the maximum running time taken for each instruction on a given architecture~\cite{BoneAFerdCHH2007:IFL}.
55This provides a model that is uniform, though at the expense of precision---each cost is a pessimistic upper bound---and the performance of the executed code---the byte code is interpreted compositionally.
56Experiments were conducted using a prototype static analysis tool, and a mainstream WCET analyser, to derive upper bounds for running time of a Hume program.
57Case studies demonstrated the feasibility of using the Hume programming language for developing resource-constrained applications~~\cite{BoneAChenZHMWW2007:ACM}.
59The current state of the art in concrete timing analysis is represented by WCET tools, which use Abstract Interpretation~\cite{Cousot1977}, control flow analyses, and detailed models of hardware to derive tight time bounds on expected program execution time.
60See~\cite{stateart} for a survey of state-of-the-art techniques in the WCET field.
61A variety of academic and commercial sources provide advanced WCET tooling: for example, aiT~\cite{absint}, bound-T~\cite{bound-T}, SWEET~\cite{Lisper2014}, Heptane~\cite{heptane}, and OTAWA~\cite{otawa}.
62Unlike the approach presented in this paper, the analyses performed by mainstream WCET tools are unverified: the WCET tool itself, typically a large, complex piece of software, must become part of one's trusted code base.
64However, recent work on the CompCert verified C compiler has investigated the verification of loop-bound analysis, a subcomponent of wider WCET analyses for the Clight intermediate language~\cite{DBLP:conf/wcet/MaronezeBPP14}.
65Note that this work is related but orthogonal to the work presented in this paper: our cost annotations at the C source-level are \emph{parametric} in the number of iterations of loops appearing in the source code.
66Indeed, to obtain a final resource analysis using our toolchain, the user must manually bound loop iterations.\footnote{This is typical for embedded and safety- or mission-critical applications, where unrestricted looping and recursion are not permitted.
67See for example rule 16.2 of MISRA C 2004, which prohibits recursive function calls entirely in safety-critical code, and Nasa JPL's programming guidelines, which states all loops must have a bounded number of iterations.} %cite
68As a result, one could potentially imagine pairing the loop-bound analysis with our parametric cost model to reduce the amount of user interaction required when using it.
70Abstract Interpretation may also be used to derive upper bounds for stack usage (e.g.~\cite{Regehr:2005:ESO:1113830.1113833}), and Abstract Interpretation tools for predicting stack space usage, for example the StackAnalyzer tool~\cite{stackanalyzer}, are also commercially available.
71An Abstract Interpretation-based analysis of resource usage for bytecode programs of a stack-based virtual machine was presented in~\cite{DalZilio2005}.
73Program logics for analysing resource use of low-level code have also been investigated (similar program logics for high-level programs will be discussed below).
74Barthe et al. developed a program logic for low-level Java bytecode programs, able to certify bounds on a bytecode program's heap usage~\cite{DBLP:conf/sefm/BarthePS05}.
76\paragraph{High-level approaches}
78Perhaps the most closely related piece of work to our own is the recent work by Carbonneaux \emph{et al}~\cite{DBLP:conf/pldi/Carbonneaux0RS14}, who extended the CompCert verified C compiler in order to lift a concrete cost model for stack space usage back to the C source-level.
79Like CerCo, this work was verified within a theorem prover---in this case Coq, rather than Matita.
80Though Carbonneaux \emph{et al} were working entirely independently of the project described herein\footnote{Personal communication with Carbonneaux and Hoffmann}, the two pieces of work share some remarkable similarities, with both projects developing an analogue of the `structured trace' data structure described earlier in this paper in order to facilitate the verification of the lifting.
81However, differences between the two projects exist.
82Carbonneaux \emph{et al} developed a `quantitative' Hoare logic for manually reasoning about stack space usage of compiled programs, coupled with an automated stack-space usage analyser built at the Clight intermediate language level; we developed a FramaC plugin for automatically establishing resource bounds, from the parametric complexity analysis offered by our compiler, modulo a degree of user input.
83Carbonneaux \emph{at al} lift a cost model for stack space usage; we lift a cost model for time and stack space usage.
84Carbonneaux \emph{at al} lift their model from the assembly generated by the CompCert compiler; we must go further, to the level of machine code, in order to lift our timing cost model, necessitating the development of a verified assembler.
85Nevertheless, despite these differences, the two projects are clearly closely related.
87Another closely related piece of work to our own is COSTA~\cite{Albert2012}, a heap resource bound static analysis system for the Java programming language, building on previous work investigating cost models for Java bytecode programs~\cite{ALBERT200767}.
88COSTA is able to produce parametric heap resource bounds for a given Java program, which are made manifest as Java Modelling Language (JML)~\cite{Chalin2006} assertions, inserted into the Java source file.
89Coupled with the KeY symbolic reasoning/automated reasoning system~\cite{KeYBook2016}, these assertions may be automatically verified, obtaining trustworthy upper bounds on a Java program's dynamic heap space usage at runtime.
91Similarly, the automatic instrumentation provided in~\cite{He2009}, combined with the Hip/SLEEK automated reasoning tool, can produce certified bounds on a program's memory usage.
92This instrumentation is proven sound with respect to an operational semantics of the programming language considered.
94Other work has explored the use of Computer Algebra Software, such as the Wolfram Mathematica system~\cite{mathematica}, for deriving closed-form upper-bounds of a program's resource usage~\cite{Albert2011}.
95A high-level static analysis for synthesising non-linear bounds on the memory usage of programs written in a Java-like programming language, extended with regions, made use of Bernstein bases in~\cite{symbolic-prediction-2007,DBLP:journals/jot/BrabermanGY06}.
96A series of work presented various static analyses of time- and space-usage of programs written in a simple functional programming language with regions, called Safe~\cite{Montenegro2010,Montenegro2010a}.
97One such analysis was verified in Isabelle/HOL~\cite{deDios2011}.
98An intraprocedural approach for deriving symbolic bounds on the number of times statements in a procedure are executed, given as a function of the procedure's scalar inputs, was developed by Gulwani et al~\cite{speed-precise-and-efficient-static-estimation-of-program-computational-complexity-2}.
99The intraprocedural analysis was implemented in the SPEED tool, and used to compute bounds for high-level programs, in several case studies.
100Another analysis for producing symbolic bounds---this time on stack and heap usage---of embedded code written in C, or similar languages, was presented in~\cite{Chin:2008:AMR:1375634.1375656}.
102Resource bounds may also be checked dynamically, at run time, using automatically inserted instrumentation code.
103This naturally incurs a runtime cost.
104A hybrid static-dynamic checker of resource bounds, for a novel programming language specifically designed to ease the implementation of the technique, was explored in~\cite{Chander2005}.
106The use of types to bound, or program logics to reason about, a program's resource usage have also been extensively investigated in previous work.
108Resource Aware ML (RAML) is a first-order statically typed functional programming language~\cite{Hoffmann11,HAH12Toplas,HoffHof10,HAH12,DBLP:journals/jfp/0002S15}.
109The RAML type system is able to derive polynomial resource bounds from first-order functional programs, using a variant of the `potential method'.
110This approach was first pioneered by Hoffman and Jost for amortized, linear bounds on resource consumption~\cite{DBLP:conf/esop/HofmannJ06}, who focussed on predicting heap-space usage.
111Campbell also used a variant of this approach to predict stack space usage~\cite{bac-thesis08,bac-giveback08,bac-esop09}, whilst Atkey extended the approach to imperative languages~\cite{Atkey2010}, using ideas from Separation Logic~\cite{Reynolds:2002:SLL:645683.664578}.
112Recent work has extended this approach to include more features of the core programming language~\cite{Hoffmann2015}, improving the resource analysis~\cite{Hoffmann:2012:MAR:2362389.2362393}, and to apply extensions of the technique to case studies, for example in demonstrating that cryptographic code is free from side-channel attacks~\cite{chan-ngo-verifying-2017}.
114In related work, Salvucci and Chailloux~\cite{Salvucci201627} used a static type and effect system to bound runtime heap usage of an impure functional programming language, similar to ML.
115Domain-specific type systems and calculi of mobile processes, for bounding agents in a distributed setting, have also been investigated~\cite{Barbanera2003,Teller2002}.
116Brady and Hammond also investigated a dependent type system extended with features for bounding a program's dynamic resource usage~\cite{Brady2006}.
117Similar ideas had previously been investigated by Crary and Weirich~\cite{Crary:2000:RBC:325694.325716}.
118Libraries within dependently-typed languages, such as Coq~\cite{McCarthy2016} and Agda~\cite{Danielsson:2008:LST:1328897.1328457}, have also been written for reasoning about the complexity of algorithms written in the language.
119One-off machine-checked verifications of the algorithmic complexity of various data structure implementations, and operations over them, have also been investigated (e.g.~\cite{Chargueraud2015,Nipkow2015}).
121Implicit Computational Complexity represents another related approach to bounding resource usage via types~\cite{DalLago2012}.
122Here, a fragment of Linear Logic~\cite{GIRARD19871}, or some other substructural logic, is taken as the language of types of some calculus, with a provable guarantee that typeable terms of the calculus normalise to a value in some bounded number of reduction steps (e.g. polynomial in the size of the term, as in the case of type systems inspired by Intuitionistic Light Affine Logic~\cite{DBLP:journals/tocl/AspertiR02}).
123Note that compared to the concrete complexity measures dealt with in this work, bounding reduction steps is rather coarse-grained: one reduction step in the calculus may decompose into some non-constant number of primitive machine operations, when executed on physical hardware.
124Indeed, the focus of Implicit Computational Complexity is not to capture \emph{concrete} resource usage of programs, but to provide a type-theoretic interpretation of complexity classes.
125A formalisation of a cost semantics for System T~\cite{DLTC:DLTC280} using ideas from Implicit Computational Complexity was formalised in Coq~\cite{Danner:2013:SCA:2428116.2428123}.
127Several program logics for reasoning about a program's resource consumption have been investigated in the literature.
129Aspinall et al. developed a program logic for a subset of the Java Virtual Machine's instruction set, capable of establishing resource bounds on JVM programs~\cite{ASPINALL2007411}, similar in spirit to the `quantitative Hoare logic' mentioned above.
130The soundness of the logic was formally verified in Isabelle/HOL.
131Another program logic for reasoning about Java resource usage, based on ideas taken from Separation Logic, was also developed by Aspinall et al.~\cite{Aspinall2010}.
133\paragraph{Verified compilation}
135The CerCo verified C compiler is naturally related to other verified compilation projects (see e.g.~\cite{Kumar:2014:CVI:2578855.2535841,LEINENBACH200823,Strecker2002,KLEIN200427,Wang:2014:CVM:2714064.2660201,Chlipala:2010:VCI:1707801.1706312,Chlipala:2007:CTC:1273442.1250742,Myreen:2009:VLI:1616077.1616105,Li:2008:TST:1792734.1792783,Myreen:2009:EPC:1532828.1532830} for more recent work, and~\cite{Dave:2003:CVB:966221.966235} for a now slightly outdated overview of the field) in that we consider the general problem of proving a program's semantics---whether functional or non-functional---are preserved under compilation, generally construed.
137However, out of all verified compiler projects, we view the CompCert verified C compiler~\cite{compcert} as the most closely related verified compiler to our own.
138Indeed, the intermediate languages chosen into the CerCo verified C compiler are at least partially inspired by those implemented in the CompCert compiler.
139Several differences between the two projects stand out, however.
141Most importantly, the vanilla CompCert compiler does not consider the lifting of any cost model from machine code to the source-level, which we do.
142Indeed, the correctness theorem for the CompCert compiler focusses exclusively on the preservation of extensional (functional) properties of the program under compilation.
143Whilst we consider extensional correctness, we also consider the tracking of non-functional properties of the program, too.
144This focus on lifting a cost model from machine code to high-level source code means that we must consider the generation of machine code, rather than assembly code, from our compiler.
145Therefore, unlike CompCert, the CerCo C compiler also contains a verified lightly-optimising assembler for our target architecture~\cite{DBLP:conf/tacas/BoenderC14,DBLP:conf/cpp/MulliganC12}.
147Further, the CompCert verified C compiler, since its initial release, has served as a nexus for research into the verification of realistic C compilers.
148As a result, various optimisations (e.g.~\cite{Tatlock:2010:BEV:1809028.1806611,Tristan-Leroy-LCM,Rideau-Leroy-regalloc,Blazy-Robillard-splitting}), program analyses (e.g.~\cite{Robert-Leroy-2012,2006-Leroy-Bertot-Gregoire}), extensions to the base CompCert C language and its semantics (e.g.~\cite{Boldo-Jourdan-Leroy-Melquiond-2013,Stewart:2015:CC:2775051.2676985,Kang:2016:LVS:2914770.2837642}), retargettings of the compiler to alternative hardware platforms (e.g.~\cite{Sevcik:2013:CVC:2487241.2487248}), and various program logics (e.g.~\cite{Leroy-Appel-Blazy-Stewart-memory}) have all extended the base CompCert verified compiler.
149None of this work has been implemented for the CerCo verified C compiler, meaning that CompCert is a much more aggressively optimising compiler, with a better developed toolchain surrounding the compiler, than the verified compiler presented in this paper.
151Lastly, the implementation styles of the two compilers, and their proofs, differ quite markedly.
152CompCert was intentionally written using as few indexed families of types, and as few advanced features of the Coq proof assistant, as possible.
153CerCo, on the other hand, was partially intended to stress-test the Matita type checker and reduction machinery, and also intended as a project that would explore the design space of verified compilers in a dependently-typed proof assistant.
154As a result, the Cerco implementation and proof of correctness makes extensive use of indexed type families and dependent types.
156% CompCert, CompCertTSO, CakeML, C0, etc.
158\subsection{Further work}
161Important future work is retargetting the CerCo verified C compiler to target hardware architectures characterised by history-dependent stateful components, like caches and pipelines.
162Most moden microprocessors, including all commodity microprocessors that ship with desktop computers and servers, fall into this class.
163A key issue here is the assignment of a parametric, dependent cost to basic blocks that can be later lifted to the source-code level by our labelling approach.
164Further, once this cost model is lifted, it must be represented in a meaningful and easy-to-understand way to the user.
165The dependent labelling approach that we have studied seems a promising tool to achieve this goal.
166However, more work is required to provide good source level approximations of the relevant processor state.
168Another key question that we wish to address is how one could retrofit an existing compiler with facilities for lifting cost models, as presented in this paper.
169Whilst we engineered the CerCo verified C compiler to track how each expression and statement of source-level code is translated, to a certain extent a more general tracking mechanism is already present in many commercial compilers which produce DWARF debug information~\cite{dwarf}.
170In particular, DWARF contains various tables, and other debugging data, on how a compiler translated code.
171This data includes a line number table, stored in the \texttt{.debug\_line} section, that contains the instructions of a simple state machine that may be executed to map code memory addresses back to file names and line numbers (and column numbers, for compilers that generate this data) at the source level.\footnote{Indeed, running \texttt{objdump -S} on an executable binary containing embedded DWARF debug information will display an interleaving of high-level source code and machine code, using this information.}
173Whether this debugging information can be harnessed to lift cost models back through the compilation chain is an open question.
174Several problems immediately stand out.
175First, advanced compilers often produce buggy DWARF, due to the challenge of precisely tracking how expressions are translated through aggressive control flow-altering optimisations.
176Indeed, the difficulty of enacting this tracking reliably is a major reason why we verified the approach presented in this work.
177Second, some of the information needed to effect the lifting of the cost model is not generated as a matter-of-course by mainstream compilers, despite being supported by the DWARF standard.
178In particular, many compilers do not produce column information, needed to uniquely identify the subexpression or statement from which machine code originated, preferring only to generate line numbers.
180Other examples of future work are to improve the cost invariant generator algorithms and the coverage of compiler optimisations, to combining the labelling approach with the type and effect discipline of~\cite{typeffects} to handle languages with implicit memory management, and to experiment with our tools in the early phases of development.
Note: See TracBrowser for help on using the repository browser.