source: Papers/jar-cerco-2017/conclusions.tex @ 3637

Last change on this file since 3637 was 3632, checked in by mulligan, 3 years ago

Small tweaks before work on another paper

File size: 10.2 KB
Line 
1\section{Conclusions}
2\label{sect.conclusions}
3
4% Conclusions
5%   Summary
6%   Related work
7%   Future work
8
9Intensional properties of programs---time and space usage, for example---are an important factor in the specification of a program, and therefore overall program correctness.
10Here, `intensional' properties can be analysed \emph{asymptotically}, or \emph{concretely}, with the latter analyses computing resource bounds in terms of clock cycles, (milli)seconds, bits transmitted, or other basal units of resource consumption, for a program execution.
11For many application domains, it is concrete complexity that is the more critical of the two analyses.
12`Soft real time' programs, with bounds on application response time, and libraries exporting cryptographic primitives, designed to be impervious to timing side-channel attacks, are two important classes of application fitting this pattern, for example.
13
14Worst Case Execution Time (WCET) tools currently represent the state of the art in providing static analyses that determine accurate concrete resource bounds for programs.
15These tools possess a number of disadvantages.
16For instance, their analyses are at once sophisticated and opaque, with trust in the analysis devolving to trust in the complex WCET tool implementation.
17Further, they also require that the analysis be performed on machine code produced by a compiler, where all high-level program structure has been `compiled away', making analyses potentially hard to understand for programmers accustomed to reasoning at the source-code level.
18
19More ideal would be a mechanism to `lift' a cost model from the machine code generated by a compiler, back to the source code level, where analyses could be performed in terms of source code, abstractions, and control-flow constructs written and understood by the programmer.
20A major challenge here is obtaining a source-level resource analysis that can compete with the precision offered by traditional WCET tools, and how to perform this high-level analysis reliably.
21
22The central insight of the CerCo project is that a compiler for a substantial fragment of the C programming language can be written in a style that facilitates high-level reasoning about concrete resource usage.
23CerCo's resource analysis proceeds by fully embracing the compilation process and the implicit knowledge of program translation that the compiler possesses, not ignoring it.
24In particular, the CerCo compiler tracks the translation of phrases of high-level source code into machine code, wherein a cost measure is applied, before being reflected back through the compiler to the source-code level.
25Thereafter, this cost model is reflected as parametric cost assertions, embellishing the program's source code, which can be used by external reasoning tools---trusted or untrusted---to provide hard bounds on a program's resource consumption.
26
27Further, the CerCo verified compiler and associated toolchain presented in this paper have demonstrated that it is indeed possible to perform static concrete resource analysis at the source level with the precision one would expect of traditional WCET tools.
28Indeed, for the MCS-51, our compiler's target architecture, our parametric cost assertions are capable of producing \emph{exact} bounds on resource consumption.
29Further, as the CerCo compiler's lifting of the cost model is fully verified in Matita, the trusted code base is minimised---one no longer need rely on the correctness of a WCET tool to obtain a trustworthy analysis.
30
31In CerCo, we believe that we have developed a general approach to structuring compilers for lifting cost models to the source-level.
32In particular, the CerCo compiler itself is a lightly-optimising compiler for a large subset of the C programming language, and whilst not as aggressive in its optimisation passes as commercial and open-source C compilers, such as \texttt{gcc}, \texttt{clang}, or \texttt{icc}, the compiler is still a relatively substantial piece of software.
33As a result, the application of the techniques described herein, first described with pen-and-paper proofs, to the full CerCo compiler is, in our view, already evidence of their scalability, and we see no reason why these techniques could not be applied to a more aggressively optimising compiler, given that none of our techniques are tied to any particular choice of intermediate language or set of optimisations.
34The techniques that we have presented are not tied to any particular programming language, and are in fact amenable to arbitrary imperative and functional programming languages.
35This last claim is bolstered by a pen-and-paper proof of the correctness of the labelling approach applied to a `toy' compiler for an applicative language. %cite
36
37\subsection{Related work}
38\label{subsect.related.work}
39
40Many different approaches to resource analysis---both asymptotic and concrete---are present within the literature.
41
42\paragraph{Compiler-based cost-model lifting}
43Perhaps the most similar piece of work to our own is the recent work by Carbonneaux \emph{et al}~\cite{DBLP:conf/pldi/Carbonneaux0RS14}, who extended the CompCert verified C compiler in order to lift a concrete cost model for stack space usage back to the C source-level.
44Like CerCo, this work was verified within a theorem prover---in this case Coq, rather than Matita.
45Though Carbonneaux \emph{et al} were working entirely independently of the project described herein\footnote{Personal communication with Carbonneaux and Hoffmann}, the two pieces of work share some remarkable similarities, with both projects developing an analogue of the `structured trace' data structure described earlier in this paper in order to facilitate the verification of the lifting.
46However, differences between the two projects exist.
47Carbonneaux \emph{et al} developed a `quantitative' Hoare logic for manually reasoning about stack space usage of compiled programs, coupled with an automated stack-space usage analyser built at the Clight intermediate language level; we developed a FramaC plugin for automatically establishing resource bounds, from the parametric complexity analysis offered by our compiler, modulo a degree of user input.
48Carbonneaux \emph{at al} lift a cost model for stack space usage; we lift a cost model for time and stack space usage.
49Carbonneaux \emph{at al} lift their model from the assembly generated by the CompCert compiler; we must go further, to the level of machine code, in order to lift our timing cost model, necessitating the development of a verified assembler.
50Nevertheless, despite these differences, the two projects are clearly closely related.
51
52\paragraph{WCET, and Abstract Interpretation}
53As mentioned above, the current state of the art in concrete resource analysis is represented by WCET tools such as AbsInt's aiT static analyser, which use Abstract Interpretation, control flow analyses, and detailed models of hardware to derive tight time bounds on expected program execution time.
54
55Unlike the approach presented in this paper, the analyses performed by mainstream WCET tools are unverified: the WCET tool itself, typically a large, complex piece of software, must become part of one's trusted code base.
56However, recent work has investigated the verification of loop-bound analysis, a subcomponent of wider WCET analyses, for an intermediate language of the CompCert verified C compiler.
57Note that this work is related but orthogonal to the work presented in this paper: our cost annotations at the C source-level are \emph{parametric} in the number of iterations of loops appearing in the source code.
58Indeed, to obtain a final resource analysis using our toolchain, the user must manually bound loop iterations.\footnote{This is typical for embedded and safety- or mission-critical applications, where unrestricted looping and recursion are not permitted.  See for example rule 16.2 of MISRA C 2004, which prohibits recursive function calls entirely in safety-critical code, and Nasa JPL's programming guidelines, which states all loops must have a bounded number of iterations.} %cite
59As a result, one could potentially imagine pairing the loop-bound analysis with our parametric cost model to reduce the amount of user interaction required when using it.
60
61\paragraph{Type-based analyses}
62
63Resource Aware ML (RAML) developed by Hoffmann and Hoffmann is a first-order statically typed functional programming language.%cite
64RAML's sophisticated type system is able to
65
66\paragraph{Other approaches}
67
68The Hume programming language, developed as part of the EU's EmBounded project, is a typed functional programming language intended for embedded applications that is compiled to a custom abstract machine---the Hume Abstract Machine (HAM). %cite
69Each instruction of the HAM has a fixed concrete timing cost, derived experimentally by executing the abstract machine instruction repeatedly on target hardware and averaging.
70These costs induce a cost model on compiled Hume code, which can be used by a prototype static analysis tool, or AbsInt's aiT WCET analyser, to derive upper bounds for running time of a given Hume program.
71
72\subsection{Further work}
73\label{subsect.further.work}
74
75To prove that compilers can keep track of optimisations
76and induce a precise cost model on the source code, we targeted a simple
77architecture that admits a cost model that is execution history independent.
78The most important future work is dealing with hardware architectures
79characterised by history-dependent stateful components, like caches and
80pipelines. The main issue is to assign a parametric, dependent cost
81to basic blocks that can be later transferred by the labelling approach to
82the source code and represented in a meaningful way to the user. The dependent
83labelling approach that we have studied seems a promising tool to achieve
84this goal, but more work is required to provide good source level
85approximations of the relevant processor state.
86
87Other examples of future work are to improve the cost invariant
88generator algorithms and the coverage of compiler optimisations, to combining
89the labelling approach with the type and effect discipline of~\cite{typeffects}
90to handle languages with implicit memory management, and to experiment with
91our tools in the early phases of development. Larger case studies are also necessary
92to evaluate the CerCo's prototype on realistic, industrial-scale programs.
Note: See TracBrowser for help on using the repository browser.