source: Papers/jar-cerco-2017/introduction.tex

Last change on this file was 3644, checked in by mulligan, 3 years ago

intro finished?

File size: 30.9 KB
Line 
1\newcommand{\All}[1]{\forall{#1}.\;}
2\newcommand{\lam}[1]{\lambda{#1}.\;}
3
4% Introduction
5%   Problem being solved, background, etc.
6%   Current state of the art (and problem with it)
7%   The `CerCo approach' (tm)
8%   Brief Matita overview
9%   Map of paper
10
11\section{Introduction}
12\label{sect.introduction}
13
14Programs are specified using both functional and non-functional constraints.
15Functional constraints dictate what tasks a program must do; non-functional constraints limit the resources the program may consume whilst completing those tasks.
16
17Depending on the application domain, non-functional constraints are as important as functional constraints when specifying a program.
18Real-time systems, with hard limits on application response times, implementations of cryptographic primitives, which must be hardened to timing side-channel attacks, and a heart pacemaker's embedded controller, which must fit inside a limited code memory space, are examples that fit this pattern.
19A cryptography library susceptible to a timing side-channel attack is not an annoyance---it's an implementation error that undermines the entire purpose of the library.
20
21A program's non-functional constraints may be given \emph{concretely}, or \emph{asymptotically}.
22Asymptotic complexity, as every Computer Science undergraduate knows, is important---but so too is concrete complexity for many applications, including the three we highlighted above as examples.
23Response times for a real time system are measured in seconds, milliseconds, or some other fundamental unit of time; a cryptographic library must have all execution paths execute in the same number of processor cycles, independent of any input passed by the client; the size of an embedded controller for a pacemaker is measured in bits, bytes, or some other unit of computer memory capacity.
24In all cases, resource consumption is measured (and is specified) in some basal, concrete unit of measure.
25
26Currently, a program's functional properties can be established by combining user annotations---preconditions, invariants, and so on---with various automated and semi-automated analyses---invariant generators, type systems, abstract interpretations, applications of theorem proving, and so on---on the high-level source code of the program.
27Functional properties of a program are therefore established by reasoning about the source code that the application programmer actually sees and writes.
28Further, the results of any analysis can be communicated to the programmer in terms of abstractions, control flow, and an overall system design that they are familiar with.
29
30By contrast, a program's concrete, non-functional properties are established by reasoning on low-level object code produced not by a programmer, but by a compiler.
31Whilst analyses operating at this level can and do produce very accurate results---Worst Case Execution Time (WCET) analysis can be extraordinarily accurate, for example---analysis at such a low-level of abstraction invariably has disadvantages:
32\begin{itemize}
33\item
34It can be hard to deduce the high-level structure of the program after compiler optimisations.
35The object code produced by an optimising compiler may have a radically different control flow to the original source code program.
36\item
37Object code analysis is unstable.
38Modern compilers and linkers are highly non-compositional: they implement a variety of sophisticated optimisation passes, and use an abundance of procedural, intra-procedural, module-level, and link-time analyses to direct these optimisations.
39As a result, small changes in high-level source code can produce radically different object code outputs from a compiler, affecting any analysis tied to that object code.
40\item
41It is well understood by Software Engineers that problems with the design or implementation of a program are cheaper to resolve early in the development process. %cite
42Despite this, techniques that operate on object code are not useful early in the development process of a program, where programs may be incomplete, with missing functionality, or missing modules.
43\item
44Parametric cost analysis is very hard: how can we translate a cost that depends on the execution state, for example the value of a register or a carry bit, to a cost that the user can understand whilst looking at the source code?
45\item
46Performing functional analysis on object code makes it hard for the programmer to provide information about the program and its expected execution, leading to a loss of precision in the resulting analysis.
47It is hard for the programmer to understand the results of the analysis, or direct its execution, as high-level abstractions, control flow constructs, and so on, introduced by the programmer are `translated away'.
48\end{itemize}
49
50More ideal would be a high-level analysis of concrete, non-functional properties, coupled with the accuracy one would expect of e.g. existing WCET analysers.
51This would lead to a reconciliation of functional and non-functional analyses.
52Information could be shared between the two analyses, and both could be performed concurrently on the same high-level source code.
53
54What has previously prevented high-level reasoning about non-functional properties is the lack of a precise cost model for programs written in programming languages, such as C.
55As modern compilers---mentioned above---are inherently non-compositional in their translations, any precise concrete cost model for a high-level language must also be non-compositional, and is inherently tied to how an individual program is compiled.
56However, as optimisations may change control flow, and the cost of an object code instruction may depend on the runtime state of hardware components like pipelines and caches, none of which are visible in the source code, it has not been clear how one could go about defining such a cost model.
57
58In this paper, we report on the scientific and technical contributions of the Certified Complexity (`CerCo') project~\cite{cerco}, an EU FET-Open funded collaboration between the Universities of Bologna, Edinburgh, and Paris 7.
59
60The central insight of the CerCo project is that a cost model for a high-level programming language can be defined using a compiler's implicit knowledge of how expressions and statements are translated to machine code.
61This implicit knowledge of how a compiler translates source-language phrases has to date been largely ignored when designing static analysis tools for concrete complexity measures.
62However, by embracing the compilation process itself, one is able to define a non-uniform, or non-compositional, cost model that is customised to the exact translation process that was subjected to the source-language program by the compiler.
63
64Accordingly, we first present a non-uniform, precise cost model for a large fragment of the C programming language---\emph{CerCo C}, which is based on early versions of \emph{CompCert C}~\cite{compcert}.
65To define this cost model, we have developed a technical device---dubbed \emph{the labelling approach}~\cite{labelling}---which is a technique to implement compilers that induce cost models on source programs by a very lightweight tracking of code changes through the compilation pipeline.
66
67To establish that this tracking works, and scales beyond `toy' languages to more realistic programming languages, we have implemented a lightly-optimising compiler employing this technique.
68The \emph{CerCo verified C compiler} compiles the CerCo C language fragment to object binaries for the MCS-51 8-bit embedded microcontroller.
69This is a well-known and still popular processor in the embedded systems community.
70Our compiler lifts a cost-model induced on the machine code that it produces, back through the compilation chain, to the source-level, making use of the labelling approach to effect the tracking and lifting.
71At the source level, this cost model is made manifest as parametric cost-annotations, inserted into the original source code file of the program being compiled.
72These annotations may be used by the programmer to predict concrete execution time and stack space usage.
73
74As we are targeting an embedded microcontroller we have not considered dynamic memory allocation.
75However, we believe that our labelling approach is general enough to handle resources other than execution time and stack space usage, including dynamic memory allocation.
76
77To demonstrate source-level verification of costs we have implemented a Frama-C plugin~\cite{framac} that invokes our compiler on a source program and uses it to generate invariants on the high-level source that correctly model low-level costs.
78The plugin certifies that the program respects these costs by calling automated theorem provers, an innovative and novel technique in the field of static analysis for non-functional properties.
79Using this plugin, we have conducted case studies, including showing that the plugin can automatically compute and certify the exact reaction time of Lustre~\cite{lustre} data flow programs compiled into C.
80
81\subsection{Our approach in more detail}
82
83As mentioned above, in CerCo we reject the idea of a uniform, or compositional, cost model for a high-level programming language.
84We view any uniform cost model as being by its nature imprecise, as modern compilers themselves are highly non-compositional in their translations.
85Far better is to try to engineer the compiler---which knows how the code is translated---to return the cost model for basic blocks of high level instructions.
86To do this, the compiler must track modifications to the control flow of the program, in order to reverse them, and interface with a processor timing analysis.
87
88By embracing compilation, the CerCo compiler can both produce efficient code and return costs that are as precise as the processor timing analysis can be.
89Moreover, our costs can be parametric: the cost of a block can depend on actual program data, on a summary of the execution history, or on an approximated representation of the hardware state.
90
91For example, loop optimisations may assign a cost to a loop body that is a function of the number of iterations performed.
92As another example, the cost of a block may be a function of the vector of stalled pipeline states, which can be exposed in the source code and updated at each basic block exit.
93
94It is parametricity that allows one to analyse small code fragments without losing precision.
95In the analysis of the code fragment we do not have to ignore the initial hardware state; rather, we can assume that we know exactly which state (or mode, as the WCET literature calls it) we are in.
96Note that this approach also extends naturally to the early stages of program development.
97Users may axiomatically attach costs to unimplemented sub-components, or modules, which may be refined as program development proceeds.
98
99The CerCo approach is an improvement on the state of the art.
100By performing control and data flow analyses on the source code, the error-prone translation of invariants is avoided entirely.
101Instead, this work is done at the source code level using tools of the user's choice.
102Indeed, any available technique for the verification of functional properties can be easily reused and multiple techniques can collaborate to infer and certify cost invariants for the program.
103There are no limitations on the types of loops or data structures involved.
104Parametric cost analysis becomes the default, with non-parametric bounds used only as a last resort when the user decides to trade the complexity of the analysis for more precision.
105
106Note that we do not aim to supplant existing WCET analyses, but rather to augment and potentially embrace them as subcomponents of our own analysis.
107\emph{A priori}, no technique previously used in traditional WCET is obsolete: processor timing analysis can be used by the compiler on the object code, and other techniques can be applied at the source code level.
108In this work, we have instantiated the CerCo approach with a cost model for the MCS-51 hardware that our prototype compiler targets.
109As this hardware is relatively simple, it admits a precise cost model, and this is reflected in our final theorem, which demonstrates that our high-level cost annotations are both accurate and precise.
110On more modern hardware, exhibiting features such as caches, pipelines, and speculative execution, an exact cost model for time or space will be hard-to-impossible to define.
111Instead, existing WCET analyses could be used, and then lifted to high-level annotations using our non-standard compiler architecture.
112
113We take it as an article of faith that software used to verify other software must itself be as bug-free as possible---preferably verified itself.
114One way to achieve this is to reduce the trusted code base of any tools used.
115Here, `trusted code base' refers to the corpus of code that must be trusted to believe that the analysis being carried out is accurate.
116The trusted code base of state-of-the-art static analysis tools, such as WCET analysers, is significant.
117These are sophisticated, complex pieces of software that are built around a control flow analyser, linear programming libraries, and also detailed formal models of the hardware under analysis.
118To trust a WCET tool, each of these subcomponents, along with their eventual combination, must all be trusted.
119
120In CerCo we move the control flow analysis to the source code level, and we introduce a non-standard compiler.
121As this compiler architecture is completely novel, we wish to ensure that our implementation (and by extension, our entire approach to lifting cost models) is correct.
122Therefore, to reduce the size of our trusted code base, we have implemented a prototype compiler and static analyser in an interactive theorem prover---Matita~\cite{matita,asperti:user:2007}, an implementation of the Calculus of Coinductive Constructions---which was used to certify that the cost annotations supplementing the program's source code are indeed an accurate reflection of those incurred by the hardware.
123We have also implemented formal models of the hardware and of the high-level source-language in the interactive theorem prover.
124
125Control flow analysis on the source code has been obtained using invariant generators, tools to produce proof obligations from generated invariants and automatic theorem provers to verify the obligations.
126If these tools are able to generate proof traces that can be independently checked, the only remaining component that enters the trusted code base is an off-the-shelf invariant generator which, in turn, can be proved correct using an interactive theorem prover.
127With these methods, we enable the use of more off-the-shelf components (e.g. provers and invariant generators) whilst reducing the trusted code base at the same time.
128
129\subsection{A proposed workflow}
130
131\begin{figure}[!t]
132\begin{tabular}{l@{\hspace{0.2cm}}|@{\hspace{0.2cm}}l}
133\begin{lstlisting}
134char a[] = {3, 2, 7, 14};
135char threshold = 4;
136
137int count(char *p, int len) {
138  char j;
139  int found = 0;
140  for (j=0; j < len; j++) {
141    if (*p <= threshold)
142      found++;
143    p++;
144    }
145  return found;
146}
147
148int main() {
149  return count(a,4);
150}
151\end{lstlisting}
152&
153%  $\vcenter{\includegraphics[width=7.5cm]{interaction_diagram.pdf}}$
154\begin{tikzpicture}[
155    baseline={([yshift=-.5ex]current bounding box)},
156    element/.style={draw, text width=1.6cm, on chain, text badly centered},
157    >=stealth
158    ]
159{[start chain=going below, node distance=.5cm]
160\node [element] (cerco) {CerCo\\compiler};
161\node [element] (cost)  {CerCo\\cost plugin};
162{[densely dashed]
163\node [element] (ded)   {Deductive\\platform};
164\node [element] (check) {Proof\\checker};
165}
166}
167\coordinate [left=3.5cm of cerco] (left);
168{[every node/.style={above, text width=3.5cm, text badly centered,
169                     font=\scriptsize}]
170\draw [<-] ([yshift=-1ex]cerco.north west) coordinate (t) --
171    node {C source}
172    (t-|left);
173\draw [->] (cerco) -- (cost);
174\draw [->] ([yshift=1ex]cerco.south west) coordinate (t) --
175    node {C source+\color{red}{cost annotations}}
176    (t-|left) coordinate (cerco out);
177\draw [->] ([yshift=1ex]cost.south west) coordinate (t) --
178    node {C source+\color{red}{cost annotations}\\+\color{blue}{synthesized assertions}}
179    (t-|left) coordinate (out);
180{[densely dashed]
181\draw [<-] ([yshift=-1ex]ded.north west) coordinate (t) --
182    node {C source+\color{red}{cost annotations}\\+\color{blue}{complexity assertions}}
183    (t-|left) coordinate (ded in) .. controls +(-.5, 0) and +(-.5, 0) .. (out);
184\draw [->] ([yshift=1ex]ded.south west) coordinate (t) --
185    node {complexity obligations}
186    (t-|left) coordinate (out);
187\draw [<-] ([yshift=-1ex]check.north west) coordinate (t) --
188    node {complexity proof}
189    (t-|left) .. controls +(-.5, 0) and +(-.5, 0) .. (out);
190\draw [dash phase=2.5pt] (cerco out) .. controls +(-1, 0) and +(-1, 0) ..
191    (ded in);
192}}
193%% user:
194% head
195\draw (current bounding box.west) ++(-.2,.5)
196    circle (.2) ++(0,-.2) -- ++(0,-.1) coordinate (t);
197% arms
198\draw (t) -- +(-.3,-.2);
199\draw (t) -- +(.3,-.2);
200% body
201\draw (t) -- ++(0,-.4) coordinate (t);
202% legs
203\draw (t) -- +(-.2,-.6);
204\draw (t) -- +(.2,-.6);
205\end{tikzpicture}
206\end{tabular}
207\caption{On the left: C code to count the number of elements in an array
208 that are less than or equal to a given threshold. On the right: CerCo's interaction
209 diagram. Components provided by CerCo are drawn with a solid border.}
210\label{test1}
211\end{figure}
212
213\fvset{commandchars=\\\{\}}
214\lstset{morecomment=[s][\color{blue}]{/*@}{*/},
215        moredelim=[is][\color{blue}]{$}{$},
216        moredelim=[is][\color{red}]{|}{|},
217        lineskip=-2pt}
218\begin{figure}[!p]
219\begin{lstlisting}
220|int __cost = 33, __stack = 5, __stack_max = 5;|
221|void __cost_incr(int incr) { __cost += incr; }|
222|void __stack_incr(int incr) {
223  __stack += incr;
224  __stack_max = __stack_max < __stack ? __stack : __stack_max;
225}|
226
227char a[4] = {3, 2, 7, 14};  char threshold = 4;
228
229/*@ behavior stack_cost:
230      ensures __stack_max <= __max(\old(__stack_max), 4+\old(__stack));
231      ensures __stack == \old(__stack);
232    behavior time_cost:
233      ensures __cost <= \old(__cost)+(178+214*__max(1+\at(len,Pre), 0));
234*/
235int count(char *p, int len) {
236  char j;  int found = 0;
237  |__stack_incr(4);  __cost_incr(111);|
238  $__l: /* internal */$
239  /*@ for time_cost: loop invariant
240        __cost <= \at(__cost,__l)+
241                  214*(__max(\at((len-j)+1,__l), 0)-__max(1+(len-j), 0));
242      for stack_cost: loop invariant
243        __stack_max == \at(__stack_max,__l);
244      for stack_cost: loop invariant
245        __stack == \at(__stack,__l);
246      loop variant len-j;
247  */
248  for (j = 0; j < len; j++) {
249    |__cost_incr(78);|
250    if (*p <= threshold) { |__cost_incr(136);| found ++; }
251    else { |__cost_incr(114);| }
252    p ++;
253  }
254  |__cost_incr(67);  __stack_incr(-4);|
255  return found;
256}
257
258/*@ behavior stack_cost:
259      ensures __stack_max <= __max(\old(__stack_max), 6+\old(__stack));
260      ensures __stack == \old(__stack);
261    behavior time_cost:
262      ensures __cost <= \old(__cost)+1358;
263*/
264int main(void) {
265  int t;
266  |__stack_incr(2);  __cost_incr(110);|
267  t = count(a,4);
268  |__stack_incr(-2);|
269  return t;
270}
271\end{lstlisting}
272\caption{The C program in Figure~\ref{test1} embellished with instrumentation added by the CerCo compiler in red and cost invariants added by the CerCo Frama-C plugin in blue.
273The \lstinline'__cost', \lstinline'__stack' and \lstinline'__stack_max' global variables hold the elapsed time in clock cycles and the current and maximum stack usage.
274Their initial values hold the clock cycles spent initialising global data in the preamble before calling \lstinline'main' and the space required by global data (and thus unavailable for the stack).
275}
276\label{itest1}
277\end{figure}
278
279How should a Software Engineer use the CerCo toolset in anger?
280We provide an illustrative C program on the left of Figure~\ref{test1}, along with a prospective workflow making use of our toolset on the right of Figure~\ref{test1}.
281An engineer writes their C program and feeds it as input to the CerCo compiler, which outputs an instrumented version of the same program that updates fresh global variables that record the elapsed execution time and the stack space usage.
282Figure~\ref{itest1} demonstrates one of the outputs of the CerCo C compiler---an embellished version of the original C source code, with cost annotations automatically inserted, the other output being an object code file, not included here.
283The red lines in Figure~\ref{itest1} introducing variables, functions and function calls starting with \lstinline'__cost' and \lstinline'__stack' are the instrumentation introduced by the compiler.
284For example, the two calls at the start of \lstinline'count' say that 4 bytes of stack are required, and that it takes 111 cycles to reach the next cost annotation (in the loop body).
285The compiler establishes these costs labelled object code that it generates, and lifts them back through the compilation chain, where they are finally reflected as cost annotations at the source level.
286
287The annotated program can then be enriched with complexity assertions in the style of Hoare logic, that are passed to a deductive platform (in our case Frama-C~\cite{framac}).
288We have written a simple automatic synthesiser for complexity assertions as a plugin for Frama-C.
289The behaviour of this plugin can be overridden by the user to increase or decrease accuracy.
290These are the blue comments starting with \lstinline'/*@' in Figure~\ref{itest1}, written in Frama-C's specification language, ACSL~\cite{acsl}.
291From these assertions, a general purpose deductive platform produces proof obligations which in turn can be closed by automatic or interactive provers, ending in a proof certificate.
292
293% NB: if you try this example with the live CD you should increase the timeout
294
295Twelve proof obligations are generated from Figure~\ref{itest1}.
296These obligations are to prove that the loop invariant holds after one execution if it holds before, to prove that the whole program execution takes at most 1358 cycles, and so on.
297Note that the synthesised time bound for \lstinline'count'---$178+214*(1+\text{\lstinline'len'})$ cycles---is parametric in the length of the array.
298The CVC3 SMT solver~\cite{DBLP:conf/cav/BarrettT07} closes all obligations within 30 seconds on commodity hardware.
299A simpler non-parametric version can be solved in seconds.
300
301\subsection{A brief introduction to Matita}
302
303We now briefly introduce Matita, the interactive theorem prover that was used in CerCo to certify all proofs.
304
305Matita is an interactive theorem proving system based on a variant of the Calculus of Coinductive Constructions~\cite{matita,asperti:user:2007}.
306The system features (co)inductive families of types, a system of coercions, a tactic-driven proof construction engine~\cite{sacerdoti-coen:tinycals:2007}, and paramodulation based automation~\cite{asperti:higher-order:2007}, all of which we exploit in the formalisation described herein.
307
308Matita's syntax is similar to the syntaxes of mainstream functional programming languages such as OCaml or Standard ML.
309The type theory that Matita implements is broadly akin to that of Coq~\cite{coq:2004} and Agda~\cite{bove:brief:2009}.
310Nevertheless, we provide a brief explanation of the main syntactic and type-theoretic features of Matita that will be needed to follow the body of the paper:
311\begin{itemize}
312\item
313Non-recursive functions and definitions are introduced via the \texttt{definition} keyword.
314Recursive functions are introduced with \texttt{let rec}.
315Mutually recursive functions are separated via the \texttt{and} keyword.
316Matita's termination checker ensures that all recursive functions are terminating before being admitted into the global environment, to maintain soundness.
317\item
318Matita has an infinite hierarchy of type universes.
319A single impredicative universe of types, \texttt{Prop}, exists at the base of this hierarchy.
320An unbounded series of predicative universes, \texttt{Type[0]}, \texttt{Type[1]}, \texttt{Type[2]}, and so on and so forth, sits atop \texttt{Prop}.
321One peculiarity of Matita is a user-customisable hierarchy of universes.
322In our development, we work in a context where \texttt{Prop} is contained in \texttt{Type[0]}, \texttt{Type[0]} is contained in \texttt{Type[1]}, and so on.
323Matita, unlike Coq or Agda, implements no form of typical ambiguity or universe polymorphism, with explicit concrete universe levels being preferred instead.
324\item
325Matita's type theory plays host to a rich and expressive higher-order logic.
326Constants \texttt{True} and \texttt{False} represent truth and falsity in \texttt{Prop} respectively.
327Two inductive families in \texttt{Prop} encode conjunction and disjunction---$\mathtt{P \wedge Q}$ and $\mathtt{P \vee Q}$ respectively.
328As is usual, implication and universal quantification are identified with the dependent function space ($\Pi$ types), whereas (constructive) existential quantification is encoded as a dependent sum (a $\Sigma$-type).
329We write $\All{x : \phi}\psi$ for the dependent function space, and abbreviate this as $\phi \rightarrow \psi$ when $x \not\in fv(\psi)$ as usual.
330We use $\langle M,N \rangle$ for the pairing of $M$ and $N$.
331\item
332Inductive and coinductive families are introduced via the \texttt{inductive} and \texttt{coinductive} keywords respectively, with named constructor declarations separated by a bar.
333Mutually inductive data family declarations are separated via \texttt{with}.
334In the following declaration:
335\begin{lstlisting}[language=Grafite]
336inductive I ($P_1$ : $\tau_1$) $\ldots$ ($P_n$ : $\tau_n$) : $\phi_1 \rightarrow \ldots \rightarrow \phi_m \rightarrow \phi$ := $\ldots$
337\end{lstlisting}
338We call $P_i$ for $0 \leq i \leq n$ the \textbf{parameters} of \texttt{I} and $\phi_j$ for $0 \leq j \leq m$ the \textbf{indices} of \texttt{I}.
339Matita's positivity checker ensures that constructors have strictly-positive types before admitting a (co)inductive family of types into the global environment, to maintain soundness.
340\item
341Records are introduced with the \texttt{record} keyword.
342A Matita record
343\begin{lstlisting}[language=Grafite]
344record R : Type[0] := { F1 : nat }.
345\end{lstlisting}
346may be thought of as syntactic sugar for a single-constructor inductive type of the same name:
347\begin{lstlisting}[language=Grafite]
348inductive R : Type[0] :=
349  | mk_R : nat -> R.
350\end{lstlisting}
351Types of later fields may depend on fields declared earlier in the record.
352Records may be decomposed with projections.
353Projections, one for each of field of a record, are registered in the global context.
354In the example record above, \texttt{F1} of type $R \rightarrow nat$ is registered as a field projection and $mk\_R$ of type $nat \rightarrow R$ is registered as a constructor.
355Record fields may also be marked as coercions.
356In the following example
357\begin{lstlisting}[language=Grafite]
358record S : Type[1] :=
359{
360  Carrier :> Type[0];
361  op : Carrier -> Carrier -> Carrier
362}
363\end{lstlisting}
364the field \texttt{Carrier} is declared to be a coercion with `\texttt{:>}'.with the operational effect being that the field projection \texttt{Carrier} may be omitted where it could be successfully inferred by Matita.
365Field coercions facilitate the informal but common mathematical practice of intentionally confusing a structure with its underlying carrier set.
366\item
367Terms may be freely omitted, allowing the user to write down partial types and terms.
368A question mark, \texttt{?}, denotes a single term that has been omitted by the user.
369Some omitted terms can be deduced by Matita's refinement system~\cite{DBLP:journals/corr/abs-1202-4905}.
370Other, more complex goals arising from omitted terms may require user input to solve, in which case a proof obligation is opened for each term that cannot be deduced automatically.
371Three consecutive dots, \texttt{$\ldots$}, denote a vector of terms or types that have been omitted, and need to be inferred.
372\item
373(Co)data may be decomposed by pattern matching with a \texttt{match} expression.
374We may fully annotate a \texttt{match} expression with its return type.
375This is especially useful when working with indexed families of types or with invariants, expressed as types, on functions.
376In the following
377\begin{lstlisting}[language=Grafite]
378match t return $\lam{x}x = 0 \rightarrow bool$ with
379[ 0    $\Rightarrow$ $\lam{prf_1}P_1$
380| S m $\Rightarrow$ $\lam{prf_2}P_2$
381] (refl $\ldots$ t)
382\end{lstlisting}
383the \texttt{0} branch of the \texttt{match} expression returns a function from $0 = 0$ to \texttt{bool}, whereas the \texttt{S m} branch of the \texttt{match} expression returns a function from \texttt{S m = 0} to \texttt{bool}.
384In both cases the annotated return type $\lam{x}x = 0 \rightarrow bool$ has been specialised given new information about \texttt{t} revealed by the act of pattern matching.
385The entire term, with \texttt{match} expression applied to \texttt{refl $\ldots$ t}, has type \texttt{bool}.
386\item
387Matita features a liberal system of coercions (distinct from the previously mentioned record field coercions).
388It is possible to define a uniform coercion $\lam{x}\langle x, ?\rangle$ from every type $T$ to the dependent product $\Sigma{x : T}. P x$.
389The coercion opens a proof obligation that asks the user to prove that $P$ holds for $x$.
390When a coercion is to be applied to a complex term (for example, a $\lambda$-abstraction, a local definition, or a case analysis), the system automatically propagates the coercion to the sub-terms.
391For instance, to apply a coercion to force $\lam{x}M : A \rightarrow B$ to
392have type $\All{x : A}\Sigma{y : B}. P x y$, the system looks for a coercion from $M : B$ to $\Sigma{y : B}. P x y$ in a context augmented with $x : A$.
393This is significant when the coercion opens a proof obligation, as the user will be presented with multiple, but simpler proof obligations in the correct context.
394In this way, Matita supports the `Russell' proof methodology developed by Sozeau in~\cite{sozeau:subset:2007}, in a lightweight but tightly-integrated manner.
395\end{itemize}
396Throughout, for reasons of clarity, conciseness, and readability, we may choose to simplify or omit parts of Matita code.
397We will always ensure that these omissions do not mislead the reader.
398
399\subsection{Map of the paper}
400
401The rest of the paper is structured as follows:
402
403In Section~\ref{sect.compiler.architecture}, we describe the architecture of the CerCo compiler.
404We describe the compiler's intermediate languages, from CerCo C down to MCS-51 machine code, and their formal semantics.
405We also describe the optimisations that the CerCo verified C compiler implements.
406
407In Section~\ref{sect.compiler.proof}, we describe the proof of correctness of the compiler in more detail.
408We explain \emph{structured traces}, a technical device developed to complete the proof, and our labelling approach.
409We further discuss the verification of our translation and optimisation passes for the intermediate languages, and discuss our verified assembler.
410
411In Section~\ref{sect.formal.development}, we present statistical data on the formal development, including line counts, and so on, and discuss remaining axioms in our proof of the compiler's correctness.
412We also briefly discuss `lessons learnt' in the formal development of the CerCo verified C compiler.
413
414In Section~\ref{sect.framac.plugin}, we discuss the Frama-C plugin for certifying the concrete complexity of programs written in CerCo C.
415We also present a case study, demonstrating the plugin in action.
416
417Finally, in Section~\ref{sect.conclusions} we conclude, discussing related and future work.
Note: See TracBrowser for help on using the repository browser.