[3619] | 1 | \newcommand{\All}[1]{\forall{#1}.\;} |
---|
| 2 | \newcommand{\lam}[1]{\lambda{#1}.\;} |
---|
| 3 | |
---|
[3615] | 4 | % Introduction |
---|
| 5 | % Problem being solved, background, etc. |
---|
| 6 | % Current state of the art (and problem with it) |
---|
| 7 | % The `CerCo approach' (tm) |
---|
| 8 | % Brief Matita overview |
---|
| 9 | % Map of paper |
---|
| 10 | |
---|
[3613] | 11 | \section{Introduction} |
---|
| 12 | \label{sect.introduction} |
---|
| 13 | |
---|
[3633] | 14 | Programs are specified using both functional and non-functional constraints. |
---|
| 15 | Functional constraints dictate what tasks a program must do; non-functional constraints limit the resources the program may consume whilst completing those tasks. |
---|
[3634] | 16 | |
---|
[3633] | 17 | Depending on the application domain, non-functional constraints are as important as functional constraints when specifying a program. |
---|
[3634] | 18 | Real-time systems, with hard limits on application response times, implementations of cryptographic primitives, which must be hardened to timing side-channel attacks, and a heart pacemaker's embedded controller, which must fit inside a limited code memory space, are examples that fit this pattern. |
---|
[3633] | 19 | A cryptography library susceptible to a timing side-channel attack is not an annoyance---it's an implementation error that undermines the entire purpose of the library. |
---|
[3613] | 20 | |
---|
[3634] | 21 | A program's non-functional constraints may be given \emph{concretely}, or \emph{asymptotically}. |
---|
[3635] | 22 | Asymptotic complexity, as every Computer Science undergraduate knows, is important---but so too is concrete complexity for many applications, including the three we highlighted above as examples. |
---|
[3636] | 23 | A real-time system's response time is measured in seconds, milliseconds, or some other fundamental unit of time; a cryptographic library must have all execution paths execute in the same number of processor cycles, independent of any input passed by the client; and the size of an embedded controller for a pacemaker is measured in bits, bytes, or some other unit of computer memory capacity. |
---|
| 24 | In all cases, resource consumption is measured (and is specified) in some basal, concrete unit of measure. |
---|
[3634] | 25 | |
---|
[3633] | 26 | Currently, a program's functional properties can be established by combining user annotations---preconditions, invariants, and so on---with various automated and semi-automated analyses---invariant generators, type systems, abstract interpretations, applications of theorem proving, and so on---on the high-level source code of the program. |
---|
[3636] | 27 | Functional properties of a program are therefore established by reasoning about the source code that the application programmer actually sees and writes. |
---|
[3633] | 28 | Further, the results of any analysis can be communicated to the programmer in terms of abstractions, control flow, and an overall system design that they are familiar with. |
---|
[3614] | 29 | |
---|
[3636] | 30 | By contrast, a program's concrete, non-functional properties are established by reasoning on low-level object code produced not by a programmer, but by a compiler. |
---|
[3635] | 31 | Whilst analyses operating at this level can and do produce very accurate results---Worst Case Execution Time (WCET) analysis can be extraordinarily accurate, for example---analysis at such a low-level of abstraction invariably has disadvantages: |
---|
| 32 | \begin{itemize} |
---|
[3613] | 33 | \item |
---|
[3633] | 34 | It can be hard to deduce the high-level structure of the program after compiler optimisations. |
---|
| 35 | The object code produced by an optimising compiler may have a radically different control flow to the original source code program. |
---|
[3613] | 36 | \item |
---|
[3634] | 37 | Object code analysis is unstable. |
---|
[3633] | 38 | Modern compilers and linkers are highly non-compositional: they implement a variety of sophisticated optimisation passes, and use an abundance of procedural, intra-procedural, module-level, and link-time analyses to direct these optimisations. |
---|
[3634] | 39 | As a result, small changes in high-level source code can produce radically different object code outputs from a compiler, affecting any analysis tied to that object code. |
---|
[3613] | 40 | \item |
---|
[3633] | 41 | It is well understood by Software Engineers that problems with the design or implementation of a program are cheaper to resolve early in the development process. %cite |
---|
[3634] | 42 | Despite this, techniques that operate on object code are not useful early in the development process of a program, where programs may be incomplete, with missing functionality, or missing modules. |
---|
[3613] | 43 | \item |
---|
[3633] | 44 | Parametric cost analysis is very hard: how can we translate a cost that depends on the execution state, for example the value of a register or a carry bit, to a cost that the user can understand whilst looking at the source code? |
---|
| 45 | \item |
---|
| 46 | Performing functional analysis on object code makes it hard for the programmer to provide information about the program and its expected execution, leading to a loss of precision in the resulting analysis. |
---|
| 47 | It is hard for the programmer to understand the results of the analysis, or direct its execution, as high-level abstractions, control flow constructs, and so on, introduced by the programmer are `translated away'. |
---|
[3635] | 48 | \end{itemize} |
---|
[3613] | 49 | |
---|
[3636] | 50 | More ideal would be a high-level analysis of concrete, non-functional properties, coupled with the accuracy one would expect of e.g. existing WCET analysers. |
---|
| 51 | This would lead to a reconciliation of functional and non-functional analyses. |
---|
| 52 | Information could be shared between the two analyses, and both could be performed concurrently on the same high-level source code. |
---|
[3635] | 53 | |
---|
[3636] | 54 | What has previously prevented high-level reasoning about non-functional properties is the lack of a uniform and precise cost model for programs written in programming languages, such as C. |
---|
[3635] | 55 | Since modern compilers---as discussed previously---may compile each expression and statement occurrence in radically different ways, optimisations may change control flow, and the cost of an object code instruction may depend on the runtime state of hardware components like pipelines and caches, none of which are visible in the source code, it has not been clear how one could go about defining such a cost model. |
---|
| 56 | |
---|
[3636] | 57 | In this paper, we report on the scientific and technical contributions of the Certified Complexity (`CerCo') project, an EU-funded collaboration between the Universities of Bologna, Edinburgh, and Paris 7. |
---|
| 58 | |
---|
| 59 | We first present a uniform, precise cost model for a large fragment of the C programming language---\emph{CerCo C}, which is similar to early versions of \emph{CompCert C}. |
---|
| 60 | To define this cost model, we have developed a technical device, which we have dubbed \emph{the labelling approach}~\cite{labelling}, a technique to implement compilers that induce cost models on source programs by a very lightweight tracking of code changes through the compilation pipeline. |
---|
| 61 | |
---|
| 62 | To establish that our approach works, and scales beyond `toy' languages to more realistic programming languages, we have implemented a compiler using this technique. |
---|
| 63 | The \emph{CerCo verified C compiler} compiles the CerCo C language fragment to object binaries for the MCS-51 8-bit embedded microcontroller. |
---|
| 64 | This is a well-known and still popular processor in the embedded systems community. |
---|
| 65 | Our compiler lifts a cost-model induced on machine code that it produces, back through the compilation chain, to the source-level. |
---|
| 66 | At the source level, this cost model is made manifest as parametric cost-annotations, inserted into the original source code file of the program being compiled. |
---|
| 67 | These annotations may be used by the programmer to predict concrete execution time and stack space usage. |
---|
| 68 | |
---|
| 69 | As we are targeting an embedded microcontroller we have not considered dynamic memory allocation. |
---|
| 70 | However, we believe that our labelling approach is general enough to handle resources other than execution time and stack space usage, including dynamic memory allocation. |
---|
| 71 | |
---|
| 72 | To demonstrate source-level verification of costs we have implemented a Frama-C plugin~\cite{framac} that invokes our compiler on a source program and uses it to generate invariants on the high-level source that correctly model low-level costs. |
---|
| 73 | The plugin certifies that the program respects these costs by calling automated theorem provers, an innovative and novel technique in the field of static analysis for non-functional properties. |
---|
| 74 | Using this plugin, we have conducted case studies, including showing that the plugin can automatically compute and certify the exact reaction time of Lustre~\cite{lustre} data flow programs compiled into C. |
---|
| 75 | |
---|
| 76 | The compiler is verified using the Matita interactive theorem prover, an implementation of the Calculus of Coinductive Constructions, developed at the University of Bologna. |
---|
| 77 | |
---|
[3620] | 78 | \paragraph{Vision.} |
---|
[3613] | 79 | |
---|
[3635] | 80 | |
---|
[3614] | 81 | We envision a new generation of compilers that track program structure through |
---|
| 82 | compilation and optimisation and exploit this information to define a precise, |
---|
| 83 | non-uniform cost model for source code that accounts for runtime state. With |
---|
| 84 | such a cost model we can perform non-functional verification in a similar way |
---|
| 85 | to functional verification and exploit the state of the art in automated |
---|
| 86 | high-level verification~\cite{survey}. |
---|
| 87 | |
---|
| 88 | The techniques currently used by the Worst Case Execution Time (WCET) community, |
---|
| 89 | which perform analysis on object code, are still applicable but can be coupled |
---|
| 90 | with additional source-level analysis. In cases where our approach produces |
---|
| 91 | overly complex cost models, safe approximations can be used to trade complexity |
---|
| 92 | for precision. |
---|
| 93 | |
---|
[3613] | 94 | Finally, source code analysis can be used early in the development process, |
---|
[3614] | 95 | when components have been specified but not implemented, as modularity means |
---|
| 96 | that it is enough to specify the non-functional behaviour of missing components. |
---|
[3613] | 97 | |
---|
| 98 | \subsection{Project context and approach} |
---|
| 99 | |
---|
[3614] | 100 | Formal methods for verifying functional properties of programs have now reached |
---|
| 101 | such a level of maturity and automation that their adoption is slowly |
---|
| 102 | increasing in production environments. |
---|
[3613] | 103 | |
---|
[3614] | 104 | For safety critical code, it is becoming commonplace to combine rigorous |
---|
| 105 | software engineering methodologies and testing with static analyses, taking the |
---|
| 106 | strengths of each and mitigating their weaknesses. Of particular interest are |
---|
| 107 | open frameworks for the combination of different formal methods, where the |
---|
| 108 | programs can be progressively specified and enriched with new safety |
---|
| 109 | guarantees: every method contributes knowledge (e.g. new invariants) that |
---|
| 110 | can be built upon by other analysis methods. |
---|
[3613] | 111 | |
---|
[3614] | 112 | The outlook for verification of non-functional properties of programs (time |
---|
| 113 | spent, memory used, energy consumed) is bleaker. In most cases, verifying that |
---|
| 114 | real-time systems meet their deadlines is done by simply performing many runs |
---|
| 115 | of the system and timing their execution, computing the maximum time and adding |
---|
| 116 | an empirical safety margin, claiming the result to be a bound for the WCET of |
---|
| 117 | the program. |
---|
[3613] | 118 | |
---|
[3614] | 119 | Formal methods and software to statically analyse the WCET of programs exist, |
---|
| 120 | but they often produce bounds that are too pessimistic to be useful. Recent |
---|
| 121 | advances in hardware architecture have focused on improving average case |
---|
| 122 | performance, not predictability of the worst case. |
---|
[3613] | 123 | |
---|
[3614] | 124 | Execution time is becoming increasingly dependent on execution history and the |
---|
| 125 | internal state of hardware components like pipelines and caches. Multi-core |
---|
| 126 | processors and non-uniform memory models are drastically reducing the |
---|
| 127 | possibility of performing static analysis in isolation, because programs are |
---|
| 128 | less and less time-composable. Clock-precise hardware models are necessary for |
---|
| 129 | static analysis, and obtaining them is becoming harder due to the increased |
---|
| 130 | sophistication of hardware design. |
---|
| 131 | |
---|
| 132 | The need for reliable real-time systems and programs is increasing, and there |
---|
| 133 | is pressure from the research community for the introduction of hardware with |
---|
| 134 | more predictable behaviour, which would be more suitable for static analysis. |
---|
| 135 | One example, being investigated by the Proartis project~\cite{proartis}, is to |
---|
| 136 | decouple execution time from execution history by introducing randomisation. |
---|
| 137 | |
---|
| 138 | In CerCo~\cite{cerco} we do not address this problem, optimistically assuming |
---|
| 139 | that improvements in low-level timing analysis or architecture will make |
---|
| 140 | verification feasible in the longer term. |
---|
| 141 | |
---|
| 142 | Instead, the main objective of our work is to bring together static analysis of |
---|
| 143 | functional and non-functional properties, which in the current state of the art |
---|
| 144 | are independent activities with limited exchange of information: while the |
---|
| 145 | functional properties are verified on the source code, the analysis of |
---|
| 146 | non-functional properties is performed on object code to exploit clock-precise |
---|
| 147 | hardware models. |
---|
| 148 | |
---|
[3634] | 149 | \subsection{Current object code methods} |
---|
[3613] | 150 | |
---|
| 151 | Analysis currently takes place on object code for two main reasons. |
---|
[3614] | 152 | |
---|
| 153 | Firstly, there cannot be a uniform, precise cost model for source code |
---|
| 154 | instructions (or even basic blocks). During compilation, high level |
---|
[3613] | 155 | instructions are broken up and reassembled in context-specific ways so that |
---|
| 156 | identifying a fragment of object code and a single high level instruction is |
---|
[3616] | 157 | unfeasible. |
---|
[3613] | 158 | |
---|
[3614] | 159 | Additionally, the control flow of the object and source code can be very |
---|
| 160 | different as a result of optimisations. For example, aggressive loop |
---|
| 161 | optimisations can completely transform source level loops. |
---|
| 162 | |
---|
| 163 | Despite the lack of a uniform, compilation- and program-independent cost model |
---|
| 164 | on the source language, research on the analysis of non-asymptotic execution |
---|
| 165 | time on high level languages assuming such a model is growing and gaining |
---|
| 166 | momentum. |
---|
| 167 | |
---|
| 168 | Unless such cost models are developed, the future practical impact of this |
---|
| 169 | research looks to be minimal. One existing approach is the EmBounded |
---|
| 170 | project~\cite{embounded}, which compositionally compiles high-level code to a |
---|
| 171 | byte code that is executed by an interpreter with guarantees on the maximal |
---|
| 172 | execution time spent for each byte code instruction. This provides a model |
---|
| 173 | that is uniform, though at the expense of precision (each cost is a pessimistic |
---|
| 174 | upper bound) and the performance of the executed code (the byte code is |
---|
| 175 | interpreted compositionally). |
---|
| 176 | |
---|
| 177 | The second reason to perform analysis on the object code is that bounding |
---|
[3613] | 178 | the worst case execution time of small code fragments in isolation (e.g. loop |
---|
| 179 | bodies) and then adding up the bounds yields very poor estimates as no |
---|
[3614] | 180 | knowledge of the hardware state prior to executing the fragment can be assumed. |
---|
[3613] | 181 | |
---|
[3614] | 182 | By analysing longer runs the bound obtained becomes more precise because the |
---|
| 183 | lack of information about the initial state has a relatively small impact. |
---|
[3613] | 184 | |
---|
[3614] | 185 | To calculate the cost of an execution, value and control flow analyses are |
---|
| 186 | required to bound the number of times each basic block is executed. Currently, |
---|
| 187 | state of the art WCET analysis tools, such as AbsInt's aiT |
---|
| 188 | toolset~\cite{absint}, perform these analyses on object code, where the logic |
---|
| 189 | of the program is harder to reconstruct and most information available at the |
---|
| 190 | source code level has been lost; see~\cite{stateart} for a survey. |
---|
| 191 | |
---|
| 192 | Imprecision in the analysis can lead to useless bounds. To augment precision, |
---|
| 193 | currently tools ask the user to provide constraints on the object code control |
---|
| 194 | flow, usually in the form of bounds on the number of iterations of loops or |
---|
| 195 | linear inequalities on them. This requires the user to manually link the source and object code, translating their assumptions on the source code (which may be |
---|
| 196 | wrong) to object code constraints. This task is hard and error-prone, |
---|
| 197 | especially in the presence of complex compiler optimisations. |
---|
| 198 | |
---|
[3613] | 199 | Traditional techniques for WCET that work on object code are also affected by |
---|
| 200 | another problem: they cannot be applied before the generation of the object |
---|
| 201 | code. Functional properties can be analysed in early development stages, while |
---|
| 202 | analysis of non-functional properties may come too late to avoid expensive |
---|
| 203 | changes to the program architecture. |
---|
| 204 | |
---|
[3614] | 205 | \subsection{The CerCo approach} |
---|
[3613] | 206 | |
---|
| 207 | In CerCo we propose a radically new approach to the problem: we reject the idea |
---|
| 208 | of a uniform cost model and we propose that the compiler, which knows how the |
---|
| 209 | code is translated, must return the cost model for basic blocks of high level |
---|
| 210 | instructions. It must do so by keeping track of the control flow modifications |
---|
| 211 | to reverse them and by interfacing with processor timing analysis. |
---|
| 212 | |
---|
| 213 | By embracing compilation, instead of avoiding it like EmBounded did, a CerCo |
---|
[3614] | 214 | compiler can both produce efficient code and return costs that are as precise |
---|
| 215 | as the processor timing analysis can be. Moreover, our costs can be parametric: |
---|
| 216 | the cost of a block can depend on actual program data, on a summary of the |
---|
| 217 | execution history, or on an approximated representation of the hardware state. |
---|
| 218 | |
---|
| 219 | For example, loop optimisations may assign a cost to a loop body that is a |
---|
| 220 | function of the number of iterations performed. As another example, the cost of |
---|
| 221 | a block may be a function of the vector of stalled pipeline states, which can |
---|
| 222 | be exposed in the source code and updated at each basic block exit. |
---|
| 223 | |
---|
| 224 | It is parametricity that allows one to analyse small code fragments without |
---|
| 225 | losing precision. In the analysis of the code fragment we do not have to ignore |
---|
| 226 | the initial hardware state; rather, we can assume that we know exactly which |
---|
[3613] | 227 | state (or mode, as the WCET literature calls it) we are in. |
---|
| 228 | |
---|
| 229 | The CerCo approach has the potential to dramatically improve the state of the |
---|
[3614] | 230 | art. By performing control and data flow analyses on the source code, the |
---|
| 231 | error-prone translation of invariants is avoided entirely. Instead, this work |
---|
| 232 | is done at the source code level using tools of the user's choice. |
---|
[3613] | 233 | |
---|
[3614] | 234 | Any available technique for the verification of functional properties can be |
---|
| 235 | easily reused and multiple techniques can collaborate to infer and certify cost |
---|
| 236 | invariants for the program. There are no limitations on the types of loops or |
---|
| 237 | data structures involved. |
---|
[3613] | 238 | |
---|
[3614] | 239 | Parametric cost analysis becomes the default, with non-parametric bounds used |
---|
| 240 | only as a last resort when the user decides to trade the complexity of the |
---|
| 241 | analysis for more precision. |
---|
| 242 | |
---|
| 243 | \emph{A priori}, no technique previously used in traditional WCET is obsolete: |
---|
| 244 | processor timing analysis can be used by the compiler on the object code, and |
---|
| 245 | other techniques can be applied at the source code level. |
---|
| 246 | |
---|
| 247 | Our approach also works in the early stages of development by allowing the user |
---|
| 248 | to axiomatically attach costs to unimplemented components. |
---|
| 249 | |
---|
| 250 | Software used to verify properties of programs must be as bug-free as possible. |
---|
| 251 | The trusted code base for verification consists of the code that needs to be |
---|
| 252 | trusted to believe that the property holds. |
---|
| 253 | |
---|
| 254 | The trusted code base of state-of-the-art WCET tools is very large: one needs |
---|
| 255 | to trust the control flow analyser, the linear programming libraries used, and |
---|
| 256 | also the formal models of the hardware under analysis. |
---|
| 257 | |
---|
| 258 | In CerCo we move the control flow analysis to the source code level, and we |
---|
| 259 | introduce a non-standard compiler. To reduce the size of the trusted code base, |
---|
| 260 | we have implemented a prototype compiler and static analyser in an interactive |
---|
[3613] | 261 | theorem prover, which was used to certify that the costs added to the source |
---|
[3614] | 262 | code are indeed those incurred by the hardware. We have also implemented formal models of the hardware and of the high level source language in the interactive |
---|
| 263 | theorem prover. |
---|
| 264 | |
---|
| 265 | Control flow analysis on the source code has been obtained using invariant |
---|
| 266 | generators, tools to produce proof obligations from generated invariants and |
---|
| 267 | automatic theorem provers to verify the obligations. If these tools are able to |
---|
| 268 | generate proof traces that can be independently checked, the only remaining |
---|
| 269 | component that enters the trusted code base is an off-the-shelf invariant |
---|
| 270 | generator which, in turn, can be proved correct using an interactive theorem |
---|
| 271 | prover. |
---|
| 272 | |
---|
| 273 | With these methods, we achieve the objective of allowing the use of more |
---|
| 274 | off-the-shelf components (e.g. provers and invariant generators) whilst |
---|
| 275 | reducing the trusted code base at the same time. |
---|
| 276 | |
---|
| 277 | \subsection{Introduction to Matita} |
---|
| 278 | |
---|
[3619] | 279 | Matita is a theorem prover based on a variant of the Calculus of Coinductive Constructions~\cite{asperti:user:2007}. |
---|
| 280 | The system features a full spectrum of dependent types and (co)inductive families, a system of coercions, a tactic-driven proof construction engine~\cite{sacerdoti-coen:tinycals:2007}, and paramodulation based automation~\cite{asperti:higher-order:2007}, all of which we exploit in the formalisation described herein. |
---|
[3614] | 281 | |
---|
[3619] | 282 | Matita's syntax is similar to the syntaxes of mainstream functional programming languages such as OCaml or Standard ML. |
---|
| 283 | The type theory that Matita implements is broadly akin to that of Coq~\cite{coq:2004} and Agda~\cite{bove:brief:2009}. |
---|
| 284 | Nevertheless, we provide a brief explanation of the main syntactic and type-theoretic features of Matita that will be needed to follow the body of the paper: |
---|
| 285 | \begin{itemize} |
---|
| 286 | \item |
---|
| 287 | Non-recursive functions and definitions are introduced via the \texttt{definition} keyword. |
---|
| 288 | Recursive functions are introduced with \texttt{let rec}. |
---|
| 289 | Mutually recursive functions are separated via the \texttt{and} keyword. |
---|
| 290 | Matita's termination checker ensures that all recursive functions are terminating before being admitted to maintain soundness of the system's logic. |
---|
| 291 | \item |
---|
| 292 | Matita has an infinite hierarchy of type universes. |
---|
| 293 | A single impredicative universe of types, \texttt{Prop}, exists at the base of this hierarchy. |
---|
| 294 | An infinite series of predicative universes, \texttt{Type[0]} : \texttt{Type[1]} : \texttt{Type[2]}, and so on and so forth, sits atop \texttt{Prop}. |
---|
| 295 | Matita, unlike Coq or Agda, implements no form of typical ambiguity or universe polymorphism, with explicit concrete universe levels being preferred instead. |
---|
| 296 | \item |
---|
| 297 | Matita's type theory plays host to a rich and expressive higher-order logic. |
---|
| 298 | Constants \texttt{True} and \texttt{False} represent truth and falsity in \texttt{Prop} respectively. |
---|
| 299 | Two inductive families in \texttt{Prop} encode conjunction and disjunction---$\mathtt{P \wedge Q}$ and $\mathtt{P \vee Q}$ respectively. |
---|
[3618] | 300 | |
---|
[3619] | 301 | As is usual, implication and universal quantification are identified with the dependent function space ($\Pi$ types), whereas (constructive) existential quantification is encoded as a dependent sum (a $\Sigma$-type). |
---|
| 302 | We write $\All{x : \phi}\psi$ for the dependent function space, and abbreviate this as $\phi \rightarrow \psi$ when $x \not\in fv(\psi)$ as usual. |
---|
| 303 | We use $\langle M,N \rangle$ for the pairing of $M$ and $N$. |
---|
| 304 | \item |
---|
| 305 | Inductive and coinductive families are introduced via the \texttt{inductive} and \texttt{coinductive} keywords respectively, with named constructor declarations separated by a bar. |
---|
| 306 | Mutually inductive data family declarations are separated via \texttt{with}. |
---|
| 307 | In the following declaration: |
---|
| 308 | \begin{lstlisting}[language=Grafite] |
---|
| 309 | inductive I ($P_1$ : $\tau_1$) $\ldots$ ($P_n$ : $\tau_n$) : $\phi_1 \rightarrow \ldots \rightarrow \phi_m \rightarrow \phi$ := $\ldots$ |
---|
| 310 | \end{lstlisting} |
---|
| 311 | We call $P_i$ for $0 \leq i \leq n$ the \textbf{parameters} of \texttt{I} and $\phi_j$ for $0 \leq j \leq m$ the \textbf{indices} of \texttt{I}. |
---|
| 312 | Matita's positivity checker ensures that constructors have strictly-positive types before admitting an inductive family to maintain soundness of the system's logic. |
---|
| 313 | \item |
---|
| 314 | Records are introduced with the \texttt{record} keyword. |
---|
| 315 | A Matita record |
---|
| 316 | \begin{lstlisting}[language=Grafite] |
---|
| 317 | record R : Type[0] := { F1 : nat }. |
---|
| 318 | \end{lstlisting} |
---|
| 319 | may be thought of as syntactic sugar for a single-constructor inductive data type of the same name: |
---|
| 320 | \begin{lstlisting}[language=Grafite] |
---|
| 321 | inductive R : Type[0] := |
---|
| 322 | | mk_R : nat -> R. |
---|
| 323 | \end{lstlisting} |
---|
| 324 | A record field's type may depend on fields declared earlier in the record. |
---|
| 325 | |
---|
| 326 | Records may be decomposed with projections. |
---|
| 327 | Projections, one for each of field of a record, are registered in the global context. |
---|
| 328 | In the example record above, \texttt{F1} of type $R \rightarrow nat$ is registered as a field projection and $mk\_R$ of type $nat \rightarrow R$ is registered as a constructor. |
---|
| 329 | |
---|
| 330 | Record fields may also be marked as coercions. |
---|
| 331 | In the following example |
---|
| 332 | \begin{lstlisting}[language=Grafite] |
---|
| 333 | record S : Type[1] := |
---|
| 334 | { |
---|
| 335 | Carrier :> Type[0]; |
---|
| 336 | op : Carrier -> Carrier -> Carrier |
---|
| 337 | } |
---|
| 338 | \end{lstlisting} |
---|
| 339 | the field \texttt{Carrier} is declared to be a coercion with `\texttt{:>}'.with the operational effect being that the field projection \texttt{Carrier} may be omitted where it could be successfully inferred by Matita. |
---|
| 340 | Field coercions facilitate the informal but common mathematical practice of intentionally confusing a structure with its underlying carrier set. |
---|
| 341 | \item |
---|
| 342 | Terms may be freely omitted, allowing the user to write down partial types and terms. |
---|
| 343 | A question mark, \texttt{?}, denotes a single term that has been omitted by the user. |
---|
| 344 | Some omitted terms can be deduced by Matita's refinement system. |
---|
| 345 | Other, more complex goals arising from omitted terms may require user input to solve, in which case a proof obligation is opened for each term that cannot be deduced automatically. |
---|
| 346 | Three consecutive dots, \texttt{$\ldots$}, denote multiple terms or types that have been omitted. |
---|
| 347 | \item |
---|
| 348 | Data may be decomposed by pattern matching with a \texttt{match} expression. |
---|
| 349 | We may fully annotate a \texttt{match} expression with its return type. |
---|
| 350 | This is especially useful when working with indexed families of types or with invariants, expressed as types, on functions. |
---|
| 351 | In the following |
---|
| 352 | \begin{lstlisting}[language=Grafite] |
---|
| 353 | match t return $\lam{x}x = 0 \rightarrow bool$ with |
---|
| 354 | [ 0 $\Rightarrow$ $\lam{prf_1}P_1$ |
---|
| 355 | | S m $\Rightarrow$ $\lam{prf_2}P_2$ |
---|
| 356 | ] (refl $\ldots$ t) |
---|
| 357 | \end{lstlisting} |
---|
| 358 | the \texttt{0} branch of the \texttt{match} expression returns a function from $0 = 0$ to \texttt{bool}, whereas the \texttt{S m} branch of the \texttt{match} expression returns a function from \texttt{S m = 0} to \texttt{bool}. |
---|
| 359 | In both cases the annotated return type $\lam{x}x = 0 \rightarrow bool$ has been specialised given new information about \texttt{t} revealed by the act of pattern matching. |
---|
| 360 | The entire term, with \texttt{match} expression applied to \texttt{refl $\ldots$ t}, has type \texttt{bool}. |
---|
| 361 | \item |
---|
| 362 | Matita features a liberal system of coercions (distinct from the previously mentioned record field coercions). |
---|
| 363 | It is possible to define a uniform coercion $\lam{x}\langle x, ?\rangle$ from every type $T$ to the dependent product $\Sigma{x : T}. P x$. |
---|
| 364 | The coercion opens a proof obligation that asks the user to prove that $P$ holds for $x$. |
---|
| 365 | When a coercion is to be applied to a complex term (for example, a $\lambda$-abstraction, a local definition, or a case analysis), the system automatically propagates the coercion to the sub-terms. |
---|
| 366 | For instance, to apply a coercion to force $\lam{x}M : A \rightarrow B$ to |
---|
| 367 | have type $\All{x : A}\Sigma{y : B}. P x y$, the system looks for a coercion from $M : B$ to $\Sigma{y : B}. P x y$ in a context augmented with $x : A$. |
---|
| 368 | This is significant when the coercion opens a proof obligation, as the user will be presented with multiple, but simpler proof obligations in the correct context. |
---|
| 369 | In this way, Matita supports the `Russell' proof methodology developed by Sozeau in~\cite{sozeau:subset:2007}, in a lightweight but tightly-integrated manner. |
---|
[3618] | 370 | \end{itemize} |
---|
[3619] | 371 | Throughout, for reasons of clarity, conciseness, and readability, we may choose to simplify or omit parts of Matita code. |
---|
| 372 | We will always ensure that these omissions do not mislead the reader. |
---|
[3618] | 373 | |
---|
[3614] | 374 | \subsection{Map of the paper} |
---|
| 375 | |
---|
[3616] | 376 | The rest of the paper is structured as follows. |
---|
[3614] | 377 | |
---|
| 378 | In section~\ref{sect.compiler.architecture}, we describe the architecture of the |
---|
| 379 | CerCo compiler, as well as the intermediate languages that it uses. We also |
---|
| 380 | describe the target hardware and its formal model. |
---|
| 381 | |
---|
| 382 | In section~\ref{sect.compiler.proof}, we describe the proof of correctness of |
---|
| 383 | the compiler in more detail. We explain our use of structured traces, the |
---|
| 384 | labelling approach, and discuss the assembler. |
---|
| 385 | |
---|
| 386 | In section~\ref{sect.formal.development}, we present data on the formal |
---|
| 387 | development. |
---|
| 388 | |
---|
| 389 | In section~\ref{sect.framac.plugin}, we discuss the Frama-C plugin, as well as |
---|
| 390 | some of the case studies we have performed to validate it. |
---|
| 391 | |
---|
[3619] | 392 | Finally, in section~\ref{sect.conclusions} we present conclusions, as well as |
---|
[3614] | 393 | related and future work. |
---|