# Changeset 3642

Ignore:
Timestamp:
Mar 8, 2017, 4:24:40 PM (8 months ago)
Message:

...

Location:
Papers/jar-cerco-2017
Files:
2 edited

### Legend:

Unmodified
 r3641 In this work, we have instantiated the CerCo approach with a cost model for the MCS-51 hardware that our prototype compiler targets. As this hardware is relatively simple, it admits a precise cost model, and this is reflected in our final theorem, which demonstrates that our high-level cost annotations are both accurate and precise. On more modern, exhibiting features such as caches, pipelines, and speculative execution, an exact cost model for time or space will be hard-to-impossible to define. On more modern hardware, exhibiting features such as caches, pipelines, and speculative execution, an exact cost model for time or space will be hard-to-impossible to define. Instead, existing WCET analyses could be used, and then lifted to high-level annotations using our non-standard compiler architecture. We take it as an article of faith that software used to verify properties of programs must be as bug-free as possible. We take it as an article of faith that software used to verify other software must itself be as bug-free as possible---preferably verified itself. One way to achieve this is to reduce the trusted code base of any tools used. Here, trusted code base' refers to the corpus of code that must be trusted to believe that the analysis being carried out is accurate. The trusted code base of state-of-the-art WCET tools is significant: one needs to trust the control flow analyser, the linear programming libraries used, and also the formal models of the hardware under analysis. The trusted code base of state-of-the-art static analysis tools, such as WCET analysers, is significant. These are sophisticated, complex pieces of software that are built around a control flow analyser, linear programming libraries, and also detailed formal models of the hardware under analysis. To trust a WCET tool, each of these subcomponents, along with their eventual combination, must all be trusted. In CerCo we move the control flow analysis to the source code level, and we introduce a non-standard compiler. As this compiler architecture is completely novel, we wish to ensure that our implementation (and by extension, our entire approach) is correct. Therefore, to reduce the size of our trusted code base, we have implemented a prototype compiler and static analyser in an interactive theorem prover---the Matita interactive theorem prover, an implementation of the Calculus of Coinductive Constructions---which was used to certify that the costs added to the source code are indeed those incurred by the hardware. As this compiler architecture is completely novel, we wish to ensure that our implementation (and by extension, our entire approach to lifting cost models) is correct. Therefore, to reduce the size of our trusted code base, we have implemented a prototype compiler and static analyser in an interactive theorem prover---Matita~\cite{matita,asperti:user:2007}, an implementation of the Calculus of Coinductive Constructions---which was used to certify that the cost annotations supplementing the program's source code are indeed an accurate reflection of those incurred by the hardware. We have also implemented formal models of the hardware and of the high-level source-language in the interactive theorem prover. We now briefly introduce Matita, the interactive theorem prover that was used in CerCo to certify all proofs. Matita is an interactive theorem proving system based on a variant of the Calculus of Coinductive Constructions~\cite{asperti:user:2007}. The system features a full spectrum of dependent types and (co)inductive families, a system of coercions, a tactic-driven proof construction engine~\cite{sacerdoti-coen:tinycals:2007}, and paramodulation based automation~\cite{asperti:higher-order:2007}, all of which we exploit in the formalisation described herein. Matita is an interactive theorem proving system based on a variant of the Calculus of Coinductive Constructions~\cite{matita,asperti:user:2007}. The system features (co)inductive families of types, a system of coercions, a tactic-driven proof construction engine~\cite{sacerdoti-coen:tinycals:2007}, and paramodulation based automation~\cite{asperti:higher-order:2007}, all of which we exploit in the formalisation described herein. Matita's syntax is similar to the syntaxes of mainstream functional programming languages such as OCaml or Standard ML. Recursive functions are introduced with \texttt{let rec}. Mutually recursive functions are separated via the \texttt{and} keyword. Matita's termination checker ensures that all recursive functions are terminating before being admitted to maintain soundness of the system's logic. Matita's termination checker ensures that all recursive functions are terminating before being admitted into the global environment, to maintain soundness. \item Matita has an infinite hierarchy of type universes. A single impredicative universe of types, \texttt{Prop}, exists at the base of this hierarchy. An infinite series of predicative universes, \texttt{Type[0]} : \texttt{Type[1]} : \texttt{Type[2]}, and so on and so forth, sits atop \texttt{Prop}. An unbounded series of predicative universes, \texttt{Type[0]}, \texttt{Type[1]}, \texttt{Type[2]}, and so on and so forth, sits atop \texttt{Prop}. One peculiarity of Matita is a user-customisable hierarchy of universes. In our development, we work in a context where \texttt{Prop} is contained in \texttt{Type[0]}, \texttt{Type[0]} is contained in \texttt{Type[1]}, and so on. Matita, unlike Coq or Agda, implements no form of typical ambiguity or universe polymorphism, with explicit concrete universe levels being preferred instead. \item Constants \texttt{True} and \texttt{False} represent truth and falsity in \texttt{Prop} respectively. Two inductive families in \texttt{Prop} encode conjunction and disjunction---$\mathtt{P \wedge Q}$ and $\mathtt{P \vee Q}$ respectively. As is usual, implication and universal quantification are identified with the dependent function space ($\Pi$ types), whereas (constructive) existential quantification is encoded as a dependent sum (a $\Sigma$-type). We write $\All{x : \phi}\psi$ for the dependent function space, and abbreviate this as $\phi \rightarrow \psi$ when $x \not\in fv(\psi)$ as usual. \end{lstlisting} We call $P_i$ for $0 \leq i \leq n$ the \textbf{parameters} of \texttt{I} and $\phi_j$ for $0 \leq j \leq m$ the \textbf{indices} of \texttt{I}. Matita's positivity checker ensures that constructors have strictly-positive types before admitting an inductive family to maintain soundness of the system's logic. Matita's positivity checker ensures that constructors have strictly-positive types before admitting a (co)inductive family of types into the global environment, to maintain soundness. \item Records are introduced with the \texttt{record} keyword. record R : Type[0] := { F1 : nat }. \end{lstlisting} may be thought of as syntactic sugar for a single-constructor inductive data type of the same name: may be thought of as syntactic sugar for a single-constructor inductive type of the same name: \begin{lstlisting}[language=Grafite] inductive R : Type[0] := | mk_R : nat -> R. \end{lstlisting} A record field's type may depend on fields declared earlier in the record. Types of later fields may depend on fields declared earlier in the record. Records may be decomposed with projections. Projections, one for each of field of a record, are registered in the global context. In the example record above, \texttt{F1} of type $R \rightarrow nat$ is registered as a field projection and $mk\_R$ of type $nat \rightarrow R$ is registered as a constructor. Record fields may also be marked as coercions. In the following example Terms may be freely omitted, allowing the user to write down partial types and terms. A question mark, \texttt{?}, denotes a single term that has been omitted by the user. Some omitted terms can be deduced by Matita's refinement system. Some omitted terms can be deduced by Matita's refinement system~\cite{DBLP:journals/corr/abs-1202-4905}. Other, more complex goals arising from omitted terms may require user input to solve, in which case a proof obligation is opened for each term that cannot be deduced automatically. Three consecutive dots, \texttt{$\ldots$}, denote multiple terms or types that have been omitted. \item Data may be decomposed by pattern matching with a \texttt{match} expression. Three consecutive dots, \texttt{$\ldots$}, denote a vector of terms or types that have been omitted, and need to be inferred. \item (Co)data may be decomposed by pattern matching with a \texttt{match} expression. We may fully annotate a \texttt{match} expression with its return type. This is especially useful when working with indexed families of types or with invariants, expressed as types, on functions. \subsection{Map of the paper} The rest of the paper is structured as follows. In section~\ref{sect.compiler.architecture}, we describe the architecture of the CerCo compiler, as well as the intermediate languages that it uses. We also describe the target hardware and its formal model. In section~\ref{sect.compiler.proof}, we describe the proof of correctness of the compiler in more detail. We explain our use of structured traces, the labelling approach, and discuss the assembler. In section~\ref{sect.formal.development}, we present data on the formal development. In section~\ref{sect.framac.plugin}, we discuss the Frama-C plugin, as well as some of the case studies we have performed to validate it. Finally, in section~\ref{sect.conclusions} we present conclusions, as well as related and future work. The rest of the paper is structured as follows: In Section~\ref{sect.compiler.architecture}, we describe the architecture of the CerCo compiler. We describe the compiler's intermediate languages, from CerCo C down to MCS-51 machine code, and their formal semantics. We also describe the optimisations that the CerCo verified C compiler implements. In Section~\ref{sect.compiler.proof}, we describe the proof of correctness of the compiler in more detail. We explain \emph{structured traces}, a technical device developed to complete the proof, and our labelling approach. We further discuss the verification of our translation and optimisation passes for the intermediate languages, and discuss our verified assembler. In Section~\ref{sect.formal.development}, we present statistical data on the formal development, including line counts, and so on, and discuss remaining axioms in our proof of the compiler's correctness. We also briefly discuss lessons learnt' in the formal development of the CerCo verified C compiler. In Section~\ref{sect.framac.plugin}, we discuss the Frama-C plugin for certifying the concrete complexity of programs written in CerCo C. We also present a case study, demonstrating the plugin in action. Finally, in Section~\ref{sect.conclusions} we conclude, discussing related and future work.