1 | \documentclass{llncs} |
---|
2 | \usepackage{hyperref} |
---|
3 | \usepackage{graphicx} |
---|
4 | \usepackage{color} |
---|
5 | \usepackage{listings} |
---|
6 | |
---|
7 | % NB: might be worth removing this if changing class in favour of |
---|
8 | % a built-in definition. |
---|
9 | %\newtheorem{theorem}{Theorem} |
---|
10 | |
---|
11 | \lstdefinelanguage{coq} |
---|
12 | {keywords={Definition,Lemma,Theorem,Remark,Qed,Save,Inductive,Record}, |
---|
13 | morekeywords={[2]if,then,else,forall,Prop,match,with,end,let}, |
---|
14 | } |
---|
15 | |
---|
16 | \lstdefinelanguage[mine]{C}[ANSI]{C}{ |
---|
17 | morekeywords={int8_t} |
---|
18 | } |
---|
19 | |
---|
20 | \lstset{language=[mine]C,basicstyle=\footnotesize\tt,columns=flexible,breaklines=false, |
---|
21 | keywordstyle=\color{red}\bfseries, |
---|
22 | keywordstyle=[2]\color{blue}, |
---|
23 | commentstyle=\color{green}, |
---|
24 | stringstyle=\color{blue}, |
---|
25 | showspaces=false,showstringspaces=false, |
---|
26 | xleftmargin=1em} |
---|
27 | |
---|
28 | \begin{document} |
---|
29 | |
---|
30 | \title{An Executable Semantics for CompCert C\thanks{The project CerCo acknowledges the financial support of the Future and |
---|
31 | Emerging Technologies (FET) programme within the Seventh Framework |
---|
32 | Programme for Research of the European Commission, under FET-Open grant |
---|
33 | number: 243881}} |
---|
34 | \author{Brian Campbell} |
---|
35 | \institute{LFCS, University of Edinburgh,\\\email{Brian.Campbell@ed.ac.uk}} |
---|
36 | \maketitle |
---|
37 | \begin{abstract} |
---|
38 | CompCert is a C compiler developed by Leroy et al, the |
---|
39 | majority of which is formalised and verified in the Coq proof |
---|
40 | assistant. The correctness theorem is defined in terms of a |
---|
41 | semantics for the `CompCert C' language, but how can we gain faith |
---|
42 | in those semantics? We explore one approach: building an equivalent |
---|
43 | executable semantics that we can check test suites of code |
---|
44 | against. |
---|
45 | \end{abstract} |
---|
46 | |
---|
47 | Flaws in a compiler are often reflected in the output they produce: |
---|
48 | buggy compilers produce buggy code. Moreover, bugs can evade testing |
---|
49 | when they only manifest themselves on source code of a particular shape, |
---|
50 | when particular optimisations are used. This has made compilers an |
---|
51 | appealing target for mechanized verification, from early work such as |
---|
52 | Milner and Weyhrauch's simple compiler in LCF~\cite{MilnerWeyhrauch72} |
---|
53 | to modern work on practical compilers, such as the Verisoft C0 |
---|
54 | compiler~\cite{Leinenbach200823} which is used as a key component of a |
---|
55 | larger verified software system. |
---|
56 | |
---|
57 | % TODO: does the NICTA work do anything like this? |
---|
58 | |
---|
59 | The CompCert project~\cite{Leroy:2009:FVR:1538788.1538814} has become |
---|
60 | a nexus of activity in recent years, including the project members' |
---|
61 | own efforts to refine the compiler and add certified |
---|
62 | optimisations~\cite{1328444,springerlink:10.1007/978-3-642-11970-5_13,springerlink:10.1007/978-3-642-28869-2_20}, |
---|
63 | and external projects such as an extensible optimisation |
---|
64 | framework~\cite{Tatlock:2010:BEV:1806596.1806611} and compiling |
---|
65 | concurrent programs~\cite{Sevcik:2011:RCV:1926385.1926393}. The main |
---|
66 | result of the project is a C compiler which targets the assembly |
---|
67 | languages for a number of popular processor architectures. Most of |
---|
68 | the compiler is formalised in the Coq proof |
---|
69 | assistant % TODO: consider citation? |
---|
70 | and accompanied by correctness results, most notably proving that any |
---|
71 | behaviour given to the assembly output must be an acceptable behaviour |
---|
72 | for the C source program. |
---|
73 | |
---|
74 | To state these correctness properties requires giving some semantics to |
---|
75 | the source and target languages. In CompCert these are given by |
---|
76 | inductively defined small-step relations. Errors and omissions in |
---|
77 | these semantics can weaken the overall theorems, potentially masking |
---|
78 | bugs in the compiler. In particular, if no semantics is given to a |
---|
79 | legitimate C program then any behaviour is acceptable for the |
---|
80 | generated assembly, and the compiler is free to generate bad |
---|
81 | code\footnote{By design, the compiler is also free to fail with an |
---|
82 | error for \emph{any} program, even if it is well-defined.}. This |
---|
83 | corresponds to the C standard's notion of \emph{undefined behaviour}, |
---|
84 | but this notion is left implicit in the semantics, so there is a |
---|
85 | danger of underdefining C. |
---|
86 | |
---|
87 | % TODO: clight paper also mentions executable semantics, in some detail |
---|
88 | Several methods could be used to gain faith in the semantics; in the |
---|
89 | conclusions of~\cite{Leroy:2009:FVR:1538788.1538814} Leroy suggests |
---|
90 | manual review, testing of executable versions and proving connections |
---|
91 | to alternative forms of semantics. Here we investigate testing an |
---|
92 | executable semantics which, by construction, is closely and formally |
---|
93 | related to CompCert's input language. The testing is facilitated by |
---|
94 | Coq's extraction mechanism, which produces OCaml code corresponding to |
---|
95 | the executable semantics that can be integrated with CompCert's |
---|
96 | existing parser to produce a C interpreter. |
---|
97 | |
---|
98 | Our motivation for this work comes principally from the CerCo project |
---|
99 | where we use executable semantics throughout~\cite{Amadio2011175}, and in particular have |
---|
100 | an executable version of CompCert's Clight intermediate |
---|
101 | language~\cite{EDI-INF-RR-1412}. There we found an omission involving function |
---|
102 | pointers during testing. We wished to attempt to replicate this in |
---|
103 | CompCert C and search for any other evident problems. An executable |
---|
104 | semantics is far more practical for this task than manual animation of |
---|
105 | the inductive definitions\footnote{This is partly because manual |
---|
106 | animation requires providing witnesses for the results or careful |
---|
107 | management of existential metavariables, and partly because we can |
---|
108 | prove results once for the executable semantics rather than once per |
---|
109 | program. Automatic proof search has similar problems.}. |
---|
110 | Morever, CompCert C is a more complex and interesting language than |
---|
111 | Clight, especially due to the presence of side-effects and |
---|
112 | non-determinism in the evaluation of expressions. |
---|
113 | |
---|
114 | The main contributions of this paper are |
---|
115 | \begin{enumerate} |
---|
116 | \item demonstrating the successful retrofitting of a small-step |
---|
117 | executable semantics to an existing verified compiler with |
---|
118 | equivalence proofs against the original semantics, |
---|
119 | \item showing that testing of the resulting interpreter does lead to |
---|
120 | the identification of bugs in both the semantics and the compiler, |
---|
121 | \item showing that the executable semantics can illustrate the |
---|
122 | limitations of the semantics, and fill in a gap in the relationship |
---|
123 | between the original deterministic and non-deterministic versions of |
---|
124 | the semantics, and |
---|
125 | \item demonstrate that a mixture of formal Coq and informal OCaml code can make |
---|
126 | testing more effective by working around known bugs with little |
---|
127 | effort. |
---|
128 | \end{enumerate} |
---|
129 | % TODO: consider moving out of footnote to make it less footnote heavy |
---|
130 | The full development is available online as a modified version of CompCert\footnote{\url{http://homepages.inf.ed.ac.uk/bcampbe2/compcert/}}. |
---|
131 | |
---|
132 | In Section~\ref{sec:compcert} we will give an overview of the relevant |
---|
133 | parts of the CompCert compiler. In Section~\ref{sec:construction} we |
---|
134 | will discuss the construction of the executable semantics, its |
---|
135 | relationship to the inductively defined semantics from CompCert (both |
---|
136 | formalised in Coq) and |
---|
137 | the additional OCaml code used to support testing. This is followed by the |
---|
138 | results of testing the semantics in Section~\ref{sec:testing}, |
---|
139 | including descriptions of the problems encountered. |
---|
140 | Finally, Section~\ref{sec:related} discusses related work, including |
---|
141 | an interpreter Leroy added to newer versions of CompCert following |
---|
142 | this work. |
---|
143 | |
---|
144 | \section{Overview of the CompCert compiler} |
---|
145 | \label{sec:compcert} |
---|
146 | |
---|
147 | CompCert compiles a simplified but substantial subset of the C |
---|
148 | programming language to one of three different architectures (as of |
---|
149 | version 1.8.2, which we refer to throughout unless stated otherwise). |
---|
150 | The main stages of the compiler, from an abstract syntax tree of the |
---|
151 | `CompCert C' language to target specific pseudo-assembly code, are |
---|
152 | written in the Calculus of Inductive Constructions (CIC) --- the |
---|
153 | dependently-typed language that underlies the Coq proof |
---|
154 | assistant~\cite{coq83}. The formal proofs of correctness refer to |
---|
155 | these definitions. Coq's extraction facilities~\cite{coqextr02} |
---|
156 | produce OCaml code corresponding to them, which can be |
---|
157 | combined with a C parser and assembler text generation to complete the |
---|
158 | compiler. The resulting compilation chain is summarised in |
---|
159 | Figure~\ref{fig:chain}. |
---|
160 | \begin{figure} |
---|
161 | \begin{center} |
---|
162 | \includegraphics{cc-diagram.pdf} |
---|
163 | \end{center} |
---|
164 | \caption{Start and end of the CompCert compilation chain} |
---|
165 | \label{fig:chain} |
---|
166 | \end{figure} |
---|
167 | |
---|
168 | Note that the unformalised C parser is not a trivial piece of code. |
---|
169 | Originally based on the CIL library for C parsing~\cite{cil02} but |
---|
170 | heavily adapted, it not only produces an abstract syntax tree, but |
---|
171 | also calculates type information and can perform several |
---|
172 | transformations to provide partial support for features that are not |
---|
173 | present in the formalised compiler. In particular, bitfields, |
---|
174 | structure passing and assignment, and \lstinline'long double' and |
---|
175 | \lstinline'long long' types are (optionally) handled here. There is a |
---|
176 | further transformation to simplify the annotated C abstract syntax |
---|
177 | tree into the CompCert C language. |
---|
178 | |
---|
179 | Earlier versions of CompCert did not feature CompCert C at all, but the |
---|
180 | less complex Clight language. Clight lives on as the next |
---|
181 | intermediate language in the compiler. Hence several publications on |
---|
182 | CompCert refer to this |
---|
183 | language~\cite{springerlink:10.1007/s10817-009-9148-3}. The major |
---|
184 | difference between the languages is that the syntax and semantics of |
---|
185 | expressions in Clight are much simpler because they are deterministic |
---|
186 | and side-effect free. Moreover, CompCert's original semantics were in |
---|
187 | big-step form and lacked support for \texttt{goto} statements. |
---|
188 | Version 1.5 added these using a small-step semantics, and 1.8 added the |
---|
189 | CompCert C language with its C-like expressions. The latter |
---|
190 | effectively moved some of the work done by the OCaml parser into the |
---|
191 | formalised compiler. |
---|
192 | % TODO: mention parser changes in 1.7? |
---|
193 | % TODO: In response to Csmith? |
---|
194 | |
---|
195 | CompCert C is also interesting because it has two semantics: one with |
---|
196 | non-deterministic behaviour for expressions which models the |
---|
197 | uncertainty about evaluation order in C, and a deterministic semantics |
---|
198 | which resolves this uncertainty. The former can be regarded as the |
---|
199 | intended input language of the compiler, and the latter as a more |
---|
200 | specific version that the compiler actually implements. In addition, |
---|
201 | these semantics have slightly different forms: in particular the |
---|
202 | non-deterministic version is always small-step, whereas the |
---|
203 | deterministic form makes a single big-step for side-effect free |
---|
204 | subexpressions. The two are connected by a result showing that |
---|
205 | programs which are \emph{safe} in the non-deterministic semantics |
---|
206 | admit some execution in the deterministic semantics, where by |
---|
207 | \emph{safe} we mean that the program never gets stuck regardless of |
---|
208 | the order of evaluation used (for a fixed choice of I/O behaviour). |
---|
209 | This result essentially holds because the deterministic semantics is |
---|
210 | choosing one of the possible evaluation orders. |
---|
211 | |
---|
212 | Note that throughout the definitions of all of these semantics we |
---|
213 | benefit from a specialisation of C to the targets that CompCert |
---|
214 | supports. Various \emph{implementation-defined} parts of the language |
---|
215 | such as integer representation and sizes are fixed (in fact, the |
---|
216 | formalisation heavily uses the assumption of a 32-bit word size). |
---|
217 | % which is the largest integer type supported, making long long a pain |
---|
218 | Also, some \emph{undefined behaviour} is given a meaning; notably |
---|
219 | mixtures of reads and writes to a variable in a single expression are |
---|
220 | given a non-deterministic behaviour rather than ruled out as per the C |
---|
221 | standard~\cite[\S 6.5.16]{c99}. |
---|
222 | |
---|
223 | \section{Construction of the executable semantics} |
---|
224 | \label{sec:construction} |
---|
225 | |
---|
226 | The executable semantics consists of several parts: existing |
---|
227 | executable definitions from CompCert, functions which closely |
---|
228 | correspond to the original relational semantics, a mechanism for |
---|
229 | resolving non-deterministic behaviour, and finally the proofs that |
---|
230 | steps of the relational semantics are equivalent to steps of the |
---|
231 | executable semantics. We do not consider whole program executions or |
---|
232 | interaction with the outside world (I/O) here, but only individual |
---|
233 | steps of the semantics. It is not difficult to include these (we have done |
---|
234 | this for Clight as part of the CerCo project~\cite{EDI-INF-RR-1412}), but we do |
---|
235 | not expect that it would yield any worthwhile insights during testing |
---|
236 | to justify the effort involved arising from the extra non-determinism |
---|
237 | from I/O and the coinductive reasoning for non-terminating programs. |
---|
238 | |
---|
239 | The existing definitions that we can reuse include the memory model |
---|
240 | that is used throughout the CompCert development~\cite{compcert-mm08}. |
---|
241 | This model features symbolic pointers with discrete abstract blocks of |
---|
242 | memory, and so is more suitable for our purposes than non-executable |
---|
243 | alternatives such as a concrete model with non-deterministic choice of |
---|
244 | fresh locations for allocations. The semantics of operations on |
---|
245 | values (for example, \texttt{+} and \texttt{==}) are naturally defined |
---|
246 | as auxiliary functions in the relational semantics, which we reuse. |
---|
247 | Casts and the conversion of values to booleans are exceptions to this; |
---|
248 | they are defined inductively in the relational semantics and we treat |
---|
249 | them in the same way as the rest of the inductive definitions, below. |
---|
250 | |
---|
251 | Local and global variable environments are necessarily executable in |
---|
252 | CompCert because they are used throughout the actual compiler code in |
---|
253 | addition to the semantics. We also make use of the compiler's error |
---|
254 | monad to recover the partiality of the relational semantics (that is, |
---|
255 | CIC is a language of total functions, so we must model failure |
---|
256 | explicitly). |
---|
257 | |
---|
258 | The environments are efficiently implemented because of their use in |
---|
259 | the compiler itself. However, the memory model was not intended to be |
---|
260 | efficient and builds large chains of closures to represent changes to |
---|
261 | memory. Fortunately, the performance was sufficient for our testing, |
---|
262 | so we did not attempt to optimise it. |
---|
263 | |
---|
264 | \subsection{Executing a step of the semantics} |
---|
265 | |
---|
266 | The relational semantics uses a series of inductive definitions to |
---|
267 | describe the steps of the abstract machine: one for reducing |
---|
268 | \emph{lvalues}, expressions which represent a location that can be |
---|
269 | read from, written to, or extracted as a pointer; one for |
---|
270 | \emph{rvalues} expressions that only yield a value, such as addition; |
---|
271 | and one for setting up function calls: |
---|
272 | \begin{lstlisting}[language=coq] |
---|
273 | Inductive lred: expr -> mem -> expr -> mem -> Prop := ... |
---|
274 | Inductive rred: expr -> mem -> expr -> mem -> Prop := ... |
---|
275 | Inductive callred: expr -> fundef -> list val -> type -> Prop := ... |
---|
276 | \end{lstlisting} |
---|
277 | The \lstinline'estep' relation represents whole state transitions |
---|
278 | corresponding to the reductions on expressions given by the first |
---|
279 | three relations, and the \lstinline'sstep' relation performs steps of |
---|
280 | statements and function entry and exit: |
---|
281 | \begin{lstlisting}[language=coq] |
---|
282 | Inductive estep: state -> trace -> state -> Prop := ... |
---|
283 | Inductive sstep: state -> trace -> state -> Prop := ... |
---|
284 | \end{lstlisting} |
---|
285 | The union of these two relations gives the overall \lstinline'step' relation. |
---|
286 | |
---|
287 | The executable semantics uses functions corresponding to each of these |
---|
288 | relations. The syntax directed nature of the inductive definitions |
---|
289 | makes this task straightforward. For example, consider the |
---|
290 | \lstinline'rred' rules for conditional expressions: |
---|
291 | \begin{lstlisting}[language=coq] |
---|
292 | | red_condition_true: forall v1 ty1 r1 r2 ty m, |
---|
293 | is_true v1 ty1 -> typeof r1 = ty -> |
---|
294 | rred (Econdition (Eval v1 ty1) r1 r2 ty) m |
---|
295 | (Eparen r1 ty) m |
---|
296 | | red_condition_false: forall v1 ty1 r1 r2 ty m, |
---|
297 | is_false v1 ty1 -> typeof r2 = ty -> |
---|
298 | rred (Econdition (Eval v1 ty1) r1 r2 ty) m |
---|
299 | (Eparen r2 ty) m |
---|
300 | \end{lstlisting} |
---|
301 | These state that a true conditional reduces to its left subexpression |
---|
302 | and a false one to its right, subject to a typing constraint. |
---|
303 | |
---|
304 | We rearrange the rules to fit in a tree of pattern matching, check |
---|
305 | type equalities where necessary, and introduce error messages where no |
---|
306 | reduction is possible: |
---|
307 | \begin{lstlisting}[language=coq] |
---|
308 | Definition exec_rred (e:expr) (m:mem) : res (expr * mem) := |
---|
309 | match e with |
---|
310 | ... |
---|
311 | | Econdition (Eval v1 ty1) r1 r2 ty => |
---|
312 | do b <- exec_bool_val v1 ty1; |
---|
313 | let r := if b then r1 else r2 in |
---|
314 | match type_eq (typeof r) ty with |
---|
315 | | left _ => OK (Eparen r ty, m) |
---|
316 | | right _ => Error (msg "type mismatch in Econdition") |
---|
317 | end |
---|
318 | \end{lstlisting} |
---|
319 | This transforms the value \lstinline[language=coq]'v1' to a boolean, |
---|
320 | selects the appropriate subexpression and checks its type. The |
---|
321 | `\lstinline[language=coq]'do'' notation is a use of the error monad: |
---|
322 | the conversion to a boolean is not always defined and may return an |
---|
323 | error, which the monad propogates to the caller of |
---|
324 | \lstinline[language=coq]'exec_rred'. The auxiliary relations defining |
---|
325 | the relationship between values and booleans, |
---|
326 | \lstinline[language=coq]'exec_bool_val', and casting are also defined |
---|
327 | as functions. |
---|
328 | |
---|
329 | Defining a function for the statement reduction, \lstinline'sstep', is |
---|
330 | similar. One part of function entry handling deserves particular |
---|
331 | attention. CompCert has a notion of \emph{external functions}, which |
---|
332 | are a set of primitive CompCert C functions that are handled outside |
---|
333 | of the program to provide system calls, dynamic memory allocation, |
---|
334 | volatile memory accesses and annotations. Inductive predicates |
---|
335 | describe what changes to the program's state are allowed. We have |
---|
336 | only implemented the dynamic memory allocation (by constructing a |
---|
337 | function similar to the predicates for \lstinline'malloc' and |
---|
338 | \lstinline'free'), and provide a predicate on states to identify uses |
---|
339 | of the other calls so that we may give a partial completeness result |
---|
340 | in the next section. |
---|
341 | |
---|
342 | The expression relation \lstinline'estep' is more difficult. This is |
---|
343 | the point in the semantics at which non-deterministic behaviour |
---|
344 | appears --- any subexpression in a suitable \emph{context} can be |
---|
345 | reduced. The contexts are defined by an inductive description of |
---|
346 | which subexpressions can be evaluated in the current state (for |
---|
347 | example, only the guard of a conditional expression can be evaluated |
---|
348 | before the conditional expression itself, whereas either subexpression |
---|
349 | of $e_1$ \lstinline'+' $e_2$ can be reduced). We could execute the |
---|
350 | non-deterministic semantics by collecting all of the possible |
---|
351 | reductions in a set (in the form of the possible successor states), but we are primarily interested in testing the |
---|
352 | semantics on well behaved C programs, and an existing theorem in |
---|
353 | CompCert shows that any program with a \emph{safe} non-deterministic |
---|
354 | behaviour also has a behaviour in the deterministic semantics. |
---|
355 | |
---|
356 | Hence we are satisfied with following a single path of execution, and |
---|
357 | so parametrise the executable semantics by a \emph{strategy} function |
---|
358 | which determines the evaluation order by picking a particular |
---|
359 | subexpression, and also returns its context. |
---|
360 | These contexts should be valid with respect to the inductive definition |
---|
361 | mentioned above. |
---|
362 | Moreover, if we choose a function matching the deterministic semantics |
---|
363 | then we know that any well-defined C program which goes wrong is |
---|
364 | demonstrating a bug (in the form of missing behaviour) which affects \emph{both} sets of semantics. Regardless |
---|
365 | of the choice of strategy, we can used failed executions to pinpoint |
---|
366 | rules in the semantics which concern us. We will discuss the |
---|
367 | implemented strategies and their implications in the following |
---|
368 | section. % TODO: make sure it is the following section later or change to a ref |
---|
369 | |
---|
370 | \label{sec:nonstuck} |
---|
371 | Finally, to match the non-deterministic semantics of \lstinline'estep' precisely we must |
---|
372 | also check that there are no \emph{stuck} subexpressions. This |
---|
373 | detects subexpressions which cannot be reduced because they are |
---|
374 | erroneous, rather than because they are already values. Normally the |
---|
375 | lack of any reducible subexpression would indicate an error, but the |
---|
376 | extra check is required because an accompanying non-terminating |
---|
377 | subexpression will mask it. The CompCert documentation gives |
---|
378 | the example |
---|
379 | \lstinline[language=C]'f() + (10 / x)' where \lstinline'x' is |
---|
380 | initially 0 and \lstinline'f' is non-terminating. Without the check, |
---|
381 | a program containing this expression would appear to be well-defined |
---|
382 | because \lstinline'f' can always be reduced, whereas we should make |
---|
383 | the program undefined because of the potential division by zero. |
---|
384 | |
---|
385 | Informally, the condition for \lstinline'estep' states that \emph{any subexpression in an |
---|
386 | evaluation context is either a value, or has a further subexpression |
---|
387 | that is reducible}. This form suggests an inefficient executable |
---|
388 | approach: for every context perform a search for the reducible |
---|
389 | subexpression. Fortunately it can be implemented by a |
---|
390 | relatively direct recursive function because we can reuse reducible |
---|
391 | subexpressions that we have already found in a larger context. Thus |
---|
392 | we only have to check that an expression can be reduced (using the |
---|
393 | reduction functions discussed above) when all of the subexpressions have |
---|
394 | already been reduced to values. |
---|
395 | |
---|
396 | The deterministic semantics does not enforce this condition because it |
---|
397 | commits to one particular evaluation, regardless of whether other |
---|
398 | evaluations may fail. |
---|
399 | This means that the relationship between the non-deterministic and |
---|
400 | deterministic semantics is not a straightforward inclusion: the |
---|
401 | deterministic semantics accepts some executions where the stuckness |
---|
402 | check fails, but rejects the executions allowed by the |
---|
403 | non-deterministic semantics under a different expression |
---|
404 | reduction order. |
---|
405 | Thus we make the executable version of the |
---|
406 | check optional so that we may implement either semantics. |
---|
407 | |
---|
408 | \subsection{Equivalence to the non-deterministic semantics} |
---|
409 | |
---|
410 | We show the equivalence of the executable semantics to the original |
---|
411 | relational semantics in two parts, summarising the proofs from the formal Coq development. First, we show that any successful |
---|
412 | execution of a step corresponds to some derivation in the original |
---|
413 | semantics, regardless of the strategy we choose. |
---|
414 | |
---|
415 | \begin{theorem}[Soundness] |
---|
416 | Given a strategy which picks valid contexts, the execution of any step |
---|
417 | $s_1$ to $s_2$ with trace $t$ implies that there exists a derivation |
---|
418 | of \lstinline'step' $s_1$ $t$ $s_2$. |
---|
419 | \end{theorem} |
---|
420 | |
---|
421 | The proof consists of building similar lemmas for each relation in the |
---|
422 | semantics, using case analysis on the states, statements and |
---|
423 | expressions involved, reduction of the executable hypothesis |
---|
424 | and reasoning on equalities % TODO: woolly |
---|
425 | until the premises of the constructor are realised. The check for |
---|
426 | \emph{stuck} subexpressions is the most involved task: we must prove a |
---|
427 | more general result differentiating between reducible and fully |
---|
428 | reduced expressions, and show that reducible terms found in |
---|
429 | subexpressions can be lifted. If no such term exists, we show that |
---|
430 | attempting to reduce the whole term is sufficient. |
---|
431 | |
---|
432 | The second part is to show that any step of the original semantics is |
---|
433 | successfully executed by the executable semantics. |
---|
434 | |
---|
435 | \begin{theorem}[Completeness --- non-deterministic] |
---|
436 | For any derivation of \lstinline'step' $s_1$ $t$ $s_2$ which is not |
---|
437 | about to call an unsupported external function there exists a strategy |
---|
438 | $p$ such that |
---|
439 | \lstinline'exec_step' $p$ $s_1$ = \lstinline'OK' $(t,s_2).$ |
---|
440 | \end{theorem} |
---|
441 | |
---|
442 | As in the soundness proof, a lemma is shown for each relation. The general |
---|
443 | form of the proofs is to perform inversion on the hypothesis for the |
---|
444 | derivation and use rewriting and the prior lemmas to reduce the goal. |
---|
445 | Again, the stuckness check is more difficult: it follows by induction |
---|
446 | on the expression, and showing that if a subexpression gets stuck then |
---|
447 | the parent expression gets stuck too. |
---|
448 | |
---|
449 | Note that each step might require a different strategy because we have |
---|
450 | restricted our attention to strategies that can be expressed as |
---|
451 | functions on the abstract machine state. This rules out strategies |
---|
452 | where the execution order depends on (for example) previous states or |
---|
453 | randomness, but these do not occur in the CompCert compiler. |
---|
454 | |
---|
455 | \subsection{Strategies and the deterministic semantics} |
---|
456 | |
---|
457 | Two evaluation strategies were implemented for the executable |
---|
458 | semantics. The first was a simple leftmost-innermost strategy that |
---|
459 | picks the first non-value subexpression that we are allowed to |
---|
460 | reduce. It was chosen because it is simple to implement, sufficient |
---|
461 | for testing (any bugs were likely to be independent of evaluation |
---|
462 | order, and our experience with the second strategy supported this), |
---|
463 | and could be used as the basis for the second strategy. |
---|
464 | |
---|
465 | % TODO: have I included a discussion of missing behaviour vs wrong |
---|
466 | % behaviour vs extra behaviour? |
---|
467 | |
---|
468 | However, to provide the greatest benefit from the executable semantics |
---|
469 | we wanted to get executions which precisely match the deterministic |
---|
470 | semantics that the compiler actually implements. That is, we wish to |
---|
471 | have an interpreter that predicts exactly the behaviour of the |
---|
472 | compiled program. This is not entirely straightforward because the |
---|
473 | original deterministic semantics are formalised in a slightly |
---|
474 | different way to non-deterministic executions: expressions with no |
---|
475 | side effects are dealt with by a big-step relation, and effectful |
---|
476 | expressions use a small-step relation as before. |
---|
477 | Here `effectful' includes both changes to the state and to control flow such as the conditional operator. |
---|
478 | This |
---|
479 | prioritisation of the effectful subexpressions is also what causes the |
---|
480 | difference in evaluation order. For example, |
---|
481 | \lstinline[language=C]'x+(x=1)' when \lstinline'x' $\not = 1$ gives a different |
---|
482 | answer with the second strategy because the assignment is evaluated |
---|
483 | first. |
---|
484 | |
---|
485 | Despite the difference in construction, we can still encode the resulting evaluation order as a |
---|
486 | strategy: |
---|
487 | \begin{enumerate} |
---|
488 | \item find the leftmost-innermost \emph{effectful} subexpression (or |
---|
489 | take the whole expression if it is effect-free), then |
---|
490 | \item |
---|
491 | \begin{enumerate} |
---|
492 | \item pick the leftmost-innermost \emph{effect-free} non-value |
---|
493 | subexpression of that if one is present, otherwise |
---|
494 | \item reduce the whole subexpression. |
---|
495 | \end{enumerate} |
---|
496 | \end{enumerate} |
---|
497 | Intuitively, this works because the effect-free subexpressions can be |
---|
498 | reduced in any order that is allowed without changing the result. |
---|
499 | |
---|
500 | To show formally that this matches the relational semantics we first |
---|
501 | show that the big-steps of effect-free subexpressions can be |
---|
502 | decomposed into a series of small-steps using the relations from the |
---|
503 | non-deterministic semantics. We can then show that iterating the |
---|
504 | executable semantics will perform the same steps, reaching the same |
---|
505 | end state as the big-step. Then using some lemmas to show that the |
---|
506 | \lstinline'leftcontext's (the deterministic version of expression contexts) used in the semantics are found by |
---|
507 | our leftmost-innermost effectful subexpression strategy, we can show |
---|
508 | that any step of the deterministic semantics is performed by some |
---|
509 | number of steps of the executable semantics: |
---|
510 | \begin{theorem}[Completeness --- deterministic] |
---|
511 | For any derivation of \lstinline'step' $s_1$ $t$ $s_2$ in the |
---|
512 | deterministic semantics which is not about to call an unsupported |
---|
513 | external function, repeated use of the \lstinline'exec_step' |
---|
514 | function starting with $s_1$ using the above strategy yields the |
---|
515 | state $s_2$. Moreover, the concatenation of the resulting traces is |
---|
516 | $t$. |
---|
517 | \end{theorem} |
---|
518 | Again, a precise version of the above argument is formalised in Coq. |
---|
519 | |
---|
520 | Together with the connections between the non-deterministic semantics |
---|
521 | and the executable semantics, and the result from CompCert linking the |
---|
522 | two relational semantics we get the relationships depicted in |
---|
523 | Figure~\ref{fig:relationships}. |
---|
524 | \begin{figure} |
---|
525 | \begin{center} |
---|
526 | \includegraphics{semantics.pdf} |
---|
527 | \end{center} |
---|
528 | \caption{Relationships between the different CompCert C semantics} |
---|
529 | \label{fig:relationships} |
---|
530 | \end{figure} |
---|
531 | We can see that the new result completes a loop --- confirming the |
---|
532 | intuition that the deterministic semantics is the same as the |
---|
533 | non-deterministic semantics with a particular choice of evaluation order, |
---|
534 | but without the `stuckness-check'. |
---|
535 | % TODO: describe the diagram's meaning a little more? |
---|
536 | |
---|
537 | We also see directly from the executable semantics that the evaluation |
---|
538 | order is the \emph{only} possible source of non-determinism besides |
---|
539 | I/O, because each step of the semantics can be expressed as a function |
---|
540 | rather than a relation. |
---|
541 | |
---|
542 | \subsection{Informal OCaml code} |
---|
543 | \label{sec:driver} |
---|
544 | |
---|
545 | To produce the actual interpreter we add some OCaml code which uses the |
---|
546 | existing CompCert parser to produce CompCert C from C source code, and then |
---|
547 | iterates the step function extracted from Coq until the program successfully |
---|
548 | completes, or fails with an error. |
---|
549 | |
---|
550 | Ideally we would fix each bug that we encounter when testing this |
---|
551 | interpreter, but this can involve repairing the specifications, the |
---|
552 | compiler, and the proofs; all of which CompCert's developers are |
---|
553 | better placed to deal with. Nonetheless, we wish to continue testing |
---|
554 | in spite of any bugs found. To achieve this we detect troublesome |
---|
555 | states in the OCaml main loop and override the normal behaviour of the |
---|
556 | executable semantics without changing the formal development. To |
---|
557 | assist in the implementation of these workarounds we use a higher order |
---|
558 | function to apply a local change to an expression to any applicable |
---|
559 | subexpression appearing in an evaluation context. |
---|
560 | |
---|
561 | We also take the opportunity to add some support functions that are |
---|
562 | outside the scope of the semantics, but which are informally supported by the |
---|
563 | compiler. They are implemented by detecting states which call these |
---|
564 | functions and running an OCaml version instead. We provide |
---|
565 | \lstinline'exit' and \lstinline'abort' (trivial functions to halt |
---|
566 | execution), \lstinline'memcpy' and \lstinline'memcmp' (impossible to |
---|
567 | implement in CompCert C because the memory model does not provide |
---|
568 | byte-by-byte representations of pointers\footnote{Implementation of |
---|
569 | \lstinline'memcpy' and \lstinline'memcmp' was assisted by the fact |
---|
570 | that Coq's extraction process exposes the internal representation of |
---|
571 | the memory model, rather than forcing us to implement |
---|
572 | them through the model's interface.}) and a limited version of |
---|
573 | \lstinline'printf'. CompCert's semantics also assume that the entire |
---|
574 | program is present in a single file, so we provide some support for |
---|
575 | merging definitions from multiple files to simulate separate |
---|
576 | compilation. |
---|
577 | |
---|
578 | All of these can be added without requiring any extra proof or |
---|
579 | changing the compiler or the semantics. They can also be turned off |
---|
580 | to be sure that problems we encounter come from the formalised |
---|
581 | semantics and not the workarounds. |
---|
582 | |
---|
583 | Finally, we note that the execution of the semantics can be examined |
---|
584 | in the debugger packaged with OCaml when absolute certainty about the |
---|
585 | cause of a problem is required. Indeed, during the testing below the |
---|
586 | debugger was used to retrieve extra information about the program |
---|
587 | state that was not included in the interpreter's error message. |
---|
588 | |
---|
589 | |
---|
590 | \section{Testing} |
---|
591 | \label{sec:testing} |
---|
592 | |
---|
593 | After some basic testing to ensure that the interpreter was |
---|
594 | functioning correctly (in particular, testing the informal driver code), |
---|
595 | we proceeded with the example that illustrated an omission in CerCo's |
---|
596 | Clight semantics, then attempted more demanding tests. |
---|
597 | |
---|
598 | \subsection{Function pointers} |
---|
599 | |
---|
600 | The simple program |
---|
601 | \begin{lstlisting}[language=C] |
---|
602 | int zero(void) { return 0; } |
---|
603 | |
---|
604 | int main(void) { |
---|
605 | int (*f)(void) = zero; |
---|
606 | return f(); |
---|
607 | } |
---|
608 | \end{lstlisting} |
---|
609 | returns zero by a call to the \lstinline'zero' function via a function |
---|
610 | pointer. It failed during testing of CerCo's executable semantics due |
---|
611 | to a bad type check, and CompCert C was suspected to suffer from the |
---|
612 | same problem. Indeed, execution fails at the call with the same type |
---|
613 | error: the variable \lstinline'f''s type is a pointer to a function and the |
---|
614 | semantics only accepts a function type. |
---|
615 | |
---|
616 | There are two noteworthy aspects to this bug. First, the compiler |
---|
617 | itself features the correct type check and works perfectly. The error |
---|
618 | only affects the specification of the compiler; this is extra |
---|
619 | behaviour that has not been proved correct. The |
---|
620 | semantics can be easily modified to use the same check, and the only |
---|
621 | other change required is to \emph{remove} some steps from the proof |
---|
622 | scripts. Second, the bug only became relevant from version 1.7 of |
---|
623 | CompCert --- before which the parser always added an explicit |
---|
624 | dereference. This illustrates a general issue: the intended semantics |
---|
625 | of CompCert C is unclear because it is difficult to understand what we |
---|
626 | should assume about the output of the complex parser. |
---|
627 | |
---|
628 | \subsection{Csmith} |
---|
629 | |
---|
630 | Csmith is a tool for the random generation of test cases for C |
---|
631 | compilers that uses a mixture of static and dynamic checks to ensure |
---|
632 | that the generated code has well-defined |
---|
633 | behaviour~\cite{Yang:2011:FUB:1993498.1993532}. Failing test cases |
---|
634 | are identified by the compiler failing with a crash or error, the |
---|
635 | generated code crashing, or by comparing the output of the test case |
---|
636 | with that obtained using different compilers. Each generated test |
---|
637 | case outputs a checksum to summarise its runtime behaviour for |
---|
638 | comparison. |
---|
639 | |
---|
640 | Csmith is particularly interesting because it has already been used to |
---|
641 | test the CompCert compiler as a whole where it detected several bugs |
---|
642 | in the informal part of the compiler and a mismatch between the |
---|
643 | assembler output and the assembler's behaviour. These problems have |
---|
644 | since been corrected, partly by the introduction of CompCert C as the |
---|
645 | input language. Thus it gives us the opportunity to compare the |
---|
646 | semantics against the actual compiler. However, it is not an ideal |
---|
647 | tool for testing semantics because it focuses on detecting |
---|
648 | `middle-end' bugs, and thus only generates a limited range of |
---|
649 | front-end language features. |
---|
650 | |
---|
651 | Given the previous testing of the compiler, it was unsurprising that |
---|
652 | we found no problems when executing the randomly generated code with |
---|
653 | the interpreter and comparing the results against the |
---|
654 | compiler\footnote{Using Csmith version 2.0.}. However, we did |
---|
655 | encounter a failure with the \emph{non-random} support code which |
---|
656 | implements the dynamic checks that prevent undefined behaviour in |
---|
657 | arithmetic operations. The failing code can be seen in |
---|
658 | Figure~\ref{fig:csmithfail}. |
---|
659 | \begin{figure} |
---|
660 | \begin{lstlisting} |
---|
661 | int8_t lshift_func_int8_t_s_s(int8_t left, int right) |
---|
662 | { |
---|
663 | return |
---|
664 | ((left < 0) || |
---|
665 | (((int)right) < 0) || |
---|
666 | (((int)right) >= 32) || |
---|
667 | (left > (INT8_MAX >> ((int)right)))) ? |
---|
668 | |
---|
669 | left : |
---|
670 | |
---|
671 | (left << ((int)right)); |
---|
672 | } |
---|
673 | \end{lstlisting} |
---|
674 | \caption{Excerpt of the safe arithmetic code from Csmith (edited for readability)} |
---|
675 | \label{fig:csmithfail} |
---|
676 | \end{figure} |
---|
677 | |
---|
678 | The failure is caused by the reduction rules for conditional expressions: |
---|
679 | \begin{lstlisting}[language=coq] |
---|
680 | | red_condition_true: forall v1 ty1 r1 r2 ty m, |
---|
681 | is_true v1 ty1 -> typeof r1 = ty -> |
---|
682 | rred (Econdition (Eval v1 ty1) r1 r2 ty) m |
---|
683 | (Eparen r1 ty) m |
---|
684 | | red_condition_false: forall v1 ty1 r1 r2 ty m, |
---|
685 | is_false v1 ty1 -> typeof r2 = ty -> |
---|
686 | rred (Econdition (Eval v1 ty1) r1 r2 ty) m |
---|
687 | (Eparen r2 ty) m |
---|
688 | \end{lstlisting} |
---|
689 | Each rule requires the type of the chosen subexpression to equal the |
---|
690 | type of the entire expression. However, here one branch is an 8-bit |
---|
691 | integer, and the other a 32-bit integer so one of the rules cannot be |
---|
692 | applied. The C standard requires that `the usual arithmetic |
---|
693 | conversions' should be applied to coerce the result to a common |
---|
694 | type~\cite[\S6.5.15]{c99}, so the 8-bit integer should be promoted to |
---|
695 | a 32-bit one. |
---|
696 | |
---|
697 | As with the function pointers, this test works with the compiler |
---|
698 | because the two types have a common representation --- all integers |
---|
699 | are handled as 32-bit words in CompCert. It is only when they are |
---|
700 | loaded and stored to memory that the integer size is taken into |
---|
701 | account. Nonetheless, this failure provides us with enough |
---|
702 | information to construct an example on which the compiler fails |
---|
703 | because the representations differ: |
---|
704 | \begin{lstlisting}[language=C] |
---|
705 | double f(int x, int a, double b) { |
---|
706 | return x ? a : b; |
---|
707 | } |
---|
708 | \end{lstlisting} |
---|
709 | Fortunately, the compiler performs some type reconstruction for the RTL |
---|
710 | intermediate language which makes it produce an error rather than |
---|
711 | bad code. Curiously, this means that the overall correctness theorem |
---|
712 | still holds if the semantics are corrected because the compiler |
---|
713 | is always allowed to fail, even on reasonable programs. However, the |
---|
714 | intermediate results about the front-end become false because the bug |
---|
715 | is present much earlier in the compiler than the type reconstruction. |
---|
716 | |
---|
717 | Alleviating this bug was essential for proper testing with Csmith, but |
---|
718 | a full solution requires fixing the semantics, compiler and proofs and |
---|
719 | is beyond the scope of this work. By implementing an alternative |
---|
720 | reduction rule in OCaml as a workaround we were able to continue |
---|
721 | testing without difficulty. |
---|
722 | |
---|
723 | \subsection{gcc-torture} |
---|
724 | |
---|
725 | The GCC compiler contains an extensive test suite for several of its |
---|
726 | supported languages~\cite[\S 6.4]{gccint4.4}. The gcc-torture suite |
---|
727 | contains an executable subset of the tests in C. Many of the test |
---|
728 | cases use GCC-specific features, but we are able to reuse a subset |
---|
729 | selected by another executable C semantics project, |
---|
730 | \texttt{kcc}~\cite{Ellison:2012:EFS:2103656.2103719}. Unlike the |
---|
731 | Csmith generated tests, gcc-torture contains many specialised tests |
---|
732 | for corner-cases of the C standard. In addition to finding errors, |
---|
733 | these also serve to highlight deliberate limitations in CompCert's |
---|
734 | semantics. |
---|
735 | |
---|
736 | A small test harness ran each test case with the executable |
---|
737 | semantics. The failing cases were manually classified and in a few |
---|
738 | cases a workaround or fix to the parser was added to prevent known |
---|
739 | bugs hiding further issues. |
---|
740 | |
---|
741 | The tests revealed that the semantics were missing some minor casting |
---|
742 | rules and that zero initialisation of file scope variables was not |
---|
743 | performed when initialising the memory model. Both were handled |
---|
744 | correctly by the compiler, in the latter case by the informal code |
---|
745 | that produces the assembler output. |
---|
746 | |
---|
747 | Several issues with the OCaml parser appeared: initialisation of an |
---|
748 | array by a shorter string did not add null-padding; arrays were not |
---|
749 | permitted to the left of a \lstinline'->' dereference; incomplete |
---|
750 | array types were not fully supported; and the reported line numbers in |
---|
751 | error messages could be wrong. Ironically, the latter was caused by |
---|
752 | side-effects and OCaml's own non-deterministic evaluation order. |
---|
753 | |
---|
754 | Deliberate limitations of CompCert were strongly apparent in the |
---|
755 | result too. Unsupported and partially supported constructs such as |
---|
756 | \lstinline'long long' integer types and bitfields caused many test |
---|
757 | case failures. A more interesting case is that the comparison of |
---|
758 | pointers is undefined when one points just beyond the end of the |
---|
759 | object (which is explicitly permitted by the C standard), or when a |
---|
760 | function pointer is used. However, we were unable to find any mention |
---|
761 | of this limitation in the documentation, and without testing would |
---|
762 | never have known about it. |
---|
763 | |
---|
764 | \section{Related work} |
---|
765 | \label{sec:related} |
---|
766 | |
---|
767 | The CompCert developers had already proposed implementing an |
---|
768 | executable semantics for the first version of Clight, but in a |
---|
769 | big-step style with a bound on the depth of the evaluation to ensure |
---|
770 | termination~\cite[\S 5.4]{springerlink:10.1007/s10817-009-9148-3}. |
---|
771 | However, it would be difficult to implement the workarounds described |
---|
772 | in Section~\ref{sec:driver} with a big-step semantics. |
---|
773 | |
---|
774 | Reaction to this work from the CompCert developers has been positive: |
---|
775 | bugs identified before the next release were fixed (in the case of the |
---|
776 | missing casts, independently before they were identified by testing), |
---|
777 | and Leroy has subsequently added a similar small-step interpreter. |
---|
778 | The new interpreter differs from the present work by focusing solely |
---|
779 | on the non-deterministic semantics and computing all of the successor |
---|
780 | states at once. This greatly simplifies the checking and proof of |
---|
781 | non-stuckness described in Section~\ref{sec:nonstuck}, and provides an |
---|
782 | option to explore the entire space of evaluation orders (which is |
---|
783 | surprisingly tractable). However, there is no easy way to mimic just |
---|
784 | the deterministic evaluation order, and no tricks in the driver code |
---|
785 | to make testing easier. Together with the bug fixes, this makes |
---|
786 | directly comparing test results with the current work infeasible, although |
---|
787 | some brief testing with the gcc-torture suite behaved as expected. |
---|
788 | The latest version of CompCert, 1.11, also features a more efficient |
---|
789 | memory model, improving the interpreter's performance. |
---|
790 | |
---|
791 | % NB: if I change the formatting of Clight, change ClightTSO too |
---|
792 | \v{S}ev\v{c}\'{i}k et al.~\cite{Sevcik:2011:RCV:1926385.1926393} have |
---|
793 | created CompCertTSO, a derivative of an earlier version of CompCert |
---|
794 | which supports concurrent shared-memory programs with a \emph{Total |
---|
795 | Store Ordering} (TSO) memory model. The source language for the |
---|
796 | formalised part of the compiler is ClightTSO and is given an |
---|
797 | executable semantics in addition to the usual relational semantics, |
---|
798 | which `revealed a number of subtle errors.' The main issues found in |
---|
799 | the present work did not arise because it was based on a version of |
---|
800 | CompCert that predated their introduction, and the pointer comparisons |
---|
801 | were not a problem because they were specified differently in |
---|
802 | CompCertTSO. |
---|
803 | |
---|
804 | Ellison and Ro{\c s}u~\cite{Ellison:2012:EFS:2103656.2103719} have |
---|
805 | developed a semantics for C based on rewriting logic that aims to be |
---|
806 | as close to the standard as possible, including many parts of the |
---|
807 | language that are not currently supported by CompCert. An interpreter |
---|
808 | called \texttt{kcc} is derived from the semantics and has been tested |
---|
809 | against the gcc-torture test suite that we reused. They go further |
---|
810 | and perform coverage analysis to gauge the proportion of the semantics |
---|
811 | exercised by the tests. These semantics |
---|
812 | are not used for compiler verification, but are intended for studying |
---|
813 | (and ultimately verifying) the behaviour of C programs. |
---|
814 | |
---|
815 | The Piton compiler~\cite{springerlink:10.1007/BF00243133} was |
---|
816 | formalised in the Boyer-Moore prover, which is an example of an |
---|
817 | environment where an executable semantics is the natural choice. The |
---|
818 | Piton language is a very low-level systems language defined for the |
---|
819 | project, so they do not benefit from preexisting test suites. |
---|
820 | However, the work is particularly interesting due to the connected |
---|
821 | hardware formalisation of the target. A later use of executable semantics |
---|
822 | in ACL2 demonstrated the usefulness of executable semantics in hardware |
---|
823 | verification because running an existing test suite for AMD's RTL language |
---|
824 | against the semantics was crucial for convincing managers that the model |
---|
825 | was relevant~\cite[\S 3]{springerlink:10.1007/3-540-49519-3_22}. |
---|
826 | |
---|
827 | Lochbihler and |
---|
828 | Bulwahn~\cite{springerlink:10.1007/978-3-642-22863-6_17} applied and |
---|
829 | extended Isabelle's code extraction and predicate |
---|
830 | compiler~\cite{springerlink:10.1007/978-3-642-03359-9_11} to animate |
---|
831 | the semantics of a multithreaded Java formalisation in Isabelle/HOL, |
---|
832 | JinjaThreads. This is a very appealing approach because the predicate |
---|
833 | compiler deals with most of the burden of writing an executable |
---|
834 | version of an inductively defined semantics, although JinjaThreads |
---|
835 | still required a substantial amount of work to deal with some |
---|
836 | difficult definitions. Their description of testing focuses on |
---|
837 | performance rather than correctness, but executable versions of the |
---|
838 | code have been tested on an ongoing basis since the earlier Jinja |
---|
839 | formalisation that it is based |
---|
840 | on~\cite{Klein:2006:MMJ:1146809.1146811}. |
---|
841 | |
---|
842 | % TODO: Norrish, C0, other exec sem? |
---|
843 | |
---|
844 | \section{Conclusions} |
---|
845 | |
---|
846 | We have shown that executable semantics can be useful enough for |
---|
847 | the necessary task of |
---|
848 | validating semantics to justify retrofitting them to existing |
---|
849 | verified compilers, and in the case of CompCert found a real compiler |
---|
850 | bug in the formalised front-end, alongside numerous minor issues. It |
---|
851 | also illustrates that testing of the semantics can be more sensitive |
---|
852 | than testing the compiler: our original failing case for the |
---|
853 | conditional expressions bug would (and did) pass compiler testing, |
---|
854 | but using that failure in the semantics we were able to derive a test |
---|
855 | case that also failed in the compiler. |
---|
856 | |
---|
857 | In addition to the testing, the executable semantics were also useful |
---|
858 | for demonstrating the limitations of the semantics, both known and |
---|
859 | unknown. Moreover, we were able to take the opportunity to prove that |
---|
860 | an intuitive relationship between the deterministic and |
---|
861 | non-deterministic semantics of CompCert C holds. |
---|
862 | |
---|
863 | \bibliographystyle{splncs03} |
---|
864 | \bibliography{ccexec} |
---|
865 | |
---|
866 | \end{document} |
---|