# Changeset 2382

Ignore:
Timestamp:
Oct 1, 2012, 11:55:26 AM (7 years ago)
Message:

Final version of executable semantics paper.

Location:
Papers/cpp-exec-2012
Files:
3 edited
1 moved

### Legend:

Unmodified
 r2381 the compiler is formalised in the Coq proof assistant % TODO: consider citation? and accompanied by correctness results. and accompanied by correctness results, most notably proving that any behaviour given to the assembly output must be an acceptable behaviour for the C source program. To state these correctness properties requires giving some semantics to these semantics can weaken the overall theorems, potentially masking bugs in the compiler.  In particular, if no semantics is given to a legitimate C program the compiler is free to generate bad legitimate C program then any behaviour is acceptable for the generated assembly, and the compiler is free to generate bad code\footnote{By design, the compiler is also free to fail with an error for \emph{any} program, even if it is well-defined.}. error for \emph{any} program, even if it is well-defined.}.  This corresponds to the C standard's notion of \emph{undefined behaviour}, but this notion is left implicit in the semantics, so there is a danger of underdefining C. % TODO: clight paper also mentions executable semantics, in some detail between the original deterministic and non-deterministic versions of the semantics, and \item demonstrate that a mixture of formal and informal code can make \item demonstrate that a mixture of formal Coq and informal OCaml code can make testing more effective by working around known bugs with little effort. \end{enumerate} % TODO: consider moving out of footnote to make it less footnote heavy The full development is available online as a modified version of CompCert\footnote{\url{http://homepages.inf.ed.ac.uk/bcampbe2/compcert/}}. In Section~\ref{sec:compcert} we will give an overview of the relevant parts of the CompCert compiler.  In Section~\ref{sec:construction} we will discuss the construction of the executable semantics, its relationship to the inductively defined semantics from CompCert and relationship to the inductively defined semantics from CompCert (both formalised in Coq) and the additional OCaml code used to support testing.  This is followed by the results of testing the semantics in Section~\ref{sec:testing}, including descriptions of the problems encountered. Finally, Section~\ref{sec:related} discusses related work. Finally, Section~\ref{sec:related} discusses related work, including an interpreter Leroy added to newer versions of CompCert following this work. \section{Overview of the CompCert compiler} CompCert compiles a simplified but substantial subset of the C programming language to one of 3 different architectures (as of programming language to one of three different architectures (as of version 1.8.2, which we refer to throughout unless stated otherwise). The main stages of the compiler, from an abstract syntax tree of the tree into the CompCert C language. Earlier version of CompCert did not feature CompCert C at all, but the Earlier versions of CompCert did not feature CompCert C at all, but the less complex Clight language.  Clight lives on as the next intermediate language in the compiler.  Hence several publications on this for Clight as part of the CerCo project~\cite{EDI-INF-RR-1412}), but we do not expect that it would yield any worthwhile insights during testing to justify the effort involved. to justify the effort involved arising from the extra non-determinism from I/O and the coinductive reasoning for non-terminating programs. The existing definitions that we can reuse include the memory model which subexpressions can be evaluated in the current state (for example, only the guard of a conditional expression can be evaluated before the conditional expression itself).  We could execute the before the conditional expression itself, whereas either subexpression of $e_1$ \lstinline'+' $e_2$ can be reduced).  We could execute the non-deterministic semantics by collecting all of the possible reductions in a set, but we are primarily interested in testing the reductions in a set (in the form of the possible successor states), but we are primarily interested in testing the semantics on well behaved C programs, and an existing theorem in CompCert shows that any program with a \emph{safe} non-deterministic Moreover, if we choose a function matching the deterministic semantics then we know that any well-defined C program which goes wrong is demonstrating a bug which affects \emph{both} sets of semantics.  Regardless demonstrating a bug (in the form of missing behaviour) which affects \emph{both} sets of semantics.  Regardless of the choice of strategy, we can used failed executions to pinpoint rules in the semantics which concern us.  We will discuss the \label{sec:nonstuck} Finally, to match the non-deterministic semantics precisely we must Finally, to match the non-deterministic semantics of \lstinline'estep' precisely we must also check that there are no \emph{stuck} subexpressions.  This detects subexpressions which cannot be reduced because they are the program undefined because of the potential division by zero. Informally, this condition states that \emph{any subexpression in an Informally, the condition for \lstinline'estep' states that \emph{any subexpression in an evaluation context is either a value, or has a further subexpression that is reducible}.  This form suggests an inefficient executable The deterministic semantics does not enforce this condition because it commits to one particular evaluation, regardless of whether other evaluations may fail.  Thus we make the executable version of the evaluations may fail. This means that the relationship between the non-deterministic and deterministic semantics is not a straightforward inclusion: the deterministic semantics accepts some executions where the stuckness check fails, but rejects the executions allowed by the non-deterministic semantics under a different expression reduction order. Thus we make the executable version of the check optional so that we may implement either semantics. We show the equivalence of the executable semantics to the original relational semantics in two parts.  First, we show that any successful relational semantics in two parts, summarising the proofs from the formal Coq development.  First, we show that any successful execution of a step corresponds to some derivation in the original semantics, regardless of the strategy we choose. different way to non-deterministic executions: expressions with no side effects are dealt with by a big-step relation, and effectful expressions (where control flow such as the conditional operator is considered an effect) use a small-step relation as before. Nonetheless, we can encode the resulting evaluation order as a expressions use a small-step relation as before. Here `effectful' includes both changes to the state and to control flow such as the conditional operator. This prioritisation of the effectful subexpressions is also what causes the difference in evaluation order.  For example, \lstinline[language=C]'x+(x=1)' when \lstinline'x' $\not = 1$ gives a different answer with the second strategy because the assignment is evaluated first. Despite the difference in construction, we can still encode the resulting evaluation order as a strategy: \begin{enumerate} executable semantics will perform the same steps, reaching the same end state as the big-step.  Then using some lemmas to show that the \lstinline'leftcontext's from the deterministic semantics are found by \lstinline'leftcontext's (the deterministic version of expression contexts) used in the semantics are found by our leftmost-innermost effectful subexpression strategy, we can show that any step of the deterministic semantics is performed by some $t$. \end{theorem} Again, a precise version of the above argument is formalised in Coq. Together with the connections between the non-deterministic semantics Finally, we note that the execution of the semantics can be examined in the debugger packaged with OCaml when absolute certainty about the cause of a problem is required. The full development is available online as a modified version of CompCert\footnote{\url{http://homepages.inf.ed.ac.uk/bcampbe2/compcert/}}. cause of a problem is required.  Indeed, during the testing below the debugger was used to retrieve extra information about the program state that was not included in the interpreter's error message. \section{Testing} There are two noteworthy aspects to this bug.  First, the compiler itself features the correct type check and works perfectly.  The itself features the correct type check and works perfectly.  The error only affects the specification of the compiler; this is extra behaviour that has not been proved correct. The semantics can be easily modified to use the same check, and the only other change required is to \emph{remove} some steps from the proof The tests revealed that the semantics were missing some minor casting rules and that zero initialisation of file scope variables was not performed when initialisation the memory model.  Both were handled performed when initialising the memory model.  Both were handled correctly by the compiler, in the latter case by the informal code that produces the assembler output. surprisingly tractable).  However, there is no easy way to mimic just the deterministic evaluation order, and no tricks in the driver code to make testing easier. to make testing easier.  Together with the bug fixes, this makes directly comparing test results with the current work infeasible, although some brief testing with the gcc-torture suite behaved as expected. The latest version of CompCert, 1.11, also features a more efficient memory model, improving the interpreter's performance. % NB: if I change the formatting of Clight, change ClightTSO too language that are not currently supported by CompCert.  An interpreter called \texttt{kcc} is derived from the semantics and has been tested against the gcc-torture test suite that we reused.  These semantics against the gcc-torture test suite that we reused.  They go further and perform coverage analysis to gauge the proportion of the semantics exercised by the tests. These semantics are not used for compiler verification, but are intended for studying (and ultimately verifying) the behaviour of C programs. We have shown that executable semantics can be useful enough for the necessary task of validating semantics to justify retrofitting them to existing verified compilers, and in the case of CompCert found a real compiler