# Changeset 1261

Ignore:
Timestamp:
Sep 23, 2011, 4:17:44 PM (8 years ago)
Message:

More of D3.2.

File:
1 edited

### Legend:

Unmodified
 r1235 We also report on progress enriching these transformations with dependent types so as to establish key invariants in each intermediate language, which will assist in future work proving correctness properties of the compiler. removes potential sources of failure within the compiler and will assist in future work proving correctness properties. \newpage % TODO: fix unicode in listings % TODO: make it clear that our correctness is intended to go beyond compcert's % TODO: capitalise deliverable/task? \section{Introduction} \begin{figure} \begin{center} \begin{minipage}{.8\linewidth} \begin{tabbing} \quad \= $\downarrow$ \quad \= \kill \textsf{RTLabs}\\ \> $\downarrow$ \> start of target specific backend\\ \ \  \dots \>\quad \vdots \end{tabbing} \end{minipage} \end{center} \caption{Front-end languages and transformations} \label{fig:summary} two intermediate languages involved are \begin{description} \item[\textsf{Cminor}] --- a C-like language where local variables are no longer held explicitly in memory and control structures are simpler \item[\textsf{RTLabs}] --- a language in the form of a control flow graph but retaining the front end operations from \textsf{Cminor} \item[\textsf{Cminor}] --- a C-like language where local variables are not explicitly allocated memory and control structures are simpler \item[\textsf{RTLabs}] --- a language in the form of a control flow graph which retains the values and front end operations from \textsf{Cminor} \end{description} More details on the formalisation of the syntax and semantics of these We have been tracking revisions to the prototype compiler during the development of the formalized version.  These were usually minor, with the exception of the following major revision to the compiler. development of the formalized version.  Most of these changes were minor, but one exception is a major change to the structure of the compiler. The original plan for the front-end featured a \textsf{Clight} to \textsf{Clight8} phase near the start which replaced all of the integer values and operations by 8 bit counterparts, while pointers were split into bytes at a later stage.  Experimentation found that it would be difficult to bytes at a later stage.  Experience has shown that it would be difficult to produce good code with this approach.  Instead, we now have: \begin{itemize} \item a cast removal stage which simplifies \textsf{Clight} expressions such as \begin{lstlisting}[language=C] (char)((int)x + (int)y) \begin{lstlisting}[language=C,belowskip=0pt] (char)((int)x + (int)y) \end{lstlisting} which are produced by integer promotion built into the CIL parser into equivalent operations on simpler types, \lstinline'x+y' in this case. into equivalent operations on simpler types, \lstinline'x+y' in this case. The cast removal is important because C requires \emph{arithmetic promotion} of integer types to (at least) \lstinline'int' before an operation is performed.  The \textsf{Clight} semantics do not perform the promotions, instead they are added as casts by the CIL-based parser.  However, our targets benefit immensely from performing operations on the smallest possible integer type, so it is important that we remove promotions where possible. \end{itemize} This document describes the formalized front end after these changes. \section{Clight phases} \section{\textsf{Clight} phases} In addition to the conversion to \textsf{Cminor}, there are several transformations that act directly on the \textsf{Clight} language. transformations which act directly on the \textsf{Clight} language. \subsection{Cast simplification} The prototype version worked by recognising fixed patterns in the \textsf{Clight} abstract syntax tree such as $(t)((t_1)e_1 op (t_2)e_2)$ subject to restrictions on the types, and replaces them with a simpler version.  These deep pattern matches are slightly awkward in \matita{} and they do not capture compositions of operations, such as $(t)((t_1)e_1\ op\ (t_2)e_2),$ subject to restrictions on the types.  These are replaced with a simpler version without the casts.  Such deep' pattern matching is slightly awkward in \matita{} and this approach does not capture compositions of operations, such as \begin{lstlisting}[language=C] (char)((int)a + (int)b + (int)c) (char)(((int)a + (int)b) + (int)c) \end{lstlisting} where \lstinline'a', \lstinline'b' and \lstinline'c' are of type \lstinline'char', because the intermediate expression is not cast to and from \lstinline'char'. the subexpression is of type \lstinline'char' and eliminates the cast. Moreover, when the recursive processing is complete the \lstinline'char' cast is also eliminated because its subexpression is now of the correct type. This has been formalized in \matita.  We have also performed a few proofs that is also eliminated because its subexpression is already of the correct type. This has been implemented in \matita.  We have also performed a few proofs that the arithmetic behind these changes is correct to gain confidence in the technique.  During task 3.4 we will extend these proofs to cover more necessary because \textsf{Clight} allows arbitrary \emph{lvalue} expressions as the destination for the returned value, but \textsf{Cminor} only allows local variables.  However, the variable names are supplied with the input program, but without any method for generating fresh names. local variables.  All other variable names in the \textsf{Cminor} program came from the \textsf{Clight} program, but we need to construct a method for generating fresh names for the temporaries. Our identifiers are based on binary numbers, and generation of fresh names is handled by keeping track of the highest allocated number.  Thus we create a new name generator from the input program by finding the maximum number used. \section{Cminor phases} Only two phases deal with \textsf{Cminor} programs: handled by keeping track of the highest allocated number.  Normally this is initialised at zero, but if initialised by the largest existing identifier in the \textsf{Clight} program then the generated names will be fresh. To do this, we extract the maximum identifier by recursively finding the maximum variable name used in every expression, statement and function of the program. \section{\textsf{Cminor} phases} \textsf{Cminor} programs are processed by two passes: one deals with the initialisation of global variables, and the other produces \textsf{RTLabs} code. \subsection{Initialisation code} initialisation data that this pass takes as input, and one with only size information that is the output.  In addition to reflecting the purpose of this pass in the types, it also ensures that it cannot be accidentally left out. \subsection{Conversion to RTLabs} pass in its type, it also ensures that the pass cannot be accidentally omitted. \subsection{Conversion to \textsf{RTLabs}} This pass breaks down the structure of the \textsf{Cminor} program into a One possible variation would be to explicitly define a state monad to carry the function under construction around, but it is not yet clear if this will make the correctness easier to prove. make the correctness results easier to prove. \section{Adding and using invariants} in nature, and will evolve during task 3.4. Invariants on \textsf{Cminor} programs are defined using a higher order predicate which applies a given predicate to a statement and recursively to each of its substatements (and expressions and subexpressions, respectively). Thus during the \textsf{Clight} to \textsf{Cminor} stage the values returned are not just \textsf{Cminor} expressions and statements, but dependent pairs that also return invariants to establish that only local variables appear in the generated terms, and all labels appearing in \texttt{goto} statements are defined in the result.  The latter is proved by showing two properties: first by checking whether the labels are in the set defined by the original \textsf{Clight} function body, and second that the translation preserves the set of label statements. These two properties are used at the end of translation to show the invariant attached to \textsf{Cminor} \emph{functions}: that all \texttt{goto} labels are defined in the body.  Similarly, the invariant for variables is slightly different to the translation's: we go from the fact that every variable was classified as local to the appearance of every variable in the locals or parameters.  This is done using a lemma showing that the classification only allows parameters and locals to be classed as local'. The next translation to \textsf{RTLabs}\dots The use of the invariants follows a common pattern.  Each language embeds invariants in the function record that constrain the function body by other information in the record (such as the list of local variables and types, or the set of labels declared).  However, during the transformations they typically need to be refined to constraints on individual statements and expressions with respect to data structures used in the transformation. A similar change in invariants is required between the transformation and the new function. For example, consider the use of local variables in the \textsf{Cminor} to \textsf{RTLabs} stage.  We start with \begin{lstlisting}[language=matita] record internal_function : Type[0] ≝ { f_return    : option typ ; f_params    : list (ident × typ) ; f_vars      : list (ident × typ) ; f_stacksize : nat ; f_body      : stmt ; f_inv       : stmt_P (λs.stmt_vars (λi.Exists ? (λx.\fst x = i) (f_params @ f_vars)) s ∧ stmt_labels (λl.Exists ? (λl'.l' = l) (labels_of f_body)) s) f_body }. \end{lstlisting} where the first half of \lstinline'f_inv' requires every variable in the function body to appear in the parameter or variable list.  In the translation to \textsf{RTLabs}, variable lookups are performed in a map to \textsf{RTLabs} pseudo-registers: \begin{lstlisting}[language=matita] definition env ≝ identifier_map SymbolTag register. let rec add_expr (le:label_env) (env:env) (ty:typ) (e:expr ty) (Env:expr_vars ty e (present ?? env)) (dst:register) (f:partial_fn le) on e : Σf':partial_fn le. fn_graph_included le f f' ≝ match e return λty,e.expr_vars ty e (present ?? env) → Σf':partial_fn le. fn_graph_included le f f' with [ Id _ i ⇒ λEnv. let r ≝ lookup_reg env i Env in ... \end{lstlisting} Note that \lstinline'lookup_reg' returns a register without any possibility of error.  The reason this works is that the pattern match on \lstinline'e' refines the type of the invariant \lstinline'Env' to a proof that the variable \lstinline'i' is present.  We then pass this proof to the lookup function to rule out failure. When this map \lstinline'env' is constructed at the start of the phase, we prove that the proof \lstinline'f_inv' from the function implies the invariant on variables needed by \lstinline'add_expr' and its equivalent on statements: \begin{lstlisting}[language=matita] lemma populates_env : ∀l,e,u,l',e',u'. populate_env e u l = 〈l',e',u'〉 → ∀i. Exists ? (λx.\fst x = i) l → present ?? e' i. \end{lstlisting} A similar mechanism is used to show that \texttt{goto} labels are always declared in the function. Also note the return type of \lstinline'add_expr' is a dependent pair.  We build the resulting \textsf{RTLabs} function incrementally, using a type \lstinline'partial_fn' that does not contain the final invariant for functions.  We always require the \lstinline'fn_graph_included' property for partially built functions to show that the graph only gets larger, a key part of the proof that the resulting control flow graph is closed.  Dependent pairs are used in a similar manner in the \textsf{Clight} to \textsf{Cminor} phase too. This work does not currently cover all of the possible sources of failure; in particular some structural constraints on functions are not yet covered and some properties of \textsf{RTLabs} programs that may be useful for later stages or the correctness proofs are not produced.  Moreover, we may experiment with variations to try to make the proof obligations and syntax simpler to deal with.  However, it does show that retrofitting these properties using dependent types in \matita{} is feasible. \section{Testing} To provide some early testing and bug fixing of this code we constructed it in concert with the executable semantics described in deliverable 3.3, and \matita{} term pretty printers in the prototype compiler.  Using these, we were able to test the phases individually and together by running programs within the proof assistant itself, and comparing the results with the expected output. \section{Conclusion}