Changeset 1261 for Deliverables/D3.2

Sep 23, 2011, 4:17:44 PM (10 years ago)

More of D3.2.

1 edited


  • Deliverables/D3.2/Report/report.tex

    r1235 r1261  
    107107We also report on progress enriching these transformations with dependent
    108108types so as to establish key invariants in each intermediate language, which
    109 will assist in future work proving correctness properties of the compiler.
     109removes potential sources of failure within the compiler and
     110will assist in future work proving correctness properties.
    118119% TODO: fix unicode in listings
    119120% TODO: make it clear that our correctness is intended to go beyond compcert's
     121% TODO: capitalise deliverable/task?
    139143\quad \= $\downarrow$ \quad \= \kill
    152156\> $\downarrow$ \> start of target specific backend\\
    153 \ \  \dots
     157\>\quad \vdots
    155161\caption{Front-end languages and transformations}
    160166two intermediate languages involved are
    162 \item[\textsf{Cminor}] --- a C-like language where local variables are no
    163 longer held explicitly in memory and control structures are simpler
    165 \item[\textsf{RTLabs}] --- a language in the form of a control flow graph but
    166 retaining the front end operations from \textsf{Cminor}
     168\item[\textsf{Cminor}] --- a C-like language where local variables are not
     169explicitly allocated memory and control structures are simpler
     171\item[\textsf{RTLabs}] --- a language in the form of a control flow graph
     172which retains the values and front end operations from \textsf{Cminor}
    168174More details on the formalisation of the syntax and semantics of these
    175181We have been tracking revisions to the prototype compiler during the
    176 development of the formalized version.  These were usually minor, with the
    177 exception of the following major revision to the compiler.
     182development of the formalized version.  Most of these changes were minor, but
     183one exception is a major change to the structure of the compiler.
    179185The original plan for the front-end featured a \textsf{Clight} to
    180186\textsf{Clight8} phase near the start which replaced all of the integer
    181187values and operations by 8 bit counterparts, while pointers were split into
    182 bytes at a later stage.  Experimentation found that it would be difficult to
     188bytes at a later stage.  Experience has shown that it would be difficult to
    183189produce good code with this approach.  Instead, we now have:
    187193\item a cast removal stage which simplifies \textsf{Clight} expressions such
    189 \begin{lstlisting}[language=C]
    190   (char)((int)x + (int)y)
     196    (char)((int)x + (int)y)
    192 which are produced by integer promotion built into the CIL parser into
    193 equivalent operations on simpler types, \lstinline'x+y' in this case.
     198into equivalent operations on simpler types, \lstinline'x+y' in this case.
     199The cast removal is important because C requires \emph{arithmetic promotion}
     200of integer types to (at least) \lstinline'int' before an operation is
     201performed.  The \textsf{Clight} semantics do not perform the promotions,
     202instead they are added as casts by the CIL-based parser.  However, our targets
     203benefit immensely from performing operations on the smallest possible integer
     204type, so it is important that we remove promotions where possible.
    196207This document describes the formalized front end after these changes.
    198 \section{Clight phases}
     209\section{\textsf{Clight} phases}
    200211In addition to the conversion to \textsf{Cminor}, there are several
    201 transformations that act directly on the \textsf{Clight} language.
     212transformations which act directly on the \textsf{Clight} language.
    203214\subsection{Cast simplification}
    211222The prototype version worked by recognising fixed patterns in the
    212223\textsf{Clight} abstract syntax tree such as
    213 \[ (t)((t_1)e_1 op (t_2)e_2) \]
    214 subject to restrictions on the types, and replaces them with a simpler
    215 version.  These deep pattern matches are slightly awkward in \matita{} and
    216 they do not capture compositions of operations, such as
     224\[ (t)((t_1)e_1\ op\ (t_2)e_2), \]
     225subject to restrictions on the types.  These are replaced with a simpler
     226version without the casts.  Such `deep' pattern matching is slightly awkward in
     227\matita{} and this approach does not capture compositions of operations, such as
    218 (char)((int)a + (int)b + (int)c)
     229    (char)(((int)a + (int)b) + (int)c)
    220231where \lstinline'a', \lstinline'b' and \lstinline'c' are of type
     232\lstinline'char', because the intermediate expression is not cast to and from
    227239the subexpression is of type \lstinline'char' and eliminates the cast.
    228240Moreover, when the recursive processing is complete the \lstinline'char' cast
    229 is also eliminated because its subexpression is now of the correct type.
    231 This has been formalized in \matita.  We have also performed a few proofs that
     241is also eliminated because its subexpression is already of the correct type.
     243This has been implemented in \matita.  We have also performed a few proofs that
    232244the arithmetic behind these changes is correct to gain confidence in the
    233245technique.  During task 3.4 we will extend these proofs to cover more
    267279necessary because \textsf{Clight} allows arbitrary \emph{lvalue} expressions
    268280as the destination for the returned value, but \textsf{Cminor} only allows
    269 local variables.  However, the variable names are supplied with the input
    270 program, but without any method for generating fresh names.
     281local variables.  All other variable names in the \textsf{Cminor} program came
     282from the \textsf{Clight} program, but we need to construct a method for
     283generating fresh names for the temporaries.
    272285Our identifiers are based on binary numbers, and generation of fresh names is
    273 handled by keeping track of the highest allocated number.  Thus we create a
    274 new name generator from the input program by finding the maximum number used.
    276 \section{Cminor phases}
    278 Only two phases deal with \textsf{Cminor} programs:
     286handled by keeping track of the highest allocated number.  Normally this is
     287initialised at zero, but if initialised by the largest existing identifier in
     288the \textsf{Clight} program then the generated names will be fresh.
     289To do this, we extract the maximum identifier by recursively finding the maximum
     290variable name used in every expression, statement and function of the program.
     292\section{\textsf{Cminor} phases}
     294\textsf{Cminor} programs are processed by two passes: one deals with the
     295initialisation of global variables, and the other produces \textsf{RTLabs}
    280298\subsection{Initialisation code}
    285303initialisation data that this pass takes as input, and one with only size
    286304information that is the output.  In addition to reflecting the purpose of this
    287 pass in the types, it also ensures that it cannot be accidentally left out.
    289 \subsection{Conversion to RTLabs}
     305pass in its type, it also ensures that the pass cannot be accidentally omitted.
     307\subsection{Conversion to \textsf{RTLabs}}
    291309This pass breaks down the structure of the \textsf{Cminor} program into a
    299317One possible variation would be to explicitly define a state monad to carry
    300318the function under construction around, but it is not yet clear if this will
    301 make the correctness easier to prove.
     319make the correctness results easier to prove.
    303321\section{Adding and using invariants}
    315333in nature, and will evolve during task 3.4.
    317 Invariants on \textsf{Cminor} programs are defined using a higher order
    318 predicate which applies a given predicate to a statement and recursively to
    319 each of its substatements (and expressions and subexpressions, respectively).
    321 Thus during the \textsf{Clight} to \textsf{Cminor} stage the values returned
    322 are not just \textsf{Cminor} expressions and statements, but dependent pairs
    323 that also return invariants to establish that only local variables appear in
    324 the generated terms, and all labels appearing in \texttt{goto} statements are
    325 defined in the result.  The latter is proved by showing two properties: first
    326 by checking whether the labels are in the set defined by the original
    327 \textsf{Clight} function body, and second that the translation preserves the
    328 set of label statements.
    330 These two properties are used at the end of translation to show the invariant
    331 attached to \textsf{Cminor} \emph{functions}: that all \texttt{goto} labels
    332 are defined in the body.  Similarly, the invariant for variables is slightly
    333 different to the translation's: we go from the fact that every variable was
    334 classified as local to the appearance of every variable in the locals or
    335 parameters.  This is done using a lemma showing that the classification only
    336 allows parameters and locals to be classed as `local'.
    338 The next translation to \textsf{RTLabs}\dots
     335The use of the invariants follows a common pattern.  Each language embeds
     336invariants in the function record that constrain the function body by other
     337information in the record (such as the list of local variables and types, or
     338the set of labels declared).  However, during the transformations they
     339typically need to be refined to constraints on individual statements and
     340expressions with respect to data structures used in the transformation.
     341A similar change in invariants is required between the transformation and the
     342new function.
     344For example, consider the use of local variables in the \textsf{Cminor} to
     345\textsf{RTLabs} stage.  We start with
     347record internal_function : Type[0] ≝
     348{ f_return    : option typ
     349; f_params    : list (ident × typ)
     350; f_vars      : list (ident × typ)
     351; f_stacksize : nat
     352; f_body      : stmt
     353; f_inv       : stmt_P (λs.stmt_vars (λi.Exists ? (λx.\fst x = i) (f_params @ f_vars)) s ∧
     354                           stmt_labels (λl.Exists ? (λl'.l' = l) (labels_of f_body)) s) f_body
     357where the first half of \lstinline'f_inv' requires every variable in the
     358function body to appear in the parameter or variable list.  In the translation
     359to \textsf{RTLabs}, variable lookups are performed in a map to \textsf{RTLabs}
     362definition env ≝ identifier_map SymbolTag register.
     364let rec add_expr (le:label_env) (env:env) (ty:typ) (e:expr ty)
     365                   (Env:expr_vars ty e (present ?? env))
     366                   (dst:register) (f:partial_fn le) on e
     367     : Σf':partial_fn le. fn_graph_included le f f' ≝
     368match e return λty,e.expr_vars ty e (present ?? env) →
     369                 Σf':partial_fn le. fn_graph_included le f f' with
     370[ Id _ i ⇒ λEnv.
     371    let r ≝ lookup_reg env i Env in
     372    ...
     374Note that \lstinline'lookup_reg' returns a register without any possibility of
     375error.  The reason this works is that the pattern match on \lstinline'e'
     376refines the type of the invariant \lstinline'Env' to a proof that the variable
     377\lstinline'i' is present.  We then pass this proof to the lookup function to
     378rule out failure.
     380When this map \lstinline'env' is constructed at the start of the phase, we
     381prove that the proof \lstinline'f_inv' from the function implies the invariant
     382on variables needed by \lstinline'add_expr' and its equivalent on statements:
     384lemma populates_env : ∀l,e,u,l',e',u'.
     385  populate_env e u l = 〈l',e',u'〉 →
     386  ∀i. Exists ? (λx.\fst x = i) l →
     387      present ?? e' i.
     389A similar mechanism is used to show that \texttt{goto} labels are always
     390declared in the function.
     392Also note the return type of \lstinline'add_expr' is a dependent pair.  We
     393build the resulting \textsf{RTLabs} function incrementally, using a type
     394\lstinline'partial_fn' that does not contain the final invariant for
     395functions.  We always require the \lstinline'fn_graph_included' property for
     396partially built functions to show that the graph only gets larger, a key part
     397of the proof that the resulting control flow graph is closed.  Dependent pairs
     398are used in a similar manner in the \textsf{Clight} to \textsf{Cminor} phase
     401This work does not currently cover all of the possible sources of failure; in
     402particular some structural constraints on functions are not yet covered and
     403some properties of \textsf{RTLabs} programs that may be useful for later
     404stages or the correctness proofs are not produced.  Moreover, we may
     405experiment with variations to try to make the proof obligations and syntax
     406simpler to deal with.  However, it does show that retrofitting these
     407properties using dependent types in \matita{} is feasible.
     411To provide some early testing and bug fixing of this code we constructed it in
     412concert with the executable semantics described in deliverable 3.3, and
     413\matita{} term pretty printers in the prototype compiler.  Using these, we
     414were able to test the phases individually and together by running programs
     415within the proof assistant itself, and comparing the results with the expected
Note: See TracChangeset for help on using the changeset viewer.