Changeset 1265


Ignore:
Timestamp:
Sep 26, 2011, 11:29:19 AM (8 years ago)
Author:
campbell
Message:

Revisions to D3.3.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • Deliverables/D3.3/Report/report.tex

    r1235 r1265  
    101101We report on the formalization of the front end intermediate languages for the
    102102\cerco{} project's compiler using executable semantics in the \matita{} proof
    103 assistant.
     103assistant.  This includes common definitions for fresh identifier handling,
     104$n$-bit arithmetic and operations and testing of the smeantics.  We also
     105consider embedding invariants into the semantics for showing correctness
     106properties.
    104107
    105108\newpage
     
    112115% TODO: mention the restricted form of types after Clight
    113116% TODO: fix unicode in listings
     117% TODO: say something about what CIC is, and why we're doing this.
    114118
    115119\section{Introduction}
     
    117121This work formalizes the front end intermediate languages from the \cerco{}
    118122prototype compiler, described in previous deliverables~\cite{d2.1,d2.2}.  The
    119 front end of the compiler is summarized in Figure~\ref{fig:summary}.  We
    120 have also refined parts of the formal development that are used for several of
    121 the languages in the compiler.
    122 
    123 We will also report on work to add several invariants to the languages.  This
    124 activity overlaps with task 3.4 on the correctness of the compiler front end.
    125 However, the use of dependent types mean that this work is tied closely to the
    126 definition of the languages and the transformations of the front end in task
    127 3.2.  By considering it now we can experiment with and judge its impact on the
    128 formal semantics, and how feasible retrofitting such invariants is.
     123front end of the compiler is summarized in Figure~\ref{fig:summary} including
     124the intermediate languages and the compiler passes described in the
     125accompanying deliverable 3.2.  We have also refined parts of the formal
     126development that are used for several of the languages in the compiler.
    129127
    130128The input language to the formalized front end is the \textsf{Clight}
    131129language.  The executable semantics for this language were presented in a
    132 previous deliverable~\cite{d3.1}.  In the present work we will report on some
    133 minor changes to the semantics made to better align it with the whole
     130previous deliverable~\cite{d3.1}.  Here we will report on some
     131minor changes to its semantics made to better align it with the whole
    134132development.
    135133
     134Finally, we will report on work to add several invariants to the
     135languages.  This activity overlaps with task 3.4 on the correctness of the
     136compiler front end.  However, the use of dependent types mean that this work
     137is tied closely to the definition of the languages and the transformations of
     138the front end in task 3.2.  By considering it now we can experiment with and
     139judge its impact on the formal semantics, and how feasible retrofitting such
     140invariants is.
     141
    136142\begin{figure}
     143\begin{center}
     144\begin{minipage}{.8\linewidth}
    137145\begin{tabbing}
    138146\quad \= $\downarrow$ \quad \= \kill
     
    150158\textsf{RTLabs}\\
    151159\> $\downarrow$ \> start of target specific backend\\
    152 \ \  \dots
     160\>\quad \vdots
    153161\end{tabbing}
     162\end{minipage}
     163\end{center}
    154164\caption{Front end languages and transformations}
    155165\label{fig:summary}
     
    158168\subsection{Revisions to the prototype compiler}
    159169
    160 Ongoing work to maintain and improve the prototype compiler resulted in
    161 several changes.  Most importantly, the transformations to replace 16 and 32
    162 bit types have been moved from the \textsf{Clight} language to the
    163 target-specific stage between \textsf{RTLabs} and \textsf{RTL} to help
    164 generate better code, and the addition of a \textsf{Clight} cast removal
    165 transformation to reduce the number of 16 and 32 bit operations.
     170Ongoing work to maintain and improve the prototype compiler has resulted in
     171several changes, mostly minor.  The most important change is that the
     172transformations to replace 16 and 32 bit types have been moved from the
     173\textsf{Clight} language to the target-specific stage between \textsf{RTLabs}
     174and \textsf{RTL} to help generate better code, and the addition of a
     175\textsf{Clight} cast removal transformation to reduce the number of 16 and 32
     176bit operations.
    166177
    167178The formalized semantics have tracked these changes, and in this report we
     
    177188we will revise the memory model to make it better suited for describing the
    178189back-end of our compiler.}.  The failure and I/O monad was introduced in the
    179 previous deliverable on \textsf{Clight}~\cite[\S4.2]{d3.1}.  The other parts
     190previous deliverable on \textsf{Clight}~\cite[\S4.2]{d3.1}.  In all of the
     191languages except \textsf{Clight} we have a basic form of type, \lstinline'typ'
     192identifying integers and pointers along with their sizes.  The other parts
    180193are discussed below, with the only change to the runtime values being the
    181194representation of integers.
     
    183196\subsection{Identifiers}
    184197
    185 Each language features one or more kinds of names to represent variables,
    186 registers, \texttt{goto} labels and/or RTL graph labels.  We also need to
    187 describe various maps keyed on identifiers in the semantics and during
    188 compilation.
     198Each language features one or more kinds of names to represent variables, such
     199as registers, \texttt{goto} labels or RTL graph labels.  We also need to
     200describe various maps whose domain is a set of identifiers when defining the
     201semantics and compilation.
    189202
    190203Previous work on the executable semantics of the target machine
     
    194207identifiers and maps respectively.
    195208
    196 However, given the wide variety of identifiers used in the compiler we also
     209One difficulty with using fixed size bit vectors for identifiers is that
     210fresh name generation can fail if generate too many.  While we use an error
     211monad to deal with failures, we wish to minimize its use in the compiler.
     212Thus we add a flag to detect overflows, and check it after the phase of the
     213compiler is complete to report exhaustion.  The rest of the phase can then be
     214written as if name generation always succeeds.  In practice this will never
     215occur on normal programs because more identifiers of each sort are available
     216than bytes of code memory on the target.
     217
     218Given the wide variety of identifiers used in the compiler we also
    197219wish to separate the different classes of identifier.  Thus we encapsulate the
    198220bit vector representing the identifier in a datatype that also carries a tag
    199221identifying which class of identifier we are using:
    200222\begin{lstlisting}
    201 inductive identifier (tag:String) : Type[0] ≝
    202   an_identifier : Word → identifier tag.
    203 \end{lstlisting}
    204 The tries are also tagged in the same manner.
    205 
    206 One difficultly with using fixed size bit vectors for identifiers is that
    207 fresh name generation can fail if generate too many.  While we use an error
    208 monad to deal with failures, we wish to minimize its use in the compiler.
    209 Thus we add a flag to detect overflows, and check it after the phase of the
    210 compiler is complete to report exhaustion.  The rest of the phase can then be
    211 written as if name generation always succeeds.
     223    inductive identifier (tag:String) : Type[0] ≝
     224      an_identifier : Word → identifier tag.
     225\end{lstlisting}
     226The tries are also tagged in the same manner.  These tags have also proved
     227useful during testing by making the resulting terms more readable.
    212228
    213229\subsection{Machine integers and arithmetic}
     
    227243pointers is treated as a different operation from subtraction of integers.
    228244
    229 A common semantics is given for these operations, as simple functions on the
    230 operation and runtime values.
    231 
    232 \section{Clight}
     245A common semantics is given for these operations in the form of simple
     246CIC functions on the operation and runtime values.
     247
     248\section{\textsf{Clight} modifications}
    233249
    234250The \textsf{Clight} input language remained largely the same as in the
    235251previous deliverable~\cite{d3.1}.  The principal changes were to use the
    236252identifiers and arithmetic described above in place of the arbitrarily large
    237 integers used above.  For the identifiers, this relieved us of the burden of
     253integers used before.  For the identifiers, this relieved us of the burden of
    238254adding an efficient datatype for maps by reusing the bit vector tries instead.
    239255
    240 The arithmetic replaced a dependent pair of an arbitrary integer and proof
     256The arithmetic replaced a dependent pair of an arbitrary integer and a proof
    241257that it was in range of 32 bit integers by the exact bit vector for each
    242258size of integer.  This direct approach is closer to the implementation and
    243259more obviously correct --- no extra precision can be left in by accident.
    244260
    245 \section{Cminor}
     261\section{\textsf{Cminor}}
    246262
    247263The \textsf{Cminor} language does not store local variables in memory, and has
    248 simpler control structures than \textsf{Clight}.  The syntax is similar to the
    249 prototype, except that the types attached to expressions are restricted so
    250 that some corner cases are ruled out in the \textsf{Cminor} to \textsf{RTLabs}
    251 stage (see the accompanying deliverable 3.2 for details):
     264simpler control structures than \textsf{Clight}.  It is similar in nature to
     265the \textsf{Cminor} language in CompCert, although the semantics have been
     266based on the \cerco{} prototype rather than ported from CompCert.  The syntax
     267is similar to the prototype, except that the types attached to expressions are
     268restricted so that some corner cases are ruled out in the \textsf{Cminor} to
     269\textsf{RTLabs} stage (see the accompanying deliverable 3.2 for details):
    252270\begin{lstlisting}
    253 inductive expr : typ → Type[0] ≝
    254 | Id : ∀t. ident → expr t
    255 | Cst : ∀t. constant → expr t
    256 | Op1 : ∀t,t'. unary_operation → expr t → expr t'
    257 | Op2 : ∀t1,t2,t'. binary_operation → expr t1 → expr t2 → expr t'
    258 | Mem : ∀t,r. memory_chunk → expr (ASTptr r) → expr t
    259 | Cond : ∀sz,sg,t. expr (ASTint sz sg) → expr t → expr t → expr t
    260 | Ecost : ∀t. costlabel → expr t → expr t.
    261 \end{lstlisting}
     271    inductive expr : typ → Type[0] ≝
     272    | Id : ∀t. ident → expr t
     273    | Cst : ∀t. constant → expr t
     274    | Op1 : ∀t,t'. unary_operation → expr t → expr t'
     275    | Op2 : ∀t1,t2,t'. binary_operation → expr t1 → expr t2 → expr t'
     276    | Mem : ∀t,r. memory_chunk → expr (ASTptr r) → expr t
     277    | Cond : ∀sz,sg,t. expr (ASTint sz sg) → expr t → expr t → expr t
     278    | Ecost : ∀t. costlabel → expr t → expr t.
     279\end{lstlisting}
     280For example, note that conditional expressions only switch on integer
     281expressions.
    262282In principle we could extend this to statically ensure that only well-typed
    263283\textsf{Cminor} expressions are considered, and we will consider this as part
     
    272292environments, arithmetic and operations mentioned above.
    273293
    274 \section{RTLabs}
     294\section{\textsf{RTLabs}}
    275295
    276296The \textsf{RTLabs} language provides a target independent Register Transfer
     
    281301a \texttt{goto} label from Cminor where a graph label should appear).
    282302
    283 Otherwise, the syntax and semantics of RTLabs mirrors that of the prototype
    284 compiler.  The same common elements are used as for \textsf{Cminor}, including
    285 the front end operations mentioned above.
     303Otherwise, the syntax and semantics of \textsf{RTLabs} mirrors that of the
     304prototype compiler.  The same common elements are used as for \textsf{Cminor},
     305including the front end operations mentioned above.
     306
     307% TODO: give fragment of syntax, in particular the type of the graph
    286308
    287309\section{Testing}
     
    290312support the testing described in the accompanying deliverable 3.2, we have
    291313adapted the pretty printers in the prototype compiler to produce \matita{}
    292 terms in the syntax for each language described above.
     314terms for the syntax of each language described above.
    293315
    294316A few common definitions were added for animating the small step semantics
     
    296318small selection of test cases to ensure basic functionality.  However, this is
    297319still a time consuming process, so more testing will carried out once the
    298 extraction to \ocaml{} programs is implemented in \matita.
     320extraction of CIC terms to \ocaml{} programs is implemented in \matita.
    299321
    300322\section{Embedding invariants}
     
    305327a \texttt{break} statement outside of a loop or \texttt{switch}, and so on.
    306328We wish to restrict our intermediate languages using dependent types to remove
    307 as many `junk' programs as possible.  We also hope that such restrictions will
    308 help in other correctness proofs.
     329as many `junk' programs as possible to rule out such failures.  We also hope
     330that such restrictions will help in other correctness proofs.
    309331
    310332This goal lies in the overlap between several tasks in the project: it
     
    315337repository and the final form will be decided and merged in during task 3.4.
    316338
    317 So far we have tried adding two forms of invariant --- one by using dependent
     339So far we have tried adding two forms of invariant --- one using dependent
    318340types to index statements in \textsf{Cminor} by their block depth, and the
    319341other asserts that variables and labels are present in the appropriate
     
    323345to globals are well-defined.
    324346
    325 \subsection{Cminor block depth}
    326 
    327 The \textsf{Cminor} language simplifies loops with a single infinite loop
    328 construct and the switch statement does not contain the individual cases.
    329 Instead, statements are provided for infinite loops, non-looping blocks and
    330 exiting an arbitrary number of blocks (which is also what the switch statement
    331 does\footnote{We are considering replacing the \textsf{Cminor} switch
    332 statement with one that uses \texttt{goto}-like labels in both the prototype
    333 and the formalized compilers, but for now we stick with this CompCert-like
    334 arrangement.}.
     347\subsection{\textsf{Cminor} block depth}
     348
     349The \textsf{Cminor} language has relatively simple control structures.
     350Statements are provided for infinite loops, non-looping blocks and
     351exiting an arbitrary number of blocks (for \texttt{break}, failing loop guards
     352and the switch statement\footnote{We are considering replacing the
     353\textsf{Cminor} switch statement with one that uses \texttt{goto}-like labels
     354in both the prototype and the formalized compilers, but for now we stick with
     355this CompCert-style arrangement.}).
    335356
    336357However, this means that there are badly-formed \textsf{Cminor} programs such
    337358as
    338359\begin{lstlisting}[language={}]
    339   int main() {
    340     block {
    341       exit 5
     360    int main() {
     361      block {
     362        loop {
     363          exit 5
     364        }
     365      }
    342366    }
    343   }
    344367\end{lstlisting}
    345368where we attempt to exit more blocks than exist.  To rule these out (including
     
    351374integers in the exit and switch statements:
    352375\begin{lstlisting}[language=matita]
    353 inductive stmt : ∀blockdepth:nat. Type[0] ≝
    354 | St_skip : ∀n. stmt n
    355 ...
    356 | St_loop : ∀n. stmt n → stmt n
    357 | St_block : ∀n. stmt (S n) → stmt n
    358 | St_exit : ∀n. Fin n → stmt n
    359 (* expr to switch on, table of <switch value, #blocks to exit>, default *)
    360 | St_switch : expr → ∀n. list (int × (Fin n)) → Fin n → stmt n
    361 ...
    362 \end{lstlisting}
    363 where \texttt{stmt n} is a statement enclosed in \texttt{n} blocks or loops,
     376    inductive stmt : ∀blockdepth:nat. Type[0] ≝
     377    | St_skip : ∀n. stmt n
     378    ...
     379    | St_loop : ∀n. stmt n → stmt n
     380    | St_block : ∀n. stmt (S n) → stmt n
     381    | St_exit : ∀n. Fin n → stmt n
     382    (* expr to switch on, table of <switch value, #blocks to exit>, default *)
     383    | St_switch : expr → ∀n. list (int × (Fin n)) → Fin n → stmt n
     384    ...
     385\end{lstlisting}
     386where \texttt{stmt n} is a statement enclosed in \texttt{n} blocks,
    364387and \texttt{Fin n} is a standard construction for a natural number which is at
    365388most \texttt{n}.
     
    368391state, and the function to find the continuation from an exit statement can be
    369392made failure-free.
    370 
    371393We note in passing that adding this parameter detected a small mistake in the
    372394semantics concerning continuations and tail calls, although the mistake itself
     
    375397\subsection{Identifier invariants}
    376398
    377 For showing that the variables and labels occurring in the body of a function
    378 are present in the relevant structures are add an additional invariant to the
     399To show that the variables and labels occurring in the body of a function
     400are present in the relevant structures we add an additional invariant to the
    379401function records.
    380402
    381 For \textsf{Cminor} we add a higher-order predicate which recursively applies
     403For \textsf{Cminor} we use a higher-order predicate which recursively applies
    382404a predicate to all substatements:
    383405\begin{lstlisting}[language=matita]
     
    393415].
    394416\end{lstlisting}
    395 Dependent pattern matching on statements thus allows \lstinline'stmt_P' to
    396 unfold to the predicate on the current statement as well as the predicate
    397 applied to all substatements.
    398 
    399 We use two properties in \textsf{Cminor}:
     417Dependent pattern matching on statements thus allows an accompanying
     418\lstinline'stmt_P' fact to be unfold to the predicate on the current statement
     419and the predicate applied to all substatements.
     420
     421We require two properties to hold in \textsf{Cminor} functions:
    400422\begin{enumerate}
    401 \item All variables are present in the list of parameters or the list of
    402 variables for the function (this also uses a similar recursive predicate on
    403 expressions).
     423\item All variables in the body are present in the list of parameters or the
     424list of variables for the function (this also uses a similar recursive
     425predicate on expressions).
    404426\item All labels in \texttt{goto} statements appear in a label statement.
    405427\end{enumerate}
     
    416438}.
    417439\end{lstlisting}
    418 where \lstinline'stmt_vars' and \lstinline'stmt_labels' return the
     440where \lstinline'stmt_vars' and \lstinline'stmt_labels' constrain the
    419441variables and labels that appear directly in a statement (but not
    420 substatements), and
     442substatements) to appear in the given list, and
    421443\lstinline'labels_of' returns a list of all the labels defined in a
    422444statement.
     
    427449state and continuations.  It was convenient to split the continuations between
    428450the local continuation representing the rest of the code to be executed within
    429 the function, and the stack of function calls.  The invariant for variables is
     451the function, and the stack of function calls.
     452% TODO: because???
     453  The invariant for variables is
    430454slightly different --- we require that every variable appear in the local
    431455environment.  We use \lstinline'f_inv' from the function to establish this
     
    435459witnesses that the invariants are those we wanted, but makes no difference to
    436460the actual execution of the program, especially as the execution can still
    437 fail due to runtime errors.  Moreover, it is unclear what effect the presence
    438 of proof terms and more dependent pattern matching in the semantics will have
    439 on the complexity of future correctness proofs.  We plan to examine this issue
    440 during task 3.4.
     461fail due to genuine runtime errors.  Moreover, it is unclear what effect the
     462presence of proof terms and more dependent pattern matching in the semantics
     463will have on the complexity of future correctness proofs.  We plan to examine
     464this issue during task 3.4.
    441465
    442466We use a similar method to specify the invariant that the \textsf{RTLabs}
Note: See TracChangeset for help on using the changeset viewer.