# Changeset 1265 for Deliverables/D3.3/Report

Ignore:
Timestamp:
Sep 26, 2011, 11:29:19 AM (9 years ago)
Message:

Revisions to D3.3.

File:
1 edited

### Legend:

Unmodified
 r1235 We report on the formalization of the front end intermediate languages for the \cerco{} project's compiler using executable semantics in the \matita{} proof assistant. assistant.  This includes common definitions for fresh identifier handling, $n$-bit arithmetic and operations and testing of the smeantics.  We also consider embedding invariants into the semantics for showing correctness properties. \newpage % TODO: mention the restricted form of types after Clight % TODO: fix unicode in listings % TODO: say something about what CIC is, and why we're doing this. \section{Introduction} This work formalizes the front end intermediate languages from the \cerco{} prototype compiler, described in previous deliverables~\cite{d2.1,d2.2}.  The front end of the compiler is summarized in Figure~\ref{fig:summary}.  We have also refined parts of the formal development that are used for several of the languages in the compiler. We will also report on work to add several invariants to the languages.  This activity overlaps with task 3.4 on the correctness of the compiler front end. However, the use of dependent types mean that this work is tied closely to the definition of the languages and the transformations of the front end in task 3.2.  By considering it now we can experiment with and judge its impact on the formal semantics, and how feasible retrofitting such invariants is. front end of the compiler is summarized in Figure~\ref{fig:summary} including the intermediate languages and the compiler passes described in the accompanying deliverable 3.2.  We have also refined parts of the formal development that are used for several of the languages in the compiler. The input language to the formalized front end is the \textsf{Clight} language.  The executable semantics for this language were presented in a previous deliverable~\cite{d3.1}.  In the present work we will report on some minor changes to the semantics made to better align it with the whole previous deliverable~\cite{d3.1}.  Here we will report on some minor changes to its semantics made to better align it with the whole development. Finally, we will report on work to add several invariants to the languages.  This activity overlaps with task 3.4 on the correctness of the compiler front end.  However, the use of dependent types mean that this work is tied closely to the definition of the languages and the transformations of the front end in task 3.2.  By considering it now we can experiment with and judge its impact on the formal semantics, and how feasible retrofitting such invariants is. \begin{figure} \begin{center} \begin{minipage}{.8\linewidth} \begin{tabbing} \quad \= $\downarrow$ \quad \= \kill \textsf{RTLabs}\\ \> $\downarrow$ \> start of target specific backend\\ \ \  \dots \>\quad \vdots \end{tabbing} \end{minipage} \end{center} \caption{Front end languages and transformations} \label{fig:summary} \subsection{Revisions to the prototype compiler} Ongoing work to maintain and improve the prototype compiler resulted in several changes.  Most importantly, the transformations to replace 16 and 32 bit types have been moved from the \textsf{Clight} language to the target-specific stage between \textsf{RTLabs} and \textsf{RTL} to help generate better code, and the addition of a \textsf{Clight} cast removal transformation to reduce the number of 16 and 32 bit operations. Ongoing work to maintain and improve the prototype compiler has resulted in several changes, mostly minor.  The most important change is that the transformations to replace 16 and 32 bit types have been moved from the \textsf{Clight} language to the target-specific stage between \textsf{RTLabs} and \textsf{RTL} to help generate better code, and the addition of a \textsf{Clight} cast removal transformation to reduce the number of 16 and 32 bit operations. The formalized semantics have tracked these changes, and in this report we we will revise the memory model to make it better suited for describing the back-end of our compiler.}.  The failure and I/O monad was introduced in the previous deliverable on \textsf{Clight}~\cite[\S4.2]{d3.1}.  The other parts previous deliverable on \textsf{Clight}~\cite[\S4.2]{d3.1}.  In all of the languages except \textsf{Clight} we have a basic form of type, \lstinline'typ' identifying integers and pointers along with their sizes.  The other parts are discussed below, with the only change to the runtime values being the representation of integers. \subsection{Identifiers} Each language features one or more kinds of names to represent variables, registers, \texttt{goto} labels and/or RTL graph labels.  We also need to describe various maps keyed on identifiers in the semantics and during compilation. Each language features one or more kinds of names to represent variables, such as registers, \texttt{goto} labels or RTL graph labels.  We also need to describe various maps whose domain is a set of identifiers when defining the semantics and compilation. Previous work on the executable semantics of the target machine identifiers and maps respectively. However, given the wide variety of identifiers used in the compiler we also One difficulty with using fixed size bit vectors for identifiers is that fresh name generation can fail if generate too many.  While we use an error monad to deal with failures, we wish to minimize its use in the compiler. Thus we add a flag to detect overflows, and check it after the phase of the compiler is complete to report exhaustion.  The rest of the phase can then be written as if name generation always succeeds.  In practice this will never occur on normal programs because more identifiers of each sort are available than bytes of code memory on the target. Given the wide variety of identifiers used in the compiler we also wish to separate the different classes of identifier.  Thus we encapsulate the bit vector representing the identifier in a datatype that also carries a tag identifying which class of identifier we are using: \begin{lstlisting} inductive identifier (tag:String) : Type[0] ≝ an_identifier : Word → identifier tag. \end{lstlisting} The tries are also tagged in the same manner. One difficultly with using fixed size bit vectors for identifiers is that fresh name generation can fail if generate too many.  While we use an error monad to deal with failures, we wish to minimize its use in the compiler. Thus we add a flag to detect overflows, and check it after the phase of the compiler is complete to report exhaustion.  The rest of the phase can then be written as if name generation always succeeds. inductive identifier (tag:String) : Type[0] ≝ an_identifier : Word → identifier tag. \end{lstlisting} The tries are also tagged in the same manner.  These tags have also proved useful during testing by making the resulting terms more readable. \subsection{Machine integers and arithmetic} pointers is treated as a different operation from subtraction of integers. A common semantics is given for these operations, as simple functions on the operation and runtime values. \section{Clight} A common semantics is given for these operations in the form of simple CIC functions on the operation and runtime values. \section{\textsf{Clight} modifications} The \textsf{Clight} input language remained largely the same as in the previous deliverable~\cite{d3.1}.  The principal changes were to use the identifiers and arithmetic described above in place of the arbitrarily large integers used above.  For the identifiers, this relieved us of the burden of integers used before.  For the identifiers, this relieved us of the burden of adding an efficient datatype for maps by reusing the bit vector tries instead. The arithmetic replaced a dependent pair of an arbitrary integer and proof The arithmetic replaced a dependent pair of an arbitrary integer and a proof that it was in range of 32 bit integers by the exact bit vector for each size of integer.  This direct approach is closer to the implementation and more obviously correct --- no extra precision can be left in by accident. \section{Cminor} \section{\textsf{Cminor}} The \textsf{Cminor} language does not store local variables in memory, and has simpler control structures than \textsf{Clight}.  The syntax is similar to the prototype, except that the types attached to expressions are restricted so that some corner cases are ruled out in the \textsf{Cminor} to \textsf{RTLabs} stage (see the accompanying deliverable 3.2 for details): simpler control structures than \textsf{Clight}.  It is similar in nature to the \textsf{Cminor} language in CompCert, although the semantics have been based on the \cerco{} prototype rather than ported from CompCert.  The syntax is similar to the prototype, except that the types attached to expressions are restricted so that some corner cases are ruled out in the \textsf{Cminor} to \textsf{RTLabs} stage (see the accompanying deliverable 3.2 for details): \begin{lstlisting} inductive expr : typ → Type[0] ≝ | Id : ∀t. ident → expr t | Cst : ∀t. constant → expr t | Op1 : ∀t,t'. unary_operation → expr t → expr t' | Op2 : ∀t1,t2,t'. binary_operation → expr t1 → expr t2 → expr t' | Mem : ∀t,r. memory_chunk → expr (ASTptr r) → expr t | Cond : ∀sz,sg,t. expr (ASTint sz sg) → expr t → expr t → expr t | Ecost : ∀t. costlabel → expr t → expr t. \end{lstlisting} inductive expr : typ → Type[0] ≝ | Id : ∀t. ident → expr t | Cst : ∀t. constant → expr t | Op1 : ∀t,t'. unary_operation → expr t → expr t' | Op2 : ∀t1,t2,t'. binary_operation → expr t1 → expr t2 → expr t' | Mem : ∀t,r. memory_chunk → expr (ASTptr r) → expr t | Cond : ∀sz,sg,t. expr (ASTint sz sg) → expr t → expr t → expr t | Ecost : ∀t. costlabel → expr t → expr t. \end{lstlisting} For example, note that conditional expressions only switch on integer expressions. In principle we could extend this to statically ensure that only well-typed \textsf{Cminor} expressions are considered, and we will consider this as part environments, arithmetic and operations mentioned above. \section{RTLabs} \section{\textsf{RTLabs}} The \textsf{RTLabs} language provides a target independent Register Transfer a \texttt{goto} label from Cminor where a graph label should appear). Otherwise, the syntax and semantics of RTLabs mirrors that of the prototype compiler.  The same common elements are used as for \textsf{Cminor}, including the front end operations mentioned above. Otherwise, the syntax and semantics of \textsf{RTLabs} mirrors that of the prototype compiler.  The same common elements are used as for \textsf{Cminor}, including the front end operations mentioned above. % TODO: give fragment of syntax, in particular the type of the graph \section{Testing} support the testing described in the accompanying deliverable 3.2, we have adapted the pretty printers in the prototype compiler to produce \matita{} terms in the syntax for each language described above. terms for the syntax of each language described above. A few common definitions were added for animating the small step semantics small selection of test cases to ensure basic functionality.  However, this is still a time consuming process, so more testing will carried out once the extraction to \ocaml{} programs is implemented in \matita. extraction of CIC terms to \ocaml{} programs is implemented in \matita. \section{Embedding invariants} a \texttt{break} statement outside of a loop or \texttt{switch}, and so on. We wish to restrict our intermediate languages using dependent types to remove as many junk' programs as possible.  We also hope that such restrictions will help in other correctness proofs. as many junk' programs as possible to rule out such failures.  We also hope that such restrictions will help in other correctness proofs. This goal lies in the overlap between several tasks in the project: it repository and the final form will be decided and merged in during task 3.4. So far we have tried adding two forms of invariant --- one by using dependent So far we have tried adding two forms of invariant --- one using dependent types to index statements in \textsf{Cminor} by their block depth, and the other asserts that variables and labels are present in the appropriate to globals are well-defined. \subsection{Cminor block depth} The \textsf{Cminor} language simplifies loops with a single infinite loop construct and the switch statement does not contain the individual cases. Instead, statements are provided for infinite loops, non-looping blocks and exiting an arbitrary number of blocks (which is also what the switch statement does\footnote{We are considering replacing the \textsf{Cminor} switch statement with one that uses \texttt{goto}-like labels in both the prototype and the formalized compilers, but for now we stick with this CompCert-like arrangement.}. \subsection{\textsf{Cminor} block depth} The \textsf{Cminor} language has relatively simple control structures. Statements are provided for infinite loops, non-looping blocks and exiting an arbitrary number of blocks (for \texttt{break}, failing loop guards and the switch statement\footnote{We are considering replacing the \textsf{Cminor} switch statement with one that uses \texttt{goto}-like labels in both the prototype and the formalized compilers, but for now we stick with this CompCert-style arrangement.}). However, this means that there are badly-formed \textsf{Cminor} programs such as \begin{lstlisting}[language={}] int main() { block { exit 5 int main() { block { loop { exit 5 } } } } \end{lstlisting} where we attempt to exit more blocks than exist.  To rule these out (including integers in the exit and switch statements: \begin{lstlisting}[language=matita] inductive stmt : ∀blockdepth:nat. Type[0] ≝ | St_skip : ∀n. stmt n ... | St_loop : ∀n. stmt n → stmt n | St_block : ∀n. stmt (S n) → stmt n | St_exit : ∀n. Fin n → stmt n (* expr to switch on, table of , default *) | St_switch : expr → ∀n. list (int × (Fin n)) → Fin n → stmt n ... \end{lstlisting} where \texttt{stmt n} is a statement enclosed in \texttt{n} blocks or loops, inductive stmt : ∀blockdepth:nat. Type[0] ≝ | St_skip : ∀n. stmt n ... | St_loop : ∀n. stmt n → stmt n | St_block : ∀n. stmt (S n) → stmt n | St_exit : ∀n. Fin n → stmt n (* expr to switch on, table of , default *) | St_switch : expr → ∀n. list (int × (Fin n)) → Fin n → stmt n ... \end{lstlisting} where \texttt{stmt n} is a statement enclosed in \texttt{n} blocks, and \texttt{Fin n} is a standard construction for a natural number which is at most \texttt{n}. state, and the function to find the continuation from an exit statement can be made failure-free. We note in passing that adding this parameter detected a small mistake in the semantics concerning continuations and tail calls, although the mistake itself \subsection{Identifier invariants} For showing that the variables and labels occurring in the body of a function are present in the relevant structures are add an additional invariant to the To show that the variables and labels occurring in the body of a function are present in the relevant structures we add an additional invariant to the function records. For \textsf{Cminor} we add a higher-order predicate which recursively applies For \textsf{Cminor} we use a higher-order predicate which recursively applies a predicate to all substatements: \begin{lstlisting}[language=matita] ]. \end{lstlisting} Dependent pattern matching on statements thus allows \lstinline'stmt_P' to unfold to the predicate on the current statement as well as the predicate applied to all substatements. We use two properties in \textsf{Cminor}: Dependent pattern matching on statements thus allows an accompanying \lstinline'stmt_P' fact to be unfold to the predicate on the current statement and the predicate applied to all substatements. We require two properties to hold in \textsf{Cminor} functions: \begin{enumerate} \item All variables are present in the list of parameters or the list of variables for the function (this also uses a similar recursive predicate on expressions). \item All variables in the body are present in the list of parameters or the list of variables for the function (this also uses a similar recursive predicate on expressions). \item All labels in \texttt{goto} statements appear in a label statement. \end{enumerate} }. \end{lstlisting} where \lstinline'stmt_vars' and \lstinline'stmt_labels' return the where \lstinline'stmt_vars' and \lstinline'stmt_labels' constrain the variables and labels that appear directly in a statement (but not substatements), and substatements) to appear in the given list, and \lstinline'labels_of' returns a list of all the labels defined in a statement. state and continuations.  It was convenient to split the continuations between the local continuation representing the rest of the code to be executed within the function, and the stack of function calls.  The invariant for variables is the function, and the stack of function calls. % TODO: because??? The invariant for variables is slightly different --- we require that every variable appear in the local environment.  We use \lstinline'f_inv' from the function to establish this witnesses that the invariants are those we wanted, but makes no difference to the actual execution of the program, especially as the execution can still fail due to runtime errors.  Moreover, it is unclear what effect the presence of proof terms and more dependent pattern matching in the semantics will have on the complexity of future correctness proofs.  We plan to examine this issue during task 3.4. fail due to genuine runtime errors.  Moreover, it is unclear what effect the presence of proof terms and more dependent pattern matching in the semantics will have on the complexity of future correctness proofs.  We plan to examine this issue during task 3.4. We use a similar method to specify the invariant that the \textsf{RTLabs}