Index: /Deliverables/D3.2/Report/report.tex
===================================================================
 /Deliverables/D3.2/Report/report.tex (revision 1260)
+++ /Deliverables/D3.2/Report/report.tex (revision 1261)
@@ 107,5 +107,6 @@
We also report on progress enriching these transformations with dependent
types so as to establish key invariants in each intermediate language, which
will assist in future work proving correctness properties of the compiler.
+removes potential sources of failure within the compiler and
+will assist in future work proving correctness properties.
\newpage
@@ 118,4 +119,5 @@
% TODO: fix unicode in listings
% TODO: make it clear that our correctness is intended to go beyond compcert's
+% TODO: capitalise deliverable/task?
\section{Introduction}
@@ 136,4 +138,6 @@
\begin{figure}
+\begin{center}
+\begin{minipage}{.8\linewidth}
\begin{tabbing}
\quad \= $\downarrow$ \quad \= \kill
@@ 151,6 +155,8 @@
\textsf{RTLabs}\\
\> $\downarrow$ \> start of target specific backend\\
\ \ \dots
+\>\quad \vdots
\end{tabbing}
+\end{minipage}
+\end{center}
\caption{Frontend languages and transformations}
\label{fig:summary}
@@ 160,9 +166,9 @@
two intermediate languages involved are
\begin{description}
\item[\textsf{Cminor}]  a Clike language where local variables are no
longer held explicitly in memory and control structures are simpler

\item[\textsf{RTLabs}]  a language in the form of a control flow graph but
retaining the front end operations from \textsf{Cminor}
+\item[\textsf{Cminor}]  a Clike language where local variables are not
+explicitly allocated memory and control structures are simpler
+
+\item[\textsf{RTLabs}]  a language in the form of a control flow graph
+which retains the values and front end operations from \textsf{Cminor}
\end{description}
More details on the formalisation of the syntax and semantics of these
@@ 174,11 +180,11 @@
We have been tracking revisions to the prototype compiler during the
development of the formalized version. These were usually minor, with the
exception of the following major revision to the compiler.
+development of the formalized version. Most of these changes were minor, but
+one exception is a major change to the structure of the compiler.
The original plan for the frontend featured a \textsf{Clight} to
\textsf{Clight8} phase near the start which replaced all of the integer
values and operations by 8 bit counterparts, while pointers were split into
bytes at a later stage. Experimentation found that it would be difficult to
+bytes at a later stage. Experience has shown that it would be difficult to
produce good code with this approach. Instead, we now have:
\begin{itemize}
@@ 187,17 +193,22 @@
\item a cast removal stage which simplifies \textsf{Clight} expressions such
as
\begin{lstlisting}[language=C]
 (char)((int)x + (int)y)
+\begin{lstlisting}[language=C,belowskip=0pt]
+ (char)((int)x + (int)y)
\end{lstlisting}
which are produced by integer promotion built into the CIL parser into
equivalent operations on simpler types, \lstinline'x+y' in this case.
+into equivalent operations on simpler types, \lstinline'x+y' in this case.
+The cast removal is important because C requires \emph{arithmetic promotion}
+of integer types to (at least) \lstinline'int' before an operation is
+performed. The \textsf{Clight} semantics do not perform the promotions,
+instead they are added as casts by the CILbased parser. However, our targets
+benefit immensely from performing operations on the smallest possible integer
+type, so it is important that we remove promotions where possible.
\end{itemize}
This document describes the formalized front end after these changes.
\section{Clight phases}
+\section{\textsf{Clight} phases}
In addition to the conversion to \textsf{Cminor}, there are several
transformations that act directly on the \textsf{Clight} language.
+transformations which act directly on the \textsf{Clight} language.
\subsection{Cast simplification}
@@ 211,12 +222,13 @@
The prototype version worked by recognising fixed patterns in the
\textsf{Clight} abstract syntax tree such as
\[ (t)((t_1)e_1 op (t_2)e_2) \]
subject to restrictions on the types, and replaces them with a simpler
version. These deep pattern matches are slightly awkward in \matita{} and
they do not capture compositions of operations, such as
+\[ (t)((t_1)e_1\ op\ (t_2)e_2), \]
+subject to restrictions on the types. These are replaced with a simpler
+version without the casts. Such `deep' pattern matching is slightly awkward in
+\matita{} and this approach does not capture compositions of operations, such as
\begin{lstlisting}[language=C]
(char)((int)a + (int)b + (int)c)
+ (char)(((int)a + (int)b) + (int)c)
\end{lstlisting}
where \lstinline'a', \lstinline'b' and \lstinline'c' are of type
+\lstinline'char', because the intermediate expression is not cast to and from
\lstinline'char'.
@@ 227,7 +239,7 @@
the subexpression is of type \lstinline'char' and eliminates the cast.
Moreover, when the recursive processing is complete the \lstinline'char' cast
is also eliminated because its subexpression is now of the correct type.

This has been formalized in \matita. We have also performed a few proofs that
+is also eliminated because its subexpression is already of the correct type.
+
+This has been implemented in \matita. We have also performed a few proofs that
the arithmetic behind these changes is correct to gain confidence in the
technique. During task 3.4 we will extend these proofs to cover more
@@ 267,14 +279,20 @@
necessary because \textsf{Clight} allows arbitrary \emph{lvalue} expressions
as the destination for the returned value, but \textsf{Cminor} only allows
local variables. However, the variable names are supplied with the input
program, but without any method for generating fresh names.
+local variables. All other variable names in the \textsf{Cminor} program came
+from the \textsf{Clight} program, but we need to construct a method for
+generating fresh names for the temporaries.
Our identifiers are based on binary numbers, and generation of fresh names is
handled by keeping track of the highest allocated number. Thus we create a
new name generator from the input program by finding the maximum number used.

\section{Cminor phases}

Only two phases deal with \textsf{Cminor} programs:
+handled by keeping track of the highest allocated number. Normally this is
+initialised at zero, but if initialised by the largest existing identifier in
+the \textsf{Clight} program then the generated names will be fresh.
+To do this, we extract the maximum identifier by recursively finding the maximum
+variable name used in every expression, statement and function of the program.
+
+\section{\textsf{Cminor} phases}
+
+\textsf{Cminor} programs are processed by two passes: one deals with the
+initialisation of global variables, and the other produces \textsf{RTLabs}
+code.
\subsection{Initialisation code}
@@ 285,7 +303,7 @@
initialisation data that this pass takes as input, and one with only size
information that is the output. In addition to reflecting the purpose of this
pass in the types, it also ensures that it cannot be accidentally left out.

\subsection{Conversion to RTLabs}
+pass in its type, it also ensures that the pass cannot be accidentally omitted.
+
+\subsection{Conversion to \textsf{RTLabs}}
This pass breaks down the structure of the \textsf{Cminor} program into a
@@ 299,5 +317,5 @@
One possible variation would be to explicitly define a state monad to carry
the function under construction around, but it is not yet clear if this will
make the correctness easier to prove.
+make the correctness results easier to prove.
\section{Adding and using invariants}
@@ 315,26 +333,86 @@
in nature, and will evolve during task 3.4.
Invariants on \textsf{Cminor} programs are defined using a higher order
predicate which applies a given predicate to a statement and recursively to
each of its substatements (and expressions and subexpressions, respectively).

Thus during the \textsf{Clight} to \textsf{Cminor} stage the values returned
are not just \textsf{Cminor} expressions and statements, but dependent pairs
that also return invariants to establish that only local variables appear in
the generated terms, and all labels appearing in \texttt{goto} statements are
defined in the result. The latter is proved by showing two properties: first
by checking whether the labels are in the set defined by the original
\textsf{Clight} function body, and second that the translation preserves the
set of label statements.

These two properties are used at the end of translation to show the invariant
attached to \textsf{Cminor} \emph{functions}: that all \texttt{goto} labels
are defined in the body. Similarly, the invariant for variables is slightly
different to the translation's: we go from the fact that every variable was
classified as local to the appearance of every variable in the locals or
parameters. This is done using a lemma showing that the classification only
allows parameters and locals to be classed as `local'.

The next translation to \textsf{RTLabs}\dots
+The use of the invariants follows a common pattern. Each language embeds
+invariants in the function record that constrain the function body by other
+information in the record (such as the list of local variables and types, or
+the set of labels declared). However, during the transformations they
+typically need to be refined to constraints on individual statements and
+expressions with respect to data structures used in the transformation.
+A similar change in invariants is required between the transformation and the
+new function.
+
+For example, consider the use of local variables in the \textsf{Cminor} to
+\textsf{RTLabs} stage. We start with
+\begin{lstlisting}[language=matita]
+record internal_function : Type[0] ≝
+{ f_return : option typ
+; f_params : list (ident × typ)
+; f_vars : list (ident × typ)
+; f_stacksize : nat
+; f_body : stmt
+; f_inv : stmt_P (λs.stmt_vars (λi.Exists ? (λx.\fst x = i) (f_params @ f_vars)) s ∧
+ stmt_labels (λl.Exists ? (λl'.l' = l) (labels_of f_body)) s) f_body
+}.
+\end{lstlisting}
+where the first half of \lstinline'f_inv' requires every variable in the
+function body to appear in the parameter or variable list. In the translation
+to \textsf{RTLabs}, variable lookups are performed in a map to \textsf{RTLabs}
+pseudoregisters:
+\begin{lstlisting}[language=matita]
+definition env ≝ identifier_map SymbolTag register.
+
+let rec add_expr (le:label_env) (env:env) (ty:typ) (e:expr ty)
+ (Env:expr_vars ty e (present ?? env))
+ (dst:register) (f:partial_fn le) on e
+ : Σf':partial_fn le. fn_graph_included le f f' ≝
+match e return λty,e.expr_vars ty e (present ?? env) →
+ Σf':partial_fn le. fn_graph_included le f f' with
+[ Id _ i ⇒ λEnv.
+ let r ≝ lookup_reg env i Env in
+ ...
+\end{lstlisting}
+Note that \lstinline'lookup_reg' returns a register without any possibility of
+error. The reason this works is that the pattern match on \lstinline'e'
+refines the type of the invariant \lstinline'Env' to a proof that the variable
+\lstinline'i' is present. We then pass this proof to the lookup function to
+rule out failure.
+
+When this map \lstinline'env' is constructed at the start of the phase, we
+prove that the proof \lstinline'f_inv' from the function implies the invariant
+on variables needed by \lstinline'add_expr' and its equivalent on statements:
+\begin{lstlisting}[language=matita]
+lemma populates_env : ∀l,e,u,l',e',u'.
+ populate_env e u l = 〈l',e',u'〉 →
+ ∀i. Exists ? (λx.\fst x = i) l →
+ present ?? e' i.
+\end{lstlisting}
+A similar mechanism is used to show that \texttt{goto} labels are always
+declared in the function.
+
+Also note the return type of \lstinline'add_expr' is a dependent pair. We
+build the resulting \textsf{RTLabs} function incrementally, using a type
+\lstinline'partial_fn' that does not contain the final invariant for
+functions. We always require the \lstinline'fn_graph_included' property for
+partially built functions to show that the graph only gets larger, a key part
+of the proof that the resulting control flow graph is closed. Dependent pairs
+are used in a similar manner in the \textsf{Clight} to \textsf{Cminor} phase
+too.
+
+This work does not currently cover all of the possible sources of failure; in
+particular some structural constraints on functions are not yet covered and
+some properties of \textsf{RTLabs} programs that may be useful for later
+stages or the correctness proofs are not produced. Moreover, we may
+experiment with variations to try to make the proof obligations and syntax
+simpler to deal with. However, it does show that retrofitting these
+properties using dependent types in \matita{} is feasible.
+
+\section{Testing}
+
+To provide some early testing and bug fixing of this code we constructed it in
+concert with the executable semantics described in deliverable 3.3, and
+\matita{} term pretty printers in the prototype compiler. Using these, we
+were able to test the phases individually and together by running programs
+within the proof assistant itself, and comparing the results with the expected
+output.
\section{Conclusion}