source: Deliverables/D3.3/Report/report.tex @ 1234

Last change on this file since 1234 was 1234, checked in by campbell, 8 years ago

Add (rather brief) draft of D3.3.

File size: 18.0 KB
Line 
1\documentclass[11pt,epsf,a4wide]{article}
2\usepackage[mathletters]{ucs}
3\usepackage[utf8x]{inputenc}
4\usepackage{listings}
5\usepackage{../../style/cerco}
6\newcommand{\ocaml}{OCaml}
7\newcommand{\clight}{Clight}
8\newcommand{\matita}{Matita}
9\newcommand{\sdcc}{\texttt{sdcc}}
10
11\newcommand{\textSigma}{\ensuremath{\Sigma}}
12
13% LaTeX Companion, p 74
14\newcommand{\todo}[1]{\marginpar{\raggedright - #1}}
15
16\lstdefinelanguage{coq}
17  {keywords={Definition,Lemma,Theorem,Remark,Qed,Save,Inductive,Record},
18   morekeywords={[2]if,then,else},
19  }
20
21\lstdefinelanguage{matita}
22  {keywords={definition,lemma,theorem,remark,inductive,record,qed,let,rec,match,with,Type,and,on},
23   morekeywords={[2]whd,normalize,elim,cases,destruct},
24   mathescape=true,
25   morecomment=[n]{(*}{*)},
26  }
27
28\lstset{language=matita,basicstyle=\small\tt,columns=flexible,breaklines=false,
29        keywordstyle=\color{red}\bfseries,
30        keywordstyle=[2]\color{blue},
31        commentstyle=\color{green},
32        stringstyle=\color{blue},
33        showspaces=false,showstringspaces=false}
34
35\lstset{extendedchars=false}
36\lstset{inputencoding=utf8x}
37\DeclareUnicodeCharacter{8797}{:=}
38\DeclareUnicodeCharacter{10746}{++}
39\DeclareUnicodeCharacter{9001}{\ensuremath{\langle}}
40\DeclareUnicodeCharacter{9002}{\ensuremath{\rangle}}
41
42
43\title{
44INFORMATION AND COMMUNICATION TECHNOLOGIES\\
45(ICT)\\
46PROGRAMME\\
47\vspace*{1cm}Project FP7-ICT-2009-C-243881 \cerco{}}
48
49\date{ }
50\author{}
51
52\begin{document}
53\thispagestyle{empty}
54
55\vspace*{-1cm}
56\begin{center}
57\includegraphics[width=0.6\textwidth]{../../style/cerco_logo.png}
58\end{center}
59
60\begin{minipage}{\textwidth}
61\maketitle
62\end{minipage}
63
64
65\vspace*{0.5cm}
66\begin{center}
67\begin{LARGE}
68\bf
69Report n. D3.3\\
70Executable Formal Semantics of front end intermediate languages\\
71\end{LARGE} 
72\end{center}
73
74\vspace*{2cm}
75\begin{center}
76\begin{large}
77Version 1.0
78\end{large}
79\end{center}
80
81\vspace*{0.5cm}
82\begin{center}
83\begin{large}
84Main Authors:\\
85Brian~Campbell
86\end{large}
87\end{center}
88
89\vspace*{\fill}
90\noindent
91Project Acronym: \cerco{}\\
92Project full title: Certified Complexity\\
93Proposal/Contract no.: FP7-ICT-2009-C-243881 \cerco{}\\
94
95\clearpage \pagestyle{myheadings} \markright{\cerco{}, FP7-ICT-2009-C-243881}
96
97\newpage
98
99\vspace*{7cm}
100\paragraph{Abstract}
101We report on the formalization of the front end intermediate languages for the
102\cerco{} project's compiler using executable semantics in the \matita{} proof
103assistant.
104
105\newpage 
106
107\tableofcontents
108
109% TODO: clear up any -ize vs -ise
110% TODO: clear up any "front end" vs "front-end"
111% TODO: clear up any mentions of languages that aren't textsf'd.
112% TODO: mention the restricted form of types after Clight
113% TODO: fix unicode in listings
114
115\section{Introduction}
116
117This work formalizes the front end intermediate languages from the \cerco{}
118prototype compiler, described in previous deliverables~\cite{d2.1,d2.2}.  The
119front end of the compiler is summarized in Figure~\ref{fig:summary}.  We
120have also refined parts of the formal development that are used for several of
121the languages in the compiler.
122
123We will also report on work to add several invariants to the languages.  This
124activity overlaps with task 3.4 on the correctness of the compiler front end.
125However, the use of dependent types mean that this work is tied closely to the
126definition of the languages and the transformations of the front end in task
1273.2.  By considering it now we can experiment with and judge its impact on the
128formal semantics, and how feasible retrofitting such invariants is.
129
130The input language to the formalized front end is the \textsf{Clight}
131language.  The executable semantics for this language were presented in a
132previous deliverable~\cite{d3.1}.  In the present work we will report on some
133minor changes to the semantics made to better align it with the whole
134development.
135
136\begin{figure}
137\begin{tabbing}
138\quad \= $\downarrow$ \quad \= \kill
139\textsf{C} (unformalized)\\
140\> $\downarrow$ \> CIL parser (unformalized \ocaml)\\
141\textsf{Clight}\\
142\> $\downarrow$ \> cast removal\\
143\> $\downarrow$ \> add runtime functions\\
144\> $\downarrow$ \> labelling\\
145\> $\downarrow$ \> stack variable allocation and control structure
146 simplification\\
147\textsf{Cminor}\\
148\> $\downarrow$ \> generate global variable initialisation code\\
149\> $\downarrow$ \> transform to RTL graph\\
150\textsf{RTLabs}\\
151\> $\downarrow$ \> start of target specific backend\\
152\ \  \dots
153\end{tabbing}
154\caption{Front end languages and transformations}
155\label{fig:summary}
156\end{figure}
157
158\subsection{Revisions to the prototype compiler}
159
160Ongoing work to maintain and improve the prototype compiler resulted in
161several changes.  Most importantly, the transformations to replace 16 and 32
162bit types have been moved from the \textsf{Clight} language to the
163target-specific stage between \textsf{RTLabs} and \textsf{RTL} to help
164generate better code, and the addition of a \textsf{Clight} cast removal
165transformation to reduce the number of 16 and 32 bit operations.
166
167The formalized semantics have tracked these changes, and in this report we
168describe them as they currently stand.
169
170\section{Definitions common to several languages}
171
172The semantics for many of the languages in the compiler share some core parts:
173the memory model, environments, runtime values, definitions of operations and
174a monad for encapsulating failure and I/O (using resumptions).  The executable
175memory model was ported from CompCert as part of the work on \textsf{Clight}
176and was reused for the front end languages\footnote{However, it is likely that
177we will revise the memory model to make it better suited for describing the
178back-end of our compiler.}.  The failure and I/O monad was introduced in the
179previous deliverable on \textsf{Clight}~\cite[\S4.2]{d3.1}.  The other parts
180are discussed below, with the only change to the runtime values being the
181representation of integers.
182
183\subsection{Identifiers}
184
185Each language features one or more kinds of names to represent variables,
186registers, \texttt{goto} labels and/or RTL graph labels.  We also need to
187describe various maps keyed on identifiers in the semantics and during
188compilation.
189
190Previous work on the executable semantics of the target machine
191code included bit vectors and bit vector tries to define various integers in
192the semantics, and give a low level view of memory~\cite{d4.1}.  To keep the
193size of the development down we have reused these data structures for
194identifiers and maps respectively.
195
196However, given the wide variety of identifiers used in the compiler we also
197wish to separate the different classes of identifier.  Thus we encapsulate the
198bit vector representing the identifier in a datatype that also carries a tag
199identifying which class of identifier we are using:
200\begin{lstlisting}
201inductive identifier (tag:String) : Type[0] ≝
202  an_identifier : Word → identifier tag.
203\end{lstlisting}
204The tries are also tagged in the same manner.
205
206One difficultly with using fixed size bit vectors for identifiers is that
207fresh name generation can fail if generate too many.  While we use an error
208monad to deal with failures, we wish to minimize its use in the compiler.
209Thus we add a flag to detect overflows, and check it after the phase of the
210compiler is complete to report exhaustion.  The rest of the phase can then be
211written as if name generation always succeeds.
212
213\subsection{Machine integers and arithmetic}
214
215The bit vectors in~\cite{d4.1} also came equipped with some basic arithmetic
216for the target semantics.  The front end required these operations to be
217generalized and extended.
218
219% TODO: expand, note need to sufficiently good performance for testing
220
221\subsection{Front end operations}
222
223The two front end intermediate languages, \textsf{Cminor} and \textsf{RTLabs},
224share the same set of operations on values.  They differ from
225\textsf{Clight}'s operations by incorporating casts and by having a separate
226operation for each type of data operated upon.  For example, subtraction of
227pointers is treated as a different operation from subtraction of integers.
228
229A common semantics is given for these operations, as simple functions on the
230operation and runtime values.
231
232\section{Clight}
233
234The \textsf{Clight} input language remained largely the same as in the
235previous deliverable~\cite{d3.1}.  The principal changes were to use the
236identifiers and arithmetic described above in place of the arbitrarily large
237integers used above.  For the identifiers, this relieved us of the burden of
238adding an efficient datatype for maps by reusing the bit vector tries instead.
239
240The arithmetic replaced a dependent pair of an arbitrary integer and proof
241that it was in range of 32 bit integers by the exact bit vector for each
242size of integer.  This direct approach is closer to the implementation and
243more obviously correct --- no extra precision can be left in by accident.
244
245\section{Cminor}
246
247The \textsf{Cminor} language does not store local variables in memory, and has
248simpler control structures than \textsf{Clight}.  The syntax is similar to the
249prototype, except that the types attached to expressions are restricted so
250that some corner cases are ruled out in the \textsf{Cminor} to \textsf{RTLabs}
251stage (see the accompanying deliverable 3.2 for details):
252\begin{lstlisting}
253inductive expr : typ → Type[0] ≝
254| Id : ∀t. ident → expr t
255| Cst : ∀t. constant → expr t
256| Op1 : ∀t,t'. unary_operation → expr t → expr t'
257| Op2 : ∀t1,t2,t'. binary_operation → expr t1 → expr t2 → expr t'
258| Mem : ∀t,r. memory_chunk → expr (ASTptr r) → expr t
259| Cond : ∀sz,sg,t. expr (ASTint sz sg) → expr t → expr t → expr t
260| Ecost : ∀t. costlabel → expr t → expr t.
261\end{lstlisting}
262In principle we could extend this to statically ensure that only well-typed
263\textsf{Cminor} expressions are considered, and we will consider this as part
264of the work on correctness in task 3.4.
265
266We also provide a variant of the syntax where the only initialization data is
267the size of each global variable, for use after the initialization code has
268been generated.
269
270The definition of the semantics is routine: a functional definition of a
271single small step of the machine is given, reusing the memory model,
272environments, arithmetic and operations mentioned above.
273
274\section{RTLabs}
275
276The \textsf{RTLabs} language provides a target independent Register Transfer
277Language, where programs are represented as control flow graphs.  We use the
278identifiers described above for the graph labels and the maps for the graph
279itself.  The tagging mechanism ensures that labels cannot be mixed up with
280other identifiers in the program (in particular, we cannot accidentally reuse
281a \texttt{goto} label from Cminor where a graph label should appear).
282
283Otherwise, the syntax and semantics of RTLabs mirrors that of the prototype
284compiler.  The same common elements are used as for \textsf{Cminor}, including
285the front end operations mentioned above.
286
287\section{Testing}
288
289To provide some assurance that the semantics were properly implemented, and to
290support the testing described in the accompanying deliverable 3.2, we have
291adapted the pretty printers in the prototype compiler to produce \matita{}
292terms in the syntax for each language described above.
293
294A few common definitions were added for animating the small step semantics
295definitions for any of the front end languages in \matita.  We then used a
296small selection of test cases to ensure basic functionality.  However, this is
297still a time consuming process, so more testing will carried out once the
298extraction to \ocaml{} programs is implemented in \matita.
299
300\section{Embedding invariants}
301
302Each phase of the prototype compiler can fail in a number of places if the
303input language permits programs that are badly structured in some sense: a
304missing label in a \texttt{goto} statement or CFG, an undefined variable name,
305a \texttt{break} statement outside of a loop or \texttt{switch}, and so on.
306We wish to restrict our intermediate languages using dependent types to remove
307as many `junk' programs as possible.  We also hope that such restrictions will
308help in other correctness proofs.
309
310This goal lies in the overlap between several tasks in the project: it
311involves manipulating the syntax and semantics of the intermediate languages
312(the present work), the encoding of the front-end compiler phases in \matita{}
313(task 3.2) and the correctness of the front-end (task 3.4).  Thus this work is
314rather experimental; it is being carried out on branches in our source code
315repository and the final form will be decided and merged in during task 3.4.
316
317So far we have tried adding two forms of invariant --- one by using dependent
318types to index statements in \textsf{Cminor} by their block depth, and the
319other asserts that variables and labels are present in the appropriate
320environments by adding a separate invariant to each function.
321
322\subsection{Cminor block depth}
323
324The \textsf{Cminor} language simplifies loops with a single infinite loop
325construct and the switch statement does not contain the individual cases.
326Instead, statements are provided for infinite loops, non-looping blocks and
327exiting an arbitrary number of blocks (which is also what the switch statement
328does\footnote{We are considering replacing the \textsf{Cminor} switch
329statement with one that uses \texttt{goto}-like labels in both the prototype
330and the formalized compilers, but for now we stick with this CompCert-like
331arrangement.}.
332
333However, this means that there are badly-formed \textsf{Cminor} programs such
334as
335\begin{lstlisting}[language={}]
336  int main() {
337    block {
338      exit 5
339    }
340  }
341\end{lstlisting}
342where we attempt to exit more blocks than exist.  To rule these out (including
343demonstrating that the previous phase of the compiler does not generate them)
344we can index the statements of the language by the depth of the enclosing
345blocks.
346
347The adaption of the syntax adds the depth to every statement, and uses bounded
348integers in the exit and switch statements:
349\begin{lstlisting}[language=matita]
350inductive stmt : ∀blockdepth:nat. Type[0] ≝
351| St_skip : ∀n. stmt n
352...
353| St_loop : ∀n. stmt n → stmt n
354| St_block : ∀n. stmt (S n) → stmt n
355| St_exit : ∀n. Fin n → stmt n
356(* expr to switch on, table of <switch value, #blocks to exit>, default *)
357| St_switch : expr → ∀n. list (int × (Fin n)) → Fin n → stmt n
358...
359\end{lstlisting}
360where \texttt{stmt n} is a statement enclosed in \texttt{n} blocks or loops,
361and \texttt{Fin n} is a standard construction for a natural number which is at
362most \texttt{n}.
363
364In the semantics the number of blocks is also added to the continuations and
365state, and the function to find the continuation from an exit statement can be
366made failure-free.
367
368We note in passing that adding this parameter detected a small mistake in the
369semantics concerning continuations and tail calls, although the mistake itself
370was benign.
371
372\subsection{Identifier invariants}
373
374For showing that the variables and labels occurring in the body of a function
375are present in the relevant structures are add an additional invariant to the
376function records.
377
378For \textsf{Cminor} we add a higher-order predicate which recursively applies
379a predicate to all substatements:
380\begin{lstlisting}[language=matita]
381let rec stmt_P (P:stmt → Prop) (s:stmt) on s : Prop ≝
382match s with
383[ St_seq s1 s2 ⇒ P s ∧ stmt_P P s1 ∧ stmt_P P s2
384| St_ifthenelse _ _ _ s1 s2 ⇒ P s ∧ stmt_P P s1 ∧ stmt_P P s2
385| St_loop s' ⇒ P s ∧ stmt_P P s'
386| St_block s' ⇒ P s ∧ stmt_P P s'
387| St_label _ s' ⇒ P s ∧ stmt_P P s'
388| St_cost _ s' ⇒ P s ∧ stmt_P P s'
389| _ ⇒ P s
390].
391\end{lstlisting}
392Dependent pattern matching on statements thus allows \lstinline'stmt_P' to
393unfold to the predicate on the current statement as well as the predicate
394applied to all substatements.
395
396We use two properties in \textsf{Cminor}:
397\begin{enumerate}
398\item All variables are present in the list of parameters or the list of
399variables for the function (this also uses a similar recursive predicate on
400expressions).
401\item All labels in \texttt{goto} statements appear in a label statement.
402\end{enumerate}
403The function definition thus becomes:
404\begin{lstlisting}[language=matita]
405record internal_function : Type[0] ≝
406{ f_return    : option typ
407; f_params    : list (ident × typ)
408; f_vars      : list (ident × typ)
409; f_stacksize : nat
410; f_body      : stmt
411; f_inv       : stmt_P (λs.stmt_vars (λi.Exists ? (λx.\fst x = i) (f_params @ f_vars)) s ∧
412                           stmt_labels (λl.Exists ? (λl'.l' = l) (labels_of f_body)) s) f_body
413}.
414\end{lstlisting}
415where \lstinline'stmt_vars' and \lstinline'stmt_labels' return the
416variables and labels that appear directly in a statement (but not
417substatements), and
418\lstinline'labels_of' returns a list of all the labels defined in a
419statement.
420
421The \textsf{Clight} semantics can be amended to use these invariants, although
422the main benefit is for the compiler stages (see the accompanying deliverable
4233.2 for details).  The semantics require the invariants to be added to the
424state and continuations.  It was convenient to split the continuations between
425the local continuation representing the rest of the code to be executed within
426the function, and the stack of function calls.  The invariant for variables is
427slightly different --- we require that every variable appear in the local
428environment.  We use \lstinline'f_inv' from the function to establish this
429invariant when the environment is set up on function entry.
430
431It is unclear whether changing the semantics is really worthwhile.  It
432witnesses that the invariants are those we wanted, but makes no difference to
433the actual execution of the program, especially as the execution can still
434fail due to runtime errors.  Moreover, it is unclear what effect the presence
435of proof terms and more dependent pattern matching in the semantics will have
436on the complexity of future correctness proofs.  We plan to examine this issue
437during task 3.4.
438
439We use a similar method to specify the invariant that the \textsf{RTLabs}
440graph is closed --- that is, any successor labels in a statement in the graph
441are present in the graph.  The definition is simpler in \textsf{RTLabs}
442because the flat representation of the graph does not require recursive
443definitions like \lstinline'stmt_P' above.
444
445\section{Conclusion}
446
447\bibliographystyle{plain}
448\bibliography{report}
449
450\end{document}
Note: See TracBrowser for help on using the repository browser.