source: Deliverables/D4.2-4.3/reports/D4-2.tex @ 1361

Last change on this file since 1361 was 1361, checked in by mulligan, 9 years ago

more changes

File size: 17.5 KB
Line 
1\documentclass[11pt, epsf, a4wide]{article}
2
3\usepackage{../../style/cerco}
4
5\usepackage{amsfonts}
6\usepackage{amsmath}
7\usepackage{amssymb} 
8\usepackage[english]{babel}
9\usepackage{graphicx}
10\usepackage[utf8x]{inputenc}
11\usepackage{listings}
12\usepackage{stmaryrd}
13\usepackage{url}
14
15\title{
16INFORMATION AND COMMUNICATION TECHNOLOGIES\\
17(ICT)\\
18PROGRAMME\\
19\vspace*{1cm}Project FP7-ICT-2009-C-243881 \cerco{}}
20
21\lstdefinelanguage{matita-ocaml}
22  {keywords={ndefinition,ncoercion,nlemma,ntheorem,nremark,ninductive,nrecord,nqed,nlet,let,in,rec,match,return,with,Type,try},
23   morekeywords={[2]nwhd,nnormalize,nelim,ncases,ndestruct},
24   morekeywords={[3]type,of},
25   mathescape=true,
26  }
27
28\lstset{language=matita-ocaml,basicstyle=\small\tt,columns=flexible,breaklines=false,
29        keywordstyle=\color{red}\bfseries,
30        keywordstyle=[2]\color{blue},
31        keywordstyle=[3]\color{blue}\bfseries,
32        commentstyle=\color{green},
33        stringstyle=\color{blue},
34        showspaces=false,showstringspaces=false}
35
36\lstset{extendedchars=false}
37\lstset{inputencoding=utf8x}
38\DeclareUnicodeCharacter{8797}{:=}
39\DeclareUnicodeCharacter{10746}{++}
40\DeclareUnicodeCharacter{9001}{\ensuremath{\langle}}
41\DeclareUnicodeCharacter{9002}{\ensuremath{\rangle}}
42
43\date{}
44\author{}
45
46\begin{document}
47
48\thispagestyle{empty}
49
50\vspace*{-1cm}
51\begin{center}
52\includegraphics[width=0.6\textwidth]{../../style/cerco_logo.png}
53\end{center}
54
55\begin{minipage}{\textwidth}
56\maketitle
57\end{minipage}
58
59\vspace*{0.5cm}
60\begin{center}
61\begin{LARGE}
62\textbf{
63Report n. D4.2\\
64Functional encoding in the Calculus of Constructions
65}
66\end{LARGE} 
67\end{center}
68
69\vspace*{2cm}
70\begin{center}
71\begin{large}
72Version 1.0
73\end{large}
74\end{center}
75
76\vspace*{0.5cm}
77\begin{center}
78\begin{large}
79Main Authors:\\
80Dominic P. Mulligan and Claudio Sacerdoti Coen
81\end{large}
82\end{center}
83
84\vspace*{\fill}
85
86\noindent
87Project Acronym: \cerco{}\\
88Project full title: Certified Complexity\\
89Proposal/Contract no.: FP7-ICT-2009-C-243881 \cerco{}\\
90
91\clearpage
92\pagestyle{myheadings}
93\markright{\cerco{}, FP7-ICT-2009-C-243881}
94
95\newpage
96
97\vspace*{7cm}
98\paragraph{Abstract}
99We describe the encoding, in the Calculus of Constructions, of the intermediate languages for the CerCo compiler.
100The CerCo backend consists of four distinct intermediate languages---RTL, ERTL, LTL and LIN, respectively---though we also handle the translations from the last frontend intermediate language, RTLabs, into RTL, and the translation of LIN into assembly language.
101Where feasible, we have made use of dependent types to enforce invariants and to make otherwise partial functions total.
102\newpage
103
104\tableofcontents
105
106\newpage
107
108%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
109% SECTION.                                                                    %
110%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
111\section{Task}
112\label{sect.task}
113
114The Grant Agreement states that Task T4.2, entitled `Functional encoding in the Calculus of Constructions' has associated Deliverable D4.2, consisting of the following:
115\begin{quotation}
116Executable Formal Semantics of back-end intermediate languages: This prototype is the formal counterpart of deliverable D2.1 for the back end side of the compiler and validates it.
117\end{quotation}
118This report details our implementation of this deliverable.
119
120%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
121% SECTION.                                                                    %
122%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
123\subsection{Connections with other deliverables}
124\label{subsect.connections.with.other.deliverables}
125
126Deliverable D4.2 enjoys a close relationship with three other deliverables, namely deliverables D2.1, D4.3 and D4.4.
127
128Deliverable D2.1, the O'Caml implementation of a cost preserving compiler for a large subset of the C programming language, is the basis upon which we have implemented the current deliverable.
129In particular, the architecture of the compiler, its intermediate languages, and the overall implementation of the Matita encodings has been taken from the O'Caml compiler.
130Any variations from the O'Caml design are due to bugs identified in the prototype compiler during the Matita implementation, our identification of code that can be abstracted and made generic, or our use of Matita's much stronger type system to enforce invariants through the use of dependent types.
131
132Deliverable D4.3 can be seen as a `sister' deliverable to the deliverable reported on herein.
133In particular, where this deliverable reports on the encoding in the Calculus of Constructions of the backend translations, D4.3 is the encoding in the Calculus of Constructions of the semantics of those languages.
134As a result, a substantial amount of Matita code is shared between the two deliverables.
135
136Deliverable D4.4, the backend correctness proofs, is the immediate successor of this deliverable.
137
138%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
139% SECTION.                                                                    %
140%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
141\subsection{A brief overview of the backend compilation chain}
142\label{subsect.brief.overview.backend.compilation.chain}
143
144The compiler's backend consists of four distinct intermediate languages: RTL, ERTL, LTL and LIN.
145A fifth language, RTLabs, serves as the exit point of the backend and the entry point of the frontend.
146RTL, ERTL and LTL are `graph based' languages, whereas LIN is a linearised language
147
148\paragraph{RTL (Register Transfer Language)}
149
150\paragraph{ERTL (Extended Register Transfer Language)}
151
152\paragraph{LTL (Linearised Transfer Language)}
153
154\paragraph{LIN (Linearised)}
155
156%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
157% SECTION.                                                                    %
158%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
159\section{The backend intermediate languages in Matita}
160\label{sect.backend.intermediate.languages.matita}
161
162We now discuss the encoding of the compiler backend languages in the Calculus of Constructions proper.
163We pay particular heed to changes that we made from the O'Caml prototype.
164In particular, many aspects of the backend languages have been unified into a single `joint' language.
165We have also made heavy use of dependent types to reduce `spurious partiality' and to encode invariants.
166
167%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
168% SECTION.                                                                    %
169%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
170\subsection{Abstracting related languages}
171\label{subsect.abstracting.related.languages}
172
173The O'Caml compiler is written in the following manner.
174Each intermediate language has its own dedicated syntax, notions of internal function, and so on.
175Translations map syntaxes to syntaxes, and internal functions to internal functions explicitly.
176
177This is a perfectly valid way to write a compiler, where everything is made explicit, but writing a \emph{verified} compiler poses new challenges.
178In particular, we must look ahead to see how our choice of encodings will affect the size and complexity of the forthcoming proofs of correctness.
179We now discuss some abstractions, introduced in the Matita code, which we hope will make our proofs shorter, amongst other benefits.
180
181\paragraph{Shared code, reduced proofs}
182Many features of individual backend intermediate languages are shared with other intermediate languages.
183For instance, RTLabs, RTL, ERTL and LTL are all graph based languages, where functions are represented as a graph of statements that form their bodies.
184Functions for adding statements to a graph, searching the graph, and so on, are remarkably similar across all languages, but are duplicated in the O'Caml code.
185
186\paragraph{Changes between languages made explicit}
187
188\paragraph{Possible language commutations}
189
190\paragraph{Instruction selection}
191
192%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
193% SECTION.                                                                    %
194%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
195\subsection{Use of dependent types}
196\label{subsect.use.of.dependent.types}
197
198We use dependent types in the backend for three reasons.
199
200First, we encode informal invariants, or uses of \texttt{assert false} in the O'Caml code, with dependent types, converting partial functions into total functions.
201There are numerous examples of this throughout the backend.
202For example, in the \texttt{RTLabs} to \texttt{RTL} transformation pass, many functions only `make sense' when lists of registers passed to them as arguments conform to some specific length.
203For instance, the \texttt{translate\_negint} function, which translates a negative integer constant:
204\begin{lstlisting}
205definition translate_negint ≝
206  $\lambda$globals: list ident.
207  $\lambda$destrs: list register.
208  $\lambda$srcrs: list register.
209  $\lambda$start_lbl: label.
210  $\lambda$dest_lbl: label.
211  $\lambda$def: rtl_internal_function globals.
212  $\lambda$prf: |destrs| = |srcrs|. (* assert in caml code *)
213    ...
214\end{lstlisting}
215The last argument to the function, \texttt{prf}, is a proof that the lengths of the lists of source and destination registers are the same.
216This was an assertion in the O'Caml code.
217
218Secondly, we make use of dependent types to make the Matita code easier to read, and eventually the proofs of correctness for the compiler easier to write.
219For instance, many intermediate languages in the backend of the compiler, from RTLabs to LTL, are graph based languages.
220Here, function definitions consist of a graph (i.e. a map from labels to statements) and a pair of labels denoting the entry and exit points of this graph.
221Practically, we would always like to ensure that the entry and exit labels are present in the statement graph.
222We ensure that this is so with a dependent sum type in the \texttt{joint\_internal\_function} record, which all graph based languages specialise to obtain their own internal function representation:
223\begin{lstlisting}
224record joint_internal_function (globals: list ident) (p: params globals): Type[0] :=
225{
226  ...
227  joint_if_code     : codeT $\ldots$ p;
228  joint_if_entry    : $\Sigma$l: label. lookup $\ldots$ joint_if_code l $\neq$ None $\ldots$;
229  ...
230}.
231\end{lstlisting}
232Here, \texttt{codeT} is a parameterised type representing the `structure' of the function's body (a graph in graph based languages, and a list in the linearised LIN language).
233Specifically, the \texttt{joint\_if\_entry} is a dependent pair consisting of a label and a proof that the label in question is a vertex in the function's graph.
234A similar device exists for the exit label.
235
236Finally, we make use of dependent types for another reason: experimentation.
237Namely, CompCert makes little use of dependent types to encode invariants.
238In contrast, we wish to make as much use of dependent types as possible, both to experiment with different ways of encoding compilers in a proof assistant, but also as a way of `stress testing' Matita's support for dependent types.
239
240%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
241% SECTION.                                                                    %
242%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
243\subsection{What we do not implement}
244\label{subsect.what.we.do.not.implement}
245
246There are several classes of functionality that we have chosen not to implement in the backend languages:
247\begin{itemize}
248\item
249\textbf{Datatypes and functions over these datatypes that are not supported by the compiler.}
250In particular, the compiler does not support the floating point datatype, nor accompanying functions over that datatype.
251At the moment, frontend languages within the compiler possess constructors corresponding to floating point code.
252These are removed during instruction selection (in the RTLabs to RTL transformation) using a daemon.
253However, at some point, we would like the front end of the compiler to recognise programs that use floating point code and reject them as being invalid.
254\item
255\textbf{Functions and operations on datatypes that are implemented in the C runtime.}
256The compiler emits only a subset of the instructions available in the MCS-51's instruction architecture.
257In particular, integer modulus at the C source level is transformed into a call to a runtime function implementing the modulus operation during the translation to C-light.
258
259However, all datatypes corresponding to valid C operations over integers and floats mention integer modulus in the backend, and the translation of these operations is discharged using a daemon.
260We have plans to dispense with this `precooking' processs at the C-light level and move the translation of these operations into the RTLabs to RTL pass, where they can be translated properly.
261\item
262\textbf{Axiomatised components that will be implemented using external oracles.}
263Several large, complex pieces of compiler infrastructure, most noticably register colouring and fixed point calculation during liveness analysis have been axiomatised.
264This was already agreed upon before the start of the project, and is clearly marked in the project proposal, following comments by those involved with the CompCert project about the difficulty in formalising register colouring in that project.
265Instead, these components are axiomatised, along with the properties that they need to satisfy in order for the rest of the compilation chain to be correct.
266These axiomatised components are found in the ERTL to LTL pass.
267\item
268\textbf{A few non-computational proof obligations.}
269A few difficult-to-close, but non-computational (i.e. they do not prevent us from executing the compiler inside Matita), proof obligations have been closed using daemons in the backend.
270These proof obligations originate with our use of dependent types for expressing invariants in the compiler.
271\textbf{`Real' tailcalls}
272For the time being, tailcalls in the backend are translated to `vanilla' function calls during the ERTL to LTL pass.
273This follows the O'Caml compiler, which did not implement tailcalls, and did this simplification step.
274`Real' tailcalls are being implemented in the O'Caml compiler, and when this implementation is complete, we aim to port this code to the Matita compiler.
275\end{itemize}
276
277%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
278% SECTION.                                                                    %
279%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
280\section{Future work}
281\label{sect.future.work}
282
283As mentioned in Section~\ref{subsect.what.we.do.not.implement}, there are several unimplemented features in the compiler, and several aspects of the Matita code that can be improved in order to make currently partial functions total.
284We summarise this future work here:
285\begin{itemize}
286\item
287We plan to make use of dependent types to identify `floating point' free programs and make all functions total over such programs.
288This will remove a swathe of uses of daemons.
289This should be routine.
290\item
291We plan to move expansion of integer modulus, and other related functions, into the instruction selection (RTLabs to RTL) phase.
292This will also help to remove a swathe of uses of daemons, as well as potentially introduce new opportunities for optimisations that we currently miss in expanding these instructions at the C-light level.
293\item
294We plan to close all existing proof obligations that are closed using daemons, arising from our use of dependent types in the backend.
295This should be routine.
296\item
297We plan to port the O'Caml compiler's implementation of tailcalls when this is completed.
298This should not cause any major problems.
299\item
300We plan to validate the backend translations, removing any obvious bugs, by executing the translation inside Matita on small C programs.
301\end{itemize}
302
303\newpage
304
305%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
306% SECTION.                                                                    %
307%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
308\section{Code listing}
309\label{sect.code.listing}
310
311%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
312% SECTION.                                                                    %
313%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
314\subsection{Listing of files}
315\label{subsect.listing.files}
316
317Translation specific files (files relating to language semantics have been omitted):
318\begin{center}
319\begin{tabular*}{0.9\textwidth}{p{5cm}p{8cm}}
320Title & Description \\
321\hline
322\texttt{RTLabs/syntax.ma} & The syntax of RTLabs \\
323\texttt{RTLabs/RTLabsToRTL.ma} & The translation from RTLabs to RTL \\
324\texttt{joint/Joint.ma} & Abstracted syntax for backend languages \\
325\texttt{joint/TranslateUtils.ma} & Generic translation utilities \\
326\texttt{RTL/RTL.ma} & The syntax of RTL \\
327\texttt{RTL/RTLToERTL.ma} & The translation from RTL to ERTL \\
328\texttt{RTL/RTLtailcall.ma} & Elimination of tailcalls \\
329\texttt{ERTL/ERTL.ma} & The syntax of ERTL \\
330\texttt{ERTL/ERTLToLTL.ma} & The translation from ERTL to LTL \\
331\texttt{ERTL/Interference.ma} & Axiomatised graph colouring component \\
332\texttt{ERTL/liveness.ma} & Liveness analysis \\
333\texttt{LTL/LTL.ma} & The syntax of LTL \\
334\texttt{LTL/LTLToLIN.ma} & The translation from LTL to LIN \\
335\texttt{LIN/LIN.ma} & The syntax of LIN \\
336\texttt{LIN/LINToASM.ma} & The translation from LIN to assembly language
337\end{tabular*}
338\end{center}
339
340\end{document}
Note: See TracBrowser for help on using the repository browser.