# source:Deliverables/D4.2-4.3/reports/D4-2.tex@1360

Last change on this file since 1360 was 1360, checked in by mulligan, 9 years ago

added more on use of dependent types, and also discussing the abstraction of the intermediate languages

File size: 16.9 KB
Line
1\documentclass[11pt, epsf, a4wide]{article}
2
3\usepackage{../../style/cerco}
4
5\usepackage{amsfonts}
6\usepackage{amsmath}
7\usepackage{amssymb}
8\usepackage[english]{babel}
9\usepackage{graphicx}
10\usepackage[utf8x]{inputenc}
11\usepackage{listings}
12\usepackage{stmaryrd}
13\usepackage{url}
14
15\title{
16INFORMATION AND COMMUNICATION TECHNOLOGIES\\
17(ICT)\\
18PROGRAMME\\
19\vspace*{1cm}Project FP7-ICT-2009-C-243881 \cerco{}}
20
21\lstdefinelanguage{matita-ocaml}
22  {keywords={ndefinition,ncoercion,nlemma,ntheorem,nremark,ninductive,nrecord,nqed,nlet,let,in,rec,match,return,with,Type,try},
23   morekeywords={[2]nwhd,nnormalize,nelim,ncases,ndestruct},
24   morekeywords={[3]type,of},
25   mathescape=true,
26  }
27
28\lstset{language=matita-ocaml,basicstyle=\small\tt,columns=flexible,breaklines=false,
29        keywordstyle=\color{red}\bfseries,
30        keywordstyle=[2]\color{blue},
31        keywordstyle=[3]\color{blue}\bfseries,
33        stringstyle=\color{blue},
34        showspaces=false,showstringspaces=false}
35
36\lstset{extendedchars=false}
37\lstset{inputencoding=utf8x}
38\DeclareUnicodeCharacter{8797}{:=}
39\DeclareUnicodeCharacter{10746}{++}
40\DeclareUnicodeCharacter{9001}{\ensuremath{\langle}}
41\DeclareUnicodeCharacter{9002}{\ensuremath{\rangle}}
42
43\date{}
44\author{}
45
46\begin{document}
47
48\thispagestyle{empty}
49
50\vspace*{-1cm}
51\begin{center}
52\includegraphics[width=0.6\textwidth]{../../style/cerco_logo.png}
53\end{center}
54
55\begin{minipage}{\textwidth}
56\maketitle
57\end{minipage}
58
59\vspace*{0.5cm}
60\begin{center}
61\begin{LARGE}
62\textbf{
63Report n. D4.2\\
64Functional encoding in the Calculus of Constructions
65}
66\end{LARGE}
67\end{center}
68
69\vspace*{2cm}
70\begin{center}
71\begin{large}
72Version 1.0
73\end{large}
74\end{center}
75
76\vspace*{0.5cm}
77\begin{center}
78\begin{large}
79Main Authors:\\
80Dominic P. Mulligan and Claudio Sacerdoti Coen
81\end{large}
82\end{center}
83
84\vspace*{\fill}
85
86\noindent
87Project Acronym: \cerco{}\\
88Project full title: Certified Complexity\\
89Proposal/Contract no.: FP7-ICT-2009-C-243881 \cerco{}\\
90
91\clearpage
93\markright{\cerco{}, FP7-ICT-2009-C-243881}
94
95\newpage
96
97\vspace*{7cm}
98\paragraph{Abstract}
99We describe the encoding, in the Calculus of Constructions, of the intermediate languages for the CerCo compiler.
100The CerCo backend consists of four distinct intermediate languages---RTL, ERTL, LTL and LIN, respectively---though we also handle the translations from the last frontend intermediate language, RTLabs, into RTL, and the translation of LIN into assembly language.
101Where feasible, we have made use of dependent types to enforce invariants and to make otherwise partial functions total.
102\newpage
103
104\tableofcontents
105
106\newpage
107
108%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
109% SECTION.                                                                    %
110%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
113
114The Grant Agreement states that Task T4.2, entitled Functional encoding in the Calculus of Constructions' has associated Deliverable D4.2, consisting of the following:
115\begin{quotation}
116Executable Formal Semantics of back-end intermediate languages: This prototype is the formal counterpart of deliverable D2.1 for the back end side of the compiler and validates it.
117\end{quotation}
118This report details our implementation of this deliverable.
119
120%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
121% SECTION.                                                                    %
122%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
123\subsection{Connections with other deliverables}
124\label{subsect.connections.with.other.deliverables}
125
126Deliverable D4.2 enjoys a close relationship with three other deliverables, namely deliverables D2.1, D4.3 and D4.4.
127
128Deliverable D2.1, the O'Caml implementation of a cost preserving compiler for a large subset of the C programming language, is the basis upon which we have implemented the current deliverable.
129In particular, the architecture of the compiler, its intermediate languages, and the overall implementation of the Matita encodings has been taken from the O'Caml compiler.
130Any variations from the O'Caml design are due to bugs identified in the prototype compiler during the Matita implementation, our identification of code that can be abstracted and made generic, or our use of Matita's much stronger type system to enforce invariants through the use of dependent types.
131
132Deliverable D4.3 can be seen as a sister' deliverable to the deliverable reported on herein.
133In particular, where this deliverable reports on the encoding in the Calculus of Constructions of the backend translations, D4.3 is the encoding in the Calculus of Constructions of the semantics of those languages.
134As a result, a substantial amount of Matita code is shared between the two deliverables.
135
136Deliverable D4.4, the backend correctness proofs, is the immediate successor of this deliverable.
137
138%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
139% SECTION.                                                                    %
140%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
141\subsection{A brief overview of the compilation chain}
142\label{subsect.brief.overview.compilation.chain}
143
144
145
146%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
147% SECTION.                                                                    %
148%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
149\section{The backend intermediate languages in Matita}
150\label{sect.backend.intermediate.languages.matita}
151
152We now discuss the encoding of the compiler backend languages in the Calculus of Constructions proper.
153We pay particular heed to changes that we made from the O'Caml prototype.
154In particular, many aspects of the backend languages have been unified into a single joint' language.
155We have also made heavy use of dependent types to reduce spurious partiality' and to encode invariants.
156
157%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
158% SECTION.                                                                    %
159%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
160\subsection{Abstracting related languages}
161\label{subsect.abstracting.related.languages}
162
163The O'Caml compiler is written in the following manner.
164Each intermediate language has its own dedicated syntax, notions of internal function, and so on.
165Translations map syntaxes to syntaxes, and internal functions to internal functions explicitly.
166
167This is a perfectly valid way to write a compiler, where everything is made explicit, but writing a \emph{verified} compiler poses new challenges.
168In particular, we must look ahead to see how our choice of encodings will affect the size and complexity of the forthcoming proofs of correctness.
169We now discuss some abstractions, introduced in the Matita code, which we hope will make our proofs shorter, amongst other benefits.
170
171\paragraph{Shared code}
172Many features of individual backend intermediate languages are shared with other intermediate languages.
173For instance, RTLabs, RTL, ERTL and LTL are all graph based languages, where functions are represented as a graph of statements that form their bodies.
174Functions for adding statements to a graph, searching the graph, and so on, are remarkably similar across all languages, but are duplicated in the O'Caml code.
175
176%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
177% SECTION.                                                                    %
178%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
179\subsection{Use of dependent types}
180\label{subsect.use.of.dependent.types}
181
182We use dependent types in the backend for three reasons.
183
184First, we encode informal invariants, or uses of \texttt{assert false} in the O'Caml code, with dependent types, converting partial functions into total functions.
185There are numerous examples of this throughout the backend.
186For example, in the \texttt{RTLabs} to \texttt{RTL} transformation pass, many functions only make sense' when lists of registers passed to them as arguments conform to some specific length.
187For instance, the \texttt{translate\_negint} function, which translates a negative integer constant:
188\begin{lstlisting}
189definition translate_negint ≝
190  $\lambda$globals: list ident.
191  $\lambda$destrs: list register.
192  $\lambda$srcrs: list register.
193  $\lambda$start_lbl: label.
194  $\lambda$dest_lbl: label.
195  $\lambda$def: rtl_internal_function globals.
196  $\lambda$prf: |destrs| = |srcrs|. (* assert in caml code *)
197    ...
198\end{lstlisting}
199The last argument to the function, \texttt{prf}, is a proof that the lengths of the lists of source and destination registers are the same.
200This was an assertion in the O'Caml code.
201
202Secondly, we make use of dependent types to make the Matita code easier to read, and eventually the proofs of correctness for the compiler easier to write.
203For instance, many intermediate languages in the backend of the compiler, from RTLabs to LTL, are graph based languages.
204Here, function definitions consist of a graph (i.e. a map from labels to statements) and a pair of labels denoting the entry and exit points of this graph.
205Practically, we would always like to ensure that the entry and exit labels are present in the statement graph.
206We ensure that this is so with a dependent sum type in the \texttt{joint\_internal\_function} record, which all graph based languages specialise to obtain their own internal function representation:
207\begin{lstlisting}
208record joint_internal_function (globals: list ident) (p: params globals): Type[0] :=
209{
210  ...
211  joint_if_code     : codeT $\ldots$ p;
212  joint_if_entry    : $\Sigma$l: label. lookup $\ldots$ joint_if_code l $\neq$ None $\ldots$;
213  ...
214}.
215\end{lstlisting}
216Here, \texttt{codeT} is a parameterised type representing the structure' of the function's body (a graph in graph based languages, and a list in the linearised LIN language).
217Specifically, the \texttt{joint\_if\_entry} is a dependent pair consisting of a label and a proof that the label in question is a vertex in the function's graph.
218A similar device exists for the exit label.
219
220Finally, we make use of dependent types for another reason: experimentation.
221Namely, CompCert makes little use of dependent types to encode invariants.
222In contrast, we wish to make as much use of dependent types as possible, both to experiment with different ways of encoding compilers in a proof assistant, but also as a way of stress testing' Matita's support for dependent types.
223
224%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
225% SECTION.                                                                    %
226%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
227\subsection{What we do not implement}
228\label{subsect.what.we.do.not.implement}
229
230There are several classes of functionality that we have chosen not to implement in the backend languages:
231\begin{itemize}
232\item
233\textbf{Datatypes and functions over these datatypes that are not supported by the compiler.}
234In particular, the compiler does not support the floating point datatype, nor accompanying functions over that datatype.
235At the moment, frontend languages within the compiler possess constructors corresponding to floating point code.
236These are removed during instruction selection (in the RTLabs to RTL transformation) using a daemon.
237However, at some point, we would like the front end of the compiler to recognise programs that use floating point code and reject them as being invalid.
238\item
239\textbf{Functions and operations on datatypes that are implemented in the C runtime.}
240The compiler emits only a subset of the instructions available in the MCS-51's instruction architecture.
241In particular, integer modulus at the C source level is transformed into a call to a runtime function implementing the modulus operation during the translation to C-light.
242
243However, all datatypes corresponding to valid C operations over integers and floats mention integer modulus in the backend, and the translation of these operations is discharged using a daemon.
244We have plans to dispense with this precooking' processs at the C-light level and move the translation of these operations into the RTLabs to RTL pass, where they can be translated properly.
245\item
246\textbf{Axiomatised components that will be implemented using external oracles.}
247Several large, complex pieces of compiler infrastructure, most noticably register colouring and fixed point calculation during liveness analysis have been axiomatised.
248This was already agreed upon before the start of the project, and is clearly marked in the project proposal, following comments by those involved with the CompCert project about the difficulty in formalising register colouring in that project.
249Instead, these components are axiomatised, along with the properties that they need to satisfy in order for the rest of the compilation chain to be correct.
250These axiomatised components are found in the ERTL to LTL pass.
251\item
252\textbf{A few non-computational proof obligations.}
253A few difficult-to-close, but non-computational (i.e. they do not prevent us from executing the compiler inside Matita), proof obligations have been closed using daemons in the backend.
254These proof obligations originate with our use of dependent types for expressing invariants in the compiler.
255\textbf{Real' tailcalls}
256For the time being, tailcalls in the backend are translated to vanilla' function calls during the ERTL to LTL pass.
257This follows the O'Caml compiler, which did not implement tailcalls, and did this simplification step.
258Real' tailcalls are being implemented in the O'Caml compiler, and when this implementation is complete, we aim to port this code to the Matita compiler.
259\end{itemize}
260
261%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
262% SECTION.                                                                    %
263%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
264\section{Future work}
265\label{sect.future.work}
266
267As mentioned in Section~\ref{subsect.what.we.do.not.implement}, there are several unimplemented features in the compiler, and several aspects of the Matita code that can be improved in order to make currently partial functions total.
268We summarise this future work here:
269\begin{itemize}
270\item
271We plan to make use of dependent types to identify floating point' free programs and make all functions total over such programs.
272This will remove a swathe of uses of daemons.
273This should be routine.
274\item
275We plan to move expansion of integer modulus, and other related functions, into the instruction selection (RTLabs to RTL) phase.
276This will also help to remove a swathe of uses of daemons, as well as potentially introduce new opportunities for optimisations that we currently miss in expanding these instructions at the C-light level.
277\item
278We plan to close all existing proof obligations that are closed using daemons, arising from our use of dependent types in the backend.
279This should be routine.
280\item
281We plan to port the O'Caml compiler's implementation of tailcalls when this is completed.
282This should not cause any major problems.
283\item
284We plan to validate the backend translations, removing any obvious bugs, by executing the translation inside Matita on small C programs.
285\end{itemize}
286
287\newpage
288
289%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
290% SECTION.                                                                    %
291%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
292\section{Code listing}
293\label{sect.code.listing}
294
295%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
296% SECTION.                                                                    %
297%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
298\subsection{Listing of files}
299\label{subsect.listing.files}
300
301Translation specific files (files relating to language semantics have been omitted):
302\begin{center}
303\begin{tabular*}{0.9\textwidth}{p{5cm}p{8cm}}
304Title & Description \\
305\hline
306\texttt{RTLabs/syntax.ma} & The syntax of RTLabs \\
307\texttt{RTLabs/RTLabsToRTL.ma} & The translation from RTLabs to RTL \\
308\texttt{joint/Joint.ma} & Abstracted syntax for backend languages \\
309\texttt{joint/TranslateUtils.ma} & Generic translation utilities \\
310\texttt{RTL/RTL.ma} & The syntax of RTL \\
311\texttt{RTL/RTLToERTL.ma} & The translation from RTL to ERTL \\
312\texttt{RTL/RTLtailcall.ma} & Elimination of tailcalls \\
313\texttt{ERTL/ERTL.ma} & The syntax of ERTL \\
314\texttt{ERTL/ERTLToLTL.ma} & The translation from ERTL to LTL \\
315\texttt{ERTL/Interference.ma} & Axiomatised graph colouring component \\
316\texttt{ERTL/liveness.ma} & Liveness analysis \\
317\texttt{LTL/LTL.ma} & The syntax of LTL \\
318\texttt{LTL/LTLToLIN.ma} & The translation from LTL to LIN \\
319\texttt{LIN/LIN.ma} & The syntax of LIN \\
320\texttt{LIN/LINToASM.ma} & The translation from LIN to assembly language
321\end{tabular*}
322\end{center}
323
324\end{document}
Note: See TracBrowser for help on using the repository browser.