source: Deliverables/D4.2-4.3/reports/D4-2.tex @ 1997

Last change on this file since 1997 was 1997, checked in by mulligan, 7 years ago

Changed titles of reports to match correct deliverable title

File size: 40.7 KB
Line 
1\documentclass[11pt, epsf, a4wide]{article}
2
3\usepackage{../../style/cerco}
4
5\usepackage{amsfonts}
6\usepackage{amsmath}
7\usepackage{amssymb} 
8\usepackage[english]{babel}
9\usepackage{graphicx}
10\usepackage[utf8x]{inputenc}
11\usepackage{listings}
12\usepackage{lscape}
13\usepackage{stmaryrd}
14\usepackage{threeparttable}
15\usepackage{url}
16
17\title{
18INFORMATION AND COMMUNICATION TECHNOLOGIES\\
19(ICT)\\
20PROGRAMME\\
21\vspace*{1cm}Project FP7-ICT-2009-C-243881 \cerco{}}
22
23\lstdefinelanguage{matita-ocaml}
24  {keywords={definition,coercion,lemma,theorem,remark,inductive,record,qed,let,in,rec,match,return,with,Type,try},
25   morekeywords={[2]whd,normalize,elim,cases,destruct},
26   morekeywords={[3]type,of},
27   mathescape=true,
28  }
29
30\lstset{language=matita-ocaml,basicstyle=\small\tt,columns=flexible,breaklines=false,
31        keywordstyle=\color{red}\bfseries,
32        keywordstyle=[2]\color{blue},
33        keywordstyle=[3]\color{blue}\bfseries,
34        commentstyle=\color{green},
35        stringstyle=\color{blue},
36        showspaces=false,showstringspaces=false}
37
38\lstset{extendedchars=false}
39\lstset{inputencoding=utf8x}
40\DeclareUnicodeCharacter{8797}{:=}
41\DeclareUnicodeCharacter{10746}{++}
42\DeclareUnicodeCharacter{9001}{\ensuremath{\langle}}
43\DeclareUnicodeCharacter{9002}{\ensuremath{\rangle}}
44
45\date{}
46\author{}
47
48\begin{document}
49
50\thispagestyle{empty}
51
52\vspace*{-1cm}
53\begin{center}
54\includegraphics[width=0.6\textwidth]{../../style/cerco_logo.png}
55\end{center}
56
57\begin{minipage}{\textwidth}
58\maketitle
59\end{minipage}
60
61\vspace*{0.5cm}
62\begin{center}
63\begin{LARGE}
64\textbf{
65Report n. D4.2\\
66CIC encoding: Back-end
67}
68\end{LARGE} 
69\end{center}
70
71\vspace*{2cm}
72\begin{center}
73\begin{large}
74Version 1.0
75\end{large}
76\end{center}
77
78\vspace*{0.5cm}
79\begin{center}
80\begin{large}
81Main Authors:\\
82Dominic P. Mulligan and Claudio Sacerdoti Coen
83\end{large}
84\end{center}
85
86\vspace*{\fill}
87
88\noindent
89Project Acronym: \cerco{}\\
90Project full title: Certified Complexity\\
91Proposal/Contract no.: FP7-ICT-2009-C-243881 \cerco{}\\
92
93\clearpage
94\pagestyle{myheadings}
95\markright{\cerco{}, FP7-ICT-2009-C-243881}
96
97\newpage
98
99\vspace*{7cm}
100\paragraph{Abstract}
101We describe the encoding, in the Calculus of Constructions, of the intermediate languages for the CerCo compiler.
102The CerCo backend consists of four distinct intermediate languages---RTL, ERTL, LTL and LIN, respectively---though we also handle the translations from the last frontend intermediate language, RTLabs, into RTL, and the translation of LIN into assembly language.
103Where feasible, we have made use of dependent types to enforce invariants and to make otherwise partial functions total.
104\newpage
105
106\tableofcontents
107
108\newpage
109
110%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
111% SECTION.                                                                    %
112%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
113\section{Task}
114\label{sect.task}
115
116The Grant Agreement states that Task T4.2, entitled `Functional encoding in the Calculus of Constructions' has associated Deliverable D4.2, consisting of the following:
117\begin{quotation}
118CIC encoding: Back-end: Functional Specification in the internal language of the Proof Assistant (the Calculus of Inductive Construction) of the back end of the compiler. This unit is meant to be composable with the front-end of deliverable D3.2, to obtain a full working compiler for Milestone M2. A first validation of the design principles and implementation choices for the Untrusted Cost-annotating OCaml Compiler D2.2 is achieved and reported in the deliverable, possibly triggering updates of the Untrusted Cost-annotating OCaml Compiler sources.
119\end{quotation}
120This report details our implementation of this deliverable.
121
122%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
123% SECTION.                                                                    %
124%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
125\subsection{Connections with other deliverables}
126\label{subsect.connections.with.other.deliverables}
127
128Deliverable D4.2 enjoys a close relationship with three other deliverables, namely deliverables D2.2, D4.3 and D4.4.
129
130Deliverable D2.2, the O'Caml implementation of a cost preserving compiler for a large subset of the C programming language, is the basis upon which we have implemented the current deliverable.
131In particular, the architecture of the compiler, its intermediate languages, and the overall implementation of the Matita encodings has been taken from the O'Caml compiler.
132Any variations from the O'Caml design are due to bugs identified in the prototype compiler during the Matita implementation, our identification of code that can be abstracted and made generic, or our use of Matita's much stronger type system to enforce invariants through the use of dependent types.
133
134Deliverable D4.3 can be seen as a `sister' deliverable to the deliverable reported on herein.
135In particular, where this deliverable reports on the encoding in the Calculus of Constructions of the backend translations, D4.3 is the encoding in the Calculus of Constructions of the semantics of those languages.
136As a result, a substantial amount of Matita code is shared between the two deliverables.
137
138Deliverable D4.4, the backend correctness proofs, is the immediate successor of this deliverable.
139
140%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
141% SECTION.                                                                    %
142%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
143\subsection{A brief overview of the backend compilation chain}
144\label{subsect.brief.overview.backend.compilation.chain}
145
146The Matita compiler's backend consists of five distinct intermediate languages: RTL, RTLntl, ERTL, LTL and LIN.
147A sixth language, RTLabs, serves as the entry point of the backend and the exit point of the frontend.
148RTL, RTLntl, ERTL and LTL are `control flow graph based' languages, whereas LIN is a linearised language, the final language before translation to assembly.
149
150We now briefly discuss the properties of the intermediate languages, and discuss the various transformations that take place during the translation process:
151
152\paragraph{RTLabs ((Abstract) Register Transfer Language)}
153As mentioned, this is the final language of the compiler's frontend and the entry point for the backend.
154This language uses pseudoregisters, not hardware registers.\footnote{There are an unbounded number of pseudoregisters.  Pseudoregisters are converted to hardware registers or stack positions during register allocation.}
155Functions still use stackframes, where arguments are passed on the stack and results are stored in addresses.
156During the pass to RTL instruction selection is carried out.
157
158\paragraph{RTL (Register Transfer Language)}
159This language uses pseudoregisters, not hardware registers.
160Tailcall elimination is carried out during the translation from RTL to RTLntl.
161
162\paragraph{RTLntl (Register Transfer Language --- No Tailcalls)}
163This language is a pseudoregister, graph based language where all tailcalls are eliminated.
164RTLntl is not present in the O'Caml compiler, the RTL language is reused for this purpose.
165
166\paragraph{ERTL (Explicit Register Transfer Language)}
167This is a language very similar to RTLntl.
168However, the calling convention is made explicit, in that functions no longer receive and return inputs and outputs via a high-level mechanism, but rather use stack slots or hadware registers.
169The ERTL to LTL pass performs the following transformations: liveness analysis, register colouring and register/stack slot allocation.
170
171\paragraph{LTL (Linearisable Transfer Language)}
172Another graph based language, but uses hardware registers instead of pseudoregisters.
173Tunnelling (branch compression) should be implemented here.
174
175\paragraph{LIN (Linearised)}
176This is a linearised form of the LTL language; function graphs have been linearised into lists of statements.
177All registers have been translated into hardware registers or stack addresses.
178This is the final stage of compilation before translating directly into assembly language.
179
180%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
181% SECTION.                                                                    %
182%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
183\section{The backend intermediate languages in Matita}
184\label{sect.backend.intermediate.languages.matita}
185
186We now discuss the encoding of the compiler backend languages in the Calculus of Constructions proper.
187We pay particular heed to changes that we made from the O'Caml prototype.
188In particular, many aspects of the backend languages have been unified into a single `joint' language.
189We have also made heavy use of dependent types to reduce `spurious partiality' and to encode invariants.
190
191%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
192% SECTION.                                                                    %
193%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
194\subsection{Abstracting related languages}
195\label{subsect.abstracting.related.languages}
196
197The O'Caml compiler is written in the following manner.
198Each intermediate language has its own dedicated syntax, notions of internal function, and so on.
199Here, we make a distinction between `internal functions'---other functions that are explicitly written by the programmer---and `external functions', which belong to external library and require explictly linking.
200In particular, IO can be seen as a special case of the `external function' mechanism.
201Internal functions are represented as a record consisting of a sequential structure of statements, entry and exit points to this structure, and other book keeping devices.
202This sequential structure can either be a control flow graph or a linearised list of statements, depending on the language.
203Translations between intermediate language map syntaxes to syntaxes, and internal function representations to internal function representations explicitly.
204
205This is a perfectly valid way to write a compiler, where everything is made explicit, but writing a \emph{verified} compiler poses new challenges.
206In particular, we must look ahead to see how our choice of encodings will affect the size and complexity of the forthcoming proofs of correctness.
207We now discuss some abstractions, introduced in the Matita code, which we hope will make our proofs shorter, amongst other benefits.
208
209\paragraph{Changes between languages made explicit}
210Due to the bureaucracy inherent in explicating each intermediate language's syntax in the O'Caml compiler, it can often be hard to see exactly what changes between each successive intermediate language.
211By abstracting the syntax of the RTL, ERTL, LTL and LIN intermediate languages, we make these changes much clearer.
212
213Our abstraction takes the following form:
214\begin{lstlisting}
215inductive joint_instruction (p: params__) (globals: list ident): Type[0] :=
216  | COMMENT: String $\rightarrow$ joint_instruction p globals
217  ...
218  | INT: generic_reg p $\rightarrow$ Byte $\rightarrow$ joint_instruction p globals
219  ...
220  | OP1: Op1 $
221ightarrow$ acc_a_reg p $
222ightarrow$ acc_a_reg p $
223ightarrow$ joint_instruction p globals
224  ...
225  | extension: extend_statements p $\rightarrow$ joint_instruction p globals.
226\end{lstlisting}
227We first note that for the majority of intermediate languages, many instructions are shared.
228However, these instructions expect different register types (either a pseudoregister or a hardware register) as arguments.
229We must therefore parameterise the joint syntax with a record of parameters that will be specialised to each intermediate language.
230In the type above, this parameterisation is realised with the \texttt{params\_\_} record.
231As a result of this parameterisation, we have also added a degree of `type safety' to the intermediate languages' syntaxes.
232In particular, we note that the \texttt{OP1} constructor expects quite a specific type, in that the two register arguments must both be what passes for the accumulator A in that language.
233In some languages, for example LIN, this is the hardware accumulator, whilst in others this is any pseudoregister.
234Contrast this with the \texttt{INT} constructor, which expects a \texttt{generic\_reg}, corresponding to an `arbitrary' register type.
235
236Further, we note that some intermediate languages have language specific instructions (i.e. the instructions that change between languages).
237We therefore add a new constructor to the syntax, \texttt{extension}, which expects a value of type \texttt{extend\_statements p}.
238As \texttt{p} varies between intermediate languages, we can provide language specific extensions to the syntax of the joint language.
239For example, ERTL's extended syntax consists of the following extra statements:
240\begin{lstlisting}
241inductive ertl_statement_extension: Type[0] :=
242  | ertl_st_ext_new_frame: ertl_statement_extension
243  | ertl_st_ext_del_frame: ertl_statement_extension
244  | ertl_st_ext_frame_size: register $\rightarrow$ ertl_statement_extension.
245\end{lstlisting}
246These are further packaged into an ERTL specific instance of \texttt{params\_\_} as follows:
247\begin{lstlisting}
248definition ertl_params__: params__ :=
249  mk_params__ register register ... ertl_statement_extension.
250\end{lstlisting}
251
252\paragraph{Shared code, reduced proofs}
253Many features of individual backend intermediate languages are shared with other intermediate languages.
254For instance, RTLabs, RTL, ERTL and LTL are all graph based languages, where functions are represented as a control flow graph of statements that form their bodies.
255Functions for adding statements to a graph, searching the graph, and so on, are remarkably similar across all languages, but are duplicated in the O'Caml code.
256
257As a result, we chose to abstract the representation of internal functions for the RTL, ERTL, LTL and LIN intermediate languages into a `joint' representation.
258This representation is parameterised by a record that dictates the layout of the function body for each intermediate language.
259For instance, in RTL, the layout is graph like, whereas in LIN, the layout is a linearised list of statements.
260Further, a generalised way of accessing the successor statement to the one currently under consideration is needed, and so forth.
261
262Our joint internal function record looks like so:
263\begin{lstlisting}
264record joint_internal_function (globals: list ident) (p:params globals) : Type[0] :=
265{ 
266  ...
267  joint_if_params   : paramsT p;
268  joint_if_locals   : localsT p;
269  ...
270  joint_if_code     : codeT ... p;
271  ...
272}.
273\end{lstlisting}
274In particular, everything that can vary between differing intermediate languages has been parameterised.
275Here, we see the location where to fetch parameters, the listing of local variables, and the internal code representation has been parameterised.
276Other particulars are also parameterised, though here omitted.
277
278Hopefully this abstraction process will reduce the number of proofs that need to be written, dealing with internal functions of different languages characterised by parameters.
279
280\paragraph{Dependency on instruction selection}
281We note that the backend languages are all essentially `post instruction selection languages'.
282The `joint' syntax makes this especially clear.
283For instance, in the definition:
284\begin{lstlisting}
285inductive joint_instruction (p:params__) (globals: list ident): Type[0] :=
286  ...
287  | INT: generic_reg p $\rightarrow$ Byte $\rightarrow$ joint_instruction p globals
288  | MOVE: pair_reg p $\rightarrow$ joint_instruction p globals
289  ...
290  | PUSH: acc_a_reg p $\rightarrow$ joint_instruction p globals
291  ...
292  | extension: extend_statements p $\rightarrow$ joint_instruction p globals.
293\end{lstlisting}
294The capitalised constructors---\texttt{INT}, \texttt{MOVE}, and so on---are all machine specific instructions.
295Retargetting the compiler to another microprocessor, improving instruction selection, or simply enlarging the subset of the machine language that the compiler can use, would entail replacing these constructors with constructors that correspond to the instructions of the new target.
296We feel that this makes which instructions are target dependent, and which are not (i.e. those language specific instructions that fall inside the \texttt{extension} constructor) much more explicit.
297In the long term, we would really like to try to directly embed the target language in the syntax, in order to reuse the target language's semantics.
298
299\paragraph{Independent development and testing}
300We have essentially modularised the intermediate languages in the compiler backend.
301As with any form of modularisation, we reap benefits in the ability to independently test and develop each intermediate language separately, with the benefit of fixing bugs just once.
302
303\paragraph{Future reuse for other compiler projects}
304Another advantage of our modularisation scheme is the ability to quickly use and reuse intermediate languages for other compiler projects.
305For instance, in creating a cost-preserving compiler for a functional language, we may choose to target a linearised version of RTL directly.
306Adding such an intermediate language would involve the addition of just a few lines of code.
307
308\paragraph{Easy addition of new compiler passes}
309Under our modularisation and abstraction scheme, new compiler passes can easily be injected into the backend.
310We have a concrete example of this in the RTLntl language, an intermediate language that was not present in the original O'Caml code.
311To specify a new intermediate language we must simply specify, through the use of the statement extension mechanism, what differs in the new intermediate language from the `joint' language, and configure a new notion of internal function record, by specialising parameters, to the new language.
312As generic code for the `joint' language exists, for example to add statements to control flow graphs, this code can be reused for the new intermediate language.
313
314\paragraph{Possible commutations of translation passes}
315The backend translation passes of the CerCo compiler differ quite a bit from the CompCert compiler.
316In the CompCert compiler, linearisation occurs much earlier in the compilation chain, and passes such as register colouring and allocation are carried out on a linearised form of program.
317Contrast this with our own approach, where the code is represented as a graph for much longer.
318Similarly, in CompCert the calling conventions are enforced after register allocation, whereas we do register allocation before enforcing the calling convention.
319
320However, by abstracting the representation of intermediate functions, we are now much more free to reorder translation passes as we see fit.
321The linearisation process, for instance, now no longer cares about the specific representation of code in the source and target languages.
322It just relies on a common interface.
323We are therefore, in theory, free to pick where we wish to linearise our representation.
324This adds an unusual flexibility into the compilation process, and allows us to freely experiment with different orderings of translation passes.
325
326%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
327% SECTION.                                                                    %
328%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
329\subsection{Use of dependent types}
330\label{subsect.use.of.dependent.types}
331
332We see several potential ways in which a compiler can fail to compile a program:
333\begin{enumerate}
334\item
335The program is malformed, and there is no hope of making sense of the program.
336\item
337The compiler is buggy, or an invariant in the compiler is invalidated.
338\item
339An incomplete heuristic in the compiler fails.
340\item
341The compiled code exhausts some bounded resource, for instance the processor's code memory.
342\end{enumerate}
343Standard compilers can fail for all the above reasons.
344Certified compilers are only required to rule out the second class of failures, but they can still fail for all the remaining reasons.
345In particular, a compiler that systematically refuses to compile any well formed program is a sound compiler.
346On the contrary, we would like our certified compiler to fail only in the fourth case.
347We plan to achieve this with the following strategy.
348
349First, the compiler is abstracted over all incomplete heuristics, seen as total functions.
350To obtain executable code, the compiler is eventually composed with implementations of the abstracted strategies, with the composition taking care of any potential failure of the heuristics in finding a solution.
351
352Second, we reject all malformed programs using dependent types: only well-formed programs should typecheck and the compiler can be applied only to well-typed programs.
353
354Finally, exhaustion of bounded resources can be checked only at the very end of compilation.
355Therefore, all intermediate compilation steps are now total functions that cannot diverge, nor fail: these properties are directly guaranteed by the type system of Matita.
356
357Presently, the plan is not yet fulfilled.
358However, we are improving the implemented code according to our plan.
359We are doing this by progressively strenghthening the code through the use of dependent types.
360We detail the different ways in which dependent types have been used so far.
361
362First, we encode informal invariants, or uses of \texttt{assert false} in the O'Caml code, with dependent types, converting partial functions into total functions.
363There are numerous examples of this throughout the backend.
364For example, in the \texttt{RTLabs} to \texttt{RTL} transformation pass, many functions only `make sense' when lists of registers passed to them as arguments conform to some specific length.
365For instance, the \texttt{translate\_negint} function, which translates a negative integer constant:
366\begin{lstlisting}
367definition translate_negint :=
368  $\lambda$globals: list ident.
369  $\lambda$destrs: list register.
370  $\lambda$srcrs: list register.
371  $\lambda$start_lbl: label.
372  $\lambda$dest_lbl: label.
373  $\lambda$def: rtl_internal_function globals.
374  $\lambda$prf: |destrs| = |srcrs|. (* assert here *)
375    ...
376\end{lstlisting}
377The last argument to the function, \texttt{prf}, is a proof that the lengths of the lists of source and destination registers are the same.
378This was an assertion in the O'Caml code.
379
380Secondly, we make use of dependent types to make the Matita code correct by construction, and eventually the proofs of correctness for the compiler easier to write.
381For instance, many intermediate languages in the backend of the compiler, from RTLabs to LTL, are graph based languages.
382Here, function definitions consist of a graph (i.e. a map from labels to statements) and a pair of labels denoting the entry and exit points of this graph.
383Practically, we would always like to ensure that the entry and exit labels are present in the statement graph.
384We ensure that this is so with a dependent sum type in the \texttt{joint\_internal\_function} record, which all graph based languages specialise to obtain their own internal function representation:
385\begin{lstlisting}
386record joint_internal_function (globals: list ident) (p: params globals): Type[0] :=
387{
388  ...
389  joint_if_code     : codeT $\ldots$ p;
390  joint_if_entry    : $\Sigma$l: label. lookup $\ldots$ joint_if_code l $\neq$ None $\ldots$;
391  ...
392}.
393\end{lstlisting}
394Here, \texttt{codeT} is a parameterised type representing the `structure' of the function's body (a graph in graph based languages, and a list in the linearised LIN language).
395Specifically, the \texttt{joint\_if\_entry} is a dependent pair consisting of a label and a proof that the label in question is a vertex in the function's graph. A similar device exists for the exit label.
396
397We make use of dependent types also for another reason: experimentation.
398Namely, CompCert makes little use of dependent types to encode invariants.
399In contrast, we wish to make as much use of dependent types as possible, both to experiment with different ways of encoding compilers in a proof assistant, but also as a way of `stress testing' Matita's support for dependent types.
400
401Moreover, at the moment we make practically no use of inductive predicates to specify compiler invariants and to describe the semantics of intermediate languages.
402On the contrary, all predicates are computable functions.
403Therefore, the proof style that we will adopt will be necessarily significantly different from, say, CompCert's one.
404At the moment, in Matita `Russell-'-style reasoning (in the sense of~\cite{sozeau:subset:2006}) seems to be the best solution for working with computable functions.
405This style is heavily based on the idea that all computable functions should be specified using dependent types to describe their pre- and post-conditions.
406As a consequence, it is natural to add dependent types almost everywhere in the Matita compiler's codebase.
407
408%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
409% SECTION.                                                                    %
410%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
411\subsection{What we do not implement}
412\label{subsect.what.we.do.not.implement}
413
414There are several classes of functionality that we have chosen not to implement in the backend languages:
415\begin{itemize}
416\item
417\textbf{Datatypes and functions over these datatypes that are not supported by the compiler.}
418In particular, the compiler does not support the floating point datatype, nor accompanying functions over that datatype.
419At the moment, frontend languages within the compiler possess constructors corresponding to floating point code.
420These are removed during instruction selection (in the RTLabs to RTL transformation) using a daemon.\footnote{A Girardism.  An axiom of type \texttt{False}, from which we can prove anything.}
421However, at some point, we would like the front end of the compiler to recognise programs that use floating point code and reject them as being invalid.
422\item
423\textbf{Axiomatised components that will be implemented using external oracles.}
424Several large, complex pieces of compiler infrastructure, most noticably register colouring and fixed point calculation during liveness analysis have been axiomatised.
425This was already agreed upon before the start of the project, and is clearly marked in the project proposal, following comments by those involved with the CompCert project about the difficulty in formalising register colouring in that project.
426Instead, these components are axiomatised, along with the properties that they need to satisfy in order for the rest of the compilation chain to be correct.
427These axiomatised components are found in the ERTL to LTL pass.
428
429It should be noted that these axiomatised components fall into the following pattern: whilst their implementation is complex, and their proof of correctness is difficult, we are able to quickly and easily verify that any answer that they provide is correct. Therefore, we plan to provide implementations in OCaml only,
430and to provide certified verifiers in Matita.
431At the moment, the implementation of the certified verifiers is left as future work.
432\item
433\textbf{A few non-computational proof obligations.}
434A few difficult-to-close, but non-computational (i.e. they do not prevent us from executing the compiler inside Matita), proof obligations have been closed using daemons in the backend.
435These proof obligations originate with our use of dependent types for expressing invariants in the compiler.
436However, here, it should be mentioned that many open proof obligations are simply impossible to close until we start to obtain stronger invariants from the proof of correctness for the compiler proper.
437In particular, in the RTLabs to RTL pass, several proof obligations relating to lists of registers stored in a `local environment' appear to fall into this pattern.
438\item
439\textbf{Branch compression (tunnelling).}
440This was a feature of the O'Caml compiler.
441It is not yet currently implemented in the Matita compiler.
442This feature is only an optimisation, and will not affect the correctness of the compiler.
443\item
444\textbf{`Real' tailcalls}
445For the time being, tailcalls in the backend are translated to `vanilla' function calls during the ERTL to LTL pass.
446This follows the O'Caml compiler, which did not implement tailcalls, and did this simplification step.
447`Real' tailcalls are being implemented in the O'Caml compiler, and when this implementation is complete, we aim to port this code to the Matita compiler.
448\end{itemize}
449
450%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
451% SECTION.                                                                    %
452%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
453\section{Associated changes to O'Caml compiler}
454\label{sect.associated.changes.to.ocaml.compiler}
455
456At the moment, only bugfixes, but no architectural changes we have made in the Matita backend have made their way back into the O'Caml compiler.
457We do not see the heavy process of modularisation and abstraction as making its way back into the O'Caml codebase, as this is a significant rewrite of the backend code that is supposed to yield the same code after instantiating parameters, anyway.
458
459%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
460% SECTION.                                                                    %
461%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
462\section{Future work}
463\label{sect.future.work}
464
465As mentioned in Section~\ref{subsect.what.we.do.not.implement}, there are several unimplemented features in the compiler, and several aspects of the Matita code that can be improved in order to make currently partial functions total.
466We summarise this future work here:
467\begin{itemize}
468\item
469We plan to make use of dependent types to identify `floating point' free programs and make all functions total over such programs.
470This will remove a swathe of uses of daemons.
471This should be routine.
472\item
473We plan to move expansion of integer modulus, and other related functions, into the instruction selection (RTLabs to RTL) phase.
474This will also help to remove a swathe of uses of daemons, as well as potentially introduce new opportunities for optimisations that we currently miss in expanding these instructions at the C-light level.
475\item
476We plan to close all existing proof obligations that are closed using daemons, arising from our use of dependent types in the backend.
477However, many may not be closable until we have completed Deliverable D4.4, the certification of the whole compiler, as we may not have invariants strong enough at the present time.
478\item
479We plan to port the O'Caml compiler's implementation of tailcalls when this is completed, and eventually port the branch compression code currently in the O'Caml compiler to the Matita implementation.
480This should not cause any major problems.
481\item
482We plan to validate the backend translations, removing any obvious bugs, by executing the translation inside Matita on small C programs.
483This is not critical, as the certification process will find all bugs anyway.
484\item We plan to provide certified validators for all results provided by
485external oracles written in OCaml. At the moment, we have axiomatized oracles
486for computing least fixpoints during liveness analysis, for colouring
487registers and for branch displacement in the assembler code.
488\end{itemize}
489
490%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
491% SECTION.                                                                    %
492%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
493\section{Code listing}
494\label{sect.code.listing}
495
496%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
497% SECTION.                                                                    %
498%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
499\subsection{Listing of files}
500\label{subsect.listing.files}
501
502Translation specific files (files relating to language semantics have been omitted) are presented in Table~\ref{table.syntax}.
503Here, the O'Caml column denotes the O'Caml source file in the prototype compiler's implementation that corresponds to the Matita script in question.
504The ratios are the linecounts of the Matita file divided by the line counts of the corresponding O'Caml file(s).
505These are computed with \texttt{wc -l}, a standard Unix tool.
506
507Individual file's ratios are an over approximation, due to the fact that it's hard to relate an individual O'Caml file to the abstracted Matita code that has been spread across multiple files.
508The ratio between total Matita code lines and total O'Caml code lines is more reflective of the compressed and abstracted state of the Matita translation.
509
510Translations and utilities are presented in Table~\ref{table.translation.utilities}.
511
512Given that Matita code is much more verbose than O'Caml code, with explicit typing and inline proofs, we have achieved respectable line count ratios in the translation.
513Some of these ratio, however, need explanation.
514In particular, the RTLabs to RTL translation stands out.
515Note that RTLabs is not subject to our abstraction process, and the language's syntax is fully explicated.
516Further, the RTLabs to RTL translation is quite involved, including instruction selection.
517
518Other, longer translations include the file \texttt{ERTLtoLTL.ma}.
519This is actually the concatenation of the files \texttt{ERTLtoLTLI.ml} and \texttt{ERTLtoLTL.ml} in the O'Caml source, hence its length.
520
521Further, many translations are actually significantly shorter than their O'Caml counterparts due to axiomatisation, and the lack of structure and functor declarations in Matita.
522
523We note that the O'Caml backend codebase consists of 6770 lines of O'Caml code (including comments).
524The Matita codebase consists of 6447 lines of Matita code (including comments).
525This is a ratio of 0.95.
526
527\begin{landscape}
528\begin{table}
529\begin{threeparttable}
530\begin{tabular}{llrlrl}
531Description & Matita & Lines & O'Caml & Lines & Ratio \\
532\hline
533Abstracted syntax for backend languages & \texttt{joint/Joint.ma} & 173 & N/A & N/A & N/A \\
534The syntax of RTLabs & \texttt{RTLabs/syntax.ma} & 73 & \texttt{RTLabs/RTLabs.mli} & 113 & 0.65 \\
535The syntax of RTL & \texttt{RTL/RTL.ma} & 49 & \texttt{RTL/RTL.mli} & 120 & 1.85\tnote{a} \\
536The syntax of ERTL & \texttt{ERTL/ERTL.ma} & 25 & \texttt{ERTL/ERTL.mli} & 191 & 1.04\tnote{a} \\
537The syntax of the abstracted combined LTL and LIN language & \texttt{LIN/joint\_LTL\_LIN.ma} & 10 & N/A & N/A & N/A \\
538The specialisation of the above file to the syntax of LTL & \texttt{LTL/LTL.ma} & 10 & \texttt{LTL/LTL.mli} & 104 & 1.86\tnote{b} \\
539The specialisation of the above file to the syntax of LIN & \texttt{LIN/LIN.ma} & 17 & \texttt{LIN/LIN.mli} & 88 & 2.27\tnote{b} \\
540\end{tabular}
541\begin{tablenotes}
542  \item[a] After inlining of \texttt{joint/Joint.ma}.
543  \item[b] After inlining of \texttt{joint/Joint\_LTL\_LIN.ma} and \texttt{joint/Joint.ma}.\\
544  \begin{tabular}{ll}
545  Total lines of Matita code for the above files:& 347 \\
546  Total lines of O'Caml code for the above files:& 616 \\
547  Ratio of total lines:& 0.56
548  \end{tabular}
549\end{tablenotes}
550\end{threeparttable}
551\caption{Syntax files}
552\label{table.syntax}
553\end{table}
554\end{landscape}
555\begin{landscape}
556\begin{table}
557\begin{threeparttable}
558\begin{tabular}{llrlrl}
559Description & Matita & Lines & O'Caml & Lines & Ratio \\
560\hline
561Generic translation utilities & \texttt{joint/TranslateUtils.ma} & 59 & N/A & N/A & N/A \\
562Translation from RTLabs to RTL & \texttt{RTLabs/RTLabsToRTL.ma} & 1251 & \texttt{RTLabs/RTLabsToRTL.ml} & 778 & 1.61 \\
563Translation from RTL to ERTL & \texttt{RTL/RTLToERTL.ma} & 417 & \texttt{RTL/RTLToERTL.ml} & 472 & 1.01\tnote{a} \\
564Elimination of tailcalls & \texttt{RTL/RTLTailcall.ma} & 52 & \texttt{RTL/RTLtailcall.ml} & 25 & 2.02 \\
565Translation from ERTL to LTL & \texttt{ERTL/ERTLToLTL.ma} & 464 & \texttt{ERTL/ERTLToLTL.ml} & 429\tnote{b} & 1.22\tnote{c} \\
566Axiomatised graph colouring component & \texttt{ERTL/Interference.ma} & 26 & \texttt{common/interference.ml} & 861 & 0.03\tnote{a} \\
567Liveness analysis & \texttt{ERTL/liveness.ma} & 298 & \texttt{ERTL/liveness.ml} & 323 & 0.92 \\
568Translation from LTL to LIN & \texttt{LTL/LTLToLIN.ma} & 92 & \texttt{LTL/LTLToLIN.ml} & 302\tnote{d} & 0.53\tnote{e} \\
569Generic code for LTL and LIN languages & \texttt{LIN/joint\_LTL\_LIN.ma} & 10 & N/A & N/A & N/A \\
570Translation from LIN to assembly & \texttt{LIN/LINToASM.ma} & 335 & \texttt{LIN/LINToASM.ml} & 142 & 2.85\tnote{f}
571\end{tabular}
572\begin{tablenotes}
573  \item[a] After inlining of \texttt{joint/TranslateUtils.ma}.
574  \item[b] After inlining of \texttt{ERTL/ERTLToLTLI.ml}.
575  \item[c] After inlining of \texttt{joint/TranslateUtils.ma} and \texttt{ERTL/ERTLToLTLI.ml}.
576  \item[d] After inlining of \texttt{LTL/LTLToLINI.ml}.
577  \item[e] After inlining of \texttt{joint/TranslateUtils.ma}, \texttt{LTL/LTLToLINI.ml} and \texttt{joint/joint\_LTL\_LIN.ma}.
578  \item[f] After inlining of \texttt{joint/TranslateUtils.ma}, and \texttt{joint/joint\_LTL\_LIN.ma}.\\
579  \begin{tabular}{ll}
580  Total lines of Matita code for the above files:& 3032. \\
581  Total lines of O'Caml code for the above files:& 3332. \\
582  Ratio of total lines:& 0.91.
583  \end{tabular}
584\end{tablenotes}
585\end{threeparttable}
586\caption{Translation files.}
587\label{table.translation.utilities}
588\end{table}
589\end{landscape}
590
591%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
592% SECTION.                                                                    %
593%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
594\subsection{Listing of important functions and axioms}
595\label{subsect.listing.important.functions.and.axioms}
596
597We list some important functions and axioms in the backend compilation:
598
599\paragraph{From RTLabs/RTLabsToRTL.ma}
600\begin{center}
601\begin{tabular*}{0.9\textwidth}{p{5cm}p{8cm}}
602Title & Description \\
603\hline
604\texttt{translate\_stmt} & Translation of an RTLabs statement to an RTL statement \\
605\texttt{translate\_internal} & Translation of an RTLabs internal function to an RTL internal function \\
606\texttt{rtlabs\_to\_rtl} & Translation of an RTLabs program to an RTL program
607\end{tabular*}
608\end{center}
609
610\paragraph{From joint/TranslateUtils.ma}
611\begin{center}
612\begin{tabular*}{0.9\textwidth}{p{5cm}p{8cm}}
613Title & Description \\
614\hline
615\texttt{fresh\_regs} & Generic fresh pseudoregister generation, for any intermediate language \\
616\texttt{adds\_graph} & Generic means of adding a statement to a graph, for any intermediate language
617\end{tabular*}
618\end{center}
619
620\paragraph{From RTL/RTLTailcall.ma}
621\begin{center}
622\begin{tabular*}{0.9\textwidth}{p{5cm}p{8cm}}
623Title & Description \\
624\hline
625\texttt{simplify\_statement} & Remove a single tailcall \\
626\texttt{simplify\_graph} & Remove all tailcalls in the function graph \\
627\texttt{tailcall\_simplify} & Simplify an RTL program by removing tailcalls
628\end{tabular*}
629\end{center}
630
631\paragraph{From RTL/RTLToERTL.ma}
632\begin{center}
633\begin{tabular*}{0.9\textwidth}{p{5cm}p{8cm}}
634Title & Description \\
635\hline
636\texttt{translate\_stmt} & Translation of an RTL statement to an ERTL statement \\
637\texttt{translate\_funct\_internal} & Translation of an RTL internal function to an ERTL internal function \\
638\texttt{translate} & Translation of an RTL program to an ERTL program
639\end{tabular*}
640\end{center}
641
642\paragraph{From ERTL/liveness.ma}
643\begin{center}
644\begin{tabular*}{0.9\textwidth}{p{5cm}p{8cm}}
645Title & Description \\
646\hline
647\texttt{analyse} & Dead code analysis
648\end{tabular*}
649\end{center}
650
651\paragraph{From ERTL/Interference.ma}
652\begin{center}
653\begin{tabular*}{0.9\textwidth}{p{5cm}p{8cm}}
654Title & Description \\
655\hline
656\texttt{build} & The (axiomatised) graph colouring for register and stack slot allocation
657\end{tabular*}
658\end{center}
659
660\paragraph{From ERTL/ERTLToLTL.ma}
661\begin{center}
662\begin{tabular*}{0.9\textwidth}{p{5cm}p{8cm}}
663Title & Description \\
664\hline
665\texttt{translate\_statement} & Translation of an ERTL statement into multiple LTL statements \\
666\texttt{translate\_internal} & Translation of an ERTL internal function into an LTL internal function \\
667\texttt{ertl\_to\_ltl} & Translation of an ERTL program into an LTL program
668\end{tabular*}
669\end{center}
670
671\paragraph{From LTL/LTLToLIN.ma}
672\begin{center}
673\begin{tabular*}{0.9\textwidth}{p{5cm}p{8cm}}
674Title & Description \\
675\hline
676\texttt{visit} & Visits, in order, every node in the statement graph for linearisation \\
677\texttt{translate\_stmt} & Translation of an LTL statement to a LIN statement \\
678\texttt{branch\_compress} & Place holder (currently identity) function for branch compression \\
679\texttt{translate\_internal} & Translation of an LTL internal function into a LIN internal function \\
680\texttt{ltl\_to\_lin} & Translation of an LTL program to a LIN program
681\end{tabular*}
682\end{center}
683
684\paragraph{From LIN/LINToASM.ma}
685\begin{center}
686\begin{tabular*}{0.9\textwidth}{p{5cm}p{8cm}}
687Title & Description \\
688\hline
689\texttt{translate\_statements} & Translation of a LIN statement to mutliple assembly instructions \\
690\texttt{translate\_fun\_def} & Translation of a LIN internal function definition into assembly \\
691\texttt{translate} & Translation of a LIN program into assembly
692\end{tabular*}
693\end{center}
694
695\bibliographystyle{alpha}
696\bibliography{report}
697
698\end{document}
Note: See TracBrowser for help on using the repository browser.