# source:Deliverables/D4.2-4.3/reports/D4-2.tex@1373

Last change on this file since 1373 was 1373, checked in by mulligan, 8 years ago

changes to file based on claudio's suggestions

File size: 34.5 KB
Line
1\documentclass[11pt, epsf, a4wide]{article}
2
3\usepackage{../../style/cerco}
4
5\usepackage{amsfonts}
6\usepackage{amsmath}
7\usepackage{amssymb}
8\usepackage[english]{babel}
9\usepackage{graphicx}
10\usepackage[utf8x]{inputenc}
11\usepackage{listings}
12\usepackage{stmaryrd}
13\usepackage{url}
14
15\title{
16INFORMATION AND COMMUNICATION TECHNOLOGIES\\
17(ICT)\\
18PROGRAMME\\
19\vspace*{1cm}Project FP7-ICT-2009-C-243881 \cerco{}}
20
21\lstdefinelanguage{matita-ocaml}
22  {keywords={definition,coercion,lemma,theorem,remark,inductive,record,qed,let,in,rec,match,return,with,Type,try},
23   morekeywords={[2]whd,normalize,elim,cases,destruct},
24   morekeywords={[3]type,of},
25   mathescape=true,
26  }
27
28\lstset{language=matita-ocaml,basicstyle=\small\tt,columns=flexible,breaklines=false,
29        keywordstyle=\color{red}\bfseries,
30        keywordstyle=[2]\color{blue},
31        keywordstyle=[3]\color{blue}\bfseries,
33        stringstyle=\color{blue},
34        showspaces=false,showstringspaces=false}
35
36\lstset{extendedchars=false}
37\lstset{inputencoding=utf8x}
38\DeclareUnicodeCharacter{8797}{:=}
39\DeclareUnicodeCharacter{10746}{++}
40\DeclareUnicodeCharacter{9001}{\ensuremath{\langle}}
41\DeclareUnicodeCharacter{9002}{\ensuremath{\rangle}}
42
43\date{}
44\author{}
45
46\begin{document}
47
48\thispagestyle{empty}
49
50\vspace*{-1cm}
51\begin{center}
52\includegraphics[width=0.6\textwidth]{../../style/cerco_logo.png}
53\end{center}
54
55\begin{minipage}{\textwidth}
56\maketitle
57\end{minipage}
58
59\vspace*{0.5cm}
60\begin{center}
61\begin{LARGE}
62\textbf{
63Report n. D4.2\\
64Functional encoding in the Calculus of Constructions
65}
66\end{LARGE}
67\end{center}
68
69\vspace*{2cm}
70\begin{center}
71\begin{large}
72Version 1.0
73\end{large}
74\end{center}
75
76\vspace*{0.5cm}
77\begin{center}
78\begin{large}
79Main Authors:\\
80Dominic P. Mulligan and Claudio Sacerdoti Coen
81\end{large}
82\end{center}
83
84\vspace*{\fill}
85
86\noindent
87Project Acronym: \cerco{}\\
88Project full title: Certified Complexity\\
89Proposal/Contract no.: FP7-ICT-2009-C-243881 \cerco{}\\
90
91\clearpage
93\markright{\cerco{}, FP7-ICT-2009-C-243881}
94
95\newpage
96
97\vspace*{7cm}
98\paragraph{Abstract}
99We describe the encoding, in the Calculus of Constructions, of the intermediate languages for the CerCo compiler.
100The CerCo backend consists of four distinct intermediate languages---RTL, ERTL, LTL and LIN, respectively---though we also handle the translations from the last frontend intermediate language, RTLabs, into RTL, and the translation of LIN into assembly language.
101Where feasible, we have made use of dependent types to enforce invariants and to make otherwise partial functions total.
102\newpage
103
104\tableofcontents
105
106\newpage
107
108%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
109% SECTION.                                                                    %
110%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
113
114The Grant Agreement states that Task T4.2, entitled Functional encoding in the Calculus of Constructions' has associated Deliverable D4.2, consisting of the following:
115\begin{quotation}
116CIC encoding: Back-end: Functional Specification in the internal language of the Proof Assistant (the Calculus of Inductive Construction) of the back end of the compiler. This unit is meant to be composable with the front-end of deliverable D3.2, to obtain a full working compiler for Milestone M2. A first validation of the design principles and implementation choices for the Untrusted Cost-annotating OCaml Compiler D2.2 is achieved and reported in the deliverable, possibly triggering updates of the Untrusted Cost-annotating OCaml Compiler sources.
117\end{quotation}
118This report details our implementation of this deliverable.
119
120%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
121% SECTION.                                                                    %
122%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
123\subsection{Connections with other deliverables}
124\label{subsect.connections.with.other.deliverables}
125
126Deliverable D4.2 enjoys a close relationship with three other deliverables, namely deliverables D2.2, D4.3 and D4.4.
127
128Deliverable D2.2, the O'Caml implementation of a cost preserving compiler for a large subset of the C programming language, is the basis upon which we have implemented the current deliverable.
129In particular, the architecture of the compiler, its intermediate languages, and the overall implementation of the Matita encodings has been taken from the O'Caml compiler.
130Any variations from the O'Caml design are due to bugs identified in the prototype compiler during the Matita implementation, our identification of code that can be abstracted and made generic, or our use of Matita's much stronger type system to enforce invariants through the use of dependent types.
131
132Deliverable D4.3 can be seen as a sister' deliverable to the deliverable reported on herein.
133In particular, where this deliverable reports on the encoding in the Calculus of Constructions of the backend translations, D4.3 is the encoding in the Calculus of Constructions of the semantics of those languages.
134As a result, a substantial amount of Matita code is shared between the two deliverables.
135
136Deliverable D4.4, the backend correctness proofs, is the immediate successor of this deliverable.
137
138%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
139% SECTION.                                                                    %
140%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
141\subsection{A brief overview of the backend compilation chain}
142\label{subsect.brief.overview.backend.compilation.chain}
143
144The Matita compiler's backend consists of five distinct intermediate languages: RTL, RTLntl, ERTL, LTL and LIN.
145A fifth language, RTLabs, serves as the entry point of the backend and the exit point of the frontend.
146RTL, RTLntl, ERTL and LTL are graph based' languages, whereas LIN is a linearised language, the final language before translation to assembly.
147
148We now briefly discuss the properties of the intermediate languages, and discuss the various transformations that take place during the translation process:
149
150\paragraph{RTLabs ((Abstract) Register Transfer Language)}
151As mentioned, this is the final language of the compiler's frontend and the entry point for the backend.
152This language uses pseudoregisters, not hardware registers.\footnote{There are an unbounded number of pseudoregisters.  Pseudoregisters are converted to hardware registers of stack positions during register allocation.}
153Functions still use stackframes, where arguments are passed on the stack and results are stored in addresses.
154During the pass to RTL, these are eliminated, and instruction selection is carried out.
155
156\paragraph{RTL (Register Transfer Language)}
157This language uses pseudoregisters, not hardware registers.
158Tailcall elimination is carried out during the translation from RTL to RTLntl.
159
160\paragraph{RTLntl (Register Transfer Language --- No Tailcalls)}
161This language is a pseudoregister, graph based language where all tailcalls are eliminated.
162RTLntl is not present in the O'Caml compiler.
163
164\paragraph{ERTL (Extended Register Transfer Language)}
165In this language most instructions still operate on pseudoregisters, apart from instructions that move data to, and from, the accumulator.
166The ERTL to LTL pass performs the following transformations: liveness analysis, register colouring and register/stack slot allocation.
167
168\paragraph{LTL (Linearisable Transfer Language)}
169Another graph based language, but uses hardware registers instead of pseudoregisters.
170Tunnelling (branch compression) should be implemented here.
171
172\paragraph{LIN (Linearised)}
173This is a linearised form of the LTL language; function graphs have been linearised into lists of statements.
174All registers have been translated into hardware registers or stack addresses.
175This is the final stage of compilation before translating directly into assembly language.
176
177%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
178% SECTION.                                                                    %
179%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
180\section{The backend intermediate languages in Matita}
181\label{sect.backend.intermediate.languages.matita}
182
183We now discuss the encoding of the compiler backend languages in the Calculus of Constructions proper.
184We pay particular heed to changes that we made from the O'Caml prototype.
185In particular, many aspects of the backend languages have been unified into a single joint' language.
186We have also made heavy use of dependent types to reduce spurious partiality' and to encode invariants.
187
188%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
189% SECTION.                                                                    %
190%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
191\subsection{Abstracting related languages}
192\label{subsect.abstracting.related.languages}
193
194The O'Caml compiler is written in the following manner.
195Each intermediate language has its own dedicated syntax, notions of internal function, and so on.
196Here, we make a distinction between internal functions'---other functions that are explicitly written by the programmer, and external functions', which belong to external library and require explictly linking.
197Internal functions are represented as a record, consisting of a sequential structure, of some description, of statements, entry and exit points to this structure, and other book keeping devices.
198Translations between intermediate language map syntaxes to syntaxes, and internal function representations to internal function representations explicitly.
199
200This is a perfectly valid way to write a compiler, where everything is made explicit, but writing a \emph{verified} compiler poses new challenges.
201In particular, we must look ahead to see how our choice of encodings will affect the size and complexity of the forthcoming proofs of correctness.
202We now discuss some abstractions, introduced in the Matita code, which we hope will make our proofs shorter, amongst other benefits.
203
204\paragraph{Changes between languages made explicit}
205Due to the bureaucracy inherent in explicating each intermediate language's syntax in the O'Caml compiler, it can often be hard to see exactly what changes between each successive intermediate language.
206By abstracting the syntax of the RTL, ERTL, LTL and LIN intermediate languages, we make these changes much clearer.
207
208Our abstraction takes the following form:
209\begin{lstlisting}
210inductive joint_instruction (p: params__) (globals: list ident): Type[0] :=
211  | COMMENT: String $\rightarrow$ joint_instruction p globals
212  ...
213  | INT: generic_reg p $\rightarrow$ Byte $\rightarrow$ joint_instruction p globals
214  ...
215  | OP1: Op1 → acc_a_reg p → acc_a_reg p → joint_instruction p globals
216  ...
217  | extension: extend_statements p $\rightarrow$ joint_instruction p globals.
218\end{lstlisting}
219We first note that for the majority of intermediate languages, many instructions are shared.
220However, these instructions expect different register types (either a pseudoregister or a hardware register) as arguments.
221We must therefore parameterise the joint syntax with a record of parameters that will be specialised to each intermediate language.
222In the type above, this parameterisation is realised with the \texttt{params\_\_} record.
223As a result of this parameterisation, we have also added a degree of type safety' to the intermediate languages' syntaxes.
224In particular, we note that the \texttt{OP1} constructor expects quite a specific type, in that the two register arguments must both be the accumulator A.
225Contrast this with the \texttt{INT} constructor, which expects a \texttt{generic\_reg}, corresponding to an arbitrary' register type.
226
227Further, we note that some intermediate languages have language specific instructions (i.e. the instructions that change between languages).
228We therefore add a new constructor to the syntax, \texttt{extension}, which expects a value of type \texttt{extend\_statements p}.
229As \texttt{p} varies between intermediate languages, we can provide language specific extensions to the syntax of the joint language.
230For example, ERTL's extended syntax consists of the following extra statements:
231\begin{lstlisting}
232inductive ertl_statement_extension: Type[0] :=
233  | ertl_st_ext_new_frame: ertl_statement_extension
234  | ertl_st_ext_del_frame: ertl_statement_extension
235  | ertl_st_ext_frame_size: register $\rightarrow$ ertl_statement_extension.
236\end{lstlisting}
237These are further packaged into an ERTL specific instance of \texttt{params\_\_} as follows:
238\begin{lstlisting}
239definition ertl_params__: params__ :=
240  mk_params__ register register ... ertl_statement_extension.
241\end{lstlisting}
242
243\paragraph{Shared code, reduced proofs}
244Many features of individual backend intermediate languages are shared with other intermediate languages.
245For instance, RTLabs, RTL, ERTL and LTL are all graph based languages, where functions are represented as a graph of statements that form their bodies.
246Functions for adding statements to a graph, searching the graph, and so on, are remarkably similar across all languages, but are duplicated in the O'Caml code.
247
248As a result, we chose to abstract the representation of internal functions for the RTL, ERTL, LTL and LIN intermediate languages into a joint' representation.
249This representation is parameterised by a record that dictates the layout of the function body for each intermediate language.
250For instance, in RTL, the layout is graph like, whereas in LIN, the layout is a linearised list of statements.
251Further, a generalised way of accessing the successor statement to the one currently under consideration is needed, and so forth.
252
253Our joint internal function record looks like so:
254\begin{lstlisting}
255record joint_internal_function (globals: list ident) (p:params globals) : Type[0] ≝
256{
257  ...
258  joint_if_params   : paramsT p;
259  joint_if_locals   : localsT p;
260  ...
261  joint_if_code     : codeT … p;
262  ...
263}.
264\end{lstlisting}
265In particular, everything that can vary between differing intermediate languages has been parameterised.
266Here, we see the number of parameters, the listing of local variables, and the internal code representation has been parameterised.
267Other particulars are also parameterised, though here omitted.
268
269Hopefully this abstraction process will reduce the number of proofs that need to be written, dealing with internal functions.
270We only need to prove once that fetching a statement's successor is correct', and we inherit this property for free for every intermediate language.
271
272\paragraph{Dependency on instruction selection}
273We note that the backend languages are all essentially post instruction selection languages'.
274The joint' syntax makes this especially clear.
275For instance, in the definition:
276\begin{lstlisting}
277inductive joint_instruction (p:params__) (globals: list ident): Type[0] ≝
278  ...
279  | INT: generic_reg p → Byte → joint_instruction p globals
280  | MOVE: pair_reg p → joint_instruction p globals
281  ...
282  | PUSH: acc_a_reg p → joint_instruction p globals
283  ...
284  | extension: extend_statements p → joint_instruction p globals.
285\end{lstlisting}
286The capitalised constructors---\texttt{INT}, \texttt{MOVE}, and so on---are all machine specific instructions.
287Retargetting the compiler to another microprocessor would entail replacing these constructors with constructors that correspond to the instructions of the new target.
288We feel that this makes which instructions are target dependent, and which are not (i.e. those language specific instructions that fall inside the \texttt{extension} constructor) much more explicit.
289
290\paragraph{Independent development and testing}
291We have essentially modularised the intermediate languages in the compiler backend.
292As with any form of modularisation, we reap benefits in the ability to independently test and develop each intermediate language separately.
293
294\paragraph{Future reuse for other compiler projects}
295Another advantage of our modularisation scheme is the ability to quickly use and reuse intermediate languages for other compiler projects.
296For instance, in creating a cost-preserving compiler for a functional language, we may choose to target the RTL language directly.
297Naturally, the register requirements for a functional language may differ from those of an imperative language, a reconfiguration which our parameterisation makes easy.
298
299\paragraph{Easy addition of new compiler passes}
300Under our modularisation and abstraction scheme, new compiler passes can easily be injected into the backend.
301We have a concrete example of this in the RTLntl language, an intermediate language that was not present in the original O'Caml code.
302To specify a new intermediate language we must simply specify, through the use of the statement extension mechanism, what differs in the new intermediate language from the joint' language, and configure a new notion of internal function record, by specialising parameters, to the new language.
303As generic code for the joint' language exists, for example to add statements to control flow graphs, this code can be reused for the new intermediate language.
304
305\paragraph{Possible language commutations}
306The backend translation passes of the CerCo compiler differ quite a bit from the CompCert compiler.
307In the CompCert compiler, linearisation occurs much earlier in the compilation chain, and passes such as register colouring and allocation are carried out on a linearised form of program.
308Contrast this with our own approach, where the code is represented as a graph for much longer.
309
310However, by abstracting the representation of intermediate functions, we are now much more free to reorder translation passes as we see fit.
311The linearisation process, for instance, now no longer cares about the specific representation of code in the source and target languages.
312It just relies on a common interface.
313We are therefore, in theory, free to pick where we wish to linearise our representation.
314This adds an unusual flexibility into the compilation process, and allows us to freely experiment with different orderings of translation passes.
315
316%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
317% SECTION.                                                                    %
318%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
319\subsection{Use of dependent types}
320\label{subsect.use.of.dependent.types}
321
322We see three potential ways in which a compiler can fail to compile a program:
323\begin{enumerate}
324\item
325The program is malformed, and there is no hope of making sense of the program.
326\item
327A heuristic or algorithm in the compiler is implemented incorrectly, in which case an otherwise correct source program fails to be compiler to correct assembly code.
328\item
329An invariant in the compiler is invalidated.
330\end{enumerate}
331The first source of failure we are unable to do anything about.
332The latter two sources of failure should be interpreted as a compiler bug, and, as part of a verified compiler project, we'd like to rule out all such bugs.
333In CerCo, we aim to use dependent types to help us enforce invariants and prove our heuristics and algorithms correct.
334
335First, we encode informal invariants, or uses of \texttt{assert false} in the O'Caml code, with dependent types, converting partial functions into total functions.
336There are numerous examples of this throughout the backend.
337For example, in the \texttt{RTLabs} to \texttt{RTL} transformation pass, many functions only make sense' when lists of registers passed to them as arguments conform to some specific length.
338For instance, the \texttt{translate\_negint} function, which translates a negative integer constant:
339\begin{lstlisting}
340definition translate_negint ≝
341  $\lambda$globals: list ident.
342  $\lambda$destrs: list register.
343  $\lambda$srcrs: list register.
344  $\lambda$start_lbl: label.
345  $\lambda$dest_lbl: label.
346  $\lambda$def: rtl_internal_function globals.
347  $\lambda$prf: |destrs| = |srcrs|. (* assert here *)
348    ...
349\end{lstlisting}
350The last argument to the function, \texttt{prf}, is a proof that the lengths of the lists of source and destination registers are the same.
351This was an assertion in the O'Caml code.
352
353Secondly, we make use of dependent types to make the Matita code easier to read, and eventually the proofs of correctness for the compiler easier to write.
354For instance, many intermediate languages in the backend of the compiler, from RTLabs to LTL, are graph based languages.
355Here, function definitions consist of a graph (i.e. a map from labels to statements) and a pair of labels denoting the entry and exit points of this graph.
356Practically, we would always like to ensure that the entry and exit labels are present in the statement graph.
357We ensure that this is so with a dependent sum type in the \texttt{joint\_internal\_function} record, which all graph based languages specialise to obtain their own internal function representation:
358\begin{lstlisting}
359record joint_internal_function (globals: list ident) (p: params globals): Type[0] :=
360{
361  ...
362  joint_if_code     : codeT $\ldots$ p;
363  joint_if_entry    : $\Sigma$l: label. lookup $\ldots$ joint_if_code l $\neq$ None $\ldots$;
364  ...
365}.
366\end{lstlisting}
367Here, \texttt{codeT} is a parameterised type representing the structure' of the function's body (a graph in graph based languages, and a list in the linearised LIN language).
368Specifically, the \texttt{joint\_if\_entry} is a dependent pair consisting of a label and a proof that the label in question is a vertex in the function's graph.
369A similar device exists for the exit label.
370
371Finally, we make use of dependent types for another reason: experimentation.
372Namely, CompCert makes little use of dependent types to encode invariants.
373In contrast, we wish to make as much use of dependent types as possible, both to experiment with different ways of encoding compilers in a proof assistant, but also as a way of stress testing' Matita's support for dependent types.
374
375%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
376% SECTION.                                                                    %
377%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
378\subsection{What we do not implement}
379\label{subsect.what.we.do.not.implement}
380
381There are several classes of functionality that we have chosen not to implement in the backend languages:
382\begin{itemize}
383\item
384\textbf{Datatypes and functions over these datatypes that are not supported by the compiler.}
385In particular, the compiler does not support the floating point datatype, nor accompanying functions over that datatype.
386At the moment, frontend languages within the compiler possess constructors corresponding to floating point code.
387These are removed during instruction selection (in the RTLabs to RTL transformation) using a daemon.\footnote{A Girardism.  An axiom of type \texttt{False}, from which we can prove anything.}
388However, at some point, we would like the front end of the compiler to recognise programs that use floating point code and reject them as being invalid.
389\item
390\textbf{Axiomatised components that will be implemented using external oracles.}
391Several large, complex pieces of compiler infrastructure, most noticably register colouring and fixed point calculation during liveness analysis have been axiomatised.
392This was already agreed upon before the start of the project, and is clearly marked in the project proposal, following comments by those involved with the CompCert project about the difficulty in formalising register colouring in that project.
393Instead, these components are axiomatised, along with the properties that they need to satisfy in order for the rest of the compilation chain to be correct.
394These axiomatised components are found in the ERTL to LTL pass.
395
396It should be noted that these axiomatised components fall into the following pattern: whilst their implementation is complex, and their proof of correctness is difficult, we are able to quickly and easily verify that any answer that they provide is correct.
397As a result, we do not see this axiomatisation process as being too onerous.
398\item
399\textbf{A few non-computational proof obligations.}
400A few difficult-to-close, but non-computational (i.e. they do not prevent us from executing the compiler inside Matita), proof obligations have been closed using daemons in the backend.
401These proof obligations originate with our use of dependent types for expressing invariants in the compiler.
402However, here, it should be mentioned that many open proof obligations are simply impossible to close until we start to obtain stronger invariants from the proof of correctness for the compiler proper.
403In particular, in the RTLabs to RTL pass, several proof obligations relating to lists of registers stored in a local environment' appear to fall into this pattern.
404\item
405\textbf{Branch compression (tunnelling).}
406This was a feature of the O'Caml compiler.
407It is not yet currently implemented in the Matita compiler.
408This feature is only an optimisation, and will not affect the correctness of the compiler.
409\item
410\textbf{Real' tailcalls}
411For the time being, tailcalls in the backend are translated to vanilla' function calls during the ERTL to LTL pass.
412This follows the O'Caml compiler, which did not implement tailcalls, and did this simplification step.
413Real' tailcalls are being implemented in the O'Caml compiler, and when this implementation is complete, we aim to port this code to the Matita compiler.
414\end{itemize}
415
416%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
417% SECTION.                                                                    %
418%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
419\section{Associated changes to O'Caml compiler}
420\label{sect.associated.changes.to.ocaml.compiler}
421
422At the moment, no changes we have made in the Matita backend have made their way back into the O'Caml compiler.
423We do not see the heavy process of modularisation and abstraction as making its way back into the O'Caml codebase, as this is a significant rewrite of the backend code.
424However, several bugfixes, and the identification of hidden invariants' in the O'Caml code will be incorporated back into the prototype.
425
426%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
427% SECTION.                                                                    %
428%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
429\section{Future work}
430\label{sect.future.work}
431
432As mentioned in Section~\ref{subsect.what.we.do.not.implement}, there are several unimplemented features in the compiler, and several aspects of the Matita code that can be improved in order to make currently partial functions total.
433We summarise this future work here:
434\begin{itemize}
435\item
436We plan to make use of dependent types to identify floating point' free programs and make all functions total over such programs.
437This will remove a swathe of uses of daemons.
438This should be routine.
439\item
440We plan to move expansion of integer modulus, and other related functions, into the instruction selection (RTLabs to RTL) phase.
441This will also help to remove a swathe of uses of daemons, as well as potentially introduce new opportunities for optimisations that we currently miss in expanding these instructions at the C-light level.
442\item
443We plan to close all existing proof obligations that are closed using daemons, arising from our use of dependent types in the backend.
444However, many may not be closable until we have completed Deliverable D4.4, the certification of the whole compiler, as we may not have invariants strong enough at the present time.
445\item
446We plan to port the O'Caml compiler's implementation of tailcalls when this is completed, and eventually port the branch compression code currently in the O'Caml compiler to the Matita implementation.
447This should not cause any major problems.
448\item
449We plan to validate the backend translations, removing any obvious bugs, by executing the translation inside Matita on small C programs.
450This is not critical, as the certification process will find all bugs anyway.
451\end{itemize}
452
453\newpage
454
455%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
456% SECTION.                                                                    %
457%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
458\section{Code listing}
459\label{sect.code.listing}
460
461%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
462% SECTION.                                                                    %
463%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
464\subsection{Listing of files}
465\label{subsect.listing.files}
466
467Translation specific files (files relating to language semantics have been omitted).
468Syntax:
469\begin{center}
470\begin{tabular*}{\textwidth}{p{3.5cm}p{5.5cm}p{3.5cm}p{1cm}}
471Title & Description & O'Caml & Ratio \\
472\hline
473\texttt{RTLabs/syntax.ma} & The syntax of RTLabs & \texttt{RTLabs/RTLabs.mli} & 0.65 \\
474\texttt{joint/Joint.ma} & Joint syntax for backend languages & N/A & N/A \\
475\texttt{RTL/RTL.ma} & The syntax of RTL & \texttt{RTL/RTL.mli} & 0.41 \\
476\texttt{ERTL/ERTL.ma} & The syntax of ERTL & \texttt{ERTL/ERTL.mli} & 0.13 \\
477\texttt{LTL/LTL.ma} & The syntax of LTL & \texttt{LTL/LTL.mli} & 0.13 \\
478\texttt{LIN/LIN.ma} & The syntax of LIN & \texttt{LIN/LIN.mli} & 0.36
479\end{tabular*}
480\end{center}
481Here, the O'Caml column denotes the O'Caml source file in the prototype compiler's implementation that corresponds to the Matita script in question.
482The ratios are the linecounts of the Matita file divided by the line counts of the corresponding O'Caml file.
483These are computed with \texttt{wc -l}, a standard Unix tool.
484
485\noindent
486Translations and utilities:
487\begin{center}
488\begin{tabular*}{\textwidth}{p{4.5cm}p{4.5cm}p{4.5cm}p{1cm}}
489Title & Description & O'Caml & Ratio \\
490\hline
491\texttt{RTLabs/RTLabsToRTL.ma} & The translation from RTLabs to RTL & \texttt{RTLabs/RTLabsToRTL.ml} & 1.61 \\
492\texttt{joint/TranslateUtils.ma} & Generic translation utilities & N/A & N/A \\
493\texttt{RTL/RTLToERTL.ma} & The translation from RTL to ERTL & \texttt{RTL/RTLToERTL.ml} & 0.88 \\
494\texttt{RTL/RTLtailcall.ma} & Elimination of tailcalls & \texttt{RTL/RTLtailcall.ml} & 2.08 \\
495\texttt{ERTL/ERTLToLTL.ma} & The translation from ERTL to LTL & \texttt{ERTL/ERTLToRTL.ml} & 3.46 \\
496\texttt{ERTL/Interference.ma} & Axiomatised graph colouring component & \texttt{common/interference.ml} & 0.03\footnote{The majority of this file is axiomatised.} \\
497\texttt{ERTL/liveness.ma} & Liveness analysis & \texttt{ERTL/liveness.ml} & 0.92 \\
498\texttt{LTL/LTLToLIN.ma} & The translation from LTL to LIN & \texttt{LTL/LTLToLIN.ml} & 0.75 \\
499\texttt{LIN/LINToASM.ma} & The translation from LIN to assembly language & \texttt{LIN/LINToASM.ml} 2.45 &
500\end{tabular*}
501\end{center}
502Given that Matita code is much more verbose than O'Caml code, with explicit typing and inline proofs, we have achieved respectable line count ratios in the translation.
503
504%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
505% SECTION.                                                                    %
506%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
507\subsection{Listing of important functions and axioms}
508\label{subsect.listing.important.functions.and.axioms}
509
510We list some important functions and axioms in the backend compilation:
511
512\paragraph{From RTL/RTLabsToRTL.ma}
513\begin{center}
514\begin{tabular*}{0.9\textwidth}{p{5cm}p{8cm}}
515Title & Description \\
516\hline
517\texttt{translate\_stmt} & Translation of an RTLabs statement to an RTL statement \\
518\texttt{translate\_internal} & Translation of an RTLabs internal function to an RTL internal function \\
519\texttt{rtlabs\_to\_rtl} & Translation of an RTLabs program to an RTL program
520\end{tabular*}
521\end{center}
522
523\paragraph{From joint/TranslateUtils.ma}
524\begin{center}
525\begin{tabular*}{0.9\textwidth}{p{5cm}p{8cm}}
526Title & Description \\
527\hline
528\texttt{fresh\_regs} & Generic fresh pseudoregister generation, for any intermediate language \\
529\texttt{adds\_graph} & Generic means of adding a statement to a graph, for any intermediate language
530\end{tabular*}
531\end{center}
532
533\paragraph{From RTL/RTLTailcall.ma}
534\begin{center}
535\begin{tabular*}{0.9\textwidth}{p{5cm}p{8cm}}
536Title & Description \\
537\hline
538\texttt{simplify\_statement} & Remove a single tailcall \\
539\texttt{simplify\_graph} & Remove all tailcalls in the function graph \\
540\texttt{tailcall\_simplify} & Simplify an RTL program by removing tailcalls
541\end{tabular*}
542\end{center}
543
544\paragraph{From RTL/RTLToERTL.ma}
545\begin{center}
546\begin{tabular*}{0.9\textwidth}{p{5cm}p{8cm}}
547Title & Description \\
548\hline
549\texttt{translate\_stmt} & Translation of an RTL statement to an ERTL statement \\
550\texttt{translate\_funct\_internal} & Translation of an RTL internal function to an ERTL internal function \\
551\texttt{translate} & Translation of an RTL program to an ERTL program
552\end{tabular*}
553\end{center}
554
555\paragraph{From ERTL/liveness.ma}
556\begin{center}
557\begin{tabular*}{0.9\textwidth}{p{5cm}p{8cm}}
558Title & Description \\
559\hline
560\texttt{analyse} & Dead code analysis
561\end{tabular*}
562\end{center}
563
564\paragraph{From ERTL/Interference.ma}
565\begin{center}
566\begin{tabular*}{0.9\textwidth}{p{5cm}p{8cm}}
567Title & Description \\
568\hline
569\texttt{build} & The (axiomatised) graph colouring for register and stack slot allocation
570\end{tabular*}
571\end{center}
572
573\paragraph{From ERTL/ERTLToLTL.ma}
574\begin{center}
575\begin{tabular*}{0.9\textwidth}{p{5cm}p{8cm}}
576Title & Description \\
577\hline
578\texttt{translate\_statement} & Translation of an ERTL statement into multiple LTL statements \\
579\texttt{translate\_internal} & Translation of an ERTL internal function into an LTL internal function \\
580\texttt{ertl\_to\_ltl} & Translation of an ERTL program into an LTL program
581\end{tabular*}
582\end{center}
583
584\paragraph{From LTL/LTLToLIN.ma}
585\begin{center}
586\begin{tabular*}{0.9\textwidth}{p{5cm}p{8cm}}
587Title & Description \\
588\hline
589\texttt{visit} & Visits, in order, every node in the statement graph for linearisation \\
590\texttt{translate\_stmt} & Translation of an LTL statement to a LIN statement \\
591\texttt{branch\_compress} & Place holder (currently identity) function for branch compression \\
592\texttt{translate\_internal} & Translation of an LTL internal function into a LIN internal function \\
593\texttt{ltl\_to\_lin} & Translation of an LTL program to a LIN program
594\end{tabular*}
595\end{center}
596
597\paragraph{From LIN/LINToASM.ma}
598\begin{center}
599\begin{tabular*}{0.9\textwidth}{p{5cm}p{8cm}}
600Title & Description \\
601\hline
602\texttt{translate\_statements} & Translation of a LIN statement to mutliple assembly instructions \\
603\texttt{translate\_fun\_def} & Translation of a LIN internal function definition into assembly \\
604\texttt{translate} & Translation of a LIN program into assembly
605\end{tabular*}
606\end{center}
607
608\end{document}
Note: See TracBrowser for help on using the repository browser.