# source:Deliverables/D4.2-4.3/reports/D4-3.tex@1405

Last change on this file since 1405 was 1405, checked in by mulligan, 10 years ago

yet more added. apparently there's more parameters than i ever thought imaginable

File size: 21.7 KB
Line
1\documentclass[11pt, epsf, a4wide]{article}
2
3\usepackage{../../style/cerco}
4
5\usepackage{amsfonts}
6\usepackage{amsmath}
7\usepackage{amssymb}
8\usepackage[english]{babel}
9\usepackage{graphicx}
10\usepackage[utf8x]{inputenc}
11\usepackage{listings}
12\usepackage{stmaryrd}
13\usepackage{url}
14
15\title{
16INFORMATION AND COMMUNICATION TECHNOLOGIES\\
17(ICT)\\
18PROGRAMME\\
19\vspace*{1cm}Project FP7-ICT-2009-C-243881 \cerco{}}
20
21\lstdefinelanguage{matita-ocaml}
22  {keywords={definition,coercion,lemma,theorem,remark,inductive,record,qed,let,let,in,rec,match,return,with,Type,try},
23   morekeywords={[2]whd,normalize,elim,cases,destruct},
24   morekeywords={[3]type,of},
25   mathescape=true,
26  }
27
28\lstset{language=matita-ocaml,basicstyle=\small\tt,columns=flexible,breaklines=false,
29        keywordstyle=\color{red}\bfseries,
30        keywordstyle=[2]\color{blue},
31        keywordstyle=[3]\color{blue}\bfseries,
33        stringstyle=\color{blue},
34        showspaces=false,showstringspaces=false}
35
36\lstset{extendedchars=false}
37\lstset{inputencoding=utf8x}
38\DeclareUnicodeCharacter{8797}{:=}
39\DeclareUnicodeCharacter{10746}{++}
40\DeclareUnicodeCharacter{9001}{\ensuremath{\langle}}
41\DeclareUnicodeCharacter{9002}{\ensuremath{\rangle}}
42
43\date{}
44\author{}
45
46\begin{document}
47
48\thispagestyle{empty}
49
50\vspace*{-1cm}
51\begin{center}
52\includegraphics[width=0.6\textwidth]{../../style/cerco_logo.png}
53\end{center}
54
55\begin{minipage}{\textwidth}
56\maketitle
57\end{minipage}
58
59\vspace*{0.5cm}
60\begin{center}
61\begin{LARGE}
62\textbf{
63Report n. D4.3\\
64Formal semantics of intermediate languages
65}
66\end{LARGE}
67\end{center}
68
69\vspace*{2cm}
70\begin{center}
71\begin{large}
72Version 1.0
73\end{large}
74\end{center}
75
76\vspace*{0.5cm}
77\begin{center}
78\begin{large}
79Main Authors:\\
80Dominic P. Mulligan and Claudio Sacerdoti Coen
81\end{large}
82\end{center}
83
84\vspace*{\fill}
85
86\noindent
87Project Acronym: \cerco{}\\
88Project full title: Certified Complexity\\
89Proposal/Contract no.: FP7-ICT-2009-C-243881 \cerco{}\\
90
91\clearpage
93\markright{\cerco{}, FP7-ICT-2009-C-243881}
94
95\newpage
96
97\vspace*{7cm}
98\paragraph{Abstract}
99We describe the encoding in the Calculus of Constructions of the semantics of the CerCo compiler's backend intermediate languages.
100The CerCo backend consists of five distinct languages: RTL, RTLntl, ERTL, LTL and LIN.
101We describe a process of heavy abstraction of the intermediate languages and their semantics.
102We hope that this process will ease the burden of Deliverable D4.4, the proof of correctness for the compiler.
103
104\newpage
105
106\tableofcontents
107
108\newpage
109
110%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
111% SECTION.                                                                    %
112%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
115
116The Grant Agreement states that Task T4.3, entitled Formal semantics of intermediate languages' has associated Deliverable D4.3, consisting of the following:
117\begin{quotation}
118Executable Formal Semantics of back-end intermediate languages: This prototype is the formal counterpart of deliverable D2.1 for the back end side of the compiler and validates it.
119\end{quotation}
120This report details our implementation of this deliverable.
121
122%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
123% SECTION.                                                                    %
124%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
125\subsection{Connections with other deliverables}
126\label{subsect.connections.with.other.deliverables}
127
128Deliverable D4.3 enjoys a close relationship with three other deliverables, namely deliverables D2.2, D4.3 and D4.4.
129
130Deliverable D2.2, the O'Caml implementation of a cost preserving compiler for a large subset of the C programming language, is the basis upon which we have implemented the current deliverable.
131In particular, the architecture of the compiler, its intermediate languages and their semantics, and the overall implementation of the Matita encodings has been taken from the O'Caml compiler.
132Any variations from the O'Caml design are due to bugs identified in the prototype compiler during the Matita implementation, our identification of code that can be abstracted and made generic, or our use of Matita's much stronger type system to enforce invariants through the use of dependent types.
133
134Deliverable D4.2 can be seen as a sister' deliverable to the deliverable reported on herein.
135In particular, where this deliverable reports on the encoding in the Calculus of Constructions of the backend semantics, D4.2 is the encoding in the Calculus of Constructions of the mutual translations of those languages.
136As a result, a substantial amount of Matita code is shared between the two deliverables.
137
138Deliverable D4.4, the backend correctness proofs, is the immediate successor of this deliverable.
139
140%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
141% SECTION.                                                                    %
142%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
143\section{The backend intermediate languages' semantics in Matita}
144\label{sect.backend.intermediate.languages.semantics.matita}
145
146%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
147% SECTION.                                                                    %
148%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
149\subsection{Abstracting related languages}
150\label{subsect.abstracting.related.languages}
151
152As mentioned in the report for Deliverable D4.2, a systematic process of abstraction, over the O'Caml code, has taken place in the Matita encoding.
153In particular, we have merged many of the syntaxes of the intermediate languages (i.e. RTL, ERTL, LTL and LIN) into a single joint' syntax, which is parameterised by various types.
154Equivalent intermediate languages to those present in the O'Caml code can be recovered by specialising this joint structure.
155
156As mentioned in the report for Deliverable D4.2, there are a number of advantages that this process of abstraction brings, from code reuse to allowing us to get a clearer view of the intermediate languages and their structure.
157However, the semantics of the intermediate languages allow us to concretely demonstrate this improvement in clarity, by noting that the semantics of the LTL and the semantics of the LIN languages are identical.
158In particular, the semantics of both LTL and LIN are implemented in exactly the same way.
159The only difference between the two languages is how the next instruction to be interpreted is fetched.
160In LTL, this involves looking up in a graph, whereas in LTL, this involves fetching from a list of instructions.
161
162As a result, we see that the semantics of LIN and LTL are both instances of a single, more general language that is parametric in how the next instruction is fetched.
163Furthermore, any prospective proof that the semantics of LTL and LIN are identical is not almost trivial, saving a deal of work in Deliverable D4.4.
164
165%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
166% SECTION.                                                                    %
167%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
168\subsection{Type parameters, and their purpose}
169\label{subsect.type.parameters.their.purpose}
170
171We mentioned in the Deliverable D4.2 report that all joint languages are parameterised by a number of types, which are later specialised to each distinct intermediate language.
172As this parameterisation process is also dependent on designs decisions in the language semantics, we have so far held off summarising the role of each parameter.
173
174We begin the abstraction process with the \texttt{params\_\_} record.
175This holds the types of the representations of the different register varieties in the intermediate languages:
176\begin{lstlisting}
177record params__: Type[1] ≝
178{
179  acc_a_reg: Type[0];
180  acc_b_reg: Type[0];
181  dpl_reg: Type[0];
182  dph_reg: Type[0];
183  pair_reg: Type[0];
184  generic_reg: Type[0];
185  call_args: Type[0];
186  call_dest: Type[0];
187  extend_statements: Type[0]
188}.
189\end{lstlisting}
190We summarise what these types mean, and how they are used in both the semantics and the translation process:
191\begin{center}
192\begin{tabular*}{\textwidth}{p{4cm}p{11cm}}
193Type & Explanation \\
194\hline
195\texttt{acc\_a\_reg} & The type of the accumulator A register.  In some languages this is implemented as the hardware accumulator, whereas in others this is a pseudoregister.\\
196\texttt{acc\_b\_reg} & Similar to the accumulator A field, but for the processor's auxilliary accumulator, B. \\
197\texttt{dpl\_reg} & The type of the representation of the low eight bit register of the MCS-51's single 16 bit register, DPL.  Can be either a pseudoregister or the hardware DPL register. \\
198\texttt{dph\_reg} & Similar to the DPL register but for the eight high bits of the 16-bit register. \\
199\texttt{pair\_reg} & Various different move' instructions have been merged into a single move instruction in the joint language.  A value can either be moved to or from the accumulator in some languages, or moved to and from an arbitrary pseudoregister in others.  This type encodes how we should move data around the registers and accumulators. \\
200\texttt{generic\_reg} & The representation of generic registers (i.e. those that are not devoted to a specific task). \\
201\texttt{call\_args} & The number of arguments to a function.  For some languages this is irrelevant. \\
202\texttt{call\_dest} & \\
203\texttt{extend\_statements} & Instructions that are specific to a particular intermediate language, and which cannot be abstracted into the joint language.
204\end{tabular*}
205\end{center}
206
207As mentioned in the report for Deliverable D4.2, the record \texttt{params\_\_} is enough to be able to specify the instructions of the joint languages:
208\begin{lstlisting}
209inductive joint_instruction (p: params__) (globals: list ident): Type[0] :=
210  | COMMENT: String → joint_instruction p globals
211  | COST_LABEL: costlabel → joint_instruction p globals
212  ...
213\end{lstlisting}
214
215We extend \texttt{params\_\_} with a type corresponding to labels, \texttt{succ}, obtaining a new record type of parameters called \texttt{params\_}:
216\begin{lstlisting}
217record params_: Type[1] ≝
218{
219  pars__ :> params__;
220  succ: Type[0]
221}.
222\end{lstlisting}
223The type \texttt{succ} corresponds to labels, in the case of control flow graph based languages, or is instantiated to the unit type for the linearised language, LIN.
224Using \texttt{param\_} we can define statements of the joint language:
225\begin{lstlisting}
226inductive joint_statement (p:params_) (globals: list ident): Type[0] :=
227  | sequential: joint_instruction p globals → succ p → joint_statement p globals
228  | GOTO: label → joint_statement p globals
229  | RETURN: joint_statement p globals.
230\end{lstlisting}
231Note that in the joint language, instructions are linear', in that they have an immediate successor.
232Statements, on the other hand, consist of either a linear instruction, or a \texttt{GOTO} or \texttt{RETURN} statement, both of which can jump to an arbitrary place in the program.
233
234For the semantics, we need further parametererised types.
235In particular, we parameterise the result and parameter type of an internal function call in \texttt{params0}:
236\begin{lstlisting}
237record params0: Type[1] ≝
238 { pars__' :> params__
239 ; resultT: Type[0]
240 ; paramsT: Type[0]
241 }.
242\end{lstlisting}
243We further extend \texttt{params0} with a type for local variables in internal function calls:
244\begin{lstlisting}
245record params1 : Type[1] ≝
246 { pars0 :> params0
247 ; localsT: Type[0]
248 }.
249\end{lstlisting}
250Again, we expand our parameters with types corresponding to the code representation (either a control flow graph or a list of statements).
251Further, we hypothesise a generic method for looking up the next instruction in the graph, called \texttt{lookup}.
252Note that \texttt{lookup} may fail, and returns an \texttt{option} type:
253\begin{lstlisting}
254record params (globals: list ident): Type[1] ≝
255 { succ_ : Type[0]
256 ; pars1 :> params1
257 ; codeT: Type[0]
258 ; lookup: codeT → label → option (joint_statement (mk_params_ pars1 succ_) globals)
259 }.
260\end{lstlisting}
261We now have what we need to define internal functions for the joint language.
262The first two universe' fields are only used in the compilation process, for generating fresh names, and do not affect the semantics.
263The rest of the fields affect both compilation and semantics.
264In particular, we have parameterised result types, function parameter types and the type of local variables.
265Note also that we have lifted the hypothesised \texttt{lookup} function from \texttt{params} into a dependent sigma type, which combines a label (the entry and exit points of the control flow graph or list) combined with a proof that the label is in the graph structure:
266\begin{lstlisting}
267record joint_internal_function (globals: list ident) (p:params globals) : Type[0] :=
268{
269  joint_if_luniverse: universe LabelTag;
270  joint_if_runiverse: universe RegisterTag;
271  joint_if_result   : resultT p;
272  joint_if_params   : paramsT p;
273  joint_if_locals   : localsT p;
274  joint_if_stacksize: nat;
275  joint_if_code     : codeT … p;
276  joint_if_entry    : $\Sigma$l: label. lookup … joint_if_code l ≠ None ?;
277  joint_if_exit     : $\Sigma$l: label. lookup … joint_if_code l ≠ None ?
278}.
279\end{lstlisting}
280Naturally, a question arises as to why we have chosen to split up the parameterisation into so many intermediate records, each slightly extending earlier ones.
281The reason is because some intermediate languages share a host of parameters, and only differ on some others.
282For instance, in instantiating the ERTL language, certain parameters are shared with RTL, whilst others are ERTL specific:
283\begin{lstlisting}
284...
285definition ertl_params__: params__ :=
286 mk_params__ register register register register (move_registers × move_registers)
287  register nat unit ertl_statement_extension.
288...
289definition ertl_params1: params1 := rtl_ertl_params1 ertl_params0.
290definition ertl_params: ∀globals. params globals ≝ rtl_ertl_params ertl_params0.
291...
292definition ertl_statement := joint_statement ertl_params_.
293
294definition ertl_internal_function :=
295  $\lambda$globals.joint_internal_function … (ertl_params globals).
296\end{lstlisting}
297Here, \texttt{rtl\_ertl\_params1} are the common parameters of the ERTL and RTL languages:
298\begin{lstlisting}
299definition rtl_ertl_params1 := $\lambda$pars0. mk_params1 pars0 (list register).
300\end{lstlisting}
301
302\begin{lstlisting}
303record more_sem_params (p:params_): Type[1] :=
304{
305  framesT: Type[0];
306  regsT: Type[0];
308  greg_store_: generic_reg p → beval → regsT → res regsT;
309  greg_retrieve_: regsT → generic_reg p → res beval;
310  acca_store_: acc_a_reg p → beval → regsT → res regsT;
311  acca_retrieve_: regsT → acc_a_reg p → res beval;
312  accb_store_: acc_b_reg p → beval → regsT → res regsT;
313  accb_retrieve_: regsT → acc_b_reg p → res beval;
314  dpl_store_: dpl_reg p → beval → regsT → res regsT;
315  dpl_retrieve_: regsT → dpl_reg p → res beval;
316  dph_store_: dph_reg p → beval → regsT → res regsT;
317  dph_retrieve_: regsT → dph_reg p → res beval;
318  pair_reg_move_: regsT → pair_reg p → res regsT;
319  pointer_of_label: label → $\Sigma$p:pointer. ptype p = Code
320}.
321\end{lstlisting}
322
323\begin{lstlisting}
324record sem_params: Type[1] :=
325{
326  spp :> params_;
327  more_sem_pars :> more_sem_params spp
328}.
329\end{lstlisting}
330
331\begin{lstlisting}
332record more_sem_params2 (globals: list ident) (p: params globals) : Type[1] :=
333{
334  more_sparams1 :> more_sem_params p;
335  fetch_statement: genv … p → state (mk_sem_params … more_sparams1) → res (joint_statement (mk_sem_params … more_sparams1) globals);
336  fetch_ra: state (mk_sem_params … more_sparams1) → res ((state (mk_sem_params … more_sparams1)) × address);
337  result_regs: genv globals p → state (mk_sem_params … more_sparams1) → res (list (generic_reg p));
338  init_locals : localsT p → regsT … more_sparams1 → regsT … more_sparams1;
339  save_frame: address → nat → paramsT … p → call_args p → call_dest p → state (mk_sem_params … more_sparams1) → res (state (mk_sem_params … more_sparams1));
340  pop_frame: genv globals p → state (mk_sem_params … more_sparams1) → res ((state (mk_sem_params … more_sparams1)));
341  fetch_external_args: external_function → state (mk_sem_params … more_sparams1) → res (list val);
342  set_result: list val → state (mk_sem_params … more_sparams1) → res (state (mk_sem_params … more_sparams1));
343  exec_extended: genv globals p → extend_statements (mk_sem_params … more_sparams1) → succ p → state (mk_sem_params … more_sparams1) → IO io_out io_in (trace × (state (mk_sem_params … more_sparams1)))
344 }.
345\end{lstlisting}
346
347\begin{lstlisting}
348record sem_params2 (globals: list ident): Type[1] :=
349{
350  p2 :> params globals;
351  more_sparams2 :> more_sem_params2 globals p2
352}.
353\end{lstlisting}
354
355The \texttt{state} record holds the current state of the interpreter:
356\begin{lstlisting}
357record state (p: sem_params): Type[0] :=
358{
359  st_frms: framesT ? p;
361  sp: pointer;
362  carry: beval;
363  regs: regsT ? p;
364  m: bemem
365}.
366\end{lstlisting}
367Here \texttt{st\_frms} represent stack frames, \texttt{pc} the program counter, \texttt{sp} the stack pointer, \texttt{carry} the carry flag, \texttt{regs} the generic registers and \texttt{m} external RAM.
368We use the function \texttt{eval\_statement} to evaluate a single joint statement:
369\begin{lstlisting}
370definition eval_statement:
371  ∀globals: list ident.∀p:sem_params2 globals.
372    genv globals p → state p → IO io_out io_in (trace × (state p)) :=
373...
374\end{lstlisting}
375We examine the type here.
376
377%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
378% SECTION.                                                                    %
379%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
382
383Monads are a categorical notion that have recently gained an amount of traction in functional programming circles.
384In particular, it was noted by Moggi that monads could be used to sequence \emph{effectful} computations in a pure manner.
385Here, effectful computations' cover a lot of ground, from writing to files, generating fresh names, or updating an ambient notion of state.
386
387In the semantics of both front and backend intermediate languages, we make use of monads.
388In particular, we make use of two forms of monad:
389\begin{enumerate}
390\item
391An error monad', which signals that a computation either has completed successfully, or returns with an error message.
392The sequencing operation of the error monad ensures that the result of chained computations in return the error message of the first failed computation.
393This monad is used extensively in the semantics to signal a state which cannot be recovered from.
394For instance, in the semantics of RTLabs, we make use of the error monad to signal bad final states:
395\begin{lstlisting}
396XXX
397\end{lstlisting}
398\item
399An IO' monad, signalling the emission or reading of data to some external location or memory address.
400Here, the monads sequencing operation ensures that emissions and reads are maintained in the correct order (i.e. it maintains a trace', or ordered sequence of IO events).
401Most functions in the intermediate language semantics fall into the IO monad.
402\end{enumerate}
403This monadic infrastructure is shared between the frontend and backend languages.
404
405%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
406% SECTION.                                                                    %
407%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
408\section{Future work}
409\label{sect.future.work}
410
411A few small axioms remain to be closed.
412These relate to fetching the next instruction to be interpreted from the control flow graph, or linearised representation, of the language.
413Closing these axioms should not be a problem.
414No further work remains, aside from tidying up' the code.
415
416\newpage
417
418%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
419% SECTION.                                                                    %
420%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
421\section{Code listing}
422\label{sect.code.listing}
423
424%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
425% SECTION.                                                                    %
426%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
427\subsection{Listing of files}
428\label{subsect.listing.files}
429
430Semantics specific files (files relating to language translations ommitted):
431\begin{center}
432\begin{tabular*}{0.9\textwidth}{p{5cm}p{8cm}}
433Title & Description \\
434\hline
435\texttt{RTLabs/syntax.ma} & The syntax of RTLabs \\
436\texttt{RTLabs/semantics.ma} & The semantics of RTLabs \\
437\texttt{joint/Joint.ma} & Abstracted syntax for backend languages \\
438\texttt{joint/SemanticUtils.ma} & Generic utilities used in the semantics of all joint' intermediate languages \\
439\texttt{RTL/RTL.ma} & The syntax of RTL \\
440\texttt{RTL/semantics.ma} & The semantics of RTL \\
441\texttt{ERTL/ERTL.ma} & The syntax of ERTL \\
442\texttt{ERTL/semantics.ma} & The semantics of ERTL \\
443\texttt{LTL/LTL.ma} & The syntax of LTL \\
444\texttt{LTL/semantics.ma} & The semantics of LTL \\
445\texttt{LIN/LIN.ma} & The syntax of LIN \\
446\texttt{LIN/semantics.ma} & The semantics of LIN
447\end{tabular*}
448\end{center}
449
450%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
451% SECTION.                                                                    %
452%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
453\subsection{Listing of important functions and axioms}
454\label{subsect.listing.important.functions.axioms}
455
456We list some important functions and axioms in the backend semantics:
457
458\end{document}
Note: See TracBrowser for help on using the repository browser.