source: Deliverables/D4.2-4.3/reports/D4-3.tex @ 1968

Last change on this file since 1968 was 1968, checked in by campbell, 8 years ago

Update D4.3's title, memory model details, and some typographical
inconsistencies and glitches.

File size: 38.5 KB
Line 
1\documentclass[11pt, epsf, a4wide]{article}
2
3\usepackage{../../style/cerco}
4
5\usepackage{amsfonts}
6\usepackage{amsmath}
7\usepackage{amssymb} 
8\usepackage[english]{babel}
9\usepackage{graphicx}
10\usepackage[utf8x]{inputenc}
11\usepackage{listings}
12\usepackage{lscape}
13\usepackage{stmaryrd}
14\usepackage{threeparttable}
15\usepackage{url}
16
17\title{
18INFORMATION AND COMMUNICATION TECHNOLOGIES\\
19(ICT)\\
20PROGRAMME\\
21\vspace*{1cm}Project FP7-ICT-2009-C-243881 \cerco{}}
22
23\lstdefinelanguage{matita-ocaml}
24  {keywords={definition,coercion,lemma,theorem,remark,inductive,record,qed,let,let,in,rec,match,return,with,Type,try},
25   morekeywords={[2]whd,normalize,elim,cases,destruct},
26   morekeywords={[3]type,of},
27   mathescape=true,
28  }
29
30\lstset{language=matita-ocaml,basicstyle=\small\tt,columns=flexible,breaklines=false,
31        keywordstyle=\color{red}\bfseries,
32        keywordstyle=[2]\color{blue},
33        keywordstyle=[3]\color{blue}\bfseries,
34        commentstyle=\color{green},
35        stringstyle=\color{blue},
36        showspaces=false,showstringspaces=false}
37
38\lstset{extendedchars=false}
39\lstset{inputencoding=utf8x}
40\DeclareUnicodeCharacter{8797}{:=}
41\DeclareUnicodeCharacter{10746}{++}
42\DeclareUnicodeCharacter{9001}{\ensuremath{\langle}}
43\DeclareUnicodeCharacter{9002}{\ensuremath{\rangle}}
44
45\date{}
46\author{}
47
48\begin{document}
49
50\thispagestyle{empty}
51
52\vspace*{-1cm}
53\begin{center}
54\includegraphics[width=0.6\textwidth]{../../style/cerco_logo.png}
55\end{center}
56
57\begin{minipage}{\textwidth}
58\maketitle
59\end{minipage}
60
61\vspace*{0.5cm}
62\begin{center}
63\begin{LARGE}
64\textbf{
65Report n. D4.3\\
66Executable Formal Semantics of back-end intermediate languages
67}
68\end{LARGE} 
69\end{center}
70
71\vspace*{2cm}
72\begin{center}
73\begin{large}
74Version 1.1
75\end{large}
76\end{center}
77
78\vspace*{0.5cm}
79\begin{center}
80\begin{large}
81Main Authors:\\
82Dominic P. Mulligan and Claudio Sacerdoti Coen
83\end{large}
84\end{center}
85
86\vspace*{\fill}
87
88\noindent
89Project Acronym: \cerco{}\\
90Project full title: Certified Complexity\\
91Proposal/Contract no.: FP7-ICT-2009-C-243881 \cerco{}\\
92
93\clearpage
94\pagestyle{myheadings}
95\markright{\cerco{}, FP7-ICT-2009-C-243881}
96
97\newpage
98
99\vspace*{7cm}
100\paragraph{Abstract}
101We describe the encoding in the Calculus of Constructions of the semantics of the CerCo compiler's back-end intermediate languages.
102The CerCo back-end consists of five distinct languages: RTL, RTLntl, ERTL, LTL and LIN.
103We describe a process of heavy abstraction of the intermediate languages and their semantics.
104We hope that this process will ease the burden of Deliverable D4.4, the proof of correctness for the compiler.
105
106\newpage
107
108\tableofcontents
109
110\newpage
111
112%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
113% SECTION.                                                                    %
114%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
115\section{Task}
116\label{sect.task}
117
118The Grant Agreement states that Task T4.3, entitled `Formal semantics of intermediate languages' has associated Deliverable D4.3, consisting of the following:
119\begin{quotation}
120Executable Formal Semantics of back-end intermediate languages: This prototype is the formal counterpart of deliverable D2.1 for the back end side of the compiler and validates it.
121\end{quotation}
122This report details our implementation of this deliverable.
123
124%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
125% SECTION.                                                                    %
126%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
127\subsection{Connections with other deliverables}
128\label{subsect.connections.with.other.deliverables}
129
130Deliverable D4.3 enjoys a close relationship with three other deliverables, namely deliverables D2.2, D4.3 and D4.4.
131
132Deliverable D2.2, the OCaml implementation of a cost preserving compiler for a large subset of the C programming language, is the basis upon which we have implemented the current deliverable.
133In particular, the architecture of the compiler, its intermediate languages and their semantics, and the overall implementation of the Matita encodings has been taken from the OCaml compiler.
134Any variations from the OCaml design are due to bugs identified in the prototype compiler during the Matita implementation, our identification of code that can be abstracted and made generic, or our use of Matita's much stronger type system to enforce invariants through the use of dependent types.
135
136Deliverable D4.2 can be seen as a `sister' deliverable to the deliverable reported on herein.
137In particular, where this deliverable reports on the encoding in the Calculus of Constructions of the back-end semantics, D4.2 is the encoding in the Calculus of Constructions of the mutual translations of those languages.
138As a result, a substantial amount of Matita code is shared between the two deliverables.
139
140Deliverable D4.4, the back-end correctness proofs, is the immediate successor of this deliverable.
141
142%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
143% SECTION.                                                                    %
144%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
145\section{The back-end intermediate languages' semantics in Matita}
146\label{sect.backend.intermediate.languages.semantics.matita}
147
148%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
149% SECTION.                                                                    %
150%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
151\subsection{Abstracting related languages}
152\label{subsect.abstracting.related.languages}
153
154As mentioned in the report for Deliverable D4.2, a systematic process of abstraction, over the OCaml code, has taken place in the Matita encoding.
155In particular, we have merged many of the syntaxes of the intermediate languages (i.e. RTL, ERTL, LTL and LIN) into a single `joint' syntax, which is parameterised by various types.
156Equivalent intermediate languages to those present in the OCaml code can be recovered by specialising this joint structure.
157
158As mentioned in the report for Deliverable D4.2, there are a number of advantages that this process of abstraction brings, from code reuse to allowing us to get a clearer view of the intermediate languages and their structure.
159However, the semantics of the intermediate languages allow us to concretely demonstrate this improvement in clarity, by noting that the semantics of the LTL and the semantics of the LIN languages are identical.
160In particular, the semantics of both LTL and LIN are implemented in exactly the same way.
161The only difference between the two languages is how the next instruction to be interpreted is fetched.
162In LTL, this involves looking up in a graph, whereas in LTL, this involves fetching from a list of instructions.
163
164As a result, we see that the semantics of LIN and LTL are both instances of a single, more general language that is parametric in how the next instruction is fetched.
165Furthermore, any prospective proof that the semantics of LTL and LIN are identical is now almost trivial, saving a deal of work in Deliverable D4.4.
166
167%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
168% SECTION.                                                                    %
169%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
170\subsection{Type parameters, and their purpose}
171\label{subsect.type.parameters.their.purpose}
172
173We mentioned in the Deliverable D4.2 report that all joint languages are parameterised by a number of types, which are later specialised to each distinct intermediate language.
174As this parameterisation process is also dependent on designs decisions in the language semantics, we have so far held off summarising the role of each parameter.
175
176We begin the abstraction process with the \texttt{params\_\_} record.
177This holds the types of the representations of the different register varieties in the intermediate languages:
178\begin{lstlisting}
179record params__: Type[1] :=
180{
181  acc_a_reg: Type[0];
182  acc_b_reg: Type[0];
183  dpl_reg: Type[0];
184  dph_reg: Type[0];
185  pair_reg: Type[0];
186  generic_reg: Type[0];
187  call_args: Type[0];
188  call_dest: Type[0];
189  extend_statements: Type[0]
190}.
191\end{lstlisting}
192We summarise what these types mean, and how they are used in both the semantics and the translation process:
193\begin{center}
194\begin{tabular*}{\textwidth}{p{4cm}p{11cm}}
195Type & Explanation \\
196\hline
197\texttt{acc\_a\_reg} & The type of the accumulator A register.  In some languages this is implemented as the hardware accumulator, whereas in others this is a pseudoregister.\\
198\texttt{acc\_b\_reg} & Similar to the accumulator A field, but for the processor's auxilliary accumulator, B. \\
199\texttt{dpl\_reg} & The type of the representation of the low eight bit register of the MCS-51's single 16 bit register, DPL.  Can be either a pseudoregister or the hardware DPL register. \\
200\texttt{dph\_reg} & Similar to the DPL register but for the eight high bits of the 16-bit register. \\
201\texttt{pair\_reg} & Various different `move' instructions have been merged into a single move instruction in the joint language.  A value can either be moved to or from the accumulator in some languages, or moved to and from an arbitrary pseudoregister in others.  This type encodes how we should move data around the registers and accumulators. \\
202\texttt{generic\_reg} & The representation of generic registers (i.e. those that are not devoted to a specific task). \\
203\texttt{call\_args} & The actual arguments passed to a function.  For some languages this is simply the number of arguments passed to the function. \\
204\texttt{call\_dest} & The destination of the function call. \\
205\texttt{extend\_statements} & Instructions that are specific to a particular intermediate language, and which cannot be abstracted into the joint language.
206\end{tabular*}
207\end{center}
208
209As mentioned in the report for Deliverable D4.2, the record \texttt{params\_\_} is enough to be able to specify the instructions of the joint languages:
210\begin{lstlisting}
211inductive joint_instruction (p: params__) (globals: list ident): Type[0] :=
212  | COMMENT: String $\rightarrow$ joint_instruction p globals
213  | COST_LABEL: costlabel $\rightarrow$ joint_instruction p globals
214  ...
215  | OP1: Op1 $\rightarrow$ acc_a_reg p $\rightarrow$ acc_a_reg p $\rightarrow$ joint_instruction p globals
216  | COND: acc_a_reg p $\rightarrow$ label $\rightarrow$ joint_instruction p globals
217  ...
218\end{lstlisting}
219Here, we see that the instruction \texttt{OP1} (a unary operation on the accumulator A) can be given quite a specific type, through the use of the \texttt{params\_\_} data structure.
220
221Joint statements can be split into two subclasses: those who simply pass the flow of control onto their successor statement, and those that jump to a potentially remote location in the program.
222Naturally, as some intermediate languages are graph based, and others linearised, the passing act of passing control on to the `successor' instruction can either be the act of following a graph edge in a control flow graph, or incrementing an index into a list.
223We make a distinction between instructions that pass control onto their immediate successors, and those that jump elsewhere in the program, through the use of \texttt{succ}, denoting the immediate successor of the current instruction, in the \texttt{params\_} record described below.
224\begin{lstlisting}
225record params_: Type[1] :=
226{
227  pars__ :> params__;
228  succ: Type[0]
229}.
230\end{lstlisting}
231The type \texttt{succ} corresponds to labels, in the case of control flow graph based languages, or is instantiated to the unit type for the linearised language, LIN.
232Using \texttt{param\_} we can define statements of the joint language:
233\begin{lstlisting}
234inductive joint_statement (p:params_) (globals: list ident): Type[0] :=
235  | sequential: joint_instruction p globals $\rightarrow$ succ p $\rightarrow$ joint_statement p globals
236  | GOTO: label $\rightarrow$ joint_statement p globals
237  | RETURN: joint_statement p globals.
238\end{lstlisting}
239Note that in the joint language, instructions are `linear', in that they have an immediate successor.
240Statements, on the other hand, consist of either a linear instruction, or a \texttt{GOTO} or \texttt{RETURN} statement, both of which can jump to an arbitrary place in the program. The conditional jump instruction COND is `linear', since it
241has an immediate successor, but it also takes an arbitrary location (a label)
242to jump to.
243
244For the semantics, we need further parametererised types.
245In particular, we parameterise the result and parameter type of an internal function call in \texttt{params0}:
246\begin{lstlisting}
247record params0: Type[1] :=
248{
249  pars__' :> params__;
250  resultT: Type[0];
251  paramsT: Type[0]
252}.
253\end{lstlisting}
254Here, \texttt{paramsT} and \texttt{resultT} typically are the (pseudo)registers that store the parameters and result of a function.
255
256We further extend \texttt{params0} with a type for local variables in internal function calls:
257\begin{lstlisting}
258record params1 : Type[1] :=
259{
260  pars0 :> params0;
261  localsT: Type[0]
262}.
263\end{lstlisting}
264Again, we expand our parameters with types corresponding to the code representation (either a control flow graph or a list of statements).
265Further, we hypothesise a generic method for looking up the next instruction in the graph, called \texttt{lookup}.
266Note that \texttt{lookup} may fail, and returns an \texttt{option} type:
267\begin{lstlisting}
268record params (globals: list ident): Type[1] :=
269{
270  succ_ : Type[0];
271  pars1 :> params1;
272  codeT : Type[0];
273  lookup: codeT $\rightarrow$ label $\rightarrow$ option (joint_statement (mk_params_ pars1 succ_) globals)
274}.
275\end{lstlisting}
276We now have what we need to define internal functions for the joint language.
277The first two `universe' fields are only used in the compilation process, for generating fresh names, and do not affect the semantics.
278The rest of the fields affect both compilation and semantics.
279In particular, we have a description of the result, parameters and the local variables of a function.
280Note also that we have lifted the hypothesised \texttt{lookup} function from \texttt{params} into a dependent sigma type, which combines a label (the entry and exit points of the control flow graph or list) combined with a proof that the label is in the graph structure:
281\begin{lstlisting}
282record joint_internal_function (globals: list ident) (p:params globals) : Type[0] :=
283{
284  joint_if_luniverse: universe LabelTag;
285  joint_if_runiverse: universe RegisterTag;
286  joint_if_result   : resultT p;
287  joint_if_params   : paramsT p;
288  joint_if_locals   : localsT p;
289  joint_if_stacksize: nat;
290  joint_if_code     : codeT ... p;
291  joint_if_entry    : $\Sigma$l: label. lookup ... joint_if_code l $\neq$ None ?;
292  joint_if_exit     : $\Sigma$l: label. lookup ... joint_if_code l $\neq$ None ?
293}.
294\end{lstlisting}
295Naturally, a question arises as to why we have chosen to split up the parameterisation into so many intermediate records, each slightly extending earlier ones.
296The reason is because some intermediate languages share a host of parameters, and only differ on some others.
297For instance, in instantiating the ERTL language, certain parameters are shared with RTL, whilst others are ERTL specific:
298\begin{lstlisting}
299...
300definition ertl_params__: params__ :=
301 mk_params__ register register register register (move_registers $\times$ move_registers)
302  register nat unit ertl_statement_extension.
303...
304definition ertl_params1: params1 := rtl_ertl_params1 ertl_params0.
305definition ertl_params: ∀globals. params globals := rtl_ertl_params ertl_params0.
306...
307definition ertl_statement := joint_statement ertl_params_.
308
309definition ertl_internal_function :=
310  $\lambda$globals.joint_internal_function ... (ertl_params globals).
311\end{lstlisting}
312Here, \texttt{rtl\_ertl\_params1} are the common parameters of the ERTL and RTL languages:
313\begin{lstlisting}
314definition rtl_ertl_params1 := $\lambda$pars0. mk_params1 pars0 (list register).
315\end{lstlisting}
316
317The record \texttt{more\_sem\_params} bundles together functions that store and retrieve values in various forms of register:
318\begin{lstlisting}
319record more_sem_params (p:params_): Type[1] :=
320{
321  framesT: Type[0];
322  empty_framesT: framesT;
323
324  regsT: Type[0];
325  empty_regsT: regsT;
326
327  call_args_for_main: call_args p;
328  call_dest_for_main: call_dest p;
329
330  greg_store_: generic_reg p $\rightarrow$ beval $\rightarrow$ regsT $\rightarrow$ res regsT;
331  greg_retrieve_: regsT $\rightarrow$ generic_reg p $\rightarrow$ res beval;
332  acca_store_: acc_a_reg p $\rightarrow$ beval $\rightarrow$ regsT $\rightarrow$ res regsT;
333  acca_retrieve_: regsT $\rightarrow$ acc_a_reg p $\rightarrow$ res beval;
334  ...
335  dpl_store_: dpl_reg p $\rightarrow$ beval $\rightarrow$ regsT $\rightarrow$ res regsT;
336  dpl_retrieve_: regsT $\rightarrow$ dpl_reg p $\rightarrow$ res beval;
337  ...
338  pair_reg_move_: regsT $\rightarrow$ pair_reg p $\rightarrow$ res regsT;
339}.
340\end{lstlisting}
341Here, the fields \texttt{empty\_framesT}, \texttt{empty\_regsT}, \texttt{call\_args\_for\_main} and \texttt{call\_dest\_for\_main} are used for state initialisation.
342
343The fields \texttt{greg\_store\_} and \texttt{greg\_retrieve\_} store and retrieve values from a generic register, respectively.
344Similarly, \texttt{pair\_reg\_move} implements the generic move instruction of the joint language.
345Here \texttt{framesT} is the type of stack frames, with \texttt{empty\_framesT} an empty stack frame.
346
347The two hypothesised values \texttt{call\_args\_for\_main} and \texttt{call\_dest\_for\_main} deal with problems with the \texttt{main} function of the program, and how it is handled.
348In particular, we need to know when the \texttt{main} function has finished executing.
349But this is complicated, in C, by the fact that the \texttt{main} function is explicitly allowed to be recursive (disallowed in C++).
350Therefore, to understand whether the exiting \texttt{main} function is really exiting, or just recursively calling itself, we need to remember the address to which \texttt{main} will return control once the initial call to \texttt{main} has finished executing.
351This is done with \texttt{call\_dest\_for\_main}, whereas \texttt{call\_args\_for\_main} holds the \texttt{main} function's arguments.
352
353We extend \texttt{more\_sem\_params} with yet more parameters via \texttt{more\_sem\_params2}:
354\begin{lstlisting}
355record more_sem_params1 (globals: list ident) (p: params globals) : Type[1] :=
356{
357  more_sparams1 :> more_sem_params p;
358
359  succ_pc: succ p $\rightarrow$ address $\rightarrow$ res address;
360  pointer_of_label: genv ... p $\rightarrow$ pointer $\rightarrow$
361    label $\rightarrow$ res ($\Sigma$p:pointer. ptype p = Code);
362  ...
363  fetch_statement:
364    genv ... p $\rightarrow$ state (mk_sem_params ... more_sparams1) $\rightarrow$
365    res (joint_statement (mk_sem_params ... more_sparams1) globals);
366  ...
367  save_frame:
368    address $\rightarrow$ nat $\rightarrow$ paramsT ... p $\rightarrow$ call_args p $\rightarrow$ call_dest p $\rightarrow$
369    state (mk_sem_params ... more_sparams1) $\rightarrow$
370    res (state (mk_sem_params ... more_sparams1));
371  pop_frame:
372    genv globals p $\rightarrow$ state (mk_sem_params ... more_sparams1) $\rightarrow$
373    res ((state (mk_sem_params ... more_sparams1)));
374  ...
375  set_result:
376    list val $\rightarrow$ state (mk_sem_params ... more_sparams1) $\rightarrow$
377    res (state (mk_sem_params ... more_sparams1));
378  exec_extended:
379    genv globals p $\rightarrow$ extend_statements (mk_sem_params ... more_sparams1) $\rightarrow$
380    succ p $\rightarrow$ state (mk_sem_params ... more_sparams1) $\rightarrow$
381    IO io_out io_in (trace $\times$ (state (mk_sem_params ... more_sparams1)))
382 }.
383\end{lstlisting}
384The field \texttt{succ\_pc} takes an address, and a `successor' label, and returns the address of the instruction immediately succeeding the one at hand.
385
386Here, \texttt{fetch\_statement} fetches the next statement to be executed.
387The fields \texttt{save\_frame} and \texttt{pop\_frame} manipulate stack frames.
388In particular, \texttt{save\_frame} creates a new stack frame on the top of the stack, saving the destination and parameters of a function, and returning an updated state.
389The field \texttt{pop\_frame} destructively pops a stack frame from the stack, returning an updated state.
390Further, \texttt{set\_result} saves the result of the function computation, and \texttt{exec\_extended} is a function that executes the extended statements, peculiar to each individual intermediate language.
391
392We bundle \texttt{params} and \texttt{sem\_params} together into a single record.
393This will be used in the function \texttt{eval\_statement} which executes a single statement of the joint language:
394\begin{lstlisting}
395record sem_params2 (globals: list ident): Type[1] :=
396{
397  p2 :> params globals;
398  more_sparams2 :> more_sem_params2 globals p2
399}.
400\end{lstlisting}
401\noindent
402The \texttt{state} record holds the current state of the interpreter:
403\begin{lstlisting}
404record state (p: sem_params): Type[0] :=
405{
406  st_frms: framesT ? p;
407  pc: address;
408  sp: pointer;
409  isp: pointer;
410  carry: beval;
411  regs: regsT ? p;
412  m: bemem
413}.
414\end{lstlisting}
415Here \texttt{st\_frms} represent stack frames, \texttt{pc} the program counter, \texttt{sp} the stack pointer, \texttt{isp} the internal stack pointer, \texttt{carry} the carry flag, \texttt{regs} the registers (hardware and pseudoregisters) and \texttt{m} external RAM.
416Note that we have two stack pointers, as we have two stacks: the physical stack of the MCS-51 microprocessor, and an emulated stack in external RAM.
417The MCS-51's own stack is minuscule, therefore it is usual to emulate a much larger, more useful stack in external RAM.
418We require two stack pointers as the MCS-51's \texttt{PUSH} and \texttt{POP} instructions manipulate the physical stack, and not the emulated one.
419
420We use the function \texttt{eval\_statement} to evaluate a single joint statement:
421\begin{lstlisting}
422definition eval_statement:
423  $\forall$globals: list ident.$\forall$p:sem_params2 globals.
424    genv globals p $\rightarrow$ state p $\rightarrow$ IO io_out io_in (trace $\times$ (state p)) :=
425...
426\end{lstlisting}
427We examine the type of this function.
428Note that it returns a monadic action, \texttt{IO}, denoting that it may have an IO \emph{side effect}, where the program reads or writes to some external device or memory address.
429Monads and their use are further discussed in Subsection~\ref{subsect.use.of.monads}.
430Further, the function returns a new state, updated by the single step of execution of the program.
431Finally, a \emph{trace} is also returned, which records externally observable `events', such as the calling of external functions and the emission of cost labels.
432
433%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
434% SECTION.                                                                    %
435%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
436\subsection{Use of monads}
437\label{subsect.use.of.monads}
438
439Monads are a categorical notion that have recently gained an amount of traction in functional programming circles.
440In particular, it was noted by Moggi that monads could be used to sequence \emph{effectful} computations in a pure manner.
441Here, `effectful computations' cover a lot of ground, from writing to files, generating fresh names, or updating an ambient notion of state.
442
443A monad can be characterised by the following:
444\begin{itemize}
445\item
446A data type, $M$.
447For instance, the \texttt{option} type in OCaml or Matita.
448\item
449A way to `inject' or `lift' pure values into this data type (usually called \texttt{return}).
450We call this function \texttt{return} and say that it must have type $\alpha \rightarrow M \alpha$, where $M$ is the name of the monad.
451In our example, the `lifting' function for the \texttt{option} monad can be implemented as:
452\begin{lstlisting}
453let return x = Some x
454\end{lstlisting}
455\item
456A way to `sequence' monadic functions together, to form another monadic function, usually called \texttt{bind}.
457Bind has type $M \alpha \rightarrow (\alpha \rightarrow M \beta) \rightarrow M \beta$.
458We can see that bind `unpacks' a monadic value, applies a function after unpacking, and `repacks' the new value in the monad.
459In our example, the sequencing function for the \texttt{option} monad can be implemented as:
460\begin{lstlisting}
461let bind o f =
462  match o with
463    None -> None
464    Some s -> f s
465\end{lstlisting}
466\item
467A series of algebraic laws that relate \texttt{return} and \texttt{bind}, ensuring that the sequencing operation `does the right thing' by retaining the order of effects.
468These \emph{monad laws} should also be useful in reasoning about monadic computations in the proof of correctness of the compiler.
469\end{itemize}
470In the semantics of both front and back-end intermediate languages, we make use of monads.
471This monadic infrastructure is shared between the front-end and back-end languages.
472
473In particular, an `IO' monad, signalling the emission of a cost label, or the calling of an external function, is heavily used in the semantics of the intermediate languages.
474Here, the monad's sequencing operation ensures that cost label emissions and function calls are maintained in the correct order.
475We have already seen how the \texttt{eval\_statement} function of the joint language is monadic, with type:
476\begin{lstlisting}
477definition eval_statement:
478  $\forall$globals: list ident.$\forall$p:sem_params2 globals.
479    genv globals p $\rightarrow$ state p $\rightarrow$ IO io_out io_in (trace $\times$ (state p)) :=
480...
481\end{lstlisting}
482If we examine the body of \texttt{eval\_statement}, we may also see how the monad sequences effects.
483For instance, in the case for the \texttt{LOAD} statement, we have the following:
484\begin{lstlisting}
485definition eval_statement:
486  $\forall$globals: list ident. $\forall$p:sem_params2 globals.
487    genv globals p $\rightarrow$ state p $\rightarrow$ IO io_out io_in (trace $\times$ (state p)) :=
488  $\lambda$globals, p, ge, st.
489  ...
490  match s with
491  | LOAD dst addrl addrh ⇒
492    ! vaddrh $\leftarrow$ dph_retrieve ... st addrh;
493    ! vaddrl $\leftarrow$ dpl_retrieve ... st addrl;
494    ! vaddr $\leftarrow$ pointer_of_address $\langle$vaddrl,vaddrh$\rangle$;
495    ! v $\leftarrow$ opt_to_res ... (msg FailedLoad) (beloadv (m ... st) vaddr);
496    ! st $\leftarrow$ acca_store p ... dst v st;
497    ! st $\leftarrow$ next ... l st ;
498      ret ? $\langle$E0, st$\rangle$
499\end{lstlisting}
500Here, we employ a certain degree of syntactic sugaring.
501The syntax
502\begin{lstlisting}
503  ...
504! vaddrh $\leftarrow$ dph_retrieve ... st addrh;
505! vaddrl $\leftarrow$ dpl_retrieve ... st addrl;
506  ...
507\end{lstlisting}
508is sugaring for the \texttt{IO} monad's binding operation.
509We can expand this sugaring to the following much more verbose code:
510\begin{lstlisting}
511  ...
512  bind (dph_retrieve ... st addrh) ($\lambda$vaddrh. bind (dpl_retrieve ... st addrl)
513    ($\lambda$vaddrl. ...))
514\end{lstlisting}
515Note also that the function \texttt{ret} is implementing the `lifting', or return function of the \texttt{IO} monad.
516
517We believe the sugaring for the monadic bind operation makes the program much more readable, and therefore easier to reason about.
518In particular, note that the functions \texttt{dph\_retrieve}, \texttt{pointer\_of\_address}, \texttt{acca\_store} and \texttt{next} are all monadic.
519
520Note, however, that inside this monadic code, there is also another monad hiding.
521The \texttt{res} monad signals failure, along with an error message.
522The monad's sequencing operation ensures the order of error messages does not get rearranged.
523The function \texttt{opt\_to\_res} lifts an option type into this monad, with an error message to be used in case of failure.
524The \texttt{res} monad is then coerced into the \texttt{IO} monad, ensuring the whole code snippet typechecks.
525
526\subsection{Memory models}
527\label{subsect.memory.models}
528
529Currently, the semantics of the front and back-end intermediate languages are built around two distinct memory models.
530The front-end languages reuse the CompCert 1.6 memory model, whereas the back-end languages employ a version tailored to their needs.
531This split between the memory models reflects the fact that the front-end and back-end languages have different requirements from their memory models.
532
533In particular, the CompCert 1.6 memory model places quite heavy restrictions on where in memory one can read from.
534To read a value in this memory model, you must supply an address, complete with the size of `chunk' to read following that address.
535The read is only successful if you attempt to read at a genuine `value boundary', and read the appropriate amount of memory for that value.
536As a result, with that memory model you are unable to read the third byte of a 32-bit integer value directly from memory, for instance.
537This has some consequences for the compiler, namely an inability to write a \texttt{memcpy} routine.
538
539However, the CerCo memory model operates differently, as we need to move data `piecemeal' between stacks in the back-end of the compiler.
540As a result, the back-end memory model allows one to read data at any memory location, not just on value boundaries.
541This has the advantage that we can successfully give a semantics to a \texttt{memcpy} routine in the back-end of the CerCo compiler (remembering that \texttt{memcpy} is nothing more than `read a byte, copy a byte' repeated in a loop), an advantage over CompCert.  However, the front-end of CerCo cannot because its memory model and values are the similar to CompCert 1.6.
542
543More recent versions of CompCert's memory model have evolved in a similar direction, with a byte-by-byte representation of memory blocks.  However, there remains an important difference in the handling of pointer values in the rest of the formalisation.  In particular, in CompCert 1.10 only complete pointer values can be loaded in all of the languages in the compiler, whereas in CerCo we need to represent individual bytes of a pointer in the back-end to support our 8-bit target architecture.
544
545Right now, the two memory models are interfaced during the translation from RTLabs to RTL.
546It is an open question whether we will unify the two memory models, using only the back-end, bespoke memory model throughout the compiler, as the CompCert memory model seems to work fine for the front-end, where such byte-by-byte copying is not needed.
547However, should we decide to port the front-end to the new memory model, it has been written in such an abstract way that doing so would be relatively straightforward.
548
549%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
550% SECTION.                                                                    %
551%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
552\section{Future work}
553\label{sect.future.work}
554
555Most things related to external function calls are currently axiomatised.
556This is due to there being a difficulty with how stackframes are handled with external function calls.
557We leave this for further work, due to there being no pressing need to implement this feature at the present time.
558
559There is also, as mentioned, an open problem as to whether the front-end languages should use the same memory model as the back-end languages, as opposed to reusing the CompCert memory model.
560Should this decision be taken, this will likely be straightforward but potentially time consuming\footnote{After the original version of this deliverable was written we ported the front-end languages' semantics to the back-end memory model.  This turned out not to be time consuming, and moreover used definitions linking front-end and back-end values that are required for the correctness proofs anyway.  However, the front-end still cannot give a semantics to \texttt{memcpy}, for the same reason as CompCert 1.10; the language currently has no representation for a single byte of a pointer.}.
561
562%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
563% SECTION.                                                                    %
564%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
565\section{Code listing}
566\label{sect.code.listing}
567
568%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
569% SECTION.                                                                    %
570%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
571\subsection{Listing of files}
572\label{subsect.listing.files}
573
574Syntax specific files are presented in Table~\ref{table.syntax} (files relating to language translations omitted).
575Here, the OCaml column denotes the OCaml source file(s) in the prototype compiler's implementation that corresponds to the Matita script in question.
576The ratios are the linecounts of the Matita file divided by the line counts of the corresponding OCaml file.
577These are computed with \texttt{wc -l}, a standard Unix tool.
578
579Individual file's ratios are an over approximation, due to the fact that it's hard to relate an individual OCaml file to the abstracted Matita code that has been spread across multiple files.
580The ratio between total Matita code lines and total OCaml code lines is more reflective of the compressed and abstracted state of the Matita translation.
581
582Semantics specific files are presented in Table~\ref{table.semantics}.
583\begin{landscape}
584\begin{table}
585\begin{threeparttable}
586\begin{tabular}{llrlrl}
587Description & Matita & Lines & OCaml & Lines & Ratio \\
588\hline
589Abstracted syntax for back-end languages & \texttt{joint/Joint.ma} & 173 & N/A & N/A & N/A \\
590The syntax of RTLabs & \texttt{RTLabs/syntax.ma} & 73 & \texttt{RTLabs/RTLabs.mli} & 113 & 0.65 \\
591The syntax of RTL & \texttt{RTL/RTL.ma} & 49 & \texttt{RTL/RTL.mli} & 120 & 1.85\tnote{a} \\
592The syntax of ERTL & \texttt{ERTL/ERTL.ma} & 25 & \texttt{ERTL/ERTL.mli} & 191 & 1.04\tnote{a} \\
593The syntax of the abstracted combined LTL and LIN language & \texttt{LIN/joint\_LTL\_LIN.ma} & 10 & N/A & N/A & N/A \\
594The specialisation of the above file to the syntax of LTL & \texttt{LTL/LTL.ma} & 10 & \texttt{LTL/LTL.mli} & 104 & 1.86\tnote{b} \\
595The specialisation of the above file to the syntax of LIN & \texttt{LIN/LIN.ma} & 17 & \texttt{LIN/LIN.mli} & 88 & 2.27\tnote{b} \\
596\end{tabular}
597\begin{tablenotes}
598  \item[a] After inlining of \texttt{joint/Joint.ma}.
599  \item[b] After inlining of \texttt{joint/Joint\_LTL\_LIN.ma} and \texttt{joint/Joint.ma}.\\
600  \begin{tabular}{ll}
601  Total lines of Matita code for the above files:& 347 \\
602  Total lines of OCaml code for the above files:& 616 \\
603  Ratio of total lines:& 0.56
604  \end{tabular}
605
606\end{tablenotes}
607\end{threeparttable}
608\caption{Syntax specific files in the intermediate language semantics}
609\label{table.syntax}
610\end{table}
611\end{landscape}
612\begin{landscape}
613\begin{table}
614\begin{threeparttable}
615\begin{tabular}{llrlrl}
616Description & Matita & Lines & OCaml & Lines & Ratio \\
617\hline
618Semantics of the abstracted languages & \texttt{joint/semantics.ma} & 434 & N/A & N/A & N/A  \\
619Generic utilities used in semantics `joint' languages & \texttt{joint/SemanticUtils.ma} & 70 & N/A & N/A & N/A \\
620Semantics of RTLabs & \texttt{RTLabs/semantics.ma} & 223 & \texttt{RTLabs/RTLabsInterpret.ml} & 355 & 0.63 \\
621Semantics of RTL & \texttt{RTL/semantics.ma} & 173 & \texttt{RTL/RTLInterpret.ml} & 324 & 2.01\tnote{a} \\
622Semantics of ERTL & \texttt{ERTL/semantics.ma} & 130 & \texttt{ERTL/ERTLInterpret.ml} & 504  & 1.26\tnote{a} \\
623Semantics of the joint LTL-LIN language & \texttt{LIN/joint\_LTL\_LIN\_semantics.ma} & 67 & N/A & N/A & N/A \\
624Semantics of LTL & \texttt{LTL/semantics.ma} & 5 & \texttt{LTL/LTLInterpret.ml} & 416 & 1.38\tnote{b} \\
625Semantics of LIN & \texttt{LIN/semantics.ma} & 43 & \texttt{LIN/LINInterpret.ml} & 379 & 1.62\tnote{b}
626\end{tabular}
627\begin{tablenotes}
628  \item{a} Includes \texttt{joint/semantics.ma} and \texttt{joint/SemanticUtils.ma}.
629  \item{b} Includes \texttt{joint/semantics.ma}, \texttt{joint/SemanticUtils.ma} and \texttt{joint/joint\_LTL\_LIN\_semantics.ma}. \\
630 \begin{tabular}{ll}
631  Total lines of Matita code for the above files:& 1145 \\
632  Total lines of OCaml code for the above files:& 1978 \\
633  Ration of total lines:& 0.58
634 \end{tabular}
635\end{tablenotes}
636\end{threeparttable}
637\caption{Semantics specific files in the intermediate language semantics}
638\label{table.semantics}
639\end{table}
640\end{landscape}
641
642%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
643% SECTION.                                                                    %
644%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
645\subsection{Listing of important functions and axioms}
646\label{subsect.listing.important.functions.axioms}
647
648We list some important functions and axioms in the back-end semantics:
649
650\paragraph{From RTLabs/semantics.ma}
651\begin{center}
652\begin{tabular*}{0.9\textwidth}{p{5cm}p{8cm}}
653Title & Description \\
654\hline
655\texttt{make\_initial\_state} & Build an initial state \\
656\texttt{eval\_statement} & Evaluate a single RTLabs statement \\
657\texttt{is\_final} & Check whether a state is in a `final' configuration \\
658\texttt{RTLabs\_exec} & Execute an RTLabs program
659\end{tabular*}
660\end{center}
661
662\paragraph{From RTL/semantics.ma}
663\begin{center}
664\begin{tabular*}{0.9\textwidth}{p{5cm}p{8cm}}
665Title & Description \\
666\hline
667\texttt{rtl\_exec\_extended} & Execute a single step of the RTL language's extended instructions \\
668\texttt{rtl\_fullexec} & Execute an RTL program
669\end{tabular*}
670\end{center}
671
672\paragraph{From ERTL/semantics.ma}
673\begin{center}
674\begin{tabular*}{0.9\textwidth}{p{5cm}p{8cm}}
675Title & Description \\
676\hline
677\texttt{ertl\_exec\_extended} & Execute a single step of the ERTL language's extended instructions \\
678\texttt{ertl\_fullexec} & Execute an ERTL program
679\end{tabular*}
680\end{center}
681
682\paragraph{From LTL/semantics.ma}
683\begin{center}
684\begin{tabular*}{0.9\textwidth}{p{5cm}p{8cm}}
685Title & Description \\
686\hline
687\texttt{ltl\_fullexec} & Execute an LTL program
688\end{tabular*}
689\end{center}
690
691\paragraph{From LIN/semantics.ma}
692\begin{center}
693\begin{tabular*}{0.9\textwidth}{p{5cm}p{8cm}}
694Title & Description \\
695\hline
696\texttt{lin\_fullexec} & Execute a LIN program
697\end{tabular*}
698\end{center}
699
700\paragraph{From LIN/joint\_LTL\_LIN\_semantics.ma}
701\begin{center}
702\begin{tabular*}{0.9\textwidth}{p{5cm}p{8cm}}
703Title & Description \\
704\hline
705\texttt{ltl\_lin\_exec\_extended} & Execute a single step of the joint LTL-LIN language's extended instructions \\
706\texttt{ltl\_lin\_fullexec} & Execute a joint LTL-LIN language program
707\end{tabular*}
708\end{center}
709
710\paragraph{From joint/semantics.ma}
711\begin{center}
712\begin{tabular*}{0.9\textwidth}{p{5cm}p{8cm}}
713Title & Description \\
714\hline
715\texttt{eval\_statement} & Evaluate a single joint language statement \\
716\texttt{is\_final} & Check whether a state is in a `final' configuration \\
717\texttt{joint\_fullexec} & Execute a joint language program
718\end{tabular*}
719\end{center}
720
721\paragraph{From joint/SemanticUtils.ma}
722\begin{center}
723\begin{tabular*}{0.9\textwidth}{p{5cm}p{8cm}}
724Title & Description \\
725\hline
726\texttt{graph\_fetch\_statement} & Fetch a statement from a control flow graph
727\end{tabular*}
728\end{center}
729
730\end{document}
Note: See TracBrowser for help on using the repository browser.