1 | \documentclass[11pt, epsf, a4wide]{article} |
---|
2 | |
---|
3 | \usepackage{../../style/cerco} |
---|
4 | |
---|
5 | \usepackage{amsfonts} |
---|
6 | \usepackage{amsmath} |
---|
7 | \usepackage{amssymb} |
---|
8 | \usepackage[english]{babel} |
---|
9 | \usepackage{graphicx} |
---|
10 | \usepackage[utf8x]{inputenc} |
---|
11 | \usepackage{listings} |
---|
12 | \usepackage{stmaryrd} |
---|
13 | \usepackage{url} |
---|
14 | |
---|
15 | \title{ |
---|
16 | INFORMATION AND COMMUNICATION TECHNOLOGIES\\ |
---|
17 | (ICT)\\ |
---|
18 | PROGRAMME\\ |
---|
19 | \vspace*{1cm}Project FP7-ICT-2009-C-243881 \cerco{}} |
---|
20 | |
---|
21 | \lstdefinelanguage{matita-ocaml} |
---|
22 | {keywords={definition,coercion,lemma,theorem,remark,inductive,record,qed,let,let,in,rec,match,return,with,Type,try}, |
---|
23 | morekeywords={[2]whd,normalize,elim,cases,destruct}, |
---|
24 | morekeywords={[3]type,of}, |
---|
25 | mathescape=true, |
---|
26 | } |
---|
27 | |
---|
28 | \lstset{language=matita-ocaml,basicstyle=\small\tt,columns=flexible,breaklines=false, |
---|
29 | keywordstyle=\color{red}\bfseries, |
---|
30 | keywordstyle=[2]\color{blue}, |
---|
31 | keywordstyle=[3]\color{blue}\bfseries, |
---|
32 | commentstyle=\color{green}, |
---|
33 | stringstyle=\color{blue}, |
---|
34 | showspaces=false,showstringspaces=false} |
---|
35 | |
---|
36 | \lstset{extendedchars=false} |
---|
37 | \lstset{inputencoding=utf8x} |
---|
38 | \DeclareUnicodeCharacter{8797}{:=} |
---|
39 | \DeclareUnicodeCharacter{10746}{++} |
---|
40 | \DeclareUnicodeCharacter{9001}{\ensuremath{\langle}} |
---|
41 | \DeclareUnicodeCharacter{9002}{\ensuremath{\rangle}} |
---|
42 | |
---|
43 | \date{} |
---|
44 | \author{} |
---|
45 | |
---|
46 | \begin{document} |
---|
47 | |
---|
48 | \thispagestyle{empty} |
---|
49 | |
---|
50 | \vspace*{-1cm} |
---|
51 | \begin{center} |
---|
52 | \includegraphics[width=0.6\textwidth]{../../style/cerco_logo.png} |
---|
53 | \end{center} |
---|
54 | |
---|
55 | \begin{minipage}{\textwidth} |
---|
56 | \maketitle |
---|
57 | \end{minipage} |
---|
58 | |
---|
59 | \vspace*{0.5cm} |
---|
60 | \begin{center} |
---|
61 | \begin{LARGE} |
---|
62 | \textbf{ |
---|
63 | Report n. D4.3\\ |
---|
64 | Formal semantics of intermediate languages |
---|
65 | } |
---|
66 | \end{LARGE} |
---|
67 | \end{center} |
---|
68 | |
---|
69 | \vspace*{2cm} |
---|
70 | \begin{center} |
---|
71 | \begin{large} |
---|
72 | Version 1.0 |
---|
73 | \end{large} |
---|
74 | \end{center} |
---|
75 | |
---|
76 | \vspace*{0.5cm} |
---|
77 | \begin{center} |
---|
78 | \begin{large} |
---|
79 | Main Authors:\\ |
---|
80 | Dominic P. Mulligan and Claudio Sacerdoti Coen |
---|
81 | \end{large} |
---|
82 | \end{center} |
---|
83 | |
---|
84 | \vspace*{\fill} |
---|
85 | |
---|
86 | \noindent |
---|
87 | Project Acronym: \cerco{}\\ |
---|
88 | Project full title: Certified Complexity\\ |
---|
89 | Proposal/Contract no.: FP7-ICT-2009-C-243881 \cerco{}\\ |
---|
90 | |
---|
91 | \clearpage |
---|
92 | \pagestyle{myheadings} |
---|
93 | \markright{\cerco{}, FP7-ICT-2009-C-243881} |
---|
94 | |
---|
95 | \newpage |
---|
96 | |
---|
97 | \vspace*{7cm} |
---|
98 | \paragraph{Abstract} |
---|
99 | We describe the encoding in the Calculus of Constructions of the semantics of the CerCo compiler's backend intermediate languages. |
---|
100 | The CerCo backend consists of five distinct languages: RTL, RTLntl, ERTL, LTL and LIN. |
---|
101 | We describe a process of heavy abstraction of the intermediate languages and their semantics. |
---|
102 | We hope that this process will ease the burden of Deliverable D4.4, the proof of correctness for the compiler. |
---|
103 | |
---|
104 | \newpage |
---|
105 | |
---|
106 | \tableofcontents |
---|
107 | |
---|
108 | \newpage |
---|
109 | |
---|
110 | %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% |
---|
111 | % SECTION. % |
---|
112 | %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% |
---|
113 | \section{Task} |
---|
114 | \label{sect.task} |
---|
115 | |
---|
116 | The Grant Agreement states that Task T4.3, entitled `Formal semantics of intermediate languages' has associated Deliverable D4.3, consisting of the following: |
---|
117 | \begin{quotation} |
---|
118 | Executable Formal Semantics of back-end intermediate languages: This prototype is the formal counterpart of deliverable D2.1 for the back end side of the compiler and validates it. |
---|
119 | \end{quotation} |
---|
120 | This report details our implementation of this deliverable. |
---|
121 | |
---|
122 | %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% |
---|
123 | % SECTION. % |
---|
124 | %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% |
---|
125 | \subsection{Connections with other deliverables} |
---|
126 | \label{subsect.connections.with.other.deliverables} |
---|
127 | |
---|
128 | Deliverable D4.3 enjoys a close relationship with three other deliverables, namely deliverables D2.2, D4.3 and D4.4. |
---|
129 | |
---|
130 | Deliverable D2.2, the O'Caml implementation of a cost preserving compiler for a large subset of the C programming language, is the basis upon which we have implemented the current deliverable. |
---|
131 | In particular, the architecture of the compiler, its intermediate languages and their semantics, and the overall implementation of the Matita encodings has been taken from the O'Caml compiler. |
---|
132 | Any variations from the O'Caml design are due to bugs identified in the prototype compiler during the Matita implementation, our identification of code that can be abstracted and made generic, or our use of Matita's much stronger type system to enforce invariants through the use of dependent types. |
---|
133 | |
---|
134 | Deliverable D4.2 can be seen as a `sister' deliverable to the deliverable reported on herein. |
---|
135 | In particular, where this deliverable reports on the encoding in the Calculus of Constructions of the backend semantics, D4.2 is the encoding in the Calculus of Constructions of the mutual translations of those languages. |
---|
136 | As a result, a substantial amount of Matita code is shared between the two deliverables. |
---|
137 | |
---|
138 | Deliverable D4.4, the backend correctness proofs, is the immediate successor of this deliverable. |
---|
139 | |
---|
140 | %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% |
---|
141 | % SECTION. % |
---|
142 | %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% |
---|
143 | \section{The backend intermediate languages' semantics in Matita} |
---|
144 | \label{sect.backend.intermediate.languages.semantics.matita} |
---|
145 | |
---|
146 | %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% |
---|
147 | % SECTION. % |
---|
148 | %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% |
---|
149 | \subsection{Abstracting related languages} |
---|
150 | \label{subsect.abstracting.related.languages} |
---|
151 | |
---|
152 | As mentioned in the report for Deliverable D4.2, a systematic process of abstraction, over the O'Caml code, has taken place in the Matita encoding. |
---|
153 | In particular, we have merged many of the syntaxes of the intermediate languages (i.e. RTL, ERTL, LTL and LIN) into a single `joint' syntax, which is parameterised by various types. |
---|
154 | Equivalent intermediate languages to those present in the O'Caml code can be recovered by specialising this joint structure. |
---|
155 | |
---|
156 | As mentioned in the report for Deliverable D4.2, there are a number of advantages that this process of abstraction brings, from code reuse to allowing us to get a clearer view of the intermediate languages and their structure. |
---|
157 | However, the semantics of the intermediate languages allow us to concretely demonstrate this improvement in clarity, by noting that the semantics of the LTL and the semantics of the LIN languages are identical. |
---|
158 | In particular, the semantics of both LTL and LIN are implemented in exactly the same way. |
---|
159 | The only difference between the two languages is how the next instruction to be interpreted is fetched. |
---|
160 | In LTL, this involves looking up in a graph, whereas in LTL, this involves fetching from a list of instructions. |
---|
161 | |
---|
162 | As a result, we see that the semantics of LIN and LTL are both instances of a single, more general language that is parametric in how the next instruction is fetched. |
---|
163 | Furthermore, any prospective proof that the semantics of LTL and LIN are identical is not almost trivial, saving a deal of work in Deliverable D4.4. |
---|
164 | |
---|
165 | %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% |
---|
166 | % SECTION. % |
---|
167 | %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% |
---|
168 | \subsection{Type parameters, and their purpose} |
---|
169 | \label{subsect.type.parameters.their.purpose} |
---|
170 | |
---|
171 | We mentioned in the Deliverable D4.2 report that all joint languages are parameterised by a number of types, which are later specialised to each distinct intermediate language. |
---|
172 | As this parameterisation process is also dependent on designs decisions in the language semantics, we have so far held off summarising the role of each parameter. |
---|
173 | |
---|
174 | We begin the abstraction process with the \texttt{params\_\_} record. |
---|
175 | This holds the types of the representations of the different register varieties in the intermediate languages: |
---|
176 | \begin{lstlisting} |
---|
177 | record params__: Type[1] ≝ |
---|
178 | { |
---|
179 | acc_a_reg: Type[0]; |
---|
180 | acc_b_reg: Type[0]; |
---|
181 | dpl_reg: Type[0]; |
---|
182 | dph_reg: Type[0]; |
---|
183 | pair_reg: Type[0]; |
---|
184 | generic_reg: Type[0]; |
---|
185 | call_args: Type[0]; |
---|
186 | call_dest: Type[0]; |
---|
187 | extend_statements: Type[0] |
---|
188 | }. |
---|
189 | \end{lstlisting} |
---|
190 | We summarise what these types mean, and how they are used in both the semantics and the translation process: |
---|
191 | \begin{center} |
---|
192 | \begin{tabular*}{\textwidth}{p{4cm}p{11cm}} |
---|
193 | Type & Explanation \\ |
---|
194 | \hline |
---|
195 | \texttt{acc\_a\_reg} & The type of the accumulator A register. In some languages this is implemented as the hardware accumulator, whereas in others this is a pseudoregister.\\ |
---|
196 | \texttt{acc\_b\_reg} & Similar to the accumulator A field, but for the processor's auxilliary accumulator, B. \\ |
---|
197 | \texttt{dpl\_reg} & The type of the representation of the low eight bit register of the MCS-51's single 16 bit register, DPL. Can be either a pseudoregister or the hardware DPL register. \\ |
---|
198 | \texttt{dph\_reg} & Similar to the DPL register but for the eight high bits of the 16-bit register. \\ |
---|
199 | \texttt{pair\_reg} & Various different `move' instructions have been merged into a single move instruction in the joint language. A value can either be moved to or from the accumulator in some languages, or moved to and from an arbitrary pseudoregister in others. This type encodes how we should move data around the registers and accumulators. \\ |
---|
200 | \texttt{generic\_reg} & The representation of generic registers (i.e. those that are not devoted to a specific task). \\ |
---|
201 | \texttt{call\_args} & The number of arguments to a function. For some languages this is irrelevant. \\ |
---|
202 | \texttt{call\_dest} & \\ |
---|
203 | \texttt{extend\_statements} & Instructions that are specific to a particular intermediate language, and which cannot be abstracted into the joint language. |
---|
204 | \end{tabular*} |
---|
205 | \end{center} |
---|
206 | |
---|
207 | As mentioned in the report for Deliverable D4.2, the record \texttt{params\_\_} is enough to be able to specify the instructions of the joint languages: |
---|
208 | \begin{lstlisting} |
---|
209 | inductive joint_instruction (p: params__) (globals: list ident): Type[0] := |
---|
210 | | COMMENT: String → joint_instruction p globals |
---|
211 | | COST_LABEL: costlabel → joint_instruction p globals |
---|
212 | ... |
---|
213 | \end{lstlisting} |
---|
214 | |
---|
215 | We extend \texttt{params\_\_} with a type corresponding to labels, \texttt{succ}, obtaining a new record type of parameters called \texttt{params\_}: |
---|
216 | \begin{lstlisting} |
---|
217 | record params_: Type[1] ≝ |
---|
218 | { |
---|
219 | pars__ :> params__; |
---|
220 | succ: Type[0] |
---|
221 | }. |
---|
222 | \end{lstlisting} |
---|
223 | The type \texttt{succ} corresponds to labels, in the case of control flow graph based languages, or is instantiated to the unit type for the linearised language, LIN. |
---|
224 | Using \texttt{param\_} we can define statements of the joint language: |
---|
225 | \begin{lstlisting} |
---|
226 | inductive joint_statement (p:params_) (globals: list ident): Type[0] := |
---|
227 | | sequential: joint_instruction p globals → succ p → joint_statement p globals |
---|
228 | | GOTO: label → joint_statement p globals |
---|
229 | | RETURN: joint_statement p globals. |
---|
230 | \end{lstlisting} |
---|
231 | Note that in the joint language, instructions are `linear', in that they have an immediate successor. |
---|
232 | Statements, on the other hand, consist of either a linear instruction, or a \texttt{GOTO} or \texttt{RETURN} statement, both of which can jump to an arbitrary place in the program. |
---|
233 | |
---|
234 | For the semantics, we need further parametererised types. |
---|
235 | In particular, we parameterise the result and parameter type of an internal function call in \texttt{params0}: |
---|
236 | \begin{lstlisting} |
---|
237 | record params0: Type[1] ≝ |
---|
238 | { pars__' :> params__ |
---|
239 | ; resultT: Type[0] |
---|
240 | ; paramsT: Type[0] |
---|
241 | }. |
---|
242 | \end{lstlisting} |
---|
243 | We further extend \texttt{params0} with a type for local variables in internal function calls: |
---|
244 | \begin{lstlisting} |
---|
245 | record params1 : Type[1] ≝ |
---|
246 | { pars0 :> params0 |
---|
247 | ; localsT: Type[0] |
---|
248 | }. |
---|
249 | \end{lstlisting} |
---|
250 | Again, we expand our parameters with types corresponding to the code representation (either a control flow graph or a list of statements). |
---|
251 | Further, we hypothesise a generic method for looking up the next instruction in the graph, called \texttt{lookup}. |
---|
252 | Note that \texttt{lookup} may fail, and returns an \texttt{option} type: |
---|
253 | \begin{lstlisting} |
---|
254 | record params (globals: list ident): Type[1] ≝ |
---|
255 | { succ_ : Type[0] |
---|
256 | ; pars1 :> params1 |
---|
257 | ; codeT: Type[0] |
---|
258 | ; lookup: codeT → label → option (joint_statement (mk_params_ pars1 succ_) globals) |
---|
259 | }. |
---|
260 | \end{lstlisting} |
---|
261 | We now have what we need to define internal functions for the joint language. |
---|
262 | The first two `universe' fields are only used in the compilation process, for generating fresh names, and do not affect the semantics. |
---|
263 | The rest of the fields affect both compilation and semantics. |
---|
264 | In particular, we have parameterised result types, function parameter types and the type of local variables. |
---|
265 | Note also that we have lifted the hypothesised \texttt{lookup} function from \texttt{params} into a dependent sigma type, which combines a label (the entry and exit points of the control flow graph or list) combined with a proof that the label is in the graph structure: |
---|
266 | \begin{lstlisting} |
---|
267 | record joint_internal_function (globals: list ident) (p:params globals) : Type[0] := |
---|
268 | { |
---|
269 | joint_if_luniverse: universe LabelTag; |
---|
270 | joint_if_runiverse: universe RegisterTag; |
---|
271 | joint_if_result : resultT p; |
---|
272 | joint_if_params : paramsT p; |
---|
273 | joint_if_locals : localsT p; |
---|
274 | joint_if_stacksize: nat; |
---|
275 | joint_if_code : codeT … p; |
---|
276 | joint_if_entry : $\Sigma$l: label. lookup … joint_if_code l ≠ None ?; |
---|
277 | joint_if_exit : $\Sigma$l: label. lookup … joint_if_code l ≠ None ? |
---|
278 | }. |
---|
279 | \end{lstlisting} |
---|
280 | Naturally, a question arises as to why we have chosen to split up the parameterisation into so many intermediate records, each slightly extending earlier ones. |
---|
281 | The reason is because some intermediate languages share a host of parameters, and only differ on some others. |
---|
282 | For instance, in instantiating the ERTL language, certain parameters are shared with RTL, whilst others are ERTL specific: |
---|
283 | \begin{lstlisting} |
---|
284 | ... |
---|
285 | definition ertl_params__: params__ := |
---|
286 | mk_params__ register register register register (move_registers × move_registers) |
---|
287 | register nat unit ertl_statement_extension. |
---|
288 | ... |
---|
289 | definition ertl_params1: params1 := rtl_ertl_params1 ertl_params0. |
---|
290 | definition ertl_params: ∀globals. params globals ≝ rtl_ertl_params ertl_params0. |
---|
291 | ... |
---|
292 | definition ertl_statement := joint_statement ertl_params_. |
---|
293 | |
---|
294 | definition ertl_internal_function := |
---|
295 | $\lambda$globals.joint_internal_function … (ertl_params globals). |
---|
296 | \end{lstlisting} |
---|
297 | Here, \texttt{rtl\_ertl\_params1} are the common parameters of the ERTL and RTL languages: |
---|
298 | \begin{lstlisting} |
---|
299 | definition rtl_ertl_params1 := $\lambda$pars0. mk_params1 pars0 (list register). |
---|
300 | \end{lstlisting} |
---|
301 | |
---|
302 | The record \texttt{more\_sem\_params} bundles together functions that store and retrieve values in various forms of register: |
---|
303 | \begin{lstlisting} |
---|
304 | record more_sem_params (p:params_): Type[1] := |
---|
305 | { |
---|
306 | framesT: Type[0]; |
---|
307 | empty_framesT: framesT; |
---|
308 | regsT: Type[0]; |
---|
309 | empty_regsT: regsT; |
---|
310 | call_args_for_main: call_args p; |
---|
311 | call_dest_for_main: call_dest p; |
---|
312 | succ_pc: succ p → address → res address; |
---|
313 | greg_store_: generic_reg p → beval → regsT → res regsT; |
---|
314 | greg_retrieve_: regsT → generic_reg p → res beval; |
---|
315 | acca_store_: acc_a_reg p → beval → regsT → res regsT; |
---|
316 | acca_retrieve_: regsT → acc_a_reg p → res beval; |
---|
317 | ... |
---|
318 | dpl_store_: dpl_reg p → beval → regsT → res regsT; |
---|
319 | dpl_retrieve_: regsT → dpl_reg p → res beval; |
---|
320 | ... |
---|
321 | pair_reg_move_: regsT → pair_reg p → res regsT; |
---|
322 | pointer_of_label: label → $\Sigma$p:pointer. ptype p = Code |
---|
323 | }. |
---|
324 | \end{lstlisting} |
---|
325 | For instance, \texttt{greg\_store\_} and \texttt{greg\_retrieve\_} store and retrieve values from a generic register, respectively. |
---|
326 | Similarly, \texttt{pair\_reg\_move} implements the generic move instruction of the joint language. |
---|
327 | Here \texttt{framesT} is the type of stack frames, with \texttt{empty\_framesT} an empty stack frame. |
---|
328 | |
---|
329 | The two hypothesised values \texttt{call\_args\_for\_main} and \texttt{call\_dest\_for\_main} deal with problems with the \texttt{main} function of the program, and how it is handled. |
---|
330 | In particular, we need to know when the \texttt{main} function has finished executing. |
---|
331 | But this is complicated, in C, by the fact that the \texttt{main} function is explicitly allowed to be recursive (disallowed in C++). |
---|
332 | Therefore, to understand whether the exiting \texttt{main} function is really exiting, or just recursively calling itself, we need to remember the address at which \texttt{main} is located. |
---|
333 | This is done with \texttt{call\_dest\_for\_main}, whereas \texttt{call\_args\_for\_main} holds the \texttt{main} function's arguments. |
---|
334 | |
---|
335 | We extend \texttt{more\_sem\_params} with yet more parameters via \texttt{more\_sem\_params2}: |
---|
336 | \begin{lstlisting} |
---|
337 | record more_sem_params2 (globals: list ident) (p: params globals) : Type[1] := |
---|
338 | { |
---|
339 | more_sparams1 :> more_sem_params p; |
---|
340 | fetch_statement: |
---|
341 | genv … p → state (mk_sem_params … more_sparams1) → |
---|
342 | res (joint_statement (mk_sem_params … more_sparams1) globals); |
---|
343 | ... |
---|
344 | save_frame: |
---|
345 | address → nat → paramsT … p → call_args p → call_dest p → |
---|
346 | state (mk_sem_params … more_sparams1) → |
---|
347 | res (state (mk_sem_params … more_sparams1)); |
---|
348 | pop_frame: |
---|
349 | genv globals p → state (mk_sem_params … more_sparams1) → |
---|
350 | res ((state (mk_sem_params … more_sparams1))); |
---|
351 | ... |
---|
352 | set_result: |
---|
353 | list val → state (mk_sem_params … more_sparams1) → |
---|
354 | res (state (mk_sem_params … more_sparams1)); |
---|
355 | exec_extended: |
---|
356 | genv globals p → extend_statements (mk_sem_params … more_sparams1) → |
---|
357 | succ p → state (mk_sem_params … more_sparams1) → |
---|
358 | IO io_out io_in (trace × (state (mk_sem_params … more_sparams1))) |
---|
359 | }. |
---|
360 | \end{lstlisting} |
---|
361 | Here, \texttt{fetch\_statement} fetches the next statement to be executed, \texttt{save\_frame} and \texttt{pop\_frame} manipulate stack frames, \texttt{set\_result} saves the result of the function computation, and \texttt{exec\_extended} is a function that executes the extended statements, peculiar to each individual intermediate language. |
---|
362 | |
---|
363 | We bundle \texttt{params} and \texttt{sem\_params} together into a single record. |
---|
364 | This will be used in the function \texttt{eval\_statement} which executes a single statement of the joint language: |
---|
365 | \begin{lstlisting} |
---|
366 | record sem_params2 (globals: list ident): Type[1] := |
---|
367 | { |
---|
368 | p2 :> params globals; |
---|
369 | more_sparams2 :> more_sem_params2 globals p2 |
---|
370 | }. |
---|
371 | \end{lstlisting} |
---|
372 | \noindent |
---|
373 | The \texttt{state} record holds the current state of the interpreter: |
---|
374 | \begin{lstlisting} |
---|
375 | record state (p: sem_params): Type[0] := |
---|
376 | { |
---|
377 | st_frms: framesT ? p; |
---|
378 | pc: address; |
---|
379 | sp: pointer; |
---|
380 | carry: beval; |
---|
381 | regs: regsT ? p; |
---|
382 | m: bemem |
---|
383 | }. |
---|
384 | \end{lstlisting} |
---|
385 | Here \texttt{st\_frms} represent stack frames, \texttt{pc} the program counter, \texttt{sp} the stack pointer, \texttt{carry} the carry flag, \texttt{regs} the generic registers and \texttt{m} external RAM. |
---|
386 | We use the function \texttt{eval\_statement} to evaluate a single joint statement: |
---|
387 | \begin{lstlisting} |
---|
388 | definition eval_statement: |
---|
389 | ∀globals: list ident.∀p:sem_params2 globals. |
---|
390 | genv globals p → state p → IO io_out io_in (trace × (state p)) := |
---|
391 | ... |
---|
392 | \end{lstlisting} |
---|
393 | We examine the type of this function. |
---|
394 | Note that it returns a monadic action, \texttt{IO}, denoting that it may have an IO \emph{side effect}, where the program reads or writes to some external device or memory address. |
---|
395 | Monads and their use are further discussed in Subsection~\ref{subsect.use.of.monads}. |
---|
396 | Further, the function returns a new state, updated by the single step of execution of the program. |
---|
397 | Finally, a \emph{trace} is also returned, which records the trace of cost labels that the program passes through, in order to calculate the cost model for the CerCo compiler. |
---|
398 | |
---|
399 | %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% |
---|
400 | % SECTION. % |
---|
401 | %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% |
---|
402 | \subsection{Use of monads} |
---|
403 | \label{subsect.use.of.monads} |
---|
404 | |
---|
405 | Monads are a categorical notion that have recently gained an amount of traction in functional programming circles. |
---|
406 | In particular, it was noted by Moggi that monads could be used to sequence \emph{effectful} computations in a pure manner. |
---|
407 | Here, `effectful computations' cover a lot of ground, from writing to files, generating fresh names, or updating an ambient notion of state. |
---|
408 | |
---|
409 | A monad can be characterised by the following: |
---|
410 | \begin{itemize} |
---|
411 | \item |
---|
412 | A data type, $M$. |
---|
413 | For instance, the \texttt{option} type in O'Caml or Matita. |
---|
414 | \item |
---|
415 | A way to `inject' or `lift' pure values into this data type (usually called \texttt{return}). |
---|
416 | We call this function \texttt{return} and say that it must have type $\alpha \rightarrow M \alpha$, where $M$ is the name of the monad. |
---|
417 | In our example, the `lifting' function for the \texttt{option} monad can be implemented as: |
---|
418 | \begin{lstlisting} |
---|
419 | let return x = Some x |
---|
420 | \end{lstlisting} |
---|
421 | \item |
---|
422 | A way to `sequence' monadic functions together, to form another monadic function, usually called \texttt{bind}. |
---|
423 | Bind has type $M \alpha \rightarrow (\alpha \rightarrow M \beta) \rightarrow M \beta$. |
---|
424 | We can see that bind `unpacks' a monadic value, applies a function after unpacking, and `repacks' the new value in the monad. |
---|
425 | In our example, the sequencing function for the \texttt{option} monad can be implemented as: |
---|
426 | \begin{lstlisting} |
---|
427 | let bind o f = |
---|
428 | match o with |
---|
429 | None -> None |
---|
430 | Some s -> f s |
---|
431 | \end{lstlisting} |
---|
432 | \item |
---|
433 | A series of algebraic laws that relate \texttt{return} and \texttt{bind}, ensuring that the sequencing operation `does the right thing' by retaining the order of effects. |
---|
434 | These \emph{monad laws} should also be useful in reasoning about monadic computations in the proof of correctness of the compiler. |
---|
435 | \end{itemize} |
---|
436 | In the semantics of both front and backend intermediate languages, we make use of monads. |
---|
437 | This monadic infrastructure is shared between the frontend and backend languages. |
---|
438 | |
---|
439 | In particular, an `IO' monad, signalling the emission or reading of data to some external location or memory address, is heavily used in the semantics of the intermediate languages. |
---|
440 | Here, the monad's sequencing operation ensures that emissions and reads are maintained in the correct order. |
---|
441 | Most functions in the intermediate language semantics fall into the IO monad. |
---|
442 | In particular, we have already seen how the \texttt{eval\_statement} function of the joint language is monadic, with type: |
---|
443 | \begin{lstlisting} |
---|
444 | definition eval_statement: |
---|
445 | ∀globals: list ident.∀p:sem_params2 globals. |
---|
446 | genv globals p → state p → IO io_out io_in (trace × (state p)) := |
---|
447 | ... |
---|
448 | \end{lstlisting} |
---|
449 | If we in the body of \texttt{eval\_statement}, we may also see how the monad sequences effects. |
---|
450 | For instance, in the case for the \texttt{LOAD} statement, we have the following: |
---|
451 | \begin{lstlisting} |
---|
452 | definition eval_statement: |
---|
453 | ∀globals: list ident. ∀p:sem_params2 globals. |
---|
454 | genv globals p → state p → IO io_out io_in (trace × (state p)) := |
---|
455 | $\lambda$globals, p, ge, st. |
---|
456 | ... |
---|
457 | match s with |
---|
458 | | LOAD dst addrl addrh ⇒ |
---|
459 | ! vaddrh $\leftarrow$ dph_retrieve … st addrh; |
---|
460 | ! vaddrl $\leftarrow$ dpl_retrieve … st addrl; |
---|
461 | ! vaddr $\leftarrow$ pointer_of_address 〈vaddrl,vaddrh〉; |
---|
462 | ! v $\leftarrow$ opt_to_res … (msg FailedLoad) (beloadv (m … st) vaddr); |
---|
463 | ! st $\leftarrow$ acca_store p … dst v st; |
---|
464 | ! st $\leftarrow$ next … l st ; |
---|
465 | ret ? $\langle$E0, st$\rangle$ |
---|
466 | \end{lstlisting} |
---|
467 | Here, we employ a certain degree of syntactic sugaring. |
---|
468 | The syntax |
---|
469 | \begin{lstlisting} |
---|
470 | ... |
---|
471 | ! vaddrh $\leftarrow$ dph_retrieve … st addrh; |
---|
472 | ! vaddrl $\leftarrow$ dpl_retrieve … st addrl; |
---|
473 | ... |
---|
474 | \end{lstlisting} |
---|
475 | is sugaring for the \texttt{IO} monad's binding operation. |
---|
476 | We can expand this sugaring to the following much more verbose code: |
---|
477 | \begin{lstlisting} |
---|
478 | ... |
---|
479 | bind (dph_retrieve … st addrh) ($\lambda$vaddrh. bind (dpl_retrieve … st addrl) |
---|
480 | ($\lambda$vaddrl. ...)) |
---|
481 | \end{lstlisting} |
---|
482 | Note also that the function \texttt{ret} is implementing the `lifting', or return function of the \texttt{IO} monad. |
---|
483 | |
---|
484 | We believe the sugaring for the monadic bind operation makes the program much more readable, and therefore easier to reason about. |
---|
485 | In particular, note that the functions \texttt{dph\_retrieve}, \texttt{pointer\_of\_address}, \texttt{acca\_store} and \texttt{next} are all monadic. |
---|
486 | The function \texttt{opt\_to\_res} is also --- this is a `utility' function that lifts an option type into the \texttt{IO} monad. |
---|
487 | |
---|
488 | \subsection{Memory models} |
---|
489 | \label{subsect.memory.models} |
---|
490 | |
---|
491 | Currently, the semantics of the front and backend intermediate languages are built around two distinct memory models. |
---|
492 | The frontend languages reuse the CompCert memory model, whereas the backend languages employ a bespoke model tailored to their needs. |
---|
493 | This split between the memory models is reflective of the fact that the front and backend language have different requirements from their memory models, to a certain extent. |
---|
494 | |
---|
495 | In particular, the CompCert memory model places quite heavy restrictions on where in memory one can read from. |
---|
496 | To read a value in this memory model, you must supply an address, complete with a number of `chunks' to read following that address. |
---|
497 | The read is only successful if you attempt to read at a genuine `value boundary', and read the appropriate number of memory chunks for that value. |
---|
498 | As a result, with the CompCert memory model you are unable to read the third byte of a 32-bit integer value directly from memory, for instance. |
---|
499 | This has some consequences for the CompCert compiler, namely an inability to write a \texttt{memcpy} routine. |
---|
500 | |
---|
501 | However, the CerCo memory model operates differently, as we need to move data `piecemeal' between stacks in the backend of the compiler. |
---|
502 | As a result, the bespoke memory model allows one to read data at any memory location, not just on value boundaries. |
---|
503 | This has the advantage that we can successfully write a \texttt{memcpy} routine with the CerCo compiler (remembering that \texttt{memcpy} is nothing more than `read a byte, copy a byte' repeated in a loop), an advantage over CompCert. |
---|
504 | |
---|
505 | Right now, the two memory models are interfaced during the translation from RTLabs to RTL. |
---|
506 | It is an open question whether we will unify the two memory models, using only the backend, bespoke memory model throughout the compiler, as the CompCert memory model seems to work fine for the frotend, where such byte-by-byte copying is not needed. |
---|
507 | However, should we decide to port the frontend to the new memory model, it has been written in such an abstract way that doing so would be relatively straightforward. |
---|
508 | |
---|
509 | %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% |
---|
510 | % SECTION. % |
---|
511 | %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% |
---|
512 | \section{Future work} |
---|
513 | \label{sect.future.work} |
---|
514 | |
---|
515 | A few small axioms remain to be closed. |
---|
516 | These relate to fetching the next instruction to be interpreted from the control flow graph, or linearised representation, of the language. |
---|
517 | Closing these axioms should not be a problem. |
---|
518 | |
---|
519 | Further, tailcalls to external functions are currently axiomatised. |
---|
520 | This is due to there being a difficulty with how stackframes are handled with external function calls. |
---|
521 | We leave this for further work, due to there being no pressing need to implement this feature at the present time. |
---|
522 | |
---|
523 | There is also, as mentioned, an open problem as to whether the frontend languages should use the same memory model as the backend languages, as opposed to reusing the CompCert memory model. |
---|
524 | Should this decision be taken, this will likely be straightforward but potentially time consuming. |
---|
525 | |
---|
526 | \newpage |
---|
527 | |
---|
528 | %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% |
---|
529 | % SECTION. % |
---|
530 | %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% |
---|
531 | \section{Code listing} |
---|
532 | \label{sect.code.listing} |
---|
533 | |
---|
534 | %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% |
---|
535 | % SECTION. % |
---|
536 | %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% |
---|
537 | \subsection{Listing of files} |
---|
538 | \label{subsect.listing.files} |
---|
539 | |
---|
540 | Semantics specific files (files relating to language translations ommitted). |
---|
541 | Syntax specific files: |
---|
542 | \begin{center} |
---|
543 | \begin{tabular*}{\textwidth}{p{5.5cm}p{9.5cm}} |
---|
544 | Title & Description \\ |
---|
545 | \hline |
---|
546 | \texttt{RTLabs/syntax.ma} & The syntax of RTLabs \\ |
---|
547 | \texttt{joint/Joint.ma} & Abstracted syntax for backend languages \\ |
---|
548 | \texttt{RTL/RTL.ma} & The syntax of RTL \\ |
---|
549 | \texttt{ERTL/ERTL.ma} & The syntax of ERTL \\ |
---|
550 | \texttt{LIN/joint\_LTL\_LIN.ma} & The syntax of the abstracted combined LTL and LIN language \\ |
---|
551 | \texttt{LTL/LTL.ma} & The specialisation of the above file to the syntax of LTL \\ |
---|
552 | \texttt{LIN/LIN.ma} & The specialisation of the above file to the syntax of LIN \\ |
---|
553 | \end{tabular*} |
---|
554 | \end{center} |
---|
555 | |
---|
556 | \noindent |
---|
557 | Semantics specific files: |
---|
558 | \begin{center} |
---|
559 | \begin{tabular*}{\textwidth}{p{5.5cm}p{9.5cm}} |
---|
560 | Title & Description \\ |
---|
561 | \hline |
---|
562 | \texttt{RTLabs/semantics.ma} & The semantics of RTLabs \\ |
---|
563 | \texttt{joint/semantics.ma} & The semantics of the abstracted backend languages. \\ |
---|
564 | \texttt{joint/SemanticUtils.ma} & Generic utilities used in the semantics of all `joint' intermediate languages \\ |
---|
565 | \texttt{RTL/semantics.ma} & The semantics of RTL \\ |
---|
566 | \texttt{ERTL/semantics.ma} & The semantics of ERTL \\ |
---|
567 | \texttt{LIN/joint\_LTL\_LIN\_semantics.ma} & The semantics of the joint LTL-LIN language \\ |
---|
568 | \texttt{LTL/semantics.ma} & The semantics of LTL \\ |
---|
569 | \texttt{LIN/semantics.ma} & The semantics of LIN |
---|
570 | \end{tabular*} |
---|
571 | \end{center} |
---|
572 | |
---|
573 | %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% |
---|
574 | % SECTION. % |
---|
575 | %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% |
---|
576 | \subsection{Listing of important functions and axioms} |
---|
577 | \label{subsect.listing.important.functions.axioms} |
---|
578 | |
---|
579 | We list some important functions and axioms in the backend semantics: |
---|
580 | |
---|
581 | \end{document} |
---|