1 | % Compiler proof |
---|
2 | % Structure of proof, and high-level discussion |
---|
3 | % Technical devices: structured traces, labelling, etc. |
---|
4 | % Assembler proof |
---|
5 | % Technical issues in front end (Brian?) |
---|
6 | % Main theorem statement |
---|
7 | |
---|
8 | \section{Compiler proof} |
---|
9 | \label{sect.compiler.proof} |
---|
10 | |
---|
11 | \subsection{Labelling approach} |
---|
12 | |
---|
13 | In this section, we will explore the labelling approach mentioned in the introduction in more detail. |
---|
14 | |
---|
15 | The `labelling' approach to the problem of building |
---|
16 | cost annotations is summarized in the following diagram. |
---|
17 | |
---|
18 | \newcolumntype{M}[1]{>{\raggedright}m{#1}} |
---|
19 | |
---|
20 | \- |
---|
21 | |
---|
22 | \begin{tabular}{M{10cm}||M{4cm}} |
---|
23 | $$ |
---|
24 | \xymatrix{ |
---|
25 | % Row 1 |
---|
26 | L_1 \\ |
---|
27 | % |
---|
28 | % Row 2 |
---|
29 | L_{1,\ell} |
---|
30 | \ar[u]^{\cl{I}} |
---|
31 | \ar@/^/[d]^{\w{er}_1} |
---|
32 | \ar[r]^{\cl{C}_1} |
---|
33 | % |
---|
34 | & L_{2,\ell} |
---|
35 | \ar[d]^{\w{er}_2} |
---|
36 | % |
---|
37 | & \ldots \hspace{0.3cm}\ar[r]^{\cl{C}_k} |
---|
38 | % |
---|
39 | & L_{k+1, \ell} |
---|
40 | \ar[d]^{\w{er}_{k+1}} \\ |
---|
41 | % |
---|
42 | % Row 3 |
---|
43 | L_1 |
---|
44 | \ar@/^/[u]^{\cl{L}} |
---|
45 | \ar[r]^{\cl{C}_1} |
---|
46 | & L_2 |
---|
47 | & \ldots\hspace{0.3cm} |
---|
48 | \ar[r]^{\cl{C}_{k}} |
---|
49 | & L_{k+1} |
---|
50 | } |
---|
51 | $$ |
---|
52 | & |
---|
53 | $$ |
---|
54 | \begin{array}{ccc} |
---|
55 | \w{er}_{i+1} \comp \cl{C}_{i} &= &\cl{C}_{i} \comp \w{er}_{i} \\ |
---|
56 | |
---|
57 | \w{er}_{1} \comp \cl{L} &= &\w{id}_{L_{1}} \\ |
---|
58 | |
---|
59 | \w{An}_{1} &= &\cl{I} \comp \cl{L} |
---|
60 | \end{array} |
---|
61 | $$ |
---|
62 | \end{tabular} |
---|
63 | |
---|
64 | \- |
---|
65 | |
---|
66 | For each language $L_i$ considered in the compilation process, |
---|
67 | we define an extended {\em labelled} language $L_{i,\ell}$ and an |
---|
68 | extended operational semantics. The labels are used to mark |
---|
69 | certain points of the control. The semantics makes sure |
---|
70 | that whenever we cross a labelled control point a labelled and observable |
---|
71 | transition is produced. |
---|
72 | |
---|
73 | For each labelled language there is an obvious function $\w{er}_i$ |
---|
74 | erasing all labels and producing a program in the corresponding |
---|
75 | unlabelled language. |
---|
76 | The compilation functions $\cl{C}_i$ are extended from the unlabelled |
---|
77 | to the labelled language so that they enjoy commutation |
---|
78 | with the erasure functions. Moreover, we lift the soundness properties of |
---|
79 | the compilation functions from the unlabelled to the labelled languages |
---|
80 | and transition systems. |
---|
81 | |
---|
82 | A {\em labelling} $\cl{L}$ of the source language $L_1$ is just a function |
---|
83 | such that $\w{er}_{L_{1}} \comp \cl{L}$ is the identity function. |
---|
84 | An {\em instrumentation} $\cl{I}$ of the source labelled language $L_{1,\ell}$ |
---|
85 | is a function replacing the labels with suitable increments of, say, |
---|
86 | a fresh \w{cost} variable. Then an {\em annotation} $\w{An}_{1}$ of the |
---|
87 | source program can be derived simply as the composition of the labelling |
---|
88 | and the instrumentation functions: $\w{An}_{1} = \cl{I}\comp \cl{L}$. |
---|
89 | |
---|
90 | As for the direct approach, suppose $s$ is some adequate representation |
---|
91 | of the state of a program. Let $S$ be a source program and suppose |
---|
92 | that its annotation satisfies the following property: |
---|
93 | \begin{equation}\label{STEP1} |
---|
94 | (\w{An}_{1}(S),s[c/\w{cost}]) \eval s'[c+\delta/\w{cost}] |
---|
95 | \end{equation} |
---|
96 | where $\delta$ is some non-negative number. |
---|
97 | Then the definition of the instrumentation and the fact that |
---|
98 | the soundness proofs of the compilation functions have been lifted |
---|
99 | to the labelled languages allows to conclude that |
---|
100 | \begin{equation}\label{step2} |
---|
101 | (\cl{C}(L(S)),s[c/\w{cost}]) \eval (s'[c/\w{cost}],\lambda) |
---|
102 | \end{equation} |
---|
103 | where $\cl{C} = \cl{C}_{k} \comp \cdots \comp \cl{C}_{1}$ and |
---|
104 | $\lambda$ is a sequence (or a multi-set) of labels whose `cost' |
---|
105 | corresponds to the number $\delta$ produced by the |
---|
106 | annotated program. |
---|
107 | Then the commutation properties of erasure and compilation functions |
---|
108 | allows to conclude that the {\em erasure} of the compiled |
---|
109 | labelled code $\w{er}_{k+1}(\cl{C}(L(S)))$ is actually equal to |
---|
110 | the compiled code $\cl{C}(S)$ we are interested in. |
---|
111 | Given this, the following question arises: |
---|
112 | |
---|
113 | \begin{quote} |
---|
114 | Under which conditions |
---|
115 | the sequence $\lambda$, {\em i.e.}, the increment $\delta$, |
---|
116 | is a sound and possibly precise description of the |
---|
117 | execution cost of the object code? |
---|
118 | \end{quote} |
---|
119 | |
---|
120 | To answer this question, we observe that the object code we are interested in is some kind of assembly code |
---|
121 | and its control flow can be easily represented as a control flow |
---|
122 | graph. The fact that we have to prove the soundness of the compilation |
---|
123 | functions means that we have plenty of pieces of information |
---|
124 | on the way the control flows in the compiled code, in particular as |
---|
125 | far as procedure calls and returns are concerned. |
---|
126 | These pieces of information allow to build a rather accurate representation |
---|
127 | of the control flow of the compiled code at run time. |
---|
128 | |
---|
129 | The idea is then to perform two simple checks on the control flow |
---|
130 | graph. The first check is to verify that all loops go through a labelled node. |
---|
131 | If this is the case then we can associate a finite cost with |
---|
132 | every label and prove that the cost annotations are sound. |
---|
133 | The second check amounts to verify that all paths starting from |
---|
134 | a label have the same cost. If this check is successful then we can |
---|
135 | conclude that the cost annotations are precise. |
---|
136 | |
---|
137 | This approach, as published in~\cite{Ayache2012} is the one used in the CerCo project. However, it has some weaknesses, which we will describe in the next subsection. We also propose a method to resolve these weaknesses. |
---|
138 | |
---|
139 | \subsection{Indexed labelling} |
---|
140 | |
---|
141 | THe labelling approach as described above does not deal well with some optimisations the compiler might introduce, such as changes in the control flow. For example, the 8051 architecture does not have an instruction for |
---|
142 | multi-word division, which means that a function containing loops has to be inserted into the code. In fact, this is symptomatic of a more general weakness: the inability to state different costs for different occurrences of labels, |
---|
143 | where the difference might be originated by labels being duplicated along the compilation, or the costs being sensitive to the current state of execution. |
---|
144 | |
---|
145 | We present a solution that addresses this weakness by introducing cost labels that are dependent on the iteration of their containing loop. |
---|
146 | This is achieved by means of \emph{indexed labels}; all cost labels are decorated with formal indices coming from the loops containing such labels. |
---|
147 | These indices allow us to keep track, even after multiple loop transformations, of which iteration of the original loop in the source code a particular label occurrence belongs to. |
---|
148 | During the annotation stage of the source code, this information is presented to the user by means of \emph{dependent costs}. |
---|
149 | |
---|
150 | Over the course of this section, we will use two specific types of optimisations to introduce the indexed labelling approach. We argue that these two types pose a good foundation for extending the labelling approach, |
---|
151 | in order to cover more and more common optimisations, as well as gaining insight into how to integrate advanced cost estimation techniques, such as cache analysis, into the CerCo compiler. |
---|
152 | Moreover loop peeling itself has the fortuitous property of enhancing and enabling other optimisations. |
---|
153 | Experimentation with CerCo's untrusted prototype compiler, which implements constant propagation and partial redundancy elimination~\cite{PRE,muchnick}, show how loop peeling enhances those other optimisations. |
---|
154 | |
---|
155 | \paragraph{Loop peeling} |
---|
156 | Loop peeling consists in preceding the loop with a copy of its body, appropriately guarded. |
---|
157 | This is used, in general, to trigger further optimisations, such as those that rely on execution information which can be computed at compile time, but which is erased by further iterations of the loop, or those that use the hoisted code to be more effective at eliminating redundant code. |
---|
158 | Integrating this transformation in to the labelling approach would also allow the integration of the common case of cache analysis explained above; the analysis of cache hits and misses usually benefits from a form of \emph{virtual} loop peeling~\cite{cacheprediction}. |
---|
159 | |
---|
160 | \paragraph{Loop unrolling} |
---|
161 | This optimisation consists of the repetition of several copies of the body of the loop inside the loop itself (inserting appropriate guards, or avoiding them altogether if enough information about the loop's guard is available at compile time). |
---|
162 | This can limit the number of (conditional or unconditional) jumps executed by the code and trigger further optimisations dealing with pipelining, if appropriate for the architecture. |
---|
163 | |
---|
164 | \subsubsection{\imp{} with goto}\label{sec:defimp} |
---|
165 | We briefly outline a toy language, \imp{} with \s{goto}s. |
---|
166 | This language was designed in order to pose problems for the existing labelling approach, and as a testing ground for our new notion of indexed labels. |
---|
167 | |
---|
168 | The syntax and operational semantics of \imp{} are presented in figure~\ref{fig:minimp}. |
---|
169 | Note that we can augment the language with \s{break} and \s{continue} statements at no further expense. |
---|
170 | |
---|
171 | \begin{figureplr} |
---|
172 | $$\begin{gathered} |
---|
173 | \begin{array}{nlBl>(R<)n} |
---|
174 | \multicolumn{4}{C}{\bfseries Syntax}\\ |
---|
175 | \multicolumn{4}{ncn}{ |
---|
176 | \ell,\ldots \hfill \text{(labels)} \hfill x,y,\ldots \hfill |
---|
177 | \text{(identifiers)} |
---|
178 | \hfill e,f,\ldots \hfill \text{(expression)} |
---|
179 | }\\ |
---|
180 | P,S,T,\ldots &\gramm& \s{skip} \mid s;t |
---|
181 | \mid \sop{if}e\sbin{then}s\sbin{else}t |
---|
182 | \mid \sop{while} e \sbin{do} s \mid |
---|
183 | x \ass e |
---|
184 | \\&\mid& |
---|
185 | \ell : s \mid \sop{goto}\ell& \spanr{-2}{statements}\\ |
---|
186 | \\ |
---|
187 | \multicolumn{4}{C}{\bfseries Semantics}\\ |
---|
188 | K,\ldots &\gramm& \s{halt} \mid S \cdot K & continuations |
---|
189 | \end{array} |
---|
190 | \\[15pt] |
---|
191 | \s{find}(\ell,S,K) \ass |
---|
192 | \left\{\begin{array}{lL} |
---|
193 | \bot & if $S=\s{skip},\sop{goto} \ell'$ or $x\ass e$,\\ |
---|
194 | (T, K) & if $S=\ell:T$,\\ |
---|
195 | \s{find}(\ell,T,K) & otherwise, if $S = \ell':T$,\\ |
---|
196 | \s{find}(\ell,T_1,T_2\cdot K) & if defined and $S=T_1;T_2$,\\ |
---|
197 | \s{find}(\ell,T_1,K) & if defined and |
---|
198 | $S=\sop{if}b\sbin{then}T_1\sbin{else}T_2$,\\ |
---|
199 | \s{find}(\ell,T_2,K) & otherwise, if $S=T_1;T_2$ or |
---|
200 | $\sop{if}b\sbin{then}T_1\sbin{else}T_2$,\\ |
---|
201 | \s{find}(\ell,T,S\cdot K) & if $S = \sop{while}b\sbin{do}T$. |
---|
202 | \end{array}\right. |
---|
203 | \\[15pt] |
---|
204 | \begin{array}{lBl} |
---|
205 | (x:=e,K,s) &\to_P& (\s{skip},K,s[v/x]) \qquad\mbox{if }(e,s)\eval v \\ \\ |
---|
206 | |
---|
207 | (S;T,K,s) &\to_P& (S,T\cdot K,s) \\ \\ |
---|
208 | |
---|
209 | (\s{if} \ b \ \s{then} \ S \ \s{else} \ T,K,s) |
---|
210 | &\to_P&\left\{ |
---|
211 | \begin{array}{ll} |
---|
212 | (S,K,s) &\mbox{if }(b,s)\eval v \neq 0 \\ |
---|
213 | (T,K,s) &\mbox{if }(b,s)\eval 0 |
---|
214 | \end{array} |
---|
215 | \right. \\ \\ |
---|
216 | |
---|
217 | |
---|
218 | (\s{while} \ b \ \s{do} \ S ,K,s) |
---|
219 | &\to_P&\left\{ |
---|
220 | \begin{array}{ll} |
---|
221 | (S,\s{while} \ b \ \s{do} \ S \cdot K,s) &\mbox{if }(b,s)\eval v \neq 0 \\ |
---|
222 | (\s{skip},K,s) &\mbox{if }(b,s)\eval 0 |
---|
223 | \end{array} |
---|
224 | \right. \\ \\ |
---|
225 | |
---|
226 | |
---|
227 | (\s{skip},S\cdot K,s) &\to_P&(S,K,s) \\ \\ |
---|
228 | |
---|
229 | (\ell : S, K, s) &\to_P& (S,K,s) \\ \\ |
---|
230 | |
---|
231 | (\sop{goto}\ell,K,s) &\to_P& (\s{find}(\ell,P,\s{halt}),s) \\ \\ |
---|
232 | \end{array} |
---|
233 | \end{gathered}$$ |
---|
234 | \caption{The syntax and operational semantics of \imp.} |
---|
235 | \label{fig:minimp} |
---|
236 | \end{figureplr} |
---|
237 | |
---|
238 | The precise grammar for expressions is not particularly relevant so we do not give one in full. |
---|
239 | For the sake of conciseness we also treat boolean and arithmetic expressions together (with the usual \textsc{c} convention of an expression being true iff non-zero). |
---|
240 | We may omit the \s{else} clause of a conditional if it leads to a \s{skip} statement. |
---|
241 | |
---|
242 | We will presuppose that all programs are \emph{well-labelled}, \ie every label labels at most one occurrence of a statement in a program, and every \s{goto} points to a label actually present in the program. |
---|
243 | The \s{find} helper function has the task of not only finding the labelled statement in the program, but also building the correct continuation. |
---|
244 | The continuation built by \s{find} replaces the current continuation in the case of a jump. |
---|
245 | |
---|
246 | \paragraph{Further down the compilation chain} |
---|
247 | We abstract over the rest of the compilation chain. |
---|
248 | We posit the existence of a suitable notion of `sequential instructions', wherein each instruction has a single natural successor to which we can add our own, for every language $L$ further down the compilation chain. |
---|
249 | |
---|
250 | \subsubsection{Loop transformations} |
---|
251 | We call a loop $L$ \emph{single-entry} in $P$ if there is no \s{goto} to $P$ outside of $L$ which jumps into $L$.\footnote{This is a reasonable aproximation: it defines a loop as multi-entry if it has an external but unreachable \s{goto} jumping into it.} |
---|
252 | Many loop optimisations do not preserve the semantics of multi-entry loops in general, or are otherwise rendered ineffective. |
---|
253 | Usually compilers implement a single-entry loop detection which avoids the multi-entry ones from being targeted by optimisations~\cite{muchnick,morgan}. |
---|
254 | The loop transformations we present are local, \ie they target a single loop and transform it. |
---|
255 | Which loops are targeted may be decided by some \emph{ad hoc} heuristic. |
---|
256 | However, the precise details of which loops are targetted and how is not important here. |
---|
257 | |
---|
258 | \paragraph{Loop peeling} |
---|
259 | $$ |
---|
260 | \sop{while}b\sbin{do}S \mapsto \sop{if}b\sbin{then} S; \sop{while} b \sbin{do} S[\ell'_i/\ell_i] |
---|
261 | $$ |
---|
262 | where $\ell'_i$ is a fresh label for any $\ell_i$ labelling a statement in $S$. |
---|
263 | This relabelling is safe for \s{goto}s occurring outside the loop because of the single-entry condition. |
---|
264 | Note that for \s{break} and \s{continue} statements, those should be replaced with \s{goto}s in the peeled body $S$. |
---|
265 | |
---|
266 | \paragraph{Loop unrolling} |
---|
267 | $$ |
---|
268 | \sop{while}b\sbin{do}S\mapsto |
---|
269 | \sop{while} b \sbin{do} (S ; |
---|
270 | \sop{if} b \sbin{then} (S[\ell^1_i/\ell_i] ; |
---|
271 | \cdots |
---|
272 | \sop{if} b \sbin{then} S[\ell^n_i/\ell_i]) \cdots) |
---|
273 | $$ |
---|
274 | where $\ell^j_i$ are again fresh labels for any $\ell_i$ labelling a statement in $S$. |
---|
275 | This is a wilfully na\"{i}ve version of loop unrolling, which usually targets less general loops. |
---|
276 | The problem this transformation poses to CerCo's labelling approach are independent of the sophistication of the actual transformation. |
---|
277 | |
---|
278 | \begin{example} |
---|
279 | In \autoref{fig:example1} we show a program (a wilfully inefficient computation of of the |
---|
280 | sum of the first $n$ factorials) and a possible transformation of it, combining loop |
---|
281 | peeling and loop unrolling. |
---|
282 | \begin{figureplr} |
---|
283 | $$ |
---|
284 | \fbox{$\begin{array}{l} |
---|
285 | s\ass 0;\\ |
---|
286 | i\ass 0;\\ |
---|
287 | \sop{while}i<n\sbin{do}\\ |
---|
288 | \inde p\ass 1;\\ |
---|
289 | \inde j\ass 1;\\ |
---|
290 | \inde \sop{while}j \le i\sbin{do}\\ |
---|
291 | \inde \inde p\ass j*p\\ |
---|
292 | \inde \inde j\ass j+1;\\ |
---|
293 | \inde s\ass s+p;\\ |
---|
294 | \inde i\ass i+1;\\ |
---|
295 | \end{array} |
---|
296 | $} |
---|
297 | \mapsto |
---|
298 | \fbox{$\begin{array}{l} |
---|
299 | s\ass 0;\\ |
---|
300 | i\ass 0;\\ |
---|
301 | \tikztargetm{a}{\s{if}}\ i<n\sbin{then}\\ |
---|
302 | \inde p\ass 1;\\ |
---|
303 | \inde j\ass 1;\\ |
---|
304 | \inde \sop{while}j \le i\sbin{do}\\ |
---|
305 | \inde \inde p\ass j*p\\ |
---|
306 | \inde \inde j\ass j+1;\\ |
---|
307 | \inde s\ass s+p;\\ |
---|
308 | \inde i\ass i+1;\\ |
---|
309 | \inde \tikztargetm{d}{\s{while}}\ i<n\sbin{do}\\ |
---|
310 | \inde \inde p\ass 1;\\ |
---|
311 | \inde \inde j\ass 1;\\ |
---|
312 | \inde \inde \tikztargetm{b}{\s{if}}\ j \le i\sbin{then}\\ |
---|
313 | \inde \inde \inde p\ass j*p\\ |
---|
314 | \inde \inde \inde j\ass j+1;\\ |
---|
315 | \inde \inde \inde \sop{if}j \le i\sbin{then}\\ |
---|
316 | \inde \inde \inde \inde p\ass j*p\\ |
---|
317 | \inde \inde \inde \inde j\ass j+1;\\ |
---|
318 | \inde \inde \inde \inde \s{while}\ j \le i\sbin{do}\\ |
---|
319 | \inde \inde \inde \inde \inde p\ass j*p\\ |
---|
320 | \inde \inde \inde \inde \inde j\ass j+1;\\ |
---|
321 | \inde \inde \inde \inde \inde \sop{if}j \le i\sbin{then}\\ |
---|
322 | \inde \inde \inde \inde \inde \inde p\ass j*p\\ |
---|
323 | \inde \inde \inde \inde \inde \inde \tikztargetm{c}j\ass j+1;\\ |
---|
324 | \inde \inde s\ass s+p;\\ |
---|
325 | \inde \inde i\ass i+1;\\ |
---|
326 | \inde \inde \sop{if}i<n\sbin{then}\\ |
---|
327 | \inde \inde \inde p\ass 1;\\ |
---|
328 | \inde \inde \inde j\ass 1;\\ |
---|
329 | \inde \inde \inde \tikztargetm{e}{\s{while}}\ j < i\sbin{do}\\ |
---|
330 | \inde \inde \inde \inde p\ass j*p\\ |
---|
331 | \inde \inde \inde \inde j\ass j+1;\\ |
---|
332 | \inde \inde \inde \inde \s{if}\ j < i\sbin{then}\\ |
---|
333 | \inde \inde \inde \inde \inde p\ass j*p\\ |
---|
334 | \inde \inde \inde \inde \inde \tikztargetm{f}j\ass j+1;\\ |
---|
335 | \inde \inde \inde s\ass s+p;\\ |
---|
336 | \inde \inde \inde i\ass i+1\tikztargetm{g};{} |
---|
337 | \end{array} |
---|
338 | $}\tikztargetm{right}{} |
---|
339 | \begin{tikzpicture}[overlay, remember picture, thick, |
---|
340 | brace/.style = {decorate, decoration={brace, amplitude = 15pt}}, |
---|
341 | label/.style = {sloped, anchor = base, yshift = 17pt, font = \large}] |
---|
342 | \draw [brace, transform canvas={xshift=5pt}] (b.north-|right) -- node[label]{peeled} (c.south-|right); |
---|
343 | \draw [brace, transform canvas={xshift=30pt}] (b.north-|right) -- node[label]{unrolled} (c.south-|right); |
---|
344 | \draw [brace, transform canvas={xshift=5pt}] (e.north-|right) -- node[label]{unrolled} (f.south-|right); |
---|
345 | \draw [brace, transform canvas={xshift=55pt}] (d.north-|right) -- node[label]{unrolled} (g.south-|right); |
---|
346 | \draw [brace, transform canvas={xshift=80pt}] (a.north-|right) -- node[label]{peeled} (g.south-|right); |
---|
347 | \end{tikzpicture} |
---|
348 | \hspace{85pt}{} |
---|
349 | $$ |
---|
350 | \caption{An example of loop transformations done on an \imp{} program. Parentheses are omitted in favour of |
---|
351 | blocks by indentation.} |
---|
352 | \label{fig:example1} |
---|
353 | \end{figureplr} |
---|
354 | \end{example} |
---|
355 | |
---|
356 | \subsubsection{Indexed labels} |
---|
357 | \label{sec:indexedlabels} |
---|
358 | This section presents the core of the new approach. |
---|
359 | In brief points it amounts to the following: |
---|
360 | \begin{enumerate}[\bfseries~\ref*{sec:indexedlabels}.1.] |
---|
361 | \item |
---|
362 | \label{en:outline1} |
---|
363 | Enrich cost labels with formal indices corresponding, at the beginning of the process, to which iteration of the loop they belong to. |
---|
364 | \item |
---|
365 | \label{en:outline2} |
---|
366 | Each time a loop transformation is applied and a cost labels is split in different occurrences, each of these will be reindexed so that every time they are emitted their position in the original loop will be reconstructed. |
---|
367 | \item |
---|
368 | \label{en:outline3} |
---|
369 | Along the compilation chain, alongside the \s{emit} instruction we add other instructions updating the indices, so that iterations of the original loops can be rebuilt at the operational semantics level. |
---|
370 | \item |
---|
371 | \label{en:outline4} |
---|
372 | The machinery computing the cost mapping will still work, but assigning costs to indexed cost labels, rather than to cost labels as we wish. |
---|
373 | However, \emph{dependent costs} can be calculated, where dependency is on which iteration of the containing loops we are in. |
---|
374 | \end{enumerate} |
---|
375 | |
---|
376 | \subsubsection{Indexing the cost labels} |
---|
377 | \label{ssec:indlabs} |
---|
378 | |
---|
379 | \paragraph{Formal indices and $\iota\ell\imp$} |
---|
380 | Let $i_0,i_1,\ldots$ be a sequence of distinguished fresh identifiers that will be used as loop indices. |
---|
381 | A \emph{simple expression} is an affine arithmetical expression in one of these indices, that is $a*i_k+b$ with $a,b,k \in \mathbb N$. |
---|
382 | Simple expressions $e_1=a_1*i_k+b_1$, $e_2=a2*i_k+b_2$ in the same index can be composed, yielding $e_1\circ e_2\ass (a_1a_2)*i_k + (a_1b2+b_1)$, and this operation has an identity element in $id_k \ass 1*i_k+0$. |
---|
383 | Constants can be expressed as simple expressions, so that we identify a natural $c$ with $0*i_k+c$. |
---|
384 | |
---|
385 | An \emph{indexing} (with metavariables $I$, $J$, \ldots) is a list of transformations of successive formal indices dictated by simple expressions, that is a mapping\footnote{Here we restrict each mapping to be a simple expression \emph{on the same index}. |
---|
386 | This might not be the case if more loop optimisations are accounted for (for example, interchanging two nested loops).} |
---|
387 | $$ |
---|
388 | i_0\mapsto a_0*i_0+b_0,\dots, i_{k-1} \mapsto a_{k-1}*i_{k-1}+b_{k-1} |
---|
389 | $$ |
---|
390 | |
---|
391 | An \emph{indexed cost label} (metavariables $\alphab$, $\betab$, \ldots) is the combination of a cost label $\alpha$ and an indexing $I$, written $\alpha\la I\ra$. |
---|
392 | The cost label underlying an indexed one is called its \emph{atom}. |
---|
393 | All plain labels can be considered as indexed ones by taking an empty indexing. |
---|
394 | |
---|
395 | \imp{} with indexed labels ($\iota\ell\imp$) is defined by adding to $\imp$ statements with indexed labels, and by having loops with formal indices attached to them: |
---|
396 | $$ |
---|
397 | S,T,\ldots \gramm \cdots i_k:\sop{while}e\sbin{do}S\mid \alphab : S |
---|
398 | $$ |
---|
399 | Note than unindexed loops still exist in the language: they will correspond to multi-entry loops which are ignored by indexing and optimisations. |
---|
400 | We will discuss the semantics later. |
---|
401 | |
---|
402 | \paragraph{Indexed labelling} |
---|
403 | Given an $\imp$ program $P$, in order to index loops and assign indexed labels, we must first distinguish single-entry loops. |
---|
404 | We sketch how this can be computed in the sequel. |
---|
405 | |
---|
406 | A first pass of the program $P$ can easily compute two maps: $\s{loopof}_P$ from each label $\ell$ to the occurrence (\ie the path) of a $\s{while}$ loop containing $\ell$, or the empty path if none exists; and $\s{gotosof}_P$ from a label $\ell$ to the occurrences of \s{goto}s pointing to it. |
---|
407 | Then the set $\s{multientry}_P$ of multi-entry loops of $P$ can be computed by |
---|
408 | $$ |
---|
409 | \s{multientry}_P\ass\{\, p \mid \exists \ell,q.p =\s{loopof}_P(\ell),q\in\s{gotosof}_P(\ell), q \not\le p\,\} |
---|
410 | $$ |
---|
411 | Here $\le$ is the prefix relation\footnote{Possible simplifications to this procedure include keeping track of just the while loops containing labels and \s{goto}s (rather than paths in the syntactic tree of the program), and making two passes while avoiding building the map to sets $\s{gotosof}$}. |
---|
412 | |
---|
413 | Let $Id_k$ be the indexing of length $k$ made from identity simple expressions, \ie the sequence $i_0\mapsto id_0, \ldots , i_{k-1}\mapsto id_{k-1}$. |
---|
414 | We define the tiered indexed labelling $\Ell^\iota_P (S,k)$ in program $P$ for occurrence $S$ of a statement in $P$ and a natural $k$ by recursion, setting: |
---|
415 | $$ |
---|
416 | \Ell^\iota_P(S,k)\ass |
---|
417 | \left\{ |
---|
418 | \begin{array}{lh{-100pt}l} |
---|
419 | (i_k:\sop{while}b\sbin{do}\alpha\la Id_{k+1}\ra : \Ell^\iota_P(T,k+1));\beta\la Id_k \ra : \s{skip} |
---|
420 | \\& \text{if $S=\sop{while}b\sbin{do}T$ and $S\notin \s{multientry}_P$,}\\[3pt] |
---|
421 | (\sop{while}b\sbin{do}\alpha\la Id_k \ra : \Ell^\iota_P(T,k));\beta\la Id_k \ra : \s{skip} |
---|
422 | \\& \text{otherwise, if $S=\sop{while}b\sbin{do}T$,}\\[3pt] |
---|
423 | \sop{if}b\sbin{then} \alpha\la Id_k \ra : \Ell^\iota_P(T_1,k) \sbin{else} \beta\la Id_k \ra : \Ell^\iota_P(T_2,k) |
---|
424 | \\&\text{if $S=\sop{if}b\sbin{then}T_1\sbin{else}T_2$,}\\[3pt] |
---|
425 | \ell:\alpha\la Id_k\ra : \Ell_P^\iota(T,k) & \text{if $S = \ell : T$,}\\[3pt] |
---|
426 | \ldots |
---|
427 | \end{array} |
---|
428 | \right. |
---|
429 | $$ |
---|
430 | Here, as usual, $\alpha$ and $\beta$ are fresh cost labels, and other cases just keep making the recursive calls on the substatements. |
---|
431 | The \emph{indexed labelling} of a program $P$ is then defined as $\alpha\la \ra : \Ell^\iota_P(P,0)$, \ie a further fresh unindexed cost label is added at the start, and we start from level $0$. |
---|
432 | |
---|
433 | In plainer words: each single-entry loop is indexed by $i_k$ where $k$ is the number of other single-entry loops containing this one, and all cost labels under the scope of a single-entry loop indexed by $i_k$ are indexed by all indices $i_0,\ldots,i_k$, without any transformation. |
---|
434 | |
---|
435 | \subsubsection{Indexed labels and loop transformations}\label{ssec:looptrans} |
---|
436 | We define the \emph{reindexing} $I \circ (i_k\mapsto a*i_k+b)$ as an operator on indexings by setting: |
---|
437 | \begin{multline*} |
---|
438 | (i_0\mapsto e_0,\ldots, i_k \mapsto e_k,\ldots,i_n\mapsto e_n) |
---|
439 | \circ(i_k\mapsto a*i_k+b) |
---|
440 | \ass\\ |
---|
441 | i_0\mapsto e_0,\ldots, i_k \mapsto e_k \circ(a*i_k+b),\ldots,i_n\mapsto e_n, |
---|
442 | \end{multline*} |
---|
443 | We further extend to indexed labels (by $\alpha\la I\ra\circ(i_k\mapsto e)\ass \alpha\la I\circ (i_k\mapsto e)\ra$) and also to statements in $\iota\ell\imp$ (by applying the above transformation to all indexed labels). |
---|
444 | |
---|
445 | We can then redefine loop peeling and loop unrolling, taking into account indexed labels. |
---|
446 | It will only be possible to apply the transformation to indexed loops, that is loops that are single-entry. |
---|
447 | The attentive reader will notice that no assumptions are made on the labelling of the statements that are involved. |
---|
448 | In particular the transformation can be repeated and composed at will. |
---|
449 | Also, note that after erasing all labelling information (\ie indexed cost labels and loop indices) we recover exactly the same transformations presented in~\ref{sec:defimp}. |
---|
450 | |
---|
451 | \paragraph{Indexed loop peeling} |
---|
452 | $$ |
---|
453 | i_k:\sop{while}b\sbin{do}S\mapsto |
---|
454 | \sop{if}b\sbin{then} S\circ (i_k\mapsto 0); i_k : \sop{while} b \sbin{do} S[\ell'_i/\ell_i]\circ(i_k\mapsto i_k + 1) |
---|
455 | $$ |
---|
456 | As can be expected, the peeled iteration of the loop gets reindexed, always being the first iteration of the loop, while the iterations of the remaining loop are shifted by $1$. Notice that this transformation can lower the actual depth of some loops, however their index is left untouched. |
---|
457 | |
---|
458 | \paragraph{Indexed loop unrolling} |
---|
459 | $$ |
---|
460 | \begin{array}{l} |
---|
461 | \begin{array}{ncn} |
---|
462 | i_k:\sop{while}b\sbin{do}S\\ |
---|
463 | \tikz\node[rotate=-90,inner sep=0pt]{$\mapsto$}; |
---|
464 | \end{array}\\ |
---|
465 | i_k:\sop{while} b \sbin{do}\\ |
---|
466 | \quad (S\circ(i_k\mapsto n*i_k) ;\\ |
---|
467 | \quad \sop{if} b \sbin{then}\\ |
---|
468 | \quad\quad (S[\ell^1_i/\ell_i]\circ(i_k\mapsto n*i_k+1) ;\\ |
---|
469 | \quad\quad\quad \vdots \\ |
---|
470 | \quad\quad \quad \sop{if} b \sbin{then}\\ |
---|
471 | \quad \quad \quad \quad S[\ell^n_i/\ell_i]\circ(i_k\mapsto n*i_k+n-1) |
---|
472 | )\cdots ) |
---|
473 | \end{array} |
---|
474 | $$ |
---|
475 | Again, the reindexing is as expected: each copy of the unrolled body has its indices remapped so that when they are executed, the original iteration of the loop to which they correspond can be recovered. |
---|
476 | |
---|
477 | \subsubsection{Semantics and compilation of indexed labels} |
---|
478 | In order to make sense of loop indices, one must keep track of their values in the state. |
---|
479 | A \emph{constant indexing} (metavariables $C,\ldots$) is an indexing which employs only constant simple expressions. |
---|
480 | The evaluation of an indexing $I$ in a constant indexing $C$, noted $I|_C$, is defined by: |
---|
481 | $$ |
---|
482 | I\circ(i_0\mapsto c_0,\ldots, i_{k-1}\mapsto c_{k-1}) \ass \alphab\circ(i_0\mapsto c_0)\circ\cdots\circ(i_{k-1}\mapsto c_{k-1}) |
---|
483 | $$ |
---|
484 | Here, we are using the definition of ${-}\circ{-}$ given in~\ref{ssec:indlabs}. |
---|
485 | We consider the above defined only if the the resulting indexing is a constant one too\footnote{For example $(i_0\mapsto 2*i_0,i_1\mapsto i_1+1)|_{i_0\mapsto 2}$ is undefined, but $(i_0\mapsto 2*i_0,i_1\mapsto 0)|_{i_0\mapsto 2}= i_0\mapsto 4,i_1\mapsto 2$, is indeed a constant indexing, even if the domain of the original indexing is not covered by the constant one.}. |
---|
486 | The definition is extended to indexed labels by $\alpha\la I\ra|_C\ass \alpha\la I|_C\ra$. |
---|
487 | |
---|
488 | Constant indexings will be used to keep track of the exact iterations of the original code that the emitted labels belong to. |
---|
489 | We thus define two basic actions to update constant indexings: $C[i_k{\uparrow}]$ increments the value of $i_k$ by one, and $C[i_k{\downarrow}0]$ resets it to $0$. |
---|
490 | |
---|
491 | We are ready to update the definition of the operational semantics of indexed labelled \imp. |
---|
492 | The emitted cost labels will now be ones indexed by constant indexings. |
---|
493 | We add a special indexed loop construct for continuations that keeps track of active indexed loop indices: |
---|
494 | $$ |
---|
495 | K,\ldots \gramm \cdots | i_k:\sop{while} b \sbin {do} S \sbin{then} K |
---|
496 | $$ |
---|
497 | The difference between the regular stack concatenation $i_k:\sop{while}b\sbin{do}S\cdot K$ and the new constructor is that the latter indicates the loop is the active one in which we already are, while the former is a loop that still needs to be started\footnote{In the presence of \s{continue} and \s{break} statements active loops need to be kept track of in any case.}. |
---|
498 | The \s{find} function is updated accordingly with the case |
---|
499 | $$ |
---|
500 | \s{find}(\ell, i_k:\sop{while}b\sbin{do}S, K) \ass \s{find}(\ell, S, i_k: \sop{while}b\sbin{do}S\sbin{then}K) |
---|
501 | $$ |
---|
502 | The state will now be a 4-tuple $(S,K,s,C)$ which adds a constant indexing to the triple of the regular semantics. |
---|
503 | The small-step rules for all statements remain the same, without touching the $C$ parameter (in particular unindexed loops behave the same as usual), apart from the ones regarding cost-labels and indexed loops. |
---|
504 | The remaining cases are: |
---|
505 | $$\begin{aligned} |
---|
506 | (\alphab : S,K,s,C) &\to[\alphab|_C]_P (S,K,s,C)\\ |
---|
507 | (i_k:\sop{while}b\sbin{do}S,K,C) &\to[\varepsilon]_P |
---|
508 | \begin{cases} |
---|
509 | (S,i_k:\sop{while}b\sbin{do}S\sbin{then} K,s,C[i_k{\downarrow}0]) |
---|
510 | \\\hskip 125pt \text{if $(b,s)\eval v\neq 0$,}\\ |
---|
511 | \rlap{(\s{skip}, K, s, C)}\hskip 125pt \text{otherwise} |
---|
512 | \end{cases}\\ |
---|
513 | (\s{skip}, i_k:\sop{while}b\sbin{do}S\sbin{then}K,C) &\to[\varepsilon]_P |
---|
514 | \begin{cases} |
---|
515 | (S,i_k:\sop{while}b\sbin{do}S\sbin{then} K,s,C[i_k{\uparrow}]) |
---|
516 | \\\hskip 125pt \text{if $(b,s)\eval v\neq 0$,}\\ |
---|
517 | \rlap{(\s{skip}, K, s, C)} \hskip 125pt \text{otherwise} |
---|
518 | \end{cases} |
---|
519 | \end{aligned}$$ |
---|
520 | Some explanations are in order: |
---|
521 | \begin{itemize} |
---|
522 | \item |
---|
523 | Emitting a label always instantiates it with the current indexing. |
---|
524 | \item |
---|
525 | Hitting an indexed loop the first time initializes the corresponding index to 0; continuing the same loop increments the index as expected. |
---|
526 | \item |
---|
527 | The \s{find} function ignores the current indexing: this is correct under the assumption that all indexed loops are single entry, so that when we land inside an indexed loop with a \s{goto}, we are sure that its current index is right. |
---|
528 | \item |
---|
529 | The starting state with store $s$ for a program $P$ is $(P,\s{halt},s,(i_0\mapsto 0,\dots,i_{n-1}\mapsto 0)$ where $i_0,\ldots,i_{n-1}$ cover all loop indices of $P$\footnote{For a program which is the indexed labelling of an \imp{} one this corresponds to the maximum nesting of single-entry loops. |
---|
530 | We can also avoid computing this value in advance if we define $C[i{\downarrow}0]$ to extend $C$'s domain as needed, so that the starting constant indexing can be the empty one.}. |
---|
531 | \end{itemize} |
---|
532 | |
---|
533 | \paragraph{Compilation} |
---|
534 | Further down the compilation chain the loop structure is usually partially or completely lost. |
---|
535 | We cannot rely on it anymore to keep track of the original source code iterations. |
---|
536 | We therefore add, alongside the \s{emit} instruction, two other sequential instructions $\sop{ind_reset}k$ and $\sop{ind_inc}k$ whose sole effect is to reset to 0 (resp.\ increment by 1) the loop index $i_k$, as kept track of in a constant indexing accompanying the state. |
---|
537 | |
---|
538 | The first step of compilation from $\iota\ell\imp$ consists of prefixing the translation of an indexed loop $i_k:\s{while}\ b\ \s{do}\ S$ with $\sop{ind_reset}k$ and postfixing the translation of its body $S$ with $\sop{ind_inc}k$. |
---|
539 | Later in the compilation chain we must propagate the instructions dealing with cost labels. |
---|
540 | |
---|
541 | We would like to stress the fact that this machinery is only needed to give a suitable semantics of observables on which preservation proofs can be done. |
---|
542 | By no means are the added instructions and the constant indexing in the state meant to change the actual (let us say denotational) semantics of the programs. |
---|
543 | In this regard the two new instruction have a similar role as the \s{emit} one. |
---|
544 | A forgetful mapping of everything (syntax, states, operational semantics rules) can be defined erasing all occurrences of cost labels and loop indices, and the result will always be a regular version of the language considered. |
---|
545 | |
---|
546 | \paragraph{Stating the preservation of semantics} |
---|
547 | In fact, the statement of preservation of semantics does not change at all, if not for considering traces of evaluated indexed cost labels rather than traces of plain ones. |
---|
548 | |
---|
549 | \subsubsection{Dependent costs in the source code} |
---|
550 | \label{ssec:depcosts} |
---|
551 | The task of producing dependent costs from constant costs induced by indexed labels is quite technical. |
---|
552 | Before presenting it here, we would like to point out that the annotations produced by the procedure described in this Subsection, even if correct, can be enormous and unreadable. |
---|
553 | In Section~\ref{sec:conc}, where we detail the actual implementation, we will also sketch how we mitigated this problem. |
---|
554 | |
---|
555 | Having the result of compiling the indexed labelling $\Ell^\iota(P)$ of an \imp{} program $P$, we may still suppose that a cost mapping can be computed, but from indexed labels to naturals. |
---|
556 | We want to annotate the source code, so we need a way to express and compute the costs of cost labels, \ie group the costs of indexed labels to ones of their atoms. |
---|
557 | In order to do so we introduce \emph{dependent costs}. |
---|
558 | Let us suppose that for the sole purpose of annotation, we have available in the language ternary expressions of the form |
---|
559 | $$\tern e {f_1}{f_2},$$ |
---|
560 | and that we have access to common operators on integers such as equality, order and modulus. |
---|
561 | |
---|
562 | \paragraph{Simple conditions} |
---|
563 | |
---|
564 | First, we need to shift from \emph{transformations} of loop indices to \emph{conditions} on them. |
---|
565 | We identify a set of conditions on natural numbers which are able to express the image of any composition of simple expressions. |
---|
566 | \emph{Simple conditions} are of three possible forms: |
---|
567 | \begin{itemize} |
---|
568 | \item |
---|
569 | Equality $i_k=n$ for some natural $n$. |
---|
570 | \item |
---|
571 | Inequality $i_k\ge n$ for some natural $n$. |
---|
572 | \item |
---|
573 | Modular equality together with inequality $i_k\bmod a = b\wedge i_k\ge n$ for naturals $a, b, n$. |
---|
574 | \end{itemize} |
---|
575 | The `always true' simple condition is given by $i_k\ge 0$. |
---|
576 | We write $i_k\bmod a = b$ as a simple condition for $i_k\bmod a = b\wedge i_k\ge 0$. |
---|
577 | |
---|
578 | Given a simple condition $p$ and a constant indexing $C$ we can easily define when $p$ holds for $C$ (written $p\circ C$). |
---|
579 | A \emph{dependent cost expression} is an expression built solely out of integer constants and ternary expressions with simple conditions at their head. |
---|
580 | Given a dependent cost expression $e$ where all of the loop indices appearing in it are in the domain of a constant indexing $C$, we can define the value $e\circ C\in \mathbb N$ by: |
---|
581 | $$n\circ C\ass n,\qquad (\tern p e f)\circ C\ass |
---|
582 | \begin{cases} |
---|
583 | e\circ C& \text{if $p\circ C$,}\\ |
---|
584 | f\circ C& \text{otherwise.} |
---|
585 | \end{cases}$$ |
---|
586 | |
---|
587 | \paragraph{From indexed costs to dependent ones} |
---|
588 | Every simple expression $e$ corresponds to a simple condition $p(e)$ which expresses the set of values that $e$ can take. |
---|
589 | Following is the definition of such a relation. |
---|
590 | We recall that in this development, loop indices are always mapped to simple expressions over the same index. |
---|
591 | If it was not the case, the condition obtained from an expression should be on the mapped index, not the indeterminate of the simple expression. |
---|
592 | We leave all generalisations of what we present here for further work: |
---|
593 | $$ |
---|
594 | p(a*i_k+b)\ass |
---|
595 | \begin{cases} |
---|
596 | i_k = b & \text{if $a = 0$,}\\ |
---|
597 | i_k \ge b & \text{if $a = 1$,}\\ |
---|
598 | i_k\bmod a = b' \wedge i_k \ge b & \text{otherwise, where $b' = b\bmod a$}. |
---|
599 | \end{cases} |
---|
600 | $$ |
---|
601 | Now, suppose we are given a mapping $\kappa$ from indexed labels to natural numbers. |
---|
602 | We will transform it in a mapping (identified, via abuse of notation, with the same symbol $\kappa$) from atoms to \imp{} expressions built with ternary expressions which depend solely on loop indices. |
---|
603 | To that end we define an auxiliary function $\kappa^\alpha_L$, parameterized by atoms and words of simple expressions, and defined on \emph{sets} of $n$-uples of simple expressions (with $n$ constant across each such set, \ie each set is made of words all with the same length). |
---|
604 | |
---|
605 | We will employ a bijection between words of simple expressions and indexings, given by:\footnote{Lists of simple expressions are in fact how indexings are -represented in CerCo's current implementation of the compiler.} |
---|
606 | $$ |
---|
607 | i_0\mapsto e_0,\ldots,i_{k-1}\mapsto e_{k-1} \cong e_0\cdots e_{k-1}. |
---|
608 | $$ |
---|
609 | As usual, $\varepsilon$ denotes the empty word/indexing, and juxtaposition is used to denote word concatenation. |
---|
610 | |
---|
611 | For every set $s$ of $n$-uples of simple expressions, we are in one of the following three exclusive cases: |
---|
612 | \begin{itemize} |
---|
613 | \item |
---|
614 | $S=\emptyset$. |
---|
615 | \item |
---|
616 | $S=\{\varepsilon\}$. |
---|
617 | \item |
---|
618 | There is a simple expression $e$ such that $S$ can be decomposed in $eS'+S''$, with $S'\neq \emptyset$ and none of the words in $S''$ starting with $e$. |
---|
619 | \end{itemize} |
---|
620 | Here $eS'$ denotes prepending $e$ to all elements of $S'$ and $+$ is disjoint union. |
---|
621 | This classification can serve as the basis of a definition by recursion on $n+\card S$ where $n$ is the size of tuples in $S$ and $\card S$ is its cardinality. |
---|
622 | Indeed in the third case in $S'$ the size of tuples decreases strictly (and cardinality does not increase) while for $S''$ the size of tuples remains the same but cardinality strictly decreases. |
---|
623 | The expression $e$ of the third case will be chosen as minimal for some total order\footnote{The specific order used does not change the correctness of the procedure, but different orders can give more or less readable results. A ``good'' order is the lexicographic one, with $a*i_k+b \le a'*i_k+b'$ if $a<a'$ or $a=a'$ and $b\le b'$.}. |
---|
624 | |
---|
625 | Following is the definition of the auxiliary function $\kappa^\alpha_L$, which follows the recursion scheme presented above: |
---|
626 | $$ |
---|
627 | \begin{aligned} |
---|
628 | \kappa^\alpha_L(\emptyset) &\ass 0\\ |
---|
629 | \kappa^\alpha_L(\{\varepsilon\}) &\ass \kappa(\alpha\la L\ra) \\ |
---|
630 | \kappa^\alpha_L(eS'+S'') &\ass \tern{p(e)}{\kappa^\alpha_{Le}(S')}{\kappa^\alpha_L(S'')} |
---|
631 | \end{aligned} |
---|
632 | $$ |
---|
633 | \noindent |
---|
634 | Finally, the wanted dependent cost mapping is defined by |
---|
635 | $$ |
---|
636 | \kappa(\alpha)\ass\kappa^\alpha_\varepsilon(\{\,L\mid \alpha\la L\ra \text{ appears in the compiled code}\,\}) |
---|
637 | $$ |
---|
638 | |
---|
639 | \paragraph{Indexed instrumentation} |
---|
640 | The \emph{indexed instrumentation} generalises the instrumentation presented in~\ref{sec:labelling}. |
---|
641 | We described above how cost atoms can be mapped to dependent costs. |
---|
642 | The instrumentation must also insert code dealing with the loop indices. |
---|
643 | As instrumentation is done on the code produced by the labelling phase, all cost labels are indexed by identity indexings. |
---|
644 | The relevant cases of the recursive definition (supposing $c$ is the cost variable) are then: |
---|
645 | $$ |
---|
646 | \begin{aligned} |
---|
647 | \mathcal I^\iota(\alpha\la Id_k\ra:S) &= c\ass c + \kappa(\alpha);\mathcal I^\iota(S)\\ |
---|
648 | \mathcal I^\iota(i_k : \sop{while}b\sbin{do}S) &= |
---|
649 | i_k \ass 0; \sop{while}b\sbin{do}(\mathcal I^\iota (S); i_k \ass i_k + 1) |
---|
650 | \end{aligned} |
---|
651 | $$ |
---|
652 | |
---|
653 | \subsubsection{A detailed example}\label{ssec:detailedex} |
---|
654 | Take the program in \autoref{fig:example1}. Its initial labelling will be: |
---|
655 | $$\begin{array}{l} |
---|
656 | \alpha\la\ra : s\ass 0;\\ |
---|
657 | i\ass 0;\\ |
---|
658 | i_0:\sop{while}i<n\sbin{do}\\ |
---|
659 | \inde \beta\la i_0\ra : p\ass 1;\\ |
---|
660 | \inde j\ass 1;\\ |
---|
661 | \inde i_1:\sop{while}j \le i\sbin{do}\\ |
---|
662 | \inde \inde \gamma\la i_0, i_1\ra : p\ass j*p\\ |
---|
663 | \inde \inde j\ass j+1;\\ |
---|
664 | \inde \delta\la i_0\ra : s\ass s+p;\\ |
---|
665 | \inde i\ass i+1;\\ |
---|
666 | \epsilon\la\ra:\s{skip} |
---|
667 | \end{array} |
---|
668 | $$ |
---|
669 | (a single \s{skip} after the $\delta$ label has been suppressed, and we are using the identification |
---|
670 | between indexings and tuples of simple expressions explained in \autoref{ssec:depcosts}). |
---|
671 | Supposing for example, $n=3$ |
---|
672 | the trace of the program will be |
---|
673 | $$\alpha\la\ra\,\beta\la 0 \ra\, \delta\la 0\ra\,\beta\la 1\ra\,\gamma\la 1,0\ra\, |
---|
674 | \delta\la 1\ra\,\beta\la 2\ra\,\gamma\la 2,0\ra\,\gamma\la 2, 1\ra\,\delta\la 2\ra\, |
---|
675 | \epsilon\la\ra$$ |
---|
676 | Now let as apply the transformations of \autoref{fig:example1} with the additional |
---|
677 | information detailed in \autoref{ssec:looptrans}. The result is shown in |
---|
678 | \autoref{fig:example2}. |
---|
679 | \begin{figureplr} |
---|
680 | $$ |
---|
681 | \begin{array}{l} |
---|
682 | \mbox{\color{blue}\boldmath$\alpha\la\ra $}:s\ass 0;\\ |
---|
683 | i\ass 0;\\ |
---|
684 | \tikztargetm{a}{\s{if}}\ i<n\sbin{then}\\ |
---|
685 | \inde \mbox{\color{blue}\boldmath$\beta\la0\ra $}:p\ass 1;\\ |
---|
686 | \inde j\ass 1;\\ |
---|
687 | \inde i_1:\sop{while}j \le i\sbin{do}\\ |
---|
688 | \inde \inde \mbox{\color{blue}\boldmath$\gamma\la 0, i_1\ra $}:p\ass j*p\\ |
---|
689 | \inde \inde j\ass j+1;\\ |
---|
690 | \inde \mbox{\color{blue}\boldmath$\delta\la 0\ra $}: s\ass s+p;\\ |
---|
691 | \inde i\ass i+1;\\ |
---|
692 | \inde i_0:\tikztargetm{d}{\s{while}}\ i<n\sbin{do}\\ |
---|
693 | \inde \inde \mbox{\color{blue}\boldmath$\beta\la 2*i_0+1\ra $}:p\ass 1;\\ |
---|
694 | \inde \inde j\ass 1;\\ |
---|
695 | \inde \inde \tikztargetm{b}{\s{if}}\ j \le i\sbin{then}\\ |
---|
696 | \inde \inde \inde \mbox{\color{blue}\boldmath$\gamma\la 2*i_0+1, 0\ra $}:p\ass j*p\\ |
---|
697 | \inde \inde \inde j\ass j+1;\\ |
---|
698 | \inde \inde \inde \sop{if}j \le i\sbin{then}\\ |
---|
699 | \inde \inde \inde \inde \mbox{\color{blue}\boldmath$\gamma\la 2*i_0+1, 1\ra $}: p\ass j*p\\ |
---|
700 | \inde \inde \inde \inde j\ass j+1;\\ |
---|
701 | \inde \inde \inde \inde i_1:\s{while}\ j \le i\sbin{do}\\ |
---|
702 | \inde \inde \inde \inde \inde \mbox{\color{blue}\boldmath$\gamma\la 2*i_0+1, 2*i_1 + 2 \ra $}:p\ass j*p\\ |
---|
703 | \inde \inde \inde \inde \inde j\ass j+1;\\ |
---|
704 | \inde \inde \inde \inde \inde \sop{if}j \le i\sbin{then}\\ |
---|
705 | \inde \inde \inde \inde \inde \inde \mbox{\color{blue}\boldmath$\gamma\la 2*i_0+1, 2*i_1 + 3\ra $}:p\ass j*p\\ |
---|
706 | \inde \inde \inde \inde \inde \inde \tikztargetm{c}j\ass j+1;\\ |
---|
707 | \inde \inde \mbox{\color{blue}\boldmath$\delta\la 2*i_0+1\ra$}: s\ass s+p;\\ |
---|
708 | \inde \inde i\ass i+1;\\ |
---|
709 | \inde \inde \sop{if}i<n\sbin{then}\\ |
---|
710 | \inde \inde \inde \mbox{\color{blue}\boldmath$\beta\la 2*i_0+2\ra $}:p\ass 1;\\ |
---|
711 | \inde \inde \inde j\ass 1;\\ |
---|
712 | \inde \inde \inde i_1:\tikztargetm{e}{\s{while}}\ j < i\sbin{do}\\ |
---|
713 | \inde \inde \inde \inde \mbox{\color{blue}\boldmath$\gamma\la 2*i_0+2, 2*i_1\ra$}: p\ass j*p\\ |
---|
714 | \inde \inde \inde \inde j\ass j+1;\\ |
---|
715 | \inde \inde \inde \inde \s{if}\ j < i\sbin{then}\\ |
---|
716 | \inde \inde \inde \inde \inde \mbox{\color{blue}\boldmath$\gamma\la2*i_0+2, 2*i_1+1\ra$}: p\ass j*p\\ |
---|
717 | \inde \inde \inde \inde \inde \tikztargetm{f}j\ass j+1;\\ |
---|
718 | \inde \inde \inde \mbox{\color{blue}\boldmath$\delta\la 2*i_0+2\ra $}: s\ass s+p;\\ |
---|
719 | \inde \inde \inde i\ass i+1\tikztargetm{g};{}\\ |
---|
720 | \mbox{\color{blue}\boldmath$\epsilon\la\ra $}:\s{skip} |
---|
721 | \end{array}$$ |
---|
722 | \caption{The result of applying reindexing loop transformations on the |
---|
723 | program in \autoref{fig:example1}.}\label{fig:example2} |
---|
724 | \end{figureplr} |
---|
725 | One can check that the transformed code leaves the same trace when executed. |
---|
726 | |
---|
727 | Now let us compute the dependent cost of $\gamma$, supposing no other loop transformations |
---|
728 | are done. Ordering its indexings we |
---|
729 | have the following list: |
---|
730 | \begin{equation} |
---|
731 | \label{eq:inds} |
---|
732 | \begin{aligned} |
---|
733 | &0, i_1\\ |
---|
734 | &2*i_0+1, 0\\ |
---|
735 | &2*i_0+1, 1\\ |
---|
736 | &2*i_0+1, 2*i_1+2\\ |
---|
737 | &2*i_0+1, 2*i_1+3\\ |
---|
738 | &2*i_0+2, 2*i_1\\ |
---|
739 | &2*i_0+2, 2*i_1+1 |
---|
740 | \end{aligned} |
---|
741 | \end{equation} |
---|
742 | |
---|
743 | The resulting dependent cost will then be |
---|
744 | \def\indetern#1#2#3{\begin{tabular}[t]{nL}(#1)\mathrel ?{}\\\hskip 15pt #2:{}\\\hskip 15pt #3\end{tabular}} |
---|
745 | \def\tern#1#2#3{(#1)\mathrel ? #2:#3} |
---|
746 | \begin{equation}\label{eq:ex} |
---|
747 | \kappa^\iota(\gamma)= |
---|
748 | \indetern{i_0 = 0} |
---|
749 | {\tern{i_1\ge 0}a0} |
---|
750 | {\indetern{i_0\bmod 2 = 1 \wedge i_0\ge 1} |
---|
751 | {\indetern{i_1=0} |
---|
752 | b |
---|
753 | {\indetern{i_1 = 1} |
---|
754 | c |
---|
755 | {\indetern{i_1\bmod 2 = 0 \wedge i_1\ge 2} |
---|
756 | d |
---|
757 | {\tern{i_1\bmod 2 = 1 \wedge i_1\ge 3}e0} |
---|
758 | } |
---|
759 | } |
---|
760 | } |
---|
761 | {\indetern{i_0\bmod 2 = 0 \wedge i_0\ge 2} |
---|
762 | {\indetern{i_1 \bmod 2 = 0 \wedge i_1 \ge 0} |
---|
763 | f |
---|
764 | {\tern{i_1 \bmod 2 = 1 \wedge i_1 \ge 1}g0} |
---|
765 | } |
---|
766 | 0 |
---|
767 | } |
---|
768 | } |
---|
769 | \end{equation} |
---|
770 | We will see later on \autopageref{pag:continued} how such an expression can be simplified. |
---|
771 | |
---|
772 | \subsection{Notes on the implementation and further work} |
---|
773 | \label{sec:conc} |
---|
774 | Implementing the indexed label approach in CerCo's untrusted Ocaml prototype does not introduce many new challenges beyond what has already been presented for the toy language, \imp{} with \s{goto}s. |
---|
775 | \s{Clight}, the \s{C} fragment source language of CerCo's compilation chain~\cite{D2.1}, has several more fetaures, but few demand changes in the indexed labelled approach. |
---|
776 | |
---|
777 | \paragraph{Indexed loops \emph{vs}. index update instructions} |
---|
778 | In our presentation we have indexed loops in $\iota\ell\imp$, while we hinted that later languages in the compilation chain would have specific index update instructions. |
---|
779 | In CerCo's actual compilation chain from \s{Clight} to 8051 assembly, indexed loops are only in \s{Clight}, while from \s{Cminor} onward all languages have the same three cost-involving instructions: label emitting, index resetting and index incrementing. |
---|
780 | |
---|
781 | \paragraph{Loop transformations in the front end} |
---|
782 | We decided to implement the two loop transformations in the front end, namely in \s{Clight}. |
---|
783 | This decision is due to user readability concerns: if costs are to be presented to the programmer, they should depend on structures written by the programmer himself. |
---|
784 | If loop transformation were performed later it would be harder to create a correspondence between loops in the control flow graph and actual loops written in the source code. |
---|
785 | However, another solution would be to index loops in the source code and then use these indices later in the compilation chain to pinpoint explicit loops of the source code: loop indices can be used to preserve such information, just like cost labels. |
---|
786 | |
---|
787 | \paragraph{Break and continue statements} |
---|
788 | \s{Clight}'s loop flow control statements for breaking and continuing a loop are equivalent to appropriate \s{goto} statements. |
---|
789 | The only difference is that we are assured that they cannot cause loops to be multi-entry, and that when a transformation such as loop peeling is complete, they need to be replaced by actual \s{goto}s (which happens further down the compilation chain anyway). |
---|
790 | |
---|
791 | \paragraph{Function calls} |
---|
792 | Every internal function definition has its own space of loop indices. |
---|
793 | Executable semantics must thus take into account saving and resetting the constant indexing of current loops upon hitting a function call, and restoring it upon return of control. |
---|
794 | A peculiarity is that this cannot be attached to actions that save and restore frames: namely in the case of tail calls the constant indexing needs to be saved whereas the frame does not. |
---|
795 | |
---|
796 | \paragraph{Cost-labelled expressions} |
---|
797 | In labelled \s{Clight}, expressions also get cost labels, due to the presence of ternary conditional expressions (and lazy logical operators, which get translated to ternary expressions too). |
---|
798 | Adapting the indexed labelled approach to cost-labelled expressions does not pose any particular problems. |
---|
799 | |
---|
800 | \paragraph{Simplification of dependent costs} |
---|
801 | As previously mentioned, the na\"{i}ve application of the procedure described in~\ref{ssec:depcosts} produces unwieldy cost annotations. |
---|
802 | In our implementation several transformations are used to simplify such complex dependent costs. |
---|
803 | |
---|
804 | Disjunctions of simple conditions are closed under all logical operations, and it can be computed whether such a disjunction implies a simple condition or its negation. |
---|
805 | This can be used to eliminate useless branches of dependent costs, to merge branches that share the same value, and possibly to simplify the third case of simple condition. |
---|
806 | Examples of the three transformations are respectively: |
---|
807 | \begin{itemize} |
---|
808 | \item $ |
---|
809 | \verb+(_i_0 == 0)?+x\verb+:(_i_0 >= 1)?+y\verb+:+z |
---|
810 | \mapsto |
---|
811 | \verb+(_i_0 == 0)?+x\verb+:+y, |
---|
812 | $ |
---|
813 | \item $ |
---|
814 | c\texttt{?}x\verb+:(+d\texttt{?}x\texttt{:}y\verb+)+ |
---|
815 | \mapsto |
---|
816 | \texttt{(}c\texttt{ || }d\texttt{)?}x\texttt{:}y, |
---|
817 | $ |
---|
818 | \item \begin{tabular}[t]{np{\linewidth}n} |
---|
819 | $\verb+(_i_0 == 0)?+x\verb+:(_i_0 % 2 == 0 && _i_0 >= 2)?+y\verb+:+z |
---|
820 | \mapsto$ \\\hfill |
---|
821 | $\verb+(_i_0 == 0)?+x\verb+:(_i_0 % 2 == 0)?+y\verb+:+z. |
---|
822 | $\end{tabular} |
---|
823 | \end{itemize} |
---|
824 | The second transformation tends to accumulate disjunctions, to the detriment of readability. |
---|
825 | A further transformation swaps two branches of the ternary expression if the negation of the condition can be expressed with fewer clauses. |
---|
826 | For example: |
---|
827 | $$ \verb+(_i_0 % 3 == 0 || _i_0 % 3 == 1)?+x\verb+:+y \mapsto |
---|
828 | \verb+(_i_0 % 3 == 2)?+y\verb+:+x. |
---|
829 | $$ |
---|
830 | Picking up again the example depicted in \autoref{ssec:detailedex}, \label{pag:continued} |
---|
831 | we can see that the cost in \eqref{eq:ex} can be simplified to the following, |
---|
832 | using some of the transformation described above: |
---|
833 | $$ |
---|
834 | \kappa^\iota(\gamma)= |
---|
835 | \indetern{i_0 = 0} |
---|
836 | a |
---|
837 | {\indetern{i_0\bmod 2 = 1} |
---|
838 | {\indetern{i_1=0} |
---|
839 | b |
---|
840 | {\indetern{i_1 = 1} |
---|
841 | c |
---|
842 | {\indetern{i_1\bmod 2 = 0} |
---|
843 | de |
---|
844 | } |
---|
845 | } |
---|
846 | } |
---|
847 | {\indetern{i_1 \bmod 2 = 0} |
---|
848 | fg |
---|
849 | } |
---|
850 | } |
---|
851 | $$ |
---|
852 | One should keep in mind that the example was wilfully complicated, in practice |
---|
853 | the cost expressions produced have rarely more clauses |
---|
854 | than the number of nested loops containing the annotation. |
---|
855 | \paragraph{Updates to the frama-C cost plugin} |
---|
856 | Cerco's frama-C~\cite{framac} cost plugin\todo{is there a reference for this?}{} has been updated to take into account our new notion of dependent costs. |
---|
857 | The frama-c framework expands ternary expressions to branch statements, introducing temporaries along the way. |
---|
858 | This makes the task of analyzing ternary cost expressions rather daunting. |
---|
859 | It was deemed necessary to provide an option in the compiler to use actual branch statements for cost annotations rather than ternary expressions, so that at least frama-C's use of temporaries in cost annotation could be avoided. |
---|
860 | The cost analysis carried out by the plugin now takes into account such dependent costs. |
---|
861 | |
---|
862 | The only limitation (which actually simplifies the code) is that, within a dependent cost, simple conditions with modulus on the same loop index should not be modulo different numbers. |
---|
863 | This corresponds to a reasonable limitation on the number of times loop unrolling may be applied to the same loop: at most once. |
---|
864 | |
---|
865 | %%% HERE BEGINS A NEW BIT |
---|
866 | |
---|
867 | \section{Introduction} |
---|
868 | |
---|
869 | The \cerco{} compiler produces a version of the source code containing |
---|
870 | annotations describing the timing behaviour of the object code, as |
---|
871 | well as the object code itself. It compiles C code, targeting |
---|
872 | microcontrollers implementing the Intel 8051 architecture. There are |
---|
873 | two versions: first, an initial prototype was implemented in |
---|
874 | \ocaml{}~\cite{d2.2}, then a version was formalised in the \matita{} |
---|
875 | proof assistant~\cite{d3.2,d4.2} and extracted to \ocaml{} code to |
---|
876 | produce an executable compiler. In this document we present results |
---|
877 | from Deliverable 3.4, the formalised proofs in \matita{} about the |
---|
878 | front-end of the latter version of the compiler (culminating in the |
---|
879 | \lstinline'front_end_correct' lemma), and describe how that fits |
---|
880 | into the verification of the whole compiler. |
---|
881 | |
---|
882 | A key part of this work was to layer the \emph{intensional} correctness |
---|
883 | results that show that the costs produced are correct on top of the |
---|
884 | proofs about the compiled code's \emph{extensional} behaviour (that is, the |
---|
885 | functional correctness of the compiler). Unfortunately, the ambitious |
---|
886 | goal of completely verifying the entire compiler was not feasible |
---|
887 | within the time available, but thanks to this separation of |
---|
888 | extensional and intensional proofs we are able to axiomatise some extensional |
---|
889 | simulation results which are similar to those in other compiler verification |
---|
890 | projects and concentrate on the novel intensional proofs. We were |
---|
891 | also able to add stack space costs to obtain a stronger result. The |
---|
892 | proofs were made more tractable by introducing compile-time checks for |
---|
893 | the `sound and precise' cost labelling properties rather than proving |
---|
894 | that they are preserved throughout. |
---|
895 | |
---|
896 | The overall statement of correctness says that the annotated program has the |
---|
897 | same behaviour as the input, and that for any suitably well-structured part of |
---|
898 | the execution (which we call \emph{measurable}), the object code will execute |
---|
899 | the same behaviour taking precisely the time given by the cost annotations in |
---|
900 | the annotated source program. |
---|
901 | |
---|
902 | In the next section we recall the structure of the compiler and make the overall |
---|
903 | statement more precise. Following that, in Section~\ref{sec:fegoals} we |
---|
904 | describe the statements we need to prove about the intermediate \textsf{RTLabs} |
---|
905 | programs for the back-end proofs. |
---|
906 | Section~\ref{sec:inputtolabelling} covers the compiler passes which produce the |
---|
907 | annotated source program and Section~\ref{sec:measurablelifting} the rest |
---|
908 | of the transformations in the front-end. Then the compile-time checks |
---|
909 | for good cost labelling are detailed in Section~\ref{sec:costchecks} |
---|
910 | and the proofs that the structured traces required by the back-end |
---|
911 | exist are discussed in Section~\ref{sec:structuredtrace}. |
---|
912 | |
---|
913 | \section{The compiler and its correctness statement} |
---|
914 | |
---|
915 | The uncertified prototype \ocaml{} \cerco{} compiler was originally described |
---|
916 | in Deliverables 2.1 and 2.2. Its design was replicated in the formal |
---|
917 | \matita{} code, which was presented in Deliverables 3.2 and 4.2, for |
---|
918 | the front-end and back-end respectively. |
---|
919 | |
---|
920 | \begin{figure} |
---|
921 | \begin{center} |
---|
922 | \includegraphics[width=0.5\linewidth]{compiler-plain.pdf} |
---|
923 | \end{center} |
---|
924 | \caption{Languages in the \cerco{} compiler} |
---|
925 | \label{fig:compilerlangs} |
---|
926 | \end{figure} |
---|
927 | |
---|
928 | The compiler uses a number of intermediate languages, as outlined the |
---|
929 | middle two lines of Figure~\ref{fig:compilerlangs}. The upper line |
---|
930 | represents the front-end of the compiler, and the lower the back-end, |
---|
931 | finishing with Intel 8051 binary code. Not all of the front-end compiler passes |
---|
932 | introduce a new language, and Figure~\ref{fig:summary} presents a |
---|
933 | list of every pass involved. |
---|
934 | |
---|
935 | \begin{figure} |
---|
936 | \begin{center} |
---|
937 | \begin{minipage}{.8\linewidth} |
---|
938 | \begin{tabbing} |
---|
939 | \quad \= $\downarrow$ \quad \= \kill |
---|
940 | \textsf{C} (unformalised)\\ |
---|
941 | \> $\downarrow$ \> CIL parser (unformalised \ocaml)\\ |
---|
942 | \textsf{Clight}\\ |
---|
943 | %\> $\downarrow$ \> add runtime functions\\ |
---|
944 | \> $\downarrow$ \> \lstinline[language=C]'switch' removal\\ |
---|
945 | \> $\downarrow$ \> labelling\\ |
---|
946 | \> $\downarrow$ \> cast removal\\ |
---|
947 | \> $\downarrow$ \> stack variable allocation and control structure |
---|
948 | simplification\\ |
---|
949 | \textsf{Cminor}\\ |
---|
950 | %\> $\downarrow$ \> generate global variable initialization code\\ |
---|
951 | \> $\downarrow$ \> transform to RTL graph\\ |
---|
952 | \textsf{RTLabs}\\ |
---|
953 | \> $\downarrow$ \> check cost labelled properties of RTL graph\\ |
---|
954 | \> $\downarrow$ \> start of target specific back-end\\ |
---|
955 | \>\quad \vdots |
---|
956 | \end{tabbing} |
---|
957 | \end{minipage} |
---|
958 | \end{center} |
---|
959 | \caption{Front-end languages and compiler passes} |
---|
960 | \label{fig:summary} |
---|
961 | \end{figure} |
---|
962 | |
---|
963 | \label{page:switchintro} |
---|
964 | The annotated source code is produced by the cost labelling phase. |
---|
965 | Note that there is a pass to replace C \lstinline[language=C]'switch' |
---|
966 | statements before labelling --- we need to remove them because the |
---|
967 | simple form of labelling used in the formalised compiler is not quite |
---|
968 | capable of capturing their execution time costs, largely due to C's |
---|
969 | `fall-through' behaviour where execution from one branch continues in |
---|
970 | the next unless there is an explicit \lstinline[language=C]'break'. |
---|
971 | |
---|
972 | The cast removal phase which follows cost labelling simplifies |
---|
973 | expressions to prevent unnecessary arithmetic promotion, which is |
---|
974 | specified by the C standard but costly for an 8-bit target. The |
---|
975 | transformation to \textsf{Cminor} and subsequently \textsf{RTLabs} |
---|
976 | bear considerable resemblance to some passes of the CompCert |
---|
977 | compiler~\cite{Blazy-Leroy-Clight-09,Leroy-backend}, although we use a simpler \textsf{Cminor} where |
---|
978 | all loops use \lstinline[language=C]'goto' statements, and the |
---|
979 | \textsf{RTLabs} language retains a target-independent flavour. The |
---|
980 | back-end takes \textsf{RTLabs} code as input. |
---|
981 | |
---|
982 | The whole compilation function returns the following information on success: |
---|
983 | \begin{lstlisting}[language=Grafite] |
---|
984 | record compiler_output : Type[0] := |
---|
985 | { c_labelled_object_code: labelled_object_code |
---|
986 | ; c_stack_cost: stack_cost_model |
---|
987 | ; c_max_stack: nat |
---|
988 | ; c_init_costlabel: costlabel |
---|
989 | ; c_labelled_clight: clight_program |
---|
990 | ; c_clight_cost_map: clight_cost_map |
---|
991 | }. |
---|
992 | \end{lstlisting} |
---|
993 | It consists of annotated 8051 object code, a mapping from function |
---|
994 | identifiers to the function's stack space usage, the space available for the |
---|
995 | stack after global variable allocation, a cost label covering the |
---|
996 | execution time for the initialisation of global variables and the call |
---|
997 | to the \lstinline[language=C]'main' function, the annotated source |
---|
998 | code, and finally a mapping from cost labels to actual execution time |
---|
999 | costs. |
---|
1000 | |
---|
1001 | An \ocaml{} pretty printer is used to provide a concrete version of |
---|
1002 | the output code and annotated source code. In the case of the |
---|
1003 | annotated source code, it also inserts the actual costs alongside the |
---|
1004 | cost labels, and optionally adds a global cost variable and |
---|
1005 | instrumentation to support further reasoning in external tools such as |
---|
1006 | Frama-C. |
---|
1007 | |
---|
1008 | \subsection{Revisions to the prototype compiler} |
---|
1009 | |
---|
1010 | Our focus on intensional properties prompted us to consider whether we |
---|
1011 | could incorporate stack space into the costs presented to the user. |
---|
1012 | We only allocate one fixed-size frame per function, so modelling this |
---|
1013 | was relatively simple. It is the only form of dynamic memory |
---|
1014 | allocation provided by the compiler, so we were able to strengthen the |
---|
1015 | statement of the goal to guarantee successful execution whenever the |
---|
1016 | stack space obeys the \lstinline'c_max_stack' bound calculated by |
---|
1017 | subtracting the global variable requirements from the total memory |
---|
1018 | available. |
---|
1019 | |
---|
1020 | The cost labelling checks at the end of Figure~\ref{fig:summary} have been |
---|
1021 | introduced to reduce the proof burden, and are described in |
---|
1022 | Section~\ref{sec:costchecks}. |
---|
1023 | |
---|
1024 | The use of dependent types to capture simple intermediate language |
---|
1025 | invariants makes every front-end pass a total function, except |
---|
1026 | \textsf{Clight} to \textsf{Cminor} and the cost checks. Hence various |
---|
1027 | well-formedness and type safety checks are performed only once between |
---|
1028 | \textsf{Clight} and \textsf{Cminor}, and the invariants rule out any |
---|
1029 | difficulties in the later stages. With the benefit of hindsight we |
---|
1030 | would have included an initial checking phase to produce a |
---|
1031 | `well-formed' variant of \textsf{Clight}, conjecturing that this would |
---|
1032 | simplify various parts of the proofs for the \textsf{Clight} stages |
---|
1033 | which deal with potentially ill-formed code. |
---|
1034 | |
---|
1035 | Following D2.2, we previously generated code for global variable |
---|
1036 | initialisation in \textsf{Cminor}, for which we reserved a cost label |
---|
1037 | to represent the execution time for initialisation. However, the |
---|
1038 | back-end must also add an initial call to the main function, whose |
---|
1039 | cost must also be accounted for, so we decided to move the |
---|
1040 | initialisation code to the back-end and merge the costs. |
---|
1041 | |
---|
1042 | \subsection{Main correctness statement} |
---|
1043 | |
---|
1044 | [[relocated to specification section]] |
---|
1045 | |
---|
1046 | \section{Correctness statement for the front-end} |
---|
1047 | \label{sec:fegoals} |
---|
1048 | |
---|
1049 | The essential parts of the intensional proof were outlined during work |
---|
1050 | on a toy compiler in Task |
---|
1051 | 2.1~\cite{springerlink:10.1007/978-3-642-32469-7_3}. These are |
---|
1052 | \begin{enumerate} |
---|
1053 | \item functional correctness, in particular preserving the trace of |
---|
1054 | cost labels, |
---|
1055 | \item the \emph{soundness} and \emph{precision} of the cost labelling |
---|
1056 | on the object code, and |
---|
1057 | \item the timing analysis on the object code produces a correct |
---|
1058 | mapping from cost labels to time. |
---|
1059 | \end{enumerate} |
---|
1060 | |
---|
1061 | However, that toy development did not include function calls. For the |
---|
1062 | full \cerco{} compiler we also need to maintain the invariant that |
---|
1063 | functions return to the correct program location in the caller, as we |
---|
1064 | mentioned in the previous section. During work on the back-end timing |
---|
1065 | analysis (describe in more detail in the companion deliverable, D4.4) |
---|
1066 | the notion of a \emph{structured trace} was developed to enforce this |
---|
1067 | return property, and also most of the cost labelling properties too. |
---|
1068 | |
---|
1069 | \begin{figure} |
---|
1070 | \begin{center} |
---|
1071 | \includegraphics[width=0.5\linewidth]{compiler.pdf} |
---|
1072 | \end{center} |
---|
1073 | \caption{The compiler and proof outline} |
---|
1074 | \label{fig:compiler} |
---|
1075 | \end{figure} |
---|
1076 | |
---|
1077 | Jointly, we generalised the structured traces to apply to any of the |
---|
1078 | intermediate languages which have some idea of program counter. This means |
---|
1079 | that they are introduced part way through the compiler, see |
---|
1080 | Figure~\ref{fig:compiler}. Proving that a structured trace can be |
---|
1081 | constructed at \textsf{RTLabs} has several virtues: |
---|
1082 | \begin{itemize} |
---|
1083 | \item This is the first language where every operation has its own |
---|
1084 | unique, easily addressable, statement. |
---|
1085 | \item Function calls and returns are still handled implicitly in the |
---|
1086 | language and so the structural properties are ensured by the |
---|
1087 | semantics. |
---|
1088 | \item Many of the back-end languages from \textsf{RTL} onwards share a common |
---|
1089 | core set of definitions, and using structured traces throughout |
---|
1090 | increases this uniformity. |
---|
1091 | \end{itemize} |
---|
1092 | |
---|
1093 | \begin{figure} |
---|
1094 | \begin{center} |
---|
1095 | \includegraphics[width=0.6\linewidth]{strtraces.pdf} |
---|
1096 | \end{center} |
---|
1097 | \caption{Nesting of functions in structured traces} |
---|
1098 | \label{fig:strtrace} |
---|
1099 | \end{figure} |
---|
1100 | A structured trace is a mutually inductive data type which |
---|
1101 | contains the steps from a normal program trace, but arranged into a |
---|
1102 | nested structure which groups entire function calls together and |
---|
1103 | aggregates individual steps between cost labels (or between the final |
---|
1104 | cost label and the return from the function), see |
---|
1105 | Figure~\ref{fig:strtrace}. This captures the idea that the cost labels |
---|
1106 | only represent costs \emph{within} a function --- calls to other |
---|
1107 | functions are accounted for in the nested trace for their execution, and we |
---|
1108 | can locally regard function calls as a single step. |
---|
1109 | |
---|
1110 | These structured traces form the core part of the intermediate results |
---|
1111 | that we must prove so that the back-end can complete the main |
---|
1112 | intensional result stated above. In full, we provide the back-end |
---|
1113 | with |
---|
1114 | \begin{enumerate} |
---|
1115 | \item A normal trace of the \textbf{prefix} of the program's execution |
---|
1116 | before reaching the measurable subtrace. (This needs to be |
---|
1117 | preserved so that we know that the stack space consumed is correct, |
---|
1118 | and to set up the simulation results.) |
---|
1119 | \item The \textbf{structured trace} corresponding to the measurable |
---|
1120 | subtrace. |
---|
1121 | \item An additional property about the structured trace that no |
---|
1122 | `program counter' is \textbf{repeated} between cost labels. Together with |
---|
1123 | the structure in the trace, this takes over from showing that |
---|
1124 | cost labelling is sound and precise. |
---|
1125 | \item A proof that the \textbf{observables} have been preserved. |
---|
1126 | \item A proof that the \textbf{stack limit} is still observed by the prefix and |
---|
1127 | the structure trace. (This is largely a consequence of the |
---|
1128 | preservation of observables.) |
---|
1129 | \end{enumerate} |
---|
1130 | The \lstinline'front_end_correct' lemma in the |
---|
1131 | \lstinline'correctness.ma' file provides a record containing these. |
---|
1132 | |
---|
1133 | Following the outline in Figure~\ref{fig:compiler}, we will first deal |
---|
1134 | with the transformations in \textsf{Clight} that produce the source |
---|
1135 | program with cost labels, then show that measurable traces can be |
---|
1136 | lifted to \textsf{RTLabs}, and finally show that we can construct the |
---|
1137 | properties listed above ready for the back-end proofs. |
---|
1138 | |
---|
1139 | \section{Input code to cost labelled program} |
---|
1140 | \label{sec:inputtolabelling} |
---|
1141 | |
---|
1142 | As explained on page~\pageref{page:switchintro}, the costs of complex |
---|
1143 | C \lstinline[language=C]'switch' statements cannot be represented with |
---|
1144 | the simple labelling used in the formalised compiler. Our first pass |
---|
1145 | replaces these statements with simpler C code, allowing our second |
---|
1146 | pass to perform the cost labelling. We show that the behaviour of |
---|
1147 | programs is unchanged by these passes using forward |
---|
1148 | simulations\footnote{All of our languages are deterministic, which can |
---|
1149 | be seen directly from their executable definitions. Thus we know that |
---|
1150 | forward simulations are sufficient because the target cannot have any |
---|
1151 | other behaviour.}. |
---|
1152 | |
---|
1153 | \subsection{Switch removal} |
---|
1154 | |
---|
1155 | We compile away \lstinline[language=C]'switch' statements into more |
---|
1156 | basic \textsf{Clight} code. |
---|
1157 | Note that this transformation does not necessarily deteriorate the |
---|
1158 | efficiency of the generated code. For instance, compilers such as GCC |
---|
1159 | introduce balanced trees of ``if-then-else'' constructs for small |
---|
1160 | switches. However, our implementation strategy is much simpler. Let |
---|
1161 | us consider the following input statement. |
---|
1162 | |
---|
1163 | \begin{lstlisting}[language=C] |
---|
1164 | switch(e) { |
---|
1165 | case v1: |
---|
1166 | stmt1; |
---|
1167 | case v2: |
---|
1168 | stmt2; |
---|
1169 | default: |
---|
1170 | stmt_default; |
---|
1171 | } |
---|
1172 | \end{lstlisting} |
---|
1173 | |
---|
1174 | Note that \textsf{stmt1}, \textsf{stmt2}, \ldots \textsf{stmt\_default} |
---|
1175 | may contain \lstinline[language=C]'break' statements, which have the |
---|
1176 | effect of exiting the switch statement. In the absence of break, the |
---|
1177 | execution falls through each case sequentially. In our implementation, |
---|
1178 | we produce an equivalent sequence of ``if-then'' chained by gotos: |
---|
1179 | \begin{lstlisting}[language=C] |
---|
1180 | fresh = e; |
---|
1181 | if(fresh == v1) { |
---|
1182 | $\llbracket$stmt1$\rrbracket$; |
---|
1183 | goto lbl_case2; |
---|
1184 | }; |
---|
1185 | if(fresh == v2) { |
---|
1186 | lbl_case2: |
---|
1187 | $\llbracket$stmt2$\rrbracket$; |
---|
1188 | goto lbl_case2; |
---|
1189 | }; |
---|
1190 | $\llbracket$stmt_default$\rrbracket$; |
---|
1191 | exit_label: |
---|
1192 | \end{lstlisting} |
---|
1193 | |
---|
1194 | The proof had to tackle the following points: |
---|
1195 | \begin{itemize} |
---|
1196 | \item the source and target memories are not the same (due to the fresh variable), |
---|
1197 | \item the flow of control is changed in a non-local way (e.g. \textbf{goto} |
---|
1198 | instead of \textbf{break}). |
---|
1199 | \end{itemize} |
---|
1200 | In order to tackle the first point, we implemented a version of memory |
---|
1201 | extensions similar to those of CompCert. |
---|
1202 | |
---|
1203 | For the simulation we decided to prove a sufficient amount to give us |
---|
1204 | confidence in the definitions and approach, but to curtail the proof |
---|
1205 | because this pass does not contribute to the intensional correctness |
---|
1206 | result. We tackled several simple cases, that do not interact with |
---|
1207 | the switch removal per se, to show that the definitions were usable, |
---|
1208 | and part of the switch case to check that the approach is |
---|
1209 | reasonable. This comprises propagating the memory extension through |
---|
1210 | each statement (except switch), as well as various invariants that are |
---|
1211 | needed for the switch case (in particular, freshness hypotheses). The |
---|
1212 | details of the evaluation process for the source switch statement and |
---|
1213 | its target counterpart can be found in the file |
---|
1214 | \lstinline'switchRemoval.ma', along more details on the transformation |
---|
1215 | itself. |
---|
1216 | |
---|
1217 | Proving the correctness of the second point would require reasoning on the |
---|
1218 | semantics of \lstinline[language=C]'goto' statements. In the \textsf{Clight} |
---|
1219 | semantics, this is implemented as a function-wide lookup of the target label. |
---|
1220 | The invariant we would need is the fact that a global label lookup on a freshly |
---|
1221 | created goto is equivalent to a local lookup. This would in turn require the |
---|
1222 | propagation of some freshness hypotheses on labels. As discussed, |
---|
1223 | we decided to omit this part of the correctness proof. |
---|
1224 | |
---|
1225 | \subsection{Cost labelling} |
---|
1226 | |
---|
1227 | The simulation for the cost labelling pass is the simplest in the |
---|
1228 | front-end. The main argument is that any step of the source program |
---|
1229 | is simulated by the same step of the labelled one, plus any extra |
---|
1230 | steps for the added cost labels. The extra instructions do not change |
---|
1231 | the memory or local environments, although we have to keep track of |
---|
1232 | the extra instructions that appear in continuations, for example |
---|
1233 | during the execution of a \lstinline[language=C]'while' loop. |
---|
1234 | |
---|
1235 | We do not attempt to capture any cost properties of the labelling\footnote{We describe how the cost properties are |
---|
1236 | established in Section~\ref{sec:costchecks}.} in |
---|
1237 | the simulation proof, which allows the proof to be oblivious to the choice |
---|
1238 | of cost labels. Hence we do not have to reason about the threading of |
---|
1239 | name generation through the labelling function, greatly reducing the |
---|
1240 | amount of effort required. |
---|
1241 | |
---|
1242 | %TODO: both give one-step-sim-by-many forward sim results; switch |
---|
1243 | %removal tricky, uses aux var to keep result of expr, not central to |
---|
1244 | %intensional correctness so curtailed proof effort once reasonable |
---|
1245 | %level of confidence in code gained; labelling much simpler; don't care |
---|
1246 | %what the labels are at this stage, just need to know when to go |
---|
1247 | %through extra steps. Rolled up into a single result with a cofixpoint |
---|
1248 | %to obtain coinductive statement of equivalence (show). |
---|
1249 | |
---|
1250 | \section{Finding corresponding measurable subtraces} |
---|
1251 | \label{sec:measurablelifting} |
---|
1252 | |
---|
1253 | There follow the three main passes of the front-end: |
---|
1254 | \begin{enumerate} |
---|
1255 | \item simplification of casts in \textsf{Clight} code |
---|
1256 | \item \textsf{Clight} to \textsf{Cminor} translation, performing stack |
---|
1257 | variable allocation and simplifying control structures |
---|
1258 | \item transformation to \textsf{RTLabs} control flow graph |
---|
1259 | \end{enumerate} |
---|
1260 | We have taken a common approach to |
---|
1261 | each pass: first we build (or axiomatise) forward simulation results |
---|
1262 | that are similar to normal compiler proofs, but which are slightly more |
---|
1263 | fine-grained so that we can see that the call structure and relative |
---|
1264 | placement of cost labels is preserved. |
---|
1265 | |
---|
1266 | Then we instantiate a general result which shows that we can find a |
---|
1267 | \emph{measurable} subtrace in the target of the pass that corresponds |
---|
1268 | to the measurable subtrace in the source. By repeated application of |
---|
1269 | this result we can find a measurable subtrace of the execution of the |
---|
1270 | \textsf{RTLabs} code, suitable for the construction of a structured |
---|
1271 | trace (see Section~\ref{sec:structuredtrace}). This is essentially an |
---|
1272 | extra layer on top of the simulation proofs that provides us with the |
---|
1273 | additional information required for our intensional correctness proof. |
---|
1274 | |
---|
1275 | \subsection{Generic measurable subtrace lifting proof} |
---|
1276 | |
---|
1277 | Our generic proof is parametrised on a record containing small-step |
---|
1278 | semantics for the source and target language, a classification of |
---|
1279 | states (the same form of classification is used when defining |
---|
1280 | structured traces), a simulation relation which respects the |
---|
1281 | classification and cost labelling and |
---|
1282 | four simulation results. The simulations are split by the starting state's |
---|
1283 | classification and whether it is a cost label, which will allow us to |
---|
1284 | observe that the call structure is preserved. They are: |
---|
1285 | \begin{enumerate} |
---|
1286 | \item a step from a `normal' state (which is not classified as a call |
---|
1287 | or return) which is not a cost label is simulated by zero or more |
---|
1288 | `normal' steps; |
---|
1289 | \item a step from a `call' state followed by a cost label step is |
---|
1290 | simulated by a step from a `call' state, a corresponding label step, |
---|
1291 | then zero or more `normal' steps; |
---|
1292 | \item a step from a `call' state not followed by a cost label |
---|
1293 | similarly (note that this case cannot occur in a well-labelled |
---|
1294 | program, but we do not have enough information locally to exploit |
---|
1295 | this); and |
---|
1296 | \item a cost label step is simulated by a cost label step. |
---|
1297 | \end{enumerate} |
---|
1298 | Finally, we need to know that a successfully translated program will |
---|
1299 | have an initial state in the simulation relation with the original |
---|
1300 | program's initial state. |
---|
1301 | |
---|
1302 | The back-end has similar requirements for lifting simulations to |
---|
1303 | structured traces. Fortunately, our treatment of calls and returns |
---|
1304 | can be slightly simpler because we have special call and return states |
---|
1305 | that correspond to function entry and return that are separate from |
---|
1306 | the actual instructions. This was originally inherited from our port |
---|
1307 | of CompCert's \textsf{Clight} semantics, but proves useful here |
---|
1308 | because we only need to consider adding extra steps \emph{after} a |
---|
1309 | call or return state, because the instruction step deals with extra |
---|
1310 | steps that occur before. The back-end makes all of the call and |
---|
1311 | return machinery explicit, and thus needs more complex statements |
---|
1312 | about extra steps before and after each call and return. |
---|
1313 | |
---|
1314 | \begin{figure} |
---|
1315 | \begin{center} |
---|
1316 | \includegraphics[width=0.5\linewidth]{meassim.pdf} |
---|
1317 | \end{center} |
---|
1318 | \caption{Tiling of simulation for a measurable subtrace} |
---|
1319 | \label{fig:tiling} |
---|
1320 | \end{figure} |
---|
1321 | |
---|
1322 | To find the measurable subtrace in the target program's execution we |
---|
1323 | walk along the original program's execution trace applying the |
---|
1324 | appropriate simulation result by induction on the number of steps. |
---|
1325 | While the number of steps taken varies, the overall structure is |
---|
1326 | preserved, as illustrated in Figure~\ref{fig:tiling}. By preserving |
---|
1327 | the structure we also maintain the same intensional observables. One |
---|
1328 | delicate point is that the cost label following a call must remain |
---|
1329 | directly afterwards\footnote{The prototype compiler allowed some |
---|
1330 | straight-line code to appear before the cost label until a later |
---|
1331 | stage of the compiler, but we must move the requirement forward to |
---|
1332 | fit with the structured traces.} |
---|
1333 | % Damn it, I should have just moved the cost label forwards in RTLabs, |
---|
1334 | % like the prototype does in RTL to ERTL; the result would have been |
---|
1335 | % simpler. Or was there some reason not to do that? |
---|
1336 | (both in the program code and in the execution trace), even if we |
---|
1337 | introduce extra steps, for example to store parameters in memory in |
---|
1338 | \textsf{Cminor}. Thus we have a version of the call simulation |
---|
1339 | that deals with both the call and the cost label in one result. |
---|
1340 | |
---|
1341 | In addition to the subtrace we are interested in measuring, we must |
---|
1342 | prove that the earlier part of the trace is also preserved in |
---|
1343 | order to use the simulation from the initial state. This proof also |
---|
1344 | guarantees that we do not run out of stack space before the subtrace |
---|
1345 | we are interested in. The lemmas for this prefix and the measurable |
---|
1346 | subtrace are similar, following the pattern above. However, the |
---|
1347 | measurable subtrace also requires us to rebuild the termination |
---|
1348 | proof. This is defined recursively: |
---|
1349 | \label{prog:terminationproof} |
---|
1350 | \begin{lstlisting}[language=Grafite] |
---|
1351 | let rec will_return_aux C (depth:nat) |
---|
1352 | (trace:list (cs_state $...$ C $\times$ trace)) on trace : bool := |
---|
1353 | match trace with |
---|
1354 | [ nil $\Rightarrow$ false |
---|
1355 | | cons h tl $\Rightarrow$ |
---|
1356 | let $\langle$s,tr$\rangle$ := h in |
---|
1357 | match cs_classify C s with |
---|
1358 | [ cl_call $\Rightarrow$ will_return_aux C (S depth) tl |
---|
1359 | | cl_return $\Rightarrow$ |
---|
1360 | match depth with |
---|
1361 | [ O $\Rightarrow$ match tl with [ nil $\Rightarrow$ true | _ $\Rightarrow$ false ] |
---|
1362 | | S d $\Rightarrow$ will_return_aux C d tl |
---|
1363 | ] |
---|
1364 | | _ $\Rightarrow$ will_return_aux C depth tl |
---|
1365 | ] |
---|
1366 | ]. |
---|
1367 | \end{lstlisting} |
---|
1368 | The \lstinline'depth' is the number of return states we need to see |
---|
1369 | before we have returned to the original function (initially zero) and |
---|
1370 | \lstinline'trace' the measurable subtrace obtained from the running |
---|
1371 | the semantics for the correct number of steps. This definition |
---|
1372 | unfolds tail recursively for each step, and once the corresponding |
---|
1373 | simulation result has been applied a new one for the target can be |
---|
1374 | asserted by unfolding and applying the induction hypothesis on the |
---|
1375 | shorter trace. |
---|
1376 | |
---|
1377 | Combining the lemmas about the prefix and the measurable subtrace |
---|
1378 | requires a little care because the states joining the two might not be |
---|
1379 | related in the simulation. In particular, if the measurable subtrace |
---|
1380 | starts from the cost label at the beginning of the function there may |
---|
1381 | be some extra instructions in the target code to execute to complete |
---|
1382 | function entry before the states are back in the relation. Hence we |
---|
1383 | carefully phrased the lemmas to allow for such extra steps. |
---|
1384 | |
---|
1385 | Together, these then gives us an overall result for any simulation fitting the |
---|
1386 | requirements above (contained in the \lstinline'meas_sim' record): |
---|
1387 | \begin{lstlisting}[language=Grafite] |
---|
1388 | theorem measured_subtrace_preserved : |
---|
1389 | $\forall$MS:meas_sim. |
---|
1390 | $\forall$p1,p2,m,n,stack_cost,max. |
---|
1391 | ms_compiled MS p1 p2 $\rightarrow$ |
---|
1392 | measurable (ms_C1 MS) p1 m n stack_cost max $\rightarrow$ |
---|
1393 | $\exists$m',n'. |
---|
1394 | measurable (ms_C2 MS) p2 m' n' stack_cost max $\wedge$ |
---|
1395 | observables (ms_C1 MS) p1 m n = observables (ms_C2 MS) p2 m' n'. |
---|
1396 | \end{lstlisting} |
---|
1397 | The stack space requirement that is embedded in \lstinline'measurable' |
---|
1398 | is a consequence of the preservation of observables, because it is |
---|
1399 | determined by the functions called and returned from, which are observable. |
---|
1400 | |
---|
1401 | \subsection{Simulation results for each pass} |
---|
1402 | |
---|
1403 | We now consider the simulation results for the passes, each of which |
---|
1404 | is used to instantiate the |
---|
1405 | \lstinline[language=Grafite]'measured_subtrace_preserved' theorem to |
---|
1406 | construct the measurable subtrace for the next language. |
---|
1407 | |
---|
1408 | \subsubsection{Cast simplification} |
---|
1409 | |
---|
1410 | The parser used in \cerco{} introduces a lot of explicit type casts. |
---|
1411 | If left as they are, these constructs can greatly hamper the |
---|
1412 | quality of the generated code -- especially as the architecture |
---|
1413 | we consider is an $8$-bit one. In \textsf{Clight}, casts are |
---|
1414 | expressions. Hence, most of the work of this transformation |
---|
1415 | proceeds on expressions. The transformation proceeds by recursively |
---|
1416 | trying to coerce an expression to a particular integer type, which |
---|
1417 | is in practice smaller than the original one. This functionality |
---|
1418 | is implemented by two mutually recursive functions whose signature |
---|
1419 | is the following. |
---|
1420 | |
---|
1421 | \begin{lstlisting}[language=Grafite] |
---|
1422 | let rec simplify_expr (e:expr) (target_sz:intsize) (target_sg:signedness) |
---|
1423 | : $\Sigma$result:bool$\times$expr. |
---|
1424 | $\forall$ge,en,m. simplify_inv ge en m e (\snd result) target_sz target_sg (\fst result) := $\ldots$ |
---|
1425 | |
---|
1426 | and simplify_inside (e:expr) : $\Sigma$result:expr. conservation e result := $\ldots$ |
---|
1427 | \end{lstlisting} |
---|
1428 | |
---|
1429 | The \textsf{simplify\_inside} acts as a wrapper for |
---|
1430 | \textsf{simplify\_expr}. Whenever \textsf{simplify\_inside} encounters |
---|
1431 | a \textsf{Ecast} expression, it tries to coerce the sub-expression |
---|
1432 | to the desired type using \textsf{simplify\_expr}, which tries to |
---|
1433 | perform the actual coercion. In return, \textsf{simplify\_expr} calls |
---|
1434 | back \textsf{simplify\_inside} in some particular positions, where we |
---|
1435 | decided to be conservative in order to simplify the proofs. However, |
---|
1436 | the current design allows to incrementally revert to a more aggressive |
---|
1437 | version, by replacing recursive calls to \textsf{simplify\_inside} by |
---|
1438 | calls to \textsf{simplify\_expr} \emph{and} proving the corresponding |
---|
1439 | invariants -- where possible. |
---|
1440 | |
---|
1441 | The \textsf{simplify\_inv} invariant encodes either the conservation |
---|
1442 | of the semantics during the transformation corresponding to the failure |
---|
1443 | of the coercion (\textsf{Inv\_eq} constructor), or the successful |
---|
1444 | downcast of the considered expression to the target type |
---|
1445 | (\textsf{Inv\_coerce\_ok}). |
---|
1446 | |
---|
1447 | \begin{lstlisting}[language=Grafite] |
---|
1448 | inductive simplify_inv |
---|
1449 | (ge : genv) (en : env) (m : mem) |
---|
1450 | (e1 : expr) (e2 : expr) (target_sz : intsize) (target_sg : signedness) : bool $\rightarrow$ Prop := |
---|
1451 | | Inv_eq : $\forall$result_flag. $\ldots$ |
---|
1452 | simplify_inv ge en m e1 e2 target_sz target_sg result_flag |
---|
1453 | | Inv_coerce_ok : $\forall$src_sz,src_sg. |
---|
1454 | typeof e1 = Tint src_sz src_sg $\rightarrow$ |
---|
1455 | typeof e2 = Tint target_sz target_sg $\rightarrow$ |
---|
1456 | smaller_integer_val src_sz target_sz src_sg (exec_expr ge en m e1) (exec_expr ge en m e2) $\rightarrow$ |
---|
1457 | simplify_inv ge en m e1 e2 target_sz target_sg true. |
---|
1458 | \end{lstlisting} |
---|
1459 | |
---|
1460 | The \textsf{conservation} invariant for \textsf{simplify\_inside} simply states the conservation |
---|
1461 | of the semantics, as in the \textsf{Inv\_eq} constructor of the previous |
---|
1462 | invariant. |
---|
1463 | |
---|
1464 | \begin{lstlisting}[language=Grafite] |
---|
1465 | definition conservation := $\lambda$e,result. $\forall$ge,en,m. |
---|
1466 | res_sim ? (exec_expr ge en m e) (exec_expr ge en m result) |
---|
1467 | $\wedge$ res_sim ? (exec_lvalue ge en m e) (exec_lvalue ge en m result) |
---|
1468 | $\wedge$ typeof e = typeof result. |
---|
1469 | \end{lstlisting} |
---|
1470 | |
---|
1471 | This invariant is then easily lifted to statement evaluations. |
---|
1472 | The main problem encountered with this particular pass was dealing with |
---|
1473 | inconsistently typed programs, a canonical case being a particular |
---|
1474 | integer constant of a certain size typed with another size. This |
---|
1475 | prompted the need to introduce numerous type checks, making |
---|
1476 | both the implementation and the proof more complex, even though more |
---|
1477 | comprehensive checks are made in the next stage. |
---|
1478 | %\todo{Make this a particular case of the more general statement on baking more invariants in the Clight language} |
---|
1479 | |
---|
1480 | \subsubsection{Clight to Cminor} |
---|
1481 | |
---|
1482 | This pass is the last one operating on the \textsf{Clight} language. |
---|
1483 | Its input is a full \textsf{Clight} program, and its output is a |
---|
1484 | \textsf{Cminor} program. Note that we do not use an equivalent of |
---|
1485 | CompCert's \textsf{C\#minor} language: we translate directly to a |
---|
1486 | variant of \textsf{Cminor}. This presents the advantage of not |
---|
1487 | requiring the special loop constructs, nor the explicit block |
---|
1488 | structure. Another salient point of our approach is that a significant |
---|
1489 | number of the properties needed for the simulation proof were directly |
---|
1490 | encoded in dependently typed translation functions. In particular, |
---|
1491 | freshness conditions and well-typedness conditions are included. The |
---|
1492 | main effects of the transformation from \textsf{Clight} to |
---|
1493 | \textsf{Cminor} are listed below. |
---|
1494 | |
---|
1495 | \begin{itemize} |
---|
1496 | \item Variables are classified as being either globals, stack-allocated |
---|
1497 | locals or potentially register-allocated locals. The value of register-allocated |
---|
1498 | local variables is moved out of the modelled memory and stored in a |
---|
1499 | dedicated environment. |
---|
1500 | \item In \textsf{Clight}, each local variable has a dedicated memory block, whereas |
---|
1501 | stack-allocated locals are bundled together on a function-by-function basis. |
---|
1502 | \item Loops are converted to jumps. |
---|
1503 | \end{itemize} |
---|
1504 | |
---|
1505 | The first two points require memory injections which are more flexible that those |
---|
1506 | needed in the switch removal case. In the remainder of this section, we briefly |
---|
1507 | discuss our implementation of memory injections, and then the simulation proof. |
---|
1508 | |
---|
1509 | \paragraph{Memory injections.} |
---|
1510 | |
---|
1511 | Our memory injections are modelled after the work of Blazy \& Leroy. |
---|
1512 | However, the corresponding paper is based on the first version of the |
---|
1513 | CompCert memory model~\cite{2008-Leroy-Blazy-memory-model}, whereas we use a much more concrete model, allowing byte-level |
---|
1514 | manipulations (as in the later version of CompCert's memory model). We proved |
---|
1515 | roughly 80 \% of the required lemmas. Notably, some of the difficulties encountered were |
---|
1516 | due to overly relaxed conditions on pointer validity (fixed during development). |
---|
1517 | Some more side conditions had to be added to take care of possible overflows when converting |
---|
1518 | from \textbf{Z} block bounds to $16$ bit pointer offsets (in practice, such overflows |
---|
1519 | only occur in edge cases that are easily ruled out -- but this fact is not visible |
---|
1520 | in memory injections). Concretely, some of the lemmas on the preservation of simulation of |
---|
1521 | loads after writes were axiomatised, due to a lack of time. |
---|
1522 | |
---|
1523 | \paragraph{Simulation proof.} |
---|
1524 | |
---|
1525 | We proved the simulation result for expressions and a representative |
---|
1526 | selection of statements. In particular we tackled |
---|
1527 | \lstinline[language=C]'while' statements to ensure that we correctly |
---|
1528 | translate loops because our approach differs from CompCert by |
---|
1529 | converting directly to \textsf{Cminor} \lstinline[language=C]'goto's |
---|
1530 | rather than maintaining a notion of loop in \textsf{Cminor}. We also have a partial |
---|
1531 | proof for function entry, covering the setup of the memory injection, |
---|
1532 | but not function exit. Exits, and the remaining statements, have been |
---|
1533 | axiomatised. |
---|
1534 | |
---|
1535 | Careful management of the proof state was required because proof terms |
---|
1536 | are embedded in \textsf{Cminor} code to show that invariants are |
---|
1537 | respected. These proof terms appear in the proof state when inverting |
---|
1538 | the translation functions, and they can be large and awkward. While |
---|
1539 | generalising them away is usually sufficient, it can be difficult when |
---|
1540 | they appear under a binder. |
---|
1541 | |
---|
1542 | %The correctness proof for this transformation was not completed. We proved the |
---|
1543 | %simulation result for expressions and for some subset of the critical statement cases. |
---|
1544 | %Notably lacking are the function entry and exit, where the memory injection is |
---|
1545 | %properly set up. As would be expected, a significant amount of work has to be performed |
---|
1546 | %to show the conservation of all invariants at each simulation step. |
---|
1547 | |
---|
1548 | %\todo{list cases, explain while loop, explain labeling problem} |
---|
1549 | |
---|
1550 | \subsubsection{Cminor to RTLabs} |
---|
1551 | |
---|
1552 | The translation from \textsf{Cminor} to \textsf{RTLabs} is a fairly |
---|
1553 | routine control flow graph (CFG) construction. As such, we chose to |
---|
1554 | axiomatise the associated extensional simulation results. However, we did prove several |
---|
1555 | properties of the generated programs: |
---|
1556 | \begin{itemize} |
---|
1557 | \item All statements are type correct with respect to the declared |
---|
1558 | pseudo-register type environment. |
---|
1559 | \item The CFG is closed, and has a distinguished entry node and a |
---|
1560 | unique exit node. |
---|
1561 | \end{itemize} |
---|
1562 | |
---|
1563 | These properties rely on similar properties about type safety and the |
---|
1564 | presence of \lstinline[language=C]'goto'-labels for \textsf{Cminor} programs |
---|
1565 | which are checked at the preceding stage. As a result, this |
---|
1566 | transformation is total and any compilation failures must occur when |
---|
1567 | the corresponding \textsf{Clight} source is available and a better |
---|
1568 | error message can be generated. |
---|
1569 | |
---|
1570 | The proof obligations for these properties include many instances of |
---|
1571 | graph inclusion. We automated these proofs using a small amount of |
---|
1572 | reflection, making the obligations much easier to handle. One |
---|
1573 | drawback to enforcing invariants throughout is that temporarily |
---|
1574 | breaking them can be awkward. For example, \lstinline'return' |
---|
1575 | statements were originally used as placeholders for |
---|
1576 | \lstinline[language=C]'goto' destinations that had not yet been |
---|
1577 | translated. However, this made establishing the single exit node |
---|
1578 | property rather difficult, and a different placeholder was chosen |
---|
1579 | instead. In other circumstances it is possible to prove a more |
---|
1580 | complex invariant then simplify it at the end of the transformation. |
---|
1581 | |
---|
1582 | \section{Checking cost labelling properties} |
---|
1583 | \label{sec:costchecks} |
---|
1584 | |
---|
1585 | Ideally, we would provide proofs that the cost labelling pass always |
---|
1586 | produces programs that are soundly and precisely labelled and that |
---|
1587 | each subsequent pass preserves these properties. This would match our |
---|
1588 | use of dependent types to eliminate impossible sources of errors |
---|
1589 | during compilation, in particular retaining intermediate language type |
---|
1590 | information. |
---|
1591 | |
---|
1592 | However, given the limited amount of time available we realised that |
---|
1593 | implementing a compile-time check for a sound and precise labelling of |
---|
1594 | the \textsf{RTLabs} intermediate code would reduce the proof burden |
---|
1595 | considerably. This is similar in spirit to the use of translation |
---|
1596 | validation in certified compilation, which makes a similar trade-off |
---|
1597 | between the potential for compile-time failure and the volume of proof |
---|
1598 | required. |
---|
1599 | |
---|
1600 | The check cannot be pushed into a later stage of the compiler because |
---|
1601 | much of the information is embedded into the structured traces. |
---|
1602 | However, if an alternative method was used to show that function |
---|
1603 | returns in the compiled code are sufficiently well-behaved, then we |
---|
1604 | could consider pushing the cost property checks into the timing |
---|
1605 | analysis itself. We leave this as a possible area for future work. |
---|
1606 | |
---|
1607 | \subsection{Implementation and correctness} |
---|
1608 | \label{sec:costchecksimpl} |
---|
1609 | |
---|
1610 | For a cost labelling to be sound and precise we need a cost label at |
---|
1611 | the start of each function, after each branch and at least one in |
---|
1612 | every loop. The first two parts are trivial to check by examining the |
---|
1613 | code. In \textsf{RTLabs} the last part is specified by saying |
---|
1614 | that there is a bound on the number of successive instruction nodes in |
---|
1615 | the CFG that you can follow before you encounter a cost label, and |
---|
1616 | checking this is more difficult. |
---|
1617 | |
---|
1618 | The implementation progresses through the set of nodes in the graph, |
---|
1619 | following successors until a cost label is found or a label-free cycle |
---|
1620 | is discovered (in which case the property does not hold and we return |
---|
1621 | an error). This is made easier by the prior knowledge that every |
---|
1622 | successor of a branch instruction is a cost label, so we do not need |
---|
1623 | to search each branch. When a label is found, we remove the chain of |
---|
1624 | program counters from the set and continue from another node in the |
---|
1625 | set until it is empty, at which point we know that there is a bound |
---|
1626 | for every node in the graph. |
---|
1627 | |
---|
1628 | Directly reasoning about the function that implements this procedure would be |
---|
1629 | rather awkward, so an inductive specification of a single step of its |
---|
1630 | behaviour was written and proved to match the implementation. This |
---|
1631 | was then used to prove the implementation sound and complete. |
---|
1632 | |
---|
1633 | While we have not attempted to prove that the cost labelled properties |
---|
1634 | are established and preserved earlier in the compiler, we expect that |
---|
1635 | the effort for the \textsf{Cminor} to \textsf{RTLabs} stage alone |
---|
1636 | would be similar to the work outlined above, because it involves the |
---|
1637 | change from requiring a cost label at particular positions to |
---|
1638 | requiring cost labels to break loops in the CFG. As there are another |
---|
1639 | three passes to consider (including the labelling itself), we believe |
---|
1640 | that using the check above is much simpler overall. |
---|
1641 | |
---|
1642 | % TODO? Found some Clight to Cminor bugs quite quickly |
---|
1643 | |
---|
1644 | \section{Existence of a structured trace} |
---|
1645 | \label{sec:structuredtrace} |
---|
1646 | |
---|
1647 | The \emph{structured trace} idea introduced in |
---|
1648 | Section~\ref{sec:fegoals} enriches the execution trace of a program by |
---|
1649 | nesting function calls in a mixed-step style and embedding the cost |
---|
1650 | labelling properties of the program. See Figure~\ref{fig:strtrace} on |
---|
1651 | page~\pageref{fig:strtrace} for an illustration of a structured trace. |
---|
1652 | It was originally designed to support the proof of correctness for the |
---|
1653 | timing analysis of the object code in the back-end, then generalised |
---|
1654 | to provide a common structure to use from the end of the front-end to |
---|
1655 | the object code. |
---|
1656 | |
---|
1657 | To make the definition generic we abstract over the semantics of the |
---|
1658 | language, |
---|
1659 | \begin{lstlisting}[language=Grafite] |
---|
1660 | record abstract_status : Type[1] := |
---|
1661 | { as_status :> Type[0] |
---|
1662 | ; as_execute : relation as_status |
---|
1663 | ; as_pc : DeqSet |
---|
1664 | ; as_pc_of : as_status $\rightarrow$ as_pc |
---|
1665 | ; as_classify : as_status $\rightarrow$ status_class |
---|
1666 | ; as_label_of_pc : as_pc $\rightarrow$ option costlabel |
---|
1667 | ; as_after_return : ($\Sigma$s:as_status. as_classify s = cl_call) $\rightarrow$ as_status $\rightarrow$ Prop |
---|
1668 | ; as_result: as_status $\rightarrow$ option int |
---|
1669 | ; as_call_ident : ($\Sigma$s:as_status.as_classify s = cl_call) $\rightarrow$ ident |
---|
1670 | ; as_tailcall_ident : ($\Sigma$s:as_status.as_classify s = cl_tailcall) $\rightarrow$ ident |
---|
1671 | }. |
---|
1672 | \end{lstlisting} |
---|
1673 | which requires a type of states, an execution relation\footnote{All of |
---|
1674 | our semantics are executable, but using a relation was simpler in |
---|
1675 | the abstraction.}, some notion of abstract |
---|
1676 | program counter with decidable equality, the classification of states, |
---|
1677 | and functions to extract the observable intensional information (cost |
---|
1678 | labels and the identity of functions that are called). The |
---|
1679 | \lstinline'as_after_return' property links the state before a function |
---|
1680 | call with the state after return, providing the evidence that |
---|
1681 | execution returns to the correct place. The precise form varies |
---|
1682 | between stages; in \textsf{RTLabs} it insists the CFG, the pointer to |
---|
1683 | the CFG node to execute next, and some call stack information is |
---|
1684 | preserved. |
---|
1685 | |
---|
1686 | The structured traces are defined using three mutually inductive |
---|
1687 | types. The core data structure is \lstinline'trace_any_label', which |
---|
1688 | captures some straight-line execution until the next cost label or the |
---|
1689 | return from the enclosing function. Any function calls are embedded as |
---|
1690 | a single step, with its own trace nested inside and the before and |
---|
1691 | after states linked by \lstinline'as_after_return'; and states |
---|
1692 | classified as a `jump' (in particular branches) must be followed by a |
---|
1693 | cost label. |
---|
1694 | |
---|
1695 | The second type, \lstinline'trace_label_label', is a |
---|
1696 | \lstinline'trace_any_label' where the initial state is cost labelled. |
---|
1697 | Thus a trace in this type identifies a series of steps whose cost is |
---|
1698 | entirely accounted for by the label at the start. |
---|
1699 | |
---|
1700 | Finally, \lstinline'trace_label_return' is a sequence of |
---|
1701 | \lstinline'trace_label_label' values which end in the return from the |
---|
1702 | function. These correspond to a measurable subtrace, and in |
---|
1703 | particular include executions of an entire function call (and so are |
---|
1704 | used for the nested calls in \lstinline'trace_any_label'). |
---|
1705 | |
---|
1706 | \subsection{Construction} |
---|
1707 | |
---|
1708 | The construction of the structured trace replaces syntactic cost |
---|
1709 | labelling properties, which place requirements on where labels appear |
---|
1710 | in the program, with semantic properties that constrain the execution |
---|
1711 | traces of the program. The construction begins by defining versions |
---|
1712 | of the sound and precise labelling properties on states and global |
---|
1713 | environments (for the code that appears in each of them) rather than |
---|
1714 | whole programs, and showing that these are preserved by steps of the |
---|
1715 | \textsf{RTLabs} semantics. |
---|
1716 | |
---|
1717 | Then we show that each cost labelling property required by the |
---|
1718 | definition of structured traces is locally satisfied. These proofs are |
---|
1719 | broken up by the classification of states. Similarly, we prove a |
---|
1720 | step-by-step stack preservation result, which states that the |
---|
1721 | \textsf{RTLabs} semantics never changes the lower parts of the stack. |
---|
1722 | |
---|
1723 | The core part of the construction of a structured trace is to use the |
---|
1724 | proof of termination from the measurable trace (defined on |
---|
1725 | page~\pageref{prog:terminationproof}) to `fold up' the execution into |
---|
1726 | the nested form. The results outlined above fill in the proof |
---|
1727 | obligations for the cost labelling properties and the stack |
---|
1728 | preservation result shows that calls return to the correct location. |
---|
1729 | |
---|
1730 | The structured trace alone is not sufficient to capture the property |
---|
1731 | that the program is soundly labelled. While the structured trace |
---|
1732 | guarantees termination, it still permits a loop to be executed a |
---|
1733 | finite number of times without encountering a cost label. We |
---|
1734 | eliminate this by proving that no `program counter' repeats within any |
---|
1735 | \lstinline'trace_any_label' section by showing that it is incompatible |
---|
1736 | with the property that there is a bound on the number of successor |
---|
1737 | instructions you can follow in the CFG before you encounter a cost |
---|
1738 | label (from Section~\ref{sec:costchecksimpl}). |
---|
1739 | |
---|
1740 | \subsubsection{Complete execution structured traces} |
---|
1741 | |
---|
1742 | The development of the construction above started relatively early, |
---|
1743 | before the measurable subtrace preservation proofs. To be confident |
---|
1744 | that the traces were well-formed at that time, we also developed a |
---|
1745 | complete execution form that embeds the traces above. This includes |
---|
1746 | non-terminating program executions, where an infinite number of the terminating |
---|
1747 | structured traces are embedded. This construction confirmed that our |
---|
1748 | definition of structured traces was consistent, although we later |
---|
1749 | found that we did not require the whole execution version for the |
---|
1750 | compiler correctness results. |
---|
1751 | |
---|
1752 | To construct these we need to know whether each function call will |
---|
1753 | eventually terminate, requiring the use of the excluded middle. This |
---|
1754 | classical reasoning is local to the construction of whole program |
---|
1755 | traces and is not necessary for our main results. |
---|
1756 | |
---|
1757 | \section{Conclusion} |
---|
1758 | |
---|
1759 | In combination with the work on the CerCo back-end and by |
---|
1760 | concentrating on the novel intensional parts of the proof, we have |
---|
1761 | shown that it is possible to construct certifying compilers that |
---|
1762 | correctly report execution time and stack space costs. The layering |
---|
1763 | of intensional correctness proofs on top of normal simulation results |
---|
1764 | provides a useful separation of concerns, and could permit the reuse |
---|
1765 | of existing results. |
---|
1766 | |
---|
1767 | \section{Files} |
---|
1768 | |
---|
1769 | The following table gives a high-level overview of the \matita{} |
---|
1770 | source files in Deliverable 3.4: |
---|
1771 | |
---|
1772 | \bigskip |
---|
1773 | |
---|
1774 | \begin{tabular}{rp{.7\linewidth}} |
---|
1775 | \lstinline'compiler.ma' & Top-level compiler definitions, in particular |
---|
1776 | \lstinline'front_end', and the whole compiler definition |
---|
1777 | \lstinline'compile'. \\ |
---|
1778 | \lstinline'correctness.ma' & Correctness results: \lstinline'front_end_correct' |
---|
1779 | and \lstinline'correct', respectively. \\ |
---|
1780 | \lstinline'Clight/*' & \textsf{Clight}: proofs for switch |
---|
1781 | removal, cost labelling, cast simplification and conversion to |
---|
1782 | \textsf{Cminor}. \\ |
---|
1783 | \lstinline'Cminor/*' & \textsf{Cminor}: axioms of conversion to |
---|
1784 | \textsf{RTLabs}. \\ |
---|
1785 | \lstinline'RTLabs/*' & \textsf{RTLabs}: definitions and proofs for |
---|
1786 | compile-time cost labelling checks, construction of structured traces. |
---|
1787 | \\ |
---|
1788 | \lstinline'common/Measurable.ma' & Definitions for measurable |
---|
1789 | subtraces. \\ |
---|
1790 | \lstinline'common/FEMeasurable.ma' & Generic measurable subtrace |
---|
1791 | lifting proof. \\ |
---|
1792 | \lstinline'common/*' & Other common definitions relevant to many parts |
---|
1793 | of the compiler and proof. \\ |
---|
1794 | \lstinline'utilities/*' & General purpose definitions used throughout, |
---|
1795 | including extensions to the standard \matita{} library. |
---|
1796 | \end{tabular} |
---|
1797 | |
---|
1798 | \subsection{A brief overview of the backend compilation chain} |
---|
1799 | \label{subsect.brief.overview.backend.compilation.chain} |
---|
1800 | |
---|
1801 | The Matita compiler's backend consists of five distinct intermediate languages: RTL, RTLntl, ERTL, LTL and LIN. |
---|
1802 | A sixth language, RTLabs, serves as the entry point of the backend and the exit point of the frontend. |
---|
1803 | RTL, RTLntl, ERTL and LTL are `control flow graph based' languages, whereas LIN is a linearised language, the final language before translation to assembly. |
---|
1804 | |
---|
1805 | We now briefly discuss the properties of the intermediate languages, and discuss the various transformations that take place during the translation process: |
---|
1806 | |
---|
1807 | \paragraph{RTLabs ((Abstract) Register Transfer Language)} |
---|
1808 | As mentioned, this is the final language of the compiler's frontend and the entry point for the backend. |
---|
1809 | This language uses pseudoregisters, not hardware registers.\footnote{There are an unbounded number of pseudoregisters. Pseudoregisters are converted to hardware registers or stack positions during register allocation.} |
---|
1810 | Functions still use stackframes, where arguments are passed on the stack and results are stored in addresses. |
---|
1811 | During the pass to RTL instruction selection is carried out. |
---|
1812 | |
---|
1813 | \paragraph{RTL (Register Transfer Language)} |
---|
1814 | This language uses pseudoregisters, not hardware registers. |
---|
1815 | Tailcall elimination is carried out during the translation from RTL to RTLntl. |
---|
1816 | |
---|
1817 | \paragraph{RTLntl (Register Transfer Language --- No Tailcalls)} |
---|
1818 | This language is a pseudoregister, graph based language where all tailcalls are eliminated. |
---|
1819 | RTLntl is not present in the O'Caml compiler, the RTL language is reused for this purpose. |
---|
1820 | |
---|
1821 | \paragraph{ERTL (Explicit Register Transfer Language)} |
---|
1822 | This is a language very similar to RTLntl. |
---|
1823 | However, the calling convention is made explicit, in that functions no longer receive and return inputs and outputs via a high-level mechanism, but rather use stack slots or hadware registers. |
---|
1824 | The ERTL to LTL pass performs the following transformations: liveness analysis, register colouring and register/stack slot allocation. |
---|
1825 | |
---|
1826 | \paragraph{LTL (Linearisable Transfer Language)} |
---|
1827 | Another graph based language, but uses hardware registers instead of pseudoregisters. |
---|
1828 | Tunnelling (branch compression) should be implemented here. |
---|
1829 | |
---|
1830 | \paragraph{LIN (Linearised)} |
---|
1831 | This is a linearised form of the LTL language; function graphs have been linearised into lists of statements. |
---|
1832 | All registers have been translated into hardware registers or stack addresses. |
---|
1833 | This is the final stage of compilation before translating directly into assembly language. |
---|
1834 | |
---|
1835 | %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% |
---|
1836 | % SECTION. % |
---|
1837 | %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% |
---|
1838 | \section{The backend intermediate languages in Matita} |
---|
1839 | \label{sect.backend.intermediate.languages.matita} |
---|
1840 | |
---|
1841 | We now discuss the encoding of the compiler backend languages in the Calculus of Constructions proper. |
---|
1842 | We pay particular heed to changes that we made from the O'Caml prototype. |
---|
1843 | In particular, many aspects of the backend languages have been unified into a single `joint' language. |
---|
1844 | We have also made heavy use of dependent types to reduce `spurious partiality' and to encode invariants. |
---|
1845 | |
---|
1846 | %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% |
---|
1847 | % SECTION. % |
---|
1848 | %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% |
---|
1849 | \subsection{Abstracting related languages} |
---|
1850 | \label{subsect.abstracting.related.languages} |
---|
1851 | |
---|
1852 | The O'Caml compiler is written in the following manner. |
---|
1853 | Each intermediate language has its own dedicated syntax, notions of internal function, and so on. |
---|
1854 | Here, we make a distinction between `internal functions'---other functions that are explicitly written by the programmer---and `external functions', which belong to external library and require explictly linking. |
---|
1855 | In particular, IO can be seen as a special case of the `external function' mechanism. |
---|
1856 | Internal functions are represented as a record consisting of a sequential structure of statements, entry and exit points to this structure, and other book keeping devices. |
---|
1857 | This sequential structure can either be a control flow graph or a linearised list of statements, depending on the language. |
---|
1858 | Translations between intermediate language map syntaxes to syntaxes, and internal function representations to internal function representations explicitly. |
---|
1859 | |
---|
1860 | This is a perfectly valid way to write a compiler, where everything is made explicit, but writing a \emph{verified} compiler poses new challenges. |
---|
1861 | In particular, we must look ahead to see how our choice of encodings will affect the size and complexity of the forthcoming proofs of correctness. |
---|
1862 | We now discuss some abstractions, introduced in the Matita code, which we hope will make our proofs shorter, amongst other benefits. |
---|
1863 | |
---|
1864 | \paragraph{Changes between languages made explicit} |
---|
1865 | Due to the bureaucracy inherent in explicating each intermediate language's syntax in the O'Caml compiler, it can often be hard to see exactly what changes between each successive intermediate language. |
---|
1866 | By abstracting the syntax of the RTL, ERTL, LTL and LIN intermediate languages, we make these changes much clearer. |
---|
1867 | |
---|
1868 | Our abstraction takes the following form: |
---|
1869 | \begin{lstlisting}[language=Grafite] |
---|
1870 | inductive joint_instruction (p: params__) (globals: list ident): Type[0] := |
---|
1871 | | COMMENT: String $\rightarrow$ joint_instruction p globals |
---|
1872 | ... |
---|
1873 | | INT: generic_reg p $\rightarrow$ Byte $\rightarrow$ joint_instruction p globals |
---|
1874 | ... |
---|
1875 | | OP1: Op1 $\rightarrow$ acc_a_reg p $\rightarrow$ acc_a_reg p $\rightarrow$ joint_instruction p globals |
---|
1876 | ... |
---|
1877 | | extension: extend_statements p $\rightarrow$ joint_instruction p globals. |
---|
1878 | \end{lstlisting} |
---|
1879 | We first note that for the majority of intermediate languages, many instructions are shared. |
---|
1880 | However, these instructions expect different register types (either a pseudoregister or a hardware register) as arguments. |
---|
1881 | We must therefore parameterise the joint syntax with a record of parameters that will be specialised to each intermediate language. |
---|
1882 | In the type above, this parameterisation is realised with the \texttt{params\_\_} record. |
---|
1883 | As a result of this parameterisation, we have also added a degree of `type safety' to the intermediate languages' syntaxes. |
---|
1884 | In particular, we note that the \texttt{OP1} constructor expects quite a specific type, in that the two register arguments must both be what passes for the accumulator A in that language. |
---|
1885 | In some languages, for example LIN, this is the hardware accumulator, whilst in others this is any pseudoregister. |
---|
1886 | Contrast this with the \texttt{INT} constructor, which expects a \texttt{generic\_reg}, corresponding to an `arbitrary' register type. |
---|
1887 | |
---|
1888 | Further, we note that some intermediate languages have language specific instructions (i.e. the instructions that change between languages). |
---|
1889 | We therefore add a new constructor to the syntax, \texttt{extension}, which expects a value of type \texttt{extend\_statements p}. |
---|
1890 | As \texttt{p} varies between intermediate languages, we can provide language specific extensions to the syntax of the joint language. |
---|
1891 | For example, ERTL's extended syntax consists of the following extra statements: |
---|
1892 | \begin{lstlisting}[language=Grafite] |
---|
1893 | inductive ertl_statement_extension: Type[0] := |
---|
1894 | | ertl_st_ext_new_frame: ertl_statement_extension |
---|
1895 | | ertl_st_ext_del_frame: ertl_statement_extension |
---|
1896 | | ertl_st_ext_frame_size: register $\rightarrow$ ertl_statement_extension. |
---|
1897 | \end{lstlisting} |
---|
1898 | These are further packaged into an ERTL specific instance of \texttt{params\_\_} as follows: |
---|
1899 | \begin{lstlisting}[language=Grafite] |
---|
1900 | definition ertl_params__: params__ := |
---|
1901 | mk_params__ register register ... ertl_statement_extension. |
---|
1902 | \end{lstlisting} |
---|
1903 | |
---|
1904 | \paragraph{Shared code, reduced proofs} |
---|
1905 | Many features of individual backend intermediate languages are shared with other intermediate languages. |
---|
1906 | For instance, RTLabs, RTL, ERTL and LTL are all graph based languages, where functions are represented as a control flow graph of statements that form their bodies. |
---|
1907 | Functions for adding statements to a graph, searching the graph, and so on, are remarkably similar across all languages, but are duplicated in the O'Caml code. |
---|
1908 | |
---|
1909 | As a result, we chose to abstract the representation of internal functions for the RTL, ERTL, LTL and LIN intermediate languages into a `joint' representation. |
---|
1910 | This representation is parameterised by a record that dictates the layout of the function body for each intermediate language. |
---|
1911 | For instance, in RTL, the layout is graph like, whereas in LIN, the layout is a linearised list of statements. |
---|
1912 | Further, a generalised way of accessing the successor statement to the one currently under consideration is needed, and so forth. |
---|
1913 | |
---|
1914 | Our joint internal function record looks like so: |
---|
1915 | \begin{lstlisting}[language=Grafite] |
---|
1916 | record joint_internal_function (globals: list ident) (p:params globals) : Type[0] := |
---|
1917 | { |
---|
1918 | ... |
---|
1919 | joint_if_params : paramsT p; |
---|
1920 | joint_if_locals : localsT p; |
---|
1921 | ... |
---|
1922 | joint_if_code : codeT ... p; |
---|
1923 | ... |
---|
1924 | }. |
---|
1925 | \end{lstlisting} |
---|
1926 | In particular, everything that can vary between differing intermediate languages has been parameterised. |
---|
1927 | Here, we see the location where to fetch parameters, the listing of local variables, and the internal code representation has been parameterised. |
---|
1928 | Other particulars are also parameterised, though here omitted. |
---|
1929 | |
---|
1930 | Hopefully this abstraction process will reduce the number of proofs that need to be written, dealing with internal functions of different languages characterised by parameters. |
---|
1931 | |
---|
1932 | \paragraph{Dependency on instruction selection} |
---|
1933 | We note that the backend languages are all essentially `post instruction selection languages'. |
---|
1934 | The `joint' syntax makes this especially clear. |
---|
1935 | For instance, in the definition: |
---|
1936 | \begin{lstlisting}[language=Grafite] |
---|
1937 | inductive joint_instruction (p:params__) (globals: list ident): Type[0] := |
---|
1938 | ... |
---|
1939 | | INT: generic_reg p $\rightarrow$ Byte $\rightarrow$ joint_instruction p globals |
---|
1940 | | MOVE: pair_reg p $\rightarrow$ joint_instruction p globals |
---|
1941 | ... |
---|
1942 | | PUSH: acc_a_reg p $\rightarrow$ joint_instruction p globals |
---|
1943 | ... |
---|
1944 | | extension: extend_statements p $\rightarrow$ joint_instruction p globals. |
---|
1945 | \end{lstlisting} |
---|
1946 | The capitalised constructors---\texttt{INT}, \texttt{MOVE}, and so on---are all machine specific instructions. |
---|
1947 | Retargetting the compiler to another microprocessor, improving instruction selection, or simply enlarging the subset of the machine language that the compiler can use, would entail replacing these constructors with constructors that correspond to the instructions of the new target. |
---|
1948 | We feel that this makes which instructions are target dependent, and which are not (i.e. those language specific instructions that fall inside the \texttt{extension} constructor) much more explicit. |
---|
1949 | In the long term, we would really like to try to directly embed the target language in the syntax, in order to reuse the target language's semantics. |
---|
1950 | |
---|
1951 | \paragraph{Independent development and testing} |
---|
1952 | We have essentially modularised the intermediate languages in the compiler backend. |
---|
1953 | As with any form of modularisation, we reap benefits in the ability to independently test and develop each intermediate language separately, with the benefit of fixing bugs just once. |
---|
1954 | |
---|
1955 | \paragraph{Future reuse for other compiler projects} |
---|
1956 | Another advantage of our modularisation scheme is the ability to quickly use and reuse intermediate languages for other compiler projects. |
---|
1957 | For instance, in creating a cost-preserving compiler for a functional language, we may choose to target a linearised version of RTL directly. |
---|
1958 | Adding such an intermediate language would involve the addition of just a few lines of code. |
---|
1959 | |
---|
1960 | \paragraph{Easy addition of new compiler passes} |
---|
1961 | Under our modularisation and abstraction scheme, new compiler passes can easily be injected into the backend. |
---|
1962 | We have a concrete example of this in the RTLntl language, an intermediate language that was not present in the original O'Caml code. |
---|
1963 | To specify a new intermediate language we must simply specify, through the use of the statement extension mechanism, what differs in the new intermediate language from the `joint' language, and configure a new notion of internal function record, by specialising parameters, to the new language. |
---|
1964 | As generic code for the `joint' language exists, for example to add statements to control flow graphs, this code can be reused for the new intermediate language. |
---|
1965 | |
---|
1966 | \paragraph{Possible commutations of translation passes} |
---|
1967 | The backend translation passes of the CerCo compiler differ quite a bit from the CompCert compiler. |
---|
1968 | In the CompCert compiler, linearisation occurs much earlier in the compilation chain, and passes such as register colouring and allocation are carried out on a linearised form of program. |
---|
1969 | Contrast this with our own approach, where the code is represented as a graph for much longer. |
---|
1970 | Similarly, in CompCert the calling conventions are enforced after register allocation, whereas we do register allocation before enforcing the calling convention. |
---|
1971 | |
---|
1972 | However, by abstracting the representation of intermediate functions, we are now much more free to reorder translation passes as we see fit. |
---|
1973 | The linearisation process, for instance, now no longer cares about the specific representation of code in the source and target languages. |
---|
1974 | It just relies on a common interface. |
---|
1975 | We are therefore, in theory, free to pick where we wish to linearise our representation. |
---|
1976 | This adds an unusual flexibility into the compilation process, and allows us to freely experiment with different orderings of translation passes. |
---|
1977 | |
---|
1978 | %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% |
---|
1979 | % SECTION. % |
---|
1980 | %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% |
---|
1981 | \subsection{Use of dependent types} |
---|
1982 | \label{subsect.use.of.dependent.types} |
---|
1983 | |
---|
1984 | We see several potential ways in which a compiler can fail to compile a program: |
---|
1985 | \begin{enumerate} |
---|
1986 | \item |
---|
1987 | The program is malformed, and there is no hope of making sense of the program. |
---|
1988 | \item |
---|
1989 | The compiler is buggy, or an invariant in the compiler is invalidated. |
---|
1990 | \item |
---|
1991 | An incomplete heuristic in the compiler fails. |
---|
1992 | \item |
---|
1993 | The compiled code exhausts some bounded resource, for instance the processor's code memory. |
---|
1994 | \end{enumerate} |
---|
1995 | Standard compilers can fail for all the above reasons. |
---|
1996 | Certified compilers are only required to rule out the second class of failures, but they can still fail for all the remaining reasons. |
---|
1997 | In particular, a compiler that systematically refuses to compile any well formed program is a sound compiler. |
---|
1998 | On the contrary, we would like our certified compiler to fail only in the fourth case. |
---|
1999 | We plan to achieve this with the following strategy. |
---|
2000 | |
---|
2001 | First, the compiler is abstracted over all incomplete heuristics, seen as total functions. |
---|
2002 | To obtain executable code, the compiler is eventually composed with implementations of the abstracted strategies, with the composition taking care of any potential failure of the heuristics in finding a solution. |
---|
2003 | |
---|
2004 | Second, we reject all malformed programs using dependent types: only well-formed programs should typecheck and the compiler can be applied only to well-typed programs. |
---|
2005 | |
---|
2006 | Finally, exhaustion of bounded resources can be checked only at the very end of compilation. |
---|
2007 | Therefore, all intermediate compilation steps are now total functions that cannot diverge, nor fail: these properties are directly guaranteed by the type system of Matita. |
---|
2008 | |
---|
2009 | Presently, the plan is not yet fulfilled. |
---|
2010 | However, we are improving the implemented code according to our plan. |
---|
2011 | We are doing this by progressively strenghthening the code through the use of dependent types. |
---|
2012 | We detail the different ways in which dependent types have been used so far. |
---|
2013 | |
---|
2014 | First, we encode informal invariants, or uses of \texttt{assert false} in the O'Caml code, with dependent types, converting partial functions into total functions. |
---|
2015 | There are numerous examples of this throughout the backend. |
---|
2016 | For example, in the \texttt{RTLabs} to \texttt{RTL} transformation pass, many functions only `make sense' when lists of registers passed to them as arguments conform to some specific length. |
---|
2017 | For instance, the \texttt{translate\_negint} function, which translates a negative integer constant: |
---|
2018 | \begin{lstlisting}[language=Grafite] |
---|
2019 | definition translate_negint := |
---|
2020 | $\lambda$globals: list ident. |
---|
2021 | $\lambda$destrs: list register. |
---|
2022 | $\lambda$srcrs: list register. |
---|
2023 | $\lambda$start_lbl: label. |
---|
2024 | $\lambda$dest_lbl: label. |
---|
2025 | $\lambda$def: rtl_internal_function globals. |
---|
2026 | $\lambda$prf: |destrs| = |srcrs|. (* assert here *) |
---|
2027 | ... |
---|
2028 | \end{lstlisting} |
---|
2029 | The last argument to the function, \texttt{prf}, is a proof that the lengths of the lists of source and destination registers are the same. |
---|
2030 | This was an assertion in the O'Caml code. |
---|
2031 | |
---|
2032 | Secondly, we make use of dependent types to make the Matita code correct by construction, and eventually the proofs of correctness for the compiler easier to write. |
---|
2033 | For instance, many intermediate languages in the backend of the compiler, from RTLabs to LTL, are graph based languages. |
---|
2034 | Here, function definitions consist of a graph (i.e. a map from labels to statements) and a pair of labels denoting the entry and exit points of this graph. |
---|
2035 | Practically, we would always like to ensure that the entry and exit labels are present in the statement graph. |
---|
2036 | We ensure that this is so with a dependent sum type in the \texttt{joint\_internal\_function} record, which all graph based languages specialise to obtain their own internal function representation: |
---|
2037 | \begin{lstlisting}[language=Grafite] |
---|
2038 | record joint_internal_function (globals: list ident) (p: params globals): Type[0] := |
---|
2039 | { |
---|
2040 | ... |
---|
2041 | joint_if_code : codeT $\ldots$ p; |
---|
2042 | joint_if_entry : $\Sigma$l: label. lookup $\ldots$ joint_if_code l $\neq$ None $\ldots$; |
---|
2043 | ... |
---|
2044 | }. |
---|
2045 | \end{lstlisting} |
---|
2046 | Here, \texttt{codeT} is a parameterised type representing the `structure' of the function's body (a graph in graph based languages, and a list in the linearised LIN language). |
---|
2047 | Specifically, the \texttt{joint\_if\_entry} is a dependent pair consisting of a label and a proof that the label in question is a vertex in the function's graph. A similar device exists for the exit label. |
---|
2048 | |
---|
2049 | We make use of dependent types also for another reason: experimentation. |
---|
2050 | Namely, CompCert makes little use of dependent types to encode invariants. |
---|
2051 | In contrast, we wish to make as much use of dependent types as possible, both to experiment with different ways of encoding compilers in a proof assistant, but also as a way of `stress testing' Matita's support for dependent types. |
---|
2052 | |
---|
2053 | Moreover, at the moment we make practically no use of inductive predicates to specify compiler invariants and to describe the semantics of intermediate languages. |
---|
2054 | On the contrary, all predicates are computable functions. |
---|
2055 | Therefore, the proof style that we will adopt will be necessarily significantly different from, say, CompCert's one. |
---|
2056 | At the moment, in Matita `Russell-'-style reasoning (in the sense of~\cite{sozeau:subset:2006}) seems to be the best solution for working with computable functions. |
---|
2057 | This style is heavily based on the idea that all computable functions should be specified using dependent types to describe their pre- and post-conditions. |
---|
2058 | As a consequence, it is natural to add dependent types almost everywhere in the Matita compiler's codebase. |
---|
2059 | |
---|
2060 | %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% |
---|
2061 | % SECTION. % |
---|
2062 | %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% |
---|
2063 | \subsection{What we do not implement} |
---|
2064 | \label{subsect.what.we.do.not.implement} |
---|
2065 | |
---|
2066 | There are several classes of functionality that we have chosen not to implement in the backend languages: |
---|
2067 | \begin{itemize} |
---|
2068 | \item |
---|
2069 | \textbf{Datatypes and functions over these datatypes that are not supported by the compiler.} |
---|
2070 | In particular, the compiler does not support the floating point datatype, nor accompanying functions over that datatype. |
---|
2071 | At the moment, frontend languages within the compiler possess constructors corresponding to floating point code. |
---|
2072 | These are removed during instruction selection (in the RTLabs to RTL transformation) using a daemon.\footnote{A Girardism. An axiom of type \texttt{False}, from which we can prove anything.} |
---|
2073 | However, at some point, we would like the front end of the compiler to recognise programs that use floating point code and reject them as being invalid. |
---|
2074 | \item |
---|
2075 | \textbf{Axiomatised components that will be implemented using external oracles.} |
---|
2076 | Several large, complex pieces of compiler infrastructure, most noticably register colouring and fixed point calculation during liveness analysis have been axiomatised. |
---|
2077 | This was already agreed upon before the start of the project, and is clearly marked in the project proposal, following comments by those involved with the CompCert project about the difficulty in formalising register colouring in that project. |
---|
2078 | Instead, these components are axiomatised, along with the properties that they need to satisfy in order for the rest of the compilation chain to be correct. |
---|
2079 | These axiomatised components are found in the ERTL to LTL pass. |
---|
2080 | |
---|
2081 | It should be noted that these axiomatised components fall into the following pattern: whilst their implementation is complex, and their proof of correctness is difficult, we are able to quickly and easily verify that any answer that they provide is correct. Therefore, we plan to provide implementations in OCaml only, |
---|
2082 | and to provide certified verifiers in Matita. |
---|
2083 | At the moment, the implementation of the certified verifiers is left as future work. |
---|
2084 | \item |
---|
2085 | \textbf{A few non-computational proof obligations.} |
---|
2086 | A few difficult-to-close, but non-computational (i.e. they do not prevent us from executing the compiler inside Matita), proof obligations have been closed using daemons in the backend. |
---|
2087 | These proof obligations originate with our use of dependent types for expressing invariants in the compiler. |
---|
2088 | However, here, it should be mentioned that many open proof obligations are simply impossible to close until we start to obtain stronger invariants from the proof of correctness for the compiler proper. |
---|
2089 | In particular, in the RTLabs to RTL pass, several proof obligations relating to lists of registers stored in a `local environment' appear to fall into this pattern. |
---|
2090 | \item |
---|
2091 | \textbf{Branch compression (tunnelling).} |
---|
2092 | This was a feature of the O'Caml compiler. |
---|
2093 | It is not yet currently implemented in the Matita compiler. |
---|
2094 | This feature is only an optimisation, and will not affect the correctness of the compiler. |
---|
2095 | \item |
---|
2096 | \textbf{`Real' tailcalls} |
---|
2097 | For the time being, tailcalls in the backend are translated to `vanilla' function calls during the ERTL to LTL pass. |
---|
2098 | This follows the O'Caml compiler, which did not implement tailcalls, and did this simplification step. |
---|
2099 | `Real' tailcalls are being implemented in the O'Caml compiler, and when this implementation is complete, we aim to port this code to the Matita compiler. |
---|
2100 | \end{itemize} |
---|
2101 | |
---|
2102 | %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% |
---|
2103 | % SECTION. % |
---|
2104 | %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% |
---|
2105 | \section{Associated changes to O'Caml compiler} |
---|
2106 | \label{sect.associated.changes.to.ocaml.compiler} |
---|
2107 | |
---|
2108 | At the moment, only bugfixes, but no architectural changes we have made in the Matita backend have made their way back into the O'Caml compiler. |
---|
2109 | We do not see the heavy process of modularisation and abstraction as making its way back into the O'Caml codebase, as this is a significant rewrite of the backend code that is supposed to yield the same code after instantiating parameters, anyway. |
---|
2110 | |
---|
2111 | %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% |
---|
2112 | % SECTION. % |
---|
2113 | %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% |
---|
2114 | \section{Future work} |
---|
2115 | \label{sect.future.work} |
---|
2116 | |
---|
2117 | As mentioned in Section~\ref{subsect.what.we.do.not.implement}, there are several unimplemented features in the compiler, and several aspects of the Matita code that can be improved in order to make currently partial functions total. |
---|
2118 | We summarise this future work here: |
---|
2119 | \begin{itemize} |
---|
2120 | \item |
---|
2121 | We plan to make use of dependent types to identify `floating point' free programs and make all functions total over such programs. |
---|
2122 | This will remove a swathe of uses of daemons. |
---|
2123 | This should be routine. |
---|
2124 | \item |
---|
2125 | We plan to move expansion of integer modulus, and other related functions, into the instruction selection (RTLabs to RTL) phase. |
---|
2126 | This will also help to remove a swathe of uses of daemons, as well as potentially introduce new opportunities for optimisations that we currently miss in expanding these instructions at the C-light level. |
---|
2127 | \item |
---|
2128 | We plan to close all existing proof obligations that are closed using daemons, arising from our use of dependent types in the backend. |
---|
2129 | However, many may not be closable until we have completed Deliverable D4.4, the certification of the whole compiler, as we may not have invariants strong enough at the present time. |
---|
2130 | \item |
---|
2131 | We plan to port the O'Caml compiler's implementation of tailcalls when this is completed, and eventually port the branch compression code currently in the O'Caml compiler to the Matita implementation. |
---|
2132 | This should not cause any major problems. |
---|
2133 | \item |
---|
2134 | We plan to validate the backend translations, removing any obvious bugs, by executing the translation inside Matita on small C programs. |
---|
2135 | This is not critical, as the certification process will find all bugs anyway. |
---|
2136 | \item We plan to provide certified validators for all results provided by |
---|
2137 | external oracles written in OCaml. At the moment, we have axiomatized oracles |
---|
2138 | for computing least fixpoints during liveness analysis, for colouring |
---|
2139 | registers and for branch displacement in the assembler code. |
---|
2140 | \end{itemize} |
---|
2141 | |
---|
2142 | \section{The back-end intermediate languages' semantics in Matita} |
---|
2143 | \label{sect.backend.intermediate.languages.semantics.matita} |
---|
2144 | |
---|
2145 | %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% |
---|
2146 | % SECTION. % |
---|
2147 | %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% |
---|
2148 | \subsection{Abstracting related languages} |
---|
2149 | \label{subsect.abstracting.related.languages} |
---|
2150 | |
---|
2151 | As mentioned in the report for Deliverable D4.2, a systematic process of abstraction, over the OCaml code, has taken place in the Matita encoding. |
---|
2152 | In particular, we have merged many of the syntaxes of the intermediate languages (i.e. RTL, ERTL, LTL and LIN) into a single `joint' syntax, which is parameterised by various types. |
---|
2153 | Equivalent intermediate languages to those present in the OCaml code can be recovered by specialising this joint structure. |
---|
2154 | |
---|
2155 | As mentioned in the report for Deliverable D4.2, there are a number of advantages that this process of abstraction brings, from code reuse to allowing us to get a clearer view of the intermediate languages and their structure. |
---|
2156 | However, the semantics of the intermediate languages allow us to concretely demonstrate this improvement in clarity, by noting that the semantics of the LTL and the semantics of the LIN languages are identical. |
---|
2157 | In particular, the semantics of both LTL and LIN are implemented in exactly the same way. |
---|
2158 | The only difference between the two languages is how the next instruction to be interpreted is fetched. |
---|
2159 | In LTL, this involves looking up in a graph, whereas in LTL, this involves fetching from a list of instructions. |
---|
2160 | |
---|
2161 | As a result, we see that the semantics of LIN and LTL are both instances of a single, more general language that is parametric in how the next instruction is fetched. |
---|
2162 | Furthermore, any prospective proof that the semantics of LTL and LIN are identical is now almost trivial, saving a deal of work in Deliverable D4.4. |
---|
2163 | |
---|
2164 | %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% |
---|
2165 | % SECTION. % |
---|
2166 | %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% |
---|
2167 | \subsection{Type parameters, and their purpose} |
---|
2168 | \label{subsect.type.parameters.their.purpose} |
---|
2169 | |
---|
2170 | We mentioned in the Deliverable D4.2 report that all joint languages are parameterised by a number of types, which are later specialised to each distinct intermediate language. |
---|
2171 | As this parameterisation process is also dependent on designs decisions in the language semantics, we have so far held off summarising the role of each parameter. |
---|
2172 | |
---|
2173 | We begin the abstraction process with the \texttt{params\_\_} record. |
---|
2174 | This holds the types of the representations of the different register varieties in the intermediate languages: |
---|
2175 | \begin{lstlisting}[language=Grafite] |
---|
2176 | record params__: Type[1] := |
---|
2177 | { |
---|
2178 | acc_a_reg: Type[0]; |
---|
2179 | acc_b_reg: Type[0]; |
---|
2180 | dpl_reg: Type[0]; |
---|
2181 | dph_reg: Type[0]; |
---|
2182 | pair_reg: Type[0]; |
---|
2183 | generic_reg: Type[0]; |
---|
2184 | call_args: Type[0]; |
---|
2185 | call_dest: Type[0]; |
---|
2186 | extend_statements: Type[0] |
---|
2187 | }. |
---|
2188 | \end{lstlisting} |
---|
2189 | We summarise what these types mean, and how they are used in both the semantics and the translation process: |
---|
2190 | \begin{center} |
---|
2191 | \begin{tabular*}{\textwidth}{p{4cm}p{11cm}} |
---|
2192 | Type & Explanation \\ |
---|
2193 | \hline |
---|
2194 | \texttt{acc\_a\_reg} & The type of the accumulator A register. In some languages this is implemented as the hardware accumulator, whereas in others this is a pseudoregister.\\ |
---|
2195 | \texttt{acc\_b\_reg} & Similar to the accumulator A field, but for the processor's auxilliary accumulator, B. \\ |
---|
2196 | \texttt{dpl\_reg} & The type of the representation of the low eight bit register of the MCS-51's single 16 bit register, DPL. Can be either a pseudoregister or the hardware DPL register. \\ |
---|
2197 | \texttt{dph\_reg} & Similar to the DPL register but for the eight high bits of the 16-bit register. \\ |
---|
2198 | \texttt{pair\_reg} & Various different `move' instructions have been merged into a single move instruction in the joint language. A value can either be moved to or from the accumulator in some languages, or moved to and from an arbitrary pseudoregister in others. This type encodes how we should move data around the registers and accumulators. \\ |
---|
2199 | \texttt{generic\_reg} & The representation of generic registers (i.e. those that are not devoted to a specific task). \\ |
---|
2200 | \texttt{call\_args} & The actual arguments passed to a function. For some languages this is simply the number of arguments passed to the function. \\ |
---|
2201 | \texttt{call\_dest} & The destination of the function call. \\ |
---|
2202 | \texttt{extend\_statements} & Instructions that are specific to a particular intermediate language, and which cannot be abstracted into the joint language. |
---|
2203 | \end{tabular*} |
---|
2204 | \end{center} |
---|
2205 | |
---|
2206 | As mentioned in the report for Deliverable D4.2, the record \texttt{params\_\_} is enough to be able to specify the instructions of the joint languages: |
---|
2207 | \begin{lstlisting}[language=Grafite] |
---|
2208 | inductive joint_instruction (p: params__) (globals: list ident): Type[0] := |
---|
2209 | | COMMENT: String $\rightarrow$ joint_instruction p globals |
---|
2210 | | COST_LABEL: costlabel $\rightarrow$ joint_instruction p globals |
---|
2211 | ... |
---|
2212 | | OP1: Op1 $\rightarrow$ acc_a_reg p $\rightarrow$ acc_a_reg p $\rightarrow$ joint_instruction p globals |
---|
2213 | | COND: acc_a_reg p $\rightarrow$ label $\rightarrow$ joint_instruction p globals |
---|
2214 | ... |
---|
2215 | \end{lstlisting} |
---|
2216 | Here, we see that the instruction \texttt{OP1} (a unary operation on the accumulator A) can be given quite a specific type, through the use of the \texttt{params\_\_} data structure. |
---|
2217 | |
---|
2218 | Joint statements can be split into two subclasses: those who simply pass the flow of control onto their successor statement, and those that jump to a potentially remote location in the program. |
---|
2219 | Naturally, as some intermediate languages are graph based, and others linearised, the passing act of passing control on to the `successor' instruction can either be the act of following a graph edge in a control flow graph, or incrementing an index into a list. |
---|
2220 | We make a distinction between instructions that pass control onto their immediate successors, and those that jump elsewhere in the program, through the use of \texttt{succ}, denoting the immediate successor of the current instruction, in the \texttt{params\_} record described below. |
---|
2221 | \begin{lstlisting}[language=Grafite] |
---|
2222 | record params_: Type[1] := |
---|
2223 | { |
---|
2224 | pars__ :> params__; |
---|
2225 | succ: Type[0] |
---|
2226 | }. |
---|
2227 | \end{lstlisting} |
---|
2228 | The type \texttt{succ} corresponds to labels, in the case of control flow graph based languages, or is instantiated to the unit type for the linearised language, LIN. |
---|
2229 | Using \texttt{param\_} we can define statements of the joint language: |
---|
2230 | \begin{lstlisting}[language=Grafite] |
---|
2231 | inductive joint_statement (p:params_) (globals: list ident): Type[0] := |
---|
2232 | | sequential: joint_instruction p globals $\rightarrow$ succ p $\rightarrow$ joint_statement p globals |
---|
2233 | | GOTO: label $\rightarrow$ joint_statement p globals |
---|
2234 | | RETURN: joint_statement p globals. |
---|
2235 | \end{lstlisting} |
---|
2236 | Note that in the joint language, instructions are `linear', in that they have an immediate successor. |
---|
2237 | Statements, on the other hand, consist of either a linear instruction, or a \texttt{GOTO} or \texttt{RETURN} statement, both of which can jump to an arbitrary place in the program. The conditional jump instruction COND is `linear', since it |
---|
2238 | has an immediate successor, but it also takes an arbitrary location (a label) |
---|
2239 | to jump to. |
---|
2240 | |
---|
2241 | For the semantics, we need further parametererised types. |
---|
2242 | In particular, we parameterise the result and parameter type of an internal function call in \texttt{params0}: |
---|
2243 | \begin{lstlisting}[language=Grafite] |
---|
2244 | record params0: Type[1] := |
---|
2245 | { |
---|
2246 | pars__' :> params__; |
---|
2247 | resultT: Type[0]; |
---|
2248 | paramsT: Type[0] |
---|
2249 | }. |
---|
2250 | \end{lstlisting} |
---|
2251 | Here, \texttt{paramsT} and \texttt{resultT} typically are the (pseudo)registers that store the parameters and result of a function. |
---|
2252 | |
---|
2253 | We further extend \texttt{params0} with a type for local variables in internal function calls: |
---|
2254 | \begin{lstlisting}[language=Grafite] |
---|
2255 | record params1 : Type[1] := |
---|
2256 | { |
---|
2257 | pars0 :> params0; |
---|
2258 | localsT: Type[0] |
---|
2259 | }. |
---|
2260 | \end{lstlisting} |
---|
2261 | Again, we expand our parameters with types corresponding to the code representation (either a control flow graph or a list of statements). |
---|
2262 | Further, we hypothesise a generic method for looking up the next instruction in the graph, called \texttt{lookup}. |
---|
2263 | Note that \texttt{lookup} may fail, and returns an \texttt{option} type: |
---|
2264 | \begin{lstlisting}[language=Grafite] |
---|
2265 | record params (globals: list ident): Type[1] := |
---|
2266 | { |
---|
2267 | succ_ : Type[0]; |
---|
2268 | pars1 :> params1; |
---|
2269 | codeT : Type[0]; |
---|
2270 | lookup: codeT $\rightarrow$ label $\rightarrow$ option (joint_statement (mk_params_ pars1 succ_) globals) |
---|
2271 | }. |
---|
2272 | \end{lstlisting} |
---|
2273 | We now have what we need to define internal functions for the joint language. |
---|
2274 | The first two `universe' fields are only used in the compilation process, for generating fresh names, and do not affect the semantics. |
---|
2275 | The rest of the fields affect both compilation and semantics. |
---|
2276 | In particular, we have a description of the result, parameters and the local variables of a function. |
---|
2277 | Note also that we have lifted the hypothesised \texttt{lookup} function from \texttt{params} into a dependent sigma type, which combines a label (the entry and exit points of the control flow graph or list) combined with a proof that the label is in the graph structure: |
---|
2278 | \begin{lstlisting}[language=Grafite] |
---|
2279 | record joint_internal_function (globals: list ident) (p:params globals) : Type[0] := |
---|
2280 | { |
---|
2281 | joint_if_luniverse: universe LabelTag; |
---|
2282 | joint_if_runiverse: universe RegisterTag; |
---|
2283 | joint_if_result : resultT p; |
---|
2284 | joint_if_params : paramsT p; |
---|
2285 | joint_if_locals : localsT p; |
---|
2286 | joint_if_stacksize: nat; |
---|
2287 | joint_if_code : codeT ... p; |
---|
2288 | joint_if_entry : $\Sigma$l: label. lookup ... joint_if_code l $\neq$ None ?; |
---|
2289 | joint_if_exit : $\Sigma$l: label. lookup ... joint_if_code l $\neq$ None ? |
---|
2290 | }. |
---|
2291 | \end{lstlisting} |
---|
2292 | Naturally, a question arises as to why we have chosen to split up the parameterisation into so many intermediate records, each slightly extending earlier ones. |
---|
2293 | The reason is because some intermediate languages share a host of parameters, and only differ on some others. |
---|
2294 | For instance, in instantiating the ERTL language, certain parameters are shared with RTL, whilst others are ERTL specific: |
---|
2295 | \begin{lstlisting}[language=Grafite] |
---|
2296 | ... |
---|
2297 | definition ertl_params__: params__ := |
---|
2298 | mk_params__ register register register register (move_registers $\times$ move_registers) |
---|
2299 | register nat unit ertl_statement_extension. |
---|
2300 | ... |
---|
2301 | definition ertl_params1: params1 := rtl_ertl_params1 ertl_params0. |
---|
2302 | definition ertl_params: $\forall$globals. params globals := rtl_ertl_params ertl_params0. |
---|
2303 | ... |
---|
2304 | definition ertl_statement := joint_statement ertl_params_. |
---|
2305 | |
---|
2306 | definition ertl_internal_function := |
---|
2307 | $\lambda$globals.joint_internal_function ... (ertl_params globals). |
---|
2308 | \end{lstlisting} |
---|
2309 | Here, \texttt{rtl\_ertl\_params1} are the common parameters of the ERTL and RTL languages: |
---|
2310 | \begin{lstlisting}[language=Grafite] |
---|
2311 | definition rtl_ertl_params1 := $\lambda$pars0. mk_params1 pars0 (list register). |
---|
2312 | \end{lstlisting} |
---|
2313 | |
---|
2314 | The record \texttt{more\_sem\_params} bundles together functions that store and retrieve values in various forms of register: |
---|
2315 | \begin{lstlisting}[language=Grafite] |
---|
2316 | record more_sem_params (p:params_): Type[1] := |
---|
2317 | { |
---|
2318 | framesT: Type[0]; |
---|
2319 | empty_framesT: framesT; |
---|
2320 | |
---|
2321 | regsT: Type[0]; |
---|
2322 | empty_regsT: regsT; |
---|
2323 | |
---|
2324 | call_args_for_main: call_args p; |
---|
2325 | call_dest_for_main: call_dest p; |
---|
2326 | |
---|
2327 | greg_store_: generic_reg p $\rightarrow$ beval $\rightarrow$ regsT $\rightarrow$ res regsT; |
---|
2328 | greg_retrieve_: regsT $\rightarrow$ generic_reg p $\rightarrow$ res beval; |
---|
2329 | acca_store_: acc_a_reg p $\rightarrow$ beval $\rightarrow$ regsT $\rightarrow$ res regsT; |
---|
2330 | acca_retrieve_: regsT $\rightarrow$ acc_a_reg p $\rightarrow$ res beval; |
---|
2331 | ... |
---|
2332 | dpl_store_: dpl_reg p $\rightarrow$ beval $\rightarrow$ regsT $\rightarrow$ res regsT; |
---|
2333 | dpl_retrieve_: regsT $\rightarrow$ dpl_reg p $\rightarrow$ res beval; |
---|
2334 | ... |
---|
2335 | pair_reg_move_: regsT $\rightarrow$ pair_reg p $\rightarrow$ res regsT; |
---|
2336 | }. |
---|
2337 | \end{lstlisting} |
---|
2338 | Here, the fields \texttt{empty\_framesT}, \texttt{empty\_regsT}, \texttt{call\_args\_for\_main} and \texttt{call\_dest\_for\_main} are used for state initialisation. |
---|
2339 | |
---|
2340 | The fields \texttt{greg\_store\_} and \texttt{greg\_retrieve\_} store and retrieve values from a generic register, respectively. |
---|
2341 | Similarly, \texttt{pair\_reg\_move} implements the generic move instruction of the joint language. |
---|
2342 | Here \texttt{framesT} is the type of stack frames, with \texttt{empty\_framesT} an empty stack frame. |
---|
2343 | |
---|
2344 | The two hypothesised values \texttt{call\_args\_for\_main} and \texttt{call\_dest\_for\_main} deal with problems with the \texttt{main} function of the program, and how it is handled. |
---|
2345 | In particular, we need to know when the \texttt{main} function has finished executing. |
---|
2346 | But this is complicated, in C, by the fact that the \texttt{main} function is explicitly allowed to be recursive (disallowed in C++). |
---|
2347 | Therefore, to understand whether the exiting \texttt{main} function is really exiting, or just recursively calling itself, we need to remember the address to which \texttt{main} will return control once the initial call to \texttt{main} has finished executing. |
---|
2348 | This is done with \texttt{call\_dest\_for\_main}, whereas \texttt{call\_args\_for\_main} holds the \texttt{main} function's arguments. |
---|
2349 | |
---|
2350 | We extend \texttt{more\_sem\_params} with yet more parameters via \texttt{more\_sem\_params2}: |
---|
2351 | \begin{lstlisting}[language=Grafite] |
---|
2352 | record more_sem_params1 (globals: list ident) (p: params globals) : Type[1] := |
---|
2353 | { |
---|
2354 | more_sparams1 :> more_sem_params p; |
---|
2355 | |
---|
2356 | succ_pc: succ p $\rightarrow$ address $\rightarrow$ res address; |
---|
2357 | pointer_of_label: genv ... p $\rightarrow$ pointer $\rightarrow$ |
---|
2358 | label $\rightarrow$ res ($\Sigma$p:pointer. ptype p = Code); |
---|
2359 | ... |
---|
2360 | fetch_statement: |
---|
2361 | genv ... p $\rightarrow$ state (mk_sem_params ... more_sparams1) $\rightarrow$ |
---|
2362 | res (joint_statement (mk_sem_params ... more_sparams1) globals); |
---|
2363 | ... |
---|
2364 | save_frame: |
---|
2365 | address $\rightarrow$ nat $\rightarrow$ paramsT ... p $\rightarrow$ call_args p $\rightarrow$ call_dest p $\rightarrow$ |
---|
2366 | state (mk_sem_params ... more_sparams1) $\rightarrow$ |
---|
2367 | res (state (mk_sem_params ... more_sparams1)); |
---|
2368 | pop_frame: |
---|
2369 | genv globals p $\rightarrow$ state (mk_sem_params ... more_sparams1) $\rightarrow$ |
---|
2370 | res ((state (mk_sem_params ... more_sparams1))); |
---|
2371 | ... |
---|
2372 | set_result: |
---|
2373 | list val $\rightarrow$ state (mk_sem_params ... more_sparams1) $\rightarrow$ |
---|
2374 | res (state (mk_sem_params ... more_sparams1)); |
---|
2375 | exec_extended: |
---|
2376 | genv globals p $\rightarrow$ extend_statements (mk_sem_params ... more_sparams1) $\rightarrow$ |
---|
2377 | succ p $\rightarrow$ state (mk_sem_params ... more_sparams1) $\rightarrow$ |
---|
2378 | IO io_out io_in (trace $\times$ (state (mk_sem_params ... more_sparams1))) |
---|
2379 | }. |
---|
2380 | \end{lstlisting} |
---|
2381 | The field \texttt{succ\_pc} takes an address, and a `successor' label, and returns the address of the instruction immediately succeeding the one at hand. |
---|
2382 | |
---|
2383 | Here, \texttt{fetch\_statement} fetches the next statement to be executed. |
---|
2384 | The fields \texttt{save\_frame} and \texttt{pop\_frame} manipulate stack frames. |
---|
2385 | In particular, \texttt{save\_frame} creates a new stack frame on the top of the stack, saving the destination and parameters of a function, and returning an updated state. |
---|
2386 | The field \texttt{pop\_frame} destructively pops a stack frame from the stack, returning an updated state. |
---|
2387 | Further, \texttt{set\_result} saves the result of the function computation, and \texttt{exec\_extended} is a function that executes the extended statements, peculiar to each individual intermediate language. |
---|
2388 | |
---|
2389 | We bundle \texttt{params} and \texttt{sem\_params} together into a single record. |
---|
2390 | This will be used in the function \texttt{eval\_statement} which executes a single statement of the joint language: |
---|
2391 | \begin{lstlisting}[language=Grafite] |
---|
2392 | record sem_params2 (globals: list ident): Type[1] := |
---|
2393 | { |
---|
2394 | p2 :> params globals; |
---|
2395 | more_sparams2 :> more_sem_params2 globals p2 |
---|
2396 | }. |
---|
2397 | \end{lstlisting} |
---|
2398 | \noindent |
---|
2399 | The \texttt{state} record holds the current state of the interpreter: |
---|
2400 | \begin{lstlisting}[language=Grafite] |
---|
2401 | record state (p: sem_params): Type[0] := |
---|
2402 | { |
---|
2403 | st_frms: framesT ? p; |
---|
2404 | pc: address; |
---|
2405 | sp: pointer; |
---|
2406 | isp: pointer; |
---|
2407 | carry: beval; |
---|
2408 | regs: regsT ? p; |
---|
2409 | m: bemem |
---|
2410 | }. |
---|
2411 | \end{lstlisting} |
---|
2412 | Here \texttt{st\_frms} represent stack frames, \texttt{pc} the program counter, \texttt{sp} the stack pointer, \texttt{isp} the internal stack pointer, \texttt{carry} the carry flag, \texttt{regs} the registers (hardware and pseudoregisters) and \texttt{m} external RAM. |
---|
2413 | Note that we have two stack pointers, as we have two stacks: the physical stack of the MCS-51 microprocessor, and an emulated stack in external RAM. |
---|
2414 | The MCS-51's own stack is minuscule, therefore it is usual to emulate a much larger, more useful stack in external RAM. |
---|
2415 | We require two stack pointers as the MCS-51's \texttt{PUSH} and \texttt{POP} instructions manipulate the physical stack, and not the emulated one. |
---|
2416 | |
---|
2417 | We use the function \texttt{eval\_statement} to evaluate a single joint statement: |
---|
2418 | \begin{lstlisting}[language=Grafite] |
---|
2419 | definition eval_statement: |
---|
2420 | $\forall$globals: list ident.$\forall$p:sem_params2 globals. |
---|
2421 | genv globals p $\rightarrow$ state p $\rightarrow$ IO io_out io_in (trace $\times$ (state p)) := |
---|
2422 | ... |
---|
2423 | \end{lstlisting} |
---|
2424 | We examine the type of this function. |
---|
2425 | Note that it returns a monadic action, \texttt{IO}, denoting that it may have an IO \emph{side effect}, where the program reads or writes to some external device or memory address. |
---|
2426 | Monads and their use are further discussed in Subsection~\ref{subsect.use.of.monads}. |
---|
2427 | Further, the function returns a new state, updated by the single step of execution of the program. |
---|
2428 | Finally, a \emph{trace} is also returned, which records externally observable `events', such as the calling of external functions and the emission of cost labels. |
---|
2429 | |
---|
2430 | %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% |
---|
2431 | % SECTION. % |
---|
2432 | %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% |
---|
2433 | \subsection{Use of monads} |
---|
2434 | \label{subsect.use.of.monads} |
---|
2435 | |
---|
2436 | Monads are a categorical notion that have recently gained an amount of traction in functional programming circles. |
---|
2437 | In particular, it was noted by Moggi that monads could be used to sequence \emph{effectful} computations in a pure manner. |
---|
2438 | Here, `effectful computations' cover a lot of ground, from writing to files, generating fresh names, or updating an ambient notion of state. |
---|
2439 | |
---|
2440 | A monad can be characterised by the following: |
---|
2441 | \begin{itemize} |
---|
2442 | \item |
---|
2443 | A data type, $M$. |
---|
2444 | For instance, the \texttt{option} type in OCaml or Matita. |
---|
2445 | \item |
---|
2446 | A way to `inject' or `lift' pure values into this data type (usually called \texttt{return}). |
---|
2447 | We call this function \texttt{return} and say that it must have type $\alpha \rightarrow M \alpha$, where $M$ is the name of the monad. |
---|
2448 | In our example, the `lifting' function for the \texttt{option} monad can be implemented as: |
---|
2449 | \begin{lstlisting}[language=Grafite] |
---|
2450 | let return x = Some x |
---|
2451 | \end{lstlisting} |
---|
2452 | \item |
---|
2453 | A way to `sequence' monadic functions together, to form another monadic function, usually called \texttt{bind}. |
---|
2454 | Bind has type $M \alpha \rightarrow (\alpha \rightarrow M \beta) \rightarrow M \beta$. |
---|
2455 | We can see that bind `unpacks' a monadic value, applies a function after unpacking, and `repacks' the new value in the monad. |
---|
2456 | In our example, the sequencing function for the \texttt{option} monad can be implemented as: |
---|
2457 | \begin{lstlisting}[language=Grafite] |
---|
2458 | let bind o f = |
---|
2459 | match o with |
---|
2460 | None -> None |
---|
2461 | Some s -> f s |
---|
2462 | \end{lstlisting} |
---|
2463 | \item |
---|
2464 | A series of algebraic laws that relate \texttt{return} and \texttt{bind}, ensuring that the sequencing operation `does the right thing' by retaining the order of effects. |
---|
2465 | These \emph{monad laws} should also be useful in reasoning about monadic computations in the proof of correctness of the compiler. |
---|
2466 | \end{itemize} |
---|
2467 | In the semantics of both front and back-end intermediate languages, we make use of monads. |
---|
2468 | This monadic infrastructure is shared between the front-end and back-end languages. |
---|
2469 | |
---|
2470 | In particular, an `IO' monad, signalling the emission of a cost label, or the calling of an external function, is heavily used in the semantics of the intermediate languages. |
---|
2471 | Here, the monad's sequencing operation ensures that cost label emissions and function calls are maintained in the correct order. |
---|
2472 | We have already seen how the \texttt{eval\_statement} function of the joint language is monadic, with type: |
---|
2473 | \begin{lstlisting}[language=Grafite] |
---|
2474 | definition eval_statement: |
---|
2475 | $\forall$globals: list ident.$\forall$p:sem_params2 globals. |
---|
2476 | genv globals p $\rightarrow$ state p $\rightarrow$ IO io_out io_in (trace $\times$ (state p)) := |
---|
2477 | ... |
---|
2478 | \end{lstlisting} |
---|
2479 | If we examine the body of \texttt{eval\_statement}, we may also see how the monad sequences effects. |
---|
2480 | For instance, in the case for the \texttt{LOAD} statement, we have the following: |
---|
2481 | \begin{lstlisting}[language=Grafite] |
---|
2482 | definition eval_statement: |
---|
2483 | $\forall$globals: list ident. $\forall$p:sem_params2 globals. |
---|
2484 | genv globals p $\rightarrow$ state p $\rightarrow$ IO io_out io_in (trace $\times$ (state p)) := |
---|
2485 | $\lambda$globals, p, ge, st. |
---|
2486 | ... |
---|
2487 | match s with |
---|
2488 | | LOAD dst addrl addrh $\Rightarrow$ |
---|
2489 | ! vaddrh $\leftarrow$ dph_retrieve ... st addrh; |
---|
2490 | ! vaddrl $\leftarrow$ dpl_retrieve ... st addrl; |
---|
2491 | ! vaddr $\leftarrow$ pointer_of_address $\langle$vaddrl,vaddrh$\rangle$; |
---|
2492 | ! v $\leftarrow$ opt_to_res ... (msg FailedLoad) (beloadv (m ... st) vaddr); |
---|
2493 | ! st $\leftarrow$ acca_store p ... dst v st; |
---|
2494 | ! st $\leftarrow$ next ... l st ; |
---|
2495 | ret ? $\langle$E0, st$\rangle$ |
---|
2496 | \end{lstlisting} |
---|
2497 | Here, we employ a certain degree of syntactic sugaring. |
---|
2498 | The syntax |
---|
2499 | \begin{lstlisting}[language=Grafite] |
---|
2500 | ... |
---|
2501 | ! vaddrh $\leftarrow$ dph_retrieve ... st addrh; |
---|
2502 | ! vaddrl $\leftarrow$ dpl_retrieve ... st addrl; |
---|
2503 | ... |
---|
2504 | \end{lstlisting} |
---|
2505 | is sugaring for the \texttt{IO} monad's binding operation. |
---|
2506 | We can expand this sugaring to the following much more verbose code: |
---|
2507 | \begin{lstlisting}[language=Grafite] |
---|
2508 | ... |
---|
2509 | bind (dph_retrieve ... st addrh) ($\lambda$vaddrh. bind (dpl_retrieve ... st addrl) |
---|
2510 | ($\lambda$vaddrl. ...)) |
---|
2511 | \end{lstlisting} |
---|
2512 | Note also that the function \texttt{ret} is implementing the `lifting', or return function of the \texttt{IO} monad. |
---|
2513 | |
---|
2514 | We believe the sugaring for the monadic bind operation makes the program much more readable, and therefore easier to reason about. |
---|
2515 | In particular, note that the functions \texttt{dph\_retrieve}, \texttt{pointer\_of\_address}, \texttt{acca\_store} and \texttt{next} are all monadic. |
---|
2516 | |
---|
2517 | Note, however, that inside this monadic code, there is also another monad hiding. |
---|
2518 | The \texttt{res} monad signals failure, along with an error message. |
---|
2519 | The monad's sequencing operation ensures the order of error messages does not get rearranged. |
---|
2520 | The function \texttt{opt\_to\_res} lifts an option type into this monad, with an error message to be used in case of failure. |
---|
2521 | The \texttt{res} monad is then coerced into the \texttt{IO} monad, ensuring the whole code snippet typechecks. |
---|
2522 | |
---|
2523 | \subsection{Memory models} |
---|
2524 | \label{subsect.memory.models} |
---|
2525 | |
---|
2526 | Currently, the semantics of the front and back-end intermediate languages are built around two distinct memory models. |
---|
2527 | The front-end languages reuse the CompCert 1.6 memory model, whereas the back-end languages employ a version tailored to their needs. |
---|
2528 | This split between the memory models reflects the fact that the front-end and back-end languages have different requirements from their memory models. |
---|
2529 | |
---|
2530 | In particular, the CompCert 1.6 memory model places quite heavy restrictions on where in memory one can read from. |
---|
2531 | To read a value in this memory model, you must supply an address, complete with the size of `chunk' to read following that address. |
---|
2532 | The read is only successful if you attempt to read at a genuine `value boundary', and read the appropriate amount of memory for that value. |
---|
2533 | As a result, with that memory model you are unable to read the third byte of a 32-bit integer value directly from memory, for instance. |
---|
2534 | This has some consequences for the compiler, namely an inability to write a \texttt{memcpy} routine. |
---|
2535 | |
---|
2536 | However, the CerCo memory model operates differently, as we need to move data `piecemeal' between stacks in the back-end of the compiler. |
---|
2537 | As a result, the back-end memory model allows one to read data at any memory location, not just on value boundaries. |
---|
2538 | This has the advantage that we can successfully give a semantics to a \texttt{memcpy} routine in the back-end of the CerCo compiler (remembering that \texttt{memcpy} is nothing more than `read a byte, copy a byte' repeated in a loop), an advantage over CompCert. However, the front-end of CerCo cannot because its memory model and values are the similar to CompCert 1.6. |
---|
2539 | |
---|
2540 | More recent versions of CompCert's memory model have evolved in a similar direction, with a byte-by-byte representation of memory blocks. However, there remains an important difference in the handling of pointer values in the rest of the formalisation. In particular, in CompCert 1.10 only complete pointer values can be loaded in all of the languages in the compiler, whereas in CerCo we need to represent individual bytes of a pointer in the back-end to support our 8-bit target architecture. |
---|
2541 | |
---|
2542 | Right now, the two memory models are interfaced during the translation from RTLabs to RTL. |
---|
2543 | It is an open question whether we will unify the two memory models, using only the back-end, bespoke memory model throughout the compiler, as the CompCert memory model seems to work fine for the front-end, where such byte-by-byte copying is not needed. |
---|
2544 | However, should we decide to port the front-end to the new memory model, it has been written in such an abstract way that doing so would be relatively straightforward. |
---|
2545 | |
---|
2546 | |
---|