source: Papers/jar-cerco-2017/proof.tex

Last change on this file was 3670, checked in by campbell, 4 years ago

Move a chunk of old specification text into place

File size: 142.0 KB
Line 
1% Compiler proof
2%   Structure of proof, and high-level discussion
3%   Technical devices: structured traces, labelling, etc.
4%   Assembler proof
5%   Technical issues in front end (Brian?)
6%   Main theorem statement
7
8\section{Compiler proof}
9\label{sect.compiler.proof}
10
11\subsection{Labelling approach}
12
13In this section, we will explore the labelling approach mentioned in the introduction in more detail.
14
15The `labelling' approach to the problem of building
16cost annotations is summarized in the following diagram.
17
18\newcolumntype{M}[1]{>{\raggedright}m{#1}}
19
20\-
21
22\begin{tabular}{M{10cm}||M{4cm}}
23$$
24\xymatrix{
25% Row 1
26  L_1 \\
27%
28% Row 2
29  L_{1,\ell}
30  \ar[u]^{\cl{I}}
31  \ar@/^/[d]^{\w{er}_1}
32  \ar[r]^{\cl{C}_1}
33%
34& L_{2,\ell}
35  \ar[d]^{\w{er}_2
36%
37& \ldots \hspace{0.3cm}\ar[r]^{\cl{C}_k}
38%
39& L_{k+1, \ell}
40  \ar[d]^{\w{er}_{k+1}}  \\
41%
42% Row 3
43  L_1                                 
44  \ar@/^/[u]^{\cl{L}}
45  \ar[r]^{\cl{C}_1}
46& L_2   
47& \ldots\hspace{0.3cm}
48  \ar[r]^{\cl{C}_{k}}
49& L_{k+1}
50}
51$$
52&
53$$
54\begin{array}{ccc}
55\w{er}_{i+1} \comp \cl{C}_{i} &= &\cl{C}_{i} \comp \w{er}_{i} \\
56
57\w{er}_{1} \comp \cl{L} &= &\w{id}_{L_{1}} \\ 
58
59\w{An}_{1} &= &\cl{I} \comp \cl{L} 
60\end{array}
61$$
62\end{tabular}
63
64\-
65
66For each language $L_i$ considered in the compilation process,
67we define an extended {\em labelled} language $L_{i,\ell}$ and an
68extended operational semantics. The labels are used to mark
69certain points of the control. The semantics makes sure
70that whenever we cross a labelled control point a labelled and observable
71transition is produced.
72
73For each labelled language there is an obvious function $\w{er}_i$
74erasing all labels and producing a program in the corresponding
75unlabelled language.
76The compilation functions $\cl{C}_i$ are extended from the unlabelled
77to the labelled language so that they enjoy commutation
78with the erasure functions. Moreover, we lift the soundness properties of
79the compilation functions from the unlabelled to the labelled languages
80and transition systems.
81
82A {\em labelling} $\cl{L}$  of the source language $L_1$ is just a function
83such that $\w{er}_{L_{1}} \comp \cl{L}$ is the identity function.
84An {\em instrumentation} $\cl{I}$ of the source labelled language  $L_{1,\ell}$
85is a function replacing the labels with suitable increments of, say,
86a fresh \w{cost} variable. Then an {\em annotation} $\w{An}_{1}$ of the
87source program can be derived simply as the composition of the labelling
88and the instrumentation functions: $\w{An}_{1} = \cl{I}\comp \cl{L}$.
89
90As for the direct approach, suppose $s$ is some adequate representation
91of the state of a program. Let $S$ be a source program and suppose
92that its annotation satisfies the following property:
93\begin{equation}\label{STEP1}
94(\w{An}_{1}(S),s[c/\w{cost}])  \eval s'[c+\delta/\w{cost}]
95\end{equation}
96where $\delta$ is some non-negative number.
97Then the definition of the instrumentation and the fact that
98the soundness proofs of the compilation functions have been lifted
99to the labelled languages allows to conclude that
100\begin{equation}\label{step2}
101(\cl{C}(L(S)),s[c/\w{cost}])  \eval (s'[c/\w{cost}],\lambda)
102\end{equation}
103where $\cl{C} = \cl{C}_{k} \comp \cdots \comp \cl{C}_{1}$ and
104$\lambda$ is a sequence (or a multi-set) of labels whose `cost'
105corresponds to the number $\delta$ produced by the
106annotated program.
107Then the commutation properties of erasure and compilation functions
108allows to conclude that the {\em erasure} of the compiled
109labelled code $\w{er}_{k+1}(\cl{C}(L(S)))$ is actually  equal to
110the compiled code $\cl{C}(S)$ we are interested in.
111Given this, the following question arises:
112
113\begin{quote}
114Under which conditions
115the sequence $\lambda$, {\em i.e.}, the increment $\delta$,
116is a sound and possibly precise description of the
117execution cost of the object code?
118\end{quote}
119
120To answer this question, we observe that the object code we are interested in is some kind of assembly code
121and its control flow can be easily represented as a control flow
122graph. The fact that we have to prove the soundness of the compilation
123functions means that we have plenty of pieces of information
124on the way the control flows in the compiled code, in particular as
125far as procedure calls and returns are concerned.
126These pieces of information allow to build a rather accurate representation
127of the control flow of the compiled code at run time.
128
129The idea is then to perform two simple checks on the control flow
130graph. The first check is to verify that all loops go through a labelled node.
131If this is the case then we can associate a finite cost with
132every label and prove that the cost annotations are sound.
133The second check amounts to verify that all paths starting from
134a label have the same cost. If this check is successful then we can
135conclude that the cost annotations are precise.
136
137This approach, as published in~\cite{Ayache2012} is the one used in the CerCo project. However, it has some weaknesses, which we will describe in the next subsection. We also propose a method to resolve these weaknesses.
138
139\subsection{Indexed labelling}
140
141THe labelling approach as described above does not deal well with some optimisations the compiler might introduce, such as changes in the control flow. For example, the 8051 architecture does not have an instruction for
142multi-word division, which means that a function containing loops has to be inserted into the code. In fact, this is symptomatic of a more general weakness: the inability to state different costs for different occurrences of labels,
143where the difference might be originated by labels being duplicated along the compilation, or the costs being sensitive to the current state of execution.
144
145We present a solution that addresses this weakness by introducing cost labels that are dependent on the iteration of their containing loop.
146This is achieved by means of \emph{indexed labels}; all cost labels are decorated with formal indices coming from the loops containing such labels.
147These indices allow us to keep track, even after multiple loop transformations, of which iteration of the original loop in the source code a particular label occurrence belongs to.
148During the annotation stage of the source code, this information is presented to the user by means of \emph{dependent costs}.
149
150Over the course of this section, we will use two specific types of optimisations to introduce the indexed labelling approach. We argue that these two types pose a good foundation for extending the labelling approach,
151in order to cover more and more common optimisations, as well as gaining insight into how to integrate advanced cost estimation techniques, such as cache analysis, into the CerCo compiler.
152Moreover loop peeling itself has the fortuitous property of enhancing and enabling other optimisations.
153Experimentation with CerCo's untrusted prototype compiler, which implements constant propagation and partial redundancy elimination~\cite{PRE,muchnick}, show how loop peeling enhances those other optimisations.
154
155\paragraph{Loop peeling}
156Loop peeling consists in preceding the loop with a copy of its body, appropriately guarded.
157This is used, in general, to trigger further optimisations, such as those that rely on execution information which can be computed at compile time, but which is erased by further iterations of the loop, or those that use the hoisted code to be more effective at eliminating redundant code.
158Integrating this transformation in to the labelling approach would also allow the integration of the common case of cache analysis explained above; the analysis of cache hits and misses usually benefits from a form of \emph{virtual} loop peeling~\cite{cacheprediction}.
159
160\paragraph{Loop unrolling}
161This optimisation consists of the repetition of several copies of the body of the loop inside the loop itself (inserting appropriate guards, or avoiding them altogether if enough information about the loop's guard is available at compile time).
162This can limit the number of (conditional or unconditional) jumps executed by the code and trigger further optimisations dealing with pipelining, if appropriate for the architecture.
163
164\subsubsection{\imp{} with goto}\label{sec:defimp}
165We briefly outline a toy language, \imp{} with \s{goto}s.
166This language was designed in order to pose problems for the existing labelling approach, and as a testing ground for our new notion of indexed labels.
167
168The syntax and operational semantics of \imp{} are presented in figure~\ref{fig:minimp}.
169Note that we can augment the language with \s{break} and \s{continue} statements at no further expense.
170
171\begin{figureplr}
172$$\begin{gathered}
173\begin{array}{nlBl>(R<)n}
174\multicolumn{4}{C}{\bfseries Syntax}\\
175\multicolumn{4}{ncn}{
176  \ell,\ldots \hfill \text{(labels)} \hfill x,y,\ldots \hfill
177\text{(identifiers)}
178\hfill e,f,\ldots \hfill \text{(expression)}
179}\\
180P,S,T,\ldots &\gramm& \s{skip} \mid s;t
181\mid \sop{if}e\sbin{then}s\sbin{else}t
182\mid \sop{while} e \sbin{do} s \mid
183  x \ass e
184\\&\mid&
185\ell : s \mid \sop{goto}\ell& \spanr{-2}{statements}\\
186\\
187\multicolumn{4}{C}{\bfseries Semantics}\\
188K,\ldots  &\gramm& \s{halt} \mid S \cdot K & continuations
189\end{array}
190\\[15pt]
191\s{find}(\ell,S,K) \ass
192\left\{\begin{array}{lL}
193\bot & if $S=\s{skip},\sop{goto} \ell'$ or $x\ass e$,\\
194(T, K) & if $S=\ell:T$,\\
195\s{find}(\ell,T,K) & otherwise, if $S = \ell':T$,\\
196\s{find}(\ell,T_1,T_2\cdot K) & if defined and $S=T_1;T_2$,\\
197\s{find}(\ell,T_1,K) & if defined and
198$S=\sop{if}b\sbin{then}T_1\sbin{else}T_2$,\\
199\s{find}(\ell,T_2,K) & otherwise, if $S=T_1;T_2$ or
200$\sop{if}b\sbin{then}T_1\sbin{else}T_2$,\\
201\s{find}(\ell,T,S\cdot K) & if $S = \sop{while}b\sbin{do}T$.
202\end{array}\right.
203\\[15pt]
204\begin{array}{lBl}
205(x:=e,K,s)  &\to_P& (\s{skip},K,s[v/x]) \qquad\mbox{if }(e,s)\eval v \\ \\
206
207(S;T,K,s)  &\to_P& (S,T\cdot K,s) \\ \\
208
209(\s{if} \ b \ \s{then} \ S \ \s{else} \ T,K,s)
210&\to_P&\left\{
211\begin{array}{ll}
212(S,K,s) &\mbox{if }(b,s)\eval v \neq 0 \\
213(T,K,s) &\mbox{if }(b,s)\eval 0
214\end{array}
215\right. \\ \\
216
217
218(\s{while} \ b \ \s{do} \ S ,K,s)
219&\to_P&\left\{
220\begin{array}{ll}
221(S,\s{while} \ b \ \s{do} \ S \cdot K,s) &\mbox{if }(b,s)\eval v \neq 0 \\
222(\s{skip},K,s) &\mbox{if }(b,s)\eval 0
223\end{array}
224\right. \\ \\
225
226
227(\s{skip},S\cdot K,s)  &\to_P&(S,K,s) \\ \\
228
229(\ell : S, K, s)  &\to_P& (S,K,s) \\ \\
230
231(\sop{goto}\ell,K,s)  &\to_P& (\s{find}(\ell,P,\s{halt}),s) \\ \\
232\end{array}
233\end{gathered}$$
234\caption{The syntax and operational semantics of \imp.}
235\label{fig:minimp}
236\end{figureplr}
237
238The precise grammar for expressions is not particularly relevant so we do not give one in full.
239For the sake of conciseness we also treat boolean and arithmetic expressions together (with the usual \textsc{c} convention of an expression being true iff non-zero).
240We may omit the \s{else} clause of a conditional if it leads to a \s{skip} statement.
241
242We will presuppose that all programs are \emph{well-labelled}, \ie every label labels at most one occurrence of a statement in a program, and every \s{goto} points to a label actually present in the program.
243The \s{find} helper function has the task of not only finding the labelled statement in the program, but also building the correct continuation.
244The continuation built by \s{find} replaces the current continuation in the case of a jump.
245
246\paragraph{Further down the compilation chain}
247We abstract over the rest of the compilation chain.
248We posit the existence of a suitable notion of `sequential instructions', wherein each instruction has a single natural successor to which we can add our own, for every language $L$ further down the compilation chain.
249
250\subsubsection{Loop transformations}
251We call a loop $L$ \emph{single-entry} in $P$ if there is no \s{goto} to $P$ outside of $L$ which jumps into $L$.\footnote{This is a reasonable aproximation: it defines a loop as multi-entry if it has an external but unreachable \s{goto} jumping into it.}
252Many loop optimisations do not preserve the semantics of multi-entry loops in general, or are otherwise rendered ineffective.
253Usually compilers implement a single-entry loop detection which avoids the multi-entry ones from being targeted by optimisations~\cite{muchnick,morgan}.
254The loop transformations we present are local, \ie they target a single loop and transform it.
255Which loops are targeted may be decided by some \emph{ad hoc} heuristic.
256However, the precise details of which loops are targetted and how is not important here.
257
258\paragraph{Loop peeling}
259$$
260\sop{while}b\sbin{do}S \mapsto \sop{if}b\sbin{then} S; \sop{while} b \sbin{do} S[\ell'_i/\ell_i]
261$$
262where $\ell'_i$ is a fresh label for any $\ell_i$ labelling a statement in $S$.
263This relabelling is safe for \s{goto}s occurring outside the loop because of the single-entry condition.
264Note that for \s{break} and \s{continue} statements, those should be replaced with \s{goto}s in the peeled body $S$.
265
266\paragraph{Loop unrolling}
267$$
268\sop{while}b\sbin{do}S\mapsto
269\sop{while} b \sbin{do} (S ;
270  \sop{if} b \sbin{then} (S[\ell^1_i/\ell_i] ;
271  \cdots
272  \sop{if} b \sbin{then} S[\ell^n_i/\ell_i]) \cdots)
273$$
274where $\ell^j_i$ are again fresh labels for any $\ell_i$ labelling a statement in $S$.
275This is a wilfully na\"{i}ve version of loop unrolling, which usually targets less general loops.
276The problem this transformation poses to CerCo's labelling approach are independent of the sophistication of the actual transformation.
277
278\begin{example}
279In \autoref{fig:example1} we show a program (a wilfully inefficient computation of of the
280sum of the first $n$ factorials) and a possible transformation of it, combining loop
281peeling and loop unrolling.
282\begin{figureplr}
283$$
284\fbox{$\begin{array}{l}
285s\ass 0;\\
286i\ass 0;\\
287\sop{while}i<n\sbin{do}\\
288\inde p\ass 1;\\
289\inde j\ass 1;\\
290\inde \sop{while}j \le i\sbin{do}\\
291\inde \inde p\ass j*p\\
292\inde \inde j\ass j+1;\\
293\inde s\ass s+p;\\
294\inde i\ass i+1;\\
295\end{array}
296$}
297\mapsto
298\fbox{$\begin{array}{l}
299s\ass 0;\\
300i\ass 0;\\
301\tikztargetm{a}{\s{if}}\ i<n\sbin{then}\\
302\inde p\ass 1;\\
303\inde j\ass 1;\\
304\inde \sop{while}j \le i\sbin{do}\\
305\inde \inde p\ass j*p\\
306\inde \inde j\ass j+1;\\
307\inde s\ass s+p;\\
308\inde i\ass i+1;\\
309\inde \tikztargetm{d}{\s{while}}\ i<n\sbin{do}\\
310\inde \inde p\ass 1;\\
311\inde \inde j\ass 1;\\
312\inde \inde \tikztargetm{b}{\s{if}}\ j \le i\sbin{then}\\
313\inde \inde \inde p\ass j*p\\
314\inde \inde \inde j\ass j+1;\\
315\inde \inde \inde \sop{if}j \le i\sbin{then}\\
316\inde \inde \inde \inde p\ass j*p\\
317\inde \inde \inde \inde j\ass j+1;\\
318\inde \inde \inde \inde \s{while}\ j \le i\sbin{do}\\
319\inde \inde \inde \inde \inde p\ass j*p\\
320\inde \inde \inde \inde \inde j\ass j+1;\\
321\inde \inde \inde \inde \inde \sop{if}j \le i\sbin{then}\\
322\inde \inde \inde \inde \inde \inde p\ass j*p\\
323\inde \inde \inde \inde \inde \inde \tikztargetm{c}j\ass j+1;\\
324\inde \inde s\ass s+p;\\
325\inde \inde i\ass i+1;\\
326\inde \inde \sop{if}i<n\sbin{then}\\
327\inde \inde \inde p\ass 1;\\
328\inde \inde \inde j\ass 1;\\
329\inde \inde \inde \tikztargetm{e}{\s{while}}\ j < i\sbin{do}\\
330\inde \inde \inde \inde p\ass j*p\\
331\inde \inde \inde \inde j\ass j+1;\\
332\inde \inde \inde \inde \s{if}\ j < i\sbin{then}\\
333\inde \inde \inde \inde \inde p\ass j*p\\
334\inde \inde \inde \inde \inde \tikztargetm{f}j\ass j+1;\\
335\inde \inde \inde s\ass s+p;\\
336\inde \inde \inde i\ass i+1\tikztargetm{g};{}
337\end{array}
338$}\tikztargetm{right}{}
339\begin{tikzpicture}[overlay, remember picture, thick,
340brace/.style = {decorate, decoration={brace, amplitude = 15pt}},
341label/.style = {sloped, anchor = base, yshift = 17pt, font = \large}]
342\draw [brace, transform canvas={xshift=5pt}] (b.north-|right) -- node[label]{peeled} (c.south-|right);
343\draw [brace, transform canvas={xshift=30pt}] (b.north-|right) -- node[label]{unrolled} (c.south-|right);
344\draw [brace, transform canvas={xshift=5pt}] (e.north-|right) -- node[label]{unrolled} (f.south-|right);
345\draw [brace, transform canvas={xshift=55pt}] (d.north-|right) -- node[label]{unrolled} (g.south-|right);
346\draw [brace, transform canvas={xshift=80pt}] (a.north-|right) -- node[label]{peeled} (g.south-|right);
347\end{tikzpicture}
348\hspace{85pt}{}
349$$
350\caption{An example of loop transformations done on an \imp{} program. Parentheses are omitted in favour of
351blocks by indentation.}
352\label{fig:example1}
353\end{figureplr}
354\end{example}
355
356\subsubsection{Indexed labels}
357\label{sec:indexedlabels}
358This section presents the core of the new approach.
359In brief points it amounts to the following:
360\begin{enumerate}[\bfseries~\ref*{sec:indexedlabels}.1.]
361\item
362\label{en:outline1}
363Enrich cost labels with formal indices corresponding, at the beginning of the process, to which iteration of the loop they belong to.
364\item
365\label{en:outline2}
366Each time a loop transformation is applied and a cost labels is split in different occurrences, each of these will be reindexed so that every time they are emitted their position in the original loop will be reconstructed.
367\item
368\label{en:outline3}
369Along the compilation chain, alongside the \s{emit} instruction we add other instructions updating the indices, so that iterations of the original loops can be rebuilt at the operational semantics level.
370\item
371\label{en:outline4}
372The machinery computing the cost mapping will still work, but assigning costs to indexed cost labels, rather than to cost labels as we wish.
373However, \emph{dependent costs} can be calculated, where dependency is on which iteration of the containing loops we are in.
374\end{enumerate}
375
376\subsubsection{Indexing the cost labels}
377\label{ssec:indlabs}
378
379\paragraph{Formal indices and $\iota\ell\imp$}
380Let $i_0,i_1,\ldots$ be a sequence of distinguished fresh identifiers that will be used as loop indices.
381A \emph{simple expression} is an affine arithmetical expression in one of these indices, that is $a*i_k+b$ with $a,b,k \in \mathbb N$.
382Simple expressions $e_1=a_1*i_k+b_1$, $e_2=a2*i_k+b_2$ in the same index can be composed, yielding $e_1\circ e_2\ass (a_1a_2)*i_k + (a_1b2+b_1)$, and this operation has an identity element in $id_k \ass 1*i_k+0$.
383Constants can be expressed as simple expressions, so that we identify a natural $c$ with $0*i_k+c$.
384
385An \emph{indexing} (with metavariables $I$, $J$, \ldots) is a list of transformations of successive formal indices dictated by simple expressions, that is a mapping\footnote{Here we restrict each mapping to be a simple expression \emph{on the same index}.
386This might not be the case if more loop optimisations are accounted for (for example, interchanging two nested loops).}
387$$
388i_0\mapsto a_0*i_0+b_0,\dots, i_{k-1} \mapsto a_{k-1}*i_{k-1}+b_{k-1}
389$$
390
391An \emph{indexed cost label} (metavariables $\alphab$, $\betab$, \ldots) is the combination of a cost label $\alpha$ and an indexing $I$, written $\alpha\la I\ra$.
392The cost label underlying an indexed one is called its \emph{atom}.
393All plain labels can be considered as indexed ones by taking an empty indexing.
394
395\imp{} with indexed labels ($\iota\ell\imp$) is defined by adding to $\imp$ statements with indexed labels, and by having loops with formal indices attached to them:
396$$
397S,T,\ldots \gramm \cdots i_k:\sop{while}e\sbin{do}S\mid \alphab : S
398$$
399Note than unindexed loops still exist in the language: they will correspond to multi-entry loops which are ignored by indexing and optimisations.
400We will discuss the semantics later.
401
402\paragraph{Indexed labelling}
403Given an $\imp$ program $P$, in order to index loops and assign indexed labels, we must first distinguish single-entry loops.
404We sketch how this can be computed in the sequel.
405
406A first pass of the program $P$ can easily compute two maps: $\s{loopof}_P$ from each label $\ell$ to the occurrence (\ie the path) of a $\s{while}$ loop containing $\ell$, or the empty path if none exists; and $\s{gotosof}_P$ from a label $\ell$ to the occurrences of \s{goto}s pointing to it.
407Then the set $\s{multientry}_P$ of multi-entry loops of $P$ can be computed by
408$$
409\s{multientry}_P\ass\{\, p \mid \exists \ell,q.p =\s{loopof}_P(\ell),q\in\s{gotosof}_P(\ell), q \not\le p\,\}
410$$
411Here $\le$ is the prefix relation\footnote{Possible simplifications to this procedure include keeping track of just the while loops containing labels and \s{goto}s (rather than paths in the syntactic tree of the program), and making two passes while avoiding building the map to sets $\s{gotosof}$}.
412
413Let $Id_k$ be the indexing of length $k$ made from identity simple expressions, \ie the sequence $i_0\mapsto id_0, \ldots , i_{k-1}\mapsto id_{k-1}$.
414We define the tiered indexed labelling $\Ell^\iota_P (S,k)$ in program $P$ for occurrence $S$ of a statement in $P$ and a natural $k$ by recursion, setting:
415$$
416\Ell^\iota_P(S,k)\ass
417\left\{
418\begin{array}{lh{-100pt}l}
419 (i_k:\sop{while}b\sbin{do}\alpha\la Id_{k+1}\ra : \Ell^\iota_P(T,k+1));\beta\la Id_k \ra : \s{skip}
420\\& \text{if $S=\sop{while}b\sbin{do}T$ and $S\notin \s{multientry}_P$,}\\[3pt]
421(\sop{while}b\sbin{do}\alpha\la Id_k \ra : \Ell^\iota_P(T,k));\beta\la Id_k \ra : \s{skip}
422\\& \text{otherwise, if $S=\sop{while}b\sbin{do}T$,}\\[3pt]
423\sop{if}b\sbin{then} \alpha\la Id_k \ra : \Ell^\iota_P(T_1,k) \sbin{else} \beta\la Id_k \ra : \Ell^\iota_P(T_2,k)
424\\&\text{if $S=\sop{if}b\sbin{then}T_1\sbin{else}T_2$,}\\[3pt]
425\ell:\alpha\la Id_k\ra : \Ell_P^\iota(T,k) & \text{if $S = \ell : T$,}\\[3pt]
426\ldots
427\end{array}
428\right.
429$$
430Here, as usual, $\alpha$ and $\beta$ are fresh cost labels, and other cases just keep making the recursive calls on the substatements.
431The \emph{indexed labelling} of a program $P$ is then defined as $\alpha\la \ra : \Ell^\iota_P(P,0)$, \ie a further fresh unindexed cost label is added at the start, and we start from level $0$.
432
433In plainer words: each single-entry loop is indexed by $i_k$ where $k$ is the number of other single-entry loops containing this one, and all cost labels under the scope of a single-entry loop indexed by $i_k$ are indexed by all indices $i_0,\ldots,i_k$, without any transformation.
434
435\subsubsection{Indexed labels and loop transformations}\label{ssec:looptrans}
436We define the \emph{reindexing} $I \circ (i_k\mapsto a*i_k+b)$ as an operator on indexings by setting:
437\begin{multline*}
438(i_0\mapsto e_0,\ldots, i_k \mapsto e_k,\ldots,i_n\mapsto e_n)
439\circ(i_k\mapsto a*i_k+b)
440\ass\\
441i_0\mapsto e_0,\ldots, i_k \mapsto e_k \circ(a*i_k+b),\ldots,i_n\mapsto e_n,
442\end{multline*}
443We further extend to indexed labels (by $\alpha\la I\ra\circ(i_k\mapsto e)\ass \alpha\la I\circ (i_k\mapsto e)\ra$) and also to statements in $\iota\ell\imp$ (by applying the above transformation to all indexed labels).
444
445We can then redefine loop peeling and loop unrolling, taking into account indexed labels.
446It will only be possible to apply the transformation to indexed loops, that is loops that are single-entry.
447The attentive reader will notice that no assumptions are made on the labelling of the statements that are involved.
448In particular the transformation can be repeated and composed at will.
449Also, note that after erasing all labelling information (\ie indexed cost labels and loop indices) we recover exactly the same transformations presented in~\ref{sec:defimp}.
450
451\paragraph{Indexed loop peeling}
452$$
453i_k:\sop{while}b\sbin{do}S\mapsto
454\sop{if}b\sbin{then} S\circ (i_k\mapsto 0); i_k : \sop{while} b \sbin{do} S[\ell'_i/\ell_i]\circ(i_k\mapsto i_k + 1)
455$$
456As can be expected, the peeled iteration of the loop gets reindexed, always being the first iteration of the loop, while the iterations of the remaining loop are shifted by $1$. Notice that this transformation can lower the actual depth of some loops, however their index is left untouched.
457
458\paragraph{Indexed loop unrolling}
459$$
460\begin{array}{l}
461\begin{array}{ncn}
462i_k:\sop{while}b\sbin{do}S\\
463\tikz\node[rotate=-90,inner sep=0pt]{$\mapsto$};
464\end{array}\\
465i_k:\sop{while} b \sbin{do}\\
466\quad (S\circ(i_k\mapsto n*i_k) ;\\
467\quad \sop{if} b \sbin{then}\\
468\quad\quad (S[\ell^1_i/\ell_i]\circ(i_k\mapsto n*i_k+1) ;\\
469\quad\quad\quad \vdots \\
470\quad\quad \quad \sop{if} b \sbin{then}\\
471\quad \quad \quad \quad S[\ell^n_i/\ell_i]\circ(i_k\mapsto n*i_k+n-1)
472)\cdots )
473\end{array}
474$$
475Again, the reindexing is as expected: each copy of the unrolled body has its indices remapped so that when they are executed, the original iteration of the loop to which they correspond can be recovered.
476
477\subsubsection{Semantics and compilation of indexed labels}
478In order to make sense of loop indices, one must keep track of their values in the state.
479A \emph{constant indexing} (metavariables $C,\ldots$) is an indexing which employs only constant simple expressions.
480The evaluation of an indexing $I$ in a constant indexing $C$, noted $I|_C$, is defined by:
481$$
482I\circ(i_0\mapsto c_0,\ldots, i_{k-1}\mapsto c_{k-1}) \ass \alphab\circ(i_0\mapsto c_0)\circ\cdots\circ(i_{k-1}\mapsto c_{k-1})
483$$
484Here, we are using the definition of ${-}\circ{-}$ given in~\ref{ssec:indlabs}.
485We consider the above defined only if the the resulting indexing is a constant one too\footnote{For example $(i_0\mapsto 2*i_0,i_1\mapsto i_1+1)|_{i_0\mapsto 2}$ is undefined, but $(i_0\mapsto 2*i_0,i_1\mapsto 0)|_{i_0\mapsto 2}= i_0\mapsto 4,i_1\mapsto 2$, is indeed a constant indexing, even if the domain of the original indexing is not covered by the constant one.}.
486The definition is extended to indexed labels by $\alpha\la I\ra|_C\ass \alpha\la I|_C\ra$.
487
488Constant indexings will be used to keep track of the exact iterations of the original code that the emitted labels belong to.
489We thus define two basic actions to update constant indexings: $C[i_k{\uparrow}]$ increments the value of $i_k$ by one, and $C[i_k{\downarrow}0]$ resets it to $0$.
490
491We are ready to update the definition of the operational semantics of indexed labelled \imp.
492The emitted cost labels will now be ones indexed by constant indexings.
493We add a special indexed loop construct for continuations that keeps track of active indexed loop indices:
494$$
495K,\ldots  \gramm \cdots | i_k:\sop{while} b \sbin {do} S \sbin{then}  K
496$$
497The difference between the regular stack concatenation $i_k:\sop{while}b\sbin{do}S\cdot K$ and the new constructor is that the latter indicates the loop is the active one in which we already are, while the former is a loop that still needs to be started\footnote{In the presence of \s{continue} and \s{break} statements active loops need to be kept track of in any case.}.
498The \s{find} function is updated accordingly with the case
499$$
500\s{find}(\ell, i_k:\sop{while}b\sbin{do}S, K) \ass \s{find}(\ell, S, i_k: \sop{while}b\sbin{do}S\sbin{then}K)
501$$
502The state will now be a 4-tuple $(S,K,s,C)$ which adds a constant indexing to the triple of the regular semantics.
503The small-step rules for all statements remain the same, without touching the $C$ parameter (in particular unindexed loops behave the same as usual), apart from the ones regarding cost-labels and indexed loops.
504The remaining cases are:
505$$\begin{aligned}
506   (\alphab : S,K,s,C) &\to[\alphab|_C]_P (S,K,s,C)\\
507   (i_k:\sop{while}b\sbin{do}S,K,C) &\to[\varepsilon]_P
508    \begin{cases}
509     (S,i_k:\sop{while}b\sbin{do}S\sbin{then} K,s,C[i_k{\downarrow}0])
510     \\\hskip 125pt \text{if $(b,s)\eval v\neq 0$,}\\
511     \rlap{(\s{skip}, K, s, C)}\hskip 125pt \text{otherwise}
512    \end{cases}\\
513   (\s{skip}, i_k:\sop{while}b\sbin{do}S\sbin{then}K,C) &\to[\varepsilon]_P
514    \begin{cases}
515     (S,i_k:\sop{while}b\sbin{do}S\sbin{then} K,s,C[i_k{\uparrow}])
516      \\\hskip 125pt \text{if $(b,s)\eval v\neq 0$,}\\
517     \rlap{(\s{skip}, K, s, C)} \hskip 125pt \text{otherwise}
518    \end{cases}
519  \end{aligned}$$
520Some explanations are in order:
521\begin{itemize}
522\item
523Emitting a label always instantiates it with the current indexing.
524\item
525Hitting an indexed loop the first time initializes the corresponding index to 0; continuing the same loop increments the index as expected.
526\item
527The \s{find} function ignores the current indexing: this is correct under the assumption that all indexed loops are single entry, so that when we land inside an indexed loop with a \s{goto}, we are sure that its current index is right.
528\item
529The starting state with store $s$ for a program $P$ is $(P,\s{halt},s,(i_0\mapsto 0,\dots,i_{n-1}\mapsto 0)$ where $i_0,\ldots,i_{n-1}$ cover all loop indices of $P$\footnote{For a program which is the indexed labelling of an \imp{} one this corresponds to the maximum nesting of single-entry loops.
530We can also avoid computing this value in advance if we define $C[i{\downarrow}0]$ to extend $C$'s domain as needed, so that the starting constant indexing can be the empty one.}.
531\end{itemize}
532
533\paragraph{Compilation}
534Further down the compilation chain the loop structure is usually partially or completely lost.
535We cannot rely on it anymore to keep track of the original source code iterations.
536We therefore add, alongside the \s{emit} instruction, two other sequential instructions $\sop{ind_reset}k$ and $\sop{ind_inc}k$ whose sole effect is to reset to 0 (resp.\ increment by 1) the loop index $i_k$, as kept track of in a constant indexing accompanying the state.
537
538The first step of compilation from $\iota\ell\imp$ consists of prefixing the translation of an indexed loop $i_k:\s{while}\ b\ \s{do}\ S$ with $\sop{ind_reset}k$ and postfixing the translation of its body $S$ with $\sop{ind_inc}k$.
539Later in the compilation chain we must propagate the instructions dealing with cost labels.
540
541We would like to stress the fact that this machinery is only needed to give a suitable semantics of observables on which preservation proofs can be done.
542By no means are the added instructions and the constant indexing in the state meant to change the actual (let us say denotational) semantics of the programs.
543In this regard the two new instruction have a similar role as the \s{emit} one.
544A forgetful mapping of everything (syntax, states, operational semantics rules) can be defined erasing all occurrences of cost labels and loop indices, and the result will always be a regular version of the language considered.
545
546\paragraph{Stating the preservation of semantics}
547In fact, the statement of preservation of semantics does not change at all, if not for considering traces of evaluated indexed cost labels rather than traces of plain ones.
548
549\subsubsection{Dependent costs in the source code}
550\label{ssec:depcosts}
551The task of producing dependent costs from constant costs induced by indexed labels is quite technical.
552Before presenting it here, we would like to point out that the annotations produced by the procedure described in this Subsection, even if correct, can be enormous and unreadable.
553In Section~\ref{sec:conc}, where we detail the actual implementation, we will also sketch how we mitigated this problem.
554
555Having the result of compiling the indexed labelling $\Ell^\iota(P)$ of an \imp{} program $P$, we may still suppose that a cost mapping can be computed, but from indexed labels to naturals.
556We want to annotate the source code, so we need a way to express and compute the costs of cost labels, \ie group the costs of indexed labels to ones of their atoms.
557In order to do so we introduce \emph{dependent costs}.
558Let us suppose that for the sole purpose of annotation, we have available in the language ternary expressions of the form
559$$\tern e {f_1}{f_2},$$
560and that we have access to common operators on integers such as equality, order and modulus.
561
562\paragraph{Simple conditions}
563
564First, we need to shift from \emph{transformations} of loop indices to \emph{conditions} on them.
565We identify a set of conditions on natural numbers which are able to express the image of any composition of simple expressions.
566\emph{Simple conditions} are of three possible forms:
567\begin{itemize}
568\item
569Equality $i_k=n$ for some natural $n$.
570\item
571Inequality $i_k\ge n$ for some natural $n$.
572\item
573Modular equality together with inequality $i_k\bmod a = b\wedge i_k\ge n$ for naturals $a, b, n$.
574\end{itemize}
575The `always true' simple condition is given by $i_k\ge 0$.
576We write $i_k\bmod a = b$ as a simple condition for $i_k\bmod a = b\wedge i_k\ge 0$.
577
578Given a simple condition $p$ and a constant indexing $C$ we can easily define when $p$ holds for $C$ (written $p\circ C$).
579A \emph{dependent cost expression} is an expression built solely out of integer constants and ternary expressions with simple conditions at their head.
580Given a dependent cost expression $e$ where all of the loop indices appearing in it are in the domain of a constant indexing $C$, we can define the value $e\circ C\in \mathbb N$ by:
581$$n\circ C\ass n,\qquad (\tern p e f)\circ C\ass
582\begin{cases}
583  e\circ C& \text{if $p\circ C$,}\\
584  f\circ C& \text{otherwise.}
585\end{cases}$$
586
587\paragraph{From indexed costs to dependent ones}
588Every simple expression $e$ corresponds to a simple condition $p(e)$ which expresses the set of values that $e$ can take.
589Following is the definition of such a relation.
590We recall that in this development, loop indices are always mapped to simple expressions over the same index.
591If it was not the case, the condition obtained from an expression should be on the mapped index, not the indeterminate of the simple expression.
592We leave all generalisations of what we present here for further work:
593$$
594p(a*i_k+b)\ass
595\begin{cases}
596i_k = b & \text{if $a = 0$,}\\
597i_k \ge b & \text{if $a = 1$,}\\
598i_k\bmod a = b' \wedge i_k \ge b & \text{otherwise, where $b' = b\bmod a$}.
599\end{cases}
600$$
601Now, suppose we are given a mapping $\kappa$ from indexed labels to natural numbers.
602We will transform it in a mapping (identified, via abuse of notation, with the same symbol $\kappa$) from atoms to \imp{} expressions built with ternary expressions which depend solely on loop indices.
603To that end we define an auxiliary function $\kappa^\alpha_L$, parameterized by atoms and words of simple expressions, and defined on \emph{sets} of $n$-uples of simple expressions (with $n$ constant across each such set, \ie each set is made of words all with the same length).
604
605We will employ a bijection between words of simple expressions and indexings, given by:\footnote{Lists of simple expressions are in fact how indexings are -represented in CerCo's current implementation of the compiler.}
606$$
607i_0\mapsto e_0,\ldots,i_{k-1}\mapsto e_{k-1} \cong e_0\cdots e_{k-1}.
608$$
609As usual, $\varepsilon$ denotes the empty word/indexing, and juxtaposition is used to denote word concatenation.
610
611For every set $s$ of $n$-uples of simple expressions, we are in one of the following three exclusive cases:
612\begin{itemize}
613\item
614$S=\emptyset$.
615\item
616$S=\{\varepsilon\}$.
617\item
618There is a simple expression $e$ such that $S$ can be decomposed in $eS'+S''$, with $S'\neq \emptyset$ and none of the words in $S''$ starting with $e$.
619\end{itemize}
620Here $eS'$ denotes prepending $e$ to all elements of $S'$ and $+$ is disjoint union.
621This classification can serve as the basis of a definition by recursion on $n+\card S$ where $n$ is the size of tuples in $S$ and $\card S$ is its cardinality.
622Indeed in the third case in $S'$ the size of tuples decreases strictly (and cardinality does not increase) while for $S''$ the size of tuples remains the same but cardinality strictly decreases.
623The expression $e$ of the third case will be chosen as minimal for some total order\footnote{The specific order used does not change the correctness of the procedure, but different orders can give more or less readable results. A ``good'' order is the lexicographic one, with $a*i_k+b \le a'*i_k+b'$ if $a<a'$ or $a=a'$ and $b\le b'$.}.
624
625Following is the definition of the auxiliary function $\kappa^\alpha_L$, which follows the recursion scheme presented above:
626$$
627\begin{aligned}
628\kappa^\alpha_L(\emptyset) &\ass 0\\
629\kappa^\alpha_L(\{\varepsilon\}) &\ass \kappa(\alpha\la L\ra) \\
630\kappa^\alpha_L(eS'+S'') &\ass \tern{p(e)}{\kappa^\alpha_{Le}(S')}{\kappa^\alpha_L(S'')}
631\end{aligned}
632$$
633\noindent
634Finally, the wanted dependent cost mapping is defined by
635$$
636\kappa(\alpha)\ass\kappa^\alpha_\varepsilon(\{\,L\mid \alpha\la L\ra \text{ appears in the compiled code}\,\})
637$$
638
639\paragraph{Indexed instrumentation}
640The \emph{indexed instrumentation} generalises the instrumentation presented in~\ref{sec:labelling}.
641We described above how cost atoms can be mapped to dependent costs.
642The instrumentation must also insert code dealing with the loop indices.
643As instrumentation is done on the code produced by the labelling phase, all cost labels are indexed by identity indexings.
644The relevant cases of the recursive definition (supposing $c$ is the cost variable) are then:
645$$
646\begin{aligned}
647\mathcal I^\iota(\alpha\la Id_k\ra:S) &= c\ass c + \kappa(\alpha);\mathcal I^\iota(S)\\
648\mathcal I^\iota(i_k : \sop{while}b\sbin{do}S) &=
649  i_k \ass 0; \sop{while}b\sbin{do}(\mathcal I^\iota (S); i_k \ass i_k + 1)
650\end{aligned}
651$$
652
653\subsubsection{A detailed example}\label{ssec:detailedex}
654Take the program in \autoref{fig:example1}. Its initial labelling will be:
655$$\begin{array}{l}
656\alpha\la\ra : s\ass 0;\\
657i\ass 0;\\
658i_0:\sop{while}i<n\sbin{do}\\
659\inde \beta\la i_0\ra : p\ass 1;\\
660\inde j\ass 1;\\
661\inde i_1:\sop{while}j \le i\sbin{do}\\
662\inde \inde \gamma\la i_0, i_1\ra : p\ass j*p\\
663\inde \inde j\ass j+1;\\
664\inde \delta\la i_0\ra : s\ass s+p;\\
665\inde i\ass i+1;\\
666\epsilon\la\ra:\s{skip}
667\end{array}
668$$
669(a single \s{skip} after the $\delta$ label has been suppressed, and we are using the identification
670between indexings and tuples of simple expressions explained in \autoref{ssec:depcosts}).
671Supposing for example, $n=3$
672the trace of the program will be
673$$\alpha\la\ra\,\beta\la 0 \ra\, \delta\la 0\ra\,\beta\la 1\ra\,\gamma\la 1,0\ra\,
674\delta\la 1\ra\,\beta\la 2\ra\,\gamma\la 2,0\ra\,\gamma\la 2, 1\ra\,\delta\la 2\ra\,
675\epsilon\la\ra$$
676Now let as apply the transformations of \autoref{fig:example1} with the additional
677information detailed in \autoref{ssec:looptrans}. The result is shown in
678\autoref{fig:example2}.
679\begin{figureplr}
680$$
681\begin{array}{l}
682\mbox{\color{blue}\boldmath$\alpha\la\ra $}:s\ass 0;\\
683i\ass 0;\\
684\tikztargetm{a}{\s{if}}\ i<n\sbin{then}\\
685\inde \mbox{\color{blue}\boldmath$\beta\la0\ra $}:p\ass 1;\\
686\inde j\ass 1;\\
687\inde i_1:\sop{while}j \le i\sbin{do}\\
688\inde \inde \mbox{\color{blue}\boldmath$\gamma\la 0, i_1\ra $}:p\ass j*p\\
689\inde \inde j\ass j+1;\\
690\inde \mbox{\color{blue}\boldmath$\delta\la 0\ra $}: s\ass s+p;\\
691\inde i\ass i+1;\\
692\inde i_0:\tikztargetm{d}{\s{while}}\ i<n\sbin{do}\\
693\inde \inde \mbox{\color{blue}\boldmath$\beta\la 2*i_0+1\ra $}:p\ass 1;\\
694\inde \inde j\ass 1;\\
695\inde \inde \tikztargetm{b}{\s{if}}\ j \le i\sbin{then}\\
696\inde \inde \inde \mbox{\color{blue}\boldmath$\gamma\la  2*i_0+1, 0\ra $}:p\ass j*p\\
697\inde \inde \inde j\ass j+1;\\
698\inde \inde \inde \sop{if}j \le i\sbin{then}\\
699\inde \inde \inde \inde \mbox{\color{blue}\boldmath$\gamma\la  2*i_0+1, 1\ra $}: p\ass j*p\\
700\inde \inde \inde \inde j\ass j+1;\\
701\inde \inde \inde \inde i_1:\s{while}\ j \le i\sbin{do}\\
702\inde \inde \inde \inde \inde \mbox{\color{blue}\boldmath$\gamma\la  2*i_0+1, 2*i_1 + 2 \ra $}:p\ass j*p\\
703\inde \inde \inde \inde \inde j\ass j+1;\\
704\inde \inde \inde \inde \inde \sop{if}j \le i\sbin{then}\\
705\inde \inde \inde \inde \inde \inde \mbox{\color{blue}\boldmath$\gamma\la  2*i_0+1, 2*i_1 + 3\ra $}:p\ass j*p\\
706\inde \inde \inde \inde \inde \inde \tikztargetm{c}j\ass j+1;\\
707\inde \inde \mbox{\color{blue}\boldmath$\delta\la 2*i_0+1\ra$}: s\ass s+p;\\
708\inde \inde i\ass i+1;\\
709\inde \inde \sop{if}i<n\sbin{then}\\
710\inde \inde \inde \mbox{\color{blue}\boldmath$\beta\la 2*i_0+2\ra $}:p\ass 1;\\
711\inde \inde \inde j\ass 1;\\
712\inde \inde \inde i_1:\tikztargetm{e}{\s{while}}\ j < i\sbin{do}\\
713\inde \inde \inde \inde \mbox{\color{blue}\boldmath$\gamma\la 2*i_0+2, 2*i_1\ra$}: p\ass j*p\\
714\inde \inde \inde \inde j\ass j+1;\\
715\inde \inde \inde \inde \s{if}\ j < i\sbin{then}\\
716\inde \inde \inde \inde \inde \mbox{\color{blue}\boldmath$\gamma\la2*i_0+2, 2*i_1+1\ra$}: p\ass j*p\\
717\inde \inde \inde \inde \inde \tikztargetm{f}j\ass j+1;\\
718\inde \inde \inde \mbox{\color{blue}\boldmath$\delta\la 2*i_0+2\ra $}: s\ass s+p;\\
719\inde \inde \inde i\ass i+1\tikztargetm{g};{}\\
720\mbox{\color{blue}\boldmath$\epsilon\la\ra $}:\s{skip}
721\end{array}$$
722\caption{The result of applying reindexing loop transformations on the
723program in \autoref{fig:example1}.}\label{fig:example2}
724\end{figureplr}
725One can check that the transformed code leaves the same trace when executed.
726
727Now let us compute the dependent cost of $\gamma$, supposing no other loop transformations
728are done. Ordering its indexings we
729have the following list:
730\begin{equation}
731\label{eq:inds} 
732\begin{aligned}
733  &0, i_1\\
734  &2*i_0+1, 0\\
735  &2*i_0+1, 1\\
736  &2*i_0+1, 2*i_1+2\\
737  &2*i_0+1, 2*i_1+3\\
738  &2*i_0+2, 2*i_1\\
739  &2*i_0+2, 2*i_1+1
740  \end{aligned}
741\end{equation}
742
743The resulting dependent cost will then be
744\def\indetern#1#2#3{\begin{tabular}[t]{nL}(#1)\mathrel ?{}\\\hskip 15pt #2:{}\\\hskip 15pt #3\end{tabular}}
745\def\tern#1#2#3{(#1)\mathrel ? #2:#3}
746\begin{equation}\label{eq:ex}
747\kappa^\iota(\gamma)=
748\indetern{i_0 = 0}
749  {\tern{i_1\ge 0}a0}
750  {\indetern{i_0\bmod 2 = 1 \wedge i_0\ge 1}
751    {\indetern{i_1=0}
752      b
753      {\indetern{i_1 = 1}
754        c
755        {\indetern{i_1\bmod 2 = 0 \wedge i_1\ge 2}
756          d
757          {\tern{i_1\bmod 2 = 1 \wedge i_1\ge 3}e0}
758        }
759      }
760    }
761    {\indetern{i_0\bmod 2 = 0 \wedge i_0\ge 2}
762      {\indetern{i_1 \bmod 2 = 0 \wedge i_1 \ge 0}
763        f
764        {\tern{i_1 \bmod 2 = 1 \wedge i_1 \ge 1}g0}
765      }
766      0
767    }
768  }
769\end{equation}
770We will see later on \autopageref{pag:continued} how such an expression can be simplified.
771
772\subsection{Notes on the implementation and further work}
773\label{sec:conc}
774Implementing the indexed label approach in CerCo's untrusted Ocaml prototype does not introduce many new challenges beyond what has already been presented for the toy language, \imp{} with \s{goto}s.
775\s{Clight}, the \s{C} fragment source language of CerCo's compilation chain~\cite{D2.1}, has several more fetaures, but few demand changes in the indexed labelled approach.
776
777\paragraph{Indexed loops \emph{vs}. index update instructions}
778In our presentation we have indexed loops in $\iota\ell\imp$, while we hinted that later languages in the compilation chain would have specific index update instructions.
779In CerCo's actual compilation chain from \s{Clight} to 8051 assembly, indexed loops are only in \s{Clight}, while from \s{Cminor} onward all languages have the same three cost-involving instructions: label emitting, index resetting and index incrementing.
780
781\paragraph{Loop transformations in the front end}
782We decided to implement the two loop transformations in the front end, namely in \s{Clight}.
783This decision is due to user readability concerns: if costs are to be presented to the programmer, they should depend on structures written by the programmer himself.
784If loop transformation were performed later it would be harder to create a correspondence between loops in the control flow graph and actual loops written in the source code.
785However, another solution would be to index loops in the source code and then use these indices later in the compilation chain to pinpoint explicit loops of the source code: loop indices can be used to preserve such information, just like cost labels.
786
787\paragraph{Break and continue statements}
788\s{Clight}'s loop flow control statements for breaking and continuing a loop are equivalent to appropriate \s{goto} statements.
789The only difference is that we are assured that they cannot cause loops to be multi-entry, and that when a transformation such as loop peeling is complete, they need to be replaced by actual \s{goto}s (which happens further down the compilation chain anyway).
790
791\paragraph{Function calls}
792Every internal function definition has its own space of loop indices.
793Executable semantics must thus take into account saving and resetting the constant indexing of current loops upon hitting a function call, and restoring it upon return of control.
794A peculiarity is that this cannot be attached to actions that save and restore frames: namely in the case of tail calls the constant indexing needs to be saved whereas the frame does not.
795
796\paragraph{Cost-labelled expressions}
797In labelled \s{Clight}, expressions also get cost labels, due to the presence of ternary conditional expressions (and lazy logical operators, which get translated to ternary expressions too).
798Adapting the indexed labelled approach to cost-labelled expressions does not pose any particular problems.
799
800\paragraph{Simplification of dependent costs}
801As previously mentioned, the na\"{i}ve application of the procedure described in~\ref{ssec:depcosts} produces unwieldy cost annotations.
802In our implementation several transformations are used to simplify such complex dependent costs.
803
804Disjunctions of simple conditions are closed under all logical operations, and it can be computed whether such a disjunction implies a simple condition or its negation.
805This can be used to eliminate useless branches of dependent costs, to merge branches that share the same value, and possibly to simplify the third case of simple condition.
806Examples of the three transformations are respectively:
807\begin{itemize}
808\item $
809\verb+(_i_0 == 0)?+x\verb+:(_i_0 >= 1)?+y\verb+:+z
810\mapsto
811\verb+(_i_0 == 0)?+x\verb+:+y,
812$
813\item $
814c\texttt{?}x\verb+:(+d\texttt{?}x\texttt{:}y\verb+)+
815\mapsto
816\texttt{(}c\texttt{ || }d\texttt{)?}x\texttt{:}y,
817$
818\item \begin{tabular}[t]{np{\linewidth}n}
819$\verb+(_i_0 == 0)?+x\verb+:(_i_0 % 2 == 0 && _i_0 >= 2)?+y\verb+:+z
820\mapsto$ \\\hfill
821$\verb+(_i_0 == 0)?+x\verb+:(_i_0 % 2 == 0)?+y\verb+:+z.
822$\end{tabular}
823\end{itemize}
824The second transformation tends to accumulate disjunctions, to the detriment of readability.
825A further transformation swaps two branches of the ternary expression if the negation of the condition can be expressed with fewer clauses.
826For example:
827$$ \verb+(_i_0 % 3 == 0 || _i_0 % 3 == 1)?+x\verb+:+y \mapsto
828\verb+(_i_0 % 3 == 2)?+y\verb+:+x.
829$$
830Picking up again the example depicted in \autoref{ssec:detailedex}, \label{pag:continued}
831we can see that the cost in \eqref{eq:ex} can be simplified to the following,
832using some of the transformation described above:
833$$
834\kappa^\iota(\gamma)=
835\indetern{i_0 = 0}
836  a
837  {\indetern{i_0\bmod 2 = 1}
838    {\indetern{i_1=0}
839      b
840      {\indetern{i_1 = 1}
841        c
842        {\indetern{i_1\bmod 2 = 0}
843          de
844        }
845      }
846    }
847    {\indetern{i_1 \bmod 2 = 0}
848      fg
849    }
850  }
851$$
852One should keep in mind that the example was wilfully complicated, in practice
853the cost expressions produced have rarely more clauses
854than the number of nested loops containing the annotation.
855\paragraph{Updates to the frama-C cost plugin}
856Cerco's frama-C~\cite{framac} cost plugin\todo{is there a reference for this?}{} has been updated to take into account our new notion of dependent costs.
857The frama-c framework expands ternary expressions to branch statements, introducing temporaries along the way.
858This makes the task of analyzing ternary cost expressions rather daunting.
859It was deemed necessary to provide an option in the compiler to use actual branch statements for cost annotations rather than ternary expressions, so that at least frama-C's use of temporaries in cost annotation could be avoided.
860The cost analysis carried out by the plugin now takes into account such dependent costs.
861
862The only limitation (which actually simplifies the code) is that, within a dependent cost, simple conditions with modulus on the same loop index should not be modulo different numbers.
863This corresponds to a reasonable limitation on the number of times loop unrolling may be applied to the same loop: at most once.
864
865%%% HERE BEGINS A NEW BIT
866
867\section{Introduction}
868
869The \cerco{} compiler produces a version of the source code containing
870annotations describing the timing behaviour of the object code, as
871well as the object code itself. It compiles C code, targeting
872microcontrollers implementing the Intel 8051 architecture.  There are
873two versions: first, an initial prototype was implemented in
874\ocaml{}~\cite{d2.2}, then a version was formalised in the \matita{}
875proof assistant~\cite{d3.2,d4.2} and extracted to \ocaml{} code to
876produce an executable compiler.  In this document we present results
877from Deliverable 3.4, the formalised proofs in \matita{} about the
878front-end of the latter version of the compiler (culminating in the
879\lstinline'front_end_correct' lemma), and describe how that fits
880into the verification of the whole compiler.
881
882A key part of this work was to layer the \emph{intensional} correctness
883results that show that the costs produced are correct on top of the
884proofs about the compiled code's \emph{extensional} behaviour (that is, the
885functional correctness of the compiler).  Unfortunately, the ambitious
886goal of completely verifying the entire compiler was not feasible
887within the time available, but thanks to this separation of
888extensional and intensional proofs we are able to axiomatise some extensional
889simulation results which are similar to those in other compiler verification
890projects and concentrate on the novel intensional proofs.  We were
891also able to add stack space costs to obtain a stronger result.  The
892proofs were made more tractable by introducing compile-time checks for
893the `sound and precise' cost labelling properties rather than proving
894that they are preserved throughout.
895
896The overall statement of correctness says that the annotated program has the
897same behaviour as the input, and that for any suitably well-structured part of
898the execution (which we call \emph{measurable}), the object code will execute
899the same behaviour taking precisely the time given by the cost annotations in
900the annotated source program.
901
902In the next section we recall the structure of the compiler and make the overall
903statement more precise.  Following that, in Section~\ref{sec:fegoals} we
904describe the statements we need to prove about the intermediate \textsf{RTLabs}
905programs for the back-end proofs.
906Section~\ref{sec:inputtolabelling} covers the compiler passes which produce the
907annotated source program and Section~\ref{sec:measurablelifting} the rest
908of the transformations in the front-end.  Then the compile-time checks
909for good cost labelling are detailed in Section~\ref{sec:costchecks}
910and the proofs that the structured traces required by the back-end
911exist are discussed in Section~\ref{sec:structuredtrace}.
912
913\section{The compiler and its correctness statement}
914
915The uncertified prototype \ocaml{} \cerco{} compiler was originally described
916in Deliverables 2.1 and 2.2.  Its design was replicated in the formal
917\matita{} code, which was presented in Deliverables 3.2 and 4.2, for
918the front-end and back-end respectively.
919
920\begin{figure}
921\begin{center}
922\includegraphics[width=0.5\linewidth]{compiler-plain.pdf}
923\end{center}
924\caption{Languages in the \cerco{} compiler}
925\label{fig:compilerlangs}
926\end{figure}
927
928The compiler uses a number of intermediate languages, as outlined the
929middle two lines of Figure~\ref{fig:compilerlangs}.  The upper line
930represents the front-end of the compiler, and the lower the back-end,
931finishing with Intel 8051 binary code.  Not all of the front-end compiler passes
932introduce a new language, and Figure~\ref{fig:summary} presents a
933list of every pass involved.
934
935\begin{figure}
936\begin{center}
937\begin{minipage}{.8\linewidth}
938\begin{tabbing}
939\quad \= $\downarrow$ \quad \= \kill
940\textsf{C} (unformalised)\\
941\> $\downarrow$ \> CIL parser (unformalised \ocaml)\\
942\textsf{Clight}\\
943%\> $\downarrow$ \> add runtime functions\\
944\> $\downarrow$ \> \lstinline[language=C]'switch' removal\\
945\> $\downarrow$ \> labelling\\
946\> $\downarrow$ \> cast removal\\
947\> $\downarrow$ \> stack variable allocation and control structure
948 simplification\\
949\textsf{Cminor}\\
950%\> $\downarrow$ \> generate global variable initialization code\\
951\> $\downarrow$ \> transform to RTL graph\\
952\textsf{RTLabs}\\
953\> $\downarrow$ \> check cost labelled properties of RTL graph\\
954\> $\downarrow$ \> start of target specific back-end\\
955\>\quad \vdots
956\end{tabbing}
957\end{minipage}
958\end{center}
959\caption{Front-end languages and compiler passes}
960\label{fig:summary}
961\end{figure}
962
963\label{page:switchintro}
964The annotated source code is produced by the cost labelling phase.
965Note that there is a pass to replace C \lstinline[language=C]'switch'
966statements before labelling --- we need to remove them because the
967simple form of labelling used in the formalised compiler is not quite
968capable of capturing their execution time costs, largely due to C's
969`fall-through' behaviour where execution from one branch continues in
970the next unless there is an explicit \lstinline[language=C]'break'.
971
972The cast removal phase which follows cost labelling simplifies
973expressions to prevent unnecessary arithmetic promotion, which is
974specified by the C standard but costly for an 8-bit target.  The
975transformation to \textsf{Cminor} and subsequently \textsf{RTLabs}
976bear considerable resemblance to some passes of the CompCert
977compiler~\cite{Blazy-Leroy-Clight-09,Leroy-backend}, although we use a simpler \textsf{Cminor} where
978all loops use \lstinline[language=C]'goto' statements, and the
979\textsf{RTLabs} language retains a target-independent flavour.  The
980back-end takes \textsf{RTLabs} code as input.
981
982The whole compilation function returns the following information on success:
983\begin{lstlisting}[language=Grafite]
984  record compiler_output : Type[0] :=
985   { c_labelled_object_code: labelled_object_code
986   ; c_stack_cost: stack_cost_model
987   ; c_max_stack: nat
988   ; c_init_costlabel: costlabel
989   ; c_labelled_clight: clight_program
990   ; c_clight_cost_map: clight_cost_map
991   }.
992\end{lstlisting}
993It consists of annotated 8051 object code, a mapping from function
994identifiers to the function's stack space usage, the space available for the
995stack after global variable allocation, a cost label covering the
996execution time for the initialisation of global variables and the call
997to the \lstinline[language=C]'main' function, the annotated source
998code, and finally a mapping from cost labels to actual execution time
999costs.
1000
1001An \ocaml{} pretty printer is used to provide a concrete version of
1002the output code and annotated source code.  In the case of the
1003annotated source code, it also inserts the actual costs alongside the
1004cost labels, and optionally adds a global cost variable and
1005instrumentation to support further reasoning in external tools such as
1006Frama-C.
1007
1008\subsection{Revisions to the prototype compiler}
1009
1010Our focus on intensional properties prompted us to consider whether we
1011could incorporate stack space into the costs presented to the user.
1012We only allocate one fixed-size frame per function, so modelling this
1013was relatively simple.  It is the only form of dynamic memory
1014allocation provided by the compiler, so we were able to strengthen the
1015statement of the goal to guarantee successful execution whenever the
1016stack space obeys the \lstinline'c_max_stack' bound calculated by
1017subtracting the global variable requirements from the total memory
1018available.
1019
1020The cost labelling checks at the end of Figure~\ref{fig:summary} have been
1021introduced to reduce the proof burden, and are described in
1022Section~\ref{sec:costchecks}.
1023
1024The use of dependent types to capture simple intermediate language
1025invariants makes every front-end pass a total function, except
1026\textsf{Clight} to \textsf{Cminor} and the cost checks.  Hence various
1027well-formedness and type safety checks are performed only once between
1028\textsf{Clight} and \textsf{Cminor}, and the invariants rule out any
1029difficulties in the later stages.  With the benefit of hindsight we
1030would have included an initial checking phase to produce a
1031`well-formed' variant of \textsf{Clight}, conjecturing that this would
1032simplify various parts of the proofs for the \textsf{Clight} stages
1033which deal with potentially ill-formed code.
1034
1035Following D2.2, we previously generated code for global variable
1036initialisation in \textsf{Cminor}, for which we reserved a cost label
1037to represent the execution time for initialisation.  However, the
1038back-end must also add an initial call to the main function, whose
1039cost must also be accounted for, so we decided to move the
1040initialisation code to the back-end and merge the costs.
1041
1042\subsection{Main correctness statement}
1043
1044[[relocated to specification section]]
1045
1046\section{Correctness statement for the front-end}
1047\label{sec:fegoals}
1048
1049The essential parts of the intensional proof were outlined during work
1050on a toy compiler in Task
10512.1~\cite{springerlink:10.1007/978-3-642-32469-7_3}.  These are
1052\begin{enumerate}
1053\item functional correctness, in particular preserving the trace of
1054  cost labels,
1055\item the \emph{soundness} and \emph{precision} of the cost labelling
1056  on the object code, and
1057\item the timing analysis on the object code produces a correct
1058  mapping from cost labels to time.
1059\end{enumerate}
1060
1061However, that toy development did not include function calls.  For the
1062full \cerco{} compiler we also need to maintain the invariant that
1063functions return to the correct program location in the caller, as we
1064mentioned in the previous section.  During work on the back-end timing
1065analysis (describe in more detail in the companion deliverable, D4.4)
1066the notion of a \emph{structured trace} was developed to enforce this
1067return property, and also most of the cost labelling properties too.
1068
1069\begin{figure}
1070\begin{center}
1071\includegraphics[width=0.5\linewidth]{compiler.pdf}
1072\end{center}
1073\caption{The compiler and proof outline}
1074\label{fig:compiler}
1075\end{figure}
1076
1077Jointly, we generalised the structured traces to apply to any of the
1078intermediate languages which have some idea of program counter.  This means
1079that they are introduced part way through the compiler, see
1080Figure~\ref{fig:compiler}.  Proving that a structured trace can be
1081constructed at \textsf{RTLabs} has several virtues:
1082\begin{itemize}
1083\item This is the first language where every operation has its own
1084  unique, easily addressable, statement.
1085\item Function calls and returns are still handled implicitly in the
1086  language and so the structural properties are ensured by the
1087  semantics.
1088\item Many of the back-end languages from \textsf{RTL} onwards share a common
1089  core set of definitions, and using structured traces throughout
1090  increases this uniformity.
1091\end{itemize}
1092
1093\begin{figure}
1094\begin{center}
1095\includegraphics[width=0.6\linewidth]{strtraces.pdf}
1096\end{center}
1097\caption{Nesting of functions in structured traces}
1098\label{fig:strtrace}
1099\end{figure}
1100A structured trace is a mutually inductive data type which
1101contains the steps from a normal program trace, but arranged into a
1102nested structure which groups entire function calls together and
1103aggregates individual steps between cost labels (or between the final
1104cost label and the return from the function), see
1105Figure~\ref{fig:strtrace}.  This captures the idea that the cost labels
1106only represent costs \emph{within} a function --- calls to other
1107functions are accounted for in the nested trace for their execution, and we
1108can locally regard function calls as a single step.
1109
1110These structured traces form the core part of the intermediate results
1111that we must prove so that the back-end can complete the main
1112intensional result stated above.  In full, we provide the back-end
1113with
1114\begin{enumerate}
1115\item A normal trace of the \textbf{prefix} of the program's execution
1116  before reaching the measurable subtrace.  (This needs to be
1117  preserved so that we know that the stack space consumed is correct,
1118  and to set up the simulation results.)
1119\item The \textbf{structured trace} corresponding to the measurable
1120  subtrace.
1121\item An additional property about the structured trace that no
1122  `program counter' is \textbf{repeated} between cost labels.  Together with
1123  the structure in the trace, this takes over from showing that
1124  cost labelling is sound and precise.
1125\item A proof that the \textbf{observables} have been preserved.
1126\item A proof that the \textbf{stack limit} is still observed by the prefix and
1127  the structure trace.  (This is largely a consequence of the
1128  preservation of observables.)
1129\end{enumerate}
1130The \lstinline'front_end_correct' lemma in the
1131\lstinline'correctness.ma' file provides a record containing these.
1132
1133Following the outline in Figure~\ref{fig:compiler}, we will first deal
1134with the transformations in \textsf{Clight} that produce the source
1135program with cost labels, then show that measurable traces can be
1136lifted to \textsf{RTLabs}, and finally show that we can construct the
1137properties listed above ready for the back-end proofs.
1138
1139\section{Input code to cost labelled program}
1140\label{sec:inputtolabelling}
1141
1142As explained on page~\pageref{page:switchintro}, the costs of complex
1143C \lstinline[language=C]'switch' statements cannot be represented with
1144the simple labelling used in the formalised compiler.  Our first pass
1145replaces these statements with simpler C code, allowing our second
1146pass to perform the cost labelling.  We show that the behaviour of
1147programs is unchanged by these passes using forward
1148simulations\footnote{All of our languages are deterministic, which can
1149be seen directly from their executable definitions.  Thus we know that
1150forward simulations are sufficient because the target cannot have any
1151other behaviour.}.
1152
1153\subsection{Switch removal}
1154
1155We compile away \lstinline[language=C]'switch' statements into more
1156basic \textsf{Clight} code.
1157Note that this transformation does not necessarily deteriorate the
1158efficiency of the generated code. For instance, compilers such as GCC
1159introduce balanced trees of ``if-then-else'' constructs for small
1160switches. However, our implementation strategy is much simpler. Let
1161us consider the following input statement.
1162
1163\begin{lstlisting}[language=C]
1164   switch(e) {
1165   case v1:
1166     stmt1;
1167   case v2:
1168     stmt2;
1169   default:
1170     stmt_default;
1171   }
1172\end{lstlisting}
1173
1174Note that \textsf{stmt1}, \textsf{stmt2}, \ldots \textsf{stmt\_default}
1175may contain \lstinline[language=C]'break' statements, which have the
1176effect of exiting the switch statement. In the absence of break, the
1177execution falls through each case sequentially. In our implementation,
1178we produce an equivalent sequence of ``if-then'' chained by gotos:
1179\begin{lstlisting}[language=C]
1180   fresh = e;
1181   if(fresh == v1) {
1182     $\llbracket$stmt1$\rrbracket$;
1183     goto lbl_case2;
1184   };
1185   if(fresh == v2) {
1186     lbl_case2:
1187     $\llbracket$stmt2$\rrbracket$;
1188     goto lbl_case2;
1189   };
1190   $\llbracket$stmt_default$\rrbracket$;
1191   exit_label:
1192\end{lstlisting}
1193
1194The proof had to tackle the following points:
1195\begin{itemize}
1196\item the source and target memories are not the same (due to the fresh variable),
1197\item the flow of control is changed in a non-local way (e.g. \textbf{goto} 
1198  instead of \textbf{break}).
1199\end{itemize}
1200In order to tackle the first point, we implemented a version of memory
1201extensions similar to those of CompCert.
1202
1203For the simulation we decided to prove a sufficient amount to give us
1204confidence in the definitions and approach, but to curtail the proof
1205because this pass does not contribute to the intensional correctness
1206result.  We tackled several simple cases, that do not interact with
1207the switch removal per se, to show that the definitions were usable,
1208and part of the switch case to check that the approach is
1209reasonable. This comprises propagating the memory extension through
1210each statement (except switch), as well as various invariants that are
1211needed for the switch case (in particular, freshness hypotheses). The
1212details of the evaluation process for the source switch statement and
1213its target counterpart can be found in the file
1214\lstinline'switchRemoval.ma', along more details on the transformation
1215itself.
1216
1217Proving the correctness of the second point would require reasoning on the
1218semantics of \lstinline[language=C]'goto' statements. In the \textsf{Clight} 
1219semantics, this is implemented as a function-wide lookup of the target label.
1220The invariant we would need is the fact that a global label lookup on a freshly
1221created goto is equivalent to a local lookup. This would in turn require the
1222propagation of some freshness hypotheses on labels. As discussed,
1223we decided to omit this part of the correctness proof.
1224
1225\subsection{Cost labelling}
1226
1227The simulation for the cost labelling pass is the simplest in the
1228front-end.  The main argument is that any step of the source program
1229is simulated by the same step of the labelled one, plus any extra
1230steps for the added cost labels.  The extra instructions do not change
1231the memory or local environments, although we have to keep track of
1232the extra instructions that appear in continuations, for example
1233during the execution of a \lstinline[language=C]'while' loop.
1234
1235We do not attempt to capture any cost properties of the labelling\footnote{We describe how the cost properties are
1236established in Section~\ref{sec:costchecks}.} in
1237the simulation proof, which allows the proof to be oblivious to the choice
1238of cost labels.  Hence we do not have to reason about the threading of
1239name generation through the labelling function, greatly reducing the
1240amount of effort required.
1241
1242%TODO: both give one-step-sim-by-many forward sim results; switch
1243%removal tricky, uses aux var to keep result of expr, not central to
1244%intensional correctness so curtailed proof effort once reasonable
1245%level of confidence in code gained; labelling much simpler; don't care
1246%what the labels are at this stage, just need to know when to go
1247%through extra steps.  Rolled up into a single result with a cofixpoint
1248%to obtain coinductive statement of equivalence (show).
1249
1250\section{Finding corresponding measurable subtraces}
1251\label{sec:measurablelifting}
1252
1253There follow the three main passes of the front-end:
1254\begin{enumerate}
1255\item simplification of casts in \textsf{Clight} code
1256\item \textsf{Clight} to \textsf{Cminor} translation, performing stack
1257  variable allocation and simplifying control structures
1258\item transformation to \textsf{RTLabs} control flow graph
1259\end{enumerate}
1260We have taken a common approach to
1261each pass: first we build (or axiomatise) forward simulation results
1262that are similar to normal compiler proofs, but which are slightly more
1263fine-grained so that we can see that the call structure and relative
1264placement of cost labels is preserved.
1265
1266Then we instantiate a general result which shows that we can find a
1267\emph{measurable} subtrace in the target of the pass that corresponds
1268to the measurable subtrace in the source.  By repeated application of
1269this result we can find a measurable subtrace of the execution of the
1270\textsf{RTLabs} code, suitable for the construction of a structured
1271trace (see Section~\ref{sec:structuredtrace}).  This is essentially an
1272extra layer on top of the simulation proofs that provides us with the
1273additional information required for our intensional correctness proof.
1274
1275\subsection{Generic measurable subtrace lifting proof}
1276
1277Our generic proof is parametrised on a record containing small-step
1278semantics for the source and target language, a classification of
1279states (the same form of classification is used when defining
1280structured traces), a simulation relation which respects the
1281classification and cost labelling and
1282four simulation results.  The simulations are split by the starting state's
1283classification and whether it is a cost label, which will allow us to
1284observe that the call structure is preserved.  They are:
1285\begin{enumerate}
1286\item a step from a `normal' state (which is not classified as a call
1287  or return) which is not a cost label is simulated by zero or more
1288  `normal' steps;
1289\item a step from a `call' state followed by a cost label step is
1290  simulated by a step from a `call' state, a corresponding label step,
1291  then zero or more `normal' steps;
1292\item a step from a `call' state not followed by a cost label
1293  similarly (note that this case cannot occur in a well-labelled
1294  program, but we do not have enough information locally to exploit
1295  this); and
1296\item a cost label step is simulated by a cost label step.
1297\end{enumerate}
1298Finally, we need to know that a successfully translated program will
1299have an initial state in the simulation relation with the original
1300program's initial state.
1301
1302The back-end has similar requirements for lifting simulations to
1303structured traces.  Fortunately, our treatment of calls and returns
1304can be slightly simpler because we have special call and return states
1305that correspond to function entry and return that are separate from
1306the actual instructions.  This was originally inherited from our port
1307of CompCert's \textsf{Clight} semantics, but proves useful here
1308because we only need to consider adding extra steps \emph{after} a
1309call or return state, because the instruction step deals with extra
1310steps that occur before.  The back-end makes all of the call and
1311return machinery explicit, and thus needs more complex statements
1312about extra steps before and after each call and return.
1313
1314\begin{figure}
1315\begin{center}
1316\includegraphics[width=0.5\linewidth]{meassim.pdf}
1317\end{center}
1318\caption{Tiling of simulation for a measurable subtrace}
1319\label{fig:tiling}
1320\end{figure}
1321
1322To find the measurable subtrace in the target program's execution we
1323walk along the original program's execution trace applying the
1324appropriate simulation result by induction on the number of steps.
1325While the number of steps taken varies, the overall structure is
1326preserved, as illustrated in Figure~\ref{fig:tiling}.  By preserving
1327the structure we also maintain the same intensional observables.  One
1328delicate point is that the cost label following a call must remain
1329directly afterwards\footnote{The prototype compiler allowed some
1330  straight-line code to appear before the cost label until a later
1331  stage of the compiler, but we must move the requirement forward to
1332  fit with the structured traces.}
1333% Damn it, I should have just moved the cost label forwards in RTLabs,
1334% like the prototype does in RTL to ERTL; the result would have been
1335% simpler.  Or was there some reason not to do that?
1336(both in the program code and in the execution trace), even if we
1337introduce extra steps, for example to store parameters in memory in
1338\textsf{Cminor}.  Thus we have a version of the call simulation
1339that deals with both the call and the cost label in one result.
1340
1341In addition to the subtrace we are interested in measuring, we must
1342prove that the earlier part of the trace is also preserved in
1343order to use the simulation from the initial state.  This proof also
1344guarantees that we do not run out of stack space before the subtrace
1345we are interested in.  The lemmas for this prefix and the measurable
1346subtrace are similar, following the pattern above.  However, the
1347measurable subtrace also requires us to rebuild the termination
1348proof.  This is defined recursively:
1349\label{prog:terminationproof}
1350\begin{lstlisting}[language=Grafite]
1351  let rec will_return_aux C (depth:nat)
1352                               (trace:list (cs_state $...$ C $\times$ trace)) on trace : bool :=
1353  match trace with
1354  [ nil $\Rightarrow$ false
1355  | cons h tl $\Rightarrow$
1356    let $\langle$s,tr$\rangle$ := h in
1357    match cs_classify C s with
1358    [ cl_call $\Rightarrow$ will_return_aux C (S depth) tl
1359    | cl_return $\Rightarrow$
1360        match depth with
1361        [ O $\Rightarrow$ match tl with [ nil $\Rightarrow$ true | _ $\Rightarrow$ false ]
1362        | S d $\Rightarrow$ will_return_aux C d tl
1363        ]
1364    | _ $\Rightarrow$ will_return_aux C depth tl
1365    ]
1366  ].
1367\end{lstlisting}
1368The \lstinline'depth' is the number of return states we need to see
1369before we have returned to the original function (initially zero) and
1370\lstinline'trace' the measurable subtrace obtained from the running
1371the semantics for the correct number of steps.  This definition
1372unfolds tail recursively for each step, and once the corresponding
1373simulation result has been applied a new one for the target can be
1374asserted by unfolding and applying the induction hypothesis on the
1375shorter trace.
1376
1377Combining the lemmas about the prefix and the measurable subtrace
1378requires a little care because the states joining the two might not be
1379related in the simulation.  In particular, if the measurable subtrace
1380starts from the cost label at the beginning of the function there may
1381be some extra instructions in the target code to execute to complete
1382function entry before the states are back in the relation.  Hence we
1383carefully phrased the lemmas to allow for such extra steps.
1384
1385Together, these then gives us an overall result for any simulation fitting the
1386requirements above (contained in the \lstinline'meas_sim' record):
1387\begin{lstlisting}[language=Grafite]
1388theorem measured_subtrace_preserved :
1389  $\forall$MS:meas_sim.
1390  $\forall$p1,p2,m,n,stack_cost,max.
1391  ms_compiled MS p1 p2 $\rightarrow$
1392  measurable (ms_C1 MS) p1 m n stack_cost max $\rightarrow$
1393  $\exists$m',n'.
1394    measurable (ms_C2 MS) p2 m' n' stack_cost max $\wedge$
1395    observables (ms_C1 MS) p1 m n = observables (ms_C2 MS) p2 m' n'.
1396\end{lstlisting}
1397The stack space requirement that is embedded in \lstinline'measurable'
1398is a consequence of the preservation of observables, because it is
1399determined by the functions called and returned from, which are observable.
1400
1401\subsection{Simulation results for each pass}
1402
1403We now consider the simulation results for the passes, each of which
1404is used to instantiate the
1405\lstinline[language=Grafite]'measured_subtrace_preserved' theorem to
1406construct the measurable subtrace for the next language.
1407
1408\subsubsection{Cast simplification}
1409
1410The parser used in \cerco{} introduces a lot of explicit type casts.
1411If left as they are, these constructs can greatly hamper the
1412quality of the generated code -- especially as the architecture
1413we consider is an $8$-bit one. In \textsf{Clight}, casts are
1414expressions. Hence, most of the work of this transformation
1415proceeds on expressions. The transformation proceeds by recursively
1416trying to coerce an expression to a particular integer type, which
1417is in practice smaller than the original one. This functionality
1418is implemented by two mutually recursive functions whose signature
1419is the following.
1420
1421\begin{lstlisting}[language=Grafite]
1422let rec simplify_expr (e:expr) (target_sz:intsize) (target_sg:signedness)
1423  : $\Sigma$result:bool$\times$expr.
1424    $\forall$ge,en,m. simplify_inv ge en m e (\snd result) target_sz target_sg (\fst result) := $\ldots$
1425
1426and simplify_inside (e:expr) : $\Sigma$result:expr. conservation e result := $\ldots$
1427\end{lstlisting}
1428
1429The \textsf{simplify\_inside} acts as a wrapper for
1430\textsf{simplify\_expr}. Whenever \textsf{simplify\_inside} encounters
1431a \textsf{Ecast} expression, it tries to coerce the sub-expression
1432to the desired type using \textsf{simplify\_expr}, which tries to
1433perform the actual coercion. In return, \textsf{simplify\_expr} calls
1434back \textsf{simplify\_inside} in some particular positions, where we
1435decided to be conservative in order to simplify the proofs. However,
1436the current design allows to incrementally revert to a more aggressive
1437version, by replacing recursive calls to \textsf{simplify\_inside} by
1438calls to \textsf{simplify\_expr} \emph{and} proving the corresponding
1439invariants -- where possible.
1440
1441The \textsf{simplify\_inv} invariant encodes either the conservation
1442of the semantics during the transformation corresponding to the failure
1443of the coercion (\textsf{Inv\_eq} constructor), or the successful
1444downcast of the considered expression to the target type
1445(\textsf{Inv\_coerce\_ok}).
1446
1447\begin{lstlisting}[language=Grafite]
1448inductive simplify_inv
1449  (ge : genv) (en : env) (m : mem)
1450  (e1 : expr) (e2 : expr) (target_sz : intsize) (target_sg : signedness) : bool $\rightarrow$ Prop :=
1451| Inv_eq : $\forall$result_flag. $\ldots$
1452     simplify_inv ge en m e1 e2 target_sz target_sg result_flag
1453| Inv_coerce_ok : $\forall$src_sz,src_sg.
1454     typeof e1 = Tint src_sz src_sg $\rightarrow$
1455     typeof e2 = Tint target_sz target_sg $\rightarrow$     
1456     smaller_integer_val src_sz target_sz src_sg (exec_expr ge en m e1) (exec_expr ge en m e2) $\rightarrow$
1457     simplify_inv ge en m e1 e2 target_sz target_sg true.
1458\end{lstlisting}
1459
1460The \textsf{conservation} invariant for \textsf{simplify\_inside} simply states the conservation
1461of the semantics, as in the \textsf{Inv\_eq} constructor of the previous
1462invariant.
1463
1464\begin{lstlisting}[language=Grafite]
1465definition conservation := $\lambda$e,result. $\forall$ge,en,m.
1466          res_sim ? (exec_expr ge en m e) (exec_expr ge en m result)
1467       $\wedge$ res_sim ? (exec_lvalue ge en m e) (exec_lvalue ge en m result)
1468       $\wedge$ typeof e = typeof result.
1469\end{lstlisting}
1470
1471This invariant is then easily lifted to statement evaluations.
1472The main problem encountered with this particular pass was dealing with
1473inconsistently typed programs, a canonical case being a particular
1474integer constant of a certain size typed with another size. This
1475prompted the need to introduce numerous type checks, making
1476both the implementation and the proof more complex, even though more
1477comprehensive checks are made in the next stage.
1478%\todo{Make this a particular case of the more general statement on baking more invariants in the Clight language}
1479
1480\subsubsection{Clight to Cminor}
1481
1482This pass is the last one operating on the \textsf{Clight} language.
1483Its input is a full \textsf{Clight} program, and its output is a
1484\textsf{Cminor} program. Note that we do not use an equivalent of
1485CompCert's \textsf{C\#minor} language: we translate directly to a
1486variant of \textsf{Cminor}. This presents the advantage of not
1487requiring the special loop constructs, nor the explicit block
1488structure. Another salient point of our approach is that a significant
1489number of the properties needed for the simulation proof were directly
1490encoded in dependently typed translation functions.  In particular,
1491freshness conditions and well-typedness conditions are included. The
1492main effects of the transformation from \textsf{Clight} to
1493\textsf{Cminor} are listed below.
1494
1495\begin{itemize}
1496\item Variables are classified as being either globals, stack-allocated
1497  locals or potentially register-allocated locals. The value of register-allocated
1498  local variables is moved out of the modelled memory and stored in a
1499  dedicated environment.
1500\item In \textsf{Clight}, each local variable has a dedicated memory block, whereas
1501  stack-allocated locals are bundled together on a function-by-function basis.
1502\item Loops are converted to jumps.
1503\end{itemize}
1504
1505The first two points require memory injections which are more flexible that those
1506needed in the switch removal case. In the remainder of this section, we briefly
1507discuss our implementation of memory injections, and then the simulation proof.
1508
1509\paragraph{Memory injections.}
1510
1511Our memory injections are modelled after the work of Blazy \& Leroy.
1512However, the corresponding paper is based on the first version of the
1513CompCert memory model~\cite{2008-Leroy-Blazy-memory-model}, whereas we use a much more concrete model, allowing byte-level
1514manipulations (as in the later version of CompCert's memory model). We proved
1515roughly 80 \% of the required lemmas. Notably, some of the difficulties encountered were
1516due to overly relaxed conditions on pointer validity (fixed during development).
1517Some more side conditions had to be added to take care of possible overflows when converting
1518from \textbf{Z} block bounds to $16$ bit pointer offsets (in practice, such overflows
1519only occur in edge cases that are easily ruled out -- but this fact is not visible
1520in memory injections). Concretely, some of the lemmas on the preservation of simulation of
1521loads after writes were axiomatised, due to a lack of time.
1522
1523\paragraph{Simulation proof.}
1524
1525We proved the simulation result for expressions and a representative
1526selection of statements.  In particular we tackled
1527\lstinline[language=C]'while' statements to ensure that we correctly
1528translate loops because our approach differs from CompCert by
1529converting directly to \textsf{Cminor} \lstinline[language=C]'goto's
1530rather than maintaining a notion of loop in \textsf{Cminor}.  We also have a partial
1531proof for function entry, covering the setup of the memory injection,
1532but not function exit.  Exits, and the remaining statements, have been
1533axiomatised.
1534
1535Careful management of the proof state was required because proof terms
1536are embedded in \textsf{Cminor} code to show that invariants are
1537respected.  These proof terms appear in the proof state when inverting
1538the translation functions, and they can be large and awkward.  While
1539generalising them away is usually sufficient, it can be difficult when
1540they appear under a binder.
1541
1542%The correctness proof for this transformation was not completed. We proved the
1543%simulation result for expressions and for some subset of the critical statement cases.
1544%Notably lacking are the function entry and exit, where the memory injection is
1545%properly set up. As would be expected, a significant amount of work has to be performed
1546%to show the conservation of all invariants at each simulation step.
1547
1548%\todo{list cases, explain while loop, explain labeling problem}
1549
1550\subsubsection{Cminor to RTLabs}
1551
1552The translation from \textsf{Cminor} to \textsf{RTLabs} is a fairly
1553routine control flow graph (CFG) construction.  As such, we chose to
1554axiomatise the associated extensional simulation results.  However, we did prove several
1555properties of the generated programs:
1556\begin{itemize}
1557\item All statements are type correct with respect to the declared
1558  pseudo-register type environment.
1559\item The CFG is closed, and has a distinguished entry node and a
1560  unique exit node.
1561\end{itemize}
1562
1563These properties rely on similar properties about type safety and the
1564presence of \lstinline[language=C]'goto'-labels for \textsf{Cminor} programs
1565which are checked at the preceding stage.  As a result, this
1566transformation is total and any compilation failures must occur when
1567the corresponding \textsf{Clight} source is available and a better
1568error message can be generated.
1569
1570The proof obligations for these properties include many instances of
1571graph inclusion.  We automated these proofs using a small amount of
1572reflection, making the obligations much easier to handle.  One
1573drawback to enforcing invariants throughout is that temporarily
1574breaking them can be awkward.  For example, \lstinline'return'
1575statements were originally used as placeholders for
1576\lstinline[language=C]'goto' destinations that had not yet been
1577translated.  However, this made establishing the single exit node
1578property rather difficult, and a different placeholder was chosen
1579instead.  In other circumstances it is possible to prove a more
1580complex invariant then simplify it at the end of the transformation.
1581
1582\section{Checking cost labelling properties}
1583\label{sec:costchecks}
1584
1585Ideally, we would provide proofs that the cost labelling pass always
1586produces programs that are soundly and precisely labelled and that
1587each subsequent pass preserves these properties.  This would match our
1588use of dependent types to eliminate impossible sources of errors
1589during compilation, in particular retaining intermediate language type
1590information.
1591
1592However, given the limited amount of time available we realised that
1593implementing a compile-time check for a sound and precise labelling of
1594the \textsf{RTLabs} intermediate code would reduce the proof burden
1595considerably.  This is similar in spirit to the use of translation
1596validation in certified compilation, which makes a similar trade-off
1597between the potential for compile-time failure and the volume of proof
1598required.
1599
1600The check cannot be pushed into a later stage of the compiler because
1601much of the information is embedded into the structured traces.
1602However, if an alternative method was used to show that function
1603returns in the compiled code are sufficiently well-behaved, then we
1604could consider pushing the cost property checks into the timing
1605analysis itself.  We leave this as a possible area for future work.
1606
1607\subsection{Implementation and correctness}
1608\label{sec:costchecksimpl}
1609
1610For a cost labelling to be sound and precise we need a cost label at
1611the start of each function, after each branch and at least one in
1612every loop.  The first two parts are trivial to check by examining the
1613code.  In \textsf{RTLabs} the last part is specified by saying
1614that there is a bound on the number of successive instruction nodes in
1615the CFG that you can follow before you encounter a cost label, and
1616checking this is more difficult.
1617
1618The implementation progresses through the set of nodes in the graph,
1619following successors until a cost label is found or a label-free cycle
1620is discovered (in which case the property does not hold and we return
1621an error).  This is made easier by the prior knowledge that every
1622successor of a branch instruction is a cost label, so we do not need
1623to search each branch.  When a label is found, we remove the chain of
1624program counters from the set and continue from another node in the
1625set until it is empty, at which point we know that there is a bound
1626for every node in the graph.
1627
1628Directly reasoning about the function that implements this procedure would be
1629rather awkward, so an inductive specification of a single step of its
1630behaviour was written and proved to match the implementation.  This
1631was then used to prove the implementation sound and complete.
1632
1633While we have not attempted to prove that the cost labelled properties
1634are established and preserved earlier in the compiler, we expect that
1635the effort for the \textsf{Cminor} to \textsf{RTLabs} stage alone
1636would be similar to the work outlined above, because it involves the
1637change from requiring a cost label at particular positions to
1638requiring cost labels to break loops in the CFG.  As there are another
1639three passes to consider (including the labelling itself), we believe
1640that using the check above is much simpler overall.
1641
1642% TODO? Found some Clight to Cminor bugs quite quickly
1643
1644\section{Existence of a structured trace}
1645\label{sec:structuredtrace}
1646
1647The \emph{structured trace} idea introduced in
1648Section~\ref{sec:fegoals} enriches the execution trace of a program by
1649nesting function calls in a mixed-step style and embedding the cost
1650labelling properties of the program.  See Figure~\ref{fig:strtrace} on
1651page~\pageref{fig:strtrace} for an illustration of a structured trace.
1652It was originally designed to support the proof of correctness for the
1653timing analysis of the object code in the back-end, then generalised
1654to provide a common structure to use from the end of the front-end to
1655the object code.
1656
1657To make the definition generic we abstract over the semantics of the
1658language,
1659\begin{lstlisting}[language=Grafite]
1660record abstract_status : Type[1] :=
1661  { as_status :> Type[0]
1662  ; as_execute : relation as_status
1663  ; as_pc : DeqSet
1664  ; as_pc_of : as_status $\rightarrow$ as_pc
1665  ; as_classify : as_status $\rightarrow$ status_class
1666  ; as_label_of_pc : as_pc $\rightarrow$ option costlabel
1667  ; as_after_return : ($\Sigma$s:as_status. as_classify s =  cl_call) $\rightarrow$ as_status $\rightarrow$ Prop
1668  ; as_result: as_status $\rightarrow$ option int
1669  ; as_call_ident : ($\Sigma$s:as_status.as_classify s = cl_call) $\rightarrow$ ident
1670  ; as_tailcall_ident : ($\Sigma$s:as_status.as_classify s = cl_tailcall) $\rightarrow$ ident
1671  }.
1672\end{lstlisting}
1673which requires a type of states, an execution relation\footnote{All of
1674  our semantics are executable, but using a relation was simpler in
1675  the abstraction.}, some notion of abstract
1676program counter with decidable equality, the classification of states,
1677and functions to extract the observable intensional information (cost
1678labels and the identity of functions that are called).  The
1679\lstinline'as_after_return' property links the state before a function
1680call with the state after return, providing the evidence that
1681execution returns to the correct place.  The precise form varies
1682between stages; in \textsf{RTLabs} it insists the CFG, the pointer to
1683the CFG node to execute next, and some call stack information is
1684preserved.
1685
1686The structured traces are defined using three mutually inductive
1687types.  The core data structure is \lstinline'trace_any_label', which
1688captures some straight-line execution until the next cost label or the
1689return from the enclosing function.  Any function calls are embedded as
1690a single step, with its own trace nested inside and the before and
1691after states linked by \lstinline'as_after_return'; and states
1692classified as a `jump' (in particular branches) must be followed by a
1693cost label.
1694
1695The second type, \lstinline'trace_label_label', is a
1696\lstinline'trace_any_label' where the initial state is cost labelled.
1697Thus a trace in this type identifies a series of steps whose cost is
1698entirely accounted for by the label at the start.
1699
1700Finally, \lstinline'trace_label_return' is a sequence of
1701\lstinline'trace_label_label' values which end in the return from the
1702function.  These correspond to a measurable subtrace, and in
1703particular include executions of an entire function call (and so are
1704used for the nested calls in \lstinline'trace_any_label').
1705
1706\subsection{Construction}
1707
1708The construction of the structured trace replaces syntactic cost
1709labelling properties, which place requirements on where labels appear
1710in the program, with semantic properties that constrain the execution
1711traces of the program.  The construction begins by defining versions
1712of the sound and precise labelling properties on states and global
1713environments (for the code that appears in each of them) rather than
1714whole programs, and showing that these are preserved by steps of the
1715\textsf{RTLabs} semantics.
1716
1717Then we show that each cost labelling property required by the
1718definition of structured traces is locally satisfied.  These proofs are
1719broken up by the classification of states.  Similarly, we prove a
1720step-by-step stack preservation result, which states that the
1721\textsf{RTLabs} semantics never changes the lower parts of the stack.
1722
1723The core part of the construction of a structured trace is to use the
1724proof of termination from the measurable trace (defined on
1725page~\pageref{prog:terminationproof}) to `fold up' the execution into
1726the nested form.  The results outlined above fill in the proof
1727obligations for the cost labelling properties and the stack
1728preservation result shows that calls return to the correct location.
1729
1730The structured trace alone is not sufficient to capture the property
1731that the program is soundly labelled.  While the structured trace
1732guarantees termination, it still permits a loop to be executed a
1733finite number of times without encountering a cost label.  We
1734eliminate this by proving that no `program counter' repeats within any
1735\lstinline'trace_any_label' section by showing that it is incompatible
1736with the property that there is a bound on the number of successor
1737instructions you can follow in the CFG before you encounter a cost
1738label (from Section~\ref{sec:costchecksimpl}).
1739
1740\subsubsection{Complete execution structured traces}
1741
1742The development of the construction above started relatively early,
1743before the measurable subtrace preservation proofs.  To be confident
1744that the traces were well-formed at that time, we also developed a
1745complete execution form that embeds the traces above.  This includes
1746non-terminating program executions, where an infinite number of the terminating
1747structured traces are embedded.  This construction confirmed that our
1748definition of structured traces was consistent, although we later
1749found that we did not require the whole execution version for the
1750compiler correctness results.
1751
1752To construct these we need to know whether each function call will
1753eventually terminate, requiring the use of the excluded middle.  This
1754classical reasoning is local to the construction of whole program
1755traces and is not necessary for our main results.
1756
1757\section{Conclusion}
1758
1759In combination with the work on the CerCo back-end and by
1760concentrating on the novel intensional parts of the proof, we have
1761shown that it is possible to construct certifying compilers that
1762correctly report execution time and stack space costs.  The layering
1763of intensional correctness proofs on top of normal simulation results
1764provides a useful separation of concerns, and could permit the reuse
1765of existing results.
1766
1767\section{Files}
1768
1769The following table gives a high-level overview of the \matita{}
1770source files in Deliverable 3.4:
1771
1772\bigskip
1773
1774\begin{tabular}{rp{.7\linewidth}}
1775\lstinline'compiler.ma' & Top-level compiler definitions, in particular
1776\lstinline'front_end', and the whole compiler definition
1777\lstinline'compile'. \\
1778\lstinline'correctness.ma' & Correctness results: \lstinline'front_end_correct'
1779and \lstinline'correct', respectively. \\
1780\lstinline'Clight/*' & \textsf{Clight}: proofs for switch
1781removal, cost labelling, cast simplification and conversion to
1782\textsf{Cminor}. \\
1783\lstinline'Cminor/*' & \textsf{Cminor}: axioms of conversion to
1784\textsf{RTLabs}. \\
1785\lstinline'RTLabs/*' & \textsf{RTLabs}: definitions and proofs for
1786compile-time cost labelling checks, construction of structured traces.
1787\\
1788\lstinline'common/Measurable.ma' & Definitions for measurable
1789subtraces. \\
1790\lstinline'common/FEMeasurable.ma' & Generic measurable subtrace
1791lifting proof. \\
1792\lstinline'common/*' & Other common definitions relevant to many parts
1793of the compiler and proof. \\
1794\lstinline'utilities/*' & General purpose definitions used throughout,
1795including extensions to the standard \matita{} library.
1796\end{tabular}
1797
1798\subsection{A brief overview of the backend compilation chain}
1799\label{subsect.brief.overview.backend.compilation.chain}
1800
1801The Matita compiler's backend consists of five distinct intermediate languages: RTL, RTLntl, ERTL, LTL and LIN.
1802A sixth language, RTLabs, serves as the entry point of the backend and the exit point of the frontend.
1803RTL, RTLntl, ERTL and LTL are `control flow graph based' languages, whereas LIN is a linearised language, the final language before translation to assembly.
1804
1805We now briefly discuss the properties of the intermediate languages, and discuss the various transformations that take place during the translation process:
1806
1807\paragraph{RTLabs ((Abstract) Register Transfer Language)}
1808As mentioned, this is the final language of the compiler's frontend and the entry point for the backend.
1809This language uses pseudoregisters, not hardware registers.\footnote{There are an unbounded number of pseudoregisters.  Pseudoregisters are converted to hardware registers or stack positions during register allocation.}
1810Functions still use stackframes, where arguments are passed on the stack and results are stored in addresses.
1811During the pass to RTL instruction selection is carried out.
1812
1813\paragraph{RTL (Register Transfer Language)}
1814This language uses pseudoregisters, not hardware registers.
1815Tailcall elimination is carried out during the translation from RTL to RTLntl.
1816
1817\paragraph{RTLntl (Register Transfer Language --- No Tailcalls)}
1818This language is a pseudoregister, graph based language where all tailcalls are eliminated.
1819RTLntl is not present in the O'Caml compiler, the RTL language is reused for this purpose.
1820
1821\paragraph{ERTL (Explicit Register Transfer Language)}
1822This is a language very similar to RTLntl.
1823However, the calling convention is made explicit, in that functions no longer receive and return inputs and outputs via a high-level mechanism, but rather use stack slots or hadware registers.
1824The ERTL to LTL pass performs the following transformations: liveness analysis, register colouring and register/stack slot allocation.
1825
1826\paragraph{LTL (Linearisable Transfer Language)}
1827Another graph based language, but uses hardware registers instead of pseudoregisters.
1828Tunnelling (branch compression) should be implemented here.
1829
1830\paragraph{LIN (Linearised)}
1831This is a linearised form of the LTL language; function graphs have been linearised into lists of statements.
1832All registers have been translated into hardware registers or stack addresses.
1833This is the final stage of compilation before translating directly into assembly language.
1834
1835%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
1836% SECTION.                                                                    %
1837%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
1838\section{The backend intermediate languages in Matita}
1839\label{sect.backend.intermediate.languages.matita}
1840
1841We now discuss the encoding of the compiler backend languages in the Calculus of Constructions proper.
1842We pay particular heed to changes that we made from the O'Caml prototype.
1843In particular, many aspects of the backend languages have been unified into a single `joint' language.
1844We have also made heavy use of dependent types to reduce `spurious partiality' and to encode invariants.
1845
1846%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
1847% SECTION.                                                                    %
1848%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
1849\subsection{Abstracting related languages}
1850\label{subsect.abstracting.related.languages}
1851
1852The O'Caml compiler is written in the following manner.
1853Each intermediate language has its own dedicated syntax, notions of internal function, and so on.
1854Here, we make a distinction between `internal functions'---other functions that are explicitly written by the programmer---and `external functions', which belong to external library and require explictly linking.
1855In particular, IO can be seen as a special case of the `external function' mechanism.
1856Internal functions are represented as a record consisting of a sequential structure of statements, entry and exit points to this structure, and other book keeping devices.
1857This sequential structure can either be a control flow graph or a linearised list of statements, depending on the language.
1858Translations between intermediate language map syntaxes to syntaxes, and internal function representations to internal function representations explicitly.
1859
1860This is a perfectly valid way to write a compiler, where everything is made explicit, but writing a \emph{verified} compiler poses new challenges.
1861In particular, we must look ahead to see how our choice of encodings will affect the size and complexity of the forthcoming proofs of correctness.
1862We now discuss some abstractions, introduced in the Matita code, which we hope will make our proofs shorter, amongst other benefits.
1863
1864\paragraph{Changes between languages made explicit}
1865Due to the bureaucracy inherent in explicating each intermediate language's syntax in the O'Caml compiler, it can often be hard to see exactly what changes between each successive intermediate language.
1866By abstracting the syntax of the RTL, ERTL, LTL and LIN intermediate languages, we make these changes much clearer.
1867
1868Our abstraction takes the following form:
1869\begin{lstlisting}[language=Grafite]
1870inductive joint_instruction (p: params__) (globals: list ident): Type[0] :=
1871  | COMMENT: String $\rightarrow$ joint_instruction p globals
1872  ...
1873  | INT: generic_reg p $\rightarrow$ Byte $\rightarrow$ joint_instruction p globals
1874  ...
1875  | OP1: Op1 $\rightarrow$ acc_a_reg p $\rightarrow$ acc_a_reg p $\rightarrow$ joint_instruction p globals
1876  ...
1877  | extension: extend_statements p $\rightarrow$ joint_instruction p globals.
1878\end{lstlisting}
1879We first note that for the majority of intermediate languages, many instructions are shared.
1880However, these instructions expect different register types (either a pseudoregister or a hardware register) as arguments.
1881We must therefore parameterise the joint syntax with a record of parameters that will be specialised to each intermediate language.
1882In the type above, this parameterisation is realised with the \texttt{params\_\_} record.
1883As a result of this parameterisation, we have also added a degree of `type safety' to the intermediate languages' syntaxes.
1884In particular, we note that the \texttt{OP1} constructor expects quite a specific type, in that the two register arguments must both be what passes for the accumulator A in that language.
1885In some languages, for example LIN, this is the hardware accumulator, whilst in others this is any pseudoregister.
1886Contrast this with the \texttt{INT} constructor, which expects a \texttt{generic\_reg}, corresponding to an `arbitrary' register type.
1887
1888Further, we note that some intermediate languages have language specific instructions (i.e. the instructions that change between languages).
1889We therefore add a new constructor to the syntax, \texttt{extension}, which expects a value of type \texttt{extend\_statements p}.
1890As \texttt{p} varies between intermediate languages, we can provide language specific extensions to the syntax of the joint language.
1891For example, ERTL's extended syntax consists of the following extra statements:
1892\begin{lstlisting}[language=Grafite]
1893inductive ertl_statement_extension: Type[0] :=
1894  | ertl_st_ext_new_frame: ertl_statement_extension
1895  | ertl_st_ext_del_frame: ertl_statement_extension
1896  | ertl_st_ext_frame_size: register $\rightarrow$ ertl_statement_extension.
1897\end{lstlisting}
1898These are further packaged into an ERTL specific instance of \texttt{params\_\_} as follows:
1899\begin{lstlisting}[language=Grafite]
1900definition ertl_params__: params__ :=
1901  mk_params__ register register ... ertl_statement_extension.
1902\end{lstlisting}
1903
1904\paragraph{Shared code, reduced proofs}
1905Many features of individual backend intermediate languages are shared with other intermediate languages.
1906For instance, RTLabs, RTL, ERTL and LTL are all graph based languages, where functions are represented as a control flow graph of statements that form their bodies.
1907Functions for adding statements to a graph, searching the graph, and so on, are remarkably similar across all languages, but are duplicated in the O'Caml code.
1908
1909As a result, we chose to abstract the representation of internal functions for the RTL, ERTL, LTL and LIN intermediate languages into a `joint' representation.
1910This representation is parameterised by a record that dictates the layout of the function body for each intermediate language.
1911For instance, in RTL, the layout is graph like, whereas in LIN, the layout is a linearised list of statements.
1912Further, a generalised way of accessing the successor statement to the one currently under consideration is needed, and so forth.
1913
1914Our joint internal function record looks like so:
1915\begin{lstlisting}[language=Grafite]
1916record joint_internal_function (globals: list ident) (p:params globals) : Type[0] :=
1917{ 
1918  ...
1919  joint_if_params   : paramsT p;
1920  joint_if_locals   : localsT p;
1921  ...
1922  joint_if_code     : codeT ... p;
1923  ...
1924}.
1925\end{lstlisting}
1926In particular, everything that can vary between differing intermediate languages has been parameterised.
1927Here, we see the location where to fetch parameters, the listing of local variables, and the internal code representation has been parameterised.
1928Other particulars are also parameterised, though here omitted.
1929
1930Hopefully this abstraction process will reduce the number of proofs that need to be written, dealing with internal functions of different languages characterised by parameters.
1931
1932\paragraph{Dependency on instruction selection}
1933We note that the backend languages are all essentially `post instruction selection languages'.
1934The `joint' syntax makes this especially clear.
1935For instance, in the definition:
1936\begin{lstlisting}[language=Grafite]
1937inductive joint_instruction (p:params__) (globals: list ident): Type[0] :=
1938  ...
1939  | INT: generic_reg p $\rightarrow$ Byte $\rightarrow$ joint_instruction p globals
1940  | MOVE: pair_reg p $\rightarrow$ joint_instruction p globals
1941  ...
1942  | PUSH: acc_a_reg p $\rightarrow$ joint_instruction p globals
1943  ...
1944  | extension: extend_statements p $\rightarrow$ joint_instruction p globals.
1945\end{lstlisting}
1946The capitalised constructors---\texttt{INT}, \texttt{MOVE}, and so on---are all machine specific instructions.
1947Retargetting the compiler to another microprocessor, improving instruction selection, or simply enlarging the subset of the machine language that the compiler can use, would entail replacing these constructors with constructors that correspond to the instructions of the new target.
1948We feel that this makes which instructions are target dependent, and which are not (i.e. those language specific instructions that fall inside the \texttt{extension} constructor) much more explicit.
1949In the long term, we would really like to try to directly embed the target language in the syntax, in order to reuse the target language's semantics.
1950
1951\paragraph{Independent development and testing}
1952We have essentially modularised the intermediate languages in the compiler backend.
1953As with any form of modularisation, we reap benefits in the ability to independently test and develop each intermediate language separately, with the benefit of fixing bugs just once.
1954
1955\paragraph{Future reuse for other compiler projects}
1956Another advantage of our modularisation scheme is the ability to quickly use and reuse intermediate languages for other compiler projects.
1957For instance, in creating a cost-preserving compiler for a functional language, we may choose to target a linearised version of RTL directly.
1958Adding such an intermediate language would involve the addition of just a few lines of code.
1959
1960\paragraph{Easy addition of new compiler passes}
1961Under our modularisation and abstraction scheme, new compiler passes can easily be injected into the backend.
1962We have a concrete example of this in the RTLntl language, an intermediate language that was not present in the original O'Caml code.
1963To specify a new intermediate language we must simply specify, through the use of the statement extension mechanism, what differs in the new intermediate language from the `joint' language, and configure a new notion of internal function record, by specialising parameters, to the new language.
1964As generic code for the `joint' language exists, for example to add statements to control flow graphs, this code can be reused for the new intermediate language.
1965
1966\paragraph{Possible commutations of translation passes}
1967The backend translation passes of the CerCo compiler differ quite a bit from the CompCert compiler.
1968In the CompCert compiler, linearisation occurs much earlier in the compilation chain, and passes such as register colouring and allocation are carried out on a linearised form of program.
1969Contrast this with our own approach, where the code is represented as a graph for much longer.
1970Similarly, in CompCert the calling conventions are enforced after register allocation, whereas we do register allocation before enforcing the calling convention.
1971
1972However, by abstracting the representation of intermediate functions, we are now much more free to reorder translation passes as we see fit.
1973The linearisation process, for instance, now no longer cares about the specific representation of code in the source and target languages.
1974It just relies on a common interface.
1975We are therefore, in theory, free to pick where we wish to linearise our representation.
1976This adds an unusual flexibility into the compilation process, and allows us to freely experiment with different orderings of translation passes.
1977
1978%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
1979% SECTION.                                                                    %
1980%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
1981\subsection{Use of dependent types}
1982\label{subsect.use.of.dependent.types}
1983
1984We see several potential ways in which a compiler can fail to compile a program:
1985\begin{enumerate}
1986\item
1987The program is malformed, and there is no hope of making sense of the program.
1988\item
1989The compiler is buggy, or an invariant in the compiler is invalidated.
1990\item
1991An incomplete heuristic in the compiler fails.
1992\item
1993The compiled code exhausts some bounded resource, for instance the processor's code memory.
1994\end{enumerate}
1995Standard compilers can fail for all the above reasons.
1996Certified compilers are only required to rule out the second class of failures, but they can still fail for all the remaining reasons.
1997In particular, a compiler that systematically refuses to compile any well formed program is a sound compiler.
1998On the contrary, we would like our certified compiler to fail only in the fourth case.
1999We plan to achieve this with the following strategy.
2000
2001First, the compiler is abstracted over all incomplete heuristics, seen as total functions.
2002To obtain executable code, the compiler is eventually composed with implementations of the abstracted strategies, with the composition taking care of any potential failure of the heuristics in finding a solution.
2003
2004Second, we reject all malformed programs using dependent types: only well-formed programs should typecheck and the compiler can be applied only to well-typed programs.
2005
2006Finally, exhaustion of bounded resources can be checked only at the very end of compilation.
2007Therefore, all intermediate compilation steps are now total functions that cannot diverge, nor fail: these properties are directly guaranteed by the type system of Matita.
2008
2009Presently, the plan is not yet fulfilled.
2010However, we are improving the implemented code according to our plan.
2011We are doing this by progressively strenghthening the code through the use of dependent types.
2012We detail the different ways in which dependent types have been used so far.
2013
2014First, we encode informal invariants, or uses of \texttt{assert false} in the O'Caml code, with dependent types, converting partial functions into total functions.
2015There are numerous examples of this throughout the backend.
2016For example, in the \texttt{RTLabs} to \texttt{RTL} transformation pass, many functions only `make sense' when lists of registers passed to them as arguments conform to some specific length.
2017For instance, the \texttt{translate\_negint} function, which translates a negative integer constant:
2018\begin{lstlisting}[language=Grafite]
2019definition translate_negint :=
2020  $\lambda$globals: list ident.
2021  $\lambda$destrs: list register.
2022  $\lambda$srcrs: list register.
2023  $\lambda$start_lbl: label.
2024  $\lambda$dest_lbl: label.
2025  $\lambda$def: rtl_internal_function globals.
2026  $\lambda$prf: |destrs| = |srcrs|. (* assert here *)
2027    ...
2028\end{lstlisting}
2029The last argument to the function, \texttt{prf}, is a proof that the lengths of the lists of source and destination registers are the same.
2030This was an assertion in the O'Caml code.
2031
2032Secondly, we make use of dependent types to make the Matita code correct by construction, and eventually the proofs of correctness for the compiler easier to write.
2033For instance, many intermediate languages in the backend of the compiler, from RTLabs to LTL, are graph based languages.
2034Here, function definitions consist of a graph (i.e. a map from labels to statements) and a pair of labels denoting the entry and exit points of this graph.
2035Practically, we would always like to ensure that the entry and exit labels are present in the statement graph.
2036We ensure that this is so with a dependent sum type in the \texttt{joint\_internal\_function} record, which all graph based languages specialise to obtain their own internal function representation:
2037\begin{lstlisting}[language=Grafite]
2038record joint_internal_function (globals: list ident) (p: params globals): Type[0] :=
2039{
2040  ...
2041  joint_if_code     : codeT $\ldots$ p;
2042  joint_if_entry    : $\Sigma$l: label. lookup $\ldots$ joint_if_code l $\neq$ None $\ldots$;
2043  ...
2044}.
2045\end{lstlisting}
2046Here, \texttt{codeT} is a parameterised type representing the `structure' of the function's body (a graph in graph based languages, and a list in the linearised LIN language).
2047Specifically, the \texttt{joint\_if\_entry} is a dependent pair consisting of a label and a proof that the label in question is a vertex in the function's graph. A similar device exists for the exit label.
2048
2049We make use of dependent types also for another reason: experimentation.
2050Namely, CompCert makes little use of dependent types to encode invariants.
2051In contrast, we wish to make as much use of dependent types as possible, both to experiment with different ways of encoding compilers in a proof assistant, but also as a way of `stress testing' Matita's support for dependent types.
2052
2053Moreover, at the moment we make practically no use of inductive predicates to specify compiler invariants and to describe the semantics of intermediate languages.
2054On the contrary, all predicates are computable functions.
2055Therefore, the proof style that we will adopt will be necessarily significantly different from, say, CompCert's one.
2056At the moment, in Matita `Russell-'-style reasoning (in the sense of~\cite{sozeau:subset:2006}) seems to be the best solution for working with computable functions.
2057This style is heavily based on the idea that all computable functions should be specified using dependent types to describe their pre- and post-conditions.
2058As a consequence, it is natural to add dependent types almost everywhere in the Matita compiler's codebase.
2059
2060%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
2061% SECTION.                                                                    %
2062%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
2063\subsection{What we do not implement}
2064\label{subsect.what.we.do.not.implement}
2065
2066There are several classes of functionality that we have chosen not to implement in the backend languages:
2067\begin{itemize}
2068\item
2069\textbf{Datatypes and functions over these datatypes that are not supported by the compiler.}
2070In particular, the compiler does not support the floating point datatype, nor accompanying functions over that datatype.
2071At the moment, frontend languages within the compiler possess constructors corresponding to floating point code.
2072These are removed during instruction selection (in the RTLabs to RTL transformation) using a daemon.\footnote{A Girardism.  An axiom of type \texttt{False}, from which we can prove anything.}
2073However, at some point, we would like the front end of the compiler to recognise programs that use floating point code and reject them as being invalid.
2074\item
2075\textbf{Axiomatised components that will be implemented using external oracles.}
2076Several large, complex pieces of compiler infrastructure, most noticably register colouring and fixed point calculation during liveness analysis have been axiomatised.
2077This was already agreed upon before the start of the project, and is clearly marked in the project proposal, following comments by those involved with the CompCert project about the difficulty in formalising register colouring in that project.
2078Instead, these components are axiomatised, along with the properties that they need to satisfy in order for the rest of the compilation chain to be correct.
2079These axiomatised components are found in the ERTL to LTL pass.
2080
2081It should be noted that these axiomatised components fall into the following pattern: whilst their implementation is complex, and their proof of correctness is difficult, we are able to quickly and easily verify that any answer that they provide is correct. Therefore, we plan to provide implementations in OCaml only,
2082and to provide certified verifiers in Matita.
2083At the moment, the implementation of the certified verifiers is left as future work.
2084\item
2085\textbf{A few non-computational proof obligations.}
2086A few difficult-to-close, but non-computational (i.e. they do not prevent us from executing the compiler inside Matita), proof obligations have been closed using daemons in the backend.
2087These proof obligations originate with our use of dependent types for expressing invariants in the compiler.
2088However, here, it should be mentioned that many open proof obligations are simply impossible to close until we start to obtain stronger invariants from the proof of correctness for the compiler proper.
2089In particular, in the RTLabs to RTL pass, several proof obligations relating to lists of registers stored in a `local environment' appear to fall into this pattern.
2090\item
2091\textbf{Branch compression (tunnelling).}
2092This was a feature of the O'Caml compiler.
2093It is not yet currently implemented in the Matita compiler.
2094This feature is only an optimisation, and will not affect the correctness of the compiler.
2095\item
2096\textbf{`Real' tailcalls}
2097For the time being, tailcalls in the backend are translated to `vanilla' function calls during the ERTL to LTL pass.
2098This follows the O'Caml compiler, which did not implement tailcalls, and did this simplification step.
2099`Real' tailcalls are being implemented in the O'Caml compiler, and when this implementation is complete, we aim to port this code to the Matita compiler.
2100\end{itemize}
2101
2102%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
2103% SECTION.                                                                    %
2104%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
2105\section{Associated changes to O'Caml compiler}
2106\label{sect.associated.changes.to.ocaml.compiler}
2107
2108At the moment, only bugfixes, but no architectural changes we have made in the Matita backend have made their way back into the O'Caml compiler.
2109We do not see the heavy process of modularisation and abstraction as making its way back into the O'Caml codebase, as this is a significant rewrite of the backend code that is supposed to yield the same code after instantiating parameters, anyway.
2110
2111%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
2112% SECTION.                                                                    %
2113%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
2114\section{Future work}
2115\label{sect.future.work}
2116
2117As mentioned in Section~\ref{subsect.what.we.do.not.implement}, there are several unimplemented features in the compiler, and several aspects of the Matita code that can be improved in order to make currently partial functions total.
2118We summarise this future work here:
2119\begin{itemize}
2120\item
2121We plan to make use of dependent types to identify `floating point' free programs and make all functions total over such programs.
2122This will remove a swathe of uses of daemons.
2123This should be routine.
2124\item
2125We plan to move expansion of integer modulus, and other related functions, into the instruction selection (RTLabs to RTL) phase.
2126This will also help to remove a swathe of uses of daemons, as well as potentially introduce new opportunities for optimisations that we currently miss in expanding these instructions at the C-light level.
2127\item
2128We plan to close all existing proof obligations that are closed using daemons, arising from our use of dependent types in the backend.
2129However, many may not be closable until we have completed Deliverable D4.4, the certification of the whole compiler, as we may not have invariants strong enough at the present time.
2130\item
2131We plan to port the O'Caml compiler's implementation of tailcalls when this is completed, and eventually port the branch compression code currently in the O'Caml compiler to the Matita implementation.
2132This should not cause any major problems.
2133\item
2134We plan to validate the backend translations, removing any obvious bugs, by executing the translation inside Matita on small C programs.
2135This is not critical, as the certification process will find all bugs anyway.
2136\item We plan to provide certified validators for all results provided by
2137external oracles written in OCaml. At the moment, we have axiomatized oracles
2138for computing least fixpoints during liveness analysis, for colouring
2139registers and for branch displacement in the assembler code.
2140\end{itemize}
2141
2142\section{The back-end intermediate languages' semantics in Matita}
2143\label{sect.backend.intermediate.languages.semantics.matita}
2144
2145%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
2146% SECTION.                                                                    %
2147%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
2148\subsection{Abstracting related languages}
2149\label{subsect.abstracting.related.languages}
2150
2151As mentioned in the report for Deliverable D4.2, a systematic process of abstraction, over the OCaml code, has taken place in the Matita encoding.
2152In particular, we have merged many of the syntaxes of the intermediate languages (i.e. RTL, ERTL, LTL and LIN) into a single `joint' syntax, which is parameterised by various types.
2153Equivalent intermediate languages to those present in the OCaml code can be recovered by specialising this joint structure.
2154
2155As mentioned in the report for Deliverable D4.2, there are a number of advantages that this process of abstraction brings, from code reuse to allowing us to get a clearer view of the intermediate languages and their structure.
2156However, the semantics of the intermediate languages allow us to concretely demonstrate this improvement in clarity, by noting that the semantics of the LTL and the semantics of the LIN languages are identical.
2157In particular, the semantics of both LTL and LIN are implemented in exactly the same way.
2158The only difference between the two languages is how the next instruction to be interpreted is fetched.
2159In LTL, this involves looking up in a graph, whereas in LTL, this involves fetching from a list of instructions.
2160
2161As a result, we see that the semantics of LIN and LTL are both instances of a single, more general language that is parametric in how the next instruction is fetched.
2162Furthermore, any prospective proof that the semantics of LTL and LIN are identical is now almost trivial, saving a deal of work in Deliverable D4.4.
2163
2164%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
2165% SECTION.                                                                    %
2166%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
2167\subsection{Type parameters, and their purpose}
2168\label{subsect.type.parameters.their.purpose}
2169
2170We mentioned in the Deliverable D4.2 report that all joint languages are parameterised by a number of types, which are later specialised to each distinct intermediate language.
2171As this parameterisation process is also dependent on designs decisions in the language semantics, we have so far held off summarising the role of each parameter.
2172
2173We begin the abstraction process with the \texttt{params\_\_} record.
2174This holds the types of the representations of the different register varieties in the intermediate languages:
2175\begin{lstlisting}[language=Grafite]
2176record params__: Type[1] :=
2177{
2178  acc_a_reg: Type[0];
2179  acc_b_reg: Type[0];
2180  dpl_reg: Type[0];
2181  dph_reg: Type[0];
2182  pair_reg: Type[0];
2183  generic_reg: Type[0];
2184  call_args: Type[0];
2185  call_dest: Type[0];
2186  extend_statements: Type[0]
2187}.
2188\end{lstlisting}
2189We summarise what these types mean, and how they are used in both the semantics and the translation process:
2190\begin{center}
2191\begin{tabular*}{\textwidth}{p{4cm}p{11cm}}
2192Type & Explanation \\
2193\hline
2194\texttt{acc\_a\_reg} & The type of the accumulator A register.  In some languages this is implemented as the hardware accumulator, whereas in others this is a pseudoregister.\\
2195\texttt{acc\_b\_reg} & Similar to the accumulator A field, but for the processor's auxilliary accumulator, B. \\
2196\texttt{dpl\_reg} & The type of the representation of the low eight bit register of the MCS-51's single 16 bit register, DPL.  Can be either a pseudoregister or the hardware DPL register. \\
2197\texttt{dph\_reg} & Similar to the DPL register but for the eight high bits of the 16-bit register. \\
2198\texttt{pair\_reg} & Various different `move' instructions have been merged into a single move instruction in the joint language.  A value can either be moved to or from the accumulator in some languages, or moved to and from an arbitrary pseudoregister in others.  This type encodes how we should move data around the registers and accumulators. \\
2199\texttt{generic\_reg} & The representation of generic registers (i.e. those that are not devoted to a specific task). \\
2200\texttt{call\_args} & The actual arguments passed to a function.  For some languages this is simply the number of arguments passed to the function. \\
2201\texttt{call\_dest} & The destination of the function call. \\
2202\texttt{extend\_statements} & Instructions that are specific to a particular intermediate language, and which cannot be abstracted into the joint language.
2203\end{tabular*}
2204\end{center}
2205
2206As mentioned in the report for Deliverable D4.2, the record \texttt{params\_\_} is enough to be able to specify the instructions of the joint languages:
2207\begin{lstlisting}[language=Grafite]
2208inductive joint_instruction (p: params__) (globals: list ident): Type[0] :=
2209  | COMMENT: String $\rightarrow$ joint_instruction p globals
2210  | COST_LABEL: costlabel $\rightarrow$ joint_instruction p globals
2211  ...
2212  | OP1: Op1 $\rightarrow$ acc_a_reg p $\rightarrow$ acc_a_reg p $\rightarrow$ joint_instruction p globals
2213  | COND: acc_a_reg p $\rightarrow$ label $\rightarrow$ joint_instruction p globals
2214  ...
2215\end{lstlisting}
2216Here, we see that the instruction \texttt{OP1} (a unary operation on the accumulator A) can be given quite a specific type, through the use of the \texttt{params\_\_} data structure.
2217
2218Joint statements can be split into two subclasses: those who simply pass the flow of control onto their successor statement, and those that jump to a potentially remote location in the program.
2219Naturally, as some intermediate languages are graph based, and others linearised, the passing act of passing control on to the `successor' instruction can either be the act of following a graph edge in a control flow graph, or incrementing an index into a list.
2220We make a distinction between instructions that pass control onto their immediate successors, and those that jump elsewhere in the program, through the use of \texttt{succ}, denoting the immediate successor of the current instruction, in the \texttt{params\_} record described below.
2221\begin{lstlisting}[language=Grafite]
2222record params_: Type[1] :=
2223{
2224  pars__ :> params__;
2225  succ: Type[0]
2226}.
2227\end{lstlisting}
2228The type \texttt{succ} corresponds to labels, in the case of control flow graph based languages, or is instantiated to the unit type for the linearised language, LIN.
2229Using \texttt{param\_} we can define statements of the joint language:
2230\begin{lstlisting}[language=Grafite]
2231inductive joint_statement (p:params_) (globals: list ident): Type[0] :=
2232  | sequential: joint_instruction p globals $\rightarrow$ succ p $\rightarrow$ joint_statement p globals
2233  | GOTO: label $\rightarrow$ joint_statement p globals
2234  | RETURN: joint_statement p globals.
2235\end{lstlisting}
2236Note that in the joint language, instructions are `linear', in that they have an immediate successor.
2237Statements, on the other hand, consist of either a linear instruction, or a \texttt{GOTO} or \texttt{RETURN} statement, both of which can jump to an arbitrary place in the program. The conditional jump instruction COND is `linear', since it
2238has an immediate successor, but it also takes an arbitrary location (a label)
2239to jump to.
2240
2241For the semantics, we need further parametererised types.
2242In particular, we parameterise the result and parameter type of an internal function call in \texttt{params0}:
2243\begin{lstlisting}[language=Grafite]
2244record params0: Type[1] :=
2245{
2246  pars__' :> params__;
2247  resultT: Type[0];
2248  paramsT: Type[0]
2249}.
2250\end{lstlisting}
2251Here, \texttt{paramsT} and \texttt{resultT} typically are the (pseudo)registers that store the parameters and result of a function.
2252
2253We further extend \texttt{params0} with a type for local variables in internal function calls:
2254\begin{lstlisting}[language=Grafite]
2255record params1 : Type[1] :=
2256{
2257  pars0 :> params0;
2258  localsT: Type[0]
2259}.
2260\end{lstlisting}
2261Again, we expand our parameters with types corresponding to the code representation (either a control flow graph or a list of statements).
2262Further, we hypothesise a generic method for looking up the next instruction in the graph, called \texttt{lookup}.
2263Note that \texttt{lookup} may fail, and returns an \texttt{option} type:
2264\begin{lstlisting}[language=Grafite]
2265record params (globals: list ident): Type[1] :=
2266{
2267  succ_ : Type[0];
2268  pars1 :> params1;
2269  codeT : Type[0];
2270  lookup: codeT $\rightarrow$ label $\rightarrow$ option (joint_statement (mk_params_ pars1 succ_) globals)
2271}.
2272\end{lstlisting}
2273We now have what we need to define internal functions for the joint language.
2274The first two `universe' fields are only used in the compilation process, for generating fresh names, and do not affect the semantics.
2275The rest of the fields affect both compilation and semantics.
2276In particular, we have a description of the result, parameters and the local variables of a function.
2277Note also that we have lifted the hypothesised \texttt{lookup} function from \texttt{params} into a dependent sigma type, which combines a label (the entry and exit points of the control flow graph or list) combined with a proof that the label is in the graph structure:
2278\begin{lstlisting}[language=Grafite]
2279record joint_internal_function (globals: list ident) (p:params globals) : Type[0] :=
2280{
2281  joint_if_luniverse: universe LabelTag;
2282  joint_if_runiverse: universe RegisterTag;
2283  joint_if_result   : resultT p;
2284  joint_if_params   : paramsT p;
2285  joint_if_locals   : localsT p;
2286  joint_if_stacksize: nat;
2287  joint_if_code     : codeT ... p;
2288  joint_if_entry    : $\Sigma$l: label. lookup ... joint_if_code l $\neq$ None ?;
2289  joint_if_exit     : $\Sigma$l: label. lookup ... joint_if_code l $\neq$ None ?
2290}.
2291\end{lstlisting}
2292Naturally, a question arises as to why we have chosen to split up the parameterisation into so many intermediate records, each slightly extending earlier ones.
2293The reason is because some intermediate languages share a host of parameters, and only differ on some others.
2294For instance, in instantiating the ERTL language, certain parameters are shared with RTL, whilst others are ERTL specific:
2295\begin{lstlisting}[language=Grafite]
2296...
2297definition ertl_params__: params__ :=
2298 mk_params__ register register register register (move_registers $\times$ move_registers)
2299  register nat unit ertl_statement_extension.
2300...
2301definition ertl_params1: params1 := rtl_ertl_params1 ertl_params0.
2302definition ertl_params: $\forall$globals. params globals := rtl_ertl_params ertl_params0.
2303...
2304definition ertl_statement := joint_statement ertl_params_.
2305
2306definition ertl_internal_function :=
2307  $\lambda$globals.joint_internal_function ... (ertl_params globals).
2308\end{lstlisting}
2309Here, \texttt{rtl\_ertl\_params1} are the common parameters of the ERTL and RTL languages:
2310\begin{lstlisting}[language=Grafite]
2311definition rtl_ertl_params1 := $\lambda$pars0. mk_params1 pars0 (list register).
2312\end{lstlisting}
2313
2314The record \texttt{more\_sem\_params} bundles together functions that store and retrieve values in various forms of register:
2315\begin{lstlisting}[language=Grafite]
2316record more_sem_params (p:params_): Type[1] :=
2317{
2318  framesT: Type[0];
2319  empty_framesT: framesT;
2320
2321  regsT: Type[0];
2322  empty_regsT: regsT;
2323
2324  call_args_for_main: call_args p;
2325  call_dest_for_main: call_dest p;
2326
2327  greg_store_: generic_reg p $\rightarrow$ beval $\rightarrow$ regsT $\rightarrow$ res regsT;
2328  greg_retrieve_: regsT $\rightarrow$ generic_reg p $\rightarrow$ res beval;
2329  acca_store_: acc_a_reg p $\rightarrow$ beval $\rightarrow$ regsT $\rightarrow$ res regsT;
2330  acca_retrieve_: regsT $\rightarrow$ acc_a_reg p $\rightarrow$ res beval;
2331  ...
2332  dpl_store_: dpl_reg p $\rightarrow$ beval $\rightarrow$ regsT $\rightarrow$ res regsT;
2333  dpl_retrieve_: regsT $\rightarrow$ dpl_reg p $\rightarrow$ res beval;
2334  ...
2335  pair_reg_move_: regsT $\rightarrow$ pair_reg p $\rightarrow$ res regsT;
2336}.
2337\end{lstlisting}
2338Here, the fields \texttt{empty\_framesT}, \texttt{empty\_regsT}, \texttt{call\_args\_for\_main} and \texttt{call\_dest\_for\_main} are used for state initialisation.
2339
2340The fields \texttt{greg\_store\_} and \texttt{greg\_retrieve\_} store and retrieve values from a generic register, respectively.
2341Similarly, \texttt{pair\_reg\_move} implements the generic move instruction of the joint language.
2342Here \texttt{framesT} is the type of stack frames, with \texttt{empty\_framesT} an empty stack frame.
2343
2344The two hypothesised values \texttt{call\_args\_for\_main} and \texttt{call\_dest\_for\_main} deal with problems with the \texttt{main} function of the program, and how it is handled.
2345In particular, we need to know when the \texttt{main} function has finished executing.
2346But this is complicated, in C, by the fact that the \texttt{main} function is explicitly allowed to be recursive (disallowed in C++).
2347Therefore, to understand whether the exiting \texttt{main} function is really exiting, or just recursively calling itself, we need to remember the address to which \texttt{main} will return control once the initial call to \texttt{main} has finished executing.
2348This is done with \texttt{call\_dest\_for\_main}, whereas \texttt{call\_args\_for\_main} holds the \texttt{main} function's arguments.
2349
2350We extend \texttt{more\_sem\_params} with yet more parameters via \texttt{more\_sem\_params2}:
2351\begin{lstlisting}[language=Grafite]
2352record more_sem_params1 (globals: list ident) (p: params globals) : Type[1] :=
2353{
2354  more_sparams1 :> more_sem_params p;
2355
2356  succ_pc: succ p $\rightarrow$ address $\rightarrow$ res address;
2357  pointer_of_label: genv ... p $\rightarrow$ pointer $\rightarrow$
2358    label $\rightarrow$ res ($\Sigma$p:pointer. ptype p = Code);
2359  ...
2360  fetch_statement:
2361    genv ... p $\rightarrow$ state (mk_sem_params ... more_sparams1) $\rightarrow$
2362    res (joint_statement (mk_sem_params ... more_sparams1) globals);
2363  ...
2364  save_frame:
2365    address $\rightarrow$ nat $\rightarrow$ paramsT ... p $\rightarrow$ call_args p $\rightarrow$ call_dest p $\rightarrow$
2366    state (mk_sem_params ... more_sparams1) $\rightarrow$
2367    res (state (mk_sem_params ... more_sparams1));
2368  pop_frame:
2369    genv globals p $\rightarrow$ state (mk_sem_params ... more_sparams1) $\rightarrow$
2370    res ((state (mk_sem_params ... more_sparams1)));
2371  ...
2372  set_result:
2373    list val $\rightarrow$ state (mk_sem_params ... more_sparams1) $\rightarrow$
2374    res (state (mk_sem_params ... more_sparams1));
2375  exec_extended:
2376    genv globals p $\rightarrow$ extend_statements (mk_sem_params ... more_sparams1) $\rightarrow$
2377    succ p $\rightarrow$ state (mk_sem_params ... more_sparams1) $\rightarrow$
2378    IO io_out io_in (trace $\times$ (state (mk_sem_params ... more_sparams1)))
2379 }.
2380\end{lstlisting}
2381The field \texttt{succ\_pc} takes an address, and a `successor' label, and returns the address of the instruction immediately succeeding the one at hand.
2382
2383Here, \texttt{fetch\_statement} fetches the next statement to be executed.
2384The fields \texttt{save\_frame} and \texttt{pop\_frame} manipulate stack frames.
2385In particular, \texttt{save\_frame} creates a new stack frame on the top of the stack, saving the destination and parameters of a function, and returning an updated state.
2386The field \texttt{pop\_frame} destructively pops a stack frame from the stack, returning an updated state.
2387Further, \texttt{set\_result} saves the result of the function computation, and \texttt{exec\_extended} is a function that executes the extended statements, peculiar to each individual intermediate language.
2388
2389We bundle \texttt{params} and \texttt{sem\_params} together into a single record.
2390This will be used in the function \texttt{eval\_statement} which executes a single statement of the joint language:
2391\begin{lstlisting}[language=Grafite]
2392record sem_params2 (globals: list ident): Type[1] :=
2393{
2394  p2 :> params globals;
2395  more_sparams2 :> more_sem_params2 globals p2
2396}.
2397\end{lstlisting}
2398\noindent
2399The \texttt{state} record holds the current state of the interpreter:
2400\begin{lstlisting}[language=Grafite]
2401record state (p: sem_params): Type[0] :=
2402{
2403  st_frms: framesT ? p;
2404  pc: address;
2405  sp: pointer;
2406  isp: pointer;
2407  carry: beval;
2408  regs: regsT ? p;
2409  m: bemem
2410}.
2411\end{lstlisting}
2412Here \texttt{st\_frms} represent stack frames, \texttt{pc} the program counter, \texttt{sp} the stack pointer, \texttt{isp} the internal stack pointer, \texttt{carry} the carry flag, \texttt{regs} the registers (hardware and pseudoregisters) and \texttt{m} external RAM.
2413Note that we have two stack pointers, as we have two stacks: the physical stack of the MCS-51 microprocessor, and an emulated stack in external RAM.
2414The MCS-51's own stack is minuscule, therefore it is usual to emulate a much larger, more useful stack in external RAM.
2415We require two stack pointers as the MCS-51's \texttt{PUSH} and \texttt{POP} instructions manipulate the physical stack, and not the emulated one.
2416
2417We use the function \texttt{eval\_statement} to evaluate a single joint statement:
2418\begin{lstlisting}[language=Grafite]
2419definition eval_statement:
2420  $\forall$globals: list ident.$\forall$p:sem_params2 globals.
2421    genv globals p $\rightarrow$ state p $\rightarrow$ IO io_out io_in (trace $\times$ (state p)) :=
2422...
2423\end{lstlisting}
2424We examine the type of this function.
2425Note that it returns a monadic action, \texttt{IO}, denoting that it may have an IO \emph{side effect}, where the program reads or writes to some external device or memory address.
2426Monads and their use are further discussed in Subsection~\ref{subsect.use.of.monads}.
2427Further, the function returns a new state, updated by the single step of execution of the program.
2428Finally, a \emph{trace} is also returned, which records externally observable `events', such as the calling of external functions and the emission of cost labels.
2429
2430%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
2431% SECTION.                                                                    %
2432%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
2433\subsection{Use of monads}
2434\label{subsect.use.of.monads}
2435
2436Monads are a categorical notion that have recently gained an amount of traction in functional programming circles.
2437In particular, it was noted by Moggi that monads could be used to sequence \emph{effectful} computations in a pure manner.
2438Here, `effectful computations' cover a lot of ground, from writing to files, generating fresh names, or updating an ambient notion of state.
2439
2440A monad can be characterised by the following:
2441\begin{itemize}
2442\item
2443A data type, $M$.
2444For instance, the \texttt{option} type in OCaml or Matita.
2445\item
2446A way to `inject' or `lift' pure values into this data type (usually called \texttt{return}).
2447We call this function \texttt{return} and say that it must have type $\alpha \rightarrow M \alpha$, where $M$ is the name of the monad.
2448In our example, the `lifting' function for the \texttt{option} monad can be implemented as:
2449\begin{lstlisting}[language=Grafite]
2450let return x = Some x
2451\end{lstlisting}
2452\item
2453A way to `sequence' monadic functions together, to form another monadic function, usually called \texttt{bind}.
2454Bind has type $M \alpha \rightarrow (\alpha \rightarrow M \beta) \rightarrow M \beta$.
2455We can see that bind `unpacks' a monadic value, applies a function after unpacking, and `repacks' the new value in the monad.
2456In our example, the sequencing function for the \texttt{option} monad can be implemented as:
2457\begin{lstlisting}[language=Grafite]
2458let bind o f =
2459  match o with
2460    None -> None
2461    Some s -> f s
2462\end{lstlisting}
2463\item
2464A series of algebraic laws that relate \texttt{return} and \texttt{bind}, ensuring that the sequencing operation `does the right thing' by retaining the order of effects.
2465These \emph{monad laws} should also be useful in reasoning about monadic computations in the proof of correctness of the compiler.
2466\end{itemize}
2467In the semantics of both front and back-end intermediate languages, we make use of monads.
2468This monadic infrastructure is shared between the front-end and back-end languages.
2469
2470In particular, an `IO' monad, signalling the emission of a cost label, or the calling of an external function, is heavily used in the semantics of the intermediate languages.
2471Here, the monad's sequencing operation ensures that cost label emissions and function calls are maintained in the correct order.
2472We have already seen how the \texttt{eval\_statement} function of the joint language is monadic, with type:
2473\begin{lstlisting}[language=Grafite]
2474definition eval_statement:
2475  $\forall$globals: list ident.$\forall$p:sem_params2 globals.
2476    genv globals p $\rightarrow$ state p $\rightarrow$ IO io_out io_in (trace $\times$ (state p)) :=
2477...
2478\end{lstlisting}
2479If we examine the body of \texttt{eval\_statement}, we may also see how the monad sequences effects.
2480For instance, in the case for the \texttt{LOAD} statement, we have the following:
2481\begin{lstlisting}[language=Grafite]
2482definition eval_statement:
2483  $\forall$globals: list ident. $\forall$p:sem_params2 globals.
2484    genv globals p $\rightarrow$ state p $\rightarrow$ IO io_out io_in (trace $\times$ (state p)) :=
2485  $\lambda$globals, p, ge, st.
2486  ...
2487  match s with
2488  | LOAD dst addrl addrh $\Rightarrow$
2489    ! vaddrh $\leftarrow$ dph_retrieve ... st addrh;
2490    ! vaddrl $\leftarrow$ dpl_retrieve ... st addrl;
2491    ! vaddr $\leftarrow$ pointer_of_address $\langle$vaddrl,vaddrh$\rangle$;
2492    ! v $\leftarrow$ opt_to_res ... (msg FailedLoad) (beloadv (m ... st) vaddr);
2493    ! st $\leftarrow$ acca_store p ... dst v st;
2494    ! st $\leftarrow$ next ... l st ;
2495      ret ? $\langle$E0, st$\rangle$
2496\end{lstlisting}
2497Here, we employ a certain degree of syntactic sugaring.
2498The syntax
2499\begin{lstlisting}[language=Grafite]
2500  ...
2501! vaddrh $\leftarrow$ dph_retrieve ... st addrh;
2502! vaddrl $\leftarrow$ dpl_retrieve ... st addrl;
2503  ...
2504\end{lstlisting}
2505is sugaring for the \texttt{IO} monad's binding operation.
2506We can expand this sugaring to the following much more verbose code:
2507\begin{lstlisting}[language=Grafite]
2508  ...
2509  bind (dph_retrieve ... st addrh) ($\lambda$vaddrh. bind (dpl_retrieve ... st addrl)
2510    ($\lambda$vaddrl. ...))
2511\end{lstlisting}
2512Note also that the function \texttt{ret} is implementing the `lifting', or return function of the \texttt{IO} monad.
2513
2514We believe the sugaring for the monadic bind operation makes the program much more readable, and therefore easier to reason about.
2515In particular, note that the functions \texttt{dph\_retrieve}, \texttt{pointer\_of\_address}, \texttt{acca\_store} and \texttt{next} are all monadic.
2516
2517Note, however, that inside this monadic code, there is also another monad hiding.
2518The \texttt{res} monad signals failure, along with an error message.
2519The monad's sequencing operation ensures the order of error messages does not get rearranged.
2520The function \texttt{opt\_to\_res} lifts an option type into this monad, with an error message to be used in case of failure.
2521The \texttt{res} monad is then coerced into the \texttt{IO} monad, ensuring the whole code snippet typechecks.
2522
2523\subsection{Memory models}
2524\label{subsect.memory.models}
2525
2526Currently, the semantics of the front and back-end intermediate languages are built around two distinct memory models.
2527The front-end languages reuse the CompCert 1.6 memory model, whereas the back-end languages employ a version tailored to their needs.
2528This split between the memory models reflects the fact that the front-end and back-end languages have different requirements from their memory models.
2529
2530In particular, the CompCert 1.6 memory model places quite heavy restrictions on where in memory one can read from.
2531To read a value in this memory model, you must supply an address, complete with the size of `chunk' to read following that address.
2532The read is only successful if you attempt to read at a genuine `value boundary', and read the appropriate amount of memory for that value.
2533As a result, with that memory model you are unable to read the third byte of a 32-bit integer value directly from memory, for instance.
2534This has some consequences for the compiler, namely an inability to write a \texttt{memcpy} routine.
2535
2536However, the CerCo memory model operates differently, as we need to move data `piecemeal' between stacks in the back-end of the compiler.
2537As a result, the back-end memory model allows one to read data at any memory location, not just on value boundaries.
2538This has the advantage that we can successfully give a semantics to a \texttt{memcpy} routine in the back-end of the CerCo compiler (remembering that \texttt{memcpy} is nothing more than `read a byte, copy a byte' repeated in a loop), an advantage over CompCert.  However, the front-end of CerCo cannot because its memory model and values are the similar to CompCert 1.6.
2539
2540More recent versions of CompCert's memory model have evolved in a similar direction, with a byte-by-byte representation of memory blocks.  However, there remains an important difference in the handling of pointer values in the rest of the formalisation.  In particular, in CompCert 1.10 only complete pointer values can be loaded in all of the languages in the compiler, whereas in CerCo we need to represent individual bytes of a pointer in the back-end to support our 8-bit target architecture.
2541
2542Right now, the two memory models are interfaced during the translation from RTLabs to RTL.
2543It is an open question whether we will unify the two memory models, using only the back-end, bespoke memory model throughout the compiler, as the CompCert memory model seems to work fine for the front-end, where such byte-by-byte copying is not needed.
2544However, should we decide to port the front-end to the new memory model, it has been written in such an abstract way that doing so would be relatively straightforward.
2545
2546
Note: See TracBrowser for help on using the repository browser.