# source:Papers/jar-cerco-2017/proof.tex@3657

Last change on this file since 3657 was 3657, checked in by mulligan, 3 years ago

more cannibalisation

File size: 101.2 KB
Line
1% Compiler proof
2%   Structure of proof, and high-level discussion
3%   Technical devices: structured traces, labelling, etc.
4%   Assembler proof
5%   Technical issues in front end (Brian?)
6%   Main theorem statement
7
8\section{Compiler proof}
9\label{sect.compiler.proof}
10
11\section{Introduction}
12
13The \cerco{} compiler produces a version of the source code containing
14annotations describing the timing behaviour of the object code, as
15well as the object code itself. It compiles C code, targeting
16microcontrollers implementing the Intel 8051 architecture.  There are
17two versions: first, an initial prototype was implemented in
18\ocaml{}~\cite{d2.2}, then a version was formalised in the \matita{}
19proof assistant~\cite{d3.2,d4.2} and extracted to \ocaml{} code to
20produce an executable compiler.  In this document we present results
21from Deliverable 3.4, the formalised proofs in \matita{} about the
22front-end of the latter version of the compiler (culminating in the
23\lstinline'front_end_correct' lemma), and describe how that fits
24into the verification of the whole compiler.
25
26A key part of this work was to layer the \emph{intensional} correctness
27results that show that the costs produced are correct on top of the
28proofs about the compiled code's \emph{extensional} behaviour (that is, the
29functional correctness of the compiler).  Unfortunately, the ambitious
30goal of completely verifying the entire compiler was not feasible
31within the time available, but thanks to this separation of
32extensional and intensional proofs we are able to axiomatise some extensional
33simulation results which are similar to those in other compiler verification
34projects and concentrate on the novel intensional proofs.  We were
35also able to add stack space costs to obtain a stronger result.  The
36proofs were made more tractable by introducing compile-time checks for
37the sound and precise' cost labelling properties rather than proving
38that they are preserved throughout.
39
40The overall statement of correctness says that the annotated program has the
41same behaviour as the input, and that for any suitably well-structured part of
42the execution (which we call \emph{measurable}), the object code will execute
43the same behaviour taking precisely the time given by the cost annotations in
44the annotated source program.
45
46In the next section we recall the structure of the compiler and make the overall
47statement more precise.  Following that, in Section~\ref{sec:fegoals} we
48describe the statements we need to prove about the intermediate \textsf{RTLabs}
49programs for the back-end proofs.
50Section~\ref{sec:inputtolabelling} covers the compiler passes which produce the
51annotated source program and Section~\ref{sec:measurablelifting} the rest
52of the transformations in the front-end.  Then the compile-time checks
53for good cost labelling are detailed in Section~\ref{sec:costchecks}
54and the proofs that the structured traces required by the back-end
55exist are discussed in Section~\ref{sec:structuredtrace}.
56
57\section{The compiler and its correctness statement}
58
59The uncertified prototype \ocaml{} \cerco{} compiler was originally described
60in Deliverables 2.1 and 2.2.  Its design was replicated in the formal
61\matita{} code, which was presented in Deliverables 3.2 and 4.2, for
62the front-end and back-end respectively.
63
64\begin{figure}
65\begin{center}
66\includegraphics[width=0.5\linewidth]{compiler-plain.pdf}
67\end{center}
68\caption{Languages in the \cerco{} compiler}
69\label{fig:compilerlangs}
70\end{figure}
71
72The compiler uses a number of intermediate languages, as outlined the
73middle two lines of Figure~\ref{fig:compilerlangs}.  The upper line
74represents the front-end of the compiler, and the lower the back-end,
75finishing with Intel 8051 binary code.  Not all of the front-end compiler passes
76introduce a new language, and Figure~\ref{fig:summary} presents a
77list of every pass involved.
78
79\begin{figure}
80\begin{center}
81\begin{minipage}{.8\linewidth}
82\begin{tabbing}
83\quad \= $\downarrow$ \quad \= \kill
84\textsf{C} (unformalised)\\
85\> $\downarrow$ \> CIL parser (unformalised \ocaml)\\
86\textsf{Clight}\\
87%\> $\downarrow$ \> add runtime functions\\
88\> $\downarrow$ \> \lstinline[language=C]'switch' removal\\
89\> $\downarrow$ \> labelling\\
90\> $\downarrow$ \> cast removal\\
91\> $\downarrow$ \> stack variable allocation and control structure
92 simplification\\
93\textsf{Cminor}\\
94%\> $\downarrow$ \> generate global variable initialization code\\
95\> $\downarrow$ \> transform to RTL graph\\
96\textsf{RTLabs}\\
97\> $\downarrow$ \> check cost labelled properties of RTL graph\\
98\> $\downarrow$ \> start of target specific back-end\\
100\end{tabbing}
101\end{minipage}
102\end{center}
103\caption{Front-end languages and compiler passes}
104\label{fig:summary}
105\end{figure}
106
107\label{page:switchintro}
108The annotated source code is produced by the cost labelling phase.
109Note that there is a pass to replace C \lstinline[language=C]'switch'
110statements before labelling --- we need to remove them because the
111simple form of labelling used in the formalised compiler is not quite
112capable of capturing their execution time costs, largely due to C's
113fall-through' behaviour where execution from one branch continues in
114the next unless there is an explicit \lstinline[language=C]'break'.
115
116The cast removal phase which follows cost labelling simplifies
117expressions to prevent unnecessary arithmetic promotion, which is
118specified by the C standard but costly for an 8-bit target.  The
119transformation to \textsf{Cminor} and subsequently \textsf{RTLabs}
120bear considerable resemblance to some passes of the CompCert
121compiler~\cite{Blazy-Leroy-Clight-09,Leroy-backend}, although we use a simpler \textsf{Cminor} where
122all loops use \lstinline[language=C]'goto' statements, and the
123\textsf{RTLabs} language retains a target-independent flavour.  The
124back-end takes \textsf{RTLabs} code as input.
125
126The whole compilation function returns the following information on success:
127\begin{lstlisting}[language=Grafite]
128  record compiler_output : Type[0] :=
129   { c_labelled_object_code: labelled_object_code
130   ; c_stack_cost: stack_cost_model
131   ; c_max_stack: nat
132   ; c_init_costlabel: costlabel
133   ; c_labelled_clight: clight_program
134   ; c_clight_cost_map: clight_cost_map
135   }.
136\end{lstlisting}
137It consists of annotated 8051 object code, a mapping from function
138identifiers to the function's stack space usage, the space available for the
139stack after global variable allocation, a cost label covering the
140execution time for the initialisation of global variables and the call
141to the \lstinline[language=C]'main' function, the annotated source
142code, and finally a mapping from cost labels to actual execution time
143costs.
144
145An \ocaml{} pretty printer is used to provide a concrete version of
146the output code and annotated source code.  In the case of the
147annotated source code, it also inserts the actual costs alongside the
148cost labels, and optionally adds a global cost variable and
149instrumentation to support further reasoning in external tools such as
150Frama-C.
151
152\subsection{Revisions to the prototype compiler}
153
154Our focus on intensional properties prompted us to consider whether we
155could incorporate stack space into the costs presented to the user.
156We only allocate one fixed-size frame per function, so modelling this
157was relatively simple.  It is the only form of dynamic memory
158allocation provided by the compiler, so we were able to strengthen the
159statement of the goal to guarantee successful execution whenever the
160stack space obeys the \lstinline'c_max_stack' bound calculated by
161subtracting the global variable requirements from the total memory
162available.
163
164The cost labelling checks at the end of Figure~\ref{fig:summary} have been
165introduced to reduce the proof burden, and are described in
166Section~\ref{sec:costchecks}.
167
168The use of dependent types to capture simple intermediate language
169invariants makes every front-end pass a total function, except
170\textsf{Clight} to \textsf{Cminor} and the cost checks.  Hence various
171well-formedness and type safety checks are performed only once between
172\textsf{Clight} and \textsf{Cminor}, and the invariants rule out any
173difficulties in the later stages.  With the benefit of hindsight we
174would have included an initial checking phase to produce a
175well-formed' variant of \textsf{Clight}, conjecturing that this would
176simplify various parts of the proofs for the \textsf{Clight} stages
177which deal with potentially ill-formed code.
178
179Following D2.2, we previously generated code for global variable
180initialisation in \textsf{Cminor}, for which we reserved a cost label
181to represent the execution time for initialisation.  However, the
182back-end must also add an initial call to the main function, whose
183cost must also be accounted for, so we decided to move the
184initialisation code to the back-end and merge the costs.
185
186\subsection{Main correctness statement}
187
188Informally, our main intensional result links the time difference in a source
189code execution to the time difference in the object code, expressing the time
190for the source by summing the values for the cost labels in the trace, and the
191time for the target by a clock built in to the 8051 executable semantics.
192
193The availability of precise timing information for 8501
194implementations and the design of the compiler allow it to give exact
195time costs in terms of processor cycles, not just upper bounds.
196However, these exact results are only available if the subtrace we
197measure starts and ends at suitable points.  In particular, pure
198computation with no observable effects may be reordered and moved past
199cost labels, so we cannot measure time between arbitrary statements in
200the program.
201
202There is also a constraint on the subtraces that we
203measure due to the requirements of the correctness proof for the
204object code timing analysis.  To be sure that the timings are assigned
205to the correct cost label, we need to know that each return from a
206function call must go to the correct return address.  It is difficult
207to observe this property locally in the object code because it relies
208on much earlier stages in the compiler.  To convey this information to
209the timing analysis extra structure is imposed on the subtraces, which
210is described in Section~\ref{sec:fegoals}.
211
212% Regarding the footnote, would there even be much point?
213% TODO: this might be quite easy to add ('just' subtract the
214% measurable subtrace from the second label to the end).  Could also
215% measure other traces in this manner.
216These restrictions are reflected in the subtraces that we give timing
217guarantees on; they must start at a cost label and end at the return
218of the enclosing function of the cost label\footnote{We expect that
219  this would generalise to more general subtraces by subtracting costs
220  for unwanted measurable suffixes of a measurable subtrace.}.  A
221typical example of such a subtrace is the execution of an entire
222function from the cost label at the start of the function until it
223returns.  We call such any such subtrace \emph{measurable} if it (and
224the prefix of the trace from the start to the subtrace) can also be
225executed within the available stack space.
226
227Now we can give the main intensional statement for the compiler.
228Given a \emph{measurable} subtrace for a labelled \textsf{Clight}
229program, there is a subtrace of the 8051 object code program where the
230time differences match.  Moreover, \emph{observable} parts of the
231trace also match --- these are the appearance of cost labels and
232function calls and returns.
233
234
235
236More formally, the definition of this statement in \matita{} is
237\begin{lstlisting}[language=Grafite]
238definition simulates :=
239  $\lambda$p: compiler_output.
240  let initial_status := initialise_status $...$ (cm (c_labelled_object_code $...$ p)) in
241  $\forall$m1,m2.
242   measurable Clight_pcs (c_labelled_clight $...$ p) m1 m2
243       (stack_sizes (c_stack_cost $...$ p)) (c_max_stack $...$ p) $\rightarrow$
244  $\forall$c1,c2.
245   clock_after Clight_pcs (c_labelled_clight $...$ p) m1 (c_clight_cost_map $...$ p) = OK $...$ c1 $\rightarrow$
246   clock_after Clight_pcs (c_labelled_clight $...$ p) (m1+m2) (c_clight_cost_map $...$ p) = OK $...$ c2 $\rightarrow$
247  $\exists$n1,n2.
248   observables Clight_pcs (c_labelled_clight $...$ p) m1 m2 =
249     observables (OC_preclassified_system (c_labelled_object_code $...$ p))
250          (c_labelled_object_code $...$ p) n1 n2
251  $\wedge$
252   clock ?? (execute (n1+n2) ? initial_status) =
253     clock ?? (execute n1 ? initial_status) + (c2-c1).
254\end{lstlisting}
255where the \lstinline'measurable', \lstinline'clock_after' and
256\lstinline'observables' definitions are generic definitions for multiple
257languages; in this case the \lstinline'Clight_pcs' record applies them
258to \textsf{Clight} programs.
259
260There is a second part to the statement, which says that the initial
261processing of the input program to produce the cost labelled version
262does not affect the semantics of the program:
263% Yes, I'm paraphrasing the result a tiny bit to remove the observe non-function
264\begin{lstlisting}[language=Grafite]
265  $\forall$input_program,output.
266  compile input_program = return output $\rightarrow$
267  not_wrong $...$ (exec_inf $...$ clight_fullexec input_program) $\rightarrow$
268  sim_with_labels
269   (exec_inf $...$ clight_fullexec input_program)
270   (exec_inf $...$ clight_fullexec (c_labelled_clight $...$ output))
271\end{lstlisting}
272That is, any successful compilation produces a labelled program that
273has identical behaviour to the original, so long as there is no
274undefined behaviour'.
275
276Note that this statement provides full functional correctness, including
277preservation of (non-)termination.  The intensional result above does
278not do this directly --- it does not guarantee the same result or same
279termination.  There are two mitigating factors, however: first, to
280prove the intensional property you need local simulation results --- these
281can be pieced together to form full behavioural equivalence, only time
282constraints have prevented us from doing so.  Second, if we wish to
283confirm a result, termination, or non-termination we could add an
284observable witness, such as a function that is only called if the
285correct result is given.  The intensional result guarantees that the
286observable witness is preserved, so the program must behave correctly.
287
288These two results are combined in the the \lstinline'correct'
289theorem in the file \lstinline'correctness.ma'.
290
291\section{Correctness statement for the front-end}
292\label{sec:fegoals}
293
294The essential parts of the intensional proof were outlined during work
295on a toy compiler in Task
297\begin{enumerate}
298\item functional correctness, in particular preserving the trace of
299  cost labels,
300\item the \emph{soundness} and \emph{precision} of the cost labelling
301  on the object code, and
302\item the timing analysis on the object code produces a correct
303  mapping from cost labels to time.
304\end{enumerate}
305
306However, that toy development did not include function calls.  For the
307full \cerco{} compiler we also need to maintain the invariant that
308functions return to the correct program location in the caller, as we
309mentioned in the previous section.  During work on the back-end timing
310analysis (describe in more detail in the companion deliverable, D4.4)
311the notion of a \emph{structured trace} was developed to enforce this
312return property, and also most of the cost labelling properties too.
313
314\begin{figure}
315\begin{center}
316\includegraphics[width=0.5\linewidth]{compiler.pdf}
317\end{center}
318\caption{The compiler and proof outline}
319\label{fig:compiler}
320\end{figure}
321
322Jointly, we generalised the structured traces to apply to any of the
323intermediate languages which have some idea of program counter.  This means
324that they are introduced part way through the compiler, see
325Figure~\ref{fig:compiler}.  Proving that a structured trace can be
326constructed at \textsf{RTLabs} has several virtues:
327\begin{itemize}
328\item This is the first language where every operation has its own
330\item Function calls and returns are still handled implicitly in the
331  language and so the structural properties are ensured by the
332  semantics.
333\item Many of the back-end languages from \textsf{RTL} onwards share a common
334  core set of definitions, and using structured traces throughout
335  increases this uniformity.
336\end{itemize}
337
338\begin{figure}
339\begin{center}
340\includegraphics[width=0.6\linewidth]{strtraces.pdf}
341\end{center}
342\caption{Nesting of functions in structured traces}
343\label{fig:strtrace}
344\end{figure}
345A structured trace is a mutually inductive data type which
346contains the steps from a normal program trace, but arranged into a
347nested structure which groups entire function calls together and
348aggregates individual steps between cost labels (or between the final
349cost label and the return from the function), see
350Figure~\ref{fig:strtrace}.  This captures the idea that the cost labels
351only represent costs \emph{within} a function --- calls to other
352functions are accounted for in the nested trace for their execution, and we
353can locally regard function calls as a single step.
354
355These structured traces form the core part of the intermediate results
356that we must prove so that the back-end can complete the main
357intensional result stated above.  In full, we provide the back-end
358with
359\begin{enumerate}
360\item A normal trace of the \textbf{prefix} of the program's execution
361  before reaching the measurable subtrace.  (This needs to be
362  preserved so that we know that the stack space consumed is correct,
363  and to set up the simulation results.)
364\item The \textbf{structured trace} corresponding to the measurable
365  subtrace.
367  program counter' is \textbf{repeated} between cost labels.  Together with
368  the structure in the trace, this takes over from showing that
369  cost labelling is sound and precise.
370\item A proof that the \textbf{observables} have been preserved.
371\item A proof that the \textbf{stack limit} is still observed by the prefix and
372  the structure trace.  (This is largely a consequence of the
373  preservation of observables.)
374\end{enumerate}
375The \lstinline'front_end_correct' lemma in the
376\lstinline'correctness.ma' file provides a record containing these.
377
378Following the outline in Figure~\ref{fig:compiler}, we will first deal
379with the transformations in \textsf{Clight} that produce the source
380program with cost labels, then show that measurable traces can be
381lifted to \textsf{RTLabs}, and finally show that we can construct the
382properties listed above ready for the back-end proofs.
383
384\section{Input code to cost labelled program}
385\label{sec:inputtolabelling}
386
387As explained on page~\pageref{page:switchintro}, the costs of complex
388C \lstinline[language=C]'switch' statements cannot be represented with
389the simple labelling used in the formalised compiler.  Our first pass
390replaces these statements with simpler C code, allowing our second
391pass to perform the cost labelling.  We show that the behaviour of
392programs is unchanged by these passes using forward
393simulations\footnote{All of our languages are deterministic, which can
394be seen directly from their executable definitions.  Thus we know that
395forward simulations are sufficient because the target cannot have any
396other behaviour.}.
397
398\subsection{Switch removal}
399
400We compile away \lstinline[language=C]'switch' statements into more
401basic \textsf{Clight} code.
402Note that this transformation does not necessarily deteriorate the
403efficiency of the generated code. For instance, compilers such as GCC
404introduce balanced trees of if-then-else'' constructs for small
405switches. However, our implementation strategy is much simpler. Let
406us consider the following input statement.
407
408\begin{lstlisting}[language=C]
409   switch(e) {
410   case v1:
411     stmt1;
412   case v2:
413     stmt2;
414   default:
415     stmt_default;
416   }
417\end{lstlisting}
418
419Note that \textsf{stmt1}, \textsf{stmt2}, \ldots \textsf{stmt\_default}
420may contain \lstinline[language=C]'break' statements, which have the
421effect of exiting the switch statement. In the absence of break, the
422execution falls through each case sequentially. In our implementation,
423we produce an equivalent sequence of if-then'' chained by gotos:
424\begin{lstlisting}[language=C]
425   fresh = e;
426   if(fresh == v1) {
427     $\llbracket$stmt1$\rrbracket$;
428     goto lbl_case2;
429   };
430   if(fresh == v2) {
431     lbl_case2:
432     $\llbracket$stmt2$\rrbracket$;
433     goto lbl_case2;
434   };
435   $\llbracket$stmt_default$\rrbracket$;
436   exit_label:
437\end{lstlisting}
438
439The proof had to tackle the following points:
440\begin{itemize}
441\item the source and target memories are not the same (due to the fresh variable),
442\item the flow of control is changed in a non-local way (e.g. \textbf{goto}
444\end{itemize}
445In order to tackle the first point, we implemented a version of memory
446extensions similar to those of CompCert.
447
448For the simulation we decided to prove a sufficient amount to give us
449confidence in the definitions and approach, but to curtail the proof
450because this pass does not contribute to the intensional correctness
451result.  We tackled several simple cases, that do not interact with
452the switch removal per se, to show that the definitions were usable,
453and part of the switch case to check that the approach is
454reasonable. This comprises propagating the memory extension through
455each statement (except switch), as well as various invariants that are
456needed for the switch case (in particular, freshness hypotheses). The
457details of the evaluation process for the source switch statement and
458its target counterpart can be found in the file
459\lstinline'switchRemoval.ma', along more details on the transformation
460itself.
461
462Proving the correctness of the second point would require reasoning on the
463semantics of \lstinline[language=C]'goto' statements. In the \textsf{Clight}
464semantics, this is implemented as a function-wide lookup of the target label.
465The invariant we would need is the fact that a global label lookup on a freshly
466created goto is equivalent to a local lookup. This would in turn require the
467propagation of some freshness hypotheses on labels. As discussed,
468we decided to omit this part of the correctness proof.
469
470\subsection{Cost labelling}
471
472The simulation for the cost labelling pass is the simplest in the
473front-end.  The main argument is that any step of the source program
474is simulated by the same step of the labelled one, plus any extra
475steps for the added cost labels.  The extra instructions do not change
476the memory or local environments, although we have to keep track of
477the extra instructions that appear in continuations, for example
478during the execution of a \lstinline[language=C]'while' loop.
479
480We do not attempt to capture any cost properties of the labelling\footnote{We describe how the cost properties are
481established in Section~\ref{sec:costchecks}.} in
482the simulation proof, which allows the proof to be oblivious to the choice
483of cost labels.  Hence we do not have to reason about the threading of
484name generation through the labelling function, greatly reducing the
485amount of effort required.
486
487%TODO: both give one-step-sim-by-many forward sim results; switch
488%removal tricky, uses aux var to keep result of expr, not central to
489%intensional correctness so curtailed proof effort once reasonable
490%level of confidence in code gained; labelling much simpler; don't care
491%what the labels are at this stage, just need to know when to go
492%through extra steps.  Rolled up into a single result with a cofixpoint
493%to obtain coinductive statement of equivalence (show).
494
495\section{Finding corresponding measurable subtraces}
496\label{sec:measurablelifting}
497
498There follow the three main passes of the front-end:
499\begin{enumerate}
500\item simplification of casts in \textsf{Clight} code
501\item \textsf{Clight} to \textsf{Cminor} translation, performing stack
502  variable allocation and simplifying control structures
503\item transformation to \textsf{RTLabs} control flow graph
504\end{enumerate}
505We have taken a common approach to
506each pass: first we build (or axiomatise) forward simulation results
507that are similar to normal compiler proofs, but which are slightly more
508fine-grained so that we can see that the call structure and relative
509placement of cost labels is preserved.
510
511Then we instantiate a general result which shows that we can find a
512\emph{measurable} subtrace in the target of the pass that corresponds
513to the measurable subtrace in the source.  By repeated application of
514this result we can find a measurable subtrace of the execution of the
515\textsf{RTLabs} code, suitable for the construction of a structured
516trace (see Section~\ref{sec:structuredtrace}).  This is essentially an
517extra layer on top of the simulation proofs that provides us with the
518additional information required for our intensional correctness proof.
519
520\subsection{Generic measurable subtrace lifting proof}
521
522Our generic proof is parametrised on a record containing small-step
523semantics for the source and target language, a classification of
524states (the same form of classification is used when defining
525structured traces), a simulation relation which respects the
526classification and cost labelling and
527four simulation results.  The simulations are split by the starting state's
528classification and whether it is a cost label, which will allow us to
529observe that the call structure is preserved.  They are:
530\begin{enumerate}
531\item a step from a normal' state (which is not classified as a call
532  or return) which is not a cost label is simulated by zero or more
533  normal' steps;
534\item a step from a call' state followed by a cost label step is
535  simulated by a step from a call' state, a corresponding label step,
536  then zero or more normal' steps;
537\item a step from a call' state not followed by a cost label
538  similarly (note that this case cannot occur in a well-labelled
539  program, but we do not have enough information locally to exploit
540  this); and
541\item a cost label step is simulated by a cost label step.
542\end{enumerate}
543Finally, we need to know that a successfully translated program will
544have an initial state in the simulation relation with the original
545program's initial state.
546
547The back-end has similar requirements for lifting simulations to
548structured traces.  Fortunately, our treatment of calls and returns
549can be slightly simpler because we have special call and return states
550that correspond to function entry and return that are separate from
551the actual instructions.  This was originally inherited from our port
552of CompCert's \textsf{Clight} semantics, but proves useful here
553because we only need to consider adding extra steps \emph{after} a
554call or return state, because the instruction step deals with extra
555steps that occur before.  The back-end makes all of the call and
556return machinery explicit, and thus needs more complex statements
557about extra steps before and after each call and return.
558
559\begin{figure}
560\begin{center}
561\includegraphics[width=0.5\linewidth]{meassim.pdf}
562\end{center}
563\caption{Tiling of simulation for a measurable subtrace}
564\label{fig:tiling}
565\end{figure}
566
567To find the measurable subtrace in the target program's execution we
568walk along the original program's execution trace applying the
569appropriate simulation result by induction on the number of steps.
570While the number of steps taken varies, the overall structure is
571preserved, as illustrated in Figure~\ref{fig:tiling}.  By preserving
572the structure we also maintain the same intensional observables.  One
573delicate point is that the cost label following a call must remain
574directly afterwards\footnote{The prototype compiler allowed some
575  straight-line code to appear before the cost label until a later
576  stage of the compiler, but we must move the requirement forward to
577  fit with the structured traces.}
578% Damn it, I should have just moved the cost label forwards in RTLabs,
579% like the prototype does in RTL to ERTL; the result would have been
580% simpler.  Or was there some reason not to do that?
581(both in the program code and in the execution trace), even if we
582introduce extra steps, for example to store parameters in memory in
583\textsf{Cminor}.  Thus we have a version of the call simulation
584that deals with both the call and the cost label in one result.
585
586In addition to the subtrace we are interested in measuring, we must
587prove that the earlier part of the trace is also preserved in
588order to use the simulation from the initial state.  This proof also
589guarantees that we do not run out of stack space before the subtrace
590we are interested in.  The lemmas for this prefix and the measurable
591subtrace are similar, following the pattern above.  However, the
592measurable subtrace also requires us to rebuild the termination
593proof.  This is defined recursively:
594\label{prog:terminationproof}
595\begin{lstlisting}[language=Grafite]
596  let rec will_return_aux C (depth:nat)
597                               (trace:list (cs_state $...$ C $\times$ trace)) on trace : bool :=
598  match trace with
599  [ nil $\Rightarrow$ false
600  | cons h tl $\Rightarrow$
601    let $\langle$s,tr$\rangle$ := h in
602    match cs_classify C s with
603    [ cl_call $\Rightarrow$ will_return_aux C (S depth) tl
604    | cl_return $\Rightarrow$
605        match depth with
606        [ O $\Rightarrow$ match tl with [ nil $\Rightarrow$ true | _ $\Rightarrow$ false ]
607        | S d $\Rightarrow$ will_return_aux C d tl
608        ]
609    | _ $\Rightarrow$ will_return_aux C depth tl
610    ]
611  ].
612\end{lstlisting}
613The \lstinline'depth' is the number of return states we need to see
614before we have returned to the original function (initially zero) and
615\lstinline'trace' the measurable subtrace obtained from the running
616the semantics for the correct number of steps.  This definition
617unfolds tail recursively for each step, and once the corresponding
618simulation result has been applied a new one for the target can be
619asserted by unfolding and applying the induction hypothesis on the
620shorter trace.
621
622Combining the lemmas about the prefix and the measurable subtrace
623requires a little care because the states joining the two might not be
624related in the simulation.  In particular, if the measurable subtrace
625starts from the cost label at the beginning of the function there may
626be some extra instructions in the target code to execute to complete
627function entry before the states are back in the relation.  Hence we
628carefully phrased the lemmas to allow for such extra steps.
629
630Together, these then gives us an overall result for any simulation fitting the
631requirements above (contained in the \lstinline'meas_sim' record):
632\begin{lstlisting}[language=Grafite]
633theorem measured_subtrace_preserved :
634  $\forall$MS:meas_sim.
635  $\forall$p1,p2,m,n,stack_cost,max.
636  ms_compiled MS p1 p2 $\rightarrow$
637  measurable (ms_C1 MS) p1 m n stack_cost max $\rightarrow$
638  $\exists$m',n'.
639    measurable (ms_C2 MS) p2 m' n' stack_cost max $\wedge$
640    observables (ms_C1 MS) p1 m n = observables (ms_C2 MS) p2 m' n'.
641\end{lstlisting}
642The stack space requirement that is embedded in \lstinline'measurable'
643is a consequence of the preservation of observables, because it is
644determined by the functions called and returned from, which are observable.
645
646\subsection{Simulation results for each pass}
647
648We now consider the simulation results for the passes, each of which
649is used to instantiate the
650\lstinline[language=Grafite]'measured_subtrace_preserved' theorem to
651construct the measurable subtrace for the next language.
652
653\subsubsection{Cast simplification}
654
655The parser used in \cerco{} introduces a lot of explicit type casts.
656If left as they are, these constructs can greatly hamper the
657quality of the generated code -- especially as the architecture
658we consider is an $8$-bit one. In \textsf{Clight}, casts are
659expressions. Hence, most of the work of this transformation
660proceeds on expressions. The transformation proceeds by recursively
661trying to coerce an expression to a particular integer type, which
662is in practice smaller than the original one. This functionality
663is implemented by two mutually recursive functions whose signature
664is the following.
665
666\begin{lstlisting}[language=Grafite]
667let rec simplify_expr (e:expr) (target_sz:intsize) (target_sg:signedness)
668  : $\Sigma$result:bool$\times$expr.
669    $\forall$ge,en,m. simplify_inv ge en m e (\snd result) target_sz target_sg (\fst result) := $\ldots$
670
671and simplify_inside (e:expr) : $\Sigma$result:expr. conservation e result := $\ldots$
672\end{lstlisting}
673
674The \textsf{simplify\_inside} acts as a wrapper for
675\textsf{simplify\_expr}. Whenever \textsf{simplify\_inside} encounters
676a \textsf{Ecast} expression, it tries to coerce the sub-expression
677to the desired type using \textsf{simplify\_expr}, which tries to
678perform the actual coercion. In return, \textsf{simplify\_expr} calls
679back \textsf{simplify\_inside} in some particular positions, where we
680decided to be conservative in order to simplify the proofs. However,
681the current design allows to incrementally revert to a more aggressive
682version, by replacing recursive calls to \textsf{simplify\_inside} by
683calls to \textsf{simplify\_expr} \emph{and} proving the corresponding
684invariants -- where possible.
685
686The \textsf{simplify\_inv} invariant encodes either the conservation
687of the semantics during the transformation corresponding to the failure
688of the coercion (\textsf{Inv\_eq} constructor), or the successful
689downcast of the considered expression to the target type
690(\textsf{Inv\_coerce\_ok}).
691
692\begin{lstlisting}[language=Grafite]
693inductive simplify_inv
694  (ge : genv) (en : env) (m : mem)
695  (e1 : expr) (e2 : expr) (target_sz : intsize) (target_sg : signedness) : bool $\rightarrow$ Prop :=
696| Inv_eq : $\forall$result_flag. $\ldots$
697     simplify_inv ge en m e1 e2 target_sz target_sg result_flag
698| Inv_coerce_ok : $\forall$src_sz,src_sg.
699     typeof e1 = Tint src_sz src_sg $\rightarrow$
700     typeof e2 = Tint target_sz target_sg $\rightarrow$
701     smaller_integer_val src_sz target_sz src_sg (exec_expr ge en m e1) (exec_expr ge en m e2) $\rightarrow$
702     simplify_inv ge en m e1 e2 target_sz target_sg true.
703\end{lstlisting}
704
705The \textsf{conservation} invariant for \textsf{simplify\_inside} simply states the conservation
706of the semantics, as in the \textsf{Inv\_eq} constructor of the previous
707invariant.
708
709\begin{lstlisting}[language=Grafite]
710definition conservation := $\lambda$e,result. $\forall$ge,en,m.
711          res_sim ? (exec_expr ge en m e) (exec_expr ge en m result)
712       $\wedge$ res_sim ? (exec_lvalue ge en m e) (exec_lvalue ge en m result)
713       $\wedge$ typeof e = typeof result.
714\end{lstlisting}
715
716This invariant is then easily lifted to statement evaluations.
717The main problem encountered with this particular pass was dealing with
718inconsistently typed programs, a canonical case being a particular
719integer constant of a certain size typed with another size. This
720prompted the need to introduce numerous type checks, making
721both the implementation and the proof more complex, even though more
722comprehensive checks are made in the next stage.
723%\todo{Make this a particular case of the more general statement on baking more invariants in the Clight language}
724
725\subsubsection{Clight to Cminor}
726
727This pass is the last one operating on the \textsf{Clight} language.
728Its input is a full \textsf{Clight} program, and its output is a
729\textsf{Cminor} program. Note that we do not use an equivalent of
730CompCert's \textsf{C\#minor} language: we translate directly to a
731variant of \textsf{Cminor}. This presents the advantage of not
732requiring the special loop constructs, nor the explicit block
733structure. Another salient point of our approach is that a significant
734number of the properties needed for the simulation proof were directly
735encoded in dependently typed translation functions.  In particular,
736freshness conditions and well-typedness conditions are included. The
737main effects of the transformation from \textsf{Clight} to
738\textsf{Cminor} are listed below.
739
740\begin{itemize}
741\item Variables are classified as being either globals, stack-allocated
742  locals or potentially register-allocated locals. The value of register-allocated
743  local variables is moved out of the modelled memory and stored in a
744  dedicated environment.
745\item In \textsf{Clight}, each local variable has a dedicated memory block, whereas
746  stack-allocated locals are bundled together on a function-by-function basis.
747\item Loops are converted to jumps.
748\end{itemize}
749
750The first two points require memory injections which are more flexible that those
751needed in the switch removal case. In the remainder of this section, we briefly
752discuss our implementation of memory injections, and then the simulation proof.
753
754\paragraph{Memory injections.}
755
756Our memory injections are modelled after the work of Blazy \& Leroy.
757However, the corresponding paper is based on the first version of the
758CompCert memory model~\cite{2008-Leroy-Blazy-memory-model}, whereas we use a much more concrete model, allowing byte-level
759manipulations (as in the later version of CompCert's memory model). We proved
760roughly 80 \% of the required lemmas. Notably, some of the difficulties encountered were
761due to overly relaxed conditions on pointer validity (fixed during development).
762Some more side conditions had to be added to take care of possible overflows when converting
763from \textbf{Z} block bounds to $16$ bit pointer offsets (in practice, such overflows
764only occur in edge cases that are easily ruled out -- but this fact is not visible
765in memory injections). Concretely, some of the lemmas on the preservation of simulation of
766loads after writes were axiomatised, due to a lack of time.
767
768\paragraph{Simulation proof.}
769
770We proved the simulation result for expressions and a representative
771selection of statements.  In particular we tackled
772\lstinline[language=C]'while' statements to ensure that we correctly
773translate loops because our approach differs from CompCert by
774converting directly to \textsf{Cminor} \lstinline[language=C]'goto's
775rather than maintaining a notion of loop in \textsf{Cminor}.  We also have a partial
776proof for function entry, covering the setup of the memory injection,
777but not function exit.  Exits, and the remaining statements, have been
778axiomatised.
779
780Careful management of the proof state was required because proof terms
781are embedded in \textsf{Cminor} code to show that invariants are
782respected.  These proof terms appear in the proof state when inverting
783the translation functions, and they can be large and awkward.  While
784generalising them away is usually sufficient, it can be difficult when
785they appear under a binder.
786
787%The correctness proof for this transformation was not completed. We proved the
788%simulation result for expressions and for some subset of the critical statement cases.
789%Notably lacking are the function entry and exit, where the memory injection is
790%properly set up. As would be expected, a significant amount of work has to be performed
791%to show the conservation of all invariants at each simulation step.
792
793%\todo{list cases, explain while loop, explain labeling problem}
794
795\subsubsection{Cminor to RTLabs}
796
797The translation from \textsf{Cminor} to \textsf{RTLabs} is a fairly
798routine control flow graph (CFG) construction.  As such, we chose to
799axiomatise the associated extensional simulation results.  However, we did prove several
800properties of the generated programs:
801\begin{itemize}
802\item All statements are type correct with respect to the declared
803  pseudo-register type environment.
804\item The CFG is closed, and has a distinguished entry node and a
805  unique exit node.
806\end{itemize}
807
808These properties rely on similar properties about type safety and the
809presence of \lstinline[language=C]'goto'-labels for \textsf{Cminor} programs
810which are checked at the preceding stage.  As a result, this
811transformation is total and any compilation failures must occur when
812the corresponding \textsf{Clight} source is available and a better
813error message can be generated.
814
815The proof obligations for these properties include many instances of
816graph inclusion.  We automated these proofs using a small amount of
817reflection, making the obligations much easier to handle.  One
818drawback to enforcing invariants throughout is that temporarily
819breaking them can be awkward.  For example, \lstinline'return'
820statements were originally used as placeholders for
821\lstinline[language=C]'goto' destinations that had not yet been
822translated.  However, this made establishing the single exit node
823property rather difficult, and a different placeholder was chosen
824instead.  In other circumstances it is possible to prove a more
825complex invariant then simplify it at the end of the transformation.
826
827\section{Checking cost labelling properties}
828\label{sec:costchecks}
829
830Ideally, we would provide proofs that the cost labelling pass always
831produces programs that are soundly and precisely labelled and that
832each subsequent pass preserves these properties.  This would match our
833use of dependent types to eliminate impossible sources of errors
834during compilation, in particular retaining intermediate language type
835information.
836
837However, given the limited amount of time available we realised that
838implementing a compile-time check for a sound and precise labelling of
839the \textsf{RTLabs} intermediate code would reduce the proof burden
840considerably.  This is similar in spirit to the use of translation
841validation in certified compilation, which makes a similar trade-off
842between the potential for compile-time failure and the volume of proof
843required.
844
845The check cannot be pushed into a later stage of the compiler because
846much of the information is embedded into the structured traces.
847However, if an alternative method was used to show that function
848returns in the compiled code are sufficiently well-behaved, then we
849could consider pushing the cost property checks into the timing
850analysis itself.  We leave this as a possible area for future work.
851
852\subsection{Implementation and correctness}
853\label{sec:costchecksimpl}
854
855For a cost labelling to be sound and precise we need a cost label at
856the start of each function, after each branch and at least one in
857every loop.  The first two parts are trivial to check by examining the
858code.  In \textsf{RTLabs} the last part is specified by saying
859that there is a bound on the number of successive instruction nodes in
860the CFG that you can follow before you encounter a cost label, and
861checking this is more difficult.
862
863The implementation progresses through the set of nodes in the graph,
864following successors until a cost label is found or a label-free cycle
865is discovered (in which case the property does not hold and we return
866an error).  This is made easier by the prior knowledge that every
867successor of a branch instruction is a cost label, so we do not need
868to search each branch.  When a label is found, we remove the chain of
869program counters from the set and continue from another node in the
870set until it is empty, at which point we know that there is a bound
871for every node in the graph.
872
873Directly reasoning about the function that implements this procedure would be
874rather awkward, so an inductive specification of a single step of its
875behaviour was written and proved to match the implementation.  This
876was then used to prove the implementation sound and complete.
877
878While we have not attempted to prove that the cost labelled properties
879are established and preserved earlier in the compiler, we expect that
880the effort for the \textsf{Cminor} to \textsf{RTLabs} stage alone
881would be similar to the work outlined above, because it involves the
882change from requiring a cost label at particular positions to
883requiring cost labels to break loops in the CFG.  As there are another
884three passes to consider (including the labelling itself), we believe
885that using the check above is much simpler overall.
886
887% TODO? Found some Clight to Cminor bugs quite quickly
888
889\section{Existence of a structured trace}
890\label{sec:structuredtrace}
891
892The \emph{structured trace} idea introduced in
893Section~\ref{sec:fegoals} enriches the execution trace of a program by
894nesting function calls in a mixed-step style and embedding the cost
895labelling properties of the program.  See Figure~\ref{fig:strtrace} on
896page~\pageref{fig:strtrace} for an illustration of a structured trace.
897It was originally designed to support the proof of correctness for the
898timing analysis of the object code in the back-end, then generalised
899to provide a common structure to use from the end of the front-end to
900the object code.
901
902To make the definition generic we abstract over the semantics of the
903language,
904\begin{lstlisting}[language=Grafite]
905record abstract_status : Type[1] :=
906  { as_status :> Type[0]
907  ; as_execute : relation as_status
908  ; as_pc : DeqSet
909  ; as_pc_of : as_status $\rightarrow$ as_pc
910  ; as_classify : as_status $\rightarrow$ status_class
911  ; as_label_of_pc : as_pc $\rightarrow$ option costlabel
912  ; as_after_return : ($\Sigma$s:as_status. as_classify s =  cl_call) $\rightarrow$ as_status $\rightarrow$ Prop
913  ; as_result: as_status $\rightarrow$ option int
914  ; as_call_ident : ($\Sigma$s:as_status.as_classify s = cl_call) $\rightarrow$ ident
915  ; as_tailcall_ident : ($\Sigma$s:as_status.as_classify s = cl_tailcall) $\rightarrow$ ident
916  }.
917\end{lstlisting}
918which requires a type of states, an execution relation\footnote{All of
919  our semantics are executable, but using a relation was simpler in
920  the abstraction.}, some notion of abstract
921program counter with decidable equality, the classification of states,
922and functions to extract the observable intensional information (cost
923labels and the identity of functions that are called).  The
924\lstinline'as_after_return' property links the state before a function
925call with the state after return, providing the evidence that
926execution returns to the correct place.  The precise form varies
927between stages; in \textsf{RTLabs} it insists the CFG, the pointer to
928the CFG node to execute next, and some call stack information is
929preserved.
930
931The structured traces are defined using three mutually inductive
932types.  The core data structure is \lstinline'trace_any_label', which
933captures some straight-line execution until the next cost label or the
934return from the enclosing function.  Any function calls are embedded as
935a single step, with its own trace nested inside and the before and
936after states linked by \lstinline'as_after_return'; and states
937classified as a jump' (in particular branches) must be followed by a
938cost label.
939
940The second type, \lstinline'trace_label_label', is a
941\lstinline'trace_any_label' where the initial state is cost labelled.
942Thus a trace in this type identifies a series of steps whose cost is
943entirely accounted for by the label at the start.
944
945Finally, \lstinline'trace_label_return' is a sequence of
946\lstinline'trace_label_label' values which end in the return from the
947function.  These correspond to a measurable subtrace, and in
948particular include executions of an entire function call (and so are
949used for the nested calls in \lstinline'trace_any_label').
950
951\subsection{Construction}
952
953The construction of the structured trace replaces syntactic cost
954labelling properties, which place requirements on where labels appear
955in the program, with semantic properties that constrain the execution
956traces of the program.  The construction begins by defining versions
957of the sound and precise labelling properties on states and global
958environments (for the code that appears in each of them) rather than
959whole programs, and showing that these are preserved by steps of the
960\textsf{RTLabs} semantics.
961
962Then we show that each cost labelling property required by the
963definition of structured traces is locally satisfied.  These proofs are
964broken up by the classification of states.  Similarly, we prove a
965step-by-step stack preservation result, which states that the
966\textsf{RTLabs} semantics never changes the lower parts of the stack.
967
968The core part of the construction of a structured trace is to use the
969proof of termination from the measurable trace (defined on
970page~\pageref{prog:terminationproof}) to fold up' the execution into
971the nested form.  The results outlined above fill in the proof
972obligations for the cost labelling properties and the stack
974
975The structured trace alone is not sufficient to capture the property
976that the program is soundly labelled.  While the structured trace
977guarantees termination, it still permits a loop to be executed a
978finite number of times without encountering a cost label.  We
979eliminate this by proving that no program counter' repeats within any
980\lstinline'trace_any_label' section by showing that it is incompatible
981with the property that there is a bound on the number of successor
982instructions you can follow in the CFG before you encounter a cost
983label (from Section~\ref{sec:costchecksimpl}).
984
985\subsubsection{Complete execution structured traces}
986
987The development of the construction above started relatively early,
988before the measurable subtrace preservation proofs.  To be confident
989that the traces were well-formed at that time, we also developed a
990complete execution form that embeds the traces above.  This includes
991non-terminating program executions, where an infinite number of the terminating
992structured traces are embedded.  This construction confirmed that our
993definition of structured traces was consistent, although we later
994found that we did not require the whole execution version for the
995compiler correctness results.
996
997To construct these we need to know whether each function call will
998eventually terminate, requiring the use of the excluded middle.  This
999classical reasoning is local to the construction of whole program
1000traces and is not necessary for our main results.
1001
1002\section{Conclusion}
1003
1004In combination with the work on the CerCo back-end and by
1005concentrating on the novel intensional parts of the proof, we have
1006shown that it is possible to construct certifying compilers that
1007correctly report execution time and stack space costs.  The layering
1008of intensional correctness proofs on top of normal simulation results
1009provides a useful separation of concerns, and could permit the reuse
1010of existing results.
1011
1012\section{Files}
1013
1014The following table gives a high-level overview of the \matita{}
1015source files in Deliverable 3.4:
1016
1017\bigskip
1018
1019\begin{tabular}{rp{.7\linewidth}}
1020\lstinline'compiler.ma' & Top-level compiler definitions, in particular
1021\lstinline'front_end', and the whole compiler definition
1022\lstinline'compile'. \\
1023\lstinline'correctness.ma' & Correctness results: \lstinline'front_end_correct'
1024and \lstinline'correct', respectively. \\
1025\lstinline'Clight/*' & \textsf{Clight}: proofs for switch
1026removal, cost labelling, cast simplification and conversion to
1027\textsf{Cminor}. \\
1028\lstinline'Cminor/*' & \textsf{Cminor}: axioms of conversion to
1029\textsf{RTLabs}. \\
1030\lstinline'RTLabs/*' & \textsf{RTLabs}: definitions and proofs for
1031compile-time cost labelling checks, construction of structured traces.
1032\\
1033\lstinline'common/Measurable.ma' & Definitions for measurable
1034subtraces. \\
1035\lstinline'common/FEMeasurable.ma' & Generic measurable subtrace
1036lifting proof. \\
1037\lstinline'common/*' & Other common definitions relevant to many parts
1038of the compiler and proof. \\
1039\lstinline'utilities/*' & General purpose definitions used throughout,
1040including extensions to the standard \matita{} library.
1041\end{tabular}
1042
1043\subsection{A brief overview of the backend compilation chain}
1044\label{subsect.brief.overview.backend.compilation.chain}
1045
1046The Matita compiler's backend consists of five distinct intermediate languages: RTL, RTLntl, ERTL, LTL and LIN.
1047A sixth language, RTLabs, serves as the entry point of the backend and the exit point of the frontend.
1048RTL, RTLntl, ERTL and LTL are control flow graph based' languages, whereas LIN is a linearised language, the final language before translation to assembly.
1049
1050We now briefly discuss the properties of the intermediate languages, and discuss the various transformations that take place during the translation process:
1051
1052\paragraph{RTLabs ((Abstract) Register Transfer Language)}
1053As mentioned, this is the final language of the compiler's frontend and the entry point for the backend.
1054This language uses pseudoregisters, not hardware registers.\footnote{There are an unbounded number of pseudoregisters.  Pseudoregisters are converted to hardware registers or stack positions during register allocation.}
1055Functions still use stackframes, where arguments are passed on the stack and results are stored in addresses.
1056During the pass to RTL instruction selection is carried out.
1057
1058\paragraph{RTL (Register Transfer Language)}
1059This language uses pseudoregisters, not hardware registers.
1060Tailcall elimination is carried out during the translation from RTL to RTLntl.
1061
1062\paragraph{RTLntl (Register Transfer Language --- No Tailcalls)}
1063This language is a pseudoregister, graph based language where all tailcalls are eliminated.
1064RTLntl is not present in the O'Caml compiler, the RTL language is reused for this purpose.
1065
1066\paragraph{ERTL (Explicit Register Transfer Language)}
1067This is a language very similar to RTLntl.
1068However, the calling convention is made explicit, in that functions no longer receive and return inputs and outputs via a high-level mechanism, but rather use stack slots or hadware registers.
1069The ERTL to LTL pass performs the following transformations: liveness analysis, register colouring and register/stack slot allocation.
1070
1071\paragraph{LTL (Linearisable Transfer Language)}
1072Another graph based language, but uses hardware registers instead of pseudoregisters.
1073Tunnelling (branch compression) should be implemented here.
1074
1075\paragraph{LIN (Linearised)}
1076This is a linearised form of the LTL language; function graphs have been linearised into lists of statements.
1077All registers have been translated into hardware registers or stack addresses.
1078This is the final stage of compilation before translating directly into assembly language.
1079
1080%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
1081% SECTION.                                                                    %
1082%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
1083\section{The backend intermediate languages in Matita}
1084\label{sect.backend.intermediate.languages.matita}
1085
1086We now discuss the encoding of the compiler backend languages in the Calculus of Constructions proper.
1087We pay particular heed to changes that we made from the O'Caml prototype.
1088In particular, many aspects of the backend languages have been unified into a single joint' language.
1089We have also made heavy use of dependent types to reduce spurious partiality' and to encode invariants.
1090
1091%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
1092% SECTION.                                                                    %
1093%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
1094\subsection{Abstracting related languages}
1095\label{subsect.abstracting.related.languages}
1096
1097The O'Caml compiler is written in the following manner.
1098Each intermediate language has its own dedicated syntax, notions of internal function, and so on.
1099Here, we make a distinction between internal functions'---other functions that are explicitly written by the programmer---and external functions', which belong to external library and require explictly linking.
1100In particular, IO can be seen as a special case of the external function' mechanism.
1101Internal functions are represented as a record consisting of a sequential structure of statements, entry and exit points to this structure, and other book keeping devices.
1102This sequential structure can either be a control flow graph or a linearised list of statements, depending on the language.
1103Translations between intermediate language map syntaxes to syntaxes, and internal function representations to internal function representations explicitly.
1104
1105This is a perfectly valid way to write a compiler, where everything is made explicit, but writing a \emph{verified} compiler poses new challenges.
1106In particular, we must look ahead to see how our choice of encodings will affect the size and complexity of the forthcoming proofs of correctness.
1107We now discuss some abstractions, introduced in the Matita code, which we hope will make our proofs shorter, amongst other benefits.
1108
1110Due to the bureaucracy inherent in explicating each intermediate language's syntax in the O'Caml compiler, it can often be hard to see exactly what changes between each successive intermediate language.
1111By abstracting the syntax of the RTL, ERTL, LTL and LIN intermediate languages, we make these changes much clearer.
1112
1113Our abstraction takes the following form:
1114\begin{lstlisting}[language=Grafite]
1115inductive joint_instruction (p: params__) (globals: list ident): Type[0] :=
1116  | COMMENT: String $\rightarrow$ joint_instruction p globals
1117  ...
1118  | INT: generic_reg p $\rightarrow$ Byte $\rightarrow$ joint_instruction p globals
1119  ...
1120  | OP1: Op1 $\rightarrow$ acc_a_reg p $\rightarrow$ acc_a_reg p $\rightarrow$ joint_instruction p globals
1121  ...
1122  | extension: extend_statements p $\rightarrow$ joint_instruction p globals.
1123\end{lstlisting}
1124We first note that for the majority of intermediate languages, many instructions are shared.
1125However, these instructions expect different register types (either a pseudoregister or a hardware register) as arguments.
1126We must therefore parameterise the joint syntax with a record of parameters that will be specialised to each intermediate language.
1127In the type above, this parameterisation is realised with the \texttt{params\_\_} record.
1128As a result of this parameterisation, we have also added a degree of type safety' to the intermediate languages' syntaxes.
1129In particular, we note that the \texttt{OP1} constructor expects quite a specific type, in that the two register arguments must both be what passes for the accumulator A in that language.
1130In some languages, for example LIN, this is the hardware accumulator, whilst in others this is any pseudoregister.
1131Contrast this with the \texttt{INT} constructor, which expects a \texttt{generic\_reg}, corresponding to an arbitrary' register type.
1132
1133Further, we note that some intermediate languages have language specific instructions (i.e. the instructions that change between languages).
1134We therefore add a new constructor to the syntax, \texttt{extension}, which expects a value of type \texttt{extend\_statements p}.
1135As \texttt{p} varies between intermediate languages, we can provide language specific extensions to the syntax of the joint language.
1136For example, ERTL's extended syntax consists of the following extra statements:
1137\begin{lstlisting}[language=Grafite]
1138inductive ertl_statement_extension: Type[0] :=
1139  | ertl_st_ext_new_frame: ertl_statement_extension
1140  | ertl_st_ext_del_frame: ertl_statement_extension
1141  | ertl_st_ext_frame_size: register $\rightarrow$ ertl_statement_extension.
1142\end{lstlisting}
1143These are further packaged into an ERTL specific instance of \texttt{params\_\_} as follows:
1144\begin{lstlisting}[language=Grafite]
1145definition ertl_params__: params__ :=
1146  mk_params__ register register ... ertl_statement_extension.
1147\end{lstlisting}
1148
1149\paragraph{Shared code, reduced proofs}
1150Many features of individual backend intermediate languages are shared with other intermediate languages.
1151For instance, RTLabs, RTL, ERTL and LTL are all graph based languages, where functions are represented as a control flow graph of statements that form their bodies.
1152Functions for adding statements to a graph, searching the graph, and so on, are remarkably similar across all languages, but are duplicated in the O'Caml code.
1153
1154As a result, we chose to abstract the representation of internal functions for the RTL, ERTL, LTL and LIN intermediate languages into a joint' representation.
1155This representation is parameterised by a record that dictates the layout of the function body for each intermediate language.
1156For instance, in RTL, the layout is graph like, whereas in LIN, the layout is a linearised list of statements.
1157Further, a generalised way of accessing the successor statement to the one currently under consideration is needed, and so forth.
1158
1159Our joint internal function record looks like so:
1160\begin{lstlisting}[language=Grafite]
1161record joint_internal_function (globals: list ident) (p:params globals) : Type[0] :=
1162{
1163  ...
1164  joint_if_params   : paramsT p;
1165  joint_if_locals   : localsT p;
1166  ...
1167  joint_if_code     : codeT ... p;
1168  ...
1169}.
1170\end{lstlisting}
1171In particular, everything that can vary between differing intermediate languages has been parameterised.
1172Here, we see the location where to fetch parameters, the listing of local variables, and the internal code representation has been parameterised.
1173Other particulars are also parameterised, though here omitted.
1174
1175Hopefully this abstraction process will reduce the number of proofs that need to be written, dealing with internal functions of different languages characterised by parameters.
1176
1177\paragraph{Dependency on instruction selection}
1178We note that the backend languages are all essentially post instruction selection languages'.
1179The joint' syntax makes this especially clear.
1180For instance, in the definition:
1181\begin{lstlisting}[language=Grafite]
1182inductive joint_instruction (p:params__) (globals: list ident): Type[0] :=
1183  ...
1184  | INT: generic_reg p $\rightarrow$ Byte $\rightarrow$ joint_instruction p globals
1185  | MOVE: pair_reg p $\rightarrow$ joint_instruction p globals
1186  ...
1187  | PUSH: acc_a_reg p $\rightarrow$ joint_instruction p globals
1188  ...
1189  | extension: extend_statements p $\rightarrow$ joint_instruction p globals.
1190\end{lstlisting}
1191The capitalised constructors---\texttt{INT}, \texttt{MOVE}, and so on---are all machine specific instructions.
1192Retargetting the compiler to another microprocessor, improving instruction selection, or simply enlarging the subset of the machine language that the compiler can use, would entail replacing these constructors with constructors that correspond to the instructions of the new target.
1193We feel that this makes which instructions are target dependent, and which are not (i.e. those language specific instructions that fall inside the \texttt{extension} constructor) much more explicit.
1194In the long term, we would really like to try to directly embed the target language in the syntax, in order to reuse the target language's semantics.
1195
1196\paragraph{Independent development and testing}
1197We have essentially modularised the intermediate languages in the compiler backend.
1198As with any form of modularisation, we reap benefits in the ability to independently test and develop each intermediate language separately, with the benefit of fixing bugs just once.
1199
1200\paragraph{Future reuse for other compiler projects}
1201Another advantage of our modularisation scheme is the ability to quickly use and reuse intermediate languages for other compiler projects.
1202For instance, in creating a cost-preserving compiler for a functional language, we may choose to target a linearised version of RTL directly.
1203Adding such an intermediate language would involve the addition of just a few lines of code.
1204
1205\paragraph{Easy addition of new compiler passes}
1206Under our modularisation and abstraction scheme, new compiler passes can easily be injected into the backend.
1207We have a concrete example of this in the RTLntl language, an intermediate language that was not present in the original O'Caml code.
1208To specify a new intermediate language we must simply specify, through the use of the statement extension mechanism, what differs in the new intermediate language from the joint' language, and configure a new notion of internal function record, by specialising parameters, to the new language.
1209As generic code for the joint' language exists, for example to add statements to control flow graphs, this code can be reused for the new intermediate language.
1210
1211\paragraph{Possible commutations of translation passes}
1212The backend translation passes of the CerCo compiler differ quite a bit from the CompCert compiler.
1213In the CompCert compiler, linearisation occurs much earlier in the compilation chain, and passes such as register colouring and allocation are carried out on a linearised form of program.
1214Contrast this with our own approach, where the code is represented as a graph for much longer.
1215Similarly, in CompCert the calling conventions are enforced after register allocation, whereas we do register allocation before enforcing the calling convention.
1216
1217However, by abstracting the representation of intermediate functions, we are now much more free to reorder translation passes as we see fit.
1218The linearisation process, for instance, now no longer cares about the specific representation of code in the source and target languages.
1219It just relies on a common interface.
1220We are therefore, in theory, free to pick where we wish to linearise our representation.
1221This adds an unusual flexibility into the compilation process, and allows us to freely experiment with different orderings of translation passes.
1222
1223%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
1224% SECTION.                                                                    %
1225%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
1226\subsection{Use of dependent types}
1227\label{subsect.use.of.dependent.types}
1228
1229We see several potential ways in which a compiler can fail to compile a program:
1230\begin{enumerate}
1231\item
1232The program is malformed, and there is no hope of making sense of the program.
1233\item
1234The compiler is buggy, or an invariant in the compiler is invalidated.
1235\item
1236An incomplete heuristic in the compiler fails.
1237\item
1238The compiled code exhausts some bounded resource, for instance the processor's code memory.
1239\end{enumerate}
1240Standard compilers can fail for all the above reasons.
1241Certified compilers are only required to rule out the second class of failures, but they can still fail for all the remaining reasons.
1242In particular, a compiler that systematically refuses to compile any well formed program is a sound compiler.
1243On the contrary, we would like our certified compiler to fail only in the fourth case.
1244We plan to achieve this with the following strategy.
1245
1246First, the compiler is abstracted over all incomplete heuristics, seen as total functions.
1247To obtain executable code, the compiler is eventually composed with implementations of the abstracted strategies, with the composition taking care of any potential failure of the heuristics in finding a solution.
1248
1249Second, we reject all malformed programs using dependent types: only well-formed programs should typecheck and the compiler can be applied only to well-typed programs.
1250
1251Finally, exhaustion of bounded resources can be checked only at the very end of compilation.
1252Therefore, all intermediate compilation steps are now total functions that cannot diverge, nor fail: these properties are directly guaranteed by the type system of Matita.
1253
1254Presently, the plan is not yet fulfilled.
1255However, we are improving the implemented code according to our plan.
1256We are doing this by progressively strenghthening the code through the use of dependent types.
1257We detail the different ways in which dependent types have been used so far.
1258
1259First, we encode informal invariants, or uses of \texttt{assert false} in the O'Caml code, with dependent types, converting partial functions into total functions.
1260There are numerous examples of this throughout the backend.
1261For example, in the \texttt{RTLabs} to \texttt{RTL} transformation pass, many functions only make sense' when lists of registers passed to them as arguments conform to some specific length.
1262For instance, the \texttt{translate\_negint} function, which translates a negative integer constant:
1263\begin{lstlisting}[language=Grafite]
1264definition translate_negint :=
1265  $\lambda$globals: list ident.
1266  $\lambda$destrs: list register.
1267  $\lambda$srcrs: list register.
1268  $\lambda$start_lbl: label.
1269  $\lambda$dest_lbl: label.
1270  $\lambda$def: rtl_internal_function globals.
1271  $\lambda$prf: |destrs| = |srcrs|. (* assert here *)
1272    ...
1273\end{lstlisting}
1274The last argument to the function, \texttt{prf}, is a proof that the lengths of the lists of source and destination registers are the same.
1275This was an assertion in the O'Caml code.
1276
1277Secondly, we make use of dependent types to make the Matita code correct by construction, and eventually the proofs of correctness for the compiler easier to write.
1278For instance, many intermediate languages in the backend of the compiler, from RTLabs to LTL, are graph based languages.
1279Here, function definitions consist of a graph (i.e. a map from labels to statements) and a pair of labels denoting the entry and exit points of this graph.
1280Practically, we would always like to ensure that the entry and exit labels are present in the statement graph.
1281We ensure that this is so with a dependent sum type in the \texttt{joint\_internal\_function} record, which all graph based languages specialise to obtain their own internal function representation:
1282\begin{lstlisting}[language=Grafite]
1283record joint_internal_function (globals: list ident) (p: params globals): Type[0] :=
1284{
1285  ...
1286  joint_if_code     : codeT $\ldots$ p;
1287  joint_if_entry    : $\Sigma$l: label. lookup $\ldots$ joint_if_code l $\neq$ None $\ldots$;
1288  ...
1289}.
1290\end{lstlisting}
1291Here, \texttt{codeT} is a parameterised type representing the structure' of the function's body (a graph in graph based languages, and a list in the linearised LIN language).
1292Specifically, the \texttt{joint\_if\_entry} is a dependent pair consisting of a label and a proof that the label in question is a vertex in the function's graph. A similar device exists for the exit label.
1293
1294We make use of dependent types also for another reason: experimentation.
1295Namely, CompCert makes little use of dependent types to encode invariants.
1296In contrast, we wish to make as much use of dependent types as possible, both to experiment with different ways of encoding compilers in a proof assistant, but also as a way of stress testing' Matita's support for dependent types.
1297
1298Moreover, at the moment we make practically no use of inductive predicates to specify compiler invariants and to describe the semantics of intermediate languages.
1299On the contrary, all predicates are computable functions.
1300Therefore, the proof style that we will adopt will be necessarily significantly different from, say, CompCert's one.
1301At the moment, in Matita Russell-'-style reasoning (in the sense of~\cite{sozeau:subset:2006}) seems to be the best solution for working with computable functions.
1302This style is heavily based on the idea that all computable functions should be specified using dependent types to describe their pre- and post-conditions.
1303As a consequence, it is natural to add dependent types almost everywhere in the Matita compiler's codebase.
1304
1305%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
1306% SECTION.                                                                    %
1307%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
1308\subsection{What we do not implement}
1309\label{subsect.what.we.do.not.implement}
1310
1311There are several classes of functionality that we have chosen not to implement in the backend languages:
1312\begin{itemize}
1313\item
1314\textbf{Datatypes and functions over these datatypes that are not supported by the compiler.}
1315In particular, the compiler does not support the floating point datatype, nor accompanying functions over that datatype.
1316At the moment, frontend languages within the compiler possess constructors corresponding to floating point code.
1317These are removed during instruction selection (in the RTLabs to RTL transformation) using a daemon.\footnote{A Girardism.  An axiom of type \texttt{False}, from which we can prove anything.}
1318However, at some point, we would like the front end of the compiler to recognise programs that use floating point code and reject them as being invalid.
1319\item
1320\textbf{Axiomatised components that will be implemented using external oracles.}
1321Several large, complex pieces of compiler infrastructure, most noticably register colouring and fixed point calculation during liveness analysis have been axiomatised.
1322This was already agreed upon before the start of the project, and is clearly marked in the project proposal, following comments by those involved with the CompCert project about the difficulty in formalising register colouring in that project.
1323Instead, these components are axiomatised, along with the properties that they need to satisfy in order for the rest of the compilation chain to be correct.
1324These axiomatised components are found in the ERTL to LTL pass.
1325
1326It should be noted that these axiomatised components fall into the following pattern: whilst their implementation is complex, and their proof of correctness is difficult, we are able to quickly and easily verify that any answer that they provide is correct. Therefore, we plan to provide implementations in OCaml only,
1327and to provide certified verifiers in Matita.
1328At the moment, the implementation of the certified verifiers is left as future work.
1329\item
1330\textbf{A few non-computational proof obligations.}
1331A few difficult-to-close, but non-computational (i.e. they do not prevent us from executing the compiler inside Matita), proof obligations have been closed using daemons in the backend.
1332These proof obligations originate with our use of dependent types for expressing invariants in the compiler.
1333However, here, it should be mentioned that many open proof obligations are simply impossible to close until we start to obtain stronger invariants from the proof of correctness for the compiler proper.
1334In particular, in the RTLabs to RTL pass, several proof obligations relating to lists of registers stored in a local environment' appear to fall into this pattern.
1335\item
1336\textbf{Branch compression (tunnelling).}
1337This was a feature of the O'Caml compiler.
1338It is not yet currently implemented in the Matita compiler.
1339This feature is only an optimisation, and will not affect the correctness of the compiler.
1340\item
1341\textbf{Real' tailcalls}
1342For the time being, tailcalls in the backend are translated to vanilla' function calls during the ERTL to LTL pass.
1343This follows the O'Caml compiler, which did not implement tailcalls, and did this simplification step.
1344Real' tailcalls are being implemented in the O'Caml compiler, and when this implementation is complete, we aim to port this code to the Matita compiler.
1345\end{itemize}
1346
1347%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
1348% SECTION.                                                                    %
1349%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
1350\section{Associated changes to O'Caml compiler}
1351\label{sect.associated.changes.to.ocaml.compiler}
1352
1353At the moment, only bugfixes, but no architectural changes we have made in the Matita backend have made their way back into the O'Caml compiler.
1354We do not see the heavy process of modularisation and abstraction as making its way back into the O'Caml codebase, as this is a significant rewrite of the backend code that is supposed to yield the same code after instantiating parameters, anyway.
1355
1356%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
1357% SECTION.                                                                    %
1358%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
1359\section{Future work}
1360\label{sect.future.work}
1361
1362As mentioned in Section~\ref{subsect.what.we.do.not.implement}, there are several unimplemented features in the compiler, and several aspects of the Matita code that can be improved in order to make currently partial functions total.
1363We summarise this future work here:
1364\begin{itemize}
1365\item
1366We plan to make use of dependent types to identify floating point' free programs and make all functions total over such programs.
1367This will remove a swathe of uses of daemons.
1368This should be routine.
1369\item
1370We plan to move expansion of integer modulus, and other related functions, into the instruction selection (RTLabs to RTL) phase.
1371This will also help to remove a swathe of uses of daemons, as well as potentially introduce new opportunities for optimisations that we currently miss in expanding these instructions at the C-light level.
1372\item
1373We plan to close all existing proof obligations that are closed using daemons, arising from our use of dependent types in the backend.
1374However, many may not be closable until we have completed Deliverable D4.4, the certification of the whole compiler, as we may not have invariants strong enough at the present time.
1375\item
1376We plan to port the O'Caml compiler's implementation of tailcalls when this is completed, and eventually port the branch compression code currently in the O'Caml compiler to the Matita implementation.
1377This should not cause any major problems.
1378\item
1379We plan to validate the backend translations, removing any obvious bugs, by executing the translation inside Matita on small C programs.
1380This is not critical, as the certification process will find all bugs anyway.
1381\item We plan to provide certified validators for all results provided by
1382external oracles written in OCaml. At the moment, we have axiomatized oracles
1383for computing least fixpoints during liveness analysis, for colouring
1384registers and for branch displacement in the assembler code.
1385\end{itemize}
1386
1387\section{The back-end intermediate languages' semantics in Matita}
1388\label{sect.backend.intermediate.languages.semantics.matita}
1389
1390%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
1391% SECTION.                                                                    %
1392%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
1393\subsection{Abstracting related languages}
1394\label{subsect.abstracting.related.languages}
1395
1396As mentioned in the report for Deliverable D4.2, a systematic process of abstraction, over the OCaml code, has taken place in the Matita encoding.
1397In particular, we have merged many of the syntaxes of the intermediate languages (i.e. RTL, ERTL, LTL and LIN) into a single joint' syntax, which is parameterised by various types.
1398Equivalent intermediate languages to those present in the OCaml code can be recovered by specialising this joint structure.
1399
1400As mentioned in the report for Deliverable D4.2, there are a number of advantages that this process of abstraction brings, from code reuse to allowing us to get a clearer view of the intermediate languages and their structure.
1401However, the semantics of the intermediate languages allow us to concretely demonstrate this improvement in clarity, by noting that the semantics of the LTL and the semantics of the LIN languages are identical.
1402In particular, the semantics of both LTL and LIN are implemented in exactly the same way.
1403The only difference between the two languages is how the next instruction to be interpreted is fetched.
1404In LTL, this involves looking up in a graph, whereas in LTL, this involves fetching from a list of instructions.
1405
1406As a result, we see that the semantics of LIN and LTL are both instances of a single, more general language that is parametric in how the next instruction is fetched.
1407Furthermore, any prospective proof that the semantics of LTL and LIN are identical is now almost trivial, saving a deal of work in Deliverable D4.4.
1408
1409%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
1410% SECTION.                                                                    %
1411%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
1412\subsection{Type parameters, and their purpose}
1413\label{subsect.type.parameters.their.purpose}
1414
1415We mentioned in the Deliverable D4.2 report that all joint languages are parameterised by a number of types, which are later specialised to each distinct intermediate language.
1416As this parameterisation process is also dependent on designs decisions in the language semantics, we have so far held off summarising the role of each parameter.
1417
1418We begin the abstraction process with the \texttt{params\_\_} record.
1419This holds the types of the representations of the different register varieties in the intermediate languages:
1420\begin{lstlisting}[language=Grafite]
1421record params__: Type[1] :=
1422{
1423  acc_a_reg: Type[0];
1424  acc_b_reg: Type[0];
1425  dpl_reg: Type[0];
1426  dph_reg: Type[0];
1427  pair_reg: Type[0];
1428  generic_reg: Type[0];
1429  call_args: Type[0];
1430  call_dest: Type[0];
1431  extend_statements: Type[0]
1432}.
1433\end{lstlisting}
1434We summarise what these types mean, and how they are used in both the semantics and the translation process:
1435\begin{center}
1436\begin{tabular*}{\textwidth}{p{4cm}p{11cm}}
1437Type & Explanation \\
1438\hline
1439\texttt{acc\_a\_reg} & The type of the accumulator A register.  In some languages this is implemented as the hardware accumulator, whereas in others this is a pseudoregister.\\
1440\texttt{acc\_b\_reg} & Similar to the accumulator A field, but for the processor's auxilliary accumulator, B. \\
1441\texttt{dpl\_reg} & The type of the representation of the low eight bit register of the MCS-51's single 16 bit register, DPL.  Can be either a pseudoregister or the hardware DPL register. \\
1442\texttt{dph\_reg} & Similar to the DPL register but for the eight high bits of the 16-bit register. \\
1443\texttt{pair\_reg} & Various different move' instructions have been merged into a single move instruction in the joint language.  A value can either be moved to or from the accumulator in some languages, or moved to and from an arbitrary pseudoregister in others.  This type encodes how we should move data around the registers and accumulators. \\
1444\texttt{generic\_reg} & The representation of generic registers (i.e. those that are not devoted to a specific task). \\
1445\texttt{call\_args} & The actual arguments passed to a function.  For some languages this is simply the number of arguments passed to the function. \\
1446\texttt{call\_dest} & The destination of the function call. \\
1447\texttt{extend\_statements} & Instructions that are specific to a particular intermediate language, and which cannot be abstracted into the joint language.
1448\end{tabular*}
1449\end{center}
1450
1451As mentioned in the report for Deliverable D4.2, the record \texttt{params\_\_} is enough to be able to specify the instructions of the joint languages:
1452\begin{lstlisting}[language=Grafite]
1453inductive joint_instruction (p: params__) (globals: list ident): Type[0] :=
1454  | COMMENT: String $\rightarrow$ joint_instruction p globals
1455  | COST_LABEL: costlabel $\rightarrow$ joint_instruction p globals
1456  ...
1457  | OP1: Op1 $\rightarrow$ acc_a_reg p $\rightarrow$ acc_a_reg p $\rightarrow$ joint_instruction p globals
1458  | COND: acc_a_reg p $\rightarrow$ label $\rightarrow$ joint_instruction p globals
1459  ...
1460\end{lstlisting}
1461Here, we see that the instruction \texttt{OP1} (a unary operation on the accumulator A) can be given quite a specific type, through the use of the \texttt{params\_\_} data structure.
1462
1463Joint statements can be split into two subclasses: those who simply pass the flow of control onto their successor statement, and those that jump to a potentially remote location in the program.
1464Naturally, as some intermediate languages are graph based, and others linearised, the passing act of passing control on to the successor' instruction can either be the act of following a graph edge in a control flow graph, or incrementing an index into a list.
1465We make a distinction between instructions that pass control onto their immediate successors, and those that jump elsewhere in the program, through the use of \texttt{succ}, denoting the immediate successor of the current instruction, in the \texttt{params\_} record described below.
1466\begin{lstlisting}[language=Grafite]
1467record params_: Type[1] :=
1468{
1469  pars__ :> params__;
1470  succ: Type[0]
1471}.
1472\end{lstlisting}
1473The type \texttt{succ} corresponds to labels, in the case of control flow graph based languages, or is instantiated to the unit type for the linearised language, LIN.
1474Using \texttt{param\_} we can define statements of the joint language:
1475\begin{lstlisting}[language=Grafite]
1476inductive joint_statement (p:params_) (globals: list ident): Type[0] :=
1477  | sequential: joint_instruction p globals $\rightarrow$ succ p $\rightarrow$ joint_statement p globals
1478  | GOTO: label $\rightarrow$ joint_statement p globals
1479  | RETURN: joint_statement p globals.
1480\end{lstlisting}
1481Note that in the joint language, instructions are linear', in that they have an immediate successor.
1482Statements, on the other hand, consist of either a linear instruction, or a \texttt{GOTO} or \texttt{RETURN} statement, both of which can jump to an arbitrary place in the program. The conditional jump instruction COND is linear', since it
1483has an immediate successor, but it also takes an arbitrary location (a label)
1485
1486For the semantics, we need further parametererised types.
1487In particular, we parameterise the result and parameter type of an internal function call in \texttt{params0}:
1488\begin{lstlisting}[language=Grafite]
1489record params0: Type[1] :=
1490{
1491  pars__' :> params__;
1492  resultT: Type[0];
1493  paramsT: Type[0]
1494}.
1495\end{lstlisting}
1496Here, \texttt{paramsT} and \texttt{resultT} typically are the (pseudo)registers that store the parameters and result of a function.
1497
1498We further extend \texttt{params0} with a type for local variables in internal function calls:
1499\begin{lstlisting}[language=Grafite]
1500record params1 : Type[1] :=
1501{
1502  pars0 :> params0;
1503  localsT: Type[0]
1504}.
1505\end{lstlisting}
1506Again, we expand our parameters with types corresponding to the code representation (either a control flow graph or a list of statements).
1507Further, we hypothesise a generic method for looking up the next instruction in the graph, called \texttt{lookup}.
1508Note that \texttt{lookup} may fail, and returns an \texttt{option} type:
1509\begin{lstlisting}[language=Grafite]
1510record params (globals: list ident): Type[1] :=
1511{
1512  succ_ : Type[0];
1513  pars1 :> params1;
1514  codeT : Type[0];
1515  lookup: codeT $\rightarrow$ label $\rightarrow$ option (joint_statement (mk_params_ pars1 succ_) globals)
1516}.
1517\end{lstlisting}
1518We now have what we need to define internal functions for the joint language.
1519The first two universe' fields are only used in the compilation process, for generating fresh names, and do not affect the semantics.
1520The rest of the fields affect both compilation and semantics.
1521In particular, we have a description of the result, parameters and the local variables of a function.
1522Note also that we have lifted the hypothesised \texttt{lookup} function from \texttt{params} into a dependent sigma type, which combines a label (the entry and exit points of the control flow graph or list) combined with a proof that the label is in the graph structure:
1523\begin{lstlisting}[language=Grafite]
1524record joint_internal_function (globals: list ident) (p:params globals) : Type[0] :=
1525{
1526  joint_if_luniverse: universe LabelTag;
1527  joint_if_runiverse: universe RegisterTag;
1528  joint_if_result   : resultT p;
1529  joint_if_params   : paramsT p;
1530  joint_if_locals   : localsT p;
1531  joint_if_stacksize: nat;
1532  joint_if_code     : codeT ... p;
1533  joint_if_entry    : $\Sigma$l: label. lookup ... joint_if_code l $\neq$ None ?;
1534  joint_if_exit     : $\Sigma$l: label. lookup ... joint_if_code l $\neq$ None ?
1535}.
1536\end{lstlisting}
1537Naturally, a question arises as to why we have chosen to split up the parameterisation into so many intermediate records, each slightly extending earlier ones.
1538The reason is because some intermediate languages share a host of parameters, and only differ on some others.
1539For instance, in instantiating the ERTL language, certain parameters are shared with RTL, whilst others are ERTL specific:
1540\begin{lstlisting}[language=Grafite]
1541...
1542definition ertl_params__: params__ :=
1543 mk_params__ register register register register (move_registers $\times$ move_registers)
1544  register nat unit ertl_statement_extension.
1545...
1546definition ertl_params1: params1 := rtl_ertl_params1 ertl_params0.
1547definition ertl_params: $\forall$globals. params globals := rtl_ertl_params ertl_params0.
1548...
1549definition ertl_statement := joint_statement ertl_params_.
1550
1551definition ertl_internal_function :=
1552  $\lambda$globals.joint_internal_function ... (ertl_params globals).
1553\end{lstlisting}
1554Here, \texttt{rtl\_ertl\_params1} are the common parameters of the ERTL and RTL languages:
1555\begin{lstlisting}[language=Grafite]
1556definition rtl_ertl_params1 := $\lambda$pars0. mk_params1 pars0 (list register).
1557\end{lstlisting}
1558
1559The record \texttt{more\_sem\_params} bundles together functions that store and retrieve values in various forms of register:
1560\begin{lstlisting}[language=Grafite]
1561record more_sem_params (p:params_): Type[1] :=
1562{
1563  framesT: Type[0];
1564  empty_framesT: framesT;
1565
1566  regsT: Type[0];
1567  empty_regsT: regsT;
1568
1569  call_args_for_main: call_args p;
1570  call_dest_for_main: call_dest p;
1571
1572  greg_store_: generic_reg p $\rightarrow$ beval $\rightarrow$ regsT $\rightarrow$ res regsT;
1573  greg_retrieve_: regsT $\rightarrow$ generic_reg p $\rightarrow$ res beval;
1574  acca_store_: acc_a_reg p $\rightarrow$ beval $\rightarrow$ regsT $\rightarrow$ res regsT;
1575  acca_retrieve_: regsT $\rightarrow$ acc_a_reg p $\rightarrow$ res beval;
1576  ...
1577  dpl_store_: dpl_reg p $\rightarrow$ beval $\rightarrow$ regsT $\rightarrow$ res regsT;
1578  dpl_retrieve_: regsT $\rightarrow$ dpl_reg p $\rightarrow$ res beval;
1579  ...
1580  pair_reg_move_: regsT $\rightarrow$ pair_reg p $\rightarrow$ res regsT;
1581}.
1582\end{lstlisting}
1583Here, the fields \texttt{empty\_framesT}, \texttt{empty\_regsT}, \texttt{call\_args\_for\_main} and \texttt{call\_dest\_for\_main} are used for state initialisation.
1584
1585The fields \texttt{greg\_store\_} and \texttt{greg\_retrieve\_} store and retrieve values from a generic register, respectively.
1586Similarly, \texttt{pair\_reg\_move} implements the generic move instruction of the joint language.
1587Here \texttt{framesT} is the type of stack frames, with \texttt{empty\_framesT} an empty stack frame.
1588
1589The two hypothesised values \texttt{call\_args\_for\_main} and \texttt{call\_dest\_for\_main} deal with problems with the \texttt{main} function of the program, and how it is handled.
1590In particular, we need to know when the \texttt{main} function has finished executing.
1591But this is complicated, in C, by the fact that the \texttt{main} function is explicitly allowed to be recursive (disallowed in C++).
1592Therefore, to understand whether the exiting \texttt{main} function is really exiting, or just recursively calling itself, we need to remember the address to which \texttt{main} will return control once the initial call to \texttt{main} has finished executing.
1593This is done with \texttt{call\_dest\_for\_main}, whereas \texttt{call\_args\_for\_main} holds the \texttt{main} function's arguments.
1594
1595We extend \texttt{more\_sem\_params} with yet more parameters via \texttt{more\_sem\_params2}:
1596\begin{lstlisting}[language=Grafite]
1597record more_sem_params1 (globals: list ident) (p: params globals) : Type[1] :=
1598{
1599  more_sparams1 :> more_sem_params p;
1600
1601  succ_pc: succ p $\rightarrow$ address $\rightarrow$ res address;
1602  pointer_of_label: genv ... p $\rightarrow$ pointer $\rightarrow$
1603    label $\rightarrow$ res ($\Sigma$p:pointer. ptype p = Code);
1604  ...
1605  fetch_statement:
1606    genv ... p $\rightarrow$ state (mk_sem_params ... more_sparams1) $\rightarrow$
1607    res (joint_statement (mk_sem_params ... more_sparams1) globals);
1608  ...
1609  save_frame:
1610    address $\rightarrow$ nat $\rightarrow$ paramsT ... p $\rightarrow$ call_args p $\rightarrow$ call_dest p $\rightarrow$
1611    state (mk_sem_params ... more_sparams1) $\rightarrow$
1612    res (state (mk_sem_params ... more_sparams1));
1613  pop_frame:
1614    genv globals p $\rightarrow$ state (mk_sem_params ... more_sparams1) $\rightarrow$
1615    res ((state (mk_sem_params ... more_sparams1)));
1616  ...
1617  set_result:
1618    list val $\rightarrow$ state (mk_sem_params ... more_sparams1) $\rightarrow$
1619    res (state (mk_sem_params ... more_sparams1));
1620  exec_extended:
1621    genv globals p $\rightarrow$ extend_statements (mk_sem_params ... more_sparams1) $\rightarrow$
1622    succ p $\rightarrow$ state (mk_sem_params ... more_sparams1) $\rightarrow$
1623    IO io_out io_in (trace $\times$ (state (mk_sem_params ... more_sparams1)))
1624 }.
1625\end{lstlisting}
1626The field \texttt{succ\_pc} takes an address, and a successor' label, and returns the address of the instruction immediately succeeding the one at hand.
1627
1628Here, \texttt{fetch\_statement} fetches the next statement to be executed.
1629The fields \texttt{save\_frame} and \texttt{pop\_frame} manipulate stack frames.
1630In particular, \texttt{save\_frame} creates a new stack frame on the top of the stack, saving the destination and parameters of a function, and returning an updated state.
1631The field \texttt{pop\_frame} destructively pops a stack frame from the stack, returning an updated state.
1632Further, \texttt{set\_result} saves the result of the function computation, and \texttt{exec\_extended} is a function that executes the extended statements, peculiar to each individual intermediate language.
1633
1634We bundle \texttt{params} and \texttt{sem\_params} together into a single record.
1635This will be used in the function \texttt{eval\_statement} which executes a single statement of the joint language:
1636\begin{lstlisting}[language=Grafite]
1637record sem_params2 (globals: list ident): Type[1] :=
1638{
1639  p2 :> params globals;
1640  more_sparams2 :> more_sem_params2 globals p2
1641}.
1642\end{lstlisting}
1643\noindent
1644The \texttt{state} record holds the current state of the interpreter:
1645\begin{lstlisting}[language=Grafite]
1646record state (p: sem_params): Type[0] :=
1647{
1648  st_frms: framesT ? p;
1650  sp: pointer;
1651  isp: pointer;
1652  carry: beval;
1653  regs: regsT ? p;
1654  m: bemem
1655}.
1656\end{lstlisting}
1657Here \texttt{st\_frms} represent stack frames, \texttt{pc} the program counter, \texttt{sp} the stack pointer, \texttt{isp} the internal stack pointer, \texttt{carry} the carry flag, \texttt{regs} the registers (hardware and pseudoregisters) and \texttt{m} external RAM.
1658Note that we have two stack pointers, as we have two stacks: the physical stack of the MCS-51 microprocessor, and an emulated stack in external RAM.
1659The MCS-51's own stack is minuscule, therefore it is usual to emulate a much larger, more useful stack in external RAM.
1660We require two stack pointers as the MCS-51's \texttt{PUSH} and \texttt{POP} instructions manipulate the physical stack, and not the emulated one.
1661
1662We use the function \texttt{eval\_statement} to evaluate a single joint statement:
1663\begin{lstlisting}[language=Grafite]
1664definition eval_statement:
1665  $\forall$globals: list ident.$\forall$p:sem_params2 globals.
1666    genv globals p $\rightarrow$ state p $\rightarrow$ IO io_out io_in (trace $\times$ (state p)) :=
1667...
1668\end{lstlisting}
1669We examine the type of this function.
1670Note that it returns a monadic action, \texttt{IO}, denoting that it may have an IO \emph{side effect}, where the program reads or writes to some external device or memory address.
1672Further, the function returns a new state, updated by the single step of execution of the program.
1673Finally, a \emph{trace} is also returned, which records externally observable events', such as the calling of external functions and the emission of cost labels.
1674
1675%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
1676% SECTION.                                                                    %
1677%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
1680
1681Monads are a categorical notion that have recently gained an amount of traction in functional programming circles.
1682In particular, it was noted by Moggi that monads could be used to sequence \emph{effectful} computations in a pure manner.
1683Here, effectful computations' cover a lot of ground, from writing to files, generating fresh names, or updating an ambient notion of state.
1684
1685A monad can be characterised by the following:
1686\begin{itemize}
1687\item
1688A data type, $M$.
1689For instance, the \texttt{option} type in OCaml or Matita.
1690\item
1691A way to inject' or lift' pure values into this data type (usually called \texttt{return}).
1692We call this function \texttt{return} and say that it must have type $\alpha \rightarrow M \alpha$, where $M$ is the name of the monad.
1693In our example, the lifting' function for the \texttt{option} monad can be implemented as:
1694\begin{lstlisting}[language=Grafite]
1695let return x = Some x
1696\end{lstlisting}
1697\item
1698A way to sequence' monadic functions together, to form another monadic function, usually called \texttt{bind}.
1699Bind has type $M \alpha \rightarrow (\alpha \rightarrow M \beta) \rightarrow M \beta$.
1700We can see that bind unpacks' a monadic value, applies a function after unpacking, and repacks' the new value in the monad.
1701In our example, the sequencing function for the \texttt{option} monad can be implemented as:
1702\begin{lstlisting}[language=Grafite]
1703let bind o f =
1704  match o with
1705    None -> None
1706    Some s -> f s
1707\end{lstlisting}
1708\item
1709A series of algebraic laws that relate \texttt{return} and \texttt{bind}, ensuring that the sequencing operation does the right thing' by retaining the order of effects.
1710These \emph{monad laws} should also be useful in reasoning about monadic computations in the proof of correctness of the compiler.
1711\end{itemize}
1712In the semantics of both front and back-end intermediate languages, we make use of monads.
1713This monadic infrastructure is shared between the front-end and back-end languages.
1714
1715In particular, an IO' monad, signalling the emission of a cost label, or the calling of an external function, is heavily used in the semantics of the intermediate languages.
1716Here, the monad's sequencing operation ensures that cost label emissions and function calls are maintained in the correct order.
1717We have already seen how the \texttt{eval\_statement} function of the joint language is monadic, with type:
1718\begin{lstlisting}[language=Grafite]
1719definition eval_statement:
1720  $\forall$globals: list ident.$\forall$p:sem_params2 globals.
1721    genv globals p $\rightarrow$ state p $\rightarrow$ IO io_out io_in (trace $\times$ (state p)) :=
1722...
1723\end{lstlisting}
1724If we examine the body of \texttt{eval\_statement}, we may also see how the monad sequences effects.
1725For instance, in the case for the \texttt{LOAD} statement, we have the following:
1726\begin{lstlisting}[language=Grafite]
1727definition eval_statement:
1728  $\forall$globals: list ident. $\forall$p:sem_params2 globals.
1729    genv globals p $\rightarrow$ state p $\rightarrow$ IO io_out io_in (trace $\times$ (state p)) :=
1730  $\lambda$globals, p, ge, st.
1731  ...
1732  match s with
1733  | LOAD dst addrl addrh $\Rightarrow$
1734    ! vaddrh $\leftarrow$ dph_retrieve ... st addrh;
1735    ! vaddrl $\leftarrow$ dpl_retrieve ... st addrl;
1736    ! vaddr $\leftarrow$ pointer_of_address $\langle$vaddrl,vaddrh$\rangle$;
1737    ! v $\leftarrow$ opt_to_res ... (msg FailedLoad) (beloadv (m ... st) vaddr);
1738    ! st $\leftarrow$ acca_store p ... dst v st;
1739    ! st $\leftarrow$ next ... l st ;
1740      ret ? $\langle$E0, st$\rangle$
1741\end{lstlisting}
1742Here, we employ a certain degree of syntactic sugaring.
1743The syntax
1744\begin{lstlisting}[language=Grafite]
1745  ...
1746! vaddrh $\leftarrow$ dph_retrieve ... st addrh;
1747! vaddrl $\leftarrow$ dpl_retrieve ... st addrl;
1748  ...
1749\end{lstlisting}
1750is sugaring for the \texttt{IO} monad's binding operation.
1751We can expand this sugaring to the following much more verbose code:
1752\begin{lstlisting}[language=Grafite]
1753  ...
1754  bind (dph_retrieve ... st addrh) ($\lambda$vaddrh. bind (dpl_retrieve ... st addrl)
1755    ($\lambda$vaddrl. ...))
1756\end{lstlisting}
1757Note also that the function \texttt{ret} is implementing the lifting', or return function of the \texttt{IO} monad.
1758
1759We believe the sugaring for the monadic bind operation makes the program much more readable, and therefore easier to reason about.
1760In particular, note that the functions \texttt{dph\_retrieve}, \texttt{pointer\_of\_address}, \texttt{acca\_store} and \texttt{next} are all monadic.
1761
1762Note, however, that inside this monadic code, there is also another monad hiding.
1763The \texttt{res} monad signals failure, along with an error message.
1764The monad's sequencing operation ensures the order of error messages does not get rearranged.
1765The function \texttt{opt\_to\_res} lifts an option type into this monad, with an error message to be used in case of failure.
1766The \texttt{res} monad is then coerced into the \texttt{IO} monad, ensuring the whole code snippet typechecks.
1767
1768\subsection{Memory models}
1769\label{subsect.memory.models}
1770
1771Currently, the semantics of the front and back-end intermediate languages are built around two distinct memory models.
1772The front-end languages reuse the CompCert 1.6 memory model, whereas the back-end languages employ a version tailored to their needs.
1773This split between the memory models reflects the fact that the front-end and back-end languages have different requirements from their memory models.
1774
1775In particular, the CompCert 1.6 memory model places quite heavy restrictions on where in memory one can read from.
1776To read a value in this memory model, you must supply an address, complete with the size of chunk' to read following that address.
1777The read is only successful if you attempt to read at a genuine value boundary', and read the appropriate amount of memory for that value.
1778As a result, with that memory model you are unable to read the third byte of a 32-bit integer value directly from memory, for instance.
1779This has some consequences for the compiler, namely an inability to write a \texttt{memcpy} routine.
1780
1781However, the CerCo memory model operates differently, as we need to move data piecemeal' between stacks in the back-end of the compiler.
1782As a result, the back-end memory model allows one to read data at any memory location, not just on value boundaries.
1783This has the advantage that we can successfully give a semantics to a \texttt{memcpy} routine in the back-end of the CerCo compiler (remembering that \texttt{memcpy} is nothing more than read a byte, copy a byte' repeated in a loop), an advantage over CompCert.  However, the front-end of CerCo cannot because its memory model and values are the similar to CompCert 1.6.
1784
1785More recent versions of CompCert's memory model have evolved in a similar direction, with a byte-by-byte representation of memory blocks.  However, there remains an important difference in the handling of pointer values in the rest of the formalisation.  In particular, in CompCert 1.10 only complete pointer values can be loaded in all of the languages in the compiler, whereas in CerCo we need to represent individual bytes of a pointer in the back-end to support our 8-bit target architecture.
1786
1787Right now, the two memory models are interfaced during the translation from RTLabs to RTL.
1788It is an open question whether we will unify the two memory models, using only the back-end, bespoke memory model throughout the compiler, as the CompCert memory model seems to work fine for the front-end, where such byte-by-byte copying is not needed.
1789However, should we decide to port the front-end to the new memory model, it has been written in such an abstract way that doing so would be relatively straightforward.
1790
1791
Note: See TracBrowser for help on using the repository browser.