1 | \documentclass[11pt,epsf,a4wide]{article} |
---|
2 | \usepackage[mathletters]{ucs} |
---|
3 | \usepackage[utf8x]{inputenc} |
---|
4 | \usepackage{listings} |
---|
5 | \usepackage{../../style/cerco} |
---|
6 | \newcommand{\ocaml}{OCaml} |
---|
7 | \newcommand{\clight}{Clight} |
---|
8 | \newcommand{\matita}{Matita} |
---|
9 | \newcommand{\sdcc}{\texttt{sdcc}} |
---|
10 | |
---|
11 | \newcommand{\textSigma}{\ensuremath{\Sigma}} |
---|
12 | |
---|
13 | % LaTeX Companion, p 74 |
---|
14 | \newcommand{\todo}[1]{\marginpar{\raggedright - #1}} |
---|
15 | |
---|
16 | \lstdefinelanguage{coq} |
---|
17 | {keywords={Definition,Lemma,Theorem,Remark,Qed,Save,Inductive,Record}, |
---|
18 | morekeywords={[2]if,then,else}, |
---|
19 | } |
---|
20 | |
---|
21 | \lstdefinelanguage{matita} |
---|
22 | {keywords={definition,lemma,theorem,remark,inductive,record,qed,let,rec,match,with,Type,and,on}, |
---|
23 | morekeywords={[2]whd,normalize,elim,cases,destruct}, |
---|
24 | mathescape=true, |
---|
25 | morecomment=[n]{(*}{*)}, |
---|
26 | } |
---|
27 | |
---|
28 | \lstset{language=matita,basicstyle=\small\tt,columns=flexible,breaklines=false, |
---|
29 | keywordstyle=\color{red}\bfseries, |
---|
30 | keywordstyle=[2]\color{blue}, |
---|
31 | commentstyle=\color{green}, |
---|
32 | stringstyle=\color{blue}, |
---|
33 | showspaces=false,showstringspaces=false} |
---|
34 | |
---|
35 | \lstset{extendedchars=false} |
---|
36 | \lstset{inputencoding=utf8x} |
---|
37 | \DeclareUnicodeCharacter{8797}{:=} |
---|
38 | \DeclareUnicodeCharacter{10746}{++} |
---|
39 | \DeclareUnicodeCharacter{9001}{\ensuremath{\langle}} |
---|
40 | \DeclareUnicodeCharacter{9002}{\ensuremath{\rangle}} |
---|
41 | |
---|
42 | |
---|
43 | \title{ |
---|
44 | INFORMATION AND COMMUNICATION TECHNOLOGIES\\ |
---|
45 | (ICT)\\ |
---|
46 | PROGRAMME\\ |
---|
47 | \vspace*{1cm}Project FP7-ICT-2009-C-243881 \cerco{}} |
---|
48 | |
---|
49 | \date{ } |
---|
50 | \author{} |
---|
51 | |
---|
52 | \begin{document} |
---|
53 | \thispagestyle{empty} |
---|
54 | |
---|
55 | \vspace*{-1cm} |
---|
56 | \begin{center} |
---|
57 | \includegraphics[width=0.6\textwidth]{../../style/cerco_logo.png} |
---|
58 | \end{center} |
---|
59 | |
---|
60 | \begin{minipage}{\textwidth} |
---|
61 | \maketitle |
---|
62 | \end{minipage} |
---|
63 | |
---|
64 | |
---|
65 | \vspace*{0.5cm} |
---|
66 | \begin{center} |
---|
67 | \begin{LARGE} |
---|
68 | \bf |
---|
69 | Report n. D3.2\\ |
---|
70 | CIC encoding: Front-end\\ |
---|
71 | \end{LARGE} |
---|
72 | \end{center} |
---|
73 | |
---|
74 | \vspace*{2cm} |
---|
75 | \begin{center} |
---|
76 | \begin{large} |
---|
77 | Version 1.0 |
---|
78 | \end{large} |
---|
79 | \end{center} |
---|
80 | |
---|
81 | \vspace*{0.5cm} |
---|
82 | \begin{center} |
---|
83 | \begin{large} |
---|
84 | Main Authors:\\ |
---|
85 | Brian~Campbell |
---|
86 | \end{large} |
---|
87 | \end{center} |
---|
88 | |
---|
89 | \vspace*{\fill} |
---|
90 | \noindent |
---|
91 | Project Acronym: \cerco{}\\ |
---|
92 | Project full title: Certified Complexity\\ |
---|
93 | Proposal/Contract no.: FP7-ICT-2009-C-243881 \cerco{}\\ |
---|
94 | |
---|
95 | \clearpage \pagestyle{myheadings} \markright{\cerco{}, FP7-ICT-2009-C-243881} |
---|
96 | |
---|
97 | \newpage |
---|
98 | |
---|
99 | \vspace*{7cm} |
---|
100 | \paragraph{Abstract} |
---|
101 | We describe the translation of the front end of the \cerco{} compiler from the |
---|
102 | \ocaml{} prototype to the Calculus of Inductive Constructions (CIC) in the |
---|
103 | \matita{} proof assistant. This transforms programs in the C-like |
---|
104 | \textsf{Clight} language to the \textsf{RTLabs} language, which is reasonably |
---|
105 | target-independent and in the form of a control flow graph. |
---|
106 | |
---|
107 | We also report on progress enriching these transformations with dependent |
---|
108 | types so as to establish key invariants in each intermediate language, which |
---|
109 | will assist in future work proving correctness properties of the compiler. |
---|
110 | |
---|
111 | \newpage |
---|
112 | |
---|
113 | \tableofcontents |
---|
114 | |
---|
115 | % TODO: clear up any -ize vs -ise |
---|
116 | % TODO: clear up any "front end" vs "front-end" |
---|
117 | % TODO: clear up any mentions of languages that aren't textsf'd. |
---|
118 | % TODO: fix unicode in listings |
---|
119 | % TODO: make it clear that our correctness is intended to go beyond compcert's |
---|
120 | |
---|
121 | \section{Introduction} |
---|
122 | |
---|
123 | The \cerco{} compiler has been prototyped in \ocaml{}~\cite{d2.1,d2.2}, but |
---|
124 | the certified compiler will be a program written in the Calculus of Inductive |
---|
125 | Constructions (CIC), as realised by the \matita{} proof assistant. This |
---|
126 | deliverable reports on the translation of the front-end of the compiler into |
---|
127 | CIC and the subsequent efforts to start exploiting dependent types to maintain |
---|
128 | invariants and rule out potential sources of failure in the compiler. |
---|
129 | |
---|
130 | The input language for the formalized compiler is the \textsf{Clight} |
---|
131 | language. This is a C-like language with side-effect free expressions that |
---|
132 | was adapted from the CompCert project~\cite{compcertfm06}\footnote{We will |
---|
133 | also use their CIL-based C parser to generate \textsf{Clight} abstract syntax |
---|
134 | trees, but will not formalize this code.} and provided with an executable |
---|
135 | semantics. See~\cite{d3.1} for more details on the syntax and semantics. |
---|
136 | |
---|
137 | \begin{figure} |
---|
138 | \begin{tabbing} |
---|
139 | \quad \= $\downarrow$ \quad \= \kill |
---|
140 | \textsf{C} (unformalized)\\ |
---|
141 | \> $\downarrow$ \> CIL parser (unformalized \ocaml)\\ |
---|
142 | \textsf{Clight}\\ |
---|
143 | \> $\downarrow$ \> cast removal\\ |
---|
144 | \> $\downarrow$ \> add runtime functions\\ |
---|
145 | \> $\downarrow$ \> labelling\\ |
---|
146 | \> $\downarrow$ \> stack variable allocation and control structure |
---|
147 | simplification\\ |
---|
148 | \textsf{Cminor}\\ |
---|
149 | \> $\downarrow$ \> generate global variable initialisation code\\ |
---|
150 | \> $\downarrow$ \> transform to RTL graph\\ |
---|
151 | \textsf{RTLabs}\\ |
---|
152 | \> $\downarrow$ \> start of target specific backend\\ |
---|
153 | \ \ \dots |
---|
154 | \end{tabbing} |
---|
155 | \caption{Front-end languages and transformations} |
---|
156 | \label{fig:summary} |
---|
157 | \end{figure} |
---|
158 | |
---|
159 | The front-end of the compiler is summarised in Figure~\ref{fig:summary}. The |
---|
160 | two intermediate languages involved are |
---|
161 | \begin{description} |
---|
162 | \item[\textsf{Cminor}] --- a C-like language where local variables are no |
---|
163 | longer held explicitly in memory and control structures are simpler |
---|
164 | |
---|
165 | \item[\textsf{RTLabs}] --- a language in the form of a control flow graph but |
---|
166 | retaining the front end operations from \textsf{Cminor} |
---|
167 | \end{description} |
---|
168 | More details on the formalisation of the syntax and semantics of these |
---|
169 | languages can be found in the accompanying deliverable 3.4. |
---|
170 | Development of the formalized front-end was conducted in concert with the |
---|
171 | development of these intermediate languages to facilitate testing. |
---|
172 | |
---|
173 | \subsection{Revisions to the prototype compiler} |
---|
174 | |
---|
175 | We have been tracking revisions to the prototype compiler during the |
---|
176 | development of the formalized version. These were usually minor, with the |
---|
177 | exception of the following major revision to the compiler. |
---|
178 | |
---|
179 | The original plan for the front-end featured a \textsf{Clight} to |
---|
180 | \textsf{Clight8} phase near the start which replaced all of the integer |
---|
181 | values and operations by 8 bit counterparts, while pointers were split into |
---|
182 | bytes at a later stage. Experimentation found that it would be difficult to |
---|
183 | produce good code with this approach. Instead, we now have: |
---|
184 | \begin{itemize} |
---|
185 | \item full size integers, pointers and operations until code selection (the |
---|
186 | first part of the back-end after \textsf{RTLabs}), and |
---|
187 | \item a cast removal stage which simplifies \textsf{Clight} expressions such |
---|
188 | as |
---|
189 | \begin{lstlisting}[language=C] |
---|
190 | (char)((int)x + (int)y) |
---|
191 | \end{lstlisting} |
---|
192 | which are produced by integer promotion built into the CIL parser into |
---|
193 | equivalent operations on simpler types, \lstinline'x+y' in this case. |
---|
194 | \end{itemize} |
---|
195 | |
---|
196 | This document describes the formalized front end after these changes. |
---|
197 | |
---|
198 | \section{Clight phases} |
---|
199 | |
---|
200 | In addition to the conversion to \textsf{Cminor}, there are several |
---|
201 | transformations that act directly on the \textsf{Clight} language. |
---|
202 | |
---|
203 | \subsection{Cast simplification} |
---|
204 | |
---|
205 | We noted above that the arithmetic promotion required by C (and implemented in |
---|
206 | the CIL-based parser) adds numerous casts, causing arithmetic operations to be |
---|
207 | performed on 32 bit integers. If left alone, the resulting code will be |
---|
208 | larger and slower. This phase removes many of the casts so that the |
---|
209 | operations can be performed more efficiently. |
---|
210 | |
---|
211 | The prototype version worked by recognising fixed patterns in the |
---|
212 | \textsf{Clight} abstract syntax tree such as |
---|
213 | \[ (t)((t_1)e_1 op (t_2)e_2) \] |
---|
214 | subject to restrictions on the types, and replaces them with a simpler |
---|
215 | version. These deep pattern matches are slightly awkward in \matita{} and |
---|
216 | they do not capture compositions of operations, such as |
---|
217 | \begin{lstlisting}[language=C] |
---|
218 | (char)((int)a + (int)b + (int)c) |
---|
219 | \end{lstlisting} |
---|
220 | where \lstinline'a', \lstinline'b' and \lstinline'c' are of type |
---|
221 | \lstinline'char'. |
---|
222 | |
---|
223 | The formalized version uses a different method, recursively examining each |
---|
224 | expression constructor to see if the expression can be coerced to some |
---|
225 | `desired' type. For example, when processing the above expression it reaches |
---|
226 | each \lstinline'int' cast with a desired type of \lstinline'char', notes that |
---|
227 | the subexpression is of type \lstinline'char' and eliminates the cast. |
---|
228 | Moreover, when the recursive processing is complete the \lstinline'char' cast |
---|
229 | is also eliminated because its subexpression is now of the correct type. |
---|
230 | |
---|
231 | This has been formalized in \matita. We have also performed a few proofs that |
---|
232 | the arithmetic behind these changes is correct to gain confidence in the |
---|
233 | technique. During task 3.4 we will extend these proofs to cover more |
---|
234 | operations and show that the semantics of the expressions are equivalent, not |
---|
235 | just the underlying arithmetic. |
---|
236 | |
---|
237 | \subsection{Labelling} |
---|
238 | |
---|
239 | This phase adds cost labels to the \textsf{Clight} program. It is a fairly |
---|
240 | simple recursive definition, and was straightforward to port to \matita. The |
---|
241 | generation of cost labels was handled by our generic identifiers code, |
---|
242 | described in the accompanying deliverable 3.3 on intermediate languages. |
---|
243 | |
---|
244 | \subsection{Runtime functions} |
---|
245 | % TODO: this hasn't been implemented yet |
---|
246 | |
---|
247 | \subsection{Conversion to Cminor} |
---|
248 | |
---|
249 | The conversion to \textsf{Cminor} performs two essential tasks. First, it |
---|
250 | determines which local variables need to be stored in memory and generates |
---|
251 | explicit memory accesses for them. Second, it must translate the control |
---|
252 | structures (\lstinline'for', \lstinline'while', \dots) into \textsf{Cminor}'s |
---|
253 | more basic structures. |
---|
254 | |
---|
255 | These are both performed by code similar to that in the prototype, although |
---|
256 | the use of generic fold operations on statements and expressions has been |
---|
257 | replaced by simpler recursive definitions. |
---|
258 | |
---|
259 | There are two additional pieces of work that the formalized translation must |
---|
260 | do. The \textsf{Cminor} definition features some mild constraints of the |
---|
261 | types of expressions, which we can enforce in the translation using some type |
---|
262 | checking. The error monad is used to dispose of ill-typed \textsf{Clight} |
---|
263 | programs. |
---|
264 | |
---|
265 | The other difficulty is that we need to generate fresh temporary variables to |
---|
266 | store function results in before they are written to memory. This is |
---|
267 | necessary because \textsf{Clight} allows arbitrary \emph{lvalue} expressions |
---|
268 | as the destination for the returned value, but \textsf{Cminor} only allows |
---|
269 | local variables. However, the variable names are supplied with the input |
---|
270 | program, but without any method for generating fresh names. |
---|
271 | |
---|
272 | Our identifiers are based on binary numbers, and generation of fresh names is |
---|
273 | handled by keeping track of the highest allocated number. Thus we create a |
---|
274 | new name generator from the input program by finding the maximum number used. |
---|
275 | |
---|
276 | \section{Cminor phases} |
---|
277 | |
---|
278 | Only two phases deal with \textsf{Cminor} programs: |
---|
279 | |
---|
280 | \subsection{Initialisation code} |
---|
281 | |
---|
282 | This replaces the initialisation data with explicit code in the main function. |
---|
283 | The only remarkable point in the formalization is that we have two slightly |
---|
284 | different instantiations of the \textsf{Cminor} syntax: one with |
---|
285 | initialisation data that this pass takes as input, and one with only size |
---|
286 | information that is the output. In addition to reflecting the purpose of this |
---|
287 | pass in the types, it also ensures that it cannot be accidentally left out. |
---|
288 | |
---|
289 | \subsection{Conversion to RTLabs} |
---|
290 | |
---|
291 | This pass breaks down the structure of the \textsf{Cminor} program into a |
---|
292 | control flow graph, but maintains the same set of operations. The algorithm |
---|
293 | is stateful in the sense that it builds up the \textsf{RTLabs} function body |
---|
294 | incrementally, but all of the relevant state is already present in the |
---|
295 | function record (including the fresh register and graph label name generators) |
---|
296 | and the prototype passes this around. Thus the formalized code is very |
---|
297 | similar in nature. |
---|
298 | |
---|
299 | One possible variation would be to explicitly define a state monad to carry |
---|
300 | the function under construction around, but it is not yet clear if this will |
---|
301 | make the correctness easier to prove. |
---|
302 | |
---|
303 | \section{Adding and using invariants} |
---|
304 | |
---|
305 | The compiler phases described above all use the error monad to deal with |
---|
306 | inconsistencies in the program being transformed. In particular, lookups in |
---|
307 | environments may fail, control flow graphs may have missing statements and |
---|
308 | various structural problems may be present. We would like to show that these |
---|
309 | failures are absent where possible by establishing that programs are well |
---|
310 | formed early in the compilation process. |
---|
311 | |
---|
312 | This work overlaps with deliverable 3.3 (where more details of the additions |
---|
313 | to the syntax and semantics of the intermediate languages can be found) and |
---|
314 | task 3.4 on the correctness of the compiler. Thus this work is experimental |
---|
315 | in nature, and will evolve during task 3.4. |
---|
316 | |
---|
317 | Invariants on \textsf{Cminor} programs are defined using a higher order |
---|
318 | predicate which applies a given predicate to a statement and recursively to |
---|
319 | each of its substatements (and expressions and subexpressions, respectively). |
---|
320 | |
---|
321 | Thus during the \textsf{Clight} to \textsf{Cminor} stage the values returned |
---|
322 | are not just \textsf{Cminor} expressions and statements, but dependent pairs |
---|
323 | that also return invariants to establish that only local variables appear in |
---|
324 | the generated terms, and all labels appearing in \texttt{goto} statements are |
---|
325 | defined in the result. The latter is proved by showing two properties: first |
---|
326 | by checking whether the labels are in the set defined by the original |
---|
327 | \textsf{Clight} function body, and second that the translation preserves the |
---|
328 | set of label statements. |
---|
329 | |
---|
330 | These two properties are used at the end of translation to show the invariant |
---|
331 | attached to \textsf{Cminor} \emph{functions}: that all \texttt{goto} labels |
---|
332 | are defined in the body. Similarly, the invariant for variables is slightly |
---|
333 | different to the translation's: we go from the fact that every variable was |
---|
334 | classified as local to the appearance of every variable in the locals or |
---|
335 | parameters. This is done using a lemma showing that the classification only |
---|
336 | allows parameters and locals to be classed as `local'. |
---|
337 | |
---|
338 | The next translation to \textsf{RTLabs}\dots |
---|
339 | |
---|
340 | \section{Conclusion} |
---|
341 | |
---|
342 | \bibliographystyle{plain} |
---|
343 | \bibliography{report} |
---|
344 | |
---|
345 | \end{document} |
---|