Changeset 1261 for Deliverables
 Timestamp:
 Sep 23, 2011, 4:17:44 PM (10 years ago)
 File:

 1 edited
Legend:
 Unmodified
 Added
 Removed

Deliverables/D3.2/Report/report.tex
r1235 r1261 107 107 We also report on progress enriching these transformations with dependent 108 108 types so as to establish key invariants in each intermediate language, which 109 will assist in future work proving correctness properties of the compiler. 109 removes potential sources of failure within the compiler and 110 will assist in future work proving correctness properties. 110 111 111 112 \newpage … … 118 119 % TODO: fix unicode in listings 119 120 % TODO: make it clear that our correctness is intended to go beyond compcert's 121 % TODO: capitalise deliverable/task? 120 122 121 123 \section{Introduction} … … 136 138 137 139 \begin{figure} 140 \begin{center} 141 \begin{minipage}{.8\linewidth} 138 142 \begin{tabbing} 139 143 \quad \= $\downarrow$ \quad \= \kill … … 151 155 \textsf{RTLabs}\\ 152 156 \> $\downarrow$ \> start of target specific backend\\ 153 \ \ \dots157 \>\quad \vdots 154 158 \end{tabbing} 159 \end{minipage} 160 \end{center} 155 161 \caption{Frontend languages and transformations} 156 162 \label{fig:summary} … … 160 166 two intermediate languages involved are 161 167 \begin{description} 162 \item[\textsf{Cminor}]  a Clike language where local variables are no 163 longer held explicitly inmemory and control structures are simpler164 165 \item[\textsf{RTLabs}]  a language in the form of a control flow graph but166 retaining thefront end operations from \textsf{Cminor}168 \item[\textsf{Cminor}]  a Clike language where local variables are not 169 explicitly allocated memory and control structures are simpler 170 171 \item[\textsf{RTLabs}]  a language in the form of a control flow graph 172 which retains the values and front end operations from \textsf{Cminor} 167 173 \end{description} 168 174 More details on the formalisation of the syntax and semantics of these … … 174 180 175 181 We have been tracking revisions to the prototype compiler during the 176 development of the formalized version. These were usually minor, with the177 exception of the following major revision tothe compiler.182 development of the formalized version. Most of these changes were minor, but 183 one exception is a major change to the structure of the compiler. 178 184 179 185 The original plan for the frontend featured a \textsf{Clight} to 180 186 \textsf{Clight8} phase near the start which replaced all of the integer 181 187 values and operations by 8 bit counterparts, while pointers were split into 182 bytes at a later stage. Experi mentation foundthat it would be difficult to188 bytes at a later stage. Experience has shown that it would be difficult to 183 189 produce good code with this approach. Instead, we now have: 184 190 \begin{itemize} … … 187 193 \item a cast removal stage which simplifies \textsf{Clight} expressions such 188 194 as 189 \begin{lstlisting}[language=C ]190 (char)((int)x + (int)y)195 \begin{lstlisting}[language=C,belowskip=0pt] 196 (char)((int)x + (int)y) 191 197 \end{lstlisting} 192 which are produced by integer promotion built into the CIL parser into 193 equivalent operations on simpler types, \lstinline'x+y' in this case. 198 into equivalent operations on simpler types, \lstinline'x+y' in this case. 199 The cast removal is important because C requires \emph{arithmetic promotion} 200 of integer types to (at least) \lstinline'int' before an operation is 201 performed. The \textsf{Clight} semantics do not perform the promotions, 202 instead they are added as casts by the CILbased parser. However, our targets 203 benefit immensely from performing operations on the smallest possible integer 204 type, so it is important that we remove promotions where possible. 194 205 \end{itemize} 195 206 196 207 This document describes the formalized front end after these changes. 197 208 198 \section{ Clightphases}209 \section{\textsf{Clight} phases} 199 210 200 211 In addition to the conversion to \textsf{Cminor}, there are several 201 transformations thatact directly on the \textsf{Clight} language.212 transformations which act directly on the \textsf{Clight} language. 202 213 203 214 \subsection{Cast simplification} … … 211 222 The prototype version worked by recognising fixed patterns in the 212 223 \textsf{Clight} abstract syntax tree such as 213 \[ (t)((t_1)e_1 op (t_2)e_2)\]214 subject to restrictions on the types , and replaces themwith a simpler215 version . These deep pattern matches are slightly awkward in \matita{} and216 they donot capture compositions of operations, such as224 \[ (t)((t_1)e_1\ op\ (t_2)e_2), \] 225 subject to restrictions on the types. These are replaced with a simpler 226 version without the casts. Such `deep' pattern matching is slightly awkward in 227 \matita{} and this approach does not capture compositions of operations, such as 217 228 \begin{lstlisting}[language=C] 218 (char)((int)a + (int)b+ (int)c)229 (char)(((int)a + (int)b) + (int)c) 219 230 \end{lstlisting} 220 231 where \lstinline'a', \lstinline'b' and \lstinline'c' are of type 232 \lstinline'char', because the intermediate expression is not cast to and from 221 233 \lstinline'char'. 222 234 … … 227 239 the subexpression is of type \lstinline'char' and eliminates the cast. 228 240 Moreover, when the recursive processing is complete the \lstinline'char' cast 229 is also eliminated because its subexpression is nowof the correct type.230 231 This has been formalized in \matita. We have also performed a few proofs that241 is also eliminated because its subexpression is already of the correct type. 242 243 This has been implemented in \matita. We have also performed a few proofs that 232 244 the arithmetic behind these changes is correct to gain confidence in the 233 245 technique. During task 3.4 we will extend these proofs to cover more … … 267 279 necessary because \textsf{Clight} allows arbitrary \emph{lvalue} expressions 268 280 as the destination for the returned value, but \textsf{Cminor} only allows 269 local variables. However, the variable names are supplied with the input 270 program, but without any method for generating fresh names. 281 local variables. All other variable names in the \textsf{Cminor} program came 282 from the \textsf{Clight} program, but we need to construct a method for 283 generating fresh names for the temporaries. 271 284 272 285 Our identifiers are based on binary numbers, and generation of fresh names is 273 handled by keeping track of the highest allocated number. Thus we create a 274 new name generator from the input program by finding the maximum number used. 275 276 \section{Cminor phases} 277 278 Only two phases deal with \textsf{Cminor} programs: 286 handled by keeping track of the highest allocated number. Normally this is 287 initialised at zero, but if initialised by the largest existing identifier in 288 the \textsf{Clight} program then the generated names will be fresh. 289 To do this, we extract the maximum identifier by recursively finding the maximum 290 variable name used in every expression, statement and function of the program. 291 292 \section{\textsf{Cminor} phases} 293 294 \textsf{Cminor} programs are processed by two passes: one deals with the 295 initialisation of global variables, and the other produces \textsf{RTLabs} 296 code. 279 297 280 298 \subsection{Initialisation code} … … 285 303 initialisation data that this pass takes as input, and one with only size 286 304 information that is the output. In addition to reflecting the purpose of this 287 pass in the types, it also ensures that it cannot be accidentally left out.288 289 \subsection{Conversion to RTLabs}305 pass in its type, it also ensures that the pass cannot be accidentally omitted. 306 307 \subsection{Conversion to \textsf{RTLabs}} 290 308 291 309 This pass breaks down the structure of the \textsf{Cminor} program into a … … 299 317 One possible variation would be to explicitly define a state monad to carry 300 318 the function under construction around, but it is not yet clear if this will 301 make the correctness easier to prove.319 make the correctness results easier to prove. 302 320 303 321 \section{Adding and using invariants} … … 315 333 in nature, and will evolve during task 3.4. 316 334 317 Invariants on \textsf{Cminor} programs are defined using a higher order 318 predicate which applies a given predicate to a statement and recursively to 319 each of its substatements (and expressions and subexpressions, respectively). 320 321 Thus during the \textsf{Clight} to \textsf{Cminor} stage the values returned 322 are not just \textsf{Cminor} expressions and statements, but dependent pairs 323 that also return invariants to establish that only local variables appear in 324 the generated terms, and all labels appearing in \texttt{goto} statements are 325 defined in the result. The latter is proved by showing two properties: first 326 by checking whether the labels are in the set defined by the original 327 \textsf{Clight} function body, and second that the translation preserves the 328 set of label statements. 329 330 These two properties are used at the end of translation to show the invariant 331 attached to \textsf{Cminor} \emph{functions}: that all \texttt{goto} labels 332 are defined in the body. Similarly, the invariant for variables is slightly 333 different to the translation's: we go from the fact that every variable was 334 classified as local to the appearance of every variable in the locals or 335 parameters. This is done using a lemma showing that the classification only 336 allows parameters and locals to be classed as `local'. 337 338 The next translation to \textsf{RTLabs}\dots 335 The use of the invariants follows a common pattern. Each language embeds 336 invariants in the function record that constrain the function body by other 337 information in the record (such as the list of local variables and types, or 338 the set of labels declared). However, during the transformations they 339 typically need to be refined to constraints on individual statements and 340 expressions with respect to data structures used in the transformation. 341 A similar change in invariants is required between the transformation and the 342 new function. 343 344 For example, consider the use of local variables in the \textsf{Cminor} to 345 \textsf{RTLabs} stage. We start with 346 \begin{lstlisting}[language=matita] 347 record internal_function : Type[0] ≝ 348 { f_return : option typ 349 ; f_params : list (ident × typ) 350 ; f_vars : list (ident × typ) 351 ; f_stacksize : nat 352 ; f_body : stmt 353 ; f_inv : stmt_P (λs.stmt_vars (λi.Exists ? (λx.\fst x = i) (f_params @ f_vars)) s ∧ 354 stmt_labels (λl.Exists ? (λl'.l' = l) (labels_of f_body)) s) f_body 355 }. 356 \end{lstlisting} 357 where the first half of \lstinline'f_inv' requires every variable in the 358 function body to appear in the parameter or variable list. In the translation 359 to \textsf{RTLabs}, variable lookups are performed in a map to \textsf{RTLabs} 360 pseudoregisters: 361 \begin{lstlisting}[language=matita] 362 definition env ≝ identifier_map SymbolTag register. 363 364 let rec add_expr (le:label_env) (env:env) (ty:typ) (e:expr ty) 365 (Env:expr_vars ty e (present ?? env)) 366 (dst:register) (f:partial_fn le) on e 367 : Σf':partial_fn le. fn_graph_included le f f' ≝ 368 match e return λty,e.expr_vars ty e (present ?? env) → 369 Σf':partial_fn le. fn_graph_included le f f' with 370 [ Id _ i ⇒ λEnv. 371 let r ≝ lookup_reg env i Env in 372 ... 373 \end{lstlisting} 374 Note that \lstinline'lookup_reg' returns a register without any possibility of 375 error. The reason this works is that the pattern match on \lstinline'e' 376 refines the type of the invariant \lstinline'Env' to a proof that the variable 377 \lstinline'i' is present. We then pass this proof to the lookup function to 378 rule out failure. 379 380 When this map \lstinline'env' is constructed at the start of the phase, we 381 prove that the proof \lstinline'f_inv' from the function implies the invariant 382 on variables needed by \lstinline'add_expr' and its equivalent on statements: 383 \begin{lstlisting}[language=matita] 384 lemma populates_env : ∀l,e,u,l',e',u'. 385 populate_env e u l = 〈l',e',u'〉 → 386 ∀i. Exists ? (λx.\fst x = i) l → 387 present ?? e' i. 388 \end{lstlisting} 389 A similar mechanism is used to show that \texttt{goto} labels are always 390 declared in the function. 391 392 Also note the return type of \lstinline'add_expr' is a dependent pair. We 393 build the resulting \textsf{RTLabs} function incrementally, using a type 394 \lstinline'partial_fn' that does not contain the final invariant for 395 functions. We always require the \lstinline'fn_graph_included' property for 396 partially built functions to show that the graph only gets larger, a key part 397 of the proof that the resulting control flow graph is closed. Dependent pairs 398 are used in a similar manner in the \textsf{Clight} to \textsf{Cminor} phase 399 too. 400 401 This work does not currently cover all of the possible sources of failure; in 402 particular some structural constraints on functions are not yet covered and 403 some properties of \textsf{RTLabs} programs that may be useful for later 404 stages or the correctness proofs are not produced. Moreover, we may 405 experiment with variations to try to make the proof obligations and syntax 406 simpler to deal with. However, it does show that retrofitting these 407 properties using dependent types in \matita{} is feasible. 408 409 \section{Testing} 410 411 To provide some early testing and bug fixing of this code we constructed it in 412 concert with the executable semantics described in deliverable 3.3, and 413 \matita{} term pretty printers in the prototype compiler. Using these, we 414 were able to test the phases individually and together by running programs 415 within the proof assistant itself, and comparing the results with the expected 416 output. 339 417 340 418 \section{Conclusion}
Note: See TracChangeset
for help on using the changeset viewer.