Changeset 1968 for Deliverables/D4.2-4.3
- Timestamp:
- May 18, 2012, 12:08:54 PM (9 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
Deliverables/D4.2-4.3/reports/D4-3.tex
r1456 r1968 64 64 \textbf{ 65 65 Report n. D4.3\\ 66 Formal semantics ofintermediate languages66 Executable Formal Semantics of back-end intermediate languages 67 67 } 68 68 \end{LARGE} … … 72 72 \begin{center} 73 73 \begin{large} 74 Version 1. 074 Version 1.1 75 75 \end{large} 76 76 \end{center} … … 99 99 \vspace*{7cm} 100 100 \paragraph{Abstract} 101 We describe the encoding in the Calculus of Constructions of the semantics of the CerCo compiler's back end intermediate languages.102 The CerCo back end consists of five distinct languages: RTL, RTLntl, ERTL, LTL and LIN.101 We describe the encoding in the Calculus of Constructions of the semantics of the CerCo compiler's back-end intermediate languages. 102 The CerCo back-end consists of five distinct languages: RTL, RTLntl, ERTL, LTL and LIN. 103 103 We describe a process of heavy abstraction of the intermediate languages and their semantics. 104 104 We hope that this process will ease the burden of Deliverable D4.4, the proof of correctness for the compiler. … … 130 130 Deliverable D4.3 enjoys a close relationship with three other deliverables, namely deliverables D2.2, D4.3 and D4.4. 131 131 132 Deliverable D2.2, the O 'Caml implementation of a cost preserving compiler for a large subset of the C programming language, is the basis upon which we have implemented the current deliverable.133 In particular, the architecture of the compiler, its intermediate languages and their semantics, and the overall implementation of the Matita encodings has been taken from the O 'Caml compiler.134 Any variations from the O 'Caml design are due to bugs identified in the prototype compiler during the Matita implementation, our identification of code that can be abstracted and made generic, or our use of Matita's much stronger type system to enforce invariants through the use of dependent types.132 Deliverable D2.2, the OCaml implementation of a cost preserving compiler for a large subset of the C programming language, is the basis upon which we have implemented the current deliverable. 133 In particular, the architecture of the compiler, its intermediate languages and their semantics, and the overall implementation of the Matita encodings has been taken from the OCaml compiler. 134 Any variations from the OCaml design are due to bugs identified in the prototype compiler during the Matita implementation, our identification of code that can be abstracted and made generic, or our use of Matita's much stronger type system to enforce invariants through the use of dependent types. 135 135 136 136 Deliverable D4.2 can be seen as a `sister' deliverable to the deliverable reported on herein. 137 In particular, where this deliverable reports on the encoding in the Calculus of Constructions of the back end semantics, D4.2 is the encoding in the Calculus of Constructions of the mutual translations of those languages.137 In particular, where this deliverable reports on the encoding in the Calculus of Constructions of the back-end semantics, D4.2 is the encoding in the Calculus of Constructions of the mutual translations of those languages. 138 138 As a result, a substantial amount of Matita code is shared between the two deliverables. 139 139 140 Deliverable D4.4, the back end correctness proofs, is the immediate successor of this deliverable.141 142 %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% 143 % SECTION. % 144 %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% 145 \section{The back end intermediate languages' semantics in Matita}140 Deliverable D4.4, the back-end correctness proofs, is the immediate successor of this deliverable. 141 142 %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% 143 % SECTION. % 144 %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% 145 \section{The back-end intermediate languages' semantics in Matita} 146 146 \label{sect.backend.intermediate.languages.semantics.matita} 147 147 … … 152 152 \label{subsect.abstracting.related.languages} 153 153 154 As mentioned in the report for Deliverable D4.2, a systematic process of abstraction, over the O 'Caml code, has taken place in the Matita encoding.154 As mentioned in the report for Deliverable D4.2, a systematic process of abstraction, over the OCaml code, has taken place in the Matita encoding. 155 155 In particular, we have merged many of the syntaxes of the intermediate languages (i.e. RTL, ERTL, LTL and LIN) into a single `joint' syntax, which is parameterised by various types. 156 Equivalent intermediate languages to those present in the O 'Caml code can be recovered by specialising this joint structure.156 Equivalent intermediate languages to those present in the OCaml code can be recovered by specialising this joint structure. 157 157 158 158 As mentioned in the report for Deliverable D4.2, there are a number of advantages that this process of abstraction brings, from code reuse to allowing us to get a clearer view of the intermediate languages and their structure. … … 252 252 }. 253 253 \end{lstlisting} 254 Here, \texttt{ resultT} and \texttt{resultT} typically are the (pseudo)registers that store the parameters and result of a function.254 Here, \texttt{paramsT} and \texttt{resultT} typically are the (pseudo)registers that store the parameters and result of a function. 255 255 256 256 We further extend \texttt{params0} with a type for local variables in internal function calls: … … 421 421 \begin{lstlisting} 422 422 definition eval_statement: 423 ∀globals: list ident.∀p:sem_params2 globals.423 $\forall$globals: list ident.$\forall$p:sem_params2 globals. 424 424 genv globals p $\rightarrow$ state p $\rightarrow$ IO io_out io_in (trace $\times$ (state p)) := 425 425 ... … … 445 445 \item 446 446 A data type, $M$. 447 For instance, the \texttt{option} type in O 'Caml or Matita.447 For instance, the \texttt{option} type in OCaml or Matita. 448 448 \item 449 449 A way to `inject' or `lift' pure values into this data type (usually called \texttt{return}). … … 468 468 These \emph{monad laws} should also be useful in reasoning about monadic computations in the proof of correctness of the compiler. 469 469 \end{itemize} 470 In the semantics of both front and back end intermediate languages, we make use of monads.471 This monadic infrastructure is shared between the front end and backend languages.470 In the semantics of both front and back-end intermediate languages, we make use of monads. 471 This monadic infrastructure is shared between the front-end and back-end languages. 472 472 473 473 In particular, an `IO' monad, signalling the emission of a cost label, or the calling of an external function, is heavily used in the semantics of the intermediate languages. … … 476 476 \begin{lstlisting} 477 477 definition eval_statement: 478 ∀globals: list ident.∀p:sem_params2 globals.478 $\forall$globals: list ident.$\forall$p:sem_params2 globals. 479 479 genv globals p $\rightarrow$ state p $\rightarrow$ IO io_out io_in (trace $\times$ (state p)) := 480 480 ... … … 484 484 \begin{lstlisting} 485 485 definition eval_statement: 486 ∀globals: list ident. ∀p:sem_params2 globals.486 $\forall$globals: list ident. $\forall$p:sem_params2 globals. 487 487 genv globals p $\rightarrow$ state p $\rightarrow$ IO io_out io_in (trace $\times$ (state p)) := 488 488 $\lambda$globals, p, ge, st. … … 527 527 \label{subsect.memory.models} 528 528 529 Currently, the semantics of the front and backend intermediate languages are built around two distinct memory models. 530 The frontend languages reuse the CompCert memory model, whereas the backend languages employ a bespoke model tailored to their needs. 531 This split between the memory models is reflective of the fact that the front and backend language have different requirements from their memory models, to a certain extent. 532 533 In particular, the CompCert memory model places quite heavy restrictions on where in memory one can read from. 534 To read a value in this memory model, you must supply an address, complete with a number of `chunks' to read following that address. 535 The read is only successful if you attempt to read at a genuine `value boundary', and read the appropriate number of memory chunks for that value. 536 As a result, with the CompCert memory model you are unable to read the third byte of a 32-bit integer value directly from memory, for instance. 537 This has some consequences for the CompCert compiler, namely an inability to write a \texttt{memcpy} routine. 538 539 However, the CerCo memory model operates differently, as we need to move data `piecemeal' between stacks in the backend of the compiler. 540 As a result, the bespoke memory model allows one to read data at any memory location, not just on value boundaries. 541 This has the advantage that we can successfully write a \texttt{memcpy} routine with the CerCo compiler (remembering that \texttt{memcpy} is nothing more than `read a byte, copy a byte' repeated in a loop), an advantage over CompCert. 529 Currently, the semantics of the front and back-end intermediate languages are built around two distinct memory models. 530 The front-end languages reuse the CompCert 1.6 memory model, whereas the back-end languages employ a version tailored to their needs. 531 This split between the memory models reflects the fact that the front-end and back-end languages have different requirements from their memory models. 532 533 In particular, the CompCert 1.6 memory model places quite heavy restrictions on where in memory one can read from. 534 To read a value in this memory model, you must supply an address, complete with the size of `chunk' to read following that address. 535 The read is only successful if you attempt to read at a genuine `value boundary', and read the appropriate amount of memory for that value. 536 As a result, with that memory model you are unable to read the third byte of a 32-bit integer value directly from memory, for instance. 537 This has some consequences for the compiler, namely an inability to write a \texttt{memcpy} routine. 538 539 However, the CerCo memory model operates differently, as we need to move data `piecemeal' between stacks in the back-end of the compiler. 540 As a result, the back-end memory model allows one to read data at any memory location, not just on value boundaries. 541 This has the advantage that we can successfully give a semantics to a \texttt{memcpy} routine in the back-end of the CerCo compiler (remembering that \texttt{memcpy} is nothing more than `read a byte, copy a byte' repeated in a loop), an advantage over CompCert. However, the front-end of CerCo cannot because its memory model and values are the similar to CompCert 1.6. 542 543 More recent versions of CompCert's memory model have evolved in a similar direction, with a byte-by-byte representation of memory blocks. However, there remains an important difference in the handling of pointer values in the rest of the formalisation. In particular, in CompCert 1.10 only complete pointer values can be loaded in all of the languages in the compiler, whereas in CerCo we need to represent individual bytes of a pointer in the back-end to support our 8-bit target architecture. 542 544 543 545 Right now, the two memory models are interfaced during the translation from RTLabs to RTL. 544 It is an open question whether we will unify the two memory models, using only the back end, bespoke memory model throughout the compiler, as the CompCert memory model seems to work fine for the frontend, where such byte-by-byte copying is not needed.545 However, should we decide to port the front end to the new memory model, it has been written in such an abstract way that doing so would be relatively straightforward.546 It is an open question whether we will unify the two memory models, using only the back-end, bespoke memory model throughout the compiler, as the CompCert memory model seems to work fine for the front-end, where such byte-by-byte copying is not needed. 547 However, should we decide to port the front-end to the new memory model, it has been written in such an abstract way that doing so would be relatively straightforward. 546 548 547 549 %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% … … 555 557 We leave this for further work, due to there being no pressing need to implement this feature at the present time. 556 558 557 There is also, as mentioned, an open problem as to whether the front end languages should use the same memory model as the backend languages, as opposed to reusing the CompCert memory model.558 Should this decision be taken, this will likely be straightforward but potentially time consuming .559 There is also, as mentioned, an open problem as to whether the front-end languages should use the same memory model as the back-end languages, as opposed to reusing the CompCert memory model. 560 Should this decision be taken, this will likely be straightforward but potentially time consuming\footnote{After the original version of this deliverable was written we ported the front-end languages' semantics to the back-end memory model. This turned out not to be time consuming, and moreover used definitions linking front-end and back-end values that are required for the correctness proofs anyway. However, the front-end still cannot give a semantics to \texttt{memcpy}, for the same reason as CompCert 1.10; the language currently has no representation for a single byte of a pointer.}. 559 561 560 562 %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% … … 570 572 \label{subsect.listing.files} 571 573 572 Semantics specific files (files relating to language translations ommitted). 573 Syntax specific files are presented in Table~\ref{table.syntax}. 574 Here, the O'Caml column denotes the O'Caml source file(s) in the prototype compiler's implementation that corresponds to the Matita script in question. 575 The ratios are the linecounts of the Matita file divided by the line counts of the corresponding O'Caml file. 574 Syntax specific files are presented in Table~\ref{table.syntax} (files relating to language translations omitted). 575 Here, the OCaml column denotes the OCaml source file(s) in the prototype compiler's implementation that corresponds to the Matita script in question. 576 The ratios are the linecounts of the Matita file divided by the line counts of the corresponding OCaml file. 576 577 These are computed with \texttt{wc -l}, a standard Unix tool. 577 578 578 Individual file's ratios are an over approximation, due to the fact that it's hard to relate an individual O 'Caml file to the abstracted Matita code that has been spread across multiple files.579 The ratio between total Matita code lines and total O 'Caml code lines is more reflective of the compressed and abstracted state of the Matita translation.579 Individual file's ratios are an over approximation, due to the fact that it's hard to relate an individual OCaml file to the abstracted Matita code that has been spread across multiple files. 580 The ratio between total Matita code lines and total OCaml code lines is more reflective of the compressed and abstracted state of the Matita translation. 580 581 581 582 Semantics specific files are presented in Table~\ref{table.semantics}. … … 584 585 \begin{threeparttable} 585 586 \begin{tabular}{llrlrl} 586 Description & Matita & Lines & O 'Caml & Lines & Ratio \\587 \hline 588 Abstracted syntax for back end languages & \texttt{joint/Joint.ma} & 173 & N/A & N/A & N/A \\587 Description & Matita & Lines & OCaml & Lines & Ratio \\ 588 \hline 589 Abstracted syntax for back-end languages & \texttt{joint/Joint.ma} & 173 & N/A & N/A & N/A \\ 589 590 The syntax of RTLabs & \texttt{RTLabs/syntax.ma} & 73 & \texttt{RTLabs/RTLabs.mli} & 113 & 0.65 \\ 590 591 The syntax of RTL & \texttt{RTL/RTL.ma} & 49 & \texttt{RTL/RTL.mli} & 120 & 1.85\tnote{a} \\ … … 599 600 \begin{tabular}{ll} 600 601 Total lines of Matita code for the above files:& 347 \\ 601 Total lines of O 'Caml code for the above files:& 616 \\602 Total lines of OCaml code for the above files:& 616 \\ 602 603 Ratio of total lines:& 0.56 603 604 \end{tabular} … … 613 614 \begin{threeparttable} 614 615 \begin{tabular}{llrlrl} 615 Description & Matita & Lines & O 'Caml & Lines & Ratio \\616 Description & Matita & Lines & OCaml & Lines & Ratio \\ 616 617 \hline 617 618 Semantics of the abstracted languages & \texttt{joint/semantics.ma} & 434 & N/A & N/A & N/A \\ … … 629 630 \begin{tabular}{ll} 630 631 Total lines of Matita code for the above files:& 1145 \\ 631 Total lines of O 'Caml code for the above files:& 1978 \\632 Total lines of OCaml code for the above files:& 1978 \\ 632 633 Ration of total lines:& 0.58 633 634 \end{tabular} … … 645 646 \label{subsect.listing.important.functions.axioms} 646 647 647 We list some important functions and axioms in the back end semantics:648 We list some important functions and axioms in the back-end semantics: 648 649 649 650 \paragraph{From RTLabs/semantics.ma}
Note: See TracChangeset
for help on using the changeset viewer.