Changeset 1968 for Deliverables


Ignore:
Timestamp:
May 18, 2012, 12:08:54 PM (8 years ago)
Author:
campbell
Message:

Update D4.3's title, memory model details, and some typographical
inconsistencies and glitches.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • Deliverables/D4.2-4.3/reports/D4-3.tex

    r1456 r1968  
    6464\textbf{
    6565Report n. D4.3\\
    66 Formal semantics of intermediate languages
     66Executable Formal Semantics of back-end intermediate languages
    6767}
    6868\end{LARGE}
     
    7272\begin{center}
    7373\begin{large}
    74 Version 1.0
     74Version 1.1
    7575\end{large}
    7676\end{center}
     
    9999\vspace*{7cm}
    100100\paragraph{Abstract}
    101 We describe the encoding in the Calculus of Constructions of the semantics of the CerCo compiler's backend intermediate languages.
    102 The CerCo backend consists of five distinct languages: RTL, RTLntl, ERTL, LTL and LIN.
     101We describe the encoding in the Calculus of Constructions of the semantics of the CerCo compiler's back-end intermediate languages.
     102The CerCo back-end consists of five distinct languages: RTL, RTLntl, ERTL, LTL and LIN.
    103103We describe a process of heavy abstraction of the intermediate languages and their semantics.
    104104We hope that this process will ease the burden of Deliverable D4.4, the proof of correctness for the compiler.
     
    130130Deliverable D4.3 enjoys a close relationship with three other deliverables, namely deliverables D2.2, D4.3 and D4.4.
    131131
    132 Deliverable D2.2, the O'Caml implementation of a cost preserving compiler for a large subset of the C programming language, is the basis upon which we have implemented the current deliverable.
    133 In particular, the architecture of the compiler, its intermediate languages and their semantics, and the overall implementation of the Matita encodings has been taken from the O'Caml compiler.
    134 Any variations from the O'Caml design are due to bugs identified in the prototype compiler during the Matita implementation, our identification of code that can be abstracted and made generic, or our use of Matita's much stronger type system to enforce invariants through the use of dependent types.
     132Deliverable D2.2, the OCaml implementation of a cost preserving compiler for a large subset of the C programming language, is the basis upon which we have implemented the current deliverable.
     133In particular, the architecture of the compiler, its intermediate languages and their semantics, and the overall implementation of the Matita encodings has been taken from the OCaml compiler.
     134Any variations from the OCaml design are due to bugs identified in the prototype compiler during the Matita implementation, our identification of code that can be abstracted and made generic, or our use of Matita's much stronger type system to enforce invariants through the use of dependent types.
    135135
    136136Deliverable D4.2 can be seen as a `sister' deliverable to the deliverable reported on herein.
    137 In particular, where this deliverable reports on the encoding in the Calculus of Constructions of the backend semantics, D4.2 is the encoding in the Calculus of Constructions of the mutual translations of those languages.
     137In particular, where this deliverable reports on the encoding in the Calculus of Constructions of the back-end semantics, D4.2 is the encoding in the Calculus of Constructions of the mutual translations of those languages.
    138138As a result, a substantial amount of Matita code is shared between the two deliverables.
    139139
    140 Deliverable D4.4, the backend correctness proofs, is the immediate successor of this deliverable.
    141 
    142 %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
    143 % SECTION.                                                                    %
    144 %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
    145 \section{The backend intermediate languages' semantics in Matita}
     140Deliverable D4.4, the back-end correctness proofs, is the immediate successor of this deliverable.
     141
     142%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
     143% SECTION.                                                                    %
     144%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
     145\section{The back-end intermediate languages' semantics in Matita}
    146146\label{sect.backend.intermediate.languages.semantics.matita}
    147147
     
    152152\label{subsect.abstracting.related.languages}
    153153
    154 As mentioned in the report for Deliverable D4.2, a systematic process of abstraction, over the O'Caml code, has taken place in the Matita encoding.
     154As mentioned in the report for Deliverable D4.2, a systematic process of abstraction, over the OCaml code, has taken place in the Matita encoding.
    155155In particular, we have merged many of the syntaxes of the intermediate languages (i.e. RTL, ERTL, LTL and LIN) into a single `joint' syntax, which is parameterised by various types.
    156 Equivalent intermediate languages to those present in the O'Caml code can be recovered by specialising this joint structure.
     156Equivalent intermediate languages to those present in the OCaml code can be recovered by specialising this joint structure.
    157157
    158158As mentioned in the report for Deliverable D4.2, there are a number of advantages that this process of abstraction brings, from code reuse to allowing us to get a clearer view of the intermediate languages and their structure.
     
    252252}.
    253253\end{lstlisting}
    254 Here, \texttt{resultT} and \texttt{resultT} typically are the (pseudo)registers that store the parameters and result of a function.
     254Here, \texttt{paramsT} and \texttt{resultT} typically are the (pseudo)registers that store the parameters and result of a function.
    255255
    256256We further extend \texttt{params0} with a type for local variables in internal function calls:
     
    421421\begin{lstlisting}
    422422definition eval_statement:
    423   ∀globals: list ident.∀p:sem_params2 globals.
     423  $\forall$globals: list ident.$\forall$p:sem_params2 globals.
    424424    genv globals p $\rightarrow$ state p $\rightarrow$ IO io_out io_in (trace $\times$ (state p)) :=
    425425...
     
    445445\item
    446446A data type, $M$.
    447 For instance, the \texttt{option} type in O'Caml or Matita.
     447For instance, the \texttt{option} type in OCaml or Matita.
    448448\item
    449449A way to `inject' or `lift' pure values into this data type (usually called \texttt{return}).
     
    468468These \emph{monad laws} should also be useful in reasoning about monadic computations in the proof of correctness of the compiler.
    469469\end{itemize}
    470 In the semantics of both front and backend intermediate languages, we make use of monads.
    471 This monadic infrastructure is shared between the frontend and backend languages.
     470In the semantics of both front and back-end intermediate languages, we make use of monads.
     471This monadic infrastructure is shared between the front-end and back-end languages.
    472472
    473473In particular, an `IO' monad, signalling the emission of a cost label, or the calling of an external function, is heavily used in the semantics of the intermediate languages.
     
    476476\begin{lstlisting}
    477477definition eval_statement:
    478   ∀globals: list ident.∀p:sem_params2 globals.
     478  $\forall$globals: list ident.$\forall$p:sem_params2 globals.
    479479    genv globals p $\rightarrow$ state p $\rightarrow$ IO io_out io_in (trace $\times$ (state p)) :=
    480480...
     
    484484\begin{lstlisting}
    485485definition eval_statement:
    486   ∀globals: list ident. ∀p:sem_params2 globals.
     486  $\forall$globals: list ident. $\forall$p:sem_params2 globals.
    487487    genv globals p $\rightarrow$ state p $\rightarrow$ IO io_out io_in (trace $\times$ (state p)) :=
    488488  $\lambda$globals, p, ge, st.
     
    527527\label{subsect.memory.models}
    528528
    529 Currently, the semantics of the front and backend intermediate languages are built around two distinct memory models.
    530 The frontend languages reuse the CompCert memory model, whereas the backend languages employ a bespoke model tailored to their needs.
    531 This split between the memory models is reflective of the fact that the front and backend language have different requirements from their memory models, to a certain extent.
    532 
    533 In particular, the CompCert memory model places quite heavy restrictions on where in memory one can read from.
    534 To read a value in this memory model, you must supply an address, complete with a number of `chunks' to read following that address.
    535 The read is only successful if you attempt to read at a genuine `value boundary', and read the appropriate number of memory chunks for that value.
    536 As a result, with the CompCert memory model you are unable to read the third byte of a 32-bit integer value directly from memory, for instance.
    537 This has some consequences for the CompCert compiler, namely an inability to write a \texttt{memcpy} routine.
    538 
    539 However, the CerCo memory model operates differently, as we need to move data `piecemeal' between stacks in the backend of the compiler.
    540 As a result, the bespoke memory model allows one to read data at any memory location, not just on value boundaries.
    541 This has the advantage that we can successfully write a \texttt{memcpy} routine with the CerCo compiler (remembering that \texttt{memcpy} is nothing more than `read a byte, copy a byte' repeated in a loop), an advantage over CompCert.
     529Currently, the semantics of the front and back-end intermediate languages are built around two distinct memory models.
     530The front-end languages reuse the CompCert 1.6 memory model, whereas the back-end languages employ a version tailored to their needs.
     531This split between the memory models reflects the fact that the front-end and back-end languages have different requirements from their memory models.
     532
     533In particular, the CompCert 1.6 memory model places quite heavy restrictions on where in memory one can read from.
     534To read a value in this memory model, you must supply an address, complete with the size of `chunk' to read following that address.
     535The read is only successful if you attempt to read at a genuine `value boundary', and read the appropriate amount of memory for that value.
     536As a result, with that memory model you are unable to read the third byte of a 32-bit integer value directly from memory, for instance.
     537This has some consequences for the compiler, namely an inability to write a \texttt{memcpy} routine.
     538
     539However, the CerCo memory model operates differently, as we need to move data `piecemeal' between stacks in the back-end of the compiler.
     540As a result, the back-end memory model allows one to read data at any memory location, not just on value boundaries.
     541This has the advantage that we can successfully give a semantics to a \texttt{memcpy} routine in the back-end of the CerCo compiler (remembering that \texttt{memcpy} is nothing more than `read a byte, copy a byte' repeated in a loop), an advantage over CompCert.  However, the front-end of CerCo cannot because its memory model and values are the similar to CompCert 1.6.
     542
     543More recent versions of CompCert's memory model have evolved in a similar direction, with a byte-by-byte representation of memory blocks.  However, there remains an important difference in the handling of pointer values in the rest of the formalisation.  In particular, in CompCert 1.10 only complete pointer values can be loaded in all of the languages in the compiler, whereas in CerCo we need to represent individual bytes of a pointer in the back-end to support our 8-bit target architecture.
    542544
    543545Right now, the two memory models are interfaced during the translation from RTLabs to RTL.
    544 It is an open question whether we will unify the two memory models, using only the backend, bespoke memory model throughout the compiler, as the CompCert memory model seems to work fine for the frontend, where such byte-by-byte copying is not needed.
    545 However, should we decide to port the frontend to the new memory model, it has been written in such an abstract way that doing so would be relatively straightforward.
     546It is an open question whether we will unify the two memory models, using only the back-end, bespoke memory model throughout the compiler, as the CompCert memory model seems to work fine for the front-end, where such byte-by-byte copying is not needed.
     547However, should we decide to port the front-end to the new memory model, it has been written in such an abstract way that doing so would be relatively straightforward.
    546548
    547549%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
     
    555557We leave this for further work, due to there being no pressing need to implement this feature at the present time.
    556558
    557 There is also, as mentioned, an open problem as to whether the frontend languages should use the same memory model as the backend languages, as opposed to reusing the CompCert memory model.
    558 Should this decision be taken, this will likely be straightforward but potentially time consuming.
     559There is also, as mentioned, an open problem as to whether the front-end languages should use the same memory model as the back-end languages, as opposed to reusing the CompCert memory model.
     560Should this decision be taken, this will likely be straightforward but potentially time consuming\footnote{After the original version of this deliverable was written we ported the front-end languages' semantics to the back-end memory model.  This turned out not to be time consuming, and moreover used definitions linking front-end and back-end values that are required for the correctness proofs anyway.  However, the front-end still cannot give a semantics to \texttt{memcpy}, for the same reason as CompCert 1.10; the language currently has no representation for a single byte of a pointer.}.
    559561
    560562%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
     
    570572\label{subsect.listing.files}
    571573
    572 Semantics specific files (files relating to language translations ommitted).
    573 Syntax specific files are presented in Table~\ref{table.syntax}.
    574 Here, the O'Caml column denotes the O'Caml source file(s) in the prototype compiler's implementation that corresponds to the Matita script in question.
    575 The ratios are the linecounts of the Matita file divided by the line counts of the corresponding O'Caml file.
     574Syntax specific files are presented in Table~\ref{table.syntax} (files relating to language translations omitted).
     575Here, the OCaml column denotes the OCaml source file(s) in the prototype compiler's implementation that corresponds to the Matita script in question.
     576The ratios are the linecounts of the Matita file divided by the line counts of the corresponding OCaml file.
    576577These are computed with \texttt{wc -l}, a standard Unix tool.
    577578
    578 Individual file's ratios are an over approximation, due to the fact that it's hard to relate an individual O'Caml file to the abstracted Matita code that has been spread across multiple files.
    579 The ratio between total Matita code lines and total O'Caml code lines is more reflective of the compressed and abstracted state of the Matita translation.
     579Individual file's ratios are an over approximation, due to the fact that it's hard to relate an individual OCaml file to the abstracted Matita code that has been spread across multiple files.
     580The ratio between total Matita code lines and total OCaml code lines is more reflective of the compressed and abstracted state of the Matita translation.
    580581
    581582Semantics specific files are presented in Table~\ref{table.semantics}.
     
    584585\begin{threeparttable}
    585586\begin{tabular}{llrlrl}
    586 Description & Matita & Lines & O'Caml & Lines & Ratio \\
    587 \hline
    588 Abstracted syntax for backend languages & \texttt{joint/Joint.ma} & 173 & N/A & N/A & N/A \\
     587Description & Matita & Lines & OCaml & Lines & Ratio \\
     588\hline
     589Abstracted syntax for back-end languages & \texttt{joint/Joint.ma} & 173 & N/A & N/A & N/A \\
    589590The syntax of RTLabs & \texttt{RTLabs/syntax.ma} & 73 & \texttt{RTLabs/RTLabs.mli} & 113 & 0.65 \\
    590591The syntax of RTL & \texttt{RTL/RTL.ma} & 49 & \texttt{RTL/RTL.mli} & 120 & 1.85\tnote{a} \\
     
    599600  \begin{tabular}{ll}
    600601  Total lines of Matita code for the above files:& 347 \\
    601   Total lines of O'Caml code for the above files:& 616 \\
     602  Total lines of OCaml code for the above files:& 616 \\
    602603  Ratio of total lines:& 0.56
    603604  \end{tabular}
     
    613614\begin{threeparttable}
    614615\begin{tabular}{llrlrl}
    615 Description & Matita & Lines & O'Caml & Lines & Ratio \\
     616Description & Matita & Lines & OCaml & Lines & Ratio \\
    616617\hline
    617618Semantics of the abstracted languages & \texttt{joint/semantics.ma} & 434 & N/A & N/A & N/A  \\
     
    629630 \begin{tabular}{ll}
    630631  Total lines of Matita code for the above files:& 1145 \\
    631   Total lines of O'Caml code for the above files:& 1978 \\
     632  Total lines of OCaml code for the above files:& 1978 \\
    632633  Ration of total lines:& 0.58
    633634 \end{tabular}
     
    645646\label{subsect.listing.important.functions.axioms}
    646647
    647 We list some important functions and axioms in the backend semantics:
     648We list some important functions and axioms in the back-end semantics:
    648649
    649650\paragraph{From RTLabs/semantics.ma}
Note: See TracChangeset for help on using the changeset viewer.