# Changeset 1968 for Deliverables/D4.2-4.3/reports/D4-3.tex

Ignore:
Timestamp:
May 18, 2012, 12:08:54 PM (8 years ago)
Message:

Update D4.3's title, memory model details, and some typographical
inconsistencies and glitches.

File:
1 edited

Unmodified
Added
Removed
• ## Deliverables/D4.2-4.3/reports/D4-3.tex

 r1456 \textbf{ Report n. D4.3\\ Formal semantics of intermediate languages Executable Formal Semantics of back-end intermediate languages } \end{LARGE} \begin{center} \begin{large} Version 1.0 Version 1.1 \end{large} \end{center} \vspace*{7cm} \paragraph{Abstract} We describe the encoding in the Calculus of Constructions of the semantics of the CerCo compiler's backend intermediate languages. The CerCo backend consists of five distinct languages: RTL, RTLntl, ERTL, LTL and LIN. We describe the encoding in the Calculus of Constructions of the semantics of the CerCo compiler's back-end intermediate languages. The CerCo back-end consists of five distinct languages: RTL, RTLntl, ERTL, LTL and LIN. We describe a process of heavy abstraction of the intermediate languages and their semantics. We hope that this process will ease the burden of Deliverable D4.4, the proof of correctness for the compiler. Deliverable D4.3 enjoys a close relationship with three other deliverables, namely deliverables D2.2, D4.3 and D4.4. Deliverable D2.2, the O'Caml implementation of a cost preserving compiler for a large subset of the C programming language, is the basis upon which we have implemented the current deliverable. In particular, the architecture of the compiler, its intermediate languages and their semantics, and the overall implementation of the Matita encodings has been taken from the O'Caml compiler. Any variations from the O'Caml design are due to bugs identified in the prototype compiler during the Matita implementation, our identification of code that can be abstracted and made generic, or our use of Matita's much stronger type system to enforce invariants through the use of dependent types. Deliverable D2.2, the OCaml implementation of a cost preserving compiler for a large subset of the C programming language, is the basis upon which we have implemented the current deliverable. In particular, the architecture of the compiler, its intermediate languages and their semantics, and the overall implementation of the Matita encodings has been taken from the OCaml compiler. Any variations from the OCaml design are due to bugs identified in the prototype compiler during the Matita implementation, our identification of code that can be abstracted and made generic, or our use of Matita's much stronger type system to enforce invariants through the use of dependent types. Deliverable D4.2 can be seen as a sister' deliverable to the deliverable reported on herein. In particular, where this deliverable reports on the encoding in the Calculus of Constructions of the backend semantics, D4.2 is the encoding in the Calculus of Constructions of the mutual translations of those languages. In particular, where this deliverable reports on the encoding in the Calculus of Constructions of the back-end semantics, D4.2 is the encoding in the Calculus of Constructions of the mutual translations of those languages. As a result, a substantial amount of Matita code is shared between the two deliverables. Deliverable D4.4, the backend correctness proofs, is the immediate successor of this deliverable. %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% % SECTION.                                                                    % %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% \section{The backend intermediate languages' semantics in Matita} Deliverable D4.4, the back-end correctness proofs, is the immediate successor of this deliverable. %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% % SECTION.                                                                    % %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% \section{The back-end intermediate languages' semantics in Matita} \label{sect.backend.intermediate.languages.semantics.matita} \label{subsect.abstracting.related.languages} As mentioned in the report for Deliverable D4.2, a systematic process of abstraction, over the O'Caml code, has taken place in the Matita encoding. As mentioned in the report for Deliverable D4.2, a systematic process of abstraction, over the OCaml code, has taken place in the Matita encoding. In particular, we have merged many of the syntaxes of the intermediate languages (i.e. RTL, ERTL, LTL and LIN) into a single joint' syntax, which is parameterised by various types. Equivalent intermediate languages to those present in the O'Caml code can be recovered by specialising this joint structure. Equivalent intermediate languages to those present in the OCaml code can be recovered by specialising this joint structure. As mentioned in the report for Deliverable D4.2, there are a number of advantages that this process of abstraction brings, from code reuse to allowing us to get a clearer view of the intermediate languages and their structure. }. \end{lstlisting} Here, \texttt{resultT} and \texttt{resultT} typically are the (pseudo)registers that store the parameters and result of a function. Here, \texttt{paramsT} and \texttt{resultT} typically are the (pseudo)registers that store the parameters and result of a function. We further extend \texttt{params0} with a type for local variables in internal function calls: \begin{lstlisting} definition eval_statement: ∀globals: list ident.∀p:sem_params2 globals. $\forall$globals: list ident.$\forall$p:sem_params2 globals. genv globals p $\rightarrow$ state p $\rightarrow$ IO io_out io_in (trace $\times$ (state p)) := ... \item A data type, $M$. For instance, the \texttt{option} type in O'Caml or Matita. For instance, the \texttt{option} type in OCaml or Matita. \item A way to inject' or lift' pure values into this data type (usually called \texttt{return}). These \emph{monad laws} should also be useful in reasoning about monadic computations in the proof of correctness of the compiler. \end{itemize} In the semantics of both front and backend intermediate languages, we make use of monads. This monadic infrastructure is shared between the frontend and backend languages. In the semantics of both front and back-end intermediate languages, we make use of monads. This monadic infrastructure is shared between the front-end and back-end languages. In particular, an IO' monad, signalling the emission of a cost label, or the calling of an external function, is heavily used in the semantics of the intermediate languages. \begin{lstlisting} definition eval_statement: ∀globals: list ident.∀p:sem_params2 globals. $\forall$globals: list ident.$\forall$p:sem_params2 globals. genv globals p $\rightarrow$ state p $\rightarrow$ IO io_out io_in (trace $\times$ (state p)) := ... \begin{lstlisting} definition eval_statement: ∀globals: list ident. ∀p:sem_params2 globals. $\forall$globals: list ident. $\forall$p:sem_params2 globals. genv globals p $\rightarrow$ state p $\rightarrow$ IO io_out io_in (trace $\times$ (state p)) := $\lambda$globals, p, ge, st. \label{subsect.memory.models} Currently, the semantics of the front and backend intermediate languages are built around two distinct memory models. The frontend languages reuse the CompCert memory model, whereas the backend languages employ a bespoke model tailored to their needs. This split between the memory models is reflective of the fact that the front and backend language have different requirements from their memory models, to a certain extent. In particular, the CompCert memory model places quite heavy restrictions on where in memory one can read from. To read a value in this memory model, you must supply an address, complete with a number of chunks' to read following that address. The read is only successful if you attempt to read at a genuine value boundary', and read the appropriate number of memory chunks for that value. As a result, with the CompCert memory model you are unable to read the third byte of a 32-bit integer value directly from memory, for instance. This has some consequences for the CompCert compiler, namely an inability to write a \texttt{memcpy} routine. However, the CerCo memory model operates differently, as we need to move data piecemeal' between stacks in the backend of the compiler. As a result, the bespoke memory model allows one to read data at any memory location, not just on value boundaries. This has the advantage that we can successfully write a \texttt{memcpy} routine with the CerCo compiler (remembering that \texttt{memcpy} is nothing more than read a byte, copy a byte' repeated in a loop), an advantage over CompCert. Currently, the semantics of the front and back-end intermediate languages are built around two distinct memory models. The front-end languages reuse the CompCert 1.6 memory model, whereas the back-end languages employ a version tailored to their needs. This split between the memory models reflects the fact that the front-end and back-end languages have different requirements from their memory models. In particular, the CompCert 1.6 memory model places quite heavy restrictions on where in memory one can read from. To read a value in this memory model, you must supply an address, complete with the size of chunk' to read following that address. The read is only successful if you attempt to read at a genuine value boundary', and read the appropriate amount of memory for that value. As a result, with that memory model you are unable to read the third byte of a 32-bit integer value directly from memory, for instance. This has some consequences for the compiler, namely an inability to write a \texttt{memcpy} routine. However, the CerCo memory model operates differently, as we need to move data piecemeal' between stacks in the back-end of the compiler. As a result, the back-end memory model allows one to read data at any memory location, not just on value boundaries. This has the advantage that we can successfully give a semantics to a \texttt{memcpy} routine in the back-end of the CerCo compiler (remembering that \texttt{memcpy} is nothing more than `read a byte, copy a byte' repeated in a loop), an advantage over CompCert.  However, the front-end of CerCo cannot because its memory model and values are the similar to CompCert 1.6. More recent versions of CompCert's memory model have evolved in a similar direction, with a byte-by-byte representation of memory blocks.  However, there remains an important difference in the handling of pointer values in the rest of the formalisation.  In particular, in CompCert 1.10 only complete pointer values can be loaded in all of the languages in the compiler, whereas in CerCo we need to represent individual bytes of a pointer in the back-end to support our 8-bit target architecture. Right now, the two memory models are interfaced during the translation from RTLabs to RTL. It is an open question whether we will unify the two memory models, using only the backend, bespoke memory model throughout the compiler, as the CompCert memory model seems to work fine for the frontend, where such byte-by-byte copying is not needed. However, should we decide to port the frontend to the new memory model, it has been written in such an abstract way that doing so would be relatively straightforward. It is an open question whether we will unify the two memory models, using only the back-end, bespoke memory model throughout the compiler, as the CompCert memory model seems to work fine for the front-end, where such byte-by-byte copying is not needed. However, should we decide to port the front-end to the new memory model, it has been written in such an abstract way that doing so would be relatively straightforward. %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% We leave this for further work, due to there being no pressing need to implement this feature at the present time. There is also, as mentioned, an open problem as to whether the frontend languages should use the same memory model as the backend languages, as opposed to reusing the CompCert memory model. Should this decision be taken, this will likely be straightforward but potentially time consuming. There is also, as mentioned, an open problem as to whether the front-end languages should use the same memory model as the back-end languages, as opposed to reusing the CompCert memory model. Should this decision be taken, this will likely be straightforward but potentially time consuming\footnote{After the original version of this deliverable was written we ported the front-end languages' semantics to the back-end memory model.  This turned out not to be time consuming, and moreover used definitions linking front-end and back-end values that are required for the correctness proofs anyway.  However, the front-end still cannot give a semantics to \texttt{memcpy}, for the same reason as CompCert 1.10; the language currently has no representation for a single byte of a pointer.}. %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% \label{subsect.listing.files} Semantics specific files (files relating to language translations ommitted). Syntax specific files are presented in Table~\ref{table.syntax}. Here, the O'Caml column denotes the O'Caml source file(s) in the prototype compiler's implementation that corresponds to the Matita script in question. The ratios are the linecounts of the Matita file divided by the line counts of the corresponding O'Caml file. Syntax specific files are presented in Table~\ref{table.syntax} (files relating to language translations omitted). Here, the OCaml column denotes the OCaml source file(s) in the prototype compiler's implementation that corresponds to the Matita script in question. The ratios are the linecounts of the Matita file divided by the line counts of the corresponding OCaml file. These are computed with \texttt{wc -l}, a standard Unix tool. Individual file's ratios are an over approximation, due to the fact that it's hard to relate an individual O'Caml file to the abstracted Matita code that has been spread across multiple files. The ratio between total Matita code lines and total O'Caml code lines is more reflective of the compressed and abstracted state of the Matita translation. Individual file's ratios are an over approximation, due to the fact that it's hard to relate an individual OCaml file to the abstracted Matita code that has been spread across multiple files. The ratio between total Matita code lines and total OCaml code lines is more reflective of the compressed and abstracted state of the Matita translation. Semantics specific files are presented in Table~\ref{table.semantics}. \begin{threeparttable} \begin{tabular}{llrlrl} Description & Matita & Lines & O'Caml & Lines & Ratio \\ \hline Abstracted syntax for backend languages & \texttt{joint/Joint.ma} & 173 & N/A & N/A & N/A \\ Description & Matita & Lines & OCaml & Lines & Ratio \\ \hline Abstracted syntax for back-end languages & \texttt{joint/Joint.ma} & 173 & N/A & N/A & N/A \\ The syntax of RTLabs & \texttt{RTLabs/syntax.ma} & 73 & \texttt{RTLabs/RTLabs.mli} & 113 & 0.65 \\ The syntax of RTL & \texttt{RTL/RTL.ma} & 49 & \texttt{RTL/RTL.mli} & 120 & 1.85\tnote{a} \\ \begin{tabular}{ll} Total lines of Matita code for the above files:& 347 \\ Total lines of O'Caml code for the above files:& 616 \\ Total lines of OCaml code for the above files:& 616 \\ Ratio of total lines:& 0.56 \end{tabular} \begin{threeparttable} \begin{tabular}{llrlrl} Description & Matita & Lines & O'Caml & Lines & Ratio \\ Description & Matita & Lines & OCaml & Lines & Ratio \\ \hline Semantics of the abstracted languages & \texttt{joint/semantics.ma} & 434 & N/A & N/A & N/A  \\ \begin{tabular}{ll} Total lines of Matita code for the above files:& 1145 \\ Total lines of O'Caml code for the above files:& 1978 \\ Total lines of OCaml code for the above files:& 1978 \\ Ration of total lines:& 0.58 \end{tabular} \label{subsect.listing.important.functions.axioms} We list some important functions and axioms in the backend semantics: We list some important functions and axioms in the back-end semantics: \paragraph{From RTLabs/semantics.ma}
Note: See TracChangeset for help on using the changeset viewer.