[917] | 1 | \documentclass{llncs} |
---|
| 2 | |
---|
[960] | 3 | \usepackage{amsfonts} |
---|
[952] | 4 | \usepackage{amsmath} |
---|
[960] | 5 | \usepackage{amssymb} |
---|
[917] | 6 | \usepackage[english]{babel} |
---|
[960] | 7 | \usepackage{color} |
---|
| 8 | \usepackage{fancybox} |
---|
| 9 | \usepackage{graphicx} |
---|
[917] | 10 | \usepackage[colorlinks]{hyperref} |
---|
[957] | 11 | \usepackage[utf8x]{inputenc} |
---|
| 12 | \usepackage{listings} |
---|
[960] | 13 | \usepackage{mdwlist} |
---|
| 14 | \usepackage{microtype} |
---|
| 15 | \usepackage{stmaryrd} |
---|
| 16 | \usepackage{url} |
---|
[917] | 17 | |
---|
[960] | 18 | \renewcommand{\verb}{\lstinline} |
---|
| 19 | \def\lstlanguagefiles{lst-grafite.tex} |
---|
| 20 | \lstset{language=Grafite} |
---|
| 21 | |
---|
[957] | 22 | \newlength{\mylength} |
---|
| 23 | \newenvironment{frametxt}% |
---|
| 24 | {\setlength{\fboxsep}{5pt} |
---|
| 25 | \setlength{\mylength}{\linewidth}% |
---|
| 26 | \addtolength{\mylength}{-2\fboxsep}% |
---|
| 27 | \addtolength{\mylength}{-2\fboxrule}% |
---|
| 28 | \Sbox |
---|
| 29 | \minipage{\mylength}% |
---|
| 30 | \setlength{\abovedisplayskip}{0pt}% |
---|
| 31 | \setlength{\belowdisplayskip}{0pt}% |
---|
| 32 | }% |
---|
| 33 | {\endminipage\endSbox |
---|
| 34 | \[\fbox{\TheSbox}\]} |
---|
| 35 | |
---|
[952] | 36 | \title{On the correctness of an assembler for the Intel MCS-51 microprocessor} |
---|
[927] | 37 | \author{Jaap Boender \and Dominic P. Mulligan \and Claudio Sacerdoti Coen} |
---|
| 38 | \institute{Dipartimento di Scienze dell'Informazione, Universit\'a di Bologna} |
---|
[918] | 39 | |
---|
[927] | 40 | \bibliographystyle{splncs03} |
---|
| 41 | |
---|
[917] | 42 | \begin{document} |
---|
| 43 | |
---|
| 44 | \maketitle |
---|
| 45 | |
---|
| 46 | \begin{abstract} |
---|
[952] | 47 | We consider the formalisation of an assembler for the Intel MCS-51 8-bit microprocessor in the Matita proof assistant. |
---|
| 48 | This formalisation forms a major component of the EU-funded CerCo project, concering the construction and formalisation of a concrete complexity preserving compiler for a large subset of the C programming language. |
---|
| 49 | ... |
---|
[917] | 50 | \end{abstract} |
---|
| 51 | |
---|
[918] | 52 | % ---------------------------------------------------------------------------- % |
---|
| 53 | % SECTION % |
---|
| 54 | % ---------------------------------------------------------------------------- % |
---|
| 55 | \section{Introduction} |
---|
| 56 | \label{sect.introduction} |
---|
| 57 | |
---|
[953] | 58 | We consider the formalisation of an assembler for the Intel MCS-51 8-bit microprocessor in the Matita proof assistant~\cite{asperti:user:2007}. |
---|
| 59 | This formalisation forms a major component of the EU-funded CerCo project~\cite{cerco:2011}, concering the construction and formalisation of a concrete complexity preserving compiler for a large subset of the C programming language. |
---|
[947] | 60 | |
---|
[953] | 61 | The MCS-51 dates from the early 1980s and commonly called the 8051/8052. |
---|
| 62 | Despite the microprocessor's age, derivatives are still widely manufactured by a number of semiconductor foundries. |
---|
| 63 | As a result the processor is widely used, especially in embedded systems development, where well-tested, cheap, predictable microprocessors find their niche. |
---|
[927] | 64 | |
---|
[952] | 65 | The MCS-51 has a relative paucity of features compared to its more modern brethren. |
---|
| 66 | In particular, the MCS-51 does not possess a cache or any instruction pipelining that would make predicting the concrete cost of executing a single instruction an involved process. |
---|
| 67 | Instead, each semiconductor foundry that produces an MCS-51 derivative is able to provide accurate timing information in clock cycles for each instruction in their derivative's instruction set. |
---|
| 68 | It is important to stress that this timing information, unlike in more sophisticated processors, is not an estimate, it is a definition. |
---|
[953] | 69 | For the MCS-51, if a manufacturer states that a particular opcode takes three clock cycles to execute, then that opcode \emph{always} takes three clock cycles to execute. |
---|
[927] | 70 | |
---|
[952] | 71 | This predicability of timing information is especially attractive to the CerCo consortium. |
---|
[953] | 72 | We are in the process of constructing a certified, concrete complexity compiler for a realistic processor, and not for building and formalising the worst case execution time (WCET) tools that would be necessary to achieve the same result with, for example, a modern ARM or PowerPC microprocessor. |
---|
| 73 | |
---|
[952] | 74 | However, the MCS-51's paucity of features is a double edged sword. |
---|
[953] | 75 | In particular, the MCS-51 features relatively miniscule memory spaces (including read-only code memory, stack and internal/external random access memory) by modern standards. |
---|
| 76 | As a result our compiler, to have any sort of hope of successfully compiling realistic C programs, ought to produce `tight' machine code. |
---|
[952] | 77 | This is not simple. |
---|
[927] | 78 | |
---|
[953] | 79 | We here focus on a single issue in the MCS-51's instruction set: unconditional jumps. |
---|
| 80 | The MCS-51 features three conditional jump instructions: \texttt{LJMP} and \texttt{SJMP}---`long jump' and `short jump' respectively---and an 11-bit oddity of the MCS-51, \texttt{AJMP}, that the prototype CerCo compiler~\cite{cerco-report-code:2011} ignores for simplicity's sake.\footnote{Ignoring \texttt{AJMP} and its analogue \texttt{ACALL} is not idiosyncratic. The Small Device C Compiler (SDCC)~\cite{sdcc:2011}, the leading open source C compiler for the MCS-51, also seemingly does not produce \texttt{AJMP} and \texttt{ACALL} instructions. Their utility in a modern context remains unclear.} |
---|
[952] | 81 | Each of these three instructions expects arguments in different sizes and behaves in different ways. |
---|
[953] | 82 | \texttt{SJMP} may only perform a `local jump' whereas \texttt{LJMP} may jump to any address in the MCS-51's memory space. |
---|
| 83 | Consequently, the size of each opcode is different, and to squeeze as much code as possible into the MCS-51's limited code memory, the smallest possible opcode should be selected. |
---|
[927] | 84 | |
---|
[953] | 85 | The prototype CerCo C compiler does not attempt to select the smallest jump opcode in this manner, as this was thought to unneccessarily complicate the compilation chain. |
---|
| 86 | Instead, the compiler targets an assembly language, complete with pseudoinstructions including bespoke \texttt{Jmp} and \texttt{Call} instructions. |
---|
| 87 | Labels, conditional jumps to labels, a program preamble containing global data and a \texttt{MOV} instruction for moving this global data into the MCS-51's one 16-bit register also feature. |
---|
[952] | 88 | This latter feature will ease any later consideration of separate compilation in the CerCo compiler. |
---|
[953] | 89 | An assembler is used to expand pseudoinstructions into MCS-51 machine code. |
---|
[927] | 90 | |
---|
[953] | 91 | However, this assembly process is not trivial, for numerous reasons. |
---|
| 92 | For example, our conditional jumps to labels behave differently from their machine code counterparts. |
---|
| 93 | At the machine code level, conditional jumps may only jump to a relative offset, expressed in a byte, of the current program counter, limiting their range. |
---|
| 94 | However, at the assembly level, conditional jumps may jump to a label that appears anywhere in the program, significantly liberalising the use of conditional jumps and further simplifying the design of the CerCo compiler. |
---|
[952] | 95 | |
---|
[953] | 96 | Further, trying to na\"ively relate assembly programs with their machine code counterparts simply does not work. |
---|
[952] | 97 | Machine code programs that fetch from code memory and programs that combine the program counter with constant shifts do not make sense at the assembly level. |
---|
| 98 | More generally, memory addresses can only be compared with other memory addresses. |
---|
| 99 | However, checking that memory addresses are only compared against each other at the assembly level is in fact undecidable. |
---|
| 100 | In short, the full preservation of the semantics of the two languages is impossible. |
---|
| 101 | |
---|
[953] | 102 | Yet more complications are added by the peculiarities of the CerCo project itself. |
---|
[952] | 103 | As mentioned, the CerCo consortium is in the business of constructing a verified compiler for the C programming language. |
---|
[953] | 104 | However, unlike CompCert~\cite{compcert:2011,leroy:formal:2009,leroy:formally:2009}---which currently represents the state of the art for `industrial grade' verified compilers---CerCo considers not just the \emph{intensional correctness} of the compiler, but also its \emph{extensional correctness}. |
---|
[952] | 105 | That is, CompCert focusses solely on the preservation of the \emph{meaning} of a program during the compilation process, guaranteeing that the program's meaning does not change as it is gradually transformed into assembly code. |
---|
| 106 | However in any realistic compiler (even the CompCert compiler!) there is no guarantee that the program's time properties are preserved during the compilation process; a compiler's `optimisations' could, in theory, even conspire to degrade the concrete complexity of certain classes of programs. |
---|
| 107 | CerCo aims to expand the current state of the art by producing a compiler where this temporal degradation is guaranteed not to happen. |
---|
| 108 | |
---|
| 109 | In order to achieve this CerCo imposes a cost model on programs, or more specifically, on simple blocks of instructions. |
---|
[954] | 110 | This cost model is induced by the compilation process itself, and its non-compositional nature allows us to assign different costs to identical blocks of instructions depending on how they are compiled. |
---|
| 111 | In short, we aim to obtain a very precise costing for a program by embracing the compilation process, not ignoring it. |
---|
| 112 | This, however, complicates the proof of correctness for the compiler proper: for every translation pass from intermediate language to intermediate language, we must prove that not only has the meaning of a program been preserved, but also its complexity characteristics. |
---|
[952] | 113 | This also applies for the translation from assembly language to machine code. |
---|
[954] | 114 | |
---|
[952] | 115 | How do we assign a cost to a pseudoinstruction? |
---|
| 116 | As mentioned, conditional jumps at the assembly level can jump to a label appearing anywhere in the program. |
---|
[954] | 117 | However, at the machine code level, conditional jumps are limited to jumping `locally', using a measly byte offset. |
---|
| 118 | To translate a jump to a label, a single conditional jump pseudoinstruction may be translated into a block of three real instructions, as follows (here, \texttt{JZ} is `jump if accumulator is zero'): |
---|
[952] | 119 | \begin{displaymath} |
---|
| 120 | \begin{array}{r@{\quad}l@{\;\;}l@{\qquad}c@{\qquad}l@{\;\;}l} |
---|
| 121 | & \mathtt{JZ} & label & & \mathtt{JZ} & \text{size of \texttt{SJMP} instruction} \\ |
---|
| 122 | & \ldots & & \text{translates to} & \mathtt{SJMP} & \text{size of \texttt{LJMP} instruction} \\ |
---|
| 123 | label: & \mathtt{MOV} & \mathtt{A}\;\;\mathtt{B} & \Longrightarrow & \mathtt{LJMP} & \text{address of \textit{label}} \\ |
---|
| 124 | & & & & \ldots & \\ |
---|
| 125 | & & & & \mathtt{MOV} & \mathtt{A}\;\;\mathtt{B} |
---|
| 126 | \end{array} |
---|
| 127 | \end{displaymath} |
---|
| 128 | In the translation, if \texttt{JZ} fails, we fall through to the \texttt{SJMP} which jumps over the \texttt{LJMP}. |
---|
[954] | 129 | Naturally, if \textit{label} is close enough, a conditional jump pseudoinstruction is mapped directly to a conditional jump machine instruction; the above translation only applies if \textit{label} is not sufficiently local. |
---|
[952] | 130 | This leaves the problem, addressed below, of calculating whether a label is indeed `close enough' for the simpler translation to be used. |
---|
| 131 | |
---|
| 132 | Crucially, the above translation demonstrates the difficulty in predicting how many clock cycles a pseudoinstruction will take to execute. |
---|
| 133 | A conditional jump may be mapped to a single machine instruction or a block of three. |
---|
| 134 | Perhaps more insidious, the number of cycles needed to execute the instructions in the two branches of a translated conditional jump may be different. |
---|
[954] | 135 | Depending on the particular MCS-51 derivative at hand, an \texttt{SJMP} could in theory take a different number of clock cycles to execute than an \texttt{LJMP}. |
---|
[952] | 136 | These issues must also be dealt with in order to prove that the translation pass preserves the concrete complexity of the code. |
---|
| 137 | |
---|
| 138 | The question remains: how do we decide whether to expand a jump into an \texttt{SJMP} or an \texttt{LJMP}? |
---|
[954] | 139 | To understand why this problem is not trivial, consider the following snippet of assembly code: |
---|
[952] | 140 | \begin{displaymath} |
---|
| 141 | \text{dpm: finish me} |
---|
| 142 | \end{displaymath} |
---|
[954] | 143 | |
---|
[952] | 144 | As our example shows, given an occurence $l$ of an \texttt{LJMP} instruction, it may be possible to shrink $l$ to an occurence of an \texttt{SJMP} providing we can shrink any \texttt{LJMP}s that exist between $l$ and its target location. |
---|
| 145 | However, shrinking these \texttt{LJMP}s may in turn depend on shrinking $l$ to an \texttt{SJMP}, as it is perfectly possible to jump backwards. |
---|
[954] | 146 | In short, unless we can somehow break this loop of circularity, and similar knotty configurations of jumps, we are stuck with a suboptimal solution to the expanding jumps problem. |
---|
[952] | 147 | |
---|
[954] | 148 | How we went about resolving this problem affected the shape of our proof of correctness for the whole assembler in a rather profound way. |
---|
| 149 | We first attempted to synthesize a solution bottom up: starting with no solution, we gradually refine a solution using the same functions that implement the jump expansion. |
---|
[952] | 150 | Using this technique, solutions can fail to exist, and the proof quickly descends into a diabolical quagmire. |
---|
| 151 | |
---|
[954] | 152 | Abandoning this attempt, we instead split the `policy'---the decision over how any particular jump should be expanded---from the implementation that actually expands assembly programs into machine code. |
---|
[952] | 153 | Assuming the existence of a correct policy, we proved the implementation of the assembler correct. |
---|
[954] | 154 | Further, we proved that the assembler fails to assemble an assembly program if and only if a correct policy does not exist. |
---|
[952] | 155 | Policies do not exist in only a limited number of circumstances: namely, if a pseudoinstruction attempts to jump to a label that does not exist, or the program is too large to fit in code memory. |
---|
[954] | 156 | The first case would constitute a serious compiler error, and hopefully certifying the rest of the compiler would rule this possibility out. |
---|
| 157 | The second case is unavoidable---certified compiler or not, trying to load a huge program into a small code memory will break \emph{something}. |
---|
[952] | 158 | |
---|
[954] | 159 | The rest of this paper is a detailed description of this proof. |
---|
| 160 | |
---|
[927] | 161 | % ---------------------------------------------------------------------------- % |
---|
| 162 | % SECTION % |
---|
| 163 | % ---------------------------------------------------------------------------- % |
---|
| 164 | \subsection{Overview of the paper} |
---|
| 165 | \label{subsect.overview.of.the.paper} |
---|
[952] | 166 | In Section~\ref{sect.matita} we provide a brief overview of the Matita proof assistant for the unfamiliar reader. |
---|
| 167 | In Section~\ref{sect.the.proof} we discuss the design and implementation of the proof proper. |
---|
| 168 | In Section~\ref{sect.conclusions} we conclude. |
---|
[927] | 169 | |
---|
| 170 | % ---------------------------------------------------------------------------- % |
---|
| 171 | % SECTION % |
---|
| 172 | % ---------------------------------------------------------------------------- % |
---|
[952] | 173 | \section{Matita} |
---|
| 174 | \label{sect.matita} |
---|
[927] | 175 | |
---|
[960] | 176 | Matita is a proof assistant based on the (Co)inductive Calculus of Constructions~\cite{asperti:user:2007}. |
---|
| 177 | For those familiar with Coq, Matita's syntax and mode of operation should be entirely familiar. |
---|
| 178 | We take time here to explain one of Matita's syntactic idiosyncracies, however. |
---|
| 179 | The use of `$\mathtt{\ldots}$' or `$\mathtt{?}$' in an argument position denotes a type to be inferred automatically by unification. |
---|
| 180 | The use of `$\mathtt{?}$' in the body of a definition, lemma or axiom denotes an incomplete term that is to be closed, by hand, using tactics. |
---|
[952] | 181 | |
---|
[927] | 182 | % ---------------------------------------------------------------------------- % |
---|
| 183 | % SECTION % |
---|
| 184 | % ---------------------------------------------------------------------------- % |
---|
[952] | 185 | \section{The proof} |
---|
| 186 | \label{sect.the.proof} |
---|
[927] | 187 | |
---|
[960] | 188 | \subsection{The assembler and semantics of machine code} |
---|
| 189 | \label{subsect.the.assembler.and.semantics.of.machine.code} |
---|
[952] | 190 | |
---|
[960] | 191 | The formalisation in Matita of the semantics of MCS-51 machine code is described in~\cite{mulligan:executable:2011}. |
---|
| 192 | We merely describe enough here to understand the rest of the proof. |
---|
| 193 | |
---|
| 194 | At heart, the MCS-51 emulator centres around a \texttt{Status} record, describing the current state of the microprocessor. |
---|
| 195 | This record contains fields corresponding to the microprocessor's program counter, special function registers, and so on. |
---|
| 196 | At the machine code level, code memory is implemented as a trie of bytes, addressed by the program counter. |
---|
| 197 | Machine code programs are loaded into \texttt{Status} using the \texttt{load\_code\_memory} function. |
---|
| 198 | |
---|
| 199 | We may execut a single step of a machine code program using the \texttt{execute\_1} function, which returns an updated \texttt{Status}: |
---|
[957] | 200 | \begin{lstlisting} |
---|
[960] | 201 | definition execute_1: Status $\rightarrow$ Status := $\ldots$ |
---|
[957] | 202 | \end{lstlisting} |
---|
[960] | 203 | The function \texttt{execute} allows one to execute an arbitrary, but fixed (due to Matita's normalisation requirement!) number of steps of a program. |
---|
[957] | 204 | |
---|
[960] | 205 | Naturally, assembly programs have analogues. |
---|
| 206 | The counterpart of the \texttt{Status} record is \texttt{PseudoStatus}. |
---|
| 207 | Instead of code memory being implemented as tries of bytes, code memory is here implemented as lists of pseudoinstructions, and program counters are merely indices into this list. |
---|
| 208 | Our analogue of \texttt{execute\_1} is \texttt{execute\_1\_pseudo\_instruction}: |
---|
[957] | 209 | \begin{lstlisting} |
---|
[960] | 210 | definition execute_1_pseudo_instruction: (Word $\rightarrow$ nat $\times$ nat) $\rightarrow$ |
---|
| 211 | PseudoStatus $\rightarrow$ PseudoStatus := $\ldots$ |
---|
[957] | 212 | \end{lstlisting} |
---|
[960] | 213 | Notice, here, that the emulation function for pseudoprograms takes an additional argument. |
---|
| 214 | This is a function that maps program counters (for the pseudoprogram) to pairs of natural numbers representing the number of clock ticks that the pseudoinstruction needs to execute, post expansion. |
---|
| 215 | We require \emph{pairs} of natural numbers because, in the case of expanding conditional jumps to labels, the expansion of the `true branch' and `false branch' may differ in the number of clock ticks needed for execution. |
---|
| 216 | This timing information is used inside \texttt{execute\_1\_pseudo\_instruction} to update the clock of the \texttt{PseudoStatus}. |
---|
| 217 | During the proof of correctness of the assembler we relate the clocks of \texttt{Status}es and \texttt{PseudoStatus}es. |
---|
[957] | 218 | |
---|
[960] | 219 | The assembler, mapping programs consisting of lists of pseudoinstructions to lists of bytes, operates in a mostly straightforward manner. |
---|
| 220 | To a degree of approximation, the assembler on an assembly program, consisting of $n$ pseudoinstructions $\mathtt{P_i}$ for $1 \leq i \leq n$, works as in the following diagram: |
---|
| 221 | \begin{displaymath} |
---|
| 222 | [\mathtt{P_1}, \ldots \mathtt{P_n}] \xrightarrow{\left(\mathtt{P_i} \xrightarrow{\mbox{\fontsize{7}{9}\selectfont$\mathtt{expand}$}} \mathtt{[I_1^i, \ldots I^q_i]} \xrightarrow{\mbox{\fontsize{7}{9}\selectfont$\mathtt{assembly1}^*$}} \mathtt{[0110]}\right)^{*}} \mathtt{[010101]} |
---|
| 223 | \end{displaymath} |
---|
| 224 | |
---|
| 225 | Here $\mathtt{I^i_j}$ for $1 \leq j \leq q$ are the $q$ machine code instructions obtained by expanding, with \texttt{expand\_pseudo\_instruction}, a single pseudoinstruction. |
---|
| 226 | Each machine code instruction $\mathtt{I^i_j}$ is then assembled, using the \texttt{assembly1} function, into a list of bytes. |
---|
| 227 | This process is iterated for each pseudoinstruction, before the lists are flattened into a single bit list representation of the original assembly program. |
---|
| 228 | |
---|
| 229 | % ---------------------------------------------------------------------------- % |
---|
| 230 | % SECTION % |
---|
| 231 | % ---------------------------------------------------------------------------- % |
---|
| 232 | \subsection{Policies} |
---|
| 233 | \label{subsect.policies} |
---|
| 234 | |
---|
[957] | 235 | \begin{lstlisting} |
---|
| 236 | inductive jump_length: Type[0] ≝ |
---|
| 237 | | short_jump: jump_length |
---|
| 238 | | medium_jump: jump_length |
---|
| 239 | | long_jump: jump_length. |
---|
[960] | 240 | \end{lstlisting} |
---|
[957] | 241 | |
---|
[960] | 242 | \begin{lstlisting} |
---|
[957] | 243 | definition jump_expansion_policy ≝ BitVectorTrie jump_length 16. |
---|
| 244 | \end{lstlisting} |
---|
| 245 | |
---|
| 246 | \begin{lstlisting} |
---|
| 247 | definition policy_ok := λpolicy.λp. sigma_safe policy p <> None .... |
---|
| 248 | \end{lstlisting} |
---|
| 249 | |
---|
| 250 | \begin{lstlisting} |
---|
| 251 | definition sigma: |
---|
| 252 | ∀p:pseudo_assembly_program. |
---|
| 253 | ∀policy. policy_ok policy p. Word → Word |
---|
| 254 | \end{lstlisting} |
---|
| 255 | |
---|
| 256 | \begin{lstlisting} |
---|
| 257 | axiom low_internal_ram_of_pseudo_low_internal_ram: |
---|
| 258 | ∀M:internal_pseudo_address_map.∀ram:BitVectorTrie Byte 7.BitVectorTrie Byte 7. |
---|
| 259 | \end{lstlisting} |
---|
| 260 | |
---|
| 261 | CSC: no option using policy |
---|
| 262 | \begin{lstlisting} |
---|
| 263 | definition status_of_pseudo_status: |
---|
| 264 | internal_pseudo_address_map → PseudoStatus → option Status |
---|
| 265 | \end{lstlisting} |
---|
| 266 | |
---|
| 267 | \begin{lstlisting} |
---|
| 268 | definition next_internal_pseudo_address_map0: |
---|
| 269 | internal_pseudo_address_map → PseudoStatus → option internal_pseudo_address_map |
---|
| 270 | \end{lstlisting} |
---|
| 271 | |
---|
| 272 | CSC: no 2nd, 3rd options using policy |
---|
| 273 | \begin{lstlisting} |
---|
| 274 | ∀M,M',ps,s,s''. |
---|
| 275 | next_internal_pseudo_address_map M ps = Some ... M' → |
---|
| 276 | status_of_pseudo_status M ps = Some ... s → |
---|
| 277 | status_of_pseudo_status M' (execute_1_pseudo_instruction (ticks_of (code_memory ... ps)) ps) = Some ... s'' → |
---|
| 278 | ∃n. execute n s = s''. |
---|
| 279 | \end{lstlisting} |
---|
| 280 | |
---|
| 281 | CSC: no options using policy |
---|
| 282 | \begin{lstlisting} |
---|
| 283 | lemma fetch_assembly_pseudo2: |
---|
| 284 | ∀program,assembled,costs,labels. |
---|
| 285 | Some ... LANGLElabels,costsRANGLE = build_maps program → |
---|
| 286 | Some ... LANGLEassembled,costsRANGLE = assembly program → |
---|
| 287 | ∀ppc. |
---|
| 288 | let code_memory ≝ load_code_memory assembled in |
---|
| 289 | let preamble ≝ \fst program in |
---|
| 290 | let data_labels ≝ construct_datalabels preamble in |
---|
| 291 | let lookup_labels ≝ λx. sigma program (address_of_word_labels_code_mem (\snd program) x) in |
---|
| 292 | let lookup_datalabels ≝ λx. lookup ? ? x data_labels (zero ?) in |
---|
| 293 | let expansion ≝ jump_expansion ppc program in |
---|
| 294 | let LANGLEpi,newppcRANGLE ≝ fetch_pseudo_instruction (\snd program) ppc in |
---|
| 295 | ∃instructions. |
---|
| 296 | Some ? instructions = expand_pseudo_instruction lookup_labels lookup_datalabels (sigma program ppc) expansion pi ∧ |
---|
| 297 | fetch_many code_memory (sigma program newppc) (sigma program ppc) instructions. |
---|
| 298 | \end{lstlisting} |
---|
| 299 | |
---|
| 300 | \begin{lstlisting} |
---|
| 301 | definition expand_pseudo_instruction: |
---|
| 302 | ∀lookup_labels.∀lookup_datalabels.∀pc.∀policy_decision. (sigma program ppc) expansion. pseudo_instruciton -> list instruction. |
---|
| 303 | \end{lstlisting} |
---|
| 304 | |
---|
| 305 | \begin{lstlisting} |
---|
| 306 | axiom assembly_ok: |
---|
| 307 | ∀program,assembled,costs,labels. |
---|
| 308 | Some ... LANGLElabels,costsRANGLE = build_maps program → |
---|
| 309 | Some ... LANGLEassembled,costsRANGLE = assembly program → |
---|
| 310 | let code_memory ≝ load_code_memory assembled in |
---|
| 311 | let preamble ≝ \fst program in |
---|
| 312 | let datalabels ≝ construct_datalabels preamble in |
---|
| 313 | let lookup_labels ≝ λx. sigma program (address_of_word_labels_code_mem (\snd program) x) in |
---|
| 314 | let lookup_datalabels ≝ λx. lookup ?? x datalabels (zero ?) in |
---|
| 315 | ∀ppc,len,assembledi. |
---|
| 316 | let LANGLEpi,newppcRANGLE ≝ fetch_pseudo_instruction (\snd program) ppc in |
---|
| 317 | Some ... LANGLElen,assemblediRANGLE = assembly_1_pseudoinstruction program ppc (sigma program ppc) lookup_labels lookup_datalabels pi → |
---|
| 318 | encoding_check code_memory (sigma program ppc) (bitvector_of_nat ... (nat_of_bitvector ... (sigma program ppc) + len)) assembledi ∧ |
---|
| 319 | sigma program newppc = bitvector_of_nat ... (nat_of_bitvector ... (sigma program ppc) + len). |
---|
| 320 | \end{lstlisting} |
---|
| 321 | |
---|
| 322 | \begin{lstlisting} |
---|
| 323 | lemma fetch_assembly_pseudo: |
---|
| 324 | ∀program,ppc,lookup_labels,lookup_datalabels. |
---|
| 325 | ∀pi,code_memory,len,assembled,instructions,pc. |
---|
| 326 | let expansion ≝ jump_expansion ppc program in |
---|
| 327 | Some ? instructions = expand_pseudo_instruction lookup_labels lookup_datalabels (bitvector_of_nat ? pc) expansion pi → |
---|
| 328 | Some ... LANGLElen,assembledRANGLE = assembly_1_pseudoinstruction program ppc (bitvector_of_nat ? pc) lookup_labels lookup_datalabels pi → |
---|
| 329 | encoding_check code_memory (bitvector_of_nat ... pc) (bitvector_of_nat ... (pc + len)) assembled → |
---|
| 330 | fetch_many code_memory (bitvector_of_nat ... (pc + len)) (bitvector_of_nat ... pc) instructions. |
---|
| 331 | \end{lstlisting} |
---|
| 332 | |
---|
| 333 | \begin{lstlisting} |
---|
| 334 | theorem fetch_assembly: |
---|
| 335 | ∀pc,i,code_memory,assembled. |
---|
| 336 | assembled = assembly1 i → |
---|
| 337 | let len ≝ length ... assembled in |
---|
| 338 | encoding_check code_memory (bitvector_of_nat ... pc) (bitvector_of_nat ... (pc + len)) assembled → |
---|
| 339 | let fetched ≝ fetch code_memory (bitvector_of_nat ... pc) in |
---|
| 340 | let LANGLEinstr_pc, ticksRANGLE ≝ fetched in |
---|
| 341 | let LANGLEinstr,pc'RANGLE ≝ instr_pc in |
---|
| 342 | (eq_instruction instr i ∧ eqb ticks (ticks_of_instruction instr) ∧ eq_bv ... pc' (bitvector_of_nat ... (pc + len))) = true. |
---|
| 343 | \end{lstlisting} |
---|
| 344 | |
---|
[927] | 345 | % ---------------------------------------------------------------------------- % |
---|
| 346 | % SECTION % |
---|
| 347 | % ---------------------------------------------------------------------------- % |
---|
| 348 | \section{Conclusions} |
---|
| 349 | \label{sect.conclusions} |
---|
| 350 | |
---|
[960] | 351 | \begin{itemize} |
---|
| 352 | \item use of dependent types to throw away wrong programs that would made |
---|
| 353 | the statement for completeness complex. E.g. misuse of addressing modes, |
---|
| 354 | jumps to non existent labels, etc. |
---|
| 355 | \item try to go for small reflection; avoid inductive predicates that require |
---|
| 356 | tactics (inversion, etc.) that do not work well with dependent types; small |
---|
| 357 | reflection usually does |
---|
| 358 | \item use coercions to simulate Russell; mix together the proof styles |
---|
| 359 | a la Russell (for situations with heavy dependent types) and the standard |
---|
| 360 | one |
---|
| 361 | \end{itemize} |
---|
| 362 | |
---|
[955] | 363 | \subsection{Use of dependent types and Russell} |
---|
| 364 | \label{subsect.use.of.dependent.types.and.Russell} |
---|
[952] | 365 | |
---|
[955] | 366 | Our formalisation makes sparing use of dependent types. |
---|
| 367 | In certain datastructures, such as tries and vectors, they are used to guarantee invariants. |
---|
| 368 | However, we have currently shyed away from making extensive use of dependent types and inductive predicates in the proof of correctness for the assembler itself. |
---|
| 369 | This is because complex dependent types and inductive predicates tend not to co\"operate particularly well with tactics such as inversion. |
---|
[952] | 370 | |
---|
[956] | 371 | However, there are certain cases where the use of dependent types is unavoidable. |
---|
| 372 | For instance, when proving that the \texttt{build\_maps} function is correct, a function that collates the cost and data labels of an assembly program into map datastructures. |
---|
| 373 | In cases such as these we make use of Matita's implementation of Russell~\cite{}. |
---|
[955] | 374 | |
---|
[952] | 375 | \subsection{Related work} |
---|
| 376 | \label{subsect.related.work} |
---|
| 377 | |
---|
[927] | 378 | \bibliography{cpp-2011.bib} |
---|
| 379 | |
---|
[960] | 380 | \end{document}\renewcommand{\verb}{\lstinline} |
---|
| 381 | \def\lstlanguagefiles{lst-grafite.tex} |
---|
| 382 | \lstset{language=Grafite} |
---|