# Changeset 2372

Ignore:
Timestamp:
Sep 28, 2012, 5:28:43 PM (8 years ago)
Message:

3.5 rewritten up to XXXX

File:
1 edited

Unmodified
Added
Removed
• ## Papers/cpp-asm-2012/cpp-2012-asm.tex

 r2370 \subsection{Correctness for well-behaved' assembly programs} \label{subsect.total.correctness.for.well.behaved.assembly.programs} The traditional approach to verifying the correctness of an assembler is to restrict the semantics of assembly code so that it treats memory addresses as opaque (symbolic) structures that cannot be modified. Memory is then represented as a map from opaque addresses to the disjoint union of data and opaque addresses---addresses are kept opaque to prevent their possible semantics breaking' manipulation by assembly programs: Most assemblers can map a single pseudoinstruction into zero or more instructions, whose size (in bytes) is not independent of the expansion. Therefore, the assembly process always produces a map (that for us is just the policy) that associates to each assembly address \texttt{a} a code memory address~\texttt{policy(a)} where the instructions that correspond to the pseudoinstruction at \texttt{a} are located. And the map is not just a linear function, but depends on the local choices and global optimizations performed. During execution of an assembly code, addresses can be stored in memory locations or in the registers. Moreover, arithmetical operations can be applied to addresses, for example to compare them or to shift a function pointer to implement C switch statements. In order to show that the object code simulates the assembly code, we need to compute the processor status that corresponds to the assembly status. In particular, those \texttt{a} in memory that are used as data should be preserved as \texttt{a}, but those used as addresses should be changed into \texttt{policy(a)}. Moreover, every arithmetic operation should commute with \texttt{policy} in order for the semantics to be preserved. According to the previous observation, we can ask if it is possible at all for an assembler to preserve the semantics of an assembly program. The traditional approach to the verification of assemblers answers the question in the positive by restricting the semantics of assembly programs. In particular, the type of memory cells and registers is set to the disjoint union of data and symbolic addresses, an the semantics is always forced to consider all possible combinations of arguments (data vs data, data vs addresses, etc.), rejecting operations whose semantics cannot be preserved. \begin{displaymath} \mathtt{Mem} : \mathtt{Addr} \rightarrow \mathtt{Bytes} + \mathtt{Addr} \qquad \llbracket - \rrbracket : \mathtt{Instr} \rightarrow \mathtt{Mem} \rightarrow \mathtt{option\ Mem} \end{displaymath} Similarly, registers may contain either data or opaque addresses. The semantics of a pseudoinstruction, $\llbracket - \rrbracket$, is given as a possibly failing function from pseudoinstructions and memory spaces (pseudostatuses to be more precise) to new memory spaces. The semantic function proceeds by case analysis over the operands of a given instruction, failing if either operand is an opaque address and the operation is meaningless on it. \begin{gather*} \llbracket \mathtt{MUL\ @A1\ @A2} \rrbracket^\mathtt{M} = \begin{cases} \mathtt{Byte\ b1},\ \mathtt{Byte\ b2} & \rightarrow \mathtt{Some}(\mathtt{M}\ \text{with}\ \mathtt{b1} + \mathtt{b2}) \\ \mathtt{Byte\ b1},\ \mathtt{Byte\ b2} & \rightarrow \mathtt{Some}(\mathtt{M}\ \text{with accumulator :=}\ \mathtt{b1} + \mathtt{b2}) \\ -,\ \mathtt{Addr\ a} & \rightarrow \mathtt{None} \\ \mathtt{Addr\ a},\ - & \rightarrow \mathtt{None} \end{cases} \end{gather*} In this paper we take a different approach, tracing memory locations (and accumulators) that contain memory addresses. We prove that only those assembly programs that use addresses in `safe' ways have their semantics preserved by the assembly process---a sort of dynamic type system sitting atop memory. In principle this approach allows us to introduce some permitted \emph{benign} manipulations of addresses that the traditional approach cannot handle, therefore expanding the set of input programs that can be assembled correctly. This approach seems similar to one taken by Tuch \emph{et al}~\cite{tuch:types:2007} for reasoning about low-level C code. Our analogue of the semantic function above is merely a wrapper around the function that implements the semantics of machine code, paired with a function that keeps track of addresses. The semantics of pseudo- and machine code are then essentially shared. The only thing that changes at the assembly level is the presence of the new tracking function. However, with this approach we must detect (at run time) programs that manipulate addresses in well-behaved ways, according to some approximation of well-behavedness. This approach has two main limitations. The first one is that it does not assign any semantics to interesting programs that could intentionally mangle addresses for malign (e.g. viruses) or benign (e.g. operating systems) purposes. The second is that it does not allow to share very well the semantics of assembly pseudoinstructions and object code instructions: only the \texttt{Byte-Byte} branch above can share the semantics with the object code \texttt{MUL}. In this paper we have already taken a different approach from Section~\ref{subsect.assembly.code.semantics}, where we have assigned a semantics to every assembly program by not distinguishing at all between data and symbolic addresses. Memory cells and registers always hold bytes, and symbolic labels are mapped to absolute addresses before execution. The consequence is that we do not expect that all assembly programs will have their semantics respected by object code. We call \emph{well behaved} those that do. Other consequences is that we can now reason over the semantics of programs that are not well behaved, and that we can handle well behavedness as an open predicate, recognizing more and more good behaviours as required. Of course, compilers that target our assembler will need to produce well behaved programs, which is usually the case by construction. The definition of well behavedness we employ employs a map to keep tracks of the memory locations and registers that hold addresses during execution of an assembly program. Therefore, the map acts as a sort of dynamic type system sitting atop memory. This approach seems similar to one taken by Tuch \emph{et al}~\cite{tuch:types:2007} for reasoning about low-level C code. The semantics of an assembly program is then augmented with a function that at each execution step updates the maps, signalling an error when the program performs an ill behaved operation. The actual computation is not performed by this mechanism, being already part of the assembly semantics. \begin{displaymath} \mathtt{AddrMap} : \mathtt{Addr} \rightarrow \{Data,Addr\} \qquad \llbracket - \rrbracket : \mathtt{Instr} \rightarrow \mathtt{AddrMap} \rightarrow \mathtt{option\ AddrMap} \end{displaymath} \begin{gather*} \llbracket \mathtt{MUL\ @A1\ @A2} \rrbracket^\mathtt{M} = \begin{cases} \mathtt{Data},\ \mathtt{Data} & \rightarrow Some(M \text{with accumulator :=} \mathtt{Data})\\ -,\ \mathtt{Addr\ a} & \rightarrow \mathtt{None} \\ \mathtt{Addr\ a},\ - & \rightarrow \mathtt{None} \end{cases} \end{gather*} In order to prove preservation of the semantics, we need to associate an objecto code status to each assembly pseudostatus. This operation is driven by the current \texttt{AddrMap}: if at address \texttt{a} the assembly level memory holds \texttt{d}, then if \texttt{AddrMap(a) = Data} the object code memory will hold \texttt{d} (data is preserved), otherwise it will hold \texttt{policy(d)}. If all the operations accepted by the address update map are well behaved, this is sufficient to show preservation of the semantics for those computation steps that are well behaved, i.e. such that the map update does not fail. XXXXXXXXXXXXXXXXX We use an \texttt{internal\_pseudo\_address\_map} to trace addresses of code memory addresses in internal RAM: \begin{lstlisting}
Note: See TracChangeset for help on using the changeset viewer.