Changeset 2372 for Papers


Ignore:
Timestamp:
Sep 28, 2012, 5:28:43 PM (7 years ago)
Author:
sacerdot
Message:

3.5 rewritten up to XXXX

File:
1 edited

Legend:

Unmodified
Added
Removed
  • Papers/cpp-asm-2012/cpp-2012-asm.tex

    r2370 r2372  
    520520\subsection{Correctness for `well-behaved' assembly programs}
    521521\label{subsect.total.correctness.for.well.behaved.assembly.programs}
    522 
    523 The traditional approach to verifying the correctness of an assembler is to restrict the semantics of assembly code so that it treats memory addresses as opaque (symbolic) structures that cannot be modified.
    524 Memory is then represented as a map from opaque addresses to the disjoint union of data and opaque addresses---addresses are kept opaque to prevent their possible `semantics breaking' manipulation by assembly programs:
     522Most assemblers can map a single pseudoinstruction into zero or more
     523instructions, whose size (in bytes) is not independent of the expansion.
     524Therefore, the assembly process always produces a map (that for us is just the
     525policy) that associates to each assembly address \texttt{a} a code memory
     526address~\texttt{policy(a)} where the instructions that correspond to the
     527pseudoinstruction at \texttt{a} are located. And the map is not just a
     528linear function, but depends on the local choices and global optimizations
     529performed.
     530
     531During execution of an assembly code, addresses can be stored in memory
     532locations or in the registers. Moreover, arithmetical operations can be
     533applied to addresses, for example to compare them or to shift a function pointer
     534to implement C switch statements. In order to show that the object code
     535simulates
     536the assembly code, we need to compute the processor status that corresponds
     537to the assembly status. In particular, those \texttt{a} in memory that
     538are used as data should be preserved as \texttt{a}, but those used as
     539addresses should be changed into \texttt{policy(a)}. Moreover, every
     540arithmetic operation should commute with \texttt{policy} in order for the
     541semantics to be preserved.
     542
     543According to the previous observation, we can ask if it is possible at all
     544for an assembler to preserve the semantics of an assembly program.
     545The traditional approach to the verification of assemblers answers the question
     546in the positive by restricting the semantics of assembly programs. In
     547particular, the type of memory cells and registers is set to the disjoint union
     548of data and symbolic addresses, an the semantics is always forced to consider
     549all possible combinations of arguments (data vs data, data vs addresses, etc.),
     550rejecting operations whose semantics cannot be preserved.
     551
    525552\begin{displaymath}
    526553\mathtt{Mem} : \mathtt{Addr} \rightarrow \mathtt{Bytes} + \mathtt{Addr} \qquad \llbracket - \rrbracket : \mathtt{Instr} \rightarrow \mathtt{Mem} \rightarrow \mathtt{option\ Mem}
    527554\end{displaymath}
    528 Similarly, registers may contain either data or opaque addresses.
    529 
    530 The semantics of a pseudoinstruction, $\llbracket - \rrbracket$, is given as a possibly failing function from pseudoinstructions and memory spaces (pseudostatuses to be more precise) to new memory spaces. The semantic function proceeds by case analysis over the operands of a given instruction, failing if either operand is an opaque address and the operation is meaningless on it.
    531555\begin{gather*}
    532556\llbracket \mathtt{MUL\ @A1\ @A2} \rrbracket^\mathtt{M} = \begin{cases}
    533                                                               \mathtt{Byte\ b1},\ \mathtt{Byte\ b2} & \rightarrow \mathtt{Some}(\mathtt{M}\ \text{with}\ \mathtt{b1} + \mathtt{b2}) \\
     557                                                              \mathtt{Byte\ b1},\ \mathtt{Byte\ b2} & \rightarrow \mathtt{Some}(\mathtt{M}\ \text{with accumulator :=}\ \mathtt{b1} + \mathtt{b2}) \\
    534558                                                              -,\ \mathtt{Addr\ a} & \rightarrow \mathtt{None} \\
    535559                                                              \mathtt{Addr\ a},\ - & \rightarrow \mathtt{None}
    536560                                                            \end{cases}
    537561\end{gather*}
    538 In this paper we take a different approach, tracing memory locations (and accumulators) that contain memory addresses.
    539 We prove that only those assembly programs that use addresses in `safe' ways have their semantics preserved by the assembly process---a sort of dynamic type system sitting atop memory.
    540 In principle this approach allows us to introduce some permitted \emph{benign} manipulations of addresses that the traditional approach cannot handle, therefore expanding the set of input programs that can be assembled correctly.
    541 This approach seems similar to one taken by Tuch \emph{et al}~\cite{tuch:types:2007} for reasoning about low-level C code.
    542 
    543 Our analogue of the semantic function above is merely a wrapper around the function that implements the semantics of machine code, paired with a function that keeps track of addresses.
    544 The semantics of pseudo- and machine code are then essentially shared.
    545 The only thing that changes at the assembly level is the presence of the new tracking function.
    546 
    547 However, with this approach we must detect (at run time) programs that manipulate addresses in well-behaved ways, according to some approximation of well-behavedness.
     562This approach has two main limitations. The first one is that it does not
     563assign any semantics to interesting programs that could intentionally mangle
     564addresses for malign (e.g. viruses) or benign (e.g. operating systems)
     565purposes. The second is that it does not allow to share very well the
     566semantics of assembly pseudoinstructions and object code instructions: only
     567the \texttt{Byte-Byte} branch above can share the semantics with the
     568object code \texttt{MUL}.
     569
     570In this paper we have already taken a different approach from
     571Section~\ref{subsect.assembly.code.semantics}, where we have assigned a
     572semantics to every assembly program by not distinguishing at all between
     573data and symbolic addresses. Memory cells and registers always hold bytes,
     574and symbolic labels are mapped to absolute addresses before execution.
     575The consequence is that we do not expect that all assembly programs will have
     576their semantics respected by object code. We call \emph{well behaved} those
     577that do.
     578Other consequences is that we can now reason over the semantics
     579of programs that are not well behaved, and that we can handle well behavedness
     580as an open predicate, recognizing more and more good behaviours as required.
     581Of course, compilers that target our assembler will need to produce well
     582behaved programs, which is usually the case by construction.
     583
     584The definition of well behavedness we employ employs a map to keep tracks
     585of the memory locations and registers that hold addresses during execution
     586of an assembly program. Therefore, the map acts as a sort of dynamic type
     587system sitting atop memory. This approach seems similar to one taken by Tuch
     588\emph{et al}~\cite{tuch:types:2007} for reasoning about low-level C code.
     589
     590The semantics of an assembly program is then augmented with a function
     591that at each execution step updates the maps, signalling an error when
     592the program performs an ill behaved operation. The actual computation is
     593not performed by this mechanism, being already part of the assembly
     594semantics.
     595\begin{displaymath}
     596\mathtt{AddrMap} : \mathtt{Addr} \rightarrow \{Data,Addr\} \qquad \llbracket - \rrbracket : \mathtt{Instr} \rightarrow \mathtt{AddrMap} \rightarrow \mathtt{option\ AddrMap}
     597\end{displaymath}
     598\begin{gather*}
     599\llbracket \mathtt{MUL\ @A1\ @A2} \rrbracket^\mathtt{M} = \begin{cases}
     600                                                              \mathtt{Data},\ \mathtt{Data} & \rightarrow Some(M \text{with accumulator :=} \mathtt{Data})\\
     601                                                              -,\ \mathtt{Addr\ a} & \rightarrow \mathtt{None} \\
     602                                                              \mathtt{Addr\ a},\ - & \rightarrow \mathtt{None}
     603                                                            \end{cases}
     604\end{gather*}
     605
     606In order to prove preservation of the semantics, we need to associate an
     607objecto code status to each assembly pseudostatus. This operation is driven
     608by the current \texttt{AddrMap}: if at address \texttt{a} the assembly
     609level memory holds \texttt{d}, then if \texttt{AddrMap(a) = Data} the
     610object code memory will hold \texttt{d} (data is preserved), otherwise
     611it will hold \texttt{policy(d)}. If all the operations accepted by the
     612address update map are well behaved, this is sufficient to show preservation
     613of the semantics for those computation steps that are well behaved, i.e.
     614such that the map update does not fail.
     615
     616XXXXXXXXXXXXXXXXX
     617
    548618We use an \texttt{internal\_pseudo\_address\_map} to trace addresses of code memory addresses in internal RAM:
    549619\begin{lstlisting}
Note: See TracChangeset for help on using the changeset viewer.