Index: /Papers/cppasm2012/cpp2012asm.tex
===================================================================
 /Papers/cppasm2012/cpp2012asm.tex (revision 2362)
+++ /Papers/cppasm2012/cpp2012asm.tex (revision 2363)
@@ 308,5 +308,5 @@
As a consequence, we allow the programmer to mangle, change and generally adjust addresses as they want, under the proviso that the translation process may not be able to preserve the semantics of programs that do this.
The only limitation introduced by this approach is that the size of
assembly programs is bounded by $2^16$.
+assembly programs is bounded by $2^{16}$.
This will be further discussed in Subsect.~\ref{subsect.total.correctness.for.well.behaved.assembly.programs}.
@@ 317,21 +317,28 @@
\subsection{The assembler}
\label{subsect.the.assembler}
+The assembler takes in input an assembly program to expand and a
+branch displacement policy for it.
+It returns both the assembled program (a list of bytes to be
+loaded in code memory for execution) and the costing of the source program.
Conceptually the assembler works in two passes.
The first pass expands every pseudoinstruction into a list of machine code instructions using the function \texttt{expand\_pseudo\_instruction}.
+The first pass expands every pseudoinstruction into a list of machine code instructions using the function \texttt{expand\_pseudo\_instruction}. The policy
+determines which expansion among the alternatives will be choosed for
+pseudojumps and pseudocalls. Once the expansion is performed, the cost
+of the pseudoinstruction is defined as the cost of the expansion.
The second pass encodes as a list of bytes the expanded instruction list by mapping the function \texttt{assembly1} across the list, and then flattening.
The program obtained as a list of bytes is ready to be loaded in code memory
for execution.
\begin{displaymath}
\hspace{0.5cm}
\mbox{\fontsize{7}{9}\selectfont$[\mathtt{P_1}, \ldots \mathtt{P_n}]$} \underset{\mbox{\fontsize{7}{9}\selectfont$\mathtt{assembly}$}}{\xrightarrow{\left(P_i \underset{\mbox{\fontsize{7}{9}\selectfont$\mathtt{assembly\_1\_pseudo\_instruction}$}}{\xrightarrow{\mathtt{P_i} \xrightarrow{\mbox{\fontsize{7}{9}\selectfont$\mathtt{expand\_pseudo\_instruction}$}} \mathtt{[I^1_i, \ldots I^q_i]} \xrightarrow{\mbox{\fontsize{7}{9}\selectfont$\mathtt{~~~~~~~~assembly1^{*}~~~~~~~~}$}} \mathtt{[0110]}}} \mathtt{[0110]}\right)^{*}}} \mbox{\fontsize{7}{9}\selectfont$\mathtt{[\ldots0110\ldots]}$}
\end{displaymath}
The most complex of the two passes is the first, which expands pseudoinstructions and must perform the task of branch displacement~\cite{hyde:branch:2006}.
The function \texttt{assembly\_1\_pseudo\_instruction} used in the body of the paper is essentially the composition of the two passes.

The branch displacement problem refers to the task of expanding pseudojumps into their concrete counterparts, preferably as efficiently as possible.
For instance, the MCS51 features three unconditional jump instructions: \texttt{LJMP} and \texttt{SJMP}`long jump' and `short jump' respectivelyand an 11bit oddity of the MCS51, \texttt{AJMP}.
Each of these three instructions expects arguments in different sizes and behaves in markedly different ways: \texttt{SJMP} may only perform a `local jump'; \texttt{LJMP} may jump to any address in the MCS51's memory space and \texttt{AJMP} may jump to any address in the current memory page.
Consequently, the size of each opcode is different, and to squeeze as much code as possible into the MCS51's limited code memory, the smallest possible opcode that will suffice should be selected.
+In order to understand the type for the policy, we briefly hint at the
+branch displacement problem for the MCS51. A detailed description is found
+in~\cite{boender:correctness:2012}.
+The MCS51 features three unconditional jump instructions: \texttt{LJMP} and \texttt{SJMP}`long jump' and `short jump' respectivelyand an 11bit oddity of the MCS51, \texttt{AJMP}.
+Each of these three instructions expects arguments in different sizes and behaves in markedly different ways: \texttt{SJMP} may only perform a `local jump' to an address closer then $2^{7}$ bytes; \texttt{LJMP} may jump to any address in the MCS51's memory space and \texttt{AJMP} may jump to any address in the current memory page. Memory pages partition the code memory into $2^8$ disjoint areas.
+The size of each opcode is different, with long jumps being larger than the
+other two. Because of the presence of \texttt{AJMP}, an optimal global solution
+may be locally unoptimal, employing a long jump where a shorter one could be
+used to force later jumps to stay inside single memory pages.
Similarly, a conditional pseudojump must be translated potentially into a configuration of machine code instructions, depending on the distance to the jump's target.
@@ 347,42 +354,52 @@
\end{array}
\end{displaymath}}}
Here, if \texttt{JZ} fails, we fall through to the \texttt{SJMP} which jumps over the \texttt{LJMP}.
Naturally, if \texttt{label} is `close enough', a conditional jump pseudoinstruction is mapped directly to a conditional jump machine instruction; the above translation only applies if \texttt{label} is not sufficiently local.
In order to implement branch displacement it is impossible to really make the \texttt{expand\_pseudo\_instruction} function completely independent of the encoding function.
This is due to branch displacement requiring the distance in bytes of the target of the jump.
Moreover the standard solutions for solving the branch displacement problem find their solutions iteratively, by either starting from a solution where all jumps are long, and shrinking them when possible, or starting from a state where all jumps are short and increasing their length as needed.
Proving the correctness of such algorithms is already quite involved and the correctness of the assembler as a whole does not depend on the `quality' of the solution found to a branch displacement problem.
For this reason, we try to isolate the computation of a branch displacement problem from the proof of correctness for the assembler by parameterising our \texttt{expand\_pseudo\_instruction} by a `policy'.

\begin{lstlisting}
definition expand_pseudo_instruction:
 $\forall$lookup_labels: Identifier $\rightarrow$ Word.
 $\forall$policy.
 $\forall$ppc: Word.
 $\forall$lookup_datalabels: Identifier $\rightarrow$ Word.
 $\forall$pi: pseudo_instruction.
 list instruction := ...
\end{lstlisting}
Here, the functions \texttt{lookup\_labels} and \texttt{lookup\_datalabels} are the functions that map labels and datalabels to program counters respectively, both of them used in the semantics of assembly.
The input \texttt{pi} is the pseudoinstruction to be expanded and is found at address \texttt{ppc} in the assembly program.
The function takes \texttt{policy} as an input.
In reality, this is a pair of functions, but for the purposes of this paper we simplify.
The \texttt{policy} maps pseudoprogram counters to program counters: the encoding of the expansion of the pseudoinstruction found at address \texttt{a} in the assembly code should be placed into code memory at address \texttt{policy(a)}.
Of course this is possible only if the policy is correct, which means that the encoding of consecutive assembly instructions must be consecutive in code memory.
\begin{displaymath}
\texttt{policy}(\texttt{ppc} + 1) = \texttt{pc} + \texttt{current\_instruction\_size}
\end{displaymath}
Here, \texttt{current\_instruction\_size} is the size in bytes of the encoding of the expanded pseudoinstruction found at \texttt{ppc}.
Note that the entanglement we hinted at is only partially solved in this way: the assembler code can ignore the implementation details of the algorithm that finds a policy;
however, the algorithm that finds a policy must know the exact behaviour of the assembly program because it needs to predict the way the assembly will expand and encode pseudoinstructions, once fed with a policy.
A companion submission to this one~\cite{boender:correctness:2012} certifies an algorithm that finds branch displacement policies for the assembler described in this paper.

The \texttt{expand\_pseudo\_instruction} function uses the \texttt{policy} map to determine the size of jump required when expanding pseudojumps, computing the jump size by examining the size of the differences between program counters.
For instance, if at address \texttt{ppc} in the assembly program we found \texttt{Jmp l} such that \texttt{lookup\_labels l = a}, if the offset \texttt{d = policy(a)  policy(ppc + 1)} is such that \texttt{d} $< 128$ then \texttt{Jmp l} is normally translated to the best local solution, the short jump \texttt{SJMP d}.
A global best solution to the branch displacement problem, however, is not always made of locally best solutions.
Therefore, in some circumstances, it is necessary to force the assembler to expand jumps into larger ones.
This is achieved by another booleanvalued function such that if the function applied to \texttt{ppc} returns true then a \texttt{Jmp l} at address \texttt{ppc} is always translated to a long jump.
An essentially identical mechanism exists for call instructions.
+%In order to implement branch displacement it is impossible to really make the \texttt{expand\_pseudo\_instruction} function completely independent of the encoding function.
+%This is due to branch displacement requiring the distance in bytes of the target of the jump.
+%Moreover the standard solutions for solving the branch displacement problem find their solutions iteratively, by either starting from a solution where all jumps are long, and shrinking them when possible, or starting from a state where all jumps are short and increasing their length as needed.
+%Proving the correctness of such algorithms is already quite involved and the correctness of the assembler as a whole does not depend on the `quality' of the solution found to a branch displacement problem.
+%For this reason, we try to isolate the computation of a branch displacement problem from the proof of correctness for the assembler by parameterising our \texttt{expand\_pseudo\_instruction} by a `policy'.
+
+The \texttt{expand\_pseudo\_instruction} function is driven by a policy
+in the choice of expansion of pseudoinstructions. The simplest idea
+is then to define policies as functions that maps jumps to their size.
+This simple idea, however, is impractical because short jumps require the
+offset of the target. For instance, suppose that at address \texttt{ppc} in
+the assembly program we found \texttt{Jmp l} such that $l$ is associated to the
+pseudoaddress \texttt{a} and the policy wants the \texttt{Jmp} to become a
+\texttt{SJMP $\delta$}. To compute $\delta$, we need to know what the addresses
+\texttt{ppc+1} and \texttt{a} will become in the assembled program to compute
+their difference.
+The address \texttt{a} will be associated to is a function of the expansion
+of all the pseudoinstructions between \texttt{ppc} and \texttt{a}, which is
+still to be performed when expanding the instruction at \texttt{ppc}.
+
+To solve the issue, we define the policy \texttt{policy} as a maps
+from a valid pseudoaddress to the corresponding address in the assembled
+program.
+Therefore, $\delta$ in the example above can be computed simply as
+\texttt{sigma(a)  sigma(ppc + 1)}. Moreover, the \texttt{expand\_pseudo\_instruction} emits a \texttt{SJMP} only after verifying for each \texttt{Jmp} that
+$\delta < 128$. When this is not the case, the function emits an
+\texttt{AJMP} if possible, or an \texttt{LJMP} otherwise, therefore always
+picking the locally best solution.
+In order to accomodate those optimal solutions that require local suboptimal
+choices, the policy may also return a boolean used to force
+the translation of a \texttt{Jmp} into a \texttt{LJMP} even if
+$\delta < 128$. An essentially identical mechanism exists for call
+instructions and conditional jumps.
+
+In order for the translation of a jump to be correct, the address associated to
+\texttt{a} by the policy and by the assembler must coincide. The latter is
+the sum of the size of all the expansions of the pseudoinstructions that
+preceed the one at address \texttt{a}: the assembler just concatenates all
+expansions one after another. To grant this property, we impose a
+correctness criterion over policies. A policy is correct when for all
+valid pseudoaddresses \texttt{ppc}
+$${\texttt{policy(ppc+1) = policy(ppc) + instruction\_size(ppc)}}$$
+Here \texttt{instruction\_size(ppc)} is the size in bytes of the expansion
+of the pseudoinstruction found at \texttt{pcc}, i.e. the length of
+$\texttt{assembly\_1\_pseudo\_instruction(ppc)}$.
+
%  %