Changeset 3477

Ignore:
Timestamp:
Sep 22, 2014, 3:50:30 PM (5 years ago)
Message:

changes to intro, in progress...

Location:
Papers/sttt
Files:
2 edited

Legend:

Unmodified
 r3476 \documentclass[twocolumn,draft]{svjour3} %\documentclass[twocolumn,draft]{svjour3} \documentclass[a4paper]{article} \usepackage{algpseudocode} %\usepackage{algorithmicx} \usepackage[british]{babel} \usepackage{caption} \usepackage{hyperref} \usepackage[colorlinks]{hyperref} \usepackage[utf8]{inputenc} \usepackage{listings} \usepackage{microtype} \usepackage{subcaption} \thanks{Research supported by the CerCo project, within the Future and Emerging Technologies (FET) programme of the Seventh Framework Programme for Research of the European Commission, under FET-Open grant number 243881}} \author{Jaap Boender \and Dominic P. Mulligan \and Claudio Sacerdoti Coen} \institute{Department of Computer Science, University of Middlesex \and Computer Laboratory, University of Cambridge \and Dipartimento di Informatica, University of Bologna} %\institute{Department of Computer Science, University of Middlesex \and Computer Laboratory, University of Cambridge \and Dipartimento di Informatica, University of Bologna} \maketitle \begin{abstract} Optimising assemblers face the branch displacement' or jump encoding' problem, i.e. how best to choose between concrete machine code jump instructions --- typically of differing instruction and offset sizes --- when expanding pseudo-instructions. Ideally, an optimising assembler would choose the set of jump expansions that minimises the size of the resulting machine code program, a task that is provably \textsc{np}-hard. As part of CerCo (Certified Complexity') --- an \textsc{eu}-funded project to develop a verified concrete complexity preserving compiler for a large subset of the C programming language --- we have implemented and proved correct an optimising assembler within the Matita interactive theorem prover. Our assembler targets the instruction set of a typical micro-controller, the Intel \textsc{mcs}-51 series. As is common in embedded systems development, this micro-controller has a paucity of available code memory and therefore we face an additional pressure in reducing the size of any assembled machine code program. Out of necessity, then, our assembler implements an algorithm for solving the branch displacement problem, and we must prove that this algorithm is correct. However, the efficient expansion of pseudoinstructions, namely jumps, into machine instructions is complex. We therefore isolate the decision making over how jumps should be expanded from the expansion process itself as much as possible using policies', making the proof of correctness for the assembler more straightforward. Optimising assemblers face the branch displacement' or jump encoding' problem, that is: how best to choose between concrete machine code jump instructions---typically of differing instruction and offset sizes---when expanding pseudo-instructions. Ideally, on space constrained hardware, an optimising assembler would choose the set of pseudo-instruction expansions that minimises the size of the resulting machine code program, a task that is provably \textsc{np}-hard. As part of CerCo (Certified Complexity')---an \textsc{eu}-funded project to develop a verified concrete complexity preserving compiler for a large subset of the C programming language---we have implemented and proved correct an optimising assembler within the Matita interactive theorem prover. Our assembler targets the instruction set of a typical micro-controller, the Intel \textsc{mcs-51} series. As is common with embedded systems development this micro-controller has a paucity of available code memory and therefore we face an additional pressure in reducing the size of any assembled machine code program. Out of necessity, then, our assembler implements an algorithm for solving the branch displacement problem, and we have proved that this algorithm is correct. However, this efficient expansion of jump pseudoinstructions into machine code equivalents is complex, and therefore could unneccessarily complicate the proof of correctness for the rest of the assembler. We therefore isolate the decision making over how jumps should be expanded from the expansion process itself as much as possible using policies', making the rest of the proof of correctness more straightforward. Our proof strategy contains a tracking facility for good addresses' and only programs that use good addresses have their semantics preserved under assembly, as we observe that it is impossible for an assembler to preserve the semantics of every assembly program. Our strategy offers increased flexibility over the traditional approach to proving the correctness of assemblers, wherein addresses in assembly are kept opaque and immutable. In particular, we may experiment with allowing the benign manipulation of addresses. In particular, our approach permits experimentation with the benign manipulation of addresses. We discuss wider issues associated with a proof of correctness for an assembler, detail our algorithm solving the branch displacement' problem, and discuss our proof of correctness, employing policies', for the assembler. \keywords{Formal verification, interactive theorem proving, assembler, branch displacement optimisation, CerCo (Certified Complexity'), MCS-51 microcontroller, Matita proof assistant} %\keywords{Formal verification, interactive theorem proving, assembler, branch displacement optimisation, CerCo (Certified Complexity'), MCS-51 microcontroller, Matita proof assistant} \end{abstract} \section{Introduction} The problem of branch displacement optimisation, also known as jump encoding, is a well-known problem in assembler design~\cite{Hyde2006}. Its origin lies in the fact that in many architecture sets, the encoding (and therefore size) of some instructions depends on the distance to their operand (the instruction 'span'). The branch displacement optimisation problem consists of encoding these span-dependent instructions in such a way that the resulting program is as small as possible. This problem is the subject of the present paper. After introducing the problem in more detail, we will discuss the solutions used by other compilers, present the algorithm we use in the CerCo assembler, and discuss its verification, that is the proofs of termination and correctness using the Matita proof assistant~\cite{Asperti2007}. Formulating the final statement of correctness and finding the loop invariants have been non-trivial tasks and are, indeed, the main contribution of this paper. It has required considerable care and fine-tuning to formulate not only the minimal statement required for the ulterior proof of correctness of the assembler, but also the minimal set of invariants needed for the proof of correctness of the algorithm. The research presented in this paper has been executed within the CerCo project which aims at formally verifying a C compiler with cost annotations. The target architecture for this project is the MCS-51, whose instruction set contains span-dependent instructions. Furthermore, its maximum addressable memory size is very small (64 Kb), which makes it important to generate programs that are as small as possible. With this optimisation, however, comes increased complexity and hence increased possibility for error. We must make sure that the branch instructions are encoded correctly, otherwise the assembled program will behave unpredictably. All Matita files related to this development can be found on the CerCo website, \url{http://cerco.cs.unibo.it}. The specific part that contains the branch displacement algorithm is in the {\tt ASM} subdirectory, in the files {\tt PolicyFront.ma}, {\tt PolicyStep.ma} and {\tt Policy.ma}. We consider the formalisation of an assembler for the Intel MCS-51 8-bit microprocessor in the Matita proof assistant~\cite{asperti:user:2007}. This formalisation forms a major component of the EU-funded CerCo (Certified Complexity') project~\cite{cerco:2011}, concerning the construction and formalisation of a concrete complexity preserving compiler for a large subset of the C programming language. The \textsc{mcs-51}, commonly called the 8051, is an 8-bit Harvard architecture \textsc{cisc} instruction set micro-controller dating from the early 1980s, and was originally manufactured by Intel. An extended variant, the \textsc{mcs-52} or 8052, was subsequently released adding extra \textsc{ram} and \textsc{rom} above and beyond that offered by the original \textsc{mcs-51}, and an extra timer. Derivatives are still widely manufactured by a number of semiconductor foundries, with the processor being used especially in embedded systems. The MCS-51 has a relative paucity of features compared to its more modern brethren, with a lack of any caching or pipelining features meaning that timing of execution is predictable, making the MCS-51 very attractive for CerCo's ends. However, the MCS-51's paucity of features---though an advantage in many respects---also quickly becomes a hindrance, as the MCS-51 features a relatively minuscule series of memory spaces by modern standards. As a result our C compiler, to be able to successfully compile realistic programs for embedded devices, ought to produce tight' machine code. To do this, we must solve the branch displacement' problem---deciding how best to expand pseudojumps to labels in assembly language to machine code jumps. This problem, also known as jump encoding', is a well-known problem amongst assembler implementors~\cite{Hyde2006} and arises when pseudojumps can be expanded in different ways to real machine instructions, but the different expansions are not equivalent (e.g. jumps that differ in the distance that they span', or instructions that differ in the number of clock cycles needed to completely execute them) and are not always correct (e.g. correctness is only up to global constraints over the compiled code). For instance, some jump instructions (short jumps) are very small and can be executed quickly, but they can only reach destinations within a certain distance from the current instruction. When the destinations are too far away, larger and slower long jumps must be used. The use of a long jump may augment the distance between another pseudojump and its target, forcing another long jump use, in a cascade. The job of the optimising compiler (assembler) is to individually expand every pseudo-instruction in such a way that all global constraints are satisfied and that the compiled program is minimal in size and faster in concrete time complexity. This problem is known to be computationally hard for most CISC architectures (\textsc{np}-hard, see~\cite{hyde:branch:2006}). To simplify the CerCo C compiler we have chosen to implement an optimising assembler whose input language the compiler will target. Labels, conditional jumps to labels, a program preamble containing global data and a \texttt{MOV} instruction for moving this global data into the MCS-51's one 16-bit register all feature in our assembly language. We further simplify by ignoring linking, assuming that all our assembly programs are pre-linked. The requirements of the CerCo project add yet more complications to our proof of correctness, as we must also address a cost model. CerCo imposes a cost model on C programs or, more specifically, on simple blocks of instructions. This cost model is induced by the compilation process itself, and its non-compositional nature allows us to assign different costs to identical C statements depending on how they are compiled. In short, we aim to obtain a very precise costing for a program by embracing the compilation process, not ignoring it. At the assembler level, this is reflected by our need to induce a cost model on the assembly code as a function of the assembly program and the strategy used to solve the branch displacement problem. In particular, our optimising assembler should also return a map that assigns a cost (in clock cycles) to every instruction in the source program. We expect the induced cost to be preserved by the assembler: we will prove that the compiled code tightly simulates the source code by taking exactly the predicted amount of time. Note that the temporal `tightness' of the simulation is a fundamental prerequisite of the correctness of the simulation, as some functions of the MCS-51---timers and \textsc{i/o}---depend on the microprocessor's clock. If the pseudo- and concrete clock differ the result of an \textsc{i/o} operation may not be preserved. % This problem is the subject of the present paper. After introducing the problem % in more detail, we will discuss the solutions used by other compilers, present % the algorithm we use in the CerCo assembler, and discuss its verification, % that is the proofs of termination and correctness using the Matita proof % assistant~\cite{Asperti2007}. % Formulating the final statement of correctness and finding the loop invariants % have been non-trivial tasks and are, indeed, the main contribution of this % paper. It has required considerable care and fine-tuning to formulate not % only the minimal statement required for the ulterior proof of correctness of % the assembler, but also the minimal set of invariants needed for the proof % of correctness of the algorithm. % The research presented in this paper has been executed within the CerCo project % which aims at formally verifying a C compiler with cost annotations. The % target architecture for this project is the MCS-51, whose instruction set % contains span-dependent instructions. Furthermore, its maximum addressable % memory size is very small (64 Kb), which makes it important to generate % programs that are as small as possible. With this optimisation, however, comes increased complexity and hence increased possibility for error. We must make sure that the branch instructions are encoded correctly, otherwise the assembled program will behave unpredictably. % All Matita files related to this development can be found on the CerCo % website, \url{http://cerco.cs.unibo.it}. The specific part that contains the % branch displacement algorithm is in the {\tt ASM} subdirectory, in the files % {\tt PolicyFront.ma}, {\tt PolicyStep.ma} and {\tt Policy.ma}. \section{The branch displacement optimisation problem} \bibliographystyle{alpha} \bibliography{biblio} \bibliographystyle{spbasic} %\bibliographystyle{spbasic} \end{document}