Changeset 561
 Timestamp:
 Feb 17, 2011, 4:20:52 PM (10 years ago)
 File:

 1 edited
Legend:
 Unmodified
 Added
 Removed

Deliverables/D4.1/ITPPaper/itp2011.tex
r560 r561 425 425 The internal state of our Matita emulator is represented as a record: 426 426 \begin{lstlisting} 427 record Status: Type[0] ≝ 428 { 427 record Status: Type[0] ≝ { 429 428 code_memory: BitVectorTrie Byte 16; 430 429 low_internal_ram: BitVectorTrie Byte 7; … … 434 433 special_function_registers_8051: Vector Byte 19; 435 434 special_function_registers_8052: Vector Byte 5; 436 ... 437 }. 435 ... }. 438 436 \end{lstlisting} 439 437 This record neatly encapsulates the current memory contents, the program counter, the state of the current SFRs, and so on. 440 One peculiarity is the packing of the 24 combined SFRs into fixed length vectors. 441 This was due to a bug in Matita when we were constructing the emulator, since fixed, where the time needed to typecheck a record grew exponentially with the number of fields. 442 443 Here, it appears that the MCS51's memory spaces are completely disjoint. 444 This is not so; many of them overlap with each other, and there's a manymany relationship between addressing modes and memory spaces. 445 For instance, \texttt{DIRECT} addressing can be used to address low internal RAM and the SFRs, but not high internal RAM. 446 447 For simplicity, we merely treat memory spaces as if they are completely disjoint in the \texttt{Status} record. 448 Overlapping, and checking which addressing modes can be used to address particular memory spaces, is handled through numerous \texttt{get\_arg\_XX} and \texttt{set\_arg\_XX} (for 1, 8 and 16 bits) functions. 438 439 Here we choosed to represent the MCS51 memory model using four disjoint 440 segments plus the SFRs. From the programmer point of view, however, what 441 matters are addressing modes that are in a manytomany relation with the 442 segments. For instance, the \texttt{DIRECT} addressing mode can be used to 443 address either low internal RAM (if the first bit is 0) or the SFRs (if the first bit is 1). That's why DIRECT uses 8bits address but pointers to the low 444 internal RAM only have 7 bits. Hence the complexity of the memory model 445 is incapsulated in the \texttt{get\_arg\_XX} and \texttt{set\_arg\_XX} 446 functions that get and set data of size \texttt{XX} from the 447 memory by considering all possible addressing modes 448 449 %Overlapping, and checking which addressing modes can be used to address particular memory spaces, is handled through numerous \texttt{get\_arg\_XX} and \texttt{set\_arg\_XX} (for 1, 8 and 16 bits) functions. 449 450 450 451 Both the Matita and O'Caml emulators follows the classic `fetchdecodeexecute' model of processor operation. … … 453 454 These costs are taken from a Siemens Semiconductor Group data sheet for the MCS51~\cite{siemens:2011}, and will likely vary across manufacturers and particular derivatives of the processor. 454 455 \begin{lstlisting} 455 definition fetch: 456 BitVectorTrie Byte 16 $\rightarrow$ Word $\rightarrow$ instruction $\times$ Word $\times$ nat := ... 457 \end{lstlisting} 458 A single instruction is assembled into its corresponding bit encoding with \texttt{assembly1}: 459 \begin{lstlisting} 460 definition assembly1: instruction $\rightarrow$ list Byte := ... 456 definition fetch: BitVectorTrie Byte 16 $\rightarrow$ Word $\rightarrow$ instruction $\times$ Word $\times$ nat 457 \end{lstlisting} 458 Instruction are assembled to bit encodings by \texttt{assembly1}: 459 \begin{lstlisting} 460 definition assembly1: instruction $\rightarrow$ list Byte 461 461 \end{lstlisting} 462 462 An assembly program, consisting of a preamble containing global data, and a list of (pseudo)instructions, is assembled using \texttt{assembly}. … … 465 465 \begin{lstlisting} 466 466 definition assembly: 467 assembly_program $\rightarrow$ option (list Byte $\times$ (BitVectorTrie String 16)) := ... 468 \end{lstlisting} 469 A single execution step of the processor is evaluated using \texttt{execute\_1}, mapping a \texttt{Status} to a \texttt{Status}: 470 \begin{lstlisting} 471 definition execute_1: Status $\rightarrow$ Status := ... 472 \end{lstlisting} 473 Multiple steps of processor execution are implemented in \texttt{execute}, which wraps \texttt{execute\_1}: 467 assembly_program $\rightarrow$ option (list Byte $\times$ (BitVectorTrie String 16)) 468 \end{lstlisting} 469 A single fetchdecodeexecute cycle is performed by \texttt{execute\_1}: 470 \begin{lstlisting} 471 definition execute_1: Status $\rightarrow$ Status 472 \end{lstlisting} 473 The \texttt{execute} functions performs a fixed number of cycles by iterating 474 \texttt{execute\_1}: 474 475 \begin{lstlisting} 475 476 let rec execute (n: nat) (s: Status) on n: Status := ... 476 477 \end{lstlisting} 477 This differs slightly from the design of the O'Caml emulator, which executed a program indefinitely, and also accepted a callback function as an argument, which could `witness' the execution as it happened, and providing a printout of the processor state, and other debugging information. 478 Due to Matita's requirement that all functions be strongly normalizing, \texttt{execute} cannot execute a program indefinitely, and must execute a fixed number of steps. 478 This differs slightly from the design of the O'Caml emulator, which executed a program indefinitely, and also accepted a callback function as an argument, which could `witness' the execution as it happened, and provide a printout of the processor state, and other debugging information. 479 Due to Matita's requirement that all functions be strongly normalizing, \texttt{execute} cannot execute a program indefinitely. An alternative is to 480 produce an infinite stream of statuses representing the execution trace. 481 Infinite streams are encodable in Matita as coinductive types. 479 482 480 483 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% … … 485 488 486 489 A peculiarity of the MCS51 is the nonorthogonality of its instruction set. 487 For instance, the \texttt{MOV} instruction, can be invoked using one of sixteen combinations of addressing modes .490 For instance, the \texttt{MOV} instruction, can be invoked using one of sixteen combinations of addressing modes out of 361. 488 491 489 492 % Show example of pattern matching with polymorphic variants … … 496 499 ... 497 500 \end{lstlisting} 498 Which were then used in our inductive datatype for assembly instructions, as follows:501 Which were then combined in our inductive datatype for assembly instructions using the union operator `$$': 499 502 \begin{lstlisting} 500 503 type 'addr preinstruction = … … 538 541 ... 539 542 \end{lstlisting} 540 A function that checks whether an \texttt{addressing\_mode} is `morally' an \texttt{addressing\_mode\_tag} is provided, as follows:543 The \texttt{is\_a} function checks if an \texttt{addressing\_mode} matches an \texttt{addressing\_mode\_tag}: 541 544 \begin{lstlisting} 542 545 let rec is_a (d: addressing_mode_tag) (A: addressing_mode) on d := … … 546 549 ... 547 550 \end{lstlisting} 548 We also extend this check to vectors of \texttt{addressing\_mode\_tag}'s in the obvious manner: 549 \begin{lstlisting} 550 let rec is_in (n: nat) (l: Vector addressing_mode_tag n) (A: addressing_mode) on l := 551 match l return $\lambda$m.$\lambda$_: Vector addressing_mode_tag m. bool with 552 [ VEmpty $\Rightarrow$ false 553  VCons m he (tl: Vector addressing_mode_tag m) $\Rightarrow$ 554 is_a he A $\vee$ is_in ? tl A ]. 555 \end{lstlisting} 556 Here $\mathtt{\vee}$ is inclusive disjunction on the \texttt{bool} datatype. 557 \begin{lstlisting} 558 record subaddressing_mode (n: nat) (l: Vector addressing_mode_tag (S n)): Type[0] := 559 { 560 subaddressing_modeel :> addressing_mode; 561 subaddressing_modein: bool_to_Prop (is_in ? l subaddressing_modeel) 562 }. 563 \end{lstlisting} 564 We can now provide an inductive type of preinstructions with precise typings: 551 The \texttt{is\_in} function checks if an \texttt{addressing\_mode} matches a set of tags represented as a vector. It simply extends the \texttt{is\_a} function in the obvious manner. 552 553 Finally, a \texttt{subaddressing\_mode} is an adhoc non empty $\Sigma$type of addressing 554 modes constrained to be in a given set of tags: 555 \begin{lstlisting} 556 record subaddressing_mode n (l: Vector addressing_mode_tag (S n)): Type[0] := 557 { subaddressing_modeel :> addressing_mode; 558 subaddressing_modein: bool_to_Prop (is_in ? l subaddressing_modeel) }. 559 \end{lstlisting} 560 An implicit coercion is provided to promote vectors of tags (denoted with 561 $\llbracket  \rrbracket$) 562 to the 563 corresponding \texttt{subaddressing\_mode} so that we can use a syntax 564 close to the O'Caml one to specify preinstructions: 565 565 \begin{lstlisting} 566 566 inductive preinstruction (A: Type[0]): Type[0] ≝ … … 569 569 ... 570 570 \end{lstlisting} 571 Here $\llbracket  \rrbracket$ is syntax denoting a vector.572 571 We see that the constructor \texttt{ADD} expects two parameters, the first being the accumulator A (\texttt{acc\_a}), and the second being one of a register, direct, indirect or data addressing mode. 573 572 … … 575 574 % Have lemmas proving that if an element is a member of a sub, then it is a member of a superlist, and so on 576 575 The final, missing component is a pair of type coercions from \texttt{addressing\_mode} to \texttt{subaddressing\_mode} and from \texttt{subaddressing\_mode} to \texttt{Type$\lbrack0\rbrack$}, respectively. 577 The latter coercion is largely straightforward, however the former is not:578 \begin{lstlisting} 579 coercion mk_subaddressing_mode: 580 $\forall$n. $\forall$l: Vector addressing_mode_tag (S n).581 $\forall$a: addressing_mode. 582 $\forall$p: bool_to_Prop (is_in ? l a). subaddressing_mode n l := 583 mk_subaddressing_mode on a: addressing_mode to subaddressing_mode ? ?. 584 \end{lstlisting} 585 Using this coercion opens a proof obligation wherein we must prove that the \texttt{addressing\_mode\_tag} in correspondence with the \texttt{addressing\_mode} is a member of the \texttt{Vector} of permissible \texttt{addressing\_mode\_tag}s. 586 This impels us to state and prove a number of auxilliary lemmas. 587 For instance, we prove that if an \texttt{addressing\_mode\_tag} is a member of a \texttt{Vector}, and we possess another vector with additional elements, then the same \texttt{addressing\_mode\_tag} is a member of this vector.588 Using these lemmas, and Matita's automation, all proof obligations are solved easily. 589 (Type checking the main \texttt{execute\_1} function, for instance, opens up over 200 proof obligations.) 590 591 The machinery just described allows us to state in the type of a function what addressing modes that function expects.576 The first one is simply a forgetful coercion, while the second one opens 577 a proof obligation wherein we must prove that the provided value is in the 578 admissible set. This kind of coercions were firstly introduced in PVS to 579 implement subset types~\cite{pvs?} and then in Coq as an additional machinery~\cite{russell}. In Matita all coercions can open proof obligations. 580 581 Proof obligations impels us to state and prove a few auxilliary lemmas related 582 to transitivity of subtyping. For instance, an addressing mode that belongs 583 to an allowed set also belongs to any one of its superset. At the moment, 584 Matita's automation exploits these lemmas to completely solve all the proof 585 obligations opened in our formalization, comprising the 200 proof obligations 586 related to the main \texttt{execute\_1} function. 587 588 The machinery just described allows us to restrict the set of addressing 589 modes expected by a function and use this information during pattern matching 590 to skip impossible cases. 592 591 For instance, consider \texttt{set\_arg\_16}, which expects only a \texttt{DPTR}: 593 592 \begin{lstlisting} 594 definition set_arg_16: Status $\rightarrow$ Word $\rightarrow$ $\llbracket$ dptr $\rrbracket$ $\rightarrow$ Status ≝ 595 $\lambda$s, v, a. 593 definition set_arg_16: Status $\rightarrow$ Word $\rightarrow$ $\llbracket$ dptr $\rrbracket$ $\rightarrow$ Status ≝ $~\lambda$s, v, a. 596 594 match a return $\lambda$x. bool_to_Prop (is_in ? $\llbracket$ dptr $\rrbracket$ x) $\rightarrow$ ? with 597 595 [ DPTR $\Rightarrow$ $\lambda$_: True. … … 600 598 let status := set_8051_sfr status SFR_DPL bl in 601 599 status 602  _ $\Rightarrow$ $\lambda$_: False. 603 match K in False with 604 [ 605 ] 606 ] (subaddressing_modein $\ldots$ a). 607 \end{lstlisting} 608 All other cases are discharged by the catchall at the bottom of the match expression. 609 Attempting to match against another addressing mode not indicated in the type (for example, \texttt{REGISTER}) will produce a typeerror. 600  _ $\Rightarrow$ $\lambda$_: False. $\bot$ ] $~$(subaddressing_modein $\ldots$ a). 601 \end{lstlisting} 602 We feed to the pattern matching the proof \texttt{(subaddressing\_modein} $\ldots$ \texttt{a)} that the argument $a$ is in the set $\llbracket$ \texttt{dptr} $\rrbracket$. In all cases but \texttt{DPTR}, the proof is a proof of \texttt{False} and we can ask the system to open a proof obligation $\bot$ that will be 603 discarded automatically using exfalso. 604 Attempting to match against a non allowed addressing mode 605 (replacing \texttt{False} with \texttt{True} in the branch) will produce a 606 typeerror. 607 608 All the other dependently and non dependently typed solutions we tried before 609 the current one resulted to be suboptimal in practice. In particular, since 610 we need a large number of different combinations of address modes to describe 611 the whole instruction set, it is unfeasible to declare a data type for each 612 one of these combinations. Moreover, the current solution is the one that 613 matches best the corresponding O'Caml code, at the point that the translation 614 from O'Caml to Matita is almost syntactical. In particular, we would like to 615 investigate the possibility of changing the code extraction procedure of 616 Matita to recognize this programming pattern and output the original 617 code based on polymorphic variants. 610 618 611 619 % Talk about extraction to O'Caml code, which hopefully will allow us to extract back to using polymorphic variants, or when extracting vectors we could extract using phantom types
Note: See TracChangeset
for help on using the changeset viewer.