Ignore:
Timestamp:
Sep 16, 2012, 5:40:25 PM (7 years ago)
Author:
sacerdot
Message:

Only relevant pieces of reviews left in place.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • Papers/cpp-asm-2012/reviews.txt

    r2333 r2334  
    1 Dear Claudio,
    2 
    3 Attached please find reviewer comments regarding your CPP submission,
    4    On the correctness of an optimising assembler for the MCS-51 microcontroller.
    5 
    6 We trust that you will find these comments useful and constructive.
    7 Thank you for submitting to CPP.  We will send more information about
    8 submitting your final draft paper (in LNCS format) for the conference
    9 proceedings via EasyChair.
    10 
    11 We look forward to seeing you at the meeting in Kyoto. Note that APLAS
    12 2012 is also collocated with CPP 2012.
    13 
    14 Sincerely, Chris Hawblitzel and Dale Miller, CPP 2012 PC Chairs
    15 
    16 Some useful links: http://cpp12.kuis.kyoto-u.ac.jp/
    17                    http://aplas12.kuis.kyoto-u.ac.jp/
    18 
    19 
    201----------------------- REVIEW 1 ---------------------
    21 PAPER: 16
    22 TITLE: On the correctness of an optimising assembler for the MCS-51 microcontroller
    23 AUTHORS: Dominic Mulligan and Claudio Sacerdoti Coen
    24 
    25 REVIEWER'S CONFIDENCE: 4 (expert)
    26 
    27 A partial Matita verification of the correctness of an optimizing assembler targeting a resource-constrained embedded processor.  The key challenge is compiling each jump to one of a variety of differently sized machine code instructions (or possibly a sequence of several) with different addressing modes.  Many of the proof obligations are discharged in the Russell style, using dependent sigma types in the original definitions of functions.  The assembler is parametrized over a "policy" explaining how each jump should be compiled, plus a proof that the policy is sound.  The final theorem establishes correct assembly for any sound policy, given that the input program satisfies a behavioral policy that rules out address arithmetic that would break key invariants.  The set-up also gives guarantees about worst-case execution time of programs.
    28 
    29 I think this is a useful piece of work that is just what we ought to be looking for in CPP.  Though the authors probably wouldn't think of it this way, this work is a good case study demonstrating that the relatively young proof assistant Matita has moved beyond "toy" status, and the paper is interesting in that respect, independently of the details of what has been verified.  Thus, I recommend acceptance.
    30 
    31 The main weakness of the paper is the detailed presentation of lemma statements.  I believe it would be better to focus on the high-level insights, from which competent readers could then reconstruct the lemma statements and proofs.  Following this advice would open up substantial space, which I believe could profitably be used to combine this paper with the simultaneous CPP submission by the same group.  It would be nice to have a single conference-length paper that explains all the key insights.
    32 
    33 One contribution of this paper is the novel way of reasoning about assembly programs that may do address arithmetic.  From the abstract, it is not clear that the current development supports no such safe arithmetic, but rather the approach is conjectured to support easy extension for that kind of code.  I think this is a reasonable position, but the speculative nature of the motivation ought to be made clearer earlier.
    34 
    35 It is disappointing that the proofs still depend on some asserted axioms.  From the standpoint of evaluating Matita's maturity, one might wonder whether it will be at all practical to finish the proofs and remove all dependence on axioms.  The best way to address these objections is to finish the proofs and report on the results!  (This point is what holds me back from choosing "strong accept" as my rating.)
     21) The main weakness of the paper is the detailed presentation of lemma statements.  I believe it would be better to focus on the high-level insights, from which competent readers could then reconstruct the lemma statements and proofs.  Following this advice would open up substantial space, which I believe could profitably be used to combine this paper with the simultaneous CPP submission by the same group.  It would be nice to have a single conference-length paper that explains all the key insights.
     3
     42) One contribution of this paper is the novel way of reasoning about assembly programs that may do address arithmetic.  From the abstract, it is not clear that the current development supports no such safe arithmetic, but rather the approach is conjectured to support easy extension for that kind of code.  I think this is a reasonable position, but the speculative nature of the motivation ought to be made clearer earlier.
    365
    376Page 3, "We only remark", perhaps add "that" after?  (and there is a double "of of" shortly after)
     
    5221
    5322----------------------- REVIEW 2 ---------------------
    54 PAPER: 16
    55 TITLE: On the correctness of an optimising assembler for the MCS-51 microcontroller
    56 AUTHORS: Dominic Mulligan and Claudio Sacerdoti Coen
    57 
    58 REVIEWER'S CONFIDENCE: 4 (expert)
    59 
    60 Summary of program committee discussion:
    61 
    62 The program committee noted that the research described here was split between two papers, "On the correctness of an optimising assembler for the MCS-51 microcontroller" and "On the correctness of a branch displacement algorithm".  After some discussion, we decided to accept only one of these two papers.  To strengthen the one accepted paper, we recommend (but do not require) that the key results from both submissions be presented in the one accepted paper.  This would, as the other reviews point out, lead to "a higher concentration of interesting results" with more "focus on the high-level insights".
    63 
    64 
    65 My review:
    66 
    67 This paper describes the formalization and verification of an optimizing assembler for MCS-51 assembly language code.  Such verification is nontrivial because the program and the assembler both want to control memory:  an assembly language program may need to observe and modify memory at a low level, while an optimizing assembler makes decisions that affect memory layout.  Thus, it's important to define semantics for assembly language that are more abstract or more constrained than the semantics for machine language, so that the assembler can influence memory layout while still preserving the assembly language program's semantics during the translation to machine language.
    68 
    69 As an example, suppose the assembly language program executes a call instruction, which pushes a code address on the stack.  The exact code address pushed on the stack depends on how the assembler lays out instructions in memory; this layout may depend on instruction encoding, padding, etc.  One possible approach is to allow assembly language registers and assembly language memory to hold symbolic addresses as well as concrete bytes.  For example, an assembly language call instruction could push a symbolic address onto the stack rather than pushing concrete bytes.  This approach makes it straightforward to define which operations are allowed (comparing two symbols for equality may be ok, but comparing a symbol with a concrete constant is probably illegal).  However, this mixture of symbolic addresses and concrete bytes complicates the assembly language semantics when compared to the machine language semantics.  Therefore, the paper rejects this approach in favor of what appea
    70 rs to be a variant of this approach.  Rather than putting symbolic addresses directly into registers and memory, the paper's approach defines a map that keeps track of which registers and memory locations hold symbols.  When looking up a value in a register or memory location, the map dictates whether to interpret the register/memory contents as a symbol or a concrete value.
    71 
    72 The current implementation appears to be pretty restrictive in what operations are allowed on symbols.  As far as I could see, you can push and pop them from the stack (particularly in call/return instructions), but that's about it.  It appears that you can't compare symbols for equality (e.g. to test two function pointers for equality in C), for example.  In fact, I'm not sure you can even move function pointers around between variables (at least, I couldn't find the place that would update the map during moves).  Do you allow jump tables of code addresses to implement C switch statements?  These would be good examples to motivate the definitions in section 3.5, which is currently pretty hard to understand.  (If you want a more challenging example, you might consider garbage collection tables that map code addresses to GC data; some collectors do binary search on code addresses to find entries in the tables.)
    73 
    74 The definition of assembly-level state (with symbolic addresses or a map that locates the symbolic addresses) and machine-level state (bytes, with no symbols) should be central to the paper.  Unfortunately, these concepts are buried in section 3.5's explanations of internal_pseudo_address_map, low_internal_ram_of_pseudo_low_internal_ram, status_of_pseudo_status, etc., which are too dense, appear too late in the paper, and rely on knowledge of the MCS-51 that the paper only hints at (e.g. low and high internal memory).  Furthermore, the definitions of Status and PseudoStatus never appear at all.  I'd suggest starting with the definitions of registers, memory, internal_pseudo_address_map, and Status/PseudoStatus, and then working through the details of the assembler later in the paper.  This would ensure that the main theorem is understandable.  Compared to the main theorem, the other lemmas are less critical to the paper.  Section 3.4, in particular, could be greatly shortened
     231) The program committee noted that the research described here was split between two papers, "On the correctness of an optimising assembler for the MCS-51 microcontroller" and "On the correctness of a branch displacement algorithm".  After some discussion, we decided to accept only one of these two papers.  To strengthen the one accepted paper, we recommend (but do not require) that the key results from both submissions be presented in the one accepted paper.  This would, as the other reviews point out, lead to "a higher concentration of interesting results" with more "focus on the high-level insights".
     24
     252) The current implementation appears to be pretty restrictive in what operations are allowed on symbols.  As far as I could see, you can push and pop them from the stack (particularly in call/return instructions), but that's about it.  It appears that you can't compare symbols for equality (e.g. to test two function pointers for equality in C), for example.  In fact, I'm not sure you can even move function pointers around between variables (at least, I couldn't find the place that would update the map during moves).  Do you allow jump tables of code addresses to implement C switch statements?  These would be good examples to motivate the definitions in section 3.5, which is currently pretty hard to understand.  (If you want a more challenging example, you might consider garbage collection tables that map code addresses to GC data; some collectors do binary search on code addresses to find entries in the tables.)
     26
     273) The definition of assembly-level state (with symbolic addresses or a map that locates the symbolic addresses) and machine-level state (bytes, with no symbols) should be central to the paper.  Unfortunately, these concepts are buried in section 3.5's explanations of internal_pseudo_address_map, low_internal_ram_of_pseudo_low_internal_ram, status_of_pseudo_status, etc., which are too dense, appear too late in the paper, and rely on knowledge of the MCS-51 that the paper only hints at (e.g. low and high internal memory).  Furthermore, the definitions of Status and PseudoStatus never appear at all.  I'd suggest starting with the definitions of registers, memory, internal_pseudo_address_map, and Status/PseudoStatus, and then working through the details of the assembler later in the paper.  This would ensure that the main theorem is understandable.  Compared to the main theorem, the other lemmas are less critical to the paper.  Section 3.4, in particular, could be greatly shortened
    7528; much of it is devoted to showing how machine instructions and bytes are concatenated together, which I found less interesting and more tedious than the paper's other sections.
    7629
    77 Other questions:
    78 - How do you model I/O operations?
    79 - How do you plan to run the assembler?  Can you extract an executable assembler from the Matita implementation?
    80 - Right now, the proofs and specifications are mixed together in the Matita files.  Is there a way to isolate and quantify the definitions that you trust for correctness (the "trusted computing base")?
    81 
    82 I'm a little bothered that the proof still has unfinished pieces, although most of the important pieces appear finished.
    83 
    84 
    85 Minor comments:
    86 
    87 "We first attempted to synthesise[sic] a solution bottom up...."  This paragraph is vague.  Either omit the paragraph, or be more specific about the "solutions" under consideration.
    88 
    89 Section 3.1, definition of execute_1: say what "cm" stands for.  (In fact, throughout the paper, try to say what the variables are as they are introduced.)
    90 
    91 Section 3.4: "is obtained by composition of expand_pseudo_instruction and assembly_1_pseudoinstruction":  "composition" usually means the output of one function is the input to the other function.
    92 
    93 Section 3.4, lemma fetch_assembly_pseudo: the variable "instructions" is not in scope (did you mean "instrs"?).
    94 
    95 Section 3.4: "Here, l is the number of machine code instructions".  "pc_plus_len" is computed by adding pc to l, so wouldn't "number of bytes" make more sense?
    96 
     304) How do you model I/O operations?
     315) How do you plan to run the assembler?  Can you extract an executable assembler from the Matita implementation?
     326) Right now, the proofs and specifications are mixed together in the Matita files.  Is there a way to isolate and quantify the definitions that you trust for correctness (the "trusted computing base")?
     33
     347) "We first attempted to synthesise[sic] a solution bottom up...."  This paragraph is vague.  Either omit the paragraph, or be more specific about the "solutions" under consideration.
     35
     368) Section 3.1, definition of execute_1: say what "cm" stands for.  (In fact, throughout the paper, try to say what the variables are as they are introduced.)
     37
     389) Section 3.4: "is obtained by composition of expand_pseudo_instruction and assembly_1_pseudoinstruction":  "composition" usually means the output of one function is the input to the other function.
     39
     4010) Section 3.4, lemma fetch_assembly_pseudo: the variable "instructions" is not in scope (did you mean "instrs"?).
     41
     4211) Section 3.4: "Here, l is the number of machine code instructions".  "pc_plus_len" is computed by adding pc to l, so wouldn't "number of bytes" make more sense?
    9743
    9844----------------------- REVIEW 3 ---------------------
    99 PAPER: 16
    100 TITLE: On the correctness of an optimising assembler for the MCS-51 microcontroller
    101 AUTHORS: Dominic Mulligan and Claudio Sacerdoti Coen
    102 
    103 REVIEWER'S CONFIDENCE: 3 (high)
    104 
    105 This paper certifies an assembler for Intel MCS-51 microprocessor.
    106 According to the authors, there are two major contributions: first
    107 the work abstract away details of the branch displacement algorithm
    108 when proving the correctness; second establish the correctness of
    109 the assembler only for input assembly programs that use addresses
    110 in safe ways. Their approach to check safety is more flexible than
    111 previous approaches that treat addresses as opaque values.
    112 
    113 The project looks interesting, but the presentation is not very
    114 readable. The major problem is that all the formal details are given
    115 in the form of Matita code. Comparing to the mathematical notations,
    116 the code in the boxes looks very verbose and is difficult to read.
    117 I know the work has been mechanized in Matita, but this doesn't mean
    118 the presentation has to show the code. The authors should use normal
    119 mathematical notations instead.
    120 
    121 For the second contribution (for checking safety of address usage),
    122 what is the problem here? Why addresses have to be treated as
    123 opaque values before? Here by addresses do you mean code labels
    124 only or both code labels and memory pointers for data? The new
    125 approach is said to be more flexible to support "benign" manipulations
    126 of addresses. Do you have examples to demonstrate this flexibility
    127 and show why it is a big deal?
    128 
    129 The title is about correctness of an optimizing assembler. However,
    130 the verification process shown in this paper abstracts away the branch
    131 displacement algorithm, the only optimization in the assembler. So it
    132 seems the optimization part is irrelevant.
    133 
    134 In the Intro the authors seem to also claim that the assembler is
    135 cost-preserving. Where is this issue discussed in the paper?
    136 
    13745page 2, line 4: giveth --> given, taketh --> taken ?
    13846page 9, Huth et al [15] --> Tuch et al [15] ?
     
    14351
    14452----------------------- REVIEW 4 ---------------------
    145 PAPER: 16
    146 TITLE: On the correctness of an optimising assembler for the MCS-51 microcontroller
    147 AUTHORS: Dominic Mulligan and Claudio Sacerdoti Coen
    148 
    149 REVIEWER'S CONFIDENCE: 3 (high)
    150 
    151 This paper describes the modelling and verification of an assembler, a
    152 program for turning sequences of mnemonic assembly instructions into
    153 sequences of bytes, i.e., machine code. The program is modelled in the
    154 mechanical proof assistant Matita, and addresses MCS51 microcontrollers.
    155 
    156 The two main problems seem to be:
    157  (i)  translating address labels into jump offets, and,
    158  (ii) ensuring that programs manipulate addresses `correctly'.
    159 
    160 The approach is serious and treats an established and industrially
    161 significant hardware platform. But the paper suffers from three main
    162 deficiencies:
    163 
    16453  (i)   The presentation is needlessly complicated and difficult to
    16554        understand; while individual sentences are well-written, they
     
    17968        not clear whether (or how) this covers all possible program
    18069        behaviours.
    181 
    182 The authors not that their proof is not yet finished, but it does seem that
    183 they have resolved all of the technical obstacles.
    184 
    185 The work is quite interesting and practically-oriented, and I would
    186 encourage the authors to revise their draft to make it more precise,
    187 self-contained, and all encompassing (perhaps as per my detailed comments
    188 below).
    189 
    190 * Detailed comments
    19170
    192711. The title talks about an _optimising_ assembler, but the only
Note: See TracChangeset for help on using the changeset viewer.