wiki:Proposal1.3.1

Version 1 (modified by sacerdot, 7 years ago) (diff)

--

Overall strategy and general description

The objectives detailed in Section 1.2? will be pursued by the CerCo? Consortium through the actuation of a workplan described in this section, consisting of a total of 6 Work-Packages (WPs) spanning a temporal frame of 36 months. Several Consortium partners participate in each WP, according to their specific expertise, know-how and business interests.

WPs are independent, yet tightly related. Their execution, to be successful, calls for a significant amount of interaction, information exchange and coordination. Each WP is led by one of the Consortium partners, whose role is to coordinate the work inside the WP and interfacing and communicating with the other WPs. Each WP is further broken down into Tasks, each of them is responsible for a specific portion of the work.

Work packages

In this section, we provide a brief overview of WP activities, outlining the required inputs and expected outputs, as well as the foreseen interactions among WPs. The description of the work to be carried out within each WP, split over the different Tasks, and including information on the specific involvement and contribution of each Consortium partner along the 36-months lifetime of the project will be presented in a later section (Tables 1.3.5?). A Pert showing the dependencies between the tasks is provided in Fig.�8.

Figure 8: Pert Diagram

img34.png

WPround(0+1:0): Project Management:: WPround(1+1:0): Compiler Prototype (untrusted):: * Task Tround(1.1+1:1)Architectural design

  • Task Tround(1.2+1:1)Intermediate languages and data structures
  • Task Tround(1.3+1:1)Implementation
  • Task Tround(1.4+1:1)Integration, validation and testingWPround(2+1:0): Verified Compiler - front end:: * Task Tround(2.1+1:1)Formal semantics of C
  • Task Tround(2.2+1:1)Functional encoding in the Calculus of Inductive Construction
  • Task Tround(2.3+1:1)Formal semantics of intermediate languages
  • Task Tround(2.4+1:1)Correctness proofsWPround(3+1:0): Verified Compiler - back end:: * Task Tround(3.1+1:1)Formal semantics of machine code
  • Task Tround(3.2+1:1)Functional encoding in the Calculus of Inductive Construction
  • Task Tround(3.3+1:1)Formal semantics of intermediate languages
  • Task Tround(3.4+1:1)Correctness proofsWPround(4+1:0): Interfaces and interactive components:: * Task Tround(4.1+1:1)Management of complexity assertions
  • Task Tround(4.2+1:1)Automation of complexity proofs
  • Task Tround(4.3+1:1)Case studiesWPround(5+1:0): Dissemination and exploitation:: * Task Tround(5.1+1:1)User validation and exploitability
  • Task Tround(5.2+1:1)Contribution to portfolio and concertation activities at FET-Open level

The work is organized in four main technical Workpackages WPround(1+1:0)-WPround(4+1:0), plus two WPs devoted to Project Management (WPround(0+1:0)) and Dissemination and exploitation activities (WPround(5+1:0)).


Work Package WPround(0+1:0) collects the main management activities. The aim of WPround(0+1:0)is to efficiently support technical and administrative activities and to facilitate good communication and co-operation among the partners, in order to ensure the successful fulfilment of all project objectives and the achievement of deliverables. As better explained in Section�2.1? Implementation, the Project Coordinator and the Project Manager will be responsible for the project management and coordination activities which include: supervision of project progresses and events, coordination between the work packages, consistency check of the contributions from project partners, check content and timing of project deliverables, monitoring of the project financial situation. The Project Coordinator will be supported by WP leaders, which will coordinate the working activities of WPs and will organise WP deliverables.


Work Package WPround(1+1:0) aims at building a functional prototype of the cost annotating compiler (milestone M1). The compiler will be untrusted, meaning that no proof will be given that the machine code and the cost annotations returned by the compiler are correct; it will be written in a high-level, comfortable programming language particularly tailored to compiler construction, such as OCaml. This untrusted prototype compiler is the first milestone of the project, since it will embody the main architectural design of the final compiler, the format of cost annotations, and the way of computing them. At the same time, it will allow to start experimenting with the management of cost annotations, the declaration of complexity assertions, the generation of complexity obligations and their interactive solution (tasks covered by WPround(4+1:0)).

All partners will contribute expertise to Task Tround(1.1+1:1)where major design choices will be taken and where the format of cost annotations will be thoroughly discussed. A selection of optimizations compatible with complexity preservation will also be identified in that task.

UPD will be in charge of the actual implementation of the untrusted compiler (Task Tround(1.3+1:1)). UPD and UEDIN will preliminary agree in Task Tround(1.2+1:1)on the intermediate languages and data structures for the front-end, in order to make them easily encodable in CIC (Task Tround(2.3+1:1)). Similarly UPD and UNIBO will reach an agreement on languages and data structures for the back-end.

Task Tround(1.4+1:1)will be performed by UPD, driven by the feedback collected from UEDIN and UNIBO in Tasks Tround(2.2+1:1), Tround(2.3+1:1), Tround(3.2+1:1)and Tround(3.3+1:1).


Work packages WPround(2+1:0)and WPround(3+1:0) are entirely devoted to the correctness proof of the compiler. Due to the dimension of the work, this has been split in two major WPs, reflecting the natural decomposition of a compiler into a front-end (WPround(2+1:0)) and a back-end (WPround(3+1:0)). The first step of the correctness proof consists in rewriting the compiler into the internal language of a proof assistant - the Matita Interactive Theorem Prover, in our case - i.e. in a simplified language more suited than OCaml to formal reasoning (Tasks Tround(2.2+1:1)and Tround(3.2+1:1)). These two tasks will be respectively performed by UEDIN and UNIBO with the collaboration of UPD who wrote the untrusted compiler and is responsible, in Task Tround(1.4+1:1), of keeping it in sync with the certified one. The complexity of the tasks relies in the fact that the language of Matita (the Calculus of Inductive Constructions) although quite expressive from a logical point of view, is very rudimentary from a programming perspective.

The formalization of the source and target language semantics in Tasks Tround(2.1+1:1)(UEDIN) and Tround(3.1+1:1)(UNIBO) will be performed during the fist year, in parallel with Task Tround(1.3+1:1)(UPD), anticipating the certification of the compiler.

At the end of the second year we plan to have a full prototype of the system (M2), already written in a language suited to be formally checked for correctness, but still lacking (complete) proofs (part of the formalization will be already completed during the second year, as part of Tasks Tround(2.3+1:1), Tround(2.4+1:1), Tround(3.3+1:1)and Tround(3.4+1:1)). Having such a prototype will allow to start, during the third year, an intensive dissemination activity and the user validation phase.

Starting from month 18, Tasks Tround(2.4+1:1)and Tround(3.4+1:1)are devoted to the correctness proof and will be respectively performed by UEDIN and UNIBO with frequent interactions with UPD to clarify the program logic and related invariants. The outcome of the two tasks will be the main contribution to the final, trusted, prototype of CerCo? (M3).

Work Package WPround(4+1:0) will develop a proof of concept prototype of a system to draw complexity assertions on the execution time of a C program. The system will exploit the output of the CerCo? compiler and will be based on existing tools (like Caduceus) for the management of cost annotations and the synthesis of proof obligations (Task Tround(4.1+1:1)). The task will be leaded by UPD with contributions from UEDIN which has expertise as users of interactive theorem provers. In an additional task Tround(4.2+1:1)UPD will focus on the automation of proofs of complexity obligations with contributions from UNIBO which has expertise in implementation of proof automation.

The synthesis of the annotating invariants will benefit from previous and ongoing research at UPD on the production of complexity bounds. A first approach that has been experimented on first-order functional programs amounts to combine a traditional termination proof with a numerical interpretation of the functions (called quasi-interpretations) that allows to bound the size of the computed data. A second approach which is currently under development at UPD addresses the complexity of higher-order functional programs with side-effects using the ideas of (elementary/light) linear logic. In this case, complexity bounds are extracted from the typing of the program. Of course, in both cases abstraction entails a loss of precision of the analysis and one challenge is to find classes of programs for which automatic analyses do provide practically useful bounds (cf. case study in task Tround(4.3+1:1)).

This final task Tround(4.3+1:1)will be leaded by UPD and will concern the application of the techniques developed in task Tround(4.2+1:1)(automation of proofs) to the C code generated by available compilers for synchronous languages such as Lustre or Esterel. This is a relevant class of C programs for which it is indeed important to get concrete complexity bounds. As a matter of fact, synchronous languages are built around a notion of instant (or phase) and in concrete applications one has to check that the duration of an instant is less than the time between two input events. A preliminary analysis reveals that the structure of the generated C code is rather simple. Then the planned case study aims at automating the process of producing and verifying the invariants which are needed to bound the duration of an instant for the generated C code.


Work Package WPround(5+1:0) is to manage the knowledge generated by the project and IPRs and to bring the technological advances to the scientific community and potential users and is further detailed in Section�3.2?. In addition to the standard scientific papers and reports, we will also create software (the certified CerCo? compiler itself and the proof-of-concept exploitation system developed in WPround(4+1:0)) and test it on a significant case study that we expect to be immediately interesting also to private parties. All our software will be developed under an open license. The Consortium Agreement will contain the policy, rights and obligations on this matter. All partners will contribute to this work package that will be leaded by UNIBO.

Task Tround(5.1+1:1), performed during the last year, will asses the technologies developed in the project in order to ensure fulfilment of actual requirement of the targeted exploitation communities and it will be based on the Untrusted CerCo? Compiler (M2).

Self assessment and validation

As explained in Section�1.1?, the first outcome of the project is the CerCo? cost-preserving compiler, first implemented in OCaml (Tasks T2.1, T2.2, T2.3 of WP2) and then in CIC (Tasks T2.4 of WP2, T3.2 of WP3 and T4.2 of WP4). The compiler will be fully certified for cost-preservation in WP3 and WP4, which are the main instrument for validation of WP2. In particular, in WP3 and WP4, we expect to detect bugs in the cost-preserving compiler or algorithms whose proof of correctness is too complex to be formalized; feedback will be immediately provided to WP2 in order to fix the bug or change the algorithms before propagating the changes back to WP3 and WP4. In particular, UNIBO and UEDIN will be responsible for the proofs and for bug detection, while UPD will be for bug fixing.

The deliverables provided in WP3 and WP4 are mostly self-validating, since they are formal proofs that are checked for correctness by the Matita interactive theorem prover. An additional validation of the methodology adopted in the proofs will be achieved by submitting them to international journals and by presenting them to international peer-reviewed conferences.

The second outcome of the project is the cost-annoting feature of the CerCo? compiler. The concrete exploitability of the cost-annotations is investigated in the Tasks T5.1, T5.2, T5.3, T6.1. In particular, Tasks T5.1 and T5.2 are aimed at providing a first layer of techniques and tools to extend cost-annotations to exact computational complexity proofs for realistic programs and constitute the main self assessment exercise for cost-annotations. Difficulties faced in the exploitation of the cost annotations must be reflected on the data structures and algorithms used to compute them in the compiler (WP2) and are likely to require modifications to the optimizations implemented in the compiler (WP2) and proved correct (in WP3 and WP4). Thus, once again, validation feedback needs to be threaded to every Work Package and intra-WP collaboration triggers bi-lateral collaborations between partners.

Tasks T5.3 and T6.1 are aimed at validating (in the particular scenario of synchronous languages for T5.3) also the techniques exploited in T5.1 and T5.2, in particular by identifying methodological weaknesses and potential solutions to improve exploitability in the long term. Task 6.1 will also target potential communities of users both in academia and in industry in order to disseminate the project results and collect timely feedback used for self-assessment.

Risk analysis and contingency plan

A project like CerCo? can encounter a number of adverse situations. We define a risk as a product between an adverse event and its consequences on the project’s achievements of its objectives. A correct procedure to minimise the overall risk will be taken into account in order to minimise the possible occurrence of adverse events in the construction of the project.


Technical Risks

Risk Probability Remedial actions
Intra-procedural optimizations and optimizations altering in non trivial way the control flow of the program. Low/Medium? Optimization are important but not crucial for the functional behaviour of the compiler: we may be ready to pay the improved semantic and complexity reliability with a small loss in performance. Hence, particularly problematic optimizations can be skipped.
During certification a bug in the untrusted compiler code is spotted. High The untrusted compiler will also be extensively tested by traditional techniques, minimizing the risk. Moreover, the certification starts six months before delivery of the untrusted CerCo? compiler to spot most errors before user validation. Essential parts of the compiler will be certified first. Man-power can be shifted from certification of non crucial parts to bug-fixing and re-certification of the minimal functional core.
A show-stopper issue is faced during certification, undermining the goal of a fully certified compiler. E.g. the formal semantics of the source language turns out to be unfaithful and proofs must be redone from scratch. Low With the delivery of the second milestone, at month 24, we already provide a fully functional CerCo? compiler, that can be tested with conventional techniques. Man-power previously assigned to certification is shifted to testing.
Matita turns out to be inadequate for the task. Low/Medium? We will switch to the Coq proof assistant. Matita is based on the same logic of the Coq proof assistant and it is not difficult to manually port scripts from one system to the other one. Moreover, the two systems can directly share definitions and proof objects encoded in an XML dialect for the Calculus of Inductive Constructions.
Lack of portability of the compiler to different architectures Low Preservation of cost models could be very sensitive to the target architecture and thus the compiler could be unsuited to support different back-ends. Since we plan to deliver just a single back-end during the project, this will not affect the schedule. Generative programming techniques (as used in gcc) can be considered to improve portability.
Sensitivity of the cost annotations to the target architecture High Cost annotations are sensitive to the target architecture and they affect the user provided cost invariants and the proof obligations. Thus even deploying an application on similar devices could require major work to update the invariants and redo the proofs. The problem is made less severe by increasing the level of automation, e.g. integrating invariant generators and automatic theorem provers. These topics will be addressed during the project time-frame only in the use-cases and thus they do not affect the schedule. Further research will be required in the long term to improve robustness of costs invariants.
Lack of methodology to help the user in identifying the cost invariants Medium Guessing cost invariants tight enough for the user scenario can be very difficult. Invariant generation techniques can be exploited and further research will be needed after the end of the project. In the project time frame we will focus our case-studies on the compilation of high level synchronous programs. For this restricted domain, we expect to be able to automatically generate cost invariants without any major user intervention. If this expectation turns out to be wrong, we plan to devise a new set of case-studies or to shift man-power to improve the management of cost invariants.


Consortium Risks

Risk Probability Remedial actions
Not to be able to intervene with correction just in time Low To ask to WP leader to prepare periodically reports based on specific forms defined at the beginning of the project
Researchers might leave Low/Medium? All work to be regularly documented and stored
Divergence among partners on project running Low Consortium agreement rules every conflict situation. The research of consensus is the first objective. However, after a reasonable amount of time has been allowed to illustration and defence of conflicting positions, in order to avoid deadlock in project operational progress, the approval of a two-third majority of Partners will be considered conclusive.
Bad consortium communication Medium Improve team building among members; improve communication facilities; increase face-to-face or telephone communications when possible

Management Risks

Risk Probability Remedial actions
Overestimate work load Medium Study, implement and certify more compiler optimizations. Alternatively, put more effort on WPround(4+1:0)to improve the proof of concept prototype of exploitation of the cost annotations or perform more case studies on more expressive synchronous languages, such as Lucid-Synchrone.
Underestimate work load in implementation Low Manpower can be reassigned from certification to implementation, certifying only core parts of the compiler.
Underestimate work load in certification Medium Certify only core parts of the compiler.
Unrealistic Time Schedule for milestone M1 Low Tasks Tround(2.3+1:1), Tround(3.3+1:1)and partially Tround(3.2+1:1)can be started before completion of deliverable Dround(1.2+1:1)and perform in parallel with the late task.
Unrealistic Time Schedule for milestone M2 Low This is likely to happen only if Tround(4.1+1:1)is late, since other tasks contributing to M2 are to be completed six months in advance. In that case, task Tround(4.2+1:1)and Tround(4.3+1:1) will be suppressed reassigning man-power to the late task and to task Tround(5.1+1:1)that has to be shortened to achieve in time milestone M3.
Inaccurate budget allocation Medium Identify necessary re-allocations among partners.