Log In Sign Up

Recycling Proof Patterns in Coq: Case Studies

Development of Interactive Theorem Provers has led to the creation of big libraries and varied infrastructures for formal proofs. However, despite (or perhaps due to) their sophistication, the re-use of libraries by non-experts or across domains is a challenge. In this paper, we provide detailed case studies and evaluate the machine-learning tool ML4PG built to interactively data-mine the electronic libraries of proofs, and to provide user guidance on the basis of proof patterns found in the existing libraries.


page 1

page 2

page 3

page 4


Machine Learning in Proof General: Interfacing Interfaces

We present ML4PG - a machine learning extension for Proof General. It al...

Formal verification of Zagier's one-sentence proof

We comment on two formal proofs of Fermat's sum of two squares theorem, ...

JEFL: Joint Embedding of Formal Proof Libraries

The heterogeneous nature of the logical foundations used in different in...

Dynamic IFC Theorems for Free!

We show that noninterference and transparency, the key soundness theorem...

(Auto)Focus approaches and their applications: A systematic review

Focus, a framework for formal specification and development of interacti...

Proof Repair Across Type Equivalences

We describe a new approach to automatically repairing broken proofs in t...

Towards Scalable Dataframe Systems

Dataframes are a popular and convenient abstraction to represent, struct...

1 Introduction

Interactive theorem provers (ITPs) (e.g. Agda [Agda], Coq [Coq], Isabelle/HOL [NPW02], Matita [Matita] to name a few) are a family of higher-order languages allowing the formalisation of a wide variety of domains, ranging from mathematical theories to software verification. The most recent achievements concerned formalisation and computer verification of results coming from Group Theory [FTT], Topology [HCMS12], Real Numbers [realsCoq], Discrete Mathematics [flyspeck] and Security [ACDDHPPPT14]. The successful and efficient ITP programming often requires a combination of mathematical and programming intuition; see e.g. [Ben06]. The use of a rich higher-order language implies that there can be a rich variety of approaches to the formalisation and proof development for a given task. Thus, a programmer relies on the previous experience and ability to “creatively” adapt already used proof techniques and patterns in newly constructed proofs. This explains why a “steep learning curve” is often mentioned as one of the big obstacles to wider adoption of ITPs. In this paper, we are probing the abilities of our recent machine-learning tool ML4PG [KHG13] to find interesting proof patterns automatically, and thus enable a more efficient use of ITPs by specialists coming from a wider range of domains.

Development of ITPs has led to the creation of big libraries and varied infrastructures for formal mathematical proofs. These frameworks usually involve thousands of definitions and theorems (for instance, there are approximately 4200 definitions and 15000 theorems in the formalisation of the Feit-Thompson theorem [FCT]). Parts of those libraries can often be re-applied in new domains; however, it is a challenge for expert and non-expert users alike to trace them and find re-usable concepts and proof ideas.

A different, but related, challenge is faced during the creation of these libraries. These frameworks are developed by teams (e.g. 15 people were involved in the Feit-Thompson theorem project), and the situation is similar in industry where teams use ITPs to verify the correctness of hardware and software systems. In those teams, each user has his own definitions, notation and proof-style, which makes the collaborative proof development difficult. In both scenarios, it would be extremely helpful to use a tool that could detect patterns across different users, notation and libraries.

To address these challenges, we propose ML4PG [HK12, KHG13] – a machine-learning extension to the Proof General [ProofGeneral] interface for Coq [BC04] and its SSReflect dialect [SSReflect]. Our main goal is to prove the concept: it is possible to embed a lightweight statistical machine-learning tool into an ITP proof interface, and use it interactively to find non-trivial patterns in existing proofs and aid new proof developments.

The ML4PG package for Proof General features the following main functions:

  • The user works within the interactive environment of Coq/SSReflect, and has an option to call ML4PG from the Proof General interface whenever he needs to find some proof patterns.

  • Based on the user’s choice, ML4PG compiles the chosen libraries, and extracts significant proof features from the existing lemmas and proofs;

  • ML4PG connects to machine-learning tools, and runs a number of experiments on clustering the data for each user’s query. Based on the results, it chooses the most reliable patterns; thus relieving the Coq programmer of the laborious statistical post-processing step.

  • If the user chose to see only patterns related to his current proof goal, ML4PG further filters the results and shows the families of related proofs to the user.

Section 2 gives an overview of ML4PG features, details of its implementation are given in [KHG13]. ML4PG’s features have been substantially extended since [KHG13], we will briefly survey the changes in Conclusions and Future Work section. In this paper, we do not focus on ML4PG implementation per se, although we use it for all the examples and experiments shown in this paper. Our main goal here is to show how useful the automated proof pattern detection can be in different domains.

To illustrate this, we devise five experiments (“user scenarios”) to test ML4PG. Each example is designed to demonstrate a different aspect of proof-pattern recognition. To demonstrate ML4PG’s ability to adapt to different domains, we deliberately illustrate each user scenario by using libraries coming from different subject areas, ranging from Linear Algebra to software verification.

User Scenario 1 illustrates how to use ML4PG for detecting proof patterns prior to the start of a new proof development. To achieve this, Section 3 analyses fundamental libraries that are common in most developments using the SSReflect library [SSReflect]. The SSReflect library was developed as the infrastructure for formalisation of the Four Colour Theorem [FCT] and has played a key role in the formal proof of the Feit-Thompson theorem [FTT]. Up to version 1.4, the SSReflect library was distributed together with the theories about the proof of the Feit-Thompson theorem; from version 1.5, the SSReflect library can be downloaded independently from the MathComp library containing the proof of the Feit-Thompson theorem.

In this first scenario, we use pattern recognition with the aim of spotting common proof patterns across fundamental libraries (1404 theorems). The benefits of using ML4PG in this context is that it can be used to speed up the beginning of a proof development, making it easier to recycle patterns already available in the libraries.

User Scenario 2 tests ML4PG in the case when a proof hint is needed in the middle of the on-going proof development. Section 4 focuses on discovery of proof patterns in mathematical proofs across formalisation of apparently disjoint mathematical theories: Linear Algebra, Combinatorics and Persistent Homology. In this scenario, we use statistically discovered proof patterns to advance the proof of a given “problematic” lemma (in our example, a lemma related to nilpotent matrices [BR91]). In this case, a few initial steps in the “problematic” proof are clustered against several mathematical libraries. This section contains a detailed description of how ML4PG was used to discover some non-trivial common proof patterns among 758 lemmas across 5 libraries, and how the detected proof clusters (groups of similar proofs) were used to advance the proof for the problematic lemma. Notably, ML4PG discovered that the fundamental lemma of Persistent Homology [HCMS12], a result from a completely different context, follows the proof strategy that would suit for the proof of our lemma.

User Scenario 3 shows how ML4PG’s hints can be used for importing a set of existing general proof methods into a new proof development. For this purpose, Section 5 tests ML4PG’s functionality in a different area – verification of Computer Algebra algorithms as suggested by the CoqEAL methodology [DMS12]. It is a different – but equally common – scenario of proof development: CoqEAL gives a general methodology to follow, but the exact rôle of over 720 proofs and definitions in CoqEAL library may be unclear to us. In this case, we equally need guidance in lemma formulation, not only in proofs. In this section, we consider various proofs concerning a fast algorithm to compute the inverse of triangular matrices over a field. Again, ML4PG is able to discover significant clusters, and points us exactly to the results which could be used as hints to formulate the necessary lemmas and complete the proofs. This case-study has been preliminary presented in [CICM13], but here, we put it into a wider context.

User Scenario 4 considers the problem of proof-pattern discovery in a different light. In User Scenarios 1-3, there was always an interesting underlying proof pattern hidden in the big proof libraries, “waiting” to be discovered. What if, despite the user’s hope that one library may contain similar proof strategies to another, the actual proofs are in fact too different to be recycled? Section 6

studies the results that ML4PG obtains working with two different Coq libraries formalising results from game theory 

[Ves06, nash]. One might hope that they contain similar proof patterns, since they formalise the same subject domain; but in fact, ML4PG shows that the actual proof strategies used in [Ves06] and [nash] are completely different. It is the opposite situation to User Scenario 2, where common proof patterns were found despite the fact that the libraries were coming from three different domains. This “negative” output given by ML4PG may in reality save user’s time inspecting these libraries manually.

User Scenario 5 considers the situation when a team of several people develops a set of different modules within one bigger (industrial-scale) verification effort, see Section 7. For this purpose, we translate the proofs of correctness of the Java Virtual Machine (JVM) given in [M03] into Coq. Industrial scenario of interactive theorem proving may differ significantly from the academic scenarios above. Namely, industrial verification tasks often feature a bigger number of routine cases and similar lemmas; and also such tasks are distributed across a team of developers. Here, the inefficiency of automated proving often arises when programmers use different notation to accomplish very similar tasks, and thus a lot of work gets duplicated, see also [BHJM12]. We tested ML4PG in exactly such scenario: we assumed that a programming team has collectively developed proofs of a) soundness of specification, and b) correctness of implementation of Java byte code for a dozen of programs computing multiplication, powers, exponentiation, and other functions. We assumed that there is a relative novice in the team, trying to “learn” from the previous team efforts, in order to repeat their proof steps for a new Java function (factorial in our setting). He calls ML4PG, which discovers common patterns among these proofs and relevant lemmas (around 150 training examples in total). The suggested clusters indeed helped to advance the proofs of properties a) and b) for the Java byte code of the factorial function.

This is the first thorough and systematic evaluation of ML4PG, note that [KHG13] focused mainly on user interface and contained very simple examples. The case studies presented here convince us that when ML4PG statistically discovers proof clusters, it does actually find meaningful, non-trivial and interesting patterns in proofs across different libraries, theories and users. This kind of proof analysis can speed up the proof development by suggesting reusable proof strategies. ML4PG works on the background of Proof General, and if called, provides clustering results almost instantly; thus, can be used interactively, as a handy tool on request. Finally, it may be used for educational purposes, as automated proof-pattern recognition may help to smooth the learning curve, see User Scenarios 3 and 5.

ML4PG and all examples presented in this paper are available in [HK12].

Related Work. ML4PG’s originality is two-fold, as it can be compared to alternative methods of using machine-learning in automated theorem proving, as well as to Coq/SSReflect tools allowing interactive pattern-search.

Related work on using machine-learning in ITPs concerned hints in lemma generation for Isabelle/HOL [JDB11], proof strategy discovery in Isabelle/HOL [BB05], speed up in proof automation in HOL-Light [ku12] and statistical tactic analysis in Isabelle [Duncan02]

. Comparing to these tools, we use unsupervised, rather than supervised, learning; and we do not use sparse machine-learning methods. (See also

[KHG13, HK13] for a detailed comparison of different machine learning tools applied in various theorem provers.) We do not have a quantitative target when it comes to improving interactive proof building experience: no longer speed up in automated proof search or the number of automatically proven theorems are the main criteria of success. Instead, the user experience is the main parameter we target. We generally follow the “qualitative” intuition that ML4PG, being an interactive hint generator, must provide interesting and non-trivial hints on user’s demand, and should be flexible and fast enough to do so in real time, at any stage of the proof, and relative to any chosen proof library.

Comparing to some of the above approaches, ML4PG does not only analyse the lemma statements, but also involves user’s tactics and user-defined proof-steps into the statistical proof-pattern recognition process. This feature also makes ML4PG sensitive (or else adaptable) to proof styles innate to the certain user, research community, or subject area (cf. Sections 37). In illustration of this point, User Scenario 2 and Section 4 consider cases when different lemma statements have similar proofs; User Scenario 4 and Section 6 discuss cases when similar lemma statements require a completely different proof strategy.

Comparing to symbolic methods of proof pattern search in Coq, e.g. Search, SearchPattern, SearchAbout, SearchRewrite [Coq, SSReflect] and Whelp [AspertiGCTZ04], ML4PG’s originality is in introducing statistical pattern-recognition into the rich family of existing searching mechanisms in Coq/SSReflect. Unlike symbolic pattern-search, ML4PG can discover “unexpected” proof-patterns that go beyond the patterns the user would try as a searching template when using symbolic pattern-search facilities. Whereas the existing Coq searching tools try to match the user-provided template with other lemma/theorem statements, ML4PG takes into consideration the proof statistics in conjunction with the lemma shapes. These two features – pattern-search without a pre-defined template and the attention to the various proof parameters – allow to achieve results often orthogonal to the symbolic pattern search. Section 4 illustrates this new view on proof-pattern search.

2 Ml4pg

In this section, we present the main functionality that ML4PG offers to the user. ML4PG works with Coq and its SSReflect dialect, and it does not assume any machine-learning knowledge from the user. The guidance it provides may come in different forms. The user may prefer the statistical hint to be related to the current proof-step (cf. User Scenarios 2, 3 and 5), or give information about proof-patterns arising in a library irrespective of the current proof-step (cf. User Scenarios 1 and 4). The user may choose to data-mine only the current library, or a number of proof-libraries coming from different domains or different users. Finally, the user may wish to experiment with proof clusters of different sizes or with different machine-learning algorithms, see Tables 3, 5, 6, 10. These choices are accommodated within ML4PG, see [KHG13] for a detailed description of the user interface.

ML4PG functionality is achieved in the following way.

  • it works on the background of Proof General extracting some low-level features from proofs in Coq/SSReflect.

  • it automatically sends the gathered statistics to a chosen machine-learning interface and triggers execution of a clustering algorithm of user’s choice;

  • it does some post-processing of the results given by the machine-learning tool, and displays families of related proofs to the user.

Stage F.1 is devoted to collecting statistics from proofs. The discovery of statistically significant features in data is a research area of its own in machine-learning, known as feature extraction, see [Bishop]

. Statistical machine-learning algorithms classify given examples seen as points in an

-dimensional space, where

is the maximum number of features each example may be characterised by. Irrespective of the particular feature extraction algorithm used, most pattern recognition tools 

[Bishop] will require that the number of selected features is limited and fixed – the exception to this is a special class of methods called “sparse” methods [Blum92].

ML4PG has its own feature extraction method that collects statistics from the interaction between the user and the prover. The feature extraction is done at the time of the interactive proof construction in the current library or during the Coq compilation for an external library. The feature extraction method captures information from proofs based on the correlation of a few chosen parameters within five proof steps. For each proof step, the parameters are:

  • the names and the number of tactics used in one command line,

  • types of the tactic arguments;

  • relation of the tactic arguments to the (inductive) hypotheses or library lemmas,

  • three top symbols in the term-tree of the current subgoal, and

  • the number of subgoals each tactic command-line generates.

When the correlation of these few parameters is taken within a few proof-steps, the arising statistics reveal patterns that can tell a lot about the “meta” proof strategy expressed by the tactics and subgoals. The details and discussion of this feature-extraction method can be found in [KHG13], and the new experimental extensions of it are available in [HK12]. We will not focus on the technical details of the ML4PG feature extraction here, but rather concentrate in the coming sections on proving the point that these simple statistical parameters (40 for one proof patch of five possibly composite proof steps) can indeed capture some essential proof-strategies, interesting and helpful enough from the user’s perspective.

Once all features are extracted, ML4PG is ready to communicate with machine learning interfaces (Stage F.2). ML4PG is built to be modular – that is, the feature extraction is first completed within the Proof General environment, where the data is gathered in the format of hash tables, and then these tables are converted to the format of the chosen machine-learning tool. In [KHG13], we connected ML4PG to several machine-learning algorithms available in Matlab [Matlab] and Weka [Weka]; the results that we obtained with both systems were similar and Weka has the advantage of being an open-source software; hence, we use only Weka throughout this paper, but see [KHG13] for a discussion of Matlab facilities.

ML4PG offers a choice of pattern-recognition algorithms. ML4PG is connected only to clustering algorithms [Bishop]

– a family of unsupervised learning methods. Unsupervised learning is chosen when no user guidance or class tags are given to the algorithm in advance: in our case, we do not expect the user to “tag” the library proofs in any way. Clustering techniques divide data into

groups of similar objects (called clusters), where the value of

is provided by the user. There are several clustering algorithms available in Weka (K-means, FarthestFirst and Expectation Maximisation, in short E.M.) and the user can select the algorithm using the ML4PG menu included in the Proof General interface. We illustrate the effect of changing clustering algorithms in Tables 

3, 5, 6 and 10.


As will be illustrated in the later sections, various numbers of clusters can be useful: this may depend on the size of the Coq library, and on existing similarities between the proofs. ML4PG has its own algorithm that determines the optimal number of clusters interactively, and based on the library size. As a result, the user does not provide the value of directly, but just decides on granularity in the ML4PG menu, by selecting a value between and , where stands for a low granularity (producing fewer large clusters) and stands for a high granularity (producing many smaller clusters). Given a granularity value , the number of clusters is given by the formula

It is worth mentioning that it is the nature of statistical methods to produce results with some probability, and not being able to provide guarantees that a certain cluster will be found for a certain library. However, ML4PG ensures quality of the output in several different ways (Stage

F.3). First of all, the results are not taken from one random run of a clustering algorithm – instead, ML4PG output shows a digest of clustering results coming from runs of the clustering algorithm. The 200 runs were experimentally found to be optimal for noticing important statistics in ML4PG setting. Only clusters that appear frequently enough are displayed to the user. There is a way to manipulate the frequency threshold within ML4PG. Another measure is a proximity value assigned by clustering algorithms to every term in a cluster – the value ranges from to , and indicates the certainty of the given example belonging to the cluster. This proximity value is also taken into account by ML4PG before the results are shown. If a lemma is contained in several clusters, proximity and frequency values are used to determine one “most reliable” cluster to display.

We refer to the ML4PG user manual [HK12] for a more detailed description of how to use the tool.

3 User scenario 1. Detecting patterns in early-stages of the development

Users of ITPs usually start their developments loading some libraries. Those libraries contain definitions, lemmas and theorems that will be used as background theory during the proof process. Some of those libraries are specific for concrete theories, but others are common for almost every development. The common libraries contain strategies and definitions that can be extrapolated to other contexts; however, detecting lemmas that follow a concrete proof-strategy can be a challenge. In this first scenario, we study the patterns that appear in the SSReflect library [SSReflect].

The second purpose of this section is to set terminology and the general style of statistical proof-pattern analysis we will use throughout other sections.

The SSReflect library extends the Coq proof language and consists of 7 files containing basic theories about: natural numbers, lists, booleans, functions, finite types, choice types and types with a decidable equality. The library contains a total of 1404 theorems; therefore, a manual inspection of these theorems to detect patterns is unfeasible. In our first scenario, we test how ML4PG can be used to detect patterns in the SSReflect library.

We analyse clusters that are produced in the SSReflect library using the K-means algorithm and the value as granularity parameter, these options produce the best results in our experiments. ML4PG discovers 280 clusters using those parameters. In the of those clusters (126 clusters), all the lemmas belong to the same library. We call a cluster homogeneous if it contains lemmas and theorems from one library, and heterogeneous if it contain objects from different libraries.

The mean size of these homogeneous clusters are 4 elements, and the similarities of the lemmas of a cluster can be easily spotted in most of the cases. From the 126 clusters, we can obtain the following classification of clusters.

  • of the clusters consists of lemmas about related functions.

    Example 3.1.

    Examples of this kind of clusters are the ones including lemmas about: max and min functions (ssrnat library), for instance the cluster containing the two following lemmas:

    Lemma maxn_mulr : right_distributive muln maxn.
    Lemma minn_mulr : right_distributive muln minn.

    and and or (ssrbool library), e.g. the one that consists of the lemmas:

    Lemma andbb : idempotent andb.
    Lemma orbb : idempotent orb.

    take and droptake takes the first elements of a list and drop removes the first elements of the list – (seq library), e.g. the one that consists of the lemmas:

    Lemma map_take s : map (take n0 s) = take n0 (map s).
    Lemma map_drop s : map (drop n0 s) = drop n0 (map s).
  • of clusters contain lemmas that follow the same proof structure and that share some common auxiliary results.

    Example 3.2.

    Examples of this kind of cluster appears for the seq library, for instance the cluster that contains the following lemmas:

    Lemma has_map a s : has a (map s) = has (preim f a) s.
    Proof. by elim: s => //= x s ->. Qed.
    Lemma all_map a s : all a (map s) = all (preim f a) s.
    Proof. by elim: s => //= x s ->. Qed.
    Lemma count_map a s : count a (map s) = count (preim f a) s.
    Proof. by elim: s => //= x s ->. Qed.

    and also for the nat library:

    Lemma addnCA : left_commutative addn. move=> m n p; elim: m => //= m; rewrite addnS => <-.
    Lemma mulnDl : left_distributive muln addn.
    by move=> m1 m2 n; elim: m1 => //= m1 IHm; rewrite -addnA -IHm.
  • of clusters consists of theorems that are used in the proofs of other theorems of the same cluster.

    Example 3.3.

    ML4PG discovers that the following two lemmas are in the same cluster:

    Lemma altP : alt_spec b.
    Lemma boolP : alt_spec b1 b1 b1. Proof. exact: (altP idP). Qed.
  • of clusters are formed by view lemmas, an important kind of lemmas that are used in SSReflect to apply boolean reflection [SSReflect].

    Example 3.4.

    ML4PG finds a cluster with the following two view lemmas coming from the fintype library:

    Lemma unit_enumP : Finite.axiom [::tt]. Proof. by case. Qed.
    Lemma bool_enumP : Finite.axiom [:: true; false].
    Proof. by case. Qed.
  • of the clusters contain equivalence lemmas that are proven just by simplification.

    Example 3.5.

    An example of this kind of clusters is given by the cluster that contains the following lemmas:

    Lemma multE : mult = muln.     Proof. by []. Qed.
    Lemma mulnE : muln = muln_rec. Proof. by []. Qed.
    Lemma addnE : addn = addn_rec. Proof. by []. Qed.
    Lemma plusE : plus = addn. Proof. by []. Qed.
  • of the clusters consists of lemmas that are solved using analogous lemmas.

    Example 3.6.

    An example of clusters that consists of lemmas that are solved using analogous lemmas is the one containing the following two lemmas.

    Lemma addnAC : right_commutative addn.
    Proof. by move=> m n p; rewrite -!addnA (addnC n). Qed.
    Lemma subnAC : right_commutative subn.
    Proof. by move=> m n p; rewrite -!subnDA addnC. Qed.

    Namely, lemma subnDA (forall (a b c : nat), a - (b + c) = (a - b) - c) can be obtained automatically from lemma addnA (forall (a b c : nat), a + (b + c) = a + b + c) using techniques like lemma analogy [lpar13].

In the case of heterogeneous clusters (clusters that include lemmas from different libraries), ML4PG discovers 154 clusters. In this case, the size of the clusters is bigger than in the case of homogeneous clusters; namely, the mean size is 8 lemmas per cluster. The different clusters can be classified as follows.

  • of the clusters contain lemmas that state properties applicable to several operators from different libraries.

    Example 3.7.

    ML4PG discovers a cluster containing lemmas about the associativity of the addition of natural numbers (addn function) and the associativity of the concatenation of lists (++ operator).

    Lemma catA s1 s2 s3 : s1 ++ s2 ++ s3 = (s1 ++ s2) ++ s3.
    Proof. by elim: s1 => //= x s1 ->. Qed.
    Lemma addnA : associative addn.
    Proof. by move=> m n p; rewrite (addnC n) addnCA addnC. Qed.
  • of the clusters consists of lemmas related to operations over base case of types.

    Example 3.8.

    As an example of this kind of clusters, ML4PG discovers that there is a strong correlation among the following four lemmas:

    Lemma andTb : left_id true andb. Proof. by []. Qed.
    Lemma orFb : left_id false orb. Proof. by []. Qed.
    Lemma mul0n : left_zero 0 muln. Proof. by []. Qed.
    Lemma sub0n : left_zero 0 subn.    Proof. by []. Qed.
  • of the clusters come from lemmas whose proof rely on the fundamental lemmas.

    Example 3.9.

    ML4PG discovers a cluster with the following two lemmas about rot (that rotates a list l left n times) and the expn (exponentiation function).

    Lemma rot0 s : rot 0 s = s.
    Proof. by rewrite /rot drop0 take0 cats0. Qed.
    Lemma expn_eq0 m e : (m ^ e == 0) = (m == 0) && (e > 0).
    Proof. by rewrite !eqn0Ngt expn_gt0 negb_or -lt0n. Qed.

    At first sight, it seems that the only similarity between these two lemmas is that they only use rewriting rules in their proofs, however if we carefully inspect the lemmas that are used for rewriting, we notice that most of them are fundamental lemmas about nil (the base constructor for the list type) and (the base constructor for the nat type).

  • of the clusters combine lemmas from the libraries about lists and natural numbers – note that the definition of lists and natural numbers is quite similar, both have one base case and a recursive one, so several lemmas are solved applying induction and using the inductive hypothesis.

    Example 3.10.

    An example of this kind of clusters is given by the one that consists of the following lemmas:

    Lemma catrev_catr s t u : catrev s (t ++ u) = catrev s t ++ u.
    Proof. by elim: s t => //= x s IHs t; rewrite -IHs. Qed.
    Lemma mulnDl : left_distributive muln addn.
      by move=> m1 m2 n; elim: m1 => //= m1 IHm;
      rewrite -addnA -IHm.
    Lemma mem_cat x s1 s2:
        (x \in s1 ++ s2) = (x \in s1) || (x \in s2).
    Proof. by elim: s1 => //= y s1 IHs; rewrite !inE /= -orbA -IHs.

    In all these lemmas, we can see that induction is applied and after the use of some rewriting rules the inductive hypothesis is applied to finish the proof.

The similarity of most clusters ( of them) can be easily discovered just inspecting the statement of the lemmas and their proofs. However, clustering is a statistical tool and in some cases there is not a clear correlation among the lemmas of a cluster. In most of those cases, the clusters contain more than 10 elements, and we can discover patterns among subsets of those clusters, but it is difficult to find a common pattern followed by all lemmas.

The above results show that ML4PG can be useful to detect patterns in early stages of a development. Namely, it can be used to find relations among functions and their lemmas, common strategies followed in a library and fundamental lemmas applied in several proofs. Besides, if a user knows a library (e.g. the library defining natural numbers), ML4PG can show similarities between lemmas about natural numbers and lists, facilitating the use of the new library based on the previous knowledge of the user.

4 User scenario 2. Detecting patterns across libraries during a proof

The second case study concerns discovery of proof patterns in mathematical proofs across formalisations of apparently disjoint mathematical theories: Linear Algebra, Combinatorics and Persistent Homology. In this scenario, we use statistically discovered proof patterns to advance the proof of a given “problematic” lemma. In this case, a few initial steps in its proof are clustered against several mathematical libraries.

In this section, we deliberately take lemmas belonging to very different Coq libraries. Lemma 4.1 states a result about nilpotent matrices [BR91] (a square matrix is nilpotent if there exists an such that ). Lemma 4.2 is a basic fact about summations. Finally, Lemma 4.3 is a generalisation of the fundamental lemma of Persistent Homology [HCMS12].

Lemma 4.1.

Let be a square matrix and be a natural number such that , then .

Lemma 4.2.

If , then

Lemma 4.3.

Let , then

When proving Lemma 4.1, a user can get stuck after a few standard proof steps as the ones presented in Table 1. At this point it is difficult, even for an expert user, to get the intuition that he can reuse the proofs of Lemmas 4.2 and 4.3. There are several reasons for this. First of all, the formal proofs of these lemmas are in different libraries: the proof of Lemma 4.1 is in the library about matrices, the proof of Lemma 4.2 is in the library about basic results concerning summations, and the proof of Lemma 4.3 is in the library about Persistent Homology. Then, it is difficult to establish a conceptual connection among them. Moreover, although the three lemmas involve summations, the types of the terms of those summations are different. Therefore, search based on types or keywords would not help. Even search of all the lemmas involving summations does not provide a clear suggestion, since there are more than lemmas – a considerable number for handling them manually.

Goals and Subgoals Proof-Steps (Tactics)
move => M m nilpotent.
rewrite big_distrr mulmxBr mul1mx.
case : n.
by rewrite !thinmx0.
Table 1: First steps of the proof of Lemma 4.1, where the user stops and invokes ML4PG for a hint.

However, if Lemmas 4.2 and 4.3 are suggested when proving Lemma 4.1, the expert would be able to spot the following common proof pattern.

Proof Strategy 4.4.

Apply case on .

  1. Prove the base case (a simple task).

  2. Prove the case :

    1. expand the summation,

    2. cancel the terms pairwise,

    3. the only terms remaining after the cancellation are the first and the last one.

We also include the following lemma, suggested by ML4PG. At the first sight, the proof of this lemma is an unlikely candidate to follow Proof Strategy 4.4, since the statement of the lemma does not involve summations. However, inspecting its proof (see Table 2), we can see that it uses as witness for and then follows Proof Strategy 4.4.

Goals and Subgoals Proof-Steps (Tactics)
move => M m nilpotent.
  \sum_(0<=i<m.+1) (pot_matrix M i).
rewrite big_distrl mulmxrB mulmx1.
case : n.
by rewrite !thinmx0.
Table 2: First steps of the proof of Lemma 4.5. Note the correlation between this proof and the proof of Lemma 4.1 in Table 1.
Lemma 4.5.

Let be a nilpotent matrix, then there exists a matrix such that .

Discovering that, across 758 lemmas and 5 libraries, the four lemmas given above follow the Strategy 4.4 is our next benchmarking task for ML4PG. Table 3 shows the results that ML4PG will obtain, if the user varies the clustering algorithms and the size of clusters when calling ML4PG within the proof environment; we also highlight the exact solution to our benchmark task, given by calling K-means algorithm with granularity .

Algorithm: () () () () ()
K-means 76 51 16 4 2
E.M. 26 51 41 11 3
FarthestFirst 81 48 31 25 21
Table 3: Clustering experiments discovering Proof Strategy 4.4. When is chosen by the user, ML4PG dynamically calculates the number of clusters . Choosing Lemma 4.1 for pattern-search, the table shows the size of the single cluster that ML4PG displays to the user for every choice of . The table shows sizes of clusters containing: Lemma 4.1, Lemma 4.2, Lemma 4.3 and Lemma 4.5. Note that with all variations of the learning algorithms and parameters, ML4PG is consistently grouping the correct lemmas into clusters, albeit with varied degree of precision. Note also the effect of the granularity parameter, the smallest granularity value produces big clusters that are difficult to inspect; on the contrary, the biggest value produces small clusters that might not contain enough elements to spot a pattern. The most accurate result for this example is obtained using the K-means algorithm and 4 as granularity parameter.

Table 3 gives an insight on how granularity parameter can be used in pattern-recognition within ML4PG. One can use a “bottom-up approach” to detect patterns starting with the granularity value of 1 and increasing that value in successive calls. Using the default values and K-means algorithm, ML4PG obtains 15 suggestions related to lemmas about summations including Lemmas 4.2 and 4.3. Increasing the granularity level to 4, ML4PG discovers that Lemma 4.1 is similar to Lemmas 4.2, 4.3 and 4.5. Finally, increasing the granularity level to the maximum level of , ML4PG just discovers that Lemmas 4.1 and 4.5 are similar.

When the proof analogy is found (as all four proof patches taken by ML4PG follow steps and of Proof Strategy 4.4), the user can use the remaining proof steps for Lemma 4.5 as a template for reconstructing the remaining proof for Lemma 4.1 (by analogy with the steps and in Lemma 4.5). The heterogeneous cluster obtained in this scenario belongs to the category of clusters consisting of theorems that share the same proof structure and as well as an auxiliary lemma (e.g. the lemma used to expand the summation).

5 User scenario 3. Proof patterns for importing proof methods

There is a trend in ITPs to develop general purpose methodologies to aid in the formalisation of a family of related proofs. However, although the application of a methodology is straightforward for its developers, it is usually difficult for an external user to decipher the key results to import such a methodology into a new development. Therefore, tools which can capture methods and suggest appropriate lemmas based on proof patterns would be valuable. In our third scenario, we show how ML4PG can be useful in this context with an example coming from the formal proof of correctness of a Computer Algebra algorithm.

Most algorithms in modern Computer Algebra systems are designed to be efficient, and this usually means that their verification is not an easy task. In order to overcome this problem, a methodology based on the idea of refinements was presented in [DMS12], and was implemented as a new library CoqEAL, built on top of the SSReflect libraries. The approach to formalise efficient algorithms followed in [DMS12] can be split into three steps:

  • define the algorithm relying on rich dependent types, as this will make the proof of its correctness easier;

  • refine this definition to an efficient algorithm described on high-level data structures; and,

  • implement it on data structures which are closer to machine representations.

The CoqEAL methodology is clear and the authors have shown that it can be extrapolated to different problems [CMS12]. Nevertheless, this library contains 720 lemmas; and the search of proof strategies inside this library is not a simple task if undertaken manually.

In order to illustrate this, let us consider the formalisation of a fast algorithm to compute the inverse of triangular matrices over a field with s in the diagonal using the CoqEAL methodology. SSReflect already implements the matrix inverse relying on rich dependent types using the invmx function; then, we only need to focus on the second and third steps of the CoqEAL methodology.

Using high-level data structures, we can define the below algorithm specially designed to efficiently compute the inverse of triangular matrices with s in the diagonal. Let be a square triangular matrix of size with s in the diagonal; then we define recursively a function called fast_invmx as follows.

  • If , then fast_invmx(M)=1%M (where 1%M

    is the notation for the identity matrix in SSReflect).

  • Otherwise, decompose in a matrix with four components: the top-left element, which is

    ; the top-right line vector, which is null; the bottom-left column vector

    ; and the bottom-right matrix ; that is, . Then define fast_invmx(M) as:


    where *m is the notation for matrix multiplication in SSReflect.

Subsequently, we should prove the equivalence between the functions invmx and fast_invmx. Proving this result is not trivial due to the different nature of the algorithms: the former is a general algorithm to compute the inverse of matrices – using adjugate matrices and determinants; on the contrary, the latter is an ad hoc efficient algorithm for a special case of triangular matrices, which takes advantage of the shape of those matrices to obtain their inverse.

In the CoqEAL library, there are just three lemmas devoted to prove the equivalence between a matrix algorithm and its efficient version. Namely, those lemmas are related to the multiplication, the rank and the determinant of matrices. However, the strategies followed to prove those equivalence lemmas are ad hoc for the concrete algorithms, and the only common step that we could reuse is the application of induction on the size of the matrix. In this situation, and after applying induction as shown in Table 4, it makes sense to ask ML4PG for a hint that could help us to tackle the proof before trying to prove it by brute force.

Goals and Subgoals Proof Steps (Tactics)
forall M:M_n, lower1 M -> fast_invmx M = invmx M
elim : n.
forall M:M_0, lower1 M -> fast_invmx M = invmx M
by move => M0 lower1;
  rewrite !thinmx0.
forall n : nat,
(forall M : M_n, lower1 M -> fast_invmx M = invmx M) ->
forall M : M_n+1, lower1 M -> fast_invmx M = invmx M
move => n IHm M lower1
fast_invmx M = invmx M
M *m fast_invmx M = 1
by rewrite fast_invmxE.
Lemma fast_invmxE : forall M:M_n, lower1 M -> M *m fast_invmx M = 1. Auxiliary lemma
Table 4: Proof of equivalence between fast_invmx and invmx. Above the dotted line: first four steps after which the user invokes ML4PG. Below the line: the proof steps taken by analogy with lemmas suggested by ML4PG. Note the introduction of lemma fast_invmxE at the bottom of the table, this lemma is introduced by analogy with Lemma 5.1. The predicate lower1 M indicates that the matrix M is a triangular matrix with 1s in the diagonal.

For our experiments in this scenario, we consider both the matrix library of SSReflect and the CoqEAL library for clustering – they involve 1118 lemmas.

We start using as granularity parameter (see Table 5). However, none of the clustering algorithm finds a useful proof cluster for our proof – K-means and FarthestFirst return an empty cluster, and E.M. returns a cluster with 48 elements (so, it is time consuming to explore all those lemmas searching for a pattern). Therefore, we decrease the granularity parameter to in order to obtain more general clusters and with lower correlation. Using these settings, ML4PG suggests 6 lemmas. Three of them are the ones about efficient multiplication, rank and determinant; but, as we have previously said, they do not provide any hint to finish our proof. In addition, the following unicity lemma is provided as a suggestion.

Lemma 5.1.

Let and be two square matrices such that (where is the identity matrix); then, is the inverse of . This result is proven in SSReflect in lemma invmx_uniq, see Table 4.

As we have seen in Section 3, a kind of cluster that is discovered by ML4PG is the one that contains theorems that use other theorems of the same cluster in their proofs. This is one of those cases; namely, to prove the equivalence between invmx and fast_invmx, it is enough to prove that given a triangular matrix M, M *m fast_invmx(M) = 1%M. This result is easy to prove.


Apply induction on the size of the matrix.

  • The base case is trivial.

  • In the inductive case, M *m fast_invmx(M) is equal to:

    Applying the inductive hypothesis, the result is proven.

Algorithm: () () () () ()
K-means 56 10 6 0 0
E.M. 0 240 48 48 48
FarthestFirst 0 0 0 0 0
Table 5: A series of clustering experiments discovering Lemma 5.1 in response to the clustering challenge of Section 5. In bold is the most useful clusters, they contain the equivalence lemmas and the unicity lemma; and their size. Note again that the K-means algorithm produces the best results. In this case, the values 4 and 5 for granularity do not provide any relevant cluster.

Therefore, ML4PG does not provide here any proof pattern and the correlation of our current proof with the suggested lemmas is not too high, but it has helped us to formulate an auxiliary lemma to finish our original proof.

Once we have proven the equivalence between the two matrix inverse algorithms, we can focus on the third step of the CoqEAL methodology. It is worth mentioning that neither invmx nor fast_invmx can be used to actually compute the inverse of matrices. These functions cannot be executed since the definition of matrices is locked in SSReflect to avoid the trigger of heavy computations during deduction steps. Using step S3 of the CoqEAL methodology, we can overcome this pitfall. In particular, we implement the function cfast_invmx using lists of lists as the low level data type for representing matrices.

To prove the equivalence between cfast_invmx and fast_invmx (the lemma is called cfast_invmxP), we can invoke ML4PG. If we use the default values of ML4PG, K-means algorithm and 3 as granularity value, ML4PG suggests three lemmas which prove the equivalence between executable and non-executable algorithms computing the rank, the determinant and the fast multiplication of matrices (see Table 6). Inspecting the proof of these three lemmas, the user can discover the following strategy that can be applied in general to prove the equivalence between executable and non-executable algorithms about matrices.

Proof Strategy 5.2.

Apply the morphism lemma to change the representation from abstract matrices to executable ones. Subsequently, apply the translation lemmas of the operations involved in the algorithm – translation lemmas are results which state the equivalence between the executable and the abstract counterparts of several operations related to matrices.

Algorithm: () () () () ()
K-means 4
Table 6: A series of clustering experiments discovering Proof Strategy 5.2. The table shows the sizes of clusters containing: a) Lemma cfast_invmxP, b) Lemma about rank (rank_elim_seqmxE), c) Lemma about fast multiplication (fast_mult_seqmxP), and d) Lemma about determinant (det_seqmxP). Again, ML4PG consistently groups the correct lemmas into clusters for various choices of algorithms and granularity; in addition, the default values (K-means algorithm and 3 as granularity value) provide the most accurate cluster only containing transitions lemmas (in bold); cf.  Tables 3 and 5.

In this case, the cluster that is found is heterogeneous (lemmas come from different libraries); however, it can be considered a homogeneous cluster that belongs to a library about lemmas proving the equivalence of non-executable and executable algorithms about matrices. In particular, the cluster belongs to one of the categories of homogeneous clusters presented in Section 3: lemmas with the same proof-structure and that use some common results. As a conclusion, ML4PG can help to import proof methods into new developments in two different ways. First of all, ML4PG can detect typical proof patterns of the methodology (the proof pattern of translation lemmas). In the cases where there is no common proof strategy, ML4PG can suggest lemmas that help in the formulation of auxiliary results, making the proof development easier.

6 User scenario 4. ML4PG for detecting irrelevant libraries

An (abstract) sequential game can be represented as a tree with pay-off functions in the leaves, dictating the win or loss of each player when the game finishes there. Each internal node is owned by a player and a play of a game is a path from the root to a leaf. A strategy is a game where each internal node has chosen a child. A Nash equilibrium is a strategy in which no agent can change one or more of his choices to obtain a better overall result for himself. A strategy is a subgame perfect equilibrium if it represents Nash equilibrium of every subgame of the original game.

In this scenario, we use ML4PG to analyse two Coq libraries that formalise that all sequential games have Nash equilibrium in binary games (games where each internal node has two choices) [Ves06] and in the general case [nash]. Note that unlike the other benchmarks presented throughout the paper, the files presented here are developed using plain Coq instead of SSReflect. ML4PG adapts to this change automatically.

It would be natural to assume that the proofs involved in verification of the two results will be very similar, and thus one could potentially hope for proof-pattern re-use. However, close inspection of these libraries can reveal that the actual proof strategies used in both libraries are different. Without ML4PG, such “negative” discovery would require user’s time and experience in comparing Coq proofs. We instead give it as a challenge to ML4PG, that takes only a few seconds to analyse the libraries. ML4PG loads the Coq files developed in [Ves06, nash] and a library about topological sorting [ler07] used in [nash]. These Coq files include 145 theorems, and we choose the K-means algorithm and the value as the granularity parameter to obtain clusters using ML4PG. ML4PG finds 32 clusters using those parameters, and their mean size is three elements per cluster. The question is: how can the user interpret these results, when he sees those 32 sets of approximately three lemmas/theorems on the Proof General screen?

It can be easily seen from ML4PG annotation of results that 21 of the 32 clusters () are homogeneous clusters, thus the similarity between the proofs within one library is higher than across libraries. Starting first with those homogeneous clusters, we notice that

  • 8 clusters () contain lemmas about related functions, – as they use similar lemmas in their proofs.

    Example 6.1.

    As an example of this kind of clusters, ML4PG discovers a cluster with two lemmas from [Ves06]: the first one (BI_Exists) states that for every game, there exists a strategy that makes the game to have Backward-Induction equilibrium (each player plays optimally at every node); the second lemma (NashEq_Exists) states the analogous result for Nash equilibrium. See Table 7 for the proof of these two theorems.

  • 6 clusters () consist of lemmas about a concrete function.

    Example 6.2.

    In [nash], there is a function called StratPref that given an agent and two strategies decides which is the best one. ML4PG finds a cluster with two lemmas: the first one (StratPref_dec) states the decidability of the function; and the second one states that the function produces an irreflexive relation.

  • 4 clusters () contain theorems that use other theorems of the cluster in their proofs.

BI_Exists NashEq_Exists
Theorem BI_Exists : Theorem NashEq_Exists :
    forall g, exists s, BI s /\ g = s2g s.     forall g, exists s, NashEq s /\ g = s2g s.
Proof. deskolem_apply BI_fctExists. Qed. Proof. deskolem_apply NashEq_fctExists. Qed.
Theorem BI_fctExists : exists F, forall g, Theorem NashEq_fctExists : exists F, forall g,
     BI (F g) /\ g = s2g (F g).      NashEq (F g) /\ g = s2g (F g).
Proof. Proof.
exists compBI. intro g. split. exists compBI. intro g. split.
         exact (compBI_is_BI g).          apply BI_is_NashEq. exact (compBI_is_BI g).
 exact (s2g_inv_compBI g).  exact (s2g_inv_compBI g).
Qed. Qed.
Table 7: Proofs of theorems BI_Exists and NashEq_Exists, coming from one library [Ves06]; and grouped together by ML4PG.
Binary case General case
Lemma SGP_is_NashEq : Lemma SPE_is_Eq :
forall s : Strategy, SGP s -> NashEq s. forall s : Strat, SPE s -> Eq s.
Proof. Proof.
induction s. intros. destruct s; simpl in H; tauto.
unfold NashEq. intros _.  induction s’. Qed.
intros. unfold stratPO. unfold agentConv in H.
rewrite (H a). trivial.
unfold agentConv. intros. contradiction.
unfold SGP. intros [_ [_ done]]. trivial.
Table 8: Proof of the theorem stating that Subgame Perfect Equilibrium implies Nash Equilibrium. Left. Binary case. Right. General case. The lemma statements are very similar; however, the structure of the proofs is completely different; hence, ML4PG does not group these proofs together.

This quick analysis would show that some obvious grouping of proofs within one library was made by ML4PG. But, unless we are interested in a particular proof technique appearing in one of them, we direct our attention to patterns found across the libraries, hoping to find some common proof methods across the developments.

In the case of heterogeneous clusters, all the clusters consist of lemmas about auxiliary functions (for instance, about different properties of lists) that are common in all the libraries we are studying. However, there is no correlation among the important theorems of these libraries; see Table 8. Even if [nash] is a generalisation of the work presented in [Ves06], the proofs for Nash equilibrium are completely different, mainly for two reasons. First, the datastructures that are used in each development are too different, and therefore the lemmas about them do not have a strong correlation. In addition, the approaches that are used to prove them are completely different: one based on a procedure called backward induction [Ves06] and the other is based on the fact that the preference of players is acyclic.

The results do not vary much when we try changing the clustering algorithm and the granularity values – reducing the granularity value produces bigger homogeneous clusters, but has little effect on heterogeneous clusters. As can be seen from this example, the ML4PG-based proof-pattern check could be an easy and fast way of getting the information about absence of recyclable patterns across the libraries.

Note that in this case study, the total number of lemmas is smaller than in the rest of the case-studies (145 theorems); but the feature extraction mechanism of ML4PG automatically adapts to this, and handles statistics of small data sets as well as statistics of bigger data sets.

7 User scenario 5. A team-based development

In the last scenario, we turn to team-based applications of Coq and ML4PG. For this purpose, we translate the ACL2 proofs of correctness of the Java Virtual Machine (JVM) [M03] into Coq/SSReflect. JVM [JVM] is a stack-based abstract machine which can execute Java bytecode. We have modelled an interpreter for JVM programs in Coq/SSReflect. From now on, we refer to our machine as “SJVM” (for SSReflect JVM).

Industrial scenario of interactive theorem proving may involve distribution of work-load across a team, and a bigger proportion of routine or repetitive cases. Here, the inefficiency often arises when programmers use different notation to accomplish very similar tasks, and thus a lot of work gets duplicated, see also [BHJM12]. We tested ML4PG in exactly such scenario: we assumed that a programming team is collectively developing proofs of soundness of specification, and correctness of implementation of Java bytecode for a dozen of programs computing multiplication, powers, exponentiation, and other functions about natural numbers. A new team member then tries to learn the important proof patterns while trying to prove similar results for new function – factorial.

Given a specific Java method, we can translate it to Java bytecode using a tool such as javac of Sun Microsystems. Such a bytecode can be executed in SJVM provided a schedule (a list of thread identifiers indicating the order in which the threads are to be stepped), and the result will be the state of the JVM at the end of the schedule. Moreover, we can prove theorems about the SJVM model behaviour when interpreting that bytecode.

Example 7.1.

The bytecode associated with the factorial program can be seen in Figure 1.

static int factorial(int n) {   int a = 1;   while (n != 0){     a = a * n;     n = n-1;     }   return a; } Fixpoint helper_fact (n a) := match n with | 0 => a | S p => helper_fact p (n * a) end. Definition fn_fact (n : nat) := helper_fact n 1.
Figure 1: Factorial function. Left: Java program for computing the factorial of natural numbers. Centre: Java bytecode associated with the Java program. Right: tail recursive version of the factorial function in Coq/SSReflect.

The state of the SJVM consists of 4 fields: a program counter (a natural number), a set of registers called locals (implemented as a list of natural numbers), an operand stack (a list of natural numbers), and the bytecode program of the method being evaluated.

Java bytecode, like the one presented in Figure 1, can be executed within SJVM. However, more interesting than mere executing Java bytecode, we can prove the correctness of the implementation of the Java bytecode programs using Coq/SSReflect. For instance, in the case of the factorial program, the new team member is asked to prove the following theorem, which states the correctness of the factorial bytecode.

Theorem 7.2.

Given a natural number and the factorial program with as an input, SJVM produces a state which contains on top of the stack running the bytecode associated with the program.

The proof of theorems like the one above always follows the same methodology adapted from ACL2 proofs about Java Virtual Machines [M03] and which consists of the following three steps.

  1. Write in Coq/SSReflect the specification of the function and the algorithm, and prove that the algorithm satisfies the specification.

  2. Write the JVM program within Coq/SSReflect, define the function that schedules the program (this function will make SJVM run the program to completion as a function of the input to the program), and prove that the resulting code implements this algorithm.

  3. Prove total correctness of the Java bytecode.

Using this methodology, we have proven the correctness of several programs related to arithmetic (multiplication of natural numbers, exponentiation of natural numbers, and so on); see [HK12]. The proof of each theorem was done independently from others to model a distributed proof development.

Therefore, we simulate the following scenario. Suppose a new developer tackles for the first time the proof of Theorem 7.2, and he knows the general methodology to prove it and has access to the library of programs previously proven by other users. In this situation the different notation employed by different users obscures some common features. ML4PG would be a good alternative to the manual search for proof patterns.

Factorial Exponentiation
Lemma fn_fact_is_theta n : fn_fact n = n‘!. Lemma fn_expt_is_theta n m : fn_expt n m = n^m.
Proof. Proof.
rewrite /fn_fact. by rewrite /fn_expt helper_expt_is_theta
by rewrite helper_fact_is_theta mul1n.           mul1n.
Qed. Qed.
Lemma helper_fact_is_theta n a : Lemma helper_expt_is_theta n m a :
      helper_fact n a = a * n‘!.      helper_expt n m a = a * (n ^ m).
Proof. Proof.
move : n a; elim : m => [a m| m IH n a /=]. move : a; elim : n => [a| n IH a /=].
           by rewrite /theta_expt fact0 muln1.             by rewrite /theta_fact expn0 muln1.
by rewrite IH /theta_fact factS by rewrite IH /theta_expt expnS
          mulnA [a * _]mulnC.           mulnA [a * _]mulnC.
Qed. Qed.
Table 9: Proofs of equivalence of the tail-recursive and recursive versions of functions exponentiation and factorial, following Proof Strategy 7.3. The left-hand-side shows a few initial proof steps for fn_fact_is_theta, leading to a deadlock. The right-hand-side shows the lemma (fn_expt_is_theta) suggested by ML4PG (see Table 10) and an auxiliary lemma used to prove it. In italics is the proof reconstruction by analogy.

Let us focus on the first step of the methodology – that is, the proof of the equivalence between the specification of the factorial function (which is already defined in SSReflect using the function factorial having the notation n‘! for factorial n) and the algorithm, see Table 9. The Java factorial function is an iterative function; and the algorithm is written in Coq as a tail recursive function, see the right side of Figure 1. In the available SJVM libraries, all the tail recursive functions are defined using an auxiliary function, called the helper, and a wrapper for such a function. Discovering this fact is the first challenge for ML4PG. Suppose the new team member has stopped after one proof-step of trying to prove the lemma fn_fact_is_theta in a naive way, without a helper function; see Table 9. He cannot proceed, and calls ML4PG for a hint. The suggestions provided by ML4PG in this case are the proofs of step for three iterative programs: the multiplication, the exponentiation and the power of natural numbers; see e.g. Lemma fn_expt_is_theta in Table 9. It is easy to notice that all of them use an auxiliary lemma (like helper_expt_is_theta), and thus follow the same proof strategy:

Proof Strategy 7.3.

Prove an auxiliary lemma about the helper considering the most general case. For example, if the helper function is defined with formal parameters , , and , and the wrapper calls the helper initializing at , the helper theorem must be about (helper n m a), not just about the special case (helper n m 0). Subsequently, instantiate the lemma for the concrete case.

The technical details are as follows. ML4PG correctly suggested similar lemmas to lemma fn_fact_is_theta. Table 10 shows the results for different choices of algorithms and parameters, and we highlight the most precise and helpful ML4PG result. In case the user is unsure of the optimal machine-learning parameters, he could use a “top-down approach”. The highest granularity level does not produce any result. But, if we decrease the granularity level to 4, ML4PG spots some interesting similarities using the K-means algorithm. If this is not enough to discover Proof Strategy 7.3, one can decrease the granularity level to , for which ML4PG discovers four lemmas following the same general scheme.

On the basis of these suggestions, the new team member can try to reconstruct the missing auxiliary lemma and the missing proof steps in the main lemma by analogy. Table 9 shows such analogical reconstruction in italics. This takes him through the first step of the general proof scheme.

Algorithm: () () () () ()
K-means 30 4 4 2 0
E.M. 21 7 7 0 0
FarthestFirst 28 25 0 0 0
Table 10: A series of clustering experiments discovering Proof Strategy 7.3. The table shows the sizes of clusters containing: Lemma about JVM multiplication program, Lemma about JVM power program, Lemma about JVM exponentiation program, and Lemma about JVM factorial program. The size of the data set is 147 lemmas, in bold is the cluster that finds exactly the four benchmark examples. Again, the lemmas grouped by clusters are consistently found for various algorithms and granularity values; and the K-means algorithm provides the most accurate clusters using as granularity value.

In the second stage, he needs to prove that the Java bytecode implements the factorial algorithm. Again, after a few proof-steps, the user gets stuck and cannot continue the proof, see Table 11. If ML4PG is invoked at this point, it suggests three lemmas (using K-means algorithm and 3 as granularity value) that are used to prove that three Java bytecode programs implement respectively multiplication, exponentiation and power algorithms. All these Java bytecode programs are iterative and involve a loop, and it is easy to notice that the proofs follow the same proof strategy:

Lemma program_is_fn_fact n :
run (sched_fact n) (make_state 0 [::n] [::] pi_fact) =
  (make_state 14 [::0;fn_fact n ] (push (fn_fact n ) [::]) pi_fact).
rewrite run_app.
rewrite loop_is_helper_fact.
 Lemma loop_is_helper_fact n a :
 run (loop_sched_fact n) (make_state 2 [::n;a] [::] pi_fact) =
 (make_state 14 [::0;(helper_fact n a)] (push (helper_fact n a) [::]) pi_fact)
move : a; elim : n => [// | n IH a].
by rewrite -IH subn1 -pred_Sn [_ * a]mulnC.
Lemma program_is_fn_expt n m :
run (sched_expt n m) (make_state 0 [::n;m] [::] pi_expt) =
  (make_state 14 [::0;fn_expt n m] (push (fn_expt n m) [::]) pi_expt).
rewrite run_app loop_is_helper_expt.
 Lemma loop_is_helper_expt n m a :
 run (loop_sched_expt n) (make_state 2 [::n;m;a] [::] pi_expt) =
 (make_state 14 [::0;(helper_expt n m a)] (push (helper_expt n m a) [::]) pi_expt)
move : n a; elim : m => [// | m IH n a].
by rewrite -IH subn1 -pred_Sn.
Table 11: Proofs that the Java bytecodes implement the factorial and exponentiation algorithms. When the user tries to prove program_is_fn_fact, he stops after one proof step (left-hand-side of the figure) and calls ML4PG. ML4PG suggests a few theorems, like e.g. program_is_fn_expt (right-hand-side of the figure). It would work for e.g. K-means algorithm and granularity values from 1 to 4, but using 4 as granularity value the cluster only contains these two lemmas. In italics, the user reconstructs the proof by analogy with program_is_fn_expt following Proof Strategy 7.4.
Proof Strategy 7.4.

Prove that the loop implements the helper using an auxiliary lemma. Such a lemma about the loop must consider the general case as in the case of Proof Strategy 7.3. Subsequently, instantiate the result to the concrete case.

Using this strategy and by analogy with the proofs of the other lemmas of the cluster, the user can finish the proof of lemma program_is_fn_fact, Table 11 shows in italics the reconstruction of that proof.

Finally, it remains to prove the total correctness of the Java bytecode (Theorem 7.2). ML4PG finds that all the proofs of the total correctness of the different programs are similar and follow the same proof pattern which consists of applying the lemmas obtained from steps and , see Table 12. Again, Table 12 illustrates the scenario of calling ML4PG on demand, and using its suggestions to reconstruct the proof by analogy. Following these guidelines, Theorem 7.2 can be formalised in Coq/SSReflect by analogy with a similar lemma for e.g. exponentiation, obtaining as a result the proof of the correctness of the factorial Java bytecode, as shown in Table 12; see also [HK12] for the full proof.

Theorem total_correctness_fact n sf :
sf = run (sched_fact n) (make_state 0 [::n] [::] pi_fact) ->
next_inst sf = (HALT,0\%Z) /\ top (stack sf) = (n‘!).
move => H; split
      ; rewrite H program_is_fn_fact fn_fact_is_theta.
Theorem total_correctness_expt n m sf :
 sf = run (sched_expt m) (make_state 0 [::n;m] [::] pi_expt) ->
 next_inst sf = (HALT,0%Z) /\ top (stack sf) = (n^m).
           by move => H; split; rewrite H  program_is_fn_expt fn_expt_is_theta.
Table 12: Proofs of total correctness for exponentiation and factorial programs, cf. Theorem 7.2. The left-hand-side shows the initial step to prove Theorem 7.2 (total_correctness_fact). ML4PG suggests a few theorems, like e.g. total_correctness_expt (right-hand-side of the figure). ML4PG provides this suggestions for different parameters (e.g. K-means algorithm and granularity values from 1 to 4), but using 4 as granularity value and K-means as algorithm the cluster only contains these two lemmas. In italics, the user reconstructs the proof by analogy with the theorem total_correctness_expt.

The clusters found in the JVM scenario are heterogeneous since they belong to different libraries. The clusters obtained for the different steps are in the category of clusters that consists of lemmas with the same proof structure and that use analogous lemmas. This is an interesting kind of clusters since the analogous lemmas could be automatically generated using techniques presented in [lpar13].

8 Conclusions and Future Work

We have presented five scenarios, of very different nature and domain, to test the capabilities of statistical proof-pattern recognition. We have observed that ML4PG’s feature extraction provides sufficiently robust results, tested using a few most common clustering algorithms (cf. Tables 356 and 10). Judging by the experiments, K-means algorithm is the most reliable algorithm, showing very stable results. The best value for granularity depends on the size of the library, in big libraries (cf. User Scenarios 1, 2 and 3) granularity values and return the most accurate clusters; however, in small libraries (cf. User Scenarios 4 and 5) the granularity value of produces better results. ML4PG in general requires minimum user effort – mainly concerning adjustments of the granularity parameter to obtain the result of the required precision. ML4PG is very fast and gives instant outputs allowing the user to have quick search/evaluation in the interactive manner.

The most valuable feature of ML4PG is that it works equally well with any library we tried; irrespective of the subject domain or the size of the libraries. This property can be used to find patterns across subjects, libraries, and users; – as our case studies illustrate. Moreover, ML4PG discovers two kinds of clusters: homogeneous (all the lemmas of the cluster belong to the same library) and heterogeneous (the lemmas of the cluster belong to different libraries). Most of the time, the relation among the elements of a homogeneous cluster is clear (same proof structure, same lemmas or analogous lemmas). On the contrary, the relation among the elements of a heterogeneous cluster is more subtle (e.g. a general proof strategy or the use of some kind of auxiliary lemma).

The work is under way to incorporate the following extensions into ML4PG (see [HK12] for the most recent ML4PG versions):

  • a more sophisticated proof-patch identification for bigger proofs. This paper is based on ML4PG version that uses only patches of the first few steps in a proof, but see [HK12] for the experimental version that uses proof-patches to cover entire proofs.

  • have a robust data-mining of type declarations and (co-)inductive definitions, alongside the currently used proof-analysis.

  • introducing recurrent approach to feature-extraction similar to [lpar13].

A longer-term project is to generate auxiliary lemmas and definitions automatically, on the basis of statistically discovered patterns. We have already done that for ACL2 [lpar13]; however, extrapolation of the techniques of [lpar13] from first-order untyped language to higher-order dependently-typed language is a difficult task.


We would like to thank Marco Gaboardi and Vladimir Komendantsky for proof-reading the paper; their suggestions helped us to improve presentation.