Source code and obtained results for "Automatically Proving Mathematical Theorems with Evolutionary Algorithms and Proof Assistants"
Mathematical theorems are human knowledge able to be accumulated in the form of symbolic representation, and proving theorems has been considered intelligent behavior. Based on the BHK interpretation and the Curry-Howard isomorphism, proof assistants, software capable of interacting with human for constructing formal proofs, have been developed in the past several decades. Since proofs can be considered and expressed as programs, proof assistants simplify and verify a proof by computationally evaluating the program corresponding to the proof. Thanks to the transformation from logic to computation, it is now possible to generate or search for formal proofs directly in the realm of computation. Evolutionary algorithms, known to be flexible and versatile, have been successfully applied to handle a variety of scientific and engineering problems in numerous disciplines for also several decades. Examining the feasibility of establishing the link between evolutionary algorithms, as the program generator, and proof assistants, as the proof verifier, in order to automatically find formal proofs to a given logic sentence is the primary goal of this study. In the article, we describe in detail our first, ad-hoc attempt to fully automatically prove theorems as well as the preliminary results. Ten simple theorems from various branches of mathematics were proven, and most of these theorems cannot be proven by using the tactic auto alone in Coq, the adopted proof assistant. The implication and potential influence of this study are discussed, and the developed source code with the obtained experimental results are released as open source.READ FULL TEXT VIEW PDF
Source code and obtained results for "Automatically Proving Mathematical Theorems with Evolutionary Algorithms and Proof Assistants"
Public materials of Natural Computing Laboratory
Human knowledge has been accumulated in various forms. Among them are theorems in mathematics that are presented in a precise fashion and can be applied to innumerous domains. The importance of mathematics and its influence on science, engineering, and all kinds of technologies are undoubtedly beyond discussion. Modern mathematics are formal systems composed of four components: an alphabet, a grammar, axioms, and inference rules. Given the alphabet and grammar, in an information technology way of thinking, the development of a mathematical field or branch can be viewed as the growth of a database containing knowledge, in the form of logical sentences considered true or proven to be true. Initially, the database is empty. The first step is to put in the axioms, which are logical sentences considered true, followed by a loop: proving a new logical sentence by applying the inference rules to the database and inserting the proven logical sentence back to the database. The proven logical sentences are called theorems, propositions, lemmas, or corollaries.
Before the proposal of the link between logic and computation, the principle of Propositions as Types, logic and computation were previously considered two separate fields . Based on the BHK interpretation [2, 3] and the Curry-Howard isomorphism [4, 5, 6], (functional) programming languages , including Haskell [8, 9], and proof assistants, such as Coq [10, 11], HOL [12, 13], Isabelle [12, 14], and LEGO [15, 16] have been developed. Due to the transformation from logic to computation, proofs to mathematical theorems can then be expressed as programs and verified computationally.
Mathematical theorem proving has been considered intelligent behavior . Even for today, while a Go computer program, called AlphaGo, able to beat the human Go champion of Europe, Fan Hui , by five games to zero [19, 20, 21] and Lee Sedol 
, who was ranked world #2, by four games to one exists, the ability of computers to fully automatically prove mathematical theorems still seems quite limited. It is probably because most of the effort made on proof assistants aims at automated proof checking, which ensures the correctness of the proof, and at providing a proof development environment interacting with human. Moreover, there seems to be a lack of effective search mechanisms, which can “think outside of the box,” especially randomized or stochastic ones, used for this purpose.
Evolutionary algorithms , as stochastic methods, have been known for their flexility and versatility. They have been successfully applied to handle a variety of scientific and engineering problems in numerous disciplines. Hence, in this study, we would like to make our first attempt to link evolutionary algorithms, as the program generator, and proof assistants, as the proof verifier, in a simplistic way. The primary goal of this study is to assess the feasibility of establishing frameworks capable of fully automatically proving mathematical theorems by integrating the facility of generating programs and the facility of verifying proofs as evaluating programs. Specifically, we will design a straightforward evolutionary algorithm without complicated mechanisms and simply adopt Coq  for fitness evaluation. The preliminary results indicate the feasibility and demonstrate that this direction of research is promising.
The remainder of the paper is organized as follows. Section II introduces the background regarding the primary goal of the present work. Section III describes in detail the ad-hoc attempt we adopt to investigate the feasibility of automatically proving mathematical theorems. The preliminary results obtained in our experiments are presented in section IV, followed by section V giving a broad discussion on this work and its potential influence. Finally, section VI concludes the paper.
The goal of this study is to make an attempt to automatically prove mathematical theorems, and the present work is done by linking two well developed fundations, proof assistants and evolutionary algorithms. In this section, the background regarding the primiary goal of this study is presented.
First of all, automated theorem proving is a term quite closely related to this study. Although automated theorem proving and the research line of this study may share the same ultimate goal, the ways to progress towards the ultimate goal are actually entirely different, especially that evolutionary algorithms, as stochastic search methods, are introduced in the present work to conduct global search to find formal proofs.
The term automated theorem proving appeared in the 1950s as the most developed field within automated reasoning and was applied in 1956 to Logic Theory Machine [25, 26]
, a deduction system for the propositional logic which adopts a heuristic approach to emulate human reasoning. It was the first program designed to prove mathematical theorems. Another field of automated reasoning with less automation but more pragmatic actions is calledinteractive theorem proving, in which proof assistants  have been developed. As software tools, proof assistants incorporate automated reasoning techniques to make logical decisions on mathematical theorems.
Proof assistants, such as Coq , HOL , Isabelle , and LEGO  have been designed to interact with users for developing formal proofs and to verify the logical validity of proofs. Among them is Coq, an interactive theorem prover developed by Inria  that allows users to express mathematical assertions by specifying various strategies called tactics. It works based on the theory of the Calculus of Inductive Construction, a derivative of the calculus of constructions. Coq also provides a formal language to write mathematical definitions, executable algorithms and theorems together with an environment for semi-interactive development of machine-checked proofs. To help users to deal with formal proofs, Coq checks the validity of tactics step by step automatically. The features, capabilities, and representation of formalization of Coq are the reasons for Coq to be adopted in this study.
The other foundation is evolutionary algorithms , which have been successfully applied to resolve issues of many different natures in a host of domains. Evolutionary algorithms are known for their flexibility and versatility because their ability to handle black-box optimization makes it possible, even easy, to interface with nearly all kinds of search and optimization problems. Therefore, evolutionary algorithms are clearly suitable techniques currently available to be incorporated in the present work.
First of all, the source code developed in this study, the supporting materials required for conducting the experiments, and our experimental results are open source in the repository of nclab  on GitHub . It should be viable and relatively easy for practitioners and interested readers to replicate the experimental results described later in this article.
Our present ad-hoc attempt, as suggested by the title of this paper, consists of two major components: an evolutionary algorithm for finding proofs and the proof assistant for verifying proofs. For part of the proof assistant, we employ Coq  in this study. The reason to employ Coq is quite straightforward that according to the entry of proof assistants  in Wikipedia, Coq supports most features, including higher-order logic and dependent types, which are most likely necessary if research along this line is continually pursued in the future.
Proof development in Coq is done through a sequence of tactics to interact with the current goals shown in the Coq interface. These tactics convert a goal of a proof into subgoals by implementing backward reasoning from conclusions to premises. For example, in order to prove , and must be both proven. Figure 1
shows a situation in which a Coq user strives to solve and eliminate subgoals stated below the line to prove the lemma or, programmatically speaking, to define the type of the lemma. Such a user-guided proof process allows certain degree of freedom and helps to find proofs in a procedural manner.
Because a deep structure of three-layered correspondence between logic and computation  exists: propositions as types, proofs as programs, and simplification of proofs as evaluation of programs, Coq as a proof assistant has been developed based on this correspondence in order to provide the capability of checking the validity of proofs computationally. From the side of Coq’s current intended usage, the logic side, Coq users give their proofs in Coq’s way such that the proofs can be verified. Now, it is time to use Coq from the viewpoint of the other side, the programming side.
In some sense, we can consider that what Coq does is to provide an execution environment in which programs are written in the programming language defined by Coq. The tactics for composing proofs in Coq can be considered as instructions, such as those of x86 CPUs  or ARM CPUs . Thus, a proof written as a sequence of tactics in Coq can be viewed as a program composed of a sequence of (CPU) instructions. If the “Coq CPU” can successfully execute a given sequence of instructions, the proof expressed as this sequence of tactics is then verified to be logically valid. It is exactly what “simplification of proofs as evaluation of programs ” indicates.
While techniques developed for Linear Genetic Programming may be a good choice to perform the role of generating CPU instruction sequences, i.e., searching for proofs, we employ a more rudimentary evolutionary approach in this study, because the goal is to make a first, simplistic attempt for linking two domains, proof assistants and evolutionary algorithms, to examine whether some proof-of-principle results can be obtained. Search methods which are highly customized for generating programs of specific “machines/CPUs” might not be totally adequate for the “Coq CPU,” which is yet fully understood from the viewpoint of evolutionary algorithms.
As a consequence, in this study, we adopt a simple, GP-like evolutionary algorithm of which the pseudo code is outlined in Algorithm 1 to work with Coq using the standard library and a “tactic base” consisting of 153 tactics, which is available for download from our GitHub repository. Since the goal of this work is to make an attempt to link evolutionary algorithms and proof assistants, instead of aiming to find a proof of some specific logical sentence, we use only the maximum generation as the stopping criterion. For the experiments conducted in this study, we use the same tactic base and set , , and . As explained, the tactic base can be considered as the “Coq CPU” instruction set. As an example, if there are 8 tactics in the tactic base, we number them from 0 to 7 as
Then, the chromosome representation adopted in this work is a variable-length integer sequence because the number of tactics needed to complete a proof may vary in length. For example, if there are chromosomes and :
these two chromosomes respectively correspond to the following Coq proofs:
The tactic intros is not included in the adopted tactic base and is inserted in the front of each individual.
The procedure for initializing the population is given in Algorithm 2. Currently, the tactic base contains 153 tactics, and thus, is 152. We use and in all the conducted experiments.
Parental selection, crossover, and mutation are given in Algorithms 3, 4 and 5, respectively. For a given population, we select two individuals among the top 50% as parents to go through crossover and, probably, mutation to generate one offspring. The individuals ranked top 50% have an equal probability to be selected as parents. The employed crossover operator is essentially the well known one-point crossover. Because the lengths of the two chromosomes are probably different, the cut point is uniformly chosen within the length of the shorter chromosome. The crossover creates only one child, and the child consists of the first parent before the cut point and the second parent after the cut point. The mutation operator may be applied with the mutation rate . One of the tactics of the individual to be mutated is replaced by one from the tactic base. Both choices are uniformly random.
Finally, the evaluation procedure invokes Coq to evaluate individuals in the population and is shown in Algorithm 6. As aforementioned, this study is a proof-of-principle one to check the viability of linking evolutionary algorithms and proof assistants. We would like to consider Coq as a black box and make use of the results obtained by simply invoking Coq. Hence, we use “the number of tactics successfully verified by Coq” as the fitness value. Such fitness might actually be inappropriate because apparently, longer chromosomes are preferred to shorter ones, and it is somewhat counterintuitive to common mathematical sense. However, from the viewpoint of evolutionary algorithms, longer chromosomes in the population might play the role of a pool of partial proofs, and adopting such fitness may just be beneficial in this regard.
In order to use the number of tactics as fitness, some limitation is required to avoid meaningless conditions. For example, the tactic simpl may be repeated forever, and tactics regarding the commutative laws can be repeated forever by pairs. For this kind of cases, we give a list of unrepeatable tactics without affecting Coq’s operations. In addition to the number of passed tactics, if Coq gives an error message saying “No such unproven subgoal,” we consider the proof is complete with extra redundant tactics and assign a fitness value of 1000 minus “the number of tactics successfully verified by Coq,” where 1000 is an arbitrary large number used to separate complete proofs and the others. For complete proofs, shorter proofs are still preferred because of higher scores.
In the source code available from our GitHub repository, we also pick some individuals with the top scores for further examination by invoking Coq on them in a tactic-by-tactic manner. If a complete proof is found, it is archived for our research purpose and nothing regarding the evolutionary process is changed. This step, or in fact using as the stopping criterion, may not be necessary if in the future, finding a proof to some given logical sentence is the only goal.
With the adopted evolutionary algorithm and our ad-hoc attempt presented in section III, we have successfully proven ten theorems of different branches in mathematics automatically. As aforementioned, the proofs found by the introduced approach for the following theorems are also available as supporting materials in our GitHub repository.
Theorem n_le_k : forall n k, n = 0 n k.
Theorem plus_n_0 : forall n, n + 0 = 0.
Theorem n_1_n : forall n, n ^ 1 = n.
Lemma solving_by_eapply : forall(P Q : nat Prop),
(forall n k, Q k P n) Q 1 P 2.
Theorem andb_prop : forall n k,
andb n k = true n = true k = true.
Theorem andb_true_elim2 : forall n k : bool,
andb n k = true k = true.
Theorem andb_true_intro : forall n k,
n = true k = true andb n k = true.
Theorem ev_minus2 : forall n,
ev n ev (pred (pred n)).
Theorem SSev_even : forall n, ev (S (S n)) ev n.
Theorem silly_prob :
(forall n, evenb n = true oddb (S n) = true)
oddb 4 = true.
Note that the tactic auto is not included in the tactic base, and more importantly, most of the theorems listed above cannot be proven by using only auto in Coq. Interested readers may further investigate into Coq and check out the supporting materials for more information.
The obtained, preliminary results are quite within our expectation. We were able to prove a number of simple theorems as listed above, while we failed to do the same on relatively advanced theorems. Simple theorems are more likely to be proven in several steps with different sequences of tactics. For a concrete example, we can consider one here:
|Theorem||andb_true_intro : forall n k,||(1)|
|n = true k = true andb n k = true.|
According to our experimental results, more than thirty different valid sequences of Coq tactics have been found in one hundred generations with 153 tactics in the tactic base. As aforementioned, the size of population, i.e., the number of individuals, is 1000, and in one successful run, the first complete proof was found at the 8th generation. Since the 8th generation, the evolutionary process continued to generate more than a few distinct sequences of tactics, although these seemingly different proofs might not be entirely mathematically different. Such a situation indicates that the diversity of proofs exists and will be further discussed in the next section.
Despite the fact that proven theorems in these preliminary results seem quite simple, even straightforward in human eyes, the potential for research and development on linking the two domains, evolutionary algorithms and proof assistants, can clearly be identified. These ten proven theorems are from various branches of mathematics, and the proofs are found by using the identical evolutionary algorithm with the same tactic base of Coq. Thus, the current attempt can be considered neutral, not biased towards certain branch of mathematics.
It can also be justified that finding the proofs to the above theorems is hardly by chance. The capability of automatically proving mathematical theorems does exist in a system linking together evolutionary algorithms and proof assistants. Taking the theorem of Equation (1
) and 153 Coq tactics for example, one has barely no chance to come up with a valid proof using a pure random search. Because the first proof, whose length was 5, was discovered at the 8th generation with a population of 1000 individuals, the probability for a pure random search to find a proof with the same computational resource can be estimated approximately
If there are actually 100 valid proofs of different sequences of which the length is 5 or shorter than 5, the probability is still about the magnitude of or . On the other hand, it took around only 8000 evaluations to “guess it right,” which costs little with modern computational facilities.
In summary, ten simple mathematical theorems from various branches of mathematics, most of them cannot be proven by the current, limited automatic mechanism of proof assistants alone, were proven by the proposed ad-hoc approach with a reasonable cost of computational resource. The feasibility of automatically generating complete proofs to mathematical theorems is clearly indicated.
In order to assess the feasibility of automatically proving mathematical theorems, an ad-hoc attempt, composed of a straightforward evolutionary algorithm and the direct use of Coq, was described in section III. The preliminary results presented in section IV provide the evidence that the line of research on linking evolutionary algorithms and proof assistants is worth pursuing. Thus, the primary goal of this study has been achieved.
As mentioned in section IV, the proposed ad-hoc approach failed to prove relatively advanced theorems which require proofs of longer tactic sequences. Although it is reasonable and acceptable in the present work, efforts on both sides need to be done to develop an automatic theorem proving framework for practical use.
Firstly, on the side of proof assistants, in the ad-hoc attempt, we basically use the number of passed tactics and its calculation as fitness. In addition to forbidding some tactics to repeat, we may consider different fitness assignment for compound usage of certain tactics or variable fitness weights on tactics for some sequence patterns. Moreover, instead of simply invoking Coq and directly using the returned results, looking into the running status of Coq may give essential information on the proving progress. If the internal (running) status of Coq can be obtained and utilized for evaluating individuals, the evolutionary process will be much better guided and the searching/proving ability of the system will be greatly enhanced.
On the side of the search methodology, since the feasibility to use evolutionary algorithms to search for formal proofs has been preliminarily identified, techniques from Linear Genetic Programming can then be considered and put into action. In addition to Linear Genetic Programming, other variants of Genetic Programming, such as Cartesian Genetic Programming [39, 40, 41], or even techniques commonly used in Genetic Programming, such as automatically defined function (ADF) and automatically defined macro (ADM), can be integrated into the evolutionary proof-search engine to extract “modularized” segments of proofs in a sense which may yet be understandable for the time being.
Further into the realm of randomized search methods, given the recent success in Go playing with computer programs , Monte Carlo tree search  may also be considered as the search mechanism. Although a great amount of efforts will be required for its adoption and adaptation in the theorem proving framework, Monte Carlo tree search is a promising methd alternative to the techniques of Genetic Programming.
From the obtained the preliminary results, we can easily find that even an automatic theorem proving framework as simple as our ad-hoc attempt is capable of finding alternative proofs, composed of different sequences of tactics, to a given theorem. Figures 2 to 6 show some of these alternative proofs found in our experiments.
|Theorem n_le_k : forall n k, n = 0 n k.|
|Theorem ev_minus2 : forall n, ev n ev (pred (pred n)).|
|Theorem SSev_even : forall n, ev (S (S n)) ev n.|
These tactic sequences verified by Coq might not be totally different from the viewpoint of concepts in mathematics, while they still differ from each other with the distinct adoption of logical strategies expressed by the tactic sequences. Taking the proofs in Figure 6 as an example, one can easily discriminate these proofs by examining whether or not the technique of Mathematical Induction is used. As a metaphor, we can think that these proofs reach the same intersection from different roads. Such an ability to provide diverse proofs is valuable and important, because the diversity either for an entire proof or in proof segments may not only help to automatically construct proofs to advanced theorems but also possibly inspire mathematicians to develop complicated proofs.
|Theorem n_1_n : forall n, n ^ 1 = n.|
It must be noted that the nature and properties of this work are distinctly different from those of conventional optimization tasks handled by using mathematical or evolutionary optimization techniques, although proofs are indeed found via application of optimization methods. Unlike optimization tasks are often repeatedly solved, there is in fact no need to prove a known theorem once again. Hence, the future development of this work should focus on the capability of handling the structural flexibility of proofs in order to prove non-trivial theorems, instead of on the efficiency, either finding a proof faster or finding a shorter proof. Efficiency should not be the primary goal, at least not at early stages of this type of work. If in the future, finding shorter proofs is required, advanced techniques, such as superoptimization [43, 44], may be utilized.
The influence of this line of research may be profound, if frameworks ready for practical use can be developed. Mathematical theorems are human knowledge able to be accumulated in a database with its symbolic representation. Apparently, all the theorems known to human as of today were proven by mathematicians, professional or amateur, or genius. Ordinary people without proper training and specific talents probably have little chance to make contributions in this regard. However, if automatic theorem proving frameworks exist, with the support of certain online cloud services, proving mathematical theorems can be actually crowdsourced. Imaging that if there is a database containing all the mathematical theorems ever proven by human, and any one with a computer can donate computational power for proving some unknown conjectures, human knowledge can then be accumulated in an unprecedented speed.
If we further stretch this probably already farfetched possibility, considering that a database containing all the proven logical sentences, i.e., theorems, exists, all the proofs to these theorems can be collected along the way of establishing this database. These proofs are also extremely useful and valuable information. As aforementioned, for a given theorem, alternative proofs may shed light on deep mathematical connections and possibly invoke mathematicians’ intuition. On the other hand, if there are repeating or similar tactic sequence patterns in the proofs across different, perhaps unrelated theorems, novel proof techniques may be discovered and identified. Data mining methods and techniques proposed in data science can be applied to the collected proofs in order to further accelerate the acquiring of knowledge.
In this paper, we introduced the possibility and potential to establish the link between evolutionary algorithms and proof assistants, both of which have been respectively developed in the past several decades while have not been linked to each other. The primary goal of this study was to assess the viability to automatically prove mathematical theorems. In order to achieve the goal in a simplistic way, we adopted Coq, one of the available, well developed proof assistants for validating proofs and designed a straightforward evolutionary algorithm capable of generating integer sequences, interpreted as Coq tactic sequences, i.e., proofs. The preliminary results indicated that the direction of linking evolutionary algorithms and proof assistants is worth pursuing, and the potential influence of the possible outcomes will be remarkably significant in many aspects of advancing and accumulating human knowledge.
As discussed in section V, the immediate future work includes adopting advanced techniques in existence, such as Linear Genetic Programming and Cartesian Genetic Programming, and investigating into Coq to utilize the Coq internal status at run time for better fitness assignment and therefore better evolutionary guidance. Moreover, Monte Carlo tree search can also be considered to enhance the search ability.
A reasonable first milestone for this line of research is to fully automatically prove non-trivial, well known theorems, such as Fermat’s little theorem. At the stage, accumulating human knowledge on mathematics in a database of symbolic representation is viable by (re-)finding proofs to known theorems. The next milestone, which seems quite farfetched for the time being, is to prove existing mathematical conjectures. If such a milestone is reached, advancing human knowledge on mathematics can then be done in an automatic, computational way. Finally, the ultimate milestone is to prove unknown, meaningful theorems without human intervention, while in addition to the capability of automatically proving theorems, it undoubtedly requires other intelligent behavior and abilities, including creativity, speculation, and desire to solve problems.
The work was supported in part by the Ministry of Science and Technology of Taiwan under Grant MOST 104-2221-E-009-126. The authors are grateful to the National Center for High-performance Computing for computer time and facilities.
D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of Go with deep neural networks and tree search,”Nature, vol. 529, no. 7587, pp. 484–489, 2016.
Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary Programming, Genetic Algorithms. Oxford, UK: Oxford University Press, 1996, ISBN: 0-19-509971-0.