Log In Sign Up

Rare Speed-up in Automatic Theorem Proving Reveals Tradeoff Between Computational Time and Information Value

We show that strategies implemented in automatic theorem proving involve an interesting tradeoff between execution speed, proving speedup/computational time and usefulness of information. We advance formal definitions for these concepts by way of a notion of normality related to an expected (optimal) theoretical speedup when adding useful information (other theorems as axioms), as compared with actual strategies that can be effectively and efficiently implemented. We propose the existence of an ineluctable tradeoff between this normality and computational time complexity. The argument quantifies the usefulness of information in terms of (positive) speed-up. The results disclose a kind of no-free-lunch scenario and a tradeoff of a fundamental nature. The main theorem in this paper together with the numerical experiment---undertaken using two different automatic theorem provers AProS and Prover9 on random theorems of propositional logic---provide strong theoretical and empirical arguments for the fact that finding new useful information for solving a specific problem (theorem) is, in general, as hard as the problem (theorem) itself.


Deriving Theorems in Implicational Linear Logic, Declaratively

The problem we want to solve is how to generate all theorems of a given ...

Pecan: An Automated Theorem Prover for Automatic Sequences using Büchi Automata

Pecan is an automated theorem prover for reasoning about properties of S...

Object-Oriented Theorem Proving (OOTP): First Thoughts

Automatic (i.e., computer-assisted) theorem proving (ATP) can come in ma...

Theorem Proving Based on Semantics of DNA Strand Graph

Because of several technological limitations of traditional silicon base...

Extremal combinatorics, iterated pigeonhole arguments, and generalizations of PPP

We study the complexity of computational problems arising from existence...

Autoformalization with Large Language Models

Autoformalization is the process of automatically translating from natur...

Strong Converse using Change of Measure Arguments

The strong converse for a coding theorem shows that the optimal asymptot...

1 Introduction

Speed-up in automatic theorem proving should not be regarded just as a quest for faster provers. To prove a theorem is an essentially difficult task and a very sensible question is whether there can be a shortcut to it by adding additional information. So the issue is what information can be added so that a proof is significantly shorter. Viewed from a very abstract point of view, of course, there is an easy (and trivial) answer: add the theorem itself and the proof becomes minimal! But adding all provable theorems to an axiomatic system renders it pointless.

On the other hand, a system under memory constraints that are not minimal (i.e. that can store a set of useful theorems apart from the axioms of a theory) but at the same time bounded by a realistic measure, would be theoretically optimal if it stores only the most useful theorems for a certain task (that is, those that can make a proof shorter). A naive strategy would be to prove theorems incrementally and save them (until the memory is full). When a new theorem has to be proved the system can resort to previous theorems to see if a shorter proof can be obtained other than trying to prove the theorem from scratch. But how useful can this approach be? As we will show, it is not only naive and so expectedly not very useful, but it is also very rarely fruitful and in equal measure a waste of resources.

For this, we use a very basic but objective measure of speed-up, namely shortening the length of proofs. We sample random propositional calculus theorems and with the help of two automatic theorem provers (AProS and Prover9) we produced alternative proofs for them including as additional axioms other (random) valid formulas. Afterwards, we compare the length of the proofs to establish if a shorter one arose from the use of the new axioms. In order to select our samples, we restrict our analysis to a fragment of propositional logic, given by an upper bound on the number of atomic propositions and of logical connectives.

To find a proof for a theorem from a finite set of formulas using an automatic theorem prover, one can always start from and apply all transformation rules until is generated, then pick out the sequence on the path from to as proof. In practice, however, important optimization strategies have to be implemented to avoid exponential execution time even for the simplest of proofs. We will explore how implementing these algorithmic strategies leads to a compromise between various seminal complexity currencies, shedding light on a possibly more general challenge in problem solving, related to the usefulness of information and computational time.

Definition 1.

Let be a finite set of formulas and let stand for the fact that is provable from . will denote the length of the minimum proof for , as given by the number of logical deduction steps used in the proof (for example, the number of lines in a Fitch diagram).

Let and be two representations of the class of equivalent theories (that is, and are finite sets of axioms with an identical set of logical consequences, or theories) such that . We define the speed-up delta for between and , or simply speed-up, as the function:

Let be a finite subset of the set of all valid formulas in propositional logic and let and . Now will be the set of all logical consequences (theorems) of in . Finally, the set is the subset of such that for a given subset of , with , we have the following property:

Note that this property is equivalent to

We ask after the relation and distribution between the size of the set with respect to the size of a finite subset of as a function of the set . In other words, given a random axiomatic system, if we strengthen it by adding a number of theorems as axioms, what does the distribution of non-trivial speed-ups look like? A set of valid propositional calculus sentences and axiomatic systems is constructed given three variables: represents the maximum depth of composition of logical operators, states the maximum number of different literals that can be present in a sentence, and determines the number of axioms in an axiomatic system.

We will report not only that instances of non-trivial positive speed-up are relatively rare, but that their number is considerably smaller than the number of instances of negative speed-up. In other words, strengthening a random axiomatic system by adding theorems as axioms tends to increase the length of proofs found by automatic theorem provers. We believe that the behavior observed verifies the stated condition and that its oblique distribution is strongly related to the computational difficulty of finding a proof of minimum size and deciding the usefulness of information as a whole.

2 Methodology

Prover9 is an automated theorem prover for First-order and equational logic developed by William McCune [17] at the Argonne National Laboratory. Prover9 is the successor of the Otter theorem prover. Prover9 is free and open source. AProS (Automated Proof Search) is a theorem prover that aims to find normal natural deduction proofs of theorems in propositional and predicate logic [18]. It does so efficiently for minimal, intuitionistic and classical versions of first-order logic.

We undertake an empirical exploration of the speed-up in propositional logic, using AProS and Prover9 in order to approximate the proof complexity over randomly generated sets of axiomatic systems and propositions. We later propose a necessary but possibly insufficient property (in definition 7) for judging the adequacy of the approximations so obtained.

To perform the exploration described, we have first to narrow the search space: We denote the set of all propositions bounded by and by , the set of all theories in by (), while and denote the sets of all theories and arguments bounded by and respectively. Finally, we define the sets:


The sets are generated recursively as follows:


where , and is the set that consists of the first variables of .

Note that, given an order to the set of Boolean operators, we can define an enumeration of all the members of based on its generation order, an order which is consistent among all the subsets for a given . The order defined is equivalent to giving each set an array structure on which the first position correspond to the first element of and to the last, where is defined as

We call this array the propositions array. Given this order, we can represent each theory comprising by a binary string of size on which the th bit is iff is in the theory. Following the previous idea, we can efficiently represent the members of by an array of integers, where each integer denotes the number of 0s present between each 1. Hence we can represent the theories by integer arrays of the form . We call this the -representation of the theory and denote it by .

Definition 3.

If and , then is known as the syntactic complexity of or depth of ; it is denoted by .

Now let’s consider the following function:

The function gives us the degree of separation between the first element in the array of propositions ([0,…,0]) and . Moreover, there is an exponential number of theories that share a value for the function. We call such a set, defined by , a separation class and the separation order of all the theories in the class. Recall that the position of each proposition depends on its order of generation. Hence the syntactic complexity of each theory is set by its order of separation.

Definition 4.

We define the syntactic complexity of a theory , denoted as , as

where is the syntactic complexity of .

Note that .

The size of the systems we are exploring grows by , making an exhaustive exploration intractable. The methodology used therefore consists in sampling the space of valid propositional sentences. The propositions are then used as theorem candidates against subsets of the formulae used as theories.

In order to compute each sample set we build two sets: a sample set of theories denoted by , composed of number of theories in , and the sample set of prospective theorems denoted by , composed of numbers of propositions in . Each set is randomly generated by, first, choosing a random list of numbers of the respective lengths between and . For the list , each of these numbers represents the prospective theorems sampled (for each theory). For the list , the numbers represent a separation class from which we choose a theory by assigning random numbers to each of the parameters of its representation, with the condition that their sum is the value of the chosen class. The chosen lists are then rid of inconsistent theories for and inconsistent propositions with respect to the first element for the list .

Afterwards, we use the lists obtained to compute a sample set of . First, for each we generate an additional number of theories of the form , where is a prefix of the list ; we call a base theory and a derived theory. Then, we pair all the theories generated with each of the propositions of , called objectives, to form a list of cases of the form . Afterwards, we remove the unprovable cases using by exhaustively exploration the corresponding truth tables.

It is important to note that we are generating a significant number of trivial cases when . Hence we expect at least close to instances of positive speed-up, depending on the number of unprovable cases.

Finally, we run an automatic theorem prover (ATP) and register the length of each of the proofs obtained, storing the shortest ones. However, we have no reason to believe that the use of an ATP system would give us a good approximation to the sparsity of proving speed-up. Therefore we define a speed-up function relative to each ATP:

Definition 5.

Let be an ATP system, a provable argument for , and two descriptions of such that . We define the speed-up delta relative to of between and , or simply relative speed-up, as the function:

where is the shortest proof found by for the argument .

Definition 6.

Let be a formal system, be an ATP system for , a provable argument for , and a description for such that . We call the function a bound for as a function of if

Now, since we do not have enough information about the existent sparsity of proving speed-up, we will define a necessary but possibly insufficient condition needed for an acceptable approximation:

Definition 7.

We say that an argument is trivial if . An ATP system is possibly normal for if, for each , , and if is a non-trivial argument, then .

(In lemma 8 we will show that normality is possible, although our example is far from ideal.)

The mathematical structure used to store and analyze the results obtained is called the speed-up matrix. On this matrix each entry has assigned the value


If is not a provable case, then the value of the entry is left undefined.

A natural simplification of this matrix is the incidence speed-up matrix, on which we simply assign different discrete values to one of the following four cases: , , and is undefined.

Note that by design we expect a prevalence in both matrices of diagonal structures composed of cases of positive speed-up with a period of . These structures correspond to the cases of trivial speed up included.

3 Results

To begin with, we performed more than 15 experiments using Prover9. The following table resumes a select number of results:

Exp. Num. Cases Percentage Ratio
11 5400 606 11.2% 94 6.44
10 6381 704 11.01% 137 5.138
7 4848 389 8.02% 231 1.683
5 5454 426 7.81% 24 17.75
3 11297 856 7.57% 70 12.228
Figure 1: The results exhibit a varying percentage of negative and positive speed-up instances. It is important to note the presence of a significant number of negative speed-up instances and the irregular distribution found among the samples.

As the table shows, Prover9 does not exhibit normal behavior as defined in 7. Furthermore, as exemplified in Fig. 2, the speed-up matrix does not present the expected periodic diagonal speed-up instances.

Figure 2: A grayscale representation of the speed-up matrix obtained for experiment number 11 using Prover9. The columns correspond to each theory generated and the rows to the theorems. For visualization purposes, the matrix is divided into four parts and only the instances of positive speed-up are colored, the darker tones corresponding to higher speed-up values.

Then, we present the Incidence Speed-up Matrix obtained from AProS under four different conditions: the basic experiment (the four logical connectives and classical deduction) (3.1); exclusion of disjunction as a connective (3.3); the basic experiment with intuitionistic logic (3.2); intuitionistic logic while restricting to the negative fragment as in 3.3 (3.4). The same set of cases was used when possible, i.e. 3.1 with 3.2 and 3.3 with 3.4. Also included is the matrix obtained for Prover9 during experiment 3.1. It is important to note that we obtained no negative speed-up values. The matrix values are represented using a color scale, where the color white corresponds to no speed-up, blue to positive speed-up and red to negative speed-up. Grey corresponds to unprovable cases or cases in which the time limit was reached.

The four experiments yield similar behavior: although AProS does show periodic diagonal structures it also exhibits a significant presence of negative speed-up instances, which makes the ATP otherwise than normal.

It is important to note that a degree of clustering of negative speed-up instances is expected, since each delta is computed from the minimum proof length found for each previously derived theory and current objective. It is arguable whether or not we are overcounting negative speed-up instances.

Each speed-up matrix is divided into four parts for visualization purposes. Each of the entries’ values is represented using a four-color scale, where the color white corresponds to no speed-up, blue to positive speed up and red to negative speed-up. The columns correspond to each theory generated and the rows to each of the theorems. Grey corresponds to unprovable cases or cases where a time limit was reached.

3.1 AProS speed-up incidence matrix with classical deduction and all four logical connectives

Provable Cases: 5763.
Positive : 632, percentage: 10.97%.
Negative : 564, percentage: 9.78%.

Figure 3: A color scale representation of the incidence speed-up matrix obtained for experiment 3.1 (classical deduction and all four logical connectives) using AProS. The periodic diagonal structures that correspond to the trivial speed-up instances are evident in this figure, but it also manifests a significant presence of negative speed-up instances, which means that the AProS is not a normal ATP system.

3.2 AProS speed-up incidence matrix with intuitionistic deduction and all four logical connectives:

Provable Cases: 5680.
Positive : 646, percentage: 11.37.%
Negative : 537, percentage: 9.45%.

Figure 4: A color scale representation of the incidence speed-up matrix compiled for experiment 3.2 (intuitionistic deduction and all four logical connectives) obtained from the same set of arguments used to compile the figure 3. The behavior is very similar to behavior observed in experiment 3.1, along with the significant presence of negative speed-up. We detected a small increase in the number of positive speed-up instances and a negligible decrease in negative speed-up cases.

3.3 AProS speed-up incidence matrix with classical deduction and without disjunction

Provable Cases: 6680.
Positive : 899, percentage: 13.46 %.
Negative : 484, percentage: 7.246%.

Figure 5: A visual representation of the incidence speed-up matrix compiled for the experiment 3.3 (classical deduction without disjunction) divided into four parts for visualization purposes. It is important to note that the set of arguments employed for the previous experiments is incompatible with the parameters established for this case. Hence a new random set had to be generated. From the image we can see that the speed-up distribution does not differ notably from previous experiments, aside from the significantly lower incidence of undemonstrable arguments.

3.4 AProS speed-up incidence matrix with intuitionistic deduction and without disjunction

Provable Cases: 6660.
Positive : 862, percentage: 12.94%.
Negative : 587, percentage: 8.81%.

Figure 6: A visual representation of the incidence speed-up matrix generated for experiment 3.4 (intuitionistic deduction without disjunction) obtained from the same set of arguments used to compile the figure 5. We can see that the behavior is very similar to that observed in experiment 3.3, along with the significant presence of negative speed-up. We detected a negligible decrease in the number of positive speed-up instances and a small increase in negative speed-up cases.

3.5 Prover9 incidence speed-up matrix without disjunction:

Provable Cases: 6680.
Positive : 312, percentage: 4.67%.
Negative : 0, percentage: 0%.

Figure 7: A color scale representation of the incidence speed-up matrix obtained for experiment 3.1 (classical deduction with all four logical connectives) using Prover9. The image is divided into four parts for visualization purposes. As with figure 2 the absence of the predicted diagonal structures is conspicuous, but of greater importance is the total absence of instances of negative speed-up.

3.6 Observations and Conclusions

The main objective of this project was to undertake an empirical exploration of the prevalence and distribution of instances of positive speed-up found within the propositional calculus. In particular, two deduction systems for propositional logic were explored: natural deduction and binary resolution, each of which was approximated by two automated proving systems, AProS and Prover9. A necessary (but not sufficient) condition was proposed in order to decide the adequacy of these approximations (Def. 7).

Given the speed-up matrices obtained, it is evident that neither AProS nor Prover9 conforms to the normality condition defined :

Prover9 cannot detect trivial cases with regularity; instead of the expected periodic diagonal patterns induced by the presence of instances of trivial speed-up, we find a number of vertical clusters of speed-up instances without a discernible regular distribution. This behavior is incompatible with the second condition of normality. We also found a non-negligible number of negative speed-up instances when a disjunction is included in the list of logical connectives. The presence of the disjunction seems to have little effect on the distribution of instances of positive speed-up.

AProS shows an important number of instances of negative speed-up (slow-down). While AProS does not have problems detecting cases of trivial speed-up, the number of instances of negative speed-up is greater than in Prover9. Furthermore, the presence and distribution of these instances is not significantly affected by the presence or absence of the disjunction, nor by the alternation between intuitionistic and classical deduction.

We consider the observed behaviors as evidence of the computational complexity that the proposed condition of normality entails: discerning the usefulness of new information is intrinsically computationally complex. We formalize this in the following statements:

Lemma 8.

There is a normal prover.


Given the argument , a brute force algorithm that, starting from the list , simply searches for while building the (infinite) tree of all possible logical derivations in a breadth-first search fashion, will always find the proof of minimal length in the form of the selected branch. ∎

The expected computational time of this algorithm is of the order , where is a polynomial that depends on the number and structure of the derivation rules used.

Theorem 9.

Given a non-trivial argument and a non-empty set such that , deciding if is -Hard.


Given the results found in [11] we can say that, if , there is no polynomial time algorithm that can find (if it is polynomial with respect to ). However, if we can decide in polynomial time then we can find in polynomial time:

The algorithm we propose iterates the answer to on each possible derivation from the list of chosen formulas ( starts with the list of axioms). The number of derived formulas is polynomial with respect to . Each of the positive instances is added to a list . The formula in that minimizes the proof length can be found in by pairwise comparison using the result of and for each . Note that if both values are TRUE, then both formulas must be part of the smallest proof that contains any of the propositions, so we can choose to add just one formula and the other one will eventually be added on to the following iterations if it is part of the smallest demonstration. We add the selected expression to the list .

Following the stated procedure, we will eventually reach a trivial argument, finishing the demonstration of the tautology in the form of list (the list contains a succession of formulas derived in order from ). Note that we consult a polynomial time algorithm a polynomial number of times over the size of a list that grows up to a polynomial size with respect to the input. Hence the algorithm finds the smallest proof in polynomial time. ∎

Given the demonstrated difficulty of the problem, we make the following conjecture:

Conjecture 10.

There is no normal proving algorithm significantly faster than the brute force algorithm described in 8.

In other words, we are proposing the existence of an ineluctable tradeoff between normality and execution time of an automated proving algorithm. The conjecture 10 also implies that, for AProS and Prover9, the function is of exponential order for each .

Finally, in the context of an argument , we can say that a set contains useful information if it induces (positive) speed-up 111This means that ) and that the information is useless otherwise. With the experiment and the theorem 9 we have presented empirical and theoretical arguments as to why discerning the usefulness of new information for solving a specific problem is as hard as the problem itself.

As for the differences found in the speed-up matrices for Prover9 and AProS, we believe that these emerge mostly due to an initial syntactic analysis performed by AProS that allows it to detect trivial cases; the exception being when removing disjunction which results in no slow-down instances, although this change doesn’t seem to affect the positive speed-up distribution in a significant way. The conjecture 10 suggests that, for AProS and Prover9, the function is of exponential order for almost all ’s, and that both are within a linear constant between them, else we should be able to find a shortcut to normality.

Whenever the result presented in figure 7 is a counterexample to this statement and the conjecture 10 is an open question. We could argue that simplifying the formulas by removing number of logical connectives is doing the prover’s job. And, if we do restrict our space to simpler (yet complete) set of formulas, a stronger normality condition should be able to be defined as:

Definition 11.

A system is normal for if there exist a polynomial time algorithm that calls ATP as an oracle for deciding , with , and as in theorem 9.


  • [1]

    J. Joosten, F. Soler-Toscano and H. Zenil, Program-size Versus Time Complexity, Speed-up and Slowdown Phenomena in Small Turing Machines,

    Int. Journ. of Unconventional Computing, special issue on Physics and Computation, vol. 7, no. 5, pp. 353-87, 2011.
  • [2] H. Zenil, From Computer Runtimes to the Length of Proofs: With an Algorithmic Probabilistic Application to Waiting Times in Automatic Theorem Proving. In M.J. Dinneen, B. Khousainov, and A. Nies (Eds.), Computation, Physics and Beyond, WTCS 2012, LNCS 7160, pp. 223-240, Springer, 2012.
  • [3] J. Joosten, F. Soler-Toscano, H. Zenil, Speedup and Slowdown Phenomena in Turing Machines, Wolfram Demonstrations Project, Published: November 8, 2012.
  • [4] A. M. Turing, On Computable Numbers, with an Application to the Entscheidungsproblem, Proceedings of the London Mathematical Society, 2 42: 230-65, 1937.
  • [5] G. J. Chaitin, Algorithmic Information Theory, Cambridge Tracts in Theoretical Computer Science Volume 1, Cambridge University Press, 1987.
  • [6] G. J. Chaitin, On the length of programs for computing finite binary sequences, Journal of the ACM, 13(4):547-569, 1966.
  • [7] M. Hutter, Algorithmic complexity. Scholarpedia, 3(1):2573, 2009
  • [8] K. Gödel, On formally undecidable propositions of Principia Mathematica and related systems I in Solomon Feferman, Collected works, Vol. I. Oxford University Press: 144-195, 1986.
  • [9] A. N. Kolmogorov, Three approaches to the quantitative definition of information. Problems of Information and Transmission, 1(1):1–7, 1965.
  • [10] R. J. Solomonoff, A formal theory of inductive inference: Parts 1 and 2. Information and Control, 7:1-22 and 224-254, 1964.
  • [11] M. Alekhnovich, S. Buss, S. Moran, and T. Pitassi, Minimum Propositional Proof Length is NP–Hard to Linearly Approximate, The Journal of Symbolic Logic, Volume 66: 171–191, 2001.
  • [12] K. Kunen, Set Theory: An Introduction to Independence Proofs, Elsevier: 133–138, 1980.
  • [13] M. Huth, and M. Ryan, Logic in Computer Science: Modelling and reasoning about Systems, Cambridge University Press, UK, 2004.
  • [14] J.E. Hopcroft, R. Motwani, and J.D. Ullman, Introduction to Automata Theory, Languages, and Computation, Addison Wesley, Boston/San Francisco/New York: 368, 2007.
  • [15] B. Christoph, S. Geoff,

    Working with Automated Reasoning Tools

    ,, 2008.
  • [16] R. Cilibrasi, P.M.B. Vitanyi, Clustering by compression, IEEE Trans. Inform. Theory, 51:12(2005), 1523–1545.
  • [17] W. McCune, “Prover9 and Mace4”,, 2005-2010.
  • [18] W. Sieg, “AProS”,, 2006–2012.
  • [19] W. Sieg, The AProS Project: Strategic Thinking & Computational Logic, Logic Journal of the IGPL, 15(4): pp. 359-368, 2007.
  • [20] W. Sieg and J. Byrnes, Normal natural deduction proofs (in classical logic), Studia Logica 60, pp. 67-106, 1998.
  • [21] M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, W.H. Freeman, 1979.