1 Introduction
Speedup in automatic theorem proving should not be regarded just as a quest for faster provers. To prove a theorem is an essentially difficult task and a very sensible question is whether there can be a shortcut to it by adding additional information. So the issue is what information can be added so that a proof is significantly shorter. Viewed from a very abstract point of view, of course, there is an easy (and trivial) answer: add the theorem itself and the proof becomes minimal! But adding all provable theorems to an axiomatic system renders it pointless.
On the other hand, a system under memory constraints that are not minimal (i.e. that can store a set of useful theorems apart from the axioms of a theory) but at the same time bounded by a realistic measure, would be theoretically optimal if it stores only the most useful theorems for a certain task (that is, those that can make a proof shorter). A naive strategy would be to prove theorems incrementally and save them (until the memory is full). When a new theorem has to be proved the system can resort to previous theorems to see if a shorter proof can be obtained other than trying to prove the theorem from scratch. But how useful can this approach be? As we will show, it is not only naive and so expectedly not very useful, but it is also very rarely fruitful and in equal measure a waste of resources.
For this, we use a very basic but objective measure of speedup, namely shortening the length of proofs. We sample random propositional calculus theorems and with the help of two automatic theorem provers (AProS and Prover9) we produced alternative proofs for them including as additional axioms other (random) valid formulas. Afterwards, we compare the length of the proofs to establish if a shorter one arose from the use of the new axioms. In order to select our samples, we restrict our analysis to a fragment of propositional logic, given by an upper bound on the number of atomic propositions and of logical connectives.
To find a proof for a theorem from a finite set of formulas using an automatic theorem prover, one can always start from and apply all transformation rules until is generated, then pick out the sequence on the path from to as proof. In practice, however, important optimization strategies have to be implemented to avoid exponential execution time even for the simplest of proofs. We will explore how implementing these algorithmic strategies leads to a compromise between various seminal complexity currencies, shedding light on a possibly more general challenge in problem solving, related to the usefulness of information and computational time.
Definition 1.
Let be a finite set of formulas and let stand for the fact that is provable from . will denote the length of the minimum proof for , as given by the number of logical deduction steps used in the proof (for example, the number of lines in a Fitch diagram).
Let and be two representations of the class of equivalent theories (that is, and are finite sets of axioms with an identical set of logical consequences, or theories) such that . We define the speedup delta for between and , or simply speedup, as the function:
Let be a finite subset of the set of all valid formulas in propositional logic and let and . Now will be the set of all logical consequences (theorems) of in . Finally, the set is the subset of such that for a given subset of , with , we have the following property:
Note that this property is equivalent to
We ask after the relation and distribution between the size of the set with respect to the size of a finite subset of as a function of the set . In other words, given a random axiomatic system, if we strengthen it by adding a number of theorems as axioms, what does the distribution of nontrivial speedups look like? A set of valid propositional calculus sentences and axiomatic systems is constructed given three variables: represents the maximum depth of composition of logical operators, states the maximum number of different literals that can be present in a sentence, and determines the number of axioms in an axiomatic system.
We will report not only that instances of nontrivial positive speedup are relatively rare, but that their number is considerably smaller than the number of instances of negative speedup. In other words, strengthening a random axiomatic system by adding theorems as axioms tends to increase the length of proofs found by automatic theorem provers. We believe that the behavior observed verifies the stated condition and that its oblique distribution is strongly related to the computational difficulty of finding a proof of minimum size and deciding the usefulness of information as a whole.
2 Methodology
Prover9 is an automated theorem prover for Firstorder and equational logic developed by William McCune [17] at the Argonne National Laboratory. Prover9 is the successor of the Otter theorem prover. Prover9 is free and open source. AProS (Automated Proof Search) is a theorem prover that aims to find normal natural deduction proofs of theorems in propositional and predicate logic [18]. It does so efficiently for minimal, intuitionistic and classical versions of firstorder logic.
We undertake an empirical exploration of the speedup in
propositional logic, using AProS
and Prover9 in order to approximate the proof complexity
over randomly generated sets of axiomatic systems and
propositions. We later propose a necessary but possibly insufficient
property (in definition 7) for judging the adequacy
of the approximations so obtained.
To perform the exploration described, we have first to narrow the search space: We denote the set of all propositions bounded by and by , the set of all theories in by (), while and denote the sets of all theories and arguments bounded by and respectively. Finally, we define the sets:
and
The sets are generated recursively as follows:
(2) 
where ,
and is the set that
consists of the first variables of .
Note that, given an order to the set of Boolean operators, we can define an enumeration of all the members of based on its generation order, an order which is consistent among all the subsets for a given . The order defined is equivalent to giving each set an array structure on which the first position correspond to the first element of and to the last, where is defined as
We call this array the propositions array. Given this
order, we can represent each theory comprising
by a binary string of size on which the th bit is
iff is in the theory. Following the previous idea, we can
efficiently represent the members of by an
array of integers, where each integer denotes the number of 0s
present between each 1. Hence we can represent the theories by integer arrays of the form . We call this the representation of the theory
and denote it by .
Definition 3.
If and , then is known as the syntactic complexity of or depth of ; it is denoted by .
Now let’s consider the following function:
The function gives us the degree of separation between the first element in the array of propositions ([0,…,0]) and . Moreover, there is an exponential number of theories that share a value for the function. We call such a set, defined by , a separation class and the separation order of all the theories in the class. Recall that the position of each proposition depends on its order of generation. Hence the syntactic complexity of each theory is set by its order of separation.
Definition 4.
We define the syntactic complexity of a theory , denoted as , as
where is the syntactic complexity of .
Note that .
The size of the systems we are exploring grows by , making an exhaustive exploration
intractable. The methodology used therefore consists in sampling
the space of valid propositional sentences. The propositions are
then used as theorem candidates against subsets of the formulae
used as theories.
In order to compute each sample set we build two sets: a sample
set of theories denoted by , composed of number of theories
in , and the sample set of prospective theorems
denoted by , composed of numbers of propositions in
. Each set is randomly generated by, first, choosing a
random list of numbers of the respective lengths between and
. For the list , each of these numbers represents the
prospective theorems sampled (for each theory). For the list ,
the numbers represent a separation class from which we choose a
theory by assigning random numbers to each of the parameters of
its representation, with the condition that their sum is the
value of the chosen class. The chosen lists are then rid of
inconsistent theories for and inconsistent propositions with
respect to the first element for the list
.
Afterwards, we use the lists obtained to compute a sample set of
. First, for each we generate an
additional number of theories of the form , where
is a prefix of the list ; we call a base theory and a
derived theory. Then, we pair all the theories generated
with each of the propositions of , called
objectives, to form a list of cases of the form
. Afterwards, we remove the unprovable
cases using by exhaustively exploration the corresponding
truth tables.
It is important to note that we are generating a significant number of
trivial cases when . Hence we expect at least close to
instances of positive speedup, depending
on the number of unprovable cases.
Finally, we run an automatic theorem prover (ATP) and register the length of each of the proofs obtained, storing the shortest ones. However, we have no reason to believe that the use of an ATP system would give us a good approximation to the sparsity of proving speedup. Therefore we define a speedup function relative to each ATP:
Definition 5.
Let be an ATP system, a provable argument for , and two descriptions of such that . We define the speedup delta relative to of between and , or simply relative speedup, as the function:
where is the shortest proof found by for the argument .
Definition 6.
Let be a formal system, be an ATP system for , a provable argument for , and a description for such that . We call the function a bound for as a function of if
Now, since we do not have enough information about the existent sparsity of proving speedup, we will define a necessary but possibly insufficient condition needed for an acceptable approximation:
Definition 7.
We say that an argument is trivial if
. An ATP system is possibly normal for
if, for each , , and if
is a nontrivial
argument, then .
(In lemma 8 we will show that normality is possible, although our example is far from ideal.)
The mathematical structure used to store and analyze the results obtained is called the speedup matrix. On this matrix each entry has assigned the value
where
If is not a provable case, then the
value of the entry is left undefined.
A natural simplification of this matrix is the incidence
speedup matrix, on which we simply assign different discrete
values to one of the following four cases: ,
, and is
undefined.
Note that by design we expect a prevalence in both matrices
of diagonal structures composed of cases of positive speedup with
a period of . These structures correspond to the
cases of trivial speed up included.
3 Results
To begin with, we performed more than 15 experiments using Prover9.
The following table resumes a select number of results:
Exp. Num.  Cases  Percentage  Ratio  

11  5400  606  11.2%  94  6.44 
10  6381  704  11.01%  137  5.138 
7  4848  389  8.02%  231  1.683 
5  5454  426  7.81%  24  17.75 
3  11297  856  7.57%  70  12.228 
As the table shows, Prover9 does not exhibit normal behavior as defined in 7. Furthermore, as exemplified in Fig. 2, the speedup matrix does not present the expected periodic diagonal speedup instances.
Then, we present the Incidence Speedup Matrix obtained
from AProS under four different conditions: the basic
experiment (the four logical connectives and classical deduction)
(3.1); exclusion of disjunction as a connective (3.3); the
basic experiment with intuitionistic logic (3.2); intuitionistic
logic while restricting to the negative fragment as in 3.3
(3.4). The same set of cases was used when possible, i.e.
3.1 with 3.2 and 3.3 with 3.4. Also included is
the matrix obtained for Prover9 during experiment 3.1.
It is important to note that we obtained no negative speedup
values. The matrix values are represented using a color scale,
where the color white corresponds to no speedup, blue to positive
speedup and red to negative speedup. Grey corresponds to
unprovable cases or cases in which the time limit
was reached.
The four experiments yield similar behavior: although AProS
does show periodic diagonal structures it also exhibits a
significant presence of negative speedup instances, which makes
the ATP otherwise than normal.
It is important to note that a degree of clustering of negative
speedup instances is expected, since each delta is computed from
the minimum proof length found for each previously derived theory
and current objective. It is arguable whether or not we are
overcounting negative speedup instances.
Each speedup matrix is divided into four parts for visualization purposes. Each of the entries’ values is represented using a fourcolor scale, where the color white corresponds to no speedup, blue to positive speed up and red to negative speedup. The columns correspond to each theory generated and the rows to each of the theorems. Grey corresponds to unprovable cases or cases where a time limit was reached.
3.1 AProS speedup incidence matrix with classical deduction and all four logical connectives
Provable Cases: 5763.
Positive
: 632, percentage: 10.97%.
Negative : 564, percentage: 9.78%.
3.2 AProS speedup incidence matrix with intuitionistic deduction and all four logical connectives:
Provable Cases: 5680.
Positive
: 646, percentage: 11.37.%
Negative : 537, percentage: 9.45%.
3.3 AProS speedup incidence matrix with classical deduction and without disjunction
Provable Cases: 6680.
Positive
: 899, percentage: 13.46 %.
Negative : 484, percentage: 7.246%.
3.4 AProS speedup incidence matrix with intuitionistic deduction and without disjunction
Provable Cases: 6660.
Positive
: 862, percentage: 12.94%.
Negative : 587, percentage: 8.81%.
3.5 Prover9 incidence speedup matrix without disjunction:
Provable Cases: 6680.
Positive
: 312, percentage: 4.67%.
Negative : 0, percentage: 0%.
3.6 Observations and Conclusions
The main objective of this project was to undertake an empirical exploration of the prevalence
and distribution of instances of positive speedup found within the propositional calculus.
In particular, two deduction systems for propositional logic were explored: natural deduction and binary
resolution, each of which was approximated by two automated proving systems, AProS and Prover9.
A necessary (but not sufficient) condition was proposed in order to decide the adequacy of
these approximations (Def. 7).
Given the speedup matrices obtained, it is evident
that neither AProS nor Prover9 conforms to the
normality condition defined :
Prover9 cannot detect trivial cases with regularity;
instead of the expected periodic diagonal patterns induced by
the presence of instances of trivial speedup, we find a number of
vertical clusters of speedup instances without a discernible
regular distribution. This behavior is incompatible with the
second condition of normality. We also found a nonnegligible
number of negative speedup instances when a disjunction is
included in the list of logical connectives. The presence of the
disjunction seems to have little effect on the
distribution of instances of positive speedup.
AProS shows an important number of instances of negative speedup
(slowdown). While AProS does not have problems detecting cases of trivial
speedup, the number of instances of negative speedup is greater than in Prover9.
Furthermore, the presence and distribution of these instances
is not significantly affected by the presence or absence of the disjunction,
nor by the alternation between intuitionistic and classical deduction.
We consider the observed behaviors as evidence of the computational complexity that the proposed condition of normality entails: discerning the usefulness of new information is intrinsically computationally complex. We formalize this in the following statements:
Lemma 8.
There is a normal prover.
Proof.
Given the argument , a brute force algorithm that, starting from the list , simply searches for while building the (infinite) tree of all possible logical derivations in a breadthfirst search fashion, will always find the proof of minimal length in the form of the selected branch. ∎
The expected computational time of this algorithm is of the order , where is a polynomial that depends on the number and structure of the derivation rules used.
Theorem 9.
Given a nontrivial argument and a nonempty set such that , deciding if is Hard.
Proof.
Given the results found in [11] we can say that, if
, there is no polynomial time algorithm that can find
(if it is polynomial with respect to
). However, if we can decide in polynomial time then we can find
in polynomial time:
The algorithm we propose iterates the answer to
on each possible
derivation from the list of chosen formulas ( starts with
the list of axioms). The number of derived formulas is polynomial
with respect to . Each of the positive instances is added to
a list . The formula in that minimizes the proof length can
be found in by pairwise comparison using the
result of
and for each
. Note that if both values are TRUE, then both
formulas must be part of the smallest proof that contains any of
the propositions, so we can choose to add just one formula and the
other one will eventually be added on to the following iterations
if it is part of the smallest demonstration. We add the selected
expression to the list .
Following the stated procedure, we will eventually reach a trivial argument, finishing the demonstration of the tautology in the form of list (the list contains a succession of formulas derived in order from ). Note that we consult a polynomial time algorithm a polynomial number of times over the size of a list that grows up to a polynomial size with respect to the input. Hence the algorithm finds the smallest proof in polynomial time. ∎
Given the demonstrated difficulty of the problem, we make the following conjecture:
Conjecture 10.
There is no normal proving algorithm significantly faster than the brute force algorithm described in 8.
In other words, we are proposing the existence of an ineluctable
tradeoff between normality and execution time of an
automated proving algorithm. The conjecture 10
also implies that, for AProS and Prover9, the function
is of exponential order for each .
Finally, in the context of an argument , we can
say that a set contains useful information if it
induces (positive) speedup ^{1}^{1}1This means that
) and that the information is
useless otherwise. With the experiment and the theorem
9 we have presented empirical and theoretical
arguments as to why discerning the usefulness of new
information for solving a specific problem is as hard as the
problem itself.
As for the differences found in the speedup matrices for Prover9 and AProS, we believe that these emerge mostly due
to an initial syntactic analysis performed by AProS that
allows it to detect trivial cases; the exception being when
removing disjunction which results in no slowdown instances,
although this change doesn’t seem to affect the positive speedup
distribution in a significant way. The conjecture
10 suggests that, for AProS and Prover9, the function is of exponential order for
almost all ’s, and that both are within a linear constant
between them, else we should be able to find a shortcut to
normality.
Whenever the result presented in figure 7 is a counterexample to this statement and the conjecture 10 is an open question. We could argue that simplifying the formulas by removing number of logical connectives is doing the prover’s job. And, if we do restrict our space to simpler (yet complete) set of formulas, a stronger normality condition should be able to be defined as:
Definition 11.
A system is normal for if there exist a polynomial time algorithm that calls ATP as an oracle for deciding , with , and as in theorem 9.
References

[1]
J. Joosten, F. SolerToscano and H. Zenil, Programsize Versus Time Complexity, Speedup and Slowdown Phenomena in Small Turing Machines,
Int. Journ. of Unconventional Computing, special issue on Physics and Computation, vol. 7, no. 5, pp. 35387, 2011.  [2] H. Zenil, From Computer Runtimes to the Length of Proofs: With an Algorithmic Probabilistic Application to Waiting Times in Automatic Theorem Proving. In M.J. Dinneen, B. Khousainov, and A. Nies (Eds.), Computation, Physics and Beyond, WTCS 2012, LNCS 7160, pp. 223240, Springer, 2012.
 [3] J. Joosten, F. SolerToscano, H. Zenil, Speedup and Slowdown Phenomena in Turing Machines, Wolfram Demonstrations Project, Published: November 8, 2012.
 [4] A. M. Turing, On Computable Numbers, with an Application to the Entscheidungsproblem, Proceedings of the London Mathematical Society, 2 42: 23065, 1937.
 [5] G. J. Chaitin, Algorithmic Information Theory, Cambridge Tracts in Theoretical Computer Science Volume 1, Cambridge University Press, 1987.
 [6] G. J. Chaitin, On the length of programs for computing finite binary sequences, Journal of the ACM, 13(4):547569, 1966.
 [7] M. Hutter, Algorithmic complexity. Scholarpedia http://www.scholarpedia.org/article/Algorithmic_complexity, 3(1):2573, 2009
 [8] K. Gödel, On formally undecidable propositions of Principia Mathematica and related systems I in Solomon Feferman, Collected works, Vol. I. Oxford University Press: 144195, 1986.
 [9] A. N. Kolmogorov, Three approaches to the quantitative definition of information. Problems of Information and Transmission, 1(1):1–7, 1965.
 [10] R. J. Solomonoff, A formal theory of inductive inference: Parts 1 and 2. Information and Control, 7:122 and 224254, 1964.
 [11] M. Alekhnovich, S. Buss, S. Moran, and T. Pitassi, Minimum Propositional Proof Length is NP–Hard to Linearly Approximate, The Journal of Symbolic Logic, Volume 66: 171–191, 2001.
 [12] K. Kunen, Set Theory: An Introduction to Independence Proofs, Elsevier: 133–138, 1980.
 [13] M. Huth, and M. Ryan, Logic in Computer Science: Modelling and reasoning about Systems, Cambridge University Press, UK, 2004.
 [14] J.E. Hopcroft, R. Motwani, and J.D. Ullman, Introduction to Automata Theory, Languages, and Computation, Addison Wesley, Boston/San Francisco/New York: 368, 2007.

[15]
B. Christoph, S. Geoff,
Working with Automated Reasoning Tools
, http://www.cs.miami.edu/~geoff/Courses/TPTPSYS/, 2008.  [16] R. Cilibrasi, P.M.B. Vitanyi, Clustering by compression, IEEE Trans. Inform. Theory, 51:12(2005), 1523–1545.
 [17] W. McCune, “Prover9 and Mace4”, http://www.cs.unm.edu/~mccune/Prover9, 20052010.
 [18] W. Sieg, “AProS”, http://www.phil.cmu.edu/projects/apros/, 2006–2012.
 [19] W. Sieg, The AProS Project: Strategic Thinking & Computational Logic, Logic Journal of the IGPL, 15(4): pp. 359368, 2007.
 [20] W. Sieg and J. Byrnes, Normal natural deduction proofs (in classical logic), Studia Logica 60, pp. 67106, 1998.
 [21] M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NPCompleteness, W.H. Freeman, 1979.