Automated Conjecturing VII: The Graph Brain Project & Big Mathematics

by   N. Bushaw, et al.

The Graph Brain Project is an experiment in how the use of automated mathematical discovery software, databases, large collaboration, and systematic investigation provide a model for how mathematical research might proceed in the future. Our Project began with the development of a program that can be used to generate invariant-relation and property-relation conjectures in many areas of mathematics. This program can produce conjectures which are not implied by existing (published) theorems. Here we propose a new approach to push forward existing mathematical research goals---using automated mathematical discovery software. We suggest how to initiate and harness large-scale collaborative mathematics. We envision mathematical research labs similar to what exist in other sciences, new avenues for funding, new opportunities for training students, and a more efficient and effective use of published mathematical research. And our experiment in graph theory can be imitated in many other areas of mathematical research. Big Mathematics is the idea of large, systematic, collaborative research on problems of existing mathematical interest. What is possible when we put our skills, tools, and results together systematically?



There are no comments yet.


page 3

page 6

page 7

page 8

page 10

page 12

page 24


Adventures in Mathematical Reasoning

"Mathematics is not a careful march down a well-cleared highway, but a j...

Big Math and the One-Brain Barrier A Position Paper and Architecture Proposal

Over the last decades, a class of important mathematical results have re...

Interoperability in the OpenDreamKit Project: The Math-in-the-Middle Approach

OpenDreamKit --- "Open Digital Research Environment Toolkit for the Adva...

The human quest for discovering mathematical beauty in the arts

In the words of the twentieth-century British mathematician G. H. Hardy,...

On Radically Expanding the Landscape of Potential Applications for Automated Proof Methods

In this paper we examine the potential of computer-assisted proof method...

Co-occurrence simplicial complexes in mathematics: identifying the holes of knowledge

In the last years complex networks tools contributed to provide insights...

Measuring Mathematical Problem Solving With the MATH Dataset

Many intellectual endeavors require mathematical problem solving, but th...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Our Project began with the development of a program that can be used to generate invariant-relation and property-relation conjectures in many areas of mathematics. This program can produce conjectures which are not implied by existing (published) theorems. Here we propose a new approach to push forward existing mathematical research goals—using automated mathematical discovery software. We suggest how to initiate and harness large-scale collaborative mathematics. We envision mathematical research labs similar to what exist in other sciences, new avenues for funding, new opportunities for training students, and a more efficient and effective use of published mathematical research.

The Graph Brain Project is an experiment in how the use of automated mathematical discovery software, databases, large collaboration, and systematic investigation provide a model for how mathematical research might proceed in the future. Our experiment is modular and can be usefully expanded. We investigated one small open problem in graph theory. In the course of this investigation we coded many graph theoretic concepts and graphs, and computed values of many invariants for these graphs. Other researchers working on other open problems, adding their own contributions and expertise, and following their own graph theoretic interests, can leverage and supplement our code—a multiplier effect. And our experiment in graph theory can be imitated in many other areas of mathematical research. Big Mathematics is the idea of large, systematic, collaborative research on problems of existing mathematical interest. What is possible when we put our skills, tools, and results together systematically?

Automated mathematical discovery programs are at the point where their utility to researchers cannot be ignored. Conjectures are the life-blood of mathematics. The papers [31, 28]

include examples of automated conjectures for matrix theory, number theory, graph theory and chemical graph theory; these are of the form of bounds for matrix, integer and graph invariants. In other research we have generated conjectures for combinatorial games, intersecting set systems, and linear programs, among others: the idea is general—all you need to get going are a few coded invariants and example objects. That said, as we are able to coax our machines to do more and more things that historically required human ingenuity, human mathematicians will always have an essential role: computer contributions are necessarily judged by how much they help us achieve our human mathematical goals; our human questions are our yardsticks with which we measure our computer assistants.

Figure 1. These are all of the expressions of complexity no more than that can be formed from invariants , , and and operators and . These are real-valued functions that can be applied to objects of type corresponding to the invariants.

A central idea is that computers can easily exhaustively generate and evaluate all expressions formed from standard mathematical ingredients for relatively small numbers of example mathematical objects. These expressions—and their corresponding values for example sets of objects—can then be utilized in a variety of ways. In well-defined instances it can be argued that no human can find a simpler expression satisfying certain conditions using the same mathematical ingredients. Consider the search for upper bounds for some invariant . For these purposes we take an invariant to be a real-valued function of the objects. To be concrete, assume the objects are (finite) graphs and that , , …, are graph invariants. An upper bound of will be some mathematical function of the ’s. This function may involve arithmetic operations, algebraic operations, or any other mathematical operations. Some real-number operators would include addition and square root ( and ). In this case (and if for the sake of example) the complexity-1 expressions would be the invariants themselves: . The complexity-2 expressions would be all the expressions that can be formed from a mix of two operators or invariants. Since is the only unary operator, the only possibilities are: , , . The complexity-3 expressions are: , , , , , , , , and (a modern computer algebra system can identify and remove expressions equivalent due to additive commutativity, etc). A program can recursively generate all possible ’s up to any specified complexity. Generating expressions will face combinatorial explosion—but there is no difficulty in generating all (relatively small) human-comprehendible expressions. (Our program can generate more than 100 million expressions per second, depending on the complexity of the expressions, on a standard laptop). Our conjecturing program can either evaluate these expressions on the fly for a particular object (graph) or, better, access a database of pre-computed invariant values. These generated, evaluated expressions—together with a list of existing bounds for the invariant —are the main ingredients in generating conjectures that improve on all published bounds for .

Objects Invariants Properties
Graphs independence_number, is_tree,
radius is_hamiltonian
Symmetric det, is_unitary,
Matrices max_eigenvalue is_positive_definite
Natural distinct_prime_factors, is_prime,
Numbers largest_prime_factor is_even
Figure 2. A user of Conjecturing will need to input some examples of objects, invariants and properties.

In the case of bounds for an invariant of a mathematical object, the program functions best the more relevant invariants that it has available for its produced conjectures. That is, if there are unknown bounds that are functions of invariants known to mathematicians (and recorded in the mathematics literature) these will be produced by the program if the invariants and properties are included in the program. In particular, the program produces the simplest (in terms of complexity) bounds that are true with respect to the objects that it knows using the invariants that it knows, and any other given constraints.

The foundational idea of Big Mathematics is to form research groups of various sizes to work on specific mathematical problems using automated discovery tools, databases, and exploiting the mathematical literature systematically. Some members of a research group might code invariants, objects and properties. Other members can be in charge of generating conjectures (which can be done automatedly), and testing conjectures and finding counterexamples (which can be done systematically for small objects if object-generators are coded). Other members can work on proving conjectures. A group might have a library specialist (responsible for identifying all existing theorems that are relevant for an investigation, and keeping track of new concepts to be coded), a code-management and database specialist (to maintain stable code, manage versioning and code updates, and compute and store values for all coded invariants for all coded objects). In order to maximize what is possible research groups will need to code huge bodies of published mathematical research. This research, in any mathematical sub-field, consists of large numbers of published examples of mathematical objects, invariants and properties of those objects, together with other related concepts. A nice feature is that, once coded, any other researcher or group can use and build on this work. Ideally we could build code-bases of graph theoretic knowledge that make it easy and profitable to use and extend—and enjoy network effects.

In the following sections we mention the historical context of our research—which goes back to the earliest days of computer science and artificial intelligence research. We discuss an example that demonstrates what is currently possible. And finally we discuss how this example—and our Graph Brain Project—can be ramped up to help mathematicians more quickly—and systematically—attain our shared research goals.

2. Background

This Graph Brain Project is motivated by our research in automated mathematical conjecturing programs—a small part of the larger area of automated mathematical discovery research. Alan Turing, in a 1948 report on “Intelligent Machinery”, suggested mathematics as a domain to begin with in building a “thinking machine” [46]. There has ever since been some number of researchers working to automate parts of mathematics, with varying success, and in developing computer tools that provide intelligent assistance to mathematics researchers.

Figure 3. Alan Turing; William McCune; Paul Erdős & Siemion Fajtlowicz. Erdős was as well-known for his conjectures as for his theorems


Automated theorem proving was the first and has been the most studied area. The first programs to prove theorems were developed in the 1950s [44]; and the McCune/Otter 1996 computer proof of the Robbins Conjecture [36] was a milestone in this area. Zeilberger has done impressive research on the automatic proof of conjectured combinatorial identities [48]. The first program to make conjectures leading to published mathematical research was Fajtlowicz’s Graffiti program [18]. Research on integer relation detection between sets of numbers has led to surprising conjectures and breakthroughs, including a new formula for the digits of the number [2]. Of course, all this is only a small part of what mathematical research consists in—or of what might be attempted.

Figure 4. Charles Babbage, Carl Friedrich Gauss, and Neil Sloane

We demonstrate that building and maintaining databases of non-trivial computational results—for instance, values of NP-complete graph invariants for all published graphs—will be generally useful in scientific research; this can be coordinated and standardized, and will be a component of Big Mathematics. The utility of significant computations has a long history in our subject, going back at least to Ptolemy’s trigonometry tables, and more recent log tables [8]. It should be noted that Babbage promoted his Difference Engine to have the advantage of producing accurate mathematical tables free of human calculating error—and this may have been the largest funded mathematics-related project ever [16]. Accurate computations are not only valuable for engineering purposes, but even for purely mathematical investigations: Gauss conjectured the Prime Number Theorem on the basis of the table of primes he had computed. The Online Encyclopedia of Integer Sequences (OEIS, initiated by Neil Sloane 50 years ago [45]), a 21 century analog of Gauss’ tables, which makes essential use of modern computer resources, is a familiar tool for many researchers searching for patterns.

Figure 5. Babbage’s Difference Engine, and associated logarithm tables.

Larson and Van Cleemput have developed a general-purpose conjecturing program—built around Fajtlowicz’s Dalmatian heuristic—that has demonstrated its utility for a number of areas of mathematical research

[31]. Generated expressions function as conjectured bounds for an investigated invariant. These are tested for truth with respect to the stored objects. Conjectures are not stored or produced unless they imply a better approximate value for at least one coded object than any coded theoretical bound or previously stored conjectured bound.

Figure 6. Fajtlowicz’s Dalmatian heuristic. Graphs

are on the horizontal axis. Conjectured bounds provide maximum and minimum values which can be used to estimate the independence number

: the true values of are spots between the curves of these theoretical ranges.

In some instances we have been able to prove the conjectures of our program—two new theorems are reported here. One attractive theorem resulted from our 2015 summer project investigating the combinatorial game Chomp [3]; several more resulted from our 2016 summer project investigating the domination number of benzenoids [28].

Figure 7. David Gale, Chomp board positions. One conjecture led to the theorem that, for any position where the previous player to play has a winning strategy, the number of remaining cookies is at least one less than twice the number of non-empty columns.

We are graph theorists. The best approach to demonstrate the utility of the kind of research programs that we are advocating is to attempt this research for graphs. A graph (or network) is a mathematical object consisting of vertices and edges between them. Graphs are used to model many situations: these include molecular structure [13, 25, 21], the World Wide Web [41], social networks [39], and GPS satellite networks [6]

. And results in graph theory can be used as tools for proving results in other areas of mathematics: one very nice example is the proof of the Birkhoff-von Neumann theorem (that every doubly stochastic matrix can be written as a convex combination of permutation matrices) using the König-Egervary theorem (that the covering number of a bipartite graph equals its matching number)


Figure 8. Sir Harold Kroto, co-discoverer of fullerene molecules, holding a model of a buckyball; a graph of buckminsterfullerine .

We will demonstrate the potential of our approach by investigating conjectured bounds for the independence number of a graph, a fundamental graph theory concept, intractable, and computationally equivalent to hundreds of other concepts in discrete mathematics. We have generated new conjectured bounds for the independence number of a graph which are not implied by any existing (published) bounds.

3. Independence Number and the conjecturing Program

The independence number (or stability number) of a graph is the largest number of points in the graph where no pair of the points has a line between them. It is a widely studied hard-to-compute graph invariant which arises in a variety of situations. Calculating the independence number of a graph can be used to optimize the configuration of a GPS network. Stable benzenoids [42] and small stable fullerenes tend to minimize their independence numbers [19]. The independence number of a graph is a central concept of two of the most studied and important problems in graph theory: the P vs. NP question [22], and Hadwiger’s Conjecture [14, 35, 9]. Many families of combinatorial objects including error-correcting codes, set packings in Hamming spaces, and balanced incomplete block designs can be viewed as maximum independent sets [40].

Figure 9. The red vertices are a maximum independent set in the Petersen graph (). A GPS satellite: independence number calculations were used to help position the GPS III satellites.

One well-known application is the calculation of the probability of unambiguous message transmission in a channel

[33]. A message consists of a string of letters. Some of these letters can be confounded or confused; for instance “b” and “d” can be confounded. A graph can be defined consisting of the letters of the alphabet as vertices and an edge between them if they can be confounded. For a message with letters a graph can be defined with all -length strings, words or messages as vertices and an edge between any pair of these strings/vertices if any pair of letters in the corresponding place between the strings can be confounded. An independent set in this graph corresponds to a set of strings no pair of which can be confounded in any pair of places. The independence number of this graph would then represent the size of a largest dictionary of -length strings which can be sent without any risk of error. (Appropriately normalized, this number is the Shannon capacity). This number can also be used to calculate the probability that some number of randomly chosen strings or words can be sent without error.

All existing algorithms for finding a maximum independent set in a general graph require an exponential number of steps (in the worst case); the corresponding decision problem is NP-complete [22]. The current boundary between possible and impossible independence number calculations in general graphs with around 2000 vertices: there is a a graph arising from error-correcting codes over an alphabet of size four, for instance, of order 2048, whose independence number has been intensively investigated by capable researchers, and is still not exactly known111 Even small theoretical advances can lead to large practical payoffs.

How can our conjecturing program and database of concepts, examples, theorems, and computed invariant values help? New bounds for the independence number of a graph are of both theoretical and practical interest. We can use our developed tools and resources to conjecture new bounds for the independence number of a graph, that necessarily improve on existing bounds. We can use the program to produce sequences of statements, true for all known examples in the graph theory literature, and hence unfalsifiable by any published examples. These will either admit a traditional proof or will admit counterexamples. Both theorems and counterexamples necessarily constitute new mathematical knowledge. Counterexamples, after being coded and added to our program, yield new conjectures: because the produced conjectures must be true for all objects the program cannot re-produce a falsified conjecture. Newly proved bounds can be used in practical independence number calculations: in ideal cases, matching upper and lower independence number bounds can be used to exactly predict values of the independence number of a graph.

Figure 10. The conjecturing process: (1) the program makes a conjecture, (2) if it is disproved the counterexample may be added to the program, (3) if it is true the theorem (theoretical bound) may be added to the program. In each case the process may be iterated and guaranteed to yield new conjectures.

In our Graph Brain Project summer 2017 workshop, we began with no coded theoretical knowledge—as a demonstration for the students. The program made the following not-existing-in-the-literature conjectures, which we quickly proved. (And these conjectures never reappeared as we began to add theorems—theoretical knowledge—to our program suggesting that these theorems are implied by other existing theorems.)

Figure 11. The -ciliates , , and , with radii 2, 3, and 4, respectively.

The eccentricity of a vertex is the maximum distance from that vertex to any other vertex in the graph. The radius of a graph is the minimum eccentricity of any vertex. The order of a graph is the number of vertices of the graph. The main tool of the following proof is a theorem due to Fajtlowicz [17] that implies that every connected graph with radius has an induced subgraph of radius , called an -ciliate , consisting of a cycle with vertices with each vertex amalgamated to a path with vertices (it follows that ).

Theorem 3.1.

For any connected graph , .


Let be a connected graph with radius , and -ciliate (with ). Note that an -ciliate is bipartite. It is easy to check that , , and .

Let , and . Then

The degree of a vertex in a graph is the number of vertices to which it is adjacent. The maximum degree of a graph is the largest degree of any vertex. The triangle number of a graph is the number of triangles induced by triples of vertices of a graph. The following conjecture, weak in general, gives equality for star graphs. Our database contains only connected graphs. In this case the statement holds for any graph (connected or not) and proving the general case is easier than proving the more specialized (connected) case—an observation any mathematician will recognize.

Theorem 3.2.

For any graph , .


The statement can be verified for small graphs. Assume it is true for graphs with fewer than edges. Let be a graph with edges and be a vertex of maximum degree. If every edge is incident to then is a star, and equality holds. It is also easy to see that the conjecture is true in any case where . So we can assume there is an edge not incident to in some triangle. Let be the graph formed by removing edge (but not its incident vertices). So, by assumption, . We see that , and that . Then

We now have available 520 graphs, 159 invariants, and 92 properties. Many of these graphs, invariants and properties were already coded into the Sage mathematical computing environment ([12], used for this research) by interested researchers. All of the graphs are either published graphs, or graphs which were counterexample to conjectures of our conjecturing program. Many of the graphs and invariants were coded during our 2017 summer research project.

An important feature of our conjecturing program is the ability to use theoretical knowledge. If is an invariant, proved to be an upper bound for an invariant , it can be a added to a theory list; the program will then not include any expression (invariant function) to its list of potentially output conjectures unless it is the case that there is a stored object such that is both less than the value of every previously stored conjectured bound for and less than every stored theory bound. The stored conjecture, if true, is necessarily new knowledge—in the sense that it cannot be implied by the stored theoretical knowledge.

We have been collecting independence number bounds for graphs for some time: many are cataloged in [49]. The ten bounds recorded here seem to be the most useful in practice. They should be interpreted for connected graphs (although most hold for general graphs). These can all be computed efficiently; thus the minimum of these upper bounds and the maximum of the lower bounds are themselves efficiently computable bounds.

Six Upper Bounds for the Independence Number of a Graph

(1) independence number <= annihilation number [43].

If the degrees, , of the vertices of a graph are arranged in non-decreasing order, the annihilation number is then defined to be the largest index such that the sum of the degrees of the first vertices is no more than the sum of the degrees of the remaining vertices.

(2) independence number <= fractional independence number [38].

The independence number can be computed by finding the optimum value of an integer linear program. (For each vertex let . The objective is to maximize , where for every edge .) The fractional independence number is defined to be the optimal value of the relaxation of this linear program.

(3) independence number <= Lovász number [33].

The Lovász number () of a graph, introduced by Lovász in 1979, has a large number of equivalent definitions [29]

, one of which is the minimum of the largest eigenvalue of all the real symmetric matrices of the order of the graph with

s on the diagonal and -entry whenever vertex is not adjacent to vertex . It is an amazing fact that the Lovász theta invariant can be computed efficiently [24].

(4) independence number <= Cvetković bound [11].

The Cvetković bound is the minimum of the number of non-negative and non-positive eigenvalues of the adjacency matrix of the graph.

(5) independence number <= order - matching number.

The matching number is the largest number of edges none of which shares an endpoint with another. This easy-to-prove, and sometimes useful, bound seems to belong to the folklore of our subject.

(6) independence number <= Hansen-Zheng bound [26].

The Hansen-Zheng bound is . Here the size is the number of edges of the graph.

Graph Upper Bound Value
annihilation number 1
fractional independence number 2
Lovász number 2.5
Cvetkovíc bound 1
order - matching 3
Hansen-Zheng bound 1
annihilation number 2
fractional independence number 2.5
Lovász number 2.236
Cvetkovíc bound 2
order - matching 3
Hansen-Zheng bound 3
annihilation number 3
fractional independence number 3
Lovász number 3
Cvetkovíc bound 4
order - matching 3
Hansen-Zheng bound 3
Petersen annihilation number 5
fractional independence number 5
Lovász number 4
Cvetkovíc bound 4
order - matching 5
Hansen-Zheng bound 8
Figure 12. Example upper bounds for the independence number of selected graphs. and are the complete graph and cycle on five vertices; is the complete bipartite graph with partite sets of sizes two and three. The true values are: , , , .

Four Lower Bounds for the Independence Number of a Graph

(1) independence number >= radius [15].

The radius was defined above. The proof that radius-critical subgraphs are -ciliates immediately implies this result as a corollary.

(2) independence number >= residue [20].

If the degrees of the vertices of a graph are arranged in non-increasing order, the Havel-Hakimi theorem says that the sequence formed by removing the first of these and reducing each of the next terms is the degree sequence of a graph. It follows that after iterating this procedure some number of times (and rearranging the new terms in non-increasing order) you get a sequence of s. The number of s is the residue of the graph.

(3) independence number >= critical independence number [30].

The critical independence number is defined to be the cardinality of a certain independent set—and the theorem is trivial. It turn out that this number equals the independence number for a large class of graph (the König-Egervary graphs) which include the bipartite graphs.

(4) independence number >= max_even_minus_even_horizontal [23].

Let be any vertex. It is easy to show that the number of vertices at even distance from minus the number of edges induced by these vertices is a lower bound for the independence number. The max even minus even horizontal invariant is the maximum of these values over all of the vertices of the graph is then also a lower bound for the independence number. Fajtlowicz defined this invariant and observed that it is very good in practice (at least for small graphs).

Graph Lower Bound Value
radius 1
residue 1
critical independence number 0
max_even_minus_even_horizontal 1
radius 2
residue 2
critical independence number 0
max_even_minus_even_horizontal 2
radius 2
residue 2
critical independence number 3
max_even_minus_even_horizontal 3
Petersen radius 2
residue 3
critical independence number 0
max_even_minus_even_horizontal 1
Figure 13. Example lower bounds for the independence number of selected graphs. The true values are: , , , .

If the conjecturing

program were given all published invariants in a mathematical field, all real-number operators used by mathematicians, and all published bounds, the program would necessarily produce new conjectures (not implied by existing theory) that are as simple as any human can produce (with respect to the objects that it knows). That is, if a human were to produce a simpler conjecture that is true for all objects the computer knows then, necessarily and by the design of the program, the conjecture must either be false for one of these objects or it must be implied, with respect to these objects, by the conjectures that the program does produce: that is, the produced conjectures must give bounds that are at least as good as the human conjecture. The program necessarily will consider every simpler conjecture: it iteratively generates and evaluates every single syntactically possible statement in order from the least complex to more complex. At the moment it considers the human’s conjecture, if it does not produce the conjecture itself, it is because it is either false, or not

significant in the described sense.

The following two conjectures were generated in our summer workshop; they have been verified for all of the more than 14 million connected small-order graphs (). In addition, we’ve used random graph generators to test these conjectures on a large sample of random graphs of assorted models (including Erdős-Renyi graphs, random regular graphs, random bounded tolerance graphs, random interval graphs, and random bipartite graphs). For each model, we tested many instances with a wide variety of parameters (also randomly generated within the given parameter space) and orders up to at least 100.

(1) independence_number >= min(girth, floor(lovasz_theta))

This is how an output conjecture of the program conjecture appears to a user. In particular it is an unquantified open sentence that must be interpreted. Since we used only connected graphs in this investigation, we interpret this over all connected graphs: that is, For every connected graph independence_number(x) >= min(girth(x), floor(lovasz_theta(x))).

Here girth and Lovász theta are graph invariants, while min and floor are real-number operators. The (lower) bound on the right-hand side of this inequality has complexity-4. The girth of a graph is the number of edges of a smallest cycle in the graph; it can be computed efficiently.

The Lovász theta number is, in fact, the best upper bound in practice for estimating the independence number of a graph; and, since the independence number is integral, the floor of this number must be an upper bound. It is interesting to note that here we have a conjectured lower bound for the independence number expressed in terms of the best upper bound. The conjecture can then be restated: for any connected graph, either the independence number is at least as big as its girth or the independence equals the floor of its Lovász theta number.

(2) independence_number <= (average_distance)^(degree_sum)

We interpret this conjecture for connected graphs. The average distance is the average of the distances between distinct pairs of vertices in the graph. This invariant is actually a lower bound for the independence number of a graph [10]. The degree sum is the sum of the degrees of the vertices of the graph. Here the caret “^” is the exponentiation operator 222Jianxiang Chen suggests a proof sketch at:

independence_number(x) >= minimum(girth(x), floor(lovasz_theta(x)))
independence_number(x) >= minimum(diameter(x), lovasz_theta(x))
independence_number(x) >= maximum(residue(x), 1/2*lovasz_theta(x))
independence_number(x) >= 2*floor(arccosh(lovasz_theta(x)))
independence_number(x) >= floor(arccosh(lovasz_theta(x)))^2
independence_number(x) >= ceil(lovasz_theta(x)) - radius(x)
independence_number(x) >= ceil(lovasz_theta(x)) - girth(x)
independence_number(x) >= floor(2*tan(matching_number(x)) - 2)
independence_number(x) >= floor(log(tan(order(x))^2)/log(10))
Figure 14. Open conjectures for the lower bound of the independence number of a connected graph (that would fit this box using invariants already defined here). The full list of open upper and lower bound conjectures for the independence number may be found at:

4. Big Mathematics

Many disciplines make important use of labs and even larger-scale collaboration—Big Science. Collaborative physics made an enormous splash recently with the discovery of gravitational waves by the LIGO consortium of more than 900 collaborating scientists [1] (and a 2017 Nobel Prize in Physics), confirming a prediction of Einstein’s theory of relativity, and pursued for more than 40 years.

Mathematicians can also make advantageous use of labs and large-scale organization: examples of large-scale, organized, collaborative mathematics include the British WWII code-breaking groups at Bletchley Park, and (presumably) similar ongoing research at the National Security Agency (NSA). The mathematics group at Bell (and later AT&T) Labs could be harnessed to address problems as needed. Other impressive examples, albeit less tightly organized, might include the classification of finite simple groups and the Polymath Project. With continued research on automated mathematical discovery programs, and the development of code-bases, and mathematical databases, it is now possible to envision large-scale, organized, collaborative mathematics existing in the future.

Our enormous mathematical knowledge bases—stored as research papers—are not being effectively or systematically exploited. Tens of thousands of mathematical research papers published each year—maybe hundreds of thousands. Only a small amount of this knowledge can be leveraged by any single researcher or group of researchers: the literature is simply too vast. Much of this knowledge can be usefully computerized so that intelligent computer assistants like Conjecturing can easily use it. Many of these papers contain new concepts. Any of these could be useful—the real test is if they show up in conjectures that advance our mathematical goals. We should leverage this knowledge—by coding it—to more quickly advance our shared mathematical goals.

We maintain a database of values of invariants for most of our coded-stored graphs. Some of these values were calculated either with significant computer resources or using theoretical knowledge. Some of this overlaps other graph theory databases including House of Graphs [5] and the Encyclopedia of Finite Graphs [27]. It would be useful—and more efficient—if researchers never had to repeat any of these computations. A universal graph theory database would be of real utility to researchers. We imagine one day there may be something like a National Institute of Mathematics maintaining a variety of mathematical databases, and housing and organizing projects like this.

Figure 15. Lászlo Lovász, Doron Zeilberger, Hao Wang

What we have done is only a small-scale experiment, a demonstration of what is possible. It would be interesting to see the results of a large-scale experiment. Continued sustained research on coding existing bounds for the independence number of a graph, generating conjectures that represent potential improvements for graphs where lower and upper bounds are not equal, proving them, adding this as a theorem, and iterating might converge on useful and efficient independence number bounds. Even if we might still be able to use these bounds to predict the exact value of the independence number of a graph with high probability—and this may be enough for practical purposes. Zeilberger for instance has imagined a mathematical future with results of exactly this type [50].

We have also made experiments with property-relation conjectures conjectures: these are necessary and sufficient conditions for an object to have a specified property. These conjecture types can be generated in an analogous way to the invariant-relation conjectures described so far. Examples of conjectured necessary or sufficient conditions for a graph having the property of being hamiltonian are reported in [32]. Much more work needs to be done here: in particular, we have coded relatively very few graph properties.

It is an important fact that successful automated discovery programs are designed to address existing mathematical problems—and their utility is measured with respect to our own (human) mathematical goals. Consider for example the conjecturing program of Hao Wang. Wang was an automated mathematical discovery pioneer while he was at IBM in the late 1950s and the developer of the first conjecturing program [47]. He wanted his program to produce “interesting” mathematical statements—but he didn’t factor in any mathematical goal. He reported: “The number of theorems printed out after running the machine for a few hours is so formidable that the writer has not even attempted to analyze the mass of data obtained.” If some of these statements were mathematical advances Wang didn’t know it. Our human goals are central to the success and (human) evaluation of our mathematical progress.

The kind of research advocated here naturally allows for the talents of researchers and students with a wide variety of abilities. Our Graph Brain Project summer 2017 workshop included students at the high school, undergraduate and graduate levels, together with faculty. The two high school students both made meaningful contributions—and learned quite a bit of graph theory along the way. They both started by coding graphs from the literature—tedious but necessary in order to achieve research literature comprehension. Both ended up doing more interesting and sophisticated coding. One of these students, with no previous coding experience, coded two different algorithms for finding the largest set of vertices in a graph that induces a bipartite subgraph. Every day in the lab we talked about open problems at the board, discussed proof ideas, ideas for constructing counterexamples, and then chose what to work on for the day that would advance our short-term and long-term goals.

This workshop was a natural way for researchers with a wide range of talents to work—in the same place, pushing forward research together organically, to learn, and to enjoy mathematical camaraderie. Furthermore, the natural science model of laboratories suggests ways for our students to quickly make contributions in naturally collaborative environments (which is definitely not the norm in our often isolated mathematical worlds). This might also suggest new ways to attract–and interest—a wider, more diverse, field of mathematical talent.

Any area of mathematics where objects, invariants and properties can be coded is amenable to investigations which exactly parallel what we have described for graphs and the independence number.

5. How to Contribute

Two ways to contribute to this kind of research are to either contribute to the research we have begun in graph theory, or to begin the work of coding objects, invariants, and properties for some other area of mathematics.

Our Conjecturing program is open-source, and written to work with Sage, an open source mathematical computational environment meant to substitute for better-known, expensive and proprietary mathematical software, and that uses Python as its interface language. This program, examples, and set-up instructions are available at: Researchers in every area of mathematics can easily replicate our graph theory experiments in their own areas of research. The matrix, number and graph theory scripts we used in [31] and other examples are also available here—these can be imitated in initial investigations.

For graph theory we have begun to code the objects, invariants, and properties from the graph theory literature. What we’ve done so far includes many well-known and standard terms, familiar to all researchers. These are available at:
These are also coded for Sage. Researchers can download these, see what’s been done, and start coding—or at least add invariants to code as Github “issues”—this a dynamic bulletin board of what needs to be coded, and what progress has been made, which any researcher anywhere can add to and comment on. Read papers, watch talks, note new concepts and graphs and add them. It is also possible to “fork” what we’ve done. Take it, do your own thing, and build on it. We’ve also pre-computed values for almost all of these invariants, for almost all of these graphs. This precomputed database can be very useful for fast conjecturing—the program will, by default, compute any values it needs, so having pre-computed values can really speed things up.

Another way to help with our Graph Brain Project is to prove or find counterexamples to the open conjectures of our program: every theorem and every counterexample count as new knowledge—and will lead to improved conjectures. There are also many graphs for which values for certain invariants and certain properties are as-yet unknown. They need to be computed. Any new computed values can easily be added to local copies of our database and, better, if posted as a Github issue, will be included in the posted, shared copy of the database. Having these values will also improve the conjectures made by the conjecturing program.

In other areas of mathematics, just start! Code a few objects and invariants, and see what conjectures you get. Iterate and add.

The computational tools we used in our graph theory investigations included geng (included in the nauty package) for comprehensive non-isomorphic generation of all connected graphs up to any given order [37], benzene for the generation of benzenoid graphs [4], and buckygen for the generation of fullerene graphs [7]. These are very useful for searching for small counterexamples to graph conjectures. It would be useful in any other area of investigation to code similar generators for the systematic construction of example objects.

6. Acknowledgements

The authors are grateful for useful comments from S. Cox, R. Meagher, M. Ong Ante, J. Padden, R. Segal, N. Sloane that have greatly improved our presentation. Jianxiang Chen has been active on the Github Graph Brain Project site, finding counterexamples to conjectures


  • [1] B. P. Abbott, R. Abbott, T. D. Abbott, M. R. Abernathy, F. Acernese, K. Ackley, C. Adams, T. Adams, P. Addesso, R. X. Adhikari, et al. Observation of gravitational waves from a binary black hole merger. Physical review letters, 116(6):061102, 2016.
  • [2] J. M. Borwein and D. H. Bailey. Mathematics by experiment: Plausible reasoning in the 21st century. AK Peters, 2004.
  • [3] A. Bradford, J. Day, L Hutchinson, C. E. Larson, M. Mills, D. Muncy, B. Kaperick, and N. Van Cleemput. Automated conjecturing II: Chomp and intelligent game play. submitted, 2017.
  • [4] G. Brinkmann, G. Caporossi, and P. Hansen. A constructive enumeration of fusenes and benzenoids. Journal of Algorithms, 45(2):155–166, 2002.
  • [5] G. Brinkmann, K. Coolsaet, J. Goedgebeur, and H. Mélot. House of Graphs: A database of interesting graphs. Discrete Applied Mathematics, 161(1):311–314, 2013.
  • [6] G. Brinkmann, S. Crevals, and J. Frye. An independent set approach for the communication network of the GPS III system. Discrete Applied Mathematics, 2011.
  • [7] G. Brinkmann, J. Goedgebeur, and B. D. McKay. The generation of fullerenes. Journal of chemical information and modeling, 52(11):2910–2918, 2012.
  • [8] M. Campbell-Kelly, M. Croarken, R. Flood, and E. Robson (editors). The history of mathematical tables: from Sumer to spreadsheets. Oxford University Press, 2003.
  • [9] M. Chudnovsky. Hadwiger’s conjecture and seagull packing. Notices Amer. Math. Soc., 57(6):733–736, 2010.
  • [10] F. R. K. Chung. The average distance and the independence number. J. Graph Theory, 12(2):229–235, 1988.
  • [11] D. M. Cvetković, M. Doob, and H. Sachs. Spectra of graphs. Johann Ambrosius Barth, Heidelberg, third edition, 1995. Theory and applications.
  • [12] The Sage Developers. SageMath, the Sage Mathematics Software System (Version 8.0), 2017.
  • [13] J. R. Dias. Handbook of polycyclic hydrocarbons, volume 1. Elsevier Science Ltd, 1987.
  • [14] P. Duchet and H. Meyniel. On Hadwiger’s number and the stability number. In Graph theory (Cambridge, 1981), volume 62 of North-Holland Math. Stud., pages 71–73. North-Holland, Amsterdam, 1982.
  • [15] P. Erdős, M. Saks, and V. T. Sós. Maximum induced trees in graphs. J. Combin. Theory Ser. B, 41(1):61–79, 1986.
  • [16] J. Essinger. Ada’s Algorithm: How Lord Byron’s Daughter Ada Lovelace Launched the Digital Age. Melville House, 2014.
  • [17] S. Fajtlowicz. A characterization of radius-critical graphs. J. Graph Theory, 12(4):529–532, 1988.
  • [18] S. Fajtlowicz. On conjectures of Graffiti. In Proceedings of the First Japan Conference on Graph Theory and Applications (Hakone, 1986), volume 72, pages 113–118, 1988.
  • [19] S. Fajtlowicz and C. E. Larson. Graph-theoretic independence as a predictor of fullerene stability. Chemical physics letters, 377(5-6):485–490, 2003.
  • [20] O. Favaron, M. Mahéo, and J.-F. Saclé. On the residue of a graph. J. Graph Theory, 15(1):39–64, 1991.
  • [21] P. W. Fowler and D. E. Manolopoulos. An Atlas of Fullerenes. Clarendon Press Oxford, 1995.
  • [22] M. R. Garey and D. S. Johnson. Computers and intractability. W. H. Freeman and Co., San Francisco, Calif., 1979. A guide to the theory of NP-completeness, A Series of Books in the Mathematical Sciences.
  • [23] M. Grigsby. A horizontal edges bound for the independence number of a graph. Master’s thesis, Virginia Commonwealth University, 2011.
  • [24] M. Grötschel, L. Lovász, and A. Schrijver.

    The ellipsoid method and its consequences in combinatorial optimization.

    Combinatorica, 1(2):169–197, 1981.
  • [25] I. Gutman and S. J. Cyvin. Introduction to the theory of benzenoid hydrocarbons. Springer-Verlag Berlin, 1989.
  • [26] P. Hansen and M.-L. Zheng. Sharp bounds on the order, size, and stability number of graphs. Networks, 23(2):99–102, 1993.
  • [27] T. Hoppe and A. Petrone.

    Integer sequence discovery from small graphs.

    Discrete Applied Mathematics, 201:172–181, 2016.
  • [28] L. Hutchinson, V. Kamat, C. E. Larson, S. Mehta, D. Muncy, and N. Van Cleemput. Automated conjecturing VI: Domination number of benzenoids. to appear in MATCH, 2017.
  • [29] D. E. Knuth. The sandwich theorem. Electron. J. Combin., 1:Article 1, approx. 48 pp. (electronic), 1994.
  • [30] C. E. Larson. The critical independence number and an independence decomposition. European J. Combin., 32(2):294–300, 2011.
  • [31] C. E. Larson and N. Van Cleemput. Automated conjecturing I: Fajtlowicz’s Dalmatian heuristic revisited. Artificial Intelligence, 231:17–38, 2016.
  • [32] C. E. Larson and N. Van Cleemput. Automated conjecturing III: Property-relations conjectures. Annals of Mathematics and Artificial Intelligence, 81(3):315–327, 2017.
  • [33] L. Lovász. On the Shannon capacity of a graph. Information Theory, IEEE Transactions on, 25(1):1–7, 1979.
  • [34] L. Lovász and M. D. Plummer. Matching theory, volume 121 of North-Holland Mathematics Studies. North-Holland Publishing Co., Amsterdam, 1986. Annals of Discrete Mathematics, 29.
  • [35] F. Maffray and H. Meyniel. On a relationship between Hadwiger and stability numbers. Discrete Math., 64(1):39–42, 1987.
  • [36] W. McCune. Solution of the Robbins problem.

    Journal of Automated Reasoning

    , 19(3):263–276, 1997.
  • [37] B. D. McKay. Nauty user’s guide (version 2.4). Computer Science Dept., Australian National University, 2007.
  • [38] G. L. Nemhauser and L. E. Trotter, Jr. Vertex packings: structural properties and algorithms. Math. Programming, 8:232–248, 1975.
  • [39] M .E. J. Newman, D. J. Watts, and S. H. Strogatz. Random graph models of social networks. Proceedings of the National Academy of Sciences of the United States of America, 99(Suppl 1):2566, 2002.
  • [40] P. R. J. Östergård. Constructing combinatorial objects via cliques. In Surveys in combinatorics 2005, volume 327 of London Math. Soc. Lecture Note Ser., pages 57–82. Cambridge Univ. Press, Cambridge, 2005.
  • [41] L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. 1999.
  • [42] R. Pepper. An upper bound on the independence number of benzenoid systems. Discrete Appl. Math., 156(5):607–619, 2008.
  • [43] R. Pepper. On the annihilation number of a graph. Recent Advances in Applied Mathematics and Computational And Information Sciences, 1:217–220, 2009.
  • [44] H. A. Simon and A. Newell. Heuristic problem solving: The next advance in operations research. Operations Research, 6(1):1–10, 1958.
  • [45] N. J. A. Sloane. The on-line encyclopedia of integer sequences. In Towards Mechanized Mathematical Assistants, pages 130–130. Springer, 2007.
  • [46] A. Turing. Intelligent machinery. The Essential Turing, pages 395–432, 2004.
  • [47] H. Wang. Toward mechanical mathematics. IBM J. Res. Develop., 4:2–22, 1960.
  • [48] H. S. Wilf and D. Zeilberger. Towards computerized proofs of identities. Bulletin of the American Mathematical Society, 23(1):77–83, 1990.
  • [49] W. Willis. Bounds for the independence number of a graph. Master’s thesis, Virginia Commonwealth University, 2011.
  • [50] D. Zeilberger. Theorems for a price: Tomorrows semi-rigorous mathematical culture. The Mathematical Intelligencer, 16(4):11–18, 1994.