 # Symbolic Regression using Mixed-Integer Nonlinear Optimization

The Symbolic Regression (SR) problem, where the goal is to find a regression function that does not have a pre-specified form but is any function that can be composed of a list of operators, is a hard problem in machine learning, both theoretically and computationally. Genetic programming based methods, that heuristically search over a very large space of functions, are the most commonly used methods to tackle SR problems. An alternative mathematical programming approach, proposed in the last decade, is to express the optimal symbolic expression as the solution of a system of nonlinear equations over continuous and discrete variables that minimizes a certain objective, and to solve this system via a global solver for mixed-integer nonlinear programming problems. Algorithms based on the latter approach are often very slow. We propose a hybrid algorithm that combines mixed-integer nonlinear optimization with explicit enumeration and incorporates constraints from dimensional analysis. We show that our algorithm is competitive, for some synthetic data sets, with a state-of-the-art SR software and a recent physics-inspired method called AI Feynman.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Regression methods have numerous applications in science and economics. In traditional regression methods, such as linear or logistic regression, the relationship between a dependent variable – say

– and a number of independent variables – say

– is assumed to have a fixed, parametric, functional form, and the parameters are calculated from data to minimize a specific “loss” function. For example, in linear regression, the functional form is assumed to be a linear function of the independent variables:

. The unknown parameters are the coefficients that are calculated so as to minimize, say, the least-square error: , where and are the values of and , respectively, in the th data point.

In symbolic regression (SR), both the functional form of of a regression function, and the associated parameters or coefficients are calculated from data so as to minimize some loss function. The functional form is assumed to be anything that can be composed from a given list of basic functions or operators applied to the independent variables and arbitrary constants. For example, if the operators are , and , then the space of all possible functions is the set of all polynomials of arbitrary degree.The usefulness of SR was demonstrated in (Connor & Taylor, 1977; Langley, 1981; Willis et al., 1997; Davidson et al., 2003; Schmidt & Lipson, 2009, 2010).

Given its generality, symbolic regression is clearly a difficult problem and there has been a lot of research into designing computationally effective algorithms. Starting with (Koza, 1992), SR problems were solved with genetic programming (GP) (Augusto & Barbosa, 2000; Banzhaf et al., 1998), and SR is often treated as being synonymous with GP (Stephens, 2019). In (Korns, 2011)

, the accuracy of SR solutions achieved by genetic algorithms was claimed to be poor. "Bloat" is another common issue in GP based methods (obtaining functions with high description length).

Some recent research on improving symbolic regression techniques is aimed at speeding them up (Luo et al., 2017), finding accurate constants in the derived symbolic mathematical expression (Izzo et al., 2017), and searching for implicit functional relationships (Schmidt & Lipson, 2010). These approaches are primarily based on genetic algorithms. Another approach explicitly populates a large hypothesis space of functions, on which sparse selection is applied (Brunton et al., 2016). In (Udrescu & Tegmark, 2019)

, a non-GP SR algorithm is developed , potentially well-suited for physics problems, that combines neural-network fitting with several techniques inspired by physics, for example, dimensional analysis, and also complete enumeration of symbolic expressions of a certain limited size.

A major challenge in symbolic regression is the difficulty of finding scientifically meaningful models out of the large number of possible models that may fit given data. In (Schmidt & Lipson, 2009, 2010), partial derivatives are matched, in addition to fitting data, to discover meaningful functions. In AI Feynman (Udrescu & Tegmark, 2019), dimensional analysis is used to both speed up the solution process, and to generate meaningful physical models.

Some recent papers proposed mathematical optimization based approaches for obtaining regression models: a model is obtained as a solution of a constrained optimization problem that selects the kind of model and optimizes its parameters. Various Mixed-Integer Programming (MIP) formulations were proposed in

(Bertsimas & Shioda, 2007) for finding classification and regression models. Mixed-Integer Nonlinear Programming (MINLP) formulations for SR were introduced in (Cozad, 2014), (Avron et al., 2015) and (Horesh et al., 2016). Some computational results were presented in (Cozad, 2014; Austel et al., 2017; Cozad & Sahinidis, 2018). In these three papers, the BARON (Tawarmalani & Sahinidis, 2005) solver was used to obtain globally optimal (see the next section) solutions, though other similar solvers can be used. An advantage of the MINLP framework is that, in principle, globally optimal symbolic expressions can be obtained along with certificates of optimality. However, existing MINLP models for SR, and the solvers for such problems, have limited scalability. In addition, the MINLP framework allows one to incorporate domain knowledge that can be encoded via nonlinear constraints: adding constraints to enforce dimensional consistency is proposed in (Gunawan et al., 2017).

### 1.1 Our contribution

In this paper, we develop a symbolic regression algorithm based on solving MINLPs inspired by and extending the free-form SR discovery framework (Austel et al., 2017) that was tested on a few toy problems, such as relations found by Kepler and Galileo. Our work is different from the work in the previous paragraph in multiple ways. Firstly, we sacrifice global optimality to get a faster algorithm (thus our algorithm can be viewed as a heuristic, though we can, in principle obtain globally optimal solutions given enough time). Secondly, we choose a different MINLP formulation of an expression tree than in previous papers, and also employ a different search strategy. Finally, we add constraints to enforce dimensional consistency.

We compare our method with AI Feynman (Udrescu & Tegmark, 2019) on the synthetic, physics-inspired data set in the associated paper, and show that our MINLP-based approach is still competitive for a number of problems, and obtains solutions to some problems that are not solved by Eureqa (but solved by AI Feynman). We can also obtain solutions to some problems from less data.

## 2 Related work

A symbolic regression scheme consists of a space of valid mathematical expressions composable from a basic list of operators (we assume these to be unary or binary), and a mechanism for exploring the space. Each valid mathematical expression can be represented by an expression tree: a rooted binary tree where each non-leaf node has an associated binary or unary operator (, , , , , etc.), and each leaf nodes has an associated constant or independent variable. An example of an expression tree for the expression

 14mx2(ω2+ω20)

is presented in Figure 1. The second tree in the figure is explained later. Figure 1: Expression tree for 14mx2(ω2+ω20), expressed with full arithmetic notation, and as a more compact tree of L-monomials.

A typical genetic programming (GP) solver searches the space of expressions using a genetic algorithm The Eureqa package (Schmidt & Lipson, 2014), based on the article (Schmidt & Lipson, 2009), is a state-of-the-art GP-based solver. Another popular solver is gplearn (Stephens, 2019). AI Feynman (Udrescu & Tegmark, 2019) first trains a neural network (NN) on the input data, so that the function encoded by the NN is an accurate fit for the input data. It then uses tests if the true function has various properties such as separability, symmetry etc, by evaluating the NN at points not originally in the data set. In addition, it uses dimensional analysis to speed up the search for the best symbolic expression. In (Udrescu & Tegmark, 2019), the authors applied their algorithm to one-hundred physics equations, and claim to rediscover all of them. They observe that Eureqa obtained solutions to only 71 of these instances.

### 2.1 Globally optimal symbolic regression

In a series of works (Cozad, 2014; Cozad & Sahinidis, 2018; Avron et al., 2015; Horesh et al., 2016; Austel et al., 2017), symbolic regression (SR) methods based on combined discrete and continuous optimization were developed, and the symbolic regression problem was formulated in various ways as a Mixed-Integer Nonlinear Programming (MINLP) problem. The MINLP problem is solved to global optimality using an off-the-shelf MINLP solver such as BARON (Tawarmalani & Sahinidis, 2005), COUENNE or SCIP (Maher et al., 2017). These solvers solve problems of the form

 min f(x,y) (1) s.t. g(x,y)≤b (2) x∈Rm,y∈Zm (3)

where

is a vector of

continuous variables, is a vector of discrete variables, is a real-valued function that can be composed using a finite list of operators such as etc. (this list will vary with each solver), and is a vector-valued function created using the same operators. These solvers use various convex-relaxation schemes to obtain lower bounds on the objective-function value, and utilize these bounds in branch-and-bound schemes to obtain globally optimal solutions.

This approach produces a globally-optimal mathematical expression (along with a certificate of optimality) while avoiding an exhaustive search of the solution space. Another advantage is that it directly produces correct, within a tolerance, real-valued constants; most other methods use specialized algorithms to refine constants (Izzo et al., 2017) and cannot guarantee global optimality.

In these MINLPs, the set of valid binary expression trees is specified by a set of constraints over discrete and continuous variables. The discrete variables of the formulation are used to define the structure of the expression tree including the assignment of operators to non-leaf nodes and whether a leaf node is assigned a constant or a specific independent variable. The continuous variables are used for the undetermined constants, and also to evaluate the resulting symbolic expression for specific numerical values associated with individual data points. The objective functions vary from accuracy (measured as sum of squared deviations) to model complexity in (Austel et al., 2017).

In (Austel et al., 2017), for example, the MINLP has the following form:

 min C(fθ) complexity s.t. θ∈T grammar vi=fθ(x(i)),  ∀i∈I prediction D(fθ(x),y)≤ε error

where represents an expression tree defined by the structural and continuous decision variables collectively designated as ; is the universe of valid expression trees; measures the description complexity; measures error of the predicted values ; is the vector of observations for inputs .

## 3 System Description

We first describe the basics of our system (without dimensional analysis), and then later explain how we incorporate dimensional analysis.

We take as input a list of operators (for this discussion, assume the input operators are , and ), an upper bound on the tree depth, an upper bound on the number of constants, and a domain for the constants . In Figure 1, there is a single constant (distinct from 1) with value in the expression tree and in the L-monomial tree. In the prior work on globally optimal symbolic regression, the MINLP formulations typically have discrete variables that (1) determine the placement of the operators in nodes of the expression tree and (2) the mapping of independent variables to leaf nodes, and (3) determine whether to map an independent variable or a constant to a leaf node.

In our formulation, we simply do not have discrete variables for (1). We explicitly enumerate all possible assignments of the operators in expression trees up to depth . More precisely, if a“partial” expression tree is one where the leaf nodes are undetermined, but the operator assignments to non-leaf nodes are determined, then we enumerate all possible partial expression trees up to a certain depth. Our assignment of variables/constants to leaf nodes is also different from prior work. Let be all the independent variables. Instead of assuming that each leaf node is either a constant or one of as in the prior work, we assume each leaf node is a one-term multivariable Laurent polynomial of the form

 hxa11xa22⋯xann, (4)

where are (undetermined) integers (for computational efficiency, we limit these integers to lie in the range for some input constant ), and represents an (undetermined) constant in the final expression. We refer to an expression of the type (4) an L-monomial. In other words, rather than assigning a single variable or constant to a leaf node, we potentially assign both variables and constants, and also multiple variables (with positive or negative powers to a leaf node). We call the resulting trees generalized expression trees, and gentrees for convenience.

Each gentree (with depth say ) corresponds to a symbolic expression: each non-leaf node with height corresponds to the expression formed by applying the operator at the node to the expressions (i.e., L-monomials) in the children nodes, and non-leaf nodes at greater heights are handled in the same manner in order of height. The only gentree with depth 0 corresponds to the L-monomial , whereas the gentrees of depth 1 correspond to the expressions , ,  ,  and , respectively, where and are L-monomials.

### 3.1 Pruning the list of gentrees

We try to reduce the number of relevant partial expression trees/gentrees by removing a number of "redundant" trees using the fact that the set of nonzero (the constant in 4 is nonzero) L-monomials is closed under multiplication and division. Notice that both and are L-monomials and thus can be represented by a single expression . In other words, if there is a symbolic expression of the form that fits our data, then there is one of the form . Accordingly our first few pruning rules are: remove a tree from the list of all gentrees with depth up to if has the subexpression

 [R1] L1×L2 or L1/L2. [R2a] L1×(L3±L4). [R2b] (L1±L2)∗(L3±L4). [R3] (L1±L2)/L3.

The first rule was explained above. The second follows by associativity: , for some L-monomials and . If a with a subexpression appears in , then replacing this subexpression by results in another gentree of the same depth, which must therefore be in . The same idea can be applied to the subexpression . We can apply associativity twice to justify rule R2b, as , for some (the same argument holds when either is replied by a in R2b). Once again, the second expression has the same depth as the first, and therefore there must be a tree in containing it. R3 can be explained similarly.

### 3.2 MINLP formulation for a gentree

Let the data points be for , where is an index set, and let the features/independent variables be . Let stand for the observations/dependent variable, with standing for the th observation (of the dependent variable). Let be a given gentree with leaf nodes, and let be the vector of all integer variables corresponding to the powers of the independent variables in the different leaf nodes. Then . Let be the variables corresponding to the constants in leaf nodes . Finally, assume we have a 0-1 variable that determines whether or not the th leaf node has a constant that is different from one. Thus the (vector) variables in our model are and . Let stand for the symbolic expression defined by fixing the values of these variables. Then the MINLP we solve can be framed as

 min ∑i∈I(y(i)−fh,p,z,T(x(i))2 (5) s.t. −δ≤pi≤δ   for i=1,…,mn −Ωzi+(1−zi)≤hi≤Ωzi+(1−zi) for i=1,…,m (7) m∑i=1zi≤k (8) z∈{0,1}m,  p∈Zmn (9)

The constraints (9) and (3.2) force to take on 0-1 values, and to take on values in the range . The constraint (7) restricts to lie in the range if has value 1, and forces to take on value 1, when has value 0. The constraints (8) allow at most of the variables to have value 1, and therefore at most of the values to be different from 1. The function is composed of the operators in the non-leaf nodes of from the L-monomials in the leaf nodes. The objective function is the sum of squares of differences between the symbolic expression values for each input data point and the corresponding dependent variable value (call it the least-square error).

We use BARON to solve the MINLPs we generate, and thus and are limited by the operators that BARON can handle (). We will illustrate for a few examples, rather than specify it formally. Suppose we are trying to derive the formula , where and are independent variables, and is an unknown constant, and we have multiple data points (for ) with values for the independent variables, and for associated . If is a depth 0 gentree, then is just a single L-monomial, and where are undetermined integers in the range , and is an undetermined real number in the range . The objective function is then

 ∑i∈I(Fi−hma1imb2irci)2 ,

where and are the values of in the th data point. If the gentree can be written as , then the objective function becomes

 ∑i∈I(Fi−(h1ma11imb12irc1i+h2ma21imb22irc2i))2;

here and are undetermined, and we solve for these values.

As depicted in Figure 1, an expression tree of a certain depth can sometimes be represented by a gentree of smaller depth. Therefore, by enumerating gentrees of a certain depth, one obtains a much richer class of functions than can be obtained by expression trees of the same depth.

### 3.3 Enumeration and parallel processing

By enumerating the generalized expression trees and solving them separately, the problem is divided into multiple, easier-to-solve sub-problems (operator placement in non-leaf nodes does not have to be determined any more) that can be solved much more quickly. This "divide and conquer" formulation can obtain solutions to problems that were intractable for the formulation in (Austel et al., 2017).

When available, we exploit parallelism, and run multiple threads, with one MINLP corresponding to a single gentree in a single thread. If we obtain a solution that fits the data within a prescribed tolerance from a gentree with depth , then we stop all processes/threads containing gentrees with depth . We terminate a gentree if the lower bound on the least-square error exceeds the least-square error found from other gentrees of the same or lower depth. Thus we execute a branch-and-bound type search, and implicitly search for the least depth gentree that fits the data.

If we have more gentrees than available cores, we process them in a round robin fashion; if is the number of cores, we start solving for gentrees in parallel, and after a fixed amount of time (10 seconds), we pause the first gentrees, and start solving more, till we either find a solution or run out of gentrees in which we case we start from the first gentree. The gentrees are sorted by a measure of complexity (roughly equal to the number of nodes). Of course, the gentree enumeration (and thus our overall algorithm) grows exponentially with gentree depth : the number of gentrees for are, respectively, 7, 60, 4485 and over 100,000. Our gentree enumeration approach is unlikely to be tractable for and is already hard for if we include more operators than listed earlier. But each L-monomial can represent large expression trees.

### 3.4 Dimensional analysis

Consider Newton’s law of universal gravitation which says that the gravitational force between two bodies is proportional to the product of their masses and inversely proportional to the square of the distance between their centers (of gravity):

 F∝m1m2r2

and the constant of proportionality is G, the gravitational constant. Therefore, we have

 F=Gm1m2r2.

The units of are chosen so that units of the right-hand-side expression equal the units of force (mass distance time-squared). Suppose one is given a data set for this example, where each data item has the masses of two bodies and the distance and gravitational force between them. Dimensional analysis would rule out or , for example, as possible solutions.

If constants can have units and every L-monomial can have a such a constant, then dimensional analysis conveys essentially no information. For example, we can choose constants with appropriate dimensions so that both the expressions and satisfy all dimensional requirements. In the AI Feynman experiments, the authors do not allow constants to have units, and also ensure that for all required universal constants that have units (such as the Gravitational constant), both numerical values and units are given as input. We allow constants to have units or not based on an input flag.

We next explain the constraints we add to our formulation to enforce dimensional consistency. Assume that all constants have no units, and that , and , and . For the gravitation example (without the gravitational constant as an input), each L-monomial has the form

 L=hma1mb1rc, (10)

where , and are bounded integer variables with an input range of , and is a variable representing a constant. We add to the system of constraints (3.2) - (9) linear constraints that equate the units of the symbolic expression to those of the dependent variable. For the case our gentree consists of a single node (as in the first tree in Figure 1), our symbolic expression has the form (10), and we add the linear constraints

 a⎡⎢⎣100⎤⎥⎦+b⎡⎢⎣100⎤⎥⎦+c⎡⎢⎣010⎤⎥⎦=⎡⎢⎣   1   1−2⎤⎥⎦.

Here, each component of a column vector corresponds to a unit (mass, distance, time, respectively). The column on the right-hand side represents the units of force, i.e., mass times distance divided by squared time. The columns on the left-hand side represent the dimensionalities of mass and distance, respectively. The third of the above linear equations does not have a solution.

Next, assume the symbolic expression is where and , and the unkowns are and . The units of and have to match, and must also equal the units of . The linear constraints we add are

 a1⎡⎢⎣100⎤⎥⎦+b1⎡⎢⎣100⎤⎥⎦+c1⎡⎢⎣010⎤⎥⎦ =⎡⎢⎣   1   1−2⎤⎥⎦ (11) a2⎡⎢⎣100⎤⎥⎦+b2⎡⎢⎣100⎤⎥⎦+c2⎡⎢⎣010⎤⎥⎦ =⎡⎢⎣   1   1−2⎤⎥⎦. (12)

Finally, for the symbolic expression , to compute the units of this expression we need to add up the units of and of . Thus we sum the left hand sides of (11) and (12) and equate this sum to the righ hand side of (11). We can similarly deal with dimension matching in the remaining gentrees of depth 1 via linear constraints. For greater depth gentrees, we apply the ideas above to depth 1 (non-leaf nodes), and then to depth 2 nodes and so on.

## 4 Results

### 4.1 Feynman Database for Symbolic Regression

In our first experiment, we choose a subset of the 100 problems from the Feynman Database for Symbolic Regression (FSReD), created in (Udrescu & Tegmark, 2019). We first eliminate the 19 problems which involve operators our system cannot handle because these operators cannot be handled by the underlying MINLP solver we use (namely ). We give detailed results on 32 out of the remaining 81 instances with varied properties in Table 1: some can easily be solved by AI Feynman and Eureqa, and some are hard for these solvers, and some need many data points or limited noise.

We set the depth limit to 3, and the number of constants to 1. We terminate when we find an expression with objective value (squared error) less than (error tolerance). The operators we use are described in Section 3. We set to 100, so all constants are in the range . We set for computational efficiency. This means that each power in an L-monomial lies in the range . Under this setting, the pruning rules – R1, R2,R3 – remove potentially non-redundant gentrees, and our algorithm does not explore all possible symbolic expressions that are representatable as a gentree of depth up to 3. Though L-monomials are closed under multiplication, L-monomials with bounded powers are not. For example, if there is a solution of the form , we may not find it.

We constrain the search in three more ways. We bound the sum of the variable powers in an L-monomial by an input number . Thus, for each term of the type (4), we add the constraint , which is representable by a linear constraint using auxiliary variables and and for . Secondly, we add another pruning rule. We remove any gentree which contains a subexpression of the form where is an L-monomial (we allow , for example). Finally, we drop the operator from our set. However, as we allow positive and negative constants, we can simulate a operator. But because we only allow at most one constant, this means that we effectively allow a single operator in our generated solutions. All constants are allowed to have units.

We give our results in Table 1. The first column contains the equation label from FSReD, the second shows the actual function. The third column gives the running time (in seconds) of our code on a machine with 30 cores, and with . A ’f’ in this column implies our code did not produce the correct answer within the time limit of 600 seconds. The fourth column gives the outcome of our experiment with noise. The “data used” column contains the number of data points we use, which is (the first) 10. Our method does not scale well with the number of data points: equation (5) becomes more complex, and the resulting MINLPs harder. We can solve for 20-30 points for most instances, but the running time increasing significantly for some problems. In columns 6-8 we give the time taken by AI Feynman, the min number of points it needs to get the correct answer, and the maximum noise level under which it obtains the correct answer, all taken from the associated paper, as is the Eureqa solution status in the last column.

We fail on 5 instances, but on the remaining instances, our results are mixed in comparison to AI Feynman. We are often slower, but sometimes faster (II.24.17, II.38.3, III.10.19), even if we scale our times by 30 (to account for the parallelism used by our code). We note that AI Feynman searches over more operators, and thus a potentially larger search space. For three instances – I.48.20, II.11.27, III.10.19 (with the number of data points in bold) – we obtain the correct answer with one-tenth the number of data points needed by AI Feynman.

To solve I.6.20a, indicated by a ’*’, we first spend the time limit of 600 seconds using the default set of operators, and then restart with replaced by , and then take another 100 seconds. This is why we report a running time of 700. While using , we also decrease the error tolerance to . (We note that AI Feynman also cycles through different sets of operators). Out of the remaining 49 problems, we are able to solve 36 instances, and fail on 13. Changing some of our parameters will allow additional problems to be solved at the cost of making the code slower. For example, increasing the number of constants to 2 allows our code to solve I.15.3t. Finally, BARON can handle the operator, but we have not experimented with it.

In a second experiment, we set the relative noise level to (as described in the AI Feynman paper), and report a “yes” if one of the gentrees returns the expected formula as a solution and has low error. When the “yes” is in bold, AI Feynman needs a lower noise level. However, as we generate arbitrary constants, for some problems we find solutions that have lower squared error than the true formula (on the 10 data points we chose), and one should apply AIC or BIC to choose from alternate formulae.

A major difference between our code and AI Feynman is that our code finds arbitrary constants, whereas AI Feynman finds combinations of fixed symbols (such as small integers and and other known physical constants). For example, in I.12.2, we find as . For problem I.13.12, AI Feynman uses the known value of Newton’s Gravitational constant and cannot discover it, whereas our code can.

## 5 Conclusions

A few MINLP based SR methods have been proposed earlier, but these are not very scalable. These methods obtain an “optimal” solution and a certificate of optimality. We proposed an MINLP based approach that enumerates expression trees with fixed operators, and uses MINLP to solve for the form of the leaves. Our approach is competitive with state-of-the-art SR packages on some difficult problems. It incorporates dimensional analysis (a first for an MINLP based implementation) but can also find real-valued constants (as can gplearn). A number of drawbacks – such as limitations on number of data points, and high variability with respect to parameters – of our method suggest future directions of research. Even though our overall method is a heuristic, the MINLP solves for individual gentrees return certificates of optimality. One can speedup the method by developing heuristics to find good solutions for a gentree and terminating before proving optimality.

Our code combines a number of useful features: a) it infers results from limited data, b) discovers complex constants, and c) deals with some noise as in real data, and d) is reasonably fast. AI Feynman cannot handle arbitrary constants or much noise, and Eureqa cannot handle explicit equations (such as dimensional equations, though dimensional equations can be handled implicitly by an application of the the Buckingham-Pi and the implied data preprocessing).

Acknowledgements Tyler Josephson was primarily supported by the U.S. Department of Energy (DOE), Office of Basic Energy Sciences, Division of Chemical Sciences, Geosciences and Biosciences under Award DE-FG02-17ER16362. Tyler Josephson and Lior Horesh gratefully acknowledge the support of the Institute for Mathematics and its Applications (IMA), where a part of this work was initiated.

## References

• Augusto & Barbosa (2000) Augusto, D. A. and Barbosa, H. J. Symbolic regression via genetic programming. In Proceedings of the Sixth Brazilian Symposium on Neural Networks, pp. 173–178. IEEE, 2000.
• Austel et al. (2017) Austel, V., Dash, S., Gunluk, O., Horesh, L., Liberti, L., Nannicini, G., and Schieber, B. Globally optimal symbolic regression. NIPS Symposium on Interpretable Machine Learning, 2017.
• Avron et al. (2015) Avron, H., Horesh, L., Liberti, L., and Nahamoo, D. Globally convergent system and method for automated model discovery, June 30 2015. U.S. Patent Application 20170004231, application No. 14/755,942 filed June 30, 2015.
• Banzhaf et al. (1998) Banzhaf, W., Nordin, P., Keller, R. E., and Francone, F. D. Genetic programming: an introduction, volume 1. Morgan Kaufmann, San Francisco, 1998.
• Bertsimas & Shioda (2007) Bertsimas, D. and Shioda, R. Classification and regression via integer optimization. Operations Research, 55:252–271, 2007.
• Brunton et al. (2016) Brunton, S. L., Proctor, J. L., and Kutz, J. N. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the National Academy of Sciences, 113(15):3932–3937, 2016.
• Connor & Taylor (1977) Connor, J. W. and Taylor, J. B. Scaling laws for plasma confinement. Nuclear Fusion, 17(5):1047, 1977.
• Cozad (2014) Cozad, A. Data- and theory-driven techniques for surrogate-based optimization. PhD thesis, Carnegie Mellon, Pittsburgh, PA, 2014.
• Cozad & Sahinidis (2018) Cozad, A. and Sahinidis, N. V. A global minlp approach to symbolic regression. Math. Program., Ser. B, 170:97–119, 2018.
• Davidson et al. (2003) Davidson, J. W., Savic, D. A., and Walters, G. A. Symbolic and numerical regression: experiments and applications. Information Sciences, 150(1):95–117, 2003.
• Gunawan et al. (2017) Gunawan, O., Horesh, L., Nannicini, G., and Zhou, W. Symbolic regression embedding dimensionality analysis, 2017. U. S. Patent Application 20190188256A1, application no. 15/843,118, filed Dec 15, 2017.
• Horesh et al. (2016) Horesh, L., Liberti, L., and Avron, H.

Globally optimal mixed integer non-linear programming (MINLP) formulation for symbolic regression.

Technical Report 219095, IBM, 2016.
• Izzo et al. (2017) Izzo, D., Biscani, F., and Mereta, A. Differentiable genetic programming. In et al., J. M. (ed.), EuroGP 2017, LNCS 10196, pp. 35––51, 2017.
• Korns (2011) Korns, M. F. Accuracy in symbolic regression. In Riolo, R., Vladislavleva, E., and Moore, J. H. (eds.), Genetic Programming Theory and Practice IX, pp. 129—151. Springer, Berlin, 2011.
• Koza (1992) Koza, J. R. Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, 1992.
• Langley (1981) Langley, P. Data-driven discovery of physical laws. Cognitive Science, 5(1):31–54, 1981.
• Langmuir (1918) Langmuir, I. The adsorption of gases on plane surfaces of glass, mica and platinum. Journal of the American Chemical Society, 40(9):1361–1403, 1918. doi: 10.1021/ja02242a004.
• Luo et al. (2017) Luo, C., Chen, C., and Jiang, Z. A divide and conquer method for symbolic regression. arXiv:1705.08061v2, 2017.
• Maher et al. (2017) Maher, S. J., Fischer, T., Gally, T., Gamrath, G., Gleixner, A., Gottwald, R. L., Hendel, G., Koch, T., Lübbecke, M. E., Miltenberger, M., Müller, B., Pfetsch, M. E., Puchert, C., Rehfeldt, D., Schenker, S., Schwarz, R., Serrano, F., Shinano, Y., Weninger, D., Witt, J. T., and Witzig, J. The SCIP optimization suite 4.0. Technical Report 17-12, ZIB, Takustr.7, 14195 Berlin, 2017.
• Schmidt & Lipson (2009) Schmidt, M. and Lipson, H. Distilling free-form natural laws from experimental data. Science, 324(5923):81–85, 2009.
• Schmidt & Lipson (2010) Schmidt, M. and Lipson, H. Symbolic regression of implicit equations. In Genetic Programming Theory and Practice VII, pp. 73–85. Springer, Berlin, 2010.
• Schmidt & Lipson (2014) Schmidt, M. and Lipson, H. Eureqa (Version 0.98 beta) [Software]. 2014. Available from www.nutonian.com.
• Stephens (2019) Stephens, T. Genetic programming in python, with a scikit-learn inspired api: gplearn. version 0.41, 2019.
• Tawarmalani & Sahinidis (2005) Tawarmalani, M. and Sahinidis, N. V. A polyhedral branch-and-cut approach to global optimization. Mathematical Programming, 103(2):225–249, 2005.
• Udrescu & Tegmark (2019) Udrescu, S.-M. and Tegmark, M. AI Feynman: A physics-inspired method for symbolic regression. Sci. Adv. 6: eaay2631, 2020.
• Veeramachaneni et al. (2015) Veeramachaneni, K., Arnaldo, I., Derby, O., and O’Reilly, U.-M. Flexgp: Cloud-based ensemble learning with genetic programming for large regression problems. Journal Of Grid Computing, 13(3):391–407, 2015.
• Willis et al. (1997) Willis, M. J., Hiden, H. G., Marenbach, P., McKay, B., and Montague, G. A. Genetic programming: An introduction and survey of applications. In Genetic Algorithms in Engineering Systems: Innovations and Applications, 1997. GALESIA 97. Second International Conference On (Conf. Publ. No. 446), pp. 314–319. IET, 1997.