Log In Sign Up

AI-Assisted Discovery of Quantitative and Formal Models in Social Science

by   Julia Balla, et al.

In social science, formal and quantitative models, such as ones describing economic growth and collective action, are used to formulate mechanistic explanations, provide predictions, and uncover questions about observed phenomena. Here, we demonstrate the use of a machine learning system to aid the discovery of symbolic models that capture nonlinear and dynamical relationships in social science datasets. By extending neuro-symbolic methods to find compact functions and differential equations in noisy and longitudinal data, we show that our system can be used to discover interpretable models from real-world data in economics and sociology. Augmenting existing workflows with symbolic regression can help uncover novel relationships and explore counterfactual models during the scientific process. We propose that this AI-assisted framework can bridge parametric and non-parametric models commonly employed in social science research by systematically exploring the space of nonlinear models and enabling fine-grained control over expressivity and interpretability.


page 1

page 2

page 3

page 4


Discovery of Nonlinear Dynamical Systems using a Runge-Kutta Inspired Dictionary-based Sparse Regression Approach

Discovering dynamical models to describe underlying dynamical behavior i...

Neurosymbolic Programming for Science

Neurosymbolic Programming (NP) techniques have the potential to accelera...

Automated Learning of Interpretable Models with Quantified Uncertainty

Interpretability and uncertainty quantification in machine learning can ...

A Computational Inflection for Scientific Discovery

We stand at the foot of a significant inflection in the trajectory of sc...

Integration of Data and Theory for Accelerated Derivable Symbolic Discovery

Scientists have long aimed to discover meaningful equations which accura...

Macroeconomic Predictions using Payments Data and Machine Learning

Predicting the economy's short-term dynamics – a vital input to economic...

I Introduction

Figure 1:

Using neuro-symbolic regression to systematically guide model discovery in social science. Analogous to the inductive-deductive reasoning process, a dataset of interest (1) – which may be time-series, cross-sectional, or longitudinal – is supplied to OccamNet. The user can provide inductive biases (2), such as the choice of key variables, known constants, or specific functional forms to constrain the search space. OccamNet finds interpretable and compact solutions that model the input data by sampling functions from an internal probability distribution represented using

P-nodes [16]. In this example, OccamNet recovers the governing equation of the Solow-Swan model of economic growth [52]

from a synthetic dataset. Each formal model is characterized by its error distribution in the training set (3), allowing the user to identify outliers and interrogate its internal validity. The symbolic model is then used to generate predictions (4) to perform deductive tests across unseen data, either by partitioning a test set or informing experimental designs (5).

Quantitative and formal models are widely employed in economics, sociology, and political science [47, 24] for studying diverse subjects ranging from economic growth [52] to participation in social movements [29]. They are also used to describe empirical regularities, such as patterns in voting behavior and city scaling laws, from which mechanisms can be derived and subsequently tested [11, 23]. While qualitative methods remain a dominant paradigm in much of social science due to their ability to include normative arguments, capture greater complexity and nuance, and extrapolate from sparse data, mathematical models provide a complementary way to reason about causal mechanisms, generate questions, and produce empirically testable predictions [21].

These models are typically constructed using a combination of existing theoretical paradigms, intuition about the relevant variables, and insights on how they interact. Although intuition can serve as a good inductive prior, especially since researchers – being human agents themselves – can have valuable insights on social phenomena, it suffers from two key drawbacks. First, because available data can be high-dimensional, noisy, or strongly nonlinear, it may be difficult to recognize certain empirical regularities. Second, social phenomena are often explained by a large space of plausible and intuitive theories [33]. This means that personal and experiential factors [30]

, together with hidden assumptions made during the modeling process, can bias the models. The inclusion or omission of certain variables may even change the sign of the estimated parameters. These biases can threaten the external validity of these models, especially if they are tested in isolation without a systematic examination of the entire solution space


Machine learning provides a collection of powerful tools for detecting patterns in data. However, these models are often uninterpretable and lack the ability to generalize beyond the observed data. Symbolic regression – the process of discovering symbolic expressions from empirical data – has thus evolved as a central method for uncovering hidden relationships in complex data. In recent years, these methods have seen substantive use in the discovery of fundamental laws in the natural sciences including mathematical conjectures [33], physical laws [56, 17, 59], and ecological models [10, 44].

There have also been several applications of symbolic regression to social science data [48, 55, 62, 20], indicating the promise of AI-assisted model discovery in these fields. However, most existing methods lack flexibility in incorporating user-specified inductive biases: a crucial element for seamless human-machine collaboration, particularly in cases where inference from sparse data can benefit greatly from human input. They are also unable to take advantage of the additional statistical power offered by longitudinal (panel) datasets, where information shared between panels can improve the generalizability of the learned model.

We address these problems in our work and propose a general framework for using neuro-symbolic regression to assist the discovery of quantitative models in social science. Building off OccamNet [16], a neural method for efficient symbolic regression, our system searches for parsimonious functional relationships or differential equations and evaluates them according to their explanatory power111The code used for our model and experiments is available at We extend existing interpretable machine learning methods on two fronts: (i) improving noise tolerance in small datasets (ii) and enabling the system to learn symbolic relationships across longitudinal data (e.g., in a panel dataset across countries) using a weight-sharing approach. This allows one neural-symbolic model to be trained on multiple panels with different parameters simultaneously. Using the proposed workflow shown in Fig. 1, these contributions allow neuro-symbolic methods to be more readily applied to ab-initio model discovery in social science, where small and noisy datasets are often available across contexts with different model parameters.

We highlight several key applications of this system in a social science context. In section III.1, we show that complexity regularization can be used to find higher-order corrections in network degree distributions, which we extend in mapping the complexity-accuracy Pareto front of economic production functions in section III.1.2. We argue that such maps can help researchers trade-off various quantitative models and debias the search process. Next, we apply this method to dynamical models in section III.1.2, where we use real-world, noisy data to discover the Lotka-Volterra equations and show that weight-sharing learning can be applied across epidemic waves to provide better performance in finding infection models. Finally, we highlight the utility of neuro-symbolic regression in evaluating existing models in section III.2.3, using weight-sharing learning to map the complexity-accuracy front of economic growth models across 18 OECD countries and evaluate the generalizability of Solow and Swan’s exogenous growth model [52, 54]. We hope that the capabilities demonstrated in this article will inspire greater collaborative use of these machine learning methods in social science.

Ii Method

We employ a neural network architecture built on OccamNet

[16] to find sparse, interpretable formal models that fit the input data. The architecture is designed to accept a range of inductive priors from the user to regularize the regression or reduce the size of the search space.

In the most general formulation, OccamNet takes as input a panel dataset of the form on variables, where each dataset with samples is partitioned by the user into an input set


and a target set


for symbolic regression, i.e. . When the data is not longitudinal, we simply have with one dataset .

Alternatively, the user can also allow OccamNet to automatically find a partition by having it search through all possible combinations of inputs and targets and fitting an implicit function to the dataset(s). This mode of operation is useful if the user is seeking new empirical regularities in data and wishes to obtain a ranked list of candidate relations based on their error and compactness.

The goal of the regression problem is to find vectors of functions and hidden constant parameters

that approximate or exactly specify each target variable with input vector based on a predefined set of basis functions . These basis functions can have any number of arguments, be specified over different domains, or include constants to be optimized over (e.g., power laws of the form ). They may include pre-determined variables, such as , , the measured rate of capital growth, or the basic reproduction number of a virus. Furthermore, OccamNet can operate with piece-wise functions or non-differentiable, potentially recursive programs like , enabling a wide range of possibilities for constructing formal models.

OccamNet works by representing a probability distribution over a space of user-specified functions. During training, OccamNet samples functions from its probability distribution, evaluates their effectiveness at fitting the given data, and updates its probability distribution so as to weight its probability toward better-fitting solutions. An overview of the implementation of OccamNet and its advantages over other neuro-symbolic regression methods for social science data is provided in Appendix A.

ii.1 Regularization

Our model is built on two key inductive priors: the use of symbolic bases and complexity regularization. Symbolic regression enables us to find interpretable expressions (as opposed to black-box or fully non-parametric results), which can inform an understanding of the mechanisms underlying the formal model. For instance, the form of differential equations, the existence of scaling laws, or interaction terms between certain variables can provide insight into the behavior of the social system. Furthermore, although symbolic expressions may only be suited to certain kinds of social science theories, they describe precisely specified relationships within the dataset and can be used to generate testable predictions across populations or timeframes.

Complexity regularization enables model discovery to follow the maxim that “the theory should be just as complicated as all our evidence suggests” [33]. Such a requirement does not necessarily depend on the assumption of parsimony: although it may be true that simple theories have higher probabilities [63], it also ensures that there is sufficient evidence relative to the complexity of the theory to produce determinate research designs. In the computational context, for many systems, the function relating the variables of interest only consists of a few terms, leading to a sparse regression problem which makes model discovery computationally tractable and avoids the need to perform a brute-force search over functions [9].

To bias OccamNet toward simpler solutions, we include two regularization terms in the optimization objective, which the user can tune to control the simplicity of OccamNet’s output: Activation Regularization and Constant Regularization. These terms penalize the number of activation bases and the number of tune-able constants used in the resulting expression respectively.

We may further constrain the model’s search space by accounting for the units of the variables present in the data. For instance, a dataset on economic growth might include variables with units such as dollars per capita or units of labor per year. Upon user specification of the units of each input and target variable, OccamNet checks whether the units of a sampled function match those of the target variable. It then adds a Unit Regularization term to penalize functions where the units do not match. While OccamNet may also add a unit-normalizing constant in front of the mismatched terms, this would come at the cost of additional complexity. The implementation of all of the regularization terms is described in more detail in Appendix A.1.

ii.2 Differential equation discovery

For time-dependent datasets, we wish to discover dynamical systems of the form , where is a vector denoting the state of the system at time . The input to OccamNet is a time-series dataset


where we have samples that are timestep apart starting from an initial state at . Our regression target is then given by . Fitting dynamical equations follows the same process as described in II with the exception that is first computed numerically from the input data using the central difference formula222The central difference formula is given by and accounts for second-order error in the derivative approximation.. Since each is often contaminated with noise, it may be necessary to filter both and to counter differentiation error, as described in Appendix A.2.

While we primarily evaluate OccamNet on systems of non-linear differential equations in which the target variables are decoupled, we note that our model can also be applied to coupled systems by fitting the equations individually and adding the output of each fit as an input variable. For instance, for a system of two ODEs and as regression targets, one can use OccamNet to fit and append the predicted time-series to the input data for fitting .

Figure 2: (a) Discovering the network densification process in paper citations. The top diagrams show the evolution of the arXiv high energy physics citation network over time. OccamNet is able to discover the correct functional form for the densification law based on a table of network properties [39]. (b) A similar experiment is performed to discover the scaling law in the rank-size distribution of node degrees in the arXiv citation and Wikipedia hyperlink networks. At the same level of complexity regularization, OccamNet picks the simplest solution for each dataset, demonstrating the principle of Occam’s razor. Our model finds a power law for the Wikipedia network and more complex function for the citation network. (c) We apply OccamNet to real-world, multi-dimensional economic data from Cobb and Douglas’s 1928 paper. The complexity-accuracy trade-off is evaluated by sweeping across the available modes of regularization.

ii.3 Data ensemble learning by weight-sharing

Datasets used in social science often contain both cross-sectional and time series information. Take for example a dataset containing economic indicators across countries, where the rows of each panel are time-dependent vectors of variables . Furthermore, each country has a vector of hidden parameters that is inaccessible to the researcher. Even if the variables are related by the same functions for each , solutions to the dynamical system may be qualitatively very different because each country may have different parameters and initial conditions.

Learning on ensembles of data by sharing the weights of OccamNet on multiple panels allows us to aggregate the statistical power of such datasets and reconstruct the dynamical equations even if is inaccessible and individual country trajectories do not traverse the full phase space of the system. This is done by fixing a shared set of functions for all panels and fitting the optimal constants

for each panel individually. The system repeats this step for a specified number of epochs and selects the optimal functions

with the lowest mean squared error with respect to the target data. To support ensembles with datasets of varying sizes, we normalize the total error by dividing each individual dataset error by the number of samples in the given panel.

Since the loss computed at each epoch is a distribution across the panels, different summary statistics may be used to select the optimal symbolic model. For instance, the loss may be computed as the mean or median. However, because the model is generally nonlinear, errors of the form , where

is a vector of Gaussian distributed entries with zero mean, can lead to highly skewed distributions that require alternate measures of central tendency. It may also be the case that certain outliers require additional variables or theoretical models to explain. In this case, the researcher can choose to identify the optimal formal model from the error histogram. The data ensemble learning modality is demonstrated in Fig.


Iii Case Studies

We validate our AI-assisted approach on synthetic and real-world datasets related to social science covering both static distributions and dynamical systems. All dataset pre-processing details and model parameters are available in Appendix B.

iii.1 Static distributions

We first evaluate our neuro-symbolic regression method for the discovery of social science models on static distributions without time-dependence. We extract symbolic relations as illustrated in Fig. 2, which summarizes our results on fitting datasets of single variables such as network densification and degree distributions, as well as a multi-variable distribution for the Cobb-Douglas production function [14].

iii.1.1 Network properties

Beginning with Barabási and Albert’s seminal paper on scale-free networks [3], graph formation properties and degree distributions have become an increasingly active area of research. Network structures are studied in economics and sociology in the context of clustering, information spread, and skewed distributions, such as those exemplifying the “rich get richer” effect in preferential attachment processes. Here, we focus on two sets of persistent properties observed in human networks: densification laws and degree distributions.

In Fig. 2 (a), we use neuro-symbolic regression to find the network densification process in citation networks. Densification occurs when the number of edges in a graph grows super-linearly in the number of nodes over time. We use OccamNet to reconstruct the densification law of the citation graph of high energy physics preprints on arXiv from January 1993 to April 2003, where the number of edges grows according to as shown by Leskovec et al. [38]. Our fit closely matches their baseline with a densification law exponent of approximately

, indicating that the number of citations in each paper on average grows over time. Note that we expect OccamNet to produce a fit with slightly higher error compared to the Ordinary Least Squares solution since our model has the additional task of finding the correct functional form before fitting the constants using gradient descent. For more accurate constant parameters, one may opt to take the functional form outputted by OccamNet and then fit the parameters using non-linear least squares.

Next, we use OccamNet to discover the functional form of various network degree distributions. Power law distributions of the form have been found in contexts ranging from hyperlink graphs [1] to biological networks [2]. These distributions have been proposed to be universal across a wide range of systems, spawning extensive literature about their formation mechanisms and their links to network dynamics. However, alternative degree distributions like log-normal, Weibull, and exponential have also been observed in real-world networks, stimulating new work on finding alternative mechanisms that explain these functional forms [8]. Employing neuro-symbolic regression, we can go beyond the standard set of distributions to find any function drawn from the specified bases.

Fig. 2 (b) shows the degree distribution for both Wikipedia article links [58] and the high energy physics citation network. OccamNet is trained on both datasets using the same complexity regularization. In the Wikipedia dataset, the system discovers the power law distribution independently, while in the citation dataset, it finds that the degree distribution is better fit by a function of the form

. The two examples underscore the principle of Occam’s razor. For a given level of complexity regularization, the system picks the simpler functional form for the Wikipedia dataset because this dataset more closely resembles a power law. On the other hand, OccamNet discovers a more complex, but far more accurate expression for the HEP-TH dataset because this dataset deviates significantly from simpler models. The citation network fit performs better than both the power law and log-normal distributions.

It is worth noting that using mean-square error minimization does not generally provide the optimal estimator for any particular degree distribution. For instance, least-squares fitting can produce inaccurate estimates for power law distributions, and goodness-of-fit tests such as likelihood ratios and the Kolmogorov-Smirnov statistic must be used to evaluate the model [13]. However, in the case where the functional form is not known a priori, we opt to use mean-square error minimization as a Bayesian posterior with a non-informative prior. Other informative priors can be used when the set of functions are narrowed down, for instance, by performing maximum-likelihood estimation for each of the hypotheses proposed by the symbolic regression system.

iii.1.2 Production functions

One of the best-known symbolic relations in economics is the Cobb-Douglas production function [14]. Its introduction in 1928 represented the first time an aggregate production function was developed and statistically estimated [5]. Predicting that a stable relationship between inputs and outputs could be derived from empirical data, the authors found a robust linear relationship from the log output-to-capital ratio and the log labor-to-capital ratio of the American manufacturing industry. This relationship was later shown to hold across countries and timeframes [19]. The function has since been widely adopted as a measure of productivity, as a utility function, and as a foundation of subsequent models such as the Solow-Swan model of long-run economic growth [52, 54].

In its most common form, the law states that the total production (output) of a single good with two input factors, capital and labor , is given by


where is typically a constant representing total factor productivity (ratio of output not explained by the quantity of inputs), and and are the output elasticities of capital and labor, respectively. In cases where we expect to see constant returns to scale (i.e., an increase in the input quantity leads to an equivalent increase in the output), and are additionally assumed to satisfy the constraint .

Cobb and Douglas used a US manufacturing dataset covering annual production from 1899-1922 to fit the linear regression

by ordinary least squares. The resulting constant parameters were given by and , resulting in the equation which is plotted in Fig. 2 (c). We explore the expressions for predicted by OccamNet for the same dataset at varying levels of complexity regularization. We define the complexity value of an expression as the sum of the number of activation bases and constants. While there are other possible measures of complexity, such as the number of bits needed to store the symbol string of the expression [56], our definition matches the regularization options provided for OccamNet training as described in II.1. We also note that any complexity metric will not be an exact measure of parsimony, which is typically defined according to the social and paradigmatic context through which the expression is interpreted.

The Pareto front of the complexity-accuracy trade-off from sweeping across Activation Regularization parameter values is shown in Fig. 2 (c). The full specification of model parameters is included in Appendix B.3. Without any regularization, OccamNet discovers an expression with lower loss and higher complexity than the Cobb-Douglas baseline. Moreover, with a small amount of regularization, our model finds an equation with both lower loss and lower complexity than the baseline. The Cobb-Douglas function, therefore, lies outside the Pareto front – under our particular definition of complexity – for the given US manufacturing dataset. We recover a Cobb-Douglas-like expression of the form only after restricting the set of possible bases for the regression to the ones found in (4). With only 24 available samples available in the data, our model benefits from a stronger inductive bias on the target function to reduce the risk of over-fitting.

In contrast to the complexity metric, the true definition of parsimony, of course, depends on how the expression is interpreted through an economic lens. This is why the use of symbolic regression must be viewed as a collaborative human-machine process. While the Cobb-Douglas function has a strong theoretical foundation, OccamNet offers a method for both discovering new models and for validating the existing expression. The ability to customize the library of bases and the depth of each symbolic expression allows for the straightforward injection of existing knowledge about the expected model. Sparse and noisy datasets such as the one above would often be better suited to this modality.

Figure 3: Regression of coupled dynamical models using noisy real-world data. (a) Time-series plot of a simulated Lotka-Volterra predator-prey system. OccamNet was able to correctly reconstruct the functional form and constants with high accuracy. (b)

Using cubic spline interpolation, our system was able to learn the two differential equations from noisy, real-world data of lynx and hare populations with just 21 data points each. The inferred non-linear model can then be used to extend predictions of future populations.

(c) The symbolic regression system is used to infer the SIR model of pandemic spread in synthetic data and (d) an ensemble of real-world measles infection data in the UK.

iii.2 Dynamical laws

We next apply our neuro-symbolic regression method to differential equation discovery. OccamNet is primarily evaluated on systems of non-linear differential equations, where each ODE is fit independently. We use both synthetic and real-world datasets to fit models of population dynamics, disease spread, and economic growth.

iii.2.1 Predator-prey relationships

Ecologists have long relied on mathematical models of population dynamics for studying ecological processes such as predator-prey interactions. While these models – often taking the form of differential equations – were historically derived from first principles and rigorous theory-building, symbolic regression offers a promising tool for reverse-engineering ecological models from data [44].

A simple yet paramount model of predator-prey interactions between two species is given by the Lotka-Volterra equations [40],


where and are the populations of the prey and predator, respectively. The constant parameter is the rate of exponential population growth for the prey, is the rate of exponential death for the predator, is the rate at which the predators destroy the prey, and is the rate of predator population growth due to the consumption of prey. Inspired by the chemical law of mass action [4], the Lotka-Volterra equations state that the rate of predation is directly proportional to the product of the populations of the prey and predator.

The Lotka-Volterra predator-prey model has significant implications beyond ecology. Richard Goodwin was one of the first to adopt the model to describe the cyclical relationship between employment and wages in economics, otherwise known as Goodwin’s class struggle model [25]. Other applications of generalized Lotka-Volterra systems in economics include the modeling of wealth distributions in society and values of firms in the stock market [43].

In Fig. 3 (a) we simulate a system of Lotka-Volterra equations as defined in (5) with synthetic parameters , , , and . We then use OccamNet to fit and and simulate our generated differential equations using the same initial conditions. Our model is able to rediscover the exact functional form and constant parameters.

We then apply OccamNet to recover the Lotka-Volterra model from the Hudson Bay Company’s data on the lynx and hare population in Canada from 1845-1935 [41]. As shown in Fig. 3 (b), we use the subset of records from 1900-1920 as it contains two population cycles with the least apparent noise. We additionally apply cubic-spline interpolation to reduce noise before fitting the numerically computed derivatives as discussed in Appendix A.2.

When fitting the interpolated data using OccamNet, we recover a system of ODEs that closely resembles the Lotka-Volterra equations. In particular, our model is able to learn the exponential population growth and decay terms and , as well as the critical interaction terms of the form and . Additionally, the best OccamNet fit includes small constant correction terms of and . These constants have much smaller magnitudes relative to the other terms and may be due to over-fitting. A researcher may choose to either ignore these kinds of correction terms, or instead increase the level of constant regularization in OccamNet to obtain a fit with higher loss but potentially better generalizability.

iii.2.2 Epidemic spread

Next, we use OccamNet to discover compartmental models of epidemics. In particular, we use synthetic and real datasets for the well-known Susceptible-Infected-Recovered (SIR) [34] model for fixed population sizes, which is given by the ODE system


where , , and represent the proportions of the susceptible, infected, and recovered populations at time respectively, and where .

The SIR model and numerous variants are often used to describe the spread of infectious diseases such as measles [6] or COVID-19 [15]. Such models are valuable for predicting the magnitude or duration of an epidemic, estimating epidemiological parameters such as the basic reproduction number [57] (given by in the SIR model), and for forecasting the impact of public health interventions. Beyond modeling disease spread, SIR variants have also been used to study the diffusion of information on social networks [60, 37], and thus carry substantial relevance to social science.

In Fig. 3 (c), we simulate a synthetic time-series using the SIR model. OccamNet discovers the correct functional form of the SIR model along with approximately correct constant parameters up to rounding. The deviations of the learned constant parameters likely stem from higher-order errors in the numerical derivative estimate which are not addressed by the central difference formula. One could opt to use higher-order derivative approximations to account for such errors if necessary.

Next, we demonstrate the data ensemble learning functionality of OccamNet on a panel dataset for measles infection data in the UK. Horrocks and Bauch used the SINDy (Sparse Identification of Nonlinear Dynamics) algorithm [9] to fit a variant of the SIR model with time-varying contact rates to this measles dataset as well as to chickenpox and rubella data [32]. We note that SINDy requires the specification of a pre-set library of possible terms for which the algorithm learns sparse vectors of coefficients. OccamNet instead uses a specified library of simple bases to compose more complex terms with no additional inductive bias necessary. Our method requires less prior knowledge about the expression structure and is thus better suited to deriving new functional forms.

Horrocks and Bauch fit the entire measles time-series to a singular equation for with time-varying [32]. We instead demonstrate OccamNet’s ability to discover the SIR model for each cycle of the epidemic with different and parameters. Using a subset of the data from 1948-1958, we generate an ensemble of five measles incidence cycles. We then apply the denoising techniques as in [9] and fit each dataset both individually and as an ensemble in which we learn the same functional form for all five datasets with varying constant parameters.

Fig. 3 (d) highlights the advantage of ensemble learning over individual fits. When each 2-year cycle is fit independently, OccamNet struggles to learn expressions with the SIR-like form of . Due to the sparsity and noisiness of each individual dataset, it only extracts the interaction term from one of the five periods. The ensemble fit, however, discovers a function that included the key form of , as shown in the last row of the table in Fig. 3 (d). The parameter is a placeholder for a constant that varies for each cycle. Ensemble learning can therefore help avoid over-fitting to individual datasets and improve generalization. While our model also learns higher order terms such as and for the ensemble, these terms are of much smaller magnitude compared to the leading terms and are thus of less importance to the correct fit. This is another example in which OccamNet’s custom regularization capabilities could be applied to eliminate higher-order terms.

iii.2.3 Long-run economic growth

The final dynamical system we consider is the neoclassical Solow-Swan model of economic growth [52]. The model postulates that long-run economic growth can be explained by capital accumulation, labor growth, and technological progress. The model typically assumes Cobb-Douglas-type aggregate production with constant returns to scale, given by


where is the output per unit of effective labor, is the capital intensity (capital stock per unit of effective labor), and is the elasticity of output with respect to capital. The central differential equation in the Solow model describes the dynamics of ,


where is the savings rate, is the population growth rate, is the technological growth rate, and is the capital stock depreciation rate. This equation states that the rate of change of capital stock is equal to the difference between the rate of investment and the rate of depreciation. The key result of the Solow growth model is that a greater amount of savings and investments do not affect the rate of economic growth in the long run.

While the Solow model was originally derived to describe U.S. economic growth, it has also been applied to other countries. If the model were perfectly universal, we would expect to see every country’s capital grow according to equation (8), with varying hidden parameters. We thus generate a synthetic example of country panel data for economic growth by simulating the Solow equations for and , with the goal of rediscovering equation (8). As an additional level of complexity, we model the savings rate and population growth rate as time-dependent variables that grow according to the equations and . Each panel dataset is generated by randomly sampling initial conditions and parameters and

from uniform distributions of values outlined in appendix

B.6.1. The resulting cross-country time series is displayed in Fig. 1. As demonstrated in the figure, OccamNet is able to recover the exact equation for (denoted as Output 1) with varying parameter for each of the 20 synthetic panels.

We then attempt to fit the Solow model of capital growth to real-world, noisy country data. In particular, we select macro-economics data from 18 of the original 20 countries in the Organisation for Economic Co-operation and Development (OECD) for which data was available. The input to our symbolic regression includes data on capital per capita, income per capita, savings rate, and population growth compiled by Khoo et al. [35], where it was utilized for regression under the Green Solow model – a variant of Solow-Swan for modeling sustainable growth [7]. The data is originally sourced from the Penn World Tables [22], and are from the Maddison Project database [42], and is from the World Development Indicators [61]. There is no available data for the remaining parameters and , so they are instead treated as learnable constants. We also apply Savitzky-Golay filtering to smooth the data before running the regression, as described in appendix B.6.2.

Figure 4: Ensemble learning of longitudinal (panel) macroeconomic data. (a) Country-level macroeconomic data on capital and income per capita, savings rates, and population growth for 18 OECD member countries. (b) Ensemble learning of the Solow economic growth model. The error distribution of the differential equation, applied to each country, is shown for three expressions generated with increasing levels of complexity regularization. The identification of outliers may inform alternative explanations, hidden parameters, or higher-order corrections to the economic model.

In Fig. 4 (b), we compare the Solow-Swan model baseline to three expressions produced by OccamNet under increasing levels of complexity regularization. The Solow baseline is generated by finding the best-fit parameter in a least-squares of fit of equation (8). The expression with no regularization is given by with constants and that vary across countries. While this expression has a lower weighted mean-squared error (WMSE) than the baseline, its functional form does not carry immediate economic intuition like the Solow model.

We then add Constant Regularization as described in II.1, which results in the equation closely matching the functional form of (8). This suggests that the Solow model has strong external validity as it can be discovered without any strong human priors. Finally, we apply Activation Regularization in addition to Constant Regularization, resulting in the output . The learned expression contains only one term from the Solow model and has the highest WMSE. The example in Fig. 4 (b) concretely demonstrates the tradeoff between accuracy and simplicity in the discovery of symbolic models. A researcher would thus benefit from running OccamNet with several specifications of regularization to select a result with the desired level of precision and complexity.

Iv Discussion

This article aims to test and demonstrate the usefulness of neuro-symbolic methods in social science. We argue that its contribution is twofold. First, human-engineered quantitative models remain a critical part of social science research as they provide a way to formalize intuition or qualitative arguments about a social system and bridge micro-macro theories. Recognizing that, neuro-symbolic methods help researchers benchmark these formal models against counterfactuals on the complexity-accuracy Pareto front: a practice not often done when new quantitative models are proposed. This is partly due to the difficulty of generating alternatives aside from baseline linear models or theories from the existing literature. The selection strategy of symbolic methods allows the intentional tuning of complexity (or meaningfulness, with human-aided post-selection) and prediction accuracy encountered by traditional parametric regression and deep learning methods.

Second, neuro-symbolic methods can also help social scientists discover new, interpretable models or generate novel hypotheses from data. Expressions on the Pareto-front can be interpreted and tested without prejudice. Aside from complete models, terms that repeatedly appear along the front may also uncover persistent interactions [51]. Our proposed approach is powerful as it allows the user to tune the amount of bias entered into the model. While models developed ab-initio from a theoretical basis can be powerful in elucidating the implications of a present paradigm and incorporating human knowledge about human systems, the underconstrained nature of social science theories means that the researcher has to be conscious of their personal, experiential, or theoretical biases [31, 30]. Using neuro-symbolic regression, the user can opt to inject a minimal amount of bias (e.g., provide a wide range of functional bases) or intentionally specify terms that existing theories may suggest.

Broadly speaking, two classes of quantitative models are typically employed in social science [12]. Parametric regression has proven invaluable over the past few decades. Since data and computational resources have become widely available, researchers have made thorough use of methods like generalized linear models for causal inference [46]. Although parametric regression offers less expressive power than formal models, it provides benefits in interpretability and data efficiency, especially when there is strong reason to believe that the underlying relationship takes a simple (e.g., linear) form. Its key drawback, however, is that this interpretability and efficiency are derived from strong assumptions about how variables interact. More complex behaviors, such as temporal dynamics or nonlinear relations between multiple variables, are ubiquitous in social science but can be hidden by using simpler parametric methods. Likewise, nonlinear optimization methods for fitting specified models [45] also require an a priori structure.

As an alternative approach, social scientists can use non-parametric methods that make weak assumptions about the distribution of the data, such as kernel density estimation, local regression, or deep neural networks. These models are useful for prediction policy problems

[36] – where prediction accuracy, not causal identifiability, is prioritized – and for capturing relationships in systems that often operate in highly nonlinear regimes. However, these methods come at the cost of interpretability. We suggest that neuro-symbolic regression can offer a bridge between these two classes of statistical methods. In between specifying a complete model to let the system search for optimal parameters and leaving it open-ended to seek a parsimonious expression, the user can provide any amount of human insight. For instance, they may specify the presence of a power-law relationship, an interaction term, or a derivative, and use the system to compose the rest of the model. From sections III.1 to III.2, we demonstrate each of these use cases and show that the framework can successfully help social scientists discover empirical regularities in the data with interpretable expressions.

Beyond interpretability, human-supplied biases are also useful in reducing the model search space, particularly when the problem faces a dearth of informative data. Since this is often the case in social science, we stress that the use of neuro-symbolic methods must be treated as a collaborative human-machine process in which a dictionary of motifs helps make inference problems tractable, as has been suggested by past work [10, 18]. Humans also play a critical role in the validation and interpretation of the resultant models. Neuro-symbolic methods merely suggest functional fits; using these equations to paint a meaningful picture in the context of social science theory remains a human endeavor. The discovery of a formal model is often just the beginning, informing additional data collection, experiments, and qualitative inquiry [21]. A productive approach to quantitative social science needs to consider both the scientific and sociological role of modeling beyond its predictive capability alone.

Neuro-symbolic regression holds the potential to be a powerful tool for helping social scientists with hypothesis creation and model discovery. An exciting next step would be to collaborate with social scientists in applying OccamNet to their research, particularly to datasets that are conjectured to capture a symbolic law. In problems with more available data and multiple independent panels, a machine learning approach allows data to be divided into training and validation sets – provided that unexplained errors between the two sets are uncorrelated – to improve the out-of-distribution performance, or external validity, of the learned model. We hope to see our proposed human-machine framework applied to unearth entirely new quantitative or formal models. Furthermore, OccamNet’s flexibility in incorporating user-defined bases such as piecewise functions and logical operators allows for moving beyond learning arithmetic expressions. Future work could include applying OccamNet to learn more complex symbolic rules such as strategies in economic games [20] or network-structured models.

We would like to thank Megan Yamoah for useful discussions during the drafting of this article. This work was sponsored in part by the United States Air Force Research Laboratory and the United States Air Force Artificial Intelligence Accelerator and was accomplished under Cooperative Agreement Number FA8750-19-2-1000. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the United States Air Force or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein. This work was also sponsored in part by the National Science Foundation under Cooperative Agreement PHY-2019786 (The NSF AI Institute for Artificial Intelligence and Fundamental Interactions, and in part by the Air Force Office of Scientific Research under the award number FA9550-21-1-0317.


  • [1] R. Albert, H. Jeong, and A. Barabási (1999-09) Diameter of the World-Wide Web. Nature 401 (6749), pp. 130–131. External Links: ISSN 0028-0836, 1476-4687, Link, Document Cited by: §III.1.1.
  • [2] R. Albert (2005-11) Scale-free networks in cell biology. Journal of Cell Science 118 (21), pp. 4947–4957. External Links: ISSN 1477-9137, 0021-9533, Link, Document Cited by: §III.1.1.
  • [3] A. Barabási and R. Albert (1999-10) Emergence of Scaling in Random Networks. Science 286 (5439), pp. 509–512. Note: Publisher: American Association for the Advancement of Science Section: Report External Links: ISSN 0036-8075, 1095-9203, Link, Document Cited by: §III.1.1.
  • [4] A. A. Berryman (1992) The Orgins and Evolution of Predator-Prey Theory. Ecology 73 (5), pp. 1530–1535. Note: Publisher: Ecological Society of America External Links: ISSN 0012-9658, Link, Document Cited by: §III.2.1.
  • [5] J. Biddle (2012-05) Retrospectives: The Introduction of the Cobb-Douglas Regression. Journal of Economic Perspectives 26 (2), pp. 223–236. External Links: ISSN 0895-3309, Link, Document Cited by: §III.1.2.
  • [6] O. N. Bjørnstad, B. F. Finkenstädt, and B. T. Grenfell (2002) Dynamics of Measles Epidemics: Estimating Scaling of Transmission Rates Using a Time Series SIR Model. Ecological Monographs 72 (2), pp. 169–184. Note: Publisher: Ecological Society of America External Links: ISSN 0012-9615, Link, Document Cited by: §III.2.2.
  • [7] W. A. Brock and M. S. Taylor (2004-06) The Green Solow Model. Working Paper Technical Report 10557, National Bureau of Economic Research. Note: Series: Working Paper Series External Links: Link, Document Cited by: §III.2.3.
  • [8] A. D. Broido and A. Clauset (2019-12) Scale-free networks are rare. Nature Communications 10 (1), pp. 1017. External Links: ISSN 2041-1723, Link, Document Cited by: §III.1.1.
  • [9] S. L. Brunton, J. L. Proctor, and J. N. Kutz (2016-04) Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the National Academy of Sciences 113 (15), pp. 3932–3937. External Links: ISSN 0027-8424, 1091-6490, Link, Document Cited by: §A.2, §II.1, §III.2.2, §III.2.2.
  • [10] P. Cardoso, V. V. Branco, P. A. V. Borges, J. C. Carvalho, F. Rigal, R. Gabriel, S. Mammola, J. Cascalho, and L. Correia (2020-12) Automated Discovery of Relationships, Models, and Principles in Ecology. Frontiers in Ecology and Evolution 8, pp. 530135. External Links: ISSN 2296-701X, Link, Document Cited by: §I, §IV.
  • [11] A. Chatterjee, M. Mitrović, and S. Fortunato (2013-01) Universality in voting behavior: an empirical analysis. Scientific Reports 3 (1), pp. 1049. Note: Number: 1 Publisher: Nature Publishing Group External Links: ISSN 2045-2322, Link, Document Cited by: §I.
  • [12] Y. Chen, M. T. Angulo, and Y. Liu (2019-12) Revealing Complex Ecological Dynamics via Symbolic Regression. BioEssays 41 (12), pp. 1900069. External Links: ISSN 0265-9247, 1521-1878, Link, Document Cited by: §IV.
  • [13] A. Clauset, C. R. Shalizi, and M. E. J. Newman (2009-11) Power-Law Distributions in Empirical Data. SIAM Review 51 (4), pp. 661–703. External Links: ISSN 0036-1445, 1095-7200, Link, Document Cited by: §III.1.1.
  • [14] C. W. Cobb and P. H. Douglas (1928) A Theory of Production. The American Economic Review 18 (1), pp. 139–165. Note: Publisher: American Economic Association External Links: ISSN 0002-8282, Link Cited by: §B.3, §III.1.2, §III.1.
  • [15] I. Cooper, A. Mondal, and C. G. Antonopoulos (2020-10) A SIR model assumption for the spread of COVID-19 in different communities. Chaos, Solitons & Fractals 139, pp. 110057. External Links: ISSN 09600779, Link, Document Cited by: §III.2.2.
  • [16] A. Costa, R. Dangovski, O. Dugan, S. Kim, P. Goyal, M. Soljačić, and J. Jacobson (2021-07) Fast Neural Models for Symbolic Regression at Scale. arXiv:2007.10784 [cs, stat]. Note: arXiv: 2007.10784 External Links: Link Cited by: Appendix B, Figure 1, §I, §II.
  • [17] M. Cranmer, A. Sanchez Gonzalez, P. Battaglia, R. Xu, K. Cranmer, D. Spergel, and S. Ho (2020) Discovering Symbolic Models from Deep Learning with Inductive Biases. In Advances in Neural Information Processing Systems, Vol. 33, pp. 17429–17442. Cited by: §I.
  • [18] P. Dominic, D. Leahy, and M. Willis (2010-12)

    GPTIPS:an open source genetic programming toolbox for multigene symbolic regression

    Lecture Notes in Engineering and Computer Science 2180, pp. . Cited by: §IV.
  • [19] P. H. Douglas (1976-10) The Cobb-Douglas Production Function Once Again: Its History, Its Testing, and Some New Empirical Values. Journal of Political Economy 84 (5), pp. 903–915. External Links: ISSN 0022-3808, 1537-534X, Link, Document Cited by: §III.1.2.
  • [20] J. Duffy and J. Engle-Warnick (2002) Using Symbolic Regression to Infer Strategies from Experimental Data. In Evolutionary Computation in Economics and Finance, S. Chen (Ed.), Studies in Fuzziness and Soft Computing, pp. 61–82. External Links: ISBN 978-3-7908-1784-3, Link, Document Cited by: §I, §IV.
  • [21] J. M. Epstein (2008-10) Why Model?. Journal of Artificial Societies and Social Simulation 11 (4). External Links: Link Cited by: §I, §IV.
  • [22] R. C. Feenstra, R. Inklaar, and M. P. Timmer (2015-10) The next generation of the penn world table. American Economic Review 105 (10), pp. 3150–82. External Links: Document, Link Cited by: §III.2.3.
  • [23] X. Gabaix (1999-08) Zipf’s Law for Cities: An Explanation. The Quarterly Journal of Economics 114 (3), pp. 739–767. External Links: ISSN 0033-5533, 1531-4650, Link, Document Cited by: §I.
  • [24] S. Gailmard and J. W. Patty (2012-06) Formal Models of Bureaucracy. Annual Review of Political Science 15 (1), pp. 353–377. External Links: ISSN 1094-2939, 1545-1577, Link, Document Cited by: §I.
  • [25] G. Gandolfo (2007-12) The Lotka-Volterra Equations in Economics: An Italian Precursor. Economia Politica XXIV, pp. 343–348. Cited by: §III.2.1.
  • [26] M. Gavish and D. L. Donoho (2014-08)

    The Optimal Hard Threshold for Singular Values is

    IEEE Transactions on Information Theory 60 (8), pp. 5040–5053. External Links: ISSN 0018-9448, 1557-9654, Link, Document Cited by: §A.2.
  • [27] J. Gehrke, P. Ginsparg, and J. Kleinberg (2003) Overview of the 2003 kdd cup. ACM SIGKDD Explorations Newsletter 5 (2), pp. 149–151. External Links: Document Cited by: §B.1.
  • [28] D. F. Gleich and R. A. Rossi (2014) A dynamical system for pagerank with time-dependent teleportation. Internet Mathematics, pp. 1–30. Cited by: §B.2.2.
  • [29] M. Granovetter (1978) Threshold models of collective behavior. American Journal of Sociology 83 (6), pp. 1420–1443. External Links: ISSN 00029602, 15375390, Link Cited by: §I.
  • [30] M. Hammersley and R. Gomm (1997-03) Bias in Social Research. Sociological Research Online 2 (1), pp. 7–19. External Links: ISSN 1360-7804, 1360-7804, Link, Document Cited by: §I, §IV.
  • [31] K. Healy (2017-06) Fuck Nuance. Sociological Theory 35 (2), pp. 118–127. External Links: ISSN 0735-2751, 1467-9558, Link, Document Cited by: §I, §IV.
  • [32] J. Horrocks and C. T. Bauch (2020-12) Algorithmic discovery of dynamic models from infectious disease data. Scientific Reports 10 (1), pp. 7061. External Links: ISSN 2045-2322, Link, Document Cited by: §B.5.2, §III.2.2, §III.2.2.
  • [33] R. O. Keohane, G. King, and S. Verba (2021) Designing social inquiry: scientific inference in qualitative research. New edition edition, Princeton University Press, Princeton. External Links: ISBN 978-0-691-22462-6 978-0-691-22463-3 Cited by: §I, §I, §II.1.
  • [34] W. O. Kermack and A. G. McKendrick (1927) A contribution to the mathematical theory of epidemics. Proceedings of the royal society of london. Series A, Containing papers of a mathematical and physical character 115 (772), pp. 700–721. Cited by: §III.2.2.
  • [35] Z. Khoo, K. H. Lee, Z. Huang, and S. Bressan (2021)

    Neural Ordinary Differential Equations for the Regression of Macroeconomics Data Under the Green Solow Model

    In Database and Expert Systems Applications, C. Strauss, G. Kotsis, A. M. Tjoa, and I. Khalil (Eds.), Lecture Notes in Computer Science, Cham, pp. 78–90. External Links: ISBN 978-3-030-86472-9, Document Cited by: §B.6.2, §III.2.3.
  • [36] J. Kleinberg, J. Ludwig, S. Mullainathan, and Z. Obermeyer (2015-05) Prediction Policy Problems. American Economic Review 105 (5), pp. 491–495. External Links: ISSN 0002-8282, Link, Document Cited by: §IV.
  • [37] P. Kumar and A. Sinha (2021-12) Information diffusion modeling and analysis for socially interacting networks. Social Network Analysis and Mining 11 (1), pp. 11. External Links: ISSN 1869-5450, 1869-5469, Link, Document Cited by: §III.2.2.
  • [38] J. Leskovec, J. Kleinberg, and C. Faloutsos (2005) Graphs over time: densification laws, shrinking diameters and possible explanations. In Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining - KDD ’05, Chicago, Illinois, USA, pp. 177. External Links: ISBN 978-1-59593-135-1, Link, Document Cited by: §B.1, §III.1.1.
  • [39] J. Leskovec and A. Krevl (2014-06) SNAP Datasets: Stanford large network dataset collection. Note: Cited by: Figure 2.
  • [40] A. J. Lotka (1910-03) Contribution to the Theory of Periodic Reactions. The Journal of Physical Chemistry 14 (3), pp. 271–274. External Links: ISSN 0092-7325, 1541-5740, Link, Document Cited by: §III.2.1.
  • [41] D. A. MacLULICH (1937) Fluctuations in the Numbers of the Varying Hare (Lepus Americanus). University of Toronto Press. External Links: ISBN 978-1-4875-8178-7, Link Cited by: §B.4.2, §III.2.1.
  • [42] A. Maddison (2017-11) Maddison Database 2010. External Links: Link Cited by: §III.2.3.
  • [43] O. Malcai, O. Biham, P. Richmond, and S. Solomon (2002-09) Theoretical analysis and simulations of the generalized Lotka-Volterra model. Physical Review E 66 (3), pp. 031102. Note: Publisher: American Physical Society External Links: Link, Document Cited by: §III.2.1.
  • [44] B. T. Martin, S. B. Munch, and A. M. Hein (2018-05) Reverse-engineering ecological theory from data. Proceedings of the Royal Society B: Biological Sciences 285 (1878), pp. 20180422. External Links: ISSN 0962-8452, Link, Document Cited by: §I, §III.2.1.
  • [45] J. A. Nelder and R. Mead (1965-01) A Simplex Method for Function Minimization. The Computer Journal 7 (4), pp. 308–313. External Links: ISSN 0010-4620, 1460-2067, Link, Document Cited by: §IV.
  • [46] J. A. Nelder and R. W. M. Wedderburn (1972) Generalized Linear Models. Journal of the Royal Statistical Society. Series A (General) 135 (3), pp. 370. External Links: ISSN 00359238, Link, Document Cited by: §IV.
  • [47] P. E. Oliver (1993-08) Formal Models of Collective Action. Annual Review of Sociology 19 (1), pp. 271–300. External Links: ISSN 0360-0572, 1545-2115, Link, Document Cited by: §I.
  • [48] X. Pan, Md. K. Uddin, B. Ai, X. Pan, and U. Saima (2019-05) Influential factors of carbon emissions intensity in OECD countries: Evidence from symbolic regression. Journal of Cleaner Production 220, pp. 1194–1201. External Links: ISSN 09596526, Link, Document Cited by: §I.
  • [49] R. A. Rossi and N. K. Ahmed (2015) The network data repository with interactive graph analytics and visualization. In AAAI, External Links: Link Cited by: §B.2.2.
  • [50] L. I. Rudin, S. Osher, and E. Fatemi (1992-11) Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena 60 (1-4), pp. 259–268. External Links: ISSN 01672789, Link, Document Cited by: §A.2.
  • [51] M. Schmidt and H. Lipson (2009-04) Distilling Free-Form Natural Laws from Experimental Data. Science 324 (5923), pp. 81–85. External Links: ISSN 0036-8075, 1095-9203, Link, Document Cited by: §IV.
  • [52] R. M. Solow (1956-02) A Contribution to the Theory of Economic Growth. The Quarterly Journal of Economics 70 (1), pp. 65. External Links: ISSN 00335533, Link, Document Cited by: Figure 1, §I, §I, §III.1.2, §III.2.3.
  • [53] N. Chr. Stenseth, W. Falck, O. N. Bjørnstad, and C. J. Krebs (1997-05) Population regulation in snowshoe hare and Canadian lynx: Asymmetric food web configurations between hare and lynx. Proceedings of the National Academy of Sciences 94 (10), pp. 5147–5152. External Links: ISSN 0027-8424, 1091-6490, Link, Document Cited by: §B.4.2.
  • [54] T. W. Swan (1956-11) Economic Growth and Capital Accumulation. Economic Record 32 (2), pp. 334–361. External Links: ISSN 0013-0249, 1475-4932, Link, Document Cited by: §I, §III.1.2.
  • [55] P. Truscott and M. F. Korns (2014) Explaining Unemployment Rates with Symbolic Regression. In Genetic Programming Theory and Practice XI, R. Riolo, J. H. Moore, and M. Kotanchek (Eds.), pp. 119–135. Note: Series Title: Genetic and Evolutionary Computation External Links: ISBN 978-1-4939-0374-0 978-1-4939-0375-7, Link, Document Cited by: §I.
  • [56] S. Udrescu and M. Tegmark (2020-04) AI Feynman: A physics-inspired method for symbolic regression. Science Advances 6 (16), pp. eaay2631. Note: Publisher: American Association for the Advancement of Science Section: Research Article External Links: ISSN 2375-2548, Link, Document Cited by: §I, §III.1.2.
  • [57] P. van den Driessche (2017-08) Reproduction numbers of infectious disease models. Infectious Disease Modelling 2 (3), pp. 288–303. External Links: ISSN 24680427, Link, Document Cited by: §III.2.2.
  • [58] Various (2009) Wikipedia database dump. Note: Version from 2009-03-06. External Links: Link Cited by: §B.2.2, §III.1.1.
  • [59] Y. Wang, N. Wagner, and J. M. Rondinelli (2019-09) Symbolic regression in materials science. MRS Communications 9 (3), pp. 793–805. Note: Publisher: Cambridge University Press External Links: ISSN 2159-6859, 2159-6867 Cited by: §I.
  • [60] J. Woo and H. Chen (2016-12) Epidemic model for information diffusion in web forums: experiments in marketing exchange and political dialog. SpringerPlus 5 (1), pp. 66. External Links: ISSN 2193-1801, Link, Document Cited by: §III.2.2.
  • [61] World Development Indicators | DataBank. External Links: Link Cited by: §III.2.3.
  • [62] G. Yang, X. Li, J. Wang, L. Lian, and T. Ma (2015-07) Modeling oil production based on symbolic regression. Energy Policy 82, pp. 48–61. External Links: ISSN 03014215, Link, Document Cited by: §I.
  • [63] A. Zellner (1984) Basic issues in econometrics. University of Chicago Press, Chicago. External Links: ISBN 978-0-226-97983-0 Cited by: §II.1.

Appendix A OccamNet

OccamNet works by representing a probability distribution over a space of user-specified basis functions. During training, OccamNet samples functions from its probability distribution, evaluates their effectiveness at fitting the given data, and updates its probability distribution so as to weight its probability toward better-fitting solutions. Specifically, OccamNet consists of alternating Activation and T-Softmax

Layers which together represent a probability distribution over a set of functions. The Activation Layers consist of user-specified basis functions, the building blocks out of which OccamNet constructs symbolic expressions. The T-Softmax Layers represent the probabilities of connecting an Activation Layer’s outputs to the next layer, which is either another Activation Layer or the output layer. To generate trial functions, OccamNet samples from the T-Softmax Layer probabilities, obtaining a set of connections that represent a trial function: the input variables are traced along the generated graph and transformed by the basis functions along the path. Because the user specifies the bases included in each Activation Layer as well as other properties of the network such as the number of layers, OccamNet allows for a highly flexible user-specified search space. An example set of specified basis functions may be given by


where , are either variables or subexpressions and where is a placeholder for a constant parameter that is tuned via gradient descent. The user may also define custom bases that are relevant to the data.

Additionally, the user may specify the maximum depth of the symbolic expression by stacking layers of . The depth of an expression is given by the maximum number of nested basis functions. For instance, indicates a maximum expression depth of 2, which allows for expressions such as but not (having a depth of 3). Moreover, each basis layer can contain a unique set of bases if desired.

OccamNet’s architecture leads to a number of advantages over other symbolic regression methods:

  • OccamNet allows the user to take full advantage of existing knowledge by encoding inductive biases into the network. For example, if the user expects the expression to be a sum of distinct terms, they can set the final activation layer to include only addition bases. In many Symbolic Regression architectures, such inductive biases cannot be directly incorporated into the architecture.

  • OccamNet’s training procedure allows for a non-differentable fitness term, thereby allowing the user to include discrete factors in the fitness, such as whether the units match or the number of constants used. Many neural approaches to Symbolic Regression cannot handle such factors.

  • OccamNet is highly efficient and scales on a GPU, something that most symbolic regression methods cannot offer. This allows the user to find better fits in the same time as many other methods.

  • OccamNet is a neural network that can be made fully differentiable, thereby enabling it to be combined with other neural architectures. Although we do not test this in our paper, OccamNet could, for example, be used to synthesize programs or to perform regression with image or text data. Thus, OccamNet can handle a broader range of potential Symbolic Regression tasks than many other approaches.

a.1 Regularization terms

OccamNet uses two terms for complexity regularization: Activation Regularization and Constant Regularization. Let

denote the number of activation functions

and let denote the number of constants. The complexity regularization terms are thus given by


where and are scalar weights for the respective regularization terms. OccamNet subtracts these terms from the unregularized fitness of a given expression . The weight parameters and allow the user to effectively customize the complexity of the resulting expression.

OccamNet also uses Unit Regularization when indicated. For a dataset with variables, the user may specify an array of exponents , where is the number of unique units across the dataset, such that each entry contains the exponent of a unit for variable . For example, if the th variable has units of dollars/year, the th row in could be represented by the vector where the first entry corresponds to dollars and the second entry corresponds to years.

OccamNet then feeds these units through the sampled function. Each basis function receives a set of variables with units, may have requirements on those units for them to be consistent, and returns a new set of units. For example, receives one variable which it requires to have units and returns the units . Similarly, takes two variables with units that it requires to be equal and returns the same units. Using these rules, OccamNet propagates units through the function until it obtains units for the output. If at any point the input units for a basis function do not meet that basis’s requirements, that basis returns Any basis functions that receive also return . Finally, if the output units of do not match the desired output units (including if outputs ), OccamNet marks as not preserving units.

The multiplication by a constant basis function, , can produce any output units. As such, it returns If any basis function receives it will either return if it has no constraints on the input units, or it will treat the as being the units required to meet the basis function’s consistency conditions. For example, if the function receives , it will treat the input as and if the function receives and it will return .

After sampling functions from OccamNet, we determine which functions do not preserve units. Because we wish to avoid these functions entirely, we bypass evaluating their normal fitness (thereby saving compute time) and instead assign a fitness of where

is a user-specified hyperparameter.

a.2 Differential equation fitting

The target variables for differential equation fitting are computed numerically from time-series data. In particular, we use the central difference formula which accounts for second-order error in the derivative approximation. We use this formula to compute the centered derivative of each variable (column) in the input data to get .

When the time-series data is sufficiently noisy, it is helpful to counter differentiation error by applying pre-processing techniques such as total variance regularization

[50] or the hard thresholding of singular values [26] as discussed in reference [9]. Alternatively, may be filtered or regularized via frequency-based methods or polynomial interpolation prior to numerical differentiation.

Appendix B Case studies

Unless otherwise specified, all of the experiments are run with the following hyperparameters.

Epochs Learning rate Constant learning rate Decay rate Activation regularization weight Constant regularization weight
1000 5 0.05 1 0 0
Table 1: Default OccamNet hyperparameters

For each experiment, we perform a grid-search of values for the remaining hyperparameters, which are described in the original OccamNet paper [16]. We search over the following parameter values and select the results with the lowest error:

b.1 HEP-TH citation densification

We reconstruct the densification law of the citation graph of high energy physics preprints on arXiv (HEP-TH) from January 1993 to April 2003, with 29555 papers and 352807 edges [27]. For each month starting from January 1993, we aggregate the number of edges formed up to the end of . We then transform the data as and to replicate the methodology of Leskovec et al. [38] in fitting a line on the log-log scale.

Output Standard deviation Top number Equalization
5 5 5
Table 2: OccamNet hyperparameters for HEP-TH densification

Basis library:

b.2 Degree distributions

b.2.1 HEP-TH degree distribution

Using the same dataset as above, we fit the degree distribution of the citation graph at the last available time period. To mitigate the effects of the long tail of nodes with high degrees, we only include nodes with degree less than 200. We additionally remove the data points for degree 0 (papers with no citations) and degree 1. Finally, we fit the distribution in log-log space.

Output Standard deviation Top number Equalization Activation regularization weight Constant regularization weight
5 10 0 0.05 0.05
Table 3: OccamNet hyperparameters for HEP-TH degree distribution

Basis library:

b.2.2 Wikipedia hyperlink degree distribution

We fit the degree distribution of the graph of Wikipedia hyperlinks [58] sourced from the Network Repository [49, 28]. The graph consists of nodes representing individual articles in the English Wikipedia with directed edges indicating wikilinks from one article to another. We pre-process the distribution similarly to the HEP-TH citation graph, discarding nodes with degree 0 and degree less greater than 200. We fit the distribution in log-log space using the same parameters as for the HEP-TH degree distribution above.

Output Standard deviation Top number Equalization Activation regularization weight Constant regularization weight
5 10 0 0.05 0.05
Table 4: OccamNet hyperparameters for Wikipedia hyperlink degree distribution

Basis library:

b.3 Cobb-Douglas production

The dataset used by Charles Cobb and Paul Douglas to empirically test the Cobb-Douglas production function contains annual capital (K), labor (L), and output (Y) data for the US manufacturing sector from 1899-1922 [14]. We sweep across several values of the activation regularization weight to explore the complexity-accuracy trade-off of this dataset.

Output Standard deviation Top number Equalization Activation regularization weight
50 10 5 0
50 10 5 0.1
50 10 5 0.2
50 10 5 0.3
Table 5: OccamNet hyperparameters for US manufacturing data with default bases

Basis library:

We then restrict the bases to the bare-minimum needed to construct the Cobb-Douglas function.

Output Standard deviation Top number Equalization Activation regularization weight
5 5 5 0
Table 6: OccamNet hyperparameters for US manufacturing data with restricted bases

Restricted basis library:

b.4 Predator-prey dynamics

b.4.1 Synthetic Lotka-Volterra model

We simulate a system of Lotka-Volterra equations as defined in (5) with synthetic parameters and populations and . The simulation spans 200 time steps with initial conditions such that it contains 2 periods of the predator-prey cycle. The target ODE system is defined to be


We then separately fit and since the system is decoupled.

Output Standard deviation Top number Equalization
5 1 5
5 1 5
Table 7: OccamNet hyperparameters for synthetic Lotka-Volterra model

Basis library:

b.4.2 Lynx-hare population

One of the first datasets supporting the Lotka-Volterra model was the Hudson Bay Company’s data on the lynx and hare population in Canada from 1845-1935 [41]. The dataset is compiled from annual fur records on lynx and hare across 10 regions of boreal Canada, with the entries prior to 1903 being derived from questionnaires [53]. Specifically, we use the subset of records from 1900-1920. Given the sparsity of the data, we utilize cubic spline interpolation to generate 100 samples at equidistant time-steps spanning the 20-year range. This allows us to compute a smoother time-series when numerically differentiating the data.

Output Standard deviation Top number Equalization
5 1 0
5 10 1
Table 8: OccamNet hyperparameters for Lynx-Hare population data

Basis library:

b.5 Compartmental models in epidemiology

b.5.1 Synthetic SIR model

We simulate a synthetic time-series using the SIR model with initial conditions of for 60 time steps. The target model is defined to be


We then separately fit , , and .

Output Standard deviation Top number Equalization
5 5 5
5 5 0
5 10 5
Table 9: OccamNet hyperparameters for synthetic SIR model

Basis library:

b.5.2 Measles spread

We apply OccamNet’s ensemble learning capability to a panel dataset for measles infection data in the UK from 1948-1967. The measles cycle is mostly biennial, which allows us to split the data into 2-year cycles for ensemble fitting. Since there is no data on susceptible individuals, we follow the methodology outlined by Horrocks and Bauch [32] to generate synthetic data for . We also replicate their data pre-processing by applying a Savitzky-Golay filter with a window length of 19 and polynomial order of 3 to smooth out . As an additional pre-processing step, we scale the derivative by a multiplicative factor of in order to improve OccamNet loss convergence on small regression outputs.

Dataset Output Standard deviation Top number Equalization
1952-1953 5 5 1
1954-1955 5 5 5
1956-1957 5 1 1
1958-1959 5 1 1
1960-1961 5 1 5
Ensemble 5 5 1
Table 10: OccamNet hyperparameters for measles data

Basis library:

b.6 Economic growth

b.6.1 Synthetic Solow-Swan model

Twenty synthetic “country” datasets are generated by simulating the following equations for 30 time steps:


For each of the 20 datasets, we randomly sample the constant parameters , , and as well as the initial conditions for the ODE system from the uniform distributions , , , , , and . The initial conditions for output are computed using the Cobb-Douglas-based assumption . We use OccamNet to fit which is computed using the centered derivative.

Output Standard deviation Top number Equalization
0.5 5 1
Table 11: OccamNet hyperparameters for synthetic Solow-Swan model

Basis library:

Unit checker: for units [USD (millions), capita]

b.6.2 OECD country capital growth

We use macroeconomic data compiled by Khoo et al. [35] to attempt to recover the Solow model of economic growth. Eighteen of the original 20 OECD countries are present in this dataset: Austria, Belgium, Canada, Denmark, France, Germany, Greece, Ireland, Italy, Netherlands, Norway, Portugal, Spain, Sweden, Switzerland, Turkey, United Kingdom, and United States. The time series , , , for each country have varying starting points as well as varying lengths ranging from 4 years to 39 years.

Output Standard deviation Top number Equalization Activation regularization weight Constant regularization weight
0.1 1 0 0 0
0.1 1 0 0 10
0.1 1 0 1 10
Table 12: OccamNet hyperparameters for OECD macroeconomic data

Basis library:

Unit checker: for units [USD (millions), capita]