Log In Sign Up

Augmenting astrophysical scaling relations with machine learning : application to reducing the SZ flux-mass scatter

by   Digvijay Wadekar, et al.

Complex systems (stars, supernovae, galaxies, and clusters) often exhibit low scatter relations between observable properties (e.g., luminosity, velocity dispersion, oscillation period, temperature). These scaling relations can illuminate the underlying physics and can provide observational tools for estimating masses and distances. Machine learning can provide a systematic way to search for new scaling relations (or for simple extensions to existing relations) in abstract high-dimensional parameter spaces. We use a machine learning tool called symbolic regression (SR), which models the patterns in a given dataset in the form of analytic equations. We focus on the Sunyaev-Zeldovich flux-cluster mass relation (Y_SZ-M), the scatter in which affects inference of cosmological parameters from cluster abundance data. Using SR on the data from the IllustrisTNG hydrodynamical simulation, we find a new proxy for cluster mass which combines Y_SZ and concentration of ionized gas (c_gas): M ∝ Y_conc^3/5≡ Y_SZ^3/5 (1-A c_gas). Y_conc reduces the scatter in the predicted M by ∼ 20-30 large clusters (M≳ 10^14 h^-1 M_⊙) at both high and low redshifts, as compared to using just Y_SZ. We show that the dependence on c_gas is linked to cores of clusters exhibiting larger scatter than their outskirts. Finally, we test Y_conc on clusters from simulations of the CAMELS project and show that Y_conc is robust against variations in cosmology, astrophysics, subgrid physics, and cosmic variance. Our results and methodology can be useful for accurate multiwavelength cluster mass estimation from current and upcoming CMB and X-ray surveys like ACT, SO, SPT, eROSITA and CMB-S4.


page 1

page 2

page 3

page 4


Finding universal relations in subhalo properties with artificial intelligence

We use a generic formalism designed to search for relations in high-dime...

A Preferential Attachment Model for the Stellar Initial Mass Function

Accurate specification of a likelihood function is becoming increasingly...

Mass Estimation of Galaxy Clusters with Deep Learning I: Sunyaev-Zel'dovich Effect

We present a new application of deep learning to infer the masses of gal...

The cumulative mass profile of the Milky Way as determined by globular cluster kinematics from Gaia DR2

We present new mass estimates and cumulative mass profiles (CMPs) with B...

Clusterplot: High-dimensional Cluster Visualization

We present Clusterplot, a multi-class high-dimensional data visualizatio...

Pooled variable scaling for cluster analysis

We propose a new approach for scaling prior to cluster analysis based on...

1 Introduction

Astrophysical scaling relations are simple low-scatter relationships (generally power laws) between properties of astrophysical systems which hold over a wide range of parameter values. Some notable examples of such relations are the Phillips relation for supernovae (Phillips, 1993), the Color-Magnitude Relation, the Leavitt period luminosity relation for Cepheids (Leavitt & Pickering, 1912; Riess et al., 2016, 2019), the Tully Fisher relation (Tully & Fisher, 1977) and its baryonic analogue (McGaugh et al., 2000) for spiral galaxies, the Faber Jackson relation (Faber & Jackson, 1976), the Kormendy relation or the more general fundamental plane relation (Djorgovski & Davis, 1987; Dressler et al., 1987; Jorgensen et al., 1996; Sheth & Bernardi, 2012) for ellipticals, the Kennicutt–Schmidt law (Schmidt, 1959; Kennicutt, 1989), the black hole-bulge mass relation (Kormendy & Ho, 2013), among others. These relations have many applications, such as determining distances to galaxies or masses of clusters for performing cosmological measurements; inferring properties of massive black holes, galaxies and galaxy clusters; providing insights into galaxy formation and evolution; and so on. Note that many of these relations have been discovered phenomenologically—often by trial and error—from observational data/simulations, rather than being derived from first principles 111It is interesting to mention that, in some areas of physics, discovery of empirical relations has sometimes led to deep theoretical insights—take Kepler’s laws giving inspiration to Newtonian mechanics, or the Planck equation (also an empirical function fit) aiding the development of Quantum Mechanics..

Most of the scaling relations found in astrophysics till now are power-law relations which involve only two variables. A reason for this could just be that it is easy to visually identify two-parameter relations in a dataset. There could exist many low-scatter relations with three or more variables in existing data which have been overlooked as it can be tedious to identify such relations with manual data analysis. For instance, some of the popular two-parameter relationships were later shown to extend to three dimensions only by a more detailed subsequent analysis, e.g. the fundamental plane relationship. One of the traditional approaches to identify a high-dimensional hypersurface in a dataset is by looking at various 2D projection plots, see e.g. Hopkins et al. (2007). This approach, however, becomes increasingly difficult and time consuming with larger datasets.

Machine learning (ML) tools can provide a faster and a more systematic approach to search for low-scatter relationships in abstract high-dimensional parameter spaces. ML tools are increasingly useful as datasets available in astrophysics continue to grow in size due to advent of high-precision multi-wavelength surveys. A particularly useful ML tool to search for new scaling relations, or to find extensions to existing ones, is symbolic regression (SR). SR identifies equations with parsimonious combinations of input parameters that have the smallest scatter with the given quantity of interest.

SR, also known as automated equation discovery, has been studied for decades in the context of scientific discovery, including early work creating the “BACON” algorithm (Langley, 1977) and its later implementations including COPER (Kokar, 1986) and FAHRENHEIT/EF (Langley & Zytkow, 1989; Zembowicz & Żytkow, 1992). More recent work by Bongard & Lipson (2007); Schmidt & Lipson (2009) popularized SR for science, and introduced the software package Eureqa, which is a powerful (but proprietary) library still in use today. This preceded significant interest from the ML community in advancing fundamental search techniques, including Sahoo et al. (2018); Kusner et al. (2017); Brunton et al. (2016a); Lusch et al. (2018); Lange et al. (2020); Both et al. (2019); Rackauckas et al. (2020); Guimerà et al. (2020); Virgolin et al. (2021); Brunton et al. (2016b); Champion et al. (2019); Cranmer et al. (2020); Cranmer (2020); Cranmer et al. (2019). In parallel, these algorithms have been applied to a range of scientific problems, such as Delgado et al. (2021); Wadekar et al. (2020); Graham et al. (2013, 2012); Shao et al. (2021); Liu & Tegmark (2020); Wilstrup & Kasak (2021); Lemos et al. (2022); Butter et al. (2021); Gilpin (2021); Cranmer et al. (2020, 2021b, 2021a); Craven et al. (2021); Werner et al. (2021).

Figure 1: Various aspects of the trade-offs between machine learning (ML) techniques. Symbolic regression can robustly be applied to datasets with only 10000 data points, each with

10 parameters. On the other hand, it can provide analytic equations that are readily interpretable and generalizable. We first use a decision-tree based approach called random forest regressor to narrow down the set of parameters that impact the scatter in the

- relation. We then implement symbolic regression to find an analytic form for a cluster mass proxy using the pre-selected parameters.

In order to put SR in context, we illustrate tradeoffs in available ML tools along various dimensions in Fig. 1

. Deep learning tools like neural networks can handle very high dimensional inputs and large datasets, but are the least interpretable. SR lies on the opposite side of this spectrum: as of today, SR can be applied to datasets with only

10,000 data points, each with 10 parameters. One must therefore simplify the problem or at times subsample the data in order to use SR on it. We follow the approach of Wadekar et al. (2020), where we first reduce the dimensionality of our dataset using a decision-tree based approach called a random forest regressor and then apply SR on it. Using the minimum set of relevant variables as input to SR is important to speed up its search for optimal equations.

We will focus on applying SR to find accurate expressions that relate properties of galaxy clusters to their masses. Galaxy clusters are the most massive bound structures in the Universe and their abundance as a function of mass is a very sensitive probe of cosmology (Planck Collaboration et al., 2013; Ade et al., 2016a, b; Hasselfield et al., 2013; Hilton et al., 2021; Bocquet et al., 2015, 2019; Palmese et al., 2020). In the 2020s, many ongoing and upcoming surveys (e.g., Rubin observatory, DES, DESI, ACT, eROSITA, SO, CMB-S4) will provide a wealth of multi-wavelength data on clusters. If we can obtain robust mass estimates for these clusters from this data, we will be able to put very strong constraints on the nature of dark energy and neutrino masses (Allen et al., 2011; Sehgal et al., 2011; Planck Collaboration et al., 2016; Bocquet et al., 2019). Cluster masses are typically inferred from properties easily measurable in observational surveys. For example, CMB surveys use the integrated electron pressure () via the mass-observable power-law relationship222In practice, the power-law exponent is calibrated with observational data, however, the actual fitted values are fairly close to 3/5, which is the prediction from a self-similar model, see section 3 for further details.: (the observable properties thus used are referred to as ‘mass proxies’). The scatter in these relationships affects the accuracy to which the masses—and thereby the cosmological parameters—can be inferred. Therefore, an important property of a mass proxy is that the scatter in its relation with mass should be small.

A combination of observable properties (sometimes measured in different surveys) could sometimes provide a lower scatter mass proxy. For example, X-ray studies show that the product of gas mass, and gas temperature, , provides a lower scatter proxy than X-ray Luminosity, gas mass or temperature: (see Kravtsov et al. (2006)

). Recently, it has become possible to measure numerous properties of clusters: cluster electron pressure with SZ surveys, gas density and temperature profiles with X-ray surveys, density profiles with weak lensing surveys, spectra and color of galaxies in optical surveys, and diffuse synchrotron flux in radio surveys. In order to construct an optimal mass proxy from these, one encounters the curse of dimensionality:

which particular properties in this large set to combine together? what functional form should be used to fit the combination?

ML methods can be useful for such problems. It is worth mentioning that there have been many recent ML motivated approaches to estimate cluster masses: (Cohn & Battaglia, 2020; Ntampaka et al., 2015, 2019; Ho et al., 2019; Kodi Ramanah et al., 2020b; Cranmer et al., 2020; Kodi Ramanah et al., 2021; Gupta & Reichardt, 2020a, b; Su et al., 2020; Yan et al., 2020; Villanueva-Domingo et al., 2021). Our goal in this paper is to model by approximating the following function


with ML tools like random forests and symbolic regressors. is the set of various observable properties from multi-wavelength cluster surveys (e.g., gas mass, gas profile, richness, galaxy colors). As clusters are non-linear objects, there are no obvious first principles predictions for which properties in should contribute. Furthermore, the high dimensionality of makes this a complex and challenging problem for traditional methods.

The paper is organized as follows. In Section 2, we briefly describe the cluster data that we use from various hydrodynamical simulations. In Section 3, we present an overview of mass proxies. We then discuss an overview of our ML techniques in Section 4 and show the results for cluster mass prediction in Section 5. We describe our reasoning behind using cluster concentration in Section 6, and we conclude in Section 7.

2 Cluster data and properties

In this section, we provide a brief description of the cluster data that we employ in our analysis. We use the TNG300-1 simulation (hereafter TNG300) produced by the IllustrisTNG collaboration (Nelson et al., 2019; Pillepich et al., 2018a; Springel et al., 2018; Nelson et al., 2018; Naiman et al., 2018; Marinacci et al., 2018; Pillepich et al., 2018b; Weinberger et al., 2017)333IllustrisTNG:, which is run with the moving mesh AREPO code (Springel, 2010; Weinberger et al., 2020). We use the cluster samples from two different snapshots at redshifts in our study.

We also use clusters from the CAMELS suite of simulations (Villaescusa-Navarro et al., 2021a; Villaescusa-Navarro et al., 2021b)444CAMELS:, which consists of more than 2,000 hydrodynamic simulations (each simulation box has length 25 ) run with different baryonic feedback and cosmological parameters, and with varying initial random seeds. CAMELS contain two distinct simulation suites, depending on the code used to solve the hydrodynamic equations and the subgrid model implemented: CAMELS-SIMBA, based on the GIZMO code (Hopkins, 2015, 2017) employing the same sub-grid model as the flagship SIMBA simulation (Davé et al., 2019); CAMELS-TNG, based on the AREPO code employing the same sub-grid model as the flagship IllustrisTNG simulations. Let us provide one example to highlight the substantial differences in these models: AGN feedback is implemented considering Bondi accretion and spherical symmetry in IllustrisTNG (Weinberger et al., 2018); while SIMBA implements gravitational torque accretion of cold gas and collimated outflows and jets from AGN (Anglés-Alcázar et al., 2017). We use clusters in the snapshots of the latin hypercube set for our analysis. (See Villaescusa-Navarro et al. (2021a) for further details on the CAMELS simulations.)

For all the simulations, we choose the centers of clusters to be the locations of the minimum gravitational potential within the FOF (friends of friends) volume. We will use the 3D data of clusters in this paper (in reality, however, projected properties, instead of 3D, are measured in surveys; we will test our results for that case in a future study). We use the boundary to define the cluster radii555 is the radius enclosing an overdensity with respect to the critical density of the Universe. throughout this paper. is the mass of all the particles (dark matter, gas, stars and black holes) within of the center of the halo. Let us now discuss the cluster properties we use in our study.

(i) Integrated electron pressure: CMB photons are scattered by high energy electrons in the plasma inside clusters due to inverse Compton scattering. This phenomenon is knows as the thermal Sunyaev-Zeldovich (tSZ) effect and it induces a shift in the energy of the scattered CMB photons (Sunyaev & Zeldovich, 1970). Such a shift is typically parameterized by the integrated Compton- parameter () and can be directly measured in SZ surveys. We measure a 3D analogue of it in simulations, as given by


where is the Thomson cross section, is the electron mass, is the electron pressure and is the speed of light. Note that we use the group_particles code666 to obtain (and most other properties mentioned in this section) from the simulation data.

(ii) Ionized gas mass: We calculate the cluster ionized gas mass () as


where is the free electron number density profile, is the primordial neutral hydrogen fraction, and is the proton mass. Note that we derive from the electron density profile of a cluster in order to mimic the measurements from X-ray surveys (where is derived by de-projecting of X-ray surface brightness profiles (Voevodkin & Vikhlinin, 2004; Croston et al., 2006)).

(iii) Cluster concentration: We use different versions of the cluster concentration in this paper. For the main results, we use concentration corresponding to the gas profile: . We also perform additional cross-checks using the concentration obtained by fitting an NFW profile to the halos. In particular, we use ( is the virial radius and is the Klypin scale radius (Klypin et al., 2011) corresponding to the largest subhalo in the halo) measurements by Gabrielpillai et al. (2021), which were obtained by running the Rockstar code (Behroozi et al., 2013) on the TNG300 halos.

(iv) Stellar mass: We calculate by summing over of the masses of all the star particles within . Note that this quantity represents thus the total stellar mass in the cluster, not the stellar mass of the central galaxy.

(v) Cluster triaxiality:

We generally expect clusters to be triaxial since they are formed by accretion along filaments that impose tidal gravitational forces upon the forming clusters. We first calculate the moment of inertia tensor using


where is the coordinate of the center-of-mass of the cluster and is the particle mass (we only use the particles within  of the cluster center in our calculations). We calculate in two different ways: first, using all particle types (gas+stars+DM+black holes); second, using only the gas particles. We then calculate the triaxiality of the cluster as where

are eigenvalues of

ordered as . We also check our results with a different definition of triaxiality: .

(vi) Cluster richness: The richness of a cluster is the number of galaxies associated with it. We select the galaxies using the threshold and by requiring the centers of the galaxies to be within  of the cluster center. At , this threshold yields a number density of galaxies in the simulation sample of .

3 Mass proxies

Figure 2: - scaling relation in clusters from the TNG300 simulation at ( [] is the integrated Compton- parameter [cluster mass] within ). The self-similar power-law scaling relation normalized to the most massive halos is shown by the dotted green line. The goal of this paper is to improve this scaling relation in order to reduce its scatter and infer cluster masses more accurately.

Simple models of clusters based on the virial theorem (which assumes that the only source of energy input into the intra-cluster medium is gravitational) predict nearly self-similar relations between halo mass and various dynamic properties (Kaiser, 1986; Kravtsov & Borgani, 2012). For example, the scaling relation between cluster masses and temperature is given by


where for a flat Universe. Note that the temperature also depends on the value of (the overdensity with respect to the critical density of the Universe used for defining the cluster boundary); we have absorbed this dependence under the proportionality sign. The scaling relation for the gas mass of a cluster is simply . Using Eqs. 2 and 5, one can write a scaling relation for the integrated Compton parameter given by


Scaling relations like these help in determining various possible proxies of cluster mass (e.g. ).

In addition to being motivated by idealized scaling relations, a mass proxy should have additional properties: Robustness: it should be largely insensitive to limitations in our understanding of clusters, baryonic feedback effects, or their merger history, Accuracy: it should have small and well-characterized scatter in the relation with mass, and Low cost: it should be observationally inexpensive in order to be applied for mass prediction of thousands of clusters.

satisfies all the aforementioned requirements. Electron pressure is relatively insensitive to the dynamical history of a cluster (Motl et al., 2005; Nagai, 2006). for clusters is also remarkably insensitive to baryonic physics like AGN feedback or radiative cooling (Battaglia et al. (2012); Stanek et al. (2010); Arnaud et al. (2010); Fabjan et al. (2011))). The relation can be calibrated using two types of gravitational lensing measurements: CMB lensing measurements (which offer the advantage of a very well determined distance to the source plane) (Hu et al., 2007; Baxter et al., 2015; Geach & Peacock, 2017; Madhavacheril et al., 2020), and optical weak lensing surveys (which provide higher S/N measurements for individual clusters) (Hoekstra et al., 2013; von der Linden et al., 2014; Battaglia et al., 2016; Medezinski et al., 2018; Schrabback et al., 2018; Miyatake et al., 2019). Analogues of have therefore been used for cluster mass estimation in CMB surveys like Planck (Planck Collaboration et al., 2013; Ade et al., 2016a, b), ACT (Hasselfield et al., 2013; Hilton et al., 2021), and SPT (Bocquet et al., 2015, 2019). An analogue of called is also used in X-ray surveys for mass estimation (Kravtsov et al., 2006; Arnaud et al., 2010). For a comprehensive review of the relation, see Battaglia et al. (2012).)

We show the relation for TNG300 clusters in Figure 2 (see Eq. 2 for the definition of ). For comparison, we also show the performance of other mass proxies like and cluster richness in Fig. 11, and leave their further discussion to Appendix A.1. For a large region of parameter space in Fig. 2, the clusters closely follow the self-similar scaling relation777A perceptive reader would notice that there is a deviation/break from the power law relation in Figure 2 for low mass clusters. This is because gas in the cluster gets ejected at low masses since the gravitational potential wells are comparatively shallower (Lovell et al., 2018; Hill et al., 2018; Le Brun et al., 2015; Greco et al., 2015; Pandey et al., 2021). We only focus on high mass clusters in this paper as only those are typically used in cosmological analyses. We leave modeling the deviations from self-similarity to a separate upcoming paper (Wadekar et al., 2021). with low scatter. Reducing the scatter further is imperative as the uncertainty in the mass-observable relation is currently the largest systematic uncertainty in cosmological analyses of galaxy clusters.

As we can see from Figure 2, is a very good first approximation; we therefore train our ML models to approximate the following function based on the residuals:


In this way, we incorporate the domain knowledge (in our case the already well-established leading-order cluster physics) and use ML only to learn extensions to it.

Figure 3: Results from prediction of cluster masses with a random forest regressor (RF). Scatter in the predicted mass using the traditional Y-M relation (RF) is in the top (middle) panel. The bottom panel shows the effect of different sets of input parameters on the mass prediction. is the concentration of gas from Eq. 10, is the NFW concentration, () is the stellar (gas) mass within . We do not compare the scatter for the high-mass end as there are very few halos available to calculate the scatter robustly. Overall, the RF improves mass prediction by % as compared to the traditional scaling relation method.
Figure 4: Same as Figure 3, but when the mass prediction is made using expressions from symbolic regression. The middle two panels show our two best results from Eq. 9 & Eq. 11, and the bottom panel compares the scatter in the predicted mass. We label the mass proxy in the second from top panel as , and it reduces the scatter by for . Introducing the term (1) effectively down-weights the cluster cores in comparison to their outskirts (the cluster cores are relatively much noisier). Overall, the improvement seems to get better with increasing masses, however more cluster data in the high mass end is needed to be certain.

4 Machine learning techniques

We now continue our discussion of machine learning (ML) techniques from Section 1. In figure 1, we had compared the ML techniques along two particular dimensions. Deep neural networks (DNNs) are on one extreme: they can work with very high dimensional datasets or datasets with large sizes. There also have been many interesting applications of DNNs to cosmology (see e.g., He et al. (2019); Wadekar et al. (2021); Zhang et al. (2019); Giusarma et al. (2019); Kreisch et al. (2021); Yip et al. (2019); Zamudio-Fernandez et al. (2019); Modi et al. (2018); Kodi Ramanah et al. (2020a); Tröster et al. (2019); Thiele et al. (2020); Cranmer et al. (2020, 2020, 2021a); Thiele et al. (2021); Li et al. (2020); Berger & Stein (2019); Horowitz et al. (2021); Villaescusa-Navarro et al. (2021c, d); Lu et al. (2021); Li et al. (2021); Ni et al. (2021)). However, DNNs are notoriously difficult to interpret due to the high-dimensional parameter space of the model (typically parameters). Furthermore, DNNs typically require very large datasets to train, whereas in our case, we only have 200 clusters with in the TNG300 sample. We therefore used the two techniques detailed below, both of which usually have better performance on small datasets.

4.1 Random forest

We use the random forest regressor (RF) from the publicly available package Scikit-Learn888Random forest: (Pedregosa et al., 2011; Breiman, 2001)

. RFs are relatively less expensive than DNNs; they are also much faster to train as typically very few hyperparameters need to be tuned. In order to check whether the results from the RF are robust to overfitting, we divide the data into two categories: we use a sub-sample containing

% of the clusters to train the RF, and the rest are used in testing the RF. We show the results from the test set later in section 5.1.

Figure 5: Same as Figure 4 but for halos in the CAMELS simulation suite instead of TNG300. As CAMELS includes variations in the baryonic feedback prescriptions in the hydrodynamic simulations, cosmological parameters and simulation initial seeds, the improvement upon using is robust against these changes. Note also for CAMELS-SIMBA that not only reduces the scatter, but also reduces the deviation from a power law for low .
Figure 6: Same as Figure 4 but when the cores of the clusters are excised from the calculation of the integrated electron pressure. We see a roughly similar scatter reduction as in Figure 4. Directly excising the cores in upcoming CMB surveys is difficult because of their low resolution, hence using is beneficial.
Figure 7: Dependence of scatter with radius in the electron pressure profile () of clusters. We use the clusters from TNG300 in the specified mass range and show the mean and the 1 region of the pressure profile scaled by the cluster mass. The profiles have a large scatter in the innermost regions (cores), while the outer regions (until ) are relatively well equilibrated.

4.2 Symbolic regression

Symbolic regression (SR) is a technique that approximates the relation between an input and an output through analytic mathematical formulae. The advantage of using SR over other machine learning regression models is that it provides analytic expressions which can be readily generalized, and which facilitate the understanding of the underlying physics. One of the downsides of SR, however, is that the dimensionality of the input space needs to be relatively small. To overcome this, we first use the RF to obtain an indication of which parameters in the set of in Eq. 7 give the most accurate . We then compress the set to include only the five most important parameters. Finally, we use SR on the compressed set to obtain an explicit functional form to approximate from Eq. 7

. We use the symbolic regressor based on genetic programming implemented in the publicly available

PySR package999PySR: (Cranmer, 2020; Cranmer et al., 2020). We leave further details on the RF and SR to Appendix B.

It is worth mentioning that SR has been used in various other astrophysical applications: modeling assembly bias (Wadekar et al., 2020; Delgado et al., 2021); inferring universal subhalo properties (Shao et al., 2021); modeling the concentration of dark matter from the mass distribution of nearby cosmic structures (Cranmer et al., 2020)

; finding analytic forms of the one-point probability distribution function for neutrino-density fluctuations

(Bernal et al., 2021); modeling the SFR density as a function of cosmological and astrophysical feedback parameters (Villaescusa-Navarro et al., 2021a).

5 Results for - scatter

Figure 8: Top: Dependence of the - relation on gas concentration (), NFW concentration () and stellar to gas mass ratio (. The dashed lines show the mean of the scatter. Higher (or ) is related to increase in the density of the ionized gas and can be a result of more radiative cooling (which in turn increases ). Higher implies more gas being converted into stars, and can therefore be associated with a decrease in . Bottom: Using instead of , for which the mean trends are comparatively weaker.

In this section, we compare the performance of the ML methods against using the standard - relation by calculating the scatter in the predicted mass:


for each mass bin , which contains clusters (we used uniformly-spaced bins in log-space). We will focus on clusters with in the rest of the paper as low-mass clusters are not generally used for probing cosmology (they are too affected by astrophysical effects to be used for cosmological studies and also have have SZ and X-ray luminosities).

5.1 Results from the random forest

We train the RF regressor using various cluster properties from Section 2 and show results in Fig. 3; we show the relative decrease in scatter in the mass prediction as compared to using the - scaling relation (the improvement is % for the best-case scenario).

We also used cluster richness and triaxiality as input to the RF but did not notice any improvement in our results (note that this is in agreement with the finding of Battaglia et al. (2012)); we therefore do not show lines corresponding to them in Fig. 3. We also tried using other galaxy properties (e.g., color of the brightest cluster galaxy), but we did not find any improvement in the scatter prediction.

5.2 Symbolic regression

Using the RF, we identified that the parameters: , and have the largest effect on the mass prediction. We now train the symbolic regressor to model the function in Eq. 7 using these properties and obtain the results shown in Fig. 4. Our main result of the paper is the following mass proxy which improves the cluster mass prediction as compared to using the standard - relation:


where is related to the concentration of the halo gas profile and is given by


where is given by Eq. 3. is a dimensionless parameter and we obtain the best-fit value for the TNG300 sample (we generally expect ). (See Section 6 for a physical explanation behind the better performance of .) We also obtained the following mass proxy:


where is another dimensionless constant (the best-fit value is used in the figure but we generally expect to be ). However, can only be estimated to within a factor of 2 accuracy with the current galaxy surveys (see e.g., Palmese et al. (2020)). This is much larger than the 10% accuracy required in our case. We will therefore use from Equation 9 as our main result for the rest of the paper. As an additional check, we let the power law index on  to deviate slightly from 3/5, and we obtain a mildly better performance as shown in Figure 9.

Note that we also obtained more complex equations than Eqs. 911 as outputs from SR. However, given the large scatter already present in clusters from TNG300, the risk of overfitting goes up with increasing equation complexity. Hence, we show only the simplest expressions which have a relatively good performance.

Due to the lack of clusters in the high-mass end of the TNG300 simulation, we are unable to compare the scatter between the different models. Cosmological simulations with a larger number of high-mass clusters (e.g., MillleniumTNG), or zoom-in simulations focused on high-mass clusters (e.g. Thiele et al. (2020)) would be valuable to test our results. Generally, we expect results from machine learning algorithms to improve with a larger training dataset.

5.3 Tests with CAMELS simulations

Until this point, we showed results corresponding to the TNG300 simulation which uses a particular configuration of baryonic feedback parameters and a fixed cosmological model. However, the true nature of feedback in the Universe can be different and we therefore want to test if the mass proxy is robust to changes in feedback prescriptions. We therefore use the CAMELS suite of simulations which have varying cosmological and astrophysical feedback parameters, as well as varying initial conditions. We show our results for clusters in Figure 5.

It is quite interesting that consistently outperforms even when the feedback prescriptions in the simulations are very different. Note that we did not retrain the symbolic regressor using the CAMELS dataset, we merely used Eq. 9 and adjusted the constant to optimize our results. We found that using a larger constant for CAMELS-SIMBA works better than using which was obtained for TNG300 (for CAMELS-TNG, however, the same constant: gives optimal results). This difference could be related to the scatter in the cores of SIMBA clusters being larger; we will return to this point in section 6.1. It is worth mentioning that the CAMELS simulations have a small box size (25 ) and there are very few high-mass clusters in the entire sample. It will be useful to check our results on the next iteration of the CAMELS simulations which will contain many more high-mass clusters.

6 Discussion

6.1 Dependence on concentration

Having shown our results, let us now discuss some physical reasons behind the improvement in cluster mass prediction by taking into account concentration. For , the term (1) contributes towards effectively down-weighting the cluster cores in comparison to their outskirts. Downweighting/excising the central regions is desirable because observed cluster profiles show a greater degree of similarity outside the core (Vikhlinin et al., 2006; Arnaud et al., 2010; Kravtsov & Borgani, 2012). To verify this, we show in Figure 6 that the scatter in predicted mass is reduced when cluster cores are explicity excised from the calculation of  (Fig. 6 is for the TNG300 clusters, while the comparison with CAMELS clusters is shown in Fig. 10).

Another way of verifying our results is to show the scatter in the pressure profile as a function of radius in the TNG300 clusters in Figure 7 (see also figure 7 of Arnaud et al. (2010) for comparison of pressure profile measurements from XMM-Newton). Note that the cores are the regions of clusters which are the most sensitive to non-gravitational processes like radiative cooling and AGN feedback. Furthermore, simulations so far have not been able to convincingly reproduce the observed thermal structure of cool cores (see Kravtsov & Borgani (2012)), and the observed scatter in cluster cores could be larger than that predicted in simulations (Arnaud et al., 2010). Given that at least partly corrects for the cluster core effects, we expect it to perform better in case the scatter in cluster cores is larger. We also expect our method to work better in case is used instead of as the contribution from cluster cores is relatively larger for .

We explicitly show the dependence - relation on and in the top panel of Figure 8 (we also show the dependence on NFW concentration). The bottom panel shows that corrects for a major part of the systematic bias from , which is, at least partly responsible for the improvement in the cluster mass prediction due to it.

6.2 Combining SZ and X-ray observations

Data from X-ray surveys and SZ surveys can provide complementary information. The advantage of X-ray surveys over SZ surveys is their higher resolution. On the other hand, the disadvantage is that they probe the cluster thermal energy indirectly (assumptions about the gas density and temperature profiles are needed to estimate the integrated pressure in X-ray surveys, whereas it is directly measured in SZ surveys). Using enables one to exploit this complementary behavior.

There are other advantages of combining SZ and X-ray surveys. Cross-calibration across different wavelength measurements generally helps in minimizing the possible systematics in individual measurements such as projection effects (see e.g., Menanteau et al. (2010)). Sometimes,

reported by SZ surveys use an X-ray-derived estimate of the aperture size (as the cluster radii could be poorly measured by SZ surveys alone). X-ray and SZ surveys have different redshift dependence: the selection function of SZ surveys flattens towards higher redshifts, while X-ray surveys favour low-redshift systems. Combination of SZ and X-ray data can also help in removing outliers (e.g., recently merged clusters which deviate from the power-law relationship) and further tighten the

- relation (Yang et al., 2010); we leave an analysis of removing outliers with TNG300 and CAMELS clusters to a future study.

6.3 Comparison with previous literature

Let us briefly mention some other proposals in the literature for augmenting the - relation. Including information about the SZ radial profile (instead of just using the cumulative SZ signal) can also help tighten the - relation (Afshordi, 2008) (see also Verde et al. (2002)). However, the upcoming surveys SO and CMB-S4 would not obtain high-resolution observations of cluster profiles101010Looking further into the future, CMB-HD could provide such high-resolution observations. We have checked that using instead of in Eq. 9 also results in a similar improvement in the mass prediction (in case full cluster pressure profile information is available, other ML tools like deep sets can be used to obtain even more accurate mass predictions).. Shaw et al. (2008) have noted that the NFW concentration can have an impact on the scatter in the - relation. Yang et al. (2010) propose augmenting - with a different form of cluster concentration: . However, measuring this quantity requires high-resolution weak lensing data and this approach is therefore too expensive to be applied to a large number of clusters. Our analysis provides a way of augmenting the - relation with properties that can be easily measured in observational surveys.

7 Conclusions

Astrophysical scaling relations have a number of applications in inferring properties of galaxies, supernovae, black holes, stars and clusters. With the upcoming high-precision astronomical surveys, it is imperative to find ways to augment the existing scaling relations in order to make them more accurate. Machine learning can provide a principled approach to search for extensions to scaling relations in abstract high-dimensional parameter spaces.

We focused on searching for augmentations to the widely-used scaling relation in order to make mass prediction of galaxy clusters more accurate. We first used a random forest regressor to search for a subset of parameters which give the most improvement in the cluster mass prediction (Fig. 3). We consequently used symbolic regression and found a new mass proxy which combines and gas concentration (): . reduces the scatter in the mass prediction by % for large clusters () at both high and low redshifts (Fig. 4). The new proxy exploits the complementary behavior of X-ray (high resolution but indirect probe of cluster thermal energy) and SZ (low resolution but direct probe of thermal energy) surveys.

We verified that is robust against changes in both feedback parameters and subgrid physics by testing it with the CAMELS suite of simulations (Fig. 5). The dependence of on is likely due to the cores of clusters being noisier (Fig. 7), and we verify this explicitly by excising the cores of clusters (Fig. 6). Our results and methodology can be useful for accurate multiwavelength cluster mass estimation from current and upcoming CMB and X-ray surveys like ACT, SO, eROSITA and CMB-S4.

Future work: We use three dimensional cluster information (e.g., ) in this paper; but, in reality, projected properties of clusters (e.g., ) are measured in surveys; we will try to test our results for that case in a future study. We focused on the scatter in the - relation for high mass halos here, but we model the break in the - power law at low halo masses using symbolic regression in a separate upcoming paper (Wadekar et al., 2021). We could not robustly test for very high mass clusters () due to lack of statistics, but we hope to do so using zoom-in simulations focused on high mass clusters, along the lines of those used in Thiele et al. (2020).

As cluster observations improve, we will be able to use ML techniques directly on observed quantities and find the lowest scatter relations between lensing masses, microwave and X-ray observables. Our methodology could also be useful for reducing scatter in other widely-used astrophysical scaling relations for exoplanets, stars, supernovae, galaxies and clusters.

We thank Daisuke Nagai, Nadia Zakamska, Matias Zaldarriaga, Niayesh Afshordi, Suzanne Staggs, and Abhishek Maniyar for fruitful discussions. DW gratefully acknowledges support from the Friends of the Institute for Advanced Study Membership. FVN acknowledges funding from the WFIRST program through NNG26PJ30C and NNN12AA01C. DAA was supported in part by NSF grants AST-2009687 and AST-2108944. The work of SH is supported by Center for Computational Astrophysics of the Flatiron Institute in New York City. The Flatiron Institute is supported by the Simons Foundation. JCH acknowledges support from NSF grant AST-2108536. We also thank Boryana Hadzhiyska, Will Coulton and Rachel Somerville for help with the ROCKSTAR catalogs corresponding to TNG halos. We have used the libraries at ( to carry out the analysis of clusters in the simulations. The scripts and the data used in this project will be made publicly available upon the acceptance of this paper.

Appendix A Supplementary plots

Figure 9: Same as Fig. 4 but when the power law slope is allowed to vary from the standard self-similar model. Using a form similar to gives a slightly more reduction in scatter as compared to Fig. 4.
Figure 10: Same as Figure 6 but for clusters in the CAMELS simulation suite instead of TNG300. We see a similar reduction in scatter once the cores of these clusters are excised.
Figure 11: Similar to Fig. 2 but for scaling relations for other proxies of halo mass: from X-ray and richness from galaxy photometric surveys. The cluster data is from the TNG300 simulation at . The power-law scaling relation normalized to the most massive halos is shown by the dotted green line.

a.1 Additional mass proxies

In Fig. 2, we showed the performance of  as a mass proxy. In Fig. 11, we compare the performance of two other mass proxies (see Section 2 for their definitions). We see that the cluster richness has a much larger scatter than , which makes the cluster mass estimation relatively less accurate. One additional issue in using cluster richness from galaxy photometric observations is that, because of the presence of background galaxies, it is not possible to state with absolute confidence that any given galaxy belongs to a given cluster. , on the other hand, also has a low scatter similar to . However, the deviation from a power law relation is much larger than , and the break from power law occurs at a higher halo mass.

Appendix B Details on machine learning techniques

b.1 Symbolic regression

Symbolic regression (SR) is a tool that searches the space of mathematical expressions to find the best equation that fits the data. The difference between it and ordinary “least squares” regression is that knowledge of the underlying functional form of the fitting function is not required a priori. Let us briefly describe the procedure to fit a function (e.g., Eq. 

7) with the PySR package. First we specify the relevant input parameters (e.g., ) and the operations (e.g., sum (), multiplication(), exponential or sin). Using genetic programming, the SR then generates multiple iterations of formulae (e.g., ), and outputs a final list of equations that have the lowest mean squared error when compared to the data. The equations in the final list are ranked on the basis of their complexity (more complex operations like exponentials are penalized over standard ones like ).

In the final list of formulae obtained from PySR, we choose the simplest ones to compare in Fig. 4. Note also that dimensionality of the input dataset needs to be relatively small in order for the use of SR to be feasible, as the cost scales exponentially with the number of input parameters and operators.

b.2 Random forest

A random forest regressor (RF) is a collection of decision trees; each tree is in itself a regression model and is trained on a different random subset of the training data. The output from a RF is the mean of the predictions from the individual trees. Note that a single decision tree is prone to overfitting and using the ensemble mean of the different trees helps to reduce overfitting. RFs have been used for various applications in astrophysics: Agarwal et al. (2018); Lucie-Smith et al. (2018); Moster et al. (2020); Nadler et al. (2018); Cohn & Battaglia (2020); Mucesh et al. (2020). As they allow one to easily infer the relative importance of each input feature, they are much better suited with regards to interpretability as compared to deep neural networks.


  • Ade et al. (2016a) Ade, P. A. R., et al. 2016a, Astron. Astrophys., 594, A24, doi: 10.1051/0004-6361/201525833
  • Ade et al. (2016b) —. 2016b, Astron. Astrophys., 594, A27, doi: 10.1051/0004-6361/201525823
  • Afshordi (2008) Afshordi, N. 2008, ApJ, 686, 201, doi: 10.1086/591307
  • Agarwal et al. (2018) Agarwal, S., Davé, R., & Bassett, B. A. 2018, MNRAS, 478, 3410, doi: 10.1093/mnras/sty1169
  • Allen et al. (2011) Allen, S. W., Evrard, A. E., & Mantz, A. B. 2011, ARA&A, 49, 409, doi: 10.1146/annurev-astro-081710-102514
  • Anglés-Alcázar et al. (2017) Anglés-Alcázar, D., Davé, R., Faucher-Giguère, C.-A., Özel, F., & Hopkins, P. F. 2017, MNRAS, 464, 2840, doi: 10.1093/mnras/stw2565
  • Arnaud et al. (2010) Arnaud, M., Pratt, G. W., Piffaretti, R., et al. 2010, A&A, 517, A92, doi: 10.1051/0004-6361/200913416
  • Battaglia et al. (2012) Battaglia, N., Bond, J. R., Pfrommer, C., & Sievers, J. L. 2012, ApJ, 758, 74, doi: 10.1088/0004-637X/758/2/74
  • Battaglia et al. (2016) Battaglia, N., Leauthaud, A., Miyatake, H., et al. 2016, J. Cosmology Astropart. Phys, 2016, 013, doi: 10.1088/1475-7516/2016/08/013
  • Baxter et al. (2015) Baxter, E. J., Keisler, R., Dodelson, S., et al. 2015, ApJ, 806, 247, doi: 10.1088/0004-637X/806/2/247
  • Behroozi et al. (2013) Behroozi, P. S., Wechsler, R. H., & Wu, H.-Y. 2013, ApJ, 762, 109, doi: 10.1088/0004-637X/762/2/109
  • Berger & Stein (2019) Berger, P., & Stein, G. 2019, MNRAS, 482, 2861, doi: 10.1093/mnras/sty2949
  • Bernal et al. (2021) Bernal, J. L., Caputo, A., Villaescusa-Navarro, F., & Kamionkowski, M. 2021, Phys. Rev. Lett., 127, 131102, doi: 10.1103/PhysRevLett.127.131102
  • Bocquet et al. (2015) Bocquet, S., et al. 2015, Astrophys. J., 799, 214, doi: 10.1088/0004-637X/799/2/214
  • Bocquet et al. (2019) —. 2019, Astrophys. J., 878, 55, doi: 10.3847/1538-4357/ab1f10
  • Bocquet et al. (2019) Bocquet, S., Dietrich, J. P., Schrabback, T., et al. 2019, ApJ, 878, 55, doi: 10.3847/1538-4357/ab1f10
  • Bongard & Lipson (2007) Bongard, J., & Lipson, H. 2007, Proceedings of the National Academy of Science, 104, 9943, doi: 10.1073/pnas.0609476104
  • Both et al. (2019) Both, G.-J., Choudhury, S., Sens, P., & Kusters, R. 2019, DeepMoD: Deep learning for Model Discovery in noisy data.
  • Breiman (2001) Breiman, L. 2001, Mach. Learn., 45, 5, doi: 10.1023/A:1010933404324
  • Brunton et al. (2016a) Brunton, S. L., Proctor, J. L., & Kutz, J. N. 2016a, Proceedings of the national academy of sciences, 113, 3932
  • Brunton et al. (2016b) —. 2016b, Proceedings of the National Academy of Sciences, 113, 3932, doi: 10.1073/pnas.1517384113
  • Butter et al. (2021) Butter, A., Plehn, T., Soybelman, N., & Brehmer, J. 2021.
  • Champion et al. (2019) Champion, K., Lusch, B., Kutz, J. N., & Brunton, S. L. 2019, arXiv e-prints, arXiv:1904.02107.
  • Cohn & Battaglia (2020) Cohn, J. D., & Battaglia, N. 2020, MNRAS, 491, 1575, doi: 10.1093/mnras/stz3087
  • Cranmer (2020) Cranmer, M. 2020, PySR: Fast & Parallelized Symbolic Regression in Python/Julia, Zenodo, doi: 10.5281/zenodo.4052869
  • Cranmer et al. (2021a) Cranmer, M., Kreisch, C., Pisani, A., et al. 2021a, ICLR 2021 SimDL Workshop, 10
  • Cranmer et al. (2020) Cranmer, M., Sanchez-Gonzalez, A., Battaglia, P., et al. 2020, NeurIPS, arXiv:2006.11287.
  • Cranmer et al. (2020) Cranmer, M., Sanchez-Gonzalez, A., Battaglia, P., et al. 2020.
  • Cranmer et al. (2021b) Cranmer, M., Cui, C., Fielding, D. B., et al. 2021b, Workshop on Sparse Neural Networks, 7
  • Cranmer et al. (2019) Cranmer, M. D., Xu, R., Battaglia, P., & Ho, S. 2019, NeurIPS Workshop on Physics and Machine Learning, arXiv:1909.05862.
  • Craven et al. (2021) Craven, J., Jejjala, V., & Kar, A. 2021, 2021, 40, doi: 10.1007/JHEP06(2021)040
  • Croston et al. (2006) Croston, J. H., Arnaud, M., Pointecouteau, E., & Pratt, G. W. 2006, A&A, 459, 1007, doi: 10.1051/0004-6361:20065795
  • Davé et al. (2019) Davé, R., Anglés-Alcázar, D., Narayanan, D., et al. 2019, MNRAS, 486, 2827, doi: 10.1093/mnras/stz937
  • Delgado et al. (2021) Delgado, A. M., Wadekar, D., Hadzhiyska, B., et al. 2021, arXiv e-prints, arXiv:2111.02422.
  • Djorgovski & Davis (1987) Djorgovski, S., & Davis, M. 1987, ApJ, 313, 59, doi: 10.1086/164948
  • Dressler et al. (1987) Dressler, A., Lynden-Bell, D., Burstein, D., et al. 1987, ApJ, 313, 42, doi: 10.1086/164947
  • Faber & Jackson (1976) Faber, S. M., & Jackson, R. E. 1976, ApJ, 204, 668, doi: 10.1086/154215
  • Fabjan et al. (2011) Fabjan, D., Borgani, S., Rasia, E., et al. 2011, MNRAS, 416, 801, doi: 10.1111/j.1365-2966.2011.18497.x
  • Gabrielpillai et al. (2021) Gabrielpillai, A., Somerville, R. S., Genel, S., et al. 2021, arXiv e-prints, arXiv:2111.03077.
  • Geach & Peacock (2017) Geach, J. E., & Peacock, J. A. 2017, Nature Astronomy, 1, 795, doi: 10.1038/s41550-017-0259-1
  • Gilpin (2021) Gilpin, W. 2021.
  • Giusarma et al. (2019) Giusarma, E., Reyes Hurtado, M., Villaescusa-Navarro, F., et al. 2019, arXiv e-prints, arXiv:1910.04255.
  • Graham et al. (2012) Graham, M. J., Djorgovski, S. G., Mahabal, A., et al. 2012, arXiv e-prints, arXiv:1208.2480.
  • Graham et al. (2013) Graham, M. J., Djorgovski, S. G., Mahabal, A. A., Donalek, C., & Drake, A. J. 2013, MNRAS, 431, 2371, doi: 10.1093/mnras/stt329
  • Greco et al. (2015) Greco, J. P., Hill, J. C., Spergel, D. N., & Battaglia, N. 2015, ApJ, 808, 151, doi: 10.1088/0004-637X/808/2/151
  • Guimerà et al. (2020) Guimerà, R., Reichardt, I., Aguilar-Mogas, A., et al. 2020, Science Advances, 6, eaav6971, doi: 10.1126/sciadv.aav6971
  • Gupta & Reichardt (2020a) Gupta, N., & Reichardt, C. L. 2020a, ApJ, 900, 110, doi: 10.3847/1538-4357/aba694
  • Gupta & Reichardt (2020b) —. 2020b, arXiv e-prints, arXiv:2005.13985.
  • Hasselfield et al. (2013) Hasselfield, M., et al. 2013, J. Cosmology Astropart. Phys, 2013, 008, doi: 10.1088/1475-7516/2013/07/008
  • He et al. (2019) He, S., Li, Y., Feng, Y., et al. 2019, Proceedings of the National Academy of Science, 116, 13825, doi: 10.1073/pnas.1821458116
  • Hill et al. (2018) Hill, J. C., Baxter, E. J., Lidz, A., Greco, J. P., & Jain, B. 2018, Phys. Rev. D, 97, 083501, doi: 10.1103/PhysRevD.97.083501
  • Hilton et al. (2021) Hilton, M., et al. 2021, ApJS, 253, 3, doi: 10.3847/1538-4365/abd023
  • Ho et al. (2019) Ho, M., Rau, M. M., Ntampaka, M., et al. 2019, ApJ, 887, 25, doi: 10.3847/1538-4357/ab4f82
  • Hoekstra et al. (2013) Hoekstra, H., Bartelmann, M., Dahle, H., et al. 2013, Space Sci. Rev., 177, 75, doi: 10.1007/s11214-013-9978-5
  • Hopkins (2015) Hopkins, P. F. 2015, MNRAS, 450, 53, doi: 10.1093/mnras/stv195
  • Hopkins (2017) —. 2017, arXiv e-prints, arXiv:1712.01294.
  • Hopkins et al. (2007) Hopkins, P. F., Hernquist, L., Cox, T. J., Robertson, B., & Krause, E. 2007, ApJ, 669, 67, doi: 10.1086/521601
  • Horowitz et al. (2021) Horowitz, B., Dornfest, M., Lukić, Z., & Harrington, P. 2021, arXiv e-prints, arXiv:2106.12675.
  • Hu et al. (2007) Hu, W., DeDeo, S., & Vale, C. 2007, New Journal of Physics, 9, 441, doi: 10.1088/1367-2630/9/12/441
  • Jorgensen et al. (1996) Jorgensen, I., Franx, M., & Kjaergaard, P. 1996, MNRAS, 280, 167, doi: 10.1093/mnras/280.1.167
  • Kaiser (1986) Kaiser, N. 1986, MNRAS, 222, 323, doi: 10.1093/mnras/222.2.323
  • Kennicutt (1989) Kennicutt, Robert C., J. 1989, ApJ, 344, 685, doi: 10.1086/167834
  • Klypin et al. (2011) Klypin, A. A., Trujillo-Gomez, S., & Primack, J. 2011, ApJ, 740, 102, doi: 10.1088/0004-637X/740/2/102
  • Kodi Ramanah et al. (2020a) Kodi Ramanah, D., Charnock, T., Villaescusa-Navarro, F., & Wandelt, B. D. 2020a, MNRAS, 495, 4227, doi: 10.1093/mnras/staa1428
  • Kodi Ramanah et al. (2020b) Kodi Ramanah, D., Wojtak, R., Ansari, Z., Gall, C., & Hjorth, J. 2020b, MNRAS, 499, 1985, doi: 10.1093/mnras/staa2886
  • Kodi Ramanah et al. (2021) Kodi Ramanah, D., Wojtak, R., & Arendse, N. 2021, MNRAS, 501, 4080, doi: 10.1093/mnras/staa3922
  • Kokar (1986) Kokar, M. 1986, Machine Learning, 1, 403, doi: 10.1023/A:1022818816206
  • Kormendy & Ho (2013) Kormendy, J., & Ho, L. C. 2013, ARA&A, 51, 511, doi: 10.1146/annurev-astro-082708-101811
  • Kravtsov & Borgani (2012) Kravtsov, A. V., & Borgani, S. 2012, ARA&A, 50, 353, doi: 10.1146/annurev-astro-081811-125502
  • Kravtsov et al. (2006) Kravtsov, A. V., Vikhlinin, A., & Nagai, D. 2006, ApJ, 650, 128, doi: 10.1086/506319
  • Kreisch et al. (2021) Kreisch, C. D., Pisani, A., Villaescusa-Navarro, F., et al. 2021, arXiv e-prints, arXiv:2107.02304.
  • Kusner et al. (2017) Kusner, M. J., Paige, B., & Hernández-Lobato, J. M. 2017, Grammar Variational Autoencoder.
  • Lange et al. (2020) Lange, H., Brunton, S. L., & Kutz, N. 2020, arXiv e-prints, arXiv:2004.00574.
  • Langley (1977) Langley, P. 1977, in IJCAI
  • Langley & Zytkow (1989)

    Langley, P., & Zytkow, J. M. 1989, Artificial Intelligence, 40, 283, doi:
  • Le Brun et al. (2015) Le Brun, A. M. C., McCarthy, I. G., & Melin, J.-B. 2015, MNRAS, 451, 3868, doi: 10.1093/mnras/stv1172
  • Leavitt & Pickering (1912) Leavitt, H. S., & Pickering, E. C. 1912, Harvard College Observatory Circular, 173, 1
  • Lemos et al. (2022) Lemos, P., Jeffrey, N., Cranmer, M., Battaglia, P., & Ho, S. 2022
  • Li et al. (2020) Li, Y., Ni, Y., Croft, R. A. C., et al. 2020, arXiv e-prints, arXiv:2010.06608.
  • Li et al. (2021) —. 2021, Proceedings of the National Academy of Science, 118, 2022038118, doi: 10.1073/pnas.2022038118
  • Liu & Tegmark (2020) Liu, Z., & Tegmark, M. 2020, arXiv e-prints, arXiv:2011.04698.
  • Lovell et al. (2018) Lovell, M. R., Pillepich, A., Genel, S., et al. 2018, MNRAS, 481, 1950, doi: 10.1093/mnras/sty2339
  • Lu et al. (2021) Lu, T., Haiman, Z., & Zorrilla Matilla, J. M. 2021, arXiv e-prints, arXiv:2109.11060.
  • Lucie-Smith et al. (2018) Lucie-Smith, L., Peiris, H. V., Pontzen, A., & Lochner, M. 2018, MNRAS, 479, 3405, doi: 10.1093/mnras/sty1719
  • Lusch et al. (2018) Lusch, B., Kutz, J. N., & Brunton, S. L. 2018, Nature Communications, 9, 4950, doi: 10.1038/s41467-018-07210-0
  • Madhavacheril et al. (2020) Madhavacheril, M. S., Sifón, C., Battaglia, N., et al. 2020, ApJ, 903, L13, doi: 10.3847/2041-8213/abbccb
  • Marinacci et al. (2018) Marinacci, F., Vogelsberger, M., Pakmor, R., et al. 2018, MNRAS, 480, 5113, doi: 10.1093/mnras/sty2206
  • McGaugh et al. (2000) McGaugh, S. S., Schombert, J. M., Bothun, G. D., & de Blok, W. J. G. 2000, ApJ, 533, L99, doi: 10.1086/312628
  • Medezinski et al. (2018) Medezinski, E., Battaglia, N., Umetsu, K., et al. 2018, PASJ, 70, S28, doi: 10.1093/pasj/psx128
  • Menanteau et al. (2010) Menanteau, F., et al. 2010, ApJ, 723, 1523, doi: 10.1088/0004-637X/723/2/1523
  • Miyatake et al. (2019) Miyatake, H., Battaglia, N., Hilton, M., et al. 2019, ApJ, 875, 63, doi: 10.3847/1538-4357/ab0af0
  • Modi et al. (2018) Modi, C., Feng, Y., & Seljak, U. 2018, J. Cosmology Astropart. Phys, 2018, 028, doi: 10.1088/1475-7516/2018/10/028
  • Moster et al. (2020) Moster, B. P., Naab, T., Lindström, M., & O’Leary, J. A. 2020, arXiv e-prints, arXiv:2005.12276.
  • Motl et al. (2005) Motl, P. M., Hallman, E. J., Burns, J. O., & Norman, M. L. 2005, ApJ, 623, L63, doi: 10.1086/430144
  • Mucesh et al. (2020) Mucesh, S., et al. 2020.
  • Nadler et al. (2018) Nadler, E. O., Mao, Y.-Y., Wechsler, R. H., Garrison-Kimmel, S., & Wetzel, A. 2018, ApJ, 859, 129, doi: 10.3847/1538-4357/aac266
  • Nagai (2006) Nagai, D. 2006, ApJ, 650, 538, doi: 10.1086/506467
  • Naiman et al. (2018) Naiman, J. P., Pillepich, A., Springel, V., et al. 2018, MNRAS, 477, 1206, doi: 10.1093/mnras/sty618
  • Nelson et al. (2018) Nelson, D., Pillepich, A., Springel, V., et al. 2018, MNRAS, 475, 624, doi: 10.1093/mnras/stx3040
  • Nelson et al. (2019) Nelson, D., Springel, V., Pillepich, A., et al. 2019, Computational Astrophysics and Cosmology, 6, 2, doi: 10.1186/s40668-019-0028-x
  • Ni et al. (2021) Ni, Y., Li, Y., Lachance, P., et al. 2021, arXiv e-prints, arXiv:2105.01016.
  • Ntampaka et al. (2015) Ntampaka, M., Trac, H., Sutherland, D. J., et al. 2015, ApJ, 803, 50, doi: 10.1088/0004-637X/803/2/50
  • Ntampaka et al. (2019) Ntampaka, M., et al. 2019.
  • Palmese et al. (2020) Palmese, A., et al. 2020, MNRAS, 493, 4591, doi: 10.1093/mnras/staa526
  • Pandey et al. (2021) Pandey, S., et al. 2021, arXiv e-prints, arXiv:2108.01601.
  • Pedregosa et al. (2011) Pedregosa, F., Varoquaux, G., Gramfort, A., et al. 2011, Journal of Machine Learning Research, 12, 2825.
  • Phillips (1993) Phillips, M. M. 1993, ApJ, 413, L105, doi: 10.1086/186970
  • Pillepich et al. (2018a) Pillepich, A., Nelson, D., Hernquist, L., et al. 2018a, MNRAS, 475, 648, doi: 10.1093/mnras/stx3112
  • Pillepich et al. (2018b) Pillepich, A., Springel, V., Nelson, D., et al. 2018b, MNRAS, 473, 4077, doi: 10.1093/mnras/stx2656
  • Planck Collaboration et al. (2013) Planck Collaboration, Ade, P. A. R., et al. 2013, A&A, 557, A52, doi: 10.1051/0004-6361/201220941
  • Planck Collaboration et al. (2016) Planck Collaboration, Ade, P. A. R., Aghanim, N., et al. 2016, A&A, 594, A24, doi: 10.1051/0004-6361/201525833
  • Rackauckas et al. (2020) Rackauckas, C., Ma, Y., Martensen, J., et al. 2020, arXiv preprint arXiv:2001.04385
  • Riess et al. (2019) Riess, A. G., Casertano, S., Yuan, W., Macri, L. M., & Scolnic, D. 2019, ApJ, 876, 85, doi: 10.3847/1538-4357/ab1422
  • Riess et al. (2016) Riess, A. G., et al. 2016, ApJ, 826, 56, doi: 10.3847/0004-637X/826/1/56
  • Sahoo et al. (2018) Sahoo, S., Lampert, C., & Martius, G. 2018, in Proceedings of Machine Learning Research, Vol. 80, Learning Equations for Extrapolation and Control, ed. J. Dy & A. Krause (Stockholmsmässan, Stockholm Sweden: PMLR), 4442–4450.
  • Schmidt (1959) Schmidt, M. 1959, ApJ, 129, 243, doi: 10.1086/146614
  • Schmidt & Lipson (2009) Schmidt, M., & Lipson, H. 2009, Science, 324, 81, doi: 10.1126/science.1165893
  • Schrabback et al. (2018) Schrabback, T., Applegate, D., Dietrich, J. P., et al. 2018, MNRAS, 474, 2635, doi: 10.1093/mnras/stx2666
  • Sehgal et al. (2011) Sehgal, N., Trac, H., Acquaviva, V., et al. 2011, ApJ, 732, 44, doi: 10.1088/0004-637X/732/1/44
  • Shao et al. (2021) Shao, H., Villaescusa-Navarro, F., Genel, S., et al. 2021, arXiv e-prints, arXiv:2109.04484.
  • Shaw et al. (2008) Shaw, L. D., Holder, G. P., & Bode, P. 2008, ApJ, 686, 206, doi: 10.1086/589849
  • Sheth & Bernardi (2012) Sheth, R. K., & Bernardi, M. 2012, MNRAS, 422, 1825, doi: 10.1111/j.1365-2966.2011.19757.x
  • Springel (2010) Springel, V. 2010, MNRAS, 401, 791, doi: 10.1111/j.1365-2966.2009.15715.x
  • Springel et al. (2018) Springel, V., Pakmor, R., Pillepich, A., et al. 2018, MNRAS, 475, 676, doi: 10.1093/mnras/stx3304
  • Stanek et al. (2010) Stanek, R., Rasia, E., Evrard, A. E., Pearce, F., & Gazzola, L. 2010, ApJ, 715, 1508, doi: 10.1088/0004-637X/715/2/1508
  • Su et al. (2020) Su, Y., Zhang, Y., Liang, G., et al. 2020, MNRAS, 498, 5620, doi: 10.1093/mnras/staa2690
  • Sunyaev & Zeldovich (1970) Sunyaev, R. A., & Zeldovich, Y. B. 1970, Ap&SS, 7, 3, doi: 10.1007/BF00653471
  • Thiele et al. (2021) Thiele, L., Cranmer, M., Coulton, W., Ho, S., & Spergel, D. N. 2021, NeurIPS Workshop on Physics and Machine Learning, 8
  • Thiele et al. (2020) Thiele, L., Villaescusa-Navarro, F., Spergel, D. N., Nelson, D., & Pillepich, A. 2020, arXiv e-prints, arXiv:2007.07267.
  • Tröster et al. (2019) Tröster, T., Ferguson, C., Harnois-Déraps, J., & McCarthy, I. G. 2019, MNRAS, 487, L24, doi: 10.1093/mnrasl/slz075
  • Tully & Fisher (1977) Tully, R. B., & Fisher, J. R. 1977, A&A, 500, 105
  • Verde et al. (2002) Verde, L., Haiman, Z., & Spergel, D. N. 2002, ApJ, 581, 5, doi: 10.1086/344134
  • Vikhlinin et al. (2006) Vikhlinin, A., Kravtsov, A., Forman, W., et al. 2006, ApJ, 640, 691, doi: 10.1086/500288
  • Villaescusa-Navarro et al. (2021a) Villaescusa-Navarro, F., Anglés-Alcázar, D., Genel, S., Spergel, D. N., et al. 2021a, ApJ, 915, 71, doi: 10.3847/1538-4357/abf7ba
  • Villaescusa-Navarro et al. (2021b) Villaescusa-Navarro, F., et al. 2021b, arXiv e-prints, arXiv:2109.10915.
  • Villaescusa-Navarro et al. (2021c) Villaescusa-Navarro, F., Anglés-Alcázar, D., Genel, S., et al. 2021c, arXiv e-prints, arXiv:2109.09747.
  • Villaescusa-Navarro et al. (2021d) Villaescusa-Navarro, F., Genel, S., Angles-Alcazar, D., et al. 2021d, arXiv e-prints, arXiv:2109.10360.
  • Villanueva-Domingo et al. (2021) Villanueva-Domingo, P., Villaescusa-Navarro, F., Anglés-Alcázar, D., et al. 2021, arXiv e-prints, arXiv:2111.08683.
  • Virgolin et al. (2021)

    Virgolin, M., Alderliesten, T., Witteveen, C., & Bosman, P. A. N. 2021, Evolutionary Computation, 29, 211, doi: 

  • Voevodkin & Vikhlinin (2004) Voevodkin, A., & Vikhlinin, A. 2004, ApJ, 601, 610, doi: 10.1086/380818
  • von der Linden et al. (2014) von der Linden, A., Mantz, A., Allen, S. W., et al. 2014, MNRAS, 443, 1973, doi: 10.1093/mnras/stu1423
  • Wadekar et al. (2021) Wadekar, D., Thiele, L., Villaescusa-Navarro, F., et al. 2021, in preparation
  • Wadekar et al. (2020) Wadekar, D., Villaescusa-Navarro, F., Ho, S., & Perreault-Levasseur, L. 2020, arXiv e-prints, arXiv:2012.00111.
  • Wadekar et al. (2021) Wadekar, D., Villaescusa-Navarro, F., Ho, S., & Perreault-Levasseur, L. 2021, Astrophys. J., 916, 42, doi: 10.3847/1538-4357/ac033a
  • Weinberger et al. (2020) Weinberger, R., Springel, V., & Pakmor, R. 2020, ApJS, 248, 32, doi: 10.3847/1538-4365/ab908c
  • Weinberger et al. (2017) Weinberger, R., Springel, V., Hernquist, L., et al. 2017, MNRAS, 465, 3291, doi: 10.1093/mnras/stw2944
  • Weinberger et al. (2018) Weinberger, R., Springel, V., Pakmor, R., et al. 2018, MNRAS, 479, 4056, doi: 10.1093/mnras/sty1733
  • Werner et al. (2021) Werner, M., Junginger, A., Hennig, P., & Martius, G. 2021.
  • Wilstrup & Kasak (2021) Wilstrup, C., & Kasak, J. 2021, arXiv e-prints, arXiv:2103.15147.
  • Yan et al. (2020) Yan, Z., Mead, A. J., Van Waerbeke, L., Hinshaw, G., & McCarthy, I. G. 2020, MNRAS, 499, 3445, doi: 10.1093/mnras/staa3030
  • Yang et al. (2010) Yang, H. Y. K., Bhattacharya, S., & Ricker, P. M. 2010, ApJ, 725, 1124, doi: 10.1088/0004-637X/725/1/1124
  • Yip et al. (2019) Yip, J. H. T., Zhang, X., Wang, Y., et al. 2019, arXiv e-prints, arXiv:1910.07813.
  • Zamudio-Fernandez et al. (2019) Zamudio-Fernandez, J., Okan, A., Villaescusa-Navarro, F., et al. 2019, arXiv e-prints, arXiv:1904.12846.
  • Zembowicz & Żytkow (1992) Zembowicz, R., & Żytkow, J. M. 1992, in Proceedings of the Tenth National Conference on Artificial Intelligence, AAAI’92 (AAAI Press), 70–75
  • Zhang et al. (2019) Zhang, X., Wang, Y., Zhang, W., et al. 2019, arXiv e-prints, arXiv:1902.05965.