1 Introduction
Over a decade ago, light emitted from an exoplanet was first measured, paving the way for the study of exoplanetary atmospheres (Charbonneau et al., 2005; Deming et al., 2005). In the years since, a diverse collection of worlds have been discovered, from rocky, Earthlike planets to massive gas giants that reach temperatures as hot as some stars (Hasegawa & Pudritz, 2013; Batalha, 2014). Edgeon planetary systems enable the measurement of transit (when the exoplanet passes in between the host star and the observer) and eclipse (when the exoplanet passes behind the host star as viewed by the observer) depths (Kreidberg, 2017). Transit depths measure the effective radius of the planet as a function of wavelength; variations in measured radius arise from the molecules in the atmosphere at the daynight terminator absorbing certain wavelengths of light, with more absorption corresponding to larger measured radii. Eclipse depths measure the ratio of the planet’s and host star’s emission as a function of wavelength. These depths provide insight into the composition and temperature structure of the planet’s atmosphere.
Using measured transit or eclipse depths spanning a range of wavelengths, an atmospheric model for the planet can be determined with some uncertainty via atmospheric retrieval, an inverse modeling technique (Madhusudhan, 2018). Early retrieval studies performed a parametric grid search over millions of precalculated forward models (Madhusudhan & Seager, 2009)
. This method was later improved by Bayesian techniques employing Markov chain Monte Carlo (MCMC) and other sampling techniques
(e.g., Skilling, 2004; ter Braak, 2006; ter Braak & Vrugt, 2008) to explore a model parameter space by computing spectra for thousands to millions of atmospheric models (e.g., Madhusudhan & Seager, 2010; Line et al., 2014; Waldmann et al., 2015; Oreshenko et al., 2017). Model parameters describe the temperature–pressure profile, ; the vertical abundance profiles for each molecule in the atmospheric model; cloud parameters; and, for the transit case, the radius of the planet. These Bayesian techniques yield a posterior distribution which constrains the range of values that fit the data for each model parameter. For lowresolution data, some parameters may be only constrained to an upper/lower limit (or not at all) due to degeneracies among lowresolution spectra (e.g., a slightly cooler atmosphere with greater abundances of molecules will look the same as a slightly warmer atmosphere with lesser abundances). While highresolution data allows for parameters to be more accurately determined, there is still some inherent uncertainty due to astrophysical and instrumental noise. Accurate quantification of this uncertainty informs the statistical significance of the results.Datadriven machine learning (ML) approaches, which are able to learn complex relationships within large data sets, provide possible solutions to methods that can be computationallyexpensive, such as atmospheric retrieval. Examples using ML can be seen across the field of astrophysics from applying Bayesian linear regression to remove commonmode systematics in Kepler data
(Roberts et al., 2013), to automating the process of identifying exoplanets using deep learning
(Shallue & Vanderburg, 2018; Ansdell et al., 2018; Osborn et al., 2019). Furthermore, in an approach similar to our own, but in a different application domain, Perreault Levasseur et al. (2017) used Bayesian neural networks to map distant gravitationallylensed galaxies.Recently, the study of exoplanetary atmospheres has been aided by ML techniques. Waldmann (2016)
makes use of deep belief networks to classify exoplanet emission spectra, importantly showing that ML approaches can identify molecular signatures in emission spectra. The first supervised ML retrieval algorithms,
HELA (MárquezNeila et al., 2018) and ExoGAN (Zingales & Waldmann, 2018), have been developed and show promising results. HELA uses a random forest to classify observed spectra into some planetary model (see Section 3.1 for more details), while ExoGAN combines a generative adversarial network (GAN, Goodfellow et al., 2014)with a technique called semantic image inpainting
(Yeh et al., 2017) to retrieve atmospheric parameters. These methods reduce retrieval times from hundreds of central processing unit hours to just seconds/minutes, highlighting the large reductions in computation times offered by ML.Here, we introduce a new ML retrieval method, plannet^{1}^{1}1Our code is available at https://github.com/exoml/plannet., which is based on an ensemble of Bayesian neural networks, and apply it to the benchmark data set of MárquezNeila et al. (2018)
. BNNs are a good choice of model for atmospheric retrievals as they give the advantage of both providing probability distributions over their outputs and scaling to highdimensional data. We directly compare our model with
HELA over the same data set and demonstrate how incorporating domainspecific knowledge into machine learning models can improve results and offer insights into the covariance of the atmospheric parameters.In this paper we first introduce the data set in Section 2 along with the notation. We then introduce both ML models in Section 3, where we start with the random forest followed by a detailed explanation of our model. In Section 4 we both display and discuss our results. Finally, in Section 5 we make conclusions about the implications of our results and suggest further avenues for research in this area.
2 Data Set
2.1 Description
We use the spectral data set of MárquezNeila et al. (2018) which consists of 100,000 synthetic Hubble Space Telescope Wide Field Camera 3 (WFC3) transmission spectra of hot Jupiters. These spectra were created using the formalism detailed in Heng & Kitzmann (2017), which makes use of linebyline calculations for opacities (S. Grimm, priv. comm.). This is based on five atmospheric parameters: an isothermal temperature; abundances of HO, NH, and HCN gas; and a gray cloud opacity, . Each spectrum has 13 channels with bandpasses matching those used in Kreidberg et al. (2015) ( – µm). Each channel holds the transit depth within the corresponding bandpass. We refer the reader to their papers for more details, particularly the ‘Methods’ section of MárquezNeila et al. (2018), as this is where the boundary conditions are described.
For each transit depth, we assume the same 50 parts per million uncertainty as MárquezNeila et al. (2018). We similarly split the data set between training () and testing (). We reserve
spectra from the training set to be the validation set, which is used to optimize model hyperparameters and architectures. This ensures that inferences are made on the test data only one time. We use the same realdata test case: the WASP12b WFC3 transit depths as analyzed by
Kreidberg et al. (2015). Two sample input spectra can be seen in the Appendix, Figures (c)c and (c)c.2.2 Notation
In this paper we use the following notation to describe our data set, . A single spectrum with
channels is denoted by the vector
and defines the vector of five atmospheric parameters. Furthermore, we generalize our model by referring to the dimension of as . The training and testing data sets are denoted by and respectively, where the test data is given by for total inputoutput pairs.3 Machine Learning Models
In machine learning, the task of inferring a function from labeled data comes under the area of supervised learning. In our case, the task is a multivariate regression problem, where the objective is to model the relationship between the inputspace,
, and the outputspace,. In addition to predicting the values of the outputs, it is vital that the ML model also provides an uncertainty estimation over these values. Astronomical observations inherently introduce uncertainty in measurements, and accurately accounting for and reporting these uncertainties is a critical part of retrieval results.
In this section we introduce the previouslyused random forest along with our plannet model. In each section we explain how each model aims to solve this multivariate regression task and how they each deal with uncertainty. We highlight that the plannet model is specifically designed to deal with both the uncertainty and the correlations between the outputs, whereas the the random forest does not differ from those used in other multivariate regression tasks.
3.1 Random Forest
Here, we briefly summarize the random forest regression model used in MárquezNeila et al. (2018), where the details of the model are available at https://github.com/exoclime/HELA. The core of their model comes from the ensemble.RandomForestRegressor method in sklearn (Pedregosa et al., 2011)
. A random forest (RF) consists of multiple decision trees (or regression trees, for the case of continuous data), whereby each tree makes a prediction given an input (see
Criminisi et al. (2012)). MárquezNeila et al. (2018) showed that no more thanregression trees were required, which led to choosing that number for the model. They set the number of nodes in each tree via a variance threshold of
. This is a metric that is related to the proportion of the remaining training data that is split at the current node.To produce the posterior plots, as shown in Figure (a)a, each prediction from a tree corresponds to a sample from an empirical distribution. The samples therefore correspond to the density estimation of the atmospheric parameters.
3.2 Bayesian Neural Networks
Our model is built from Bayesian neural networks, which inherit their structure from neural networks. Although we provide details of both techniques in the following section, we highlight their strong relationship with multivariate linear regression, where the objective is to learn a matrix of weights that map an input to an output . Fully connected deep neural networks extend upon this by combining layers of linear regression with nonlinear functions to result in a more powerful functionapproximating capability, despite still operating on the same supervised learning task as a linear regression model.
3.2.1 A Summary
Bayesian neural networks (BNNs) offer the powerful functionapproximating capability of deep neural networks with the additional advantage of being able to provide distributions over their outputs (MacKay, 1992; Neal, 1995). Therefore, these characteristics are wellsuited to the task of atmospheric retrieval. To enable BNNs to scale to large architectures we employ the Monte Carlo dropout approximation to BNNs (Gal & Ghahramani, 2016). This is a stochastic variational inference approach (Hoffman et al., 2013) that allows BNN inference to be performed for both large architectures and large data sets. The alternative approach would be to implement a form of MCMC such as Hamiltonian Monte Carlo (HMC, Neal, 1995) to perform inference. Although HMC has been shown to be successful at small scale, it currently cannot be scaled in the same way as stochastic variational inference approaches.
Deep neural networks consist of a hierarchy of layers, where each layer applies a nonlinear weighted transformation of its input. We define each layer , to have its own matrix of weights and biases . If is a nonlinear function then we can define a fully connected dense neural network with layers and input as:
where and refers to all the network weights. A BNN takes this formulation and adds a prior
over the weights, often taking the form of a multivariate normal distribution. Bayesian inference in BNNs requires computing an intractable integral to infer
. The Monte Carlo dropout approximation provides a (variational) approximation to this distribution and comes under the wider area of variational inference (Jordan et al., 1998). Practical implementation of MC dropout requires drawing dropout masks (Srivastava et al., 2014)from Bernoullidistributed random variables to set a certain proportion of weights to zero. Applying this during the training of the network acts as a regularizer to prevent overfitting. Dropping these weights whilst making predictions at testtime results in the testtime approximation for predictions over the outputs. For a given input
, we can sample the network times to result in an empirical distribution .Determining the proportion of weights to be dropped in each layer often requires tuning over a validation set. However, we use concrete dropout layers to automatically optimize for these values in the training process (Gal et al., 2017).
3.2.2 The Model
Our model, plannet, shown in Figure 1, is a deep neural network with four dense concrete dropout layers (Gal et al., 2017). The model is implemented in Keras (Chollet et al., 2015) with a TensorFlow backend (Abadi et al., 2016). Each layer consists of 1024 units, and we use a batch size of 512. For training the model, we use the Adam optimization algorithm (Kingma & Ba, 2014). For deciding on the architecture, we implemented a grid search over the number of layers and the number of units per layer.
Our task is to accurately predict the atmospheric parameters and provide posterior^{2}^{2}2In the machine learning literature, the output distribution would normally be called the predictive distribution as we are inferring the posterior over the weights of the network and then working with this posterior to infer a predictive distribution. However, to remain consistent with the exoplanet literature, we avoid that here.
distributions over their values. These parameters are expected to covary and we directly use this domain knowledge to design our model, such that we can represent the atmospheric parameters to be jointly distributed by a multivariate normal distribution. Therefore we design the output of the BNN to consist of the a lower triangular matrix
of dimensions and a mean vector of dimension . We can then represent the precision matrix of a multivariate normal via its Cholesky decomposition .Figure 1 demonstrates the atmospheric retrieval process after the model is trained. We implement forward passes through the network for a given observed spectrum , resulting in the samples . In the next step, we take the mean over these network samples to give the expected and for a given spectrum:
(1) 
(2) 
The final step is to sample from the multivariate normal distribution,
(3) 
to retrieve samples from the inferred atmospheric parameters, where this distribution is parameterized by the expectation BNN output.
3.2.3 Training
In order to train this model, we must design a loss that ensures the network learns the correlations between the atmospheric parameters. In order to estimate the covariance, our loss is the negative loglikelihood of the multivariate normal, as defined by and . The loss,
(4) 
is defined to be implicitly dependent on the network weights through the lower triangular matrix and the inferred mean (see Figure 1). As also mentioned in Dorta et al. (2018), we must be careful to ensure that the diagonal elements, , of are positive such that
is positivedefinite; we therefore take the exponential of the diagonal terms to ensure this. In comparison to previous loss functions that have been used for BNNs, such as the squared loss and the heteroscedastic squared loss (see
Gal (2016, Chapter 4)), our new loss in Equation (4) is able to model correlations between atmospheric parameters. These inferred correlations lead to better uncertainty estimates for the retrieved atmospheric parameters than the previous losses.In addition to using the Adam optimizer, we employ early stopping, with a patience of 30 epochs, according to the validation loss. Furthermore, we use model checkpointing to save the model that has the best performance on the validation set.
3.3 Ensemble
It has been shown that an ensemble of neural networks can offer more accurate estimations of the predictive uncertainty than a single network (Lakshminarayanan et al., 2017; Gal & Smith, 2018). The additional benefit is that an ensemble is more robust to changes in weight initialization and the path taken during stochastic optimization.
In this paper we use an ensemble of five plannet models and provide comparison to a single model. Five models were chosen due to the empirical performance in Table 1, as larger ensembles result in increasingly marginal improvements.
The challenge in using an ensemble is in how the outputs from the individual models are combined. In our case, each output is the mean and covariance of a multivariate normal distribution. Therefore in combining these distributions together, we can treat the overall output from the ensemble as a Gaussian mixture model, whereby the each component weight corresponds to
, where is the number of models in the ensemble.To calculate the expectation of this mixture model, , we take the average of the individual component means such that
The variance of the mixture model can be calculated by employing the law of total variance:
where the inferred covariance matrix of a single model is given by . This combines the variance in the component means with the expectation of the variance of the individual models, thus taking into account how unsure each model is and how far each model’s mean lies from the ensemble mean. Therefore the atmospheric parameters retrieved via the ensemble are distributed according to .
4 Results and Discussion
Table 1 displays a comparison of values across the models, where corresponds to the coefficient of determination
(5) 
as defined in the sklearn.metrics Python package, where the summation is over both the size of the data set and the output dimension . is the data mean for each atmospheric parameter and the prediction for each data point is given by . This can be viewed as a ratio between the residuals for the model prediction and the total sum of squares. Values close to are desirable as they are related to the correlation coefficient between the predicted and true atmospheric parameters.
Therefore, the results in Table 1 show that both of our models, the ensemble and the individual plannet model, outperform the random forest. Furthermore, we note the slight performance boost that is gained from the ensemble. In order to show that the results are reproducible, we list both our implementation of the random forest and their reported results, which closely agree.
In addition to reporting the values, Table 2 contains the average covariance matrix over the test data. This table shows the average inferred correlations, where the diagonal corresponds to the variance in each atmospheric parameter and the offdiagonals indicate correlations between the parameters. As this is the average correlation matrix for all test planets, not too many conclusions can be drawn from this matrix. However, we note the average negative correlation that appears between and as well as and HO. This is consistent with intuition due to the known degeneracies in the data. More specifically, as the observed spectral features are caused by the temperature–pressure profile and the molecular abundances, increasing either whilst keeping all other parameters constant leads to stronger spectral features. Consequently, a simultaneous increase in temperature and a decrease in molecular abundances (or vice versa) could lead to the same observed spectrum. Finally, an increase in cloud opacity decreases the intensity of the observed spectral features and could therefore look similar to a decrease in temperature, hence the degeneracy and the expected negative correlation between and .
Mean  

plannet  
Ens. 5 plannet  
Our Ran. Forest  
Ran. Forest^{3}^{3}3Reported from MárquezNeila et al. (2018).  

Kreidberg et al. (2015)        

MárquezNeila et al. (2018) nested sampling  
Our Rand. Forest  
Ens. 5 plannet  

not applicable to their model. Errors are reported for one standard deviation, where we report the median and equivalent asymmetric posterior percentiles for the random forest and for
Kreidberg et al. (2015).By designing our model to learn these correlations, we are able to interpret the results in a way that is not always available when using deep learning models. Specifically, we identify cases where both our model and the random forest approach do not recover the true values, but where our model includes the true values in its wider posterior distributions. Figure 9 shows a case where the random forest infers narrow (highly confident) posterior distributions that fall far from the true values, whereas our plannet ensemble model is (appropriately) less confident, leading to posterior distributions that cover the true values for the atmospheric parameters (shown by the red stars).
Given the performance over the synthetic test data set, we further test our models on the WFC3 transmission spectrum of WASP12b. Figure 5 shows the posterior plots for the random forest, the single plannet model and the plannet ensemble. In the case of WASP12b, both plannetbased models find marginalized posteriors similar to the random forest for the cloud opacity () and the abundances of HCN and NH. For temperature, both plannetbased models have a distribution that is consistent with the retrieval performed in Kreidberg et al. (2015), while the random forest favors cooler temperatures. All models favor low () abundances for HCN and NH, indicating a nondetection of these molecules. The HO abundance predicted by the individual plannet model and the ensemble are more tightly constrained than the results of MárquezNeila et al. (2018) or Kreidberg et al. (2015); see Table 3 for numerical comparisons^{4}^{4}4MárquezNeila et al. (2018) utilize a constantopacity cloud parameterization, while Kreidberg et al. (2015) use a cloud and haze model that assumes an opaque gray cloud deck, which introduces degeneracies between the cloud and haze parameters. Consequently, a direct comparison between the two models cannot be made in Table 3. . Fisher & Heng (2018) found that, in general, WFC3 transmission spectra are adequately explained by an isothermal atmosphere (in the regions probed by transit observation), gray clouds, and HO only. Based on our ensemble’s confidence in HO abundance (and lack of confidence in HCN and NH abundances), it is likely that the model similarly learned this.
4.1 Limitations
We highlight that employing variational approximate inference in BNNs is known to have problems, particularly underestimating uncertainty (Blei et al., 2017). Unlike the RF, our ensemble BNN model favors large uncertainties when the data cannot constrain a parameter, as shown in Figures 9 and 13. Though an ensemble of models helps to improve the uncertainty estimation, we emphasize that accurate uncertainty estimation requires using MCMC, nested sampling, or another Bayesian sampling algorithm proven to obtain accurate posterior distributions and therefore uncertainty estimations (e.g., ter Braak & Vrugt, 2008).
Nevertheless, BNNs are presently an important tool for retrievals. They provide a reasonable estimation of parameters orders of magnitude faster than traditional methods that require hundreds of hours of CPU time, helping to constrain parameter spaces. As an example, a single plannet prediction over a test planet takes ms, when samples, and an ensemble of five takes s if they are run sequentially.^{5}^{5}5Hardware: Ubuntu 18.04, 32GB memory, CPU: Intel Core i78700K, GPU: TITAN Xp
As long as the data set used to train the model contains all relevant molecules, BNNs can inform which molecules should be considered in a traditional retrieval analysis based on retrieved abundances and their uncertainties. A single plannet must be trained once for a certain class of planets, e.g., WFC3 transmission spectra of hot Jupiters. Once the model has been trained, all inferences with that model are fast and repeatable, for the class of planets represented in the training set. Therefore, although training the model can be computationally expensive, this only needs to be done once. In our example, each plannet model takes minutes to train over the WFC3 transmission spectra. Thus, despite the limitations of BNNs, their results are valuable and help save compute time spent on retrieval analyses.
Our approach is a generalizable technique that is not limited to any specific type of planet. In addition, important parameters such as the radius of the planet should be included in future models, as in this paper we make use of the data set of MárquezNeila et al. (2018) which does not include it in the parameter space. Therefore the challenge in using BNNs comes from ensuring that the data set contains both the parameter space and planet types of interest.
5 Conclusions
In this paper, we have demonstrated how domainknowledge can be used to design a machine learning model that both outperforms the previous approach and provides inferred correlations between its outputs. Furthermore, we have introduced a novel likelihood function for BNNs which captures correlations between output dimensions. This extends on the diagonal Gaussian likelihood often used in the literature that does not capture these correlations. We highlight that this is extremely easy to do with BNNs and stochastic approximate inference, when comparing to traditional ML techniques (e.g. Gaussian processes), where it would involve many more approximations.
Using the data set of MárquezNeila et al. (2018), we independently reproduced the results of their RF. For the first time, we have shown that ML retrieval results are reproducible and consistent across implementations.
In addition to comparing our approach to the random forest using test planet models, we also analyzed the inferred posteriors for WASP12b, where we take the results of Kreidberg et al. (2015) to be the ground truth. Our ensemble of five plannet models gives results consistent with the RF of MárquezNeila et al. (2018) and achieved distributions for HO abundance and temperature that agree more closely with Kreidberg et al. (2015) and the nested sampling retrieval of MárquezNeila et al. (2018) than the RF. The low () retrieved abundances and large uncertainties of HCN and NH indicate a nondetection of these molecules.
Furthermore, we have found that an ensemble of BNNs provides posterior distributions that better represent those of traditional Bayesian atmospheric retrieval methods, compared to both a single BNN model and the RF model. A single plannet model can underestimate the size of the posterior distributions due to overconfidence in their predictions, while the RF can be overconfident in a wrong answer.
We have presented the first study that employs BNNs for atmospheric retrievals, setting the foundation for further research in this area. As the data available for atmospheric retrievals expands, it will become increasingly important to combine domainknowledge with machine learning models. It is equally important that it remains possible to interpret the outputs of these models so that inferences can be physically understood. Our method easily scales to higher dimensionality; in future work, we will expand our model to higher resolution spectra and a larger number of atmospheric parameters.
References
 Abadi et al. (2016) Abadi, M., Barham, P., Chen, J., et al. 2016, in OSDI, Vol. 16, 265–283
 Ansdell et al. (2018) Ansdell, M., Ioannou, Y., Osborn, H. P., et al. 2018, ApJ, 869, L7
 Batalha (2014) Batalha, N. M. 2014, Proceedings of the National Academy of Science, 111, 12647
 Blei et al. (2017) Blei, D. M., Kucukelbir, A., & McAuliffe, J. D. 2017, Journal of the American Statistical Association, 112, 859
 Charbonneau et al. (2005) Charbonneau, D., Allen, L. E., Megeath, S. T., et al. 2005, ApJ, 626, 523
 Chollet et al. (2015) Chollet, F., et al. 2015, Keras, ,
 Criminisi et al. (2012) Criminisi, A., Shotton, J., Konukoglu, E., et al. 2012, Foundations and Trends® in Computer Graphics and Vision, 7, 81
 Deming et al. (2005) Deming, D., Seager, S., Richardson, L. J., & Harrington, J. 2005, Nature, 434, 740

Dorta et al. (2018)
Dorta, G., Vicente, S., Agapito, L., Campbell, N. D., & Simpson, I. 2018, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5477–5485
 Fisher & Heng (2018) Fisher, C., & Heng, K. 2018, MNRAS, 481, 4698
 Gal (2016) Gal, Y. 2016, PhD thesis, PhD thesis, University of Cambridge
 Gal & Ghahramani (2016) Gal, Y., & Ghahramani, Z. 2016, in International Conference on Machine Learning, 1050–1059
 Gal et al. (2017) Gal, Y., Hron, J., & Kendall, A. 2017, in Advances in Neural Information Processing Systems, 3581–3590
 Gal & Smith (2018) Gal, Y., & Smith, L. 2018, arXiv preprint arXiv:1806.00667
 Goodfellow et al. (2014) Goodfellow, I., PougetAbadie, J., Mirza, M., et al. 2014, in Advances in Neural Information Processing Systems, 2672–2680
 Hasegawa & Pudritz (2013) Hasegawa, Y., & Pudritz, R. E. 2013, ApJ, 778, 78
 Heng & Kitzmann (2017) Heng, K., & Kitzmann, D. 2017, MNRAS, 470, 2972
 Hoffman et al. (2013) Hoffman, M. D., Blei, D. M., Wang, C., & Paisley, J. 2013, The Journal of Machine Learning Research, 14, 1303
 Jordan et al. (1998) Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., & Saul, L. K. 1998, in Learning in graphical models (Springer), 105–161
 Kingma & Ba (2014) Kingma, D. P., & Ba, J. 2014, arXiv preprint arXiv:1412.6980
 Kreidberg (2017) Kreidberg, L. 2017, Exoplanet Atmosphere Measurements from Transmission Spectroscopy and Other Planet Star Combined Light Observations (Springer International Publishing), 100
 Kreidberg et al. (2015) Kreidberg, L., Line, M. R., Bean, J. L., et al. 2015, ApJ, 814, 66
 Lakshminarayanan et al. (2017) Lakshminarayanan, B., Pritzel, A., & Blundell, C. 2017, in Advances in Neural Information Processing Systems, 6402–6413
 Line et al. (2014) Line, M. R., Knutson, H., Wolf, A. S., & Yung, Y. L. 2014, ApJ, 783, 70
 MacDonald & Madhusudhan (2017) MacDonald, R. J., & Madhusudhan, N. 2017, ApJ, 850, L15
 MacKay (1992) MacKay, D. J. 1992, Neural computation, 4, 448
 Madhusudhan (2018) Madhusudhan, N. 2018, ArXiv eprints, arXiv:1808.04824
 Madhusudhan & Seager (2009) Madhusudhan, N., & Seager, S. 2009, ApJ, 707, 24
 Madhusudhan & Seager (2010) —. 2010, ApJ, 725, 261
 MárquezNeila et al. (2018) MárquezNeila, P., Fisher, C., Sznitman, R., & Heng, K. 2018, Nature Astronomy, arXiv:1806.03944
 Neal (1995) Neal, R. M. 1995, PhD thesis, University of Toronto
 Oreshenko et al. (2017) Oreshenko, M., Lavie, B., Grimm, S. L., et al. 2017, ApJ, 847, L3
 Osborn et al. (2019) Osborn, H. P., Ansdell, M., Ioannou, Y., et al. 2019, arXiv preprint arXiv:1902.08544
 Pedregosa et al. (2011) Pedregosa, F., Varoquaux, G., Gramfort, A., et al. 2011, Journal of Machine Learning Research, 12, 2825
 Perreault Levasseur et al. (2017) Perreault Levasseur, L., Hezaveh, Y. D., & Wechsler, R. H. 2017, ApJ, 850, L7
 Roberts et al. (2013) Roberts, S., McQuillan, A., Reece, S., & Aigrain, S. 2013, MNRAS, 435, 3639
 Shallue & Vanderburg (2018) Shallue, C. J., & Vanderburg, A. 2018, AJ, 155, 94
 Skilling (2004) Skilling, J. 2004, in American Institute of Physics Conference Series, Vol. 735, American Institute of Physics Conference Series, ed. R. Fischer, R. Preuss, & U. V. Toussaint, 395–405
 Srivastava et al. (2014) Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. 2014, The Journal of Machine Learning Research, 15, 1929
 ter Braak (2006) ter Braak, C. 2006, Statistics and Computing, 16, 239. http://dx.doi.org/10.1007/s1122200687691
 ter Braak & Vrugt (2008) ter Braak, C. J. F., & Vrugt, J. A. 2008, Statistics and Computing, 18, 435. http://dx.doi.org/10.1007/s1122200891049
 Waldmann (2016) Waldmann, I. P. 2016, ApJ, 820, 107
 Waldmann et al. (2015) Waldmann, I. P., Rocchetto, M., Tinetti, G., et al. 2015, ApJ, 813, 13
 Yeh et al. (2017) Yeh, R. A., Chen, C., Yian Lim, T., et al. 2017, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5485–5493
 Zingales & Waldmann (2018) Zingales, T., & Waldmann, I. P. 2018, ArXiv eprints, arXiv:1806.02906