Learning in the Absence of Training Data -- a Galactic Application

11/22/2018 ∙ by Cedric Spire, et al. ∙ Loughborough University 0

There are multiple real-world problems in which training data is unavailable, and still, the ambition is to learn values of the system parameters, at which test data on an observable is realised, subsequent to the learning of the functional relationship between these variables. We present a novel Bayesian method to deal with such a problem, in which we learn a system function of a stationary dynamical system, for which only test data on a vector-valued observable is available, and training data is unavailable. This exercise borrows heavily from the state space probability density function (pdf), that we also learn. As there is no training data available for either sought function, we cannot learn its correlation structure, and instead, perform inference (using Metropolis-within-Gibbs), on the discretised form of the sought system function and of the pdf, where this pdf is constructed such that the unknown system parameters are embedded within its support. Likelihood of the unknowns given the available data, is defined in terms of such a pdf. We make an application to the learning of the density of all gravitational matter in a real galaxy.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The study of rich correlation structures of high-dimensional random objects, is often invoked when learning the unknown functional relationship between an observed random variable, and some other parameters that might inform on the properties of a system. A problem in which a vector of system parameters (say

) is related to an observed response variable (say

), is easily visualised by the equation: , where . Given training data , we aim to learn this unknown mapping within the paradigm of supervised learning. By ”training data” we mean here the pairs composed of chosen design points , and the output that is generated at ;

. Methods to perform supervised learning are extensively covered in the literature

Hastie et al. (2009); Neal (1998); Rasmussen and Williams (2006); Russel and Norvig (2009). Having learnt , one could use this model to predict the value Chakrabarty et al. (2015), at which the test datum on is realised – either in the conventional framework as , or as the Bayesian equivalent. Such prediction is possible, only subsequent to the learning of the functional relation between and using training data .

However, there exist physical systems for which only measurements on the observable are known, i.e. training data is not available. The disciplines affected by the absence of training data are diverse. In engineering Sun et al. (2011)

, anomaly detection is entirely sample-specific. There is no training data that allows for the learning of a functional relationship between anomaly occurrence (parametrised by type and severity of anomaly) and conditions that the sample is subjected to. Yet, we need to predict those anomalies. In finance, such anomalies in stock price trends are again outside the domain of supervised learning, given that the relationship between the market conditions and prices have not been reliably captured by any ”models” yet. In neuroscience,

Ahmad et al. (2017)

, a series of neurons spike at different amplitudes, and for different time widths, to cause a response (to a stimulus). We can measure the response’s strength and the parameters of firing neurons, but do not know the relation between these variables. Again, in petrophysics, the system property that is the proportion of the different components of a rock (eg. water, hydrocarbons), affects Nuclear Magnetic Resonance (NMR) measurements from the rock

Coates et al. (1999); Wang et al. (2016)

. However, this compositional signature cannot be reliably estimated given such data, using available estimation techniques. Quantification of petrological composition using the destructive testing of a rock, is highly exclusive, and expensive, to allow for a sample that is large enough to form a meaningful training data set. Also, the resulting training data will in general be unrepresentative of any new rock, since the relationship between the (compositional) system property and (NMR) observable is highly rock-specific, being driven by geological influences on the well that the given rock is obtained from. Therefore any training data will need to be substantially diverse, and as stated before, this is unachievable in general. Equally, this dependence on the latent geological influence annuls the possibility of using numerical simulations to generate NMR data, given design compositional information. Thus, generation of training data is disallowed in general.

In this work, we advance the learning of the sought functional relation between an observable and a system parameter vector, in such a challenging (absent training) data situation; this could in principle, then be undertaken as an exercise in unsupervised learning, though opting for the more robust supervised learning route is still possible, as long as the missing training data is generated, i.e. we are able to generate the at which the measured (test) datum, on , is available, . Our new method for accomplishing this, is to invoke a system property that helps link with , and this is possible in physical systems for which we have – at least partial – observed information. To clarify, what we advance in the face of the absent training data, is the pursuit of the probability density function of the observable , on which data is available, and employ this to learn the system parameter vector . We undertake such an exercise in a Bayesian framework, in which we seek posterior of the pdf of the observables, and the system parameters, given the available data.

The sought parameter vector could inform on the behaviour, or structure, of the system (eg. it could be the vectorised version of the density function of all gravitating matter in a distant galaxy). The state space pdf establishes the link between this unknown vector, and measurements available on the observable (that may comprise complete or incomplete information on the state space variable). We consider dynamical systems, s.t. the system at hand is governed by a kinetic equation Gressman and Strain (2010); we treat the unknown system parameter vector as the stationary parameter in the model of this dynamical system. In the novel Bayesian learning method that we introduce, this parameter is embedded within the support of the state space pdf. We describe the general model in Section 2, that is subsequently applied to an astronomical application that is discussed in Section 3. Inference is discussed in Section 4, where inference is made on the state space pdf and the sought system parameters, given the data that comprises measurements of the observable, using Metropolis-within-Gibbs. Results are presented in Section 5, and the paper is rounded up with a conclusive section (Section 6).

2 General Methodology

We model the system as a dynamical one, and define the state space variable as a -dimensional vector . Let the observable be , such that only some () of the different components of the state space vector can be observed. In light of this situation that is marked by incomplete information, we need to review our earlier declaration of interest in the probability density function of the full state space vector. Indeed, we aim to learn the pdf of the state space variable , and yet, have measured information on only , i.e. on only of the components of . Our data is then one set of measurements of the observable , and can be expressed by . If the density of is to be learnt given data on , such incompleteness in measured information will have to be compensated for by invoking some independent information. Such independent information is on the symmetry of .

It follows that unobserved components of will have to be integrated out of the state space pdf, in order to compare against data that comprises measurements of the observables. This state space pdf that the unobserved variables are integrated out of, is equivalently projected onto the space of observables, and therefore, we refer to it as the projected state space pdf. The likelihood of the model parameters, given the data, is simply the product of the projected state space pdf over all the data points. But until now, the unknown model parameters have not yet appeared in our expression of the likelihood. The next step is then to find a way for embedding the sought system parameters, in the support of the projected state space pdf.

This can be achieved by assuming that our dynamical system is stationary, so that its state space pdf does not depend on time-dependent variables. In other words, the rate of change of the state space pdf is . This allows us to express the pdf as dependent on the state space vector , but only via such functions of (some or all amongst) that are not changing with time; in fact, the converse of this statement is also true. This is a standard result, often referred to as Jeans Theorem Binney and Tremaine (1987); Merritt (2013). The model parameters that we seek, can be recast as related to such identified time-independent functions of all/some state space coordinates of motion. Thus, by expressing the state space pdf as a function of appropriate constants of motion, we can embed system parameters into the support of the sought pdf.

As stated above, this pdf will then need to be projected into the space of observables , and we will convolve such a projected pdf with the error density, at every choice of the model parameters. Then assuming the data to be

, the product of such a convolution over the whole data set will finally define our likelihood. Using this likelihood, along with appropriate priors, we then define the posterior probability density of the model parameters and the state space

, given the data . Subsequently we generate posterior samples using Metropolis-within-Gibbs. scheme.

We recall that in absence of training data on a pair of r.v.s, we cannot learn the correlation structure of the functional relationship between these variables. In such situations, instead of the full function, we can only learn the vectorised version of the sought function. In other words, the relevant interval of the domain of the function is discretised into a bin, and the value of the function held a constant over any such bin; we can learn the functional value over any such bin.

3 Astrophysics Application

Our astrophysics application is motivated by the wish to learn the contribution of dark matter, to the density function of all gravitating mass in a distant galaxy. While information on light-emitting matter is available, it is more challenging to model the effects of dark matter since, by definition, one cannot observe such matter (as it does not emit/reflect light of any colour). However, physical phenomena such as: the distortion of the path of light by gravitational matter acting as gravitational lenses; temperature distribution of hot gas that is emanating from a galaxy; motions of stars or other galactic particles that is permitted in spite of the attractive gravitational pull of the surrounding galactic matter, allow us to confirm that non-observable, dark matter is contributing to the overall gravitational mass density of the galaxy. In fact, astrophysical theories suggest that the proportion of dark matter in older galaxies (that are of interest to us here) is the major contributor to the galactic mass, over the minor fraction of luminous galactic matter Kalinova (2014). We can compute the proportion of this contribution, by subtracting the density of the luminous matter from the overall density. It is then necessary to learn the gravitational mass density of the whole system in order to learn the density of dark matter.

We begin by considering the galaxy at hand to be a stationary dynamical system, i.e. the distribution of the state space variable does not depend on time. Let define the state space variable of a galactic particle, where is defined as its 3-dimensional location vector and as the 3-dimensional velocity vector of the galactic particle. Our data consists of measurement of the one observable velocity coordinate , and two observable spatial coordinates, , of galactic particles (eg. stars). That is, for each galactic particle, we have measurements of . For observations, our data is thus .

The system function that we are interested in learning here, is the density function of the gravitational mass of all matter in the considered galaxy, where we assume that this gravitational mass density is a function of the spatial coordinates only. This system function does indeed inform on the structure of the galactic system – for it tells us about the distribution of matter in the galaxy; it also dictates the behaviour of particles inside the galaxy, since the gravitational mass density is deterministically known as a function of the gravitational potential via the Poisson equation (, where is the known Universal Gravitational constant, and is the Laplacian operator), which is one of the fundamental equations of Physics Goldstein et al. (2002). The potential of a system dictates system dynamics, along with the state space distribution.

Here, we assume that the state space density of this dynamical system does not vary with time, i.e. . This follows from the consideration that within a typical galaxy, collisions between galactic particles are extremely rare Binney and Tremaine (1987). We thus make the assumption of a collisionless system evolving in time according to the Collisionless Boltzmann Equation (CBE) Binney and Tremaine (1987); Choudhuri (2010). As motivated above, this allows us to express the state space pdf as dependent on those functions of that remain invariant with time, along any trajectory in the state space ; such time-invariant constants of motion include energy, momentum, etc. It is a standard result that the constant of motion that the state space has to depend on, is the energy of a galactic particle Binney (1982); Contopoulos (1963), where represents the Euclidean norm of a vector. Here, energy is given partly by kinetic energy that is proportional to , and partly by potential energy, which by our assumption, is independent of velocities. Secondly, given that the state space is -dimensional, the number of constants of motion

5, in order to let the galactic particle enjoy at least 1 degree of freedom, i.e. not be fixed in state space

Contopoulos (1963).

We ease our analysis by assuming that the state space is a function of energy only. This can be rendered equivalent to designating the symmetry of isotropy to the state space , where isotropy implies invariance to rotations in this space, i.e. the state space is assumed to be such a function of and , that all orthogonal transformations of and preserve the state space . The simple way to achieve the equivalence between a isotropic state space and the lone dependence on energy of the , is to ensure that the gravitation mass density, (and therefore the gravitational potential) at all points at a given Euclidean distance from the galactic centre, be the same, i.e. the distribution of gravitational mass abides by spherical symmetry s.t. (and therefore ) depends on via the Euclidean norm of the location vector of a particle. Then energy is given as the sum of the -dependent kinetic energy, and the -dependent potential energy. Spherical mass distribution is not a bad assumption in the central parts of “elliptical” galaxies that are of interest for us, as these have a global triaxial geometry.

To summarise, state space is written as , and we embed into the support of this state space , by recalling that energy is partly the gravitational potential energy that is deterministically related to the gravitational mass density through Poisson equation.

As there is no training data available to learn the correlation structure of the sought functions and , we can only learn values of these functions at specified points in their domains, i.e. learn their vectorised forms and respectively, where , with for . The discretised form of is similarly defined, after discretising the relevant (non-positive) -values (to indicate that the considered galactic particles are bound to the galaxy by gravitational attraction), into number of -bins. Then in terms of these vectorised versions of the state space likelihood of the unknown parameters , given data on the observable is:


where is the projected state space pdf.

We also require that , and that . The latter constraint is motivated by how the mass in a gravitating system (such as a galaxy) is distributed; given that gravity is an attractive force, the stronger pull on matter closer to the centre of the galaxy, implies that gravitational mass density should not increase, as we move away from the centre of the system. These constraints are imposed via the inference that we employ.

4 Inference

Inference on the unknown parameters – that are the components of and – is undertaken using Metropolis-within-Gibbs. In the first block update during any iteration, the parameters are updated, and subsequently, the parameters are updated in the 2nd block, at the updated -parameters, given the data that comprises measurements of the observed state space variables that are the components of the observable vector .

Imposition of the monotonicity constraint on the parameters, s.t. , , renders the inference interesting. We propose from a Truncated Normal proposal density that is left truncated at , , and propose

from a Truncated Normal that is left truncated at 0. The mean of the proposal density is the current value of the parameter and the variance is experimentally chosen, as distinct for each

. Such a proposal density helps to maintain the non-increasing nature of the -parameters, with increasing . At the same time, non-negativity of these parameters is also maintained. We choose arbitrary seeds for , and using these as the means, a Gaussian prior is imposed on each parameter. The variance of the prior densities is kept quite large, and demonstration of lack of sensitivity to the prior choices, as well as the seeds, is undertaken.

Figure 1: Results from a MCMC scheme showing the HPDs for all the parameters to learn, for both PNe (top row) and GC (bottom row) data. Modes are shown as red dots.Top Row: HPDs on the (left), and the parameters for the PNe data. Bottom Row: HPDs on the (left), and the parameters for the GC data.

As for components of the vectorised state space , there is no correlation information to be enjoyed in this case, unlike in the case of the components of the vectorised gravitational mass density function. We propose from a Truncated Normal (to maintain non-negativity), where the mean of this proposal density is the current value of the parameter and the variance is chosen by hand. Loose Gaussian priors are imposed, while the same seed value is used .

An important consideration in our work is the choice of and . We could have treated these as unknowns and attempted learning these from the data; however, that would imply that the number of unknowns is varying from one iteration to another, and we desired to avoid such a complication, especially since the data strongly suggests values of and . We choose by binning the range of values in the data , s.t. each resulting -bin includes at least one observed value of in it, and at the same time, the number of -bins is maximised. Again, we use the available data to compute the empirical values of energy , where an arbitrarily scaled histogram of the observed is used to mimic the vectorised gravitational mass density function, that is then employed to compute the empirical estimate of the vectorised gravitational potential function, that contributes to values. We admit maximal -bins over the range of the empirically computed values of , s.t. each such -bin contains at least one datum in .

5 Results

We have input data on location and velocities of 2 kinds of galactic particles (called “Globular Clusters”, and “Planetary Nebulae” – respectively abbreviated as GC and PNe), available for the real galaxy NGC4494. The GC data comprises measurements of , for the GCs in NGC4494 Foster et al. (2011). Our second data set (PNe data), comprises measurements of the PNe Napolitano et al. (2009). Results of the learnt HPDs for all parameters, given both PNe (top row) and GC (bottom row) data, are shown in Figure 1. Significant inconsistencies between the learnt gravitational mass density parameters can suggest interesting dynamics, such as splitting of the galactic state space into multiple, non-communicating sub-spaces Chakrabarty (2017), but for this galaxy, it is noted that such parameters learnt from the 2 datasets, concur within learnt HPDs.

6 Conclusions

An astronomical implication of our work is that learnt from either dataset suggests a very high gravitational mass density in the innermost -bin ( 1.6kpc), implying gravitational mass times mass of the Sun, enclosed within this innermost radial bin. This result alone does not contradict the suggestion that NGC4494 harbours a central supermassive blackhole (SMBH) of mass solar masses Sadoun and Colin (2012). Very interestingly, our results indicate that for both GCs and PNe, most particles lie in the intermediate range of energy values; this is also borne by the shape of the histogram of the empirically computed energy using either dataset, where this empirical value computation is discussed in the last paragraph of Section 4. However, owing to its intense radially inward gravitational attraction, a central SMBH is expected to render the potential energy (and therefore the total energy ) of the particles closer to the galactic centre, to be much higher negative values, than those further away, while also rendering the number (density) of particles to be sharply monotonically decreasing with radius away from the centre. This is expected to render the energy distribution to be monotonically decreasing as we move towards more positive values – in contradiction to our noted non-monotonic trend. So while our results are not in contradiction to the report of a very large value of mass enclosed within the inner parts of NGC4494, interpretation of that mass as a SMBH does not follow from our learning of the state space .

The learning of the gravitational mass density function, and state space – as well as that of the relation between the observable state space coordinates, and the system function/vector – can be done after generating the training dataset relevant to the functional learning problem at hand. Applications in Petrophysics and Finance are also planned.


  • Ahmad et al. (2017) Ahmad, S., Lavin, A., Purdy, S., and Agha, Z. (2017). Unsupervised real-time anomaly detection for streaming data. Number Volume 262 in Neurocomputing.
  • Binney (1982) Binney, J. (1982). Dynamics of elliptical galaxies and other spheroidal components. Annual Review of Astronomy and Astrophysics, 20:399–429.
  • Binney and Tremaine (1987) Binney, J. and Tremaine, S. (1987). Galactic Dynamics. Princeton University Press, Princeton.
  • Chakrabarty (2017) Chakrabarty, D. (2017). A new bayesian test to test for the intractability-countering hypothesis. Jl. of American Statistical Association, 112:561–577.
  • Chakrabarty et al. (2015) Chakrabarty, D., Biswas, M., and Bhattacharya, S. (2015). Bayesian nonparametric estimation of milky way parameters using matrix-variate data, in a new gaussian process based method. Electronic Journal of Statistics, 9(1):1378–1403.
  • Choudhuri (2010) Choudhuri, A. (2010). Astrophysics for Physicists. Cambridge University Press.
  • Coates et al. (1999) Coates, G. R., Xhao, L., and Prammer, M. G. (1999). NMR logging; principles applications. Halliburton Energy Services Publication H02308, Houston.
  • Contopoulos (1963) Contopoulos, G. (1963). A classification of the integrals of motion. Astrophysical Jl, 138:1297–1305.
  • Foster et al. (2011) Foster, C. et al. (2011). Global properties of ordinary early-type galaxies: photometry and spectroscopy of stars and globular clusters in ngc 4494. Monthly Notices of Royal Astronomical Society, 415:3393–3416.
  • Goldstein et al. (2002) Goldstein, H., Poole, C. P., and Safko, J. (2002). Classical Mechanics. Addison-Wesley Longman, Incorporated.
  • Gressman and Strain (2010) Gressman, P. T. and Strain, R. M. (2010). Global classical solutions of the boltzmann equation with long-range interactions. Proceedings of the National Academy of Sciences.
  • Hastie et al. (2009) Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of Statistical Learning. Springer-Verlag New York.
  • Kalinova (2014) Kalinova, V. (2014). Mass Distributions of Galaxies from SAURON and CALIFA Stellar Kinematic Maps. Doctoral thesis, Max-Planck-Institut fur Astronomie.
  • Merritt (2013) Merritt, D. (2013). Dynamics and Evolution of Galactic Nuclei. Princeton University Press, Princeton.
  • Napolitano et al. (2009) Napolitano, N. et al. (2009). The Planetary Nebula Spectrograph elliptical galaxy survey: the dark matter in NGC 4494. Monthly Notices of the Royal Astronomical Society, 393:329–353.
  • Neal (1998) Neal, R. M. (1998). Regression and classification using gaussian process priors (with discussion). In et. al, J. M. B., editor, Bayesian Statistics 6, pages 475–501. Oxford University Press.
  • Rasmussen and Williams (2006) Rasmussen, C. E. and Williams, C. K. (2006).

    Gaussian Processes for Machine Learning

    The MIT Press, MIT.
  • Russel and Norvig (2009) Russel, S. and Norvig, P. (2009). Artificial Intelligence: A Modern Approach. Pearson, 3 edition.
  • Sadoun and Colin (2012) Sadoun, R. and Colin, J. (2012). Mσrelation between supermassive black holes and the velocity dispersion of globular cluster systems. Monthly Notices of the Royal Astronomical Society: Letters, 426(1):L51–L55.
  • Sun et al. (2011) Sun, X., Yao, H., Ji, R., Liu, X., and Xu, P. (2011). Unsupervised fast anomaly detection in crowds. In Proceedings of the 19th ACM International Conference on Multimedia, MM ’11, pages 1469–1472, New York, NY, USA. ACM.
  • Wang et al. (2016) Wang, P., Jain, V., and Venkataramanan, L. (2016). Sparse bayesian t1-t2 inversion from borehole nmr measurements. In Proceedings of SPWLA 57 Annual Logging Symposium, 25-29 June 2016, Reykjavik.