In astrophysics, distributions constructed by energy measurements in different wavelengths, namely Spectral Energy Distributions (SEDs), are important tools for studying the physical properties and evolution of astronomical objects. SEDs can be used for example to determine the luminosity of astronomical objects, the rate at which galaxies form new stars or the rate at which supermassive black holes accrete mass to generate energy in quasars Efstathiou1 ; Efstathiou2 ; rowanrobinson . However, the measurement process is prone to statistical (random) as well as systematic errors, such as background and foreground noise interference, i.e., atmospheric absorption and distortion, opaque/obscuring dust, etc. Due to these factors, as well as technical limitations, such as camera sensor sensitivity, cooling, resolution, etc, SEDs are collected in scarce, often incomplete datasets. SEDs are compared to physical models in order to find the best-fit model(s), which provides an insight into the underlying physical processes and properties of the target. This highlights the importance of expanding the range and improving the accuracy of the available data points. In the literature, computational methods have been widely used to enhance SEDs and handle the experimental error Walcher_2010 .
In recent years, deep learning has proven to be an important tool for enhancement of real data and in general for solving inverse problems, where the goal is to reconstruct or correct a signal given an incomplete and/or noisy version. Specifically for astronomical data, deep learning techniques have been used mainly for astronomical imaging, such as deblending images of galaxiesBoucaud_2019 or image enhancement lanusse2019hybrid
. For SEDs, deep learning has been used in forward problems such as feature extractionrefId0 , but not inverse problems. In this paper we use well-known deep learning techniques adjusted appropriately in order to solve various inverse problems for SEDs.
The method we apply is data-driven and utilizes Deep Generative Models as learned structural priors. More specifically, models like Variational AutoEncoders (VAEs)vae1
and Generative Adversarial Networks (GANs)gans , trained on large datasets (most frequently of images) are able to extract information about the underlying data distribution and generate realistic samples. These models, once trained, can be used as structural priors for solving inverse problems bora17 . Thus, this methods requires us to train a high-quality generative network which can model realistic SEDs, with properties such as high-frequency, irregularity etc. In this paper, we use the Generative Latent Optimization framework (GLO) glo to train a deep generative network suitable for our needs. The framework allows us to train a high-quality generative network with more flexibility than a VAE and at the same time offers training efficiency unlike GANs, which are notoriously hard to train.
In order to train a generative network any state-of-the-art method requires a high-quality large dataset. However, for the case of SEDs these prerequisites are unrealistic since the measurement procedure contains innate error, incompleteness and is particularly expensive. To overcome the issue of erroneous and/or incomplete samples we propose an end-to-end approach: () a preprocessing step where we utilize classical computational methods for enhancement, e.g., iterative PCA Walcher_2010 , () the deep learning method described above. Our approach is useful for a variety of inverse problems and it can mitigate the long-term cost of solving such problems for SEDs. Furthermore, it is expected to improve overall performance on these problems even with significant corruption and/or incompleteness by leveraging the powerful generalization property as well as the robustness of a deep generative network.
Suppose, we collect measurements of the form:
where is a measurement matrix and
a noise vector. Our goal is to reconstruct the signal, given and , thus solve the linear inverse problem. This formulation usually refers to compressed sensing (where we assume few measurements taken) but can be also used to model several real-world problems concerning SEDs.We tackle the problem for the case of SEDs using a deep generative network as a structural prior bora17 , a method that has been successfully applied to natural images.
2.1 Building the Generative Network
To build our generative network we use the Generative Latent Optimization (GLO) framework glo , which allows us to train a relatively large generator (sufficiently over-parametrized) in order to achieve good generalization generalization_DNN . The framework is based on the manipulation of the generator’s latent space as well as its parameters using a simple reconstruction loss. We use the GLO framework as an alternative to GANs which are trained via an adversarial optimization scheme. Unlike GLO, which consists of a simple loss minimization back-propagated to the latent space, GANs should ideally converge to an (approximate) equilibrium which is not guaranteed and/or requires excessive resources gan_equilibria . Thus, when training GANs in practice it is common to examine the generated samples and stop the training when they are satisfactory. In the case of images this technique can be easily applied, but for SEDs this is not feasible. In fact, we use the ability to solve inverse problems as a proxy to evaluate our trained generator.
Let us examine the training procedure more closely. We train the generator , where denotes the latent space and the underlying class of SEDs which is described by the training set . Prior to training, we randomly initialize the latent codes
from a multi-dimensional Normal distribution and pair them with each of the samples. During training, the generator’s parameters and the latent codes are jointly optimized, as described by (2). The optimization is driven by a simple reconstruction loss , which in our case is Mean Squared Error (MSE).
More specifically, the gradient of the loss function with respect to the parameters of the generator and the latent code is back-propagated all the way through the network and to the latent space. This training procedure makes the latent space more structurally meaningful and suitable for reconstruction. To promote this feature, we project the latent codes onto the unit sphere during trainingglo .
Given the generative network
, the estimated solution of an inverse problem (1) could be where:
In other words, we (approximately) optimize the latent code such that the corresponding signal matches the measurements . We optimize by back-propagating the gradient of the reconstruction loss through bora17 . Note that we have to project onto the unit sphere, similarly to training. In a different approach bora17 , instead of explicitly projecting onto the unit sphere, we can apply a regularization to implicitly restrict as follows:
where is the regularization term and
a balance hyperparameter.
We apply our approach to Sloan Digital Sky Survey (SDSS) spectra 111https://www.sdss.org/dr12/spectro/. Specifically, we use the preprocessed SDSS Corrected Spectra dataset offered by the astroML library astroml . The dataset contains SEDs for galaxies moved to restframe, preprocessed with iterative PCA and resampled to wavelengths (
). Although the preprocessing is imperfect, leading to outlier values, our deep learning approach still displays great performance due to its robustness. Notice that the original SDSS dataset consists of innately incomplete and/or corrupted SEDs, due to the nature of the measurement process. Thus, the original SEDs cannot be used directly for evaluation purposes because we would lack the ground truth. Instead, we consider part of the corrected SEDs produced by the preprocessing step as test data (of the preprocessed dataset), which we use for comparisons in Section 3.3.
We train a Feed-forward neural network withhidden layers and leakyReLU activations (except for the output layer). We use of the preprocessed dataset as our training data and train our network for epochs with batches of spectra. We use Adam optimization adam with learning rate for the network’s parameters and
for the latent codes as well as 1d-batch normalization to accelerate the training procedurebatchnorm . We choose a simple Mean Squared Error (MSE) as our loss function and we also apply weight decay to avoid overfitting.
Our spectra consist of measurements for wavelengths. We choose dimensions for the latent space, which are sufficient for the representation and allow for efficient training and reconstruction. For the reconstruction, we limit the optimization procedure to
epochs and choose a configuration similar to training. The project is developed using PyTorchpytorch .
We evaluate our approach, both qualitatively and quantitatively, for different inverse problems, by artificially injecting realistic corruption and/or incompleteness to our test data. For the qualitative evaluation (Figure 1), we examine the performance of our algorithm on inverse problems with missing information. More specifically, the missing information corresponds to either a continuous window of missing values (inpainting) or randomly chosen values throughout the entire signal (super-resolution). For each problem we randomly select four SED signals from our test set, then apply the appropriate masking and produce a reconstruction. The missing information in both cases is chosen to be of the total number of measurements that compose each SED. We can see that for both problems, the reconstruction closely follows the trajectory of the original signal and in most cases predicts the high-frequency changes and large spikes.
In Figure 2, we examine quantitatively the performance of our algorithm on the problems of (a) inpainting and (b) denoising (that is removing added noise drawn by a normal distribution). For each configuration, we show the reconstruction MSE of randomly selected SEDs (excluding measurements that fall outside times the interquartile range). In Figure 2(a), we examine the performance for different levels of missing information and compare between SEDs drawn from test and training data. In both cases the MSE of the vast majority of signals is particularly low for up to of missing information. For reasonable percentages of missing information the performance on test data is on par with the training data. For a sufficient proportion of the examined signals, this trend persists even for larger percentages (see median values). Given that our generative network was optimized to represent the training data, this shows a considerable generalization capability, which is crucial for the effectiveness of our approach. In Figure 2(b), we examine the performance for different levels of added noise and compare between our reconstruction methods, the explicit projection (eq. 3) and the regularization (eq. 4). We can see that for all levels of added noise the MSE is particularly low, which indicates notable performance on denoising. Furthermore, when regularization is utilized we observe better error concentration, which can be attributed to the flexibility it offers to the reconstruction process.
4 Conclusion and Future Work
We presented an end-to-end deep learning solution for various inverse problems concerning Spectral Energy Distributions (SEDs). Our approach relies on a deep generative network, tailored to the particular properties of SEDs, as a structural prior leveraging its generalization capability. Our preliminary results show promising performance on realistic inverse problems. We are working to extend this project to diverse and more demanding SED families e.g.
, for different parts of the spectrum. Another future direction involves transfer learning techniques, as well as ensemble learning in order to extend our approach to data that are even more incomplete. Finally, we could augment our method using bi-directional training in order to simultaneously extract information regarding the astrophysical objects we study. This idea draws from recent research on invertible neural networks for inverse problemsinvertibleNNs .
This project will have broad impact in the effort to interpret the SEDs which will be made available with a number of current and future ground-based and space missions such as LSST, Euclid, JWST and SPICA. Although the examples used in this work concentrate on the optical part of the spectrum, the same method can also be used on SEDs which cover the whole spectrum of galaxies from the ultraviolet to the radio. Such studies of the complete SEDs of galaxies are now recognized as essential for a complete understanding of the processes that control galaxy formation and evolution (e.g. rowanrobinson ; shirley ).
-  Andreas Efstathiou and Michael Rowan-Robinson. Dusty discs ih active galactic nuclei. Monthly Notices of the Royal Astronomical Society, 273(3):649–661, 1995.
-  Andreas Efstathiou, Michael Rowan-Robinson, and Ralf Siebenmorgen. Massive star formation in galaxies: radiative transfer models of the uv to millimetre emission of starburst galaxies. Monthly Notices of the Royal Astronomical Society, 313(4):734–744, 2000.
-  Michael Rowan-Robinson and et al. Spectral energy distributions and luminosities of galaxies and active galactic nuclei in the spitzer wide-area infrared extragalactic (swire) legacy survey. The Astronomical Journal, 129:1183–1197, March 2005.
-  Jakob Walcher, Brent Groves, Tamás Budavári, and Daniel Dale. Fitting the integrated spectral energy distributions of galaxies. Astrophysics and Space Science, 331(1):1–51, Aug 2010.
-  Alexandre Boucaud, Marc Huertas-Company, Caroline Heneka, Emille E O Ishida, Nima Sedaghat, Rafael S de Souza, Ben Moews, Hervé Dole, Marco Castellano, Emiliano Merlin, and et al. Photometry of high-redshift blended galaxies using deep learning. Monthly Notices of the Royal Astronomical Society, 491(2):2481–2495, Dec 2019.
-  Francois Lanusse, Peter Melchior, and Fred Moolekamp. Hybrid physical-deep learning model for astronomical inverse problems, 2019.
Frontera-Pons, J., Sureau, F., Bobin, J., and Le Floc´h, E.
Unsupervised feature-learning for galaxy seds with denoising autoencoders.A&A, 603:A60, 2017.
-  Diederik P. Kingma and Max Welling. An introduction to variational autoencoders. CoRR, abs/1906.02691, 2019.
-  Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks, 2014.
Ashish Bora, Ajil Jalal, Eric Price, and Alexandros G. Dimakis.
Compressed sensing using generative models.
Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, pages 537–546. JMLR.org, 2017.
-  Piotr Bojanowski, Armand Joulin, David Lopez-Paz, and Arthur Szlam. Optimizing the latent space of generative networks. 2017.
-  Behnam Neyshabur, Srinadh Bhojanapalli, David Mcallester, and Nati Srebro. Exploring generalization in deep learning. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 5947–5956. Curran Associates, Inc., 2017.
-  Frans A. Oliehoek, Rahul Savani, Jose Gallego-Posada, Elise van der Pol, and Roderich Groß. Beyond local nash equilibria for adversarial networks. CoRR, abs/1806.07268, 2018.
-  J.T. Vanderplas, A.J. Connolly, Ž. Ivezić, and A. Gray. Introduction to astroml: Machine learning for astrophysics. In Conference on Intelligent Data Understanding (CIDU), pages 47–54, Oct 2012.
-  Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. International Conference on Learning Representations, 12 2014.
-  Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, ICML’15, page 448–456. JMLR.org, 2015.
-  Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. 2017.
-  Lynton Ardizzone, Jakob Kruse, Sebastian J. Wirkert, Daniel Rahner, Eric W. Pellegrini, Ralf S. Klessen, Lena Maier-Hein, Carsten Rother, and Ullrich Köthe. Analyzing inverse problems with invertible neural networks. CoRR, abs/1808.04730, 2018.
-  Raphael Shirley, Yannick Roehlly, Peter D. Hurley, Veronique Buat, María del Carmen Campos Varillas, Steven Duivenvoorden, Kenneth J. Duncan, Andreas Efstathiou, Duncan Farrah, Eduardo González Solares, Katarzyna Malek, Lucia Marchetti, Ian McCheyne, Andreas Papadopoulos, Estelle Pons, Roberto Scipioni, Mattia Vaccari, and Seb Oliver. HELP: a catalogue of 170 million objects, selected at 0.36-4.5 m, from 1270 deg of prime extragalactic fields. Monthly Notices of the Royal Astronomical Society, 490(1):634–656, November 2019.