The study of galaxy mergers plays an important role in discerning the existence of different types of galaxies, documenting their origins, and furthering our understanding of the evolution of the entire universe and its appearance today. It has been shown that machine learning can greatly advance the study of merging galaxies [1,16,12,4], yet without the ability to connect the knowledge obtained from disparate large-scale simulations and astronomical surveys, we are at a significant disadvantage towards the goal of harnessing all available data.
Images produced by astronomical simulations are made to mimic real observations from a particular telescope, but even the slightest differences (which are unavoidable) can cause a classification model trained on simulated images to perform substantially worse on related real data. One such example can be found in , where authors used a data set of merging galaxies from the EAGLE simulation  made to mimic SDSS observations and real SDSS images . In this paper, authors show that the performance of the classifier on the task of distinguishing merging from non-merging galaxies trained on one data set had much lower accuracy when classifying the other data set: , with the classifier trained on real SDSS images and then applied to EAGLE simulation images yielding particularly poor performance. Similarly, in  authors work with distant merging galaxies from the Illustris-1 cosmological simulation . Here it was shown that, even in the case where the domains only differ due to inclusion of noise to mimic the Hubble Space Telescope observations, the accuracy of classification in the domain the model was not trained on hovers around , no better than random guessing. These two examples are indicative of the need for domain adaptation techniques to be applied in astrophysical contexts.
Domain adaptation techniques are used to detect the shift between source domain and target domain distributions [5,8,19]. This functionality is very useful in situations often found in astronomy where models are trained on labeled simulations and then applied to unlabeled real data. In this paper we apply two domain adaptation techniques as transfer loss: MMD [15,7] and adversarial training using DANNs . We demonstrate both techniques on a data set similar to the one from : we use simulated distant merging galaxies from Illustris-1 at redshift , both without (source) and with (target) the addition of random sky shot noise to mimic observations from the Hubble Space Telescope. We also test two networks for a comparison of results across architectures: DeepMerge, a simple network for classification of galaxies presented in , as well as the more complex and well-known ResNet18 . In both cases, we show that the use of the identified domain adaptation techniques lead to a significant improvement in the performance of the classifier on the target domain.
In our experiments, the neural network is trained using the total loss :
where we label and as classifier loss (cross-entropy) and transfer loss. The effects of MMD and adversarial training are applied through the latter which is weighted by constant .
In order to detail MMD and adversarial training below, we first introduce the following conventions: we denote the source and target domains as and and their respective distributions as and . Since the source domain data are labeled, we have pairs of images and labels , while in the unlabeled target domain we only have unlabeled images . Finally, data from both domains are associated with domain labels: for source and for target domain.
Maximum Mean Discrepancy (MMD)
as a transfer loss works by minimizing the distance between the means of and
. While it is possible to estimateand , in practice, no computationally expensive density estimation is necessary . Instead, kernel methods may be applied to determine their means for subtraction and an optimization is undertaken in an RKHS (Reproducing Kernel Hilbert Space):
where denotes the kernel distance as a proxy for mean discrepancy, and
are random variables drawn fromand respectively,
closely resembles a cumulative distribution function, and the simplification follows from the reproducibility property of RKHS:[8,11]. While in practice, can be considered a general kernel, we follow  where
is a linear combination of multiple Gaussian Radial Basis Function (RBF) kernels to extend across a range of mean embeddings.
By this definition, if , then there must exist some such that the distance between the two means is maximized . Clearly, the inner product is maximized for the identity . Therefore, must equal - to maximize the mean discrepancy . This leaves us with the final transfer loss, after some discretization:
where is the total number of samples. Here the distance is expressed as the difference between the self-similarities of source () and target () domains and their cross-similarity ().
Domain adversarial training
employs a DANN to distinguish between the source and target domains . DANNs are comprised of three parts: feature extractor (), label predictor (), and domain classifier (
). Like all deep learning models applied for the classification of images, the feature extractor uses convolutional layers to extract features from images, while the label predictor has dense layers which output the class label. In contrast, the domain classifier, which is also built from dense layers, is unique to DANNs and is used to predict the domain labels.
The domain classifier is added after the feature extractor as a parallel branch to the label predictor, and it includes a gradient reversal layer which maximizes the loss for this branch of the neural network.This leads to the feature extractor being trained with an adversarial objective to confuse the domain classifier. When the domain classifier fails to discriminate between the domains, domain-invariant features have been found and the classifier can then be successfully applied across the two domains.
Domain classifier loss is calculated as:
are the output probabilities for the source domain and target domain labels respectively, calculated using cross-entropy loss. Finally, we designate this domain classifier loss as our transfer loss:.
We use a similar data set as in , where authors extract galaxies at redshift from Illustris-1. Our dataset differs only by the addition of one more filter to get three channel images: ACS F814W, NC F356W, WFC3 F160W. The source domain includes images from Illustris-1 convolved with a model point-spread function (PSF), while the target domain additionally includes random sky shot noise. More details about the data set can be found in . The source and target domains contain merger and non-merger images ( pixels). We divide these data sets into training, validation, and testing samples: .
We present the performance of both domain adaptation techniques in two neural network architectures: the DeepMerge architecture introduced in  and the more complex ResNet18 . Both networks are trained for the task of distinguishing between two classes of objects: merging and non-merging galaxies. We first train our two classifier networks without any domain adaptation on the pristine labelled source data only. We then train with the addition of MMD and domain adversarial training. Both deep domain adaptation techniques involve training with both the pristine labelled source data and the unlabelled noisy target data. Finally, we evaluate all three training configurations on both the source and target domain data.
In all experiments, we use the Adam optimizer  with implemented "one-cycle" scheduling, which was shown to lead to much faster convergence of training accuracy . We also include early stopping, to prevent overfitting. Additionally, our choice of hyperparameters was informed from the results of a hyperparameter search using DeepHyper [3,2], with only one of the domain adaptation techniques employed for each network. Furthermore, in all experiments we use the same fixed random seed (1) to shuffle images and initialize network weights in order to ensure result consistency.
Resulting source and target classification accuracies of merging and non-merging galaxies for the three experiments detailed above are given in Table 1. We designate our base for improvement as the case without domain adaptation, where, as expected, test accuracy on source images is high, while the classifier performs much worse on target domain images.
|No Domain Adaptation|
While we expected that with domain adaptation we would see a slight decrease in performance in the source domain in order to compensate for the recognition of shared features across domains, what we actually observe is an increase in performance for the source domain accuracy. We believe this is due to the regularizing effect of the additional transfer loss included in MMD and adversarial training, which assists in preventing overfitting on the source training data set which allows longer training of the model. As expected, the target domain classification accuracy improves in both training with MMD and adversarial training. Additional metrics for performance comparison on the source (dashed bars) and target (solid bars) domain test set of images are presented in the top row of Figure 1. Here training without domain adaptation is navy, MMD is violet, and adversarial training is orange. The bottom row of Figure 1 shows the comparison of ROC curves (Receiver Operating Characteristics) for source and target test set of images, both with and without domain adaptation, with the same color and hatching scheme.
Furthermore, between the two networks, we posit that the smaller improvements made with ResNet18 in the target domain are the result of the much greater architecture complexity — two orders of magnitude more trainable parameters than DeepMerge — making it more susceptible to overfitting on the source domain. Early stopping patience and weight decay were invoked to tackle this issue, but resulted in only marginal improvements. Since we found the methods to be extremely sensitive to the hyperparameters chosen, we feel there is still room for further improvement with the choice of optimal parameters (the hyperparameter search was not run on the task without domain adaptation) and perhaps even network pruning.
While the results are quite sensitive to the choice of hyperparameters, we report that they are robust to random seed choice. We ran 10 different random seeds for all experiments with DeepMerge, and did not see significant deviation in performance outside of the target test set without domain adaptation (which is expected since the classifier does not work). We report the following meanfor each experiment: no domain adaptation source () and target (); MMD source () and target (); adversarial training source () and target ().
Astronomy is entering the big data era with a plethora of simulations and many ongoing and future large surveys. Without the ability to connect the knowledge obtained from these different domains, we are at a significant disadvantage to harness all available data. In this paper, we show the promise for the use of domain adaptation techniques, like MMD and adversarial training, in astronomy to substantially improve the performance of a source-trained model on a new and often unlabeled target domain data set. While the scope of this paper is to demonstrate the efficacy of MMD and adversarial training in the case with two simulated domains that differ only due to the inclusion of observational noise, our future work will address results of these techniques applied to simulated and real observational data.
We acknowledge that, while domain adaptation techniques can be very powerful, their ultimate performance depends on the similarity between the source and target domains. To ensure the best possible performance across domains in astronomy— particularly when training with a simulated source domain and real target domain, the simulated data must be made to mimic the target domain and should contain only in-distribution objects for classification. Differences due to the limitations of the simulator, or differences in the noise and other observational effects added to the simulated images, can then be addressed by domain adaptation. It is for this reason that we firmly believe that studying and refining domain adaptation techniques will prove crucial to successfully deploying deep learning models in astronomy.
This research will impact the astronomy community but also the wider scientific community, since domain transfer problems are very common in many areas of research. In experimental high-energy physics, astronomy and cosmology, biotechnology, etc. research often involves studying physical processes using simulations either before real data becomes available or in conjunction with it. This paper demonstrates the capability of domain adaptation techniques as an important tool in this process.
The authors of this paper have committed themselves to performing this work in an equitable, inclusive, and just environment, and we hold ourselves accountable, believing that the best science is contingent on a good research environment. We acknowledge the Deep Skies Lab as an open community of multi-domain experts and collaborators. This community was important for the development of this project.
This manuscript has been supported by Fermi Research Alliance, LLC under Contract No. DE-AC02-07CH11359 with the U.S. Department of Energy, Office of Science, Office of High Energy Physics. This research has been partially supported by the High Velocity Artificial Intelligence grant as part of the Department of Energy High Energy Physics Computational HEP sessions program.
This research used resources of the Argonne Leadership Computing Facility at Argonne National Laboratory, which is a user facility supported by the Office of Science of the U.S. Department of Energy under contract DE-AC02-06CH11357.
 Sandro Ackermann, Kevin Schawinski, Ce Zhang, Anna K. Weigel, and M. Dennis Turp. Using transferlearning to detect galaxy mergers. Monthly Notices of the Royal Astronomical Society, 479(1):415–425, September 2018.
 Prasanna Balaprakash, Romain Egele, Misha Salim, Stefan Wild, Venkatram Vishwanath, Fangfang Xia, Tom Brettin, and Rick Stevens. Scalable reinforcement-learning-based neural architecture search forcancer deep learning research. In
 Prasanna Balaprakash, Romain Egele, Misha Salim, Stefan Wild, Venkatram Vishwanath, Fangfang Xia, Tom Brettin, and Rick Stevens. Scalable reinforcement-learning-based neural architecture search forcancer deep learning research. InProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’19, New York, NY, USA, 2019. Association for Computing Machinery.
 Prasanna Balaprakash, Salim Misha, Tomas D. Uram, Venkatram Vishwanath, and Stefan M. Wild. Deep-hyper: Asynchronous hyperparameter search for deep neural networks. In 2018 IEEE 25th International Conference on High Performance Computing (HiPC), pages 42–51, 2018.
 Aleksandra Ćiprijanović, Gregory F. Snyder, Brian Nord, and Joshua E. G. Peek. DeepMerge: Classifying high-redshift merging galaxies with deep neural networks. Astronomy and Computing, 32:100390, July2020.
 Gabriela Csurka. A Comprehensive Survey on Domain Adaptation for Visual Applications, pages 1–35. Springer International Publishing, Cham, 2017.
 Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario March, and Victor Lempitsky. Domain-adversarial training of neural networks. Journal of Machine Learning Research, 17(59):1–35, 2016.
 Arthur Gretton, Karsten Borgwardt, Malte J. Rasch, Bernhard Schölkopf, and Alexander J. Smola. A Kernel Two-Sample Test. Journal of Machine Learning Research, 13(25):723–773, 2012.
 Arthur Gretton, Dino Sejdinovic, Heiko Strathmann, Sivaraman Balakrishnan, Massimiliano Pontil, Kenji Fukumizu, and Bharath K. Sriperumbudur. Optimal kernel choice for large-scale two-sample tests. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1205–1213. Curran Associates, Inc., 2012.
 Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition. arXiv e-prints, page arXiv:1512.03385, December 2015.
 Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. CoRR, abs:1412.6980, 2015.
 Sinno Jialin Pan, I. W. Tsang, J. T. Kwok, and Qiang Yang. Domain adaptation via transfer component analysis. Trans. Neur. Netw., 22(2):199–210, February 2011.
 William . J. Pearson, Lingyu Wang, James W. Trayford, Carlo E. Petrillo, and Floris F. S. van der Tak. Identifying galaxy mergers in observations and simulations with deep learning. Astronomy & Astrophysics, 626:A49, June 2019.
 Joop Schaye, Robert A. Crain, Richard G. Bower, Michelle Furlong, Matthieu Schaller, Tom Theuns,Claudio Dalla Vecchia, Carlos S. Frenk, I. G. McCarthy, John C. Helly, Adrian Jenkins, Y. M. Rosas-Guevara, Simon D. M. White, Maarten Baes, C. M. Booth, Peter Camps, Julio F. Navarro, Yan Qu, Alireza Rahmati, Till Sawala, Peter A. Thomas, and James Trayford. The EAGLE project: simulating the evolution and assembly of galaxies and their environments. Monthly Notices of the Royal Astronomical Society, 446(1):521–554, January 2015.
 Leslie N. Smith and Nicholay Topin. Super-convergence: very fast training of neural networks using large learning rates. In Tien Pham, editor, Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications, volume 11006, pages 369 – 386. International Society for Optics and Photonics,SPIE, 2019.
 Alexander J. Smola, Arthur Gretton, Le Song, and Bernhard Schölkopf. A hilbert space embedding for distributions. In Algorithmic Learning Theory, Lecture Notes in Computer Science 4754, pages 13–31, Berlin, Germany,October 2007. Max-Planck-Gesellschaft, Springer.
 Gregory F. Snyder, Vicente Rodriguez-Gomez, Jennifer M. Lotz, Paul Torrey, Amanda C. N. Quirk, Lars Hernquist, Mark Vogelsberger, and Peter E. Freeman. Automated distant galaxy merger classifications from Space Telescope images using the Illustris simulation. Monthly Notices of the Royal Astronomical Society, 486(3):3702–3720, July 2019.
 Mark Vogelsberger, Shy Genel, Volker Springel, Paul Torrey, Debora Sijacki, Dandan Xu, Greg Snyder, Dylan Nelson, and Lars Hernquist. Introducing the Illustris Project: simulating the coevolution of dark and visible matter in the Universe. Monthly Notices of the Royal Astronomical Society, 444(2):1518–1547, October 2014.
 Mei Wang and Weihong Deng. Deep visual domain adaptation: A survey. Neurocomputing, 312:135 –153, 2018.
 Garrett Wilson and Diane J. Cook. A survey of unsupervised deep domain adaptation. ACM Trans. Intell. Syst. Technol., 11(5), July 2020.
 Donald G. York, J. Adelman, Jr. Anderson, John E., Scott F. Anderson, James Annis, Neta A. Bahcall, J. A. Bakken, Robert Barkhouser, Steven Bastian, Eileen Berman, William N. Boroski, and et al., SDSS Collaboration. The Sloan Digital Sky Survey: Technical Summary. The Astronomical Journal, 120(3):1579–1587, September 2000.
 Yinghua Zhang, Yu Zhang, Ying Wei, Kun Bai, Yangqiu Song, and Qiang Yang. Fisher Deep Domain Adaptation. arXiv e-prints, page arXiv:2003.05636, March 2020.