Cycle-StarNet: Bridging the gap between theory and data by leveraging large datasets

07/06/2020
by   Teaghan O'Briain, et al.
0

Spectroscopy provides an immense amount of information on stellar objects, and this field continues to grow with recent developments in multi-object data acquisition and rapid data analysis techniques. Current automated methods for analyzing spectra are either (a) data-driven models, which require large amounts of data with prior knowledge of stellar parameters and elemental abundances, or (b) based on theoretical synthetic models that are susceptible to the gap between theory and practice. In this study, we present a hybrid generative domain adaptation method to turn simulated stellar spectra into realistic spectra, learning from the large spectroscopic surveys. We use a neural network to emulate computationally expensive stellar spectra simulations, and then train a separate unsupervised domain-adaptation network that learns to relate the generated synthetic spectra to observational spectra. Consequently, the network essentially produces data-driven models without the need for a labeled training set. As a proof of concept, two case studies are presented. The first of which is the auto-calibration of synthetic models without using any standard stars. To accomplish this, synthetic models are morphed into spectra that resemble observations, thereby reducing the gap between theory and observations. The second case study is the identification of the elemental source of missing spectral lines in the synthetic modelling. These sources are predicted by interpreting the differences between the domain-adapted and original spectral models. To test our ability to identify missing lines, we use a mock dataset and show that, even with noisy observations, absorption lines can be recovered when they are absent in one of the domains. While we focus on spectral analyses in this study, this method can be applied to other fields, which use large data sets and are currently limited by modelling accuracy.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 8

page 9

page 11

page 12

07/06/2020

Interpreting Stellar Spectra with Unsupervised Domain Adaptation

We discuss how to achieve mapping from large sets of imperfect simulatio...
06/24/2021

AutoAdapt: Automated Segmentation Network Search for Unsupervised Domain Adaptation

Neural network-based semantic segmentation has achieved remarkable resul...
06/11/2021

Spectral Unsupervised Domain Adaptation for Visual Recognition

Unsupervised domain adaptation (UDA) aims to learn a well-performed mode...
03/02/2022

Convolutional neural networks as an alternative to Bayesian retrievals

Exoplanet observations are currently analysed with Bayesian retrieval te...
03/10/2021

Disentangled Representation Learning for Astronomical Chemical Tagging

Modern astronomical surveys are observing spectral data for millions of ...
09/25/2020

Predicting galaxy spectra from images with hybrid convolutional neural networks

Galaxies can be described by features of their optical spectra such as o...
02/19/2019

Feature Selection for Better Spectral Characterization or: How I Learned to Start Worrying and Love Ensembles

An ever-looming threat to astronomical applications of machine learning ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Using theoretical models to decipher stellar spectra in terms of stellar properties is difficult. It requires detailed modeling of the photospheric surface layers, understanding a myriad of atomic and plasma processes, and calculating the radiative transfer through complex stellar atmospheres. Nevertheless, stellar spectra are one of the most important data sources that we have to understand stars.

Classically, ab initio methods that compare theoretical stellar spectra directly to observations have been used for decades. The comparison is typically performed manually (e.g., Sneden et al., 2008; Aoki et al., 2013; Venn et al., 2020), but massively multiplexed and higher resolution stellar spectroscopic surveys have been launched in the past few years (Gilmore et al., 2012; Dalton et al., 2014; Buder et al., 2018; Holtzman et al., 2018), where the data analysis approaches have started to become more automatic (Yanny et al., 2009; Ting et al., 2017a, 2019; Fabbro et al., 2018; Zhang et al., 2019; Bialek et al., 2020; Guiglion et al., 2020).

Unfortunately, ab initio methods typically suffer from the differences between theory and practice, referred to as the “synthetic gap.” The synthetic gap can be induced by both theoretical systematics and instrumental factors. In terms of theoretical systematics, many assumption are made during spectral modeling; e.g., stellar atmospheres are often assumed to be one-dimensional, in hydrostatic equilibrium, and in local thermodynamic equilibrium. These assumptions often fail, causing systematic offsets between theoretical spectra and actual measurements. Instrumental factors can further introduce signatures that are not reproduced by theoretical modelling; e.g., telluric lines imposed by the Earth’s atmosphere and the image formation on the detector due to the light path through the telescope. Modern spectroscopic pipelines (e.g., Ballester et al., 2000; Martioli et al., 2012) often need to make accurate assumptions about the instrumental signatures in order to reproduce realistic and consistent spectra. These assumptions can contribute to the synthetic gap and limit the capabilities of the ab initio methods.

Previous work has been done to attempt to overcome the synthetic gap between theory and observations. For instance, efforts have been made towards incorporating non-LTE and 3D hydrodynamic effects in the model atmospheres (e.g., Amarsi, 2015; Amarsi et al., 2016; Kovalev et al., 2019). Other methods have been proposed to isolate spectral regions – only using spectral regions where the astrophysics and instrumental effects are better understood (e.g., Jahandar et al., 2017; Ting et al., 2019). Furthermore, others have attempted to reduce this gap by augmenting the synthetic data through sampling and adding noise to make the spectra look more realistic (e.g., Bialek et al., 2020). Despite these efforts, there is still a large amount of room for improvement.

In contrast, methods that use the empirical observed data directly for the spectral templates have been proposed (Ness et al., 2015; Ting et al., 2017a; Fabbro et al., 2018; Leung & Bovy, 2019; Xiang et al., 2019). These “data-driven” methods skip the direct use of synthetic spectra, but depend on a priori knowledge of stellar parameters and elemental abundances for a large training set. Accordingly, these methods learn a model that directly translates spectra into physical characteristics, or vice versa. Due to the increasing number of amassed spectra, training these data-driven methods has become more tangible.

However, the stellar labels determined from data-driven models are limited in accuracy. For instance, the spectra that these models are applied to are often taken from a pipeline developed for the same spectra that are used to train the data-driven models. Naturally, this raises doubts on whether the model is learning actual physics or simply inheriting the biases from the original pipeline. In addition, systematic errors in the original stellar models used to determine the stellar parameters and elemental abundances will be hidden. Lastly, building data-driven models requires high quality data for training, often in the form of high signal-to-noise spectra. In most cases, collecting a sufficient number of high quality empirical templates that span the full range of the stellar parameter and elemental abundance ranges can be difficult, if not impractical.

In this study, we propose a novel solution, cycle-starnet, that overcomes the synthetic gap without suffering from the shortcomings of data-driven methods. At its core, cycle-starnet

is trained to learn to transform data from the synthetic domain to the observed spectral domain. To accomplish this, the network leverages recent advancements in machine learning methodologies; specifically, Domain Adaptation 

(Liu et al., 2017; Zhu et al., 2017). Furthermore, the method can directly work with noisy training spectra, and implicitly denoise them. In essence, the domain adaptation is accomplished by forcing the two domains to share an abstract representation, which we use to exploit the connections found between the domains.

cycle-starnet shows how auto-calibrated data-driven models can be built via unlabelled observed spectra. In other words, our approach paves the way to alleviate the critical limitations of data-driven models (the need to know the stellar labels a priori) and at the same time, it bridges the synthetic gap that plagues existing ab initio spectral analysis methods.

This paper is organized as follows: In Section 2, we detail the critical insights of cycle-starnet. In Section 3, the technical details of cycle-starnet are described. In Section 4, cycle-starnet is used in two case studies; in particular, correcting the systematics in theoretical models through domain adaptation, and identifying missing spectral features in the synthetic models. In Section 5, the advantages of cycle-starnet compared to other spectral analysis techniques are discussed, as well as its limitations. We conclude the study in Section 6.

2 Motivation and Overview of cycle-starnet

The main goal of cycle-starnet is to learn the connection between two sets of unlabelled spectra, and how to “morph” from one domain to another. In other words – adopting terminology from the area of Domain Adaptation in Machine Learning – cycle-starnet can transfer spectral models from the synthetic domain to the observed domain, and by doing so, it corrects for the systematic errors in the synthetic modelling.

Domain adaptation has a long history in the field of Machine Learning. Additionally, with the advent of the Generative Adversarial Network (GAN), GAN-based domain adaptation has seen numerous successes. For example, Zhu et al. (2017) built a domain transfer model that was capable of translating photo images from day time settings to night time settings (and vice-versa), while keeping the content in the photos the same. In our work, we apply a similar method to translate between two spectral domains. As an analogy, one can think of the robust spectral features as the content in the photo images (which both domains share), whereas the day/night “context” are the systematics that we want to correct for.

An important aspect of these domain adaptation methods (including ours) is that the data is unpaired across the two domains. In other words, we are provided with data from each domain (e.g., the synthetic and observed spectra), but we do not have samples from one domain that correspond to samples in the opposite domain. This is unlike the other proposed data-driven models (e.g., Ness et al., 2015; Ting et al., 2017b; Xiang et al., 2019)

which assume that the corresponding stellar labels of the observed spectra are known a priori, and then learn the label-spectra translation through supervised learning. Instead, for unsupervised domain adaptation, this mapping can to be built with unpaired data. Not requiring paired samples is ideal for future uses in stellar spectroscopy because – when obtaining newly observed spectra – no prior assumptions or knowledge are needed regarding the stellar parameters and elemental abundances of the stars.

Figure 1: A schematic diagram depicting the proposed method to provide a link between two domains of spectra: the synthetic domain, , and the observed domain, . This is accomplished by creating a shared space for the two domains to have a representation in, called the shared latent space,

. The common space is created via autoencoders, where

represents the encoder (or the dimensionality reduction component) and is the decoder (or the spectral generation component).

We will elaborate on the technical details of cycle-starnet in Section 3; here, we focus on the insights. The critical insight is that, while the samples in the two domains are unlabelled, they should share the same latent variable space if they represent the same underlying objects. In the case of stellar spectra, the shared latent variables are the underlying stellar labels (stellar parameters and elemental abundances)222Note that we do not investigate the impact of some astrophysical line broadening effects that may vary between elements, e.g., Stark broadening, hyperfine corrections, and radiative damping parameters. that define the spectra. If the two domains span the same range of stellar parameters and labels – by enforcing the two latent spaces to “agree” with each other – the two domains end up communicating, and hence, share the same knowledge.

In order to achieve this goal, there are two mission-critical components of cycle-starnet: (a) a method to extract the underlying hidden abstractions (or latent space) of each domain, essentially performing a dimensionality reduction of spectra, and (b) an algorithm to ensure that the abstractions for the two domains are the same. The former is done by learning to extract information with two separate auto-encoders for the individual domains. As for the latter, the individual abstractions are made to be the same by forcing the latent spaces of the two domains to be “shared”. This implies that the latent embeddings from one domain are also valid in the other domain. In practice, this means that the auto-encoder networks are able to reproduce samples within the same domain, as well as map samples from one domain to the other. This concept is visualized in Figure 2.

On top of this framework, inspired by Gonzalez-Garcia et al. (2018), we apply a twist that tightens our method: we leave room for a non-shared latent space. In more detail, the synthetic domain is only able to show variations that are a subset of what happens in real data. For example, the observed data might have instrumental variations that are not part of the synthetic spectra. Therefore, without leaving room for a non-shared latent space (which only applies for the observed domain), it is impossible for the framework to model phenomena such as instrumental variation. However, by providing the framework with the freedom of a non-shared latent space, this issue is mitigated.

3 Methodology

In this section, we lay out the details of cycle-starnet, which consists of two key components. The first component is an unsupervised domain adaptation algorithm that transforms spectra from one spectral domain to another. The second component is a spectral emulator that provides a connection between transferred spectra and physical parameters. We detail the two components in Section 3.1 and Section 3.2, respectively.

Figure 2: A graphical representation of the cycle-starnet framework. cycle-starnet is an unspervised domain adaptation technique that allows spectra from one domain to morph into another domain. In particular, cycle-starnet creates a shared latent-space, (Section 3.1.1), through which spectra can be transferred to the opposite domain. The shared latent-space is achieved through a combination of auto-encoder reconstruction losses (Section 3.1.4), generative adversarial learning (Section 3.1.5), and cycle-consistency (Section 3.1.6). We denote as the encoders, as the decoders, and as the critic networks. On top of that, to retrace the latent representations back to the physical stellar labels, we impose an additional surrogate physics network (the payne), which emulates spectra , from stellar labels (Section 3.2). The individual components discussed in Section 3 are annotated in the plot.

3.1 Domain adaptation

The primary motivation for cycle-starnet is the capacity to bridge the gap between observed and synthetic data sets using unsupervised methods; i.e., learning hidden aspects of the two domains automatically, without any human supervision. For ease in explanation, we will refer to the domain of synthetic data as the “synthetic domain”, and the observed data as the “observed domain”. We further denote data from these two domains as and , respectively. Note, however, that cycle-starnet is highly flexible, and can be used to transform between any two domains, and it is not restricted to transferring between synthetic and real observations. For example, one could also transfer between spectra obtained from different spectrographs, or between two different synthetic models, which we leave as future applications.

3.1.1 The Latent Space

Shared latent space

In order to learn the mapping from one set of spectra to the other, we propose a method based on the UNsupervised Image-to-image Translation Networks

(UNIT, Liu et al., 2017). Roughly speaking, both datasets are encoded independently down to a shared representation. We denote this shared latent-space as , and illustrate this concept in Figure 1.

Once the shared space is created, a synthetic spectrum can be mapped to the latent-space, then the latent representation can be used to create the corresponding spectrum in the observed domain. Furthermore, as a bi-product of this shared latent-space, we create four domain mappings, which can be written as (1) , (2) , (3) , and (4) . Of these, the translation of , the mapping of imperfect synthetic models to the observed domain, is the main focus of this paper.

Split latent space

While the transfer of spectra between domains requires a shared latent space, not all of the information in one domain is present in the other. This is especially true when relating synthetic and observed spectra. For example, two stars in the observed domain may have the same set of stellar labels, yet vary in other characteristics. In other words, the two domains could have “shared” characteristics, but also unique properties of their own. The latent representation of the synthetic spectra can therefore be considered as a subset of the observed spectra, since the latter have other defining features from instrumental profiles (e.g., line spread function) and observational affects (e.g., telluric features) that might not be fully captured in the synthetic models. To account for this, in addition to the shared latent variables (that are common to both domains), we introduce a split latent-space, , which represents information that is unique to the observed domain.

3.1.2 Architecture

To implement the proposed task, a framework is constructed out of encoders and decoders, as shown in Figure 2. One unique aspect of this architectural design is that we impose a hierarchical structure on the encoder-decoder pairs, which is used to facilitate training. Namely, two low-level encoder-decoder pairs (- and -) are implemented that are dedicated to capturing domain-specific changes within spectra. We then utilize a high-level encoder-decoder pair (-) that further abstracts the latent space; this pair is shared between both domains. Lastly, to implement the split latent space, we also make use of a second high-level encoder-decoder pair (-) for the observed domain. Note that, the dimensionality of the data gets reduced as it goes through the encoders – thus the data is abstracted – while the opposite happens when it goes through the decoders.

This architectural design allows low-level and data-related information to be learned by the domain-specific encoder-decoders. At the same time, the shared encoder-decoder learns to abstract high-level physical concepts that are shared amongst both domains. While this design choice was motivated by the architecture used for UNIT (Liu et al., 2017), our framework is unique because of the use of the split encoder-decoder, which was found to provide improved convergence. Another key difference between cycle-starnet and UNIT is that we implement deterministic auto-encoders instead of variational auto-encoders in this study. The exact architectural design of these networks are outlined in A.

3.1.3 Training the Network

In order for the network to be able to translate from one domain to the other, it should be able to perform – simultaneously – the following tasks:

  1. The encoder-decoders should be able to abstract and de-abstract spectra; meaning they should be able to map spectra to latent representations, then back to spectra within each domain.

  2. They should also be able to transfer spectra from one domain to the other. Once transferred, the spectra produced should look as if they are from the resulting domain.

  3. They should retain physical meaning within the transferred spectra. Therefore, when a spectrum is transferred from one domain to the other, it is once again transferred back to the original domain (thus forming a cycle). This cycled spectrum should be identical to the original spectrum.

These three objectives are formed as three different loss functions, which we combine into a single loss function for optimization. We denote (1) the loss related to the within-domain reconstruction as

, (2) the domain transfer loss as , and (3) the cycle-reconstruction loss as . Since these losses are replicated for both domains, the overall loss used during training can be written as

(1)

where the s are hyper-parameters that control the influence of each term.

This loss formulation is similar to that in Liu et al. (2017), except we adopt a mean squared distance, instead of a mean absolute distance. Each term is described below; for specific training details, see B.

3.1.4 Within-Domain Reconstruction —

The first objective is to ensure that the encoder-decoders are able to reconstruct data within each domain.

First, we introduce the shorthand notations

(2)

Here, for simplicity, we have denoted using both encoder-decoders, - and -, as -. With this notation, the within-domain reconstruction loss function can be written as

(3)

where is the distance function. For the synthetic domain, a standard Mean Squared Error (MSE) loss is minimised, while for the observed domain, an MSE with samples weighted by the spectrum uncertainties (and bad pixels masked) is minimised.

3.1.5 Cross-Domain Translation —

As for the cross-domain translation, recall that there is no direct pairing between spectra in the two domains. To overcome this lack of direct pairing, as in UNIT (Liu et al., 2017), we employ generative adversarial learning (GAN, Goodfellow et al., 2014). In other words, the translated333In this study, we use the words “translated” or “transferred” interchangeably. Both words designate the result of our domain adaptation , which transforms spectra from the synthetic domain to the observed domain. spectra are required to “look” as though they belong in the translated domain. Thus, we train critics (also referred to as discriminators) to distinguish between the actual spectra from each domain and the translated spectra.

A core idea behind GANs is that – unlike typical deep network training where a fixed criteria exists – the critic is also a deep network that is trained at the same time as the original generative network (here, our encoder-decoders). Furthermore, since the generative network is trained to fool – whereas the critic is trained to discriminate – the training process is an adversarial game; the critic minimizes a loss function, while the generative network maximizes it.

In the case of cycle-starnet, as shown in Figure 2, we use one critic for each domain: and . For both critics, they take in reconstructed444The critics are asked to discern the reconstructed spectra and the cross-domain spectra, instead of the original spectra and the cross-domain spectra, because this will facilitate the translation to denoise the cross-domain transferred spectra. Denoising will be discussed in Section 5. and cross-domain mapped spectra as inputs. The critics then predict a confidence value of whether or not each sample is “real” (i.e., resembling the reconstructed spectra from the cross-domain) or “fake”. Additionally, to help constrain the latent-representations to be the same for both synthetic and observed spectra, the critic networks act on the latent-space as well. More explicitly, the critics receive paired samples of spectra and their latent-representations, and predict a confidence value for each pair.

Figure 3: A schematic diagram on generating systematic-corrected data-driven models. After cycle-starnet is trained, the network can create systematic-corrected spectra by mapping stellar labels, , through the flow highlighted in this figure. Since the mapping is continuous, we can also associate spectral features to stellar labels by taking the differential derivatives of the network with respect to the input stellar labels.

Mathematically, the training objective is defined with a binary cross-entropy function. Binary cross entropy assigns 0 and 1 values for the two groups (real versus fake). In our notation, we assign the value 1 for objects that truly belong to the group, and the value 0 otherwise. In short, the critic would want to assign value 1 for all objects that are reconstructed in a domain-specific manner, and 0 for objects that are passed through the cross-domain translation. If we denote the binary cross-entropy function as , the loss for the critics can be summarized as

(4)

Here, we have again used the short-hand notation for the transfer functions:

(5)

Note that the also requires the split latent variables, . Therefore, for each transferred spectrum from the synthetic to the observed domain, we choose a random observed spectrum, , encode it with , and use its split latent values for this process. Since we do not aim for a “target” observed spectrum – but rather, to construct a realistic “observed” spectrum – for the adversarial training, any could be used and this would not harm the generality of .

While the task of the critics is to minimize this particular loss function, the task for the cross-domain auto-encoders is the complete opposite: to fool the critics. Therefore, the adversarial objective for the generative processes is to maximize , instead of minimizing. In order to accomplish this, we simply switch the target class for the domain transferred spectra. The reconstructed “true” spectra are not used for this particular optimization process. Training of the critic network is alternated with the training of the rest of the framework, forming a min-max training setup as in typical GANs. For more details on the critic training, we refer readers to Goodfellow et al. (2014).

3.1.6 Cycle-Reconstruction —

Accomplishing the within-domain reconstruction and cross-domain adversarial objectives would result in having a model that can – not surprisingly – reconstruct and cross-domain transfer spectra. In addition, the constraint of applying the critics on the latent-representations further provides the basis for a shared latent-space. However, there is no guarantee that for a given spectrum, , the cross-domain generated spectrum, , is the correct corresponding spectrum in the  domain. Therefore, as described in Liu et al. (2017), we enforce that the physical meaning is preserved throughout the transfer by introducing a cycle-consistency constraint. In other words, we require that a spectrum from  can be mapped to  and then back to  accurately. The same applies to spectra in the  domain.

Mathematically, we introduce the shorthand notations

(6)

Importantly, when transferring an observed spectrum to the synthetic domain, the information in the split latent-space is lost. Therefore, to accurately cycle-reconstruct this spectrum, the originally encoded split latent variables are used when mapping this spectrum back to the observed domain.

We can write the cycle-reconstruction loss as

(7)

which is similar to , but with full cycles.

3.2 Spectral Emulator

While we have described how to perform domain adaptation, there is no guarantee that the shared latent representation is physically interpretable. The network has the full liberty to decide what the best latent representation looks like, and as a result, an individual latent variable can be related to several physical parameters. Even if the network is properly trained, and the shared latent representation is directly related to stellar labels, disentangling the two can remain a challenge. In cycle-starnet, we propose to overcome this issue by including a synthetic emulator (see Figure 2). The synthetic emulator maps stellar labels, , to the synthetic spectra, . Subsequently, the synthetic spectra can then be morphed into the observed domain via the domain adaptation network. The key here is to create a differential pipeline which can trace the latent representations back to the stellar labels, via the continuous mapping of .

We adopt the payne as our synthetic emulator (for details, see Ting et al., 2019), which utilizes a neural network as a physics surrogate to emulate the Kurucz ATLAS12/SYNTHE models (Kurucz, 1970; Kurucz & Avrett, 1981a; Kurucz, 1993a, 2005a)

. More explicitly, a multi-layer perceptron (MLP) network

555In the original version in Ting et al. (2019), the authors adopted separate MLPs to emulate the flux variation of individual wavelength pixels independently. In this study, we adopt an improved version by considering a single large MLP to emulate the synthetic spectra as a whole. We found that using a single network facilitates the extraction of information from adjacent pixels, and thus improving the emulation precision. This version of the payne can be found in the latest the payne github: https://github.com/tingyuansen/The_Payne is trained on a set of ab initio Kurucz synthetic spectra, and the neural network learns how the spectral fluxes vary with respect to the stellar labels. Details are available in the original paper.

Figure 4: The reconstruction of the synthetic domain. cycle-starnet

transfers spectra through autoencoders, and the figure quantifies the “interpolation errors” from these autoencoders. Shown are the relative residuals for the test set of 7,000 synthetic spectra whose stellar labels are drawn randomly from the APOGEE-Payne catalog. The top panel shows the relative residuals between the reconstructed spectra

and the original spectra . Similarly, the bottom panel shows a comparison between the cycle-reconstructed spectra, , and the original spectra, . For each plot, the mean bias

and standard deviation

are stated. cycle-starnet incurs a negligible interpolation error (0.4%) and bias ().

3.2.1 Extracting Spectral Feature-Label Correlations

With the synthetic spectral emulator included, cycle-starnet provides a closed connection between stellar labels and the systematics-corrected translated spectra. This can be done by evaluating the continuous flow of , as shown in Figure  3. Moreover, since each mapping – including the synthetic emulator and auto-encoder – is continuous, by interpreting the differential relations that the network has found, we can identify spectral features and associate them with individual stellar labels.

Spectral features can be identified by calculating how individual input elemental abundances impact the output pixels in the observed domain. For example, if is the systematic-corrected model, then the partial derivative of the individual pixels with respect to a particular element, , shows the spectral response to that element, which can be written as

(8)

The derivative spectrum for the synthetic domain emulator, , can also be calculated, which provides the information held within our theoretical models. Therefore, by taking the difference between the two “response functions”, the additional information that is not contained in the synthetic models can be revealed. This method is tested in Section 4.2.

4 Experiments & Results

Figure 5: Similar to Figure 4, but here we show the reconstruction capabilities regarding the observed domain. However, since the observations are noisy, we do not expect the “denoised” reconstructions to be exactly the same as the input spectra. Therefore, we normalize the residuals with the reported uncertainties of APOGEE. The reconstructions have a standard deviation of , demonstrating that the reconstruction in the observed domain is accurate and is consistent with the observational uncertainties.
Figure 6: Mitigating the synthetic gap with cycle-starnet. The top panel shows a portion of the best-fitted Kurucz model for an M-giant APOGEE spectrum with solar-like abundances. While the Kurucz model is broadly consistent with the APOGEE observation, a minor synthetic gap persists. The bottom panel shows the cycle-starnet generated spectrum via domain transfer. Evidently, the transferred model exhibits better consistency with the observation, especially for cool stars like M-giants whose spectral models were not well calibrated. On top of that, cycle-starnet also learns the imperfect continuum normalization in the data (as illustrated by the fact that some normalized values are ) and produces transferred synthetic models that are normalized in a self-consistent manner.

In this section, we present two case studies to show that cycle-starnet can generate systematic-corrected models from unlabelled observed spectra. In Section 4.1, we show how Kurucz synthetic models translate into APOGEE observed spectra. We quantify the agreement between spectra in terms of the residuals and via a t-SNE analysis. Admittedly, the accurate agreement of spectra does not necessarily guarantee that cycle-starnet has learned actual physics. Therefore, we further investigate the derivatives of cycle-starnet in Section 4.2 to determine how the network derivatives have aligned with stellar labels. This second case study shows that the network has learned the actual physics behind spectra. Specifically, we show that cycle-starnet can associate missing spectral features in the synthetic spectra to their correct corresponding elemental abundances.

4.1 Mitigating the synthetic gap with cycle-starnet

In this section, we show how to mitigate the synthetic gap between the Kurucz models and APOGEE observations with cycle-starnet.

Figure 7: cycle-starnet mitigates the synthetic gap between the Kurucz models and the APOGEE observations. The top panel shows the difference between the 7,000 APOGEE test spectra and their corresponding best-fit Kurucz models. Even with consistent continuum normalization, the residuals are more significant than the measured uncertainties with a non-negligible bias in the residuals; evidence of the synthetic gap. In contrast, the bottom panel shows a similar comparison between the APOGEE spectra and the Kurucz models that are transferred with cycle-starnet. The transferred spectra demonstrate better consistency with the APOGEE observations and the residuals are largely consistent with the APOGEE reported uncertainties with a negligible bias.

4.1.1 Experimental setup

For this case study, atlas12/synthe models (Kurucz & Avrett, 1981b; Kurucz, 1993b, 2005b, 2013) are adopted for the synthetic domain and the payne is used as the synthetic emulator (Section 3.2). Similar to Ting et al. (2019), instead of using the default Kurucz line list, we utilized a calibrated line list by Cargile et al. (in prep.), which was tuned to better match the Solar and Arcturus FTS spectra. As for the observed domain, APOGEE DR14 spectra are adopted, which have been wavelength calibrated to vacuum to be consistent with the Kurucz models. These spectra are constructed by co-adding multiple velocity corrected visits of the same object. All spectra are continuum normalized using the same routine as in Ting et al. (2019).

For simplicity, the APOGEE-Payne catalog (the revised APOGEE catalog of stellar labels determined by using the payne

) is adopted as our reference. To reduce the effects of outlier spectra, only APOGEE spectra that have decent fits in the APOGEE-Payne catalog are included;

i.e. those with a reduced and a total broadening (which consists of both macroturbulence and rotation ) of less than 10 km/s. Finally, only APOGEE spectra with a median signal-to-noise of are included to eliminate noisy and/or saturated spectra.

Furthermore, the APOGEE-Payne catalog is randomized to showcase how cycle-starnet can perform domain adaptation with unpaired spectra. In particular, for the observed domain, we adopt APOGEE spectra from half of the objects that meet the above criterion. For the other half of the objects, the APOGEE-Payne stellar labels are used to generate the synthetic Kurucz spectra. While the APOGEE-Payne labels are used to generate the Kurucz models and remove outlier spectra, we emphasize that cycle-starnet never “sees” the stellar labels of either domain; the training is entirely unsupervised.

The 25 stellar labels from APOGEE-Payne, which are inherited include: , , microturbulence , additional broadening , 20 elemental abundances in [X/H], namely, C, N, O, Na, Mg, Al, Si, P, S, K, Ca, Ti, V, Cr, Mn, Fe, Co, Ni, Cu, Ge, and the isotopic ratio, C12/C13. The above procedure yields a set of 97,000 spectra in each domain. From these, we adopt 80,000 spectra as the training set, 10,000 spectra as the validation set, and withhold 7,000 spectra as the test set. Unless stated otherwise, all of the results shown below are based on the test set.

4.1.2 Within-domain reconstruction

As discussed in Sections 3.1.4 and 3.1.6, cycle-starnet provides two approaches for reconstructing spectra within the same domain: direct reconstruction and cycle-reconstruction. We first demonstrate that these within-domain reconstructions work, which are a necessary condition for the cross domain translations.

In Figure 4, we show the accuracy of these mappings in the synthetic domain. The auto-encoded, , and cycle-reconstructed spectra, , are compared to the original spectra, . To show the relative residual, we normalize the difference with the original spectra, and display the 16-84 percentile as the 1 range. As demonstrated, cycle-starnet can reconstruct the synthetic domain with negligible bias () and a scatter of . This applies to both the direct reconstruction and the cycle-reconstruction. Recall that the cycle-reconstruction is performed by passing information first to the opposite domain (here, the observed domain) and then back to the original domain (the synthetic domain). The fact that this cycle-reconstruction is accurate demonstrates that the latent space has learned not only information within-domain, but also information from the opposite domain.

Figure 5 shows similar results for the observed domain. However, since the observations are noisy, we do not expect the “denoised” reconstructions to be the exactly the same as the input spectra. Therefore, we normalize the residuals with the uncertainties of the observed spectra as reported by APOGEE. In short, the deviations for both reconstructions are minimal when compared to the original spectra, providing a negligible bias and a normalized standard deviation . This demonstrates that cycle-starnet is also able to reconstruct spectra in the noisy observed domain. Finally, the fact that the standard deviation is close to 1 demonstrates that the reconstructions are implicitly denoised.

Nevertheless, outliers do exist and some pixels have more substantial deviations. There are of the pixels that have a normalized deviation greater than 3 (). The exact reason for these substantial variations is unclear, but we suspect some mischaracterizations of the APOGEE uncertainties may be a cause. Here, the uncertainties provided by APOGEE are assumed to be calibrated and uncorrelated between pixels, which might not be strictly true, especially for the resampled and co-added spectra.

4.1.3 Domain adaptation with cycle-starnet

In Figures 6 and 7, we show examples of how cycle-starnet is capable of adapting spectra from one domain to the other – a key objective of this study. Figure 6 demonstrates this procedure for a typical M-giant with solar abundances in APOGEE. The upper panel compares the APOGEE spectrum to the corresponding best-fit Kurucz model (derived from a best-fit with the payne). It is clear that, even though the two spectra are normalized with the same procedure, the synthetic gap persists.

In contrast, the lower panel of Figure 6 compares the APOGEE spectrum to the domain transferred spectrum produced by cycle-starnet. For this, we adopt a transferred spectrum that best fits the observation (see Section 4.1.5 for details on the fitting). The transferred model illustrates how cycle-starnet can correct for the improperly modelled spectral features. Furthermore, cycle-starnet has learned to understand the imperfect continuum normalization in the data and produces transferred models that are consistent with such normalization.

In Figure 7, we analyze the residuals for all 7,000 test spectra. The top panel shows the residuals between the APOGEE spectra and best-fit Kurucz models, whereas the bottom panel shows these same residuals with the transferred synthetic spectra produced by cycle-starnet. After the domain adaptation, the spectra exhibit a much better agreement with the observations, reducing the synthetic gap. Evidently, when transferring the spectra with cycle-starnet, the sample bias is 15 times smaller, and the sample standard deviation is 1.6 times smaller, reaching almost the same precision as the within-domain reconstructions, shown in Figure 5. As previously mentioned, the Kurucz models used in this study have been generated with the improved line list by Cargile et al., (in prep.), as described in Ting et al. (2019). Consequently, the synthetic gap would be even larger if we were to use the original Kurucz models rather than those with the improved line list.

4.1.4 Visualization of domains via t-SNE

An intuitive visualization of the synthetic gap – and how we reduce it – is provided by using t-Distributed Stochastic Neighbor Embeddings (t-SNE, Maaten & Hinton, 2008). The t-SNE is a dimensionality reduction technique that is widely used to visualize high dimensional spaces. Within the context of this paper, the t-SNE projects spectra from a 7000-dimensional “spectral pixel” space (or a 600-dimensional latent representation) to a compressed, 2-dimensional representation. This allows one to illustrate the distribution of a high-dimensional dataset in a 2D figure, where each sample is represented by a single point. Furthermore, the proximity of two points in the 2D space demonstrates the similarity of those two samples. Since the t-SNE projected space is a lower-dimensional representation with arbitrary units, there are no explicit dimensions in the axes of the plots.

Figure 8: The t-SNE projections of 7,000 pairs of test spectra. The left panel illustrates the synthetic gap between the Kurucz models and the APOGEE spectra; without domain adaptation, spectra from the two domains span different regions in the t-SNE projection. The middle panel shows the cycle-starnet results, which exemplifies the effectiveness of the generative aspect of cycle-starnet. After domain adaptation, cycle-starnet successfully morphs the synthetic domain to the observed domain. In both panels, to have a more robust comparison (see text for details), we consider the auto-encoded “denoised” version of the observed spectra, , as our reference instead of the original version, . The right panel shows the comparison of the latent-space, demonstrating that cycle-starnet created a common representation for the two domains.
Figure 9: Stellar parameters recovered by Cycle-StarNet for APOGEE spectra. Typical full spectral fitting technique such as The Payne requires extensive evaluation of the quality of synthetic models to isolate wavelength regions where the models do not agree with the observation (left panel). Using all pixels (middle panel) can exacerbate model systematics biases as illustrated in the red boxes. The stellar parameters inferred by Cycle-StarNet is plotted at the right panel. Cycle-StarNet auto-calibrates the Kurucz models through domain adaptation and attains more precise stellar labels without the need for a spectroscopic mask. Cycle-StarNet is a step toward auto-calibrating spectral models with large datasets and fitting the full spectrum without the need for additional spectroscopic masks.

In Figure 8, using three separate t-SNE analyses, we show how our method mitigates the synthetic gap. The left panel shows the comparison between the Kurucz synthetic models, , and the APOGEE spectra, . Consistent with Figure 7, the two domains are distinct in the t-SNE projection, which is further evidence of the synthetic gap. By contrast, the middle panel and the right panel illustrate the results of our domain adaptation. The middle panel compares the domain transferred spectra, , to the auto-encoded version of the original observed spectra, . Evidently, the domain adapted synthetic spectra show close agreement with the observed spectra, demonstrating that cycle-starnet can generate accurate synthetic spectra that are almost indistinguishable from those in the observed domain.

In these first two analyses, since cycle-starnet implicitly denoises spectra, we adopted the auto-encoded versions of the observed spectra, , as the reference for our comparisons instead of the original noisy spectra, . The denoised versions were chosen to demonstrate that the synthetic gap is inherently due to imperfect modeling and reduction – not the observational noise. Therefore, the auto-encoded is a more direct comparison with and because these are also noiseless.

Finally, the right panel shows the two latent representations of and produced by their respective encoders. The agreement between the latent representations demonstrates that cycle-starnet has indeed created a shared latent space that extracts common information from both domains.

Figure 10: Identification of missing spectral features with cycle-starnet. We present a mock study where we have perfect knowledge of both the synthetic and “observed” domains. We generate Kurucz spectral models, but mask 30% of the spectral features in the “synthetic” domain. We then use the original Kurucz models (with noise added) as the observed domain. The top panel shows the Kurucz model for a Solar abundance K-giant with missing spectral features. The second panel compares the systematic-corrected transferred spectrum to the actual “observed” spectrum. The third panel shows the differences between the synthetic and transferred spectra, demonstrating the missing features; the missing features of Mg, Si, Fe, and C are annotated in blue, orange, green, and red, respectively. The final panel shows the differences in the cycle-starnet derivatives between the synthetic domain and the transferred models. The difference between the two demonstrates the additional information that cycle-starnet has learned from the observed domain, yet was not contained in the synthetic models. Even with noisy and unpaired observed spectra that mimic the APOGEE observations, cycle-starnet not only correctly recovers the missing features (the second panel), but it identifies the actual elemental sources of the missing spectral features (the final panel).
Figure 11: Similar to Figure 10, but for the carbon and nitrogen spectral features. While cycle-starnet still successfully identified missing features – especially for the C and/or N related molecular features – cycle-starnet sometimes inaccurately associates the features for being purely carbons or nitrogen or both. C and N are highly degenerate and can both contribute to the same features, directly or indirectly. Consequently,cycle-starnet might struggle to distinguish the exact sources. Domain knowledge is needed to disentangle the exact sources.

4.1.5 Deriving stellar parameters for APOGEE

In this section, we study the recovery of stellar parameters from APOGEE spectra with the better-calibrated Kurucz models from cycle-starnet. In particular, we fit 100,000 random APOGEE spectra taken from the observed domain and compare the results to the original stellar parameters inferred with the payne, which uses the same Kurucz models. A more complete inference framework with detailed comparison to other pipelines – as well as the study of elemental abundances – is deferred to future work.

Figure 9 illustrates the improved recovery of stellar labels by cycle-starnet. To provide a visual guide for the general expected trend, MIST isochrones at 7 Gyrs old are also plotted. In the left panel, the results from the original Kurucz models adopted in the payne are shown. As discussed in Ting et al. (2019), to mitigate the synthetic gap, they constructed a spectroscopic mask specific to the dataset. In the middle panel, the results produced by the payne without using this spectroscopic mask are shown. Consequently, due to the imperfectness of the Kurucz models, matching the spectra to the Kurucz models without the spectroscopic mask exacerbates the systematic biases, for example, the cool M-giants (also see Figure 6).

Lastly, the right panel shows the fit produced with cycle-starnet; i.e., with the better-calibrated Kurucz models produced through domain adaptation. Not surprisingly – and consistent with Figure 7 – the improved calibrated models provide more precise stellar parameters, which show a considerably reduced amount of scatter. Also notably, cycle-starnet can be used to fit the entire spectrum, using all of the information collected, without the need for a spectroscopic mask.

4.2 Learning stellar physics with cycle-starnet

While we demonstrate in Section 4.1 that cycle-starnet successfully morphs the Kurucz models to the APOGEE observations – closing the synthetic gap – it does not guarantee that cycle-starnet has extracted useful physics. In this section, we show a possible physical interpretation of cycle-starnet using the network’s derivatives. In particular, the derivatives of individual flux intensities with respect to elemental abundances.

4.2.1 Experimental setup

As discussed in Section 3.2.1, the derivatives (i.e., flux “responses”) of elemental abundances help determine which spectral features are associated with a particular element666This is not strictly true because some elemental abundances, especially those prolific electron donors, can also substantially change the stellar atmospheric structure, which indirectly affect all spectral features. As a result, spectral features that vary with a particular element do not necessarily imply that those features are due to the direct atomic/molecular transitions of that specific element.. However, for the APOGEE observed spectra, it is impossible to know the ground truth; we simply do not know what may or may not be the missing in our line list. Therefore, as a proof of concept, we consider a mock “observed” data set drawn from the Kurucz models instead, which allows us to know the ground truth derivatives. Applications to real spectra are deferred to future studies.

The training of this version of cycle-starnet is similar to the one in Section 4.1, however, this time we create a controlled observed domain. In more detail, instead of adopting the APOGEE spectra, an “observed” data set is synthesized with Kurucz models using the APOGEE-Payne labels that correspond to our observed training set used in Section 4.1. Noise is added to these mock observed spectra to mimic a more realistic observed training set. In the synthetic domain, we mask approximately 30% of the absorption features by setting them to the continuum level. To summarize, in this controlled experiment, two sets of unpaired Kurucz models are utilized. The synthetic domain is composed of noiseless Kurucz models with 30% of the spectral features missing, and the observed spectra are the original Kurucz models without missing features, but with added noise (mimicking the real APOGEE spectra). This will demonstrate that cycle-starnet can learn actual physics. In particular, by learning from the data alone, cycle-starnet can correctly identify missing spectral features in the “synthetic” models and associate them to the correct corresponding elements.

The flux derivatives of the synthetic emulator, , and the derivatives of the domain transferred spectra, , are calculated. The former informs us of the original input line list in the “synthetic” domain (which is missing about of the spectral features). The latter reveals the “true”, complete Kurucz line list learned from the “observed” data. Naturally, the differences between the two are used to identify the missing spectral features.

4.2.2 Recovering missing spectral features

The results of this domain adaptation problem are shown in Figure 10. A K-giant (K, ) with solar abundances is used as our set of reference labels, . The top panel shows the spectrum in the synthetic domain, , and the second panel shows the transferred synthetic spectrum, , as well as an actual observed spectrum with these stellar labels. In this second panel, when mapped to the observed domain via cycle-starnet, the missing features in the synthetic domain are correctly filled in. Similar to Figure 6, this provides evidence that cycle-starnet is able to bridge the synthetic gap.

The last two panels in Figure 10 demonstrate that, not only can cycle-starnet accurately fill in the missing features, but it also associates them to their corresponding elements. In more detail, the third panel shows the difference between the transferred spectrum and the synthetic spectrum, which illustrates the missing lines and their corresponding elemental abundances. To make this more clear, we color-code the masked features with their associated elements; focusing on the four elements that have a prominent presence in the APOGEE H-band: Mg, Si, Fe, and C. The final panel shows the differences between the true and recovered derivatives, as probed by cycle-starnet. In most cases, the differences in the derivatives are the strongest when calculated with respect to the correct input element. This implies that the missing features are recovered with accurate associations, even though cycle-starnet was trained with noisy and unpaired observed spectra that mimic the APOGEE observations. Similar results are found for most other missing lines associated with other elemental abundances. Nonetheless, as occasionally seen in Figure 10

, the gradients can be non-zero for unassociated elements, signaling that the unsupervised learning can still be improved in future studies.

While this experiment produces encouraging results for line identifications, it also highlights some limitations. Specifically, cycle-starnet can struggle to identify the exact sources of features when more than one element contribute to the feature – either directly or indirectly. This is especially true for the CNO molecular features (via the molecular balance of CNO, e.g., Ting et al., 2018). For instance, Figure 11 shows a few of the carbon and nitrogen features that were masked in the synthetic domain. As illustrated, cycle-starnet can assign the C and N related features to either C or N or both. In these cases, cycle-starnet will only be able to limit the potential sources rather than identify them precisely through a gradient analysis. However, such molecular features are usually very prominent in the spectrum, with many neighboring transitions within the wavelength region. As a result, incorporating prior domain knowledge of stellar spectroscopy could readily resolve this limitation.

5 Discussion

In this study, we showed how unsupervised domain adaptation has enormous potential to auto-calibrate for model inaccuracies through exploiting large data sets. To illustrate this, we presented a case study using stellar spectroscopy surveys. In particular, we demonstrated that domain adaptation is a powerful idea that can harness the strengths of data-driven and ab initio modeling while mitigating their limitations. In the following, we first discuss the advantages of cycle-starnet when compared with standard spectral calibration and fitting methods. We then discuss the limitations of cycle-starnet.

5.1 Advantages of applying cycle-starnet

A key advantage of cycle-starnet compared to existing data-driven models (e.g., Ness et al., 2015; Fabbro et al., 2018; Leung & Bovy, 2019), is that the training does not require any labels for the observed data. This is important as supervised training often inherits biases in the training labels. These biases are a result of obtaining training labels derived from other pipelines, which adopt their own set of models that are susceptible to systematics. In contrast, cycle-starnet learns a common abstraction of both the synthetic and observed domain – purely from the data – and directly adapts data in the synthetic domain to the observed domain and vice versa. Therefore, unlike other approaches, cycle-starnet is not subject to biases in the training labels. Thus, cycle-starnet provides an entirely new idea to construct data-driven models through unlabelled large datasets. Furthermore, one can interpret the shared representation as a multi-dimensional match, alleviating the need for spatial cross-matching between surveys. This would ultimately mitigate selection biases seen in pure data-driven analyses.

Moreover, cycle-starnet can yield robust and denoised data-driven models, even when trained with noisy spectra. We attribute this “effective denoising” of observed spectra to the following subtleties related to our method. First, the reconstruction and the cycle-reconstruction loss functions are weighted by the uncertainties in the spectra. As a result, this weighting forces the network to focus more on the cleaner portions of the dataset and down-weight the noisier spectral fluxes. In addition, when provided with enough data, auto-encoders learn to reconstruct the common information within a dataset (i.e., spectral features), and therefore ignore information that is highly specific to a given sample, (i.e., the noise). Lastly, to facilitate convergence and training, we use the reconstructed spectra as our “true” samples for the critic networks (see Eq. 4). Therefore, the transfer of spectra from the synthetic to the observed domain is implicitly primed to produce noiseless spectra. Being able to “learn” from noisy spectra is particularly useful as we can train with the bulk of the survey data instead of restricting to only a small subset of high S/N spectra.

Since the network incorporates synthetic data in the training and is applied to real observations, cycle-starnet can also be regarded as an auto-calibration of the synthetic models. However, unlike standard calibration methods that are often based on a few standard stars (e.g., the Sun and Arcturus, Shetrone et al., 2015)

, we effectively calibrate based on all of the stars in the dataset. The critical insight here is that spectra are fundamentally low-dimensional objects that lie on a manifold once transferred to an abstract latent-space. The latent space corresponds to the astrophysical properties of stars and has a finite and small number of degrees of freedom. Benefiting from a large amount of unlabelled data, we apply machine learning to discover this manifold that both synthetic and observed spectra lie on. By enforcing the commonality of the manifold, auto-calibration is attained. This is a drastically different philosophy to calibrate spectral models – a calibration that relies on the redundancy in large datasets rather than standard “ground truths.”

As demonstrated in this study, our new insight can lead to a better calibration of spectral models than standard calibration techniques. This is perhaps not surprising because the standard calibration focuses only on a few stars, which span a limited range of stellar labels. Consequently, calibrating with relatively hot stars like the Sun and Arcturus requires extrapolation when considering cooler stars like the M-giants (see Figure 6). In contrast, cycle-starnet calibrates the models using all of the available stars, which span the entire stellar parameter space of interest. Furthermore, since the network learns from the existing data, it can capture the variations in the instrumental and observational effects. This allows the method to correct for things that are challenging to model with pre-existing methods (e.g., the line spread function variation), alleviating a key roadblock when comparing models to observations.

5.2 Limitations and future implementations

While cycle-starnet has many attractive properties, the training of cycle-starnet can be delicate and may still require some fine-tuning. The difficulty mainly arises from the adversarial training setup. Specifically, adversarial training provides competition between the auto-encoder networks and the critic networks, making convergence not as straightforward as other machine learning tasks such as supervised regression. We note that the other proposed methods, including The Payne (Ting et al., 2019), The Cannon (Ness et al., 2015), AstroNN (Leung & Bovy, 2019), and StarNet (Bialek et al., 2020) are examples of supervised regression, and therefore are technically easier to train.

To train cycle-starnet, we have extensively explored numerous architectural choices and training details before deciding upon the final model. For instance, removing the adversarial losses leads to a smoother convergence, but without it, the latent representations are no longer shared, causing cross-domain translations to fail. Nevertheless, adversarial training is an active field of research in machine learning, with many new ideas proposed (Miyato et al., 2018; Deshpande et al., 2018; Donahue & Simonyan, 2019) since the original UNIT paper was published. Consequently, we expect that some these advancements would benefit and stabilize the training of future versions of cycle-starnet.

Besides the difficulty in training in an adversarial setting, we have made a few simplifying assumptions regarding the training spectra data sets:

(a) Firstly, we assumed that the uncertainties provided by APOGEE are uncorrelated and accurately determined. How a mis-characterization of the noise may skew the training is yet to be studied.

(b) The labels from both domains in this study span the same ranges as they are randomly drawn from the APOGEE-Payne distribution. Early experiments suggest that having the two domains span a similar label space is an essential aspect of cycle-starnet. However, for a new application, we might not know the range of the stellar labels a priori. How to resolve this problem requires a more thorough investigation.

Furthermore, since cycle-starnet is trained on the “bulk” of the observed data, the network might not extrapolate well for out-of-distribution samples and exotic stellar objects. To mitigate the potential impact of outliers during training, we eliminated APOGEE spectra by only considering spectra that produced a decent fit in the original APOGEE-Payne (reduced

). We do not expect a small number of odd samples to dominate the training, however, when adopting this method for other applications, it may be useful to remove outliers from the training process (

e.g., through “simpler” unsupervised methods such as t-SNE, UMAP, or normalizing flow).

(c) Recall that the domain adaptation process of cycle-starnet conducts a shared abstraction of both domains. The abstraction is then traced back to the stellar labels via a physics surrogate emulator. As a result, cycle-starnet auto-calibrates spectra primarily based on the original models, with which the emulator is trained. As demonstrated in Figure 9, some systematics can persist even after the auto-calibration. For example, the and of the red clump stars are still systematically lower than the isochrones, similar to the results from the original Kurucz models. Additionally, the stellar parameters for the coolest dwarfs remain problematic. These results illustrate that the auto-calibration of cycle-starnet only works to a certain extent and may not correct for significant “zero-point” biases in the models. Including external physical priors (e.g., an isochrone prior) could constrain the zero-point biases.

As a final remark, cycle-starnet is a project that started 2 years ago. During the development of cycle-starnet, multiple unsupervised domain adaptation methods have emerged, with potentials to alleviate some of the mentioned caveats. In particular, the UNIT method has been adapted to multiple domain adaptation (Huang et al., 2018), and to smaller sample requirements (Liu et al., 2019). Normalizing flows for domain adaptation have also shown to enforce cycle-consistency, remove the need of adversarial learning, and provide more tractable gradients and likelihoods by construction (Grover et al., 2020).

5.3 Open source

For future reproducibility and to assist with applications to other projects, we have made the code for cycle-starnet publicly available on GitHub777 https://github.com/teaghan/Cycle_SN, along with in-depth explanations on the code itself and the training details. As noted in Section 5.2, training cycle-starnet can be sensitive to the choices of the network architecture and training hyper-parameters. Therefore, we emphasize that the GitHub aims to only serve as a starting point for other applications.

6 Conclusion

Maximally extracting information from stellar spectra requires that we have perfect knowledge of spectral synthesis, an idea that has remained elusive despite decades of studies. In this paper, we present a new methodology, cycle-starnet, to tackle this problem. cycle-starnet adopts ideas from Domain Adaptation in Machine Learning to mitigate model systematics. Our results are summarized below:

  1. cycle-starnet auto-calibrates for deficiencies in spectral modeling, and develops a common abstraction of both synthetic and observed data through an unsupervised network. This abstraction is related to physical stellar labels via a physics surrogate emulator network.

  2. cycle-starnet can build data-driven models via a set of unlabelled training spectra, without knowing the stellar labels of the training spectra a priori.

  3. Through the use of a split latent space, cycle-starnet can distinguish the actual astrophysical information from the spectral variations due to instrumental and observational factors. This reduces the reliance of the modeling accuracy on instrumental factors, such as the fiber-dependent line spread functions and telluric feature modelling.

  4. By fitting the APOGEE spectra, we demonstrated that the auto-calibrated models produced by cycle-starnet yield more precise stellar parameters than the original model, even without adopting any spectroscopic mask or spectral windows.

  5. cycle-starnet can understand stellar astrophysics and uncover unknown spectral features. Testing on a mock dataset, we illustrated that cycle-starnet can recover the missing features in the Kurucz models and associate the missing features to the correct corresponding elements.

The philosophy of auto-calibrating models with domain adaptation, exemplified by cycle-starnet, is generic and can be applied to many other fields. Our results provide an entirely new path to extract information from large unlabelled datasets; harnessing advancements in Machine Learning to redefine what big data astronomy can mean for stellar spectroscopy and beyond.

TO and SB acknowledge the support provided for a portion of this research by the Natural Sciences and Engineering Research Council of Canada (NSERC) Undergraduate Student Research Awards (USRA). YST is supported by the NASA Hubble Fellowship grant HST-HF2-51425.001 awarded by the Space Telescope Science Institute. KV and SB acknowledge funding from the National Science and Engineering Research Council Discovery Grants program and the CREATE training program on New Technologies for Canadian Observatories.

Appendix A cycle-starnet Architecture

We summarize the architecture of each of the sub-networks in Table 1

. Each row within the table shows a subsequent network layer, as well as the activation function (LeakyReLU, Sigmoid, InstanceNorm) applied after the layer operation. The CONV layers are the standard 1D-convolutional layers; the DCONV layers are de-convolutional (or transposed convolutional) layers; and FC denotes fully-connected layers. For each layer, N, K, and S represent, respectively, the number of filters (or nodes), the size of the filters (or kernel size), and the stride-length of the convolutional operations.

Furthermore, instead of using the standard 7214 pixels from the ASPCAP reduction, we discard 47 pixels in the red chip to easily accommodate symmetrical down- and up-sampling in the convolutional networks. Thus, when downsampled through the encoder networks, the latent representations are dimensional, where is the number of filters in the last layer of the encoders. For instance, the shared latent-space is of shape .

The only difference between the architectures used in Sections 4.1 and 4.2 is that the split latent-space has four filters in the former and one filter in the latter. Note that, since Section 4.2 uses a mock dataset for the observed domain, there is no other observation-specific variable. Therefore, in principle, there is no need for a split latent-space. However, to keep the two architectures relatively consistent, we simply used fewer filters.

Layer &
1 CONV-(N32,K7,S4), LeakyReLU
2 CONV-(N64,K7,S4), LeakyReLU
Layer
1 CONV-(N128,K7,S4), LeakyReLU
2 CONV-(N256,K7,S2), LeakyReLU
3 CONV-(N512,K7,S2), LeakyReLU
4 CONV-(N25,K1,S1), InstanceNorm
Layer
1 CONV-(N32,K7,S4), LeakyReLU
2 CONV-(N32,K7,S2), LeakyReLU
3 CONV-(N32,K7,S2), LeakyReLU
4 CONV-(N4(1),K1,S1), InstanceNorm
Layer
1 DCONV-(N32,K7,S2), LeakyReLU
2 DCONV-(N32,K7,S2), LeakyReLU
3 DCONV-(N32,K7,S4), LeakyReLU
Layer
1 DCONV-(N512,K7,S2), LeakyReLU
2 DCONV-(N256,K7,S2), LeakyReLU
3 DCONV-(N128,K7,S4), LeakyReLU
Layer &
1 DCONV-(N64,K7,S4), LeakyReLU
2 DCONV-(N32,K7,S4), LeakyReLU
3 CONV-(N1,K1,S1)
Layer &
1a CONV-(N16,K7,S4), LeakyReLU
2a CONV-(N32,K7,S4), LeakyReLU
3a CONV-(N64,K7,S4), LeakyReLU
4a CONV-(N128,K7,S4), LeakyReLU
5a CONV-(N256,K7,S4), LeakyReLU
1b CONV-(N32,K1,S1), LeakyReLU
2b CONV-(N64,K1,S1), LeakyReLU
3b CONV-(N128,K1,S1), LeakyReLU
4b CONV-(N256,K1,S1), LeakyReLU
5b CONV-(N512,K1,S1), LeakyReLU
8 FC-(N1), Sigmoid
Table 1: A summary of the sub-network architectures.

Appendix B cycle-starnet Training Details

cycle-starnet is trained by optimizing the loss outlined in Equation 1. An essential aspect of having the network converge correctly is determining the correct combination of the values in this equation, which control the influence of each loss term. Through various empirical tests, we determined that the values , , and lead to satisfactory performance.

As discussed in Section 3.1.5, training is done iteratively; one iteration of training the auto-encoders is computed – including the adversarial loss for the transfer mappings – followed by an iteration of training the critic networks. Each iteration used a batch of 8 spectra from each domain. If we consider a single batch iteration as one optimization step for both processes, the network was trained for 350,000 batch iterations. The training was done with the Adam optimizer using a learning rate of 0.0001. Furthermore, this learning rate was decreased by a factor of 0.7 at 50k, 100k, 150k, and 200k batch iterations.

References

  • Amarsi (2015) Amarsi, A. M. 2015, MNRAS, 452, 1612
  • Amarsi et al. (2016) Amarsi, A. M., Asplund, M., Collet, R., & Leenaarts, J. 2016, MNRAS, 455, 3735
  • Aoki et al. (2013) Aoki, W., Beers, T. C., Lee, Y. S., et al. 2013, AJ, 145, 13
  • Ballester et al. (2000) Ballester, P., Modigliani, A., Boitquin, O., et al. 2000, The Messenger, 101, 31
  • Bialek et al. (2020) Bialek, S., Fabbro, S., Venn, K. A., et al. 2020, in prep
  • Buder et al. (2018) Buder, S., Asplund, M., Duong, L., et al. 2018, MNRAS, 478, 4513
  • Dalton et al. (2014) Dalton, G., Trager, S., Abrams, D. C., et al. 2014, in Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, Vol. 9147, Proc. SPIE, 91470L
  • Deshpande et al. (2018)

    Deshpande, I., Zhang, Z., & Schwing, A. G. 2018, in Conference on Computer Vision and Pattern Recognition

  • Donahue & Simonyan (2019) Donahue, J., & Simonyan, K. 2019, in Advances in Neural Information Processing Systems, 10542
  • Fabbro et al. (2018) Fabbro, S., Venn, K. A., O’Briain, T., et al. 2018, MNRAS, 475, 2978
  • Gilmore et al. (2012) Gilmore, G., Randich, S., Asplund, M., et al. 2012, The Messenger, 147, 25
  • Gonzalez-Garcia et al. (2018) Gonzalez-Garcia, A., van de Weijer, J., & Bengio, Y. 2018, CoRR, abs/1805.09730. https://arxiv.org/abs/1805.09730
  • Goodfellow et al. (2014) Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., et al. 2014, NIPS, 2672
  • Grover et al. (2020) Grover, A., Chute, C., Shu, R., Cao, Z., & Ermon, S. 2020, in AAAI, 4028
  • Guiglion et al. (2020) Guiglion, G., Matijevic, G., Queiroz, A. B. A., et al. 2020, arXiv e-prints, arXiv:2004.12666. https://arxiv.org/abs/2004.12666
  • Holtzman et al. (2018) Holtzman, J. A., Hasselquist, S., Shetrone, M., et al. 2018, AJ, 156, 125
  • Huang et al. (2018) Huang, X., Liu, M.-Y., Belongie, S., & Kautz, J. 2018, in Proceedings of the European Conference on Computer Vision (ECCV), 172
  • Jahandar et al. (2017) Jahandar, F., Venn, K. A., Shetrone, M. D., et al. 2017, MNRAS, 470, 4782
  • Kovalev et al. (2019) Kovalev, M., Bergemann, M., Ting, Y.-S., & Rix, H.-W. 2019, A&A, 628, A54
  • Kurucz (1970) Kurucz, R. L. 1970, SAO Special Report, 309, 291 pp
  • Kurucz (1993a) —. 1993a, SYNTHE spectrum synthesis programs and line data
  • Kurucz (1993b) —. 1993b, SYNTHE spectrum synthesis programs and line data, ed. Kurucz, R. L.
  • Kurucz (2005a) —. 2005a, Memorie della Societa Astronomica Italiana Supplementi, 8, 14
  • Kurucz (2005b) —. 2005b, Memorie della Societa Astronomica Italiana Supplementi, 8, 14
  • Kurucz (2013) —. 2013, ATLAS12: Opacity sampling model atmosphere program, Astrophysics Source Code Library. http://ascl.net/1303.024
  • Kurucz & Avrett (1981a) Kurucz, R. L., & Avrett, E. H. 1981a, SAO Special Report, 391, 139 pp
  • Kurucz & Avrett (1981b) —. 1981b, SAO Special Report, 391, 139 pp
  • Leung & Bovy (2019) Leung, H. W., & Bovy, J. 2019, MNRAS, 483, 3255
  • Liu et al. (2017) Liu, M.-Y., Breuel, T., & Kautz, J. 2017, NIPS
  • Liu et al. (2019) Liu, M.-Y., Huang, X., Mallya, A., et al. 2019, in Proceedings of the IEEE International Conference on Computer Vision, 10551
  • Maaten & Hinton (2008) Maaten, L. v. d., & Hinton, G. 2008, Journal of Machine Learning Research, 9, 2579
  • Martioli et al. (2012) Martioli, E., Teeple, D., Manset, N., et al. 2012, in Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, Vol. 8451, Proc. SPIE, 84512B
  • Miyato et al. (2018) Miyato, T., Kataoka, T., Koyama, M., & Yoshida, Y. 2018, in International Conference on Learning Representations
  • Ness et al. (2015) Ness, M., Hogg, D. W., Rix, H.-W., Ho, A. Y., & Zasowski, G. 2015, ApJ, 808, 16
  • Shetrone et al. (2015) Shetrone, M., Bizyaev, D., Lawler, J., et al. 2015, The American Astronomical Society, 221
  • Sneden et al. (2008) Sneden, C., Cowan, J. J., & Gallino, R. 2008, ARA&A, 46, 241
  • Ting et al. (2018) Ting, Y.-S., Conroy, C., Rix, H.-W., & Asplund, M. 2018, ApJ, 860, 159
  • Ting et al. (2019) Ting, Y.-S., Conroy, C., Rix, H.-W., & Cargile, P. 2019, ApJ, 879, 69
  • Ting et al. (2017a) Ting, Y.-S., Rix, H.-W., Conroy, C., Ho, A. Y. Q., & Lin, J. 2017a, ApJ, 849, L9
  • Ting et al. (2017b) —. 2017b, ApJ, 849, L9
  • Venn et al. (2020) Venn, K. A., Kielty, C. L., Sestito, F., et al. 2020, MNRAS, 492, 3241
  • Xiang et al. (2019) Xiang, M., Ting, Y.-S., Rix, H.-W., et al. 2019, ApJS, 245, 34
  • Yanny et al. (2009) Yanny, B., Rockosi, C., Newberg, H. J., et al. 2009, The Astronomical Journal, 137, 4377
  • Zhang et al. (2019) Zhang, X., Zhao, G., Yang, C. Q., Wang, Q. X., & Zuo, W. B. 2019, PASP, 131, 094202
  • Zhu et al. (2017) Zhu, J.-Y., Park, T., Isola, P., & Efros, A. A. 2017, in IEEE International Conference on Computer Vision, 2223