Deep Learning at Scale for Gravitational Wave Parameter Estimation of Binary Black Hole Mergers

03/05/2019 ∙ by Hongyu Shen, et al. ∙ 12

We present the first application of deep learning at scale to do gravitational wave parameter estimation of binary black hole mergers that describe a 4-D signal manifold, i.e., black holes whose spins are aligned or anti-aligned, and which evolve on quasi-circular orbits. We densely sample this 4-D signal manifold using over three hundred thousand simulated waveforms. In order to cover a broad range of astrophysically motivated scenarios, we synthetically enhance this waveform dataset to ensure that our deep learning algorithms can process waveforms located at any point in the data stream of gravitational wave detectors (time invariance) for a broad range of signal-to-noise ratios (scale invariance), which in turn means that our neural network models are trained with over 10^7 waveform signals. We then apply these neural network models to estimate the astrophysical parameters of black hole mergers, and their corresponding black hole remnants, including the final spin and the gravitational wave quasi-normal frequencies. These neural network models represent the first time deep learning is used to provide point-parameter estimation calculations endowed with statistical errors. For each binary black hole merger that ground-based gravitational wave detectors have observed, our deep learning algorithms can reconstruct its parameters within 2 milliseconds using a single Tesla V100 GPU. We show that this new approach produces parameter estimation results that are consistent with Bayesian analyses that have been used to reconstruct the parameters of the catalog of binary black hole mergers observed by the advanced LIGO and Virgo detectors.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Gravitational wave (GW) sources Abbott et al. (2016a, b, 2017a, 2017b, 2017c); The LIGO Scientific Collaboration et al. (2018) are now routinely detected by the advanced LIGO Abbott et al. (2016c); The LIGO Scientific Collaboration et al. (2015) and Virgo Acernese et al. (2015) detectors. The last two observing runs of these GW detectors indicate that, on average, one GW source has been detected for every fifteen days of analyzed data. It is expected that this number will be superseded in the upcoming third observing run, since the advanced LIGO and Virgo detectors have been undergoing commissioning since August 2017. In their enhanced volumetric configuration, it is expected that the detection rate for binary black hole (BBH) mergers and binary neutron star (BNS) inspirals will increase, and may yield the first observations of neutron star-black hole (NSBH) mergers The LIGO Scientific Collaboration et al. (2018).

Given the expected scale of GW discovery in upcoming observing runs, it is in order to explore the use of efficient signal-processing algorithms for low-latency GW detection and parameter estimation. This work is motivated by the need to probe a deeper parameter space that is available to GW detectors, in real-time, and using minimal computational resources to maximize the number of studies that can be conducted with GW data. This combination of constraints is a common theme for large-scale astronomical facilities, which will be producing large datasets in low-latency within the next decade, e.g., the Large Synoptic Survey Telescope LSST Science Collaboration et al. (2009). Scenarios in which both LSST, among other electromagnetic observatories, and advanced LIGO and Virgo work in unison, analyzing disparate datasets in real-time to realize the science goals of Multi-Messenger Astrophysics make this work timely and relevant Allen et al. (2018, 2019).

Among a number of recent developments in signal-processing, deep learning exhibits great promise to increase the speed and depth of real-time GW searches. The first deep learning algorithms to do classification and regression of GWs emitted by non-spinning BBHs on quasi-circular orbits were presented in George and Huerta (2018) in the context of simulated LIGO noise. The extension of that study to realistic detection scenarios using real advanced LIGO noise was introduced in George and Huerta (2018). Even though these algorithms were trained to do real-time classification and regression of GWs in realistic detection scenarios for a 2-D signal manifold (non-spinning BBHs on quasi-circular orbits), the studies presented in George and Huerta (2018, 2018); Rebei et al. (2018) have demonstrated that deep learning algorithms generalize to new types of sources, enabling the identification of moderately eccentric BBH mergers, spin precessing BBH mergers, and moderately eccentric BBH signals that include higher-order modes, respectively. These studies also indicate that while the detection of these new types of GW sources is possible, it is necessary to use higher-dimensional signal manifolds to train these algorithms to improve parameter estimation results, and to go beyond point-parameter estimation analysis. This work has sparked the interest of the GW community, leading to a variety of studies including the classification of simulated BBH waveforms in Gaussian noise, GW source modeling and GW denoising of BBH mergers George et al. (2017); George and Huerta (2017); Shen et al. (2017); Huerta et al. (2018); Wei and Huerta (2019); Rebei et al. (2018); Chua et al. (2018); Gabbard et al. (2018); Fan et al. (2018); González and Guzmán (2018); Gabbard et al. (2018); Fujimoto et al. (2018); Li et al. (2017); Nakano et al. (2018).

While detection and parameter estimation are the key goals for the development of deep learning for GW astrophysics, in this article we focus on the application of deep learning for parameter estimation. At present, GW parameter estimation is done using Bayesian inference 

Graff et al. (2012); Veitch et al. (2015), which is a robust, yet computationally-intensive approach. Given the scalability and robustness of deep learning algorithms, it is natural to explore their applicability for GW parameter estimation, the theme of this article.

Previous Work The first exploration of deep learning for a 2-D signal manifold was presented in George and Huerta (2018, 2018). For waveform signals with matched-filtering signal-to-noise ratio (SNR) , these neural network models measure the masses of quasi-circular BBH mergers with a mean percentage absolute error , and with errors for moderately eccentric BBH mergers. These results provided a glimpse of the robustness and scalability of deep neural network models, and the motivation to take these prototypical applications into a production run toolkit for GW parameter estimation.

Highlights of This Work We have designed new architectures and training schemes to demonstrate that deep learning provides the means to reconstruct the parameters of BBH mergers in more realistic astrophysical settings, i.e., black holes whose spins are aligned or anti-aligned, and which evolve on quasi-circular orbits. This 4-D signal manifold marks the first time deep learning at scale is used for GW data analysis, since we have utilized nearly 10 million waveforms to train neural network models that, once fully trained, can reconstruct in real-time the parameters of the BBH catalog presented by the LIGO and Virgo Scientific Collaborations in The LIGO Scientific Collaboration et al. (2018). The neural network models we introduce in this article have two different architectures. The first one is tailored for the measurement of the masses of the binary components, whereas the second is used to quantify the final spin and the quasi-normal modes (QNMs) of the BH remnant. Once both neural networks are fully trained, we use them in parallel for inferences studies, finding that we can reconstruct the parameters of BBH mergers within 2 milliseconds using a single Tesla V100 GPU. We also show that by measuring these parameters we can infer the individual spins of the binary components that best reproduce the observed BBH signals in real LIGO noise, providing an alternative approach to better constrain the highly degenerate spin parameter space of BBH signals.

This article is structured as follows. Section II introduces the model architectures used in these analyses, it describes the construction and curation of the datasets used to train, validate and test our neural network models. It also includes a revised curriculum learning for neural network training. We quantify the accuracy of these neural network models in realistic detection scenarios using real advanced LIGO noise in Section III. We put at work our deep learning algorithms in Section IV to estimate the astrophysical parameters of the BBH mergers reported in The LIGO Scientific Collaboration et al. (2018). We summarize our findings and future directions of work in Section V.

Ii Methods

In this section, we introduce the neural network models used for parameter estimation, and describe a novel curriculum learning scheme to accurately measure the masses of the binary components, and the final spin and QNMs of the BH remnant. We have used TensorFlow Abadi et al. (2016, 2015) to design, train, validate and test the neural network models presented in this section.

The rationale to use two neural network models stems from the fact that the masses, spins and QNMs span rather different scales. Therefore, to improve the accuracy with which deep learning can measure these parameters we have designed one neural network that is tailored to measure the masses of the binary components, and one to measure the final spin and QNMs of the remnant. The astute reader may have noticed that the final spin of the BH remnant and its QNMs have a similar range of values when the QNMs are cast in dimensionless units, and this is the approach we have followed. In practice, we train the second neural network model using the fact that the QNMs are determined by the final spin using the relation Berti et al. (2006)

(1)

where correspond to the frequency and damping time of the ringdown oscillations for the fundamental bar mode, and the first overtone . We have computed the QNMs following Berti et al. (2006). One can readily translate into the ringdown frequency (in units of Hertz) and into the corresponding (inverse) damping time (in units of seconds) by computing . represents the final mass of the remnant, and can be determined using Eq. (1) in Healy and Lousto (2017).

As we describe below, we have found that to accurately reconstruct the masses of the binary components, it is necessary to use a more complex and deeper neural network architecture. It is worth mentioning that once these models are fully trained, a single GPU is sufficient to perform regression analyses in milliseconds using both neural network models.

ii.1 Neural network model to measure the properties of the black hole remnant

The neural network model consists of two main parts: a shared root component for all physical parameters, and three leaf components for individual parameters (, , and ), as illustrated in the left panel of Figure 1, and Table 1

. The model architecture looks like a rooted tree. The root is composed of seven convolutional layers, and its output is shared by the leaves. Each leaf component has the same network architecture with three fully connected layers. This approach is inspired by the hierarchical self decomposing of convolutional neural networks described in 

Sairam et al. (2018); Hu et al. (2018). The key idea behind this approach is that the neural network structures are composed of a general feature extractor for the first seven layers, which is then followed up by sub-networks that take values from the output of the general feature extractor.

The rationale to have splits after the universal structure is to use sub-structures that focus on different sub-groups of the data. As a simile: even though the human body has multiple limb locations (“leaves”), human motion is controlled by the overall motion of the body (“the root”). In practice this means that the tree structure of our models leverages the hierarchical structure of the data. It first extracts the universal features through the root, and then passes the information to the different sub-networks (“leaves”) to learn specialized features for different physical parameters. Notice that the root will also prevent overfitting in the “leaves”, since each leaf is optimized through the root.

Figure 1: Architecture of the neural network used to estimate the final mass and quasi-normal modes of the black hole remnant.

Another change to the conventional architecture is that we remove the nonlinear activation in the second to last layer in the leaf component, i.e., it is a linear layer with identity activation function (see Table 

1

). This allows more neurons to be activated and passed to the final layer. As discussed in 

Dong et al. (2017), removing the nonlinear activation in some intermediate layers smooths the gradients and maintains the correlation of the gradients in the neural network weights, which, in turn, allows more information to be passed through the network as the depth increases.

Layer
Component
Layer
Configurations
Activation
Functions
Root Layer:
Convolutional
ReLU
Leaf Layer:
Fully Connected
ReLU
Identity
Tanh
Table 1:

Architecture of the neural network model used to measure the final spin and QNMs of the black hole remnant. For the root convolutional layers, the setup indicates: (kernel size, # of output channels, stride, dilation rate, max pooling kernel, max pooling stride). All convolutional layers have ReLU activation function and the padding is set to “VALID” mode. There is no max pooling layer if the last two entries in the configuration are 0’s. The leaf fully connected layers setup: (# of output neurons, dropout rate). For the last layer, we use

activation function. However, the activation function in the second last layer is removed.

ii.2 Neural network model to measure the masses of the binary components

The tree-like network model used for this study is described in the right panel of Figure 2 and Table 2. With respect to the architecture described in the previous section, we reduce the number of convolutional layers in the root from seven to three. We have done this because we are now using more layers in the leaves, which in turn makes the gradient back-propagation harder. Reducing the number of root layers improves gradient updates to the front layers.

Each leaf component uses a squeeze-and-excitation (SE) structure Hu et al. (2018). The SE block is a sub-structure between two layers (squeeze step). It applies a global pooling, and assigns weights to each of the channels in the convolutional layers (excitation step). Compared to conventional convolutional structures with universal weights, the SE components adjust the importance of each channel with an adaptively learned weight, which, as described in Hu et al. (2018), effectively results in 25% improvement in image classification. For images, channels are usually represented in RGB. Since we are using 1-D time-series signals, we treat channels of the original input signals to be 1. The SE block adaptively recalibrates channel-wise feature responses. Furthermore, the weights are optimally learned through a constraint introduced by the global pooling. This ensures that the weights encode both spatial and channel-wise information. Furthermore, the weights help the channels represent group specific features at deeper layers, which is consistent with our objective of using “leaves” for different parameters.

Figure 2: Architecture of the neural network used to estimate the masses of the binary black hole components.

Following the SE components, the neural networks have two highway blocks Srivastava et al. (2015). The structures are a variant of the residual structure, as proposed in He et al. (2016). In the residual block, instead of directly learning the feature, it learns the residual components by an identity shortcut connection, which resolves the gradients vanishing when the model goes deeper. The highway block only introduces weights to the components in the residual block, which is similar to the application of importance weights on channels in SE components. Finally, we apply three fully connected layers with dropouts after the highway blocks to prevent overfitting Srivastava et al. (2014). The same nonlinearity reduction is also applied in the second last layer.

Layer
Component
Layer
Configurations
Activation
Functions
Root Layer:
Convolutional
ReLU
Leaf Layer: SE ReLU
Leaf Layer: Highway ReLU
Leaf Layer:
Fully Connected
ReLU
Identity
ReLU
Table 2: Architecture of the neural network model used to measure the masses of the binary components. For the root convolutional layers, the setup indicates: (kernel size, # of output channels, stride, dilation rate, max pooling kernel, max pooling stride). All convolutional layers have ReLU activation function and the padding is set to “VALID” mode. For the Leaf SE layer, the setup is: (# of output channels, # of residual blocks). The general structure for the SE layer follows the configuration described in Hu et al. (2018). Leaf highway layer setup: (kernel size, # of channels, stride, # of highway blocks). The configuration for the highway is described in Srivastava et al. (2015). The leaf fully connected layers setup is: (# of output neurons, dropout rate). For the last layer we use ReLU activation. However, the activation function in the second last layer is removed.

ii.3 Dataset Preparation

To demonstrate the use of deep learning for parameter estimation, we consider the catalog of BBH mergers presented in The LIGO Scientific Collaboration et al. (2018). Based on the Bayesian analyses presented in that study, we consider the following parameter space to produce our training dataset: , . The spin of the binary components span a range . By uniformly sampling this parameter space we produce a dataset with 300,180 waveforms. These waveforms are produced with the surrogate waveform family Blackman et al. (2015), considering the last second of the evolution which includes the late inspiral, merger and ringdown. The waveforms are produced using a sample rate of 8192Hz.

For training purposes, we label the waveforms using the masses and spins of the binary components, and then use this information to also enable the neural net to estimate the final spin of the BH remnant using the formulae provided in Hofmann et al. (2016), and the QNMs of the ringdown following Berti et al. (2006). In essence, we are training our neural network models to identify the key features that determine the properties of the BBHs before and after merger using a unified framework.

In order to encapsulate the true properties of advanced LIGO noise, we whiten all the training templates using real LIGO noise from the Hanford and Livingstone detectors gathered during the first and second observing runs LIGO Lab, the LIGO Scientific Collaboration and the Virgo Scientific Collaboration (2019).

We use 70% of these waveform samples for training, 15% for validation, and 15% for testing. The training samples are randomly and uniformly chosen. Throughout the training, we use ADAM optimizer to minimize the mean squared error of the predicted parameters with default hyper-parameter setups Kingma and Ba (2014)

. We choose the batch size to be 64, the learning rate to be 0.0008, the total number of iterations to be 120,000 (maximum). We use a dropout rate 0.1 for training and no dropout is applied for testing and validation. To simulate the environment where the true GWs are embedded, we use real advanced LIGO noise to compute power spectral density, which is then used to whiten the templates. In addition, we apply a random 0% to 6% left or right shifts. This endows the neural networks with time-invariance, and improves their performance to estimate the parameters of the signal irrespective of their position in the data stream. On the other hand, this technique also prevents overfitting of the data. Since the locations are randomly shifted with independent noise injected, the training data are different at each epoch.

ii.4 Curriculum learning with decreasing signal-to-noise ratio

Iterations pSNRs
1-9000 2.0-3.0
9001-18000 1.5-3.0
18001-27000 1.0-3.0
27001-36000 0.5-3.0
36001-48000 0.3-3.0
48001-60000 0.2-3.0
60001- 0.1-3.0
Table 3: Decreasing peak SNR (pSNR) setup. The pSNR is uniformly chosen within the indicated range. Notice that the early stopping criterion is also applied if the number of iterations is greater than 60,000 and the relative error threshold is met. The relation between match-filtering SNR to pSNR is: 1.0 pSNR 13.0 SNR.

In realistic detection scenarios, GWs have moderate SNRs, and are contaminated by non-Gaussian and non-stationary noise. In order to ensure that neural networks identify GWs over a broad range of astrophysically motivated SNRs, we start training them with large SNRs, and gradually reduce the SNRs to a lower level. This is an idea taken from curriculum learning literature Bengio et al. (2009), which allows the network to distill more accurate information of the underlying signals with larger SNRs to signals with lower SNRs. This approach has been demonstrated for classification, regression and denoising of GW signals George and Huerta (2018, 2018); Rebei et al. (2018); Shen et al. (2017); George et al. (2017); George and Huerta (2017); Wei and Huerta (2019). Specifically, each waveform is normalized to have maximum amplitude 1, and then we use curriculum learning with the decreasing SNR scheme detailed in Table 3

. The noisy data is then normalized to have variance one. We normalize the data to ensure that the trained model can characterize true BBH signals in realistic detection scenarios, covering a broad range of SNRs.

The different steps followed in our curriculum learning scheme are presented in Table 3. In addition, we use an early stopping criterion with the relative error threshold 0.026 for and 0.0016 for . One additional change to the mass model is we rescale the masses by 1/20, to make the optimization converge faster. In the evaluation, we just scale the data back to its original amplitude.

Iii Experimental Results

Using the signal manifold described in the previous section, we present results of the accuracy with which our neural network models can measure the masses of the binary components, and the properties of the corresponding remnant.

Figure 3: Relative error with which our deep learning algorithm can measure the masses of the binary black hole components as a function of optimal matched-filtering signal-to-noise ration (SNR). For waveform with , the primary and secondary masses can be constrained with relative errors less than , respectively.

Figure 3 presents the accuracy with which can be recovered over a a broad range of SNRs. We notice that for signals with , the primary and secondary masses can be constrained with relative errors Abramowitz and Stegun (1965) less than , respectively. These results represent a major improvement to the analysis we reported in the context of a 2-D signal manifold in George and Huerta (2018, 2018). Furthermore, Figure 4 shows that for signals with our neural network models can measure the triplet with relative errors less than , respectively. To the best of our knowledge, this is the first time deep learning is used to infer the properties of BH remnants directly from GW signals.

Figure 4: Relative error, as a function of optimal matched-filtering signal-to-noise ration (SNR), with which our neural network model can measure the final spin, , and quasi-normal modes (QNMs), , of the black hole remnant. For signals with , can be recovered with relative errors less than , respectively.

Iv Deep learning parameter estimation of detected Binary Black Hole Mergers

Event Name
GW150914 37.46 [4.13 0.06] 30.80 [0.43-1.65] 0.689 [0.037 0.17] 0.5362 [0.0127-0.20] 0.0798 [0.0011 0.16]
GW151012 23.89 [0.35 1.65] 17.34 [0.56 1.44] 0.653 [0.009 0.25] 0.5214 [0.0030 0.15] 0.0810 [0.0003-0.15]
GW151226 17.60 [2.01 0.87] 14.14 [2.85 0.73] 0.646 [0.006 1.53] 0.5188 [0.0021 1.51] 0.0812 [0.0001-1.60]
GW170104 36.45 [1.54-0.76] 21.83 [3.54-0.56] 0.661 [0.080-0.84] 0.5185 [0.0306-0.48] 0.0816 [0.0029 0.57]
GW170608 13.96 [1.13 1.10] 11.96 [1.07 1.56] 0.697 [0.025-1.28] 0.5278 [0.0154-0.95] 0.0809 [0.0011-0.67]
GW170729 48.61 [1.58-1.61] 37.69 [1.82-0.28] 0.694 [0.019-0.47] 0.5102 [0.0107-0.50] 0.0812 [0.0019-0.16]
GW170809 31.01 [3.29 0.60] 22.42 [4.56 1.85] 0.698 [0.034-1.23] 0.5428 [0.0163-1.15] 0.0779 [0.0016-1.05]
GW170814 35.07 [1.75 0.84] 21.50 [0.52 0.99] 0.718 [0.010-1.89] 0.5377 [0.0108-1.38] 0.0794 [0.0003 1.76]
GW170818 40.05 [1.29-1.57] 24.08 [0.93-1.33] 0.656 [0.015 0.73] 0.5129 [0.0043 1.21] 0.0816 [0.0005-1.02]
GW170823 39.56 [1.75-1.44] 30.14 [0.53-1.68] 0.740 [0.002-1.76] 0.5510 [0.0007-1.74] 0.0782 [0.0001 1.75]
Table 4: Deep learning parameter estimation results for the catalog of BBH signals reported in The LIGO Scientific Collaboration et al. (2018)

. The values for each parameter, from left to right, represent the median, median absolute deviation and Pearson median skewness, obtained by whitening the GW strain containing a putative signal with up to 120 different PSDs.

In this section we use our neural network models to measure from all the BBH mergers detected to date by the advanced LIGO and Virgo observatories The LIGO Scientific Collaboration et al. (2018). The results of this analysis are presented in Table 4 in triplets that provide, from left to right, the median, median absolute deviation and Pearson median skewness. These values are computed by whitening the data containing a putative signal with 120 different PSDs, half of them are constructed using LIGO Hanford noise and the rest with LIGO Livingstone noise. Through this approach we are effectively measuring the impact of PSD variations in the measurements of the astrophysical parameters of BBH mergers.

The deep learning parameter estimation results presented in Tables 4 are consistent with those obtained with established, Bayesian parameter estimation pipelines The LIGO Scientific Collaboration et al. (2018). The robustness and speed with which these results are obtained for each BBH signal (less than 2 milliseconds) warrants the development in full generality of the algorithms we present in this article. This work is under earnest development and will be presented shortly.

Having demonstrated the application of deep learning at scale for the characterization of BBH mergers, it is now in order to design deep neural networks for real-time detection and characterization of GW sources that are expected to have electromagnetic and astro-particle counterparts, i.e., BNS and NSBH systems. For that study, we expect no additional computational challenges to the ones we have already addressed in this analysis. The central development for such an effort, however, will consist of designing a clever algorithm to readily identify BNS or NSBH in a hierarchical manner, i.e., in principle it is not needed to train neural networks using minute long waveforms. Rather, we need to figure out how much information is needed to accurately reconstruct the astrophysical parameters of one of these events in real-time. These studies should be pursued in the future.

V Conclusion

We have presented the first application of deep learning at scale to characterize the astrophysical properties of BHs whose spins are aligned or anti-aligned, and which evolve on quasi-circular orbits. Using nearly 10 million waveforms to densely sample this parameter space, and encoding time- and scale-invariance, we have demonstrated that deep learning enables real-time GW parameter estimation. Our results are consistent with established, compute-intensive, Bayesian methods that are routinely used for GW parameter estimation.

The approach we have presented herein provides the means to constrain the parameters of BBHs before and after the merger event. We have shown that deep learning can directly infer the final spin and QNMs of BH remnants, thereby paving the way to directly use QNMs to assess whether BH remnants are accurately described by general relativity. In future work, we will study how accurately these neural network models can tell apart ringdown waveforms described by astrophysically motivated alternative theories of gravity in realistic detection scenarios. The extension of this work to enable real-time detection and parameter estimation of GW sources that are central for Multi-Messenger Astrophysics discovery campaigns should also be investigated.

Vi Acknowledgements

This research is part of the Blue Waters sustained-petascale computing project, which is supported by the National Science Foundation (awards OCI-0725070 and ACI-1238993) and the State of Illinois. Blue Waters is a joint effort of the University of Illinois at Urbana-Champaign and its National Center for Supercomputing Applications. We acknowledge support from the NCSA, and thank the NCSA Gravity Group for useful feedback. Tesla P100 and V100 GPUs used for this project were donated by NVIDIA to the NCSA Gravity Group, and are hosted by Vlad Kindratenko at the Innovative Systems Lab at NCSA. This work also made used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI-1548562. Specifically, it used the Bridges system, which is supported by NSF award number ACI-1445606, at the Pittsburgh Supercomputing Center (PSC); TG-PHY160053 grant is gratefully acknowledged This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357. This paper was reviewed and approved by the LIGO P&P committee.

References