Denoising Gravitational Waves using Deep Learning with Recurrent Denoising Autoencoders

11/27/2017 ∙ by Hongyu Shen, et al. ∙ 0

Gravitational wave astronomy is a rapidly growing field of modern astrophysics, with observations being made frequently by the LIGO detectors. Gravitational wave signals are often extremely weak and the data from the detectors, such as LIGO, is contaminated with non-Gaussian and non-stationary noise, often containing transient disturbances which can obscure real signals. Traditional denoising methods, such as principal component analysis and dictionary learning, are not optimal for dealing with this non-Gaussian noise, especially for low signal-to-noise ratio gravitational wave signals. Furthermore, these methods are computationally expensive on large datasets. To overcome these issues, we apply state-of-the-art signal processing techniques, based on recent groundbreaking advancements in deep learning, to denoise gravitational wave signals embedded either in Gaussian noise or in real LIGO noise. We introduce SMTDAE, a Staired Multi-Timestep Denoising Autoencoder, based on sequence-to-sequence bi-directional Long-Short-Term-Memory recurrent neural networks. We demonstrate the advantages of using our unsupervised deep learning approach and show that, after training only using simulated Gaussian noise, SMTDAE achieves superior recovery performance for gravitational wave signals embedded in real non-Gaussian LIGO noise.



There are no comments yet.


page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The application of machine learning and deep learning techniques have recently driven disruptive advances across many domains in engineering, science, and technology 

LeCun et al. (2015)

. The use of these novel methodologies is gaining interest in the gravitational wave (GW) community. Convolutional neural networks were recently applied for the detection and characterization of GW signals in real-time 

George and Huerta (2016, 2017a). The use of machine learning algorithms have also been explored to address long-term challenges in GW data analysis for classification of the imprints of instrumental and environmental noise from GW signals Powell et al. (2015, 2016); Zevin et al. (2016); George et al. (2017a, b), and also for waveform modeling Huerta et al. (2017). Torres et al. (2015, 2014, 2016a) introduced a variety of methods to recover GW signals embedded in additive Gaussian noise.

PCA is widely used for dimension reduction and denoising of large datasets Jolliffe (2002); Anderson (2003). This technique was originally designed for Gaussian data and its extension to non-Gaussian noise is a topic of ongoing research Jolliffe (2002). Dictionary learning Mairal et al. (2009a); Hawe et al. (2013); Mairal et al. (2009b) is an unsupervised technique to learn an overcomplete dictionary that contains single-atoms from the data, such that the signals can be described by sparse linear combinations of these atoms Aharon et al. (2006); Baraniuk et al. (2010). Exploiting the sparsity is useful for denoising, as discussed in Baraniuk et al. (2010); Gribonval and Nielsen (2006); Mairal et al. (2009a); Hawe et al. (2013); Mairal et al. (2009b)

. Given the dictionary atoms, the coefficients are estimated by minimizing an error term and a sparsity term, using a fast iterative shrinkage-thresholding algorithm 

Beck and Teboulle (2009).

Dictionary learning was recently applied to denoise GW signals embedded in Gaussian noise whose peak signal-to-noise ratio (SNR)111

Peak SNR is defined as the peak amplitude of the GW signal divided by the standard deviation of the noise after whitening. We have also reported the optimal matched-filtering SNR (MF SNR) 

Owen and Sathyaprakash (1999) alongside the peak SNR in this paper. Torres et al. (2016b). This involves learning a group of dictionary atoms from true GW signals, and then reconstructing signals in a similar fashion to PCA, i.e., by combining different atoms with their corresponding weights. However, the drawback is that coefficients are not simply retrieved from projections but learned using minimization. Therefore, denoising a single signal requires running minimization repeatedly, which is a bottleneck that inevitably leads to delays in the analysis. Furthermore, it is still challenging to estimate both the dictionary and the sparse coefficients of the underlying clean signal when the data is contaminated with non-Gaussian noise Chainais (2012); Giryes and Elad (2014).

To address the aforementioned challenges, we introduce an unsupervised learning technique using a new model which we call Staired Multi-Timestep Denoising Autoencoder (SMTDAE), that is inspired by the recurrent neural networks (RNNs) used for noise reduction introduced in 

Maas et al. (2012). The structure of the SMTDAE model is shown in FIG 1(b). RNNs are the state-of-the-art generic models for continuous time-correlated machine learning problems, such as speech recognition/generation Graves et al. (2013); Arik et al. (2017); Zhang et al. (2017)

, natural language processing/translation 

Sutskever et al. (2014), handwriting recognition Graves and Schmidhuber (2009), etc. A Denoising Autoencoder (DAE) is an unsupervised learning model that takes noisy signals and return the clean signals Bengio et al. (2013); Vincent et al. (2008, 2010); Maas et al. (2012). By combining the advantages of the two models, we demonstrate excellent recovery of weak GW signals injected into real LIGO noise based on the two measurements, Mean Square Error (MSE) and Overlap 222Overlap is calculated via matched-filtering using the PyCBC library Usman et al. (2016) between a denoised waveform and a reference waveform. Dal Canton et al. (2014); Usman et al. (2016). Our results show that SMTDAE outperforms denoising methods based on PCA and dictionary learning using both metrics.

(c) Reference
Figure 1: 1(a) shows the structure of MTDAE. This model passes multiple inputs of a noisy waveform into hidden layers constructed with LSTM cells and outputs a clean version. The timestep in the output is the middle timestep in each of the multiple inputs. 1(b) indicates our proposed SMTDAE structure that uses a sequence-to-sequence model. It differs from MTDAE in that the final state of encoder is passed to the beginning state of decoder in hidden layers. We also include a Signal Amplifier before the output layer in the network to enhance signal reconstruction. The nomenclature is described in 1(c).

Ii Methods

The noise present in GW detectors is highly non-Gaussian, with a time-varying (non-stationary) power spectral density. Our goal is to extract clean GW signals from the noisy data stream from a single LIGO detector. Since this is a time-dependent process, we need to ensure that SMTDAE can recover a signal given noisy signal input and return zeros given pure noise.

Denoising GWs is similar to removing noise in automatic speech recognition (ASR) through RNN, as illustrated in FIG 1(a). The state-of-the-art tool in ASR is the Multiple Timestep Denoising Autoencoder (MTDAE), introduced in Maas et al. (2012). The idea of this model is to take multiple time steps within a neighborhood to predict the value of a specific point. Compared to conventional RNNs, which takes only one time step input to predict the value of that corresponding output, MTDAE takes one time step and its neighbors to predict one output. It is shown in Maas et al. (2012) that this model returns better denoised outputs.

Realizing the striking similarities between ASR and denoising GWs, we have constructed a Staired Multiple Timestep Denoising Autoencoder (SMTDAE). As shown in FIG 1(b), our new model encodes the actual physics of the problem we want to address by including the following novel features:

  • Since GW detection is a time-dependent analysis, our encoder and decoder have time-correlations, as shown in FIG 1(b). The final state that records information of the encoder will be passed to the first state of the decoder. We use a sequence-to-sequence model Sutskever et al. (2014) with two layers for the encoder and decoder, where each layer uses a bidirectional LSTM cell Hochreiter and Schmidhuber (1997). This type of structure is widely used in Natural Language Processing (NLP) 333A practical implementation of NLP for LIGO was recently described in Mukund et al. (2017).

  • We have included another scalar variable which we call Signal Amplifier—indicated by a green circle in FIG 1(b)

    . This is extremely helpful in denoising GW signals when the amplitude of the signal is lower than that of the background noise. Specifically, we use 9 time steps to denoise inputs for one time step. For each hidden layer in the encoder and decoder, we have 64 neurons.

The key experiments which we conducted and the results of our analysis are presented in the following sections.

Iii Experiments

For this analysis, we use simulated gravitational waveforms that describe binary black hole (BBH) mergers, generated with the waveform model introduced in Bohé et al. (2017), which is available in LIGO’s Algorithm Library LSC . We consider BBH systems with mass-ratios in steps of 0.1, and with total mass , in steps of for training. Intermediate values of total mass were used for testing. The waveforms are generated with a sampling rate of 8192 Hz, and whitened with the design sensitivity of LIGO Shoemaker (2010). We consider the late inspiral, merger and ringdown evolution of BBHs, since it is representative of the BBH GW signals reported by ground-based GW detectors Abbott et al. (2016a, b, 2017a, 2017b)

. We normalize our inputs (signal+noise) by their standard deviation to ensure that the variance of the data is 1 and the mean is 0. In addition, we add random time shifts, between 0% to 15% of the total length, to the training data to make the model more resilient to variations in the location of the signal. Only simulated additive white Gaussian noise was added during the training process, while real non-Gaussian noise, 4096s taken from the

LIGO Open Science Center (LOSC) around the LVT151012 event, was whitened and added for testing.

Decreasing SNR over the course of training can be seen as a continuous form of transfer learning 

Weiss et al. (2016), called Curriculum Learning (CL) Bengio et al. (2009), which has been introduced in George and Huerta (2016) for dealing with highly noisy GW signals. Signals with high peak SNR ¿ 1.00 (MF SNR ¿ 13) can be easily denoised, as shown in FIG 2. When the training directly starts with very low SNR from the beginning, it is difficult for a model to learn the original signal structure and remove the noise from raw data. To denoise signals with extremely low SNR, our training starts with a high peak SNR of 2.00 (MF SNR = 26) and then it gradually decreases every round during training until final peak SNR of 0.50 (MF SNR = 6.44).

Iv Results

All our training session were performed on NVIDIA Tesla P100 GPUs using TensorFlow 

Abadi et al. (2016). We show the results of denoising with our model using signals from the test set injected into real LIGO noise in FIG 2, and compare them with PCA and dictionary learning methods (using the code based on Mairal et al. (2014)). MSE and Overlap are reported with each figure. MSE is a measure of

distance in vector space of GWs, whereas Overlap indicates the level of agreement between the phase of the two signals. Since both MSE and Overlap provide complementary information about the denoised waveforms, we include both measurements in our analysis.

(b) Dictionary Learning
(c) PCA
(e) Dictionary Learning
(f) PCA
(h) Dictionary Learning
(i) PCA
Figure 2: Denoising results on test set signals injected into real non-Gaussian LIGO noise. 2(a), 2(d) and 2(g) show results of SMTDAE trained only on simulated Gaussian noise on signals injected into real LIGO noise with peak SNR 0.50 and 1.00—equivalent to MF SNR of 6.44 and 12.90, respectively—and on pure LIGO noise (SNR 0.00). 2(b), 2(e) and 2(h) show corresponding results for dictionary learning model described in Torres et al. (2016b). 2(c), 2(f) and 2(i) show results for PCA model with 10 principal components. The length of each principal component is same as the length of a signal. Peak SNR, optimal matched-filtering SNR (MF SNR), mean square error (MSE) and Overlap are indicated in each panel.

In FIG 2, we show results with PCA, dictionary learning, and SMTDAE, on the test set signals embedded in real LIGO noise. Note that our model was only trained with white Gaussian noise. We show that after training at different SNRs, our model outperforms PCA and dictionary learning in terms of the MSE and Overlap in the presence of real LIGO noise. In addition, our model is able to return a flat output of zeros when the inputs are either pure Gaussian noise or non-Gaussian, non stationary LIGO noise. In terms of computational performance, PCA takes on average two minutes to denoise 1s of input data. In stark contrast, applying our SMTDAE model with a GPU, takes on average less than 100 milliseconds to process 1s of input data.

V Conclusion

We have introduced SMTDAE, a new non-linear algorithm to denoise GW signals which combines a DAE with an RNN architecture using unsupervised learning. When the input data is pure noise, the output of the SMTDAE is close to zero. We have shown that the new approach is more accurate than PCA and dictionary learning methods at recovering GW signals in real LIGO noise, especially at low SNR, and is significantly more computationally efficient than the latter. More importantly, although our model was trained only with additive white Gaussian noise, SMTDAE achieves excellent performance even when the input signals are embedded in real LIGO noise, which is non-Gaussian and non-stationary. This indicates SMTDAE will be able to automatically deal with changes in noise distributions, without retraining, which will occur in the future as the GW detectors undergo modifications to attain design sensitivity.

We have also applied SMTDAE to denoise new classes of GW signals from eccentric binary black hole mergers, simulated with the Einstein Toolkit Löffler et al. (2012), injected into real LIGO noise, and found that we could recover them well even though we only used non-spinning, quasi-circular BBH waveforms for training. This indicates that our denoising method can generalize to new types of signals beyond the training data. We will provide detailed results on denoising different classes of eccentric and spin-precessing binaries as well as supernovae in a subsequent extended article. The encoder in SMTDAE may be used as a feature extractor for unsupervised clustering algorithms George et al. (2017b). Coherent GW searches may be carried out by comparing the output of SMTDAE across multiple detectors or by providing multi-detector inputs to the model. Denoising may also be combined with the Deep Filtering technique George and Huerta (2016, 2017b) for improving the performance of signal detection and parameter estimation of GW signals at low SNR, in the future. We will explore the application of this algorithm to help detect GW signals in real discovery campaigns with the ground-based detectors such as LIGO and Virgo.