Source-Agnostic Gravitational-Wave Detection with Recurrent Autoencoders

by   Eric A. Moreno, et al.

We present an application of anomaly detection techniques based on deep recurrent autoencoders to the problem of detecting gravitational wave signals in laser interferometers. Trained on noise data, this class of algorithms could detect signals using an unsupervised strategy, i.e., without targeting a specific kind of source. We develop a custom architecture to analyze the data from two interferometers. We compare the obtained performance to that obtained with other autoencoder architectures and with a convolutional classifier. The unsupervised nature of the proposed strategy comes with a cost in terms of accuracy, when compared to more traditional supervised techniques. On the other hand, there is a qualitative gain in generalizing the experimental sensitivity beyond the ensemble of pre-computed signal templates. The recurrent autoencoder outperforms other autoencoders based on different architectures. The class of recurrent autoencoders presented in this paper could complement the search strategy employed for gravitational wave detection and extend the reach of the ongoing detection campaigns.


page 1

page 2

page 3

page 4


ODDObjects: A Framework for Multiclass Unsupervised Anomaly Detection on Masked Objects

This paper presents a novel framework for unsupervised anomaly detection...

Online-compatible Unsupervised Non-resonant Anomaly Detection

There is a growing need for anomaly detection methods that can broaden t...

Unsupervised detection of mouse behavioural anomalies using two-stream convolutional autoencoders

This paper explores the application of unsupervised learning to detectin...

Comparing Weak- and Unsupervised Methods for Resonant Anomaly Detection

Anomaly detection techniques are growing in importance at the Large Hadr...

Radial Autoencoders for Enhanced Anomaly Detection

In classification problems, supervised machine-learning methods outperfo...

Including Sparse Production Knowledge into Variational Autoencoders to Increase Anomaly Detection Reliability

Digitalization leads to data transparency for production systems that we...

Landmine Detection Using Autoencoders on Multi-polarization GPR Volumetric Data

Buried landmines and unexploded remnants of war are a constant threat fo...

1 Introduction

The detection of gravitational waves (GW) from stellar binaries such as black hole and neutron star mergers have ushered in a new era of analyzing the Universe. Operating together, the Laser Interferometer Gravitational-wave Observatory (LIGO) [1] and the Virgo Interferometer [7] can peer into deep space giving astronomers the ability to uncover and localize stellar processes through their gravitational signature. The first observation of a binary black hole merger (GW150914) [2] has given way to a plethora of GW events, and notably to the observation of intermediate-size black holes [3] and neutron star mergers [4], an event that marked the beginning of the multi-messenger astronomy era [5].

Instrumental on the software side of these observations are the algorithms which identify the faint GW signals in an environment characterized by overwhelming classical and quantum noise. The most sensitive detection algorithm, Matched Filtering (MF) [8], consists of matching incoming data with templates of simulated GW shapes, covering the parameter space of binary masses  [43, 10, 36, 15, 44]

, which is then used to identify the signal. At the same time, they offer an estimate of the astrophysical parameters associated to the detected GW event, such as the nature and mass of the two merging objects. This method was extremely functional to the success of the LIGO and VIRGO observation campaigns. On the other hand, by relying on pre-computed templates, it could lead to missing events for which a template is not available. This could be an event originating from computationally prohibitive conditions 

[28, 26, 27, 25], or even some sort of unforeseen GW source.

Detecting GWs is certainly one of the hardest challenges faced in fundamental science in the recent years. Given how weak a GW signal is when compared to typical noise levels, it is natural to look at Machine Learning (ML), and especially to Deep Learning (DL), in order to improve a signal detection capability. For instance, Refs. 

[22, 21] discusses how Convolutional 1D networks [19, 29] can be trained to extract a variety of GW signals from highly noisy data. This is an example of a supervised classifier, i.e., a classifier trained to separate different populations in data (e.g., signal vs. noise) by matching a given set of ground-truth labels. While classifiers could certainly contribute to enhance the current state-of-the-art detection capability, they rely on pre-defined signal much like the MF technique. In other words, they are designed to possibly improve the detection accuracy but they are not necessarily going to extend the detector sensitivity to exotic signals outside the portfolio of pre-simulated templates. In fact, these networks are typically trained on labelled data from simulation, so even in this case the capability of simulating a given signal is an underlying requirement. On the other hand, GW detection comes with the need of going beyond signal signatures that can be emulated. Besides the search for exotic sources, model independent strategies could be useful to deal with practical issues such as glitch detection [16].

In this paper, we investigate the possibility of rephrasing the problem of GW detection as an anomaly detection task. By anomaly detection we mean the use of a one-class classifier, in this case a Deep Autoencoder (AE) [42]

, to identify outlier populations in an unlabelled dataset. An autoencoder is a compression-decompression algorithm that is trained to map a given set of inputs into itself, by first compressing the input into a point in a learned latent space (encoding) and then reconstructing it from the encoded information (decoding).

Once trained on standard events, the autoencoder might fail to reconstruct samples of different kind (the anomalies). Any input-to-output distance measurement can then be used to identify these anomalies. Under the assumption that these anomalies are rare, one can directly train these AEs on data, looking for the set of AE parameter values that minimize the difference between the input and the output, using some distance D

as a loss function. By taking as a reference the distribution of

D on data, one can label anomalies as the outlier events of this distribution. This very same approach was recently discussed in Ref. [31], where the reach of Convolutional autoencoders for GW detection is investigated. In this work, we consider two recurrent AE architectures: Long-short memory networks (LSTMs) [24]

, and Gated Recurrent Units (GRUs) 


). For comparison, we consider alternative AE architectures: dense (i.e., fully connected) neural networks (DNNs) and Convolutional Neural Networks (CNNs).

Data are compressed to a latent space (encoded) and reconstructed (decoded) from it. The distance between output and input, in a properly chosen metric, can be used to identify anomalies selecting outliers from the distance distribution. The main advantage of this unsupervised strategy is that one algorithm is potentially sensitive to multiple signal typologies. On the other hand, this gain in flexibility is typically followed by a loss in accuracy. For a specific signal, an algorithm trained with an unsupervised procedure on unlabeled data is typically less accurate than a supervised classifier trained on labeled data.

The proposed strategy comes with another remarkable advantage: under the assumption that anomalies are rare instances in the processed data, autoencoders can be trained directly on data, without relying on signal or noise simulation. Instead, supervised algorithms require labels which, in the case of rare signals like those considered in this study, are obtained using synthetic data (e.g., from Monte Carlo simulation). Assuming this training could happen in real time, the AE could adapt to changing experimental conditions and limit the occurrence of false claims.

This paper is organized as follows: related works describing ML approaches to GW detection are briefly discussed in Section 2. The dataset utilized for this study is described in Section 3. The autoencoder architectures are described in Section 4, together with the corresponding classifiers used to benchmark performance. Results are presented in Section 5. Conclusions are given in Section 6.

2 Related work

Besides MFs, Bayesian inference libraries 

[9, 41]

have been created to estimate the properties of a gravitational-wave source. Unsupervised anomaly detection of transients has also been proposed using a temporal version of k-nearest neighbors (kNN


Most DL approaches for GW classification involve supervised learning techniques, which typically provide competitive accuracy by exploiting network non-linearity and the information provided by ground-truth labels.

Principal Component Analysis [18] (PCA), which performs a linear orthogonal transformation of a set of (possibly correlated) variables into a set of linearly uncorrelated variables, has also been introduced for transient detection [38, 39]. This can give a quick characterization of the intrinsic properties of a data sample. The DL methods discussed in this paper are generalizations of this approach to include nonlinear compression.

Boosting Neural Networks [37], which use a combination unsupervised and supervised learning techniques, have also been used for GW classification [34]

. This method performs an unsupervised hierarchical clustering on the incoming data to identify possible groups and a supervised Bayesian classifier to do the final classification. BNNs can classify a number of different injections including Gaussian, Ringdowns, Supernovae, white noise bursts, mergers, etc. Importantly, architecture requires sufficient statistics to cluster in an unsupervised manner, which makes it not ideal for the rare processes that we focus on in this paper.

Finally, CNNs have achieved high accuracy while having the added benefit of parameter estimation to infer useful parameters of the GWs such as masses and spins of the binary merger components [22, 21]. In a similar approach to the one discussed in this paper, Ref. [31] discusses the use of CNN autoencoders in order to classify GWs. For comparison, the same architecture as specified in [31] is implemented in Section 5 alongside the recurrent autoencoders.

3 Data samples

Figure 1: Resulting strains from simulated gravitational waves on the Livingston (L1) and Hanford (H1) detectors from the coalescence of a 70 M and a 56 M black holes, obtained using the GGWD package [20]. The re-scaled amplitude of the GW signal is illustrated in orange and the strain on the detector including this signal is illustrated in blue.

This study is performed on a sample of synthetic data, generated for this study and available on Zenodo [32, 33]. Data are generated using the GGWD library [20].

Noise events occur when no signal is overlapped to the detector noise. Detector noise is generated at a specified power spectral density (PSD), which is computed as a function of the length of noise, time step of the noise, and noise weighting to color the noise. This is done using PyCBC  [35]. This approach to simulated data generation ignores glitches, blips, and other transient sources of detector noise 222For this reason, the considered AEs could be re-purposed for anomaly detection algorithms to identify detector glitches. The main difference between these kind of anomalies and those of astronomical significance would be the coincidence of anomalies across different detectors, a clear indication of an anomalous signal..

In absence of exotic signal sources, we use traditional GW signals to assess the detection performance. We consider two kinds of GW sources: Binary black hole (BBH) and Binary neutron star (BNS) mergers. Signal events are generated simulating GW production from compact binary coalescences using PyCBC [35], which itself uses algorithms from LIGO’s LAL Suite [30]. Signal event containing GWs were created overlaying simulated GWs on top of detector noise. This provides an analogous situation to a real GW, in which the strain from the incoming wave is recorded in combination with the normal detector noise.

The dataset consists of 300,000 noise samples. Each sample corresponds to 8 seconds of data, sampled at 2048 Hz. We consider a LIGO-like experimental setup, with two detectors (L1 and H1) taking data simultaneously at different locations. For each detector, a sample is represented as a one-dimensional array with 16,384 entries.

Figure 1 shows one of the simulated signal events: a coalescence of two black holes, with masses set to 70 and 56 stellar masses (M

), detected by the L1 and H1 detectors. The generation parameters are listed in the figure: the spin of the two black holes, right ascension, declination, coalescence phase, inclination, polarization, and signal-to-noise ratio 

[20]. In absence of any source of noise, the signal would appear as shown by the orange line. Once the noise is added, the detectable signal corresponds to the blue line.

This dataset is split in three parts: 192,000 training samples (), 48,000 validation samples (), and 60,000 test samples (). The training and validation datasets are used in the optimal-parameter learning process, while the test dataset is used to assess the algorithm performance after training, together with the signal samples.

The BBH sample is generated as follows:

  • SEOBNRv4 [12] Approximant.

  • Masses independently and uniformly varied within [10, 80] M.

  • Spins independently and uniformly varied within [0, 0.998].

  • Injection signal-to-noise (SNR) ratio uniformly varied within [5, 20].

  • Coalescence phase uniformly varied within [0, 2].

  • Inclination uniformly varied within [0, ].

The BNS sample is generated as follows:

  • IMRPhenomDNRTidal_v2 [6] Approximant.

  • Masses independently and uniformly varied within [5, 10] M.

  • Spins independently and uniformly varied within [0, 0.998].

  • Injection signal-to-noise (SNR) ratio uniformly varied within [5, 20].

  • Coalescence phase uniformly varied within [0, 2].

  • Inclination uniformly varied within [0, ].

Data are whitened with a Fast Fourier Transform integration length of 4 seconds and a duration of the time-domain Finite Impulse Filter whitening filter of 4 seconds to remove the underlying correlation in the data 

[17]. Then, band-pass filtering was applied to remove high frequency (above 250 Hz) and low frequency (below 50 Hz) components from the data. Doing so, background from outside the current interferometer sensitivity range is discarded. To facilitate the data processing and learning by the network, the data are scaled absolutely to a [0, 1] range. Each 8 second event is then cropped to 2.5 sec around the GW event, with the GW happening at a random time after the beginning of the time window. The delay is uniformly sampled in a [0.5, 2] sec interval. This choice allows us to take into account the appropriate time for the signal ring-up/ring-down. This is done to assure that the model classifiers are not biased to a certain time period within the event, which would occur if the simulated data has GWs appearing at only a single time within the event window.

4 Network architectures

Autoencoders are algorithms that project an input sample to its encoded projection in a latent space , typically of lower dimension than the input space. The encoded projection is then decoded to a reconstructed . The network parameters determine how is projected to and then back to . Their values are fixed minimizing some input-to-output distance, used as a loss function in the network training. In this study, we consider the mean-squared error (MSE) between each element of the input array and the corresponding output. We consider several network architectures, all structured according to a common scheme with the decoder mirroring as close as possible the encoder architecture. Four specific architectures are introduced, with DNN, CNN, LSTM, or GRU layers.

As a first example, we consider a DNN AE. The input to this DNN AE is 51 one-dimensional arrays of shape (100, 1) corresponding to a 2.5-second interval sampled at 2048 Hz. In this case, the four network blocks are all expressed as fully connected layers. The encoder consists of 3 layers, with number of nodes 100, 50, 10, which produces a latent representation of 10 nodes. The decoder is structured with the mirrored architecture, i.e., 3 layers with 10, 50, 100 nodes. Hidden-layer nodes are activated using ReLU 

[23] functions, while linear activation is used for the output layer.

The LSTM and GRU networks function similarly, utilizing LSTM and GRU cells instead of simple DNN nodes. The latent space bottleneck in this representation is created by instructing the LSTM (or GRU) cells to only return their final state in the encoding phase and then repeating that final state as a vector which can then be inputted to the decoder. The input to these recurrent architectures are 51 one-dimensional arrays of shape (100, 1) corresponding to a 2.5-second interval sampled at 2048 Hz. The encoder consists of 2 layers, with number of units 32, 8, which produces a latent representation by only outputting the final LSTM sequence on the final layer. The decoder consists of two layers with 8, 32 nodes, which is then multiplied by a final temporal slice of a dense layer, yielding the same dimensions as the input representation. For illustration, the architecture of the LSTM model is shown in Fig. 


As a comparison, we implement a CNN AE with an input of five one-dimensional images of shape (1024, 1) corresponding to a 2.5-second interval sampled at 2048 Hz. The encoder consists of two one-dimensional convolutional layers with filters of size [256, 128] and kernel of size 3, coupled with a maxpool layer of size 2. The decoder mirrors this architecture with an upsampling layer of size 2, and two one-dimensional convolutional layers with filter size [256, 1] and kernel of size 3. Deeper CNN AEs with additional layers were attempted but yielded worse results.

Figure 2: Graphical illustration of LSTM autoencoder. The general encoder structure still remains, and the latent space is created by setting return_sequences=False in the LSTM cell, passing through only the final time step of the LSTM. This latent space is then repeated using a RepeatVector, forcing the encoder to create some important latent representation as its final output which can then be sent to the decoder.

The DNN, LSTM, and GRU autoencoders are all trained on unlabeled detector noise data, with no introduction to the shape of the signal. Doing so, the latent space representation returned by the encoder is exclusively a function of detector noise. As a result, the MSE error is relatively consistent during periods with exclusively noise, but it might instantaneously increase when a signal event passes through the autoencoder. An example of such a spike in the MSE loss as a function of time is shown in Fig. 3. The spike is typically due to the fact that the encoding/decoding sequence learned on noise might not be optimal for a previously unobserved kind of input data. As a result, the distance between the input and the output could be larger for anomalous data, up to generate a spike. Operationally, one could then monitor the MSE value returned by the algorithm, and the detection of a signal could be correlated to the observation of a spike above threshold.

Figure 3: Spike/dip in MSE loss for an LSTM autoencoder over two 8-second event sampled at 2048 Hz (split into 100-timestep inputs) signaling the detection of a gravitational wave. Detection thresholds are set by fixing a FPR, which is done continuously to create a ROC curve in Section 5.

The performance of the four AE models is assessed comparing their accuracy on benchmark signal samples to that obtained from binary CNN classifiers, trained on the same data and the corresponding labels. Different classifiers are trained for different signal topologies. The classifier architecture is loosely equivalent to that of the CNN encoder from Ref. [22, 21]

. In particular, it consists of four convolution layers, with 64, 128, 256, and 512 filters respectively, and two fully connected layers with 128 and 64 nodes, respectively. A ReLU activation function was used throughout. Kernel sizes of 16x16, 16x16, 16x16, and 32x32 with a stride of 1 for the convolutional layers and 4x4 with a stride of 4 for all the (max) pooling layers were used with dilations of 1, 2, 2, and 2 in the corresponding convolutional layers. A sigmoid function is used for the single-node output layer. An LSTM-implementation of the supervised classifier was also attempted but yielded results far worse than the CNN classifier method, so it is not included in this study.

Two classifiers of this kind are trained, using a dataset of 300,000 samples, consisting of noise events and one of the two classes of signal (BBH and BNS) considered in this study. The training is performed minimizing a binary cross entropy error loss function on the training sample of Sec. 3, using the validation set to optimize the training and the test set to evaluate the model performance.

The classifier is tested on noise samples, as well as on BBH and BNS events. When tested on the same kind of signal it is trained on, the classifier accuracy is used to estimate the best accuracy that the AE could reach and, consequently, the loss in accuracy due to the use of an unsupervised approach. When testing the classifier on the signal it was not trained on, we can instead compare the generalization property of the autoencoder to that of a supervised algorithm. The two tests provide an assessment of the balance between accuracy and generalization power and demonstrate the complementarity between our approach and a standard template-based method. In practice, one could implement as many supervised algorithms as known GW sources, while using an unsupervised algorithm to be sensitive to unexpected signal sources (and non-coincident signals across multiple interferometers, in cases of glitch detection and data quality monitoring).

5 Results

Figure 4 shows the receiver operating characteristic (ROC) curves for three autoencoder architectures (LSTM, GRU, CNN). The curves are obtained considering a single detector, i.e., no coincidence is enforced at this stage. In the left (right) figure, the ROC curves are evaluated on noise and a signal sample of BBH (BNS) merger data. For comparison, the CNN classifiers trained on both datasets are shown.

Figure 4: Single-detector ROC curves for the LSTM, GRU, CNN Autoencoders compared to the corresponding CNN supervised architecture. The figure shows the ROC curves for supervised CNNs trained on the BBH and BNS merger data, when the inference is performed on noise and a signal sample of BBH (left) or BNS (right) events.

As the ROC curves show, the LSTM architecture provides the best accuracy among the AEs. The LSTM AE accuracy is worse than that of the classifier trained on the right signal hypothesis, but better than that of the classifier trained on the wrong signal hypothesis. The performance comparison is quantified in Table 1, where the false positive rates (FPRs) corresponding to fixed values of the true positive rates (TPRs) are shown. The ROC curves and the values shown in Table 1 quantify the trade-off between accuracy and generalization that motivates this study. This makes autoencoders especially useful to potentially discover unexpected GW sources, as well as GW sources that cannot be modeled by traditional simulation techniques.

BBH signal vs. noise
0.1 56.4% 37.8% 43.8% 80.1% 26.9%
0.01 38.7% 21.0% 26.2% 68.3% 11.7%
BNS signal vs. noise
0.1 21.7% 15.4% 16.6% 17.3% 64.0%
0.01 4.2% 2.3% 2.3% 1.75% 44.2%
Table 1: True-positive rates for BBH and BNS merger detection at 10% and 1% false-positive rates, for autoencoders trained on noise and for binary CNN classifiers trained on BBH and BNS simulations. The autoencoder architecture with the best unsupervised results is marked in bold.

This advantage is especially marked with BBHs, which have larger masses and thus larger SNRs. Thus, autoencoder architectures will likely have an advantage with higher mass-range mergers in the regime where supervised learning models cannot generalize but which still have large SNRs. This is as opposed to BNSs, which have lower mass values and consequently lower SNR signatures. In this case, the generalization performance stagnates for both of the models, meaning that both models are extracting the same amount of signal out of the events. Still, the autoencoder models perform better than the supervised algorithm trained on the wrong signal hypothesis.

Figure 5: True positive rates for the LSTM Autoencoder at a fixed False Positive Rate (FPR=0.1) with BBH (left) and BNS (right) events at variable Signal-to-noise ratios.

The TPR values quoted on Table 1

are obtained averaging across the SNR, which is uniformly distributed in a [5, 20] range. On the other hand, the TPRs of the AE models depend strongly on the SNR value, as shown in Fig. 

5 both for BBH and BNS merger events. As shown in the figure, the LSTM AE guarantees better performance across the considered range of SNR values. While the improvement (e.g., with respect to the CNN AE) is roughly constant for BBH events, in the case of BNS events the LSTM AE is particularly better than the other architectures for large SNR values. Overall, this study reinforces the idea that the LSTM AE is the most robust choice among those we considered.

In a realistic exploitation of this algorithm, one would define a threshold above which the data would be called a potential signal. Doing so, one would like to keep the FPR at a manageable rate, while retaining a reasonable TPR value. For instance, an FPR of would correspond to about one false alarm a day, low enough for a post detection assessment of the nature of the anomaly. Similarly, a FPF of would correspond to about one false alarm every three months, low enough for the algorithm to be used in a real-time data processing, e.g., to serve as a trigger for multi-messenger astronomy.

Figure 6: ROC curves for the LSTM, GRU, CNN Autoencoders, obtained by exploiting signal coincidence on two detectors. The loss is computed independently on each detector and the same threshold is applied. The test is performed on noise and a signal sample of BBH (left) or BNS (right) events.

The key ingredient to reach low FPR values is the exploitation of signal coincidence across multiple detectors. Since the noise across detectors is uncorrelated, single-detector FPR of (as in Table 1) would give a global FPR of () when two (three) instruments are put in coincidence. Clearly, the presence of uncorrelated noise overlapped to the signal dilutes the correlation of the anomaly across different devices. On the other hand, a certain level of correlation is retained. For instance, we observe a 30% correlation on the LSTM anomaly score for BBH merger events. For comparison, a 60% correlation is observed for the CNN classifier. Coincidence can be enforced requiring that two signals above a certain threshold are detected at the same time. Alternatively, one could apply a threshold on the sum of the two losses, with the idea that an MSE loss function is loosely related to the negative log likelihood, so that the sum of the loss would correspond to the negative log of the likelihood products. The former approach has the advantage of requiring the two detectors to communicate only after the anomaly event in a detector is identified. This means that the data throughput to be transmitted can be kept low. On the other hand, the latter approach provides better performance and it is considered here. In this case, one would have to find solutions to mitigate the data throughput and facilitate the communication of the detectors in real time. For instance, one could run the encoder at each experiment site and transmit the compressed data, with the decoding and coincidence check happening off-site. One should keep in mind that the LSTM model can run on a Field Programmable Gate Array within nsec, as demonstrated in Ref. [40].

To show this, we consider the case of two detectors and we build a ROC curve requiring a signal above a threshold on the sum of the autoencoder losses. The result is shown in Fig. 6, both for BBH and BNS mergers and quantified in Table 2. Keeping as a target a FPR of , one can retain a BBH TPR comparable to that of the TPR of the single-detector threshold (see Table 1), while reducing the FPR by two orders of magnitude. The gain is less striking in the case of BNS: the two-detector combination comes with a reduction in terms of TPR, but the value achieved is still better than the square of the single-detector TPRs (e.g., 0.5% for a FPR, to be compared with 0.18% from squaring the value quoted in Table 1). Overall, the improvement obtained using two detectors depends on the SNR value but certainly there is an advantage in exploiting the coincidence of the signal across detectors.

BBH signal vs. noise
0.01 54.1% 36.1% 38.3% 78.7% 14.6%
0.0001 38.9% 18.1% 17.4% 58.2% 2.9%
BNS signal vs. noise
0.01 7.4% 3.1% 2.8% 3.3% 53.7%
0.0001 0.5% 0.2% 0.2% 0.0% 31.6%
Table 2: True-positive rates for BBH and BNS merger detection at 10% and 1% false-positive rates, obtained by exploiting signal coincidence in two detectors. The autoencoder architecture with the best unsupervised results is marked in bold.

6 Conclusions

We presented an unsupervised strategy to detect GW signals from unspecified sources exploiting an AE trained on noise. The AE is trained to compress input data to a low-dimension latent space and reconstruct a representation of the input from the point in the latent space. The algorithm is optimized using as a loss function a differentiable metric, quantifying the distance between input and output data. Given a trained AE, one could identify anomalous data isolating the outlier data populating the tail of the loss distribution.

We applied this strategy to a sample of synthetic data from two GW interferometers. We explore different choices for the network architecture and compare the single-detector detection capability to that of a CNN binary classifier, trained on specific signal hypotheses. We show how a recurrent AE provides the best anomaly detection performance on benchmark BBH and BNS merger events. We show the trade-off between accuracy and generalization, when using this unsupervised strategy rather than a supervised approach (here quantified through a CNN binary classifier trained on the same data and the corresponding labels).

We show how the coincidence of two detectors, both selecting anomalies at an expected FPR, would retain a TPR of 53.1% (28.7%) for BBH (BNS) signals while giving one false alarm a day which can be easily be discarded after a post-detection analysis, e.g., with more traditional GW detection strategies.

With the same FPR, one could bring the false alarm rate to about once every three month, exploiting the coincidence of three detectors. In this case, the algorithms proposed here could operated in the real-time as part of a trigger system for multi-messenger astronomy, in the spirit of what is discussed in Ref. [13] for real-time data analysis at the Large Hadron Collider. Considering the relatively low computational cost of such an algorithm [40]

and the high impact of a potential signal detection by this algorithm, its implementation for LIGO and VIRGO would be certainly be beneficial, despite the fact that the expected detection probability cannot be guaranteed to be high for any signal source.


We are grateful to the insight and expertise of Rana Adhikari and Hang Yu from the LIGO collaboration and Elena Cuoco from the VIRGO collaboration, who guided us on a field of research which is not our own.

Part of this work was conducted at "iBanks", the AI GPU cluster at Caltech. We acknowledge NVIDIA, SuperMicro and the Kavli Foundation for their support of "iBanks".

This work was carried on as part of the 2020 CERN OpenLab Summer Student program, which was carried on in remote mode due to the COVID pandemic.

M. P. is supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (Grant Agreement No. 772369).

E. M. is supported by the Institute for Research and Innovation in Software for High Energy Physics (IRIS-HEP) through a fellowship in Innovative Algorithms.

This work is partially supported by the U.S. DOE, Office of Science, Office of High Energy Physics under Award No. DE-SC0011925, DE-SC0019227 and DE-AC02-07CH11359.



  • [1] J. Aasi et al. (2015) Advanced LIGO. Class. Quant. Grav. 32, pp. 074001. External Links: 1411.4547, Document Cited by: §1.
  • [2] B. P. Abbott et al. (2016-02) Observation of gravitational waves from a binary black hole merger. Phys. Rev. Lett. 116, pp. 061102. External Links: Document, Link Cited by: §1.
  • [3] B.P. Abbott et al. (2017) GW170104: Observation of a 50-Solar-Mass Binary Black Hole Coalescence at Redshift 0.2. Phys. Rev. Lett. 118 (22), pp. 221101. Note: [Erratum: Phys.Rev.Lett. 121, 129901 (2018)] External Links: 1706.01812, Document Cited by: §1.
  • [4] B.P. Abbott et al. (2017) GW170817: Observation of Gravitational Waves from a Binary Neutron Star Inspiral. Phys. Rev. Lett. 119 (16), pp. 161101. External Links: 1710.05832, Document Cited by: §1.
  • [5] B.P. Abbott et al. (2017) Multi-messenger Observations of a Binary Neutron Star Merger. Astrophys. J. Lett. 848 (2), pp. L12. External Links: 1710.05833, Document Cited by: §1.
  • [6] B.P.. Abbott et al. (2019) Properties of the binary neutron star merger GW170817. Phys. Rev. X 9 (1), pp. 011001. External Links: 1805.11579, Document Cited by: 1st item.
  • [7] F. Acernese et al. (2015) Advanced Virgo: a second-generation interferometric gravitational wave detector. Class. Quant. Grav. 32 (2), pp. 024001. External Links: 1408.3978, Document Cited by: §1.
  • [8] B. Allen et al. (2012) FINDCHIRP: An Algorithm for detection of gravitational waves from inspiraling compact binaries. Phys. Rev. D 85, pp. 122006. External Links: gr-qc/0509116, Document Cited by: §1.
  • [9] G. Ashton et al. (2019-04) Bilby: a user-friendly bayesian inference library for gravitational-wave astronomy. The Astrophysical Journal Supplement Series 241 (2), pp. 27. External Links: Document, Link Cited by: §2.
  • [10] R. Balasubramanian, B.S. Sathyaprakash, and S.V. Dhurandhar (1996-03) Gravitational waves from coalescing binaries: detection strategies and monte carlo estimation of parameters. Phys. Rev. D 53, pp. 3033–3055. External Links: Document, Link Cited by: §1.
  • [11] Z. Benkő, T. Bábel, and Z. Somogyvári (2020) How to find a unicorn: a novel model-free, unsupervised anomaly detection method for time series. Note: arXiv cs.LG/2004.11468 Cited by: §2.
  • [12] A. Bohé et al. (2017) Improved effective-one-body model of spinning, nonprecessing binary black holes for the era of gravitational-wave astrophysics with advanced detectors. Phys. Rev. D 95 (4), pp. 044028. External Links: 1611.03703, Document Cited by: 1st item.
  • [13] O. Cerri et al. (2019) Variational Autoencoders for New Physics Mining at the Large Hadron Collider. JHEP 05, pp. 036. External Links: 1811.10276, Document Cited by: §6.
  • [14] J. Chung et al. (2014)

    Empirical evaluation of gated recurrent neural networks on sequence modeling

    CoRR abs/1412.3555. External Links: Link, 1412.3555 Cited by: §1.
  • [15] T. Cokelaer (2007-11) Gravitational waves from inspiralling compact binaries: hexagonal template placement and its efficiency in detecting physical signals. Phys. Rev. D 76, pp. 102004. External Links: Document, Link Cited by: §1.
  • [16] R.E. Colgan et al. (2020) Efficient Gravitational-wave Glitch Identification from Environmental Data Through Machine Learning. Phys. Rev. D 101 (10), pp. 102003. External Links: 1911.11831, Document Cited by: §1.
  • [17] E. Cuoco et al. (2001) On line power spectra identification and whitening for the noise in interferometric gravitational wave detectors. Class. Quant. Grav. 18, pp. 1727–1752. External Links: gr-qc/0011041, Document Cited by: §3.
  • [18] K. P. F.R.S. (1901) LIII. on lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 2 (11), pp. 559–572. External Links: Document Cited by: §2.
  • [19] K. Fukushima (1980)

    Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position

    Biological Cybernetics 36, pp. 193–202. Cited by: §1.
  • [20] T. Gebhard and N. Kilbertus (2019) Generate Gravitational-Wave Data (GGWD). Note: Cited by: Figure 1, §3, §3.
  • [21] D. George and E.A. Huerta (2018) Deep Learning for Real-time Gravitational Wave Detection and Parameter Estimation: Results with Advanced LIGO Data. Phys. Lett. B 778, pp. 64–70. External Links: 1711.03121, Document Cited by: §1, §2, §4.
  • [22] D. George and E.A. Huerta (2018) Deep Neural Networks to Enable Real-time Multimessenger Astrophysics. Phys. Rev. D 97 (4), pp. 044039. External Links: 1701.00008, Document Cited by: §1, §2, §4.
  • [23] R. Hahnloser et al. (2000) Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature 405, pp. 947–951. Cited by: §4.
  • [24] S. Hochreiter and J. Schmidhuber (1997-12) Long short-term memory. Neural computation 9, pp. 1735–80. External Links: Document Cited by: §1.
  • [25] E. A. Huerta et al. (2014) Accurate and efficient waveforms for compact binaries on eccentric orbits. Phys. Rev. D 90 (8), pp. 084016. External Links: 1408.3406, Document Cited by: §1.
  • [26] E. A. Huerta et al. (2017-01) Complete waveform model for compact binaries on eccentric orbits. Phys. Rev. D 95, pp. 024038. External Links: Document, Link Cited by: §1.
  • [27] E.A. Huerta and D.A. Brown (2013) Effect of eccentricity on binary neutron star searches in Advanced LIGO. Phys. Rev. D 87 (12), pp. 127501. External Links: 1301.1895, Document Cited by: §1.
  • [28] S. Klimenko et al. (2016-02) Method for detection and reconstruction of gravitational wave transients with networks of advanced detectors. Phys. Rev. D 93, pp. 042004. External Links: Document, Link Cited by: §1.
  • [29] Y. LeCun et al. (1999) Object recognition with gradient-based learning. In

    Shape, Contour and Grouping in Computer Vision

    pp. 319–345. External Links: ISBN 978-3-540-46805-9, Document, Link Cited by: §1.
  • [30] LIGO Scientific Collaboration (2018) LIGO Algorithm Library - LALSuite. Note: free software (GPL) External Links: Document Cited by: §3.
  • [31] F. Morawski et al. (2021) Anomaly detection in gravitational waves data using convolutional autoencoders. Machine Learning: Science and Technology. External Links: Link Cited by: §1, §2.
  • [32] E. Moreno (2021-07) Source-Agnostic Gravitational-Wave Detection with Recurrent Autoencoders: H1 detector dataset. Zenodo. Note: External Links: Document Cited by: §3.
  • [33] E. Moreno (2021-07) Source-Agnostic Gravitational-Wave Detection with Recurrent Autoencoders: L1 detector dataset. Zenodo. Note: External Links: Document Cited by: §3.
  • [34] N. Mukund et al. (2017-05) Transient classification in ligo data using difference boosting neural network. Phys. Rev. D 95, pp. 104059. External Links: Document, Link Cited by: §2.
  • [35] A. Nitz et al. (2020-08) Gwastro/pycbc: pycbc release v1.16.9. Zenodo. External Links: Document, Link Cited by: §3, §3.
  • [36] B.J. Owen (1996-06) Search templates for gravitational waves from inspiraling binaries: choice of template spacing. Phys. Rev. D 53, pp. 6749–6761. External Links: Document, Link Cited by: §1.
  • [37] N.S. Philip and K.B. Joseph (2000-12) Boosting the differences: a fast bayesian classifier neural network. Intell. Data Anal. 4 (6), pp. 463–473. External Links: ISSN 1088-467X Cited by: §2.
  • [38] J. Powell et al. (2015-10) Classification methods for noise transients in advanced gravitational-wave detectors. Classical and Quantum Gravity 32 (21), pp. 215012. External Links: Document, Link Cited by: §2.
  • [39] J. Powell, A. Torres-Forné, R. Lynch, D. Trifirò, E. Cuoco, M. Cavaglià, I. S. Heng, and J. A. Font (2017-01) Classification methods for noise transients in advanced gravitational-wave detectors II: performance tests on advanced LIGO data. Classical and Quantum Gravity 34 (3), pp. 034002. External Links: Document, Link Cited by: §2.
  • [40] Z. Que et al. (2021) Accelerating recurrent neural networks for gravitational wave experiments. External Links: 2106.14089 Cited by: §5, §6.
  • [41] I.M. Romero-Shaw et al. (2020-09) Bayesian inference for compact binary coalescences with bilby: validation and application to the first LIGO–Virgo gravitational-wave transient catalogue. Monthly Notices of the Royal Astronomical Society 499 (3), pp. 3295–3319. External Links: ISSN 0035-8711, Document, Link, Cited by: §2.
  • [42] D.E. Rumelhart, G.E. Hinton, and R.J. Williams (1986) Learning internal representations by error propagation. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1: Foundations, pp. 318–362. External Links: ISBN 026268053X Cited by: §1.
  • [43] B.S. Sathyaprakash and S.V. Dhurandhar (1991-12) Choice of filters for the detection of gravitational waves from coalescing binaries. Phys. Rev. D 44, pp. 3819–3834. External Links: Document, Link Cited by: §1.
  • [44] R. Smith et al. (2016) Fast and accurate inference on gravitational waves from precessing compact binaries. Phys. Rev. D 94 (4), pp. 044031. External Links: 1604.08253, Document Cited by: §1.