From One to Many: A Deep Learning Coincident Gravitational-Wave Search

by   Marlin B. Schäfer, et al.

Gravitational waves from the coalescence of compact-binary sources are now routinely observed by Earth bound detectors. The most sensitive search algorithms convolve many different pre-calculated gravitational waveforms with the detector data and look for coincident matches between different detectors. Machine learning is being explored as an alternative approach to building a search algorithm that has the prospect to reduce computational costs and target more complex signals. In this work we construct a two-detector search for gravitational waves from binary black hole mergers using neural networks trained on non-spinning binary black hole data from a single detector. The network is applied to the data from both observatories independently and we check for events coincident in time between the two. This enables the efficient analysis of large quantities of background data by time-shifting the independent detector data. We find that while for a single detector the network retains 91.5% of the sensitivity matched filtering can achieve, this number drops to 83.9% for two observatories. To enable the network to check for signal consistency in the detectors, we then construct a set of simple networks that operate directly on data from both detectors. We find that none of these simple two-detector networks are capable of improving the sensitivity over applying networks individually to the data from the detectors and searching for time coincidences.



There are no comments yet.


page 1

page 2

page 3

page 4


Detecting residues of cosmic events using residual neural network

The detection of gravitational waves is considered to be one of the most...

Deep learning based pulse shape discrimination for germanium detectors

Discrimination between different event signatures is a key requirement f...

New methods to assess and improve LIGO detector duty cycle

A network of three or more gravitational wave detectors simultaneously t...

Improving significance of binary black hole mergers in Advanced LIGO data using deep learning : Confirmation of GW151216

We present a novel Machine Learning (ML) based strategy to search for co...

Core-Collapse Supernova Gravitational-Wave Search and Deep Learning Classification

We describe a search and classification procedure for gravitational wave...

Detecting and Diagnosing Terrestrial Gravitational-Wave Mimics Through Feature Learning

As engineered systems grow in complexity, there is an increasing need fo...

Training Strategies for Deep Learning Gravitational-Wave Searches

Compact binary systems emit gravitational radiation which is potentially...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Gravitational waves (gws) are now routinely observed by the two Advanced LIGO detectors Aasi and others (2015) and the Advanced Virgo detector Acernese and others (2015). At the end of the last observing period, the KAGRA detector Akutsu and others (2019) joined the network and is expected to aid observations in the future. During three observing runs gws from compact binary sources have been identified, almost all of which are consistent with the merger of bbh systems Abbott and others (2019, 2020); Nitz et al. (2021); Abbott and others (2021).

Many searches for gws from compact-binary coalescence use matched filtering to separate potential signals from the background detector noise Abbott and others (2020); Messick and others (2017); Dal Canton et al. (2020); Adams et al. (2016). Matched filtering is a technique that convolves a set of pre-calculated template waveforms, each representing a possible source with different component masses, spins, etc., with the detector’s data and is known to be optimal for Gaussian noise Allen et al. (2012). A snr time series is calculated for each template waveform; candidates are identified by a peak in the snr time series that also passes data quality Nuttall and others (2015); Abbott and others (2016, 2020) checks. In a second step the candidate detections from one detector are cross-validated with the candidate detections from other detectors to further increase the significance of the reported events and rule out false positives Abbott and others (2020); Usman and others (2016); Sachdev and others (2019). For sources where the gravitational-wave signal is unknown or poorly modelled other search algorithms detect coincident excess power in different detectors and don’t require a model Klimenko and others (2016).

Deep learning has started to be explored as an alternative approach to building an algorithm to detect gws George and Huerta (2018b, a); Gabbard et al. (2018); Dreissigacker and Prix (2020); Schäfer et al. (2020); Krastev et al. (2021); Wei et al. (2021); Wei and Huerta (2021); Cuoco and others (2021); Huerta and Zhao (2021). It may potentially target signals which are currently challenging for matched filter search algorithms due to computational limitations Wei et al. (2021, 2020). The computational cost of these modeled searches scales with the number of templates required by the parameter space. Certain effects like higher-order modes Harry et al. (2018), precession Harry et al. (2016), eccentricity Nitz et al. (2019); Lenon et al. (2021), or the inclusion of sub-solar mass systems Nitz and Wang (2021, 2021) potentially require millions of templates and are thus computationally prohibitive to analyze. Deep learning may also be more sensitive when the noise is non-Gaussian Zevin and others (2017); Essick et al. (2020); Wei and Huerta (2021).

In our previous work Schäfer et al. (2021) we explored the sensitivity of a simple neural network to non-spinning bbh sources in Gaussian noise for a single detector. We tested how different training strategies influence the training procedure and the final efficiency of the network. Our results showed that under the given conditions the network can closely reproduce the sensitivity of matched filtering and that most efficient convergence is reached when a range of low snr signals is provided throughout training.

Here we extend our previous work to two detectors. To do so, we use the same single detector network explored in Schäfer et al. (2021) and apply it individually to the data from both observatories. This procedure produces a list of candidate events for each detector. We then search for coincident events between the two, where two events are assumed to be coincident if they are within the maximum time-of-flight difference between both detectors. We assume this difference to be since the networks are trained to be insensitive to variations on such scale.

The network uses the usr modification we introduced in Schäfer et al. (2021). It outputs a single detector ranking statistic. Here we use it to construct a network ranking statistic. This network ranking statistic turns out to be the sum of the individual ranking statistics minus a correction factor.

The main advantage of this approach is the trivial computation of the search background which enables robust detection claims at comparable statistical significance ( per years) to existing production methodology. By applying time shifts larger than the time-of-flight difference between the detectors to the data from only one observatory, we can create large amounts of data which by construction cannot contain any astrophysical coincident candidates. By applying the time shifts to the single detector events rather than the input data directly, we can skip re-evaluating the entire test set and efficiently look for coincident events. This is a well established method that has already been successfully applied Abbott and others (2020); Usman and others (2016); Sachdev and others (2019). By this approach we can probe the search down to a false-alarm rate (far) of 1 false-alarm per

months. The far estimates how often a candidate is produced by the search under the null hypothesis of no astrophysical candidates. Our far-estimate is limited by the assigned hardware resources rather than the available data.

We compare this search to an equivalent matched filter search Nitz et al. (2021). We find that the deep learning search still retains of the sensitivity of a two-detector matched filter search when the latter is restricted to using the timing difference between the detectors as the only means for determining coincident events. However, the matched filter search also extracts some information on the parameters of the signal. When we also require matching templates and the phase and amplitude of the triggered templates to be consistent between detectors Nitz et al. (2017), the machine learning search only retains of the sensitivity.

We then construct a single network that operates on the data from both detectors. The idea is that the network may then be able to learn, summarize, and cross-correlate signal characteristics between detectors. To do so, we remove the last layer of the original networks applied to the individual detectors and concatenate their output. Thereby the input data are compressed to a dimensional latent space. Dense layers are used to correlate the concatenated outputs and condense it into a single ranking statistic.

Using a single network complicates the background estimation, as time shifts between the detectors can in principle not be applied after evaluating the individual data streams. However, the two-detector network architecture is constructed such that the data from different detectors is analyzed by individual sub-networks, concatenated and processed by a third sub-network. This enables us to process the bulk of the data only once and apply time shifts to the individual detector sub-network outputs. To obtain the ranking statistic we are then only required to run the time-shifted data through the final, small sub-network.

We find that networks constructed this way are not able to improve the sensitivity over a time coincidence analysis of the single detector machine learning events. We test three different approaches to training these networks but none show any improvement.

Ii Coincident Search from Independent Single-Detector Networks

The algorithm explored in this section uses a network trained on data from a single detector and uses it to find coincidences in multiple detectors. It is one of the most simple extensions and has two advantages. Firstly, networks trained on data from a single detector can be re-used which reduces requirements to computational resources. Secondly, the search background can be estimated using well established and efficient algorithms allowing for much higher confidence in candidate detections.

ii.1 Architecture

We use the same network as in Schäfer et al. (2021), which is an adaptation of the network presented in Gabbard et al. (2018). It consists of 6 stacked convolutional layers followed by 3 dense layers. An overview of the architecture is given in Table 1.

layer type kernel size output shape
Input + BatchNorm1d
Conv1D + ELU 64
Conv1D 32
MaxPool1D + ELU 4
Conv1D + ELU 32
Conv1D 16
MaxPool1D + ELU 3
Conv1D + ELU 16
Conv1D 16
MaxPool1D + ELU 2
Dense + Dropout + ELU
Dense + Dropout + ELU
Dense + Softmax
Table 1: A detailed overview of the architecture for the single detector neural network. Rows are grouped by their influence on the shape of the data. The layers are to be read from left to right and top to bottom to construct the network.

The last layer contains a Softmax activation function, which we remove during testing. In

Schäfer et al. (2021) we showed that this modification, which we called usr, allows the network to be tested at lower fars than otherwise possible.

The Softmax activation for the first output neuron is given by


where is the network output before the activation function and . When is strongly positive, the denominator in (1) and thus the fraction numerically evaluates to . This leads to problems when setting the threshold value to use to determine true positive detections Schäfer et al. (2021).

However, equation (1) is bijective and can be inverted


This quantity is monotonic and we can thus do statistics on directly, avoiding numerical instabilities while still using the Softmax activation during training.

ii.2 Data Sets and Training

The input to the network is a time series of duration sampled at . This allows for signals up to a frequency of to be resolved which is sufficient for the considered parameter space.

The network is trained on signals from non-spinning bbhs with component masses uniformly distributed from   to   . We enforce and for each pair of masses uniformly draw coalescence phases . The signals are generated with the waveform model SEOBNRv4_opt Devine et al. (2016) (optimized version of SEOBNRv4 Bohé and others (2017)) and scaled to varying optimal snrs in the range during training. The time of merger is varied from   to   from the start of the input window to decrease the dependency of the network on the exact signal position. Each signal is whitened by the analytic model for the detector power spectral density (psd) aLIGOZeroDetHighPower Collaboration (2018). For further details on the training set please refer to Schäfer et al. (2021).

Notably, we do not vary the sky position, inclination or polarization during training. For a single detector, variations in these parameters can be fully expressed by changes in the distance, which is fixed by choosing a specific snr, and the phase . For a two detector setup this degeneracy is broken as a time-of-flight difference is introduced and the amplitudes and phases are correlated in the two detectors. However, our search algorithm is largely parameter agnostic. This means that its output does not depend on the amplitude or phase. Thus, we do not have information on whether or not the search responds to consistent signals. Finally, the time-of-flight difference is on the order of the variation of the merger time within the training set and can, therefore, not be resolved. In section III the network has access to data from both observatories and the data is adjusted accordingly.

All noise is Gaussian and simulated from the aLIGOZeroDetHighPower psd Collaboration (2018). We explicitly generate colored noise and whiten it afterwards. This in principle allows to extend our training to real noise.

The training set contains noise samples, of which are combined with unique signals. The validation set111In out previous work Schäfer et al. (2021) what we call validation set here was named efficiency set. contains noise samples and unique signal samples, which we subsequently scale to snrs . This set is used to calculate the efficiency of the network at a fixed fap of

. The fap is the fraction of discrete noise samples misclassified as signals. The efficiency is the fraction of discrete signal samples correctly classified as signals at a given fap.

The test set contains a month of continuous simulated noise for each of the two detectors in Hanford and Livingston. We inject signals with parameters drawn from the distributions shown in Table 2 into both data streams. Injections are separated by a random time between   to   . To enable the networks to process this data, the continuous stream is sliced into million overlapping, correlated samples. Each sample is whitened individually by the analytic psd.

We construct a second test set for background estimation. This set contains the same time domain noise as the first test set but no injections are performed. We pre-process this second data set in the same way we pre-process the first data set for the network to be able to process it.

The network is trained for epochs and we use the network with the highest average efficiency over all snrs for the analysis carried out here. We use the Adam optimizer with a learning rate of , , and Kingma and Ba (2014)

. We use a variant of the binary cross-entropy which was designed to stay finite as loss function


where is for a signal-class sample and for a noise-class sample, is the prediction of the network, is the mini-batch size, and .

We implemented the network using the high-level API Keras

Chollet and others (2015)

of TensorFlow version 2.3.0

Abadi et al. (2015).

ii.3 Single Detector Events

To apply the network to data of duration longer than the input of the network, we use a sliding window with step size . The contents of each window are whitened individually by the psd model. At each step the network outputs a set of two numbers, the difference of which we use as our ranking statistic.

We apply the same network to the data from both detectors individually. We, thus, receive two output time series of ranking statistics. To determine notable events in the individual detectors we apply a threshold to both time series and cluster the resulting points above the threshold into events. A point exceeding the threshold is counted towards a cluster if it is within of the cluster boundaries. We choose a threshold on the usr output of , which corresponds to a Softmax output of .

The search algorithm produces a list of events, where an event is a tuple . Each event is a time at which the network predicts a signal to be present with a ranking statistic . The ranking statistic can be used to assign a significance to the event.

ii.4 Coincident Events

A signal will be present in the data of all detectors if it is of astrophyiscal origin. Its snr in each detector depends on the location and orientation of the source. The number of false alarms can, thus, be reduced by requiring that the event is picked up by multiple detectors at similar times.

To quantify the significance of an event detected by more than one observatory, a combined ranking statistic is required. For simplicity we restrict our current analysis to two detectors. However, this approach is extendable to any number of detectors.

If the network was using the final Softmax activation during evaluation a combined ranking statistic would come straight forwardly from the interpretation of the output as a probability.


The 1-to-1 relation between and given in equation (1) can be inserted into (4) to get


The combined ranking statistic is the sum of the single detector ranking statistics minus a correction term.

We consider an event in one detector to be coincident with another event in the other detector if the event times are within of each other. This time difference is chosen to be the maximum time resolution the networks can achieve due to the time variation in the training set.

We construct a list of coincidence events from the single detector list by the above condition. Each coincident event is assigned the combined ranking statistic (II.4) and the time in the Hanford detector.

ii.5 Background Estimation

To estimate the far at different ranking statistic values we evaluate the same noise used to search for signals but omit injecting the gws. This ensures that all events found in this data set are noise artifacts and are not influenced by close by injections.

We apply the network to the data and determine events as described in subsection II.3. We obtain two lists of events and search for coincidences as detailed in subsection II.4.

The lowest far that can be probed is limited by the duration of the analyzed data. Our test set covers one month. The duration can be increased by shifting the data in one of the detectors by a time larger than the maximum time-of-flight duration between the detectors. Rather than shifting the data itself one may instead alter the event times returned by the search. This allows us to skip reanalyzing the full data for each time step and only requires us to look for coincidences between the events from one detector and the time shifted events from the second detector. Increasing the amount of background by applying time shifts is a well established method that has already been successfully applied in production searches Abbott and others (2020); Usman and others (2016); Sachdev and others (2019).

We choose a time shift of and apply any possible integer multiple of this step size. We then search for coincidences in these events as detailed in subsection II.4. This procedure increases our background to years.

A list of fars at different network ranking statistics is obtained by counting the number of events in the way described above with a larger ranking statistic.

ii.6 Sensitivity

The sensitive volume of a search can be estimated by


when it is derived on data containing injections which are distributed uniformly in volume Usman and others (2016). Here is the far at which the volume is being calculated, is the maximum distance of any injection, is the volume of a sphere with radius , is the number of signals detected with a far and is the total number of injected signals. We report the radius of a sphere with volume instead of the sensitive volume.

We analyze a month of simulated data from the two detectors Hanford and Livingston, assuming the psd aLIGOZeroDetHighPower Collaboration (2018). The data contains injections drawn from the distribution shown in Table 2. We apply the network to the data from both detectors individually as described in subsection II.3. The resulting single detector events are correlated and a list of coincident events is produced as detailed in subsection II.4. We then pick out any events that are within of an injection. These events are called foreground events from here on out.

Parameter Uniform distribution
Component masses
Spins 0
Coalescence phase
Right ascension
Table 2: Distributions of the parameters used for the injections in the test set.

To determine the search background, we evaluate the same month of noise used to find the foreground events. However, this data does not contain any injections. The networks return a list of single detector events, which are correlated and shifted in time to increase the effective duration of the analyzed data as detailed in subsection II.5. The resulting coincident events are called background events from here on out.

We can then assign a far to any foreground event. To do so we count the number of background events with a ranking statistic larger than the ranking statistic of the considered foreground event. This number is divided by the effective duration of the analyzed background to obtain a far. The sensitive volume is then obtained from equation (6) and converted to a distance. The sensitive distance as a function of the far is obtained by evaluating the sensitive volume at the fars of all foreground events.

ii.7 Matched Filtering

The template bank contains unique waveforms and is constructed such that no more than of the snr of any signal is lost due to the discreteness of the bank. It covers the same mass range of   to   as the training set of the networks and spins are set to . The individual templates are generated using the waveform model IMRPhenomD Husa et al. (2016); Khan et al. (2016) and placed stochastically.

To run the matched filter search we use the program pycbc_inspiral Nitz et al. (2021). It is setup to use a snr threshold of in both detectors to create two sets of single detector triggers. These two sets are then checked for coincidence by two different approaches.

One approach handles the matched filter triggers analogous to the network single detector triggers, i.e. they are clustered and turned into single detector events as described in subsection II.3. In this case the ranking statistic is the snr returned by the best matching template. We then look for coincidences as described in subsection II.4 by requiring two events in different detectors to be separated by no more than . The combined ranking statistic in this case is given by


This disregards the information about the possible parameters obtained from the best matching template and only looks for time coincidence, i.e. no signal consistency is required.

The other approach leverages the signal information and checks for phase and amplitude correlation as well as requiring that the templates matching the data are consistent between detectors. In particular we utilize the combined ranking statistic given in equation (2) of Nitz et al. (2017) and find coincidences as described therein.

ii.8 Evaluation and Comparison to Matched Filtering

In Figure 1 we show the injections that were found and missed by the network coincident search at a far of false alarm per month. The x-axis shows the optimal snr of the injections in the Hanford detector and the y-axis shows the optimal snr in the Livingston detector. The color indicates the network ranking statistic as calculated by equation (II.4). Missed injections are marked with a red cross. A network snr of as calculated by equation (7) is highlighted by the black line.

Figure 1 shows that the combined ranking statistic (II.4) is correlated with the network snr. As the network snr increases so does the combined ranking statistic. The loudest missed injection has a network snr of . However, the signal is most dominantly seen in the Hanford detector with a single detector snr of , whereas Livingston has an optimal snr due to the location of the source. Therefore, it is not surprising that the signal does not show up in both detectors and is missed by the coincidence search. When considering only the detector in which the signal is observable with lower snr, the loudest missed signal has a optimal snr of in that detector.

Figure 1: Found and missed injections from the test set as returned by the procedure discussed in section II. The top panel overlays the missed injections by the found injections and the bottom panel reverses the order. The x- and y-axis show the optimal snrs of the injections in the Hanford and Livingston detector, respectively. The color of found injections represents the combined ranking statistic as defined by equation (II.4). Missed injections are marked by a red cross. The black line indicates an optimal network snr of . The plot is generated at a far of false alarm per month.

In Figure 2 we show the sensitive distance of different algorithms as a function of the far. The orange lines show the sensitivity curves of the machine learning based algorithms whereas the purple lines show the sensitivities of a comparable matched filter search. The dashed lines show the sensitivity of the searches when only a single detector is considered. We compare those to a two-detector search where we require coincident detections in both detectors. The filled orange line and the dash-dotted purple line show the comparison between the machine learning and matched filter algorithms, respectively, when both impose the same coincidence condition. The filled purple line shows a more realistic application of matched filtering where the consistency of the time of arrival, the phase, the amplitude, as well as the parameters of the best matching template are required.

We find a significant improvement of up to at a given far when the machine learning algorithm has access to data from both detectors compared to using only data from a single detector. Furthermore, we can probe fars down to false alarms per month without needing to increase the amount of evaluated data by applying time shifts between detectors as described in subsection II.5. In principle this limit may be decreased even further and time shifts are only limited by the time-of-flight difference between the detectors. The large increase in the available background potentially greatly increases the statistical significance of any event.

The sensitivities of the machine learning search algorithms are compared to an equivalent matched filter search. For the single detector searches given by the dashed lines in Figure 2 we find that the machine learning algorithm retains at least of the sensitivity at a fixed far of the matched filter analogue. This corresponds to a maximum absolute separation of . This difference in sensitivity is basically unchanged when data from two detectors is considered and both the machine learning as well as the matched filter search calculate coincidences only based on the timing in the different detectors. The corresponding curves in Figure 2 are the filled orange and the dash-dotted purple line, respectively. In this case, the machine learning algorithm retains at least of the sensitivity of the time coincidence matched filter search which corresponds to an absolute separation of .

However, matched filtering also carries information about the intrinsic parameters of the source, the relative phase, and the relative amplitudes in the two detectors. This information can be used to further constrain coincidences and improve the ranking statistic Nitz et al. (2017) by testing for signal consistency. We compare the time coincidence machine learning search (filled, orange line in Figure 2) to this matched filter coincidence search utilizing signal consistency checks (filled, purple line in Figure 2). The machine learning search now only retains at least of the sensitivity in far regions where both are defined. This corresponds to an absolute separation of .

We truncate the sensitivity curve of any search that has access to data from both detectors in Figure 2 at a far of false alarms per month. This is done due to a large number of true positives at high fars originating from random noise coincidences. This means that the search returns a coincident event that is caused by a particular noise realization which happens to coincide with an injection with an optimal snr below the trigger threshold. Many of these injections should thus not be recoverable but are detected at high far due to these noise fluctuations. At a far of per month we expect less then of these false associations. Another reason to only compare the sensitivity at low fars of the machine learning and the matched filtering based searches are the thresholds used to find triggers. The matched filter search uses a threshold of snr whereas the machine learning search uses a threshold on the usr ranking statistic of . Because there is no direct relation between these two statistics, we cannot guarantee that both thresholds correspond to similar signal strengths. It may be possible that one search excludes weak signals which are found by the other based on this difference in the threshold.

The sensitivity difference between machine learning and matched filtering stays constant between using data from a single detector and using data from two detectors when matched filtering may only check for time consistency between detection candidates from the two observatories. The performance difference increases when matched filtering also checks for signal consistency. It is, therefore, reasonable to believe that a multi detector machine learning search may be more sensitive when it too can check for signal consistency. This would either require the single detector network to output parameter estimates of the detected signal alongside a ranking statistic or a single network that uses the data from both detectors as input. In the following section III we explore the second hypothesis.

Figure 2: Shown are the sensitive distances of different search algorithms as a function of the far. In orange we show the sensitivity curves of the machine learning based searches presented in Schäfer et al. (2021) and this work. In purple we show sensitivity curves of an equivalent matched filter search. The dashed lines are derived on data only from a single detector. A label ”coinc. ” refers to events being tested for coincidence based solely on the time difference of the events in the two detectors. The label ”coinc. signal” means that the matched filter search also checked for signal consistency based on the time-, phase-, amplitude-difference, and intrinsic parameters in the two detectors. Sensitivities derived on data from more than one detector are truncated at a far of per month due to an increasing number of true detections caused by random coincident events in the noise.

Iii Two Detector Network

The deep learning algorithm presented in section II is significantly less sensitive than the full matched filter analysis that takes signal consistency into account. On the other hand, when the deep learning algorithm is compared to the matched filter search where signal consistency is ignored, the difference in sensitivity is comparable to the difference in sensitivity for a single detector. This gives reason to believe that the difference in sensitivity compared to the full matched filter search could be reduced when the network may operate on the data from both detectors and consider coincidences itself.

iii.1 Architecture

We construct a network that uses data from both detectors while still retaining the ability to efficiently estimate a large background. The network from section II is still applied to the data from the two detectors individually. However, the final layer is removed and the output-neurons from both networks are concatenated. We then add more fully connected layers to look for coincidences between the detectors. An overview of the network is shown in Figure 3.














Figure 3: A high level overview of the two-detector architecture. The network consists of three sub-networks A, B, and C. A detailed description of the sub-networks A and B can be found in Table 1 by removing the final row. The fully connected Dense layers contain , , and neurons in that order. All but the final Dense layer are equipped with an exponential linear unit (ELU) activation.

The last layer from the single detector network is removed to create a large latent space. A matched filter search compresses the input data into the ranking statistic, the time of the merger, and the parameters of the best matching template. The intention is that neurons may be sufficient for a comparable compression and that the additional layers that operate on the concatenated outputs could perform a signal consistency analysis.

The sub-networks A and B in Figure 3 are intended to act as encoders that reduce the dimensional input into a latent space of dimension

. It may be interesting in the future to train these sub-networks initially as autoencoders

Kramer (1991) from which only the encoder is used for detection purposes afterwards. Autoencoders are neural networks which in the most simple form consist of an encoder network and a decoder network. The encoder network compresses the input to some lower dimensional latent representation whereas the decoder uses that lower dimensional representation to reconstruct the input. Other studies have already found that autoencoders have potential applications in gw data analysis Shen et al. (2019); Gabbard et al. (2019).

iii.2 Data Sets and Training

The network is trained on data similar to that presented in subsection II.2. However, the data is extended to two detectors and sources are uniformly distributed in the sky. The latter change is required due to the amplitude and phase correlations in the two detectors.

We utilize the pre-trained single detector network used in section II in two different ways. In both cases the single detector parts of the two detector network (A and B in Figure 3) are initialized with the weights of the pre-trained model from section II. However, for one of the two networks, these weights are then not optimized during training, leaving only the weights of the final fully connected layers (C in Figure 3

) to be adjusted. This approach is known as transfer learning

Weiss et al. (2016) and has been successfully applied for different problems Tan et al. (2018); George et al. (2018); Mesuga and Bayanay (2021). The second network optimizes the weights of the entire network. We also train a third network of the same architecture, where all parameters are initialized randomly and optimized during training.

The same optimizer settings and loss function described in subsection II.2 are used to train all three networks for epochs. They are trained with a Softmax activation on the final layer, which is removed during evaluation. Each network is only trained once and the epoch with the highest efficiency on the validation set is chosen for further analysis.

iii.3 Coincident Events

Because the networks output a single value when given the data from two detectors, we interpret that output as a coincidence ranking statistic at the corresponding time. We then perform the same clustering and thresholding described in subsection II.3 to obtain a list of coincident events.

iii.4 Background Estimation

Determining the background of the two detector network is more challenging than for the single detector network from subsection II.5, as there is no direct way of performing time shift in a computationally efficient way. One would, therefore, naively be limited by the duration of the analyzed data or would have to re-evaluate the entire month of test data multiple times. However, the network is designed in such a way that the data from both detectors are still analyzed individually and combined only at later stages. We evaluate the single detector data individually with the sub-networks A and B from Figure 3 and store those outputs. We then permute the order of the outputs from sub-network B such that it corresponds to a time shift with respect to the output from sub-network A. Finally, sub-network C is applied to the concatenated data from sub-network A and B for many different time shifts. Since sub-network C is very simple and time shifts can be generated trivially this process generates months of background within on a NVIDIA RTX 2070 Super.

iii.5 Evaluation and Comparison to Matched Filtering

Figure 4 shows the sensitive distance of the various networks as a function of the far and compares them to the results presented in subsection II.8. All curves are truncated at a far of per month due to the large number of false associations described in subsection II.8. The three networks utilizing the data from both detectors described in this section are labeled as ”Machine learning network coinc.”. The matched filter results are shown in purple, where the dash-dotted line considers only time coincidence and the filled line also takes the consistency of intrinsic source parameters, phase, and amplitude into account. The orange line corresponds to the network from section II.

Figure 4: The sensitivity of different search algorithms as a function of the far. All shown algorithms operate on the data from two detectors. The curves labeled ”Machine learning coinc.” are neural network search algorithms that consider data from both detectors and an overview can be found in Figure 3. The network labeled ”initialized” initializes the sub-networks A and B as shown in Figure 3 from the single detector network used in subsection II.8 but optimizes them during the subsequent training. The network labeled ”transfer” also initializes both sub-networks as the ”initialized” network but freezes their weights. The network labeled ”scratch” initializes all parameters of the network randomly. All other searches operate on the data from the individual detectors first and then search for coincident events. A label ”coinc. ” refers to events being tested for coincidence based solely on the time difference of the events in the two detectors. The label ”coinc. signal” means that the matched filter search also checked for signal consistency based on intrinsic parameters and the time-, phase-, and amplitude-difference in the two detectors. The curve labeled ”Machine learning coinc. ” refers to the two-detector machine learning search analyzed in subsection II.8. All sensitivities are truncated at a far of per month due to a growing number of true positive detections caused by the coincidence of noise events.

The networks described in this section were designed to be able to take signal consistency into account by reducing the input data to a large latent space. As such we were expecting sensitivities at low fars to be larger than those obtained from time coincidence between single detector events produced by the single detector network.

However, we find that at low fars all of the two detector networks are roughly as sensitive as the network tested in subsection II.8. Therefore, they are still less sensitive than the matched filter equivalent and do not seem to take signal consistency into account. For high fars, on the other hand, they are more sensitive. We suspect that the large time variation of the peak amplitude of may be responsible for this behavior. The networks are, thereby, trained to be insensitive to variations in timing of less then , which may produce phase and amplitude variations in a broad range.

Iv Conclusions

In this paper we have extended the single detector deep learning gw search algorithm from Gabbard et al. (2018); Schäfer et al. (2021) to two detectors and compared it to an equivalent matched filter algorithm. We found that the most simple extension, applying the one detector network to the data from two detectors individually and searching for coincident events, retains of the sensitivity of matched filtering, when only the time consistency between detectors is required. This fraction drops to when signal consistency between detectors is also considered.

To operate on data from two observatories, we constructed a two detector ranking statistic for the machine learning search based on the single detector usr ranking statistic proposed in Schäfer et al. (2021). This ranking statistic proved to be correlated with the network snr.

We also highlighted the advantages of using a single detector network to construct a two detector search. Firstly, the single detector network does not need to be re-trained to be applied to the second detector, if both have similar noise characteristics. Secondly, this approach enables an efficient background estimation by applying relative time shifts to the recovered single detector events. This allows to test the two detector search to almost arbitrarily low fars at low computational expenses. This method has already proven to be effective and reliable in state-of-the-art classical search algorithms Abbott and others (2020); Usman and others (2016); Sachdev and others (2019).

Because using a single detector network restricts one to check for coincidences based solely on the timing difference, we tested a simple network that operates on data from both detectors directly. This allows the network in principle to construct internal signal representations which can be correlated between observatories. The network was constructed by removing the final layer of the single detector network, concatenating the outputs and adding a few fully connected layers to check for coincident events. The final fully connected layers, thus, receive latent variables for each detector that can be checked for coincidence.

This design of the two detector network allowed us to do efficient background estimation. By applying relative time shifts to the outputs of the individual detector sub-networks, only the final few fully connected layers need to be evaluated for all shifts. The bulk of the computation, namely evaluating the input data of the detectors, only needs to be done once.

The network architecture was trained in three different ways; randomly initialized parameters for the entire network, parameters of the sub-networks initialized from of the single detector network, and parameters of the individual detector sub-networks fixed to the single detector parameters and optimizing only the final fully connected layers.

We found that all of these networks have very similar performance at low fars. Neither of them performed substantially better than the initial network that looked for time coincident events between the single detector network outputs. It, therefore, seems as if the network architecture explored here is unable to learn any additional information about the signal. This may be caused by the allowed time-variance of

for signals in the training set, which may limit the time resolution of the network and thus overshadow correlations in any other parameters. More sophisticated network architectures with higher time resolution may improve our findings. First promising steps have already been taken by Wei et al. (2021). Using an autoencoder to find a more meaningful latent representation of the input data may also be of use.

While the sensitivity was not improved by using a single network to process the data of two detectors, we still want to highlight that the method of determining the background may be of use for future networks.

Here we limited our research to gws from non-spinning binary black holes with signal duration and Gaussian noise. Any of these simplifications are desirable to be lifted. Especially considering real noise may increase the gap in sensitivity between the single detector and multi detector search algorithm, by vetoing glitches. While we considered only two detectors an extension to a larger network should be trivial and may follow studies such as Davies et al. (2020).

V Acknowledgements

We thank Ondřej Zelenka, Frank Ohme, and Bernd Brügmann for valuable discussions and their scientific input. We acknowledge the Max Planck Gesellschaft and the Atlas cluster computing team at Albert-Einstein Institut (AEI) Hannover for support.


  • J. Aasi et al. (2015) Advanced LIGO. Class. Quantum Grav. 32, pp. 074001. External Links: Document, 1411.4547 Cited by: §I.
  • M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng (2015) TensorFlow: large-scale machine learning on heterogeneous systems. Note: Software available from External Links: Link Cited by: §II.2.
  • B. P. Abbott et al. (2016) GW150914: The Advanced LIGO Detectors in the Era of First Discoveries. Phys. Rev. Lett. 116 (13), pp. 131103. External Links: 1602.03838, Document Cited by: §I.
  • B. P. Abbott et al. (2019) GWTC-1: A Gravitational-Wave Transient Catalog of Compact Binary Mergers Observed by LIGO and Virgo during the First and Second Observing Runs. Phys. Rev. X 9 (3), pp. 031040. External Links: 1811.12907, Document Cited by: §I.
  • B. P. Abbott et al. (2020) A guide to LIGO–Virgo detector noise and extraction of transient gravitational-wave signals. Class. Quant. Grav. 37 (5), pp. 055002. External Links: 1908.11170, Document Cited by: §I.
  • R. Abbott et al. (2020) GWTC-2: Compact Binary Coalescences Observed by LIGO and Virgo During the First Half of the Third Observing Run. External Links: 2010.14527 Cited by: §I, §I, §I, §II.5, §IV.
  • R. Abbott et al. (2021) Observation of Gravitational Waves from Two Neutron Star–Black Hole Coalescences. Astrophys. J. Lett. 915 (1), pp. L5. External Links: 2106.15163, Document Cited by: §I.
  • F. Acernese et al. (2015) Advanced Virgo: a second-generation interferometric gravitational wave detector. Class. Quantum Grav. 32 (2), pp. 024001. External Links: Document, 1408.3978 Cited by: §I.
  • T. Adams, D. Buskulic, V. Germain, G. M. Guidi, F. Marion, M. Montani, B. Mours, F. Piergiovanni, and G. Wang (2016) Low-latency analysis pipeline for compact binary coalescences in the advanced gravitational wave detector era. Class. Quant. Grav. 33 (17), pp. 175012. External Links: 1512.02864, Document Cited by: §I.
  • T. Akutsu et al. (2019) KAGRA: 2.5 Generation Interferometric Gravitational Wave Detector. Nature Astron. 3 (1), pp. 35–40. External Links: 1811.08079, Document Cited by: §I.
  • B. Allen, W. G. Anderson, P. R. Brady, D. A. Brown, and J. D. E. Creighton (2012) FINDCHIRP: An Algorithm for detection of gravitational waves from inspiraling compact binaries. Phys. Rev. D 85, pp. 122006. External Links: gr-qc/0509116, Document Cited by: §I.
  • A. Bohé et al. (2017) Improved effective-one-body model of spinning, nonprecessing binary black holes for the era of gravitational-wave astrophysics with advanced detectors. Phys. Rev. D 95 (4), pp. 044028. External Links: 1611.03703, Document Cited by: §II.2.
  • F. Chollet et al. (2015) Keras. Note: Cited by: §II.2.
  • L. S. Collaboration (2018) LIGO Algorithm Library - LALSuite. Note: free software (GPL) External Links: Document Cited by: §II.2, §II.2, §II.6.
  • E. Cuoco et al. (2021) Enhancing Gravitational-Wave Science with Machine Learning. Mach. Learn. Sci. Tech. 2 (1), pp. 011002. External Links: 2005.03745, Document Cited by: §I.
  • T. Dal Canton, A. H. Nitz, B. Gadre, G. S. Davies, V. Villa-Ortega, T. Dent, I. Harry, and L. Xiao (2020) Realtime search for compact binary mergers in Advanced LIGO and Virgo’s third observing run using PyCBC Live. External Links: 2008.07494 Cited by: §I.
  • G. S. Davies, T. Dent, M. Tápai, I. Harry, C. McIsaac, and A. H. Nitz (2020) Extending the PyCBC search for gravitational waves from compact binary mergers to a global network. Phys. Rev. D 102 (2), pp. 022004. External Links: 2002.08291, Document Cited by: §IV.
  • C. Devine, Z. B. Etienne, and S. T. McWilliams (2016) Optimizing spinning time-domain gravitational waveforms for Advanced LIGO data analysis. Class. Quant. Grav. 33 (12), pp. 125025. External Links: 1601.03393, Document Cited by: §II.2.
  • C. Dreissigacker and R. Prix (2020) Deep-Learning Continuous Gravitational Waves: Multiple detectors and realistic noise. Phys. Rev. D 102 (2), pp. 022005. External Links: 2005.04140, Document Cited by: §I.
  • R. Essick, P. Godwin, C. Hanna, L. Blackburn, and E. Katsavounidis (2020)

    iDQ: Statistical Inference of Non-Gaussian Noise with Auxiliary Degrees of Freedom in Gravitational-Wave Detectors

    External Links: 2005.12761 Cited by: §I.
  • H. Gabbard, C. Messenger, I. S. Heng, F. Tonolini, and R. Murray-Smith (2019) Bayesian parameter estimation using conditional variational autoencoders for gravitational-wave astronomy. External Links: 1909.06296 Cited by: §III.1.
  • H. Gabbard, M. Williams, F. Hayes, and C. Messenger (2018) Matching matched filtering with deep networks for gravitational-wave astronomy. Phys. Rev. Lett. 120 (14), pp. 141103. External Links: 1712.06041, Document Cited by: §I, §II.1, §IV.
  • D. George and E. A. Huerta (2018a) Deep Learning for Real-time Gravitational Wave Detection and Parameter Estimation: Results with Advanced LIGO Data. Phys. Lett. B 778, pp. 64–70. External Links: 1711.03121, Document Cited by: §I.
  • D. George and E. A. Huerta (2018b) Deep Neural Networks to Enable Real-time Multimessenger Astrophysics. Phys. Rev. D 97 (4), pp. 044039. External Links: 1701.00008, Document Cited by: §I.
  • D. George, H. Shen, and E. A. Huerta (2018) Classification and unsupervised clustering of LIGO data with Deep Transfer Learning. Phys. Rev. D 97 (10), pp. 101501. External Links: Document Cited by: §III.2.
  • I. Harry, J. Calderón Bustillo, and A. Nitz (2018) Searching for the full symphony of black hole binary mergers. Phys. Rev. D 97 (2), pp. 023004. External Links: 1709.09181, Document Cited by: §I.
  • I. Harry, S. Privitera, A. Bohé, and A. Buonanno (2016) Searching for Gravitational Waves from Compact Binaries with Precessing Spins. Phys. Rev. D 94 (2), pp. 024012. External Links: 1603.02444, Document Cited by: §I.
  • E. A. Huerta and Z. Zhao (2021) Advances in Machine and Deep Learning for Modeling and Real-time Detection of Multi-Messenger Sources. External Links: 2105.06479 Cited by: §I.
  • S. Husa, S. Khan, M. Hannam, M. Pürrer, F. Ohme, X. Jiménez Forteza, and A. Bohé (2016) Frequency-domain gravitational waves from nonprecessing black-hole binaries. I. New numerical waveforms and anatomy of the signal. Phys. Rev. D 93 (4), pp. 044006. External Links: 1508.07250, Document Cited by: §II.7.
  • S. Khan, S. Husa, M. Hannam, F. Ohme, M. Pürrer, X. Jiménez Forteza, and A. Bohé (2016) Frequency-domain gravitational waves from nonprecessing black-hole binaries. II. A phenomenological model for the advanced detector era. Phys. Rev. D 93 (4), pp. 044007. External Links: 1508.07253, Document Cited by: §II.7.
  • D. P. Kingma and J. Ba (2014) Adam: A Method for Stochastic Optimization. arXiv e-prints, pp. arXiv:1412.6980. External Links: 1412.6980 Cited by: §II.2.
  • S. Klimenko et al. (2016) Method for detection and reconstruction of gravitational wave transients with networks of advanced detectors. Phys. Rev. D 93 (4), pp. 042004. External Links: 1511.05999, Document Cited by: §I.
  • M. A. Kramer (1991)

    Nonlinear principal component analysis using autoassociative neural networks

    AIChE Journal 37 (2), pp. 233–243. External Links: Document, Link, Cited by: §III.1.
  • P. G. Krastev, K. Gill, V. A. Villar, and E. Berger (2021) Detection and Parameter Estimation of Gravitational Waves from Binary Neutron-Star Mergers in Real LIGO Data using Deep Learning. Phys. Lett. B 815, pp. 136161. External Links: 2012.13101, Document Cited by: §I.
  • A. K. Lenon, D. A. Brown, and A. H. Nitz (2021) Eccentric Binary Neutron Star Search Prospects for Cosmic Explorer. External Links: 2103.14088 Cited by: §I.
  • C. Messick et al. (2017) Analysis Framework for the Prompt Discovery of Compact Binary Mergers in Gravitational-wave Data. Phys. Rev. D 95 (4), pp. 042001. External Links: 1604.04324, Document Cited by: §I.
  • R. Mesuga and B. J. Bayanay (2021) On the Efficiency of Various Deep Transfer Learning Models in Glitch Waveform Detection in Gravitational-Wave Data. External Links: 2107.01863, Document Cited by: §III.2.
  • A. Nitz, I. Harry, D. Brown, C. M. Biwer, J. Willis, T. D. Canton, C. Capano, L. Pekowsky, T. Dent, A. R. Williamson, G. S. Davies, S. De, M. Cabero, B. Machenschalk, P. Kumar, S. Reyes, D. Macleod, dfinstad, F. Pannarale, T. Massinger, S. Kumar, M. Tápai, L. Singer, S. Khan, S. Fairhurst, A. Nielsen, S. Singh, shasvath, and B. U. V. Gadre (2021) Gwastro/pycbc: 1.18.0 release of pycbc External Links: Document, Link Cited by: §I, §II.7.
  • A. H. Nitz, C. D. Capano, S. Kumar, Y. Wang, S. Kastha, M. Schäfer, R. Dhurkunde, and M. Cabero (2021) 3-OGC: Catalog of gravitational waves from compact-binary mergers. External Links: 2105.09151 Cited by: §I.
  • A. H. Nitz, T. Dent, T. Dal Canton, S. Fairhurst, and D. A. Brown (2017) Detecting binary compact-object mergers with gravitational waves: Understanding and Improving the sensitivity of the PyCBC search. Astrophys. J. 849 (2), pp. 118. External Links: 1705.01513, Document Cited by: §I, §II.7, §II.8.
  • A. H. Nitz, A. Lenon, and D. A. Brown (2019) Search for Eccentric Binary Neutron Star Mergers in the first and second observing runs of Advanced LIGO. Astrophys. J. 890, pp. 1. External Links: 1912.05464, Document Cited by: §I.
  • A. H. Nitz and Y. Wang (2021) Search for gravitational waves from the coalescence of sub-solar mass and eccentric compact binaries. External Links: 2102.00868, Document Cited by: §I.
  • A. H. Nitz and Y. Wang (2021) Search for gravitational waves from the coalescence of sub-solar mass binaries in the first half of Advanced LIGO and Virgo’s third observing run. External Links: 2106.08979 Cited by: §I.
  • L. Nuttall et al. (2015) Improving the Data Quality of Advanced LIGO Based on Early Engineering Run Results. Class. Quant. Grav. 32 (24), pp. 245005. External Links: 1508.07316, Document Cited by: §I.
  • S. Sachdev et al. (2019) The GstLAL Search Analysis Methods for Compact Binary Mergers in Advanced LIGO’s Second and Advanced Virgo’s First Observing Runs. External Links: 1901.08580 Cited by: §I, §I, §II.5, §IV.
  • M. B. Schäfer, F. Ohme, and A. H. Nitz (2020) Detection of gravitational-wave signals from binary neutron star mergers using machine learning. Phys. Rev. D 102 (6), pp. 063015. External Links: 2006.01509, Document Cited by: §I.
  • M. B. Schäfer, O. Zelenka, A. H. Nitz, F. Ohme, and B. Brügmann (2021) Training Strategies for Deep Learning Gravitational-Wave Searches. External Links: 2106.03741 Cited by: §I, §I, §I, Figure 2, §II.1, §II.1, §II.1, §II.2, §IV, §IV, footnote 1.
  • H. Shen, D. George, E. A. Huerta, and Z. Zhao (2019) Denoising Gravitational Waves with Enhanced Deep Recurrent Denoising Auto-Encoders. External Links: 1903.03105, Document Cited by: §III.1.
  • C. Tan, F. Sun, T. Kong, W. Zhang, C. Yang, and C. Liu (2018) A survey on deep transfer learning. In Artificial Neural Networks and Machine Learning – ICANN 2018, V. Kůrková, Y. Manolopoulos, B. Hammer, L. Iliadis, and I. Maglogiannis (Eds.), Cham, pp. 270–279. External Links: ISBN 978-3-030-01424-7, Document Cited by: §III.2.
  • S. A. Usman et al. (2016) The PyCBC search for gravitational waves from compact binary coalescence. Class. Quant. Grav. 33 (21), pp. 215004. External Links: 1508.02357, Document Cited by: §I, §I, §II.5, §II.6, §IV.
  • W. Wei, E. A. Huerta, M. Yun, N. Loutrel, R. Haas, and V. Kindratenko (2020) Deep Learning with Quantized Neural Networks for Gravitational Wave Forecasting of Eccentric Compact Binary Coalescence. External Links: 2012.03963 Cited by: §I.
  • W. Wei and E. A. Huerta (2021) Deep learning for gravitational wave forecasting of neutron star mergers. Phys. Lett. B 816, pp. 136185. External Links: 2010.09751, Document Cited by: §I.
  • W. Wei, A. Khan, E. A. Huerta, X. Huang, and M. Tian (2021) Deep Learning Ensemble for Real-time Gravitational Wave Detection of Spinning Binary Black Hole Mergers. Phys. Lett. B 812, pp. 136029. External Links: 2010.15845, Document Cited by: §I, §IV.
  • K. Weiss, T. M. Khoshgoftaar, and D. Wang (2016) A survey of transfer learning. Journal of Big Data 3 (1), pp. 9. External Links: ISSN 2196-1115, Document, Link Cited by: §III.2.
  • M. Zevin et al. (2017) Gravity Spy: Integrating Advanced LIGO Detector Characterization, Machine Learning, and Citizen Science. Class. Quant. Grav. 34 (6), pp. 064003. External Links: 1611.04596, Document Cited by: §I.