I Background and motivation
The pioneering work by Huerta et al. George and Huerta (2018) showed convolutional neural networks (CNN) to be a powerful method approach to the gravitational wave (GW) detection problem – a GW signature buried in the noisy interferometer data Abbott et al. (2016c, 2017) can be detected with a CNN. A CNN is a regularized multilayer artificial neural network that utilizes the hierarchical features in data LeCun et al. (1990, 1989). The localized nature of convolutions makes CNNs demonstrate great performance on raw data, especially image data. The approach is to train the CNN with noisy data with and without GW signatures. Data streams from interferometers are searched by matched-filter with a collection of approximate GW templates. When high correlation with a template is detected, it is shared with partners for verification with data sources from other interferometers and electromagnetic follow-up Abbott et al. (2017). Appropriately trained CNNs have shown to be an excellent first pass detection method to proceed the more specific computationally expensive matched-filter and labor intense verification steps.
The original method of George and Huerta (2018)
followed the standard CNN architecture, so requires a large training set and a computationally cumbersome choice of hyper-parameters. Adding more feature selection prior to classification with CNN can improve performance.
In this letter, we propose a new method improving on the CNN approach by including topological features of the data, in particular, persistent homology of sliding window embeddings. This is known as topological data analysis (TDA) Carlsson (2009). The proposed method makes training more efficient, consequently reducing the size of the training set significantly. The aforementioned localized effect of convolution layers makes this a low-risk endeavor, as adding topological features should not decrease performance because as the CNN is trained, it can ignore these features by assigning small weights.
The important potential enhancement is the increased generality. Interesting GW signals come in a large and diverse class. For instance, multiple parameters are involved in a black-hole merger that change the signature, e.g. the mass ratio. TDA is a lossy process, but preserves various key properties such as period, decay rate, etc. when classifying wave-packets.
Ii Data synthesis
Signals were generated by a surrogate model described in Blackman et al. (2015). The model generates non-spinning binary black-hole merger gravitational waveforms with mass ratio between and . It has an accuracy close to that of the high-fidelity model which requires solving Einstein’s equations by the Spectral Einstein Code (SpEC). The reduced model is constructed by selecting most relevant mass ratios using a greedy algorithm. The surrogate model is highly accurate after including about waveforms, in that the error becomes comparable to the truncation error of the SpEC.
We use reference signals with mass ratios between and , sampled at . Each window length is seconds. We construct training sets by adding noise and embedding the signal in noise so it occurs at a random time.
Let be a signal from the reference set and
be the Gaussian noise with standard deviation. The GW signal is embedded in . The non-Gaussian noise can be treated in the similar manner. We scale the noise against the signal with a unitless scaling coefficient . The synthetic data is then:
The coefficient scales the noise amplitude down to roughly the same order of magnitude as the signal. The signal is inserted at a random position in a piece of noise of duration seconds ( elements at Hz) scaled with the same factor as above. This yields a signal of length
seconds. A GW signal is present with probability of, implying that the training data is balanced. By cycling through a sample of signals and values for while randomly choosing signal presence and occurrence time, we construct an arbitrarily large training set. The coefficient corresponds to the optimal match-filtered SNR in Table 1.
Signals superimposed in red on noise with signal in blue. Random padding included. SNR:,, left to right, respectively.
Figure 1 shows noisy signals (blue) that contain the GW (red) with SNR = , respectively. As shown in the figure, the GW is highly weak compared to the noise, so the detection is challenging.
Note that the noise is not Gaussian in general. It comes from various sources, terrestrial and astrophysical, giving color to the noise Abbott et al. (2016a); Allen et al. (2002). Also note that SNR can be higher than the SNRs in the range used for our synthetic data Abbott et al. (2016d, 2017b). We focus on lower SNRs as we are interested in the limitations of the detection methods, though a SNR of below is generally considered too low to be verifiable. Most detected signals have a SNR within the range we chose to test at the detection sites Abbott et al. (2016b); Scientific et al. (2017); Abbott et al. (2017a, 2017).
Iii Sliding window embedding
We use the persistent homology of sliding window embeddings described in Perea and Harer (2015) (see Section IV for persistent homology). The main difference between the noise signal with and without the GW signal can be characterized by the periodic embedding of signals.
For a time-series the sliding window embedding of size at the time index is:
where is the number of samples and there are points in
. A periodic signal has a sliding window embedding of a circle or an oval. A decaying periodic signal is a spiral. White noise is a ball. They are easily classified using homology groups, providing a method to classify different periodic-like behaviorsPerea and Harer (2015).
, which is not optimized but chosen heuristically. Figure2 shows the sliding window of white noise with SNR .
Dimensional reduction: The sliding window creates a point cloud in a -dimensional space. However, the features of interest are much lower dimensional. We calculate feature persistent homology of , requiting dimensions. This, along with the expense of TDA on high dimensional space, motivates a dimensional reduction before proceeding. We project down to three dimensions for ease of visualization and to avoid loss of using only two.
where denotes the usual matrix element notation and the mean value of column vector of
. Then the singular value decomposition (SVD) is conducted and a projection operator is constructed to project the point cloud onto the first three scaled left singular vectors:
where is the SVD of , the reduction of and the superscript the Hermitian conjugate.
Figure 3 is the dimensionally reduced sliding window embedding of a GW signal (left) and white noise (right). Notice that the topology of the white noise is approximately a ball, while the GW chirp signature has a different topology.
Iv TDA: Persistent Homology
Let be the topological space of interest, the embedding space of GW signals in our case. Let be the -simplex, the convex hull composed of vertices. Let be a ring and be the free -module generated by all possible continuous images of -simplices . Let : be the boundary map as
where are the vertices. The factor is put to preserve the orientation. The th homology group of with coefficients in , , is the quotient group of the kernel and image groups:
The number of generators of is called the Betti number, , roughly the number of geometric holes in th dimensional space of . denotes the number of connected components. For example, for , , and and for , , and .
Given the point cloud, we construct a simplicial complex created by gluing a finite number of simplices together. Homology on such a simplicial complex is known as persistent homology. We use the Vietoris-Rips complex Carlsson (2009), which is built by taking all points as zero simplices. For a fixed value of , known as the filtration parameter, we add edges between two points if their distance is less than . We add a triangle between three points if each pair of points has distance less than and so on for higher dimensional simplices. We repeat this process with various values.
The barcode is the graph of against the parameter . It displays not only at each , but also graphs how long each generator remains non-trivial. Its interval gives a concept referred to as persistence. The starting point of each persistence is called “birth” and the ending “death”. Its vertical representation, with the brith as -axis and the death -axis, is called the persistence diagram.
Let be the number of the persistences in barcode and in . Once both barcodes are obtained, we sort the persistences by descending magnitude. Let be the ordered persistence in , , similarly the ordered persistence in , . Let and be
We call these persistence vectors, and they are how we choose to encode topological features as a vector that can be used as input to a CNN.
We use the persistent homology for dimensions and because the existing results regarding TDA of sliding window embeddings focus on Perea and Harer (2015). was included as the performance impact of doing so was negligible and the nature of a CNN allows for it to be used or disregarded as fit. The Vietoris-Rips complex is computationally feasible as we use only and but may become extremely resource intense for higher order homology groups.
V Preprocessing for CNN
The persistence vectors (of length respectively) are adjusted to a fixed length by either truncation or zero-padding as needed. These are then concatenated into one vector , of length :
The raw signal is then concatenated with :
The resulting vector is of fixed size and ready for input into the CNN. In the actual procedure, we first normalize and separately so that they have the same maximum. A similar approach has been applied to VLBI signal analysis Lee et al. (2019).
Vi Hyper-parameters & Procedure
The hyper-parameters used here are meant to replicate the work in George and Huerta (2018). They are suboptimal but we use them for comparison purpose.
64, strides = 1, kernel size = 16
|3||Max Pooling||strides = 4, pool size = 4|
|5||Convolution||128, strides = 1, kernel size = 16|
|6||Max Pooling||strides = 4, pool size = 4|
|8||Convolution||256, strides = 1, kernel size = 16|
|9||Max Pooling||strides = 4, pool size = 4|
|11||Convolution||512, strides = 1, kernel size = 32|
|12||Max Pooling||strides = 4, pool size = 4|
The hyperparameters are used to replicate the work inGeorge and Huerta (2018).
We used mean-squared error for the loss function and Adam optimizerKingma and Ba (2014)
for optimization. Five epochs were used for training. The size of the synthetic data sets is. This was divided into elements for training and for testing. Initialization function was
in Keras. Random seeds were fixed everywhere necessary to force deterministic initialization and optimization for reproducibility. The procedure is as follows:
1. Generate sliding window embedding (SWE) of the raw signal
2. Perform the dimensional reduction on the SWE, yielding a -dimensional point cloud
3. Compute persistent homology of and of the SWE
4. Construct with a fixed
5. Normalize and raw data
6. Concatenate and the raw signal
7. Input into CNN for binary classification
Performance metrics: Sensitivity is the ratio of true positives to all positives and specificity is the ratio of true negatives to all negatives. They are also called the true positive rate (TPR) and true negative rate (TNR), respectively. A perfect classifier has both equal to . The case of is equivalent to using a coin toss as a binary classifier. The case where TPR and TNR corresponds to a case where the classifier always guesses positive regardless of input, vice versa if the classifier always guesses negative.
The receiver operating characteristic (ROC) curves are another metric for the evaluation. The closer the area under the curve (AUC) to , the better the classifier. A perfect binary classifier is a step function that reaches at . A classifier of shows no classification ability equivalent to a coin fair flip.
Software: We used GUDHI C. Maria, J. Boissonnat, M. Glisse, and M. Yvinec (2014); 13
for TDA and Keras for CNN as an interface to Tensorflow. For the ROC curves
sklearn.metrics.roc_curve()was used. For the sensitivity and specificity vs SNR curves, our own routine was used with a fixed threshold value of .
python-gwtoolswas used to calculate the optimal match-filtered SNR 14.
Figures 4, 5, 6 and 7 show two performance metrics for three different training and test sets. The blue solid line represents the CNN with raw data, the red the CNN with TDA features only (Eq. (6)) and the green the CNN with raw data and TDA features concatenated (Eq. (7)), labeled as
raw, tda, both. Each contains training elements, 50% of which have a GW signal. The test set has elements synthesized in the same manner as the training set. The first set uses different signals of different mass ratios, as well as different SNRs. The other two sets use only one signal with uniform samplings and SNRs respectively, for .
The left shows the sensitivity and specificity for each method versus SNR.
As detection is more difficult when there are a wider range of signals and SNR values (Fig. 4 and Fig. 5), these tests are descending in difficulty, Fig. 7 being the easiest case and Fig. 4 being the hardest. It is clear that using the CNN with the raw signal alone provides no detection ability except for the easiest case (Fig. 7), while the TDA features alone provide some detection ability in all cases, and the combined features are better in every case. This shows the power of the topological features to greatly improve performance, and the synergy of combining them with the raw signal (which increases maximum accuracy).
Figures 8, 9, 10 have a constant SNR and use signals with mass ratios between and . The training set sizes are and test set sizes , respectively. They show the effect of noise level and training set size on the efficacy of the method. They show that the TDA features are responsible for increased accuracy at lower SNR with less training. Note that when training the CNN with TDA features alone, performance does not change much as the training set size increases. It is worth noting that when using TDA features alone, performance maximizes after a relatively small training set. This lends to the asseration that the TDA features reduce the training requirements of the scheme.
As these examples indicate, the proposed method yields a significant improvement over the original CNN method. It could be very useful for pre-screening the interferometer data-streams to locate potentially interesting windows before more costly analysis. It is not limited to the detection of black-hole mergers, as many interesting astrophysical sources of gravitational wave produce a chirp type of signature Abbott et al. (2017c).
Acknowledgements.The authors thank Scott Field for providing data and for the helpful discussion. This work was supported by National Research Foundation.
- GW170814: a three-detector observation of gravitational waves from a binary black hole coalescence. Phys. Rev. Lett. 119, pp. 141101. External Links: Cited by: §I, §II.
- Characterization of transient noise in advanced ligo relevant to gravitational wave signal gw150914. Classical and Quantum Gravity 33 (13), pp. 134001. Cited by: §II.
- GW151226: observation of gravitational waves from a 22-solar-mass binary black hole coalescence. Physical review letters 116 (24), pp. 241103. Cited by: §II.
- GW170608: observation of a 19 solar-mass binary black hole coalescence. The Astrophysical Journal Letters 851 (2), pp. L35. Cited by: §II.
- GW170817: observation of gravitational waves from a binary neutron star inspiral. Physical Review Letters 119 (16), pp. 161101. Cited by: §II.
- GW170817: observation of gravitational waves from a binary neutron star inspiral. Physical Review Letters 119 (16), pp. 161101. Cited by: §VII.
- Observation of gravitational waves from a binary black hole merger. Physical review letters 116 (6), pp. 061102. Cited by: §I.
- Observation of gravitational waves from a binary black hole merger. Physical review letters 116 (6), pp. 061102. Cited by: §II.
- Robust statistics for deterministic and stochastic gravitational waves in non-gaussian noise: frequentist analyses. Physical Review D 65 (12), pp. 122002. Cited by: §II.
- Fast and accurate prediction of numerical relativity waveforms from binary black hole coalescences using surrogate models. Physical review letters 115 (12), pp. 121102. Cited by: §II.
- Topology and data. Bulletin of the American Mathematical Society 46 (2), pp. 255–308. Cited by: §I, §IV.
- Deep learning for real-time gravitational wave detection and parameter estimation: results with advanced ligo data. Physics Letters B 778, pp. 64–70. Cited by: §I, §I, Table 2, §VI.
-  (2018) Gudhi :: anaconda cloud. Note: https://anaconda.org/conda-forge/gudhiAccessed: 2018-10-20 Cited by: §VII.
-  (2019) Gwtools 1.0.2: a collection of gravitational wave tools. Note: https://pypi.org/project/gwtools/Accessed: 2019-07-15 Cited by: §VII.
- Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §VI.
- Backpropagation applied to handwritten zip code recognition. Neural computation 1 (4), pp. 541–551. Cited by: §I.
- Handwritten digit recognition with a back-propagation network. In Advances in neural information processing systems, pp. 396–404. Cited by: §I.
- Model discrepancy of earth polar motion using topological data analysis and convolutional neural network analysis. Submitted. Cited by: §V.
- The gudhi library: simplicial complexes and persistent homology. In International Congress on Mathematical Software, pp. 167–174. Cited by: §VII.
- Sliding windows and persistence: an application of topological methods to signal analysis. Foundations of Computational Mathematics 15 (3), pp. 799–838. Cited by: §III, §III, §IV.
- GW170104: observation of a 50-solar-mass binary black hole coalescence at redshift 0.2. Physical Review Letters 118 (22), pp. 221101. Cited by: §II.