EarthquakeGen: Earthquake Simulation Using Generative Adversarial Networks

11/10/2019 ∙ by Tiantong Wang, et al. ∙ Los Alamos National Laboratory 0

Detecting earthquake events from seismic time series has proved itself a challenging task. Manual detection can be expensive and tedious due to the intensive labor and large scale data set. In recent years, automatic detection methods based on machine learning have been developed to improve accuracy and efficiency. However, the accuracy of those methods relies on a sufficient amount of high-quality training data, which itself can be expensive to obtain due to the requirement of domain knowledge and subject matter expertise. This paper is to resolve this dilemma by answering two questions: (1) provided with a limited number of reliable labels, can we use them to generate more synthetic labels; (2) Can we use those synthetic labels to improve the detectability? Among all the existing generative models, the generative adversarial network (GAN) shows its supreme capability in generating high-quality synthetic samples in multiple domains. We designed our model based on GAN. In particular, we studied several different network structures. By comparing the generated results, our GAN-based generative model yields the highest quality. We further combine the dataset with synthetic samples generated by our generative model and show that the detectability of our earthquake classification model is significantly improved than the one trained without augmenting the training set.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Detection of earthquake events from the seismic time series record plays a fundamental role in seismology. However, such a task has been proved to be challenging. Seismic waveform has its own characteristics comparing to time series from other physics domains. It requires intensive training on the domain knowledge to manually recognize them. Conventional seismic detection methods detect events primarily through the use of similarity and correlation in the data. The most popular methods for the automatic detection of seismic events include short-time-average/long-time-average, waveform cross-correlation, and autocorrelation. However, those conventional detection methods either generate many false positives and fail in situations when the signal yields a low signal-to-noise ratio or suffer from expensive computational costs.

In recent years, with the volume of seismic data increasing significantly, automatic and efficient earthquake detection methods are needed. Deep neural network (DNN) methods have been successful in object detection to identify patterns. Of these, convolutional neural networks (CNN) have achieved promising results in computer vision, image analysis, and many other domains due to the significantly improved computational power (

krizhevsky2012imagenet; VGG; he2016deep; szegedy2017inception; huang2017densely; de2019transforming). Specifically, LeNet (lecun1995learning)

is known as the first kind of CNNs. In 2012, AlexNet won the ImageNet competition 


. The authors introduced fully connected layers and max-pooling layers to help AlexNet outperform all the other methods. After that, a sequence of different structures such as VGGNet 

(VGG), ResNet (he2016deep), GoogleNet (szegedy2017inception), and DenseNet (huang2017densely) were introduced.

Meanwhile, researchers in seismology have also started developing CNN-based earthquake detection methods. The first earthquake detection method using CNN network architecture is “ConvNetQuake” built in Convolutional-2018-Perol. The ConvNetQuake is a network with 10 layers including 8 convolutional layers, 1 flattening layer, and 1 fully connected layer. They successfully applied ConvNetQuake to study the induced seismicity in Oklahoma. DeepDetect-2019-Wu developed a “DeepDetect” detection method, which is a cascaded region-based convolutional neural network to capture earthquake events in different sizes while incorporating contextual information to enrich features for each proposal. DeepDetect is built on DenseNet structure. The total number of layers of DeepDetect is 64. DeepDetect-2019-Wu also reports promising results by applying DeepDetect to laboratory earthquakes. Built upon DeepDetect-2019-Wu, zhang2019adaptive

implemented a deep learning based earthquake/non-earthquake classification model with an adaptive threshold frequency filtering module. In the classification process, the input seismic sample is first passed through a high-pass filtering module with an adaptive frequency threshold, and then fed to an ordinary deep learning classifier. By applying the adaptive filtering module, the classifier is able to directly learn the patterns of the high-frequency seismic signals without being interfered by low-frequency environmental noises. By applying an adaptive threshold on the frequency choice, the model can adapt a customized frequency threshold for each different sample. The model reaches a relatively high performance. Due to the supreme performance reported in

zhang2019adaptive, we will build our method based on the existing work.

All of the aforementioned neural networks are supervised, meaning that they all require training procedure to learn pattern of earthquake from labels. On the other hand, the training procedure usually requires a sufficient amount of labeled dataset. However, in earthquake detection domain, annotated data are usually very expensive to obtain due to the requirement of intensive subject matter expertise. In order to resolve the dilemma, we develop a generative model based on GAN to synthesize labels and use them to augment training set.

Generative adversarial network is a type of generative models. It is designed based on an adversarial min-max game between two networks, generator and discriminator (GAN). The main idea is to sample from a simple distribution like Gaussian and learn to map this noise to data distribution using universal function approximators such as neural networks. This is achieved by adversarial training of these two networks. Researchers have successfully applied it to image synthesis (GAN; creswell2018generative), audio waveform generation (engel2019gansynth; yang2017midinet; chen2017deep), and speech synthethis (pascual2017segan; saito2017statistical; kaneko2017generative).

There have been multiple variants of GAN. Conditional GAN (CondGAN) turns the traditional GAN into a conditional model, which allows the user to customize the category of the generated samples by an additional label information as input. Another well known GAN-based conditional model is InfoGAN (chen2016infogan). It maps generated samples of different categories to different “latent code” in the input prior space. With the success of convolutional neural network (CNN) in image classification task, researchers also incorporate CNN into GAN and come up with deep convolutional GAN (or DCGAN) (DCGAN). For high resolution image synthesis, the design of DCGAN shows a stable capability of generating high-resolution images with convincing local details.

It may be worthwhile to mention that data augmentation has drawn significant amount of interest in image synthetic and classification research (baird1995document; claudiu2010deep; perez2017effectiveness; xu2016improved). In claudiu2010deep, with data augmentation techniques, additional training samples are generated from each layer of a network. With the help of new samples, the classifier reaches an error rate

, which yields a better performance compared with energy-based models(

(ranzato2007efficient) and neural network models without data augmentation((simard2003best). In xu2016improved, by ruling out noisy data carefully, the performance of a deep RNN based classification model is improved by a large margin of with a data augmentation. Particularly, GAN has shown its advantage in data augmentation tasks. In perez2017effectiveness, with the help of GAN-generated synthetic samples, the accuracy is improved by on various classification tasks. To our best knowledge, data augmentation with GAN generated synthetic data yet has not been applied to 1D time series such as seismic event detection task.

In this paper, we developed a generative model based on conditional GAN that can produce synthetic seismic time series. In the model design, to better learn the characteristic of seismic samples, we develop a three-pipeline structure on the generator. To further improve the performance of synthetic samples, we incorporate Fourier transform as part of our discriminator, and utilize features from both high and low frequency components of the seismic sample. We then validate the quality of synthetic seismic events visually and quantitatively. With the promise of our high-quality synthetic seismic samples, we further explore the feasibility of augmenting limited data sets with our synthetic samples on a earthquake detection task.

In the following sections, we first briefly provide the related work in Section 2. We describe the fundamentals of GANs and its variants in Section 3. In Section 4, we provide details on the field data and preprocessing techniques. We then develop and discuss our model in Section 5. Section 6 describes experimental results. Finally, concluding remarks and future work are presented.

2 Related Work

2.1 GAN on two dimensional data

Most existing research work on GAN is targeting on image synthesis. In the first paper that introduced GAN (GAN), authors used image datasets including MNIST, TFD and CIFAR to illustrate the performance of GAN model. In the work of DCGAN (DCGAN; DCGANimages), authors successfully combined convolutional layers with GAN, and enabled GAN to generate high-resolution synthetic images with better quality. The network structure of DCGAN has provided inspiration for numerous other works. As an example, in brock2018large, a carefully tuned conditional GAN model is able to generate synthetic images with resolution as high as and yields an inception score of . Those synthetic images are almost indistinguishable by human.

There is theoretical GAN research mainly focusing on the improvement of the loss function. Those works are essentially to find a better divergence between the real and generative samples’ distributions. In 

nowozin2016f, authors found that a group of various loss functions were aimed to decrease a same family of divergence, i.e., f-divergence. In the work Wasserstein GAN (arjovsky2017wasserstein) and its improved versions (gulrajani2017improved; miyato2018spectral; berthelot2017began), authors replaced Jensen-Shannon divergence with Wasserstein divergence. With this replacement, the differentiability of the loss function was maintained when the generative and real distributions have zero measure of intersection. All those GAN-based models have been successfully applied to image synthesis.

2.2 GAN on one dimensional data

Besides the applications on two-dimensional data like images, researchers are also applying GAN to one-dimensional dataset, which can be categorized into audio waveform synthesis and human speech synthesis.

2.2.1 Audio Waveform Synthesis

oord2016wavenet developed a GAN model and successfully generated high quality sample of audio waveform by solving the scale issue. In yang2017midinet, the authors generated melody one by one in the symbolic domain with a GAN based network structure. Based on oord2016wavenet, engel2019gansynth further developed an improved model that use log-magnitude for target spectrum/phases, and reaches the State-of-The-Art performance in audio synthesis task.


developed a model that outperforms the baseline model on both automated and human evaluation metrics. In

chen2017deep, the author solves the cross-modal audio-visual musical performance generation problem using conditional GAN.

2.2.2 Human Speech Synthesis

hosseini2018multi applied CycleGAN (zhu2017unpaired) on unsupervised non-parallel speech domain adaptation task. To alleviate the over-smoothing problem, hosseini2018multi combined GAN with conventional generation loss, and showed that their model can generate natural sounding speech with the model trained on data from other domain. For the same problem in statistical parametric speech synthesis models, a less straight-forward approach is presented in saito2017statistical, where the authors utilized GAN as a realistic spectral texture generative model, so that the true distribution of time-frequency representation can be learned.

3 Background

3.1 Generative Adversarial Networks

Generative adversarial network (GAN) is a family of deep-learning-based generative models that can be used to learn a distribution and produce realistic synthetic samples. This avoids the direct sampling from real distribution. Hence, GAN can be helpful in cases when the real sample distribution is implicit (density function is unknown) and expensive to be sampled. A typical GAN consists of two feed-forward neural networks: a generator and a discriminator. The generator is to learn a function that maps a prior vector to a realistic synthetic sample, while the discriminator reads in both real and synthetic samples and provides critics to them. The loss function of GAN can be written as


where is the generator and is the discriminator. The random vector of follows

, which usually is a multi-dimensional Gaussian distribution and

is sampled from the real distribution of . produces a synthetic sample . The discriminator reads in a sample (either and ) and provide a scalar critic, . The generator and discriminator are trained in an adversarial manner. The generator is trained to produce a synthetic sample similar to real samples . On the other hand, the discriminator is trained to distinguish and by yielding a small related to and higher to the .

Training GAN is an alternative min-max game between discriminator and generator. It is adversarial in that the discriminator learns to better distinguish the synthetic samples from the real ones while the generator learns to produce more realistic samples by improving the approximation to the real sample distribution. The competition and cooperation between discriminator and generator will promote the closeness of the the generative distribution to the real sample distribution. A generic structure of GAN can be illustrated in Fig.(a)a.

GAN model produces realistic samples. However, the loss function developed in Eq. (1) can be limited when applied to categorized data sets. The generator will learn the overall distribution of the whole dataset, while the label of a synthetic sample will be randomly specified. Hence, there is a need to incorporate label information to the GAN.

3.2 Conditional GAN

With an input of a label information to both generator and discriminator, a traditional GAN can be turned into a conditional GAN (CondGAN). The structure of conditional GAN can be illustrated in Fig. (b)b. It allows the generator to produce samples that belong to given categories. The loss function of conditional GAN can be written as


where is the label of real sample , and is the targeted label.

In conditional GAN (Fig. (b)b), the generator, , reads in the prior-label pair of , where is the targeted label of the synthetic sample . The discriminator, , reads in the sample-label pairs ( or ), and yields a scalar critic, , for each pair. Besides evaluating the sample itself, will also justify whether the sample matches its label. A synthetic sample-label pair can only achieve a high value of from when is realistic enough and belongs to the targeted label of . Consequently, is forced to generate high-quality synthetic sample that will match the targeted label .

Figure 1: An illustration of the structure of GAN ((a)a) and conditional GAN ((b)b)

4 Data Description and Pre-processing

4.1 Raw Seismic Waveform Time Series

Seismic waveform propagates in the subsurface when an earthquake occurs. At a certain point, it can be recorded by a seismic station. A broadband seismic station records raw seismic waveform with three components of BHE, BHN and BHZ by the velocity (m/s). BHE component represents the movement in the east-west direction; BHN component represents movement in the north-south direction; BHZ component represent the movement in the up-down direction.

Seismic waveform data are usually noisy, consisting both high-frequency seismic event and low-frequency noise component zollo2009earthquake; allen2003potential. To better analyze the earthquake waveform, a high-pass filter is usually applied to the raw waveform data to obtain the filtered seismic signal.

Seismic body waves include both P and S waves. These two types of wave have variant travel speed - S wave travels at about half of the P wave speed. Thus a nearby seismic station will first record the arrival of P wave, and then S wave. In most of earthquake records, P wave is relatively easier to visualize in BHZ component, while S wave is easier to notice in BHE and BHN components. The amplitude of P wave is usually smaller than S wave. In most cases, the arrival of both P and S waves can be clearly identified by an abrupt amplitude “burst” in the waveform record. We provide an real waveform with earthquake event in Fig. 2.

Figure 2: Illustration of an earthquake event including the raw waveform in (a) and the filtered waveform in (b). The three rows in each figure show components of BHE, BHN and BHZ, respectively. In (b), we indicate the arrival time of P and S waves.

4.2 Dataset Generation

We use two field datasets to validate the performance of our model. Each dataset is processed from raw waveforms data acquired at two different transportable seismic stations: V34A and V35A. Station V34A and V35A are located in the state of Oklahoma, approximately 60 80 km away from the Oklahoma City, as shown in Fig. 3. Station V34A operated from Nov 1st 2009, 21:59:18 to Sep 3rd 2011, 13:55:28. Station V35A operated from Mar 14th 2010, 18:47:42 to Sep 4th 2011, 23:59:58. Both stations are broadband seismometers operated at sampling rate of 40 Hz.

Figure 3: Stations V34A and V35A are both located in the state of Oklahoma, USA. The dataset acquired from these stations are used to test the performance of our model.

To compile an earthquake catalog, we use a modified version of the earthquake detection method described in Tidal-2017-Delorey. In this method, patterns of arrivals are first identified by an autonomous seismic tremor detection method across a seismic array that emanate from local earthquakes, and then each identified arrival is examined by an expert so that false alarms are precluded. So, we detected earthquakes from station V34A, and earthquakes from station V35A during the time of operation in our study area.

The duration of an earthquake is an important parameter to decide. We apply a consistent window size to all earthquakes in this work. We find that a window size of seconds can be a good option in that it is large enough to cover any individual earthquakes while small enough to facilitate an efficient training (zhang2019adaptive). Provided with the sampling rate, we define a raw seismic time series sample as a -channel vector of length . We provide both positive and negative seismic sample based on whether or not there is an earthquake event included in the time series.

Earthquakes in our datasets are extremely sparse over time. We find that the duration between any two neighboring earthquakes is no less than time stamps in the raw seismic time series, so that any two consecutive earthquakes will not be included in the same positive sample of length time stamps.

With all the aforementioned details on our raw seismic waveform, we build our dataset following the rules listed below

  • Each positive sample shall cover a single earthquake;

  • Negative samples shall not cover any earthquake;

  • Positive and negative samples shall not overlap with each other;

  • The number of positive and negative samples shall be balanced.

We use the station V34A as an example to demonstrate the procedure of building data set out of the above four rules. For each seismic event located at a time stamp , we firstly sample three offsets , and

from a discrete uniform distribution of

. We then create three positive samples by segmenting three intervals length of centered at , and on the raw waveform data. That will provide us a total of positive samples. For negative samples, we randomly select a total of time segments with a length of from the remaining raw seismic waveform. Eventually, the positive and negative samples together will result in a total data size of for station V34A. Similar procedures can be applied to station V35A, and that will provide us with a dataset of size which consists of positive samples and negative samples.

In Fig. 4, we show six samples of our dataset, including three positive and three negative ones. To better visualize the waveform of the earthquakes, we show the raw waveforms (Figs. (a)a, (e)e, (i)i, (c)c, (g)g, (k)k) as well as their filtered waveforms (Figs. (b)b, (f)f, (j)j, (d)d, (h)h, (l)l). Waveform of positive and negative samples are colored red and green respectively. In Fig. (b)b, we use arrows to indicate the arrival time stamp of P and S waves of an earthquake.

Figure 4: Illustration of six waveform samples and their corresponding filtered waveforms, including three positive and three negative samples. Each sample shows a 40-second period of seismic record with sampling rate of Hz. Column 1 shows the raw waveforms of the positive samples, and Col. 2 shows their filtered waveforms. Column 3 shows the raw waveforms of negative samples and Col. 4 are their filtered waveforms.

4.3 Normalization

In raw seismic time series, values recorded by the seismic stations spread over a range of m/s. To effectively learn the features of the seismic waveforms, the dataset needs to appropriately normalized. In particular, for a -channel, -length raw seismic time series of , we normalize it as


Through comparison to other normalization schemes (zhang2019adaptive), the one in Eq. (3) yields the best results.

5 Model Design

5.1 Model

Figure 5: An illustration of our network structure

Our model is built on the structure of conditional GAN CondGAN. The main structure of our model is illustrated in Fig. 5, which consists of two networks: the generator and the discriminator.

5.1.1 Generator Structure

The structure of our generator reflects the characteristics of seismic time series. Particularly, through previous sections we learned that real seismic samples contain channels of 1D time series length of . Correspondingly, we design our generator to comprise of three pipelines to synthesize each channel of the data individually. All three pipelines share an identical network structure as shown in Fig. 5. Each pipeline is a four-layer convolutional network. There is no interaction among the three pipelines except that they share the same input. As shown in Fig. 5, the input vector is a Gaussian noise vector of length , while is a binary scalar with representing positive event. With both and becoming available, we will pass through a transposed 1D convolution layer to obtain an augmented 1D feature vector of length . In parallel, the scalar input of will be augmented to be length of , and then further concatenated with augmented vector. Similar to the conventional DCGAN, we use an additional 3 layers of 1D convolution layers to synthesize one channel data in the seismic sample of . Similarly, we can obtain the other two channels of the synthetic seismic sample of through two other pipelines.

5.1.2 Discriminator Structure

The discriminator will be used to evaluate the quality of input samples (real or synthetic). The discriminator first learns features representative to seismic signals including both earthquake and non-earthquake events and further provides critics based on features learned. According to the domain knowledge as illustrated in Sec. 4.1

, raw seismic time series consists of earthquake events which reside in high-frequency band and non-earthquake events which exist in the low-frequency band. Therefore, we design our discriminator to include two sequential modules, namely, “feature extraction” and “sample critic”. The module of feature extraction learns high-quality feature vector, which will be passed onto the module of sample critic that evaluates the quality of the samples.

Based on conditional GAN, our discriminator receives two inputs: the sample and the label information. In particular, the sample and label come in as data pair, either or , where the former stands for real data pair and the latter stands for synthesized pair.

For the module of feature extraction, to characterize the seismic time series, we decompose the original time series, , into those of low frequency component, , and those of high frequency component, . This procedure can be posed as the following two steps




where is the Fourier transform and is the inverse Fourier transform. is the seismic waveform vector in time domain, and

is the transformed vector in frequency domain.

is a hyper-parameter that determines the frequency threshold between earthquake events versus non-earthquake events. It is worthwhile to mention that it is important to select an appropriate value of . In this work, we learn the value of through our previous work in zhang2019adaptive. With and obtained through Eqs. (4) and (5), we further pass them through two identical pipelines consisting of two convolution layers. Each of pipeline converts the inputs signal of dimension into a feature vector of dimension . The first dimension is the channel number and the second dimension is the length of the vector.

As shown in Fig. 5, another input to the discriminator is a binary scalar, . Similar to the generator, we firstly augment to be a vector of dimension with a linear layer. To match the dimension of feature vector from sample, we further enlarge the vector to be dimension of with a convolution layer. With three feature vectors learned from , , and , we combine them to obtain a feature vector of dimension .

For the module of sample critic, the discriminator justifies the quality of the samples based on the feature vector obtained from the module of feature extraction. Specifically, we design a network of three convolutional layers with stride

and followed by a mean operator as illustrated in Fig. 5. The output of the discriminator is a scalar value , indicating the quality of the input data pair.

5.1.3 Loss Function

An improved loss function of Wasserstein GAN has been developed and shown to be effective in providing a more stable convergence during training (gulrajani2017improved). We therefore apply similar loss function to our problem. In particular, the loss function of generator and discriminator can be written as




where represents the generator, and represents the discriminator. represents the distribution of real samples. represents a Gaussian noise vector. is a hyper-parameter, that is set to be in our experiments according to gulrajani2017improved.

6 Experiment

In this section, we design four tests to validate the performance of our generative model. In Test 1, we first provide a performance comparison of our model versus baseline models via visualization of the synthetic samples. In Test 2, we further evaluate the quality of our synthetic samples via a classification task. We further study the robustness of our model under limited training sets in Test 3. Finally, we apply our generative model on a data augmentation task in Test 4.

6.1 Test 1 on Synthetic Earthquake Evaluation via Visualization

In this test, we visually verify the synthetic results of our model and the baseline models. All the generative models in this section are trained on the full dataset from V35A, which contains real samples with positive versus negative ratio as 1 : 1.

6.1.1 Our Model

We first show several synthetic samples generated by our model in Fig. 6, including both positive (on the first row in red) and the negative (on the second row in green). We observe the positive synthetic samples share similar characteristics to those of the real positive samples in Fig. 4. In particular, it may be easier to notice in BHZ channel that the P-wave arrival is ahead of the S-wave arrival. Relatively, S-wave arrival is easier to observe from the BHE and BHN channels. The arrivals of both P and S wave have an abrupt “burst” on the amplitude, while the S wave amplitudes in channel BHE and BHN are more visible than those of P wave.

Figure 6: Illustration of synthetic samples generated by our model. We provide five positive samples in the first row and five negative samples in the second row.

6.1.2 Baseline Designs

We use seven baseline models for comparison. we design the baseline models by varying several critical components of our model. This is to validate the effectiveness of our generative model.

• Baseline 1 - Direct Deployment of DCGAN

Our first baseline model is a network with a structure similar to the one in DCGANimages, which can be seen as a single pipeline variant of our model. Based on this structure, we provide some synthetic samples in Fig. 7. As shown in the figure, for either positive or negative synthetic samples, the waveforms of all three channels become almost identical.

Figure 7: Synthetic samples ( positive ones on the left, and negative ones on the right) generated by baseline 1.

• Baseline 2 - Independent Generators

It can be important to use a shared input for three pipelines in our generator. To demonstrate the importance, we design a baseline model by feeding each pipeline with independent input pair of and label feature vector augmented from . We show the corresponding synthetic results in Fig. 8. Even though the synthetic data in three channels are not more identical, the temporal correlation between channels are lost as well. For an instance, the arrivals of S wave in BHE and BHN channels are not strictly correlated in time. This is due to the fact that the three channels are generated by three independent generators and no information is shared among them.

Figure 8: Synthetic samples generated by baseline 2.

• Baseline 3, 4, 5 and 6 - Kernel Size

Kernel size can be an important factor in the network structures (largeKernelMatters; kernelSizeTip). We design four baseline models (baseline 3, 4, 5, and 6) to illustrate its effectiveness in both generator and discriminator. Specifically, the generator kernel size is varied from to in baseline 3 and from to in baseline 4, respectively. We show the corresponding results in Figs. 9 and 10. Similarly, the discriminator kernel size is varied from to in baseline 5 and from to in baseline 6, respectively. Their results are provided in Figs. 11 and 12. Observed from Figs. 9 and 10, neither of baseline 3 and 4 are capable of learning effective features to generate earthquake events. Figs. 11 and 12 does not yield satisfying result either. Particularly, in Fig. 11, the abrupt arrivals of P- or S-wave are not generated. In Fig. 12, the high frequency component of positive samples are not realistic comparing those real samples in Fig. 4.

Figure 9: Synthetic samples generated by baseline 3.
Figure 10: Synthetic samples generated by baseline 4.
Figure 11: Synthetic samples generated by baseline 5.
Figure 12: Synthetic samples generated by baseline 6.

• Baseline 7 - Fourier transform Removed

Signal decomposition can be helpful in producing representative feature vectors in our discriminator. To validate this, we design a baseline 7. In this baseline model, instead of decomposing temporal signal as in Eq. (5), we simply duplicate the input temporal signal, and feed them to the pipelines, respectively. We provide the synthetic results using baseline 7 in Fig. 13. Visually, baseline 7 yields better results than aforementioned six baseline models. However, comparing to the real earthquake events in Fig. 4, there are still some generated samples which are quite visually distinct (for instance, positive samples shown in Fig. (b)b and Fig. (c)c).

Figure 13: Synthetic samples generated by baseline 7.

6.2 Test 2 on Synthetic Earthquake Evaluation via Classification

Other than evaluating the quality of our synthetic samples via visualization, in this test we provide quantitative evaluation by applying the synthetic samples to a classification task. The classification accuracy on real data will be utilized to quantify the quality of synthetic samples that are used as training sets.

We use the adaptive filtering classification model due to its superior performance in earthquake detection (zhang2019adaptive). We use dataset from both V34A and V35A for this test. Specifically, take V35A dataset as example, we divide the dataset into a real training set size of and a testing set size of . The ratios of the positive versus negative samples in both training and test sets are . Classification accuracy is used as the metric in this test and it is defined as


where “TP” and “TN” refer to the numbers of true positive and true negative, respectively. “Total” refers to the total number of samples in the test set, which is in V34A and in V35A. With the classification model and the dataset selected, we proceed the test on each of the generative models as in the following four steps

  1. Train the generative model with real training set;

  2. Based on the trained generative model, produce additional synthetic samples, which become the synthetic training set;

  3. Train the adaptive filtering classifier with the synthetic training set;

  4. Test the trained classifier on the test set and report the accuracy.

Intuitively, the performance of the classifier using real training set will be better than the one trained on synthetic set. Hence, we provide the performance of the classifier (denoted as ) trained on real training set for the comparison purpose. Similarly, we denote the classifiers trained on synthetic training set as . The higher the classification accuracy of , the better the quality of the synthetic samples that are used to train the classifier. We provide the classification results in Tab. 1. As expected, yields the best performance among all classifiers. The classifier , that is based on our generative model, produces the second best classification accuracy. This is consistent with our visual evaluation results. It is interesting to notice that there can be some inconsistency in-between visual evaluation and classification accuracy. As an example, The results of baseline 2 as shown in Fig. 8 can be easily identified as unrealistic samples by human experts. However, when these synthetic samples are used to train a classifier, we obtain accuracy as high as and based on our two dataset. This inconsistency is due to the fact that adaptive filtering classifier favors local features while human are capable of capturing both local and global features. Through this test, we verify that our generative model can effectively learn the key features from real seismic time series so that its synthetic samples may be as helpful as the real data for the classification task.

V34A 98.93 97.11 49.58 95.02 68.09 90.51 95.95 94.60 94.27
V35A 98.81 96.96 40.81 95.35 52.06 53.99 91.60 95.04 93.45
Table 1: Classification results using classifier trained on real training set ( in Col. 2) and those trained on synthetic training set ( in Cols. 3 to 10). Specifically, is based on our model, and are based on baseline 1 to 7, respectively.

6.3 Test 3 on Robustness of Our Generative Model

Our generative model is trained on labeled dataset. However, in practical the high quality labels can be expensive to obtain as we mentioned earlier. Hence, it is important to study the robustness of our generative model when the size of the training set is limited. To verify the robustness, we design our test to train on data set with sizes varying among , , , , and . We keep the ratio between positive and negative samples to be one in all those limited training sets. Take the training set size of as an example, we randomly select positive and negative samples from V35A training set, and combine them as the limited training set size of . We construct four other training sets sizes of , , , and in a similar approach. With those five training sets being available, we train and obtain five different generative models, namely, , , , , and . Using each generative model, we further synthesize a training set size of , which consisting of positive and negative synthetic samples. Based on those five synthetic training sets of , we independently train five adaptive filtering classifiers and test each of them on the same V35A test set as those used in Test 2. We record the accuracies of the predictions from those five classifiers (Cols. 2 to 6) in Tab. 2. As a benchmark, classifier trained on the real training set is also reported (denoted as “real” in Col. 1 of Tab. 2) and we use all real samples as the training set. Not surprisingly, with a real training set, the classifier yields the best accuracy (Col. 1). While using synthetic samples only (Cols. 2 7), the classifiers still produce reasonable predictions with accuracy higher than 70%. This indicates the robustness of our generative model with respect to limited training set sizes. We implement a similar robustness test on V34A dataset and report the results in Tab. 3. Similar conclusion can be drawn. Through this test, we learn that our generative model can be effective when training set is limited. This is consistent with the image synthesis task (gurumurthy2017deligan; marchesi2017megapixel), where GAN has been proven to be effective on limited dataset. Despite of the reasonable classification results are obtained in Tabs. 2 and 3, we will explore the possibility to further improve accuracy. One natural idea is through data augmentation.

98.81 77.63 72.69 85.73 91.00 94.20
Table 2: Robustness results on V35A dataset. We provide benchmark result (Col. 1) as well as those results using our generative models based on five different limited training sets (Cols. 2 to 6). Our model yields reasonable robustness with limited training size.
98.65 65.86 82.65 94.65 93.07 92.33
Table 3: Robustness results on V34A dataset. We provide benchmark result (Col 1) as well as those results using our generative models based on five different limited training sets (Cols. 2 to 6). Our model yields reasonable robustness with limited training size.

6.4 Test 4 on Data Augmentation using Our Model

Data augmentation is a commonly used technique in machine learning to increase the diversity of the data available for training. Such a technique can be valuable for automatic earthquake detection using machine learning approaches due to the expensive costs in obtaining high quality labels. However, traditional data augmentation techniques such as cropping, padding, or flipping may not work because of the unique patterns of the earthquake events in multi-channel 1-D seismic time series. Through our previous tests, we demonstrate our generative model is capable of synthesizing realistic positive and negative seismic samples. In this test, we further utilize those synthetic samples to augment the training set and evaluate the performance of classification. In particular, we design this test based on the same five generative models (

, , , , and ) and their related real training sets of limited sizes (, , , , and ) from Test 3. We use those generative models to produce different numbers of the synthetic samples that will be combined with the existing real training set. For the ease of demonstration, we use a variable to stand for the augmentation ratio between real samples and synthetic samples to be added. We choose six different augmentation scenarios including , , , , and . Take and as an example, we firstly generate 10 synthetic samples consisting of 5 positive and 5 negative samples using . We further combine those 10 synthetic samples with the existing limited real training set size of 10, and that will give us an augmented training set size of 20. We then train an adaptive filtering classifier using the augmented training dataset. We report our classification accuracy using the V35A test set in Tab. 4. As a baseline, we also include a scenario of non-augmentation test, where the classifier is trained only on real data set and we denote this as . Same as in zhang2019adaptive, we use the learning rate for training classifiers. To make a fair comparison, we train each classifier for a total of iterations. We can observe from Tab. 4 that baseline () in general yields the worst classification accuracy compared to the augmentation scenarios. For 25 out of 30 cases (bold in Tab. 4), the augmentation of dataset shows improvement on the performance of the classifier. In the best case, the accuracy is increased by nearly 14% (Col. 3 in Tab. 4). We proceed similar tests on the V34A test set and report the results in Tab. 5. Similar performance improvement can be observed, where 25 out 30 cases result in improvement and the largest increase in classification accuracy is close to 17% (Col. 3).

Through this test, we conclude that the synthetic samples generated by our generative model can improve the the performance of the classifier by data augmentation.

69.59 76.51 88.90 89.31 93.22
66.41 70.48 84.42 89.55 93.53
67.68 72.44 93.18 92.07 96.15
73.08 87.02 93.18 92.87 95.72
77.98 89.00 92.66 92.39 95.89
79.08 89.96 92.07 92.67 95.75
79.90 89.36 92.41 92.86 95.51
Table 4: Detection results using classifiers trained on augmented training set from V35A dataset.
71.79 73.87 85.42 92.21 95.25
65.01 68.80 88.09 89.32 95.37
61.86 82.75 95.80 94.43 96.15
71.56 90.81 96.32 95.01 96.07
75.58 89.97 96.47 94.84 96.14
76.55 91.44 96.52 94.89 95.98
77.08 91.52 96.07 94.82 95.88
Table 5: Detection results using classifiers trained on augmented training set from V34A dataset.

7 Discussion and Future Work

7.1 Machine Learning and Earthquake Physics

Time series classification has been widely applied to many domains such as communication, medicine, finance, etc. Even though similar seem-like classification techniques have been developed for those applications, there are some fundamental differences comparing to earthquake detection, that is the underlying physics.

Depending on the causing factors, there are various types of earthquakes and induced earthquakes are one of those. Induced earthquakes, such as the ones being studied in this paper, are a few of many examples of humans causing earthquakes. It happens during the course of injecting a mixture of water and chemicals at high pressure into the ground, with the goal of increasing the permeability of the rocks. Earthquakes happens when there is a slip on fault. For fluid-induced anthropogenic earthquakes, the injection of fluids either increases the ambient fluid pressure, or loosens, the fault interface, or both. That would make it easier for the fault to slip, hence creating earthquakes.

Seismic time series (or seismogram) is a 1-D representation of the earthquake physics. Many critical governing earthquake physics can be directly or indirectly inferred from seismogram such as the field epicenter, magnitude, moment tensor or even subsurface velocity structures. Different causing factors will create differences in the governing physics, which result in difference representation or pattern that is being captured in seismogram. From this perspective, a reasonable question to ask would be “Does the generative model learns the pattern or capture the critical physics from data?” In all our tests, we demonstrate that our generative model visually and quantitatively yields similar pattern to the real earthquakes. Despite of all the encouraging results, that only answers the first part of the question. In order to answer the second question, more intensive analysis needs to be carried out on the synthetic seismogram. In particular, it would be fundamentally critical to understand whether not similar physics can be inferred from the real seismogram versus from our synthetic seismogram. We will study this topic as our future work.

8 Conclusions

We develop a generative model that can produce high quality seismic waveform samples with either earthquake events or non-earthquake events. Based on conditional GAN, we apply pipeline design and Fourier transform process in the network structure, and find a proper kernel size for each layer. To verify the efficacy of our generative model, we apply it to seismic field data collected at Oklahoma. Through the numerical tests, we show that our model can generate high-quality synthetic earthquake. We further demonstrate that the earthquake detection performance can be improved by using augmented training sets with both synthetic and real samples. Therefore, our generative model has great potential in obtaining valuable labels and improving event detection performance.

9 Acknowledgments

This work was supported by the Center for Space and Earth Science (CSES) at Los Alamos National Laboratory (LANL). The experiment was performed using supercomputers of LANL’s Institutional Computing Program.