1 Introduction
Understanding seismological data is important for disaster prediction as well as understanding the physical properties of the earth’s crust. In the past years, the drastic rise in the number of seismic monitoring stations has transformed the field of seismological research from an observation-based to a data-driven science [1]. In general, earthquake seismology problems fall into three categories: probabilistic risk assessment [2, 3, 4, 5, 6], earthquake recognition for data mining and early-earthquake detection [7, 8, 9, 10, 11], and earthquake prediction for a warning system [12, 13]
Seismology studies are conducted with four objectives: (1) disaster preparation or adjustment, (2) actions to reduce long-term the disaster of earthquake, (3) disaster response strategies, and (4) post-disaster recovery planning, which are known as the strategies of preparedness, mitigation, response, and recovery, respectively. The challenges in the detection of every earthquake from seismic data are: (1) a huge amount of noisy data in the seismic records and (2) many earthquake events are undetected[11]. As a case in point, the nonvolcanic tremors were first observed in southwestern Japan only two decades ago[14] This is because the weak signal generated by these tremors is hard to detect in certain regions. Hence, efficient detection of seismic signals would allow us to better understand the processes of seismic zones and hence predict earthquakes effectively. In recent years Machine Learning algorithms have been demonstrated to be effective in classification and prediction tasks. Unlike other AI algorithms where the features have to be manually defined[15, 16, 17]
machine learning enables the computer to select relevant features on its own, it does require huge amounts of labeled data to train successfully. This requires a lot of man-hours and is a major disadvantage. A better alternative is the use of unsupervised learning, like clustering algorithms, that allow training on unlabelled data. The computer can separate the data into distinct classes based on similarities among members of a particular class. Unsupervised learning has been applied to the data from volcano monitoring systems
[18, 19, 20, 21], induced seismicity[22, 23], global seismicity[9], and local vs distance earthquakes[24].2 Proposed method
Mel spectrograms are widely used in speech recognition. The raw frequency data is converted to mel scale using the following formula:
(1) |
The rationale behind this is, it has been empirically proven that humans do not perceive frequencies linearly but logarithmically[25]
In this paper, we convert the raw seismological data using the formula
(2) |
where, c1 and c2 are constants, derived empirically.
This is based on the fact that some animals can predict earthquakes from seismic signals[26]
. Therefore, it must help to adjust the spectrogram from human hearing to the hearing of those animals. From a machine learning point of view, this operation reduces the variance in data, unimportant for the given task, and increases variance for the important data. Thus, it makes it easier for the machine learning model to learn the important features for the prediction or classification task.
The obtained frequencies are converted to a spectrogram using a short-time Fourier transform and passed to the computer vision network.
In this experiment, we use a Convolutional Neural Network (CNN)
[27]to extract features from the spectrogram. The CNN is a form of deep neural network that uses multiple filters with trainable weights to extract features, usually from a 2D representation of the data. The CNN we used is a resnet
[28]. Resnet consists of layers of convolutions with skip connections between some layers, along with activation layers (we used ReLU) and pooling layers. Each layer of convolution applies a trainable filter to the data passed to it. We used a combination of max and average pooling layers in our model. The output of the resnet was fed into a clustering model.


We used the Gaussian mixture model
[29]for clustering, where the goal is to find a set of K-normal distributions of mean
and covariance(where k=1 to K) to best describe the overall data. As the final output of the algorithm, a categorical variable is also inferred. The Gaussian mixture clustering is a probabilistic and more flexible version of the K-means clustering algorithm, in which the clusters can be unbalanced in terms of internal variance, each covariance can be anisotropic, and where decision boundary is soft. The negative likelihood of the data to be fully described by the set of normal distributions is used as the clustering loss. The number of clusters is inferred by our procedure. For the Gaussian mixture algorithm, we initialize the number of clusters (K) = 10 and let the model train with expectation minimization strategy
[29]. We made clustering optional, once clustering loss stagnated after 6000 epochs. We employed batch processing, which randomly selects subsets of the whole dataset for faster training. This also prevents the model from getting stuck in local minima. We trained the model for 10000 epochs. The Clustering loss decreased by a factor of 5.
3 Data
For this experiment, we used seismic data (waveforms and related metadata), publicly available on the IRIS data management center website. IRIS data services is funded by the Seismological Facilities for the Advancement of Geoscience and EarthScope (SAGE) Project, which is funded by the NSF under Cooperative Agreement EAR-1261681. In addition, we used some simulation data from USGS.


We also introduced some artificial data to the dataset. The artificial data was generated by using data augmentation techniques (translation, reflection about horizontal and vertical axis).
4 Results
Our method consistently outperformed other, commonly used models, with a similar number of parameters. In addition, our model was able to train well on smaller datasets. In the comparative study between the same model, trained with our variant of Mel spectrogram and ordinary spectrogram, the one trained on the variant of Mel spectrogram outperformed the one with normal spectrogram by a substantial margin. [Table 1]
CNN | Clustering loss with ordinary spectrogram | Clustering loss with Mel spectrogram |
---|---|---|
Resnet 18 | 6.27 | 5.04 |
Resnet 50 | 5.62 | 4.50 |
Resnet 101 | 4.79 | 3.95 |
Resnet 152 | 3.56 | 2.90 |

5 Discussion
Our method, thus, provides a valuable tool for remote seismic monitoring centers, who don’t have access to a high amount of computational resources. The clustering algorithms can detect very subtle patterns, which might be missed by human seismologists, and group data items together. Thus, it is particularly effective for data shown in [30,31]. The clustering also allows the detection of seismological phenomena that might have been considered regular and ignored by a seismologist. This is important for spotting precursors to a natural disaster, like an earthquake or volcanic eruption. It also enables a better understanding of the inner mechanics of the Earth’s crust by detecting hard to detect phenomena, like nonvolcanic tremors, low-frequency earthquakes, distant vs local earthquakes, etc. Animals like elephants have been widely known to run inland from the coast just before an earthquake, followed by a tsunami (https://www.nationalgeographic.com/animals/article/news-animals-tsunami-sense-coming, https://www.dailymail.co.uk/news/article-3614477/How-Ning-Nong-elephant-saved-tsunami-incredible-bond-little-girl-baby-jumbo-inspired-Michael-Morpurgo-s-sequel-War-Horse.html). Since hearing of animals has not been empirically tested to generate proper constants for the Mel spectrogram, we tried out different constants stochastically and reported the results for the best one, so far, in this paper. We are continuing to test more combinations of constants and better results could be forthcoming. If we get close to the hearing efficiency of an elephant then every seismological monitoring station would be able to detect an approaching earthquake and issue an early warning of up to 20 minutes before the earthquake hits, saving lives of people in the region by giving them sufficient time to get to safe places.
6 Conclusion
We have proposed an idea that, to our knowledge, has not been explored before. It is far from perfect as the manual testing of constants for the Mel Spectrogram variant is tedious and time-consuming. We are currently working on automating the process. There is a lot of work to be done in the field of ML for seismology, which due to lack of attention from the industry, remains largely neglected. It is a very important area of study with the potential to save thousands of lives each year.
References
- [1] Havskov, J., and Ottemoller, L., 2010, Routine data processing in earthquake seismology: With sample data, exercises and software: Springer, Netherlands, Dordrecht, 347 p.
- [2] S. P. Nishenko, R. Buland. A generic recurrence interval distribution for earthquake forecasting. Bulletin of the Seismological Society of America (1987) 77 (4): 1382–1399
- [3] Kagan, Y. Y., and Jackson, D. D., 2000, Probabilistic forecasting of earthquakes: Geophysical Journal International, v. 143, p 438-453.
- [4] Moustra, M., Avraamides, M., and Christodoulou, C., 2011, Artificial neural networks for earthquake prediction using time series magnitude data or seismic electric signals: Expert Systems with Applications, v. 38, p. 15032 15039.
- [5] Wang, Q., Guo, Y., Yu, L., and Li, P. 2017, Earthquake prediction based on spatio-temporal data mining: an LSTM network approach: IEEE Transactions on Emerging Topics in Computing.
-
[6]
Lipski, M., Argueta, C. L., Saunders, M. D., 2017,
Earthquake prediction using deep learning: Proceedings of Modeling Complex Systems
, University of Guelph, 4 p. NCEDC, 2014, Northern California Earthquake Data Center. UC Berkeley Seismological Laboratory. Dataset. doi:10.7932/NCEDC. - [7] Allen, R. V., 1978, Automatic earthquake recognition and timing from single traces: Bulletin of the Seismological Society of America, v. 68, p 1521-1532.
- [8] Satriano, C., Wu, Y. M., Zollo, A., and Kanamori, H., 2011, Earthquake early warning: Concepts, methods, and physical grounds: Soil Dynamics and Earthquake Engineering, v. 31, p. 106-118.
- [9] Yoon, C. E., O’Reilly, 0., Bergen, K. J., and Beroza, G. C., 2015, Earthquake detection through computationally efficient similarity search: Science Advances, 13 p.
- [10] Joswig, M., 1990, Pattern recognition for earthquake detection: Bulletin of the Seismological Society of America, v. 80, p. 170 - 186.
- [11] Perol, T., Gharbi, M., and Denolle, M., 2018, Convolutional neural network for earthquake detection and location: Science Advances, 8 p.
- [12] Scholz, C. H., Sykes, L. R., and Aggarwal, Y. P., 1973, Earthquake prediction: A Physical Basis: Science, v. 181, p. 803-810.
- [13] Allegre, C. J., Le Mouel, J. L., and Provost, A., 1982, Scaling rules in rock fracture and possible implications for earthquake prediction: Nature, v. 297, p. 47 - 49.
- [14] Obara, K., Hirose, H., Yamamizu, F. & Kasahara, K. Episodic slow slip events accompanied by non-volcanic tremors in southwest japan subduction zone. Geophys. Res. Lett. 31, L23602 (2004).
-
[15]
Mousavi, S. M., Zhu, W., Ellsworth, W. & Beroza, G.
Unsupervised clustering of seismic signals using deep convolutional autoencoders.
IEEE Geosci. Remote Sens. Lett. 16, 1693–1697 (2019). -
[16]
Köhler, A., Ohrnberger, M. & Scherbaum, F.
Unsupervised pattern recognition in continuous seismic wavefield records using self-organizing maps.
Geophys. J. Int. 182, 1619–1630 (2010). - [17] Rouet-Leduc, B. et al. Machine learning predicts laboratory earthquakes. Geophys. Res. Lett. 44, 9276–9282 (2017).
- [18] Esposito, A. et al. Unsupervised neural analysis of very-long-period events at stromboli volcano using the self-organizing maps. Bull. Seismol. Soc. Am. 98, 2449–2459 (2008).
- [19] Unglert, K. & Jellinek, A. Feasibility study of spectral pattern recognition reveals distinct classes of volcanic tremor. J. Volcanol. Geotherm. Res. 336, 219–244 (2017).
- [20] Hammer, C., Ohrnberger, M. & Faeh, D. Classifying seismic waveforms from scratch: a case study in the alpine environment. Geophys. J. Int. 192, 425–439 (2012).
- [21] Soubestre, J. et al. Network-based detection and classification of seismovolcanic tremors: example from the klyuchevskoy volcanic group in kamchatka. J. Geophys. Res.: Solid Earth 123, 564–582 (2018).
-
[22]
Beyreuther, M., Hammer, C., Wassermann, J., Ohrnberger, M. & Megies, T.
Constructing a hidden markov model based earthquake detector: application to induced seismicity.
Geophys. J. Int. 189, 602–610 (2012). - [23] Holtzman, B. K., Paté, A., Paisley, J., Waldhauser, F. & Repetto, D. Machine learning reveals cyclic changes in seismic source spectra in geysers geothermal field. Sci. Adv. 4, eaao2929 (2018)
- [24] Mousavi, S. M., Zhu, W., Ellsworth, W. & Beroza, G. Unsupervised clustering of seismic signals using deep convolutional autoencoders. IEEE Geosci. Remote Sens. Lett. 16, 1693–1697 (2019).
- [25] Pedersen, Paul. “The Mel Scale.” Journal of Music Theory, vol. 9, no. 2, [Duke University Press, Yale University Department of Music], 1965, pp. 295–308, https://doi.org/10.2307/843164.
- [26] Bhargava, Neeti & Katiyar, V. & Sharma, Mukat & Pradhan, Pragnya. (2009). Earthquake Prediction through Animal Behavior: A Review. Indian Journal of Biomechanics: Special Issue. 7–8.
-
[27]
Y. LeCun et al.,
"Backpropagation Applied to Handwritten Zip Code Recognition," in Neural Computation
, vol. 1, no. 4, pp. 541-551, Dec. 1989, doi: 10.1162/neco.1989.1.4.541. - [28] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. “Deep Residual Learning for Image Recognition”, arXiv, 2015
- [29] Reynolds, D. Gaussian mixture models. in Encyclopedia of Biometrics (eds Li, S. Z., Jain, A.) 827–832 (Springer, Boston, MA, 2009)
- [30] Poli, P. Creep and slip: seismic precursors to the nuugaatsiaq landslide (greenland). Geophys. Res. Lett. 44, 8832–8836 (2017).
- [31] Bell, A. F. Predictability of landslide timing from quasi-periodic precursory earthquakes. Geophys. Res. Lett. 45, 1860–1869 (2018).