1 Introduction
Damage due to earthquakes poses a threat to humans worldwide. Seismic hazard analysis is used to estimate the possible ground motion at a given location in a given period based on historical earthquake data and decay of ground motion intensity with distance. However, current approaches to seismic hazard analysis are largely empirical and may not capture the full range of ground shaking in future large earthquakes due to a lack of sufficient historical geological data. This leads to large uncertainties in hazard estimates. Ideally, this lack of data can be mitigated by employing physical models that supplement existing data with additional scenario events that can quantify the expected variability of ground shaking to provide a robust estimate of hazard and risk.
Earthquake faults are incredibly complex systems that span a vast range of length and time scales, making it challenging to construct physical models that resolve all of the relevant physics. Information on the state of stress of the faults can be obtained from past earthquake focal plane mechanisms, geologic observations, or direct wellbores or drill holes (Zoback, 2010)
. However, these stress analysis techniques typically neither constrain the full stress tensor nor cover a broad range of locations and depths. Therefore, direct
in situ measurements of the stresses and displacements during rupture are rarely available. Even if the stress state and microscopic physics governing earthquake slip are known with certainty, multiscale modeling of earthquakes poses a vast computational challenge due to the range of length and timescales involved. Due to these limitations, physical modeling is not routinely used directly in estimating earthquake hazard. However, physicsbased approaches like dynamic earthquake rupture simulations are frequently suggested as a way that such physical constraints could be incorporated into seismic hazard estimation (Harris, 1998; Harris and Day, 1999; Graves et al., 2011; Olsen et al., 2009; de la Puente et al., 2009).Dynamically simulating earthquake rupture is challenging due to uncertainty regarding the underlying physics of earthquake slip. The stress conditions and frictional properties of faults are not well constrained (Duan and Oglesby, 2006; Peyrat et al., 2001; Ripperger et al., 2008; Kame et al., 2003) although they, together with fault geometry, control the rupture process and determine the dynamics of slip as well as the resulting ground motions. Because earthquake rupture is a highly nonlinear process, determining parameter values is often done by making simplifying assumptions or taking a trial and error approach, which usually incurs expensive computational and numerical costs (Douilly et al., 2015; Ripperger et al., 2008; Peyrat et al., 2001). Therefore, this limits the applicability of the simulations, as due to the high computational expense they cannot be easily integrated into other calculations such as inversions or seismic hazard analysis.
While rupture simulations compute the slip and ground motions with a high level of detail, seismic hazard analysis usually only requires more general characteristics of an earthquake such as moment magnitude or peak ground velocity
(Petersen et al., 2014). Thus, rupture simulations may only need to approximate such characteristics to be useful in the hazard analysis. Machine learning is a promising approach for reducing the computational cost involved in estimating such attributes of complex data sets. Many recent studies show that a machine learning approach can be beneficial in a variety of applications including seismic event detection, hazard analysis, and fault detection from unprocessed seismic data. For example, RouetLeduc et al. (2017) used a random forestbased algorithm to identify signals from the laboratorygenerated acoustic measurements with predictive power. Their model was able to accurately forecast future failure events using only a window of data much shorter than the time between events. Perol et al. (2018)developed a convolutional neural network algorithm
ConvNetQuake to detect earthquakes. ConvNetQuake showed the ability to identify 20 times more earthquakes accurately than were contained in the catalog listed by the Oklahoma Geological Survey. Last et al. (2016) explored machine learning methods to predict the largest possible magnitude of an earthquake likely to occur next year based on the previously recorded seismic events in Israel and its neighboring countries. The model was able to predict the maximum earthquake magnitude with 69.8 AUC (Area Under the Curve) accuracy. ArayaPolo et al. (2017) developed a neural network based model that can identify faults from raw seismic data bypassing the expensive multistep seismic processing, which is a significant promise in hydrocarbon exploration. These models illustrate the variety of ways that machine learning can take advantage of geophysical data to help us understand the complex structure and dynamics of the earth’s crust. Another recent application of machine learning algorithm is to predict ground motion. Paolucci et al. (2018) used a neural network to predict broadband earthquake ground motions from 3D physicsbased numerical simulations. The authors showed that their algorithm could reproduce the highfrequency shaking based on the lowfrequency simulation results, illustrating how physicsbased models might be integrated into hazard modeling.In this paper, we describe a workflow for using machine learning to predict if an earthquake can break a fault with geometric heterogeneity, a complex problem that is dependent on geometry, stress, and fault friction coupled to elastic wave propagation. We start by presenting the data generated from dynamic rupture simulations and normalizing the data (zero mean and unit standard deviation) for training models. Then we examine two models built from the random forest and neural network and trained by a large number of earthquake rupture simulations. Finally, we discuss the potential to use machine learning models in predicting real surface fault rupture and estimating seismic hazard.
2 Rupture simulations and data preprocessing
We produce a set of 2,000 rupture simulations based on the geometry illustrated in Fig. 1 using fdfault (Daub, 2017) a finite difference code for numerical simulation of elastodynamic fracture and friction problems. A dynamic rupture model solves the elastodynamic wave equation coupled to a friction law describing the failure process. The simulation dynamically determines fault slip based on the initial stress conditions, the elastodynamic wave equations, and frictional failure on that fault.
The fault in the simulation is planar, with a Gaussian geometric heterogeneity at the center of the fault. Rupture is nucleated 10 km to the left of the barrier and propagates from the hypocenter towards the barrier. The rupture on the fault is governed by the linear slipweakening law (3). The fault starts to break when the shear stress () exceeds the peak strength , where and are the static friction coefficient and normal stress, respectively. Over a critical slip distance , the friction coefficient reduces linearly to constant dynamic friction .
Fault geometry, stress state, and friction are all critical considerations for whether or not a rupture propagating from left to right can break the barrier. For rightlateral slip on the fault, the fault geometry is such that the closest side of the barrier is a restraining bend, which inhibits rupture, while the far side is a releasing bend that promotes rupture. Due to the fault geometry, all three stress components influence the shear and normal tractions on the restraining and releasing bends. For example, normal stress increases around the restraining bend depend on the bending angle while it decreases on the releasing bend, making the barrier a challenging effect to systematically analyze for rupture propagation. In this project, the halfwidth of the barrier varies between 1 and 2.1 km while the height is set in the range of 010 of its halfwidth. These values are large enough that the barrier has a nonnegligible influence on the actual shear and normal tractions. Thus the combination of the stress state with the complexity of the fault geometry makes it difficult to predict the ability of the rupture to break the barrier due to these nonlinear effects.
Rupture simulations frequently use the shear over normal stress or the ratio (Andrews, 1976; Das and Aki, 1977) to characterize the ability of a rupture to propagate on a fault. The ratio is calculated as , where is the initial shear stress, and and are the peak and residual shear stresses, respectively (Fig. 3). For this particular problem, however, we find that neither metric can reliably predict if the rupture can break the barrier for a given set of parameters. We find that the ratio does offer some utility as a discriminant for rupture, but only in the sense that large values of S are more likely to arrest. Smaller values of the ratio traditionally indicate a rupture that is closer to failure, but we find that for this particular problem small ratios still arrest in many situations. For example, the Fig. 2
shows the smoothed probability density function of the
ratio for rupture arrest and propagation estimated from the training set of rupture data. From the figure, it is obvious that rupture propagation and arrest have large overlapping regions that make the Sratio a poor discriminant for this particular problem. This is because of the role of the fault geometry, combined with the out of plane normal stress component.For simple planar faults with uniform stress and strength conditions, the Sratio and shear over normal stress are indicative of the stress situation over the entire fault, and thus predictive of rupture over the whole fault. For a rough fault, on the other hand, the actual tractions on the barrier can be significantly different from those for a flat fault, and thus ruptures that would ordinarily propagate may arrest. Thus, while this might appear at the outset to be a rather simple rupture problem, it captures the essential physics of rupture propagation on more complex fault geometries and provides a useful test case. Therefore, it allows us to determine if a machine learning algorithm can predict the ability of a rupture to propagate under a given set of conditions. The 2000 simulations are divided into training (1600) and test (400) sets. 1600 training datasets are used to train and validate the model while the rest of the 400 test simulations are used to test the performance of the models. We control the following eight parameters to vary the rupture behavior in each simulation:

The geometric barrier halfwidth follows a uniform distribution between 1 and 2.1 km.

The barrier height follows a uniform distribution between 0 and 10 of the barrier halfwidth.

The fault normal stress (inplane, or IP) follows a uniform distribution between 10 and 160 MPa (negative indicates compression hence smaller means higher stress). Note that while this sets the normal stress on the flat part of the fault, while the normal stress on the barrier varies spatially due to the geometry.

The other normal stress component, which we term the outofplane (OP) normal stress, follows a uniform distribution between 25 of the fault normal stress.

The dynamic friction coefficient () follows a uniform distribution between 0.2 and 0.6.

The friction drop (static minus dynamic friction) follows a uniform distribution between 0.2 and .

The shear stress is set to be proportional to the normal stress. The proportionality coefficient is the dynamic friction coefficient plus 2 of the friction drop (), plus a random number times 13 of the friction drop. In other words, the initial stress is chosen to be in a relatively narrow range of values where it is not trivial to predict whether or not the rupture will be able to propagate. Stress is always higher than the dynamic strength, but never very close to the static strength. This small range reflects the idea that earthquakes occur on faults once the shear stress on the fault reaches the minimum stress needed to rupture the entire fault, so such values should be representative of realistic values of the initial shear stress on natural faults.

The slip weakening distance (
) follows a normal distribution centered at 0.4 m with a standard deviation of 0.05 m.
Height (Km)  Halfwidth (Km)  OP Stress (MPa)  Shear stress (MPa)  IP Stress (MPa)  Fric. drop  Dyn fric.  (m)  Output 

0.104  1.146  102.509  58.619  117.766  0.484  0.217  0.296  0 
0.088  1.304  136.062  51.391  126.715  0.346  0.448  0.406  1 
0.099  1.260  117.559  40.972  115.529  0.293  0.502  0.389  1 
0.116  1.191  128.169  94.021  157.830  0.572  0.203  0.409  0 
0.018  1.108  106.350  29.149  101.379  0.253  0.325  0.398  1 
Five sample rupture examples before preprocessing are listed in Table 1 and shows the different parameter values with the corresponding output, 0 or 1. An output value of 1 means that the rupture propagated through the restraining geometric barrier while 0 indicates that the rupture arrested before reaching the center of the barrier. All the parameters of the dataset are normalized to have zero mean and unit standard deviation. Note that this puts the problem in the form of a standard classification problem, for which many algorithms have been developed (Carbonell et al., 1983; Michie et al., 1996).
3 Classification Stategy
We use two different classification algorithms to create predictive models to determine if a rupture can break the fault in our model. One is the random forest decision tree algorithm
(Breiman, 2001) and the other is the artificial neural network (Rosenblatt, 1958; Rumelhart et al., 1988). These two algorithms have been selected due to their flexibility in handling a range of classification problems with a large feature space as well as the fact that their estimated parameter values can provide insight into the underlying dynamics. Another important aspect is that, while many learning algorithms such as nearest neighbor algorithms may work well for a particular problem, they may not necessarily tell us much about the underlying physics.Since our training dataset is imbalanced (65 of ruptures arrest while 35 of ruptures propagate), costsensitive learning (Chen et al., 2004) is beneficial in many situations to make our models more suitable to learn from imbalanced data. Therefore, imposing a strong penalty on misclassification of minority class can improve performance. We assign a high weight (i.e., higher misclassification cost) to the minority class (rupture propagation) using the ‘balanced‘ class strategy, where the class weights are given by the number of total samples / (number of classes number of samples in each class). The above formula is used to calculate the weight for rupture arrest and propagation classes, which are 0.768 and 1.431 respectively. These parameters are used to weight the training examples in computing the cost function, which is then minimized when training the model.
4 Evalution Metrics
Since the data set is imbalanced, we use recall, precision, F1 score, and accuracy as the evaluation criteria. Accuracy is not always a good indicator of model performance in these cases, as poor performance on the rare class can be masked by good performance on the much larger numbers of the more common class. Recall measures the proportion of the true predicted positive examples among the total positive cases. The recall is a metric to evaluate the model performance on all positive samples. Precision, on the other hand, measures the proportion of the true positive examples out of the total number of predicted positives. Precision helps us to analyze the model quality regarding the positive predictions. The F1 score is an evaluation metric that combines precision and recall by taking the harmonic mean of them. The F1 score is an auxiliary evaluation metric that combines precision and recall in a way where the model must exhibit high precision and recall to obtain a good F1 score. F1 score is calculated as: 2
(precision recall) / (precision + recall).We also examine the confusion matrix to evaluate model performance. The matrix gives detailed information on true positive (TP), true negative (TN), false positive (FP), and false negative (FN) predictions. TP and TN are the examples where the rupture propagates or arrests, respectively, and the models predict them correctly. The FPs are situations where a rupture was predicted to propagate but actually arrested, and the FNs are examples where an arrest was predicted, but the rupture propagated. All of the metrics provide us with a better understanding of model performance.
5 Random forest classifier
Ensemble machine learning algorithms have been widely used because of their excellent accuracy, robustness, and ease of use. The method combines multiple independent learning algorithms (weak learners) to achieve better predictive performance than could be obtained from any of the single learners alone (Opitz and Maclin, 1999; Polikar, 2006; Rokach, 2010). The ensemble makes a final prediction based on a majority voting from the individual components of the group. There are two types of ensembles commonly used: bagging and boosting. Bagging (Breiman, 1996) involves having each learner in the ensemble vote with equal weight while in boosting (Dietterich, 2000)
, the prediction is made by taking a weighted vote of the predictions. Weights are proportional to each learner’s accuracy on its training set. Although the ensemble methods can often perform better than a single learner, it requires increased storage, extensive computation, and has a complex structure to interpret due to the involvement of multiple classifiers in decision making.
A fast algorithm such as decision trees is commonly used in ensemble methods to reduce the computational cost. Fig. 4 shows a schematic of the ensembles of multiple single decision trees. Random forest (RF) is a bagging type of ensemble classifier (Breiman, 2001) that uses many single trees to make predictions. RF can be used for both classification and regression. In this project, we use the ID3 algorithm (Quinlan, 1986) that uses information gain as the splitting criteria for a single decision tree.
For the random forest classification, we use the scikitlearn (Pedregosa et al., 2011) library for implementing the algorithm in Python. Three parameters are optimized to improve the model performance: (1) maximum depth: the maximum depth of the tree (2) minimum samples split: the minimum number of samples required to split an internal node and (3) number of estimators: the number of trees in the forest. To find the best parameters, we perform a grid search over the specified parameter values using the crossvalidation technique to assess model performance. The best parameter value of maximum depth is 10, minimum samples split is 40, and the number of estimators is 20. Crossvalidation involves dividing the training data into two parts: the training data is used to estimate the model parameters and the validation data to determine how well the model generalizes to unseen data. By optimizing the model performance on the crossvalidation data, we select the values of these model parameters.
5.1 Important features
The random forest algorithm allows the evaluation of the important features of a classification task. The most important features tend to occur higher in the decision trees, so we can see in this manner in which features are most predictive of rupture or arrest. Fig. 5 shows the weighting of important features by percentage. Stress components, dynamic friction, and friction drop are the most important features to determine if an earthquake can break through a fault. These features account for 86 of the total predictive power. This result is consistent with many studies that have shown that earthquake rupture initiation, its propagation, and termination are highly sensitive to stress and friction properties (Duan and Oglesby, 2007; Peyrat et al., 2001; Ripperger et al., 2008; Kame et al., 2003).
OP normal stress alone has the greatest (21.34) classification contribution followed by dynamic friction (18.80). IP normal and shear stress components and friction drop have almost equal importance (15). On the other hand, geometric features height and halfwidth have a less significant effect on the predictive power, indicating that the fault geometry is less important. This is likely because the stress and friction are important both with and without a geometrical heterogeneity on the fault, and regardless of how strong the heterogeneous fault geometry is, as long as it is loaded sufficiently close to failure it will still be able to propagate. Additionally, the influence of the OP normal stress is due entirely to the varying geometry, and the algorithm may find that the fault geometry varies less widely across the simulations, making the OP normal stress the variable that gives the most reliable predictions. The OP normal stress also varies more widely than the other stress parameters, so this may exhibit a stronger influence when constructing predictive models. This does not mean that fault geometry is unimportant, as it is well known that fault roughness influences the residual stress heterogeneities on the fault, which has a significant effect on earthquake recurrence probability (Zielke et al., 2017). Instead, this result suggests that if the geometry does not vary too much, the stress and frictional conditions are more useful to make predictions about the rupture process. Simulations using more complex fault geometries such as fractal faults may show a stronger sensitivity to features based on the fault geometry when compared to the simple barrier considered here.
5.2 Classification result
We use 400 test data to estimate the generalization error of the random forest model. Table 2 shows the confusion matrix that contains information about actual and predicted classifications performed by the model. The RF classifier accurately predicts 218 test cases as TN. Similarly, 107 example ruptures are correctly identified as TP. On the other hand, 21 and 54 test cases were inaccurately classified as FP or FN, respectively.
Negative  Positive  

Negative (Rupture arrested)  TN = 221  FN = 51 
Positive (Rupture propagated)  FP = 24  TP = 104 
Class  Precision  Recall  F1 score  Number 

Negative (Rupture arrested)  0.91  0.80  0.85  272 
Positive (Rupture propagated)  0.66  0.84  0.74  128 
Average/Total  0.83  0.81  0.82  400 
Table 3 shows the classification result using three evaluation metrics. The high score of recall (0.80) for the positive class and the high precision (0.91) for the negative class indicates that the number of true positives (128) misclassified as negative is small (21). A slightly lower F1 score for the positive class indicates that the model performance of the rupture propagation class is not as good as the performance of the rupture arrest class. Adding more rupture propagation data may improve the positive class performance by helping the algorithm distinguish more subtle patterns in the dataset.
6 Artificial Neural Network
Artificial neural networks (ANN) are inspired by how neurons are connected in the brain
(Rosenblatt, 1958). A neural network consists of several units interconnected and organized in layers. The individual units are also known as neurons. The neurons perform a weighted sum of its input, so its output can fit complex functions by combining a large number of weights. A layer can be connected to an arbitrary number of further hidden layers of arbitrary size before being combined in the output layer. Hidden layers introduce complexity to the model, and given sufficient data to train the model, these additional layers can improve performance. However, increasing the number of hidden layers does not always help. Additional hidden layers can lead to overfitting (Hinton et al., 2012; Lawrence and Giles, 2000; Lawrence et al., 1997) where the network will memorize the training data, but generalize poorly to new data. Therefore, selecting the number of layers and units in each layer is one of the challenges of constructing ANNs that perform well on complex data sets.Figure. 6 illustrates the schematic diagram of the neural network topology we use in this work. The network has one hidden layer with 12 units. The eight input parameters are mapped to these 12 units, producing a
weight matrix for the model. As each input enters a unit, the output of the previous unit is multiplied by its weight. The unit then adds all these new inputs, which determines the output value of the intermediate unit. We then apply a nonlinear activation function ReLu
(Hahnloser et al., 2000) to the output weight, which passes all the values greater than zero and set any negative output to be zero. Finally, the hidden layers combine the 12 outputs with the output layer and use the resulting weight to make predictions. We use keras (Chollet et al., 2015), a Python deep learning library, to build the model.
In the ANN, 480 training examples (30
of the training set) are used for validating the model. To prevent overfitting (high variance, therefore, poor performance on unseen examples), we use several strategies: (1) crossvalidation (a technique to assess model performance) (2) L2 regularization that penalizes large weights and effectively reduces the variance of the model, hence prevents overfitting and (3) early stopping technique that stops the training when the difference between training and validation error start increasing rather than decreasing. In our case, if the validation accuracy does not improve in 20 consecutive training steps, the early stopping technique halts the training.
6.1 Network parameters
The weights learned by the neural network allow us to gain some insight into the combinations of input parameters that are most predictive of the ability of the rupture to propagate. To visualize the weights, we have constructed the weights versus neural units matrix plot. This is illustrated in Fig. 7. The left panel shows the model weights mapping the eight inputs (horizontal) to the twelve hidden units (vertical). The right panel shows the weights that combine the hidden units into the one output unit on the right. The color scale indicates the range of weights. The weights in the model range from negative values to positive values which indicate inhibitory or excitatory influences.
If the output layer weight is positive for a particular unit, then the parameter combination is a good predictor of rupture propagation, while a negative weight indicates the parameter combination is a good predictor of rupture arrest. Similarly, weights mapping the inputs to the hidden units that are positive indicators signify that large values of that input unit favor rupture. If the output unit has a negative weight, then large weights for a hidden unit indicate that large values of that parameter are predictive of the arrest. The weights provide insight into which parameter combinations are most predictive of rupture.
All the weights related to height, halfwidth and slipweakening distance have relatively small weights when compared to the stress and friction parameters (Fig. 7). This means these three parameters have reduced the influence on the final prediction while the remaining five parameters have a much stronger contribution to the final predictions. We note the consistency with the random forest algorithm that gave the same importance rank for the parameters; that is, height, halfwidth, and are the least significant.
Parameters illustrated in Fig. 7 also provide insights into the parameter combinations and their influence on determining the rupture propagation. Interestingly, we find that all of the units with large output might show similar patterns for the combined parameter values. For example, unit4 has a large negative output weight. The unit has a large negative weight of shear stress, friction drop, and slipweakening distance while height, halfwidth, OP, IP normal stress, dynamic friction coefficient, and friction drop have positive weight. This indicates that if a fault has low shear stress, and low friction drop, but high compressive OP, IP normal stress, high dynamic friction coefficient, height, and halfwidth then it is likely that rupture would not propagate but arrest. On the other hand, unit8 has a substantial positive output weight. The OP, IP normal and dynamic friction coefficient has a large negative weight while friction drop, slipweakening distance, shear stress have high positive weight. Note that these are essentially the opposite values from those in unit4, indicating a single underlying pattern in the data. Therefore, based on our input data the model has determined exactly how to best combine the various input parameters in a more sophisticated way than simply looking at shear over normal stress or the parameter. Additionally, the model provides a method for integrating the effect of the out of plane normal stress, which is known to be important for complex faults but is not accounted for quantitatively when examining the Sfactor.
Our model is a simplification, and there is just one barrier with one shape and size, rather than many barriers with a much broader range of shapes and sizes. Although the fault geometry (halfwidth and height) has a relatively small influence in determining rupture, variations in barrier size still play an important role for local stress state on the fault surface. It is likely that when considering the full fractal fault, the fault geometry likely plays a more pronounced role as there are many more places where the geometry could cause the rupture to arrest. Chester and Chester (2000) used an analytical model of a wavy friction fault and found that the fault roughness has the most significant impact in determining the orientation and magnitude of principal stress. In a nonplanar fault geometry, narrower barriers have a more substantial stress perturbation in the sense that angle near the bend is sharper for the barriers of the same height, so the variation in traction at the releasing and restraining bend (Fig. 1) is more prominent. Whereas if the barrier is broad, the angle around the bend changes gradually, and the stress perturbation at the restraining and releasing bend is less noticeable. Although the fault geometry is not as predictive as the stress and friction, the geometry still does play a role in understanding the rupture process, and the methods developed here give a straightforward way to account for them quantitatively.
We confirm that the parameter combinations found by the ANN approach are robust by repeating the fitting procedure. We build an additional 15 neural network models with the same training data set, but different initial weights to see if the models find the same features that are predictive of rupture. The average testing accuracy of the models is 83 which is very similar to the results shown in Table 5. Although the final weights vary slightly from model to model, they exhibit high correlation when we sort them in ascending order based on their output weight. Fig. 8 shows the determination of coefficient (R2 score) among the parameters (in ascending order) learned by the models. Model11 has the smallest correlation with the other model while models10, and 12 have high correlations. Even though model11 has the lowest correlation coefficient of 0.71 with model1, it is high enough that it still contains similar patterns as the other models. The highly correlated weights indicate that the models are picking up on consistent features regardless of the random way that the model is initialized.
6.2 Classification result
Negative  Positive  

Negative (Rupture arrested)  TN = 223  FN = 49 
Positive (Rupture propagated)  FP = 26  TP = 102 
Class  Precision  Recall  F1 score  support 

Negative (Rupture arrested)  0.90  0.82  0.86  272 
Positive (Rupture propagated)  0.68  0.80  0.73  128 
Average/Total  0.83  0.81  0.82  400 
We use the same 400 test data to validate the ANN model. Table 4 shows the confusion matrix (CM) that contains information about actual and predicted classifications. Interestingly the CM of the RF model is essentially the same as that of the ANN model, with two additional cases correctly predicted relative to the RF. The classification report (table 5) produced by the ANN model also shows nearly the same precision, recall, and F1 scores as the RF model. The overall testing accuracy of the model is . This result suggests that several common classification algorithms are capable of accurately predicting the rupture results to the same level given the same input data. This also suggests that the misclassified test data are due to the number of training examples provided to the algorithms.
7 Misclassification analysis
Although the models generally provide robust predictions on the data, it performs less well on a number of false positives where rupture was predicted to propagate but arrested instead. To understand the model performance on the false positives and false negatives, we reduced all the training data into 2D using Principal component analysis (PCA). PCA is mathematically defined as an orthogonal linear transformation that transforms multidimensional data (eight dimensions in our case) to a lower dimension. Fig.
9(a) and (b) show the scatter plot of the reduced training data set. The horizontal and vertical axes represent PCA component1 and 2, respectively. Most of the rupture propagation (true positives) and arrest (true negatives) data are distinguishable in the PCA plots for the RF (Fig.9 (a) and ANN (Fig.9 (b). Fig. 9(c) shows the normalized mean value of each feature of TP, FP, TN, and FN. This gives some insight that the misclassified examples tend to have parameter values that are on the outside edge of the averages of those that are correctly predicted. False positives tend to be located in the transition between true positives and true negatives. Some of the FP also overlap with true positives. Interestingly, the false positives have a similar mean value of all features when compared to the true positives illustrated in Fig. 9(c). Likewise, the false negatives have similar mean feature values when compared to the true negatives. Most of the false negatives are located in the rupture arrest region. These examples likely were misclassified because we have insufficient data, and adding more data (especially more ruptures that propagated) could improve model performance. Another possible way to solve the problem is to use the Bayesian neural network (BNN) (Gal, 2016; Mullachery et al., 2018). Future efforts will focus on ways to improve model performance on the misclassified examples.8 Discussion
In this study, we develop machine learning models using the random forest and artificial neural network algorithms to predict earthquake rupture on a geometrically complex fault. The models provide a robust way to learn the parameter combinations responsible for rupture propagation. Because of the complicated fault geometry, nonlinear rupture process, and unknown material properties, it is difficult to predict if an earthquake can break through a fault (Ohnaka et al., 1986; Kanamori et al., 1993; Marone, 1998; Abercrombie and Rice, 2005; Warren and Shearer, 2006). Moreover, discontinuous particle velocities across fault zones and tractions acting on the fault are governed by nonlinear friction laws, and obtaining these parameters in situ is challenging (McGarr and Gay, 1978; Saffer and Marone, 2003; Saffer et al., 2001; Kohlstedt et al., 1995). Therefore, modeling earthquake rupture with a heterogeneous fault surface and unknown material properties remains a challenging computational problem (Douilly et al., 2015; Ripperger et al., 2008; Peyrat et al., 2001). Our machine learning approach, on the other hand, provides a method to understand the rupture physics on complex fault and efficiently predict the result with high accuracy if a sufficient number of examples are provided.
Classification results produced by the ANN and RF models show that they can capture most of the underlying patterns of rupture propagation. Interestingly, both models have around 84 recall on the positive class (rupture propagation), meaning that both of the models successfully learned most of the underlying complex data patterns responsible for rupture propagation. Another exciting aspect of the ANN model is that it consistently finds the same hidden data patterns despite the various weight initializations. These underlying complex data patterns reflect the physics of the rupture propagation. For example, if a fault with geometrical heterogeneity has higher outofplane normal and shear stress and low static and dynamic friction, then it is likely that an earthquake rupture can break through the fault, and our model provides a way to evaluate this quantitatively.
These models are highly efficient in determining whether a rupture is going to break through a fault. Both of the models take a fraction of a second to predict if a rupture can propagate given eight input parameters. This is a significant improvement over running a full rupture simulation, which takes about two hours of wall clock time on eight processors. Although the model performance in predicting earthquake rupture outperforms the traditional approach such as the Sparameter (Andrews, 1976; Das and Aki, 1977) or nondimensional prestress (Bruhat et al., 2016; Kaneko and Lapusta, 2010), this approach is not meant to replace all rupture simulations, but rather to help determine parameter values. This approach might be combined with information such as a historical fault database and paleoseismic rupture areas to choose a parameter range consistent with past earthquakes in a region. The model predictions might also be incorporated into other complex calculations such as inversions or probabilistic seismic hazard analysis (PSHA). Since the models are computationally efficient, we can find a range of parameters likely to give a specific magnitude, which can be used to run some specific scenario events for estimating ground motions.
The method could also be adapted into a probabilistic model for earthquake size given a few physically relevant parameters that can account for epistemic uncertainty by robustly considering different parameter value choices. We plan to expand our results to more extensive data sets as well as more realistic geometries drawn from fault databases, and develop methods to use machine learning to generate rupture models that are based on physics and can be used in PSHA.
9 Conclusion
We use the random forest and artificial neural network algorithms to predict if a rupture can propagate through a fault with a geometric heterogeneity. We first generate 2000 dynamic earthquake rupture simulations varying stress, friction parameters and height, and halfwidth of the restraining bend of the fault. In both of the models, 400 rupture examples are used to test the model performance. Both of the models can consistently predict rupture with more than 81 accuracy. Data patterns identified by the models reflect the physics of the rupture propagation, and these patterns are robustly identified independently of how the models are initialized.
Computationally, the models are highly efficient. Once the training simulations are computed, and the machine learning algorithms are trained, the models can make a prediction within a fraction of a second. This has the potential to allow for the results of dynamic rupture simulations to be incorporated into other complex calculations such as inversions or probabilistic seismic hazard analysis, something that would not be ordinarily possible. The method can also be applied to other complex rupture problems such as branching faults, fault stepovers, and other complex heterogeneities where the physics of earthquake rupture propagation is not fully understood. Machine learning provides a new way of approaching this complex, nonlinear problem, and helps scientists understand how the underlying geophysical parameters are related to the resulting slip and ground motions, thus helping us constrain future seismic hazard and risk.
10 Acknowledgement
This research was supported by the Southern California Earthquake Center (Contribution No. 7759). SCEC is funded by NSF Cooperative Agreement EAR1033462 USGS Cooperative Agreement G12AC20038. Also, thank CERI and University of Memphis HPC for providing support and computational resources. The source code for all graphs, plots, and models can be found in the Github repository: https://github.com/msahamed/machine_learning_earthquake_rupture.
References
 Abercrombie and Rice (2005) Abercrombie, R. E., and J. R. Rice (2005), Can observations of earthquake scaling constrain slip weakening?, Geophysical Journal International, 162(2), 406–424.
 Ahamed et al. (2017) Ahamed, S., E. Daub, and E. Choi (2017), Coupling longterm tectonic loading and shortterm earthquake slip., doi:10.6084/m9.figshare.4604344.v4.
 Andrews (1976) Andrews, D. (1976), Rupture velocity of plane strain shear cracks, Journal of Geophysical Research, 81(32), 5679–5687.
 ArayaPolo et al. (2017) ArayaPolo, M., T. Dahlke, C. Frogner, C. Zhang, T. Poggio, and D. Hohl (2017), Automated fault detection without seismic processing, The Leading Edge.
 Breiman (1996) Breiman, L. (1996), Bagging predictors, Machine learning, 24(2), 123–140.
 Breiman (2001) Breiman, L. (2001), Random forests, Machine learning, 45(1), 5–32.
 Bruhat et al. (2016) Bruhat, L., Z. Fang, and E. M. Dunham (2016), Rupture complexity and the supershear transition on rough faults, Journal of Geophysical Research: Solid Earth, 121(1), 210–224.
 Carbonell et al. (1983) Carbonell, J. G., R. S. Michalski, and T. M. Mitchell (1983), An overview of machine learning, in Machine learning, pp. 3–23, Springer.
 Chen et al. (2004) Chen, C., A. Liaw, and L. Breiman (2004), Using random forest to learn imbalanced data, University of California, Berkeley, 110, 1–12.
 Chester and Chester (2000) Chester, F. M., and J. S. Chester (2000), Stress and deformation along wavy frictional faults, Journal of Geophysical Research: Solid Earth, 105(B10), 23,421–23,430.
 Chollet et al. (2015) Chollet, F., et al. (2015), Keras, https://github.com/fchollet/keras.
 Das and Aki (1977) Das, S., and K. Aki (1977), A numerical study of twodimensional spontaneous rupture propagation, Geophysical journal international, 50(3), 643–668.
 Daub (2017) Daub, E. G. (2017), Finite difference code for earthquake faulting, https://github.com/egdaub/fdfault/commit/b74a11b71a790e4457818827a94b4b8d3aee7662.
 de la Puente et al. (2009) de la Puente, J., J.P. Ampuero, and M. Käser (2009), Dynamic rupture modeling on unstructured meshes using a discontinuous galerkin method, Journal of Geophysical Research: Solid Earth, 114(B10).
 Dietterich (2000) Dietterich, T. G. (2000), An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization, Machine learning, 40(2), 139–157.
 Douilly et al. (2015) Douilly, R., H. Aochi, E. Calais, and A. Freed (2015), 3D dynamic rupture simulations across interacting faults: The Mw 7.0, 2010, Haiti earthquake, Journal of Geophysical Research: Solid Earth.
 Duan and Oglesby (2006) Duan, B., and D. D. Oglesby (2006), Heterogeneous fault stresses from previous earthquakes and the effect on dynamics of parallel strikeslip faults, Journal of Geophysical Research: Solid Earth, 111(5), doi:10.1029/2005JB004138.
 Duan and Oglesby (2007) Duan, B., and D. D. Oglesby (2007), Nonuniform prestress from prior earthquakes and the effect on dynamics of branched fault systems, Journal of Geophysical Research: Solid Earth (1978–2012), 112(B5).
 Gal (2016) Gal, Y. (2016), Uncertainty in deep learning, University of Cambridge.
 Graves et al. (2011) Graves, R., T. H. Jordan, S. Callaghan, E. Deelman, E. Field, G. Juve, C. Kesselman, P. Maechling, G. Mehta, K. Milner, et al. (2011), Cybershake: A physicsbased seismic hazard model for southern California, Pure and Applied Geophysics, 168(34), 367–381.
 Hahnloser et al. (2000) Hahnloser, R. H., R. Sarpeshkar, M. A. Mahowald, R. J. Douglas, and H. S. Seung (2000), Digital selection and analogue amplification coexist in a cortexinspired silicon circuit, Nature, 405(6789), 947.
 Harris (1998) Harris, R. A. (1998), Introduction to special section: Stress triggers, stress shadows, and implications for seismic hazard, Journal of Geophysical Research: Solid Earth, 103(B10), 24,347–24,358.
 Harris and Day (1999) Harris, R. A., and S. M. Day (1999), Dynamic 3D simulations of earthquakes on en echelon faults, Geophysical Research Letters, 26(14), 2089–2092.
 Hinton et al. (2012) Hinton, G. E., N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov (2012), Improving neural networks by preventing coadaptation of feature detectors, arXiv preprint arXiv:1207.0580.
 Kame et al. (2003) Kame, N., J. R. Rice, and R. Dmowska (2003), Effects of prestress state and rupture velocity on dynamic fault branching, Journal of Geophysical Research: Solid Earth (1978–2012), 108(B5).
 Kanamori et al. (1993) Kanamori, H., J. Mori, E. Hauksson, T. H. Heaton, L. K. Hutton, and L. M. Jones (1993), Determination of earthquake energy release and ML using TERRAscope, Bulletin of the Seismological Society of America, 83(2), 330–346.
 Kaneko and Lapusta (2010) Kaneko, Y., and N. Lapusta (2010), Supershear transition due to a free surface in 3D simulations of spontaneous dynamic rupture on vertical strikeslip faults, Tectonophysics, 493(34), 272–284.
 Kohlstedt et al. (1995) Kohlstedt, D., B. Evans, and S. Mackwell (1995), Strength of the lithosphere: Constraints imposed by laboratory experiments, Journal of Geophysical Research: Solid Earth, 100(B9), 17,587–17,602.
 Last et al. (2016) Last, M., N. Rabinowitz, and G. Leonard (2016), Predicting the maximum earthquake magnitude from seismic data in Israel and its neighboring countries, PloS one, 11(1), e0146,101.

Lawrence and Giles (2000)
Lawrence, S., and C. L. Giles (2000), Overfitting and neural networks: conjugate gradient and backpropagation, in
Neural Networks, 2000. IJCNN 2000, Proceedings of the IEEEINNSENNS International Joint Conference on, vol. 1, pp. 114–119, IEEE.  Lawrence et al. (1997) Lawrence, S., C. L. Giles, and A. C. Tsoi (1997), Lessons in neural network training: Overfitting may be harder than expected, in AAAI/IAAI, pp. 540–545.
 Marone (1998) Marone, C. (1998), Laboratoryderived friction laws and their application to seismic faulting, Annual Review of Earth and Planetary Sciences, 26(1), 643–696.
 McGarr and Gay (1978) McGarr, A., and N. Gay (1978), State of stress in the Earth’s crust, Annual Review of Earth and Planetary Sciences, 6(1), 405–436.

Michie et al. (1996)
Michie, D., D. J. Spiegelhalter, and C. C. Taylor (1996), Machine learning, neural and statistical classification,
Journal of the American Statistical Association, 91(433), 436–438.  Mullachery et al. (2018) Mullachery, V., A. Khera, and A. Husain (2018), Bayesian Neural Networks, ArXiv eprints.
 Ohnaka et al. (1986) Ohnaka, M., Y. Kuwahara, K. Yamamoto, and T. Hirasawa (1986), Dynamic breakdown processes and the generating mechanism for highfrequency elastic radiation during stickslip instabilities, in Earthquake Source Mechanics, vol. 37, pp. 13–24, AGU Washington, DC.
 Olsen et al. (2009) Olsen, K., S. Day, L. Dalguer, J. Mayhew, Y. Cui, J. Zhu, V. CruzAtienza, D. Roten, P. Maechling, T. Jordan, et al. (2009), Shakeout: Ground motion estimates using an ensemble of large earthquakes on the southern San Andreas fault with spontaneous rupture propagation, Geophysical Research Letters, 36(4).
 Opitz and Maclin (1999) Opitz, D. W., and R. Maclin (1999), Popular ensemble methods: An empirical study, J. Artif. Intell. Res.(JAIR), 11, 169–198.
 Paolucci et al. (2018) Paolucci, R., F. Gatti, M. Infantino, C. Smerzini, A. Güney Özcebe, and M. Stupazzini (2018), Broadband ground motions from 3D physicsbased numerical simulations using artificial neural networks, Bulletin of the Seismological Society of America, 108(3A), 1272–1286.
 Pedregosa et al. (2011) Pedregosa, F., G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay (2011), Scikitlearn: Machine Learning in Python, Journal of Machine Learning Research, 12, 2825–2830.
 Perol et al. (2018) Perol, T., M. Gharbi, and M. Denolle (2018), Convolutional neural network for earthquake detection and location, Science Advances, 4(2), e1700,578.
 Petersen et al. (2014) Petersen, M. D., M. P. Moschetti, P. M. Powers, C. S. Mueller, K. M. Haller, A. D. Frankel, Y. Zeng, S. Rezaeian, S. C. Harmsen, O. S. Boyd, N. Field, R. Chen, K. S. Rukstales, N. Luco, R. L. Wheeler, R. A. Williams, and A. H. Olsen (2014), Documentation for the 2014 update of the United States national seismic hazard maps, Tech. rep., Reston, VA, doi:10.3133/ofr20141091.
 Peyrat et al. (2001) Peyrat, S., K. Olsen, and R. Madariaga (2001), Dynamic modeling of the 1992 Landers earthquake, Journal of Geophysical Research: Solid Earth (1978–2012), 106(B11), 26,467–26,482.
 Polikar (2006) Polikar, R. (2006), Ensemble based systems in decision making, IEEE Circuits and systems magazine, 6(3), 21–45.
 Quinlan (1986) Quinlan, J. R. (1986), Induction of decision trees, Machine learning, 1(1), 81–106.
 Ripperger et al. (2008) Ripperger, J., P. Mai, and J.P. Ampuero (2008), Variability of nearfield ground motion from dynamic earthquake rupture simulations, Bulletin of the seismological society of America, 98(3), 1207–1228.
 Rokach (2010) Rokach, L. (2010), Ensemblebased classifiers, Artificial Intelligence Review, 33(1), 1–39.

Rosenblatt (1958)
Rosenblatt, F. (1958), The perceptron: A probabilistic model for information storage and organization in the brain.,
Psychological review, 65(6), 386.  RouetLeduc et al. (2017) RouetLeduc, B., C. Hulbert, N. Lubbers, K. Barros, C. J. Humphreys, and P. A. Johnson (2017), Machine learning predicts laboratory earthquakes, Geophysical Research Letters, doi:10.1002/2017GL074677, 2017GL074677.
 Rumelhart et al. (1988) Rumelhart, D. E., G. E. Hinton, R. J. Williams, et al. (1988), Learning representations by backpropagating errors, Cognitive modeling, 5(3), 1.
 Saffer and Marone (2003) Saffer, D. M., and C. Marone (2003), Comparison of smectiteand illiterich gouge frictional properties: Application to the updip limit of the seismogenic zone along subduction megathrusts, Earth and Planetary Science Letters, 215(1), 219–235.
 Saffer et al. (2001) Saffer, D. M., K. M. Frye, C. Marone, and K. Mair (2001), Laboratory results indicating complex and potentially unstable frictional behavior of smectite clay, Geophysical Research Letters, 28(12), 2297–2300.
 Warren and Shearer (2006) Warren, L. M., and P. M. Shearer (2006), Systematic determination of earthquake rupture directivity and fault planes from analysis of longperiod Pwave spectra, Geophysical Journal International, 164(1), 46–62.
 Zielke et al. (2017) Zielke, O., M. Galis, and P. M. Mai (2017), Fault roughness and strength heterogeneity control earthquake size and stress drop, Geophysical Research Letters, 44(2), 777–783.
 Zoback (2010) Zoback, M. D. (2010), Reservoir geomechanics, Cambridge University Press.
Comments
There are no comments yet.