1 Introduction
In the Seismic Probabilistic Risk Assessment (SPRA) studies performed on industrial facilities, a key point is the evaluation of fragility curves which express the failure probability of a structure, or critical components, as a function of a seismic intensity measure such as peak ground acceleration (PGA) or spectral acceleration. It should be noted that apart from the use in a SPRA framework, fragility curves can be used for making decisions regarding the choice of construction details, to improve the structural performance of installations under seismic excitations (see e.g. [ZHANG20091648, SAHA201620, PATIL201692, doi:10.1002/eqe.2586]). They can also be used to evaluate impact of ground motion characteristics (nearfault type like, broadband, etc.) on the structural vulnerability (see e.g. [doi:10.1002/eqe.2586, WangF2017]). Finally, it is worth noting that the use of fragility curves is not limited to seismic excitation, they can also be applied to other loading sources such as wind for example [QUILLIGAN2012270].
In theory, for complex structures, fragility curves have to be evaluated empirically based on a large number of mechanical analyses requiring, in most cases, nonlinear timehistory calculations including both the uncertainties inherent to the system capacity and to the seismic demand, respectively called epistemic and aleatory uncertainties [KIUREGHIAN2009105]. Nevertheless, the prohibitive computational cost induced by most of nonlinear mechanical models requires the development of numerically efficient methods to evaluate such curves from a minimal number of computations, in particular in industrial contexts.
Following the idea proposed in the early 1980’s in the framework of nuclear safety assessment [KENNEDY1980315], the lognormal parametric model has been widely used in many applications to estimate fragility curves from a reduced number of numerical calculations (see e.g. [ZHANG20091648, SAHA201620, PATIL201692, WangF2017]). Different methods can be used to determine the parameters of the lognormal model (see e.g. [ZENTNER20101614, Baker2015]), however, the question of the validity of this assumption arises. Typically, in [Mai2017] authors show that for a given structure the accuracy of the lognormal curves depends on the ground motion intensity measure, the failure criterion and the employed method for fitting the model. As shown in [doi:10.1002/eqe.2567], the class of structures considered may also have an influence on the adequacy of the lognormal model.
The question of the representativity is inevitable with the use of parametric models since, for the complex cases of interest, it is very difficult to verify their validity. To bypass this problem, the need of a numerically efficient nonparametricbased methodology (which would be accurate with a minimum number of mechanical analyses) is necessary. A way to achieve this goal consists in building a metamodel (i.e. a surrogate model of the mechanical analysis) which expresses the statistical relation between seismic inputs and structural outputs. Various metamodeling strategies have been proposed recently in the literature based on, for example, response surfaces ([PARK20141, SEO2013642]), kriging [doi:10.1002/eqe.2586]
and Artificial Neural Networks (ANNs)
[WANG2018213].The goal of this paper is twofold. First, it is to propose a simple and efficient methodology for estimating nonparametric fragility curves that allows to reduce the cost of mechanical numerical computations by optimizing their selection. Second, it is to adress the question of the best seismic intensity measure indicator that can to be used as abscissa of the fragilty curves and not be limited to the PGA. To this end, the strategy proposed is based on the use of Support Vector Machines (SVMs) coupled with an active learning algorithm. Thus, input excitation is reduced to some relevant parameters and, given these parameters, SVMs are used for a binary classification of the structural responses relative to a limit threshold of exceedance. It is worth noting that their output is not only binary, it is a ”score” that can have a probabilistic interpretation as we will see.
In contrast to classical learning (passive learning), the active learner selects the most useful numerical experiments to be added to the learning data set. The ”learners” choose the best instances from a given large set of unlabeled examples. So, the main question in active learning is how to choose new numerical experiments to be labeled. Various methods proposed in active learning by ANNs are presented in [Hasenjager2002]. Most are based on the learning of several ”learners” ([Seung:1992:QC:130385.130417, Gazut:2008:TOD:2325839.2328058]
). With SVMs, active learning can be done very easily by using only one learner because the distance to the separator hyperplane is a ”natural” criterion for selecting new points to ”label”
[Ton]. A similar technique using logistic output neural networks can be used by analyzing the logit of the output. But in this case, given the nonlinearity of the ANNs, the different learnings of the learner may present a strong variability on the decision boundary.
Finally, as the SVM output is a score, this score can be used as abscissa of the fragility curves. Indeed, a perfect classifier, if it exists, would lead to a fragility curve in the form of a unit step function, i.e. corresponding to a fragility curve ”without uncertainty”. Nevertheless although certainly not perfect, in the linear binary classification this score can particularly be relevant for engineering purpose since it is a simple linear combination of the input parameters.
To illustrate the proposed methodology, inputs parameters are defined from a set of real accelerograms which is enriched by synthetic accelerograms using the procedure defined in [doi:10.1002/eqe.997]
, which is based on a parameterized stochastic model of modulated and filtered whitenoise process. A brief summary of this model is presented in section 2 of this paper. Moreover a simple inelastic oscillator is considered in order to validate the methodology at a large scale within a Monte Carlobased approach that does not require any assumption to estimate probabilities of interest. This physical model is presented in section 3. Section 4 is devoted to the presentation of the different classification methods, and the active learning methodology. Section 5 shows how the proposed methodology makes it possible to estimate fragility curves, using either the score functions or classical intensity measure indicators such as the PGA. Finally, a conclusion is presented in section 6.
2 Model of earthquake ground motion
2.1 Formulation of the model
Following Rezaeian [Rez], a seismic ground motion with is modeled as:
(1) 
where is a deterministic, nonnegative modulating function with a set of parameters
, and the process inside the squared brackets is a filtered whitenoise process of unit variance:
is a whitenoice process, denotes the impulse response function (IRF) of the linear filter with a set of parameters , andis the standard deviation of the process defined by the integral in equation
1.In order to achieve spectral nonstationarity of the ground motion, the parameters of the filter depend on the time of application of the pulse; thus the standard deviation depend on . Still following Rezaeian, we choose for the impulse response function:
(2) 
where is the set of parameters, is the natural frequency (dependent on the time of application of the pulse) and is the (constant) damping ratio. A linear form is chosen for the frequency: . The modulating function is defined piecewisely:
(3) 
The modulation parameters are thus: . The initial delay is used in parameter identification for real ground motions, but it is not used in simulations (we choose ). To summarize, the generated signals are associated with real parameters: . Finally, a highpass filter is used as postprocessing to guarantee zero residuals in the acceleration, velocity and displacement. The corrected signal is the solution of the differential equation:
(4) 
where Hz is the corner frequency. Due to high damping of the oscillator (damping ratio of ), it is clear that , and all vanish shortly after the input process has vanished, thus assuring zero residuals for the simulated ground motion. In the rest of this paper we will use for the corrected signal .
2.2 Parameter identification
The first step to generate artificial signals is to identify the 9 model parameters for every real signal from our given database of acceleration records (selected from the European Strong Motion Database [ESMD] for and , where is the magnitude and the distance from the epicenter). Following Rezaeian [Rez, doi:10.1002/eqe.997], the modulation parameters and the filter parameters are identified independently as follows.
2.2.1 Modulation parameters
For a target recorded accelerogram , we determine the modulation parameters by matching the cumulative energy of the accelerogram with the expected cumulative energy of the stochastic process , which does not depend on the filter parameters (if the highpass postprocessing is neglected):
(5) 
Thanks to the definition of , this expected energy only depends on the modulation parameters . To match the two cumulative energy terms, we minimize the integrated squared difference between them:
(6) 
This minimization is done with the Matlab function fminunc. Note that the PGA of the generated signal is not necessarily equal to that of the recorded one on average.
2.2.2 Filter parameters
For the filter parameters , we use the zerolevel upcrossings, and the positive minima and negative maxima of the simulated signal and target signal . These quantities do not depend on scaling, thus we use only the unmodulated process
(7) 
For a given damping ratio , we can identify the frequencies by matching the cumulative count of zerolevel upcrossings of the target signal with the same expected cumulative count for the simulated signal, given by:
(8) 
where is the mean zerolevel upcrossing rate of the process and is an adjustment factor due to discretization (usually between and ). Since is a Gaussian process with zero mean and unit variance, the mean rate , after simplification, is given by:
(9) 
where is the standard deviation of the time derivative of the process:
(10) 
assuming we neglect integrals over a fraction of a time step in the discretization.
To identify the damping ratio , we use the cumulative count of positive minima and negative maxima. Indeed, in a narrowband process ( close to ), almost all maxima are positive and almost all minima are negative, but the rate increases with increasing bandwidth (larger ). An explicit formulation exists for this rate but it involves computing the second derivative of [adler], thus it is easier to use a simulation approach, by counting and averaging the negative maxima and positive minima in a sample of simulated realizations of the process, then choosing the value that minimizes the difference between real and simulated cumulative counts.
2.3 Simulation of ground motions
So far, we have defined a model of earthquake ground motions, and explained how to identify each of the parameters from a single real signal . The model then allows one to generate any number of artificial signals, thanks to the white noise . However, these signals would all have very similar features; in order to estimate a fragility curve, we need to be able to generate artificial signals over a whole range of magnitudes, with realistic associated probabilities. Thus, we have to add a second level of randomness in the generation process, coming from the parameters themselves. With Rezaian’s method we identified all the model parameters for each of the acceleration records, giving us data points in the parameter space (
in this case). Then, to define the parameters’ distribution, we use a Gaussian kernel density estimation (KDE) with a multivariate bandwidth estimation, following Kristan
[Kri].Let be a multivariate independent and identically distributed sample drawn from some distribution with an unknown density , . The kernel density estimator is:
(11) 
where is a Gaussian kernel centered at with covariance matrix . A classical measure of closeness of the estimator
to the unknown underlying probability density function (pdf) is the
asymptotic mean integrated squared error (AMISE), defined as:(12) 
where is the trace operator and is the Hessian of . If we write with and we suppose that is known, then the AMISE is minimized for:
(13) 
where still depends on the underlying (and unknown) distribution :
(14) 
Following Kristan [Kri], can be approximated by the empirical covariance matrix of the observed samples and can be approximated by
(15) 
where
(16) 
The estimator is now fully defined. The simulation of an artificial ground motion thus requires three steps:

choose an integer
with a uniform distribution;

sample a vector
from a Gaussian distribution with pdf
centered at with covariance matrix , and let ; 
sample a white noise and compute the signal using (1), with parameters .
For this article we generated artificial seismic ground motions using this method.
3 Physical model
3.1 Equations of motion
For the illustrative application of the methodology developed in this paper, a nonlinear single degree of freedom system is considered. Indeed, despite its extreme simplicity, such model may reflect the essential features of the nonlinear responses of some real structures. Moreover, in a probabilistic context requiring Monte Carlo simulations, it makes it possible to have reference results with reasonable numerical cost. Its equation of motion reads:
(17) 
where and are respectively the relative velocity and acceleration of the unit mass of the system submitted to the ith artificial seismic ground motion with null initial conditions in velocity and displacement. In equation 17, is the damping ratio, is the circular frequency and is the nonlinear resisting force. In this study, Hz, , the yield limit is m, and the postyield stiffness, defining kinematic hardening, is equal to of the elastic stiffness. Moreover, we call the relative displacement of the associated linear system, that is assumed to be known in the sequel, whose equation of motion is:
(18) 
and we set:
(19) 
(20) 
3.2 Response spectrum
Using the linear equation 18, we can compare the response spectra of the recorded accelerograms with that of the simulated signals. Figure (a)a shows this comparison for the average spectrum, as well as the , and quantiles. It can be seen that the simulated signals have statistically the same response spectra than the real signals, although at high frequency ( Hz), the strongest simulated signals have higher responses than the real ones. This may be due to the fact that the model conserves energy (equation 5), while the selection of acceleration records from ESMD is based on magnitude. To illustrate this, figures (b)b and (c)c
show the empirical cumulative distribution functions of the PGA and total energy. While there is a good match for the energy, the PGA of the strongest simulated signals is slightly higher than for the real ones.
3.3 Choice of the seismic intensity measures
Let be our database of simulated ground motions, and the associated modulating and filter parameters. For every signal , we also consider:

the peak ground acceleration ;

the maximum velocity (or Peak Ground Velocity) ;

the maximum displacement (or Peak Ground Displacement) ;

the total energy ;

the maximum linear displacement of the structure (equation 20). Conventionnaly, this is spectral acceleration () which is considered as intensity measure indicator. Nevertheless, since here the variable of interest is a nonlinear displacement, it is more suitable to use spectral displacement.
Thus, for each simulated signal we have a vector of 13 real parameters. We want to know whether the maximum total displacement of the structure is greater than a certain threshold, for example twice the elasticity limit .
4 Binary classification
4.1 Preprocessing of the training data
The signals whose maximum linear displacement is less than the elasticity limit are not interesting, since we know they do not reach the threshold:
This discarded of the simulated signals. We also discarded a few signals () whose maximum linear displacement was too high (), since the mechanical model we use is not realistic beyond that level. Therefore, we ended with a subset of our database such that:
We have kept simulated signals out of a total of . On those signals, a BoxCox transform was applied on each component of . This nonlinear step is critical for the accuracy of the classification, especially when we later use linear SVM classifiers. The BoxCox transform is defined by:
(21) 
The parameter
is optimized, for each component, assuming a normal distribution and maximizing the loglikelihood. Figure
2 shows that is heavily modified by this transformation, with an optimal parameter of .Finally, each of the 13 components was standardized, thus forming the training database with .
4.2 Simple classifiers
At the most basic level, a binary classifier is a labeling function
(22) 
that, given a vector corresponding to a seismic signal , gives us an estimated label . In our setting, the true label of instance is if the displacement is greater than the damage threshold , and otherwise:
(23) 
Note that the true label is not in general a function of the vector , since it depends on the full signal when only gives us macroscopic measures of the signal; therefore, a perfect classifier may not exist.
One of the simplest choice for a classifier is to look at only one component of the vector . For example, it is obvious that the PGA is highly correlated with the maximum total displacement ; therefore, we can define the PGA classifier as:
(24) 
where is a given threshold. Moving the threshold up results in less false positives ( when the real label is ) but more false negatives ( when the real label is ); and moving the threshold down results in the opposite. Therefore, there exists a choice of such that the number of false positives and false negatives are equal, as can be seen in figure 3.
Note that this choice does not guarantee that the total number of misclassifications is minimal. Similarly, we can also define a classifier based on the maximum linear displacement , since the linear displacement is also highly correlated with the total displacement. These two simple classifiers give us a baseline to measure the performance of more advanced classifiers.
4.3 SVMs and active learning
4.3.1 Support vector machines
In machine learning, support vector machines (SVMs) are supervised learning models used for classification and regression analysis. In the linear binary classification setting, given a training data set
that are vectors in , and their labels in , the SVM is a hyperplane of that separates the data by a maximal margin. More generally, SVMs allow one to project the original training data set onto a higher dimensional feature space via a Mercer kernel operator . The classifier then associates to each new signal a score given by:(25) 
A new seismic signal represented by the vector has an estimated label of if , otherwise. In a general SVM setting, most of the labeled instances have an associated coefficient equal to ; the few vectors such that are called ”support vectors”, thus the name ”support vector machine”. This historical distinction among labeled instances is less relevant in the case of active learning (see next section), since most of the are nonzero. In the linear case, is just the scalar product in , and the score is:
(26) 
where and depend on the coefficients
. Another commonly used kernel is the radial basis function kernel (or RBF kernel)
, which induces boundaries by placing weighted Gaussians upon key training instances.4.3.2 Active learning
Computing the total displacement of the structure (and thus the label ) is very costly for a complex structure, limiting the size of the training data. Fortunately, it is possible to make accurate classifiers using only a limited number of labeled training instances, using active learning.
In the case of poolbased active learning, we have, in addition to the labeled set , access to a set of unlabeled samples (therefore we have ). We assume that there exists a way to provide us with a label for any sample from this set (in our case, running a full simulation of the physical model using signal ), but the labeling cost is high. After labeling a sample, we simply add it to our training set. In order to improve a classifier it seems intuitive to query labels for samples that cannot be easily classified. Various querying methods are possible [Kre, Ton], but the method we present here only requires to compute the score for all samples in the unlabeled set, then to identify a sample that reaches the minimum of the absolute value , since a score close to means a high uncertainty for this sample. Thus, we start with samples and , labeled and . Recursively, if we know the labels of signals :

we compute the SVM classifier associated with the labeled set ;

for each unlabeled instance , , we compute its score

we query the instance with maximum uncertainty for this classifier:
(27) and compute the corresponding maximum total displacement by running a full simulation of the physical model;

the instance is added to the labeled set.
4.3.3 Choice of the starting points
The active learner needs two starting points, one on each side of the threshold. After the preprocessing step, about of all remaining instances have a displacement greater than the threshold (although this precise value is usually unknown). It can be tempting to choose, for example, the signal with the smallest PGA as and the signal with the biggest PGA as . However, running simulations with these signals is costly and give us a relatively useless information. We prefer to choose the starting points randomly, which also allows us to see how this randomness affects the final performance of the classifier.
The linear displacement and the of a signal are both obviously strongly correlated with the displacement . As a consequence, it is preferable that the starting points respect the order for these two variables:
(28) 
Indeed, if and are such that, for example, but , then the active learner starts by assuming that the PGA and displacement have a negative correlation, and it can take many iterations before it ”flips”; in some rare instances the classifier performs extremely poorly for several hundreds of iterations. Thus, the starting points and are chosen such that equation (28) is automatically true, using quantiles of the PGA and linear displacement. is chosen randomly among the instances whose PGA is smaller than the median PGA and whose linear displacement is smaller than the median linear displacement:
(29) 
where denotes the median of set . It is almost certain that any instance in this set satisfies and thus . Similarly,
is chosen using the 9th decile of both PGA and linear displacement:
(30) 
where denotes the 9th decile of a set . The probability that in this case was found to be . If we get unlucky and then we discard this signal and choose another one.
4.4 ROC curve and precision/recall breakeven point
The SVM classifier gives an estimated label to each signal depending on its score . As for the simple classifiers (section 4.2), we can set a nonzero limit , and define the classifier as:
(31) 
If , then the number of false positives ( and ) is smaller, but the number of false negatives ( and ) is bigger, relative to the case, and the opposite is true if we choose . Taking all possible values for defines the receiver operating characteristic curve, or ROC curve. The area under the ROC curve is a common measure for the quality of a binary classifier. The classifier is perfect if there exists a value of such that all estimated labels are equal to the true labels; in this case the area under the curve is equal to . Figure 4 shows one example of active learning, with ROC curves corresponding to different numbers of labeled signals. As expected, the classifier improves on average when the labeled set gets bigger; and the active learner becomes better than the simple PGA classifier as soon as .
Another metric can be used to measure performance: the precision / recall breakeven point [Kre]. Precision is the percentage of samples a classifier labels as positive that are really positive. Recall is the percentage of positive samples that are labeled as positive by the classifier. By altering the decision threshold on the SVM we can trade precision for recall, until both are equal, therefore defining the precision/recall breakeven point. In this case the number of false positives and false negatives are equal (see figure 3). This value is very easy to obtain from a practical point of view. Let us denote by the number of instances where the displacement is greater than the threshold (on a total of signals in the database):
(32) 
We sort all instances according to their score, i.e. we find a permutation such that:
(33) 
Then the precision/recall breakeven point (PRBP) is equal to the proportion of positive instances among the instances with the highest score:
(34) 
This criteria does not depend on the number of true negatives (unlike the false positive rate, used in the ROC curve). In particular, it is not affected by our choice of preprocessing of the training data, where we discarded all the weak signals (). (both metrics are affected by our choice to discard the very strong signals (), but the effect is negligible in both cases).
4.5 Results
We now compare different classifiers with the precision/recall breakeven point. More precisely, we compare different orderings of all signals, since only the order matters to the PRBP; for instance, the PGA does not give directly a label, but we can compute the PRBP of the PGA classifier with equation 34 using the permutation that sorts the PGA of all signals. We can thus compare:

the simple PGA and maximum linear displacement classifiers and , defined in section 4.2;

neural networks, trained with all instances and all labels (ie, with the signals and labels), with either all 13 parameters, or just 4 of them: (the linear displacement, peak ground acceleration, peak ground velocity and filter frequency, see section 4.7 for justification of this choice);

SVMs given by our active learning methods.
The neural networks we used are fullconnected Multi Layered Perceptrons (MLPs) with 2 layers of 26 and 40 neurons for
, and two layers of 50 and 64 neurons for . In the active learning category, the performance depends on the number of iterations (between 10 and 1000). So, the results shown in figure 5 are functions of the number of labeled training instances, in logarithmic scale. The simple classifiers and the neural networks are represented as horizontal lines, since they do not depend on . Moreover, this figure shows the PRBP of (i) a linear SVM using all 13 parameters (in blue), (ii) a linear SVM using only 4 parameters (in red) and (iii) a radial basis function (RBF) SVM, using the same 4 parameters (in yellow). As active learners depend on the choice of the first two samples, results of figure (a)a are obtained choosing 20 pairs of starting points , then averaging the performance, knowing that the same starting points were used for all three types of SVMs. For completeness, figure (b)b shows the worst and best performances of the 3 classifiers on the 20 test cases.Figure (a)a shows that active learning gives a much better classifier than the standard practice of using a single parameter (usually the PGA). The linear SVM with only 4 variables has initially the best performance on average, up to 150/200 iterations. The full linear SVM with 13 variables is better when the number of iterations is at least 200. The RBF kernel in appears to have the best performance with 1000 labeled instances, outperforming the neural network in ; however, it has a higher variability, as can be seen in figure (b)b. Radial basis function SVMs with 13 parameters seem to always perform very poorly, and are not represented here. Figure (b)b shows the lowest and highest score of all 20 test cases, independently for each number of iterations (one active learner can perform poorly at some point, and much better later, or the other way around). So, in conclusion, (i) active learners need a minimum of 3040 iterations, otherwise they can end up worse than using the simpler PGA classifier, (ii) between 50 and 200 iterations, the linear SVM in is the best choice, and has a relatively small variability and (iii) the RBF kernel seems quite unpredictable for less than 1000 iterations, and its performance depends wildly on the starting points, probably because of overfitting.
4.6 Results for different settings
Our methodology is very general and can be applied to a variety of structures. As an example, we compared the same classifiers on two structures with two different main frequencies, Hz and Hz, instead of the original Hz. The elasticity limit was also changed so that approximately one third of all signals result in inelastic displacement: m for the Hz structure, m for Hz and m for Hz. The failure threshold was always chosen as , which resulted in about of all signals attaining it for the and Hz cases, compared to in the Hz setting. As shown in figure 6, the performances of active learners are very similar to the 5 Hz case, and the same conclusions apply. The performances of classifiers based on a single parameter, on the other hand, can vary a lot depending on the frequence of the structure: the PGA classifier provides a good classifier at high frequency (PRBP at 10 Hz) but performs poorly at low frequency ( at 2.5 Hz, it does not appear in figure 6); while the linear displacement does the opposite (PRBP at Hz, but PRBP at 10 Hz). These results show that the active learning methodology is not just more precise, but also more flexible than the simple classifiers, and that with just 50 to 200 simulations it approaches the performance of a neural network using simulations.
4.7 Remark about the dimension reduction
In the linear case the score is equal to the distance to a hyperplane: (see equation 26). Therefore, we can see which of the components of are the most important for the classification simply by looking at the values of the components of . Figure 7 shows that the values of are roughly the same for all 20 test cases. After 1000 iterations, the coefficients for the PGA and maximum linear displacement end up between and , the value for the maximum velocity is around , and the value for the signal main frequency is around . The other 9 components of (when working with ) are all between and , but are smaller (in absolute value) than these 4 components. Note that comparing the values of the components of is only possible because the components of are standardized in the first place (cf. the preprocessing in section 4.1). As can be seen in the previous sections, reducing the dimension from to allows for a faster convergence, although the converged classifier is usually less precise. Continuing the active learning after 1000 iterations changes only marginally the results; even with , both the PRBP and the values of stay roughly the same between 1000 and 5000 iterations.
5 Fragility curves
SVM classifiers give to each signal a score whose sign expresses the estimated label, but it does not give directly a probability for this signal to be in one class or the other. In order to define a fragility curve, that is, the probability of exceeding the damage threshold as a function of a parameter representing the ground motion, we first need to assign a probability to each signal.
5.1 Scorebased fragility curve
This probability depends only on the score . For a perfect classifier, the probability would be if and if ; for our SVM classifiers we use a logistic function:
(35) 
where and are the slope and intercept parameters of the logistic function ( should be close to if the classifier has no bias, giving a probability of to signals with
). These parameters are computed using a logistic regression on the labeled set
. To compare this estimation with the empirical failure probability of signals with a given score, we divide our database in groupsdepending on their score, with the kmeans algorithm; then we define the estimated and reference probabilities of each group:
(36) 
We can now compute the discrete distance between these two probabilities:
(37) 
with . Figure 8 shows this distance for different classifiers using 20, 50, 100, 200, 500 and 1000 labeled instances. The three classifiers (linear SVM in , linear SVM in , and RBF kernels in ) are compared on 20 test cases, using 20 pairs of starting points (the same for all three). The solid lines show the average errors, and the dashed lines show the minimum and maximum errors among all test cases.
The average error goes down from after 20 iterations to less than after 1000 iterations for the linear SVM in , and from to less than for the linear SVM in . For the SVM using RBF kernels (in yellow in figure 8), the average error does not decrease as the number of iterations increases, and ends up around after 1000 iterations. Figure 10 shows typical examples of fragility curves obtained with each method after 100 and 1000 iterations. Recall that the logistic functions (in red) are not fitted using all the real data (in blue), but only the labeled set, ie 100 or 1000 signals. The linear SVM in has the least errors in terms of probabilities, although its PRBP is smaller than the linear SVM in when using 1000 labeled instances.
The radial basis function kernel shows a very ”strange” behaviour. The probability of failure is not even an increasing function of the score (figures (c)c and (f)f); in particular, signals with a very negative score still have a chance of exceeding the threshold. This strange shape of explains why the error of RBF kernels is so high (figure 8), since we tried to fit a logistic curve on a nonmonotonous function. The reason for this major difference between linear and RBF kernels can be understood if we look at the maximum total displacement as a function of the score , using both kernels (see figure 11). Let us keep in mind that the RBF classifier at 1000 iterations is the most precise of all our active learners; it has the fewest false positives and false negatives of all (see table 1). The sign of the RBF score is thus an excellent predictor for binary classification.
linear kernel  

1020  4711  
27175  812 
RBF kernel  

1009  4722  
27287  700 
Figure 11 shows that for the linear classifier, the score is a good predictor of the maximum total displacement , with a monotonous relation between the two; therefore the probability that a is wellapproximated by a logistic function of the score. The RBF score, on the other hand, is a poor predictor of the probability of failure, since the relation between the score and the maximum total displacement is not monotonous. We can now understand the very high errors for RBF kernels. Looking at figure 11, we can see that the weakest signals (, just above the elasticity limit) have a RBF score between and . Since these weak signals are very common in our database, the reference probability goes rapidly from for to almost for (see figures (c)c and (f)f), not because the number of positive signals changes significantly between and , but because the number of negative signals is more than 20 times bigger. The linear kernels do not have this problem, and therefore have much lower errors.
ROC curves (figure 12) give us another way to look at this dilemma between linear and RBF kernels. If we look at the unbiased (i.e. ) classifiers, the RBF is clearly superior: it has fewer false positives and slightly fewer false negatives than the linear classifier. However, when we choose a negative limit (see equation 31), for example , then some of the weakest signals end up over the limit () and thus have an estimated label of . Since these weak signals are so common, the false positive rate becomes extremely high.
5.2 PGAbased (resp. Lbased) fragility curve
In the previous section we always used the score as the parameter on the xaxis to build the fragility curves. However, our method assigns a probability to each signal, depending only on a few parameters. If we consider this probability as a function of 4 parameters ( if ), then we can use any of those parameters, the PGA for example, to define a posteriori a fragility curve depending on just this parameter, averaging over the other ones:
(38) 
Figures 13 and 14 show two examples of such curves, using the PGA or the maximum linear displacement . We used a linear SVM classifier in with 100 and 1000 iterations and computed the probabilities as before, but then divided the database in groups (with kmeans) depending on their PGA (resp. on ), instead of the score, before computing the reference probabilities and estimated probabilities for each group . In this case, we can show all 20 test cases in a single figure, since they share a common xaxis (which is not true when we used the score).
We now have a fully nonparametric fragility curve. The distance between the reference and estimated curve is very small in both cases, even using just 100 labeled instances, although the spread is smaller when we add more data points.
5.3 Trading precision for steepness
The PGAbased and Lbased fragility curves (figures 13 and 14) are very close to the reference curves; the distance between reference and estimated curves is very small, even smaller than in the case of scorebased fragility curves. In this case, why even bother with scorebased fragility curves ? What is their benefit, compared to easilyunderstandable, commonly accepted PGAbased curves ?
The difference is in the steepness of the curve. Formally, when we construct a fragility curve, we choose a projection to use as the xaxis. This projection can be one of the 4 variables (for example the PGA), or the score , which can be a linear or nonlinear (in the case of RBF kernel) combinaison of the 4 variables. We then use the kmeans algorithm to make groups of signals who are ”close” according to this projection, i.e. signals with the same PGA or the same score; then we compute the estimated probability for each group. Let us assume for a while that our estimation is very precise, so that . In this case, which fragility curve gives us the most information ? To see this, we define:
(39) 
for some nonnegativevalued function . Intuitively, a perfect classifier would give each signal a probability of or , while a classifier which assigns a probability of to many signals is not very useful. Therefore, we want to be positive on , equal to for and . If we choose:
(40) 
then can be seen as the entropy of the probability , which would be equal to for a perfect classifier and has higher values for a ”useless” classifier. Another choice would be:
(41) 
In this case, also has a clear physical meaning: it is the proportion of ”uncertain” signals, i.e. signals such that . Table 2 shows the value of , using the entropy version, for different choices of projection (score, PGA, or linear displacement). We can see on this table that the PGA and Lbased fragility curves are extremely precise, with very low values of (this can also be seen in figures 13 and 14), but their entropy is much higher than the scorebased fragility curves.
projection  score  PGA  L  

n=100  ()  
entropy  
n=1000  ()  
entropy 
One surprising fact of table 2 is that the entropy is smaller at compared to in all three cases. This shows that after only mechanical calculations, all our classifiers tend to slightly overestimate the steepness, and give fragility curves that are actually steeper than the reality (and also steeper than the more realistic curves). This was also seen in figures (b)b and (e)e: at iterations the estimated curve is steeper than the reference curve, which gives an estimated entropy smaller than the reference entropy. Using the other choice of gives the same conclusions: the proportion of signals with is if we use the score, but it is around for both the PGA and maximum linear displacement. Therefore, the choice of the projection used for a fragility curve is a trade between precision and steepness. Keep in mind that the values of the entropy for different choices of projection can be obtained after the active learning, and the computationnal cost is very small (mostly the cost of kmeans). As a consequence, this choice can be made a posteriori, from the probabilities assigned to each signal.
5.4 Additionnal remarks
5.4.1 About the specificity of active learning
Using the score to compute the probabilities (equation 35) on the whole dataset is absolutely mandatory, even if one is only interested in the PGA fragility curves. In particular, looking only at the labeled set to find directly a probability of failure depending on the PGA gives extremely wrong results. Figure 15 shows not only the reference and estimated fragility curves previously defined, but also the points of the labeled set and an empirical probability built from it.
This empirical fragility curve was computed by using the kmeans algorithm on the PGA values of the labeled set , then taking the empirical failure probability in each group. The result looks like… a constant around (with high variability). Looking only at , the PGA does not even look correlated with failure. The reason for this surprising (and completely wrong) result is the active learning algorithm. is not at all a random subset of ; it is a carefully chosen set of the data points with maximal uncertainty, which means that most of them end up very close to the final hyperplane (and have a final score very close to ), but also that they span as much of this hyperplane as possible. When these points are projected on any axis not perpendicular to the hyperplane (that is, any axis but the score), for example the PGA (which is one of the components of ), the empirical probability is roughly equal to the constant , which is not representative at all of the underlying probability. Using equation 35 (then eventually fitting the results to a lognormal curve for the PGA) is the right way to go.
5.4.2 About a combination of the linear and RBF kernels
The RBF kernel was very promising in terms of classification, as we saw on the first results (figure (a)a); however, we saw in the following sections that using it to make a fragility curve can lead to catastrophic results (figures (c)c and (f)f). Could we combine the two kernels in a way that let us keep the benefits of both ? One simple way to do that is to use the following procedure:

use the active learning with RBF kernel to select signals to be labeled;

from these 1000 data points, train two classifiers, one with a linear kernel, the other with the RBF kernel;

assign two scores and to each nonlabeled signal, using the two classifiers;

fit the labeled points to each set of scores, giving you two probabilities and for every signal;

the ”final” probability is chosen as:
(42)
Since the active learning used RBF kernels and we use the RBF score for any ”uncertain” signal, the PRBP has about the same value than in the ”pure RBF” version, around 0.85. However, the is at , slightly higher than for the ”pure linear” version (1.8%), but much better than the catastrophic ”pure RBF” version (20% !). Although this procedure may seem to have the best of both worlds, in a practical application the PRBP may not be very interesting if the goal is to make a fragility curve; in this case only the precision and steepness of the curve are important, and a linear kernel performs better than a RBF kernel.
6 Conclusion
This paper proposed an efficient methodology for estimating nonparametric seismic fragility curves by active learning with a Support Vector Machines classifier. We have introduced and studied this methodology when aleatory uncertainties have a predominant contribution in the variability of structural response, that is to say when the contribution of uncertainties regarding seismic excitation is ”much larger” than the contribution of uncertainties regarding structural capacity. In this work, structure was considered as deterministic. In this framework, a perfect classifier, if it exists, would lead to a fragility curve in the form of a unit step function, i.e. corresponding to a fragility curve ”without uncertainty”. That means the output of this classifier, which is a score, would be the best seismic intensity measure indicator to evaluate the damaging potential of the seismic signals, knowing that such a classifier would necessary be both structure and failure criteriondependent, with possibly a dependence on the ground motion characteristics (nearfault type like, broadband, etc).
The proposed methodology makes it possible to build such a (nonperfect) classifier. It consists in (i) reducing input excitation to some relevant parameters and, given these parameters, (ii) using a SVM for a binary classification of the structural responses relative to a limit threshold of exceedance. Selection of the mechanical numerical calculations by active learning dramatically reduces the computational cost of construction of the classifier. The output of the classifier, the score, is the desired intensity measure indicator which is then interpreted in a probabilistic way to estimate fragility curves as score functions or as functions of classical seismic intensity measures.
This work shows that a simple but crucial preprocessing of the data (i.e. BoxCox transformation of the input parameters) makes it possible to use a simple linear SVM to obtain a very precise classifier after just one hundred iterations, that is to say with one hundred mechanical calculations. Moreover, for the class of structures considered, with only four classical seismic parameters (, , , ), the scorebased fragility curve is very close to the reference curve (obtained with a massive Monte Carlobased approach) and steeper than the PGAbased one, as expected. Lbased fragility curves appear to perform about as well as PGAbased ones in our setting. Advanced SVMs using RBF kernel result in less classification errors when using one thousand mechanical calculations, but does not appear well suited to making fragility curves.
A naive way to take into account epistemic uncertainties would consist in building a classifier for each set of structural parameters. Nevertheless, such a method would not be numerically efficient. Another way could be to assume that epistemic uncertainties have small influence on the classifier evaluated, for example, for the median capacity of the structure. Thus, only calculations of linear displacements would be necessary to estimate the corresponding fragility curve. However, to avoid such assumptions, some research efforts have to be devoted to propose an efficient overall methodology that takes into account the two types of uncertainties.
Comments
There are no comments yet.