The idea of “letting the tail speak for itself” by ignoring the bulk of the data is effective in forcing fitting procedures focus on the extremes, where primary interest lies, rather than being overwhelmed by data from the bulk of the distribution, which is by definition much more plentiful. However, the price to be paid in employing this strategy is that it forces a somewhat arbitrary decision about what counts as extreme, and more importantly, it disallows the possibility that events that are large but deemed insufficiently extreme can enter into the analysis at all. This leads to the question of how to model a major portion of the spatiotemporal data without thresholding at a high value where our main aim is in spatial prediction of high marginal quantiles.
Gaussian processes (GPs) are by far the most common model in spatial statistics due to their good theoretical properties, tractability in high-dimensions and the ease of computation even for hierarchical models (Gelfand and Schliep, 2016). However, GPs are criticized for modeling spatial extremes as the extremal dependence between any two spatial locations is zero (Davison et al., 2013). While a single transformation is usually considered to approximate normality (Schmidt et al., 2017) along with modeling the transformed data using a GP, the extremal dependence between two spatial locations is invariant of the data transformation and hence still zero. Gelfand et al. (2005) propose a flexible nonparametric and nonstationary model based on Dirichlet process mixture (DPM) of GPs which relaxes the Gaussian assumption; however the spatial extremal dependence remains zero. Hence, even if the model is appropriate for modeling the bulk of the spatial data, it is questionable for modeling the extremes.
Literature on spatial modeling of extremes cover a number of approaches like Bayesian hierarchical models (Sang and Gelfand, 2009, 2010; Turkman et al., 2010), copula-based approaches (Ribatet and Sedki, 2013; Fuentes et al., 2013; Genest and Nešlehová, 2012) and max-stable processes (Reich and Shaby, 2012; Mathieu, 2013; Davison and Huser, 2015). Davison et al. (2012)
compare these approaches and conclude that the copula and max-stable process approaches fit the joint distribution better and hence preferred for modeling spatial extremes. In spite of good theoretical properties of max-stable processes in explaining spatial extremes, drawing inference is challenging because the full joint density of a multivariate generalized extreme value (GEV) distribution can be calculated only when the dimension is small, leading to less efficient techniques like composite likelihoods(Padoan et al., 2010); recently Thibaud et al. (2016) proposed a hierarchical Bayesian model approach for approximating the full joint distribution though the approach is computationally intensive. Limiting Poisson (Wadsworth and Tawn, 2013) and the generalized Pareto (Ferreira et al., 2014) approaches are theoretically justified models for threshold exceedances and are computationally tractable. A nonparametric copula-based model is proposed by Fuentes et al. (2013) where the spatial dependence between the extreme observations is modeled by DPM of GPs but the marginals are modeled as GEV distributions. Krupskii et al. (2018) propose factor copula models for replicated spatial data which can model tail dependence and tail asymmetry though the model is parametric. Morris et al. (2017) propose a Bayesian spatiotemporal model for threshold exceedances based on spatial skew- processes (STPs).
Mixture models have been used extensively in univariate extreme value analysis. Models that splice a generalized Pareto tail to a different “bulk distribtion” were proposed by Behrens et al. (2004) and extended by Carreau and Bengio (2009), MacDonald et al. (2011), Hu and Scarrott (2018), and others. do Nascimento et al. (2012)
spliced a generalized Pareto tail to a finite mixture of gamma distributions, whileFúquene Patiño (2015) extends this model to a Dirichlet process mixture of gammas. In the time series context, Shaby et al. (2016) use a dependent mixture of a normal bulk and generalized Pareto tail to model heat waves. Though the choices of high thresholds are arbitrary, for example, 0.98-th data quantile in Shaby et al. (2016), and the ideas are not readily applicable in spatial extremes and computational burden is an issue.
Except the geostatistics approach of Gelfand et al. (2005), other spatial extremes approaches use high thresholds or block maximums which lead to the situation where moderately extreme observations are discarded or censored from the analysis. In the univariate literature using mixture model, even if the full data is used, an arbitrary threshold is chosen and the observations below and above the threshold are modeled differently.
We use a hierarchical mixture of skew-
processes to model spatial dependence, with particular emphasis on modeling the joint tail. This approach differs from the current prevailing practice in extreme event analysis in that we model the entire spatial process, whereas the current state of the art either throws away or censors observations that are not considered extreme. Our mixture approach seeks the advantages of letting the tail speak for itself without sacrificing the ability to consider non-extreme events. It simultaneously probabilistically clusters events and estimates cluster dependence parameters, so that events that land in the extreme cluster with high probability influence the fit in the tail, while events that land in other clusters with high probability do not. Taken together, the mixture components can both flexibly model the bulk of the distribution and fit the tail dependence in an un-contaminated way. Furthermore, each mixture component is a skew-process, which itself considerably more flexible than standard tools like Gaussian processes that allows spatial extremal dependence.
Our data example illustrates a situation where elevated but non-extreme events play an important role, even if the primary interest lies in the most extreme events. We analyze a fire threat index called Fosberg Fire Weather Index (FFWI) in a fire-prone region in Southern California. Here, risk management requires knowledge of the probable spatial extent of the most extreme fire weather conditions, so estimating the joint tail characteristics is key. However, elevated but not necessarily extreme fire weather still poses substantial risk, so it cannot be ignored. Finally, we compare the performance of several models in predicting high quantiles of FFWI using a leave-one-out cross-validation and use our model to make spatial maps of FFWI extremes.
In this section, first we develop a nonparametric Bayesian spatial model using a Dirichlet process mixture (DPM) of spatial skew- processes (STPs) assuming temporal independence. In Section 2.4, we propose an extension to accommodate the temporal extremal dependence.
2.1 Spatial skew- process
Let be the (potentially transformed) observation at the monitoring site located at and time (we discuss the transformation in Subsection 2.2). Our Bayesian nonparametric (BNP) model for is based on the parametric STPs described in this subsection. STPs (Padoan, 2011) are richer models than GPs. Unlike GPs, STPs permit heavy tails and asymmetry in the marginal distribution and asymptotic spatial dependence for extremes. Borrowing ideas from additive processes (Azzalini and Capitanio, 2003, 2014), Morris et al. (2017) create a STP as a location-scale mixture of a GP.
Let be a STP defined over the spatial domain of interest . At a spatial location , we can write
where is the spatially-varying mean process, and . Assuming
follow a standard normal distribution, marginally over the randomand , follows a skew- distribution with location , scale , skewness (see Supplementary material (SM) Appendix A). To account for spatial dependence, the error process is assumed to follow a standard GP with correlation . We assume the isotropic Matérn correlation function
where is the Euclidean distance between and , , and are the range, smoothness and the ratio of spatial to total variation respectively. In (2.2), is the modified Bessel function of degree and if and 0 otherwise.
For a set of locations , let
denote the vector of observations and. After marginalization over and , [matching the notations of Azzalini and Capitanio (2014)], the joint distribution of is
where is the -dimensional matrix obtained by evaluating at ; denote the set of parameters by . We denote the skew- density function as ; more details about the joint density of the STPs are provided in SM Appendix A.
2.2 Dirichlet process mixture (DPM) model
When a spatial process is observed at only one time point, a parametric model like a STP is required. However, with temporal replications, the spatial process can be estimated semiparametrically or nonparametrically. We temporarily assume thatare iid -dimensional realizations from a DPM of STPs (henceforth, STP-DPM) with the density described by
where is the number of mixture components, are the mixture probabilities with , denotes the set of parameters of the -th component and denotes the density function of a -dimensional realization from the skew- process as described in Subsection 2.1. The density function denotes the density function of a DPM of multivariate skew- densities (henceforth, MST-DPM). For a fully nonparametric model, .
An equivalent representation of (2.4) is the clustering model described below. Let denote the cluster label of the replication at time with , and . Therefore, the STP-DPM model clusters similar replications (e.g. days) and models the clusters using STPs. The motivation for this clustering is to separate the bulk from the extremes, and by allowing the clusters to have different ’s, we hope to prevent data from the bulk to influence parameter estimates for the tails. In our Bayesian model, the cluster labels are treated as unknown parameters and thus we account for uncertainty in cluster allocation.
Given the cluster label , the conditional distribution of is
where is the spatial correlation defined by in (2.2) with parameters and . The set of parameters for time is with for .
The mixture probabilities, ’s, are sequentially constructed following the stick-breaking representation proposed by Sethuraman (1994) so that they sum to one and hence the ’s “break the stick” of unit length. The first mixture probability is modeled as , where . Subsequently, the -th mixture probability is constructed as where is the probability not considered by the first components and . In case of finite , we set so that which ensures that . The case of corresponds to the Dirichlet process prior (Ferguson, 1973, 1974).
Further, we put hyperpriors on the cluster-specific parameters’s which are the atoms of the stick-breaking process. For , the hyperpriors relate to the base measure of the corresponding Dirichlet process. We assume and the components of ’s are independently distributed. Choices of the parameters of the hyperpriors are discussed in Section 3.
The STP-DPM model has support . Extreme value analysis often deals with bounded distributions. We therefore include a transformation step to allow a more flexible model for the bounds of the distribution. Let be the FFWI at the monitoring site located at and time . We assume that the support of to be same as the support of a generalized extreme value (GEV) distribution with location, scale and shape parameters and respectively; thus, the support of is if , if and if . We relate the observed and transformed data using monotonically-increasing GEV-log transformation . If , the transformation is . The transformed variables are then modeled flexibly using a nonparametric approach; thus, the marginal distributions of are not GEV. Advantages of power transformations, e.g., Box-Cox transformation, have been discussed by Wadsworth et al. (2010) mainly in improving the convergence rate of Fisher-Tippett-Gnedenko theorem. We emphasize that the GEV parameters and are treated as unknown in our fully Bayesian analysis. More details regarding the transformation are provided in SM Appendix B.
2.3 Model properties
From the infinite mixture model representation () in (2.4), it is evident that for any spatial locations (), the class of -dimensional joint densities is a superset of the class of priors of Gelfand et al. (2005) which is a special case by setting and for each . Thus, MST-DPM prior spans the entire set of joint densities for any set of spatial locations (Gelfand et al., 2005; Reich and Fuentes, 2015). Under suitable regularity conditions, the posterior consistency of the DPM with multivariate Gaussian kernels is proven by Wu and Ghosal (2010). The posterior consistency of the DPM with multivariate skew- kernels holds from the fact that the skew-
distribution can be obtained from the normal distribution by marginalizing the random location and scale and hence, the Kullback-Leibler divergence between the true density and the estimate based on our model is smaller than its value for the normal distribution follows from Lemma B.11 ofGhosal and van der Vaart (2017).
For the model (2.2), given the cluster parameters , the conditional mean and covariances of (assuming for each with ) are
The mean and covariance are both dependent on and and cannot be reduced to a function of and hence the model has both nonstationary mean and covariance structure.
By setting , and for each with , the model (2.2) is a spatial Dirichlet process where has discrete support ’s with and given the cluster parameters and the mixture probabilities . The mean and covariance functions span the mean and covariance function of any square-integrable stochastic process. The proof is provided in SM Appendix C.
The extremal dependence between two random variablesand is often quantified using the -measure (Sibuya, 1960) given by
where and are marginal distribution functions of and respectively. A value of near 1 indicates strong asymptotic dependence while defines asymptotic independence. For a spatiotemporal process, the extremal dependence between two spatial locations and is defined by or for , the Euclidean distance between the locations, when the spatial extremal dependence is isotropic (assuming both the forms to be time-invariant). Similarly, for a spatial location , the temporal extremal dependence between two time points and , is defined by and further, assuming the -measure being stationary in time and spatially-invariant, we denote it by where denotes the temporal lag.
Conditioning on the cluster parameters and the mixture probabilities , the spatial extremal dependence for the proposed STP-DPM model is given by the following theorem. The proof is provided in the SM Appendix D.
The extremal dependence measure for the STP-DPM model in Subsection 2.2 is given by
with , , is the survival function for a Student’s distribution with degrees of freedom, and .
The -measure is dependent on and only through ; thus, even if the model specification is very flexible with nonstationary mean and covariance structures (Lemma 1), the extremal dependence is isotropic. This characteristic may be appealing in many applications because data in the tail are sparse and thus simple models are needed to provide stability. To allow for nonstationary extremal dependence, we can relax the isotropic covariance structure and assume a nonstationary covariance structure for the . Simpler alternatives like anisotropic Matérn covariance structure which has only two additional parameters (discussed later in Section 5) can be considered as well.
The -measure depends only on the cluster with the smallest degrees of freedom which is the component with the heaviest tail. Thus, the extreme observations are likely to be clustered into one component with the heaviest tail and as we allow different parameters for each cluster, the data appearing from other clusters with lighter tails do not influence the parameters of the cluster with the thickest tail. Thus, STP-DPM model allows a probabilistic partitioning of the tail from the bulk and prevents the bulk from influencing on the inference about the extremes.
Similar to the upper-tail extremal dependence in (Appendix E: Sub-asymptotic conditional exceedance probabilities), the lower-tail case is . For the STP-DPM model, has a similar form as in (2.7) except replaced by . Thus, in both the tails the -measure depends only on the cluster with the smallest degrees of freedom and therefore it may be that extremely small values influence the estimates of the parameters in the upper tail. To bypass this issue, we censor the observations in the left tail (below 0.1-th quantile) to remove the effect of the lower-tail data on the upper-tail extremal dependence parameters.
Some corollaries of Theorem 2.1 for the extremal dependence in case of sub-models, e.g., GP (), TP (), STP (), DPM of GPs (Gelfand et al. (2005), GP-DPM) and DPM of TPs (TP-DPM) are as follows. Since the -measure is isotropic for all processes, we denote simply by .
GP, GP-DPM: .
DPM of TPs: with .
In Figure 1 (left), we plot versus for several parameter choices (for convenience, we drop the subscript from , and ). The extremal dependence decreases with increasing (for , the extremal dependence remains zero for any , the case of a skew-normal process). For any finite , is positive, even if is zero. The lines corresponding to and -0.25 show that a positive skewness increases . While we discuss the asymptotic conditional exceedance probabilities as -measure, the sub-asymptotic exceedance probabilities (without considering the limit) for a skew- process with different parameter choices are discussed in SM Appendix E.
2.4 Extension to the spatiotemporal Dirichlet process
As we have discussed in Section Appendix J: Fosberg Fire Weather Index, the FFWI demonstrate temporal extremal dependence. The literature on dependent DPM approaches is dominated by the autoregressive DPM approach (Beal et al., 2002; Fox et al., 2011; Storlie et al., 2014)
which consider a Markov model of the cluster indexes’s in (2.2). While this approach adds temporal autocorrelation, this does not capture extremal temporal dependence as given by the following theorem. The proof is provided in SM Appendix F.
Consider the STP-DPM model in (2.2). The temporal extremal dependence at any spatial location is zero if the temporal dependence in the spatiotemporal process is constructed only through the temporal dependence of the cluster indexes , i.e., given and , and are independent. The only exception is the case of which leads to exact dependence.
Following Morris et al. (2017), we consider an AR(1) structure for the and . To ensure the spatial process is DPM of STPs, the inverse gamma distribution of with parameters and and the half-normal distribution of needs to be preserved and it is done as follows. Suppose and denote the CDFs of half-normal and inverse gamma distributions respectively. Thus, for each , and and so and . We specify an AR(1) structure as follows.
This specification ensures the process is stationary across time.
It is challenging to derive an analytical expression for the temporal extremal dependence. Based on a simulated data, we demonstrate the presence of the dependence. We consider a model with mixture components and mixing probabilities . The vector of skew- parameters for the two components to be and respectively ( denotes the location parameter). We generate lag- observations for from our model setting , for . The lag- -measure is estimated using the F-madogram. The right panel of Figure 1 suggests that the extremal dependence increases as increases to 1 and the extremal dependence decreases as increases.
We use Markov chain Monte Carlo (MCMC) methods for model fitting and prediction. We consider conjugate priors for the parameters whenever possible which helps in updating the parameters using Gibbs sampling. For the GEV parameters, we assume, and . For the purpose of computation, we fix the number of components in the stick-breaking model at by setting . For the parameters of the base measure , we assume
, , ,
, truncated above at 20, .
Here denotes the correlation matrix obtained from in (2.2) with Matérn parameters and
. For hyperparameters of the base measure, we consider, , truncated above at 20 and . The DP concentration parameter . We tune the hyperparameters differently to allow better mixing within MCMC and the choices are discussed in Section 5. More details regarding the prior choices are provided in SM Appendix G.
The MCMC steps are a combination of Gibbs sampling and Metropolis-Hastings algorithm. The steps of model fitting and prediction are provided in SM Appendix H.
4 Simulation Studies
In this section, we perform a simulation study to assess the performance of our proposed model in spatial prediction of marginal quantiles and in estimation of extremal dependence. We compare the performances of GPs, TPs, STPs, GP-DPM and TP-DPM with the proposed STP-DPM model. STPs are considered to be of the form (2.1) while TPs and GPs are sub-models of (2.1) by setting for TPs and and both for GPs. Considering max-stable processes (MSPs) as alternatives, the MSP of Reich and Shaby (2012) is compared with STPs in Figure 3 of Morris et al. (2017) and it appears that when the data are generated from a MSP, MSPs perform only slightly better than STPs, while in case of data generated from STPs, MSPs perform quite poorly. Considering these results and the computational burden associated with MSPs, we do not include MSPs in the simulation study.
4.1 Simulation design
We generate 100 datasets from each of the 6 designs: (1) GP, (2) TP, (3) STP, (4) GP-DPM, (5) TP-DPM and (6) STP-DPM. In each case, data are generated at sites and time points. The sites are generated uniformly on the unit square and . As our main aim is spatial prediction of time-invariant quantiles, we generate independent replications of the spatial process. For Design (6), we consider a three-component mixture of STPs as in (2.3) with the parameters in Table 1. For simulation from Design (5), we consider same parameters as in Table 1 except that we set the ’s to zero. Additionally we set
’s to infinity in case of Design (4), i.e., the components have fixed variance equal to’s. For Design (3), we consider the parameters of the third component in Table 1 except that we set the Matérn parameters , and . Design (2) has same parameters as in Design (3) except and additionally for Design (1). For each model, we transform the simulated data to using inverse GEV-log transformation with , and .
We use the priors given in Section 3
. We run each MCMC chain for 20,000 iterations, discard first 10,000 iterations as burn-in and out of the post-burn-in samples, we perform thinning by keeping one in each five samples. Here we are not interested in estimating specific parameters which is complicated due to label switching throughout the MCMC. Rather, we are interested in quantiles of the posterior predictive distribution and the extremal dependence. We monitor the convergence for a set of quantiles and the MCMC converges well for these quantities.
We fit the models to 50 sites and predict the true quantiles for 10 additional test sites. For a test site , suppose the true marginal distribution is given by and denote as the posterior predictive distribution function at . For , models are judged based on the difference ; a model with close to zero is preferred for that (correspondingly return level). A positive (negative) value of indicates overestimation (underestimation) of the true quantile. For each of the 100 datasets generated from six designs, we fit all the models and plot versus for , averaged across the test sites and the datasets in Figure 2. While our main interest lies in the inference about the tails, the proposed method can model the full support very flexibly and hence we consider low-through-high values of . While indicates the bias in prediction, the root mean squared error (RMSE) of prediction for the 0.95-th quantile is provided in Table 2.
For evaluating the performance in estimating the extremal dependence, we compare the true -measure versus the estimates based on the six models, GP through STP-DPM. For models GP and GP-DPM, estimated extremal dependence is always zero. The plots of the estimated based on models TP, STP, TP-DPM and STP-DPM along with the true are provided in Figure 6 of SM Appendix I.
First, we compare the models based on estimating the CDF (Figure 2 and Table 2). When the data are generated from Design (1), all models perform well (with highest is approximately 0.0075) while GP and GP-DPM perform better than other models. For Design (2), GP and GP-DPM perform worse than other models. In case of Design (3), STP and STP-DPM perform well while all other models lead to poor prediction performances. In all three cases, the STP-DPM model has only slightly higher MSE than the models with smallest RMSE values. When the data are generated from the mixture models, the DPM models perform better than the parametric models both in terms of prediction bias and prediction RMSE. In case of Design (4), GP-DPM model performs better at the tails while TP-DPM and STP-DPM performs slightly better for the bulk. For Design (5), TP-DPM and STP-DPM perform better than GP-DPM and for Design (6), STP-DPM perform the best followed by GP-DPM. Similar ordering is observed for the prediction RMSE of the 0.98-th and 0.99-th quantiles as well. Overall, STP-DPM performs equally well or better than other models, both in terms of prediction bias and prediction RMSE as well as estimation of irrespective of the data generating model.
|1||0.45 (0.02)||0.52 (0.02)||0.54 (0.03)||0.43 (0.02)||0.50 (0.02)||0.56 (0.03)|
|2||0.83 (0.05)||0.58 (0.03)||0.67 (0.04)||0.90 (0.05)||0.60 (0.03)||0.74 (0.04)|
|3||1.75 (0.18)||1.76 (0.09)||1.56 (0.11)||1.61 (0.10)||1.76 (0.09)||1.47 (0.09)|
|4||1.96 (0.03)||2.94 (0.05)||4.29 (0.12)||0.44 (0.02)||0.49 (0.02)||0.49 (0.02)|
|5||1.75 (0.04)||3.64 (0.11)||11.66 (0.47)||0.66 (0.03)||0.63 (0.02)||0.63 (0.03)|
|6||2.61 (0.11)||4.93 (0.24)||20.52 (0.86)||1.40 (0.08)||1.53 (0.08)||1.20 (0.08)|
Comparison of models GP, TP, STP, GP-DPM, TP-DPM and STP-DPM based on the prediction RMSE of the 0.95-th quantile when the data are generated from the Simulation Designs 1-6. The standard errors are in the parentheses.
5 Data application
Southern California is susceptible to catastrophic wildfires which are often caused by Santa Ana winds. Santa Ana winds occur mainly in the counties Ventura, Los Angeles, Orange, Santa Bernardino, Riverside and San Diego. During the late fall and winter, the Santa Ana winds originate from the Great Basin and heat up as they cross the mountains, move towards the coast due to offshore surface pressure gradients and often lead to wildfires (Raphael, 2003). This phenomenon is most common in December (Hughes and Hall, 2010). For example, the Thomas fire in December, 2017 is associated with the Santa Ana winds and is considered to be the largest wildfire in the modern history of California (https://inciweb.nwcg.gov/incident/5670). An understanding of the future risk of wildfires is required for improved disaster management. As the whole Santa Ana region is a small geographic domain, the fire risk is likely to be high throughout the region on a particular day of extreme weather. Therefore, models for analyzing fire risk in this region should be capable of exhibiting extremal spatial dependence.
Fosberg Fire Weather Index (FFWI) is a well-established measure that quantifies the potential influence of important weather parameters on fire risk (Fosberg, 1978). It is a nonlinear function of air temperature, wind speed and relative humidity (the functional form of the filter is provided in the SM Appendix J). The National Oceanic and Atmospheric Administration (NOAA) considers FFWI larger than 50 to be significant on a national scale. The Storm Prediction Center (SPC) fire weather verification scheme (http://www.spc.noaa.gov) uses FFWI for fire danger rating ranging between high to extremes. Disaster management policies often consider moderate-to-extreme quantiles of weather parameters and hence require modeling of the bulk as well as the tail (Dey and Yan, 2016) of the FFWI observations using a proper spatiotemporal model.
5.1 Fosberg Fire Weather Index data
Our data consists of hourly FFWI observations from 61 Remotely Allocated Monitoring Stations (RAWS) across the Santa Ana region from December 31, 2003 through December 31, 2013. More details about RAWS are available at https://www.wrh.noaa.gov/sto/obsmap.php. Based on the definition of fire-zones of the Santa Ana region by Rolinski et al. (2016), the whole region is divided into three zones with Zone 1 comprises parts of Ventura and Los Angeles Counties, Zone 2 covers the parts of Orange, San Bernardino and Riverside Counties while Zone 3 corresponds to most of the regions within the San Diego County. While the index is not truncated at 100 in general, e.g., Roads et al. (1991), Sapsis et al. (2016), some authors consider truncation at 100, considering it as a threshold for the extreme fire situation (Kambezidis and Kalliampakos, 2016). In this paper, we use the raw index without truncation. As a first step for data preprocessing, we consider daily maximum of the hourly observations. This step follows the protocols of National Fire Danger Rating System (NFDRS) where daily maximums are considered to be the representative of the combined typical active burning period conditions. Second, we consider only the observations in December as the probability of large katabatic forcing due to strong temperature gradient, the source of the Santa Ana winds, is highest in December (Hughes and Hall, 2010). Third, as mentioned by Sapsis et al. (2016), even if colder conditions support fire growth in some areas of the United States, it is not true in California and so following Sapsis et al. (2016), we discard the data points where the recorded air temperature is less than 50F for model fitting and prediction.
The plot of the observed FFWI values at 61 RAWS corresponding to the day of Shekell Fire (December 3, 2006) in Ventura county is provided in the left panel of Figure 3. Large values of FFWI (values of 50 or more) across the stations indicate the potential influence of the weather variables on fire risk. In the right panel of Figure 3, the time series plot of FFWI (only December) at a representative station, Big Pines, is provided. The time series appears to be fairly stationary across the years. Similar stationary pattern is observed for most of the stations.
To motivate the need for a model with spatial and temporal extremal dependence, we compute empirical estimates of the extremal dependence in space and time. The -measure can be estimated empirically using F-madogram (Cooley et al., 2006) where we estimate the F-madogram based on replications of and and the corresponding empirical distribution functions and the relation . For calculating the spatial extremal dependence, first we decorrelate the time series at each station by fitting a AR(1) model and treating the residuals as independent replications of the spatial process, we calculate the -measure for each pair of stations. Further, we obtain smooth versus curve by fitting a nonparametric regression model with the pairwise -measures as responses and the distances between the stations as predictors. For each station , we estimate the temporal extremal dependence and further, averaging over , we estimate .
The spatial extremal dependence and spatially averaged temporal extremal dependence obtained as described above are provided in Figure 7 considering the whole Santa Ana region as well as corresponding to different fire-zones defined by Rolinski et al. (2016). From Figure 7, it appears that varies between 0.3 and 0.7 while varies between 0.1 and 0.4 and varies more than across the zones. Due to the nonzero spatial and temporal extremal dependence, inferences based on models having no asymptotic dependence like GPs are questionable.
SM Appendix K contains additional exploratory analysis, including plots that show different spatial trends for sample quantiles at different quantile levels and a test based on mixture models that suggests the data are non-Gaussian. These analyses motivate the need for a non-Gaussian model with separate parameters in the bulk and tail.
5.2 Data-specific adjustments
As observed in Figure 7, the spatial extremal dependence is likely to be different for different fire zones. Considering the STP-DPM model, the extremal dependence is stationary and the same for all zones (Remark 3). While depends on the parameters ’s, ’s and ’s in (2.7), allowing different ’s and ’s for different fire zones would be difficult to implement in the model though allowing separate ’s for each fire-zone is simple and the computation allows Gibbs sampling. Thus, allowing three separate ’s for three fire-zones along with STP-DPM model in (2.2), we call the model STP-DPM-Z.
The isotropic extremal dependence of the STP-DPM might be questionable considering the specific direction of wind through the mountain passes, variation of altitude and distance from the Pacific ocean. Due to the increased computational burden as well as the stability issue mentioned in Remark 3, we consider anisotropic Matérn covariance function similar to Haskard et al. (2007) for ’s. Instead of using Euclidean distance between two points in (2.2), following Eriksson and Siska (2000), we use the distance defined by in (2.2) where and and are two additional parameters treated as unknown with and . Thus, in (2.2), the covariance matrices are described by and for each . We call this model STP-DPM-A. Furthermore, by accounting for both the zonal and anisotropic components, we consider the model STP-DPM-A-Z.
5.3 Model comparisons
The model STP-DPM is assumed to have time-dependence as described in Subsection 2.4. The AR1 structure is considered for both and in case of STP-DPM and thus the model TP-DPM has time dependence only through . In contrast, GP-DPM has no time-dependence as are non-random in that case. Considering parametric alternatives, AR1 structure of STP and TP are constructed similar to STP-DPM and TP-DPM models respectively. For GP, we consider the spatial error process to have AR1 correlation structure in time. In addition to the time-dependent STP-DPM model with its three versions, also the time-independent STP-DPM models are considered. GP is taken as reference and all the alternatives are assessed in terms of relative performance in high level quantile estimation using a leave-one-site-out cross-validation. The spatially-invariant parameters are estimated only once based on the full data for each model. For a model , the prediction RMSE for a quantile is calculated as where and denote the CDF of the the posterior predictive distribution and the empirical CDF at site respectively. The RMSE skill score for model is defined as
where a model with higher value of is preferred.
The results are reported in Table 3. All entries are greater than one and generally increase with the quantile level (), indicating that all models outperform GP especially in the tails. Among the parametric models, STP has higher values compared to TP. When we ignore the temporal dependence, for the STP-DPM models are always smaller than a parametric STP model with temporal dependence indicating the importance to consider temporal extremal dependence structure. The STP-DPM with temporal extremal dependence yields smaller value than TP-DPM except for the 0.98-th quantile but adding both the zone-specific skewness terms and anisotropic covariance structure increases for all the quantiles with the highest improvement is 13.25% in case of the 0.97-th quantile.
A -year return level is calculated as -th quantile of the posterior predictive distribution considering December only. We compare the spatial prediction capabilities of the four time-dependent STP-DPM models along with two parametric alternatives GP and STP in Figure 5 based on the 1-year return levels. For all the models, FFWI values are higher in Zone 1 compared to the other zones. For GP, the estimated return levels are smaller compared to all other models throughout the region. For all the models, FFWI values are high near Santa Clarita which is expected as the heat waves move towards these regions through Soledad pass. For the STP-DPM-A-Z model, the values are highest between Santa Monica and Santa Clarita and also the values are higher than the estimates based on other models. While other models provide smoother estimates, the estimates based on STP-DPM-A-Z model is less spatially smooth which is more realistic considering the geography of the Santa Ana region.
6 Discussions and Conclusions
In this paper, we propose a very flexible semiparametric Bayesian model for spatiotemporal data that can model the bulk as well as the tail. The model automatically clusters the temporal replications (possibly dependent across time) and sets the extreme observations into one component with the thickest tail. Allowing separate parameters for the mixture components, we hope to prevent the data from the bulk to influence the parameters of the component of extremes. Considering infinite components, the model spans all possible densities over the spatial domain of interest and the model has nonstationary mean and covariance structure similar to Gelfand et al. (2005). Allowing isotropic covariance functions for the error process within each component, the extremal dependence measure is also isotropic. As the data is sparse for the component of extremes, a isotropic covariance function is justified from the theoretical and computational perspective. Using a simulation study, we demonstrate the performance of our method in predicting quantiles at the test sites. In case of daily measurements, ignoring temporal dependence can affect the model performance and hence we consider temporal extremal dependence.
The proposed model can be further generalized. As a shortcoming of our model, the spatial extremal dependence is nonzero throughout the entire spatial domain of interest. Thus, in case of a large spatial domain like the entire United States, a random partitioning the spatial domain is required as considered by Morris et al. (2017). Instead of considering specific random mean and scale structure of the Gaussian process to obtain a skew- process, Huser et al. (2017) consider a more general class of scale mixtures that allows both asymptotic dependence and independence. The temporal extremal dependence structure we consider does not allow Gibbs sampling for a number of model parameters and hence any existence of stationary gamma and half-normal processes that allow Gibbs sampling can lead to computational advantages.
The authors thank Tim Brown at the Desert Research Institute for providing the data, David Sapsis at CalFIRE for suggesting the analysis of FFWI and Raphaël Huser at KAUST for some valuable suggestions regarding the methodology. This work was partially supported by NSF grants DMS-1752280 and DMS-0454942, NOAA grant Z17-20337, DOE grant DE-AC02-05CH11231, DOI grant 14-1-04-9 and NIH grants R01ES027892 and 5P01 CA142538-09.
- Azzalini and Capitanio (2003) A. Azzalini and A. Capitanio. Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 65(2):367–389, 2003.
- Azzalini and Capitanio (2014) A. Azzalini and A. Capitanio. The skew-normal and related families. Institute of Mathematical Statistics Monographs, 2014.
Beal et al. (2002)
M. J. Beal, Z. Ghahramani, and C. E. Rasmussen.
The infinite hidden Markov model.Advances in Neural Information Processing Systems, 2002.
- Behrens et al. (2004) Cibele N. Behrens, Hedibert F. Lopes, and Dani Gamerman. Bayesian analysis of extreme events with threshold estimation. Stat. Model., 4(3):227–244, 2004. ISSN 1471-082X. doi: 10.1191/1471082X04st075oa. URL https://doi.org/10.1191/1471082X04st075oa.
- Bortot (2010) Paola Bortot. Tail dependence in bivariate skew-normal and skew-t distributions. Available online: www2. stat. unibo. it/bortot/ricerca/paper-sn-2. pdf, 2010.
- Carreau and Bengio (2009) Julie Carreau and Yoshua Bengio. A hybrid Pareto model for asymmetric fat-tailed data: the univariate case. Extremes, 12(1):53–76, 2009. ISSN 1386-1999. doi: 10.1007/s10687-008-0068-0. URL https://doi.org/10.1007/s10687-008-0068-0.
- Cooley et al. (2006) D. Cooley, P. Naveau, and P. Poncet. Variograms for spatial max-stable random fields. In Dependence in Probability and Statistics, pages 373–390. Springer, 2006.
- Davison and Huser (2015) A. C. Davison and R. Huser. Statistics of extremes. Annual Review of Statistics and its Application, 2:203–235, 2015.
- Davison et al. (2012) A. C. Davison, S. A. Padoan, and M. Ribatet. Statistical modeling of spatial extremes. Statistical Science, 27(2):161–186, 2012.
- Davison et al. (2013) A. C. Davison, R. Huser, and E. Thibaud. Geostatistics of dependent and asymptotically independent extremes. Mathematical Geosciences, 45(5):511–529, 2013.
- Dey and Yan (2016) D. K. Dey and J. Yan. Extreme value modeling and risk analysis: methods and applications. CRC Press, 2016.
- do Nascimento et al. (2012) Fernando Ferraz do Nascimento, Dani Gamerman, and Hedibert Freitas Lopes. A semiparametric Bayesian approach to extreme value estimation. Stat. Comput., 22(2):661–675, 2012. ISSN 0960-3174. doi: 10.1007/s11222-011-9270-z. URL https://doi.org/10.1007/s11222-011-9270-z.
- Eriksson and Siska (2000) M. Eriksson and P. P. Siska. Understanding anisotropy computations. Mathematical Geology, 32(6):683–700, 2000.
- Ferguson (1973) T. S. Ferguson. A Bayesian analysis of some nonparametric problems. The Annals of Statistics, pages 209–230, 1973.
- Ferguson (1974) T. S. Ferguson. Prior distributions on spaces of probability measures. The Annals of Statistics, pages 615–629, 1974.
- Ferreira et al. (2014) A. Ferreira, L. De Haan, et al. The generalized Pareto process; with a view towards application and simulation. Bernoulli, 20(4):1717–1737, 2014.
- Fosberg (1978) M. A. Fosberg. Weather in wildland fire management: The fire-weather index. Paper presented at the Conference on Sierra Nevada Meteorology, Am. Meteorol. Soc., South Lake Tahoe, California, 1978.
- Fox et al. (2011) E. B. Fox, E. B. Sudderth, M. I. Jordan, and A. S. Willsky. A sticky HDP-HMM with application to speaker diarization. The Annals of Applied Statistics, pages 1020–1056, 2011.
- Fraley and Raftery (2006) C. Fraley and A. E. Raftery. Mclust version 3: an R package for normal mixture modeling and model-based clustering. Technical report, Washington, University of Seattle, Department of Statistics, 2006.
- Fuentes et al. (2013) M. Fuentes, J. Henry, and B. J. Reich. Nonparametric spatial models for extremes: Application to extreme temperature data. Extremes, 16(1):75–101, 2013.
- Fúquene Patiño (2015) Jairo Alberto Fúquene Patiño. A semi-parametric Bayesian extreme value model using a Dirichlet process mixture of gamma densities. J. of Applied Stat., 42(2):267–280, 2015. doi: 10.1080/02664763.2014.947357.
- Gelfand and Schliep (2016) A. E. Gelfand and E. M. Schliep. Spatial statistics and Gaussian processes: A beautiful marriage. Spatial Statistics, 18:86–104, 2016.
- Gelfand et al. (2005) A. E. Gelfand, A. Kottas, and S. N. MacEachern. Bayesian nonparametric spatial modeling with Dirichlet process mixing. Journal of the American Statistical Association, 100(471):1021–1035, 2005.
- Genest and Nešlehová (2012) C. Genest and J. Nešlehová. Copulas and copula models. Encyclopedia of Environmetrics, 2012.
Ghosal and van der Vaart (2017)
S. Ghosal and A. van der Vaart.
Fundamentals of nonparametric Bayesian inference, volume 44. Cambridge University Press, 2017.
- Haskard et al. (2007) K. A. Haskard, B. R. Cullis, and A. P. Verbyla. Anisotropic Matérn correlation and spatial prediction using REML. Journal of Agricultural, Biological, and Environmental Statistics, 12(2):147, 2007.
Hu and Scarrott (2018)
Yang Hu and Carl Scarrott.
evmix: An R package for extreme value mixture modeling, threshold estimation and boundary corrected kernel density estimation.J. Stat. Software, 84(5):1–27, 2018. doi: 10.18637/jss.v084.i05. URL https://doi.org/10.18637/jss.v084.i05.
- Hughes and Hall (2010) M. Hughes and A. Hall. Local and synoptic mechanisms causing Southern California’s Santa Ana winds. Climate Dynamics, 34(6):847–857, 2010.
- Huser et al. (2017) R. Huser, T. Opitz, and E. Thibaud. Bridging asymptotic independence and dependence in spatial extremes using gaussian scale mixtures. Spatial Statistics, 21:166–186, 2017.
- Kambezidis and Kalliampakos (2016) H. D. Kambezidis and G. K. Kalliampakos. Fire-Risk Assessment in Northern Greece using a modified Fosberg Fire-Weather Index that includes forest coverage. International Journal of Atmospheric Sciences, 2016.
- Krupskii et al. (2018) P. Krupskii, R. Huser, and M. G. Genton. Factor copula models for replicated spatial data. Journal of the American Statistical Association, 113(521):467–479, 2018.
- MacDonald et al. (2011) A. MacDonald, C. J. Scarrott, D. Lee, B. Darlow, M. Reale, and G. Russell. A flexible extreme value mixture model. Comput. Statist. Data Anal., 55(6):2137–2157, 2011. ISSN 0167-9473. doi: 10.1016/j.csda.2011.01.005. URL https://doi.org/10.1016/j.csda.2011.01.005.
- Mathieu (2013) R. Mathieu. Spatial extremes: Max-stable processes at work. Journal de la Société Française de Statistique, 154(2):156–177, 2013.
- Morris et al. (2017) S. A. Morris, B. J. Reich, E. Thibaud, and D. Cooley. A space-time skew-t model for threshold exceedances. Biometrics, 2017.
S. A. Padoan.
Multivariate extreme models based on underlying skew-t and
Journal of Multivariate Analysis, 102(5):977–991, 2011.
- Padoan et al. (2010) S. A. Padoan, M. Ribatet, and S. A. Sisson. Likelihood-based inference for max-stable processes. Journal of the American Statistical Association, 105(489):263–277, 2010.
- Raphael (2003) M. N. Raphael. The Santa Ana winds of California. Earth Interactions, 7(8):1–13, 2003.
- Reich and Fuentes (2015) B. J. Reich and M. Fuentes. Spatial Bayesian nonparametric methods. In Nonparametric Bayesian Inference in Biostatistics, pages 347–357. Springer, 2015.
- Reich and Shaby (2012) B. J. Reich and B. A. Shaby. A hierarchical max-stable spatial model for extreme precipitation. The Annals of Applied Statistics, 6(4):1430, 2012.
- Ribatet and Sedki (2013) M. Ribatet and M. Sedki. Extreme value copulas and max-stable processes. Journal de la Société Française de Statistique, 154(1):138–150, 2013.
- Roads et al. (1991) J. O. Roads, K. Ueyoshi, S. C. Chen, J. Alpert, and F. Fujioka. Medium-range fire weather forecasts. International Journal of Wildland Fire, 1(3):159–176, 1991.
- Rolinski et al. (2016) T. Rolinski, S. B. Capps, R. G. Fovell, Y. Cao, B. J. D’Agostino, and S. Vanderburg. The Santa Ana wildfire threat index: methodology and operational implementation. Weather and Forecasting, 31(6):1881–1897, 2016.
- Sang and Gelfand (2009) H. Sang and A. E. Gelfand. Hierarchical modeling for extreme values observed over space and time. Environmental and Ecological Statistics, 16(3):407–426, 2009.
- Sang and Gelfand (2010) H. Sang and A. E. Gelfand. Continuous spatial process models for spatial extreme values. Journal of Agricultural, Biological, and Environmental Statistics, 15(1):49–65, 2010.
- Sapsis et al. (2016) D. Sapsis, T. Brown, C. Low, M. Moritz, D. Saah, and B. Shaby. Mapping Environmental Influences on Utility Fire Threat. A Report to the California Public Utilities Commission Pursuant to R.08–11-005 and R.15-05-006, 2016.
- Schmidt et al. (2017) A. M. Schmidt, K. Gonçalves, and P. L. Velozo. Spatio-temporal models for skewed processes. Environmetrics, 28(6), 2017.
- Sethuraman (1994) J. Sethuraman. A constructive definition of Dirichlet priors. Statistica Sinica, pages 639–650, 1994.
- Shaby et al. (2016) Benjamin A. Shaby, Brian J. Reich, Daniel Cooley, and Cari G. Kaufman. A Markov-switching model for heat waves. The Annals of Applied Statistics, 10(1):74–93, 2016. ISSN 1932-6157. doi: 10.1214/15-AOAS873. URL https://doi.org/10.1214/15-AOAS873.
- Sibuya (1960) M. Sibuya. Bivariate extreme statistics, i. Annals of the Institute of Statistical Mathematics, 11(3):195–210, 1960.
- Storlie et al. (2014) C. Storlie, J. Sexton, S. Pakin, M. Lang, B. Reich, and W. Rust. Modeling and predicting power consumption of high performance computing jobs. arXiv preprint arXiv:1412.5247, 2014.
- Thibaud et al. (2016) E. Thibaud, J. Aalto, D. S. Cooley, A. C Davison, J. Heikkinen, et al. Bayesian inference for the Brown–Resnick process, with an application to extreme low temperatures. The Annals of Applied Statistics, 10(4):2303–2324, 2016.
- Turkman et al. (2010) K. F. Turkman, M. A. A. Turkman, and J. M. Pereira. Asymptotic models and inference for extremes of spatio-temporal data. Extremes, 13(4):375–397, 2010.
- Wadsworth and Tawn (2013) J. L. Wadsworth and J. A. Tawn. Efficient inference for spatial extreme value processes associated to log-Gaussian random functions. Biometrika, 101(1):1–15, 2013.
- Wadsworth et al. (2010) J. L. Wadsworth, J. A. Tawn, and P. Jonathan. Accounting for choice of measurement scale in extreme value modeling. The Annals of Applied Statistics, pages 1558–1578, 2010.
- Wu and Ghosal (2010) Y. Wu and S. Ghosal. The L1-consistency of Dirichlet mixtures in multivariate Bayesian density estimation. Journal of Multivariate Analysis, 101(10):2411–2419, 2010.
Appendix A: Marginal and joint distributions of a Skew-t process
The densities of univariate and multivariate skew- distributions are provided in the following.
Univariate skew- distribution We call to follow a univariate skew- distribution with parameters if with and . The density function of is given by
are density and cumulative distribution functions (CDF) of univariate Student’sdistribution (location = 0 and scale = 1) with degrees of freedom.
Multivariate skew- distribution
We call to follow a -variate skew- distribution with parameters if with and . The density function of is given by
where with , is the density function of -variate Student’s distribution with location , shape matrix and degrees of freedom and is the CDF of univariate Student’s distribution (location = 0 and scale = 1) with degrees of freedom. The matrix is given by .
Appendix B: GEV-log transformation
In extreme value analysis, an observation obtained from some block maximum is assumed to be distributed as a generalized extreme value (GEV) distribution with CDF given by where
The location, scale and shape parameters are and respectively. These three parameters jointly determine the support of ; if , if and if . We denote .
The flexible Bayesian nonparametrics (BNP) tools are easier to implement if the support of is assumed to be the whole real line. Thus, we consider the transformation so that we can easily implement the BNP tools along with the observations have a more generalized support. By GEV transformation, we consider so that . The support of is . Further we consider a log transformation to obtain so that has support over the whole real line. Finally we apply the BNP tools over . Additional to flexible support, the transformation can add skewness for simple models like Gaussian processes. Though the transformation does not lead to spatial extremal dependence for when is asymptotically independent. Hence, the skew- process is required in spite of a transformed Gaussian process being skewed and the marginal distributions having thicker tail. As an illustration, we consider three transformations with a standard normal distribution in Figure 7.
Here we consider and transform to obtain . We fix the location and scale parameters by and . When , the density of and coincide. When , the density of is bounded below and have thicker right tail compared to . When , the density of is bounded above and have thicker left tail compared to .
Appendix C: Proof of Remark 2
We want to show that the proposed STP-DPM model covers the class of square-integrable processes with continuous mean and covariance functions defined on the domain . By setting , and for each (thus, has discrete support ’s), the proposed STP-DPM model has and given the cluster parameters and the mixture probabilities . Specifically, consider a generic process with continuous mean function and continuous covariance function