1 Introduction
The idea of “letting the tail speak for itself” by ignoring the bulk of the data is effective in forcing fitting procedures focus on the extremes, where primary interest lies, rather than being overwhelmed by data from the bulk of the distribution, which is by definition much more plentiful. However, the price to be paid in employing this strategy is that it forces a somewhat arbitrary decision about what counts as extreme, and more importantly, it disallows the possibility that events that are large but deemed insufficiently extreme can enter into the analysis at all. This leads to the question of how to model a major portion of the spatiotemporal data without thresholding at a high value where our main aim is in spatial prediction of high marginal quantiles.
Gaussian processes (GPs) are by far the most common model in spatial statistics due to their good theoretical properties, tractability in highdimensions and the ease of computation even for hierarchical models (Gelfand and Schliep, 2016). However, GPs are criticized for modeling spatial extremes as the extremal dependence between any two spatial locations is zero (Davison et al., 2013). While a single transformation is usually considered to approximate normality (Schmidt et al., 2017) along with modeling the transformed data using a GP, the extremal dependence between two spatial locations is invariant of the data transformation and hence still zero. Gelfand et al. (2005) propose a flexible nonparametric and nonstationary model based on Dirichlet process mixture (DPM) of GPs which relaxes the Gaussian assumption; however the spatial extremal dependence remains zero. Hence, even if the model is appropriate for modeling the bulk of the spatial data, it is questionable for modeling the extremes.
Literature on spatial modeling of extremes cover a number of approaches like Bayesian hierarchical models (Sang and Gelfand, 2009, 2010; Turkman et al., 2010), copulabased approaches (Ribatet and Sedki, 2013; Fuentes et al., 2013; Genest and Nešlehová, 2012) and maxstable processes (Reich and Shaby, 2012; Mathieu, 2013; Davison and Huser, 2015). Davison et al. (2012)
compare these approaches and conclude that the copula and maxstable process approaches fit the joint distribution better and hence preferred for modeling spatial extremes. In spite of good theoretical properties of maxstable processes in explaining spatial extremes, drawing inference is challenging because the full joint density of a multivariate generalized extreme value (GEV) distribution can be calculated only when the dimension is small, leading to less efficient techniques like composite likelihoods
(Padoan et al., 2010); recently Thibaud et al. (2016) proposed a hierarchical Bayesian model approach for approximating the full joint distribution though the approach is computationally intensive. Limiting Poisson (Wadsworth and Tawn, 2013) and the generalized Pareto (Ferreira et al., 2014) approaches are theoretically justified models for threshold exceedances and are computationally tractable. A nonparametric copulabased model is proposed by Fuentes et al. (2013) where the spatial dependence between the extreme observations is modeled by DPM of GPs but the marginals are modeled as GEV distributions. Krupskii et al. (2018) propose factor copula models for replicated spatial data which can model tail dependence and tail asymmetry though the model is parametric. Morris et al. (2017) propose a Bayesian spatiotemporal model for threshold exceedances based on spatial skew processes (STPs).Mixture models have been used extensively in univariate extreme value analysis. Models that splice a generalized Pareto tail to a different “bulk distribtion” were proposed by Behrens et al. (2004) and extended by Carreau and Bengio (2009), MacDonald et al. (2011), Hu and Scarrott (2018), and others. do Nascimento et al. (2012)
spliced a generalized Pareto tail to a finite mixture of gamma distributions, while
Fúquene Patiño (2015) extends this model to a Dirichlet process mixture of gammas. In the time series context, Shaby et al. (2016) use a dependent mixture of a normal bulk and generalized Pareto tail to model heat waves. Though the choices of high thresholds are arbitrary, for example, 0.98th data quantile in Shaby et al. (2016), and the ideas are not readily applicable in spatial extremes and computational burden is an issue.Except the geostatistics approach of Gelfand et al. (2005), other spatial extremes approaches use high thresholds or block maximums which lead to the situation where moderately extreme observations are discarded or censored from the analysis. In the univariate literature using mixture model, even if the full data is used, an arbitrary threshold is chosen and the observations below and above the threshold are modeled differently.
We use a hierarchical mixture of skew
processes to model spatial dependence, with particular emphasis on modeling the joint tail. This approach differs from the current prevailing practice in extreme event analysis in that we model the entire spatial process, whereas the current state of the art either throws away or censors observations that are not considered extreme. Our mixture approach seeks the advantages of letting the tail speak for itself without sacrificing the ability to consider nonextreme events. It simultaneously probabilistically clusters events and estimates cluster dependence parameters, so that events that land in the extreme cluster with high probability influence the fit in the tail, while events that land in other clusters with high probability do not. Taken together, the mixture components can both flexibly model the bulk of the distribution and fit the tail dependence in an uncontaminated way. Furthermore, each mixture component is a skew
process, which itself considerably more flexible than standard tools like Gaussian processes that allows spatial extremal dependence.Our data example illustrates a situation where elevated but nonextreme events play an important role, even if the primary interest lies in the most extreme events. We analyze a fire threat index called Fosberg Fire Weather Index (FFWI) in a fireprone region in Southern California. Here, risk management requires knowledge of the probable spatial extent of the most extreme fire weather conditions, so estimating the joint tail characteristics is key. However, elevated but not necessarily extreme fire weather still poses substantial risk, so it cannot be ignored. Finally, we compare the performance of several models in predicting high quantiles of FFWI using a leaveoneout crossvalidation and use our model to make spatial maps of FFWI extremes.
2 Methodology
In this section, first we develop a nonparametric Bayesian spatial model using a Dirichlet process mixture (DPM) of spatial skew processes (STPs) assuming temporal independence. In Section 2.4, we propose an extension to accommodate the temporal extremal dependence.
2.1 Spatial skew process
Let be the (potentially transformed) observation at the monitoring site located at and time (we discuss the transformation in Subsection 2.2). Our Bayesian nonparametric (BNP) model for is based on the parametric STPs described in this subsection. STPs (Padoan, 2011) are richer models than GPs. Unlike GPs, STPs permit heavy tails and asymmetry in the marginal distribution and asymptotic spatial dependence for extremes. Borrowing ideas from additive processes (Azzalini and Capitanio, 2003, 2014), Morris et al. (2017) create a STP as a locationscale mixture of a GP.
Let be a STP defined over the spatial domain of interest . At a spatial location , we can write
(2.1) 
where is the spatiallyvarying mean process, and . Assuming
follow a standard normal distribution, marginally over the random
and , follows a skew distribution with location , scale , skewness (see Supplementary material (SM) Appendix A). To account for spatial dependence, the error process is assumed to follow a standard GP with correlation . We assume the isotropic Matérn correlation function(2.2) 
where is the Euclidean distance between and , , and are the range, smoothness and the ratio of spatial to total variation respectively. In (2.2), is the modified Bessel function of degree and if and 0 otherwise.
For a set of locations , let
denote the vector of observations and
. After marginalization over and , [matching the notations of Azzalini and Capitanio (2014)], the joint distribution of is(2.3) 
where is the dimensional matrix obtained by evaluating at ; denote the set of parameters by . We denote the skew density function as ; more details about the joint density of the STPs are provided in SM Appendix A.
2.2 Dirichlet process mixture (DPM) model
When a spatial process is observed at only one time point, a parametric model like a STP is required. However, with temporal replications, the spatial process can be estimated semiparametrically or nonparametrically. We temporarily assume that
are iid dimensional realizations from a DPM of STPs (henceforth, STPDPM) with the density described by(2.4) 
where is the number of mixture components, are the mixture probabilities with , denotes the set of parameters of the th component and denotes the density function of a dimensional realization from the skew process as described in Subsection 2.1. The density function denotes the density function of a DPM of multivariate skew densities (henceforth, MSTDPM). For a fully nonparametric model, .
An equivalent representation of (2.4) is the clustering model described below. Let denote the cluster label of the replication at time with , and . Therefore, the STPDPM model clusters similar replications (e.g. days) and models the clusters using STPs. The motivation for this clustering is to separate the bulk from the extremes, and by allowing the clusters to have different ’s, we hope to prevent data from the bulk to influence parameter estimates for the tails. In our Bayesian model, the cluster labels are treated as unknown parameters and thus we account for uncertainty in cluster allocation.
Given the cluster label , the conditional distribution of is
(2.5) 
where is the spatial correlation defined by in (2.2) with parameters and . The set of parameters for time is with for .
The mixture probabilities, ’s, are sequentially constructed following the stickbreaking representation proposed by Sethuraman (1994) so that they sum to one and hence the ’s “break the stick” of unit length. The first mixture probability is modeled as , where . Subsequently, the th mixture probability is constructed as where is the probability not considered by the first components and . In case of finite , we set so that which ensures that . The case of corresponds to the Dirichlet process prior (Ferguson, 1973, 1974).
Further, we put hyperpriors on the clusterspecific parameters
’s which are the atoms of the stickbreaking process. For , the hyperpriors relate to the base measure of the corresponding Dirichlet process. We assume and the components of ’s are independently distributed. Choices of the parameters of the hyperpriors are discussed in Section 3.The STPDPM model has support . Extreme value analysis often deals with bounded distributions. We therefore include a transformation step to allow a more flexible model for the bounds of the distribution. Let be the FFWI at the monitoring site located at and time . We assume that the support of to be same as the support of a generalized extreme value (GEV) distribution with location, scale and shape parameters and respectively; thus, the support of is if , if and if . We relate the observed and transformed data using monotonicallyincreasing GEVlog transformation . If , the transformation is . The transformed variables are then modeled flexibly using a nonparametric approach; thus, the marginal distributions of are not GEV. Advantages of power transformations, e.g., BoxCox transformation, have been discussed by Wadsworth et al. (2010) mainly in improving the convergence rate of FisherTippettGnedenko theorem. We emphasize that the GEV parameters and are treated as unknown in our fully Bayesian analysis. More details regarding the transformation are provided in SM Appendix B.
2.3 Model properties
From the infinite mixture model representation () in (2.4), it is evident that for any spatial locations (), the class of dimensional joint densities is a superset of the class of priors of Gelfand et al. (2005) which is a special case by setting and for each . Thus, MSTDPM prior spans the entire set of joint densities for any set of spatial locations (Gelfand et al., 2005; Reich and Fuentes, 2015). Under suitable regularity conditions, the posterior consistency of the DPM with multivariate Gaussian kernels is proven by Wu and Ghosal (2010). The posterior consistency of the DPM with multivariate skew kernels holds from the fact that the skew
distribution can be obtained from the normal distribution by marginalizing the random location and scale and hence, the KullbackLeibler divergence between the true density and the estimate based on our model is smaller than its value for the normal distribution follows from Lemma B.11 of
Ghosal and van der Vaart (2017).Lemma 1.
For the model (2.2), given the cluster parameters , the conditional mean and covariances of (assuming for each with ) are
where .
Remark 1.
The mean and covariance are both dependent on and and cannot be reduced to a function of and hence the model has both nonstationary mean and covariance structure.
Remark 2.
By setting , and for each with , the model (2.2) is a spatial Dirichlet process where has discrete support ’s with and given the cluster parameters and the mixture probabilities . The mean and covariance functions span the mean and covariance function of any squareintegrable stochastic process. The proof is provided in SM Appendix C.
The extremal dependence between two random variables
and is often quantified using the measure (Sibuya, 1960) given by(2.6) 
where and are marginal distribution functions of and respectively. A value of near 1 indicates strong asymptotic dependence while defines asymptotic independence. For a spatiotemporal process, the extremal dependence between two spatial locations and is defined by or for , the Euclidean distance between the locations, when the spatial extremal dependence is isotropic (assuming both the forms to be timeinvariant). Similarly, for a spatial location , the temporal extremal dependence between two time points and , is defined by and further, assuming the measure being stationary in time and spatiallyinvariant, we denote it by where denotes the temporal lag.
Conditioning on the cluster parameters and the mixture probabilities , the spatial extremal dependence for the proposed STPDPM model is given by the following theorem. The proof is provided in the SM Appendix D.
Theorem 2.1.
The extremal dependence measure for the STPDPM model in Subsection 2.2 is given by
(2.7) 
with , , is the survival function for a Student’s distribution with degrees of freedom, and .
Remark 3.
The measure is dependent on and only through ; thus, even if the model specification is very flexible with nonstationary mean and covariance structures (Lemma 1), the extremal dependence is isotropic. This characteristic may be appealing in many applications because data in the tail are sparse and thus simple models are needed to provide stability. To allow for nonstationary extremal dependence, we can relax the isotropic covariance structure and assume a nonstationary covariance structure for the . Simpler alternatives like anisotropic Matérn covariance structure which has only two additional parameters (discussed later in Section 5) can be considered as well.
Remark 4.
The measure depends only on the cluster with the smallest degrees of freedom which is the component with the heaviest tail. Thus, the extreme observations are likely to be clustered into one component with the heaviest tail and as we allow different parameters for each cluster, the data appearing from other clusters with lighter tails do not influence the parameters of the cluster with the thickest tail. Thus, STPDPM model allows a probabilistic partitioning of the tail from the bulk and prevents the bulk from influencing on the inference about the extremes.
Remark 5.
Similar to the uppertail extremal dependence in (Appendix E: Subasymptotic conditional exceedance probabilities), the lowertail case is . For the STPDPM model, has a similar form as in (2.7) except replaced by . Thus, in both the tails the measure depends only on the cluster with the smallest degrees of freedom and therefore it may be that extremely small values influence the estimates of the parameters in the upper tail. To bypass this issue, we censor the observations in the left tail (below 0.1th quantile) to remove the effect of the lowertail data on the uppertail extremal dependence parameters.
Some corollaries of Theorem 2.1 for the extremal dependence in case of submodels, e.g., GP (), TP (), STP (), DPM of GPs (Gelfand et al. (2005), GPDPM) and DPM of TPs (TPDPM) are as follows. Since the measure is isotropic for all processes, we denote simply by .

GP, GPDPM: .

TP: .

STP: .

DPM of TPs: with .
In Figure 1 (left), we plot versus for several parameter choices (for convenience, we drop the subscript from , and ). The extremal dependence decreases with increasing (for , the extremal dependence remains zero for any , the case of a skewnormal process). For any finite , is positive, even if is zero. The lines corresponding to and 0.25 show that a positive skewness increases . While we discuss the asymptotic conditional exceedance probabilities as measure, the subasymptotic exceedance probabilities (without considering the limit) for a skew process with different parameter choices are discussed in SM Appendix E.
2.4 Extension to the spatiotemporal Dirichlet process
As we have discussed in Section Appendix J: Fosberg Fire Weather Index, the FFWI demonstrate temporal extremal dependence. The literature on dependent DPM approaches is dominated by the autoregressive DPM approach (Beal et al., 2002; Fox et al., 2011; Storlie et al., 2014)
which consider a Markov model of the cluster indexes
’s in (2.2). While this approach adds temporal autocorrelation, this does not capture extremal temporal dependence as given by the following theorem. The proof is provided in SM Appendix F.Theorem 2.2.
Consider the STPDPM model in (2.2). The temporal extremal dependence at any spatial location is zero if the temporal dependence in the spatiotemporal process is constructed only through the temporal dependence of the cluster indexes , i.e., given and , and are independent. The only exception is the case of which leads to exact dependence.
Following Morris et al. (2017), we consider an AR(1) structure for the and . To ensure the spatial process is DPM of STPs, the inverse gamma distribution of with parameters and and the halfnormal distribution of needs to be preserved and it is done as follows. Suppose and denote the CDFs of halfnormal and inverse gamma distributions respectively. Thus, for each , and and so and . We specify an AR(1) structure as follows.
(2.8) 
This specification ensures the process is stationary across time.
It is challenging to derive an analytical expression for the temporal extremal dependence. Based on a simulated data, we demonstrate the presence of the dependence. We consider a model with mixture components and mixing probabilities . The vector of skew parameters for the two components to be and respectively ( denotes the location parameter). We generate lag observations for from our model setting , for . The lag measure is estimated using the Fmadogram. The right panel of Figure 1 suggests that the extremal dependence increases as increases to 1 and the extremal dependence decreases as increases.
3 Computation
We use Markov chain Monte Carlo (MCMC) methods for model fitting and prediction. We consider conjugate priors for the parameters whenever possible which helps in updating the parameters using Gibbs sampling. For the GEV parameters, we assume
, and . For the purpose of computation, we fix the number of components in the stickbreaking model at by setting . For the parameters of the base measure , we assume
, , ,

, ,

, truncated above at 20, .
Here denotes the correlation matrix obtained from in (2.2) with Matérn parameters and
. For hyperparameters of the base measure, we consider
, , truncated above at 20 and . The DP concentration parameter . We tune the hyperparameters differently to allow better mixing within MCMC and the choices are discussed in Section 5. More details regarding the prior choices are provided in SM Appendix G.The MCMC steps are a combination of Gibbs sampling and MetropolisHastings algorithm. The steps of model fitting and prediction are provided in SM Appendix H.
4 Simulation Studies
In this section, we perform a simulation study to assess the performance of our proposed model in spatial prediction of marginal quantiles and in estimation of extremal dependence. We compare the performances of GPs, TPs, STPs, GPDPM and TPDPM with the proposed STPDPM model. STPs are considered to be of the form (2.1) while TPs and GPs are submodels of (2.1) by setting for TPs and and both for GPs. Considering maxstable processes (MSPs) as alternatives, the MSP of Reich and Shaby (2012) is compared with STPs in Figure 3 of Morris et al. (2017) and it appears that when the data are generated from a MSP, MSPs perform only slightly better than STPs, while in case of data generated from STPs, MSPs perform quite poorly. Considering these results and the computational burden associated with MSPs, we do not include MSPs in the simulation study.
4.1 Simulation design
We generate 100 datasets from each of the 6 designs: (1) GP, (2) TP, (3) STP, (4) GPDPM, (5) TPDPM and (6) STPDPM. In each case, data are generated at sites and time points. The sites are generated uniformly on the unit square and . As our main aim is spatial prediction of timeinvariant quantiles, we generate independent replications of the spatial process. For Design (6), we consider a threecomponent mixture of STPs as in (2.3) with the parameters in Table 1. For simulation from Design (5), we consider same parameters as in Table 1 except that we set the ’s to zero. Additionally we set
’s to infinity in case of Design (4), i.e., the components have fixed variance equal to
’s. For Design (3), we consider the parameters of the third component in Table 1 except that we set the Matérn parameters , and . Design (2) has same parameters as in Design (3) except and additionally for Design (1). For each model, we transform the simulated data to using inverse GEVlog transformation with , and .1  0.25  1  2  0.9  0.5  1  
2  0.25  0.5  4  0.5  0.1  0.1  
3  0.5  1  6  1  0.1  2  0.5 
We use the priors given in Section 3
. We run each MCMC chain for 20,000 iterations, discard first 10,000 iterations as burnin and out of the postburnin samples, we perform thinning by keeping one in each five samples. Here we are not interested in estimating specific parameters which is complicated due to label switching throughout the MCMC. Rather, we are interested in quantiles of the posterior predictive distribution and the extremal dependence. We monitor the convergence for a set of quantiles and the MCMC converges well for these quantities.
We fit the models to 50 sites and predict the true quantiles for 10 additional test sites. For a test site , suppose the true marginal distribution is given by and denote as the posterior predictive distribution function at . For , models are judged based on the difference ; a model with close to zero is preferred for that (correspondingly return level). A positive (negative) value of indicates overestimation (underestimation) of the true quantile. For each of the 100 datasets generated from six designs, we fit all the models and plot versus for , averaged across the test sites and the datasets in Figure 2. While our main interest lies in the inference about the tails, the proposed method can model the full support very flexibly and hence we consider lowthroughhigh values of . While indicates the bias in prediction, the root mean squared error (RMSE) of prediction for the 0.95th quantile is provided in Table 2.
For evaluating the performance in estimating the extremal dependence, we compare the true measure versus the estimates based on the six models, GP through STPDPM. For models GP and GPDPM, estimated extremal dependence is always zero. The plots of the estimated based on models TP, STP, TPDPM and STPDPM along with the true are provided in Figure 6 of SM Appendix I.
4.2 Results
First, we compare the models based on estimating the CDF (Figure 2 and Table 2). When the data are generated from Design (1), all models perform well (with highest is approximately 0.0075) while GP and GPDPM perform better than other models. For Design (2), GP and GPDPM perform worse than other models. In case of Design (3), STP and STPDPM perform well while all other models lead to poor prediction performances. In all three cases, the STPDPM model has only slightly higher MSE than the models with smallest RMSE values. When the data are generated from the mixture models, the DPM models perform better than the parametric models both in terms of prediction bias and prediction RMSE. In case of Design (4), GPDPM model performs better at the tails while TPDPM and STPDPM performs slightly better for the bulk. For Design (5), TPDPM and STPDPM perform better than GPDPM and for Design (6), STPDPM perform the best followed by GPDPM. Similar ordering is observed for the prediction RMSE of the 0.98th and 0.99th quantiles as well. Overall, STPDPM performs equally well or better than other models, both in terms of prediction bias and prediction RMSE as well as estimation of irrespective of the data generating model.
Design  GP  TP  STP  GPDPM  TPDPM  STPDPM 

1  0.45 (0.02)  0.52 (0.02)  0.54 (0.03)  0.43 (0.02)  0.50 (0.02)  0.56 (0.03) 
2  0.83 (0.05)  0.58 (0.03)  0.67 (0.04)  0.90 (0.05)  0.60 (0.03)  0.74 (0.04) 
3  1.75 (0.18)  1.76 (0.09)  1.56 (0.11)  1.61 (0.10)  1.76 (0.09)  1.47 (0.09) 
4  1.96 (0.03)  2.94 (0.05)  4.29 (0.12)  0.44 (0.02)  0.49 (0.02)  0.49 (0.02) 
5  1.75 (0.04)  3.64 (0.11)  11.66 (0.47)  0.66 (0.03)  0.63 (0.02)  0.63 (0.03) 
6  2.61 (0.11)  4.93 (0.24)  20.52 (0.86)  1.40 (0.08)  1.53 (0.08)  1.20 (0.08) 
Comparison of models GP, TP, STP, GPDPM, TPDPM and STPDPM based on the prediction RMSE of the 0.95th quantile when the data are generated from the Simulation Designs 16. The standard errors are in the parentheses.
5 Data application
Southern California is susceptible to catastrophic wildfires which are often caused by Santa Ana winds. Santa Ana winds occur mainly in the counties Ventura, Los Angeles, Orange, Santa Bernardino, Riverside and San Diego. During the late fall and winter, the Santa Ana winds originate from the Great Basin and heat up as they cross the mountains, move towards the coast due to offshore surface pressure gradients and often lead to wildfires (Raphael, 2003). This phenomenon is most common in December (Hughes and Hall, 2010). For example, the Thomas fire in December, 2017 is associated with the Santa Ana winds and is considered to be the largest wildfire in the modern history of California (https://inciweb.nwcg.gov/incident/5670). An understanding of the future risk of wildfires is required for improved disaster management. As the whole Santa Ana region is a small geographic domain, the fire risk is likely to be high throughout the region on a particular day of extreme weather. Therefore, models for analyzing fire risk in this region should be capable of exhibiting extremal spatial dependence.
Fosberg Fire Weather Index (FFWI) is a wellestablished measure that quantifies the potential influence of important weather parameters on fire risk (Fosberg, 1978). It is a nonlinear function of air temperature, wind speed and relative humidity (the functional form of the filter is provided in the SM Appendix J). The National Oceanic and Atmospheric Administration (NOAA) considers FFWI larger than 50 to be significant on a national scale. The Storm Prediction Center (SPC) fire weather verification scheme (http://www.spc.noaa.gov) uses FFWI for fire danger rating ranging between high to extremes. Disaster management policies often consider moderatetoextreme quantiles of weather parameters and hence require modeling of the bulk as well as the tail (Dey and Yan, 2016) of the FFWI observations using a proper spatiotemporal model.
5.1 Fosberg Fire Weather Index data
Our data consists of hourly FFWI observations from 61 Remotely Allocated Monitoring Stations (RAWS) across the Santa Ana region from December 31, 2003 through December 31, 2013. More details about RAWS are available at https://www.wrh.noaa.gov/sto/obsmap.php. Based on the definition of firezones of the Santa Ana region by Rolinski et al. (2016), the whole region is divided into three zones with Zone 1 comprises parts of Ventura and Los Angeles Counties, Zone 2 covers the parts of Orange, San Bernardino and Riverside Counties while Zone 3 corresponds to most of the regions within the San Diego County. While the index is not truncated at 100 in general, e.g., Roads et al. (1991), Sapsis et al. (2016), some authors consider truncation at 100, considering it as a threshold for the extreme fire situation (Kambezidis and Kalliampakos, 2016). In this paper, we use the raw index without truncation. As a first step for data preprocessing, we consider daily maximum of the hourly observations. This step follows the protocols of National Fire Danger Rating System (NFDRS) where daily maximums are considered to be the representative of the combined typical active burning period conditions. Second, we consider only the observations in December as the probability of large katabatic forcing due to strong temperature gradient, the source of the Santa Ana winds, is highest in December (Hughes and Hall, 2010). Third, as mentioned by Sapsis et al. (2016), even if colder conditions support fire growth in some areas of the United States, it is not true in California and so following Sapsis et al. (2016), we discard the data points where the recorded air temperature is less than 50F for model fitting and prediction.
The plot of the observed FFWI values at 61 RAWS corresponding to the day of Shekell Fire (December 3, 2006) in Ventura county is provided in the left panel of Figure 3. Large values of FFWI (values of 50 or more) across the stations indicate the potential influence of the weather variables on fire risk. In the right panel of Figure 3, the time series plot of FFWI (only December) at a representative station, Big Pines, is provided. The time series appears to be fairly stationary across the years. Similar stationary pattern is observed for most of the stations.
To motivate the need for a model with spatial and temporal extremal dependence, we compute empirical estimates of the extremal dependence in space and time. The measure can be estimated empirically using Fmadogram (Cooley et al., 2006) where we estimate the Fmadogram based on replications of and and the corresponding empirical distribution functions and the relation . For calculating the spatial extremal dependence, first we decorrelate the time series at each station by fitting a AR(1) model and treating the residuals as independent replications of the spatial process, we calculate the measure for each pair of stations. Further, we obtain smooth versus curve by fitting a nonparametric regression model with the pairwise measures as responses and the distances between the stations as predictors. For each station , we estimate the temporal extremal dependence and further, averaging over , we estimate .
The spatial extremal dependence and spatially averaged temporal extremal dependence obtained as described above are provided in Figure 7 considering the whole Santa Ana region as well as corresponding to different firezones defined by Rolinski et al. (2016). From Figure 7, it appears that varies between 0.3 and 0.7 while varies between 0.1 and 0.4 and varies more than across the zones. Due to the nonzero spatial and temporal extremal dependence, inferences based on models having no asymptotic dependence like GPs are questionable.
SM Appendix K contains additional exploratory analysis, including plots that show different spatial trends for sample quantiles at different quantile levels and a test based on mixture models that suggests the data are nonGaussian. These analyses motivate the need for a nonGaussian model with separate parameters in the bulk and tail.
5.2 Dataspecific adjustments
As observed in Figure 7, the spatial extremal dependence is likely to be different for different fire zones. Considering the STPDPM model, the extremal dependence is stationary and the same for all zones (Remark 3). While depends on the parameters ’s, ’s and ’s in (2.7), allowing different ’s and ’s for different fire zones would be difficult to implement in the model though allowing separate ’s for each firezone is simple and the computation allows Gibbs sampling. Thus, allowing three separate ’s for three firezones along with STPDPM model in (2.2), we call the model STPDPMZ.
The isotropic extremal dependence of the STPDPM might be questionable considering the specific direction of wind through the mountain passes, variation of altitude and distance from the Pacific ocean. Due to the increased computational burden as well as the stability issue mentioned in Remark 3, we consider anisotropic Matérn covariance function similar to Haskard et al. (2007) for ’s. Instead of using Euclidean distance between two points in (2.2), following Eriksson and Siska (2000), we use the distance defined by in (2.2) where and and are two additional parameters treated as unknown with and . Thus, in (2.2), the covariance matrices are described by and for each . We call this model STPDPMA. Furthermore, by accounting for both the zonal and anisotropic components, we consider the model STPDPMAZ.
5.3 Model comparisons
The model STPDPM is assumed to have timedependence as described in Subsection 2.4. The AR1 structure is considered for both and in case of STPDPM and thus the model TPDPM has time dependence only through . In contrast, GPDPM has no timedependence as are nonrandom in that case. Considering parametric alternatives, AR1 structure of STP and TP are constructed similar to STPDPM and TPDPM models respectively. For GP, we consider the spatial error process to have AR1 correlation structure in time. In addition to the timedependent STPDPM model with its three versions, also the timeindependent STPDPM models are considered. GP is taken as reference and all the alternatives are assessed in terms of relative performance in high level quantile estimation using a leaveonesiteout crossvalidation. The spatiallyinvariant parameters are estimated only once based on the full data for each model. For a model , the prediction RMSE for a quantile is calculated as where and denote the CDF of the the posterior predictive distribution and the empirical CDF at site respectively. The RMSE skill score for model is defined as
(5.1) 
where a model with higher value of is preferred.
5.4 Results
The results are reported in Table 3. All entries are greater than one and generally increase with the quantile level (), indicating that all models outperform GP especially in the tails. Among the parametric models, STP has higher values compared to TP. When we ignore the temporal dependence, for the STPDPM models are always smaller than a parametric STP model with temporal dependence indicating the importance to consider temporal extremal dependence structure. The STPDPM with temporal extremal dependence yields smaller value than TPDPM except for the 0.98th quantile but adding both the zonespecific skewness terms and anisotropic covariance structure increases for all the quantiles with the highest improvement is 13.25% in case of the 0.97th quantile.
Model  

TP  3.57  3.99  4.75  5.71  6.61  6.55  6.54 
STP  5.16  5.83  5.92  7.63  7.94  8.85  8.43 
GPDPM  7.81  8.43  9.12  10.74  10.10  10.53  9.21 
TPDPM  6.93  7.57  9.03  10.44  10.71  11.31  10.45 
Timeindep. cases  
STPDPM  1.69  2.62  4.00  4.80  5.41  6.59  6.90 
STPDPMZ  2.95  3.12  3.89  5.01  5.64  7.29  8.23 
STPDPMA  3.15  3.76  4.93  6.01  6.46  7.73  7.62 
STPDPMAZ  3.30  4.29  5.55  6.18  6.84  7.33  7.81 
AR1 cases  
STPDPM  6.84  7.54  8.57  9.38  10.29  10.88  11.26 
STPDPMZ  4.29  4.67  5.56  6.84  7.83  8.32  9.41 
STPDPMA  7.36  7.52  8.28  9.65  10.55  11.15  10.67 
STPDPMAZ  9.43  10.22  11.48  12.03  12.51  13.25  11.80 
A year return level is calculated as th quantile of the posterior predictive distribution considering December only. We compare the spatial prediction capabilities of the four timedependent STPDPM models along with two parametric alternatives GP and STP in Figure 5 based on the 1year return levels. For all the models, FFWI values are higher in Zone 1 compared to the other zones. For GP, the estimated return levels are smaller compared to all other models throughout the region. For all the models, FFWI values are high near Santa Clarita which is expected as the heat waves move towards these regions through Soledad pass. For the STPDPMAZ model, the values are highest between Santa Monica and Santa Clarita and also the values are higher than the estimates based on other models. While other models provide smoother estimates, the estimates based on STPDPMAZ model is less spatially smooth which is more realistic considering the geography of the Santa Ana region.
6 Discussions and Conclusions
In this paper, we propose a very flexible semiparametric Bayesian model for spatiotemporal data that can model the bulk as well as the tail. The model automatically clusters the temporal replications (possibly dependent across time) and sets the extreme observations into one component with the thickest tail. Allowing separate parameters for the mixture components, we hope to prevent the data from the bulk to influence the parameters of the component of extremes. Considering infinite components, the model spans all possible densities over the spatial domain of interest and the model has nonstationary mean and covariance structure similar to Gelfand et al. (2005). Allowing isotropic covariance functions for the error process within each component, the extremal dependence measure is also isotropic. As the data is sparse for the component of extremes, a isotropic covariance function is justified from the theoretical and computational perspective. Using a simulation study, we demonstrate the performance of our method in predicting quantiles at the test sites. In case of daily measurements, ignoring temporal dependence can affect the model performance and hence we consider temporal extremal dependence.
The proposed model can be further generalized. As a shortcoming of our model, the spatial extremal dependence is nonzero throughout the entire spatial domain of interest. Thus, in case of a large spatial domain like the entire United States, a random partitioning the spatial domain is required as considered by Morris et al. (2017). Instead of considering specific random mean and scale structure of the Gaussian process to obtain a skew process, Huser et al. (2017) consider a more general class of scale mixtures that allows both asymptotic dependence and independence. The temporal extremal dependence structure we consider does not allow Gibbs sampling for a number of model parameters and hence any existence of stationary gamma and halfnormal processes that allow Gibbs sampling can lead to computational advantages.
Acknowledgement
The authors thank Tim Brown at the Desert Research Institute for providing the data, David Sapsis at CalFIRE for suggesting the analysis of FFWI and Raphaël Huser at KAUST for some valuable suggestions regarding the methodology. This work was partially supported by NSF grants DMS1752280 and DMS0454942, NOAA grant Z1720337, DOE grant DEAC0205CH11231, DOI grant 141049 and NIH grants R01ES027892 and 5P01 CA14253809.
References
 Azzalini and Capitanio (2003) A. Azzalini and A. Capitanio. Distributions generated by perturbation of symmetry with emphasis on a multivariate skew tdistribution. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 65(2):367–389, 2003.
 Azzalini and Capitanio (2014) A. Azzalini and A. Capitanio. The skewnormal and related families. Institute of Mathematical Statistics Monographs, 2014.

Beal et al. (2002)
M. J. Beal, Z. Ghahramani, and C. E. Rasmussen.
The infinite hidden Markov model.
Advances in Neural Information Processing Systems, 2002.  Behrens et al. (2004) Cibele N. Behrens, Hedibert F. Lopes, and Dani Gamerman. Bayesian analysis of extreme events with threshold estimation. Stat. Model., 4(3):227–244, 2004. ISSN 1471082X. doi: 10.1191/1471082X04st075oa. URL https://doi.org/10.1191/1471082X04st075oa.
 Bortot (2010) Paola Bortot. Tail dependence in bivariate skewnormal and skewt distributions. Available online: www2. stat. unibo. it/bortot/ricerca/papersn2. pdf, 2010.
 Carreau and Bengio (2009) Julie Carreau and Yoshua Bengio. A hybrid Pareto model for asymmetric fattailed data: the univariate case. Extremes, 12(1):53–76, 2009. ISSN 13861999. doi: 10.1007/s1068700800680. URL https://doi.org/10.1007/s1068700800680.
 Cooley et al. (2006) D. Cooley, P. Naveau, and P. Poncet. Variograms for spatial maxstable random fields. In Dependence in Probability and Statistics, pages 373–390. Springer, 2006.
 Davison and Huser (2015) A. C. Davison and R. Huser. Statistics of extremes. Annual Review of Statistics and its Application, 2:203–235, 2015.
 Davison et al. (2012) A. C. Davison, S. A. Padoan, and M. Ribatet. Statistical modeling of spatial extremes. Statistical Science, 27(2):161–186, 2012.
 Davison et al. (2013) A. C. Davison, R. Huser, and E. Thibaud. Geostatistics of dependent and asymptotically independent extremes. Mathematical Geosciences, 45(5):511–529, 2013.
 Dey and Yan (2016) D. K. Dey and J. Yan. Extreme value modeling and risk analysis: methods and applications. CRC Press, 2016.
 do Nascimento et al. (2012) Fernando Ferraz do Nascimento, Dani Gamerman, and Hedibert Freitas Lopes. A semiparametric Bayesian approach to extreme value estimation. Stat. Comput., 22(2):661–675, 2012. ISSN 09603174. doi: 10.1007/s112220119270z. URL https://doi.org/10.1007/s112220119270z.
 Eriksson and Siska (2000) M. Eriksson and P. P. Siska. Understanding anisotropy computations. Mathematical Geology, 32(6):683–700, 2000.
 Ferguson (1973) T. S. Ferguson. A Bayesian analysis of some nonparametric problems. The Annals of Statistics, pages 209–230, 1973.
 Ferguson (1974) T. S. Ferguson. Prior distributions on spaces of probability measures. The Annals of Statistics, pages 615–629, 1974.
 Ferreira et al. (2014) A. Ferreira, L. De Haan, et al. The generalized Pareto process; with a view towards application and simulation. Bernoulli, 20(4):1717–1737, 2014.
 Fosberg (1978) M. A. Fosberg. Weather in wildland fire management: The fireweather index. Paper presented at the Conference on Sierra Nevada Meteorology, Am. Meteorol. Soc., South Lake Tahoe, California, 1978.
 Fox et al. (2011) E. B. Fox, E. B. Sudderth, M. I. Jordan, and A. S. Willsky. A sticky HDPHMM with application to speaker diarization. The Annals of Applied Statistics, pages 1020–1056, 2011.
 Fraley and Raftery (2006) C. Fraley and A. E. Raftery. Mclust version 3: an R package for normal mixture modeling and modelbased clustering. Technical report, Washington, University of Seattle, Department of Statistics, 2006.
 Fuentes et al. (2013) M. Fuentes, J. Henry, and B. J. Reich. Nonparametric spatial models for extremes: Application to extreme temperature data. Extremes, 16(1):75–101, 2013.
 Fúquene Patiño (2015) Jairo Alberto Fúquene Patiño. A semiparametric Bayesian extreme value model using a Dirichlet process mixture of gamma densities. J. of Applied Stat., 42(2):267–280, 2015. doi: 10.1080/02664763.2014.947357.
 Gelfand and Schliep (2016) A. E. Gelfand and E. M. Schliep. Spatial statistics and Gaussian processes: A beautiful marriage. Spatial Statistics, 18:86–104, 2016.
 Gelfand et al. (2005) A. E. Gelfand, A. Kottas, and S. N. MacEachern. Bayesian nonparametric spatial modeling with Dirichlet process mixing. Journal of the American Statistical Association, 100(471):1021–1035, 2005.
 Genest and Nešlehová (2012) C. Genest and J. Nešlehová. Copulas and copula models. Encyclopedia of Environmetrics, 2012.

Ghosal and van der Vaart (2017)
S. Ghosal and A. van der Vaart.
Fundamentals of nonparametric Bayesian inference
, volume 44. Cambridge University Press, 2017.  Haskard et al. (2007) K. A. Haskard, B. R. Cullis, and A. P. Verbyla. Anisotropic Matérn correlation and spatial prediction using REML. Journal of Agricultural, Biological, and Environmental Statistics, 12(2):147, 2007.

Hu and Scarrott (2018)
Yang Hu and Carl Scarrott.
evmix: An R package for extreme value mixture modeling, threshold estimation and boundary corrected kernel density estimation.
J. Stat. Software, 84(5):1–27, 2018. doi: 10.18637/jss.v084.i05. URL https://doi.org/10.18637/jss.v084.i05.  Hughes and Hall (2010) M. Hughes and A. Hall. Local and synoptic mechanisms causing Southern California’s Santa Ana winds. Climate Dynamics, 34(6):847–857, 2010.
 Huser et al. (2017) R. Huser, T. Opitz, and E. Thibaud. Bridging asymptotic independence and dependence in spatial extremes using gaussian scale mixtures. Spatial Statistics, 21:166–186, 2017.
 Kambezidis and Kalliampakos (2016) H. D. Kambezidis and G. K. Kalliampakos. FireRisk Assessment in Northern Greece using a modified Fosberg FireWeather Index that includes forest coverage. International Journal of Atmospheric Sciences, 2016.
 Krupskii et al. (2018) P. Krupskii, R. Huser, and M. G. Genton. Factor copula models for replicated spatial data. Journal of the American Statistical Association, 113(521):467–479, 2018.
 MacDonald et al. (2011) A. MacDonald, C. J. Scarrott, D. Lee, B. Darlow, M. Reale, and G. Russell. A flexible extreme value mixture model. Comput. Statist. Data Anal., 55(6):2137–2157, 2011. ISSN 01679473. doi: 10.1016/j.csda.2011.01.005. URL https://doi.org/10.1016/j.csda.2011.01.005.
 Mathieu (2013) R. Mathieu. Spatial extremes: Maxstable processes at work. Journal de la Société Française de Statistique, 154(2):156–177, 2013.
 Morris et al. (2017) S. A. Morris, B. J. Reich, E. Thibaud, and D. Cooley. A spacetime skewt model for threshold exceedances. Biometrics, 2017.

Padoan (2011)
S. A. Padoan.
Multivariate extreme models based on underlying skewt and
skewnormal distributions.
Journal of Multivariate Analysis
, 102(5):977–991, 2011.  Padoan et al. (2010) S. A. Padoan, M. Ribatet, and S. A. Sisson. Likelihoodbased inference for maxstable processes. Journal of the American Statistical Association, 105(489):263–277, 2010.
 Raphael (2003) M. N. Raphael. The Santa Ana winds of California. Earth Interactions, 7(8):1–13, 2003.
 Reich and Fuentes (2015) B. J. Reich and M. Fuentes. Spatial Bayesian nonparametric methods. In Nonparametric Bayesian Inference in Biostatistics, pages 347–357. Springer, 2015.
 Reich and Shaby (2012) B. J. Reich and B. A. Shaby. A hierarchical maxstable spatial model for extreme precipitation. The Annals of Applied Statistics, 6(4):1430, 2012.
 Ribatet and Sedki (2013) M. Ribatet and M. Sedki. Extreme value copulas and maxstable processes. Journal de la Société Française de Statistique, 154(1):138–150, 2013.
 Roads et al. (1991) J. O. Roads, K. Ueyoshi, S. C. Chen, J. Alpert, and F. Fujioka. Mediumrange fire weather forecasts. International Journal of Wildland Fire, 1(3):159–176, 1991.
 Rolinski et al. (2016) T. Rolinski, S. B. Capps, R. G. Fovell, Y. Cao, B. J. D’Agostino, and S. Vanderburg. The Santa Ana wildfire threat index: methodology and operational implementation. Weather and Forecasting, 31(6):1881–1897, 2016.
 Sang and Gelfand (2009) H. Sang and A. E. Gelfand. Hierarchical modeling for extreme values observed over space and time. Environmental and Ecological Statistics, 16(3):407–426, 2009.
 Sang and Gelfand (2010) H. Sang and A. E. Gelfand. Continuous spatial process models for spatial extreme values. Journal of Agricultural, Biological, and Environmental Statistics, 15(1):49–65, 2010.
 Sapsis et al. (2016) D. Sapsis, T. Brown, C. Low, M. Moritz, D. Saah, and B. Shaby. Mapping Environmental Influences on Utility Fire Threat. A Report to the California Public Utilities Commission Pursuant to R.08–11005 and R.1505006, 2016.
 Schmidt et al. (2017) A. M. Schmidt, K. Gonçalves, and P. L. Velozo. Spatiotemporal models for skewed processes. Environmetrics, 28(6), 2017.
 Sethuraman (1994) J. Sethuraman. A constructive definition of Dirichlet priors. Statistica Sinica, pages 639–650, 1994.
 Shaby et al. (2016) Benjamin A. Shaby, Brian J. Reich, Daniel Cooley, and Cari G. Kaufman. A Markovswitching model for heat waves. The Annals of Applied Statistics, 10(1):74–93, 2016. ISSN 19326157. doi: 10.1214/15AOAS873. URL https://doi.org/10.1214/15AOAS873.
 Sibuya (1960) M. Sibuya. Bivariate extreme statistics, i. Annals of the Institute of Statistical Mathematics, 11(3):195–210, 1960.
 Storlie et al. (2014) C. Storlie, J. Sexton, S. Pakin, M. Lang, B. Reich, and W. Rust. Modeling and predicting power consumption of high performance computing jobs. arXiv preprint arXiv:1412.5247, 2014.
 Thibaud et al. (2016) E. Thibaud, J. Aalto, D. S. Cooley, A. C Davison, J. Heikkinen, et al. Bayesian inference for the Brown–Resnick process, with an application to extreme low temperatures. The Annals of Applied Statistics, 10(4):2303–2324, 2016.
 Turkman et al. (2010) K. F. Turkman, M. A. A. Turkman, and J. M. Pereira. Asymptotic models and inference for extremes of spatiotemporal data. Extremes, 13(4):375–397, 2010.
 Wadsworth and Tawn (2013) J. L. Wadsworth and J. A. Tawn. Efficient inference for spatial extreme value processes associated to logGaussian random functions. Biometrika, 101(1):1–15, 2013.
 Wadsworth et al. (2010) J. L. Wadsworth, J. A. Tawn, and P. Jonathan. Accounting for choice of measurement scale in extreme value modeling. The Annals of Applied Statistics, pages 1558–1578, 2010.
 Wu and Ghosal (2010) Y. Wu and S. Ghosal. The L1consistency of Dirichlet mixtures in multivariate Bayesian density estimation. Journal of Multivariate Analysis, 101(10):2411–2419, 2010.
Appendix A: Marginal and joint distributions of a Skewt process
The densities of univariate and multivariate skew distributions are provided in the following.
Univariate skew distribution
We call to follow a univariate skew distribution with parameters if with and . The density function of is given by
where and
are density and cumulative distribution functions (CDF) of univariate Student’s
distribution (location = 0 and scale = 1) with degrees of freedom.Multivariate skew distribution
We call to follow a variate skew distribution with parameters if with and . The density function of is given by
where with , is the density function of variate Student’s distribution with location , shape matrix and degrees of freedom and is the CDF of univariate Student’s distribution (location = 0 and scale = 1) with degrees of freedom. The matrix is given by .
Appendix B: GEVlog transformation
In extreme value analysis, an observation obtained from some block maximum is assumed to be distributed as a generalized extreme value (GEV) distribution with CDF given by where
The location, scale and shape parameters are and respectively. These three parameters jointly determine the support of ; if , if and if . We denote .
The flexible Bayesian nonparametrics (BNP) tools are easier to implement if the support of is assumed to be the whole real line. Thus, we consider the transformation so that we can easily implement the BNP tools along with the observations have a more generalized support. By GEV transformation, we consider so that . The support of is . Further we consider a log transformation to obtain so that has support over the whole real line. Finally we apply the BNP tools over . Additional to flexible support, the transformation can add skewness for simple models like Gaussian processes. Though the transformation does not lead to spatial extremal dependence for when is asymptotically independent. Hence, the skew process is required in spite of a transformed Gaussian process being skewed and the marginal distributions having thicker tail. As an illustration, we consider three transformations with a standard normal distribution in Figure 7.
Here we consider and transform to obtain . We fix the location and scale parameters by and . When , the density of and coincide. When , the density of is bounded below and have thicker right tail compared to . When , the density of is bounded above and have thicker left tail compared to .
Appendix C: Proof of Remark 2
We want to show that the proposed STPDPM model covers the class of squareintegrable processes with continuous mean and covariance functions defined on the domain . By setting , and for each (thus, has discrete support ’s), the proposed STPDPM model has and given the cluster parameters and the mixture probabilities . Specifically, consider a generic process with continuous mean function and continuous covariance function
Comments
There are no comments yet.