Hierarchical Transformed Scale Mixtures for Flexible Modeling of Spatial Extremes on Datasets with Many Locations

by   Likun Zhang, et al.

Flexible spatial models that allow transitions between tail dependence classes have recently appeared in the literature. However, inference for these models is computationally prohibitive, even in moderate dimensions, due to the necessity of repeatedly evaluating the multivariate Gaussian distribution function. In this work, we attempt to achieve truly high-dimensional inference for extremes of spatial processes, while retaining the desirable flexibility in the tail dependence structure, by modifying an established model based on a Gaussian scale mixture. We show in detail that the desired extremal dependence properties from the original model are preserved in the modified model, and we demonstrate that the corresponding Bayesian hierarchical model does not involve the expensive computation of multivariate Gaussian distribution function. We fit our model to exceedances of a high threshold, and perform coverage analyses and cross-model checks to validate its ability to capture different types of tail characteristics. We use a standard adaptive Metropolis algorithm for model fitting, and further accelerate the computation via parallelization and Rcpp. Lastly, we apply the model to a dataset of a fire threat index on the Great Plains region of the US, which is vulnerable to massively destructive wildfires. We find that the joint tail of the fire threat index exhibits a decaying dependence structure that cannot be captured by limiting extreme value models.



There are no comments yet.


page 36


Modeling spatial extremes using normal mean-variance mixtures

Classical models for multivariate or spatial extremes are mainly based u...

A Vecchia Approximation for High-Dimensional Gaussian Cumulative Distribution Functions Arising from Spatial Data

We introduce an approach to quickly and accurately approximate the cumul...

A Multi-Resolution Spatial Model for Large Datasets Based on the Skew-t Distribution

Large, non-Gaussian spatial datasets pose a considerable modeling challe...

Outer power transformations of hierarchical Archimedean copulas: Construction, sampling and estimation

A large number of commonly used parametric Archimedean copula (AC) famil...

Projecting Flood-Inducing Precipitation with a Bayesian Analogue Model

The hazard of pluvial flooding is largely influenced by the spatial and ...

Dependence Modeling in Ultra High Dimensions with Vine Copulas and the Graphical Lasso

To model high dimensional data, Gaussian methods are widely used since t...

A Model-Free Selection Criterion For The Mixing Coefficient Of Spatial Max-Mixture Models

One of the main concerns in extreme value theory is to quantify the depe...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Modeling the dependence structure in the extremes of spatial processes is of great consequence for risk analysis of extreme events. In this paper, we make a slight alteration to the flexible model proposed by Huser and Wadsworth (2018) based on a scaled transformed Gaussian process to effectively increase scalability and enable high-dimensional inference, while preserving submodels that transition smoothly between dependence classes.

Generally, the probability that two spatially-indexed random variables exceed a high level simultaneously varies by their separation distance, and the particular way in which this joint probability decays must be well-represented in models if one hopes to accurately to assess risks posed by spatial extremal phenomena. As an example, figure

1 shows the spring maximum Fosberg Fire Weather Index (FFWI) values for the years 1974–2015 at four weather stations in the Great Plains region of the US (see section 6 for more details on the dataset). For two stations that are relatively close (left panel, ), there is strong positive dependence in the spring maximum FFWI values. By contrast, the process exhibits little evident dependence among the maxima at locations spaced far apart (right panel,

). Good estimation of how the dependence changes with both distance in space and distance into the joint tail will enable us to accurately calculate exceedance probabilities of areal quantities, predict at un-observed locations, and, secondarily, get a more realistic picture of marginal quantities.

Figure 1: Spring maximum FFWI values for 1974–2015. The distances between stations are around and respectively.

In classical spatial modeling, Gaussian processes have been widely used due to their mathematical simplicity and tractability for larger datasets. However, the Gaussian density function is very light-tailed, and thus tends to underestimate probabilities associated with extreme events; furthermore, Gaussian models stipulate that the dependence among rare events at distinct locations will always diminish as the events become more extreme, eventually becoming independent in the limit. Instead of using a Gaussian process, which has the asymptotic independence property (see section 2.1 for definition), it is often more realistic to use a model that possesses stronger tail dependence when analyzing the joint occurrence of extremes in space.

Max-stable processes form an important class of models that exhibit dependence in the asymptotic tail. They are the natural extension of classical univariate extreme value theory to infinite dimensional settings. Specifically, consider a stochastic process with continuous sample paths, where . Then its limiting process is a max-stable process:


where are independent replicates of , and are sequences of continuous functions, and the limiting process has non-degenerate marginals. Max-stable processes comprise the only possible limits of linearly rescaled pointwise maxima of spatial processes. Although the inference for max-stable models is computationally difficult (Padoan et al., 2010; Huser and Davison, 2013; Castruccio et al., 2016), they nonetheless provide an asymptotically-justified modeling framework for datasets consisting of block-maxima. Counterparts of max-stable processes suitable for threshold exceedances are called generalized Pareto processes (Thibaud and Opitz, 2015). These processes are also asymptotically dependent, and have the advantage that they bypass many of the computational difficulties of max-stable processes.

Despite the theoretical appeal of limiting max-stable and generalized Pareto processes, incorrectly assuming asymptotic dependence can be problematic because if the true data generating process exhibits weakening dependence in the un-observed region of the tail, inference about the far joint tail will over-estimate risk, sometimes substantially. This problem is difficult to diagnose using exploratory analysis because asymptotically independent processes can exhibit spatial dependence in the far, but sub-asymptotic, tail, possibly leading one to incorrectly conclude that an asymptotically dependent model is necessary. Since tail inferences tend to be highly uncertain, it might not be so clear whether the process is truly asymptotically dependent or independent based on analysis of data that is necessarily sub-asymptotic.

Moreover, rare events sometimes become more spatially localized as the extremeness intensifies. Both max-stable processes and generalized Pareto processes will fail to capture this feature because they have stable dependence structure, no matter how extreme the events are. Up to a location-scale transformation, max-stable processes are invariant to taking component-wise maxima, while generalized Pareto processes are invariant to taking conditional exceedances over increasingly higher thresholds. This lack of flexibility in the joint tail may lead to disastrous over- or underestimation of the probabilities of the simultaneous occurrence of rare events, which in return causes incorrect extrapolation.

On account of the limitations of the existing limiting models, it is desirable to find a family of spatial models that can transition smoothly between asymptotic dependence and asymptotic independence. Section 2.2 gives an overview of such hybrid spatial extreme models in the literature. In particular, we will be examining the marginally transformed Gaussian scale mixture model proposed by Huser and Wadsworth (2018)

. This model is of great interest due to its appealing theoretical properties and ability to capture the dependence structure flexibly. Unfortunately, inference for the model is not feasible in higher dimensions, as calculation of the censored likelihood, the preferred method for fitting joint tail models, entails integration over high-dimensional multivariate Gaussian distribution functions. To increase scalability, we propose an adaptation of the model by adding an independent measurement error term to each component. By adding this nugget effect, the new model circumvents the lengthy computation of the multivariate normal distribution function. Also it can avoid the integral of the process below the censoring threshold elegantly by considering the uncensored process as latent and drawing from it using Gibbs sampling, allowing for truly high-dimensional inference. Furthermore, we show that the model retains all the significant asymptotic properties of the original smooth model despite the presence of the measurement errors, which lays a solid theoretical foundation for correctly capturing the sub-asymptotic dependence behavior.

The article is organized as follows. Section 2 provides a brief literature review on the measures of extremal dependence and hybrid spatial extreme models, and further explains the intractability of the existing censored likelihood approaches to inference on these hybrid models. Section 3 describes our new model that alleviates the computational problems, and studies its extremal dependence properties. Section 4 includes a marginal transformation in the hierarchical model and details the inference using Gibbs sampling. Section 5 presents a simulation study that validates the methodology. We apply our model to a dataset of the fire threat index FFWI on the Great Plains in section 6. Section 7 concludes with some discussion. Appendix A provides detailed proofs of all the theoretical results.

2 Motivation

For a stochastic process , we write and so forth for simplicity, where denotes the th spatial location. When the process is observed at

, the joint distribution for models of the asymptotic tail can written as the following form, assuming unit Fréchet margins:


where is called the spectral measure which characterizes the dependence structure of . It is easy to verify that corresponds to being completely dependent, while indicates complete independence. For max-stable processes, is connected to the Picklands dependence function (de Haan and Resnick, 1977; Pickands, 1981).

However, there is usually no simple parametric representation for

, and thus it is difficult to make inference from data. Likelihood inference for the few parametric families in use tend to be intractable for even moderately-sized datasets because they suffer from the curse of dimensionality.

2.1 Spatial Dependence for Extremes

To concisely summarize the extremal dependence implied by these limiting models, it is useful to restrict the scope to the bivariate case. One example of a bivariate dependence measure is called upper tail dependence:


where , , and . Joe (1993) defined the upper tail dependence parameter as the limit . Asymptotic dependence is attained if and only if , while gives asymptotic independence.

For max-stable processes, is a constant function of as a consequence of the homogeneity of , resulting in . Generalized Pareto processes also have for all above a certain level (see Rootzén et al., 2018). This is why the dependence structures of max-stable processes or generalized Pareto processes are stable with respect to level of extremeness.

When , i.e. the case of asymptotic independence, alternative measures are needed to provide information about the dependence in the tail because provides no discriminating detail. In light of this, Ledford and Tawn (1996) introduced the coefficient of tail dependence . More specifically, when the joint survival function satisfies the asymptotic assumption


for large , where is a slowly varying function (see section A.2 for definition), then is called the coefficient of tail dependence for process . The pair of variables are asymptotically dependent when and . The remaining cases are all asymptotic independent, and the value of characterizes the strength of extremal dependence in the joint upper tail. In the case of a Gaussian process, , where is the correlation at lag . The variables are called positively associated when and negatively associated when . Near independence corresponds to .

Gaussian processes are asymptotically independent for all correlations . They might be considered candidates for modeling the joint tail of asymptotically independent phenomena, but they are generally considered are too restrictive because they only allow very weak dependence due to their light tail. Another option for modeling asymptotic spatial phenomena is the class of inverted max-stable processes. Given that the lower tail of a multivariate max-stable distribution is asymptotically independent, Wadsworth and Tawn (2012) proposed the class of inverted max-stable processes that exhibits more dependence than Gaussian processes at finite thresholds. But they pose the same computational challenge as max-stable models. Fuentes et al. (2013) presented a flexible Dirichlet-based copula model to explain extremal dependence. Opitz (2016)

captures spatial dependence in asymptotically independent processes by construction of Laplace random fields, defined as mixtures of Gaussian processes with a random variance that is exponentially distributed.

The aforementioned models improve upon modeling of spatial extremes using max-stable processes by reducing the computational cost via modeling threshold exceedances or introducing more flexibility in the joint upper tail, but they are purely either asymptotically dependent models or asymptotically independent models. It is more desirable to fit spatial models that encompass both dependence classes and formally unify them.

2.2 Traversing Asymptotic Independence and Dependence in Spatial Extremes

Wadsworth and Tawn (2012) were the first to introduce hybrid models that combine max-stable and inverted max-stable processes so that asymptotic dependence prevails at short distances, and asymptotic independence at long distances. However, inference for this model is difficult because there are a fairly large number of parameters involved, and the transition between the dependence classes takes place at the boundary of the parameter space.

Recently, several Gaussian scale mixture models were proposed to allow more flexible transitions between dependence classes. Through multiplying an asymptotically independent Gaussian process by a random effect that governs the extremal dependence, these models can be described by a small number of parameters and have non-trivial asymptotically independent and asymptotically dependent submodels. More precisely, suppose is a standard Gaussian process with covariance function . The Gaussian scale mixture model can be constructed as


where is a link function, and is a random scaling factor that can be interpreted as a constant random process over spatial domain with perfect dependence. Impacting simultaneously the whole domain , heavier tailed induces asymptotic dependence, whereas lighter tailed induces asymptotic independence.

Morris et al. (2017)

uses a new space-time model based on skew-

process, where is a identity function, is an inverse gamma random variable, and

is a Matérn covariance function. On top of the mixture, they added covariate effects and a skew term. Since inverse gamma distribution is heavy tailed, the skew-

process is asymptotically dependent for . Asymptotic independence is achieved only when .

Huser et al. (2017) also used an identity link function, but few assumptions on the random scale were made and more general results on the joint tail decay rates of the mixture processes were provided. A wide class of Weibull-like tail decay in

was proven to yield asymptotic independence, while Pareto-like tail that is regularly varying at infinity gives asymptotic dependence. They also proposed a parametric model that bridges the two asymptotic regimes and provides a smoother transition, in which

is a two-parameter Weibull-type distribution


where , and the support is . Since converges to as approaches 0, (6) forms a continuous parametric family on . When , (6) constitutes a class of Weibull-type distributions and thus assures asymptotic independence. When , the variable is Pareto distributed and thus gives asymptotic dependence. This shows that the model provides greater flexibility and can smoothly transition from asymptotic dependence to independence via adjusting the value of .

However, the previous two Gaussian scale mixture models both make the transition between the dependence classes at the limit or the boundary of the parameter space. They are also inflexible in their representation of asymptotic dependence structures. It may be more desirable to find a model for which the transition takes place in the interior of the parameter space so one we can quantify the uncertainty about the dependence class in a simpler manner. To overcome this, Huser and Wadsworth (2018) proposed a marginally transformed Gaussian scale mixture model, where transforms a standard Gaussian variable to standard Pareto, and itself is Pareto distributed:


Here the type of asymptotic dependence is determined by the value of . When , is light tailed and this induces asymptotic independence; when , the converse is true and this induces asymptotic independence. The analytic forms of the tail dependence coefficients and is also derived (see Lemma 2.1).

Lemma 2.1.
111Corollary 1 in Huser and Wadsworth (2018).

Suppose the pair are two variables observed from process (7). Then the upper tail dependence parameter satisfies

The coefficient of tail dependence for the process (7) is

where is the coefficient of tail dependence for .

The model in (7) provides a smooth transition through asymptotically independent and asymptotically dependent submodels. It has many appealing asymptotic properties. However, calculating the censored likelihood defined in section 2.3, which is the preferred vehicle for inference on joint tail models, requires computing an integral where the integrand contains the Gaussian distribution function in dimensions, where is the number of components below a designated high threshold. This is computationally prohibitive for even moderatly-sized datasets. We now introduce a slight alternation to this model to make it tractable while preserving all the desired asymptotic results.

2.3 The Censored Likelihood

In inference for multivariate and spatial extremes, the preferred approach to fitting the dependence structure is using a censored likelihood, which prevents observations from the bulk of the distribution from affecting the estimation of the extremal dependence structure. It provides a reasonable compromise between bias and variance compared to alternative approaches, although different censoring schemes have been adopted (Thibaud and Opitz, 2015; Huser et al., 2016).

For notational simplicity, we will write for the marginally transformed Gaussian process when no confusion can arise. For (7), one has the representation


when the process is observed at spatial locations . By conditioning on , we derive the distribution function


where , and denotes the -variate Gaussian distribution with zero mean and covariance matrix .

Let be the set of locations with censored observations—that is, the set of locations where the components are below a high threshold; let be the set of locations with uncensored observations. For any index set , denote as the subset of subsetted via the index set , denote as the matrix restricted to the rows in and the columns in , and let denote the Schur complement of in . The likelihood is obtained via taking partial derivatives of (9) with respect to :


Although only one-dimensional integral appears in (10), the integrand is a -dimensional Gaussian distribution function. When approximating the integral using standard quadrature or Monte Carlo methods, one needs to compute the distribution function for each sample point taken on . This is only feasible when the number of locations is moderate. Additionally, if there are multiple time replicates, this calculation will have to be repeated several times.

To avoid the calculation of the multivariate Gaussian distribution function, one could put off integrating the process below the threshold, instead think of as latent and draw from it using random walk Monte Carlo methods. Consequently there is no need to compute the giant likelihood (10

). However, to update the Markov chain each time, it is now necessary to draw

from a high-dimensional truncated distribution, which might again be computationally intensive.

Therefore, we propose to make a slight adjustment to the model in (5). Our new model is markedly more amenable to higher-dimensional inference, yet it keeps hold of the joint tail decay rates observed by Huser and Wadsworth (2018). Equivalently our new model has non-trivial asymptotically dependent and asymptotically independent submodels with the transition taking place in the interior of the parameter space.

3 Model

3.1 Construction

We alter the model in (5) by adding an independent measurement error term to each component


where , and the distribution of and the link function is the same as in (7).

We add a simple nugget effect to the smooth process . When drawing the latent processes below the threshold, we can condition on the smooth process and simply update the noise term . Because these error terms are independent of each other, there is only univariate integral involved in the full conditional likelihood. Also, when we update the smooth process given the noisy process , no truncation or censoring is present and it would be much easier to sample from the corresponding likelihood. See section 4 for more details on the Markov Chain Monte Carlo (MCMC) updating scheme and see how this small alternation can largely facilitate inference.

3.2 Dependence Properties

Recall is a stationary process with standard Pareto margins, and it shows asymptotic independence; i.e. for large ,


where is a slowly varying function, and unless the correlation for .

Figure 2 illustrates the coefficient of tail dependence as a function of for . For each combination of and , we generate 100,000 replicates from model (5) (i.e. ) and model (11) with respectively. For each replicate, we sample with a Gaussian copula with correlation . We then numerically approximate the joint survival probability in (4) to obtain an estimation on . The left panel of figure 2 clearly shows that the smooth transition from asymptotic independence to asymptotic dependence takes place at , confirming the results from Lemma 2.1; the right panel shows that adding a measurement error has little effect on the tail dependence because exhibits similar behavior. This result prompted us to examine whether the flexible asymptotic properties in Huser and Wadsworth (2018) are preserved in the altered model. It would be advantageous if they are and we can achieve computational gain at the same time. The results are described in section 3.2.1 - 3.2.3, and the proofs are deferred to Appendix A.

Figure 2: Coefficient of tail dependence approximated for the smooth Gaussian scale mixture processes () and the noisy processes () as a function of for . The levels of dependence are similar for two models.

3.2.1 Marginal distribution

The marginal distribution of the smooth process (5) can be derived as follows (see equation (9) in Huser and Wadsworth (2018)):

The case can be established as the limit of the survival function above:

To obtain the marginal distribution for the noisy process , taking a convolution of and will suffice. However, the margins of are no longer available in closed form; see Proposition A.1. But using the fact that Gaussian distribution is considerably lighter-tailed than , Theorem 3.1 demonstrates that we can approximate the convolved distribution function in the far tail nonetheless.

Theorem 3.1.

See Appendix A.1. ∎

This shows that the nugget effect on the marginal tail behavior is negligible, regardless of the error variance .

3.2.2 Joint distribution

We now study the joint survival function of a pair of variable from the noisy process (11). Combining the results from (13), we can calculate the coefficients and that outline the tail dependence of as a function of .

Theorem 3.2.

Denote , and , , where . Then we have , and

where is a slowly varying function, and .


See Appendix A.2

Together theorem 3.1 and 3.2 tell us that and To find the coefficient of tail dependence as defined in (4), it is apparent that will have the same value as in Lemma 2.1, and the same goes for dependence parameter .

Proposition 3.3.

The coefficient of tail dependence for the process (11) is


where is the coefficient of tail dependence for . Moreover, if , the pair is asymptotically dependent with


If , the pair is asymptotically independent with .


See Appendix A.3. ∎


The -dimensional version of , , equals

which can be demonstrated analogously using the same techniques as for the bivariate version. Furthermore, let and denote -dimensional equivalents of the coefficient of tail dependence, and similar summaries still hold for -dimension simply by replacing and with and .

This shows that the modified model still provides non-trivial asymptotically dependent and asymptotically independent submodels with the transition taking place smoothly in the interior of the parameter space. The case marks the critical shift between asymptotic dependence and asymptotic independence. According to Proposition 3.3, we have asymptotic independence (), while the coefficient of tail dependence reaches the boundary value of 1. Equivalently we have as from (4). When , we have and ; that is, when the value of is small, is lighter tailed, and the dependence structure is governed by . On the other hand, if approaches 1, the term will be dominated by which can be seen as a random process with perfect dependence. As the parameter

moves gradually from 0 to 1, one can get a smooth interpolation from asymptotically independence submodel

to perfect dependence.

3.2.3 Further dependence properties under asymptotic dependence

First we define a limiting measure, namely the so-called exponent function

where . For max-stable processes or generalized Pareto processes, this function is equivalent to the function in (2) that characterizes the joint dependence structure. Buishand (1984) first introduced as an extremal coefficient. It has range , with lower and upper ends indicating respectively, dependence and perfect independence. We can use this coefficient to examine the mode of convergence under asymptotic dependence when modeling extreme events at finite levels. Following the proof by Huser and Wadsworth (2018), we can obtain the following results that are again similar to results from the smooth model.

Theorem 3.4.

For as defined in (11) and ,


That is, the -dimensional extremal coefficient is


See Appendix A.4. ∎


(16) resembles the spectral representation of the measure in (2).

The rate at which converges to its limit determines the flexibility of a process in the joint tail and affects the ability to capture extremal dependence. The following theorem assures the noisy model to have the same flexibility as the smooth model. It also allows dependence to weaken above the level used for fitting, while still permitting asymptotic dependence.

Theorem 3.5.

For ,

For , since , the rate is given by the coefficient of tail dependence .


See Appendix A.4. ∎

4 Bayesian Inference

4.1 Hierarchical Model

We define a Bayesian hierarchical model based on the process (11) defined in section 3.1 and use a MCMC algorithm to fit the data. We model our data as partially censored observations (see section 2.3) from the model. In (11), the mixing parameter controls both joint and marginal behavior of the response , which we would prefer to separate. Therefore we first assume our observations above a high threshold are generalized Pareto distributed conditionally, and we include a marginal transformation right in the hierarchical model. Specifically, we define a marginal transformation as follows:


where is the marginal distribution function for process (11) which can found in Proposition A.1 in analytic form, and


where , , and is a high threshold. Conditioning on the smooth process , which was not truncated, the censored likelihood for an observation can be derived as


where . Note that there are only univariate calculations required in (20), compared to (10) for which we have to estimate the -dimensional Gaussian distribution functions. In addition, since and are independent conditioning on the smooth process (

), the joint likelihood of the whole vector

is simply .

Now we extend (11) to accommodate temporal replication; i.e., we observe , and the temporal dependence is ignored. Then the hierarchical model can be described as


where is a Gaussian process with covariance function , and the proportion of censored observations can be treated as a known parameter or an unknown parameter that enters the hierarchical model. We complete the model by assigning priors

4.2 Gibbs Sampler

To estimate the posterior distribution of the model parameters, we apply random walk Metropolis (RWM) algorithm using Log-Adaptive Proposals (LAP) as our adaptive tuning strategy (Shaby and Wells, 2010)

. Whenever conjugate priors are not available, we use a random walk Metropolis-Hastings update step.

The following theorem calculates the density function of conditioning on the random scaling factor .

Theorem 4.1.

For , where , the density conditioned on is


where .


See Appendix A.5. ∎


The full conditional distribution used in the Gibbs sampling to update and can be obtained as follows


Since time independence is assumed, we can update and in a parallel fashion across . The other parameters are updated similarly using adaptively-tuned random walk Metropolis-Hastings updates, with the likelihood multiplied by a corresponding prior.

5 Simulation Studies

In this section, we present simulation results and conduct coverage analysis to investigate whether the MCMC procedure is able to draw accurate inference on model parameters, and to check whether our model captures asymptotic dependence characteristics correctly even when the data-generating model is different from (11).

5.1 Parameter Estimation

To verify the accuracy of inference made by MCMC sampling, we generate data from model (11) in section 3.1 using sites and independent temporal replications. The sites are uniformly drawn from the unit square . We generate data from three different scenarios:

  • , asymptotic independence;

  • , transition point;

  • , asymptotic dependence.

For each , the latent Gaussian processes are generated using a Matérn covariance with smoothness parameter , and the characteristic length scale parameter . Meanwhile, is used as variance for each measurement error term. Then the processes are marginally transformed to generalized Pareto distribution with , where and , the proportion of censored observations, are treated as known parameters.

The attenuation constants used in the LAP algorithm are . The prior for is halfCauchy, and the priors for the other parameters are specified in (21), where so that the prior for is fairly noninformative. We ran each MCMC chain for 400,000 iterations and thinned the results by a factor of 10. The parallelism of updating and is implemented in R via the foreach routine with doParallel package as a backend (Analytics, 2015).

5.2 Coverage Analysis

We now study the coverage properties of the posterior inference based on the MCMC sampler for the posterior credible intervals with 100 simulated datasets under each of the three different scenarios in the previous section.

Figure 3

shows the empirical coverage rates of highest posterior density credible intervals of several sizes, along with standard binomial confidence intervals. We can see that the sampler did well in generating posterior inference with close to nominal frequentist coverage. The coverage for larger

is may be slightly different than nominal for large , but overall the results are quite good.

Figure 3: Empirical coverage rates of credible intervals for and for two designs: (left) and (right). The error bars are 95% binomial confidence intervals for the coverage probability.

5.3 Simulation with Mis-specified Models

We now fit our model to data generated from other models to validate its ability to capture the tail dependence characteristics under mis-specification. We simulate datasets from models referenced in Section 2.2, and use the sampler described in Section 4.2 to fit model (11). A basic check on whether the fitting proceeds as desired is to to examine whether the estimate falls within or in accordance to whether the data-generating process is asymptotically independent or dependent. The data were generated using four different simulation designs:

  • Skew- process from Morris et al. (2017) with (asymptotically dependent);

  • Gaussian scale mixture from Huser et al. (2017) with (asymptotically dependent);

  • Gaussian scale mixture from Huser et al. (2017) with (asymptotically independent).

For each simulation design, we simulate a single dataset using

locations uniformly distributed on

and independent time replicates. The Matérn covariance function with and is again specified for the latent Gaussian processes. For the skew- process, to give a distribution with degrees of freedom, and to simulate moderate skewness. For the last three designs, the were generated as described in (6), with .

To obtain good starting values for the latent smooth process for MCMC, we first marginally transform the simulated data to noisy scale mixture variables independently at each location using the following procedure. Following the semi-parametric procedure of Coles and Tawn (1991), we estimate each marginal distribution as a blend of the generalized Pareto distribution function above a high marginal threshold, and the empirical distribution function below that threshold. Fixing initial values for , we then transform the margins to noisy scale mixtures via . The next step is to run a Metropolis algorithm using the full conditional distribution (see (23)) 100 times and save the last random walk states as initial values for . This procedure is also used for data analysis in section 6. Finally, with initial values in hand, the datasets from each design were fit using a fully Bayesian approach that simultaneously updates marginal and spatial dependence parameters.

Figure 4 displays the nonparametric and model-based estimates of the upper tail dependence defined in (3). To generate nonparametric estimates of at distance , we look at all pairs of points whose locations are apart (within some small tolerance), and compute the ratio of empirical probabilities , pretending that each pair of points is independent from each other pair of points. This is similar to an empirical variogram estimator. The nonparametric confidence envelopes are obtained by computing pointwise binomial confidence intervals. For parametric estimates, we take samples from the converged MCMC chain and use parameters from each iteration to calculate based on our model for each MCMC iteration. Then, combining across MCMC iterations, we compute the pointwise average curves and their credible bands. The results in Figure 4 demonstrate that our model provides a sensible approximation to the extremal dependence structure of the mis-specified models. Borrowing strength across locations, the parametric estimators of are much more reliable than the nonparametric ones in that they are able to discriminate between the two asymptotic classes via estimating a relatively small number of parameters. Especially for the Huser et al. (2017) (HOT) models, the dependence strength diminishes gradually as becomes greater, approaching the Gaussian copula. With the extra tail flexibility, our model is able to accurately capture these features.

(a) Skewed- process
(b) HOT model with
(c) HOT model with
(d) HOT model with
Figure 4: Estimated coefficients , , for two points at distance , using data simulated from the mis-specified models with Matérn correlation function with and . The two scenarios in the top row are asymptotically dependent, while the scenarios in the bottom row are asymptotically independent. Each simulated data set has 100 uniform locations in , and 40 time replicates. Solid green lines show true function. The black dashed lines show the averaged curve from the posterior samples of fitted model, while the blue shaded areas are 95% credible envelopes. The red dashed lines show the nonparametric estimates from the simulated data sets, while the grey shaded areas are pointwise 95% binomial confidence intervals.

6 Data Analysis

We consider daily observations of Frosberg Fire Weather Index (FFWI) from 1974 to 2015 at 93 monitoring stations over parts of the Great Plains, mainly from Central Great Plains to South Texas Plains (Dunn et al., 2012). Figure 9 shows the observation locations as black triangles. FFWI aims to quantify potential wildfire threat. It is a single number summary calculated from temperature, wind and relative humidity; larger index values signify higher flame lengths and more rapid drying (Fosberg, 1978). Due to human activity and changes in the grassland ecosystem, the Great Plains region is becoming an important high-risk region of large wildfires. According to a in-depth study conducted by Donovan et al. (2017), the number of wildfire breakouts per year in Great Plains has tripled over the past three decades. The total area burned by fires annually has grown by more than 400%, from thousands of hectares per year from 1985 to 1994, to millions of hectares per year from 2005 to 2014. As a consequence of the prevailing hot and dry air, wildfires in this region are particularly more concentrated in the spring, feasting on grasses made dry by long-term drought. Modeling the tail behavior of FFWI and studying the extremes of the process could have major implications for wildfire planning.

To ensure the independence over time and avoid seasonal effects, we take the maxima of the FFWI values over ten-day intervals during the spring season. Figure 5 shows the 50-year return levels estimated using the block maxima with 10-year sliding windows for 12 randomly-selected stations. There is no clear evidence for a systematic increase or decrease in the return levels. Although treating meteorological data as constant over time is often problematic, particularly for temperature data, the behavior in Figure 5 suggests that an assumption of constant marginal parameters in time is appropriate.

Figure 5: Point estimates and 95% confidence intervals for 50-year return levels of FFWI at 12 randomly selected stations. For each station and for each year, we estimate the return level using annual maxima in a 10-year sliding window. There is no evident systematic trend in the return level, so the assumption of marginal parameters that are constant in time is deemed appropriate.

To account for the physical features of the terrain in the Great Plains, we describe the scale parameter in by the trend surface:


where and are the longitude and latitude of the stations at which the data are observed. We constrain the joint prior of such that the support of is the positive real line. We model the shape parameter as constant over the spatial domain, as suggested by exploratory analysis (see Appendix B.1 ).

Similar to the procedure in section 5.3

, to obtain starting values for MCMC, we fit generalized Pareto distributions to model events above the 98% quantile,

, and empirical distributions to those below , of the time series at each station separately, and then use the fitted models to transform the data to have noisy scale mixture distributions. We then ran the MCMC chain for 50,000 iterations thinned by 10 steps and discarded a burn-in period of 25,000 iterations.

6.1 Model Evaluation

To evaluate the model fit, we randomly hold out 5 stations for validation purposes, and exclude them from the MCMC analysis; see the red points in Figure 9. We then compare the empirical distributions of minima, mean, and maxima for the 5 held-out stations with those simulated with parameters from MCMC samples. Results are shown in the QQ-plots in Figure 6. Although the observed maxima appear below the diagonal for lower quantiles, the fit displays a good match against the observed minima, mean and maxima in the upper quantiles.

Figure 6: Comparisons of the observed and predicted minima (left), mean (middle) and maxima (right) for 5 locations held out for model validations. Overall 95% confidence envelopes are also shown. Note that the values are transformed marginally to uniform using the empirical distribution functions at each location.

As a comparison, we change to be a process, and re-fit the model using MCMC. We apply to proper scoring rules (Gneiting and Raftery, 2007), log scores and continuously-ranked probability scores (CRPS), to compare the quality of probabilistic forecasts. While running MCMC, we interpolate the latent process at the held-out locations for each iteration using the full conditional likelihood. Plugging the predictive draws at the held-out observations into the equation (20), we obtain the log score (simply the log-likelihood) as


where are the validation stations. The left panel of Figure 7 compares the log scores between two models, showing that the transformed Gaussian scale mixture process clearly outperforms the process. Additionally, we the CPRS (Matheson and Winkler, 1976) for both models,


where is the marginal distribution estimated using parameters using one MCMC iteration, and is the observed value. The right panel of Figure 7 shows the averaged CRPS for the two models, where our model clearly has better results.

Figure 7: Comparisons of the log-likelihood scores and CRPS between our model and a similar model with latent process. In both panels, higher values indicate better model fit.

Similar to Figure 4, we show the empirical and model-based values of and for the block maxima of the FFWI in Figure 8. It confirms that our model captures the extremal spatial dependence in the data quite well.

Figure 8: Empirical estimates (dashed red lines and gray envelopes) of (left) and (right) for the FFWI, , for two points at distance . The blue shaded areas are 95% credible envelopes obtained from the posterior samples of fitted model, and the black dashed lines show the averaged curve. The vertical line is the threshold used when fitting the dependence model.

6.2 Results

To get an idea of what a realization of the fitted process looks like, the left panel of Figure 9 shows one realization of the latent scale mixture process using parameters from one MCMC iteration. Extreme values only occur in two small blobs. The right panel shows the same realization, now marginally transformed to the scale of the FFWI values. Since we modeled our data as partially censored observations, the map here only displays the areas where threshold exceedances are observed.

A quantity of great interest is areal exceedance probabilities, which represent the amount of territory simultaneously at extreme risk for wildfire. To obtain a Monte Carlo estimate of these joint probabilities, we use parameters from each MCMC sample to simulate 100 processes (on a grid, although any resolution is possible), and calculate the total area that has FFWI over a designated threshold. Figure 10 shows the results. These curves represent total area at risk for various FFWI thresholds. The curves for the higher thresholds decay faster than those for the lower thresholds, which confirms that extreme events simultaneously occurring across large areas becomes less common when the threshold increases. This also shows that the joint tail of the fire threat index exhibits a weakening dependence structure, with more extreme events being more localized. It is not possible to capture this behavior using limiting extreme value models like max-stable or generalized Pareto processes.

Figure 9: A realization of the process generated from parameters from one MCMC iteration. Left panel shows the latent scale mixture process , with points showing the stations of the 93 gauges. For model checking, the 88 stations marked by triangles were used to fit the models, and those marked with circles were used to validate the models. Right panel shows the same realization, marginally transformed to the scale of the original FFWI data, and then censored. A projected coordinate reference system, NAD83(HARN), is used here so that the axes are in units of meters.
Figure 10: The distribution of the total area that is over a certain threshold. For each MCMC iteration, we simulate 100 processes on a grid, and count the number of threshold exceedances. The density of the total area is then estimated using Gaussian kernels.

7 Discussion

In this paper, we have proposed a new model which is based on the transformed Gaussian scale mixture model proposed by Huser and Wadsworth (2018). We added a measurement to the mixture so that we can avoid calculating the onerous -dimensional Gaussian distribution function when dealing with the censored likelihood. Also, the new model circumvents the need to draw from a high-dimensional truncated distribution by treating the smooth process as latent and updating repeatedly using MCMC. In addition to its computational advantages, the presence of a measurement error term probably makes the model more realistic for data collected by real-world instruments.

Even with the presence of the measurement error, the model is still able to capture qualitatively different types of sub-asymptotic dependence behavior of spatial processes. A smooth transition between both extremal dependence paradigms takes place in the interior of the parameter space, which enables inference of the dependence class in a simple manner. We proved that all the appealing asymptotic properties found in the smooth process by Huser and Wadsworth (2018) are preserved in the modified model, regardless of the scale of the measurement error variance.

The model allows inference on spatial extreme-value datasets with relatively large numbers of locations. The computational limitations are similar to those of conventional spatial Gaussian process models. We have defined the model conditionally as a Bayesian hierarchical model, for which standard MCMC techniques can be used to fit the data. Computation is facilitated greatly by paralleling over time and migrating some basic linear algebra to C/C++ via Rcpp. Even so, the lack of closed form marginal transformations creates a significant (though embarrassingly parallel) computational challenge that scales with the total number of exceedances, rather than the usual case of scaling with the number of spatial locations.

Despite easing computational limitations associated with the Huser and Wadsworth (2018) model, our modified model inherets the same theoretical limitations. Firstly, for , asymptotic independence is never attained even when the lag . This persistent dependence structure is not desirable when analyzing spatial processes in a large region where only short-range dependence is likely. Secondly, when , the scaling factor of will dominate the smooth process, and model (11) will behave like generalized Pareto process because will become a stable function of . However, for weakly dependent generalized Pareto process, model (11) will fail to capture the dependence class because as , whereas the upper tail dependence for a generalized Pareto process will be a constant function of . However in practice, this will rarely happen because most environmental datasets exhibit weakening dependence.


We gratefully thank Jennifer L. Wadsworth for sharing her ideas for proofs of the dependence properties for similar models during the course of this research. These ideas were indispensable for us in deriving the analytical results in this paper.

Appendix A

Appendix A Technical Proofs

In this section, we assume subjects to standard Pareto distribution so that for notational conveniences. The existing results are labeled with lemmas, and our new findings are labels with propositions or theorems.

a.1 Approximate the marginal distributions

Proposition A.1.

For a fixed location , the marginal distribution for process (11) may be established as follows:


where , and is the density of .

Proof. First consider the marginal distribution without the errors

Plug it into the following equation: