1 Introduction
Waterrelated extremes such as floods and droughts can heavily impact human life, affecting our society, economic stability, and environmental sustainability. In recent years, we have witnessed an acceleration of the water cycle in some areas of the globe, including an increase in the frequency and intensity of heavy precipitation, sadly illustrated with a number of unprecedented hurricane events that recently hit the Caribbean islands and the Southeastern United States (U.S.) in August–September, 2017. These climate changes have motivated the development of stochastic models for risk assessment and uncertainty quantification of extreme weather events. In the univariate context, the generalized extremevalue distribution with timevarying parameters has been used to study the effect of climate change on global precipitation annual maxima (Westra et al., 2013). More recently, Fischer and Knutti (2016) have adopted an empirical approach to study heavy rainfall intensification, validating the theory predicted by early generations of general circulation models. Similarly, Hoerling et al. (2016) analyzed trends in U.S. heavy precipitation data. Beyond the univariate context, several studies have focused on the spatial or spatiotemporal modeling of precipitation extremes within and across different catchments (Cooley et al., 2007; Thibaud et al., 2013; Huser and Davison, 2014; Opitz et al., 2018), leading to extreme river discharges (Asadi et al., 2015) or on the risk of heavy snowfall in mountainous areas (Blanchet and Davison, 2011). In these small regions, however, the spatial dependence structure is often assumed to be stationary. By contrast, in this paper we focus on modeling the complex, nonstationary, spatial dependence structure of U.S. winter precipitation extremes on a continental scale.
Assessing the behavior of extreme events over space entails many challenges. First, data are measured at a finite number of locations, but extrapolation is typically required at any other location of the study region. In our case, the study region is the contiguous U.S. and the monitoring stations at which data are collected are displayed in Figure 1.
Second, by contrast with classical geostatistics where the estimation target is usually the (conditional) mean, statistics of extremes focuses on high (or low) quantiles, and extrapolation is not only required over space, but also into the joint upper (or lower) tail of the distribution. Third, the scarcity of extremes results in uncertainty inflation, especially when very high quantiles have to be estimated, and thus it is important to develop efficient inference methods that can potentially be applied in high dimensions. Finally, a recurrent problem with large heterogeneous regions, such as the contiguous U.S., is the modeling of spatial nonstationarity. This may concern both marginal distributions and the dependence structure. As pointed out by
Huser and Genton (2016), misspecification of the joint distribution of extremes may result in poor estimation of spatial risk measures. There is no universal consensus on how to handle nonstationarity; however, it is often both useful and realistic to assume some weak form of local stationarity. In this work, we fit a very flexible nonstationary, but locally stationary, spatial model to U.S. winter precipitation extremes, which contrasts with the quite rigid fully parametric model fitted by
Huser and Genton (2016), and the nonparametric approach adopted by CastroCamilo and de Carvalho (2016), whereby a family of bivariate extremevalue dependence structures at different sites is linked through a smooth function of covariates, while neglecting spatial dependence.The literature on spatial extremes is mostly divided into two mainstream approaches: the first approach defines extreme events as block (e.g., annual) maxima, which are typically modeled using maxstable processes (Westra and Sisson, 2011; Davison and Gholamrezaee, 2012; Huser and Genton, 2016); the latter can be regarded as the functional generalization of multivariate extremevalue distributions, and they arise as limits of properly renormalized maxima of random fields. The second approach defines extreme events as high threshold exceedances, which are usually modeled using generalized Pareto processes, i.e., the threshold counterpart of maxstable processes (Ferreira et al., 2014; Thibaud and Opitz, 2015; de Fondeville and Davison, 2016). Both maxstable and generalized Pareto processes are “ultimate” models, in the sense that they have an asymptotic characterization for block maxima and threshold exceedances, respectively. However, their tail dependence structure is fairly rigid: maxstable copulas are invariant to the operation of taking componentwise maxima, while generalized Pareto copulas are invariant to thresholding at higher levels. This lack of tail flexibility has recently motivated the development of “penultimate” spatial models for threshold exceedances (Wadsworth and Tawn, 2012; Opitz, 2016; Huser et al., 2017; Huser and Wadsworth, 2018), which, unlike ultimate models, can capture weakening extremal dependence strength on the way to their limiting generalized Pareto process. See also Huser et al. (2018) and Bopp et al. (2018) for the case of maxima. To illustrate this, Figure 2 shows, for a range of quantiles
, the conditional probability
(1.1) 
estimated for six selected pairs of stations at distance , where , , denotes the 5 daycumulative winter precipitation random field defined over the contiguous U.S. (denoted ), and denotes the marginal distribution of . While the function in (1.1) appears to display different dynamics for each pair of stations, a generalized Pareto process always assumes that is constant in (Rootzén et al., 2018) for each lag , hence potentially leading to an overestimation of the dependence strength at higher quantiles for the pairs that have a decreasing function . In Section 2, we develop a nonstationary model capturing nontrivial tail dependence, i.e., allowing for a positive limit , while at the same time having a decreasing function with known limiting extremevalue dependence structure. Our nonstationary model builds upon the stationary exponential factor copula model proposed by Krupskii et al. (2018), which has attractive modeling features, and is computationally more convenient than the models proposed by Huser et al. (2017) and Huser and Wadsworth (2018). It fundamentally differs from the Laplace model of Opitz (2016), which strongly focuses on capturing tail independence () and the decay rate towards the limit, rather than tail dependence (), which is the type of extremal behavior suggested by our precipitation data at small scales (or at least, we have for quite high thresholds ); recall Figure 2.
Inference for extremevalue models is known to be particularly cumbersome. When maxstable processes are directly or indirectly involved, the likelihood function becomes excessively prohibitive to compute in moderate to large dimensions (Castruccio et al., 2016). This has lead to the use of less efficient composite likelihood approaches (Padoan et al., 2010; Westra and Sisson, 2011; Huser and Davison, 2013; Thibaud et al., 2013; Huser and Davison, 2014). Alternatively, threshold approaches lead to simpler, significantly less demanding likelihoods, although their routine application to highdimensional datasets is hampered by the need to censor observations involving nonextreme data (i.e., below a high threshold), which entails expensive multidimensional integrals; see, e.g., Wadsworth and Tawn (2014), Thibaud and Opitz (2015), Huser et al. (2017) and Huser and Wadsworth (2018). Other censoring schemes have been investigated by Engelke et al. (2015) and Opitz (2016), but they may cause some bias (Huser et al., 2016). More recently, Morris et al. (2017)impute the censored, nonextreme, observations using a simulationbased algorithm within a Bayesian framework. In this work, although our full dataset comprises 1218 monitoring stations over the contiguous U.S., the dimensionality is considerably reduced by adopting a local estimation approach, making it possible to perform traditional censored likelihood inference in a robust and efficient way.
Our main methodological contributions can be summarized as follows: we extend the stationary exponential factor copula model by allowing the parameter to change smoothly with locations, and provide further theoretical results for it. We develop a local likelihood approach adapted to the context of extremes to fit the stationary exponential factor copula model locally, censoring nonextreme observations. Finally, we derive simplified expression for the terms involved in the local censored likelihood function in order to compute maximum likelihood estimates in a reasonable time, while avoiding numerical instabilities.
In Section 2, the stationary exponential factor copula model is presented, and its tail properties are studied. Section 3 discusses censored local likelihood inference based on high threshold exceedances. In Section 4, a simulation study is conducted based on a nonstationary factor copula model, in order to assess the performance of our approach in various nonstationary contexts. In Section 5, we apply our proposed model to study the dependence structure of heavy precipitation over the whole contiguous U.S., and we use it to estimate return periods of spatial extreme events. Section 6 concludes with some discussion.
2 Modeling spatial extremes using factor copulas
2.1 Copula models
A copula is a multivariate probability distribution with
margins. Copulas are used to describe the dependence between random variables, and may be used to link univariate marginal distributions to construct a joint distribution. Specifically, let
be a random vector with continuous marginals
, . The copula of is defined through the random vector (with standard uniform margins) as . Sklar (1959) showed that each multivariate distribution with continuous margins has a unique copula , which may be expressed as(2.1) 
this implies that can be written in terms of univariate marginal distributions chosen independently from the dependence structure between the variables. This result can be associated with a twostep approach for inference, where margins are treated separately from the dependence structure. Based on (2.1), several copula families have been proposed and applied in practice for the modeling of environmental data, the most common one being the Gaussian copula obtained by choosing the joint distribution
to be the standard multivariate Gaussian distribution
with correlation matrix . Other more flexible elliptical copulas may be derived similarly, such as the Student copula, which is taildependent as opposed to the Gaussian copula. An alternative general family of copulas generating interesting tail dependence structures are factor copula models (Krupskii and Joe, 2015; Krupskii et al., 2018), in which a random and unobserved factor affects all measurements simultaneously. In Section 2.2, we describe the construction of the stationary exponential factor copula model proposed by Krupskii et al. (2018), which has appealing modeling and inference properties, and we then embed it in a more general nonstationary model in Section 4.1.2.2 The stationary exponential factor copula model
Let , , be a standard Gaussian process with stationary correlation function (see Gneiting et al. (2006) for a review of correlation functions), and let
be an exponentially distributed random variable with rate parameter
, independent of , and which does not depend on the spatial location . The exponential factor copula model may be expressed in continuous space through the random process(2.2) 
The process in (2.2) is a Gaussian location mixture, i.e., a standard Gaussian process with a random (exponentially distributed), constant mean. In this sense, it has similarities with the Student process, which can be viewed as a Gaussian scale
mixture (with standard deviation following a specific inverse gamma distribution). However, both models are not nested in each other, and their dependence structures have significant dissimilarities.
Although Model (2.2) may seem quite artificial, it is only used to generate a flexible upper tail dependence structure, which we then fit to precipitation extremes. In other words, we disregard the margins and only consider the copula associated to (2.2), which we fit to high threshold exceedances using a censored likelihood approach, reducing the contribution of small precipitation values; more details are given in Sections 3 and 5.
From (2.2), each configuration of sites yields a variate copula as follows: let and , . The random vector
has a multivariate normal distribution with correlation matrix
that depends on the correlation function and the sites’ configuration, i.e., . The components of the random vector are , , where is independent of ; by integrating out , the joint distribution of may be expressed as(2.3) 
whilst its density is
(2.4) 
where is the multivariate standard normal density with correlation matrix . In Appendix A we provide simpler and computationally efficient expressions to compute (2.3), (2.4), and partial derivatives of (2.3), avoiding the integral in . Applying (2.1), the resulting copula and its density may be written as
(2.5) 
where , , and and denote the marginal distribution and density of the process, respectively. In particular, we can show that
(2.6) 
where is the standard normal distribution function; see (A.3) in Appendix A.
To illustrate the tail flexibility of the stationary exponential factor copula model, Figure 3 displays the function defined in (1.1) as a function of the quantile level and the Euclidean distance , for different rate and range parameters, assuming an exponential correlation function , . While the range controls the correlation decay with distance, the rate determines the overall tail dependence strength and strongly impacts the value of and its limit at large distances, i.e., as .
In particular, because the random factor in (2.2) is common to all sites , spatial dependence in the process does not vanish as . Therefore, this stationary model may be useful for (replicated) spatial data collected on a local or regional scale, but it is generally not realistic on larger scales, such as the whole contiguous U.S. In Section 3, we develop a local estimation approach assuming that the stationary model (2.2) is only valid locally, and in Section 4.1 we discuss a flexible extension to generate nonstationary tail dependencies structures. As shown by Krupskii et al. (2018), the distribution of the random factor characterizes the tail properties of the copula (2.5); in particular, the stationary exponential factor copula model is taildependent (i.e., ) for fixed , and its uppertail dependence structure is governed the Hüsler and Reiss (1989) copula, which has been widely used for multivariate and spatial extremes (Davison et al., 2012). Moreover, as , Model (2.2) boils down to the Gaussian copula, which is tailindependent (i.e., ). Conversely, as , the
process is perfectly dependent over space. Thus, the exponential factor copula model interpolates between tail independence as
and perfect dependence as , while capturing a wide range of nontrivial tail dependence structures for .We now provide more detailed information on the subasymptotic behavior of Model (2.2), refining the description of its tail structure. The rate at which converges to , as , characterizes the flexibility of the process to capture subasymptotic extremal dependence. In practice, this is important, as the model will always be fitted at a finite threshold. Proposition 1 shows that the parameter regulates this rate of convergence; the smaller , the faster the convergence, and vice versa. We deferred the proof to Appendix B.
Proposition 1.
Consider Model (2.2) with rate . We have the expansion
where , and, for , , with such that in the limit as ,
The measure is linked to the bivariate extremal coefficient function (Davison et al., 2012) of the associated limiting maxstable random field, defined by
(2.7) 
The extremal coefficient takes values between 1 and 2, with 1 corresponding to complete dependence among locations and 2 to complete independence. For model (2.2), we have , where ; see Krupskii et al. (2018) and the Supplementary Material. The extremal coefficient encompasses the effects of all parameters involved in the model, i.e., , and we use it in our simulation experiments (Section 4) and the data application (Section 5) to study the performance of our model and to assess its ability to flexibly capture different levels of extremal dependence.
Other types of tail dependence structures can be obtained using alternative distributions for the random factor (see Krupskii et al., 2018), but we here restrict ourselves to the exponential factor copula model, which yields flexible tail structures and fast inference.
3 Local likelihood inference with partial censoring
The assumption of stationarity underlying (2.2) is unrealistic over large heterogeneous regions, such as the whole contiguous U.S., but it may be the starting point for more sophisticated models; in particular, the true precipitation data generating process might be approximately stationary in small regions, providing support for nonstationary, but locally stationary, models. We here assume that Model (2.2) provides a good approximation to the local tail dependence structure while stemming from a more complex global data generating process.
Local likelihood estimation for univariate threshold exceedances was proposed by Davison and Ramesh (2000), while Anderes and Stein (2011) investigated how such an approach may be applied in the spatial context based on a single realization from a Gaussian process. Risser and Calder (2015) used local likelihood to estimate the spatiallyvarying parameters of a nonstationary Gaussian process. Their model is simplified by modeling the locallyvarying anisotropies using an approach similar to the discrete mixture kernel convolution of Higdon (1998). We here detail how to perform local likelihood estimation based on high threshold exceedances by adapting the hardthresholding methodology of Anderes and Stein (2011) to the joint upper tail of the stationary copula model (2.2), under the assumption that it is valid in small regional neighborhoods. Unlike Anderes and Stein (2011), we assume that multiple replicates of the process are observed, with possibly arbitrary marginal distributions.
Since our focus is on the data’s tail dependence structure, we advocate a twostep semiparametric estimation procedure, whereby margins are first estimated at each site separately using the empirical distribution function, and the copula model is then estimated locally in a second step by maximum likelihood, censoring low (i.e., nonextreme) values to prevent them from influencing the fit. Such a twostep approach is standard in the copula literature, and has been studied in depth; see, e.g., Genest et al. (1995) and Joe (2014), Chapter 5.
More complex parametric approaches are also possible to estimate margins in the first step. Although marginal estimation is not our primary target, we study the effect of the rank transformation on the estimation of the copula, by comparing it to an approach based on the generalized Pareto distribution, and the optimal scenario where exact marginal distributions are used. We have found that the use of the rankbased empirical distribution function provides a fast, robust, and reliable method, which does not affect much the subsequent estimation of the copula (see Section 2.4 of the Supplementary Material).
The first step of our proposed estimation procedure consists in transforming the observed data nonparametrically to the uniform scale. Let denote independent and identically distributed observations at the th monitoring station, with essentially arbitrary margins. Pseudouniform scores may be obtained using ranks as follows:
In the second step, the scores , , are treated as a perfect random sample from the distribution. To estimate the spatial copula structure, we then discretize the space (in our case, the whole contiguous U.S.) into a fine grid . For each grid point , we assume that the local stationary copula model (2.2) with parameters is valid in a small neighborhood around . In what follows, these regional neighborhoods will be determined by a small number, , of nearest stations from the site , so we will write . Obviously, the choice of neighborhood is important: the stationary model (2.2) might be a poor approximation for large neighborhoods (i.e., for large ), whereas model fitting might be cumbersome for small neighborhoods characterized by a small number
of stations. This biasvariance tradeoff is tricky to deal with, and Section
5 describes one possible approach to mitigate this issue.To estimate the local tail dependence structure, we suggest using a censored likelihood approach, which is standard in statistics for spatial extremes, though it has never been applied to Model (2.2); see, e.g., Ledford and Tawn (1996), Thibaud et al. (2013), Huser and Davison (2014), Wadsworth and Tawn (2014), Thibaud and Opitz (2015) and Huser et al. (2017). Essentially, vectors with nonextreme observations (i.e., lower than a high threshold) are partially or fully censored to prevent these low values from affecting the estimation of the extremal dependence structure. More precisely, for each grid point with associated neighborhood , let , , denote the pseudouniform scores for each of the nearby stations in , and let , , be high marginal thresholds; in our application in Section 5, we take for all . Using the notation introduced in (2.3)–(2.6), the censored local loglikelihood may be expressed as
(3.1) 
where and , , is the index set of all noncensored observations (i.e., all vector components are extreme), is the index set of all fully censored observations (i.e., none of the vector components are extreme) with , is the index set of all partially censored observations (i.e., some, but not all, vector components are extreme), is the index set of threshold exceedances for the th vector of observations, and denotes differentiation with respect to the variables indexed by the set . Numerical maximization of (3) yields the maximum likelihood estimates for location . In Appendix A, we provide simple expressions for , and , which involve the variate Gaussian density and the multivariate Gaussian distribution in dimension and , respectively. When is large, the computation of the multivariate Gaussian distribution can significantly slow down the estimation procedure. However, thanks to our local approach, is typically quite small, and this allows to estimate the model at a reasonable computational cost. Furthermore, because the likelihood maximizations can be done independently at each grid point , we can easily take advantage of distributed computing resources to perform each fit in parallel.
Following Anderes and Stein (2011), it is possible to generalize (3) to obtain smoother parameter estimates over space. Specifically, we can instead maximize the weighted loglikelihood function defined as
(3.2) 
where is a nonnegative weight function, and , denotes the nested sequence of subsets comprising the first nearest neighbors of , with the convention that and . Choosing hardthresholding weights with for all boils down to (3). However, smoother parameter estimates may be obtained by selecting softthresholding weights that smoothly decay to zero near the neighborhood boundaries, e.g., using the biweight function , for some bandwidth . Although this estimation approach seems quite appealing, it significantly increases the computational burden, since the weighted loglikelihood function (3.2) requires computing (3) times. In the Supplementary Material, we compare hard and softthesholding weights in a simulation study, and do not notice any major difference in the results. This is due to the fact that, unlike Anderes and Stein (2011), multiple replicates are available to fit the model. For these reasons, and because the censored likelihood procedure is already quite intensive, we do not pursue this approach further in this paper. Subsequent simulation and real data experiments are all based on (3).
Remark that, as mentioned above, we assume that Model (2.2) provides a good approximation to the local tail dependence structure while stemming from a more complex global data generating process. To make sure that the local stationary model (2.2) can truly come from a welldefined global stochastic process, we describe in Section 4.1 one possible way to embed the model (2.2) into a nonstationary process, which we use in our simulations.
4 Simulation study
4.1 A nonstationary exponential factor copula model
We here briefly describe how we can extend the stationary exponential factor copula model in (2.2) to a nonstationary process that we use in our simulation study. Further details and asymptotic results are provided in Section 1 of the Supplementary Material. Our proposed model extension, equivalent to (2.2) on infinitesimal regions, is
(4.1) 
where is a zero mean Gaussian process with nonstationary correlation function and is a standard exponentially distributed common factor, independent of . The rate parameter , , is assumed to be a smooth surface, which dictates different degrees of tail dependence in distinct regions. Although (4.1) may not realistically capture longdistance dependencies owing to the latent factor being constant over space, its spatiallyvarying parameters flexibly describe the local dependence structure; Model (4.1) is therefore useful to “think” about the results globally, while generating various forms of extremal dependence locally (or regionally).
In the literature, different nonstationary correlation functions have been proposed; see, e.g., Fuentes (2001), Nychka et al. (2002), Stein (2005), Paciorek and Schervish (2006), and Reich et al. (2011). Here, we focus on a nonstationary, locally isotropic, Matérn correlation function with constant smoothness parameter , constructed through the kernel convolution approach advocated by Paciorek and Schervish (2006); it is defined as
(4.2) 
where , , is a smoothly varying range parameter, is the Gamma function and is the modified Bessel function of second kind of order . The stationary Matérn correlation function (obtained by setting for all ) has become popular because of its appealing properties (Stein, 1999). In particular, a Gaussian process with Matérn correlation function is times meansquare differentiable if and only if . For , it boils down to the exponential correlation function, which yields continuous but nondifferentiable sample paths. As , sample paths are infinitely differentiable. The nonstationary correlation function defined in (4.2) is locally Matérn, and therefore it inherits these attractive properties. An extension of (4.2) allowing for varying degrees of smoothness over space has been proposed in the unpublished manuscript of Stein (2005) (see also Anderes and Stein, 2011), but in practice, estimating is cumbersome and conservative approaches are usually adopted. In our application in Section 5, is treated as constant over space.
4.2 Datagenerating scenarios and simulation results
In this section, we study the flexibility of our copula model (4.1) to locally describe complex extremal dynamics across space, and we assess the performance of our local estimation approach based on (3) at capturing such nonstationary dependence structures. We also analyze the sensitivity of the parameter estimates to the neighborhood size.
Data are simulated on a grid in from the copula model stemming from (4.1) based on the nonstationary Matérn correlation function (4.2); independent replicates are generated. We choose three different levels of smoothness by fixing , and we consider three scenarios for the range and the rate parameters, representing various levels of nonstationarity in the tail behavior. The true parameter values and the bivariate extremal coefficient function (recall (2.7)) with respect to three different reference points are shown in Figure 4 for the weakly, mildly and strongly nonstationary scenarios.
aaLograte 
Mildly nonstat. Strongly nonstat. Estimates (mild case) 

aaaaRange 

Extremal coefficient 

In all simulations, the smoothness parameter is held fixed, while the rate and range parameters are estimated on a grid using the local estimation approach with censoring thresholds for all , as described in Section 3. To compute performance metrics and to measure the uncertainty of estimated parameters, we replicate all simulation experiments times.
We first detail the results for the mildly nonstationary scenario (second column of Figure 4) with and choosing nearest neighbors, while results for all simulation scenarios are summarized in Table 1. The rightmost column of Figure 4 displays mean surfaces, computed over the independent experiments, of the estimated lograte and range parameters, as well as the fitted bivariate extremal coefficient functions for the three different reference locations. By comparing the true surfaces to the mean estimates, the bias appears to be quite small overall, except perhaps at the boundaries of the study region, . This is a wellknown drawback of local estimation approaches: because the neighborhoods are asymmetric near the boundaries, the bias is generally more severe, and this can also be noticed in our case, e.g., in regions where
. This issue is similar to the boundary problem in kernel density estimation occurring with positively or compactly supported densities; some strategies have been advocated to deal with it, such as taking asymmetric kernels near the boundaries, or performing local linear regression (see, e.g.,
CastroCamilo et al., 2018), but we do not pursue this here. Apart from this minor boundary problem, our local estimation approach succeeds in capturing the complex nonstationary dependence dynamics over space.Configuration  

Weakly nonstationary  0.09  0.59  0.03  0.01  0.03  0.00 
Mildly nonstationary  0.09  0.69  0.06  0.02  0.04  0.00 
Strongly nonstationary  0.17  0.89  0.15  0.05  0.22  0.02 
To assess the variability of the parameter estimates, Figure 5 displays a functional boxplot for the range parameter , projected onto the axis representing the only direction of variation in the true values, and a surface boxplot for the bivariate extremal coefficient for fixed . Functional and surface boxplots are the natural extensions of the classical boxplot to the case of functional data, and we refer to Sun and Genton (2012) and Genton et al. (2014) for their precise interpretation.
As the true range parameter , , varies only with respect to , it is easier to visualize its estimated uncertainty. As expected, appears to be well estimated overall with higher uncertainty for larger values. The estimated median curve follows the true curve very closely, even near the boundaries and around , where the true curve is the steepest, while the functional interquartile range is fairly narrow for all values of . Moreover, the estimates for single runs, superimposed on the functional boxplot, suggest that the hardthresholding approach produces reasonably smooth estimates, even with only about 25 threshold exceedances at each location. In our application in Section 5, we have roughly 100 exceedances per location. As for the extremal coefficient function, the surface boxplot suggests that it can also be rather well estimated with relatively low uncertainty. Table 1 reports the root mean integrated squared error (RMISE) for all the data generating configurations described in Section 4.2. The results are coherent with our intuition: the estimation is more difficult for higher levels of nonstationarity and rougher random fields (i.e., with smaller ). Overall, simulations confirm that our local censored estimation approach provides reasonable estimates of the true range and rate parameters while capturing their dynamics over space.
To assess the performance of our local likelihood approach as a function of the neighborhood size, we fix the rate parameter to the strongly nonstationary scenario, while considering the weakly, mildly and strongly nonstationary cases for the range parameter (recall Figure 4). We then fit the copula model using the local likelihood approach described in Section 3 with neighborhoods defined in terms of nearest neighbors. Table 2 reports the RMISE for all cases. While the RMISE is quite small and fairly constant overall for the range , the rate seems more difficult to estimate, and it improves with lower degrees of nonstationarity for and bigger neighborhoods (i.e., with larger ), even in the strongly nonstationary scenario. This suggests that the size of neighborhoods will likely be dictated by available computational resources. Unless the nonstationarity is extremely severe, it is advisable to consider large neighborhoods, as this would improve the estimation efficiency at a fairly moderate cost in bias.
Configuration for the  

range parameter  
Weakly nonstationary  1.16  0.02  0.44  0.01  0.25  0.00  0.18  0.00  0.15  0.00 
Mildly nonstationary  1.33  0.03  0.61  0.01  0.44  0.01  0.34  0.01  0.3  0.01 
Strongly nonstationary  1.38  0.03  0.69  0.01  0.55  0.02  0.47  0.02  0.45  0.02 
5 Case study: U.S. winter precipitation extremes
5.1 Data description
Daily precipitation data, freely available online, were gathered from the U.S. Historical Climatological Network (USHCN). They are measured in hundredths of an inch and were collected from to at the stations represented in Figure 1. To ensure data quality, we discard all observations marked by any reliability or accuracy flag. We focus on winter data (December 21 to March 20) to remove seasonal effects, and we consider five daycumulative precipitation, in order to capture the intensity and duration of distinct storms and to reduce the effect of temporal dependence. A preprocessing stage was conducted to check for possible temporal nonstationarity, and no evidence of nonstationarity was found (see Section 2.1 of the Supplementary Material). This procedure yields up to observations per station with of missing data overall, which corresponds to about 2 million observations in total at all sites. The resulting dataset ranges from (no rain over five days) to hundredths of an inch.
The empirical and quantiles, plotted in Figure 6 using a logarithmic scale, reveal interesting spatial patterns that are due to marginal distributions varying smoothly over space. In order to disentangle the local marginal and dependence effects, we use the (twostep) local censored likelihood approach described in Section 3 and fit the exponential factor copula model; recall Sections 2 and 3.
5.2 Estimation grid and neighborhood selection
To describe the local dependence structure of precipitation extremes over the U.S., we generate a regular grid (using the WGS84/UTM zone 14N metric coordinate system) with grid points at an internodal distance of km. When plotted with respect to longitude and latitude, this results in a “distorted” grid, owing to the metrictodegree system change.
An important step to fit our model for extremal dependence at each grid point using the local estimation approach described in Section 3 is to select a suitable number of nearest neighbors , which might vary over space. Although crossvalidation techniques would usually be advisable, pragmatic approaches are often adopted in practice: in the time series context, Davison and Ramesh (2000) suggest selecting the local likelihood bandwidth by the naked eye, while in the spatial context, Anderes and Stein (2011)
advocate a heuristic method based on a measure of spatial variation in the estimated parameters. In principle, the choice of
should be such that the spatial dependence structure of threshold exceedances is approximately stationary within each selected neighborhood around . Small neighborhoods (with small ) yield good stationary approximations, but poor statistical efficiency, and vice versa, and our simulation study in Section 4 suggests that should be as large as our computational resources permit, provided the dependence structure is not overly nonstationary. In the hydrological literature, a variety of tests to assess homogeneity of marginal distributions have been proposed; see for instance Lu and Stedinger (1992), Fill and Stedinger (1995), Hosking and Wallis (1993), and Hosking and Wallis (2005). Testing for stationarity of the extremal dependence structure is, however, much more complicated. Here, for simplicity, we assume that the local dependence structure of extremes is stationary whenever this is the case for margins, and we follow the recommendations of Viglione et al. (2007) by testing the homogeneity of the margins through a compromise between the Hosking and Wallis (1993) test and a modified Anderson–Darling test (Scholz and Stephens, 1987). Our adhoc neighborhood selection procedure can, therefore, be summarized as follows: considering an increasing nested sequence of neighborhoods, we test for marginal homogeneity until the test rejects the null hypothesis or a predefined maximum neighborhood size has been reached. In our case, we fix the maximum neighborhood to have radius
km and, for computational reasons, . Although this procedure has no theoretical guarantee to be optimal, we have found that it yields reasonable estimates with our dataset.5.3 Local likelihood inference for extreme precipitation
Following Section 3, we fit the stationary exponential factor copula model (2.2) within small local neighborhoods, by maximizing the censored local loglikelihood (3) at all 2200 grid points , choosing the empirical quantile as a threshold to define extreme events. The lefthand panel of Figure 6 illustrates the selected threshold at each monitoring station on the scale of the observations. As the smoothness parameter in (4.2) is difficult to estimate, we adopt a profile likelihood approach by considering a grid of values for . Among all grid points, the likelihood is maximized in , , , and cases for , respectively. To be able to easily compare rate and scale parameter estimates across space, we then choose to fix at all locations, which boils down to using an exponential correlation function for the underlying Gaussian process. It is important to notice that we are not trying to estimate the smoothness parameter per se. Rather, the aim of the profile likelihood approach and the selection via frequencies is to select (or validate) in a nonarbitrary datadriven way, a reasonable value for . In general, estimation of the smoothness parameter is difficult; with a single realization in a fixed domain, not all the Matérn correlation parameters can be estimated consistently (see Zhang, 2004). Moreover, when is estimated jointly with and , we have found that the estimated parameters become very difficult to identify.
Thanks to the embarrassingly parallel nature of our local likelihood estimation procedure, we can make an efficient use of distributed computing resources by fitting the local model at each grid point independently; overall, a single model fit (fixing , but estimating and ) at all locations took about 25 thousand corehours on a cluster with 39 nodes of 20 cores each.
Figure 7 shows the estimated lograte and logrange parameters, and their estimated standard deviations computed over 300 block bootstrap samples with monthly blocks. Diverse spatial tail dependence structures arise in different regions, although the correlation between the estimated range and rate parameters makes it difficult to interpret.
To better understand the dynamics of the extremal dependence strength over space, we compute the local extremal coefficient as in (2.7). Since our model is fitted locally, interpretation of the extremal coefficient is possible within the selected neighborhoods. As can be seen in Figure 2 in the Supplementary Material, the minimum neighborhood radius is 45 km; therefore we compute the extremal coefficient at distance km; see Figure 8. As expected, tail dependence weakens at larger distances, and the patterns are quite smoothly varying over space. Extremal dependence is stronger in the Pacific West, central region (except for Colorado), midSouth, midWest and North East. By contrast, some states in the South East (e.g., North/South Carolina and Florida) and the Rocky Mountains show less extremal dependence, suggesting that extreme precipitation events are more localized in these regions.
To further validate our fitted model, Figure 2 compares empirical and fitted conditional tail probabilities, as defined in (1.1). The scarcity of observations in the right tail makes the empirical estimates of highly variable, which in turn yields wide empirical confidence envelopes (in dark blue). Nevertheless, the pointwise modelbased estimates (in green) and the 95% confidence envelope (in light blue) suggest that our model succeeds in capturing the extremal dependence between closeby sites, with relatively low uncertainty. Overall, our model has great tail flexibility, and our local estimation approach can uncover complex nonstationary patterns of extremal dependence over space, which has important implications in terms of regional risk assessment of extreme precipitation events.
It is wellknown that small changes in the joint tail dependence structure might result in an important difference in the return period of spatial extremes. To assess this, we compute the return periods (in years) associated with the joint probability of observing extreme precipitation in a set of three locations within five different states. The probabilities were computed by simulating replicates from our copula model over the same 60km internodal distance grid detailed before in Section 5.2, and extremes were defined as those values that exceeded the 99.5% quantile at each grid point (i.e., ). Table 3
displays the selected thresholds on the scale of the data for Louisiana, Mississippi, Kentucky, Florida and Tennessee, notorious to have experienced several extreme precipitation events in the recent past. Pointwise estimates along with bootstrapbased 95% confidence intervals show that the selected locations in the state of Louisiana are at higher risk of simultaneous extreme events than those in the other four states, and should experience such spatial extreme events once every 11 months, on average. These results, of course, strongly depend on the relative positions of the selected locations. Note that the uncertainty, which accounts for both marginal and dependence estimation uncertainty, is moderately low.
State  Thresholds (0.01 inch)  Return period (year) 

Louisiana  {590, 730, 650}  0.9 (0.88, 0.93) 
Mississippi  {520, 470, 640}  12.1 (11.07 13.17) 
Kentucky  {750, 870, 480}  22.7 (20.11 25.25) 
Florida  {550, 1090, 550}  2.3 (2.26 2.44) 
Tennessee  {910, 690, 780}  8.6 (7.98 9.14) 
6 Concluding remarks
We propose a flexible nonstationary factor copula model to model subasymptotic spatial extremes over large heterogeneous regions. Fitted locally, our model can capture weakening dependence strength as events become more extreme. This contrasts with the current spatial extremes literature, which often relies on asymptotic models with a rigid tail structure. Our model can be efficiently fitted to high threshold exceedances using a local censored likelihood estimation procedure at a reasonable computational burden. Our extensive simulation experiments demonstrate the flexibility of our model and the efficiency of our local estimation approach while providing some guidance on the selection of regional neighborhoods.
By fitting our model to precipitation data over the whole contiguous U.S., a diverse and complex tail dependence structure emerges, revealing rich and intuitive spatial patterns. Moreover, our model can be used in practice to compute return periods associated with simultaneous extreme events.
Although our proposed nonstationary model has great additional flexibility over its stationary counterpart, it is still unable to capture weak dependence at large distances, which is, however, a common feature of most models for spatial extremes. This implies that the fitted model needs to be interpreted locally, or perhaps regionally. Further research should, therefore, be devoted to developing models for spatial and spatiotemporal extremes that can more realistically capture longrange independence. Morris et al. (2017) discuss one way to achieve this based on random partitions of the study region.
Acknowledgements
We thank Luigi Lombardo (KAUST) for cartographic support and Eduardo González (KAUST) for computational support. We extend our thanks to Dan Cooley (Colorado State University) for helpful comments and suggestions. Support from the KAUST Supercomputing Laboratory and access to Shaheen is also gratefully acknowledged.
Appendix A Simplified expressions for the model likelihood
In this appendix, we obtain simplified expressions for the local censored likelihood function (3) based on Model (2.2), in order to compute the maximum likelihood estimates in a reasonable time, while avoiding numerical instabilities.
a.1 Joint density of the process in (2.2)
a.2 Joint distribution of the process in (2.2)
Here we express the finitedimensional distribution as a function of conditional multivariate normal distributions. This alternative formulation allows us to use efficient routines implemented in R, substantially improving accuracy and execution time. Using integration by parts in expression (2.3), we obtain
where and and are the conditional mean and covariance matrix of , respectively. Precisely,
where denotes the th column of the matrix with the th row removed, etc. Now,
(A.1) 
where follows the standard multivariate normal distribution in dimension , and is independent of the univariate normal random variable with mean and unit variance, and where . To compute (A.1), notice that the dimensional vector is jointly multivariate normal, and that has mean and covariance matrix . Hence, we have
where
Finally, we obtain
(A.2) 
In particular, by setting , the marginal distribution may be written as
(A.3) 
a.3 Partial derivatives of the joint distribution
We want to compute , the partial derivative of with respect to the variables indexed by the set . Without loss of generality, we here assume that , for some integer . Writing , we can express the covariance matrix in block notation as follows:
Then, writing , with and , we have
Comments
There are no comments yet.