New Exploratory Tools for Extremal Dependence: Chi Networks and Annual Extremal Networks

Understanding dependence structure among extreme values plays an important role in risk assessment in environmental studies. In this work we propose the χ network and the annual extremal network for exploring the extremal dependence structure of environmental processes. A χ network is constructed by connecting pairs whose estimated upper tail dependence coefficient, χ̂, exceeds a prescribed threshold. We develop an initial χ network estimator and we use a spatial block bootstrap to assess both the bias and variance of our estimator. We then develop a method to correct the bias of the initial estimator by incorporating the spatial structure in χ. In addition to the χ network, which assesses spatial extremal dependence over an extended period of time, we further introduce an annual extremal network to explore the year-to-year temporal variation of extremal connections. We illustrate the χ and the annual extremal networks by analyzing the hurricane season maximum precipitation at the US Gulf Coast and surrounding area. Analysis suggests there exists long distance extremal dependence for precipitation extremes in the study region and the strength of the extremal dependence may depend on some regional scale meteorological conditions, for example, sea surface temperature.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 18

03/19/2021

Recognizing a Spatial Extreme dependence structure: A Deep Learning approach

Understanding the behaviour of environmental extreme events is crucial f...
02/11/2022

POT-flavored estimator of Pickands dependence function

This work proposes an estimator with both Peak-Over-Threshold and Block-...
03/30/2022

Seasonality, density dependence and spatial population synchrony

Spatial population synchrony has been the focus of theoretical and empir...
10/02/2017

Local likelihood estimation of complex tail dependence structures in high dimensions, applied to U.S. precipitation extremes

In order to model the complex non-stationary dependence structure of pre...
05/07/2019

Tail dependence and smoothness

The risk of catastrophes is related to the possibility of occurring extr...
12/11/2019

Estimating high-resolution Red sea surface temperature hotspots, using a low-rank semiparametric spatial model

In this work, we estimate extreme sea surface temperature (SST) hotspots...
06/09/2021

Spatial modelling of COVID-19 incident cases using Richards' curve: an application to the Italian regions

We introduce an extended generalised logistic growth model for discrete ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The 2017 hurricane season was the costliest tropical cyclone season on record (Halverson, 2018). The season had 17 named storms, 10 hurricanes, and 6 major hurricanes. Great damage was done to US territories by three major hurricanes, Harvey, Irma, and Maria. These three storms were distinct both in location and timing: Harvey impacted Texas and Louisiana in mid-to-late August, Irma impacted Florida in early September, and Maria devastated Puerto Rico in late September. A standard treatment of the data from these three storms would likely treat them as independent events, due to their temporal and spatial distance. However, it is largely acknowledged that these storms are related, arising from conditions in 2017 conducive to tropical cyclone formation and intensification (Emanuel, 2011; Trenberth et al., 2018). While it is important to assess local impacts of each storm, understanding the connection of hurricane-induced extreme precipitation is also important for insurance companies or government agencies (e.g., Federal Emergency Management Agency (FEMA)) who need to prepare for the possibility of responding to multiple extreme events in a year. Thus we seek to use an approach that studies such phenomena in context of each other, specifically using a spatial network type approach.

The concept of networks has been used in climate science for about 15 years. The connection was first established by Tsonis and Roebber (Tsonis and Roebber, 2004; Tsonis et al., 2006), who introduced the concept of climate networks to study the dynamics of Earth’s climate system. A climate network consists of nodes (e.g., grid cells from a climate model output/gridded reanalysis data product) and edges. Two nodes are connected by an edge if the strength of a (pair-wise) dependence measure between the corresponding pairs of time-series is “strong” enough. For example, Tsonis and Roebber (2004) connected nodes with correlation greater than 0.5. This approach has been applied to many climate applications (e.g. Tsonis et al., 2006, 2008; Donges et al., 2009; Steinhaeuser et al., 2011) to gain insights into the dynamics of the climate system over many spatial scales, e.g., teleconnection paths.

Other important network types in climate science are event synchronization networks (Quiroga et al., 2002; Malik et al., 2010, 2012; Boers et al., 2014), where connections are defined based on whether extreme events at one point commonly coincide (within the span of a selected time interval) with extreme events at another point; phase synchronization networks (Yamasaki et al., 2009), where the signals at each point are viewed as oscillations, and connections are defined based on the apparent coupling between those oscillations; and causal networks (Runge, 2014; Ebert-Uphoff and Deng, 2014; Zerenner et al., 2014; Kretschmer et al., 2016), where connections are defined to represent potential cause-effect relationships. Causal networks can be established by many different methods. For example, they can be based on probabilistic graphical models (Pearl, 2000; Spirtes et al., 2000) (for their use in climate science, see for example (Ebert-Uphoff and Deng, 2014)

); on combining the concept of Granger causality with vector autoregression (VAR)

(Lütkepohl, 2007) or LASSO (Tibshirani, 1996) methods, as known as Graphical Lasso or GLasso, (Yuan and Lin, 2007; Friedman et al., 2008) (for their use in climate science see for example (Bahadori and Liu, 2011)); or on Gaussian models (Zerenner et al., 2014). Note that the majority of these network types are closely related to the concept of correlation. Namely, correlation networks use correlation itself (Tsonis and Roebber, 2004); many probabilistic graphical models are derived using partial correlation (Ebert-Uphoff and Deng, 2014);VAR and GLasso approaches approximate regression coefficients, which can be expressed in terms of partial correlation; and Gaussian models approximate the precision matrix (i.e., inverse covariance matrix), whose elements can be interpreted as partial correlation coefficients. Only two network types mentioned above are not based on correlation, namely phase synchronization networks, which with their assumption of oscillators do not appear to be a good match to model extremes, and event synchronization networks, which are in fact most closely related to the concepts discussed here, but are not based on well understood statistical measures.

Correlation measures the linear relationship between a bivariate random vector around the center of its distribution and hence is poorly suited for describing dependence in the tail, i.e., the dependence between extreme events. In contrast, several dependence measures designed specifically for extremes have been proposed in the literature and those are likely to be more appropriate than correlation to construct networks that focus on analyzing relationships between extremes. Three examples are the upper tail dependence, aka tail dependence coefficient () (Coles et al., 1999), the extremal coefficient () (Smith, 1990; Schlather and Tawn, 2003), and the F-madogram () (Cooley et al., 2006; Naveau et al., 2009). See Chapter 2 of Yan and Dey (2015) for a review. The upper tail dependence is defined as

(1)

where and

are two random variables and

their corresponding cumulative distribution function (CDF). Thus the

measure is the (limiting) conditional probability of one random variable being extreme, given that the other random variable is extreme. Note that the

measure is symmetric in terms of which variable is chosen to be conditioned upon, due to the standardization of the marginal CDFs.

In Section 2, we introduce a new network type, the network, where we use the aforementioned upper tail dependence () to explore the spatial dependence structure in extremes. We briefly proposed this idea at a workshop (Ebert-Uphoff et al. (2018)), but it is fully developed for the first time here. Specifically, we construct a network by connecting the pairs of nodes whose exceed some specified value (see Sec. 2.1 for details). However, we find that a straight forward implementation of this procedure does not perform well as it introduces many spurious network connections (see Sec. 2.2). This problem, which can be described as a network bias, is not unique to networks. In fact, we found that the same bias can occur in traditional correlation networks as well, although, to the best of our knowledge, this issue has never been discussed for correlation networks. In Sec. 2.3 we describe a method to alleviate this issue by incorporating the spatial structure on observed from the empirical estimates to reconstruct the network. Although we did not test this hypothesis yet, we conjecture that this bias correction method could be used to correct the bias in correlation networks as well.

The network assesses extremal dependence integrated over time, which is useful for assessing risk. However, as such it cannot be used to explore the “dynamics” (i.e., year-to-year) structure of extreme events in order to gain insights about the potential mechanisms that contribute to extreme behavior, such as the 2017 hurricane season. In Section 2.4 we develop another network type, the annual extremal network, that aims to explore the potential drivers of extreme behavior.

In Section 3, we create a network to hurricane-season maximum daily precipitation data collected in the US Gulf Coast to explore the spatial pattern of the extremal network connections. We also apply the annual extremal network to this precipitation data and examine whether sea surface temperature affects characteristics of the annual networks.

It should be noted that the purpose of the new network types proposed here is not to replace existing network types, such as correlation networks or event synchronization networks, but to supplement them. Namely, for any given application one might construct both a correlation network, which is more sensitive to identifying dependencies near the mean of the distribution, and a network, which is more sensitive to identifying dependencies between extremes. The two different networks are thus expected to provide two complementary viewpoints of the spatial dependencies. However, comparing these types of networks is not trivial, so is subject of future research, and here we simply focus on presenting the new type of network.

2 and Annual Extremal Networks

In this section we first describe how to construct a network, but also in tandem describe an illustrative simulation study. Our reason for interweaving these two discussions is that the simulation study exposes a bias issue with the network that arises due to the act of thresholding at a level of and creating edges for pairs of nodes whose estimated exceeds this threshold. We will begin by introducing the needed background in order to describe the procedure for estimating for each pair of nodes/locations in Section 2.1. We then present a simulation study to illustrate the issue of network bias (Sec. 2.2), which we then correct by conducting a spatial regularization on (Sec. 2.3). We conclude the section by describing the annual extremal networks (Sec. 2.4).

2.1 An Empirical Estimator of

We first discuss how we estimate from observations of a random sample of a bivariate random vector. To develop an estimator for one would need to first estimate the marginal distributions. Second, since is the limiting (conditional) probability, one would need to choose a threshold to estimate . There is a bias-variance trade-off to be made in the choice of . Using a that is not sufficiently high results in increased bias, while using a that is too high results in increased variance due to paucity of data. In practice, often is estimated for many values of and plotted (see, for example, the chiplot function of the evd package (Stephenson, 2002) for R (R Core Team, 2018)).

Our particular application, with an additional assumption of max-stability, allows us to avoid the selection of . Our motivation here is to better understand relationships between hurricane events in a given year, which leads us to investigate annual extremes. To make this precise, let , where be the original observations with sample size , be the block size, and be the sample size for the block maximum. For example, could denote the daily rainfall amounts at two locations, and the time index (e.g., starting from January 1st for some year). If we use (or for leap years), then would be the annual maximum precipitation amounts at these two locations in year , where their dependence is of interest for risk assessment.

Below, we use estimators of extremal dependence that have been proposed for the max-stable setting to obtain our estimator for . Max-stability is an important notion that underlies much of extreme value analysis and here we give a brief introduction, The reader is referred to (Coles et al., 2001; Beirlant et al., 2004; de Haan and Ferreira, 2006; Segers, 2012) for more details. A multivariate distribution is max–stable if the distribution is close under (component-wise) maximization. That is, each marginal distribution, by Extremal Types Theorem (Fisher and Tippett, 1928; Gnedenko, 1943), is belongs to the family of generalized extreme value (GEV) (Jenkinson, 1955; Coles et al., 2001). Furthermore, the dependence structure, which can be described by its copula (Joe, 1997; Nelsen, 2007) is satisfied the property that for all . The copula is called max-stable copula which form the basis for much of multivariate and spatial extreme value analysis (Davison et al., 2012; Ribatet and Sedki, 2013). The implication of max-stability is that any pairwise for a max-stable random vector is a constant with respect to and hence there is no need to choose a threshold. Moreover, there exists deterministic relationships between and the other aforementioned summary statistics for extremal dependence, namely, and under the max-stable condition. We will make use of these relationships to develop our estimator for in the next paragraph.

We first estimate the marginal distributions of and , denoted as , non-parametrically by their empirical distribution functions (EDFs) . We then estimate the F-madogram, the first order absolute difference in CDF

(2)

using the following empirical estimator

(3)

(Cooley et al., 2006) has shown that there is a one to one correspondence between and

(4)

Furthermore, by the max-stability assumption, we have . Therefore we use the plug-in estimate to obtain the resulting estimator of :

(5)

To create a network, we first apply this estimator to all the pairwise components of a dimensional random vector . We then connect those pairs with greater than a chosen threshold to construct the corresponding network.

2.2 Simulation Study

The purpose of this simulation study is to examine the performance of the empirical estimator introduced in the previous section for estimating the network, where the network consists of pairs with greater than some pre-specified threshold. To do that we will need to generate data from a random vector for which we know the true network for a given threshold. The estimated network can then be compared with the true network for evaluation. This simulated network thus serves as a benchmark for the proposed new network estimation procedure.

To make this simulation relevant in the context of environmental applications, we first simulate 100 spatial locations uniformly in a domain and these are the nodes of the network. We then generate realizations from a stationary and isotropic Brown-Resnick process (Brown and Resnick, 1977; Kabluchko et al., 2009) at these locations. A Brown–Resnick process is a flexible class of stationary max-stable processes where every finite collection of those random variables has a multivariate max-stable distribution. The dependence structure of a Brown–Resnick process is induced by a Gaussian process via the spectral representation of max-stable process (de Haan, 1984). The interested reader is referred to (Kabluchko et al., 2009; Huser and Davison, 2013; Thibaud et al., 2016) for more details. The implied value of a Brown–Resnick process for each pair only depends on their spatial distance, denoted by , and the (Gaussian) process parameters , the range, and , the smoothness: , where

is the CDF of the standard normal distribution. Setting

, , and , two nodes will be connected if their distance is less than 0.107, a relatively small distance compare with the spatial domain in our simulation study.

We simulate 100 realizations (the Monte Carlo sample size), each with sample size and dimension (i.e., a Brown-Resnick process observed at the aforementioned 100 spatial locations). For each realization in the simulation study, we estimate the network with using the empirical estimator described in Sec. 2.1. Comparing the estimated networks and the true network, it is apparent that the number of edges is overestimated (see Fig. 1 (a), (b) and (c)). We further evaluate the estimation performance in terms of the true positive rate (TPR) and positive predictive value (PPV), defined as

(6)
(7)

While TPR is reasonably high ( 80 %), the estimation procedure identifies many more false positives than true positives as the PPV is rather low (, see Fig. 1 (d)).

Figure 1: (a): The true network. (b): The best network estimate (i.e., the number of node connections is closest to the true network). (c): Histogram of estimated number of node connections, the blue and green vertical lines indicate the true and the “best” estimated value of the connections, respectively. (d) The TPR and PPV of the network connections.

Fig. 2 provides insight into how this biased estimate of the number of connections arises. Plotted is the true as a function of spatial distance along with one simulation’s estimates. Although the estimates themselves appear unbiased as they “center” around the true values, it is the act of thresholding which introduces the bias. Clearly there are more false positives than false negatives due to the shape of the true function and the increasing variance of the estimates as decreases. Note that this problem is not unique to the networks proposed here. In fact, we confirmed in another simulation that the same effect can also occur for the well-established correlation networks, although, to the best of our knowledge, this issue has never been mentioned in the literature on climate networks.

Figure 2: as a function of spatial distance. The red curve is the true function. The black points are individual estimates from one realization. The horizontal blue dashed-line is the cutoff () of the network and the vertical blue dashed-line is the spatial distance cutoff of the true network. The texts TP, FP, FN, TN denote true positive, false positive, false negative, and true negative, respectively.

2.3 Network Bias Correction

Here we develop a method to correct the bias of the network estimation as demonstrated in the previous subsection. We exploit the spatial structure revealed from (see Fig. 2) to spatially regularize the network estimate. In what follows we will describe a new estimator for , denoted by , which is a weighted average of empirical estimator and an estimator that explicitly incorporates spatial information . The weight for a given pair , denoted as , will depend on the estimation precision of and . The resulting estimator, the weighted average of and , has an empirical Bayes interpretation (see Loader and Switzer (1992) in the context of spatial covariance function estimation), can then be used to construct the network. Specifically, the bias-corrected estimator has the following form:

(8)

In the following we first describe how to estimate and then discuss how to come up with the weights .

2.3.1 The estimator for

We smooth the , the empirical estimates for all the pairs, by spatial distance to obtain . Specifically, we divide into distance bins and we compute the mean value of within each bin, denoted by . We then fit a cubic smoothing spline regression (Wahba, 1990; Gu, 2013) to , where denotes the average distance within the bin, using the sreg function in the fields R package (Nychka et al., 2015) to get the as a function of spatial distance only. Note that this approach effectively introduced a spatially stationary and isotropic structure on , which is less tenable especially when is “large”. Therefore, we assume for any , is normally distributed with variance increases with spatial distance in order to weaken this somewhat strong and rigid assumption in order to learn potential nonstationary pattern on .

2.3.2 The weights

The weight for a given pair is determined by the estimation variability of (), and the amount of departure from , quantified by a variance function: . We perform a “spatial” block bootstrap (see e.g., Wang et al. (2016), Appendix C, or Huang et al. (2016), Appendix D) by re-sampling the vectors to obtain the bootstrap variances as the estimates for the corresponding . We then use as the weight .

The estimator (8) can be formulated as the posterior mean from the hierarchical model

(9)
(10)
(11)

can be interpreted as an (empirical) Bayes (Robbins, 1955; Efron and Morris, 1972; Casella, 1985) estimator, where the empirical aspect arises from the fact that we estimate from .

2.3.3 An illustration

Following the bias correction procedure described in previous subsections, we first bin the into bins to obtain . We then fit a cubic spline regression to to get the estimate of (see Fig. 3, (a)), we then assume , which is an increasing (logistic) function with

. As described in previous subsection, the bootstrap standard deviations are used as the estimates of

and we compare those estimates with the “oracle” standard errors (i.e., standard deviations (SDs) among those 100 Monte Carlo simulations from the true model). As would be expected the bootstrap SDs are (slightly) higher than their corresponding oracle SDs but they show similar behavior, at least qualitatively. We can then plug in the

, , and to obtain the resulting estimate for each pair to construct the bias-corrected network. This estimator performs well as have been “shrunk” toward , which is much closer to the truth than (Fig. 3, (c)).

Figure 3: Illustration of bias correction. (a): The estimated (blue curve), true (red curve), and the empirical estimate (black points). (b): The scatter plot of . All the points are above the 1:1 line means that the bootstrap standard errors are slightly higher than the “true” standard errors. (c): As in Fig. 2 but we add the bias-corrected estimates (blue points).

The comparison of TPR/PPV for the empirical and the bias-corrected network estimators shows that our method not only identifies most of the true connections, i.e., TPR 1, but also avoids making many false discoveries, i.e., PPV for the bias-corrected estimator is substantially higher than that of the empirical estimator (see Tab. 1).

TPR percentile 5th 25th 50th 75th 95th
0.76 0.80 0.83 0.86 0.91
0.88 0.97 1 1 1
PPV percentile 5th 25th 50th 75th 95th
0.26 0.31 0.34 0.37 0.41
0.58 0.76 0.88 0.99 1
Table 1: Summary of TPR/PPV for the empirical and the bias-corrected network (i.e., > 0.3) estimators.

This bias correction method, with some modifications, is expected to apply also to correlation networks to compensate for their network bias, but that topic is beyond the scope of this work.

2.4 Annual Extremal Network

The estimation procedure of the network and its bias-corrected version assumes that the dependence structure of the block maximum, , does not change with “time” (i.e., , is a random sample from an underlying time invariant random vector ), so that one can use these “replicates” to infer the spatial extremal dependence. Moreover, we assume that is max-stable to avoid the need to choose a high threshold to estimate . Here we introduce another type of network, the annual extremal network, to examine the year-to-year variation of the “observed” extremal networks and we seek to explain this variability. The main idea of the annual extremal network is to connect a pair of nodes in a given “year” if extreme events occur at both locations. For example, an extreme event can be defined as the value exceeding a high percentile, say the 95th percentile. In this case, two locations are connected if both locations exceed their 20-year return level in a given “year”. Specifically, we treat the EDFs of , denoted by , where , as the “data” and for each , we connect pairs if both and are greater than .

The definition of the annual extremal network defined here has similaraties to the definition of event synchronization networks (Malik et al., 2010, 2012; Boers et al., 2014), but the latter operate on a completely different time scale. Event synchronization networks in climate science typically consider daily data, and define extreme events at each location to be those exceeding a certain threshold, e.g. the 95th percentile of the overall distribution. Two locations are connected if an extreme event at one location is frequently followed by an extreme event at another location within a matter of days, which can be used to track the propagation of individual phenomena from one location to another, e.g., the propagation of storm systems. In contrast, the annual extremal network, due to its annual scale, investigates whether the overall conditions in a given year lead to several events in the same year, although those events may not have any direct connection to each other. Thus, due to the different time scale (days vs. years) these two network types focus on very different phenomena.

Summary statistics (e.g., the number of the connections, the average distance of connections at each location) of this annual network are time series (on an annual scale) and can be related to some physically meaningful covariates to explore its year-to-year variation. In Section 3.3 we use the number of “long distance” connected pairs to relate this statistic to sea surface temperature. Note that the annual extremal network described here is constructed directly from the “data”, namely the EDFs for all the locations, whereas the network is constructed from the statistic that was estimated from the EDFs. Therefore, by ignoring the estimation uncertainty of EDFs themselves, we do not need to develop a bias correction procedure as we did for the network.

3 Gulf Coast Extreme Precipitation Application

3.1 Data Set

We use a subset of the Global Historical Climatology Network (GHCN) data set (Menne et al., 2012), where the spatial domain includes Texas, Louisiana, Mississippi, Alabama, Florida, and Georgia and the time period is from Jan. 1949 to Oct. 2017. There are 339 GHCN stations being included in this study where the selecting criterion is that those stations should have at least 90 percentage of non-missing daily precipitation values. We further restrict our attention to the annual maximum of the daily precipitation values from June to October to investigate the extremal dependence structure for the hurricane season.

3.2 Network for Gulf Coast Extreme Precipitation

We first compute the network based on the empirical estimator as described in Sec. 2.1 and we assess the network estimation uncertainty using a spatial block bootstrap. The results obtained from the block bootstrap (Fig. 4, Left panel) clearly demonstrate the issue of overestimation. We then conduct a bias correction using the method described in Sec. 2.3. The right panel in Fig. 4 displays the , , and as a function of the spatial distance.

Figure 4: Left: Histogram of estimated number of node connections of the block bootstrap sample, the black vertical lines indicate the estimated value of the connections of the original sample. Right: (gray points), (blue curve), and (black curve) of the Gulf Coast hurricane season maximum precipitation.

Fig. 5 shows that many long distance links in the network based on the empirical estimator disappear and may simply be due to estimation variability for using the empirical estimator. However, the bias-corrected network does identify some long distance edges, for example between the Houston and Tampa regions. It is important to note that although takes on values between 0 and 1, its values should not be interpreted in a similar manner to correlation values. A of 0.3 between two distant locations would imply strong tail dependence; that is, when one location experiences a very extreme event, the other would have a 30 percent chance of also experiencing an extreme event. Individual links are hard to interpret, and we do not believe there is some phenomenon that specifically links Houston to Tampa. Nevertheless, these long-distance network links at a minimum suggest that calculating annual risk between two distant regions like Houston and Tampa should not be done with an independence assumption as this would underestimate joint risk.

Figure 5: network of the hurricane-season maximum daily precipitation across US Gulf Coast. Top: The initial empirical estimate of the network. Bottom: The bias-corrected network estimate. (The connections in this figure are best viewed on-screen.)

3.3 Annual Extremal Precipitation Network for Gulf Coast

In this subsection we construct annual extremal networks in the study region to examine the year-to-year structure. Specifically, for each hurricane season we identify the connections where their EDFs of the hurricane season maximum exceed 0.95 (i.e., both places experience at least a 20-year event in that year, see Fig. 6 for examples). To further sutdy the long range extremal dependence, we compute the number of pairs where their spatial distance are greater than 1,000km. We then study how the inter-annual variability of the numbers of these “long distance” extremal pairs might be explained by some meteorological covariate, for example, annual mean sea surface temperature (SST) averaged in a relevant spatial region. Here we use the Hadley Centre Global monthly average Sea Surface Temperature version 1.1 (HadISST 1.1, Rayner et al. (2003)) and we compute the spatial average of the annual mean SST over a “rectangular” region with longitude from west to west and latitude from north to north where the annual window is defined as from July of the previous year to June (i.e., the 2017 mean SST is computed by averaging the monthly SST from July 2016 to June 2017).

Figure 6: Annual extremal precipitation networks for the year 1988 (Left) and 2017 (Right). The EDFs are plotted (circles) in the spatial domain using a blue-white-red color scheme (see the horizontal color bars). These two annual extremal networks illustrate the year to year variation. Note that the year 1988 is the year with lowest Gulf Coast SST and 2017 is the year with the highest SST. SST are added as images using a rainbow color scheme (see the vertical col bars).

We explore the potential cause of the annual variation in the number of extremal connections as shown in Fig. 6. In Fig. 7 (left panel), we plot two time series together, the red one for the proportion of the long distance (i.e., > 1,000km) extremal connections on a log scale (log ratio hereafter), where this log ratio is computed as the logarithm of the number of the connections divided by the total number of the long distance pairs in the study region in a given year. The blue line is the SST time series. There exists “co-movement” between these two time series motivated us to perform regression analyses where we use the log ratio as the response and SST as the predictor (see Fig. 7

, Right panel). Specifically, we perform two regression analyses to study this relationship. First, a linear regression (

lm):

Second, a generalized linear model (McCullagh and Nelder, 1989)

assuming the number of the long distance extremal connections follows a Poisson distribution with its log intensity, denoted by

, depends linearly on SST (glm):

Both the lm (although the homoscedasticity assumption is likely not appropriate here) and glm model fits suggest that the SST can explain some proportion of the year-to-year variation observed in the number of long distance extremal pairs, and the estimated slopes ( and ) are positive with p-values and . The results here suggest that the annual mean Gulf Coast SST may partly contribute to the abnormally long distance extremal dependence observed in the 2017 hurricane season.

Figure 7: Left: The time series of the proportion of the number of long distance extremal connections (red) and the time series of Gulf Coast annual mean SST (blue). Right: Scatterplot of the log of the long distance extremal connections versus Gulf Coast SST. The dashed line represents the linear regression (lm) fit and the solid line represents the Poisson regression (glm) fit.

3.4 Scientific Implications

More intense hurricanes (i.e., named storms) under climate change lead to great US economic losses (e.g., Emanuel, 2011; Klotzbach et al., 2018). Studies already found that number of extreme precipitation events associated with hurricane has increased in recent decades (e.g., Kunkel et al., 2010). But it is still largely unknown how the spatial dependence of extreme precipitation may respond. Sparse summer convective precipitation events have typical radius around 10-100km, which lead to short-distance spatial dependence. However, major Atlantic hurricanes can grow to 800km measured with the outer radius of vanishing wind (Chavas et al., 2016). Such large low-pressure system deliver huge amount of water when it moves from the warm ocean to land. In this case, faraway places like some regions in Texas and Florida could all encounter extreme precipitation during a hurricane season. Here our biase-corrected network (see Fig. 5, Bottom panel) captures a range of precipitation spatial dependence, not only the common short-distance connectivity among nearby stations but also the long-distance connectivity across states associated with hurricane events. With this overall understanding for the spatial extremal dependence, we further examine the temporal variation of the annual extremal precipitation networks and we investigate the potential relationship with the US Gulf Coast SST. The analyses in Sec. 3.3 reveal there exists a robust relationship between the number of long distance extremal precipitation connection in southern states of US and pre-hurricane season (previous year July to current year June) SST in Gulf of Mexico. It agrees with the understanding that increased ocean heat content leads to more tropical cyclone (Trenberth et al., 2018), which in turn induces more long-distance preciptation extremes. In summary, by combining the extremal dependence estimates with the network analysis, we are able to quantitatively measure how the hurricane-induced extreme precipitation changes under climate change. The network perspective enables us to reveal the structural change among a suite of stations, which provide additional information to standard analysis based on individual locations.

4 Discussion

This work presents two new tools, the and annual extremal networks, for exploring dependence structure of extreme values for space-time data sets. In contrast to the widely used correlation network in the climate literature, the network uses statistics to summarize extremal dependence, and is hence suitable for studying extremal dependence. We demonstrate through a simulation study that a network constructed by using an empirical estimator could overestimate the degree of the network. We found that this issue can also arise if one uses the sample correlation to construct a correlation network. We propose a bias-correction method to incorporate the spatial dependence structure to alleviate the network estimation bias for the network. We expect that this bias correction procedure could also be used to overcome the bias in traditional correlation networks. We then apply the network to the Gulf Coast hurricane season rainfall extremes to study the spatial dependence structure of the annual maximum precipitation observed over the Gulf Coast.

We also apply the annual extremal network to further explore the potential non-stationary long-range extremal dependence. Our results suggest that the sea surface temperature may contribute to the abnormally long distance extremal dependence observed in the 2017 hurricane season.

Due to the exploratory nature of the network, there are some limitations in terms of providing the whole picture of the extremal dependence structure of an environmental process of interest. First, the statistic is a measure of pairwise tail dependence and hence not able to describe the higher order extremal dependence. Second, is an appropriate extremal dependence metric in the case of asymptotic dependence, it is less clear about the usefulness of this measure if the environmental process of interest exhibit weakening spatial dependence as events become more extreme (Huser et al., 2017; Wadsworth et al., 2017; Huser and Wadsworth, 2018; Bopp et al., 2018).

In this work, we make several simplifying assumptions for developing our bias-corrected method in order to obtain a close form expression for our estimator. One could try to relax these assumptions at the price that the computations needed for a sampling based approach, which can be substantial. Given that the number of pairs will be large for most environmental applications, we prefer a simple method such as the one we proposed here, that allows us to quickly explore the extremal dependence structure, which could provide some guidance on how to proceed the following confirmatory analysis.

References

  • Bahadori and Liu (2011) M. T. Bahadori and Y. Liu. Granger causality analysis with hidden variables in climate science applications. In Climate Informatics workshop (CI 2011), 2011.
  • Beirlant et al. (2004) J. Beirlant, Y. Goegebeur, J. Segers, and J. Teugels. Statistics of extremes: theory and applications. John Wiley & Sons, 2004.
  • Boers et al. (2014) N. Boers, A. Rheinwalt, B. Bookhagen, H. M. Barbosa, N. Marwan, J. Marengo, and J. Kurths. The south american rainfall dipole: a complex network analysis of extreme events. Geophysical Research Letters, 41(20):7397–7405, 2014.
  • Bopp et al. (2018) G. P. Bopp, B. A. Shaby, and R. Huser. A hierarchical max-infinitely divisible process for extreme areal precipitation over watersheds. arXiv preprint arXiv:1805.06084, 2018.
  • Brown and Resnick (1977) B. M. Brown and S. I. Resnick. Extreme values of independent stochastic processes. Journal of Applied Probability, 14(4):732–739, 1977.
  • Casella (1985) G. Casella. An introduction to empirical bayes data analysis. The American Statistician, 39(2):83–87, 1985.
  • Chavas et al. (2016) D. R. Chavas, N. Lin, W. Dong, and Y. Lin. Observed tropical cyclone size revisited. Journal of Climate, 29(8):2923–2939, 2016.
  • Coles et al. (1999) S. Coles, J. Heffernan, and J. Tawn. Dependence measures for extreme value analyses. Extremes, 2(4):339–365, 1999.
  • Coles et al. (2001) S. Coles, J. Bawa, L. Trenner, and P. Dorazio. An introduction to statistical modeling of extreme values, volume 208. Springer, 2001.
  • Cooley et al. (2006) D. Cooley, P. Naveau, and P. Poncet. Variograms for spatial max-stable random fields. In Dependence in probability and statistics, pages 373–390. Springer, 2006.
  • Davison et al. (2012) A. C. Davison, S. Padoan, M. Ribatet, et al. Statistical modeling of spatial extremes. Statistical Science, 27(2):161–186, 2012.
  • de Haan (1984) L. de Haan. A spectral representation for max-stable processes. The annals of probability, pages 1194–1204, 1984.
  • de Haan and Ferreira (2006) L. de Haan and A. Ferreira. Extreme value theory: an introduction. Springer, 2006.
  • Donges et al. (2009) J. F. Donges, Y. Zou, N. Marwan, and J. Kurths. Complex networks in climate dynamics. The European Physical Journal Special Topics, 174(1):157–179, 2009.
  • Ebert-Uphoff and Deng (2014) I. Ebert-Uphoff and Y. Deng. Causal discovery from spatio-temporal data with applications to climate science. In Machine Learning and Applications (ICMLA), 2014 13th International Conference on, pages 606–613. IEEE, 2014.
  • Ebert-Uphoff et al. (2018) I. Ebert-Uphoff, W. Huang, A. Mitra, D. Cooley, S. Chatterjee, C. Chen, and Z. Wang. Studying extremal dependence in climate using complex networks. In Proceedings of the 8th International Workshop on Climate Informatics (CI 2018), Boulder, CO, 2018.
  • Efron and Morris (1972) B. Efron and C. Morris. Limiting the risk of bayes and empirical bayes estimators?part ii: The empirical bayes case. Journal of the American Statistical Association, 67(337):130–139, 1972.
  • Emanuel (2011) K. Emanuel. Global warming effects on us hurricane damage. Weather, Climate, and Society, 3(4):261–268, 2011.
  • Fisher and Tippett (1928) R. A. Fisher and L. H. C. Tippett. Limiting forms of the frequency distribution of the largest or smallest member of a sample. In Mathematical Proceedings of the Cambridge Philosophical Society, volume 24, pages 180–190. Cambridge Univ Press, 1928.
  • Friedman et al. (2008) J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3):432–441, 2008.
  • Gnedenko (1943) B. Gnedenko. Sur la distribution limite du terme maximum d’une serie aleatoire. Annals of mathematics, pages 423–453, 1943.
  • Gu (2013) C. Gu. Smoothing spline ANOVA models. Springer series in statistics, 297. Springer, New York, 2nd ed. edition, 2013. ISBN 1299337511.
  • Halverson (2018) J. B. Halverson. The costliest hurricane season in us history. Weatherwise, 71(2):20–27, 2018.
  • Huang et al. (2016) W. K. Huang, M. L. Stein, D. J. McInerney, S. Sun, and E. J. Moyer. Estimating changes in temperature extremes from millennial-scale climate simulations using generalized extreme value (gev) distributions. Advances in Statistical Climatology, Meteorology and Oceanography, 2(1):79–103, 2016. doi: 10.5194/ascmo-2-79-2016. URL https://www.adv-stat-clim-meteorol-oceanogr.net/2/79/2016/.
  • Huser and Davison (2013) R. Huser and A. C. Davison. Composite likelihood estimation for the brown–resnick process. Biometrika, 100(2):511–518, 2013.
  • Huser and Wadsworth (2018) R. Huser and J. L. Wadsworth. Modeling spatial processes with unknown extremal dependence class. Journal of the American Statistical Association, pages 1–11, 2018.
  • Huser et al. (2017) R. Huser, T. Opitz, and E. Thibaud. Bridging asymptotic independence and dependence in spatial extremes using gaussian scale mixtures. Spatial Statistics, 21:166–186, 2017.
  • Jenkinson (1955) A. F. Jenkinson. The frequency distribution of the annual maximum (or minimum) values of meteorological elements. Quarterly Journal of the Royal Meteorological Society, 81(348):158–171, 1955.
  • Joe (1997) H. Joe. Multivariate models and multivariate dependence concepts. Chapman and Hall/CRC, 1997.
  • Kabluchko et al. (2009) Z. Kabluchko, M. Schlather, L. De Haan, et al. Stationary max-stable fields associated to negative definite functions. The Annals of Probability, 37(5):2042–2065, 2009.
  • Klotzbach et al. (2018) P. J. Klotzbach, S. G. Bowen, R. Pielke Jr, and M. Bell. Continental united states hurricane landfall frequency and associated damage: Observations and future risks. Bulletin of the American Meteorological Society, (2018), 2018.
  • Kretschmer et al. (2016) M. Kretschmer, D. Coumou, J. F. Donges, and J. Runge. Using causal effect networks to analyze different arctic drivers of midlatitude winter circulation. Journal of Climate, 29(11):4069–4081, 2016.
  • Kunkel et al. (2010) K. E. Kunkel, D. R. Easterling, D. A. Kristovich, B. Gleason, L. Stoecker, and R. Smith. Recent increases in us heavy precipitation associated with tropical cyclones. Geophysical Research Letters, 37(24), 2010.
  • Loader and Switzer (1992) C. Loader and P. Switzer. Spatial covariance estimation for monitoring data. Statistics in the Environmental and Earth Sciences, pages 52–70, 1992.
  • Lütkepohl (2007) H. Lütkepohl. New introduction to multiple time series analysis. Springer Science & Business Media, 2nd edition, 2007.
  • Malik et al. (2010) N. Malik, N. Marwan, and J. Kurths. Spatial structures and directionalities in monsoonal precipitation over south asia. Nonlinear Processes in Geophysics, 17(5):371–381, 2010.
  • Malik et al. (2012) N. Malik, B. Bookhagen, N. Marwan, and J. Kurths. Analysis of spatial and temporal extreme monsoonal rainfall over south asia using complex networks. Climate dynamics, 39(3-4):971–987, 2012.
  • McCullagh and Nelder (1989) P. McCullagh and J. A. Nelder. Generalized linear models, volume 37. CRC press, 1989.
  • Menne et al. (2012) M. J. Menne, I. Durre, R. S. Vose, B. E. Gleason, and T. G. Houston. An overview of the global historical climatology network-daily database. Journal of Atmospheric and Oceanic Technology, 29(7):897–910, 2012.
  • Naveau et al. (2009) P. Naveau, A. Guillou, D. Cooley, and J. Diebolt. Modelling pairwise dependence of maxima in space. Biometrika, 96(1):1–17, 2009.
  • Nelsen (2007) R. B. Nelsen. An introduction to copulas. Springer Science & Business Media, 2007.
  • Nychka et al. (2015) D. Nychka, R. F. Furrer, J. Paige, and S. Sain. fields: Tools for spatial data, 2015. URL www.image.ucar.edu/fields. R package version 8.10.
  • Pearl (2000) J. Pearl. Causality - Models, Reasoning and Inference. Cambridge University Press, reprinted with corrections edition, 2000.
  • Quiroga et al. (2002) R. Q. Quiroga, T. Kreuz, and P. Grassberger. Event synchronization: a simple and fast method to measure synchronicity and time delay patterns. Physical review E, 66(4):041904, 2002.
  • R Core Team (2018) R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2018. URL http://www.R-project.org/.
  • Rayner et al. (2003) N. Rayner, D. E. Parker, E. Horton, C. Folland, L. Alexander, D. Rowell, E. Kent, and A. Kaplan. Global analyses of sea surface temperature, sea ice, and night marine air temperature since the late nineteenth century. Journal of Geophysical Research: Atmospheres, 108(D14), 2003.
  • Ribatet and Sedki (2013) M. Ribatet and M. Sedki. Extreme value copulas and max-stable processes. Journal de la Société Française de Statistique, 154(1):138–150, 2013.
  • Robbins (1955) H. Robbins. An empirical bayes approach to statistics. Technical report, COLUMBIA UNIVERSITY New York City United States, 1955.
  • Runge (2014) J. Runge. Detecting and quantifying causality from time series of complex systems. PhD thesis, Humboldt-Universität zu Berlin, Mathematisch-Naturwissenschaftliche Fakultät, 2014.
  • Schlather and Tawn (2003) M. Schlather and J. A. Tawn. A dependence measure for multivariate and spatial extreme values: Properties and inference. Biometrika, 90(1):139–156, 2003.
  • Segers (2012) J. Segers. Max-stable models for multivariate extremes. REVSTAT-Statistical Journal, 10(1):61–82, 2012.
  • Smith (1990) R. L. Smith. Max-stable processes and spatial extremes. 1990.
  • Spirtes et al. (2000) P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction, and Search. MIT Press, 2nd edition, 2000.
  • Steinhaeuser et al. (2011) K. Steinhaeuser, N. V. Chawla, and A. R. Ganguly. Complex networks as a unified framework for descriptive analysis and predictive modeling in climate science.

    Statistical Analysis and Data Mining: The ASA Data Science Journal

    , 4(5):497–511, 2011.
  • Stephenson (2002) A. G. Stephenson. evd: Extreme value distributions. R News, 2(2):31–32, 2002.
  • Thibaud et al. (2016) E. Thibaud, J. Aalto, D. S. Cooley, A. C. Davison, J. Heikkinen, et al. Bayesian inference for the brown–resnick process, with an application to extreme low temperatures. The Annals of Applied Statistics, 10(4):2303–2324, 2016.
  • Tibshirani (1996) R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267–288, 1996.
  • Trenberth et al. (2018) K. E. Trenberth, L. Cheng, P. Jacobs, Y. Zhang, and J. Fasullo. Hurricane harvey links to ocean heat content and climate change adaptation. Earth’s Future, 2018.
  • Tsonis and Roebber (2004) A. A. Tsonis and P. J. Roebber. The architecture of the climate network. Physica A: Statistical Mechanics and its Applications, 333:497–504, 2004.
  • Tsonis et al. (2006) A. A. Tsonis, K. L. Swanson, and P. J. Roebber. What do networks have to do with climate? Bulletin of the American Meteorological Society, 87(5):585–595, 2006.
  • Tsonis et al. (2008) A. A. Tsonis, K. L. Swanson, and G. Wang. On the role of atmospheric teleconnections in climate. Journal of Climate, 21(12):2990–3001, 2008.
  • Wadsworth et al. (2017) J. Wadsworth, J. A. Tawn, A. Davison, and D. M. Elton. Modelling across extremal dependence classes. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(1):149–175, 2017.
  • Wahba (1990) G. Wahba. Spline models for observational data. CBMS-NSF Regional Conference series in applied mathematics; 59. Society for Industrial and Applied Mathematics (SIAM, 3600 Market Street, Floor 6, Philadelphia, PA 19104), Philadelphia, PA., 1990. ISBN 0898712440.
  • Wang et al. (2016) J. Wang, Y. Han, M. L. Stein, V. R. Kotamarthi, and W. K. Huang. Evaluation of dynamically downscaled extreme temperature using a spatially-aggregated generalized extreme value (gev) model. Climate dynamics, 47(9-10):2833–2849, 2016.
  • Yamasaki et al. (2009) K. Yamasaki, A. Gozolchiani, and S. Havlin. Climate networks based on phase synchronization analysis track el-niño. Progress of Theoretical Physics Supplement, 179:178–188, 2009.
  • Yan and Dey (2015) J. Yan and D. K. Dey. Extreme Value Modeling and Risk Analysis. Chapman and Hall/CRC, New York, 1st edition, 2015.
  • Yuan and Lin (2007) M. Yuan and Y. Lin. Model selection and estimation in the gaussian graphical model. Biometrika, 94(1):19–35, 2007.
  • Zerenner et al. (2014) T. Zerenner, P. Friederichs, K. Lehnertz, and A. Hense. A gaussian graphical model approach to climate networks. Chaos: An Interdisciplinary Journal of Nonlinear Science, 24(2):023103, 2014.