Climate extreme event attribution using multivariate peaks-over-thresholds modeling and counterfactual theory

08/08/2019 ∙ by Anna Kiriliouk, et al. ∙ 0

Numerical climate models are complex and combine a large number of physical processes. They are key tools to quantify the relative contribution of potential anthropogenic causes (e.g., the current increase in greenhouse gases) on high impact atmospheric variables like heavy rainfall. These so-called climate extreme event attribution problems are particularly challenging in a multivariate context, that is, when the atmospheric variables are measured on a possibly high-dimensional grid. In this paper, we leverage two statistical theories to assess causality in the context of multivariate extreme event attribution. As we consider an event to be extreme when at least one of the components of the vector of interest is large, extreme-value theory justifies, in an asymptotical sense, a multivariate generalized Pareto distribution to model joint extremes. Under this class of distributions, we derive and study probabilities of necessary and sufficient causation as defined by the counterfactual theory of Pearl. To increase causal evidence, we propose a dimension reduction strategy based on the optimal linear projection that maximizes such causation probabilities. Our approach is tested on simulated examples and applied to weekly winter maxima precipitation outputs of the French CNRM from the recent CMIP6 experiment.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Quantifying human influence on climate change and identifying potential causes of climate extremes is often referred to as extreme event attribution (EEA), which falls within the research field of detection and attribution (D&A) (see, e.g. the report of NAP16; Chen18)

. In such studies, the main inferential objective is estimation of extreme quantiles (return levels), well-used in finance, hydrology and other fields of risk analysis

(e.g. Embrechts97). In EEA, such small probabilities and their associated return levels are computed under two scenarios that differ according to the causal link of interest, often increases in greenhouse gases (GHG) concentrations (see, e.g. Angelil17; stott16-review; fischer2015). Typically, such an approach compares probabilities under a factual scenario of conditions that occurred around the time of the event against probabilities under a counterfactual scenario in which anthropogenic emissions never occurred. Under this set-up, one can compare the probability of an extreme event in the factual world, denoted , to the probability of an extreme event in a counterfactual world, i.e., a world that might have been if no anthropogenic forcing would have existed. The definition of the so-called extreme event is by itself a non-trivial task and depends on the application at hand. A common choice is to take some climatic index exceeding a high threshold. In their seminal paper, Stott04 studied mean June-August temperatures in Europe in order to quantify by how much human activities may have increased the risk of European heatwaves. In this example, a one dimensional sample mean, say , summarized a complex random field that varied in time and space. The set where Kelvin was chosen to resemble the 2003 mean European summer anomaly temperatures. The probabilities and were then inferred from numerical counterfactual and factual runs respectively, using non-parametric inference techniques (for bootstrap counting methods in EEA, see Paciorek18) and univariate extreme-value theory (EVT); for an EVT introduction, see, e.g. Coles01. For trend detection problems in D&A, there exists a variety of EVT models, see e.g. the pioneering work of Kharin05 and Kharin07.

In other environmental research areas, complex multivariate EVT models are commonly used (see, e.g. Davison15; Cooley17; Fondeville19; Fondeville18; Engelke19; Reich13; Engelke19; Shaby12). Bayesian hierarchical modeling (see, e.g. Hammerling17; Katzfuss17) also offers a flexible way to insert different layers of complexity present in climate D&A problems (internal natural variability, numerical model uncertainty, observational errors, sampling uncertainty in space and time, etc.). Despite these advances, the EEA domain remains a fairly untouched territory in terms of multivariate EVT. Even recent applied papers like Kew19; Luu18; Otto18 and King17 are based on univariate EVT only.

Our first objective is to investigate how multivariate EVT could be used for event attribution. As extreme events in D&A are mostly expressed in terms of threshold exceedances, like in Stott04, this naturally leads to the question of how to integrate the multivariate generalized Pareto distribution (GPD) introduced by tajvidi1996 into the EEA framework. This distribution has been tailored to represent extremal behaviors when at least one of the components of the vector of interest is large. The probabilistic properties of the multivariate GPD have been well studied by, among others, beirlant2004; rootzen2006; falk2008; ferreira2014; rootzen2018 and rootzen2018nr2, while statistical modeling is more recent (huser2015; kiriliouk2018).

In most univariate EEA studies (see stott16-review, and references therein), two types of probability ratios are considered: the Risk Ratio and the so-called Fraction of Attributable Risk (FAR), defined by

where corresponds to probability of exceeding the threshold in the counterfactual world, while represents the same quantity in the factual world. Using the counterfactual theory of pearl2000, the FAR corresponds to the probability of necessary causation, i.e., anthropogenic forcings are necessary for the extreme event to occur, but might not be sufficient. Within the Gaussian set-up, hannart2016 and hannart18 highlighted the link between causality theory and event attribution studies. The second objective of our work is to explain how Pearl’s counterfactual theory can be applied within a multivariate GPD framework, and to identify conditions that increase the level of causality, a fundamental feature in any EEA analysis.

The rest of the paper is structured as follows. Section 2 summarizes the relevant background of both multivariate GPDs and climate event attribution. Section 3 discusses the behaviour of univariate probabilities of causation as a function of the threshold . In Section 4, we make the link between multivariate GPDs and causality by maximizing necessary causation for any linear projection and we discuss the inference strategy. Finally, in Section 5, the proposed methods are applied to weekly winter maxima of precipitation outputs from the French CNRM model that are part of the recent CMIP6 experiment. Technical details are deferred to the appendices.

2 Background

2.1 The multivariate generalized Pareto distribution

The basic building block to construct standardized multivariate GPD random vectors (rootzen2006; rootzen2018) is the stochastic representation



corresponds to a univariate exponential random variable with unit mean, and

represents any -dimensional random vector independent of . One can easily check that each positive conditional margin has a unit exponential survival function,

Model (2.1) can be generalized by setting, for and ,


where operations like have to be understood componentwise. We then denote . Equation (2.2) implies

where denotes the survival function of the univariate GPD with scale parameter and shape parameter . Hence, the conditional margins follow univariate GPDs111although the random variables may not follow GPDs themselves. .

Theoretically, the random vector defined by (2.1) and (2.2) can be viewed as the limiting solution of any linearly rescaled multivariate vector given that at least one component is large (rootzen2006; rootzen2018). This asymptotic result can be interpreted as a multivariate version of the Pickands–Balkema–de Haan theorem (pickands1975; balkema1974).

Let denote a random vector in , representing a -variate observation, and let denote a high threshold. When , univariate peaks-over-thresholds approaches (davison1990) consist of fitting to a univariate GPD. Similarly, the multivariate GPD approximates the tail behavior of , where means that at least one component of exceeds the corresponding component of . By construction, multivariate GPD vectors exhibit asymptotic dependence, i.e., is well-approximated by a member of the class of multivariate GPDs if puts no mass on lower-dimensional subspaces.

The random “generator” in (2.1) drives the extremal dependence of , often summarized by the tail dependence coefficient . If , denote the unconditional marginal distribution functions of , then measures the probability of being large given that is large as the threshold increases,

A large value of corresponds to strong extremal dependence between and , whereas corresponds to tail independence. For more details on in the context of multivariate GPDs, see the supplementary material in kiriliouk2018. As an example, Figure 1 displays 500 bivariate random draws from a multivariate GPD model where is zero-mean bivariate Gaussian with unit covariance matrix , corresponding to .

Figure 1: Scatterplots and density contours from 500 bivariate GPD random draws using (2.1) and (2.2) with parameters , for the left panel and , for the right panel. The generator is zero-mean bivariate Gaussian with unit covariance matrix .

To make the link with the probabilities and used for EEA, we need a tool to project the information contained in a possibility complex multivariate GPD structure into a single valued summary. The following proposition provides this key tool.

Proposition 2.1 (Linear-projection, rootzen2018).

If with , then for any non-negative weights such that , the linear projection of , conditioned on being positive, follows a univariate GPD, i.e.,

2.2 Climate event attribution and counterfactual theory

The question of attribution in EEA is rooted in causality assessment, so that connections between the probabilities and and some type of causality are called for. To explain the link between some event (e.g. the 2003 European heatwave) and its potential cause (e.g. the increase of GHG emissions), pearl2000 makes the distinction between three forms of causality. These types of causality can be expressed in a probabilistic manner as

  1. probability of necessary causation (PN): is required for but other factors might be required as well;

  2. probability of sufficient causation (PS): always triggers but might occur without ;

  3. probability of necessary and sufficient causation (PNS): both of the above hold.

In most applications, these probabilities are difficult to estimate. In the special case of climate EEA where one assumes that both factual and counterfactual worlds are available from numerical experiments, hannart2016 exploited the fact that is monotonous with respect to the external forcings (increase in GHGs likely leads to a warmer climate) and all forcings are exogenous (fossil energy, volcanic forcings are not part of the climate system). This simplifies the expressions of PN, PS and PNS that become


where is the probability of in the counterfactual world and is the probability of in the factual world. If , then PN coincides with the FAR used by Stott04. In the remainder of this paper, we will use the notation PN (instead of FAR) to highlight its causal interpretation. By construction, one has PNS PN,PS and hence it is worth noticing that a low PNS does not imply the absence of a causal relationship.

One important objective of this paper is to combine multivariate EVT and causality theory. This leads to the question of how to reduce the dimension of a multivariate GPD vector, while ensuring that the projected data contains the most information in terms of causality for extremes. In a multivariate Gaussian set-up, hannart18 proposed to maximize PNS by taking the linear combination that will contrast the factual and counterfactual worlds the most. Their solution was similar to linear discriminant analysis. Before dealing with the multivariate GPD case, one can learn a lot from studying , , PN, PS and PNS in the univariate case.

3 Causation probabilities for univariate extremes

To understand how PN, PS and PNS behave for univariate extremes, we take and when and are either Gaussian or GPD random variables. The left panel of Figure 2 shows the case where the counterfactual world corresponds to a standardized Gaussian variable, , and the factual world is one Kelvin warmer with a higher variability, . This artificial design mimics the typical behaviour of mean temperature anomalies. Two features can be highlighted from this example: PN goes to one as increases, and the maximum of PNS is around . In other words, the probability of necessary causation becomes certain for extremes (large ), and the probability of necessary and sufficient causation can be reasonably high in the Gaussian case.

Figure 2: Probabilities of necessary causation (PN, solid line), sufficient causation (PS, dotted line) and sufficient and necessary causation (PNS, dashed line) as functions of . The left panel corresponds to a Gaussian set-up: for the counterfactual world and for the factual one. The middle and right panels correspond to a GPD set-up: for the counterfactual world, for the factual one (middle) and for the counterfactual world, for the factual one (right).

To contrast these remarks with other types of tail behaviors, the middle panel of Figure 2 displays a GPD case with equal shape parameter in the counterfactual and factual worlds, but different scale parameters, and . One can see that, as increases, PN converges to a constant around , and PNS remains small for any value of . Hence, causal evidencing is much more difficult than in the Gaussian set-up, where a rare event in the factual world ( small) would be nearly impossible in the counterfactual world ( almost zero). In contrast, even a very rare event in the factual world will not be impossible in a GPD counterfactual world. Concerning PNS, it is small in the second panel and this phenomenon is even more pronounced when the shape parameter changes between the two worlds; see the right panel where , , and . As PNS is always almost near zero, there is no reason to maximize it. Instead, maximizing causality will correspond to maximizing PN in the remainder of this work.

In practice, and do not follow GPDs. Using a classical peaks-over-thresholds approach, we can condition on some high threshold and approximate the probabilities for by


We can now formalize the tail behaviour observed in Figure 2. Whenever the limit of PN() for large is finite222Degenerate cases can occur when . For example, if , the PN is not defined for , which is visible for the dashed line in the left panel of Figure 3., it has to be equal to


where represents the indicator function, equal to one if is true and zero otherwise.

Figure 3: Probability of necessary causation as a function of , see (2.3) and (3.1). Left panel: the counterfactual scale is , increasing to in the factual world, while the shape parameter is identical, , equal to , , and for the dashed, solid and dotted lines respectively. Right panel: the solid line corresponds to increasing shape and decreasing scale, and . The dashed line corresponds to the opposite scenario : and ).

The left panel of Figure 3 shows how three different GPD shape parameters, , and (dashed, solid and dotted lines respectively) influence the increase of the PN with respect to the threshold . The right panel of Figure 3 points out possible atypical behaviors of the PN, highlighting that PN is not always increasing as increases. Here, the solid line corresponds to a counterfactual world with compared to a factual world with , while the the dotted line represents the converse change: from to ).

4 Necessary causation in a multivariate set-up

4.1 The multivariate GPD and necessary causation

Let be any -dimensional random vector such that for some high threshold , where . Then, according to Proposition 2.1, the extremal information contained in any linear projection can be approximated, up to a normalizing constant, by a univariate GPD survival function. More precisely, for any , we can write


for and . Constraining the weights to sum to a constant is necessary to ensure identification of . The condition implies that conditional marginal distributions have equal shape parameters for . Therefore, homogeneous spatial regions (in terms of the shape parameter) have to be identified in practice. This is closely related to the regional frequency analysis problem treated in hydrology carreau2017. Finally, we note that the dependence structure of is present in the term only.

Any linear projection in the factual and counterfactuals worlds, denoted and respectively, can now be used to compute a probability of necessary causation that depends on the weight and the dependence structure of and ;


where satisfies approximation (4.1) for . To understand how dependence affects the strength of necessary causation, we study the value of in the bivariate case with , and . In the example displayed in Figure 4, two different dependence structures are investigated, summarized by the tail dependence coefficient . The dotted line corresponds to increasing dependence, from to and an increasing marginal scale, from to . The dashed line again represents increasing dependence, but decreasing marginal scale, from to . Finally, the solid line shows increasing marginal scale of the same order as the dotted line, and decreasing dependence, from to .

Figure 4: PN defined by (4.2) between two bivariate GPDs, and . The dotted line corresponds to , , and . The dashed line differs from the dotted line by and . The solid line differs from the dotted line by and .

Figure 4 shows that the dependence structure can have an impact on the PN for any finite value of . In other words, EEA based on a hypothesis of independence (eg, in space) will lead to incomplete statements concerning the strength of PN whenever the multivariate extremes are dependent. Figure 4 also suggests that, as increases, the impact of an increasing dependence in the factual world becomes negligible. However, it is important to keep in mind that in applications, the marginal scale might be constant between the two worlds. In that case, we will see the impact of increasing dependence in Figure 6.

4.2 Maximizing necessary causation

In a Gaussian set-up, hannart18 explored how to maximize causation probabilities of any Gaussian linear projection. Similarly here, the choice of could be an essential part in the maximization of necessary causation for multivariate GPD random variables. To address this point in the bivariate case, we need the following result.

Proposition 4.1.

Let and consider two positive bivariate scale parameters:
and . Denote

and if , define the weights

If and if one of the two weights belongs to , then this weight, denoted , maximizes


In all other cases, only zero or unit weights maximize this ratio.

When , the expression of is simpler because it does not depend on . Expression (4.3) is an approximation of the PN defined in (4.2); it is equal to the PN when are multivariate GPDs and , i.e., when the dependence structure remains constant between the two worlds. Proposition 4.1 allows us to study the gain in terms of PN with respect to the weight . When unit or zero weights are chosen as the optimal solution in Proposition 4.1, only one coordinate is considered and no linear projection is necessary. This happens when the contrast in one of the margins between the factual and counterfactual world is already sufficient to optimize PN. However, Proposition 4.1 shows that, to maximize necessary causality, one needs to consider only those components that (individually) give the largest PN. As an example, take and . The dependence structures of and are chosen such that . Hence, the difference between the two worlds is only due to the scale change in one of the components. Figure 5 shows the PN gain as a function of , i.e., the ratio between and where based on Proposition 4.1. Each curve corresponds to a different shape parameter (with constraint ), equal to (dashed line), (solid line), and (dotted line), respectively. We see that the optimal weight can lead to a large increase in necessary causality, particularly when the shape parameter is positive (dotted line).

Figure 5: Necessary causation gain when and , and , are Gaussian random vectors with . The ratio of to is shown as a function of , where based on Proposition 4.1. The dashed, solid and dotted lines correspond to a shape parameter (with constraint ) of , and respectively.

Explicit optimal weights like in Proposition 4.1 can only be obtained in very specific cases. For example, for the bivariate Gaussian GPD with , the probability does not depend on (see Proposition A.1 in the Appendix). For most other cases, numerical optimization schemes have to be used, especially beyond the bivariate set-up. In order to move closer to practical applications, we need to couple this optimization procedure with inference in a multivariate context.

4.3 Inference

Let and denote two independent samples of size , representing climate model output in the counterfactual and the factual world respectively, and let , denote two high thresholds. For , let denote the number of observations among that have at least one component exceeding . Extracting these observations and subtracting , we obtain the multivariate GPD samples . For , an estimator of and hence of the PN follows from approximation (4.1). The first term, , can be estimated nonparametrically by

To estimate the second term, , we first compute estimators and

by applying the method of probability weighted moments to

for (see Appendix B). Next, we set 333An alternative method to enforce equal shape parameters is described in carreau2017. Finally, we estimate by


Alternatively, we could directly estimate and by applying the method of probability weighted moments to , which reduces uncertainty and enforces the constraint of equal shape parameters. Appendix C shows a small simulation experiment, confirming the good performance of .

In the previous sections, we studied the increase in PN for changing dependence structures and marginal parameters. Another important question is what happens when marginal parameters do not change ( and ), while dependence increases (). Under a hypothesis of independence in space, one would estimate the univariate PN componentwise and take the average, thus possibly underestimating the PN. To see by how much, and how the result varies with the dimension, we conduct the following experiment. Consider points on a regular unit distance grid. For distances from to , pairwise tail dependence coefficients ranging from to for the counterfactual world and from to for the factual world were obtained using a Whittle-Matérn correlation function444The covariance matrices and are generated using a Whittle-Matérn correlation function with fixed shape and varying scales , . The correlation matrices are then multiplied by 10 to obtain and . We evaluate the PN in the quantile of using equal weights, calculated based on a pre-simulation run of sample size and held fixed. Figure 6 shows boxplots of the multivariate estimates minus the average univariate PN estimates, based on 1000 samples of size . The black line corresponds to the true values, calculated using the formulas in Appendix A. We see that as the dimension increases, taking dependence into account increases necessary causation.

Figure 6: Boxplots of the multivariate estimates minus the average univariate PN estimates, where is defined in (4.4), and . 1000 samples of size were simulated from a multivariate Gaussian GPD model with and , and (pairwise). The black line corresponds to the true values.

5 Analysis of heavy precipitation from the CNRM model

Evidencing causality is more difficult for heavy rainfall than for extreme temperatures, because precipitation variability is greater in space and time and because extreme rainfall has heavier tails than temperatures (extreme rainfall often has , see, e.g. katz02

). We want to determine if a multivariate GPD approach could enhance the causality message of a univariate analysis. We work with simulated rainfall time series from the French global climate model of Météo-France (CNRM) that belongs to the latest Coupled Model Intercomparison Project (CMIP6). We consider the winter months between the 1st of January 1985 until the 31st of August 2014 over the region defined by

to in longitude and to in latitude (corresponding to central Europe). Our factual and counterfactual worlds correspond to two historical runs, the second one of which has only natural forcings. We take the weekly maxima of winter precipitation. As the number of years covers only three decades, the rainfall series can be considered stationary in time within each world. Concerning their spatial structure, we apply the partitioning around medoids (PAM) algorithm (kaufman1990) to the counterfactual rainfall run. The difference with the original PAM version is that our “distance” between two locations and is tailored to threshold exceedances via

where denotes the standard empirical estimator of the pairwise tail dependence coefficient (see, e.g. kiriliouk2016book, and references therein). Our approach is close to the one of bernard2013, who focused on maxima instead of threshold exceedances. Figure 7 displays the spatial structure for clusters555Other values of were tested and provided similar patterns.. Although no spatial coordinates were given to the algorithm, the clusters appear to be spatially homogeneous and climatologically coherent. As the multivariate GPD is tailored for asymptotic dependence, identifying dependent regions help improve its fit. In addition, such a spatial clustering makes the assumption of a constant shape parameter within a region more reasonable.

Figure 7: Clustering of weekly maximum winter precipitation in central Europe between January 1985 and August 2014, using the PAM algorithm with distance based on pairwise tail dependence coefficients.

We hence model each cluster independently, calculating the estimated multivariate PN based on defined in (4.4). Figures 8 and 9 show necessity causation probabilities for the five-year and fifty-year return level respectively. The return levels were calculated based on quantiles of

for each cluster, with equal weights. Both figures show the PN per cluster, calculated using equal weights (top) and optimal weights (bottom). The diameters of the black circles around the PN estimates are proportional to the length of bootstrap-based 95% confidence intervals.

Figure 8: Necessary causation probabilities for a five-year return level of weighted maximal weekly winter precipitation in the counterfactual world for each cluster, calculated using equal weights (top) and optimal weights (bottom).
Figure 9: Necessary causation probabilities for a fifty-year return level of weighted maximal weekly winter precipitation in the counterfactual world for each cluster, calculated using equal weights (top) and optimal weights (bottom).

Higher PN does not necessarily comes with higher uncertainty; see for instance the cluster around northern Italy, whose confidence interval is narrow for the five-year return level and even more so for the fifty-year return level. Comparing the two panels of Figure 8 shows that the differences between the factual and the counterfactual world are higher when using optimal weights. This feature is even more striking for the fifty-year return levels, see Figure 9. Except for locations near the English channel, most points have a probability of necessary causation that is greater than . In particular, a few points like northern Italy shows a probability near one.

This example based on CRNM precipitation data is not sufficient to conclude general climatological results about heavy rainfall. The patterns found here may be due to this specific climate model, internal climate variability or other sources of variability. An exhaustive analysis of all the CMIP6 models, in terms of computer resources and climatological expertise, is beyond the scope of this methodological work. Still, this example illustrates that methods combining multivariate extreme-value theory and counterfactual theory could help climatologists working on causality and multivariate extremes (see, e.g. kim16; Zscheischler18). Another interesting research direction will be to extend this coupling between EVT and counterfactual theory to other types of extremes modelling in geosciences; see, e.g. Hammerling17 or Reich13 for a Bayesian hierarchical point of view, Shooter19 for asymptotic independence models, and Ragone18 for rare event algorithms.

Appendix A Gaussian multivariate GPD model

Let and let be a bivariate Gaussian random vector with mean and covariance matrix . Note that with . The tail dependence coefficient defined in Section 2.1 can be shown to equal

see the supplementary material in kiriliouk2018. In particular, as . Since is identified from , different covariance matrices may lead to the same tail dependence coefficient.

The Gaussian GPD model is convenient because the probability , which appears in the definition of the PN, can be calculated analytically in certain cases.

Proposition A.1.

Let where is a bivariate Gaussian random vector with mean zero and covariance matrix . Let .

  1. If , then for , we have

  2. If , then

For , the probability does not depend on . This is no longer true when . For a generalization of Proposition A.1 and its proof, see the supplementary material.

Appendix B Probability weighted moments inference

In this paper we opt for the probability weighted moments method of hosking1987 to estimate the parameters of a univariate GPD, because of its simplicity and its popularity in hydrological sciences. The method performs well for666An extension has been proposed for (diebolt2007). , and the asymptotic covariance matrix has a simple expression (hosking1987; ribereau2011). Let denote iid copies of GPD. The first and second probability weighted moments of are

These probability weighted moments can be estimated by

where are the order statistics of , which gives

Appendix C Simulation

Let and let

follow a Gaussian distribution with mean zero and covariance matrix

. We assess the quality of the estimator in (4.4) for a range of thresholds . The true probabilities can be calculated by combining (4.1) with Proposition A.1 (or its -dimensional generalization in the supplementary material). The experiments are based on 1000 samples of size from with . Figure 10 shows the results for and , where roughly corresponds to the quantile of . We take ; using the optimal weights gives similar results (not shown). We consider increasing dependence (, ), increasing marginal scale (, ) and constant marginal shape (). Boxplots of the corresponding estimators of the PN are centered around their true values for better visibility. We see that the estimated median is always equal to the true one. As

increases, more and more (downward) outliers appears, which is natural as the true PN is very close to

for large values of . For small values of , the results show less variability for than for , while for large , the opposite happens.

Figure 10: Boxplots of estimators of PN based on (4.4), centered around their true values, based on 1000 samples of size from with , , , and , for equal weights in (left panel) and (right panel).