1 Introduction
Quantifying human influence on climate change and identifying potential causes of climate extremes is often referred to as extreme event attribution (EEA), which falls within the research field of detection and attribution (D&A) (see, e.g. the report of NAP16; Chen18)
. In such studies, the main inferential objective is estimation of extreme quantiles (return levels), wellused in finance, hydrology and other fields of risk analysis
(e.g. Embrechts97). In EEA, such small probabilities and their associated return levels are computed under two scenarios that differ according to the causal link of interest, often increases in greenhouse gases (GHG) concentrations (see, e.g. Angelil17; stott16review; fischer2015). Typically, such an approach compares probabilities under a factual scenario of conditions that occurred around the time of the event against probabilities under a counterfactual scenario in which anthropogenic emissions never occurred. Under this setup, one can compare the probability of an extreme event in the factual world, denoted , to the probability of an extreme event in a counterfactual world, i.e., a world that might have been if no anthropogenic forcing would have existed. The definition of the socalled extreme event is by itself a nontrivial task and depends on the application at hand. A common choice is to take some climatic index exceeding a high threshold. In their seminal paper, Stott04 studied mean JuneAugust temperatures in Europe in order to quantify by how much human activities may have increased the risk of European heatwaves. In this example, a one dimensional sample mean, say , summarized a complex random field that varied in time and space. The set where Kelvin was chosen to resemble the 2003 mean European summer anomaly temperatures. The probabilities and were then inferred from numerical counterfactual and factual runs respectively, using nonparametric inference techniques (for bootstrap counting methods in EEA, see Paciorek18) and univariate extremevalue theory (EVT); for an EVT introduction, see, e.g. Coles01. For trend detection problems in D&A, there exists a variety of EVT models, see e.g. the pioneering work of Kharin05 and Kharin07.In other environmental research areas, complex multivariate EVT models are commonly used (see, e.g. Davison15; Cooley17; Fondeville19; Fondeville18; Engelke19; Reich13; Engelke19; Shaby12). Bayesian hierarchical modeling (see, e.g. Hammerling17; Katzfuss17) also offers a flexible way to insert different layers of complexity present in climate D&A problems (internal natural variability, numerical model uncertainty, observational errors, sampling uncertainty in space and time, etc.). Despite these advances, the EEA domain remains a fairly untouched territory in terms of multivariate EVT. Even recent applied papers like Kew19; Luu18; Otto18 and King17 are based on univariate EVT only.
Our first objective is to investigate how multivariate EVT could be used for event attribution. As extreme events in D&A are mostly expressed in terms of threshold exceedances, like in Stott04, this naturally leads to the question of how to integrate the multivariate generalized Pareto distribution (GPD) introduced by tajvidi1996 into the EEA framework. This distribution has been tailored to represent extremal behaviors when at least one of the components of the vector of interest is large. The probabilistic properties of the multivariate GPD have been well studied by, among others, beirlant2004; rootzen2006; falk2008; ferreira2014; rootzen2018 and rootzen2018nr2, while statistical modeling is more recent (huser2015; kiriliouk2018).
In most univariate EEA studies (see stott16review, and references therein), two types of probability ratios are considered: the Risk Ratio and the socalled Fraction of Attributable Risk (FAR), defined by
where corresponds to probability of exceeding the threshold in the counterfactual world, while represents the same quantity in the factual world. Using the counterfactual theory of pearl2000, the FAR corresponds to the probability of necessary causation, i.e., anthropogenic forcings are necessary for the extreme event to occur, but might not be sufficient. Within the Gaussian setup, hannart2016 and hannart18 highlighted the link between causality theory and event attribution studies. The second objective of our work is to explain how Pearl’s counterfactual theory can be applied within a multivariate GPD framework, and to identify conditions that increase the level of causality, a fundamental feature in any EEA analysis.
The rest of the paper is structured as follows. Section 2 summarizes the relevant background of both multivariate GPDs and climate event attribution. Section 3 discusses the behaviour of univariate probabilities of causation as a function of the threshold . In Section 4, we make the link between multivariate GPDs and causality by maximizing necessary causation for any linear projection and we discuss the inference strategy. Finally, in Section 5, the proposed methods are applied to weekly winter maxima of precipitation outputs from the French CNRM model that are part of the recent CMIP6 experiment. Technical details are deferred to the appendices.
2 Background
2.1 The multivariate generalized Pareto distribution
The basic building block to construct standardized multivariate GPD random vectors (rootzen2006; rootzen2018) is the stochastic representation
(2.1) 
where
corresponds to a univariate exponential random variable with unit mean, and
represents any dimensional random vector independent of . One can easily check that each positive conditional margin has a unit exponential survival function,Model (2.1) can be generalized by setting, for and ,
(2.2) 
where operations like have to be understood componentwise. We then denote . Equation (2.2) implies
where denotes the survival function of the univariate GPD with scale parameter and shape parameter . Hence, the conditional margins follow univariate GPDs^{1}^{1}1although the random variables may not follow GPDs themselves. .
Theoretically, the random vector defined by (2.1) and (2.2) can be viewed as the limiting solution of any linearly rescaled multivariate vector given that at least one component is large (rootzen2006; rootzen2018). This asymptotic result can be interpreted as a multivariate version of the Pickands–Balkema–de Haan theorem (pickands1975; balkema1974).
Let denote a random vector in , representing a variate observation, and let denote a high threshold. When , univariate peaksoverthresholds approaches (davison1990) consist of fitting to a univariate GPD. Similarly, the multivariate GPD approximates the tail behavior of , where means that at least one component of exceeds the corresponding component of . By construction, multivariate GPD vectors exhibit asymptotic dependence, i.e., is wellapproximated by a member of the class of multivariate GPDs if puts no mass on lowerdimensional subspaces.
The random “generator” in (2.1) drives the extremal dependence of , often summarized by the tail dependence coefficient . If , denote the unconditional marginal distribution functions of , then measures the probability of being large given that is large as the threshold increases,
A large value of corresponds to strong extremal dependence between and , whereas corresponds to tail independence. For more details on in the context of multivariate GPDs, see the supplementary material in kiriliouk2018. As an example, Figure 1 displays 500 bivariate random draws from a multivariate GPD model where is zeromean bivariate Gaussian with unit covariance matrix , corresponding to .
To make the link with the probabilities and used for EEA, we need a tool to project the information contained in a possibility complex multivariate GPD structure into a single valued summary. The following proposition provides this key tool.
Proposition 2.1 (Linearprojection, rootzen2018).
If with , then for any nonnegative weights such that , the linear projection of , conditioned on being positive, follows a univariate GPD, i.e.,
2.2 Climate event attribution and counterfactual theory
The question of attribution in EEA is rooted in causality assessment, so that connections between the probabilities and and some type of causality are called for. To explain the link between some event (e.g. the 2003 European heatwave) and its potential cause (e.g. the increase of GHG emissions), pearl2000 makes the distinction between three forms of causality. These types of causality can be expressed in a probabilistic manner as

probability of necessary causation (PN): is required for but other factors might be required as well;

probability of sufficient causation (PS): always triggers but might occur without ;

probability of necessary and sufficient causation (PNS): both of the above hold.
In most applications, these probabilities are difficult to estimate. In the special case of climate EEA where one assumes that both factual and counterfactual worlds are available from numerical experiments, hannart2016 exploited the fact that is monotonous with respect to the external forcings (increase in GHGs likely leads to a warmer climate) and all forcings are exogenous (fossil energy, volcanic forcings are not part of the climate system). This simplifies the expressions of PN, PS and PNS that become
(2.3) 
where is the probability of in the counterfactual world and is the probability of in the factual world. If , then PN coincides with the FAR used by Stott04. In the remainder of this paper, we will use the notation PN (instead of FAR) to highlight its causal interpretation. By construction, one has PNS PN,PS and hence it is worth noticing that a low PNS does not imply the absence of a causal relationship.
One important objective of this paper is to combine multivariate EVT and causality theory. This leads to the question of how to reduce the dimension of a multivariate GPD vector, while ensuring that the projected data contains the most information in terms of causality for extremes. In a multivariate Gaussian setup, hannart18 proposed to maximize PNS by taking the linear combination that will contrast the factual and counterfactual worlds the most. Their solution was similar to linear discriminant analysis. Before dealing with the multivariate GPD case, one can learn a lot from studying , , PN, PS and PNS in the univariate case.
3 Causation probabilities for univariate extremes
To understand how PN, PS and PNS behave for univariate extremes, we take and when and are either Gaussian or GPD random variables. The left panel of Figure 2 shows the case where the counterfactual world corresponds to a standardized Gaussian variable, , and the factual world is one Kelvin warmer with a higher variability, . This artificial design mimics the typical behaviour of mean temperature anomalies. Two features can be highlighted from this example: PN goes to one as increases, and the maximum of PNS is around . In other words, the probability of necessary causation becomes certain for extremes (large ), and the probability of necessary and sufficient causation can be reasonably high in the Gaussian case.
To contrast these remarks with other types of tail behaviors, the middle panel of Figure 2 displays a GPD case with equal shape parameter in the counterfactual and factual worlds, but different scale parameters, and . One can see that, as increases, PN converges to a constant around , and PNS remains small for any value of . Hence, causal evidencing is much more difficult than in the Gaussian setup, where a rare event in the factual world ( small) would be nearly impossible in the counterfactual world ( almost zero). In contrast, even a very rare event in the factual world will not be impossible in a GPD counterfactual world. Concerning PNS, it is small in the second panel and this phenomenon is even more pronounced when the shape parameter changes between the two worlds; see the right panel where , , and . As PNS is always almost near zero, there is no reason to maximize it. Instead, maximizing causality will correspond to maximizing PN in the remainder of this work.
In practice, and do not follow GPDs. Using a classical peaksoverthresholds approach, we can condition on some high threshold and approximate the probabilities for by
(3.1) 
We can now formalize the tail behaviour observed in Figure 2. Whenever the limit of PN() for large is finite^{2}^{2}2Degenerate cases can occur when . For example, if , the PN is not defined for , which is visible for the dashed line in the left panel of Figure 3., it has to be equal to
(3.2) 
where represents the indicator function, equal to one if is true and zero otherwise.
The left panel of Figure 3 shows how three different GPD shape parameters, , and (dashed, solid and dotted lines respectively) influence the increase of the PN with respect to the threshold . The right panel of Figure 3 points out possible atypical behaviors of the PN, highlighting that PN is not always increasing as increases. Here, the solid line corresponds to a counterfactual world with compared to a factual world with , while the the dotted line represents the converse change: from to ).
4 Necessary causation in a multivariate setup
4.1 The multivariate GPD and necessary causation
Let be any dimensional random vector such that for some high threshold , where . Then, according to Proposition 2.1, the extremal information contained in any linear projection can be approximated, up to a normalizing constant, by a univariate GPD survival function. More precisely, for any , we can write
(4.1) 
for and . Constraining the weights to sum to a constant is necessary to ensure identification of . The condition implies that conditional marginal distributions have equal shape parameters for . Therefore, homogeneous spatial regions (in terms of the shape parameter) have to be identified in practice. This is closely related to the regional frequency analysis problem treated in hydrology carreau2017. Finally, we note that the dependence structure of is present in the term only.
Any linear projection in the factual and counterfactuals worlds, denoted and respectively, can now be used to compute a probability of necessary causation that depends on the weight and the dependence structure of and ;
(4.2) 
where satisfies approximation (4.1) for . To understand how dependence affects the strength of necessary causation, we study the value of in the bivariate case with , and . In the example displayed in Figure 4, two different dependence structures are investigated, summarized by the tail dependence coefficient . The dotted line corresponds to increasing dependence, from to and an increasing marginal scale, from to . The dashed line again represents increasing dependence, but decreasing marginal scale, from to . Finally, the solid line shows increasing marginal scale of the same order as the dotted line, and decreasing dependence, from to .
Figure 4 shows that the dependence structure can have an impact on the PN for any finite value of . In other words, EEA based on a hypothesis of independence (eg, in space) will lead to incomplete statements concerning the strength of PN whenever the multivariate extremes are dependent. Figure 4 also suggests that, as increases, the impact of an increasing dependence in the factual world becomes negligible. However, it is important to keep in mind that in applications, the marginal scale might be constant between the two worlds. In that case, we will see the impact of increasing dependence in Figure 6.
4.2 Maximizing necessary causation
In a Gaussian setup, hannart18 explored how to maximize causation probabilities of any Gaussian linear projection. Similarly here, the choice of could be an essential part in the maximization of necessary causation for multivariate GPD random variables. To address this point in the bivariate case, we need the following result.
Proposition 4.1.
Let and consider two positive bivariate scale parameters:
and .
Denote
and if , define the weights
If and if one of the two weights belongs to , then this weight, denoted , maximizes
(4.3) 
In all other cases, only zero or unit weights maximize this ratio.
When , the expression of is simpler because it does not depend on . Expression (4.3) is an approximation of the PN defined in (4.2); it is equal to the PN when are multivariate GPDs and , i.e., when the dependence structure remains constant between the two worlds. Proposition 4.1 allows us to study the gain in terms of PN with respect to the weight . When unit or zero weights are chosen as the optimal solution in Proposition 4.1, only one coordinate is considered and no linear projection is necessary. This happens when the contrast in one of the margins between the factual and counterfactual world is already sufficient to optimize PN. However, Proposition 4.1 shows that, to maximize necessary causality, one needs to consider only those components that (individually) give the largest PN. As an example, take and . The dependence structures of and are chosen such that . Hence, the difference between the two worlds is only due to the scale change in one of the components. Figure 5 shows the PN gain as a function of , i.e., the ratio between and where based on Proposition 4.1. Each curve corresponds to a different shape parameter (with constraint ), equal to (dashed line), (solid line), and (dotted line), respectively. We see that the optimal weight can lead to a large increase in necessary causality, particularly when the shape parameter is positive (dotted line).
Explicit optimal weights like in Proposition 4.1 can only be obtained in very specific cases. For example, for the bivariate Gaussian GPD with , the probability does not depend on (see Proposition A.1 in the Appendix). For most other cases, numerical optimization schemes have to be used, especially beyond the bivariate setup. In order to move closer to practical applications, we need to couple this optimization procedure with inference in a multivariate context.
4.3 Inference
Let and denote two independent samples of size , representing climate model output in the counterfactual and the factual world respectively, and let , denote two high thresholds. For , let denote the number of observations among that have at least one component exceeding . Extracting these observations and subtracting , we obtain the multivariate GPD samples . For , an estimator of and hence of the PN follows from approximation (4.1). The first term, , can be estimated nonparametrically by
To estimate the second term, , we first compute estimators and
by applying the method of probability weighted moments to
for (see Appendix B). Next, we set ^{3}^{3}3An alternative method to enforce equal shape parameters is described in carreau2017. Finally, we estimate by(4.4) 
Alternatively, we could directly estimate and by applying the method of probability weighted moments to , which reduces uncertainty and enforces the constraint of equal shape parameters. Appendix C shows a small simulation experiment, confirming the good performance of .
In the previous sections, we studied the increase in PN for changing dependence structures and marginal parameters. Another important question is what happens when marginal parameters do not change ( and ), while dependence increases (). Under a hypothesis of independence in space, one would estimate the univariate PN componentwise and take the average, thus possibly underestimating the PN. To see by how much, and how the result varies with the dimension, we conduct the following experiment. Consider points on a regular unit distance grid. For distances from to , pairwise tail dependence coefficients ranging from to for the counterfactual world and from to for the factual world were obtained using a WhittleMatérn correlation function^{4}^{4}4The covariance matrices and are generated using a WhittleMatérn correlation function with fixed shape and varying scales , . The correlation matrices are then multiplied by 10 to obtain and . We evaluate the PN in the quantile of using equal weights, calculated based on a presimulation run of sample size and held fixed. Figure 6 shows boxplots of the multivariate estimates minus the average univariate PN estimates, based on 1000 samples of size . The black line corresponds to the true values, calculated using the formulas in Appendix A. We see that as the dimension increases, taking dependence into account increases necessary causation.
5 Analysis of heavy precipitation from the CNRM model
Evidencing causality is more difficult for heavy rainfall than for extreme temperatures, because precipitation variability is greater in space and time and because extreme rainfall has heavier tails than temperatures (extreme rainfall often has , see, e.g. katz02
). We want to determine if a multivariate GPD approach could enhance the causality message of a univariate analysis. We work with simulated rainfall time series from the French global climate model of MétéoFrance (CNRM) that belongs to the latest Coupled Model Intercomparison Project (CMIP6). We consider the winter months between the 1st of January 1985 until the 31st of August 2014 over the region defined by
to in longitude and to in latitude (corresponding to central Europe). Our factual and counterfactual worlds correspond to two historical runs, the second one of which has only natural forcings. We take the weekly maxima of winter precipitation. As the number of years covers only three decades, the rainfall series can be considered stationary in time within each world. Concerning their spatial structure, we apply the partitioning around medoids (PAM) algorithm (kaufman1990) to the counterfactual rainfall run. The difference with the original PAM version is that our “distance” between two locations and is tailored to threshold exceedances viawhere denotes the standard empirical estimator of the pairwise tail dependence coefficient (see, e.g. kiriliouk2016book, and references therein). Our approach is close to the one of bernard2013, who focused on maxima instead of threshold exceedances. Figure 7 displays the spatial structure for clusters^{5}^{5}5Other values of were tested and provided similar patterns.. Although no spatial coordinates were given to the algorithm, the clusters appear to be spatially homogeneous and climatologically coherent. As the multivariate GPD is tailored for asymptotic dependence, identifying dependent regions help improve its fit. In addition, such a spatial clustering makes the assumption of a constant shape parameter within a region more reasonable.
We hence model each cluster independently, calculating the estimated multivariate PN based on defined in (4.4). Figures 8 and 9 show necessity causation probabilities for the fiveyear and fiftyyear return level respectively. The return levels were calculated based on quantiles of
for each cluster, with equal weights. Both figures show the PN per cluster, calculated using equal weights (top) and optimal weights (bottom). The diameters of the black circles around the PN estimates are proportional to the length of bootstrapbased 95% confidence intervals.




Higher PN does not necessarily comes with higher uncertainty; see for instance the cluster around northern Italy, whose confidence interval is narrow for the fiveyear return level and even more so for the fiftyyear return level. Comparing the two panels of Figure 8 shows that the differences between the factual and the counterfactual world are higher when using optimal weights. This feature is even more striking for the fiftyyear return levels, see Figure 9. Except for locations near the English channel, most points have a probability of necessary causation that is greater than . In particular, a few points like northern Italy shows a probability near one.
This example based on CRNM precipitation data is not sufficient to conclude general climatological results about heavy rainfall. The patterns found here may be due to this specific climate model, internal climate variability or other sources of variability. An exhaustive analysis of all the CMIP6 models, in terms of computer resources and climatological expertise, is beyond the scope of this methodological work. Still, this example illustrates that methods combining multivariate extremevalue theory and counterfactual theory could help climatologists working on causality and multivariate extremes (see, e.g. kim16; Zscheischler18). Another interesting research direction will be to extend this coupling between EVT and counterfactual theory to other types of extremes modelling in geosciences; see, e.g. Hammerling17 or Reich13 for a Bayesian hierarchical point of view, Shooter19 for asymptotic independence models, and Ragone18 for rare event algorithms.
Appendix A Gaussian multivariate GPD model
Let and let be a bivariate Gaussian random vector with mean and covariance matrix . Note that with . The tail dependence coefficient defined in Section 2.1 can be shown to equal
see the supplementary material in kiriliouk2018. In particular, as . Since is identified from , different covariance matrices may lead to the same tail dependence coefficient.
The Gaussian GPD model is convenient because the probability , which appears in the definition of the PN, can be calculated analytically in certain cases.
Proposition A.1.
Let where is a bivariate Gaussian random vector with mean zero and covariance matrix . Let .

If , then for , we have

If , then
For , the probability does not depend on . This is no longer true when . For a generalization of Proposition A.1 and its proof, see the supplementary material.
Appendix B Probability weighted moments inference
In this paper we opt for the probability weighted moments method of hosking1987 to estimate the parameters of a univariate GPD, because of its simplicity and its popularity in hydrological sciences. The method performs well for^{6}^{6}6An extension has been proposed for (diebolt2007). , and the asymptotic covariance matrix has a simple expression (hosking1987; ribereau2011). Let denote iid copies of GPD. The first and second probability weighted moments of are
These probability weighted moments can be estimated by
where are the order statistics of , which gives
Appendix C Simulation
Let and let
follow a Gaussian distribution with mean zero and covariance matrix
. We assess the quality of the estimator in (4.4) for a range of thresholds . The true probabilities can be calculated by combining (4.1) with Proposition A.1 (or its dimensional generalization in the supplementary material). The experiments are based on 1000 samples of size from with . Figure 10 shows the results for and , where roughly corresponds to the quantile of . We take ; using the optimal weights gives similar results (not shown). We consider increasing dependence (, ), increasing marginal scale (, ) and constant marginal shape (). Boxplots of the corresponding estimators of the PN are centered around their true values for better visibility. We see that the estimated median is always equal to the true one. Asincreases, more and more (downward) outliers appears, which is natural as the true PN is very close to
for large values of . For small values of , the results show less variability for than for , while for large , the opposite happens.
Comments
There are no comments yet.