## 1 Introduction

Investigating causal links between climate forcings and the observed climate evolution over the instrumental era represents a significant part of the research effort on climate. Studies addressing these aspects in the context of climate change have been providing over the past decades, an ever increasing level of causal evidence that is important for decision-makers in international discussions on mitigation policy. In particular, these studies have produced far-reaching causal claims; for instance the latest IPCC report (Stocker et al., 2013), AR5 hereafter, stated that “It is extremely likely that human influence has been the dominant cause of the observed warming since the mid-20 century.” An important part of this causal claim, as well as many related others, regards the associated level of uncertainty. More precisely, the expression “extremely likely” in the latter quote has been formally defined by the IPCC (Mastrandrea et al., 2010) to correspond to a probability of . The above quote hence implicitly means that the probability that the observed warming since the mid-20th century was not predominantly caused by human influence but by natural factors, is roughly . Based on the current state of knowledge, it means that it is not yet possible to fully rule out that natural factors were the main causes of the observed global warming. This probability of , as well as all the probabilities associated to the numerous causal claims that can be found in the past and present climate literature and are summarized in AR5, are critical quantities that are prone to affect the way in which climate change is apprehended by citizens and decision makers, and thereby to affect decisions on the matter. It is thus of interest to examine the method followed to derive them and, potentially, to improve it.

Aforementioned studies buttressing the above claim usually rely on a conventional attribution framework in which “causal attribution of anthropogenic climate change” is understood to mean “demonstration that a detected change is consistent with the estimated responses to anthropogenic and natural forcings combined, but not consistent with alternative, physically plausible explanations that exclude important elements of anthropogenic forcings” (Hegerl et al., 2010). While this definition has proved to be very useful and relevant, it offers a description of causality which is arguably overly qualitative for the purpose of deriving a probability. In particular, it comes short of a mathematical definition of the word “cause” and incidentally, of the “probability to have caused

” that we in fact wish to quantify. Hence, beyond these general guidance principles, the actual derivation of these probabilities is left to some extent to the interpretation of the practitioner. In practice, causal attribution has usually been performed by using a class of linear regression models

(Hegerl and Zwiers, 2011):(1) |

where the observed climate change is regarded as a linear combination of externally forced response patterns with referred to as fingerprints, and where

represent of internal climate variability and observational error (all variables are vectors of dimension

). The regression coefficient accounts for possible error in climate models in simulating the amplitude of the pattern of response to forcing . After inference and uncertainty analysis, the value of each coefficientand the magnitude of the confidence intervals determine whether or not the observed response is attributable to the associated forcing. The desired probability of causation, i.e. the probability that the forcing of interest

has caused the observed change is denoted hereafter . It has often been equated to the probability that the corresponding linear regression coefficient is positive^{1}

^{1}1The notation corresponds to the confidence level associated to the confidence interval

under a frequentist approach, and to the posterior probability that

is positive under a Bayesian one. :(2) |

A shortcoming of the conventional framework summarized in Equations (1) and (2) above, is that a linear regression coefficient does not have any causal meaning from a formal standpoint. As acknowledged by Pearl (2009), turning an intrinsically deterministic notion such as causality into a probabilistic one, is a difficult general problem which has also long been a matter of debate (Simpson, 1951; Suppes, 1970; Mellor, 1995). Nevertheless, the current approach can be theoretically improved in the context of climate change where the values of the probabilities of causation have such important implications.

Our proposal to tackle this objective is anchored into a coherent theoretical corpus of definitions, concepts and methods of general applicability which has emerged over the past three decades to address the issue of evidencing causal relationships empirically (Pearl, 2009). This general framework is increasingly used in diverse fields (e.g. in epidemiology, economics, social science) in which investigating causal links based on observations is a central matter. Recently, it has been introduced in climate science for the specific purpose of attributing weather and climate-related extreme events (Hannart et al., 2015a), which we refer to as ‘extreme events’ hereafter. The latter article gave a brief overview of causal theory and articulated it with the conventional framework used for the attribution of extreme events, which is also an important topic in climate attribution. In particular, Hannart et al. (2015a) showed that the key quantity referred to as the fraction of attributable risk (FAR) (Allen, 2003; Stone and Allen, 2005) which buttresses most extreme event attribution (EA) studies, can be directly interpreted within causal theory.

However, Hannart et al. (2015a) did not address how to extend and adapt this theory in the context of the attribution of climate changes occurring on longer timescales. Yet, a significant advantage of the definitions of causal theory is precisely that they are relevant no matter the temporal and spatial scale. For instance, from the perspective of a paleoclimatologist studying Earth’s climate over the past few hundred millions of years, global warming over the past hundred and fifty years can be considered as a climate event. As a matter of fact, the word “event” is used in paleoclimatology to refer to “rapid” changes in the climate system, but ones that may yet last centuries to millennia. Where to draw the line is thus arbitrary: one person’s long term trend is another person’s short term event. It should therefore be possible to tackle causal attribution within a unified methodological framework based on shared concepts and definitions of causality. Doing so would allow to bridge the methodological gap that exists between EA and trend attribution at a fundamental level, thereby covering the full scope of climate attribution studies. Such a unification would present in our view several advantages: enhancing methodological research synergies between D&A topics, improving the shared interpretability of results, and streamlining the communication of causal claims — in particular when it comes to the quantification of uncertainty, i.e. of the probability that a given forcing has caused a given observed phenomenon.

Here, we adapt some formal definitions of causality and probability of causation to the context of climate change attribution. Then, we detail technical implementation under standard assumptions used in D&A. The method is finally illustrated on the warming observed over the 20 century.

## 2 Causal counterfactual theory

While an overview of causal theory cannot be repeated here, it is necessary for clarity and self-containedness to highlight its key ideas and most relevant concepts for the present discussion.

Let us first recall the so-called “counterfactual” definition of causality by quoting the 18th century Scottish philosopher David Hume: “We may define a cause to be an object followed by another, where, if the first object had not been, the second never had existed.” In other words, an event ( stands for effect) is caused by an event ( stands for cause) if and only if would not occur were it not for . Note that the word event is used here in its general, mathematical sense of subset of a sample space . According to this definition, evidencing causality requires a counterfactual approach by which one inquires whether or not the event would have occurred in a hypothetical world, termed counterfactual, in which the event would not have occurred. The fundamental approach of causality which is implied by this definition is still entirely relevant in the standard causal theory. It may also arguably be connected to the guidance principles of the conventional climate change attribution framework and to the optimal fingerprinting models, in a qualitative manner. The main virtue of the standard causality theory of Pearl consists in our view in formalizing precisely the above qualitative definition, thus allowing for sound quantitative developments. A prominent feature of this theory consists in first recognizing that causation corresponds to rather different situations and that three distinct facets of causality should be distinguished: (i) necessary causation, where the occurrence of requires that of but may also require other factors; (ii) sufficient causation, where the occurrence of drives that of but may not be required for to occur; (iii) necessary and sufficient causation, where (i) and (ii) both hold. The fundamental distinction between these three facets can be visualized by using the simple illustration shown in Figure 1.

While the counterfactual definition as well as the three facets of causality described above may be understood at first in a fully deterministic sense, perhaps the main strength of Pearl’s formalization is to propose an extension of these definitions under a probabilistic setting. The probabilities of causation are thereby defined as follow:

(3a) | |||

(3b) | |||

(3c) |

where and are the complementaries of and , and where the notation means that an intervention is applied to the system under causal investigation. For instance PS, the probability of sufficient causation, reads from the above: the probability that occurs when is interventionally forced to occur, conditional on the fact that neither nor were occurring in the first place. Conversely PN, the probability of necessary causation, is defined as the probability that would not occur when is interventionally forced to not occur, conditional on the fact that both and were occurring in the first place. While we omit here the formal definition of the intervention for brevity, the latter can be understood merely as experimentation: applying these definitions thus requires the ability to experiment. Real experimentation, whether in situ or in vivo, is often accessible in many fields but it is not in climate research for obvious reasons. In this case, one can thus only rely on numerical in silico experimentation: the implications of this constraint are discussed further.

While the probabilities of causation are not easily computable in general, their expression fortunately becomes quite simple under assumptions that are reasonable in the case of external forcings (i.e. exogeneity and monotonicity):

(4a) | |||

(4b) | |||

(4c) |

where is the so-called factual probability of the event in the real world where did occur and is its counterfactual probability in the hypothetic world as it is would have been had not occurred. One may easily verify that Equation (4) holds in the three examples of Figure 1 by assuming that the switches are probabilistic and exogenous. In any case, under such circumstances, the causal attribution problem can thus be narrowed down to computing an estimate of the probabilities and . The latter only requires two experiments: a factual experiment and a counterfactual one . Equation (3) then yields and PNS from which a causal statement can be formulated.

Each three probability PS, PN and PNS have different implications depending on the context. For instance, two perspectives can be considered: (i) the ex post perspective of the plaintiff or the judge who asks “does bear the responsibility of the event that did occur?”; and (ii) the ex ante perspective of the planner or the policymaker who instead asks “what should be done w.r.t. to prevent future occurrence of ?”. It is PN that is typically more relevant to context (i) involving legal responsibility, whereas PS has more relevance for context (ii) involving policy elaboration. Both these perspectives could be relevant in the context of climate change, and it thus makes sense to trade them off. Note that PS and PN can be articulated with the conventional definition recalled in introduction. Indeed, the “demonstration that the change is consistent with (…)” implicitly corresponds to the idea of sufficient causation, whereas “(…) is not consistent with (…)” corresponds to that of necessary causation. The conventional definition therefore implicitly requires a high PS and a high PN to attribute a change to a given cause.

PNS may be precisely viewed as a probability which combines necessity and sufficiency. It does so in a conservative way since we have by construction that . In particular, this means that a low PNS does not imply the absence of a causal relationship because either a high PN or a high PS may still prevail even when PNS is low. On the other hand, it presents the advantage that any statement derived from PNS asserting the existence of a causal link, holds both in terms of necessity and sufficiency. This property is thus prone to simplify causal communication, in particular towards the general public, since the distinction no longer needs to be explained. Therefore, establishing a high PNS may be considered as a suitable goal to evidence the existence of a causal relationship in a simple and straightforward way. In particular, the limiting case corresponds to the fully deterministic, systematic and single-caused situation in Figure 1c — i.e. undeniably the most stringent way in which one may understand causality.

## 3 Probabilities of causation of climate change

We now return to the question of interest: for a given forcing and an observed evolution of the climate system , can be attributed to ? More precisely, what is the probability that has caused ? We propose to tackle this problem by applying the causal counterfactual theory to the context of climate change, and more specifically, by using the three probabilities of causation PN, PS and PNS recalled above. This Section shows that it can be done to a large extent similarly to the approach of Hannart et al. (2015a) for EA. In particular, as in EA, the crucial question to be answered as a starting point consists of narrowing down the definitions of the cause event and of the effect event associated to the question at stake — where the word “event” is used here in its general mathematical sense of “subset”.

### 3.1 Counterfactual setting

For the cause event , a straightforward answer is possible: we can follow the exact same approach as in EA by defining as “presence of forcing ” (i.e. the factual world that occurred) and as “absence of forcing ” (i.e. the counterfactual world that would have occurred in the absence of ). Indeed, forcing can be switched on and off in numerical simulations of the climate evolution over the industrial period, as in the examples of Fig. 1 and as in standard EA studies. Incidentally, the sample space consists in the set of all possible climate trajectories in the presence and absence of , including the observed one . In other words, all forcings other than are held constant at their observed values as they are not concerned by the causal question.

In practice and by definition, the factual runs of course always correspond to the historical experiment (HIST hereafter), using the Climate Model Intercomparison Project’s (CMIP hereafter) terminology as described by Taylor et al. (2012). The counterfactual runs are obtained from the same setting as historical but switching off the forcing of interest. For instance, if the forcing consists of the anthropogenic forcing then the counterfactual runs corresponds to the historicalNat (NAT hereafter) experiment i.e. . Likewise, if the forcing consists of the CO forcing, then the counterfactual runs corresponds to the “all except CO” experiment. However, no such runs are available in CMIP5

(`https://cmip.llnl.gov/cmip5/docs/historical_Misc_forcing.pdf`

and Section 6 for discussion). Lastly, it is worth underlining that the historicalAnt experiment, which combines all anthropogenic forcings, thus corresponds to the counterfactual setting associated to the natural forcings. Therefore, runs from the historicalAnt experiment are relevant for the attribution of the natural forcings only, they are not relevant for the attribution of the anthropogenic forcings under the present counterfactual causal theory.

These definitions of and have an important implication w.r.t. the design of numerical experiments in climate change attribution. In contrast with the design usually prevailing in D&A (forcing only), the latter experiments are required to be counterfactual (i.e. all forcings except ). We elaborate further on this remark in Section 6.

### 3.2 Balancing necessity and sufficiency

To define the effect event , we propose to follow the same approach as in EA, where is usually defined based on an ad hoc climatic index exceeding a threshold :

(5) |

Thus, defining implies choosing an appropriate variable and threshold that reflect the focus of the question while keeping in mind the implications of the balance between the probabilities of necessary and sufficient causation. We now illustrate this issue and lay out some proposals to address it.

Consider the question “Have anthropogenic CO emissions caused global warming?”. Following the above, the event “global warming” may be loosely defined as a positive trend on global Earth surface temperature, i.e. where is the global surface temperature linear trend coefficient and the threshold is zero. In that case, nearly always occurs in the factual world () but it is also frequent in the counterfactual one ( medium) thus the emphasis is mostly on PS, i.e. on sufficient causation, while PN and PNS will have moderate values (Fig. 2b,e). But if global warming is more restrictively defined as a warming trend comparable to or greater than the observed trend, i.e. where is the observed trend, then the event becomes nearly impossible in the counterfactual world () but remains frequent in the factual one ( medium) thus the emphasis is on PN, i.e. on necessary causation, while the values of PS and PNS will this time be low. Therefore, the above two extreme definitions both yield a low PNS. But under a more balanced definition of global warming as a trend exceeding an intermediate value , then the event nearly always occurs in the factual world () and yet remains very rare in the counterfactual one (). Hence PNS is then high: both necessary and sufficient causation prevail. We propose to take advantage of this optimal value to define the event “global warming” as the global trend index exceeding the optimal threshold such that the amount of causal evidence, in a PNS sense, is maximized:

(6) |

where the condition insures that the event has actually occurred. When used on real data (see Section 6), this approach yields a high value of for the above question (Figure 2b,e).

Let us now consider the question “Have anthropogenic CO emissions caused the Argentinian heatwave of December 2013?” (Hannart et al., 2015b). Here, the event can be defined as where is surface temperature anomaly averaged over an ad-hoc space-time window. Like in the previous case, the causal evidence agains shifts from necessary and not sufficient (Fig. 2a,d) when is equal to the observed value of the index C (unusual event in both worlds but much more so in the counterfactual one) to sufficient and not necessary when is small (usual event in both worlds but much more so in the factual one). Like in the previous case, a possible approach here would be to balance both quantities by maximizing PNS in as in Equation (6). However, this would lead here to a substantially lower threshold which no longer reflects the rare and extreme nature of the event “heatwave” under scrutiny. Furthermore, this would yield a well-balanced, but pretty low level of causal evidence (). Thus maximizing PNS is not relevant here. Instead, maximizing PN, even if that is at the expense of PS, is arguably more relevant here since we are dealing with extreme events that are rare in both worlds, thereby inherently limiting the evidence of sufficient causation. This maximization corresponds to which often yields the highest observed threshold . Therefore, PN (i.e. the FAR) is an appropriate metric for the attribution of extreme events, and a high threshold matching with the observed value should be used in order to maximize it. In contrast with extreme events, long term changes are prone to be associated with much powerful causal evidence that simultaneously involves necessary and sufficient causation, and may yield high values for PN, PS and PNS. PNS is thus an appropriate summary metric to consider for the attribution of climate changes, in agreement with D&A guidance principles (Hegerl et al., 2010). An optimal intermediate threshold can be chosen by maximizing PNS.

### 3.3 Building an optimal index

In the above example where “global warming” is the focus of the question, the variable of interest to define the event can be considered as implicitly stated in the question, insofar as the term “global warming” implicitly refers to an increasing trend on global temperature. However, in the context of climate change attribution, we often investigate the cause of “an observed change ” with no precise a priori regarding the characteristics of the change that are relevant w.r.t. causal evidencing. Furthermore, may be a large dimensional space-time vector. Thus the definition of the index in this case is more ambiguous.

We argue that in such a case, the physical characteristics of which are implicitly considered relevant to the causal question are precisely those which best enhance the existence of a causal relationship in a PNS sense. This indeed corresponds to the idea of “fingerprinting” used thus far in climate change attribution studies (as well as in criminal investigations, hence the name): we seek a fingerprint, i.e. a distinctive characteristic of which would never appear in the absence of forcing (i.e. ) but systematically does in its presence (i.e. ). If this characteristic shows up in observations, then the causal evidence is conclusive. A fingerprint may thus be thought of as a characteristic which maximizes the gap between and and thereby maximizes PNS, by definition.

As an illustration, Marvel and Bonfils (2013) focus on the attribution of changes in precipitation, and subsequently address the question “Have anthropogenic forcing caused the observed evolution of precipitation at a global level?”. Arguably, this study illustrates our point in the sense that it addresses the question by defining a fingerprint index which aims precisely at reflecting the features of the change in precipitation that are thought to materialize frequently (if not systematically) in the factual world and yet are expected to be rare (if not impossible) in the counterfactual one, based on physical considerations. In practice, the index defined by the authors consists of a non-dimensional scalar summarizing the main spatial and physical features of precipitation evolution w.r.t. dynamics and thermodynamics. The factual and counterfactual PDFs of are then derived from the HIST and NAT runs respectively (Fig. 2c). From these PDFs, one can easily obtain an optimal threshold for the precipitation index by applying Equation (6) (Fig. 2f). This yields , i.e. anthropogenic forcings have about as likely as not caused the observed evolution of precipitation.

A qualitative approach driven by physical considerations, such as the one of Marvel and Bonfils (2013), is perfectly possible to define a fingerprint index that aims at maximizing PNS. However, a quantitative approach can also help in order to define optimally, and to identify the features of that best discriminate between the factual and counterfactual worlds. Indeed, the qualitative, physical elicitation of may be difficult when the joint evolution of the variables at stake is complex or not well-understood a priori. Furthermore, one may also wish to combine lines of evidence by treating several different variables at the same time in (i.e. precipitation and temperature, Yan et al. (2016)). Let us introduce the notation where

is the space-time vectorial random variable of size

which observed realization is , and is a mapping from to . Extending Equation (6) to the simultaneous determination of the optimal mapping and optimal threshold , we propose the following maximization:(7) |

In words, we thus propose to choose the value of the threshold, but also to choose the index among the set all possible indexes , so as to maximize PNS. The event defined above in Equation (7) may thus be referred to as the optimal fingerprint w.r.t. forcing . The maximization performed in Equation (7) also suggests that our approach shares some similarity with the method of Yan et al. (2016), insofar as the variables of interest are in both cases selected mathematically by maximizing a criterion which is relevant for attribution (i.e. potential detectability in Yan et al. (2016), PNS in the present article), rather than by following qualitative, physics- or impact-oriented, considerations.

## 4 Implementation under the standard framework

We now turn to the practical aspects of implementing the approach described in Section 3 above, based on the observations and on climate model experiments. We detail these practical aspects in the context of the standard framework briefly recalled in Section 1, i.e. multivariate linear regression under a Gaussian setting. Note that the assumptions underlying the latter conventional framework could be challenged (e.g. pattern scaling description of model error and gaussianity). However, the purpose of this section is not to challenge these assumptions. It is merely to illustrate in detail how these assumptions can be used within the general causal framework proposed. Furthermore, the details of the mathematical derivation shown in this subsection can not be covered exhaustively here in order to meet the length constraint. However, some important steps of the derivation are described in Appendix A, and the complete details and justification thereof can be found in the references given in the text.

### 4.1 Generalities

The maximization of Equation (7) requires the possibility to evaluate the probabilities of occurrence and , in the factual and counterfactual world, of the event , for any and . For this purpose, it is convenient to derive beforehand the factual and counterfactual PDFs of the random variable , denoted and

respectively. Assuming their two first moments are finite, we introduce:

(8) |

The means and represent the expected response in the factual and counterfactual worlds; it is intuitive that their difference will be key to the analysis. The covariances and represent all the uncertainties at stake, they must be carefully determined based on additional assumptions. To avoid repetition in presenting these assumptions, we will detail them for the factual world only, but they will be applied identically in both worlds.

As recalled above, in situ experimentation on the climate system is not accessible, thus we are left with in silico experimentation as the only option. While the increasing realism of climate system models renders such an in silico approach plausible, it is clear that modeling errors associated to their numerical and physical imperfections should be taken into account into . In addition, sampling uncertainty and observational uncertainty, which are commonplace sources of uncertainty in dealing with experimental results in an in situ context as well, should also be taken into account. Finally, internal climate variability also needs to be factored. The latter four sources of uncertainty can be represented by decomposing into a sum of four terms:

(9) |

where the component represents climate internal variability; represents model uncertainty; represents observational uncertainty; and represents sampling uncertainty. Assumptions regarding the latter four sources of uncertainty are also key in the conventional Gaussian linear regression framework recalled in Section 1. We therefore propose to take advantage of some assumptions, data and procedures that have been previously introduced under the conventional framework, and adapt them to specify , , , and . The statistical model specification below can thus be viewed as an extension of developments under the conventional framework, in particular those exposed in Hannart (2016). The various parameters and data involved, as well as their conditioning, are summarized in the hierarchical model represented in Figure 3, which we now describe.

### 4.2 Model description

As in the conventional linear regression formulation recalled in Equation (1), we assume that the random variable is Gaussian with mean and covariance :

(10) |

In the conventional framework, climate models are assumed to correctly represent the response patterns but to err on their amplitude. Recognizing that the scaling factors thereby aim at representing the error associated to models, we prefer to treat as a random variable instead of a fixed parameter to be estimated. The latter factors are also assumed to be Gaussian:

(11) |

where we assume that the expected value of is , and is a scalar parameter which will be determined further in this Section. Combining Equations (10) and (11), it comes (Appendix A):

(12) |

where . Equation (12) thus shows that it is possible to translate the pattern scaling term from the mean of to the covariance of . We believe such a mean-covariance translation is relevant here, since the pattern scaling assumption is meant to represent a source of uncertainty. Furthermore, the covariance associated to the latter source of uncertainty can be represented by the component , which results from the random scaling of the individual responses . Furthermore, the expected value of , denoted , is equal to the sum of the latter individual responses. Under the additivity assumption prevailing in the conventional framework, thus corresponds to the model response under the scenario where the forcings are present. Hence, can be obtained by estimating directly the combined response as opposed to estimating the individual responses one by one and summing them up. Such a direct estimation of is indeed advantageous from a sampling error standpoint, as will be made clear immediately below.

The PDF of in Equation (12) involves three quantities , and that needs to be estimated from a finite ensemble of model runs denoted , which of course introduces sampling uncertainty. Assuming independence among runs, it is straightforward to show that (Appendix A):

(13) |

where is the ensemble average for the individual response ; is the ensemble average for the combined response; is the number of runs available for the individual response to forcing ; is the number of combined forcings runs. Combining Equations (12) and (13), and after some algebra, it comes (Appendix A):

(14) |

with , and where the sampling uncertainty on the responses and thus corresponds to the term . On the other hand, the internal variability component also has to be estimated from the preindustrial control runs, which introduces additional sampling uncertainty. The sampling uncertainty on can be treated by following the approach of Hannart (2016)

, which introduces an Inverse Wishart conjugate prior for

. This leads an Inverse Wishart posterior for which has the following expression (Appendix A):(15) |

where the estimated covariance consists of a so-called shrinkage estimator:

(16) |

where is the empirical covariance of the control ensemble; is a shrinkage target matrix taken here to be equal to i.e. and for ; the shrinkage intensity is obtained from the marginal likelihood maximization described in Hannart et al. (2014); and .

Combining Equations (14) and (15), and after some algebra and an approximation, it comes (Appendix A):

(17) |

where we adopted the simplified parametric form for the covariance of observational error, and where is the multivariate distribution with mean

, variance

and degrees of freedom. Equation (17) implies that taking into account the sampling uncertainty on does not affect the total variance of . Instead, it only affects the shape of the PDF of, which has thicker tails than the Gaussian distribution. With these parameterizations, our model for

is thus a parametric Student model with two parameters .After computing the estimators , , and using the ensemble of model experiments as described above, the parameters are estimated by fitting the above model to the observation based on likelihood maximization. The log-likelihood of the model has the following expression:

(18) |

The estimators and are then obtained numerically using a standard maximization algorithm (e.g. gradient descent). With being obtained from factual runs (i.e. HIST runs) and containing all the forcings including , this procedure thus yields the PDF of in the factual world:

(19) |

Next, to obtain , one simply needs to change the mean to obtained as the ensemble average for the counterfactual experiment “all forcings except ”. Some changes also need to be made regarding the covariance. Indeed, since forcing is absent in the counterfactual world, the model error covariance component , corresponding to the random scaling of the response to forcing , must be taken out of the covariance. Furthermore, if the number of counterfactual runs differ from the number of factual runs , the sampling uncertainty associated to estimating also has to be modified. The PDF of in the counterfactual world can thus be written:

(20) |

As noted above, when is anthropogenic forcing, the counterfactual experiment NAT is usually available in CMIP runs, allowing for a straightforward derivation of . But for other forcings, by the design of CMIP experiments, counterfactual runs are usually not available. A possible workaround then consists in applying the additivity assumption to approximate with . For instance, if CO is the forcing of interest, the counterfactual response to all forcings except CO emissions can be approximated by subtracting the CO individual response from the all forcings response. However in that case, the sampling uncertainty term corresponding to the estimation of must be added to the covariance .

### 4.3 Derivation of the probabilities of causation

With the two PDFs of in hand, an approximated solution to the maximization of Equation (7) can be conveniently obtained by linearizing , yielding a closed mathematical expression for the optimal index :

(21) |

Equation (21) is a well known result of Linear Discriminant Analysis (LDA) (McLachlan, 2004). Details of approximations made and of the mathematical derivation of Equation (21) are given in Appendix B. The optimal index can thus be interpreted as the projection of onto the vector which will be denoted hereinafter, i.e. .

To obtain PNS, we then need to derive the factual and counterfactual CDFs of , denoted and respectively. Since no closed form expression of these CDFs is available, we simulate realizations thereof. Drawing two samples of random realizations of from the Student distributions and is straightforward, by treating the Student

as a compound Gaussian–Chi-squared distribution. Samples of

are then immediately obtained by projecting onto and the desired CDFs can be estimated using the standard kernel smoothing estimator (Bowman and Azzalini, 1997), yielding and for all . Finally, follows as:(22) |

and:

(23) |

where .

### 4.4 Reducing computational cost

When the dimension of is large, the above described procedure can become prohibitively costly if applied straightforwardly, due to the necessity to derive the inverse and determinant of at several steps of the procedure. However, the computational cost of these operations can be drastically reduced by applying the Sherman-Morrison-Woodbury formula (Woodbury, 1950), which states that the inverse of a low rank correction of some matrix can be computed by doing a low rank correction to the inverse of the original matrix. Omitting the notation for more clarity, we have:

(24) |

where can be inverted using the same formula:

(25) |

where . Likewise, we apply the Sylvester formula (Sylvester, 1851) twice to compute the determinant of :

(26) |

Independently of , the matrices is of size , is of size , and is diagonal. Obtaining their inverse and determinant is therefore computationally cheap. Hence, the inverse and determinant of can be obtained at a low computational cost by applying first Equation (25) to determine and second Equations (24) and (26).

## 5 Illustration on temperature change

Our methodological proposal is applied to the observed evolution of Earth’s surface temperature during the century, with the focus being restrictively on the attribution to anthropogenic forcings. More precisely, consists of a spatial-temporal vector of size which contains the observed surface temperatures averaged over 54 time-space windows. These windows are defined at a coarse resolution: Earth’s surface is divided into 6 regions of similar size (3 in each hemisphere) while the period 1910-2000 is divided into 9 decades. The decade 1900-1910 is used as a reference period, and all values are converted to anomalies w.r.t. the first decade. The HadCRUT4 observational dataset (Morice et al., 2012) was used to obtain . With respect to climate simulations, the runs of the IPSL-CM5A-LR model (Dufresne et al., 2012) for the NAT, ANT, HIST and PIcontrol experiments were used (see Appendix C for details) and converted to the same format as after adequate space-time averaging.

Following the procedure described in Section 4, we successively derived the estimated factual response using the HIST runs; the estimated counterfactual response using the NAT runs; the estimated individual responses and using the NAT runs and ANT runs respectively ( and ); the estimated covariance from the PIcontrol runs. Then, we derived and by likelihood maximization, to obtain and .

An assessment of the relative importance of the four components of uncertainty was obtained by deriving the trace of each component (i.e. the sum of diagonal terms) normalized to the trace of the complete covariance. Climate variability is found to be the dominant contribution, followed by model uncertainty, observational uncertainty and sampling uncertainty (not shown). The split between model and observational uncertainty is to some extent arbitrary as we have no objective way to separate them based only on , i.e. the model could be equivalently formulated as and . An objective separation would require an ensemble representing observational uncertainty, allowing for a preliminary estimation of .

The optimal vector , designed to capture the space-time patterns that best discriminate the HIST evolution and the NAT one, was then obtained from Equation (21). To identify which features of are captured by this optimal mapping, the coefficients were averaged spatially and temporally, and were plotted in Figure 4. Firstly, it can be noted that the coefficients’ global average is large and positive: a major discriminant feature is merely global mean temperature, as expected. Secondly, the coefficients also exhibit substantial variation around their average in both space and time. Spatial variations of unsurprisingly suggest that, beyond global mean temperature, other discriminant features include the warming contrast prevailing between the two hemispheres and/or between low and high latitudes (the low resolution prevent from a clear separation), as well as between ocean and land (Fig. 4a). Temporal variations of suggest that discriminant features includes the linear trend increase as expected, but also higher order temporal variations (Fig. 4b).

The PDFs of the optimal index were derived, and are plotted in Figure 5, together with PNS as a function of the threshold . The maximum of PNS determines the desired probability of causation:

(27) |

In IPCC terminology, this would mean that anthropogenic forcings have virtually certainly caused the observed evolution of temperature, according to our approach. More precisely, the probability that the observed evolution of temperature is not caused by anthropogenic forcings is one in then thousands (1:10,000) instead of one in twenty (1:20). Therefore, the level of causal evidence found here is substantially higher than the level assessed in the IPCC report. This discrepancy will be discussed in Section 6.

Before digging into this discussion, it is interesting to assess the relative importance of the “trivial” causal evidence coming from the global increase in temperature, and of the less obvious causal evidence coming from space-time features captured by . For this purpose, we merely split into the sum of a global average contribution and of the remaining variations . The PDFs of the resulting indexes are plotted in Figure 5a,b. Their bivariate PDF can also be visualized with the scatterplot of Figure 5c. The following two probabilities of causation are obtained:

(28) |

where refer to the globally averaged evolution and refer to other features of evolution. Therefore, while the globally averaged warming provides alone a substantial level of evidence (i.e. ), these results suggest that the overwhelmingly high overall evidence (i.e. ) is primarily associated to non-obvious space-time features of the observed temperature change.

## 6 Discussion

### 6.1 Comparison with previous statements

The probabilities of causation obtained by using our proposal may depart from the levels of uncertainty asserted by the latest IPCC report, and/or by previous work. For instance, when corresponds to the evolution of precipitation observed over the entire globe during the satellite era (1979-2012), we have shown in Section 3 that, using the dynamic-thermodynamic index built by Marvel and Bonfils (2013), the associated probability of causation is found to be 0.43. This probability is thus significantly lower than the one implied by the claim made in this study that “the changes in precipitation observed in the satellite era are likely to be anthropogenic in nature” wherein “likely” implicitly means .

In contrast with the situation prevailing for precipitation, when corresponds to the observed evolution of Earth’s surface temperature during the century, and in spite of using a very coarse spatial resolution, we found a probability of causation which considerably exceeds the probability implied by the latest IPCC report. Such a gap deserves to be discussed.

Firstly, the probability of causation defined in our approach is of course sensitive to the assumptions that are made on the various sources of uncertainty, all of which are here built into . Naturally, increasing the level of uncertainty, for instance through an inflation factor applied to , reduces the probability of causation (Figure 6). In the present illustration, uncertainty needs to inflated by a factor 2.4 to obtain in agreement with the IPCC statement. Therefore, a speculative explanation for the gap is that experts may be adopting a conservative approach by implicitly inflating uncertainty, although not explicitly, perhaps in an attempt to account for additional sources of uncertainty that are not well known. In the present case, such an inflation should amount to 2.4 to explain the gap. This number is arguably too high to provide a satisfactory standalone explanation, yet overall, such a conservativeness may partly contribute to the discrepancy when it comes to temperature. In any case, this highlights the need for a more explicit and consistent use of conservativeness — if any.

Besides the effect of inflating the individual variances, it is important to note that the probability of causation may also be greatly reduced when the correlation coefficients of the covariance , whether spatial or temporal, are inflated. This less straightforward effect can be explained by the fact that higher correlations imply greater spatial and temporal coherence of the noise, which is thus more prone to confounding with a highly coherent signal, and thereby reduces the probability of causation. Conservativeness may thus be associated to an inflation of correlations, in addition to an inflation of variances.

Another possible explanation for the discrepancy is more technical. Many previous attribution studies buttressing the IPCC statement regarding temperature, are based on an inference method for the linear regression model of Equation (1) which is not optimal w.r.t. maximizing causal evidence — despite of it being often referred to as “optimal fingerprinting”. More precisely, the inference on the scaling factors and the associated uncertainty quantification, are obtained by projecting the observation as well as the patterns

onto the leading eigenvectors of the covariance

associated to climate internal variability. Such a projection choice sharply contrasts with the projection used in our approach, which is indeed performed onto the vector . Denoting the eigenvectors of andthe corresponding eigenvalues, the expression of

can be written:(29) |

Equation (29) shows that projecting onto does not emphasize the leading eigenvectors of , in contrast to the aforementioned standard projection. Instead, it emphasizes the eigenvectors that simultaneously present a low eigenvalue and a strong alignment with the contrast between the two worlds . As a matter of fact, the ratio can be interpreted as the signal-to-noise ratio associated to the eigenvector , where the eigenvalue quantifies the magnitude of the noise and that of the causal signal. Projecting onto thus maximizes the signal-to-noise ratio. In contrast, since is a large contribution to (the dominant one in our illustration), a projection onto the leading eigenvectors of naturally tends to amplify the noise, and to partly hide the relevant causal signal .

In order to assess whether or not these theoretical remarks hold in practice, we revisited our illustration and quantified the impact on of using such a projection onto the leading eigenvectors of . For this purpose, after computing the projection matrix on the ten leading eigenvectors of , our procedure was applied identically, but this time using the projected vector . Results are shown in Figure 7, again after splitting the contribution of global mean change and patterns of change. Unsurprisingly, the probability of causation associated to the global mean change remains unmodified at 0.978. However, the probability of causation associated to the space-time features of warming drops from 0.9994 to 0.92. Indeed, the features that most discriminate the two worlds, and are summarized in , do not align well with the leading eigenvectors of . They are thus incompletely reflected by the projected vector (Figure 8). Furthermore, the uncertainty of the resulting index is high by construction, as the magnitude of climate variability is maximized when projecting onto its leading modes. This further contributes to reducing to 0.992.

### 6.2 Counterfactual experiments

Our methodological proposal has an immediate implication w.r.t. the design of standardized CMIP experiments dedicated to D&A: a natural option would be to change the present design “forcing only” into a counterfactual design “all forcings except ”. Indeed, is driven by the difference between the factual response (i.e. historical experiment) and the counterfactual response (i.e. all forcings except experiment). Under the assumption that forcings do not interact with one another and that the combined response matches with the sum of the individual responses, the difference coincides with the individual response (i.e. forcing only experiment). While this hypothesis is well established for temperature at large scale (Gillett et al., 2004), it appears to break down for other variables (e.g. precipitation, (Shiogama et al., 2013)) or over particular regions (e.g the Southern extratropics, (Morgenstern et al., 2014)) where forcings appear to significantly interplay. Such a lack of additivity would inevitably damage the results of the causal analysis. It is thus important in our view to better understand the domain of validity of the forcing-additivity assumption and to evaluate the drawbacks of the present “one forcing only” design versus its advantages. Such an analysis does require “forcing only” experiments, but also “all forcings except ” experiments in order to allow for comparison. This effort would hence justify including in the DAMIP set of experiments an “all forcings except ” experiment — which is presently absent even in the lowest priority tier thereof — at least for the most important forcings such as anthropogenic CO.

### 6.3 Benchmarking high probabilities

Section 5 showed that the proposed approach may sometimes yield probabilities of causation that are very close to one. How can we communicate such low levels of uncertainty? This question arises insofar as the term “virtual certainty” applies as soon as PNS exceeds under the current IPCC language. Thus, this terminology would be unfit to express in words a PNS increase from to , say — even though such an increase corresponds to a large reduction of uncertainty by a factor one hundred. One option to address this issue is to use instead the uncertainty terminology of theoretical physics, in which a probability is translated into an exceedance level under the Gaussian distribution, measured in numbers of from the mean (where

denotes standard deviation), i.e.

with the CDF of the standard Gaussian distribution. Under such terminology, “virtual certainty” thus corresponds to a level of uncertainty of 2.3, while found in Section 5 reaches 3.7. It is interesting to note that the level of uncertainty officially requested in theoretical physics to corroborate a discovery as such — e.g. the existence of the Higgs Boson — is 5. By applying such standards, one may actually consider that is still too low a probability to corroborate that human influence has indeed been the cause of the observed warming. Whether or not such standards are relevant in the particular context of climate change — which relates to defining the proper level of aversion to false discovery suitable in that context — are discutable matters. In any case, increasing beyond the “virtual certainty” threshold of by building more evidence into the analysis, is possible and may still be considered as a relevant research goal.### 6.4 Alternative assumptions

The mathematical developments of Section 4 are but an illustration of how our proposed causal approach, as framed in Section 3, can be implemented when one uses the conventional assumptions of pattern scaling and gaussianity associated to the standard linear regression setting. In that sense, Section 4 thus shows that the proposed causal framing is perfectly compatible with the conventional linear regression setting: it should be viewed as an extension of, rather than an alternative to, the latter setting. Nevertheless, it is important to underline that the application of the causal framework of Section 3 is by no means restricted to the conventional linear regression setting. One may for instance challenge some aspects of the latter, e.g. the pattern scaling description of model error, and formulate an alternative parameterization of the covariance . This does not affect the relevance of the maximization of Equation (7), which can be implemented similarly.

### 6.5 Attribution as a classification problem

Lastly, it should be noted that the maximization of Equation (7) can be viewed as a binary classification problem. Indeed, as illustrated in Figure 5, solving Equation (7) is equivalent to building a function of observations which allows to optimally discriminate between two “classes”: the factual class and the counterfactual class. Under this perspective, PNS is related to the

of correct classification decisions made by the classifier and is thus a measure of its skill.

Viewing the fingerprinting index

as a classifier offers a fruitful methodological angle in our opinion. Indeed, classification is a classic and widespread problem in statistics and machine learning for which numerous solutions are readily available. For instance, under the restrictive assumptions of Section 4 — among which the Gaussian assumption — one obtains a linear classifier under a closed form expression, which is well known in Linear Discriminant Analysis

(McLachlan, 2004). But more recent developments have focused on the non Gaussian situations where a nonlinear classifier is more suitable, and can be formulated for instance using as diverse approaches as random forests, support vector machine or neural nets

(Alpaydin, 2010). Testing such approaches for the present attribution problem certainly offers promise. However, the difficulty to physically interpret such complex classifiers represent a challenge in such approaches.## 7 Summary and conclusion

We have introduced an approach for deriving the probability that a forcing has caused a given observed change. The proposed approach is anchored into causal counterfactual theory (Pearl, 2009) which has been introduced recently in the context of EA. We argued that these concepts are also relevant, and can be straightforwardly extended to the context of climate change attribution. For this purpose, and in agreement with the principle of fingerprinting applied in the conventional D&A framework, a trajectory of change is converted into an event occurrence defined by maximizing the causal evidence associated to the forcing under scrutiny. Other key assumptions used in the conventional D&A framework, in particular those related to numerical models error, can also be adapted conveniently to this approach. Our proposal thus allows to bridge the conventional framework with the standard causal theory, in an attempt to improve the quantification of causal probabilities. Our illustration suggested that our approach is prone to yield a higher estimate of the probability that anthropogenic forcings have caused the observed temperature change, thus supporting more assertive causal claims.

We gratefully acknowledge helpful comments by Aurélien Ribes and three anonymous reviewers. This work was supported by the French Agence Nationale de la Recherche grant DADA (AH, PN), and the grants LEFE-INSU-Multirisk, AMERISKA, A2C2, and Extremoscope (PN). The work of PN was completed during his visit at the IMAGE-NCAR group in Boulder, CO, USA.

[A]

## Derivation of the PDF of

To obtain Equation (12) from Equation (10) and (11), we integrate out :

(30) |

Given the quadratic dependence to of the two terms under the integral in the right hand side of Equation (30), it is clear that the PDF of the left hand side is also Gaussian. Thus, instead of computing the above integral, it is more convenient to derive the mean and variance of this PDF by applying the rule of total expectation and total variance:

(31) |

Next, in order to account for the sampling uncertainty on the estimation of

, we apply Bayes theorem to derive the PDF of

conditional on the ensemble . Denote the simulated responses in which are assumed to be i.i.d. according to a Gaussian with mean and covariance . We have:(32) |

where is the empirical mean of the ensemble, and we use the improper prior . The exact same approach yields .

To integrate out , we proceed by following the same reasoning as above for integrating out . Since the resulting PDF is clearly Gaussian, it suffices to derive its mean and variance:

(33) |

Likewise, to integrate out , we derive the total mean and total variance:

(34) |

with . Note that is no longer Gaussian after integrating out , because appears in the covariance of . However, for simplicity, we approximate it to be Gaussian.

The sampling uncertainty on the covariance matrix is addressed by using an approach described in Hannart et al. (2014) which main ideas are succinctly recalled here. The reader is referred to the publication for details and explicit calculations. In summary, we apply Bayes theorem in order to derive , as for and . However, we use this time an informative conjugate prior on , as opposed to an improper prior.

(35) |

where denotes the a priori mean of and is a scalar parameter that drives the a priori variance. Furthermore, the mean and variance parameters of this informative prior are estimated from by maximizing the marginal likelihood associated to this Bayesian model.

(36) |

where is the variate Gamma function and is the empirical covariance. The estimators satisfy to:

(37) |

where is a set of definite positive matrices chosen to introduce a regularization constraint on the covariance. Here we choose the set of definite positive diagonal matrices, and we derive an approximated solution to Equation (37) with and . Because the prior PDF is fitted on the data, this approach can be referred to as “empirical bayesian”. The “fitted” prior is then updated using the ensemble , and the obtained posterior has a closed form expression due to conjugacy:

(38) |

where and . We can then use the above posterior to integrate out in the PDF of , in order to obtain :

(39) |

The integral above does not have a closed form expression because the variance of is not proportional to . To address this issue, we approximate by . This assumption is conservative in the sense that it extends the sampling uncertainty on to even though is a constant. It yields a closed form expression of the above integral thanks to conjugacy:

(40) |

[B]

## Optimal index derivation

Let us solve the optimization problem of Equation (7) under the above assumptions. For simplicity, we restrict our search to so called “half-space” events which are defined by where is a linear index with a vector of dimension , and is a threshold. Let us consider PNS as a function of and .

(41) |

For simplicity, we will use an expression of in the treatment of the optimization problem which approximates by a Gaussian PDF, even though it is a Student PDF from the calculations of Section 4. Note that this approximation is made restrictively here for deriving an optimal index. Once this index is obtained, it is the then the true Student PDF of that will be used to derive the desired value of PNS. Therefore, the implication of this approximation is to yield an index which is suboptimal and thereby underestimates the maximized value .

(42) |

where is the standard Gaussian CDF. The first order condition in , , thus yields:

(43) |

Next, we introduce a third approximation to solve Equation (43), yielding:

(44) |

Then, the first order condition in , , yields:

(45) |

which proves Equation (21). Figure 5c illustrates this solution and also shows that the optimization problem of Equation (7) may be viewed as a classification problem. Our proposal to solve Equation (