Causal Modelling of Heavy-Tailed Variables and Confounders with Application to River Flow

10/13/2021
by   Olivier C. Pasche, et al.
EPFL
University of Geneva
UNIL
0

Confounding variables are a recurrent challenge for causal discovery and inference. In many situations, complex causal mechanisms only manifest themselves in extreme events, or take simpler forms in the extremes. Stimulated by data on extreme river flows and precipitation, we introduce a new causal discovery methodology for heavy-tailed variables that allows the use of a known potential confounder as a covariate and allows its effect to be almost entirely removed when the variables have comparable tails, and also decreases it sufficiently to enable correct causal inference when the confounder has a heavier tail. We introduce a new parametric estimator for the existing causal tail coefficient and a permutation test. Simulations show that the methods work well and the ideas are applied to the motivating dataset.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

page 11

page 14

page 20

page 22

page 23

08/14/2019

Causal discovery in heavy-tailed models

Causal questions are omnipresent in many scientific problems. While much...
08/23/2021

Causal Analysis at Extreme Quantiles with Application to London Traffic Flow Data

Treatment effects on asymmetric and heavy tailed distributions are bette...
07/08/2019

Causal mechanism of extreme river discharges in the upper Danube basin network

Extreme hydrological events in the Danube river basin may severely impac...
07/18/2020

Autoregressive flow-based causal discovery and inference

We posit that autoregressive flow models are well-suited to performing a...
09/11/2021

Uncovering Main Causalities for Long-tailed Information Extraction

Information Extraction (IE) aims to extract structural information from ...
09/28/2020

Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect

As the class size grows, maintaining a balanced dataset across many clas...
03/18/2020

Causal Simulation Experiments: Lessons from Bias Amplification

Recent theoretical work in causal inference has explored an important cl...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The field of causal inference has developed massively in recent decades. Most of this literature concerns central quantities such as expectations, but certain causal mechanisms manifest themselves only in rare events and or may simplify in distribution tails. Standard methods of causal inference are ill-suited for such situations, and recent work has begun to link causality and extreme value theory. Examples are Gissibl.Kluppelberg:2018, who define recursive max-linear models on directed acyclic graphs, Kluppelberg21, who propose a scaling technique to determine the causal order of the variables in such graphs, kiriliouk

, who use multivariate generalized Pareto distributions to study probabilities of necessary and sufficient causation as defined in the counterfactual theory of Pearl, and

mhalla2018

, who construct a causal inference method for tail quantities relying on Kolmogorov complexity of extreme conditional quantiles. See surveys by

naveau2020 on extreme event attribution and by Engelke2020 on the detection and modeling of sparse patterns in extremes.

Our work stems from that of Gnecco2019, who propose an estimator of the causal tail coefficient and an algorithm that, under mild conditions, consistently retrieves a causal order on an underlying graph even in the presence of hidden confounders. Such an order helps to exclude some causal structures, but does not provide evidence for the existence of a specific structure, as in general a given order is causal for several possible graphs; in particular, all orders are causal for the empty graph corresponding to absence of causality. Although it is asymptotically invariant to hidden confounders, this estimator can suffer from confounding in finite samples when inference on the direct relationship between two variables is needed, when these effects are too strong or when the confounders have heavier tails than the two variables.

This paper addresses a central challenge in causal inference: the presence of confounders. It is often assumed that all the relevant variables are observed and can be included in the model, but typically this cannot be ensured. The available variables are often subject to external influences, observed or unobserved, that affect the variables of interest and can make it harder or even impossible to infer a correct causal relationship. Our goals are to mitigate the effect of a known confounder on an extremal causal analysis by treating it as a covariate, and to present a permutation test for direct causality between the two observed variables. Such a model enables causal discovery and inference for a greater variety of situations.

Our work was stimulated by average daily discharge data from 68 gauging stations along the Rhine and Aare catchments in Switzerland, see Figure 1. The data were collected by by the Swiss Federal Office for the Environment (hydrodaten.admin.ch), but were provided by the authors of Engelke2020, with some useful preliminary insights. We focus on the causal relationship between extreme discharges, for which precipitation is an obvious confounder, and use daily precipitation data from meteorological stations, provided by the Swiss Federal Office of Meteorology and Climatology, MeteoSwiss (gate.meteoswiss.ch/idaweb). Unlike in our simulation experiments, we know neither the true tail properties of the discharges and precipitation nor the effect of the confounder. We use precipitation as a covariate and show that it allows correct inference on the direct causal relationships between discharges for the majority of the station pairs, with at least estimated confidence, which was impossible without our proposed approach.

Figure 1: Topographic map of Switzerland showing the gauging stations (red dots) along the Rhine, the Aare and their tributaries. Water flows towards station . Adapted from Engelke2020.

The paper is organised as follows. Section 2 discusses the causal tail coefficient, its interpretation and its properties. Section 3 introduces a new parametric estimator for it based on generalized Pareto modelling of threshold excesses, which allows a known confounder to be used as a covariate. A simulation study in Section 4 underlines the strengths and limitations of the two estimators. Section 5 presents a permutation test intended to detect direct causality between two heavy-tailed variables, which is also assessed via simulation. Section 6 applies the methodology to the river discharges, and Section LABEL:s:conclu gives a brief discussion.

2 Causal tail coefficient and its estimation

We first give some basic notions needed to describe the setting in which causal relationships between random variables can be recovered.

Definition 2.1.

A linear structural causal model (LSCM) over a set of random variables satisfies

where is a set of nodes representing the corresponding random variables, is the set of parents of , is called the causal weight of node on node , and are jointly independent noise variables. We suppose that the associated graph , in which the directed edge belongs to if and only if , is a directed acyclic graph (DAG).

In a DAG , we say that is an ancestor of in , if there exists a directed path from to . The set of the ancestors of in is denoted by , and we define . In a LSCM over random variables , with associated DAG , we say that causes , if . We call a confounder (or common cause) of and if there exist directed paths from to and from to in that do not include and , respectively. We say that there is no causal link between and if . For any we let denote the sum of the products of the causal weights along the distinct directed paths from vertex to vertex ; we set and if .

Let and be random variables from a LSCM with respective distributions and . The causal (upper) tail coefficient of a random variable on another random variable is defined as (Gnecco2019)

(1)

if the limit exists. This coefficient lies between zero and one and captures the causal influence of on in their upper tails: if has a linear causal effect on , will be close to unity. The coefficient is asymmetric, as extremes of need not lead to extremes of , and in that case, will be appreciably smaller than . As only depends on the rescaled margins of the variables, it is invariant to monotone increasing marginal transformations.

If both tails are of interest, the causal tail coefficient can be generalized to capture the causal effects in both directions, by considering the symmetric causal tail coefficient of on , i.e.,

if the limit exists, where . As ,

The interpretation and properties of are similar to those of . The symmetric version captures the causal influence of on in both of their tails.

For simplicity we focus on in this paper, though all of our results and methods can be generalized to both tails by considering instead, if the assumptions for the upper tails are also satisfied in the lower tails of the variables considered.

Before stating the theorem that describes how the underlying causal relationships in a set of random variables can be recovered, we define the concept of regular variation.

Definition 2.2.

A positive measurable function is said to be regularly varying with index , written , if for all , . If , then is said to be slowly varying.

Definition 2.3.

The random variable is said to be regularly varying with index , if, for some , as .

Independent regularly varying random variables are said to have comparable upper tails if there exist , and such that, for each , as .

The following theorem describes how the causal relationships underlying a set of random variables can be recovered from their causal tail coefficients.

Theorem 2.4 (Gnecco2019, Gnecco2019).

Let be random variables from a LSCM, with associated directed acyclic graph and suppose that

  1. the coefficients of the linear structural causal relationship are strictly positive for all and , and

  2. the real-valued noise variables are independent and regularly varying with comparable upper tails.

Then the values of and allow one to distinguish between the different possible causal relationships between and summarized in Table 1.

causes
causes common cause only
no causal link
Table 1: Equivalence of the possible values of and with the underlying causal relationship between and .

Under the theorem’s assumptions, the blank entries in Table 1 cannot occur. Theorem 2.4 is generalizable to the variant of the coefficient and possibly negative values if the assumptions are also satisfied in the lower tails of the variables.

Gnecco2019 show that under the setup and assumptions of Theorem 2.4, the causal tail coefficient (1) for any distinct , and with , is

(2)

If are independent replicates of , with the random variables and from the LSCM, then the non-parametric estimator of is defined to be

(3)

for some , where denotes the indicator function, denotes the order statistic and

is the empirical cumulative distribution function of

, i.e.,

This estimator is the empirical counterpart to (1), as is a quantile of the corresponding empirical distribution. The value of controls the number of data pairs in the upper tail of that contribute to the estimator. Under the assumptions of Theorem 2.4 and a “very mild assumption that is satisfied by most univariate regularly varying distributions of interest”, estimator (3) is consistent as , for a choice of such that and (Gnecco2019).

A strength of the causal tail coefficient approach is its asymptotic robustness to hidden confounders. A frequent assumption in causality is that all the relevant variables are observed, but this is usually moot. Theorem 2.4 holds even when some variables in the underlying LSCM are unobserved. The capacity to deal with confounders both when studying the causal relationship between two variables and when retrieving a causal order is not generally shared by other approaches in causal inference, as argued by Gnecco2019, but this valuable property has the limitation that the unobserved variables must satisfy the regular variation assumption of Theorem 2.4, which cannot be checked in practice and may be unrealistic. In our motivating setting, for example, the tail of the precipitation variable, the confounder, may not behave like the tails of the river discharges. Moreover, it may be difficult to distinguish between causal cases using empirical estimates. In particular, an increase in the strength of the causal effect of a common confounder of and will increase , making it harder to tell whether a high value of indicates that or that . This is illustrated in the simulations in Section 4.

3 Parametric Tail Causality and Confounder Dependence

The causal tail coefficient (1) is useful when trying to understand a causal relationship between two variables, as it allows inference on the underlying extremal causal structure and has attractive asymptotic properties. However, the non-parametric estimator (3) is inappropriate when the underlying assumptions are not met or when a hidden confounder has a strong influence. This problem generally occurs when the confounder is different in nature from the variables of interest, as its tail is then rarely comparable to theirs. In particular the tail of the confounder may be heavier, giving it an even larger unwanted effect on the coefficients. Thus a useful modification would be to allow conditioning on the value of a known confounder in order to remove or at least reduce its effect on the estimated causal tail coefficient . In this section, we propose an approach to this through a peaks-over-threshold method. Another useful modification would be a reliable statistical test for direct causality. We propose and discuss such a test, based on a permutation resampling method, in Section 5.

3.1 Generalized Pareto Causal Tail Coefficient

We first model the tails of the variables using the generalized Pareto distribution (GPD) to model the excesses (IExtr, Chapter 4). This has scale and shape parameters that are typically estimated using maximum likelihood and that can depend on other variables (davison1990). For , and under mild conditions on , for a threshold large enough, the threshold excesses follow approximately a generalized Pareto distribution. That is,

(4)

with a scale parameter and a shape parameter that determines :

  • corresponds to light-tailed distributions, and then lies in the maximum domain of attraction of the Gumbel distribution;

  • corresponds to heavy-tailed distributions, and then lies in the maximum domain of attraction of the Fréchet distribution; and

  • corresponds to distributions with bounded upper tails, and then lies in the maximum domain of attraction of the (reverse) Weibull distribution.

Any random variable satisfying the assumptions of Theorem 2.4 satisfies (4), as a regularly varying random variable with index lies in the Fréchet maximum domain of attraction. If the threshold is chosen to be the quantile of for some , then we can write

and using the empirical distribution to estimate and maximum likelihood estimation using the excesses of to obtain and yields an estimator of the distribution function of , i.e.,

Using hybrid estimators for and for an integer yields the parametric GPD causal tail coefficient estimator for ,

(5)

where . Unlike with the non-parametric estimator (3), the number of data pairs used in (5) may not equal , as it depends on the fit of .

The GPD model can be reparametrized to allow dependence on time or on another variable of interest. More precisely, the model’s parameters can be rewritten in the form , where denotes either , or both, is an inverse link function,

is a vector of parameters and

contains the values of explanatory variables on which the model might depend (IExtr, Chapter 6).

We wish to reparametrise the model to reduce or remove the effect on of a potential confounder of and . If appears linearly in the LSCM then under the setup in Section 2 it is straightforward to show that affects the scale parameters of the GPD model that applies to and above high thresholds, and not their shapes.Thus we write

where is the replicate of corresponding to the observations of . This yields, for , the parametric -conditional linear generalized Pareto distribution (LGPD) causal tail coefficient estimator for ,

(6)

Estimation of , and is performed by maximum likelihood. In applications it is preferable to center and rescale the values of

to have unit variance and be centered around zero.

3.2 The positive linear scale issue

Linear modelling of the GPD scale parameter may not yield positive scale estimates for each and . The use of a nonlinear link function to ensure that the scale estimates were positive would not agree with the assumption of extremal linearity of the causal relationships, as the effect of on the scale is also necessarily linear. We now describe two different solutions to this problem, which we compare by simulation in Section 4.

The first solution, post-fit correction, replaces in (6) by for an arbitrarily small positive . The second solution, the constrained approach, applies the constraints

(7)

to the estimates when maximizing the likelihood. When the data have a known distribution, box constraints can replace the linear ones. For example, if , and have distributions, then and . Thus, if , then

(8)

where the lower and upper bounds are needed for positive and negative , respectively.

4 Simulation Study

Here we perform a simulation study using the Student

, Pareto and log-normal distributions. The first two lie in the Fréchet maximum domain of attraction and are regularly varying with index

. As the Pareto distribution exactly satisfies Definition 2.3, one might expect better behaviour with this than with the Student

distribution. The log-normal distribution lies in the maximum domain of attraction of the Gumbel distribution and is not regularly varying, but finite samples from it can appear to be heavy-tailed.

We focus on the behaviour of the causal tail coefficient estimators (3) and (6) between two variables and in their causal configurations, as shown in Figure 2. As we study the estimators of causal effects of both on and of on , we generated simulations only for the four causal cases, A, B, C and D. The LSCM causal weights , and were chosen to equal unity, by default, for each existing edge in all four cases. Hence, in D, is caused by and with equal strength, even though has a second effect on through .

Unless stated otherwise, each estimate is based on a random sample of triples , of which were chosen based on Gnecco2019, who found that the optimal fractional exponent of for choosing seems to lie between and . The factor doubles the number of data pairs used in the estimator, thus decreasing its variability, but does not introduce much bias for such large values of . One thousand independent replicate estimates were calculated for each of the four causal configurations and three distributions.

We present only the highlights of the study; the code and all the results are available from github.com/opasche/ExtremalCausalModelling.

Figure 2: The six possible causal configurations between and with a possible confounder , separated into the four cases studied in the simulations, and the two omitted by symmetry.

4.1 Variables with comparable tails

Detailed results for variables with comparable tails are presented in Section LABEL:sm:comparableTails of the Supplementary Material. In this case it is essentially always possible to infer the existence and direction of any causality between and , based on the non-parametric or -conditional LGPD estimators, (3) or (6), of and alone. When the causal effects of on and , i.e., and , are increased relative to the noise variance and any causal effect of on , both and increase in configuration B, and increases in configurations C and D. This increase is more marked with the non-parametric estimators of and , which are biased upwards in these configurations. When the confounder has a high causal impact, inference based on the non-parametric estimator (3) for direct causal link between and is sometimes impossible, as and their difference is close to zero in configurations B and D.

Use of the -conditional LGPD estimator (6) greatly reduces the effect of on the coefficient estimates in configurations B and D. For Pareto and log-normal data, the results are indistinguishable from those without the confounder, both in terms of location and variability, as if the effect of had been entirely removed. For the Student distribution, the estimates are also shifted to around the same values as in the corresponding confounder-free configurations, though their upper tails are marginally heavier. These few greater values remain appreciably lower than without as a covariate. For configurations A and C, unlike for B and D, the estimator is almost unaffected by the addition of as a covariate when it is not a confounder. This is also a useful property, as it could allow tests of whether a specific covariate is a confounder of two variables, based on changes to the estimated coefficient when it is added to the model.

4.2 Confounder with a different tail

One generalisation allows the tail of the distribution of to be heavier or lighter than those of and . A lighter tail does not negatively affect whether the non-parametric and -conditional LGPD estimators can infer a direct causal relationship between and , since the tails of and are then dominant. Figure 3 shows the sampling distributions of and for all four causal structures when the tail of is heavier than those of and . The true coefficient values are unknown, as assumption (b) of Theorem 2.4 is not satisfied, though the coefficient for comparable tails, (2), is shown for comparison.

Figure 3: Histograms of (turquoise) and (blue) for -distributed and , and -distributed (top four panels) and for -distributed and , and -distributed (bottom four panels). Half-lines indicate and for comparable tails. The panels for distributed and , and distributed are very similar to the lower four panels.

When has a heavier tail than and , the non-parametric estimators and in configuration B and in configuration D are shifted well towards unity. With an even heavier-tailed, Student , distribution for (not shown here), the Student results resemble those for the Pareto and log-normal distributions. In all these cases it becomes impossible to infer a direct causal relationship between and , due to the effect of the heavier confounder tail on the non-parametric estimators.

Figure 4 shows the sample distributions of and with post-fit correction, for all four causal configurations, when the tail of is heavier than those of and .

Figure 4: Histograms of (turquoise) and (blue) with post-fit correction for distributed and , and distributed (top four panels), for distributed and , and distributed (middle four panels), and distributed and , and distributed (lower four panels). Half-lines indicate and for comparable tails.

Figure 3 shows that in configurations B and D the non-parametric estimator is badly affected by the heavier tail of : all estimates lie very close to unity, rendering inference on the direct causal relationship between and impossible. In the -conditional LGPD context, the use of as a covariate solves this problem: the estimates shift towards the coefficient values in the corresponding confounder-free cases, and consistently yield positive values of the difference of estimates for configuration D and differences centred at zero for configuration B; see also Section LABEL:sm:comparableTails of the Supplementary Material. The estimates in configurations A and C, without the confounder causal effect, are barely changed by the addition of as a covariate.

Simulation results for and with the constrained fit are very similar to those for post-fit correction for the Pareto and log-normal distributions, but not for the Student distribution. Figure 5 shows the sample distribution of and with the constrained fit, for a heavier confounder tail.

Figure 5: Histograms of (turquoise) and (blue) with constrained fit for distributed and , and distributed . Half-lines indicate and for comparable tails.

For the Student distribution, the effect of the confounder on the estimator is appreciably less reduced for the constrained fit than for post-fit correction, compared to the non-parametric results. As the Student distribution is heavy in both tails, the lower inequality constraint in (7) forces () to have an appreciably smaller slope, explaining this reduced effect. In configurations with a confounder, the absolute values of the constained may be up ten times smaller than those for post-fit correction. With both approaches rarely differs greatly from zero for configurations without a confounder.

In the case of the Student distribution, both types of constraint yield very similar estimates; see github.com/opasche/ExtremalCausalModelling.

To summarize, the simulations show that both the non-parametric estimator (3) and the -conditional LGPD estimator (6) perform well when the theoretical assumptions are met and the influence of a hidden confounder is not too strong. When this influence grows, it becomes increasingly difficult to confidently infer the causal relationship between the variables using the non-parametric estimator, but the -conditional LGPD estimator allows us to detect this relationship by reducing the effect of the confounding.

5 Testing for Direct Causality

5.1 Permutation Test

In situations such as the causal analysis presented in Section 6, the distributions of the and estimators must be estimated to be used for inference. One way to obtain such distributions would be bootstrap resampling, but the extremal nature of the causal tail coefficient would require an unrealistically large sample size for its bootstrap distributions to be trustworthy, as these distributions tend to be too discrete in the extremes.

We therefore propose a permutation test (DavisonBootstrap, Chapter 4) for direct causality between two observed variables, measuring the asymmetry in their direct causal relationship. Suppose we have a sample

from a LSCM and wish to test the null hypothesis of no direct causal relationship between

and , , versus the alternative that causes , . Our proposed procedure is as follows:

  1. Rescale values (, ), where known confounders can be used in the distribution estimator , as for .

  2. For , obtain and by randomly permuting the indices for each pair ().

  3. Compute on and on .

  4. Obtain the Monte Carlo

    -value, by comparing the value of the test statistic on the original rescaled data with the permutation distribution,

If there are no asymmetric confounding effects on the two variables, then under , whereas under ; see Theorem 2.4. In the first case, the direct causal relationship is symmetric, i.e., is as likely to take extreme values when is extreme as is when is extreme. If so, then permutations such as those performed in step 2. are equally likely, have a common distribution centered around zero, and

will be uniformly distributed. Under the alternative, the direct causal relationship is “asymmetric”, as

is more likely to be extreme when is extreme than conversely; then is more likely to lie in the upper tail of . Thus the distribution of

will become increasingly skewed towards zero as the causal strength of

on increases.

If all asymmetric confounding effects are captured in , and have comparable tails and causal effects behave linearly in the extremes, then the proposed procedure should provide a reliable -value for testing direct causality of on .

5.2 Simulations

We used simulation from different data distributions and for different causal configurations involving and a potential confounder to assess our proposed test. We used values of for the causal strength of on , with confounding effects both present and absent. Symmetric () and asymmetric ( and , or and ) confounding effects were considered, and the noise variable were Pareto, Student and log-normal. We generated replicate samples of independent triples for each causal configuration and noise distribution. Three versions of the permutation test were performed for each sample, corresponding to the causal tail coefficient estimators discussed in Sections 2 and 3: non-parametric (3) and -conditional LGPD (6) with either post-fit correction or constrained fit. Each used permutations and the estimator hyper-parameters were set to and .

Figure 6 shows uniform QQ-plots of for the Pareto and Student distributions, in the case of heavier confounder tail, with symmetric effects. In the absence of confounding the test behaves as expected in both cases, and adding dependence on the independent variable in the modelling through the parametric estimators has no visible effect on the distribution of compared to the non-parametric approach. For the Pareto distribution, the test has power of almost for a direct causal strength of , and it behaves perfectly for higher causal strengths. For the Student distribution, the test reaches a power of for a direct causal strength of , of for causal strength of and of near 1.0 for a causal strength of .

Figure 6: Uniform QQ-plots of Monte Carlo -values , with Kolmogorov–Smirnov confidence bands for different causal strengths (colors), the three estimators (columns) and optional symmetric confounding effects, (rows). Top six panels: distributed and , and distributed . Bottom six panels: distributed and , and distributed ,

When the confounding effects are added, the test based on the non-parametric estimator fails for the Pareto distribution, as most of the then lie outside the confidence bands, indicating that the distribution of is highly non-uniform. This is corrected when the value of the confounder is taken into account using the parametric approaches, and power , for a direct causal strength of only one twentieth of the confounder’s marginal effects. In the Student case, seems to be close to uniformity in the absence of direct causality (the difference in tail shape is much greater in the Pareto case), but post-fit correction increases the power from below to above for a direct causal strength of one fifth of the confounder’s marginal effects. Similar conclusions to those of Section 4.2 about the constrained fit for distributions with both tails heavy apply, as the constrained fit estimator is not significantly better than the non-parametric estimator compared to post-fit correction.

Figure 7 shows the uniform QQ-plots with asymmetric confounding effects for the Pareto distribution with comparable tails.

Figure 7: QQ-plot of the estimates against the standard uniform distribution, with Kolmogorov–Smirnov confidence bands, for distributed and and , for different causal strengths (colors), the three estimators (columns) and optional asymmetric confounding effects, , (rows).

Unlike in the corresponding symmetric case, the test here fails when using the non-parametric estimator owing to the asymmetry induced by the confounder, but both parametric approaches remove this unwanted effect by enough that nearly has a uniform distribution, with almost perfect power, for a causal strength of one sixteenth and one twentieth of the marginal confounding effects.

6 Application to Swiss rivers

We now illustrate how our method can discover direct causal relationships between the discharge extremes of pairs of river stations. This illustrates our method on a real example for which we know the ‘ground truth’ of extremal causality, but unlike in the simulations of Section 4, we cannot control and do not know the true tail behaviour of the stations discharges and their potential confounders.

6.1 Data sources and additional collection

We use the average daily discharges of the Swiss gauging stations shown in Figure 1, and add daily precipitation data from meteorological stations. Some additional information, such as the stations’ elevation, their catchment surface area and mean elevation, their glaciation percent and their coordinates, was collected from the Federal Office for the Environment’s website. To reduce any seasonal effects due to unobserved confounders, we only consider data during June, July and August, as the more extreme observations happen during this period when mountain rivers are less likely to be frozen.

Figure 8 shows relationships between the estimates, station altitudes and average discharges. Although altitude does not greatly affect the estimates, the shape parameter estimates broadly decrease with increased average river discharge volume.

6.2 Choice of stations and comonotonicity

For the causal analysis, we consider pairs of stations with known direct causal relationships, and pairs with no direct causal relationship. Causal pairs are ordered by the flow of water, with one downstream of the other. The river volumes for the pairs should be as similar as possible, as our exploratory analysis indicated different tail behaviours for rivers with very different average discharges. There should also be enough confluences between the two stations, otherwise one would observe comonotonicity, i.e., almost perfect dependence, between their discharges. If there is comonotonicity between and , then , for all , and it is impossible to know which variable causes which based on the data alone, even if one is certain of direct causality. Confluences between the two stations reduce comonotonicity and make it possible to detect the direction of causality.

As we shall use precipitation as the confounding covariate, the stations must share likely meteorological effects and must lie in regions where precipitation data is available. Based on these criteria, we chose causal station pairs for the analysis: , , , , , , .

Figure 8:

Relation between shape parameter estimates, scale parameter estimates (log scale), station elevation and average discharge (log scale), with standard errors (

) shown as error bars.

The non-causal station pairs were selected to have similar average volume and similar shape parameter estimates. Both pairs with stations separated by long distances and pairs relatively close to each other were considered. The pairs selected are , , , , , , , , , , , , .

The choice of covariate for the causal pairs was the mean daily precipitation among the meteorological stations in the area and the catchment of the two stations. The choice of covariate was less meaningful for the non-causal pairs with large separating distances, as they have different meteorological conditions, so the average daily precipitation over the whole country was used. For the pair , which has the closest stations and local precipitation data available, the daily average in the local catchments was also considered. In the latter case, the pair will be highlighted with an asterisk to avoid confusion.

6.3 Causal Analysis Results

For each station pair, the permutation test for direct causality was performed using the non-parametric (3) and -conditional LGPD (6) estimators with post-fit correction or constraints, with permutations and estimator hyper-parameters and . Table 6.3 shows the values of , the covariate shape estimate and its estimated extremal linear effects for the two stations, the latter estimated without constraints. The number of common observations for the pairs varies from to , and lies between and . With precipitation covariates added, the number of common observations ranges from to , and lies between and .

Table 2: Permutation -values for station pairs using the non-parametric approach (NP), the -conditional post-fit corrected (PFC) and constrained fit (CF) LGPD approaches, and an -conditional exponential inverse-link GPD approach (Exp). The shape estimate for the precipitation covariate and the unconstrained scale slope estimates are also shown (with standard errors of at most for the former and in parentheses for the latter).