# Statistical inference for heavy tailed series with extremal independence

We consider stationary time series {X_j, j ∈ Z} whose finite dimensional distributions are regularly varying with extremal independence. We assume that for each h ≥ 1, conditionally on X_0 to exceed a threshold tending to infinity, the conditional distribution of X_h suitably normalized converges weakly to a non degenerate distribution. We consider in this paper the estimation of the normalization and of the limiting distribution.

## Authors

• 1 publication
• 4 publications
• 2 publications
06/09/2021

### Some variations on the extremal index

We re-consider Leadbetter's extremal index for stationary sequences. It ...
06/21/2021

### Spliced Binned-Pareto Distribution for Robust Modeling of Heavy-tailed Time Series

This work proposes a novel method to robustly and accurately model time ...
05/17/2021

### A Distribution Free Conditional Independence Test with Applications to Causal Discovery

This paper is concerned with test of the conditional independence. We fi...
06/24/2021

### On the asymptotic distribution of the maximum sample spectral coherence of Gaussian time series in the high dimensional regime

We investigate the asymptotic distribution of the maximum of a frequency...
03/30/2019

### Asymptotic nonparametric statistical analysis of stationary time series

Stationarity is a very general, qualitative assumption, that can be asse...
03/10/2019

### Extreme events of higher-order Markov chains: hidden tail chains and extremal Yule-Walker equations

We derive some key extremal features for kth order Markov chains, which ...
06/15/2021

### Fluctuations of water quality time series in rivers follow superstatistics

Superstatistics is a general method from nonequilibrium statistical phys...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Let be a strictly stationary univariate time series. We say that the time series is regularly varying if all its finite dimensional distributions are regularly varying, i.e. for each , there exists a nonzero boundedly finite measure on infinity, such that

 P(x−1(X0,…,Xh)∈⋅)P(|X0|>x)\tiny\rm v⟶νh, (1.1)

on , as , where means vague convergence. Following [Kal17], we say that a measure defined on a complete separable metric space  (endowed with its Borel -field) is boundedly finite if for all Borel bounded sets and a sequence of boundedly finite measures is said to converge vaguely to a measure  if for all continuous functions with bounded support. See also [HL06] who use the terminology of -convergence. Here the metric space considered is endowed with the metric

 dh(x,y)=|x−y|∨||x|−1−|y|−1|,

where is an arbitrary norm on . This metric induces the usual topology and makes a complete separable space and bounded sets are sets separated from zero. Moreover, is still locally compact so this definition essentially yields the same notion as the classical vague convergence without the need for compactification at infinity.

This assumption implies that there exists such that the measure  is homogeneous of degree and the marginal distribution of is regularly varying and satisfies the balanced tail condition: there exists such that

 p=limx→∞P(X0>x)P(|X0|>x)=1−limx→∞P(X0<−x)P(|X0|>x).

Without loss of generality, we assume that .

If , there exist two fundamentally different cases: either the exponent measure is concentrated on the axes or it is not. The former case is referred to as extremal independence and the latter as extremal dependence. In other words, extremal independence means that no two components can be extremely large at the same time, and extremal dependence means that some pairs components can be simultaneously extremely large.

In a time series context, we may want to assess the influence of an extreme event at time zero on future observations. If the finite dimensional distributions of the time series model under consideration are extremally independent or more generally if the vector

is extremally independent for some , then, for any Borel set which is bounded away from zero in and ,

 limx→∞P(X0>xy0,(Xm…,Xh)∈xA)P(|X0|>x)=0. (1.2)

Thus in case of extremal independence the exponent measure provides no information on (most) extreme events occurring after an extreme event at time 0.

In order to obtain a non degenerate limit in (1.2) and a finer analysis of the sequence of extreme values, it is necessary to change the normalization in (1.1), and possibly the space on which we will assume that vague convergence holds. One idea is to find a sequence of normalizations , such that for each , the conditional distribution of given has a non degenerate limit. Pursuing in the direction opened by [HR07] and [DR11], [LRR14] and [KS15] we will consider vague convergence on the set endowed with the metric on defined by

 d0h(x,y)=|x−y|∨|x−10−y−10|.

The bounded sets for this metric are those sets such that implies for some . Note that under the present definition of vague convergence, we avoid the pitfalls described in [DJe17].

###### Assumption 1.1.

There exist scaling functions , and nonzero measures , , boundedly finite on , , such that

 1P(|X0|>x)P((X0x,X1b1(x),⋯,Xhbh(x))∈⋅)\tiny\rm v⟶μ0,h, (1.3)

on and for every , the measures and on

are not concentrated on a hyperplane.

This assumption does not exclude regularly varying time series with extremal dependence for which for all . But our interest will be in extremally independent time series for which for all

. This assumption is fulfilled by many time series, like stochastic volatility models with heavy tailed noise or heavy tailed volatility, exponential moving averages and certain Markov chains with regularly varying initial distribution and appropriate conditions on the transition kernel. See

[KS15], [MR13] and [JD16].

An important consequence of Assumption 1.1 is that the functions , are regularly varying (see [HR07, Proposition 1] and [KS15].) To put emphasis on the regular variation of the functions , we recall the following definition of [KS15].

###### Definition 1.2 (Conditional scaling exponent).

Under Assumption 1.1, for , we call the index of regular variation of the functions the (lag ) conditional scaling exponent.

The exponents , reflect the influence of an extreme event at time zero on future lags. Even though we expect this influence to decrease with the lag in the case of extremal independence, these exponents are not necessarily monotone decreasing. The measures also have some important homogeneity properties: For all Borel sets , ,

 μ0,h(tA0×h∏i=1tκiAi)=t−αμ0,h(h∏i=0Ai). (1.4)

Equivalently, for all bounded measurable functions ,

 ∫(R∖{0})×Rhf(t−1x0,t−κ1x1,…,t−κhxh)μ0,h(dx)=t−α∫(R∖{0})×Rhf(x)μ0,h(dx). (1.5)

Cf. [HR07, Proposition 1] and [KS15, Lemma 2.1]

. Define the probability measure

on by

 σh({ϵ}×A)=∫ϵu0>1∫Aμ0,h(du0,uκ10du1,…,uκh0duh),

for and a Borel subset of . Let be an valued random vector with distribution . Then, for every Borel subsets , we have

 (1.6)

See [KS15, Section 2.4]. Let

be a Pareto random variable with tail index

, independent of . Then, as ,

 L((X0x,X1b1(x),…,Xhbh(x))∣|X0|>x)\tiny\rm d⟶Y0Wh. (1.7)

In particular, we define for the distribution function on :

 Ψh(y) =P(Y0Wh≤y)=limx→∞P(Xh≤bh(x)y∣|X0|>x), (1.8)

for all since the distribution of is continuous at all points except possibly 0.

The goal of this paper is to complement the investigation of this assumption started in [KS15] by providing valid statistical procedures to estimate the conditional scaling functions , the conditional limiting distributions and scaling exponents .

## 2 Statistical inference

Let be a distribution of . All our results we be proved under the following -mixing assumptions.

###### Assumption 2.1.
1. [(A1)]

2. The sequence is -mixing with rate .

3. There exist a non decreasing sequence , non decreasing sequences of integers and such that

 limn→∞ln =limn→∞rn=limn→∞un=limn→∞rnln=∞, (2.1) limn→∞nrnβln =0, (2.2) limn→∞un =n¯F0(un)=∞,  limn→∞rn¯F0(un)=0. (2.3)

### 2.1 Non parametric estimation of the limiting conditional distribution

In order to define an estimator of , we must first consider the infeasible statistic

 ˜Ih,n(s,y) =1n¯¯¯¯¯¯F0(un)n−h∑j=11{|Xj|>uns,Xj+h≤bh(un)y}. (2.4)

Then, Assumption 1.1 and the homogeneity property (1.5) imply that for all and ,

 limn→∞E[˜Ih,n(s,y)] =limn→∞n−hnP(|X0|>uns,Xh≤b(un)y)¯¯¯¯F0(un) =μ0,h((s,∞)×Rh−1×(−∞,y])=s−αΨh(s−κhy).

We consider weak convergence of the processes and defined on by

 ˜Ih,n(s,y) =√n¯¯¯¯¯¯F0(un){˜Ih,n(s,y)−E[˜Ih,n(s,y)]}, Ih,n(s,y) =√n¯¯¯¯¯¯F0(un){˜Ih,n(s,y)−s−αΨh(s−κhy)}.
###### Assumption 2.2.

For all ,

 limℓ→∞limsupn→∞1¯¯¯¯F0(un)∑ℓ<|j|≤rnP(|X0|>uns,|Xj|>unt)=0. (2.5)
###### Assumption 2.3.

There exists such that

 limn→∞√n¯¯¯¯F0(un)sups≥s0,y∈R∣∣ ∣∣P(|X0|>uns,Xh≤b(un)y)¯¯¯¯F0(un)−s−αΨh(s−κhy)∣∣ ∣∣=0. (2.6)
###### Remark 2.4.

An assumptions similar to (2.5

) is unavoidable. Its purpose is to prove the convergence of the intrablock variance in the blocking method and tightness. The present one is taken from

[KSW15]. Similar ones have been considered in [Roo09], [DR10] and [DSW15]. Some of these conditions have been checked directly for extremally dependent time series like GARCH(1,1) or ARMA models (see e.g. [Dre02]), or for Markov chains that satisfy a drift condition (cf. [KSW15]). This assumption will be checked in Section 3 for some specific models. Assumption 2.3 is unavoidable if one wants to remove bias. This will not be discussed in the paper. The condition holds for some sequences .

Let be a the Gaussian process on with covariance , , . We note that

 W(u)=Ih(1,Ψ−1h(u)),  u∈(0,1),

is a standard Brownian motion on . The following theorem establishes weak convergence of the tail empirical process and forms the basis for statistical inference on . Its proof is given in Section 6.2.

###### Theorem 2.5.

Let be a strictly stationary regularly varying sequence such that Assumption 1.1 with extremal independence at all lags. Assume moreover that Assumptions 2.2 and 2.1 hold and that the function is continuous on . Then the process converges weakly in to . If moreover Assumption 2.3 holds, then converges weakly in to .

We now need proxies to replace and which are unknown in order to obtain a feasible statistical procedure. As usual, will be replaced by an order statistic. To estimate the scaling functions we will exploit their representations in terms of conditional mean. Therefore, we need additional conditions.

###### Assumption 2.6.

There exists and such that

 limn→∞rn{n¯¯¯¯F0(un)}−δ/2=0, (2.7)
 supn≥11¯¯¯¯F0(un)E[∣∣∣Xhbh(un)∣∣∣2+δ1{{X0>s0un}}]<∞, (2.8) limℓ→∞limsupn→∞1¯¯¯¯F0(un)∑ℓ<|j|≤rnE[|Xh|bh(un)|Xj+h|bh(un)1{{X0>s0un}}1{{Xj>s0un}}]=0, (2.9) limn→∞sups≥s0√n¯¯¯¯F0(un)∣∣ ∣ ∣∣E[|Xh|bh(un)1{|X0|>uns}]¯¯¯¯F0(un)−E[|Wh|]sκh−α∣∣ ∣ ∣∣=0. (2.10)

Condition (2.8) requires and implies that the sequence is uniformly integrable conditionally on and therefore,

 limx→∞E[b−ih(x)|Xh|i∣X0>x]=∫∞−∞|y|iΨh(dy)=E[Yi0]E[|Wh|i]<∞,  i=1,2. (2.11)

Since the function and the limiting distribution are defined up to a scaling constant, we can and will assume without loss of generality that

 E[|Wh|]=∫∞−∞|y|Ψh(dy)=∫∞1∫∞−∞|xh|μ0,h(x).

Condition (2.9) is again unavoidable and must be checked for specific models. Condition (2.10) is a bias condition which will not be further discussed.

Set and let be the order statistics of . Define an estimator of by

 ˆbh,n=1kn−h∑j=1|Xj+h|1{{|Xj|>|X|(n:n−k)}}. (2.12)
###### Corollary 2.7.

Let the assumptions of Theorem 2.5 and Assumption 2.6 hold with extremal independence at all lags. Then

 (Ih,n,√k(|X|(n:n−k)un−1),√k(ˆbh,nbh(un)−1))\tiny\rm w⟶(Ih,α−1W(1),∫10|Ψ−1h(u)|dB(u)+α−1κhW(1)),

where is a standard Brownian motion and is a standard Brownian bridge on .

###### Remark 2.8.

The moment conditions in

Assumption 2.6 may seem to be too restrictive. In fact, we can consider a family of estimators , where in (2.12) is replaced with with some . However, we do not pursue it in this paper.

Define now the following estimator of :

 ˆΨh,n(y)=1kn−h∑j=11{{Xj>X(n:n−k)}}1{{Xj+h≤ˆbh,ny}}=˜Ih,n(X(n:n−k)un,ˆbh,nbh(un)y). (2.13)

The theory for this estimator is easily obtained by applying Corollary 2.7 and the -method.

###### Corollary 2.9.

Under the assumptions of Corollary 2.7 and if the function is differentiable, in , where the process is defined by

 Λh(y) =B(Ψh(y))+yΨ′h(y)∫10|Ψ−1h(u)|dB(u), (2.14)

where is the standard Brownian bridge.

###### Remark 2.10.

The additional term in the limiting distribution is due to the method of estimation of the conditional scaling function. Note that the limiting distribution depends only on and therefore can be used for a Kolmogorov-Smirnov type goodness of fit test of the conditional distribution.

### 2.2 Estimation of the conditional scaling exponent

We now consider the estimation of the scaling exponent . We will use the following result.

###### Lemma 2.11.

Let Assumption 1.1 hold and assume moreover that

 limϵ→0limsupx→∞P(|X0Xh|>xbh(x),|X0|≤ϵx)P(|X0|>x)=0. (2.15)

Then and

 limx→∞P(|X0Xh|>xbh(x)y)P(|X0|>x)=E[|Wh|α1+κh]y−α1+κh. (2.16)

This is [KS15, Proposition 2], where the finiteness of is assumed, but it is easily seen that this is actually a consequence of (2.15). At this moment this is all we need to state our results but we will need to prove in Section 6.1 a generalized version of Lemma 2.11; see Lemma 6.4. It must be noted that Condition (2.15) does not hold for an i.i.d. sequence. See also Section 3.1.

If (2.15) holds, then the product has tail index . Hence, we can suggest the following estimation procedure of the scaling exponent .

• Let , where is the tail index of the sequence . Estimate using the Hill estimator based on an intermediate sequence , i.e.

 ^γ=1kn∑j=1log(|X|(n:n−j+1)/|X|(n:n−k)).
• Let be estimated by , the Hill estimator of the tail index of , based on the sequence , (assuming without loss of generality that we have observations) and on the same intermediate sequence:

 ^γh=1kn∑j=1log(V(n:n−j+1)/V(n:n−k)).
• Estimate by

 ˆκh=ˆγh/ˆγ−1. (2.17)

Asymptotic normality of the Hill estimator for beta-mixing sequences is well known. See e.g. [Dre00, Dre02]. The asymptotic normality of will follow from the delta method. To state the result, we need additional anti-clustering and second-order conditions.

###### Assumption 2.12.

For all ,

 limℓ→∞limsupn→∞1¯¯¯¯F0(un)∑ℓ<|j|≤rnP(|X0Xh|>unbh(un)s,|XjXj+h|>unbh(un)t)=0. (2.18)

Furthermore,

 limℓ→∞limsupn→∞ 1¯¯¯¯F0(un)rn∑j=ℓE[log+(|X0|/un)log+(|Xj|/un)]=0, (2.19) limℓ→∞limsupn→∞ 1¯¯¯¯F0(un)rn∑j=ℓE[log+(|X0Xh|/(unb(un)))log+(|XjXj+h|/(unb(un)))]=0. (2.20)
###### Theorem 2.13.

Let be a strictly stationary regularly varying sequence such that Assumption 1.1 holds with independence at all lags. Assume moreover that Assumptions 2.3, 2.12, 2.2 and 2.1 and the bound (2.15) hold and that is chosen in such a way that

 limn→∞√ksups≥s0∣∣ ∣∣P(|X0Xh|>unbh(un)s)¯¯¯¯F0(un)−s−α/(1+κh)∣∣ ∣∣=0 (2.21)

for some . Then

 √k(ˆκh−κh)\tiny\rm d⟶N(0,(1+κh)E[||Wh|α1+κh−1|]).

## 3 Examples

### 3.1 Stochastic volatility process

Consider the sequence , , where is a Gaussian process independent of the i.i.d. sequence , regularly varying with index . For simplicity we assume that the random variables are nonnegative. We list the properties of (see [DM01][KS11][KS15]).

1. [(i),wide=0pt]

2. The sequence is regularly varying with extremal independence. It satisfies Assumption 1.1 with for all .

3. By Breiman’s lemma, as .

4. By [Bra05, Theorem 5.2a),c)], if the spectral density of the Gaussian sequence is bounded away from zero and if with then ;

5. Conditioning on the sequence , the equivalence between the tails of and and Potter’s bounds yield for ,

 1¯¯¯¯¯¯F0(un) ∑ℓ<|j|≤rnP(X0>uns,Xj>uns) =¯¯¯¯F2ε(un)¯¯¯¯¯¯F0(un)∑ℓ<|j|≤rnE[P(ε0>unsexp(−Y0)∣Y)P(ε0>un)P(ε0>unsexp(−Yj)∣Y)P(ε0>un)] =O(¯¯¯¯Fε(un))∑ℓ<|j|≤rnE[exp((α+δ)(Y0+Yj))∨1]=O(rn¯¯¯¯¯¯F0(un))=o(1),

as if (2.3) holds.

6. Fix . We again condition on the sequence and apply Potter’s bounds:

 1¯¯¯¯¯¯F0(un) ∑ℓ<|j|≤rnE[|XhXj+h|1{{X0>uns}}1{{Xj>uns}}] =¯¯¯¯F2ε(un)¯¯¯¯¯¯F0(un)(E[|ε0|])2 =O(¯¯¯¯Fε(un))∑ℓ<|j|≤rnE[exp((Yh+Yj+h)){exp((α+δ)(Y0+Yj))∨1}] =O(rn¯¯¯¯Fε(un))=o(1),

whenever (2.3) holds and .

In summary, the results in Section 2.1 are applicable to the stochastic volatility model.

On the other hand, condition (2.15) does not hold and hence the method of estimating the conditional scaling exponent is not applicable here (note however that the exponent itself is zero).

### 3.2 Markov chains

As in [KSW15], assume that is a function of a stationary Markov chain , defined on a probability space , with values in a measurable space . That is, there exists a measurable real valued function such that . Assume moreover that:

###### Assumption 3.1.
1. [(i),wide=0pt]

2. The Markov chain is strictly stationary under .

3. The sequence is regularly varying with tail index .

4. The sequence satisfies Assumption 1.1.

5. There exist a measurable function , , and such that for all ,

 E[V(Y1)∣Y0=y]≤γV(y)+b. (3.1)
6. There exist an integer and for all , there exists a probability measure on and such that, for all and all measurable sets ,

 P(Ym∈B∣Y0=y)≥ϵν(B).
7. There exist and a constant such that

 |g|q0≤cV.
8. For every ,

 limsupn→∞1bq0(un)¯¯¯¯F(un)E[V(Y0)1{{g(Y0)>uns}}]<∞, (3.2)

where .

In [KSW15] we showed that the above assumptions (without (iii) and with in (3.2)) imply that is -mixing with geometric rates and the conditions (2.2), (2.5) and (2.8)-(2.9) are satisfied. Following the calculations in [KSW15] we can argue that (2.8)-(2.9) hold with . Therefore, we conclude the following result.

###### Corollary 3.2.

Assume that Assumption 3.1 holds. Assume moreover that the conditions (2.1), (2.3), (2.6) are satisfied. Then the conclusion of Theorem 2.5 holds. If also (2.10) is satisfied, then the of Corollary 2.7 holds. If moreover is differentiable, then the conclusion of Corollary 2.9 holds.

###### Example 3.3 (Exponential AR(1)).

Consider , , where and . Then the stationary solution has a regularly varying right tail and is tail equivalent to , cf. [MR13][KS15]. If , then . Hence, the drift function is . Condition (3.2) holds with .

## 4 Simulations

We simulated from Exponential AR(1) model , , where , and

are i.i.d. with exponential distribution and the parameter

. Hence, , , .

On Figure 1 we plot estimates of the tail index of

using the Hill estimator along with the confidence intervals:

 ˆαk±1.961√kˆαk,  k=10,…,500,

where is the reciprocal of the Hill estimator based on order statistics. On the same graph we plot the estimates of the tail index for products, along with the confidence intervals (left panel). On the right panels we display estimates of the scaling exponent along with the confidence interval:

 ˆκ1(k)±1.961√k(1+ˆκ1(k))×√E[|Wα/(1+κ)1−1|],

where indicates that the estimator of the scaling exponent is based on order statistics. The factor