DeepAI

# Latent Model Extreme Value Index Estimation

We propose a novel strategy for multivariate extreme value index estimation. In applications such as finance, volatility and risk present in the components of a multivariate time series are often driven by the same underlying factors, such as the subprime crisis in the US. To estimate the latent risk, we apply a two-stage procedure. First, a set of independent latent series is estimated using a method of latent variable analysis. Then, univariate risk measures are estimated individually for the latent series to assess their contribution to the overall risk. As our main theoretical contribution, we derive conditions under which the effect of the first step to the asymptotic behavior of the risk estimators is negligible. Simulations demonstrate the theory under both i.i.d. and dependent data, and an application into financial data illustrates the usefulness of the method in extracting joint sources of risk in practice.

• 16 publications
• 5 publications
• 10 publications
• 9 publications
03/09/2021

### Portfolio risk allocation through Shapley value

We argue that using the Shapley value of cooperative game theory as the ...
11/27/2018

### Extracting conditionally heteroscedastic components using ICA

In the independent component model, the multivariate data is assumed to ...
04/20/2021

### GARCH-UGH: A bias-reduced approach for dynamic extreme Value-at-Risk estimation in financial time series

The Value-at-Risk (VaR) is a widely used instrument in financial risk ma...
12/06/2021

### Stable sums to infer high return levels of multivariate rainfall time series

We introduce the stable sums method for inference of extreme return leve...
04/22/2021

### Dynamic investment portfolio optimization using a Multivariate Merton Model with Correlated Jump Risk

In this paper, we are concerned with the optimization of a dynamic inves...
02/12/2018

### Latent variable time-varying network inference

In many applications of finance, biology and sociology, complex systems ...
08/13/2020

### Risk Measures Estimation Under Wasserstein Barycenter

Randomness in financial markets requires modern and robust multivariate ...

## 1 Introduction

Let be a sample of

-variate random vectors with possibly dependent distributions. For each observation, we assume the instantaneous latent variable model,

 xi=f(zi),i=1,…,n, (1)

where the latent -variate random vectors are assumed to have independent components in the sense that the th component of is independent of the th component of , for all and . Furthermore, we assume is a deterministic function that is smooth and bijective. Note that while no explicit noise term is present in (1), the general formulation still captures noisy latent models as well, as one or several of the latent components can represent noise which is then combined with the other components (signals) by the function in a desired manner (additively, multiplicatively etc.).

The model (1) can be considered as a very general form of independent component analysis and has applications in numerous fields such as in telecommunications, psychometrics, economics and finance (Comon and Jutten, 2010; Hyvärinen and Oja, 2000). The model provides a powerful alternative to standard multivariate modelling schemes as, after having estimated the latent vectors, the independence of their components implies that all subsequent modeling can be done univariately. This structural simplification leads to both smaller number of parameters to estimate and simplified interpretations for the components as no interactions between the series need to be acknowledged.

In this paper we focus on estimating the tail behaviour of the latent variables in the model (1), evaluated through the extreme value indices of the corresponding distributions (De Haan and Ferreira, 2007). This is a natural goal to pursue in many financial and signal processing applications as the heaviness of the tails of a distribution is an indicator of an unstable and risky signal. For example, the independent component model has been applied in cashflow analysis and prediction of financial time series data (Kiviluoto and Oja, 1998; Lu et al., 2009; Yang and Yi, 2005), and in this context assessing the tail behaviour of the obtained independent components could help identify common sources of financial risk. Similarly, the evaluation of the extreme behaviour of latent components could help identify the sources of abnormalities in applications such as biomedical imaging (Roberts, 2000) or maritime vessel track analysis (Smith et al., 2012).

This objective can be reached in two steps. First, we estimate a mapping such that equals the latent components up to order and scales (in latent component analysis, the order and scales of the latent components are usually neither of interest nor identifiable, see, e.g., Tong et al. (1991)). Especially under linear , numerous techniques for obtaining consistent estimators under various types of data exist, see Section 4 for examples. Second, after having obtained the sample estimates of the latent vectors, we use one of the several univariate extreme value index estimators presented in the literature (De Haan and Ferreira, 2007) to assess the extreme behavior of the individual, now independent, components.

Note that, in contrast to the above, the standard approach in multivariate extreme value theory is to assess the extreme behaviour component-wise for the observed multivariate signal itself (De Haan and Ferreira, 2007). Approaches which in some way acknowledge the multivariate structure of the data have been proposed only recently, and they include considering convex combinations of the component-wise estimators (Dematteo and Clémençon, 2016; Kim and Lee, 2017), extreme risk region estimation (Cai et al., 2011), and estimating the extreme value index of the generating variate of an underlying elliptical model (Dominicy et al., 2017; Heikkilä et al., 2019). However, these methods either involve complicated estimation or require strict distributional assumptions, making them less than ideal in practice. In comparison, our proposed two-step procedure is straightforward to apply and takes the multivariate form of the data into account in a natural way. Moreover, the associated latent variable model is flexible, allowing different tail behaviors for the underlying independent components. The only structural assumption we make is that the observed variables are generated by a set of independent factors.

### 1.1 Scope and structure of the paper

Of the two steps of our proposed method, we are primarily interested in the latter. That is, we focus on assessing the extreme behavior of the individual components in the independent component model (1). Throughout the article, we assume that there exists an estimator with the asymptotic linearization

 ^zi:=^f−1(xi)=zi+^Hzi+^r, (2)

where the -matrix and the -vector for some rate . Here

denotes the element-wise “convergence rate” in probability. For a precise definition, see Section

3. The form (2) is very general and encompasses many popular estimators in the independent component analysis and blind source separation literature, see Section 4 for examples. Assuming for now that an estimator exists in the sense of (2), our main objective is to estimate the extreme value indices of the components of the latent variables using as a proxy for , and to show that this approximation incurs no loss in asymptotic efficiency under a suitable set of assumptions.

A further complicating factor is that latent variable models such as (1) are well-known for not having fixed signs or scales for the latent components. That is, the vector on the right-hand side of (2) actually corresponds in many models to the true latent vectors only up to the signs and scales of its components. In the standard usage of latent variable modelling this is most often acceptable, as our interest lies commonly not in the signs, but in the shapes of the distributions of the latent variables. Similarly, in the present context, the scale of the components is irrelevant as most commonly applied extreme value index estimators are scale-invariant. However, as risk is estimated from the tails of the components, knowing in which of the tails we are in is for our purposes of paramount importance, and we need a way of identifying the correct tail. A simple, but restrictive, solution would be to require that all the latent components have symmetric distributions. Instead, we choose to assess the extreme behaviour of, not the latent components, but their absolute values. This rids us of the sign indeterminacy by “stacking” the two tails on top of each other. Since the absolute value inherits its tail behaviour from the heavier of the two tails, this approach has the interpretation of us always looking at the heavier of the two tails. Moreover, as heavier tails correspond to larger risk, the use of absolute values can be seen as a conservative approach to tail behaviour estimation.

The rest of the paper is organized as follows: Preliminaries on extreme value theory along with the popular extreme value index estimators, the Hill estimator and the moment estimator, are reviewed in Section 2. These extreme value index estimators are known to be consistent and asymptotically normal under mild technical conditions. In Section 3, we derive sufficient conditions ensuring that the asymptotic properties of the extreme value estimators are preserved when estimated using the proxy sample . In Section 4, we consider two example cases of the general framework and discuss the particular assumptions needed to achieve the limiting results for the corresponding proxy samples. In Section 5, we present a large simulation study and a real data application is considered in Section 6. All the proofs are postponed to the supplementary appendix, along with a supplementary simulation study and additional details concerning the real data example.

## 2 Preliminaries on extreme value theory

In the following we provide a brief introduction to the topics in univariate extreme value theory that are most relevant to our objectives. See De Haan and Ferreira (2007) and the references therein for more information.

Consider an i.i.d. random sample from a univariate distribution and the sample maximum . If there exists sequences of constants and such that has a limiting distribution , we say that is the extreme value distribution of . One of the fundamental results in extreme value theory is the Fisher-Tippett-Gnedenko theorem which identifies the class of distributions .

###### Theorem 1 (Fisher-Tippett-Gnedenko).

The class of extreme value distributions is with and , where

 Gγ(x)=exp(−(1+γx)−1/γ),1+γx>0,

with and where for the right-hand side is interpreted as .

According to Theorem 1, the family of possible extreme value distributions has a remarkably simple form, parametrized by a single real number . If is the extreme value distribution of , the distribution is said to be in the domain of attraction of , and we write . The parameter is said to be the extreme value index of . The parameter measures the thickness of the (right) tail of and knowing its value leads to a complete characterization of the asymptotic tail behavior of , allowing extrapolating probabilities beyond the observed dataset. Thus is a key ingredient in risk assessment.

It is widely accepted that distributions are divided into heavy and light tailed ones based on the sign of . More precisely, for , the distributions are called heavy tailed and belonging to the domain of attraction of the Frechet distribution. Similarly, if and , then we say that is light tailed and belongs to the domain of attraction of the Weibull distribution. Finally, if , then belongs to the domain of attraction of the Gumbel

distribution. This corresponds to the border case between light and heavy tails, and includes, e.g., the case of a normal distribution.

One of the most commonly applied classical estimators of the extreme value index, suitable for , is the Hill estimator introduced in Hill (1975),

 ^γH(y)=1knkn−1∑m=0log(y)(n−m,n)(y)(n−kn,n),

where are the order statistics of the sample y, and is a sequence of thresholds for the portion of observations that are considered to form the tail. Common choices for the threshold include, e.g., and .

Another well-known estimator, which in turn is valid for any value of , is the moment estimator introduced in Dekkers et al. (1989). As in the Hill estimator, set

 M(j)n(y)=1knkn−1∑m=0(log(y)(n−m,n)(y)(n−kn,n))j.

Here is given, and the Hill estimator corresponds to the choice . The moment estimator is based on the choices and is given by

 ^γM(y)=M(1)n(y)+1−12⎛⎝1−[M(1)n(y)]2M(2)n(y)⎞⎠−1.

In the next section, both the Hill estimator and the moment estimator are used to estimate the extreme value indices of the absolute values of the latent components in (1).

## 3 Extreme value index estimation for latent variables

Recall from Section 1 that we consider an estimated sample of the latent vectors satisfying

 ^zi=zi+^Hzi+^r, (3)

where the -matrix , and the -vector for some rate . Here, and throughout the paper, the notation

is used to denote that the family of random variables

is uniformly tight. Similarly, we use other Landau notation, such as to indicate convergence towards zero. With , we denote convergence in probability, and with , we indicate weak convergence, i.e., convergence in distribution.

Recall further, that the idea underlying the model (3) is that the vector is an estimate of obtained by solving some latent variable model. However, for the following results to hold, simply having the form (3) is sufficient, regardless of how it originated.

A common assumption in extreme value literature as well as in the latent variable literature is to assume that each component of the true non-observable signals is strictly stationary, and has a univariate marginal , i.e., each observation has marginal distribution , for all . One typical example is the case where observations are i.i.d., with components drawn from different distributions. Another typical example is the case where the components form different stationary series with marginals . However, while our main examples arise from stationary series falling into the above setting, our main results do not even require stationarity of the components (although it might be difficult to interpret the estimated extreme value index if the one dimensional marginals are not equal).

It is also customary in the field of extreme value theory to assume that marginals do not have point-mass at zero. In our case, this ensures that our logarithm-based estimators are well-defined. That is, in the general non-stationary case, we assume that, for each component , we have

 limδ→0infi≥1P(|zki|≥δ)=1. (4)

In the sequel, (4) is always assumed, even if it is not explicitly stated. Note that (4) is a natural assumption and not very restrictive. First of all, (4) implies that for all and . Moreover, in the case of equal marginals, (4) is equivalent to . In the general case, (4) excludes also the situations where the observations come from a sequence of distributions that approach a distribution having point mass at zero.

In the sequel, the notation refers to the sample of the absolute values of the th latent series and denotes the th largest element of .

Throughout the article, we make the following assumption.

###### Assumption 1.

For all , there exists deterministic sequences for which the th component of satisfies

 |zk|(n,n)−bnkank=Op(1).

We stress that Assumption 1 is very relaxed, and in the extreme value theory literature it is usually taken as granted, without explicitly stating it. Indeed, if the observations are independent with a distribution function , then Assumption 1 follows immediately whenever , i.e., is in the domain of attraction of some extreme value distribution . Thus, in the case of independent observations, discussing the extreme value index without Assumption 1 is not sensible. More generally, Assumption 1 follows immediately whenever converges towards some distribution. For example, Assumption 1 is trivially valid even in the totally degenerate case , for all .

The main contribution of this article is the derivation of sufficient conditions under which the asymptotic properties of the Hill and moment estimators are preserved under Model (3). Intuitively, one would expect that these asymptotic properties remain the same, provided that vanishes rapidly enough to compensate the growth of the sample maximum of the heaviest component. Theorem 2 and Theorem 3

below contain the precise statements of this heuristic argument. In the sequel, we use the notation

.

###### Theorem 2.

Let Assumption 1 hold and assume that,

 maxℓ{gnℓ}cn=o(1). (5)

Let be fixed and let and be arbitrary constants.

• If , then .

• If , and , then .

Note that in the above result it is not required that and are the correct extreme value indices — any constants suffice. Indeed, the part i) of Theorem 2 simply states that whenever the Hill estimator based on the ”true” latent signals converges towards some constant, then the Hill estimator based on the estimated latent signals converges towards the same constant. The reason behind our formulation is that usually, as is the case for independent observations, the Hill estimator converges towards , see De Haan and Ferreira (2007), pp. 101. In other words, the Hill estimator vanishes for distributions that are not heavy tailed. For such distributions, one can then apply the moment estimator. Part ii) of Theorem 2 says that whenever both, the Hill estimator and the moment estimator based on the true latent signals , converge towards any constants, then the Hill and the moment estimator based on the estimated latent signals converge towards the same constants. As in most cases the Hill estimator converges towards , and does not explode, part ii) of Theorem 2 implies that asymptotic properties of the moment estimator are inherited to the estimated model as well. The extra condition in part ii) concerns the case when the Hill estimator converges towards zero, , and ensures that this convergence is not too rapid in comparison to the growth of the heaviest tail. In many cases of interest, the convergence rate of the Hill estimator is . This leads to the same condition as in Theorem 3, and can be achieved by a suitable choice of . Finally, we stress that an examination of the proof of Theorem 2 reveals that the item ii) is valid as long as the Hill estimator does not tend to infinity. Thus one can safely apply the moment estimator for light tailed distributions under Model (3).

In order to gain better understanding on the behavior of the estimators, we next consider their limiting distributions.

###### Theorem 3.

Let Assumption 1 hold and assume that,

 √knmaxℓ{gnℓ}cn=o(1). (6)

Let be fixed and let , and be arbitrary constants.

• If then

 √kn(^γH(|^zk|)−CH) ⇝N(μH,σ2H).
• If , , and then

 √kn(^γM(|^zk|)−CM) ⇝N(μM,σ2M).

In the above result, the constants , and can be computed explicitly in most cases, their exact values depending on the so-called second order conditions. For details, we refer to De Haan and Ferreira (2007). We also remark that in our proof, we could easily replace the convergence rate with some other rate, or the limiting normal distribution with some other distribution. The underlying reason for the above formulation is that we are not aware of any asymptotic results for extreme value index estimators where the rate is other than or where the limiting distribution is not normal.

We end this section by discussing the strictness of the key conditions and . These conditions state that the convergence rate of the estimated latent sample to the true latent sample must be sufficiently fast compared both to , the square root of the tail threshold, and to , the heaviness of the heaviest of the latent components. Moreover, the rate can be seen as a type of a tuning parameter. Choosing a faster growing will make the Hill estimator converge more rapidly, but it will, at the same time, limit the range of distributions whose extreme value indices we can estimate in the first place, and vice versa.

To shed further light on these conditions, we consider an example. Assume that there exists at least one latent component belonging to the domain of attraction of the Fréchet distribution, i.e., for some (cf. Lemma 1 in the supplementary Appendix A). Now, letting for some and under the standard rate , we end up with the restriction . Hence, putting the tail threshold sufficiently small, we see that extreme value index estimation is feasible as long as the heaviest Fréchet component among the latent variables has its extreme value index smaller than

, that is, all latent components have finite variance. Heavier components, i.e., ones without second moments, can be captured through estimators which yield faster convergence rates

than the usual for the model estimation. Conversely, if the convergence rate is slower than the usual (see, e.g., Lietzén et al. (2020)), then might not be sufficient to compensate too heavy tails, and, e.g., assumptions on the existence of higher moments are required.

## 4 Example models

In this section, we illustrate the applicability of our main results by considering two popular example models: stationary independent component model and stationary second order source separation model. For simplicity, we only consider the Hill estimator, although the following analysis could be easily extended for the moment estimator as well (see Remark 1). Throughout, we assume that the heaviest component has index , often implying that see the examples below. As the rate is the best possible that one can usually expect, we also assume . This ensures the square integrability of all of our random variables, which is also a minimum requirement for (3) to hold for the standard estimation procedures in our example models.

Let now be fixed. We illustrate our results in cases where both Theorem 2 and Theorem 3 are applicable. Thus, in order to obtain limiting normality for the Hill estimator based on the true values , we impose a second order condition for the marginal distribution of . The distribution is called second order regularly varying (with index ) if there exists a positive or negative function with the property such that, for all ,

 limt→∞U(tx)U(t)−xγA(t)=xγxρ−1ρ, (7)

holds for some real number . Here the function is given by

 U=(11−F)←,

where denotes the left-continuous (pseudo-)inverse function. Then, in the case of independent observations, the limiting normality,

 √kn(^γH(|zk|)−γ)⇝N(λ1−ρ,σ2), (8)

holds, provided that . This leads to an upper bound on the rate at which can grow. Similarly, conditions of Theorems 2 and 3 give upper bounds for the rate at which can grow. Thus, we can obtain limiting normality (and consistency) by choosing a not-too-rapidly growing sequence , at the cost of a slower rate of convergence. For details on the limiting normality of the Hill estimator in the case of i.i.d. observations, see De Haan and Ferreira (2007), and in the case of stationary dependent observations, see De Haan et al. (2016) and the references therein.

### 4.1 Independent component model

In independent component analysis (ICA) the observed -vectors are assumed to be a random sample from the independent component (IC) model,

 x=Ωz+μ, (9)

where the latent -vector z has independent components, is invertible and is a location parameter (Hyvärinen and Oja, 2000) . The objective in ICA is find an unmixing matrix , such that has independent components. Standard theory then shows that if at most one of the ICs is Gaussian, any such solution coincides with z up to scaling, order and signs of the components. The scales can be fixed by second order standardisation of z. This guarantees that all the solutions are of the form where is a permutation matrix and is a sign-change matrix (diagonal matrix with diagonal entries equal to ). In our approach in assessing extreme behaviour, the sign ambiguity is of no concern, as we consider the absolute values of the source components. Moreover, the order of the components is irrelevant, if one is interested in modelling the tail index of the component with the highest risk.

Numerous estimators of the unmixing matrix have been proposed and under suitable assumptions and standardisations, most estimators converge to at some rate ,

 cn(^ΓΩ−% Ip)=Op(1). (10)

Typically, in the context of i.i.d. observations, we have . See Miettinen et al. (2015) for several examples including FastICA (Hyvärinen, 1999), fourth order blind identification (FOBI) (Cardoso, 1989), and joint approximate diagonalization of eigenmatrices (JADE) (Cardoso and Souloumiac, 1993). Also the ICA-estimators based on the simultaneous diagonalization of two symmetrized scatter matrices (Oja et al., 2006; Nordhausen et al., 2008) can be shown to have the rate , assuming that the applied symmetrized scatter matrices have the same convergence rate.

Assuming that is of the form (10), the estimated latent vectors can be written as

 ^zi=^Γ(xi−¯x)=^ΓΩ(z% i−¯z)=zi+(^ΓΩ−Ip)zi−^ΓΩ¯z.

Writing now and , we observe that we have arrived to the form (3). By the assumption that with , we observe that (5) is automatically valid, and (6) is valid for suitably chosen sequence . We stress that guarantees the existence of second moments, while usually standard ICA methods operate on higher-order information making even stronger moment assumptions. For example, the -consistency for FOBI requires the existence of finite eighth moments of the latent variables (Ilmonen et al., 2010). By using squared FastICA with the hyperbolic tangent (Miettinen et al., 2017) as the ICA-estimator, one can reduce the order of the required moments to four. Finally, we stress that under independent observations drawn from a second order regular varying heavy tailed distribution, the Hill estimator is consistent and asymptotically normal (see De Haan and Ferreira (2007)), and while different technical assumptions are required for the classical ICA-estimators, none of them interfere with our assumption that guarantees the consistency and limiting normality of the Hill estimator. Thus, as a conclusion, we can safely apply Theorem 2 and Theorem 3.

###### Remark 1.

In the above discussions we have considered only the Hill estimator. However, applying Theorem 2 or Theorem 3 for the moment estimator in the ICA-context is straightforward. Indeed, under second order regularly varying tails and independence, the Hill estimator always converges to , and the moment estimator is both consistent and asymptotically normal. Thus it suffices to check the extra condition . However, even if we have , and thus it suffices to choose the sequence such that .

### 4.2 Second order source separation model

Our second example moves to the realm of signal processing and blind source separation (BSS). Like the IC model, also the second order BSS model is linear and based on the general location-scatter model. In the model, the observed run of a stationary -variate time series is assumed to have the instantaneous latent representation,

 xi=Ωzi+μ,i∈{1,…,n}, (11)

where the latent -variate time series is stationary and has standardized uncorrelated components, and is of full rank. The location is (by stationarity) trivial to estimate by using a standard average estimator, which provides a consistent estimator if the system is ergodic. Thus, for the sake of simplicity, it will be omitted in the following. Note also that the non-identifiability of signs and order holds in the BSS model as well. However, for our purposes this does not matter due to the reasons explained in Subsection 4.1.

One standard approach to estimate is algorithm for multiple unknown signals extraction (AMUSE) (Tong et al., 1990) where the autocovariance matrices for are diagonalised simultaneously. An extension of AMUSE that is less sensitive to the choice of is the second order blind identification (SOBI) (Belouchrani et al., 1997) algorithm where the autocovariance matrices over a chosen set of lags are jointly diagonalized. As in the IC-model, one would expect that the algorithm provides a consistent estimator with some rate :

 cn(^ΓΩ−Ip)=Op(1). (12)

It turns out that (12) holds true whenever

 cn(^Στ(xi)−Στ(xi))=Op(1),τ∈T, (13)

where denotes the estimator of the autocovariance matrix . The fact that (13) implies (12) is proved in the case of complex valued AMUSE in Lietzén et al. (2020) with general rate , and in the case of real valued SOBI (and its variants) in Miettinen et al. (2016) for the rate . It is also straightforward to check that the arguments of Miettinen et al. (2016) apply with arbitrary rate function . For examples with general rate instead of the standard , we refer to Lietzén et al. (2020).

Equation (13) is the first key assumption on the rate of convergence for the autocovariance estimators, which on the other hand gives us our speed . If is non-standard, (5) gives us also the restriction , limiting the possible values of . This can be seen as an interchange between moment assumptions and the speed at which the autocovariance estimators converge, as higher moments are required if the estimators converge slowly.

In order to make Theorem 3 applicable, we also require that the Hill estimator satisfies limiting normality (8). Compared to independent observations, the problem is much more subtle in the case of dependent sequences and one needs to pose extra assumptions in addition to the second order regularly varying condition (7). The extra assumptions are, roughly speaking, conditions that ensure the dependence to be weak enough so that the series ”behaves” similarly as a series of independent observations. The precise definition of weak dependence or asymptotic independence varies in the literature. Usually asymptotic independence is encoded to mixing-conditions (for different notions of mixing-conditions and their relations, see the survey in Bradley (2005)). It is known (see, e.g., Drees (2000, 2003)) that (8) holds provided that forms a -mixing stationary sequence such that some minor additional regularity conditions are met (see, e.g., conditions (a)-(c) of De Haan et al. (2016)). In particular, all these conditions are satisfied for the following sequences:

• -dependent process and AR(1)-process (Drees, 2003; Rootzén, 1995, 2009),

• AR(p)-processes and MA()-processes (with suitable assumptions on the coefficients) (Drees, 2002; Resnick and Stărică, 1997),

• MA(q)-processes (Drees, 2002; Hsing, 1991; Rootzén, 1995, 2009),

• ARCH(1)-processes (Drees, 2002, 2003),

• GARCH-processes (Drees, 2000; Stărică, 1999).

We emphasize that the above examples form a very large and applicable class of processes. For details and more information on the above examples, see also De Haan et al. (2016).

We now turn back to extreme value index estimation under the BSS model. In order to apply Theorem 2 or Theorem 3, it suffices to make sure that, for a given heavy tailed component , the above mentioned conditions guaranteeing the limiting normality (8) for the Hill estimator are satisfied. At the same time, one needs that (13) holds, with some rate satisfying . After that, it remains to choose not increasing too rapidly so that (6) holds as well. We next explore the connection between the given assumptions. We first observe that conditions required to ensure (8) are solely on the dependence structure and distribution of the given component of interest. At the same time, condition (13) considers the rate of convergence of autocovariance estimators for all components simultaneously. Note that -mixing and assumptions (a)-(c) of De Haan et al. (2016) do not imply convergence of the autocovariance estimators (as the conditions do not even require existence of second moments). Conversely, convergence of the autocovariance estimators is related to the so-called -mixing (see Bradley (2005) for precise definition) which does not imply -mixing. Thus, even the convergence rate of the autocovariance estimator of the component does not provide any information regarding the validity of conditions implying limiting normality (8) for the Hill estimator. This means that the assumptions do not contradict, and also that in practice, one has to verify (13) and the limiting normality of the Hill estimator separately.

###### Remark 2.

If all components are -mixing, then the slowest decay of the -mixing coefficients gives us an upper bound for . Moreover, if one poses a stronger mode of mixing, -mixing, for the sequence , then (Bradley, 2005, p.112) the sequence is also both - and -mixing. For example, this is the case for -dependent processes and GARCH-processes.

## 5 Simulations

In this section, we illustrate the tail index estimation under the second order source separation model of Section 4.2 through a simulation study. Appendix B in the supplementary material presents a similar study for the independent component model in Section 4.1, with largely the same conclusions.

In this simulation study, we consider the -process z, where the components are -length realizations, i.e., time series, of the independent stochastic processes in The first component of is an ARCH(1)-process with the parameter vector . At time , the second and third components are defined as and , where denotes a fractional Brownian motion (fBm) with Hurst parameter 3/4 and denotes a fBm with Hurst parameter 4/5, such that are are mutually independent. For a comprehensive study on fBm, see, e.g., Nualart (2006). Out of the three components, the ARCH(1) process has the largest theoretical extreme value index 1/5. We considered the sample sizes and the threshold sequence was chosen to be . For each sample size, the simulation was iterated 2000 times.

As a preliminary step, the simulated observations were centered. Here, the centered observations are denoted as . In every iteration , we applied, for all

, where the elements of the -matrix

were simulated independently, and separately in every iteration, from the univariate uniform distribution

. We then applied the AMUSE unmixing procedure with lag to the mixed time series, using the implementation contained in the R-package JADE (Miettinen et al., 2017). The existence of the limiting distribution of the AMUSE unmixing estimator requires finite fourth moments. Note that the ARCH(1) parameters are chosen such that the fourth moments exist for all components. We denote the absolute values of the AMUSE unmixed time series and the absolute values of the original centered time series as and , respectively.

Now, we have , which corresponds to the ARCH(1) process. The process in the third component has the slowest rate of converence, giving , see Lietzén et al. (2020). Hereby, under our choice of , we have that the assumptions required by Theorems 2 and 3 hold and, hence, for large sample sizes, the extreme value index estimates calculated from and are expected to be close to each other.

We estimated the extreme value indices for every component from both and , using both the Hill estimator and the moment estimator. Note that both estimators produce three extreme value index estimates, one for each component. To capture the ARCH(1) component, we collected, in every simulation iteration, the largest of the three estimates, denoted in the following by and (this induces a slight bias to the results which is, however, rendered negligible with increasing ). The histograms of and for sample sizes are shown in Figure 1, where the extreme value indices estimated from correspond to light blue colour, and the extreme value index estimates calculated from the original correspond to light red colour. Dark blue colour is used for the parts of the histograms that overlap and the dashed yellow vertical line represents the theoretical extreme value index value . Values smaller than are omitted from the figure; a total of 21 moment estimator estimates were smaller than .

In Figure 1, already for the small sample size , the two histograms overlap significantly. Moreover, starting from , the histograms are basically identical, showing that, as predicted by the theory, the effect of the BSS-step on the estimation of the extreme value indices is almost negligible. When comparing the Hill estimator and the moment estimator, Figure 1 indicates that the variance of the moment estimator is larger, when compared to the Hill estimator. In addition, the bias of the Hill estimator is visible in the histograms, see De Haan and Ferreira (2007), and seems to decrease as the sample size increases. The histograms corresponding to the sample sizes and have been omitted here, as they introduce no new information to the simulation study.

Figure 2 illustrates the absolute differences, scaled with , between the estimates calculated from and . The red and blue curves represent the first and third empirical quartiles of the absolute differences, respectively, and the yellow curve is the corresponding sample median curve. The differences can be seen to converge to zero for both estimators, but the moment estimator requires larger sample sizes for this. That is, the quartile for the Hill estimator is close to zero already with and, conversely, the moment estimator quartile requires samples of size for achieving the same magnitude.

## 6 Real data example

Heavy-tailed distributions are encountered frequently in the context of financial instruments (Rachev, 2003). Here, we consider extreme value index estimation for a four-dimensional financial time series downloaded from Yahoo Finance. The data consist of the daily log-returns of the S&P500 index and the stock prices of CISCO Systems, Intel Corporation and Sprint Corporation in the period of January 3rd, 1991 – September 12th, 2019. The observations were further standardized to have unit variance, which can be done without loss of generality as both our extreme value index estimators are scale invariant. A subset of the data from a shorter period of time was used already in Fan et al. (2008) in the context of multivariate volatility modeling, which inspired us to choose the same data set.

The full four-variate series is visualized in Figure C7 in the supplementary Appendix C. Volatility spikes that span most of the series occur around the years 2002 and 2008, caused by the stock market downturn of 2002 and the financial crisis of 2007–2008, respectively. Especially the latter time period stands out also in Figure 3, where we have estimated the extreme value indices of the individual series. The estimation in Figure 3 was conducted by moving a window of length 60 days through each univariate series and estimating the extreme value index of each window with the Hill estimator with the tail length . The -axis values in the plot correspond to the middle days (30th days) of the windows. Contrary to the approach in Section 3, we estimated the extreme value indices not from the absolute values of the series, but separately for both the left and the right tail of each of the series. That is, for each of the four time series in Figure C7, we obtain two sequences of extreme value index estimates, always plotted with the same colours in Figure 3. This approach was taken to assess the behaviour of both negative and positive returns separately, in order to perform a more subtle analysis. Based on Figures 3 and C7, it seems reasonable to assume that among the four series there is an underlying latent factor (“financial crisis series”) which contributes risk to all four series around the times of the previous two crises.

To explore this, we estimate latent factors using generalized SOBI (Miettinen et al., 2019), an extension of the SOBI method which uses both serial correlation and volatility information in estimating the latent series. Denoting the original four-variate series at time by , the estimates of the centered latent series are given by where is the unmixing matrix estimate given by generalized SOBI. The estimates are shown in Figure C8 in the supplementary Appendix C and indeed hint that the risk on certain periods is driven by individual latent factors. E.g., the majority of the volatility associated with the 2007-2008 financial crisis has concentrated in the fourth latent series.

To get a clearer view, Figure 4 shows the extreme value index estimates of the four latent series, obtained using the same rolling window approach as used in Figure 3. The most prominent feature in Figure 4 is the spike around year 2002 in one of the extreme value indices of the first series, indicating a period of large risk. Several other spikes are also visible, most notably in one of the indices of the fourth latent series during late 2002. Thus, we infer that the 2002 crisis was driven by two separate sources of risk.

Finally, we study the connection between the factors and the observed series. The inverse transformation from the latent series to the observed ones is where

 ^Γ−1=⎛⎜ ⎜ ⎜⎝0.540.160.200.800.240.870.280.340.280.160.890.32−0.510.040.100.85⎞⎟ ⎟ ⎟⎠,

contains the loadings of the latent sources for each of the observed series. The loadings reveal, for example, that both the first and fourth latent series contribute (absolutely) most to the first and the fourth original time series. More specifically, the fourth latent process is the most important (loadings 0.80 and 0.85), and the first latent process the second most important (loadings 0.54 and -0.51) in explaining the behavior of the log-returns of S&P500 and Sprint. We conclude that, out of the four observed series, the financial crises affected S&P500 and Sprint the most, and had a significantly smaller impact on CISCO and Intel.

## 7 Conclusion

We studied the effect of a preliminary latent variable extraction on the estimation of the extreme value indices of the latent independent components. This approach to multivariate extreme value analysis is highly practical in the sense that it reduces the problem into several univariate extreme value problems, allowing the use of the standard extreme value machinery. Moreover, our asymptotic analysis revealed that, under reasonably mild conditions, the consistency and limiting normality of the Hill estimator and the moment estimator are preserved in this construction.

A natural question to pursue in the future is whether the conditions in Theorems 2 and 3 can be weakened (we only showed that they are sufficient). Some preliminary simulation (not shown here) indicates that this might indeed be the case. Moreover, the current work can likely be used to simplify the task of deriving similar results for other suitable estimators besides the Hill estimator and the moment estimator. This is because the perturbation bounds for tail observations given in Appendix A.1 are not tied to any particular extreme value index estimator (indeed, they concern the latent variable estimation part of the model). As such, one only needs to derive the analogues of Appendix A.2 (perturbation bounds for the actual extreme value index estimation step) for the new methods.

## Appendix A Proofs

Section A of the appendix is devoted to the proofs of the technical results. We have gathered auxiliary technical lemmas into Subsection A.1 and Subsection A.2 contains to the proofs of our main theorems.

### a.1 Auxiliary lemmas

The main objective in this subsection is to establish the rate at which the quantity

 ||^zk|(n−m,n)/|zk|(n−m,n)−1|

vanishes. We begin with the next result that allows us to consider the component with the heaviest tail as the conservative bound for the error. This translates into on our main theorems.

###### Lemma 1.

Let be distributions such that,

 γ0>γ1>γ2=0>γ3.

For , put , where are the normalising sequences such that , where follows . Then

 gnkgn0=o(1),k=1,2,3.
###### Proof.

Note first that since , the distribution is heavy tailed and belongs to the domain of attraction of the Fréchet distribution. Thus, by (Embrechts et al., 2013, Section 3.4), where is a slowly varying function. Similarly for some slowly varying function and we have the claim for the value , that is,

 gn1gn0=nγ1−γ0L1(n)L0(n)=o(1).

Similarly, the distribution is light tailed and belongs to the domain of attraction of the Weibull distribution. As such, by (Embrechts et al., 2013, Section 3.4), we have for some slowly-varying function and constant . Since , we have and the claim for follows from

 gn3gn0=O(1)nγ0L0(n)=o(1).

It remains to prove the case that corresponds to the border case . Now belongs to the domain of attraction of the Gumbel distribution and, by (Embrechts et al., 2013, Section 3.4), we have , where is as in (Embrechts et al., 2013, Definition 3.3.18), and

is the quantile function. Let

be the right endpoint of the distribution . We consider two cases, and , separately. In the former, as and by (Embrechts et al., 2013, Remark 2, Section 3.3) as . Thus, for a large enough , we have and

 gn2gn0=yF+o(1)nγ0L0(n)=o(1).

For , we have and, by (Embrechts et al., 2013, Remark 1, Section 3.3), . Thus, for a large enough , we have and

 gn2gn0=F←2(1−1/n)nγ0L0(n).

We continue by proof by contradiction, and assume that does not converge to zero. Then there exists such that we can find an arbitrarily large such that

 F←2(1−1/n)≥ϵ0nγ0L0(n).

It follows that

 1−F2(ϵ0nγ0L0(n))≥1n

and since is slowly varying, this further implies that

 1−F2(cnγ0)≥1n

for some constant and a large enough . Since , this implies that is heavy tailed giving us the contradiction. This completes the proof for the case as well. ∎

The next result shows that the denominator in is negligible.

###### Lemma 2.

Let be an arbitrary sequence of non-negative random variables such that

 limδ→0infk≥1P(zk≥δ)=1. (14)

Then, for any and any intermediate sequence , there exists and such that

 P(z(n−kn,n)<δ)<ϵ,n≥N.
###### Proof.

Let

 Sn(δ)=n∑k=11zk≥δ.

Then

 P(z(n−kn,n)<δ)=P(Sn(δ)

Indeed, means that less than of the values are above or equal to , which implies that :th maximum of is strictly less than . Vice versa, if , then at most of values can be above or equal to . Thus it suffices to prove that for any , we can find and such that for we have

 P(Sn(δ)

Equivalently, we need to show

 P(Sn(δ)≥kn)>1−ϵ. (15)

Denote . By (14), for any we can find small enough such that

 E¯¯¯¯Sn(δ)=1nn∑k=1P(zk≥δ)>1−˜ϵ (16)

uniformly in . Together with this gives us

 [E¯¯¯¯Sn(δ)]2E¯S2n(δ)≥(1−˜ϵ)2

which holds for every . Next we recall the Paley-Zygmund inequality which states that, for any random variable with finite variance and any number , we have

 P(Z≥θEZ)≥(1−θ)2(EZ)2EZ2. (17)

Since is an intermediate sequence, we have, applying (16), that

 knnE¯¯¯¯Sn(δ)≤knn(1−˜ϵ)≤1

provided that is large enough. Hence we may apply (17) with and to compute

 P(Sn(δ)≥kn)=P⎛⎝¯¯¯¯Sn(δ)≥knnE¯¯¯¯Sn(δ)E¯¯¯¯Sn(δ)⎞⎠≥⎛⎝1−knnE¯¯¯¯Sn(δ)⎞⎠2[E¯¯¯¯Sn(δ)]2E¯¯¯¯S2n(δ)≥(1−knn(1−˜