1 Introduction
Extreme value analysis of heavytailed distributions is an important model in various applications. In seismology and climatology, for example, statistics of extremes is used to study earthquakes (Beirlant et al., 2018) or heavy precipitation (Carreau
et al., 2017). Another important field of research is analysing high financial losses, which becomes particularly interesting if the losses depend on covariates (ChavezDemoulin et al., 2016; Hambuckers
et al., 2018). In this situation an automated threshold selection procedure could bring additional benefits by enabling the selection of the threshold depending on a covariate. We will discuss this possibility in more detail in Section 5.
To mathematically investigate the behaviour of heavy tails, we consider random variables from the domain of attraction (DoA) of a Fréchet distribution. Let
be independent identically distributed (i.i.d.) random variables with distribution function , where is in the DoA of an extreme value distribution (evd) with extreme value index . This means there exist sequences and real, s.t.In this situation the following first order condition holds,
(1) 
i.e. the survival function is regularly varying with index . Distributions fulfilling this condition are called Paretotype distributions, because they only differ from the Pareto distribution by a slowly varying function , i.e. .
We can interpret the quotient in (1
) as a conditional probability, and it follows directly that
(2) 
Thus, for a sufficiently large threshold the data above this threshold can be modelled by a Pareto or an exponential distribution. In this article we concentrate on the exponential approximation and utilize it for inference on the extreme value index. It is common to consider the threshold and choose the sample fraction instead of , where denote the order statistics of a sample of size . In this case, a natural estimator for under the exponential approximation of the logspacings is their mean, the Hill estimator (Hill, 1975),
(3) 
The Hill estimator is still among the most popular and wellknown estimators for the extreme value index, although its sample path as a function in can be highly unstable and estimation therefore crucially depends on the choice of the sample fraction .
This dependence highlights the difficulties in estimating : even from univariate i.i.d. observations from , estimation is hard, since only few observations contain information about the extreme value distribution . To select a threshold above which the data can be used for statistical inference about the tail is one of the most fundamental problems in the field of extreme value analysis.
Due to the importance of this task, the appropriate choice of the threshold has been discussed extensively in extreme value research over the last decades, and suggested solutions cover a variety of methodologies. We give a short summary on different types of approaches and stress the specific difficulties that arise. We mainly concentrate on methods we compare in our simulation study in Section 4. More comprehensive reviews about threshold selection can be found in Scarrott and
MacDonald (2012) and Dey and Yan (2016).
One basic concept in threshold selection is data visualisation, which is also discussed more deeply in Kratz and
Resnick (1996) and Drees
et al. (2000). Popular graphical diagnostics used in this context are the Zipf plot, Hill plot, QQplot or the meanexcess plot to name a few. A major drawback of these methods is their subjectivity due to the necessarily personal interpretation of the plot. Further, it is a burden to choose each threshold manually, especially in high dimensional settings or when analysing many samples.
Easier ways to select the sample fraction are rulesofthumb such as using the upper 10% of the data (DuMouchel, 1983) or (Ferreira
et al., 2003). However, these suggestions are neither theoretically justified nor data driven.
Reiss and
Thomas (2007) present a procedure that tries to find a region of stability among the estimates of the extreme value index. Their method depends on a tuning parameter, whose choice is further analysed in Neves and
Fraga Alves (2004). To our knowledge no theoretical analysis exists for this approach.
Besides these and similar heuristic approaches, there is a class of theoretically motivated procedures that target the optimal sample fraction for specific estimation tasks, such as quantile estimates
(Ferreira et al., 2003), estimation of high probabilities (Hall and Weissman, 1997) or the Hill estimator, see below. We also mention two other methodologies. First, there are suggestions that utilize comparing the empirical distribution to the fitted generalized Pareto distribution (GPD) via goodnessoffit tests (Bader et al., 2018) or by minimizing the distance between them (Pickands, 1975; Gonzalo and Olmo, 2004; Clauset et al., 2009), where the latter approach is theoretically analysed by Drees et al. (2018). Further, Goegebeur et al. (2008) propose a family of kernel statistics to test for exponentiality in order to select a threshold.Of particular interest to us are methods that aim to estimate the sample fraction which minimizes the asymptotic mean square error (AMSE) of the Hill estimator. To construct an estimator for , Drees and Kaufmann (1998) utilize the Lepskii method and an upper bound on the maximum random fluctuation of around . To apply their approach it is necessary to choose several tuning parameters and to obtain consistent initial estimates for and a second order parameter . They recommend specific choices of the parameters based on a numerical study and we employ their proposals in our simulations. However, the choice of these parameters is not datadriven. In Guillou and Hall (2001)
, a test statistic
is constructed based on an accumulation of logspacings, which takes values around 1 as long as the bias of the Hill estimator is not significantly large. Their statistic depends on a tuning parameter as well, and a critical value to test against has to be chosen. Again we adopt the parameter choice suggested in their simulation study. Danielsson et al. (2001) introduce a double bootstrap approach to estimate the optimal sample fraction. They need to choose the number of bootstrap samples and a parameter . For , a datadriven but computationally expensive selection method is provided, where the whole bootstrap procedure is repeated for various possible values of . Another estimator for is given by Beirlant et al. (2002), which employs least squares estimates from an exponential regression approach. The method depends on an estimate for and a sample fraction . To avoid the choice of they suggest taking the median of the estimates over a range of values, e.g. . A different approach is taken by Goegebeur et al. (2008), who use the properties of a test statistic regarding bias estimation to construct an estimator for the AMSE and minimize it with respect to . If one fixes , as they suggest in their simulations chapter, there is no further tuning parameter to be chosen. However, no result about consistency of in the sense of is known in contrast to the approaches in Drees and Kaufmann (1998), Guillou and Hall (2001), Danielsson et al. (2001) and Beirlant et al. (2002).In this paper we contribute to the problem of threshold selection by introducing two new methods. The first one presented in Section 2 is inspired by the idea of testing the exponential approximation. We estimate the integrated square error (ISE) of the exponential density under the assumption that the logspacings are indeed exponentially distributed. The error functional we obtain, denoted as inverse Hill statistic (IHS), is very easy to compute and does not depend on any tuning parameters. Since this criterion is variable for small , it can be additionally smoothed to improve the performance. The minimizing sample fraction of IHS is asymptotically smaller than , as it is stricter against deviation from the exponential approximation. This estimator performs remarkably well for adaptive quantile estimation on finite samples, as illustrated in our simulation study.
In our second approach we suggest a smooth estimator for the AMSE of the Hill estimator, called SAMSEE (smooth AMSE estimator). This estimator is constructed by a preliminary estimate of using the generalized Jackknife approach in Gomes
et al. (2000) and a bias estimator for the Hill estimator introduced in Section 3. By minimizing SAMSEE we estimate the optimal sample fraction . For estimation, the choice of a large sample fraction is necessary, for which we present a datadriven selection procedure in Section 3. SAMSEE utilizes the idea of fixing , which is justified by good performance in simulations and leads to a simpler and more robust estimator. However, the estimator can also be adjusted to any by including a consistent estimator , as described in Section 3.1.
After introducing our two novel threshold selection methods in Sections 2 and 3 we compare these methods to various other approaches in an numerical analysis in Section 4. In Section 5 the importance of automated threshold selection procedures is illustrated in an application, where we nonparametrically estimate an extreme value index that varies over time.
The proof of Theorem 3, which describes the asymptotic behaviour of our bias estimator, and auxiliary theoretical results can be found in Appendix A.
2 IHS – The inverse Hill statistic
In this section we introduce the first threshold selection procedure by analysing the integrated square error (ISE) between the exponential density and its parametric estimator employing the Hill estimator,
The first term of ISE is constant and thus plays no role for selecting . The last term of ISE is known, but the second term is not. Therefore, we cannot minimize ISE directly. Instead, we want to estimate and minimize its expectation under the exponential approximation. This is based on the idea of considering the hypothesis that the logspacings are indeed exponentially distributed. Under
the Hill estimator is gamma distributed, see Lemma
1, and the mean of ISE (MISE) can be calculated explicitly. We observe that MISE is a decreasing function in under the exponential approximation,(4) 
where and denotes the upper incomplete gamma function. The function converges to 1 very fast, s.t. we obtain
This provides us with an unbiased estimator for the first term in (
4) under . However, due to the high variability for small , we instead want to find an estimator of the form for some depending on that minimizes the MSE under the exponential approximation. To do so, we approximate its MSE in the following way,(5) 
The approximation depends on similar functions as , which quickly become constant. The MSE in (5) is minimized for . Thus, we suggest the inverse Hill statistic
to estimate and the threshold selected via minimizing IHS,
By minimizing IHS we select a sample fraction where IHS starts increasing and contradicts by behaving contrarily to MISE under the exponential approximation. This criterion can be compared to hypothesis testing with a large significance level , which implies seeking high confidence when deciding to not reject . Further properties of are analysed theoretically in Section 2.1 and for finite samples in a numerical study in Section 4.
Note that the performance of IHS depends on the bias of the Hill estimator being positive and increasing, see Section 2.1. However, the bias can be negative for some nonstandard distributions. In case of a negative bias, we instead suggest to use,
The two cases can easily be distinguished by analysis of the Hill estimator for large . Both and are justified by asymptotic results in Section 2.1.
Figure 1 illustrates that IHS is highly varying for small , which makes automatic threshold choices more variable. To control this problematic behaviour we smooth the IHS. More specifically, we want to estimate by considering the regression problem
where and .
Due to the structure of the Hill estimator, the random variables are highly dependent, which needs to be taken into account in estimation. In our simulations, we apply a Bayesian nonparametric procedure introduced by Serra
et al. (2018) which simultaneously estimates mean and covariance and is available in the Rpackage eBsc. The approach provides a smooth estimator for the expectation of IHS – denoted as sIHS – comprising less variation for small . This way we can improve the performance by selecting a more suitable threshold, as illustrated in Figure 1. Of course, one can also use other smoothing procedures suitable for dependent data (Opsomer
et al., 2001; Krivobokova and
Kauermann, 2007; Lee
et al., 2010).
We finally want to remark on the relation between IHS and ISE, which is given by
(6) 
This equation points out that minimizing IHS does not minimize ISE, as IHS takes an additional bias term into account. If the bias of the Hill estimator is positive, IHS selects smaller (larger thresholds) than ISE. This is not surprising, because we estimate the expectation of the ISE under the hypothesis that the exponential approximation holds. This is a much more conservative error functional, meaning it is more strict against deviation from the exponential distribution.
In conclusion, with IHS we do not aim to estimate but to find a sample fraction where we can be very certain that the exponential approximation still holds. The impact of this consideration is illustrated in simulations and an application in Sections 4 and 5.
2.1 Theorectical analysis of IHS
In order to understand the IHS asymptotically we consider the second order condition,
(7) 
for and with second order parameter . Here, denotes a function converging to zero as goes to infinity and is regularly varying with index . Further, is defined by , where denotes the left inverse of the distribution function . In this setting the following asymptotic normality statements for the Hill estimator hold.
Theorem 1 (Theorem 3.2.5 in de Haan and Ferreira (2006)).
Let be i.i.d. random variables with distribution function for . If (7) holds and is an intermediate sequence, i.e. and as , then
with .
Theorem 2.
Under the conditions of Theorem 1, it holds that
Proof.
Applying the delta method to Thm. 1. ∎
Following the reasoning in de Haan and Ferreira (2006), page 78, the minimizing point of the AMSE can be found explicitly if considering with . In this special case the minimizing sample fraction can be expressed as
(8) 
Under the same assumption we can calculate the minimizing point of the asymptotic expectations of and . Let denote the asymptotic expectation referring to the expectation of the limiting distribution in Thm. 2. Then
It is easy to check that the same formula holds for if is replaced by its absolute value. Further note that by Lemma 2 it is sufficient to consider intermediate sequences when determining the minimizing sequence. Comparing and for a fixed we obtain that
(9) 
as and for a constant depending on , and .
This supports what equation (6) already suggested: minimizing gives asymptotically a smaller than . Thus, asymptotically performs suboptimally for the Hill estimator but still leads to a consistent sequence of estimates.
For finite samples the ratio crucially depends on , and can be even larger than , as illustrated in Figure 2. The graphic presents the quotient of the two sample fractions as a function in the second order parameter for different samples sizes. The parameters and are fixed to 1, as they have a weaker impact on the proportion. It also holds that , as , since both sample fractions converge to in this case.
Although is of smaller order than asymptotically, the simulation study in Section 4 shows that works remarkably well when used for quantile estimation. We consider the following quantile estimator for the quantile,
(10) 
The sample fraction also minimizes the asymptotic relative MSE of , see e.g. Theorem 4.3.8 in de Haan and Ferreira (2006). For finite samples however, the quantile estimator seems to benefit from . This has different reasons, two of which are illustrated by Figure 3. On the left we see the empirical expectation of IHS, the empirical versions of the MSE of and the relative MSE of the quantile estimator,
(11) 
as used in Theorem 4.3.8 in de Haan and
Ferreira (2006).
We observe that (blue dot) is indeed smaller than (black) but so is the minimizer of MSEQ (pink) as well.
On the right we see a plot of the empirical and MSE of for Loggamma distributed samples of size 5000. This graphic highlights the similarities between MSE and IHS for the boundary case .
These observations indicate why outperforms other methods that try to minimize the MSE of the Hill estimator when adaptively estimating by (10) on most of our exemplary distributions and sample sizes and , see Section 4.
3 SAMSEE  The smooth AMSE estimator
In this section we illustrate a way to smoothly estimate the AMSE of the Hill estimator. Via minimizing this AMSE estimator, called SAMSEE, we obtain an estimator for . By this means, we extend previous methods which also estimate by estimating the AMSE itself. From Thm. 1
it is easy to see that the AMSE, which is the asymptotic variance plus the asymptotic squared bias, equals
(12) 
Thus, to estimate the AMSE as a function in we employ two estimators, one for and one for the bias term as a combination of and . First we explain how we estimate and then we define the bias estimator. This bias estimator has a quite smooth sample path in , and it depends on the choice of a large sample fraction , for which we afterwards provide a datadriven selection procedure.
Note that, for the moment, we assume that the second order parameter
is equal to to motivate the construction of the AMSE estimator. The idea of misspecifying to simplify estimation – via avoiding the additional uncertainty through estimating or selecting an influential tuning parameter – was already used, for example, by Gomes et al. (2000), Drees and Kaufmann (1998) and Goegebeur et al. (2008). It is also motivated by the simulations in Section 3.1.For we consider the generalized Jackknife estimator introduced by Gomes et al. (2000) as . This estimator is defined by
(13) 
where denotes the logspacings as in equation (3).
Note, that is the de Vries estimator introduced under this name in de Haan and
Peng (1998) and is the Hill estimator as above.
The generalized Jackknife estimator has a reduced bias compared to the Hill estimator and is even asymptotically unbiased if , see (2.11) in Gomes
et al. (2000). This property is useful here, since the bias estimator defined in the following performs optimally for as well. Furthermore, the same large sample fraction can be used for and .
To construct this bias estimator, we study the following averages of Hill estimators,
where . Plotting these averages illustrates how they smoothly frame the sample path of the Hill estimator. Especially the upper mean seems to contain a lot of structural information about the underlying asymptotic bias of the Hill estimator when choosing the upper bound appropriately, see Figure 4.
This similarity between the upper mean and the bias of the Hill estimator inspires the definition
(14) 
The estimator is indeed a sensible estimator for a bias function, since
(15) 
follows for from Theorem 3.
Danielsson et al. (2001) use to access the bias of and apply a double bootstrap procedure to stabilize this highly varying estimate. We use the difference of two estimators for as well, but now consider averaging to smooth the bias estimate. The idea to average the Hill estimator in order to smooth the Hill plot and decrease the variance is also studied in Resnick and
Stǎricǎ (1997).
It remains to choose an appropriate in order to complete SAMSEE and to estimate the optimal sample fraction . We need to be large enough to allow for minimization over all relevant and small enough to be an intermediate sequence itself (see Theorem 3 for this condition). To find such a we use the following relation between the estimators,
(16) 
This provides us with a relatively stable function in , , that has the same asymptotic expectation as the highly nonsmooth Hill estimator. We want to find an intermediate sequence for which (16) holds and thus define
(17) 
to measure the deviation from approximation (16) uniformly over all . Based on this, we suggest to choose
(18) 
In this way we select a where the asymptotic approximation (16) is most stable, since we minimize the local variation of . Simulations suggest that this criterion is not sensitive to slightly increasing the region of stability from to or depending on the sample size.
Now we finally combine the previously described estimators to approach the AMSE in (12) under the assumption that . With in (18) and the property of in (15), we obtain an estimator for the AMSE of the Hill estimator and for by
(19)  
Figure 5 illustrates how such a smooth estimate of the AMSE can look like. On the left, SAMSEE is displayed for a Fréchet sample with parameters and . On the right, the Hill plot of the same sample is presented for all .
This smooth estimate of the AMSE can be useful beyond the context of threshold selection. For extreme value mixture models or Bayesian threshold selection approaches, SAMSEE could be used to construct a transition function between bulk and tail distribution or an empirical prior for the threshold, respectively, see Scarrott and MacDonald (2012) for a review on mixture models.
3.1 SAMSEE if
We next want to analyse SAMSEE in the broader context of an unknown second order parameter . The first thing to note is that the generalized Jackknife estimator is no longer unbiased in this situation. Secondly, the behaviour of our bias estimator changes, as it is described in the following Theorem.
Theorem 3.
Proof.
The proof can be found at the end of Appendix A. ∎
From Theorem 3 follows that
For the function is equal to 1. If , we can observe that bends our bias estimator and it will therefore increase slightly too fast or too slow. We can still apply SAMSEE in this situation and select from (18). However, approximation (16) does not hold anymore and instead the following holds,
(20) 
The absolute value of the error described by (20) is high if strongly differs from and the bias term is large. If , indeed deviates from and we minimize the error by minimizing the bias. This is why applying (18) in this case leads to a small . On the other hand, if , the approximation stays valid for an increasing bias and will typically be larger.
An alternative to fixing is to incorporate a consistent estimator of the second order parameter. This can be done via
and
(21)  
In this way we can construct an estimator for in the general setting of Paretotype distributions.
In Table 1, we present the results of a simulation study indicating for which distributions it is beneficial to use instead of . We estimate using the estimator suggested in Theorem 1 in Drees and
Kaufmann (1998). The results indicate that, in general, it is sensible to fix
in SAMSEE, since only for the Cauchy distribution using
performs slightly better regarding bias and RMSE. This confirms the observations already made by others (Gomes et al., 2000; Drees and Kaufmann, 1998; Goegebeur et al., 2008), that it is often recommendable to select instead of allowing for further variability by including an additional estimator.(RMSE)  

true  
Studentt(6)  0.17  1/3  0.21 (0.09)  0.26 (0.12)  0.28 (0.14) 
Fréchet(2)  0.50  1  0.51 (0.07)  0.51 (0.07)  0.51 (0.08) 
Cauchy  1.00  2  1.01 (0.13)  0.97 (0.17)  0.99 (0.16) 
Burr(2,1)  2.00  1  2.05 (0.34)  2.05 (0.34)  2.03 (0.40) 
4 Simulation study
In the following we numerically analyse the performance of eight threshold selection methods on heavytailed distributions with very different tail behaviour. The simulation study is based on the following distributions:

the Fréchet distribution with parameter and distribution function for , which implies and ,

the standard Cauchy distribution leading to a tail behaviour with and ,

the Loggamma distribution with and and density function

the Burr distribution with a parametrisation such that , and distribution function

a logarithmically perturbed Pareto distribution of the random variable with and , where and . This distribution is denoted as negBias due to its negative bias in the Hill estimator.
On these distributions we evaluate the methods by their root mean square error (RMSE) when adaptively estimating with the Hill estimator relative to the RMSE obtained using ,
where denotes the empirical expectation. These efficiency quotients are also used by, e.g., Guillou and Hall (2001), Gomes et al. (2000) and Drees and Kaufmann (1998). The smaller the quotient the better the threshold selection procedure performs compared to the asymptotically optimal sample fraction . Furthermore, we study the efficiency in quantile estimation with the estimator defined in (10) for ,
Since we do not know the true minimizer of the AMSE, we utilize an empirical version suggested by Gomes
et al. (2000). Following their approach we approximate by the mean of 20 independent replicates of , which is the minimizer of the empirical MSE based on samples, i.e.
We compare these efficiency values for eight different threshold selection methods. Most of the considered approaches are constructed for adaptive estimation of applying the Hill estimator. This includes one procedure that looks for a stable region among the Hill estimates, while the others aim to estimate . The only exception is the IHS approach discussed in Section 2, which is motivated to minimize the deviation from the exponential approximation. We still evaluate the performance of this procedure in the same simulations, although it is not primarily tailored for the specific applications. In total, the following methods are considered:
 sIHS:

IHS smoothed by using the eBsc package, see Section 2,
 SAM:
 GH:

method by Guillou and Hall (2001) utilizing and ,
 DK:

procedure by Drees and Kaufmann (1998) with fixed ,
 GO:

approach by Goegebeur et al. (2008) defined in their equation (3.3) with fixed ,
 DB:

double bootstrap approach by Danielsson et al. (2001) with the choice if and if ,
 B:

method by Beirlant et al. (2002) with ,
 RT:
SAM  GH  DK  GO  DB  B  RT  sIHS  

Studentt(6)  1.07  1.68  1.18  1.38  1.06  1.04  1.04  1.14 
Fréchet(2)  1.13  1.15  1.08  1.12  1.60  1.49  2.00  1.41 
Cauchy  1.37  1.19  1.32  1.16  2.14  1.85  2.11  1.47 
Loggamma  0.98  1.06  1.27  1.11  1.12  1.04  1.32  0.78 
Burr(2,1)  1.11  1.22  1.47  1.13  1.68  1.42  1.82  1.14 
negBias  1.06  1.13  1.56  1.13  1.07  1.22  1.89  2.27 
SAM  GH  DK  GO  DB  B  RT  sIHS  

Studentt(6)  1.20  1.58  1.31  1.39  1.35  1.03  1.26  1.03 
Fréchet(2)  1.08  1.26  1.07  1.21  1.66  1.29  2.40  2.43 
Cauchy  1.34  1.41  1.08  1.17  2.00  1.68  2.78  3.03 
Loggamma  1.08  1.10  1.32  1.17  1.19  1.05  1.40  0.79 
Burr(2,1)  1.07  1.29  1.62  1.14  1.63  1.29  2.21  1.79 
negBias  0.98  1.12  1.54  1.10  1.30  1.04  2.08  3.98 
SAM  GH  DK  GO  DB  B  RT  sIHS  

Studentt(6)  1.09  2.30  1.23  1.60  1.01  1.04  1.16  1.10 
Fréchet(2)  0.96  1.07  1.01  1.06  1.07  1.16  1.42  0.83 
Cauchy  0.89  1.04  1.03  0.95  0.86  1.40  1.59  0.65 
Loggamma  0.84  0.95  2.10  1.02  0.88  1.06  1.55  0.50 
Burr(2,1)  0.79  2.15  8.60  0.98  0.71  1.43  3.19  0.41 
negBias  1.66  1.37  2.32  2.13  0.80  1.98  3.75  8.05 
SAM  GH  DK  GO  DB  B  RT  sIHS  

Studentt(6)  1.07  1.39  1.16  1.29  1.44  1.02  1.24  0.94 
Fréchet(2)  1.04  1.14  1.07  1.15  1.14  1.14  1.53  1.14 
Cauchy  1.01  1.13  1.03  1.04  1.16  1.22  1.55  1.10 
Loggamma  0.94  1.00  1.39  1.12  1.11  0.97  1.55  0.67 
Burr(2,1)  1.00  1.38  1.24  1.11  1.09  1.11  1.72  0.71 
negBias  1.00  1.11  0.87  1.21  0.87  1.21  1.23  2.16 
When looking at the results for estimating adaptively for and in Table 2, we observe a very diverse picture of methods performing best. Overall we get the impression that SAMSEE together with the approach by Goegebeur
et al. (2008) performs most stable over the variety of distributions. This is interesting, because those are the methods which depend least on tuning parameters. The performance of the approaches GH, DK and B is comparable, but we obtain from Table 2 that on average over all distributions the SAMSEE procedure is superior.
For estimating a high quantile SAMSEE also performs convincingly, see Table 3, but additionally sIHS and the approach by Danielsson et al. (2001) show very good efficiency values. They are closely followed by B and GO. Looking at the average performance over all distributions, SAMSEE performs best again. However, if we exclude the negBias distribution, sIHS works superior on average.
In conclusion, we can see that SAMSEE performs very efficiently and comparable to over all exemplary distributions. It works especially well for estimating a high quantile. Only in the case of estimating for the Cauchy distribution it performs worse than DK and GO, but still better than most other approaches. Recalling the results of the simulation on the influence of in Table 1 in Section 3.1, it is not very surprising that SAMSEE performs slightly weaker in this situation. There, the Cauchy distribution is the only example we considered that benefits from estimating instead of fixing it to .
From Table 3 we furthermore observe that sIHS is a strong choice when estimating high quantiles from small samples with up to . However, the performance when estimating is quite variable and it seems that sIHS does not perform particularly well for distributions with a small second order parameter (). This behaviour is already discussed in Section 2.1 and highlighted in Figure 2: sIHS selects smaller than optimal for the Hill estimator, especially if is in the regime between and .
The reason why some approaches perform worse on quantiles than they do on is that the estimator defined in (10) depends on in the exponent and is thus very sensitive to overestimation in case of . Hence, when estimating a high quantile, an estimate
that is too large will lead to an even stronger overestimation of the quantile. This is why a few outliers among the
estimates can already cause much higher values.5 Application to varying extreme value index
In this section we analyse our new procedures in a financial application, where we study operational losses of a bank. We are, of course, particularly interested in the distributional properties of very high losses. It has been discussed before that it is reasonable to assume the distribution of such extreme losses being heavytailed (ChavezDemoulin et al., 2016; Moscadelli, 2004) and to change with the financial market over time (Hambuckers
et al., 2018; Cope
et al., 2012). In this context, we want to estimate how the extreme value index changes depending on the univariate covariate time. For this task, we utilize the approaches presented in Sections 2 and 3 for locally optimal selection of a threshold.
The observations of interest are operational losses from the Italian bank UniCredit from 2005 to 2014. In Hambuckers
et al. (2018) the data is analysed in a regularized generalized Pareto regression approach including several firmspecific, macroeconomic and financial indicators as covariates. This approach describes the dependence of the GPD parameters on various covariates via parametric functions.
We consider an easier and more direct approach to study the temporal dependence of the extreme value index without taking into account possible interference by other covariates. Our aim is to estimate the time dependent extreme value index nonparametrically with a simple ad hoc estimator that extends the estimator from de Haan and
Zhou (2017) by employing our threshold selection procedures sIHS and SAMSEE. We present the estimator in Section 5.1 and the results we obtain when applying this estimator to the dataset of operational losses in Section 5.2.
5.1 Estimating a varying extreme value index
In de Haan and Zhou (2017), the authors already discussed estimating a trend in the extreme value index nonparametrically. They consider independent random variables , where for . To address this problem, they introduce the following estimator for , which locally applies the Hill estimator and is based on a global sample fraction ,
(22) 
where is the neighbourhood of , i.e. . This estimator depends on the choice of the bandwidth and the global sample fraction , which is then rescaled to for the individual regions . A small bandwidth leads to very high variability in and a large value of smooths out all interesting features. Thus, the choice of should balance these two effects.
We suggest a modification of their estimator, where we locally estimate an optimal threshold , i.e.
(23) 
To compare these two approaches, we repeat the simulation presented in Figure 2 (i) in de Haan and Zhou (2017) on samples of size with and . Figure 6 illustrates the benefits of locally optimizing the threshold via SAMSEE from Section 3
, as it strongly tightens the empirical confidence interval around the average, which is obtained from the
% and % quantiles among estimates.5.2 Functional extreme value index of operational losses
The operational losses in the dataset of UniCredit are grouped by the type of event that caused the specific loss. We consider the event type CPBP, which provides sufficient observations for our local estimation approach. The CPBP losses are caused by clients, products and business practices related to derivatives or other financial instruments.
First we want to test if the extreme value index is constant over time. Using the test T4 from Einmahl
et al. (2016), we can reject the null hypotheses with a value that is virtually zero and thus are confident that the extreme value index of the losses is indeed varying over time.
We apply the new methodology from (23) to these losses via estimating with sIHS from Section 2 and the SAMSEE approach from Section 3. Figure 7 shows the estimates we obtain for the event type CPBP. It is clearly visible that both procedures yield similar estimates for most time points and that the simple ad hoc estimators recover an increase of the severity of high losses during the financial and Euro crisis from 2008 to 2011. A similar overall trend in the extreme value index can also be identified in the estimates of Hambuckers
et al. (2018) for CPBP.
For a more extensive discussion of the data and results of the more complex model including further covariates we refer to Hambuckers
et al. (2018).
Appendix A Theoretical results and proof of Theorem 3
Lemma 1 (Distribution of the Hill estimator).
Let for with
. Then the following distributional representation for the Hill estimator holds,
where and for it holds that
for some .
Proof.
From the first order condition
follows that is regularly varying with index and there exists a slowly varying function , such that . Let be i.i.d. random variables with distribution function . Note that . We define , for which it follows that
Note that is standard exponentially distributed. By Lemma 3.2.3 in de Haan and
Ferreira (2006) follows for i.i.d. standard exponential random variables that
Hence, we obtain for the Hill estimator that
where as the sum of i.i.d. exponentials and denotes the second average.
If , almost surely by Lemma 3.2.1 in de Haan and
Ferreira (2006). Since is slowly varying, converges to zero almost surely.
If , in probability by Cor. 2.2.2 in de Haan and
Ferreira (2006)
. Thus, by the weak law of large numbers
∎
Lemma 2.
Let for with . Then the following holds for and depending on the sample fraction .

If k is finite,

then , as

Comments
There are no comments yet.