Tails and probabilities for extreme outliers

by   Pavlina Jordanova, et al.
RE/MAX Partners

The task of estimation of the tails of probability distributions having small samples seems to be still opened and almost unsolvable. The paper tries to make a step in filling this gap. In 2017 Jordanova et al. introduce six new characteristics of the heaviness of the tails of theoretical distributions. They rely on the probability to observe bluemild or extreme outliers. The main their advantage is that they always exist. This work presents some new properties of these characteristics. Using them six distribution sensitive estimators of the extremal index are defined. A brief simulation study compares their quality with the quality of Hill, t-Hill, Pickands and Deckers-Einmahl-de Haan estimators.



There are no comments yet.


page 1

page 2

page 3

page 4


On Extreme Value Index Estimation under Random Censoring

Extreme value analysis in the presence of censoring is receiving much at...

Estimation of conditional extreme risk measures from heavy-tailed elliptical random vectors

In this work, we focus on some conditional extreme risk measures estimat...

SetSketch: Filling the Gap between MinHash and HyperLogLog

MinHash and HyperLogLog are sketching algorithms that have become indisp...

Estimation and reduced bias estimation of the residual dependence index with unnamed marginals

Unlike univariate extreme value theory, multivariate extreme value distr...

A new blocks estimator for the extremal index

The occurrence of successive extreme observations can have an impact on ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction and Preliminaries

One of the main tasks in Extreme value theory is estimation of extremal index. Given a huge sample it is solved by Hill hill1975simple , t-Hillstehlik2010favorable ; jordanova2012weak , Pickandspickands1975statistical and Deckers-Einmahl-de Haandekkers1989moment estimators. However their rates of convergence are fast only in case when the tail of the observed distribution is very close to Pareto one. The last makes difficult the task for estimating extremal index based on small samples. A good experience with this can be done when you try to estimate the tail index of the Hill-horror distribution. This distribution is discussed e.g. in Embrechts et al. (2013) embrechts2013modelling or Resnick (2007) resnick2007heavy . Therefore a preliminary classification of the tails of the distributions that can be used for preparing later on distribution sensitive estimators of the extremal index seems to be reasonable and very useful. According to Klugman klugman2012loss ”The tail of a distribution … is that part that reveals probabilities about large values”. And now the question: ”What does it mean ”large values”?” arises. In order to clarify this concept we follow Tukey at al. tukey1977exploratory and McGill et al. (1978) mcgill1978variations

approach. They define mild and extreme outliers and box-plots. The main statistics that they use are the quartiles of the empirical distribution and the interquartile range. In 2017 Jordanova et al.


use their results and make classification of the probability distributions with respect to heaviness of their tails. They are based on the probability of the event to observe extreme outlier in a sample of independent observations. Analogously to the situations when we consider mean values and variances it is possible one distribution to belong to more than one distributional type with respect to this classification. However it shows us the most appropriate classes of distributions for fitting the corresponding distributional tail. In Section 2 new properties of these characteristics are obtained. The main their advantages are that they always exist and they are invariant with respect to increasing affine transformations. In Section 3 a new estimator of the extremal index is obtained and its properties are compared with the properties of Hill

hill1975simple , t-Hillstehlik2010favorable ; jordanova2012weak , Pickandspickands1975statistical and Deckers-Einmahl-de Haandekkers1989moment estimators. A beautiful summary of their properties could be found e.g. in Resnick et al. (2007) resnick2007heavy or Embrechts et al. (2013) embrechts2013modelling , and the references there in. The paper finishes with some conclusive remarks.

Along the paper

are independent identically distributed(i.i.d.) observations on a random variable (r.v.)

. Denote their cumulative distribution function (c.d.f.) by

, the theoretical p-quantiles by

, , and the corresponding increasing order statistics by . There are many different definitions of the empirical p-quantiles . They can be found e.g. in Hyndman et al. (1996) hyndman1996sample , Langford (2006)langford2006quartiles or Parzen (1979) parzen1979nonparametric . We use the following one


Here means the integer part of and . This definition entails

and the fact that the empirical quantile function is linearly interpolated between these points. This estimator is implemented in function

in R (2018)R as type = 6. Arnold et al. (1992)arnold1992first , Section 5.5 shows that

is asymptotically unbiased estimator for

. According to embrechts2013modelling if and , for , then almost sure. The last means that sample quantiles are strongly consistent estimators of the theoretical quantiles . Arnold et al. (1992)arnold1992first Th. 8.5.1. and Smirnov (1949) smirnov1949limit find conditions for their asymptotic normality. Pancheva (1984) pancheva1155limit is the first who describes the limiting probability laws for non-linearly normalized extreme order statistics. Pancheva and Gacovska (2014) pancheva2014asymptotic investigate asymptotic behavior of central order statistics under monotone normalizations. Their limit theorems propose further development of the results in this paper for different numbers of the central order statistics. Recently Barakat et al. (2017) barakat2017new model maxima under linear-power normalizations.

We will use these results for estimating the first and the third quartile of . They can be useful for estimating procedures based on relatively small samples because they are particular cases of the central order statistics and their rate of convergence seems to be faster than the rate of convergence of the extreme values.

2 Properties of , and Characteristics

Here we consider the following three characteristics of extremely heavy left-, right- or two-sided tails of theoretical distributions introduced in Jordanova et al. (2017) jordanova2017measuring :

where is the inter quartile range of the theoretical distribution. It is clear that and are weekly consistent L estimators correspondingly for and . For general theory of L estimators see e.g. Arnold et al. (1992)arnold1992first . The next their properties show that they are invariant with respect to shifting to a constant or with respect to a product with a positive number. This makes them very prominent for differentiating heaviness of the tails of the distributions.

Theorem 1. The characteristics , and possess the following properties:


If , then .


If , then , , .


If the constant , then , ,


If , then

Sketch of the proof: b) is corollary of the facts that

c) follows by the equalities , and .

d) Consider . Then , , .

In the next examples we will skip the cases when , and

simultaneously. In this class of distributions fall e.g. Uniform distribution. The definitions of the distributions that we consider below could be found in many standard textbooks in probability theory.

Figure 1: The dependence of and from

Example 1. Exponential distribution.

Let , and with . Then , , , , , ,

Example 2. Gamma distribution.

Assume , , with . Due to the fact that is a scale parameter and the characteristics and are invariant with respect to a scale change, without lost of generality . The probability for extreme left outliers . In order to obtain the quantile function of and to depict the dependence of on we have used R software R . The results are plotted on Figure 1, left. The conclusion that only in case we have corresponds to those made by Klugman et al. (2012) klugman2012loss based on hazard rate function.

Example 3. Normal distribution.

Consider , and . Without lost of generality and . Due to its symmetry and . Then . Therefore

n 1 2 3 4 5
0.0453 0.0146 0.0064 0.0033 0.0019
n 6 7 8 9 10
0.0012 0.0008 0.0006 0.0004 0.0003
Table 1: The dependence of and from . Here .

Example 4. -distribution. Assume and . Using R software R one can obtain , and . The values of are presented in Table 1.

Example 5. Pareto distribution. Let , and

In this case , , , . Then

The plot of the last characteristic with respect to is presented on Figure 1, left.

Example 6. Frchet distribution. Assume , , and

It is well known that , , , . Then

The dependence of on is depicted on Figure 1, left. It corresponds to the well known result that Frchet’s and Pareto’s tails and very similar.

Example 7. Weibull negative distribution. Consider , , , and

The corresponding quantile function is , , , and . Then

Figure 1, right represents the dependence of on .

Example 8. Gumbell distribution. Let , , and

In this case , , , and . Then

The last means that in the context of characteristics Exponential distribution has heavier right tail than the Gumbell one. Moreover it has approximately three times higher chance for observing right extreme outliers.

Example 9. Hill-horror distribution. Assume and

Then , ,

Therefore . The dependence on the values of with respect to is presented on Figure 1, left. We observe that within the considered types this distribution has ”heaviest tail”.

3 The Extremal Index Estimators

Suppose is a r.v. with c.d.f. with regularly varying tail. More precisely there exists such that for all ,


Jordanova et al. (2017) jordanova2017measuring use characteristics and obtain five new distribution sensitive statistics for the parameter . Here we introduce one more estimator. It is based on the assumption that the observed r.v. has Hill-Horror distribution. Its rate of convergence is compared with the one of the corresponding Hill hill1975simple , t-Hillstehlik2010favorable ; jordanova2012weak , Pickandspickands1975statistical and Deckers-Einmahl-de Haandekkers1989moment estimators. Along the section and are correspondingly the first and the third empirical quartiles of the sample and is the empirical inter quartile range. The first group of two estimators that we consider are the most appropriate if the observed r.v. is Pareto distributed.

1. Assume , Then , therefore

Having a sample of observations, if the corresponding statistic is

where is the number of the extreme outlayers divided by the sample size .

The quantiles are as useful as the cumulative distribution function. Quantile matching procedure seems to be well known. Its description could be seen e.g. in Klugman et al. (2012)klugman2012loss

. Analogously to the generalized method of moments we can make generalized quantile matching procedure. The second estimator is based on the fact that the fraction of the quartiles of Pareto distribution is

It is invariant with respect to a scale change, and given it has the form

Our empirical study shows that outperforms the other estimators discussed here in case of Pareto observed r.v.

2. The estimators from the second group are the most appropriate for the case when the observed r.v. is Frchet distributed. The equality leads us to the estimator

In order to obtain the second estimator we consider the fraction

Using the Generalized quantile matching procedure (see Klugman et al. (2012) klugman2012loss ) we express and replace the theoretical quartiles with the corresponding empirical. Finally we obtain

Given small samples and Frchet or Hill-Horror observed r.v. seems to be very appropriate. It exceeds the quality of the other estimators discussed here. See Figure 3 and Figure 4.

3. Suppose now that the observed r.v. has distribution which tail is close to those of the Hill-Horror distribution. In Embrechts et al. embrechts2013modelling this distribution is defined via its quantile function ,

To the best knowledge of the authors the following estimator is new. It is obtained using the relation between characteristic of the Hill Horror distribution and . Given and we can express and obtain

In the next section we show that within the considered set of distributions together with are the only appropriate estimators for given small sample of observations on Hill-Horror distributed r.v. See Figure 4.

It is easy to see that

Jordanova et al. (2017) replace the theoretical quartiles with the corresponding empirical. In this way using the generalized quantile matching procedure the authors define the following estimator

The empirical results show that given a sample of observations on Hill-Horror distributed r.v. the rate of convergence of this estimator increases when decreases. However according to our observations is too distribution sensitive and not robust.

4 Simulation Study and Comparison With Alternative Estimators

Figure 2: Comparison between the rates of convergence of , , , , , (left) and Hill, t-Hill and Deckers-Einmahl-de Haan, (right) estimators for Pareto(1, 0.5) observed r.v.

In this section we explore the behaviour of the considered estimators. Using the functions implemented in R (2018), R we have simulated samples of independent observations separately on r.v. with one of the following three probability laws: Pareto, Frchet or Hill-Horror. Then for any fixed and for any fixed sample we have calculated , , , , and , Hill hill1975simple , t-Hillstehlik2010favorable ; jordanova2012weak , Pickandspickands1975statistical and Deckers-Einmahl-de Haandekkers1989moment estimators. Finally we have averaged the corresponding values over the considered . Because of Hill, t-Hill and Deckers-Einmahl-de Haan estimators depend not only of the sample size, but also from the number of order statistics that are included in their calculations in any of these three cases the results are plotted on separate figure. We have excluded the Pickands estimator from our plots because it turned out that the considered sample size is not enough to observe its good properties.

On Figure 2, a) is depicted the dependence of the values of , , , ,

and their empirical 95% confidence intervals on the sample size. The plot of

is skipped because of it fluctuates too much. The plots of Hill, t-Hill, and Deckers-Einmahl-de Haan estimators together with their empirical 95% confidence intervals are given on Figure 2, b). Let us note that on the second figure the sample size is fixed. It is , and only the number of order statistics changes. Therefore this figure should be compared only with the points on Figure 2, a). We observe that , and have very similar behaviour to the well known estimators. Of course in this case, when the observed r.v. has exact Pareto distribution the Hill estimator outperforms the others.

If the observed r.v. is Frchet() distributed, then the results from our simulation study, depicted on Figure 3, a) show that only estimator seems to be unbiased. The biggest advantage of this estimator is that in this case and for the considered sample sizes Hill, t-Hill, Pickands and Deckers-Einmahl-de Haan estimators are not appropriate, because of their slower rate of convergence.

Figure 3: Comparison between the rates of convergence of , , , , , , (left) and Hill, t-Hill and Deckers-Einmahl-de Haan, (right) estimators for Frchet() observed r.v.

The case when the observed r.v. comes from Hill-Horror type is the most difficult for estimating. Here we have simulated such samples for . Hill, t-Hill, Pickands and Deckers-Einmahl-de Haan statistics are not appropriate because the sample size is too small. See e.g. Embrechts et al. (2013) embrechts2013modelling and Figure 4, b). The plots on Figure 4, a) show that only and estimators has relatively fast rate of convergence and seems to be appropriate in this case.

Figure 4: Comparison between the rates of convergence of , , , , , , (left) and Hill, t-Hill and Deckers-Einmahl-de Haan, (right) estimators for Hill-Horror() observed r.v.

5 Conclusions

The introduced , and characteristics and their estimators are appropriate for usage in preliminary statistical analysis. They can help the practitioners to find the closest classes of probability laws to the distribution of the observed r.v. Within that family the tail index needs further estimation. That is when we fix the most appropriate parametric family the proposed estimators work well, but they are not appropriate in general non-parametric situations. For example if the observed r.v. has Pareto distribution then it is well known that Hill estimator is the best one. Here we propose , , estimators as its alternatives. In case when follows Frchet type, then has the best properties. If is close to Hill-Horror distribution and have fast rate of convergence and therefore they can be very useful for working with relatively small sample sizes. However the main disadvantage of all these estimators is that they are too distribution sensitive. The last means that their good properties disappear if the distributional type is not correctly determined. Here the characteristics of the heaviness of the tails of the distributions , , , , and can be useful.


The authors are grateful to the bilateral projects Bulgaria - Austria, 2016-2019, Feasible statistical modelling for extremes in ecology and finance, Contract number 01/8, 23/08/2017, the project RD-08-125/06.02.2018 from the Scientific Research Fund in Shumen University, and the project 80-10-222/04.05.2018 from the Scientific Funds of Sofia University.


  • (1) B. M. Hill, et al., A simple general approach to inference about the tail of a distribution, The annals of statistics 3 (5) (1975) 1163–1174.
  • (2) M. Stehlík, R. Potockỳ, H. Waldl, Z. Fabián, On the favorable estimation for fitting heavy tailed data, Computational Statistics 25 (3) (2010) 485–503.
  • (3) P. K. Jordanova, E. I. Pancheva, Weak asymptotic results for t-hill estimator, Comptes rendus de l’académie bulgare des sciences 65 (12) (2012) 1649–1656.
  • (4) J. Pickands III, Statistical inference using extreme order statistics, the Annals of Statistics (1975) 119–131.
  • (5) A. L. Dekkers, J. H. Einmahl, L. De Haan, A moment estimator for the index of an extreme-value distribution, The Annals of Statistics (1989) 1833–1855.
  • (6) P. Embrechts, C. Klüppelberg, T. Mikosch, Modelling extremal events: for insurance and finance, Vol. 33, Springer Science & Business Media, 2013.
  • (7) S. I. Resnick, Heavy-tail phenomena: probabilistic and statistical modeling, Springer Science & Business Media, 2007.
  • (8) S. A. Klugman, H. H. Panjer, G. E. Willmot, Loss models: from data to decisions, Vol. 715, John Wiley & Sons, 2012.
  • (9) J. W. Tukey, Exploratory data analysis, Vol. 2, Reading, Mass., 1977.
  • (10) R. McGill, J. W. Tukey, W. A. Larsen, Variations of box plots, The American Statistician 32 (1) (1978) 12–16.
  • (11) P. K. Jordanova, M. P. Petkova, Measuring heavy-tailedness of distributions, in: AIP Conference Proceedings, Vol. 1910, AIP Publishing, 2017, p. 060002.
  • (12) R. J. Hyndman, Y. Fan, Sample quantiles in statistical packages, The American Statistician 50 (4) (1996) 361–365.
  • (13) E. Langford, Quartiles in elementary statistics, Journal of Statistics Education 14 (3).
  • (14) E. Parzen, Nonparametric statistical data modeling, Journal of the American statistical association 74 (365) (1979) 105–121.
  • (15) The r project for statistical computing (2018).
    URL https://www.r-project.org/
  • (16) B. C. Arnold, N. Balakrishnan, H. N. Nagaraja, A first course in order statistics, Vol. 54, SIAM, 1992.
  • (17) N. V. Smirnov, Limit distributions for the terms of a variational series, Trudy Matematicheskogo Instituta imeni VA Steklova 25 (1949) 3–60.
  • (18)

    E. Pancheva, Limit theorems for extreme order statistics under non-linear transformations, Stability Problems for Stochastic Models. Lecture Notes in Statistics 1155 248–309.

  • (19) E. I. Pancheva, A. Gacovska, Asymptotic behavior of central order statistics under monotone normalization, Theory of Probability & Its Applications 58 (1) (2014) 107–120.
  • (20) H. Barakat, A. Omar, O. Khaled, A new flexible extreme value model for modeling the extreme value data, with an application to environmental data, Statistics & Probability Letters 130 (2017) 25–31.
  • (21) W. J. Dixon, Analysis of extreme values, The Annals of Mathematical Statistics 21 (4) (1950) 488–506.
  • (22) W. Dixon, Processing data for outliers, Biometrics 9 (1) (1953) 74–89.
  • (23) F. E. Grubbs, Procedures for detecting outlying observations in samples, Technometrics 11 (1) (1969) 1–21.
  • (24) J. Irwin, On a criterion for the rejection of outlying observations, Biometrika 17 (3/4) (1925) 238–250.
  • (25) A. McKay, The distribution of the difference between the extreme observation and the sample mean in samples of n from a normal universe, Biometrika 27 (3/4) (1935) 466–471.
  • (26) K. Nair, The distribution of the extreme deviate from the sample mean and its studentized form, Biometrika 35 (1/2) (1948) 118–144.