Logarithm of ratios of two order statistics and regularly varying tails

Here we suppose that the observed random variable has cumulative distribution function F with regularly varying tail, i.e. 1-F ∈ RV_-α, α > 0. Using the results about exponential order statistics we investigate logarithms of ratios of two order statistics of a sample of independent observations on Pareto distributed random variable with parameter α. Short explicit formulae for its mean and variance are obtained. Then we transform this function in such a way that to obtain unbiased, asymptotically efficient, and asymptotically normal estimator for α. Finally we simulate Pareto samples and show that in the considered cases the proposed estimator outperforms the well known Hill, t-Hill, Pickands and Deckers-Einmahl-de Haan estimators.

Authors

• 2 publications
• 1 publication
• Distribution sensitive estimators of the index of regular variation based on ratios of order statistics

Ratios of central order statistics seem to be very useful for estimating...
07/29/2020 ∙ by Pavlina K. Jordanova, et al. ∙ 0

• Trimming and threshold selection in extremes

We consider removing lower order statistics from the classical Hill esti...
03/19/2019 ∙ by Martin Bladt, et al. ∙ 0

• Efficiency requires innovation

In estimation a parameter θ∈ R from a sample (x_1,...,x_n) from a popula...
02/18/2019 ∙ by Abram M. Kagan, et al. ∙ 0

• Efficient Minimum Distance Estimation of Pareto Exponent from Top Income Shares

We propose an efficient estimation method for the income Pareto exponent...
01/08/2019 ∙ by Alexis Akira Toda, et al. ∙ 0

• SMML estimators for exponential families with continuous sufficient statistics

The minimum message length principle is an information theoretic criteri...
02/04/2013 ∙ by James G. Dowty, et al. ∙ 0

• The exponential distribution analog of the Grubbs--Weaver method

Grubbs and Weaver (JASA 42 (1947) 224--241) suggest a minimum-variance u...
12/05/2018 ∙ by Andrew V. Sills, et al. ∙ 0

• Ready-to-Use Unbiased Estimators for Multivariate Cumulants Including One That Outperforms x^3

We present multivariate unbiased estimators for second, third, and fourt...
04/27/2019 ∙ by Fabian Schefczik, et al. ∙ 0

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 History of the Problem

The usefulness of regularly varying (RV) functions in economics seems to be discussed for the first time during modeling of the wealth in our society by Pareto distribution, called to the name of Vilfredo Pareto (1897). J. Karamata (1933) provides their definition and integral representation. Later on the Convergence to types theorem, proved by R. A. Fisher, L. H. C. Tippett (1928), and B.V. Gnedenko (1948) plays a key role for their future applications. It is well known that this class of distributions describes very well the domain of attraction of stable distribution (see Mandelbrot (1960) [15]) and max-domain of attraction of Frchet distribution (see M. Frchet (1927)). Laurens de Haan (1970) and co-authors [3, 4, 5] develop the main machinery for working with cumulative distribution functions(c.d.fs.) with such tail behaviour. Let us remind that the c.d.f. has regularly varying right tail with parameter , if

 limt→∞1−F(xt)1−F(t)=x−α,∀x>0.

After their works the topic spread over the world very fast and many estimators of the index of regular variation are proposed, see e.g. Hill (1975) [10], Pickands (1975)[19] and Deckers-Einmahl-de Haan (1989) [6], t-Hill (Stehlik and co-authors (2010) [24, 9, 25], and Pancheva and Jordanova (2012) [11, 14]), among others.

Here we show the usefulness of functions of two central order statistics in estimating the parameter of regular variation. Under very general settings we show that the logarithm of the fraction of two specific central order statistics is an weakly consistent and asymptotically normal estimators of the logarithm of the corresponding theoretical quantiles. Then we use these functions and obtain our estimator for

. Its main advantage is that it is very flexible and provides a useful accuracy given mid-range and small samples. Pareto case, considered in Section 3 motivates our investigation. First we define a biased form of the estimator. Then using results about order statistics, which could be seen e.g. in Nevzorov (2001) [18]

we obtain explicit formulae for its mean and variance. This allows us to define unbiased correction which is asymptotically efficient. Then we prove asymptotic normality and obtain large sample confidence intervals. Our simulation study depicts the advantages of the considered estimators over Hill, t-Hill, and Deckers-Einmahl-de Haan estimators. The paper finishes with some conclusive remarks.

Trough the paper we assume that are independent observations on a random variable(r.v.) , and denote by the corresponding increasing order statistics.

 Hn,m=1+12m+13m+...+1(n−1)m+1nm,n=1,2,...,

denotes the -the Generalized harmonic number of power , and , is for the well-known -th harmonic number.

The main object of interest in this point are the statistics

 Qk,s:=logX(ks,(s+1)k−1)X(k,(s+1)k−1)Hks−1−Hk−1,Q∗k,s=logX(ks,(s+1)k−1)X(k,(s+1)k−1)log(s),s=2,3,...

The estimator it is obtained in Jordanova et al. [13] via quantile matching procedure. About the last procedure see e.g. Sgouropoulos et al. (2015) [21].

Along the paper means convergence in distribution.

2 General Results

In 1933 - 1949 Smirnoff [22] shows that in case of central order statistics, and more precisely for and such that and , the asymptotic distribution of is a standard normal. Moreover it seems that he has a similar results about bivariate order statistics. It could be seen e.g. in Arnold et al. (1992) [1], p. 226, Mosteller (1946) [16] p.338, Nair [17], p.330, or Wilks [27] among others. The multivariate delta method is a very powerful technique for obtaining confidence intervals in such cases. In the next theorem we apply them and obtain the limiting distribution of the logarithmic differences of central order statistics.

Smirnoff’s theorem. Assume for , , , , and . Then

 √n(X(k1,n)−F←(p1)X(k2,n)−F←(p2))d→(θ1θ2),(θ1θ2)∈N(0,V)

where the covariance matrix

 V=⎛⎜ ⎜⎝p1(1−p1)f2[F←(p1)]p1(1−p2)f2[F←(p1)]f2[F←(p2)]p1(1−p2)f2[F←(p1)]f2[F←(p2)]p2(1−p2)f2[F←(p2)]⎞⎟ ⎟⎠.

We apply this theorem together with the Multivariate delta method and obtain asymptotic normality of the estimators, discussed in this paper.

Theorem 1. Consider a sample of , independent observations on a r.v. with c.d.f. and p.d.f. . If there exists and , then for

 Tk,s:=√(s+1)k−1⎡⎣log(X(ks,(s+1)k−1)X(k,(s+1)k−1))−log⎛⎝F←(ss+1)F←(1s+1)⎞⎠⎤⎦d→N(0;V). (1)

The variance in (1) is , where , and

Proof: We will apply the Theorem of Smirnoff for and and Multivariate delta method.

By assumptions the conditions , are satisfied. And for we have , , and , therefore the Smirnoff’s theorem on the joint asymptotic normality of the order statistics, says that

 √(s+1)k−1⎛⎝X(k,(s+1)k−1)−F←(1s+1)X(ks,(s+1)k−1)−F←(is+1)⎞⎠d→N[([]c00);D],k→∞,

where the asymptotic covariance matrix of this bivariate distribution is

and the asymptotic correlation between these two order statistics is .

Consider the function . For and it is continuously differentiable.

The Jacobian of the transformation is

 J:=[∂g(x,y)∂x,∂g(x,y)∂y]=(−1x,1y).

The asymptotic mean is

 limk→∞Elog(X(ks,(s+1)k−1)X(k,(s+1)k−1))=g[F←(1s+1),F←(ss+1)]=log⎛⎝F←(ss+1)F←(1s+1)⎞⎠.

Now we apply the Multivariate Delta method, which could be seen e.g. in Sobel (1982) [23], and obtain that the asymptotic variance of is

 V: = J×D×J′=(−1x,1y)∣∣∣x=F←(1s+1),y=F←(ss+1) × = 1(s+1)2⎡⎢⎣(−1x,1y)⎛⎜⎝sf2(x)1f(x)f(y)1f(x)f(y)sf2(y)⎞⎟⎠(−1x1y)⎤⎥⎦∣∣∣x=F←(1s+1),y=F←(ss+1) = 1(s+1)2[sx2f2(x)−2xyf(x)f(y)+sy2f2(y)]∣∣∣x=F←(1s+1),y=F←(ss+1)=1(s+1)2(sa2F,s−2aF,sbF,s+sb2F,s).

Q.A.D.

Slutsky’s theorem about continuous functions together with the definition of convergence in probability, application of quantile transform, and Smirnoff’s theorem about a.s. convergence of empirical quantiles to corresponding theoretical one, lead us to the following result. Without lost of generality we consider only a.s. positive r.vs, however the result could be easily transformed for

or , .

Theorem 2. Assume . If , , then for

 log(X(ks,(s+1)k−1)X(k,(s+1)k−1))P→log⎡⎣F←(ss+1)F←(1s+1)⎤⎦,k→∞. (2)

3 Pareto Case

In this section we assume that are independent observations on a r.v. with Pareto c.d.f.

 FX(x)=⎧⎨⎩0,x≤δ1−(δx)α,x>δ,α>0,δ>0. (3)

Briefly we will denote this by . Different generalizations of this distributions could be seen in Arnold (2015) [2]. The number is called ”index of regular variation of the tail of c.d.f.”. It determines the tail behaviour of the c.d.f. See e.g. de Haan and Ferreira [5], Resnick [20], or Jordanova [12].

Denote by , the fact that the r.v. has c.d.f.

 FX(x)={0,x≤01−e−λx,x>0. (4)

The results in the following theorem allow us later on, in Corollaries 1 and 2, to obtain unbiased, consistent, and asymptotically efficient estimators of the parameter .

Theorem 3. Assume are order statistics of independent observations on a r.v. , , , and are integer.

i)

Denote by

a Beta distributed with parameters

, and . Then

 log(X(j,n)X(i,n))d=−1αlog(ρ)d=E(j−i,n−i)d=1αE∗(j−i,n−i),

where is the -th order statistics in a sample of independent observations on i.i.d. Exponential r.vs. with parameter , and is the - th order statistic of a sample of

independent observations on exponentially distributed r.v. with parameter

. Its probability density function is

 flogX(j,n)X(i,n)(x)=α(n−i)!(j−i−1)!(n−j)!(1−e−αx)j−i−1e−αx(n−j+1),x>0.
ii)

and

Proof: Let us fix , integers. Because of is a strictly increasing function, it is well known that the probability quantile transform, entails

 (X(1,n)δ,X(2,n)δ,...,X(n,n)δ)d=(eE(1,n),eE(2,n),...eE(n,n)),

where are order statistics of independent identically distributed (i.i.d.) r.vs. with . Then, because of the multiplicative property of the exponential distribution

 (X(1,n)δ,X(2,n)δ,...,X(n,n)δ)d=(e1αE∗(1,n),e1αE∗(2,n),...e1αE∗(n,n)),

where are order statistics of i.i.d. r.vs. with . See e.g. de Haan and Ferreira [5]. Denote the logarithm with basis by log. Because of , , is an increasing function, thus

 log(X(j,n)X(i,n))d=1α(E∗(j,n)−E∗(i,n))d=1αE∗(j−i,n−i).

The last equality could be seen e.g. in de Haan and Ferreira [5] or Arnold et al. (1992) [1].

i) Follows by the equality , the well known relation and the formula for probability density function (p.d.f.) of order statistics of a sample of i.i.d. r.vs. See e.g. p. 7 Nevzorov [18].

ii) The mean, and the variance of the last order statistics are very well investigated. See e.g. Nevzorov [18], p.23. Using his results and the main properties of the expectation and the variance we obtain:

 E[log(X(j,n)X(i,n))] = E(1αE∗(j−i,n−i))=1αE(E∗(j−i,n−i))=1α(Hn−i−Hn−j) D[log(X(j,n)X(i,n))] =

Q.A.D.

In the next corollary is useful when working with finite samples. We obtain that for any , and for fixed the estimators are unbiased for . The accuracy of these estimators in that case is explicitly calculated. However these estimators are applicable also for large enough samples, because for they are weakly consistent and asymptotically efficient.

Corollary 1. Assume , are order statistics of independent observations on a r.v. , , . Then, for all , and ,

i)

Denote by a Beta distributed with parameters , and . Then

 Qk,sd=−log(ρ)α(Hks−1−Hk−1)d=E((s−1)k,ks−1)Hks−1−Hk−1d=E∗((s−1)k,ks−1)α(Hks−1−Hk−1),

where is the -th order statistics in a sample of independent observations on i.i.d. Exponential r.vs. with parameter . is the - th order statistic of a sample of independent observations on exponentially distributed r.v. with parameter . Its probability density function is

 fQk,s(x)=α(Hks−1−Hk−1)(ks−1)![(s−1)k−1]!(k−1)!(1−e−α(Hks−1−Hk−1)x)(s−1)k−1e−kα(Hks−1−Hk−1)x,x>0.
ii)

and

iii)

For all ,

 P[∣∣∣Qk,s−1α∣∣∣>ε]≤Hks−1,2−Hk−1,2α2ε2(Hks−1−Hk−1)2.
iv)

The estimator is asymptotically efficient. For ,

 DQk,s∼Hks−1,2−Hk−1,2α2[log(ks−1k)]−2,limk→∞DQk,s=0.
v)

The estimator is weekly consistent. More precisely, for all ,

Proof: i) and ii) follow by Theorem 1, definition of and the relations

 fQk,s(x)=(Hks−1−Hk−1)flogX(ks,k(s+1)−1)X(k,k(s+1)−1)[x(Hks−1−Hk−1)],EQk,s=ElogX(ks,k(s+1)−1)X(k,k(s+1)−1)Hks−1−Hk−1,DQk,s=DlogX(ks,k(s+1)−1)X(k,k(s+1)−1)(Hks−1−Hk−1)2

iii) is corollary of ii) and Chebyshev’s inequality.

iv) It is well known that where is the EulerMascheroni constant, , and is the Digamma function. By ii) for any fixed , we have

 limk→∞DQk,s = 1α2limk→∞Hks−1,2−Hk−1,2(Hks−1−Hk−1)2=1α2limk→∞Hks−1,2−Hk−1,2{Hks−1−log(ks−1)−[Hk−1−log(k)]+log(ks−1k)}2 (5) = 1α2limk→∞Hks−1,2−Hk−1,2[log(ks−1k)]2=1α2limk→∞Hks−1,2−limk→∞Hk−1,2[log(s)]2=0 (6)

In the last equality we have used the well known solution of the Basel problem, and more precisely the limit

v) is a consequence of ii), iii) and iv). Q.A.D.

In the previous proof we have seen that for any fixed , . Therefore, although are biased, they are asymptotically unbiased, asymptotically normal, weakly consistent and asymptotically efficient estimators for . The next conclusions follow by the relation , and the main properties of the mean and the variance.

Corollary 2. Assume , are order statistics of independent observations on a r.v. , , .

i)

Denote by a Beta distributed with parameters , and . Then, for all ,

 Q∗k,sd=−log(ρ)αlog(s)d=E((s−1)k,ks−1)log(s)d=E∗((s−1)k,ks−1)αlog(s),

where is the -th order statistics in a sample of independent observations on i.i.d. Exponential r.vs. with parameter . is the - th order statistic of a sample of independent observations on exponentially distributed r.v. with parameter . Its probability density function is

 fQ∗k,s(x)=αlog(s)(ks−1)![(s−1)k−1]!(k−1)!(1−s−αx)(s−1)k−1s−αkx,x>0.
ii)

For all , and

iii)

For all , and ,

 P[∣∣∣Q∗k,s−1α∣∣∣>ε]≤Hks−1,2−Hk−1,2α2ε2[log(s)]2.
iv)

estimator is asymptotically unbiased and asymptotically efficient. More precisely

 limk→∞EQ∗k,s=1α,limk→∞DQ∗k,s=0.
v)

estimator is weekly consistent. For all ,

Applications of the previous results require knowledge about confidence intervals. Therefore, in the the next theorem, we obtain asymptotic normality of these estimators which allows us later on to construct large sample confidence intervals.

Theorem 4. If , , , then for all , and ,

 (7)
 √k(s+1)−1(Hks−1−Hk−1)[αQk,s−log(s)Hks−1−Hk−1]d→η2,η2∈N(0,s2−1s), (8)
 √k(s+1)−1[αQ∗k,s−1]d→η3,η3∈N(0,s2−1s[log(s)]2). (9)

Proof: In this case , and . Therefore, , ,

 f[F←(1s+1)]=αs1/α+1δ(s+1)1/α+1∈(0,∞),f[F←(ss+1)]=αδ(s+1)1/α+1∈(0,∞),

For we have , , , and therefore we can apply Smirnoff’s theorem about the joint asymptotic normality of the order statistics and Theorem 1. In order to determine and let us note that . Therefore

 aF,s=1{log[F←(p)]}′|p=1s+1=αss+1,bF,s=1{log[F←(p)]}′|p=ss+1=αs+1

The equalities

 V=1(s+1)2(sa2F,s−2aF,sbF,s+sb2F,s)=1α2(1s−2s+s)=s2−1α2s,log⎛⎝F←(ss+1)F←(1s+1)⎞⎠=1αlog(s).

lead us to (7). When we multiply the numerator in (7) by , and the denominator by , and use that we obtain (8). If we multiply both sides of (7) by , and use that we obtain (9). Q.A.D.

Now we are ready to compute the corresponding confidence intervals. Let us chose and denote by ,

quantile of the standard normal distribution. Using (

9), and the definition of we obtain

 P[−z1−α02≤log(s)√s[k(s+1)−1]s2−1(αQ∗k,s−1)≤z1−α