Extreme value (EV) methodology starts from the assumption that the distribution of the available sample belongs to the domain of attraction of a generalized extreme value distribution, i.e. there exists sequences and such that as
where , for some with . The parameter is termed the extreme value index (EVI). It is well-known (see e.g. Beirlant et al., 2004, and de Haan and Ferreira, 2006)) that (1) is equivalent to the existence of a positive function , such that
where and denotes the endpoint of the distribution of . The conditional distribution of given is called the peaks over threshold (POT) distribution, while is the survival function of the generalized Pareto distribution (GPD).
In case , the limit in (1) holds if and only if is of Pareto-type, i.e.
for some slowly varying function , i.e. satisfying as , for every . Pareto-type distributions satisfy a simpler POT limit result: as
Estimation of and tail quantities such as return periods is then based on fitting a GPD to the observed excesses given , respectively a simple Pareto distribution with survival function to given in case . The main difficulty in such an EV application is the choice of the threshold . Most often, the threshold is chosen as one of the top data points for some where denotes the ordered sample. The limit results in (2) and (4) require to be chosen as large as possible (or, equivalently, as small as possible) for the bias in the estimation of
and other tail parameters to be limited. However, in order to limit the estimation variance,should be as small as possible, i.e. the number of data points used in the estimation should be as large as possible. Several adaptive procedures for choosing or have been proposed, but mainly in the Pareto-type case with under further second-order specifications of (3) or (4), see for instance Chapter 3 in Beirlant et al. (2004), or Matthys and Beirlant (2000).
In case of a real-valued EVI, the selection of an appropriate threshold is even more difficult and only a few methods are available. Dupuis (1999) suggested a robust model validation mechanism to guide the threshold selection, assigning weights between 0 and 1 to each data point where a high weight means that the point should be retained since a GPD model is fitting it well. However, thresholding is required at the level of the weights and hence the method cannot be used in an unsupervised manner.
where , with and slowly varying at infinity (see section 2.3 in de Haan and Ferreira, 2006), the left hand side of (4) equals
This then leads to the extension of the Pareto distribution (EPD) to approximate the distribution of given as :
with satisfying as .
In cases where the second order model (5) holds, such a mixture model will improve the approximation of for values of which are smaller than the appropriate -values when modelling the POTs using . So the extension can work when modelling large and moderate extremes. As a byproduct however, at instances, it may even work for the full sample.
In Beirlant et al. (2009), using an external estimator of , the parameters are estimated fitting the EPD (slightly adapted, with survival function and ) by maximum likelihood on excesses over a random threshold , . The result of this procedure is two-fold:
First, the estimates of are more stable as a function of compared to the original ML estimator derived by Hill (1975)
which is obtained by fitting the Pareto distribution to the excesses following (4). Indeed, the bias in the simple POT model (4) is estimated when fitting and it is shown that, under the assumption that the EP model for the excesses is correct and that is estimated consistently, the asymptotic bias of is 0 as long as as , while the asymptotic bias of is only 0 when .
On the other hand, the asymptotic variance of equals , where is the asymptotic variance of .
As an example Figure 1 shows both the Hill estimates and the bias reduced estimates , obtained from maximum likelihood fitting of (6) using and , as a function of for a dataset of Belgian ultimate car insurance claims from 1995 and 2010 discussed in more detail in Albrecher et al. (2017). Note that the bias reduced estimates helps to interpret the original Hill ”horror” plot. Here from the bias reduced estimator a level around 0.5 becomes apparent for and a lower value between 0.3 and 0.4 for smaller values of . In fact in insurance claim data mixtures in the ultimate tail do appear quite often. Moreover the EPD fit appears to extend quite well down to the lower threshold value, i.e. with up to 600 (but not when using almost all data, ). In this sense, classical first order extreme value modelling can in some cases be extended using mixture modelling in order to capture the characteristics of the bulk of the data.
Other bias reduction techniques in the Pareto-type case have been proposed among others in Feuerverger and Hall (1999), Gomes et al. (2000), Beirlant et al. (1999, 2002) and Gomes and Martins (2002). In Caeiro and Gomes (2011) methods are proposed to limit the variance of bias-reduced estimators to the level of the variance of the Hill estimator . The price to pay is then to assume a third-order slow variation model specifying (5) even further. These methods focus on the distribution of the
-spacings of high order statistics. Other construction methods for asymptotically unbiased estimators ofwere introduced in Peng (1998) and Drees (1996).
In this paper we concentrate on bias reduction when using the GPD approximation to the distribution of POTs , on which the literature is quite limited. This allows to extend bias reduction to the general case . We apply the flexible semiparametric GP modelling introduced in Tencaliec et al. (2018) to the POT distributions. We also extend the second-order refined POT approach using from (6) to all max-domains of attraction. Here the corresponding basic second order regular variation theory can be found in Theorem 2.3.8 in de Haan and Ferreira (2006) stating that
with as and which for the cases and is understood to be equal to the limit as and . We further allow more flexible second-order models than the ones arising from second-order regular variation theory such as in (7
) using non-parametric modelling of the second-order component. These new methods are also applied to the specific case of Pareto-type distributions.
In the next section we propose our transformed and extended GPD models, and detail the estimation methods. Some basic asymptotic results are provided in section 3. In the final section we discuss simulation results of the proposed methods and some practical case studies. We then also discuss the evaluation of the overall goodness-of-fit behaviour of the fitted models.
2 Transformed and extended GPD models
Recently, Naveau et al. (2016), generalizing Papastathopoulos and Tawn (2013), proposed to use full models for rainfall intensity data that are able to capture low, moderate and heavy rainfall intensities without a threshold selection procedure. These authors, considering only applications with a positive EVI however, propose to model all data jointly using transformation models with survival function
with and distribution functions on linked by (), and satisfying constraints to preserve the classical tail GPD fit and a power behaviour for small rainfall intensities:
, for some ,
, for some and .
In Naveau et al. (2016) the authors propose parametric examples for , such as , with . In Tencaliec et al. (2018) a non-parametric approach is taken using Bernstein polynomials of degree to approximate , i.e. using with beta densities
In Naveau et al. (2016) and Tencaliec et al. (2018) the primary goal is the search for a model fitting the whole outcome set, while the fit of the proposed model to POT values. However the bias and MSE properties of the estimators of and are still to be analyzed.
To encompass the above mentioned methods from Beirlant et al. (2009), Naveau et al. (2016) and Tencaliec et al. (2018) we propose to approximate with a transformation model with right tail function of the type
where for all as . Note here that for and
We also consider a submodel of , approximating the POT distribution with an extended GPD model
and for every ,
is twice continously differentiable.
Here the parameter represents a second order (nuisance) parameter.
For negative -values one needs to obtain a valid distribution.
At the function then corresponds to in (8), while as leads to the GPD survival function at large thresholds.
Note that model () is a direct generalization of the EPD model (6) replacing the Pareto distribution by the GPD and considering a general function rather than .
Now several possibilities for bias reduction appear:
Estimation under the transformed model (). Modelling the distribution of with model () and estimating and for every , we propose to use the algorithm from Tencaliec et al. (2018) for every or . This approach is further denoted with ().
Here we apply Bernstein approximation and estimation of which is the distribution function of . The Bernstein approximation of order of a continuous distribution function on is given by
As in Babu et al. (2002) one then replaces the unknown distribution function itself with the empirical distribution function of the available data in order to obtain a smooth estimator of :
In the present application, data from
are only available after imputing a value for. This then leads to the iterative algorithm from Tencaliec et al. (2018), which is applied to every threshold , or every number of top data. We here detail the algorithm for excesses , using the reparametrization with :
Set starting values (). Here one can use () from using .
Iterate for until the difference in loglikelihood taken in () and () is smaller than a prescribed value
Given () construct rv’s
Construct Bernstein approximation based on ()
with the empirical distribution function of
Obtain new estimates () with ML:
with denoting the derivative of .
The final estimates of and are denoted here by and . As noted in Tencaliec et al. (2018) the theoretical study of these estimates is difficult. In the simulation study the finite sample characteristics of these estimators are given using with a fixed value of using in order to stabilize the plots of the estimates of as much as possible. Note that this estimation procedure is computationally demanding.
Estimates of small tail probabilities can be obtained through
Finally, bias reduced estimators of extreme quantiles for small are obtained by setting the above formulas equal to and solving for .
Estimation under the extended model (). Modelling the distribution of the exceedances with model () leads to maximum likelihood estimators based on the excesses :
with for a given choice of .
Estimates of small tail probabilities are then obtained through
As in Naveau et al. (2016), respectively Tencaliec et al. (2018), two approaches can be taken towards the bias function : a parametric approach, respectively a non-parametric approach.
In the parametric approach, denoted (), the second-order result (7) leads to the parametric choice in case and .
Model () allows for bias reduced estimation of under the assumption that the corresponding second-order model (7) is correct for the POTs . Note that () generalizes the approach taken in Beirlant et al. (2009) to all max-domains of attraction. When model () is used as a model for all observations, i.e. taking , this model directly encompasses the models from Frigessi et al. (2002) and Naveau et al. (2016).
in which the classical estimator of (with ), or an appropriate value , is used to substitute , next to an appropriate value of . One can also choose a value of () minimizing the variance in the plot of the resulting estimates of as a function of .
Alternatively, a non-parametric approach (denoted ) can be performed using the Bernstein polynomial algorithm from Tencaliec et al. (2018). In fact in practice a particular distribution probably follows laws of nature, environment or business and does not have to follow the second-order regular variation assumptions as in (7). Moreover in the case of a real-valued EVI, the function can take different mathematical forms depending on () and being close to 0 or not.
Here a Bernstein type approximation is obtained for from obtained through algorithm (), and reparametrizing by with an appropriate value of the number of top data used. The function is then substituted by .
The methods described above of course can be rewritten for the specific case of Pareto-type distributions where the distribution of POTs are approximated by transformed Pareto distributions. The models are then rephrased as
The above algorithms, now based on the exceedances (), are then adapted as follows:
In algorithm () step (ii.c) is replaced by
The resulting estimates are denoted with and .
In approach () the likelihood solutions are given by
Note that the () approach using the parametric version for a particular fixed equals the EPD method from Beirlant et al. (2009), while () is new.
Estimators of tail probabilities are then given by
3 Basic asymptotics under model ()
We discuss here in detail the asymptotic properties of the maximum likelihood estimators solving (10) and (11). To this end, as in Beirlant et al. (2009) we develop the likelihood equations up to linear terms in since with decreasing value of . Below we set when using extended GPD modelling, while when using extended Pareto modelling under .
Extended Pareto POT modelling. The likelihood problem (11) was already considered in Beirlant et al. (2009) in case of parametric modelling for . We here propose a more general treatment. The limit statements in the derivation can be obtained using the methods from Beirlant et al. (2009). The likelihood equations following from (11) are given by
Extended Generalized Pareto POT modelling. The likelihood equations following from (10) up to linear terms in are now given by
Under the extended model we now state the asymptotic distribution of the estimators and . To this end let denote the quantile function of , and let denote the corresponding tail quantile function. Model assumption can be rephrased in terms of :
where and regularly varying with index . Moreover in the mathematical derivations one needs the extra condition that for every , and sufficiently large
Similarly, is rewritten as
The analogue of in this specific case is given by
with regularly varying with index .
Finally, in the expression of the asymptotic variances we use
The proof of the next theorem is outlined in the Appendix. It allows to construct confidence intervals for the estimators ofobtained under the extended models.
Theorem 1 Let be a sequence such that and such that . Moreover assume that in (10) and (11), is substituted by a consistent estimator as . Then