Analyzing insurance data with an exponentiated composite Inverse-Gamma Pareto model

08/14/2021
by   Bowen Liu, et al.
University of Nevada, Las Vegas
0

Exponentiated models have been widely used in modeling various types of data such as survival data and insurance claims data. However, the exponentiated composite distribution models have not been explored yet. In this paper, we introduced an improvement of the one-parameter Inverse-Gamma Pareto composite model by exponentiating the random variable associated with the one-parameter Inverse-Gamma Pareto composite distribution function. The goodness-of-fit of the exponentiated Inverse-Gamma Pareto was assessed using the well-known Danish fire insurance data and Norwegian fire insurance data. The two-parameter exponentiated Inverse-Gamma Pareto model outperforms the one-parameter Inverse-Gamma Pareto model in terms of goodness-of-fit measures for both datasets.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

04/30/2019

On the Utility of the Inverse Gamma Distribution in Modeling Composite Fading Channels

We introduce a general approach to characterize composite fading models ...
09/09/2021

A series acceleration algorithm for the gamma-Pareto (type I) convolution and related functions of interest for pharmacokinetics

The gamma-Pareto type I convolution (GPC type I) distribution, which has...
02/06/2020

GLRT based CFAR Pareto-Target Aircraft Detection in Two-Parameter Pareto Distributed Clutter

In the last decade, after Pareto distribution has been validated for X-b...
05/07/2020

Relevance Vector Machine with Weakly Informative Hyperprior and Extended Predictive Information Criterion

In the variational relevance vector machine, the gamma distribution is r...
06/05/2020

A zero-inflated gamma model for deconvolved calcium imaging traces

Calcium imaging is a critical tool for measuring the activity of large n...
04/13/2020

Assessing the Performance of the Discrete Generalised Pareto Distribution in Modelling Non-Life Insurance Claims

In this paper, non-life insurance claims were modelled under the three p...
04/02/2020

Generalized inverse-Gaussian frailty models with application to TARGET neuroblastoma data

A new class of survival frailty models based on the Generalized Inverse-...

1 Introduction

Modeling claim size data is one of the major topic in actuarial science. Actuaries often make decisions on financial risk management based on models. Thus, the selection of a proper model for claim sizes is a key task in the actuarial industry. Under normal circumstances, a claim size data set consists of a large number of claims with small claim size and few claims with large size. The common distributions in the literature such as exponential, normal, etc. do not have the ability to incorporate all the features of a claim size data set. Hence, the concept of composite distribution was introduced for modeling claim size data. With such concept, many different composite models were developed including lognormal-Pareto [ananda2005], exponential-Pareto [Teodorescu2006], Weibull-Pareto [preda2006], etc. Pareto distribution is considered good for modeling claims with large size. However, for modeling claims with small size, there are many variations in the literature.

Aminzadeh and Deng introduced the Inverse-Gamma Pareto model recently [ig_pareto] and it was suggested as a possible model for data sets with a very heavy tail such as insurance data sets. This is a one-parameter Inverse-Gamma Pareto composite distribution with appealing properties such as continuity and differentiability. However, fitting a one-parameter Inverse-Gamma Pareto model to the Danish fire insurance data does not provide satisfactory performance as we will show in the Numerical Examples section. Specifically, the mode of fitted Inverse-Gamma Pareto distribution is not large enough to describe the small claims with high frequencies within the Danish fire insurance data. Therefore, we will modify this one-parameter Inverse-Gamma Pareto model by introducing an additional parameter.

Exponentiated distributions were first introduced by Mudholkar and Srivastava [mudholkar1990]. The main idea of exponetiated distributions is to exponentiate the Cumulative Density Function (CDF) of an existing distribution. It adds more flexibility to the traditional models due to the extra parameter. Many modifications of the existing distributions were later introduced following the idea of Mudholkar and Srivastava. For instance, Gupta and Kunda introduced exponentiated exponential [gupta]; Nadarajah pioneered exponentiated beta, exponentiated Pareto and exponentiated Gamma [exp_beta, exp_pareto, exp_gumbel]; Nadarajah and Gupta initiated exponentiated Gamma [exp_gamma] and Afify established exponentiated Weibull-Pareto [exp_weibull]. However, none of these models were established using CDF of a composite distribution. Moreover, all the exponentiated distributions mentioned above were created by exponentiating the CDF, while exponentiated Inverse-Gamma model we propose was constructed by exponentiating the random variable associated with CDF of a composite distribution.

The rest of the paper is organized as follows. Section 2 provides the derivation of exponentiated Inverse-Gamma Pareto model, the description of its behaviors and an algorithm to obtain the maximum likelihood estimators of the model. We briefly summarize the results from simulation studies in Section 3 to assess the accuracy and consistency of the MLE. In Section 4, two numerical examples are presented. One is the Danish fire insurance data and the other is Norwegian fire insurance data. Conclusions are provided in Section 5.

2 Methodology

2.1 Introduction of the general composite model in loss data modeling

Let be a positive real-valued random variable The general form of a composite model in loss data modeling was formally introduced [Bak15] as follows:

along with the continuity and differentiability conditions at the threshold :

where

is the probability density function of random variable

when takes values between and ;

are the model parameters for the probability density function of the random variable

when takes values that are greater than . is a positive parameter that controls the weights of and .

The composite Inverse-Gamma Pareto model was established by Aminzadeh and Deng [ig_pareto] by utilizing the theory introduced above. Suppose a random variable is known to follow a composite Inverse Gamma-Pareto distribution such that the pdf of is as follows:

(1)

where, . Thus, their proposed Inverse-Gamma Pareto model contains only one parameter . In the following subsection, we will discuss the development of exponentiated composite Inverse-Gamma Pareto distribution specifically.

2.2 Development of the exponentiated composite Inverse Gamma-Pareto distribution

Now suppose a power transformation is applied to random variable , say , where is monotone increasing for any . Also, . For any , has continuous derivative on . Then the probability density function of is given by:

(2)

It can be easily shown that the above density function for exponentiated composite Inverse-Gamma Pareto model is continuous and differentiable on the support .

The motivation for developing exponentiated IG-Pareto model as an improvement of IG-Pareto model for loss data modeling is shown in Figure 1 and 2. Two different values for are chosen as 5 (Figure 1) and 10 (Figure 2). For each value, three values and are chosen, where corresponds to the original one-parameter Inverse-Gamma Pareto composite.

The figures indicate the composite exponentiated Inverse-Gamma Pareto model provides more flexibility to the one-parameter Inverse-Gamma Pareto model due to the introduction of the power parameter . For fixed value of , the mode of the composite exponentiated Inverse-Gamma Pareto increases as increases.

Figure 1: Composite exponentiated Inverse-Gamma Pareto density ()
Figure 2: Composite exponentiated Inverse-Gamma Pareto density ()

2.3 Parameter Estimation

Let be a random sample from the exponentiated composite pdf given in (2). Without loss of generality, assume that is an ordered random sample generated from the pdf. The likelihood function can be written as follows:

(3)

where

The above likelihood assumes that there’s an such that . The MLE of and can be obtained by solving the following equations:

Closed-form expressions for MLE of and cannot be obtained. In addition, needs to be determined before finding the solution of the above equations. However, given the value of and , the closed-form solution of can be written as follows:

(4)

Thus, we designed a simple search algorithm to find the MLE of and by utilizing equation (4). The description of the search algorithm is as follows:

  1. Obtain the sorted observations of a sample as

  2. Determine the range of , the parameter search will be done within the pre-defined range. Note that, we get the original one-parameter Inverse-Gamma Pareto model if . Hence, the search needs to be done in an interval around .

  3. For a known in the range, we start with and calculate the MLE of given based on (4). If , then . Otherwise jump to step (IV)

  4. Let . If , then . We shall continue the above steps until is identified. Once is identified, keep as the MLE of for the known .

  5. Search for the optimal that maximizes . Find the corresponding using equation (4). These are the MLEs for and .

3 Simulation

We conducted simulation studies to check the accuracy for the estimates of and . For the selected sample size , and values, samples from the composite density (2) were generated.

Table 1 to 6 present the results of all simulations under different scenarios. , stand for the sample mean of and ; and

denote the sample standard deviation of of

and .

20 0.876 1.304 0.204 1.437
50 0.828 1.094 0.117 0.474
100 0.816 1.040 0.084 0.315
500 0.804 1.006 0.037 0.135
Table 1: Simulation Results for and
20 1.093 1.262 0.248 1.025
50 1.036 1.092 0.145 0.478
100 1.017 1.039 0.102 0.307
500 1.005 1.005 0.049 0.137
Table 2: Simulation Results for and
20 1.322 1.263 0.312 1.094
50 1.240 1.091 0.174 0.463
100 1.220 1.048 0.120 0.314
500 1.206 1.005 0.0582 0.140
Table 3: Simulation Results for and

We observed that when sample size increases, the mean of the estimates of gets closer to the underlying true under all simulation scenarios. Similarly, the mean of gets closer to the underlying true . In addition, the standard deviation of both and decreases as the sample size increases for different settings of the simulation parameters. Thus, the MLE of and become more accurate as the sample size increases.

20 0.877 7.464 0.203 9.776
50 0.829 5.555 0.117 1.992
100 0.813 5.276 0.082 1.263
500 0.805 5.049 0.037 0.512
Table 4: Simulation Results for and
20 1.098 7.256 0.258 6.480
50 1.036 5.626 0.146 2.070
100 1.017 5.269 0.101 1.232
500 1.003 5.048 0.049 0.511
Table 5: Simulation Results for and
20 1.317 7.245 0.305 7.213
50 1.244 5.566 0.173 2.025
100 1.224 5.283 0.121 1.261
500 1.206 5.059 0.0579 0.511
Table 6: Simulation Results for and

4 Numerical Examples

In this section, we presented the performance of the exponentiated Inverse-Gamma Pareto model with 2 diffenrent insurance data sets.

4.1 Goodness-of-fit of the exponentiated Inverse-Gamma Pareto Model

To compare the performance of the different models when fitting the insurance datasets, NLL, AIC and BIC were used. The description of the measures are listed as follows:

  • NLL: Negative Log-Likelihood is defined as the additive inverse of the loglikelihood function as follows:

    reaches its minimum as the loglikelihood function reaches its maximum. Thus, minimizing is equivalent to maximizing the loglikelihood function. For the models with the same number of free parameters, can be utilized to compare the model performance, where a lower value of will indicates that a model fits the data better.

  • AIC: Akaike’s Information Criterion [burnham] is defined as follows:

    where is the number of free parameters.

    can be used to compare the models with different number of parameters since the first term of decreases as the number of parameters increases, while the second term of increases as the number of parameters increases. A smaller value indicates that a model will fit the data better.

  • BIC: Bayesian Information Criterion [burnham] is provided as follows:

    where is the number of free parameters and is the sample size of the data set. Similar to , penalizes the models with more parameters with its second term. However, it penalizes free parameters more heavily in comparison to when the gets larger.

R software was used to compute the MLE for the parameters in different models as well as NLL, AIC and BIC of these models.

4.1.1 Case 1: Danish fire insurance data

Danish fire insurance data was widely used by many researchers to check the performance of different composite models. The data set contains 2492 claims in millions of Danish Krones (DKK) from the years 1980 to 1990. From the SMPracticals package in R [smpractical], we were able to obtain the data and complete the analysis.

Table 7 provides the performance of several models including exponentiated IG-Pareto model. Exponentiated IG-Pareto model outperforms the original one-parameter IG-Pareto model in both and

. This is consistent with Figure 3. Firgure 3 presents the comparison of IG-Pareto model, exponentiated IG-Pareto model and the Guassian kernel density estimate of the Danish Fire Insurance Dataset. Exponentiated IG-Pareto model provides a satisfactory fit to the Danish Fire Insurance Data while the original one-parameter IG-Pareto model does not fit the same data set well. Among the three two-parameter models we chose, Inverse-Gamma model performed slightly better compared to exponentiated IG-Pareto model. However, in terms of

and , the exponentiated IG-Pareto model gave a better performance compared to the two-parameter Weibull model.

4.1.2 Case 2: Norwegian fire insurance data

Similar to the Danish fire insurance loss data set, the Norwegian fire insurance data was used by several researchers to investigate the performance of various loss models. The data set consists of 9181 claims in 1000s of Norwegian Krones (NKK) from the years 1972 to 1992 for a Norweigian insurance company. We obtained the data set through R package ReIns [reins]. Note the claims with size less than 500,000 NKK are focred to be 500,000 NKK. However, none of the claim values from the year 1972 are truncated, and therefore we selected the data from the year 1972 to assess the performance of the proposed model. Dealing with the truncated data is beyond the scope of this article.

The claim data from the year 1972 consists of 97 values and the claim values in millions of Norweigian Krones (NKK) are as follows:

0.520, 0.529, 0.530, 0.530, 0.544, 0.545, 0.546, 0.549, 0.553, 0.555, 0.562, 0.565, 0.565, 0.568, 0.579, 0.586, 0.600, 0.600, 0.604, 0.605, 0.621, 0.627, 0.633, 0.636, 0.667, 0.670, 0.671, 0.676, 0.681, 0.682, 0.699, 0.706, 0.725, 0.729, 0.736, 0.741, 0.744, 0.750, 0.758, 0.764, 0.767, 0.778, 0.797, 0.810, 0.849, 0.856, 0.878, 0.900, 0.916, 0.919, 0.922, 0.930, 0.942, 0.943, 0.982, 0.991, 1.051, 1.059, 1.074, 1.130, 1.148, 1.150, 1.181, 1.189, 1.218, 1.271, 1.302, 1.428, 1.438, 1.442, 1.445, 1.450, 1.498, 1.503, 1.578, 1.895, 1.912, 1.920, 2.090, 2.370, 2.470, 2.522, 2.590, 2.722, 2.737, 2.924, 3.293, 3.544, 3.961, 5.412, 5.856, 6.032, 6.493, 8.648, 8.876, 13.911, 28.055

Table 8 provides the performance of several models including exponentiated IG-Pareto model. Similar to what we observed for the Danish Fire Insurance Data, the Exponentiated IG-Pareto model performed better than the original one-parameter IG-Pareto model in terms of both and . This is consistent with Figure 4, where exponentiated IG-Pareto model fits with the Norwegian Fire Insurance Data satisfactorily while the original one-parameter IG-Pareto model does not fit the same data set well. Among the three two-parameter models we chose, exponentiated IG-Pareto model performed the best in terms of all these goodness-of-fit criteria including and .

Model MLE of parameters NLL AIC BIC
Weibull
IG
IG-Pareto
Exp IG-Pareto
Table 7: Goodness-of-fit of different models to the Danish fire data based on MLEs.
Figure 3: Density Plot of Danish Fire Insurance Data with corresponding exponentiated IG-Pareto and IG-Pareto model fit
Model MLE of parameters NLL AIC BIC
Weibull
IG
IG-Pareto
Exp IG-Pareto
Table 8: Goodness-of-fit of different models to the Norwegian fire insurance data (year 1972) based on MLEs.
Figure 4: Density Plot of Nowrgewian Fire Insurance Data (year 1972) with corresponding exponentiated IG-Pareto and IG-Pareto model fit

5 Conclusion

In this paper, we proposed a new exponentiated Inverse-Gamma Pareto model to improve the performance of original one-parameter Inverse-Gamma Pareto model. We provide an algorithm to find the MLE of and in Section 2. Such algorithm presents the ability to identify the MLE as the estimates for both and become more accurate as the sample size gets larger in all simulation scenarios. Two numerical examples are provided and the new exponentiated Inverse-Gamma Pareto model outperforms the original Inverse-Gamma Pareto model in both examples. The development of this model is promising since such exponentiation approach can also be applied to other composite models.

References