A regularization approach for stable estimation of loss development factors

In this article, we show that a new penalty function, which we call log-adjusted absolute deviation (LAAD), emerges if we theoretically extend the Bayesian LASSO using conjugate hyperprior distributional assumptions. We further show that the estimator with LAAD penalty has closed-form in the case with a single covariate and it can be extended to general cases when combined with coordinate descent algorithm with assurance of convergence under mild conditions. This has the advantages of avoiding unnecessary model bias as well as allowing variable selection, which is linked to the choice of tail factor in loss development for claims reserving. We calibrate our proposed model using a multi-line insurance dataset from a property and casualty company where we observe reported aggregate loss along the accident years and development periods.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 19

10/20/2021

Variable selection in doubly truncated regression

Doubly truncated data arise in many areas such as astronomy, econometric...
04/11/2020

Robust adaptive variable selection in ultra-high dimensional regression models based on the density power divergence loss

We consider the problem of simultaneous model selection and the estimati...
12/14/2020

Variable Selection with Second-Generation P-Values

Many statistical methods have been proposed for variable selection in th...
05/11/2016

High dimensional thresholded regression and shrinkage effect

High-dimensional sparse modeling via regularization provides a powerful ...
11/22/2018

Penalized least squares approximation methods and their applications to stochastic processes

We construct an objective function that consists of a quadratic approxim...
03/24/2016

Pathway Lasso: Estimate and Select Sparse Mediation Pathways with High Dimensional Mediators

In many scientific studies, it becomes increasingly important to delinea...
08/07/2021

Bayesian L_1/2 regression

It is well known that bridge regression enjoys superior theoretical prop...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Chain ladder method, as an industry benchmark with theoretical discussions such as mack1993chainladder and mack1999chainladderse, has been widely used to determine the development pattern of reported or paid claim. However, despite the prevalence of the method, we need to consider some issues in estimation of development factors for mature years with chain ladder method. In general, we expect that cumulative reported loss amount gradually increases whereas the magnitude of development decreases. However, it is possible that loss development patterns with some run-off triangles may not follow that usual pattern. As mentioned in renshaw1989chain, it is because we only have a few data points in the north-east corner and estimation of parameters which depends on those data points becomes unstable, due to triangular or trapezoidal shape of aggregated claims data. Therefore, we need to consider a way to estimate the development factors for mature years with more stability.

Further, loss development may continue beyond ten years, which is usually the maximum number of years tracked in run-off triangles so that one can consider the ‘tail factor’ of loss development to predict ultimate claims. However, naive use of chain ladder method does not provide a way to estimate tail factor so that tail factor is chosen subjectively or according to industry benchmark in practice. In this regard, there have been some research works on the selection of tail factor based on a loss development triangle, including but not limited to boor2006tailfactor, verrall2012tailfactor, and merz2013tailfactor.

In order to deal with the aforementioned issues, stable estimation of development factors for mature years and tail factor selection, one can apply regularization method or penalized regression in loss development models. Today, there is a rich literature of using penalization in regression framework. The first penalization method introduced is ridge regression, developed by

hoerl1970ridge. By adding an penalty term on the least squares, they showed that it is possible to have smaller mean squared error when there is severe multicollinearity in a given dataset. However, ridge regression has merely the shrinkage property and not the property of variable selection. To tackle the latter, tibshirani1996lasso suggested LASSO, which uses an penalty term on the least squares, and showed that this method enables us to do variable selection, which leads to dimension reduction as well. Despite the simplicity of the proposed method, there has been a great deal of work done to extend the LASSO framework. For example, park2008bayesian

extended LASSO by providing a Bayesian interpretation on LASSO. Although LASSO has the variable selection property, the estimates derived by LASSO regression are inherently biased. However, there have been some meaningful approaches so that we obtain both the variable selection property and less biased estimates. For example,

fan2001scad

proposed smoothly clipped absolute deviation (SCAD) penalty, which is derived by assuming continuously differentiable penalty function to achieve three properties such as (i) getting nearly unbiased estimate when absolute value of the true unknown parameter is large, (ii) variable selection, and (iii) the continuity of the calculated estimates. Although SCAD penalty has the above-mentioned good properties, it is naturally a non-convex optimization so that it loses the desirable properties of convex optimization problems. Thus,

zhang2010mcp proposed minimax concave penalty (MCP), which minimizes the maximum concavity subject to unbiasedness feature.

The use of penalized regression in the actuarial literature is not quite new, but there have only been a few relevant works in the field. For instance, williams2015elastic applied elastic net penalty, which is a combination of and penalties, on a dataset with over 350 initial covariates to enhance insurance claims prediction. In addition, nawar2016 used LASSO for detecting possible interaction between covariates which are used in claims modeling.

In this paper, we extend the Bayesian LASSO framework of park2008bayesian

using conjugate hyperprior distributional assumptions. This extension leads to a new penalty function, which we call log-adjusted absolute deviation (LAAD), which enables us to obtain variable selection property while maintaining less bias of the estimator. This motivated us to apply LAAD penalty in cross-classfied model used in loss development for reserving to choose the tail factor. To calibrate the model, we use reported loss triangles from multiple lines of business from a property and casualty (P&C) insurer. We compare the estimated loss development factors including the tail factor of our proposed model with usual cross-classified model and cross-classified model with LASSO penalty. All these models are comparably explained in the section on estimation and prediction; we also discussed the validation measures. It turns out that our proposed model provides us reasonable estimates of loss development factors which agree with our prior knowledge or expectation on loss development pattern as well as shows better performance in the prediction of reserve.

This paper has been organized as follows. In Section 2, we develop the construction of the new penalty via Bayesian interpretation of LASSO and its properties by comparing it with other penalty functions. In Section 3, we postulate the novelty of our proposed method via simulation study. In Section 4, we explore possible use of our method in insurance reserving application using a multi-line reported loss triangles dataset and provide results of the estimation and prediction for the various models. We make a conclusion in Section 5.

2 Proposed method: LAAD

2.1 Derivation of loss function with LAAD penalty

According to park2008bayesian, we may interpret LASSO in a Bayesian framework as follows:

where

is a vector of size

with each component having a density function , for . According to their specification, we may express the likelihood and the log-likelihood for , respectively, as

(1)

In their work, park2008bayesian suggested two ways to choose the optimal in Equation (1). One is the use of point estimate by cross-validation, and the other is the use of a ‘hyperprior’ distribution for . However, they did not provide a detailed derivation of the likelihood when a hyperprior is used.

Now, consider the following distributional assumptions

In other words, the hyperprior of

follows a gamma distribution with density

. This implies that we have:

(2)

As a result, the log-likehood in Equation (2) allows us to have the following formulation of our penalized least squares problem. This gives rise to what we call the log-adjusted absolute deviation (LAAD) penalty function:

so that

A similar penalty function is described as hierarchical adaptive lasso in lee2010hal but the authors did not provide details of the derivation.

2.2 Estimation with LAAD penalty

To better understand the characteristic of a model with LAAD penalty, let us consider a simple example when and . In this case, optimization of in Equation (2) is reduced to a univariate case so that it is sufficient to solve the following:

(3)

where .

Theorem 1.

Let us set . Then the corresponding minimizer will be given as , where

and is the unique solution of

Proof.

See Appendix A. ∎

Note that when is large enough, which means converges to when . Therefore, by using LAAD penalty, we obtain an optimizer which has the variable selection property, and bias reduction property as true parameter value departs from zero. Figure 1 provides graphs which describe the behavior of the obtained optimizer derived with different penalization. The first graph is the behavior of the optimizer derived with penalty, which is also called ridge regression. In that case, as previously mentioned, it has no variable selection property but it only shrinks the magnitude of the estimates. The second graph is the behavior of the optimizer derived with penalty, which is the basic LASSO. For this case, we see that although it has variable selection property (if value of is small enough, then becomes 0), the discrepancy between the true and remains constant even when the true is very big. Finally, the third graph shows the behavior of the optimizer derived with the proposed LAAD penalty. One can see that not only the given optimizer has the variable selection property, but also converges to as increases.

Figure 1: Estimate behavior for different penalty functions

Figure 2 illustrates the constraint regions implied by each penalty. It is well known that the constraint regions defined by penalization is a -dimensional circle, whereas the constraint regions defined by penalization is a -dimensional diamond. We can observe that in both cases of and penalization, the constraint regions are convex, which implies we entertain good properties of convex optimization. However, in the case of the constraint region implied by LAAD penalty, the region is non-convex, which is inevitable to obtain both the consistency of the estimates and the variable selection property.

Figure 2: Constraint regions for different penalties

It is also possible to compare the behavior of LAAD penalty with SCAD penalty and MCP. According to fan2001scad and zhang2010mcp, one can write down the penalty functions (and their derivatives) in the univariate case, assuming , as follows:

From above, it is straightforward to see that

This implies that the marginal effect of penalty converges to 0 as the value of increases and hence, the magnitude of distortion on the estimate becomes negligible as the true coefficient gets larger when we use either SCAD penalty, MCP, or LAAD penalty. However, we see that , which means that the magnitude of distortion on the estimate is the same even in the case when the true coefficient is very large.

On the other hand, if we let to 0, one can see that

which implies that for SCAD penalty, MCP, and LAAD penalty, the magnitude of penalization is the same with LASSO when the true value of is very small. Therefore, we verify that SCAD, MCP, and LAAD penalty have the same property of variable selection as LASSO, when the true is small enough.

2.3 Implementation in general case and convergence analysis

Estimating parameters from given penalized least squares is an optimization problem. Since an analytic solution is obtained in the case of univariate penalized least squares, one can implement an algorithm for optimization. For example, for obtaining in the multivariate case, we may apply coordinate descent algorithm proposed by luo1992coordinate, which starts with an initial set of estimates and then successively optimize along each coordinate or blocks of coordinates. The algorithm is explained in details as follows:

Although our optimization problem is non-convex, we can obtain a sufficient condition so that coordinate descent algorithm converges with our optimization problem. To show the convergence, we need to introduce the concepts of quasi-convex and hemivariate. A function is hemivariate if a function is not constant on any interval which belongs to its domain. A function is quasi-convex if

An example of a function which is quasi-convex and hemivariate is .

The following lemma is useful for obtaining a sufficient condition that our optimization problem converges with coordinate descent algorithm.

Lemma 1.

Suppose a function is defined as follows:

and for all . If , then is both quasi-convex and hemivariate for all .

Proof.

See Appendix B. ∎

Theorem 2.

If for all and , then the solution from coordinate descent algorithm with function converges to where

Proof.

According to Theorem 5.1 of tseng2001cdescent, it suffices to show that (i) is continuous on , (ii) is lower semicontinuous, and (iii) is quasi-convex and hemivariate. (i) and (ii) are obvious and (iii) could be shown from Lemma 1. ∎

3 Simulation study

In this section, we conduct a simulation study in order to demonstrate the novelty of our proposed method. Suppose we have the following nine available covariates

and response variable

which are generated as follows:

One can check that if a regression model is calibrated using , then the estimated regression coefficients are all significant. However, even if all covariates are significant by themselves, omission of effective interaction term, , can lead to biases in the estimated coefficients and subsequently the lack of fit as illustrated in Figure 3. In Figure 3, reduced model means a linear model fitted only with , while true model is a linear model fitted with .

Figure 3: QQplots for reduced model and true model

On the other hand, including every interaction terms may also end up with an inferior model since it may accumulate noises in the estimation which lead to higher variances in the estimates. As elaborated in

james2013isl, the mean squared error (MSE) of a predicted value under a linear model is determined by both the variance of the predicted value and the squared bias of the estimated regression coefficients as follows:

(4)

From Equation (4), we note that by including fewer variables in our model with variable selection, we can get lower . However, it could increase due to omitted variable bias (i.e., if a variable has been selected out) or inherent bias of the estimated value because of the penalization. Therefore, it implies that if most of the original variables are significant so that the magnitude of the bias is too high, then the benefit of a reduced is offset by a higher . In this regard, variable selection should be performed carefully to achieve a balance between the bias and variance and get better prediction with lower mean squared error.

To show the novelty of our proposed penalty function, we first obtain 100 replications of simulated samples with sample size 1000 and estimate the regression coefficients based on the following four models:

  • Full model: a linear model fitted with and every possible interaction among them,

  • Reduced model: a linear model fitted only with ,

  • LASSO model: Full model regularized with penalty,

  • LAAD model: Full model regularized with LAAD penalty.

To evaluate the estimation result under each model, we introduce the following metrics which measure the discrepancy between the true coefficients and estimated coefficients under each model:

where means the true value of coefficient and refers to the estimated value of coefficient with simulated sample. According to Table 1, Full model

is most favored in terms of the biases of estimated coefficients, which is reasonable since ordinary least square (OLS) estimator is unbiased. However, one can see MSEs of estimated coefficients under

Full model is greater than those of LAAD model so that LAAD model is expected to provide better estimation in general. It is also observed that estimation results with Reduced model and LASSO model are quite poor, which means use of naive LASSO penalty may not work well.

Bias RMSE
Full Reduced LASSO LAAD Full Reduced LASSO LAAD
x1 -0.006 0.216 -0.010 0.006 0.082 0.222 0.039 0.025
x2 -0.020 0.938 -0.055 0.002 0.165 0.944 0.098 0.035
x3 -0.011 -1.980 -0.141 -0.059 0.099 1.981 0.155 0.073
x4 0.003 0.029 0.047 0.002 0.116 0.046 0.069 0.018
x5 0.012 -0.001 -0.077 -0.009 0.123 0.038 0.109 0.018
x6 -0.004 0.002 0.057 0.003 0.088 0.023 0.077 0.012
x7 0.004 -0.031 -0.040 -0.005 0.114 0.046 0.064 0.016
x8 -0.003 0.089 -0.116 -0.051 0.180 0.109 0.142 0.058
x9 0.012 0.123 0.042 0.020 0.148 0.137 0.074 0.038
‘x1 : x6’ 0.001 0.000 -0.004 0.000 0.012 0.000 0.008 0.000
‘x2 : x3’ -0.001 -1.000 -0.024 -0.024 0.016 1.000 0.029 0.029
‘x3 : x4’ 0.001 0.000 0.002 0.000 0.008 0.000 0.006 0.000
‘x4 : x6’ 0.000 0.000 -0.001 0.000 0.005 0.000 0.004 0.000
Table 1: Summary of estimation

Besides the values of estimated coefficients, it is also of interest to have the ability to capture correct degree of sparsity in a model with the following measures:

Table 2 shows how LAAD model captures the sparsity of the true model correctly. One can see that LAAD model shows the least mean and norm differences while Full model fails to capture the sparsity of the true model. Therefore, this simulation supports the assertion that our proposed LAAD penalty can be utilized in practice with better performance.

Full Reduced LASSO LAAD
Mean norm difference 1.351 4.478 0.88 0.272
Mean norm difference 35.000 1.000 19.07 0.440
Table 2: and norm differences for each model

4 Empirical analysis: application in loss development methods

4.1 Data characteristics

A dataset from ACE Limited 2011 Global Loss Triangles is used for our empirical analysis which is shown in Tables 3 and 4. This dataset is summarization of two lines of insurance business including General Liability and Other Casualty in the form of reported claim triangles.

Given dataset can also be expressed in the following way:

(5)

where means the reported clam for line of insurance business in accident year with development lag. Note that in our case.

Based on the reported claim data (upper triangle), an insurance company needs to predict ultimate claim (lower triangle) described as follows:

(6)
DL 1 DL 2 DL 3 DL 4 DL 5 DL 6 DL 7 DL 8 DL 9 DL 10
AY 1 87,133 146,413 330,129 417,377 456,124 556,588 563,699 570,371 598,839 607,665
AY 2 78,132 296,891 470,464 485,708 510,283 568,528 591,838 662,023 644,021
AY 3 175,592 233,149 325,726 449,556 532,233 617,848 660,776 678,142
AY 4 143,874 342,952 448,157 599,545 786,951 913,238 971,329
AY 5 140,233 284,151 424,930 599,393 680,687 770,348
AY 6 137,492 323,953 535,326 824,561 1,056,066
AY 7 143,536 350,646 558,391 708,947
AY 8 142,149 317,203 451,810
AY 9 128,809 298,374
AY 10 136,082
Table 3: Reported claim triangle for General Liability
DL 1 DL 2 DL 3 DL 4 DL 5 DL 6 DL 7 DL 8 DL 9 DL 10
AY 1 201,702 262,233 279,314 313,632 296,073 312,315 308,072 309,532 310,710 297,929
AY 2 202,361 240,051 265,869 302,303 347,636 364,091 358,962 361,851 355,373
AY 3 243,469 289,974 343,664 360,833 372,574 373,362 382,361 380,258
AY 4 338,857 359,745 391,942 411,723 430,550 442,790 437,408
AY 5 253,271 336,945 372,591 393,272 408,099 415,102
AY 6 247,272 347,841 392,010 425,802 430,843
AY 7 411,645 612,109 651,992 688,353
AY 8 254,447 368,721 405,869
AY 9 373,039 494,306
AY 10 453,496
Table 4: Reported claim triangle for Other Casualty

Note that although it may be natural to consider a possible dependence among different lines of business, here we refrain from incorporating dependence in this paper in order to focus on variable selection via LAAD penalty. For this reason, we note without the superscript which denotes each line of business. Those who are interested in the dependence modeling among different lines of business might refer to shi2012multireserve and jeong2019vinereserving.

4.2 Model specifications and estimation

In our search for a loss development model, we use cross-classfied model which was also introduced in shi2011depreserve and taylor2016. For each line of business, unconstrained lognormal cross-classified model is formulated as follows:

(7)

where means the overall mean of the losses from the line of business, is the effect for accident year and means the cumulative development at year.

General Liability Other Casualty
Estimate Pr(>|t|) Estimate Pr(>|t|)
11.382 0.000 12.173 0.000
0.789 0.000 0.260 0.000
1.236 0.000 0.359 0.000
1.515 0.000 0.430 0.000
1.673 0.000 0.464 0.000
1.779 0.000 0.491 0.000
1.825 0.000 0.489 0.000
1.850 0.000 0.506 0.000
1.874 0.000 0.508 0.000
1.936 0.000 0.432 0.000
0.168 0.020 0.065 0.027
0.221 0.004 0.188 0.000
0.505 0.000 0.370 0.000
0.396 0.000 0.282 0.000
0.616 0.000 0.323 0.000
0.570 0.000 0.835 0.000
0.461 0.000 0.347 0.000
0.410 0.002 0.667 0.000
0.439 0.010 0.852 0.000
Adj- 0.999 0.999
Table 5: Summary of unconstrained model estimation

Although the constrained lognormal regression model shows us nearly perfect fit in terms of adjusted , there are two issues which need to be considered. Firstly, it is natural that incremental reported loss amount gradually decreases while cumulative reported loss amount still increases until it is developed to ultimate level, which is equivalent to for . It is observed, however, that estimated values do not show that pattern for both lines of business in Table 5. Secondly, it is known that development of loss is recorded usually for ten years in the triangle which follows the format of Schedule P, the NAIC-mandated Annual Statement. However, it is also known that there can be a claim which takes much more than ten years to be finalized in a long-tail line of P&C insurance such as workers compensation. Therefore, one needs to consider the ‘tail development factor’, which accounts for the magnitude of loss development from a finite stage of development to ultimate.

In order to handle aforementioned issues simultaneously, we propose a penalized cross-classfied model. Since both and are nuisance parameters in terms of tail factor selction, we modify the formulation in (7) in the following way:

(8)

In this formulation, can be interpreted as incremental development factor from year to year so that if for a certain value of , then it implies there is no more development of loss after years of development and would determine the tail factor. Therefore, this formulation allows us to systematically choose tail factor based on variable selection procedure performed with penalized regression on given data, not by a subjective judgment. In that regard, we propose the following three model specifications:

  • Unconstrained model - a model which minimizes the following for each line of business:

  • LASSO constrained model - a model which minimizes the following for each line of business with LASSO penalty:

  • LAAD constrained model - a model which minimizes the following for each line of business with LAAD penalty:

Note that for both of LASSO and LAAD constrained models, is not penalized in estimation in order to avoid underreserving issue due to penalization.

When a variable selection via penalization is implemented, it is required to set the tuning parameter which controls the magnitude of penalty, either via cross-validation or based on prior knowledge. In our search for the tuning parameter for LASSO constrained model, usual cross-validation method is applied with glmnet routine in R to choose optimal so that the average of root mean squared errors (RMSEs) on n-fold cross-validation with each value of tuning parameters are examined and a value is chosen which yields the smallest average of cross-validation RMSEs. For detail, see friedman2009glmnet.

For the calibration of the tuning parameter for LAAD constrained model, we apply the idea of prior elicitation in Bayesian statistics. One can recall that derivation of LAAD penalty was based on a hierarchical Bayesian model. In this sense, calibration of the tuning parameter for LAAD penalty is equivalent to the elicitation of Laplace hyperprior. Based on our experience reserve modeling and theoretical background on penalization method, we hope that incremental loss development is decreasing as the development lag increases while it remains non-negative and also induces bias because penalization should not be too large. Therefore, we choose optimal

as the smallest value among the tuning parameters satisfying in their resulting estimation.

Note that apart from the choice of tuning parameters, we also need to consider different attributes of covariates (for example, binary, ordinal, discrete, or continuous) when we do variable selection via penalization. However, since the covariates used in our empirical analysis are all binary factor variables, we can claim that either direct use of penalty or its transformation is innocuous. For the variable selection on the covariates with diverse attributes, see devriendt2018sparse.

Once the parameters are estimated in each model, the corresponding incremental development factor lag can be also estimated as , based on the formulation of lognormal cross-classified model. Table 6 summarizes the estimated results of incremental development factors for the three calibrated models. One can see that the unconstrained model deviates from our expectations on the development pattern. For example, in the case of General Liability, incremental development factor of lag is less than that of lag. In the case of Other Casualty, it is also shown that incremental development factor of lag is less than 1, which is not intuitive as well. Further, the unconstrained model also fails to estimate a tail factor. With regard to the LASSO constrained model, it is clear that naive implementation of variable selection is not working on this example. Even after the cross-validation procedure which picks the best value of for each of business line, the resulting estimates are less reasonable to explain loss development pattern. For instance, in the case of Other Casualty, estimated incremental loss development factors for and lags are still less than 1. Finally, we see that the LAAD constrained model ends up with reasonable estimates so that is always satisfied for both business lines and tail factors are well estimated with a pattern that is intuitive. For example, in the case of Other Casualty, we have so that is given as the tail factor.

General Liability Other Casualty
Unconstrained LASSO LAAD Unconstrained LASSO LAAD
2.2022 2.3203 2.4066 1.2975 1.3064 1.3570
1.5681 1.5514 1.5408 1.1052 1.1016 1.0876
1.3108 1.2956 1.2846 1.0792 1.0754 1.0606
1.1723 1.1574 1.1458 1.0352 1.0312 1.0157
1.1569 1.1407 1.1281 1.0298 1.0254 1.0085
1.0465 1.0299 1.0164 0.9959 1.0000 1.0000
1.0512 1.0317 1.0163 1.0024 1.0000 1.0000
1.0106 1.0000 1.0000 0.9929 0.9998 1.0000
1.0147 1.0000 1.0000 0.9589 0.9685 1.0000
Table 6: Summary of estimated incremental development factors

4.3 Model validation

To validate the predictive models for loss development, calibrated using the training set (upper loss triangles) defined in (5), we use cumulative (or incremental) payments of claims for calendar year 2012 as a validation set, obtained from ACE Limited 2012 Global Loss Triangles. Note that those data points can be described as .

Based on the estimated incremental development factor, one can predict cumulative (or incremental) payments of claims for the subsequent calendar year. For example, according to the model specification in (8), it is possible to predict the cumulative payment for accident year at lag as of lag as follows:

Table 7 provides the predicted values of incremental claims under each model and the actual values as well. According to the table, we can see that in case of Other Casualty line, both unconstrained and LASSO models fail to predict the paid claims for mature years which may lead to underreserving.

General Liability Other Casualty
Unconstrained LASSO LAAD Actual Unconstrained LASSO LAAD Actual
AY=2004 18,623 8,786 9,292 10,460 -13,991 -10,598 716 1,702
AY=2005 16,780 9,251 9,784 18,236 -2,008 606 766 4,655
AY=2006 63,989 44,503 30,085 42,420 1,862 768 881 1,098
AY=2007 47,066 33,846 23,895 49,790 -925 729 836 6,641
AY=2008 182,805 165,041 152,477 62,450 13,655 11,732 4,519 24,195
AY=2009 133,732 122,806 115,084 116,112 25,547 22,757 12,189 23,449
AY=2010 148,684 141,548 136,946 152,345 32,934 31,363 25,463 11,790
AY=2011 176,050 170,832 168,006 220,413 52,996 51,184 44,384 55,776
AY=2012 167,789 183,974 196,140 203,434 135,984 139,989 163,117 165,383
Total 955,517 880,589 841,709 875,659 246,054 248,531 252,870 294,690
Table 7: Summary of predicted incremental paid claims

It is also possible to evaluate the performance of prediction based on usual validation measures such as root mean squared error (RMSE) and mean absolute error (MAE) defined as follows, where smaller values of RMSE and MAE indicate preferred model:

Table 8 shows us that LAAD model is the most preferred in terms of prediction performance measured by RMSE and MAE in both lines of business.

Table 8: Model comparison of validation measures
General Liability Other Casualty
Unconstrained LASSO LAAD Unconstrained LASSO LAAD
RMSE 45447.55 39250.54 36573.14 14075.55 12506.03 9919.94
MAE 28395.27 24200.97 23778.28 10738.65 9478.27 7685.04

Finally, to account for uncertainty of parameter estimation in each model, we incorporate the bootstrap approach to simulate unpaid claims for subsequent calendar year under each model. Similar bootstrap approach has been done in shi2011depreserve and gao2018bayesian. From Figure 4, one can see that simulated unpaid claims under LAAD model is the closest to the actual unpaid claims for that year in General Liability line, while all three models show similar behavior in case of Other Casualty line. Details of simulation scheme with bootstrap is provided in Appendix C.

Figure 4: Predictive density of incremental reported losses for each model via Bootstrap

5 Concluding remarks

In this paper, we introduce LAAD penalty derived from the use of Laplace hyperprior for the in Bayesian LASSO. It is also shown that the proposed penalization method has some good properties such as variable selection with reversion to the true regression coefficients, analytic solution for the univariate case, and an optimization algorithm for the multivariate case which converges under modest condition via coordinate descent. The novelty of the proposed method is also shown with a simulation study. In the simulation study, use of LAAD penalty outperforms the other methods such as OLS or LASSO penalization in terms of better prediction and the ability to capture the correct level of model sparsity. Finally, we also explore a possible use of LAAD penalty in actuarial application, especially calibration of loss development model and tail factor selection. According to the results of the empirical analysis, one can see that use of LAAD penalty ended up with reasonable loss development pattern while the other methods deviate from that pattern. As future research work, with several advantages described in this paper, it is expected that one can apply regularization method with LAAD penalty not only to aggregate loss reserving model but also to individual loss reserving model, which would naturally incorporate many more covariates.

It is easy to see that so we can start from the case that is not a negative number. Then we have the following:

Note that if , then for and . Thus, .

Case 1)

Since should be non-negative, we just need to consider . If , then

If , then we have

Thus, for both cases we have only one local minimum point for and is indeed, a global minimum point so that .

Case 2)

In this case, so that . Therefore, strictly increasing and .

Case 3)

In this case, . Moreover, , and . Therefore, .

Case 4)

Here, let and . Now, let us show that is the local minimum of - which only requires to show that . Again, it suffices to show that as follows:

Therefore, is a local minimum of and would be either or . So in this case, we have to compute and

Note that for fixed ,

Thus, is strictly decreasing with respect to and

has unique solution because if and if . Hence

where is the unique solution of for given . See Figure 5.

Figure 5: Distribution of optimizer along with r and z

Once we get a result for , we can use the same approach to when .

Suppose is fixed as for all . Then we can observe that

where and .

As usual, we can start from the case that . First, one can easily check that is a decreasing function of where and . When , according to the arguments in the proof of Theorem 1, is strictly decreasing when and strictly increasing when if and belong to Case 1, Case 2, and Case 3. Note that if , then we may exclude Case 4. Therefore, is hemivariate and quasi-convex if and also if because of the symmetry of penalty term.

  • Simulate where .

  • Using the simulated values of in step (1), estimate bootstrap replication of the parameters .

  • Based on , predict the unpaid loss for the next year which is given as follows: