Chain ladder method, as an industry benchmark with theoretical discussions such as mack1993chainladder and mack1999chainladderse, has been widely used to determine the development pattern of reported or paid claim. However, despite the prevalence of the method, we need to consider some issues in estimation of development factors for mature years with chain ladder method. In general, we expect that cumulative reported loss amount gradually increases whereas the magnitude of development decreases. However, it is possible that loss development patterns with some run-off triangles may not follow that usual pattern. As mentioned in renshaw1989chain, it is because we only have a few data points in the north-east corner and estimation of parameters which depends on those data points becomes unstable, due to triangular or trapezoidal shape of aggregated claims data. Therefore, we need to consider a way to estimate the development factors for mature years with more stability.
Further, loss development may continue beyond ten years, which is usually the maximum number of years tracked in run-off triangles so that one can consider the ‘tail factor’ of loss development to predict ultimate claims. However, naive use of chain ladder method does not provide a way to estimate tail factor so that tail factor is chosen subjectively or according to industry benchmark in practice. In this regard, there have been some research works on the selection of tail factor based on a loss development triangle, including but not limited to boor2006tailfactor, verrall2012tailfactor, and merz2013tailfactor.
In order to deal with the aforementioned issues, stable estimation of development factors for mature years and tail factor selection, one can apply regularization method or penalized regression in loss development models. Today, there is a rich literature of using penalization in regression framework. The first penalization method introduced is ridge regression, developed byhoerl1970ridge. By adding an penalty term on the least squares, they showed that it is possible to have smaller mean squared error when there is severe multicollinearity in a given dataset. However, ridge regression has merely the shrinkage property and not the property of variable selection. To tackle the latter, tibshirani1996lasso suggested LASSO, which uses an penalty term on the least squares, and showed that this method enables us to do variable selection, which leads to dimension reduction as well. Despite the simplicity of the proposed method, there has been a great deal of work done to extend the LASSO framework. For example, park2008bayesian
extended LASSO by providing a Bayesian interpretation on LASSO. Although LASSO has the variable selection property, the estimates derived by LASSO regression are inherently biased. However, there have been some meaningful approaches so that we obtain both the variable selection property and less biased estimates. For example,fan2001scad
proposed smoothly clipped absolute deviation (SCAD) penalty, which is derived by assuming continuously differentiable penalty function to achieve three properties such as (i) getting nearly unbiased estimate when absolute value of the true unknown parameter is large, (ii) variable selection, and (iii) the continuity of the calculated estimates. Although SCAD penalty has the above-mentioned good properties, it is naturally a non-convex optimization so that it loses the desirable properties of convex optimization problems. Thus,zhang2010mcp proposed minimax concave penalty (MCP), which minimizes the maximum concavity subject to unbiasedness feature.
The use of penalized regression in the actuarial literature is not quite new, but there have only been a few relevant works in the field. For instance, williams2015elastic applied elastic net penalty, which is a combination of and penalties, on a dataset with over 350 initial covariates to enhance insurance claims prediction. In addition, nawar2016 used LASSO for detecting possible interaction between covariates which are used in claims modeling.
In this paper, we extend the Bayesian LASSO framework of park2008bayesian
using conjugate hyperprior distributional assumptions. This extension leads to a new penalty function, which we call log-adjusted absolute deviation (LAAD), which enables us to obtain variable selection property while maintaining less bias of the estimator. This motivated us to apply LAAD penalty in cross-classfied model used in loss development for reserving to choose the tail factor. To calibrate the model, we use reported loss triangles from multiple lines of business from a property and casualty (P&C) insurer. We compare the estimated loss development factors including the tail factor of our proposed model with usual cross-classified model and cross-classified model with LASSO penalty. All these models are comparably explained in the section on estimation and prediction; we also discussed the validation measures. It turns out that our proposed model provides us reasonable estimates of loss development factors which agree with our prior knowledge or expectation on loss development pattern as well as shows better performance in the prediction of reserve.
This paper has been organized as follows. In Section 2, we develop the construction of the new penalty via Bayesian interpretation of LASSO and its properties by comparing it with other penalty functions. In Section 3, we postulate the novelty of our proposed method via simulation study. In Section 4, we explore possible use of our method in insurance reserving application using a multi-line reported loss triangles dataset and provide results of the estimation and prediction for the various models. We make a conclusion in Section 5.
2 Proposed method: LAAD
2.1 Derivation of loss function with LAAD penalty
According to park2008bayesian, we may interpret LASSO in a Bayesian framework as follows:
is a vector of sizewith each component having a density function , for . According to their specification, we may express the likelihood and the log-likelihood for , respectively, as
In their work, park2008bayesian suggested two ways to choose the optimal in Equation (1). One is the use of point estimate by cross-validation, and the other is the use of a ‘hyperprior’ distribution for . However, they did not provide a detailed derivation of the likelihood when a hyperprior is used.
Now, consider the following distributional assumptions
In other words, the hyperprior of
follows a gamma distribution with density. This implies that we have:
As a result, the log-likehood in Equation (2) allows us to have the following formulation of our penalized least squares problem. This gives rise to what we call the log-adjusted absolute deviation (LAAD) penalty function:
A similar penalty function is described as hierarchical adaptive lasso in lee2010hal but the authors did not provide details of the derivation.
2.2 Estimation with LAAD penalty
To better understand the characteristic of a model with LAAD penalty, let us consider a simple example when and . In this case, optimization of in Equation (2) is reduced to a univariate case so that it is sufficient to solve the following:
Let us set . Then the corresponding minimizer will be given as , where
and is the unique solution of
See Appendix A. ∎
Note that when is large enough, which means converges to when . Therefore, by using LAAD penalty, we obtain an optimizer which has the variable selection property, and bias reduction property as true parameter value departs from zero. Figure 1 provides graphs which describe the behavior of the obtained optimizer derived with different penalization. The first graph is the behavior of the optimizer derived with penalty, which is also called ridge regression. In that case, as previously mentioned, it has no variable selection property but it only shrinks the magnitude of the estimates. The second graph is the behavior of the optimizer derived with penalty, which is the basic LASSO. For this case, we see that although it has variable selection property (if value of is small enough, then becomes 0), the discrepancy between the true and remains constant even when the true is very big. Finally, the third graph shows the behavior of the optimizer derived with the proposed LAAD penalty. One can see that not only the given optimizer has the variable selection property, but also converges to as increases.
Figure 2 illustrates the constraint regions implied by each penalty. It is well known that the constraint regions defined by penalization is a -dimensional circle, whereas the constraint regions defined by penalization is a -dimensional diamond. We can observe that in both cases of and penalization, the constraint regions are convex, which implies we entertain good properties of convex optimization. However, in the case of the constraint region implied by LAAD penalty, the region is non-convex, which is inevitable to obtain both the consistency of the estimates and the variable selection property.
It is also possible to compare the behavior of LAAD penalty with SCAD penalty and MCP. According to fan2001scad and zhang2010mcp, one can write down the penalty functions (and their derivatives) in the univariate case, assuming , as follows:
From above, it is straightforward to see that
This implies that the marginal effect of penalty converges to 0 as the value of increases and hence, the magnitude of distortion on the estimate becomes negligible as the true coefficient gets larger when we use either SCAD penalty, MCP, or LAAD penalty. However, we see that , which means that the magnitude of distortion on the estimate is the same even in the case when the true coefficient is very large.
On the other hand, if we let to 0, one can see that
which implies that for SCAD penalty, MCP, and LAAD penalty, the magnitude of penalization is the same with LASSO when the true value of is very small. Therefore, we verify that SCAD, MCP, and LAAD penalty have the same property of variable selection as LASSO, when the true is small enough.
2.3 Implementation in general case and convergence analysis
Estimating parameters from given penalized least squares is an optimization problem. Since an analytic solution is obtained in the case of univariate penalized least squares, one can implement an algorithm for optimization. For example, for obtaining in the multivariate case, we may apply coordinate descent algorithm proposed by luo1992coordinate, which starts with an initial set of estimates and then successively optimize along each coordinate or blocks of coordinates. The algorithm is explained in details as follows:
Although our optimization problem is non-convex, we can obtain a sufficient condition so that coordinate descent algorithm converges with our optimization problem. To show the convergence, we need to introduce the concepts of quasi-convex and hemivariate. A function is hemivariate if a function is not constant on any interval which belongs to its domain. A function is quasi-convex if
An example of a function which is quasi-convex and hemivariate is .
The following lemma is useful for obtaining a sufficient condition that our optimization problem converges with coordinate descent algorithm.
Suppose a function is defined as follows:
and for all . If , then is both quasi-convex and hemivariate for all .
See Appendix B. ∎
If for all and , then the solution from coordinate descent algorithm with function converges to where
According to Theorem 5.1 of tseng2001cdescent, it suffices to show that (i) is continuous on , (ii) is lower semicontinuous, and (iii) is quasi-convex and hemivariate. (i) and (ii) are obvious and (iii) could be shown from Lemma 1. ∎
3 Simulation study
In this section, we conduct a simulation study in order to demonstrate the novelty of our proposed method. Suppose we have the following nine available covariateswhich are generated as follows:
One can check that if a regression model is calibrated using , then the estimated regression coefficients are all significant. However, even if all covariates are significant by themselves, omission of effective interaction term, , can lead to biases in the estimated coefficients and subsequently the lack of fit as illustrated in Figure 3. In Figure 3, reduced model means a linear model fitted only with , while true model is a linear model fitted with .
On the other hand, including every interaction terms may also end up with an inferior model since it may accumulate noises in the estimation which lead to higher variances in the estimates. As elaborated injames2013isl, the mean squared error (MSE) of a predicted value under a linear model is determined by both the variance of the predicted value and the squared bias of the estimated regression coefficients as follows:
From Equation (4), we note that by including fewer variables in our model with variable selection, we can get lower . However, it could increase due to omitted variable bias (i.e., if a variable has been selected out) or inherent bias of the estimated value because of the penalization. Therefore, it implies that if most of the original variables are significant so that the magnitude of the bias is too high, then the benefit of a reduced is offset by a higher . In this regard, variable selection should be performed carefully to achieve a balance between the bias and variance and get better prediction with lower mean squared error.
To show the novelty of our proposed penalty function, we first obtain 100 replications of simulated samples with sample size 1000 and estimate the regression coefficients based on the following four models:
Full model: a linear model fitted with and every possible interaction among them,
Reduced model: a linear model fitted only with ,
LASSO model: Full model regularized with penalty,
LAAD model: Full model regularized with LAAD penalty.
To evaluate the estimation result under each model, we introduce the following metrics which measure the discrepancy between the true coefficients and estimated coefficients under each model:
where means the true value of coefficient and refers to the estimated value of coefficient with simulated sample. According to Table 1, Full model
is most favored in terms of the biases of estimated coefficients, which is reasonable since ordinary least square (OLS) estimator is unbiased. However, one can see MSEs of estimated coefficients underFull model is greater than those of LAAD model so that LAAD model is expected to provide better estimation in general. It is also observed that estimation results with Reduced model and LASSO model are quite poor, which means use of naive LASSO penalty may not work well.
|‘x1 : x6’||0.001||0.000||-0.004||0.000||0.012||0.000||0.008||0.000|
|‘x2 : x3’||-0.001||-1.000||-0.024||-0.024||0.016||1.000||0.029||0.029|
|‘x3 : x4’||0.001||0.000||0.002||0.000||0.008||0.000||0.006||0.000|
|‘x4 : x6’||0.000||0.000||-0.001||0.000||0.005||0.000||0.004||0.000|
Besides the values of estimated coefficients, it is also of interest to have the ability to capture correct degree of sparsity in a model with the following measures:
Table 2 shows how LAAD model captures the sparsity of the true model correctly. One can see that LAAD model shows the least mean and norm differences while Full model fails to capture the sparsity of the true model. Therefore, this simulation supports the assertion that our proposed LAAD penalty can be utilized in practice with better performance.
|Mean norm difference||1.351||4.478||0.88||0.272|
|Mean norm difference||35.000||1.000||19.07||0.440|
4 Empirical analysis: application in loss development methods
4.1 Data characteristics
A dataset from ACE Limited 2011 Global Loss Triangles is used for our empirical analysis which is shown in Tables 3 and 4. This dataset is summarization of two lines of insurance business including General Liability and Other Casualty in the form of reported claim triangles.
Given dataset can also be expressed in the following way:
where means the reported clam for line of insurance business in accident year with development lag. Note that in our case.
Based on the reported claim data (upper triangle), an insurance company needs to predict ultimate claim (lower triangle) described as follows:
|DL 1||DL 2||DL 3||DL 4||DL 5||DL 6||DL 7||DL 8||DL 9||DL 10|
|DL 1||DL 2||DL 3||DL 4||DL 5||DL 6||DL 7||DL 8||DL 9||DL 10|
Note that although it may be natural to consider a possible dependence among different lines of business, here we refrain from incorporating dependence in this paper in order to focus on variable selection via LAAD penalty. For this reason, we note without the superscript which denotes each line of business. Those who are interested in the dependence modeling among different lines of business might refer to shi2012multireserve and jeong2019vinereserving.
4.2 Model specifications and estimation
In our search for a loss development model, we use cross-classfied model which was also introduced in shi2011depreserve and taylor2016. For each line of business, unconstrained lognormal cross-classified model is formulated as follows:
where means the overall mean of the losses from the line of business, is the effect for accident year and means the cumulative development at year.
|General Liability||Other Casualty|
Although the constrained lognormal regression model shows us nearly perfect fit in terms of adjusted , there are two issues which need to be considered. Firstly, it is natural that incremental reported loss amount gradually decreases while cumulative reported loss amount still increases until it is developed to ultimate level, which is equivalent to for . It is observed, however, that estimated values do not show that pattern for both lines of business in Table 5. Secondly, it is known that development of loss is recorded usually for ten years in the triangle which follows the format of Schedule P, the NAIC-mandated Annual Statement. However, it is also known that there can be a claim which takes much more than ten years to be finalized in a long-tail line of P&C insurance such as workers compensation. Therefore, one needs to consider the ‘tail development factor’, which accounts for the magnitude of loss development from a finite stage of development to ultimate.
In order to handle aforementioned issues simultaneously, we propose a penalized cross-classfied model. Since both and are nuisance parameters in terms of tail factor selction, we modify the formulation in (7) in the following way:
In this formulation, can be interpreted as incremental development factor from year to year so that if for a certain value of , then it implies there is no more development of loss after years of development and would determine the tail factor. Therefore, this formulation allows us to systematically choose tail factor based on variable selection procedure performed with penalized regression on given data, not by a subjective judgment. In that regard, we propose the following three model specifications:
Unconstrained model - a model which minimizes the following for each line of business:
LASSO constrained model - a model which minimizes the following for each line of business with LASSO penalty:
LAAD constrained model - a model which minimizes the following for each line of business with LAAD penalty:
Note that for both of LASSO and LAAD constrained models, is not penalized in estimation in order to avoid underreserving issue due to penalization.
When a variable selection via penalization is implemented, it is required to set the tuning parameter which controls the magnitude of penalty, either via cross-validation or based on prior knowledge. In our search for the tuning parameter for LASSO constrained model, usual cross-validation method is applied with glmnet routine in R to choose optimal so that the average of root mean squared errors (RMSEs) on n-fold cross-validation with each value of tuning parameters are examined and a value is chosen which yields the smallest average of cross-validation RMSEs. For detail, see friedman2009glmnet.
For the calibration of the tuning parameter for LAAD constrained model, we apply the idea of prior elicitation in Bayesian statistics. One can recall that derivation of LAAD penalty was based on a hierarchical Bayesian model. In this sense, calibration of the tuning parameter for LAAD penalty is equivalent to the elicitation of Laplace hyperprior. Based on our experience reserve modeling and theoretical background on penalization method, we hope that incremental loss development is decreasing as the development lag increases while it remains non-negative and also induces bias because penalization should not be too large. Therefore, we choose optimalas the smallest value among the tuning parameters satisfying in their resulting estimation.
Note that apart from the choice of tuning parameters, we also need to consider different attributes of covariates (for example, binary, ordinal, discrete, or continuous) when we do variable selection via penalization. However, since the covariates used in our empirical analysis are all binary factor variables, we can claim that either direct use of penalty or its transformation is innocuous. For the variable selection on the covariates with diverse attributes, see devriendt2018sparse.
Once the parameters are estimated in each model, the corresponding incremental development factor lag can be also estimated as , based on the formulation of lognormal cross-classified model. Table 6 summarizes the estimated results of incremental development factors for the three calibrated models. One can see that the unconstrained model deviates from our expectations on the development pattern. For example, in the case of General Liability, incremental development factor of lag is less than that of lag. In the case of Other Casualty, it is also shown that incremental development factor of lag is less than 1, which is not intuitive as well. Further, the unconstrained model also fails to estimate a tail factor. With regard to the LASSO constrained model, it is clear that naive implementation of variable selection is not working on this example. Even after the cross-validation procedure which picks the best value of for each of business line, the resulting estimates are less reasonable to explain loss development pattern. For instance, in the case of Other Casualty, estimated incremental loss development factors for and lags are still less than 1. Finally, we see that the LAAD constrained model ends up with reasonable estimates so that is always satisfied for both business lines and tail factors are well estimated with a pattern that is intuitive. For example, in the case of Other Casualty, we have so that is given as the tail factor.
|General Liability||Other Casualty|
4.3 Model validation
To validate the predictive models for loss development, calibrated using the training set (upper loss triangles) defined in (5), we use cumulative (or incremental) payments of claims for calendar year 2012 as a validation set, obtained from ACE Limited 2012 Global Loss Triangles. Note that those data points can be described as .
Based on the estimated incremental development factor, one can predict cumulative (or incremental) payments of claims for the subsequent calendar year. For example, according to the model specification in (8), it is possible to predict the cumulative payment for accident year at lag as of lag as follows:
Table 7 provides the predicted values of incremental claims under each model and the actual values as well. According to the table, we can see that in case of Other Casualty line, both unconstrained and LASSO models fail to predict the paid claims for mature years which may lead to underreserving.
|General Liability||Other Casualty|
It is also possible to evaluate the performance of prediction based on usual validation measures such as root mean squared error (RMSE) and mean absolute error (MAE) defined as follows, where smaller values of RMSE and MAE indicate preferred model:
Table 8 shows us that LAAD model is the most preferred in terms of prediction performance measured by RMSE and MAE in both lines of business.
|General Liability||Other Casualty|
Finally, to account for uncertainty of parameter estimation in each model, we incorporate the bootstrap approach to simulate unpaid claims for subsequent calendar year under each model. Similar bootstrap approach has been done in shi2011depreserve and gao2018bayesian. From Figure 4, one can see that simulated unpaid claims under LAAD model is the closest to the actual unpaid claims for that year in General Liability line, while all three models show similar behavior in case of Other Casualty line. Details of simulation scheme with bootstrap is provided in Appendix C.
5 Concluding remarks
In this paper, we introduce LAAD penalty derived from the use of Laplace hyperprior for the in Bayesian LASSO. It is also shown that the proposed penalization method has some good properties such as variable selection with reversion to the true regression coefficients, analytic solution for the univariate case, and an optimization algorithm for the multivariate case which converges under modest condition via coordinate descent. The novelty of the proposed method is also shown with a simulation study. In the simulation study, use of LAAD penalty outperforms the other methods such as OLS or LASSO penalization in terms of better prediction and the ability to capture the correct level of model sparsity. Finally, we also explore a possible use of LAAD penalty in actuarial application, especially calibration of loss development model and tail factor selection. According to the results of the empirical analysis, one can see that use of LAAD penalty ended up with reasonable loss development pattern while the other methods deviate from that pattern. As future research work, with several advantages described in this paper, it is expected that one can apply regularization method with LAAD penalty not only to aggregate loss reserving model but also to individual loss reserving model, which would naturally incorporate many more covariates.
It is easy to see that so we can start from the case that is not a negative number. Then we have the following:
Note that if , then for and . Thus, .
Since should be non-negative, we just need to consider . If , then
If , then we have
Thus, for both cases we have only one local minimum point for and is indeed, a global minimum point so that .
In this case, so that . Therefore, strictly increasing and .
In this case, . Moreover, , and . Therefore, .
Here, let and . Now, let us show that is the local minimum of - which only requires to show that . Again, it suffices to show that as follows:
Therefore, is a local minimum of and would be either or . So in this case, we have to compute and
Note that for fixed ,
Thus, is strictly decreasing with respect to and
has unique solution because if and if . Hence
where is the unique solution of for given . See Figure 5.
Once we get a result for , we can use the same approach to when .
Suppose is fixed as for all . Then we can observe that
As usual, we can start from the case that . First, one can easily check that is a decreasing function of where and . When , according to the arguments in the proof of Theorem 1, is strictly decreasing when and strictly increasing when if and belong to Case 1, Case 2, and Case 3. Note that if , then we may exclude Case 4. Therefore, is hemivariate and quasi-convex if and also if because of the symmetry of penalty term.
Simulate where .
Using the simulated values of in step (1), estimate bootstrap replication of the parameters .
Based on , predict the unpaid loss for the next year which is given as follows: