Cost-sensitive Multi-class AdaBoost for Understanding Driving Behavior with Telematics

07/06/2020
by   Banghee So, et al.
University of Connecticut
UQAM
0

Powered with telematics technology, insurers can now capture a wide range of data, such as distance traveled, how drivers brake, accelerate or make turns, and travel frequency each day of the week, to better decode driver's behavior. Such additional information helps insurers improve risk assessments for usage-based insurance (UBI), an increasingly popular industry innovation. In this article, we explore how to integrate telematics information to better predict claims frequency. For motor insurance during a policy year, we typically observe a large proportion of drivers with zero claims, a less proportion with exactly one claim, and far lesser with two or more claims. We introduce the use of a cost-sensitive multi-class adaptive boosting (AdaBoost) algorithm, which we call SAMME.C2, to handle such imbalances. To calibrate SAMME.C2 algorithm, we use empirical data collected from a telematics program in Canada and we find improved assessment of driving behavior with telematics relative to traditional risk variables. We demonstrate our algorithm can outperform other models that can handle class imbalances: SAMME, SAMME with SMOTE, RUSBoost, and SMOTEBoost. The sampled data on telematics were observations during 2013-2016 for which 50,301 are used for training and another 21,574 for testing. Broadly speaking, the additional information derived from vehicle telematics helps refine risk classification of drivers of UBI.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 9

05/01/2022

Game-Theoretic Planning for Autonomous Driving among Risk-Aware Human Drivers

We present a novel approach for risk-aware planning with human agents in...
05/28/2021

How much telematics information do insurers need for claim classification?

It has been shown several times in the literature that telematics data c...
01/10/2019

Gender Differences in Injury Severity Risk of Single-Vehicle Crashes in Virginia: A Nested Logit Analysis of Heterogeneity

This paper explores gender differences in injury severity risk using a c...
10/18/2017

Transforming Speed Sequences into Road Rays on the Map with Elastic Pathing

Advances in technology have provided ways to monitor and measure driving...
07/12/2016

Improved Multi-Class Cost-Sensitive Boosting via Estimation of the Minimum-Risk Class

We present a simple unified framework for multi-class cost-sensitive boo...
05/10/2022

Nightly Automobile Claims Prediction from Telematics-Derived Features: A Multilevel Approach

In recent years it has become possible to collect GPS data from drivers ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Telematics refers to the use of telecommunication devices and technology to transmit and store information. It is derived from the French word, télématique, which combines the words “telecommunications” and “computing science.” We are seeing a growing list of applications of telematics technology in many diverse fields: a radio receiver installed in a drone for journalism reporting or for private investigation, a smart home system that remotely controls temperature, lighting, appliance, and even alarms for security, an electronic system for better communication between healthcare professionals and their patients in health telematics (Orphanoudakis et al. (1998)), and most predominantly, a device installed in a car or a mobile app to remotely monitor driving habits for insurance rating.

While the use of telematics-based insurance has not fully matured in the insurance industry, it is gaining more attraction to drivers, especially to younger ones, which gives them opportunity to save on premiums in exchange for driving behavior being monitored. According to Karapiperis et al. (2015), Progressive Insurance Company offered the first such UBI in the early 2000s where the company, in collaboration with General Motors, began to offer premium discounts linked to monitoring of driving activities and behavior. A tracking device, upon driver’s agreement, is installed in the car to collect the information through GPS technology. What follows next is the acceleration in technology advancement leading to some variations in UBI with similar names floating around that include, for example, Pay as you Drive (PAYD), Pay how you Drive (PHYD), Pay as you Drive as you Save (PAYDAYS), Pay per mile, and Pay as you Go (PASG). The use of telematics for UBI enables insurers to collect metrics of driving habits to help improve driver risk profile, thereby allowing for a more flexible pricing mechanism to reflect this observed behavior.

There is a positive social effect to vehicle telematics: it encourages better driving behavior. UBI programs have many potential benefits to consumers, insurers, and the society, in general. It allows insurers to put a price tag that links more closely to individual’s driving habits. This helps insurers to increase the predictability of their profit margin and gives more careful drivers the opportunity for more affordable premiums. Consumers can control premium costs by adopting safer driving habits or reducing frequency of driving. Moreover, it benefits the society because with safer driving and fewer drivers on the road, this reduces accidents, congestion, and car emissions. In order to get the optimal benefits of UBI to both insurers and their policyholders, it becomes subsequently crucial to identify the more significant telematics variables that truly affects the occurrence of car accidents.

It is comforting to know that there is a growing literature on telematics in actuarial science and insurance; many of these research works find additional value from telematics for better claims predictions, risk classification, and premium assessments. The work of Ayuso et al. (2016) uses, interestingly, survival models to conclude that gender discrimination is unnecessary in the presence of sufficient telematics information on driving behavior. Boucher et al. (2017) offers the benefits of using generalized additive models (GAM) to gain additional insights as to how premiums can be more dynamically assessed for PAYD policies based on telematics information. Using zero-inflated Poisson regression models, Guillen et al. (2019) investigated how telematics information helps explain part of the occurrence of zero accidents not typically accounted by traditional risk factors. For tariff determination, Ayuso et al. (2019) uses classical frequency models to incorporate significant information drawn from telematics metrics using data from a portfolio of PAYD policies issued by a Spanish insurance company. Additional research that have revealed the benefits of telematics metrics for improved understanding of driving behavior includes, but not limited to, Constantinescu et al. (2018), Verbelen et al. (2018), Pérez-Marín et al. (2019), Pesantez-Narvaez et al. (2019), and Guillen et al. (2020).

We distinguish our work from previous literature by addressing the sparsity of recorded claims for datasets with telematics information, leading to a highly imbalanced observed accident frequencies. For most portfolios of motor insurance, it is characteristically often a rare event to observe exactly one claim from a policy, let alone observing two or more claims. As we shall see, this is even more noticeable in the case of telematics data because there is a perceived self-selection for drivers with UBI. Because of the attractiveness of potential premium savings, policyholders of UBI believe and tend to be more careful drivers making the sparsity an even more important issue. The early work of Pednault et al. (2000) somehow addresses this highly imbalanced classification suggesting the use of a tree-based model with the log-likelihood as an impurity function for identifying the splitting of the regions; the common impurity measure used for classification trees is the Gini index. The work lacks the empirical evidence to support the method.

The proposed algorithm in this paper is motivated by our telematics dataset drawn from a UBI program offered by a Canadian-owned insurance cooperative. We focus on the accident frequency, that is, the number of accidents or claims made during our observation period, as the response variable. According to our training data, we have 97.1% observations with zero claims, 2.8% with exactly one claim, and merely 0.1% with two or more claims. Our observations are clearly highly imbalanced, and yet we want to investigate the predictive power of using telematics metrics, apart from the traditional metrics, to understand and predict claims frequency.

We employ an innovative multi-class classification algorithm, for which we refer to as SAMME.C2 that can handle such an imbalance. The proposed algorithm has its origin from the work of Zhu et al. (2009), which introduced SAMME (Stagewise Additive Modeling using Multi-class E

xponential loss function), a multi-class AdaBoost classification model. Adaptive boosting (AdaBoost), as introduced by

Freund and Schapire (1997), is an iterative classification algorithm that combines several weak and inaccurate learners to improve prediction accuracy. In contrast with other similar AdaBoost techniques, SAMME.C2

uses cost-sensitive learning mechanism to rebalance and tilt the class distribution by accounting for the costs of prediction errors. This paper evaluates the competitiveness of our proposed algorithm to other models that have appeared in the machine learning literature and that have demonstrated to handle imbalanced classes.

In particular, using both simulated and real datasets, we find that SAMME.C2 algorithm outperforms other multi-class classification models that can handle class imbalances: SAMME, SAMME with SMOTE, RUSBoost, and SMOTEBoost. Each of these methods is briefly explained in the subsequent section. While performance statistics such as Recall and Precision are not reasonable measures for comparison, we chose to use the G-mean metric for comparison. This performance metric is simply a geometric average of recall statistics for all classes. In addition, we find that relatively telematics variables are more significantly important predictors than traditional variables of accident frequencies.

For the rest of this paper, it has been organized as follows. Section 2 provides an overview of related works on adaptive boosting learning algorithms suitable to handle class imbalances. In Section 3, we introduce our new SAMME.C2 algorithm, which is largely based on some integration of SAMME and Ada.C2 methods; these two latter methods are therefore described in detail. We also briefly explain various performance metrics used to compare various classification models. Section 4 presents the simulation results to assess the performance of SAMME.C2. Section 5 presents results based on our telematics dataset. Section 6 concludes the paper.

2 Related work

In this section, we give an overview of related work and techniques, which may be categorized into three levels: (a) Data level; (b) Algorithm level; and (c) Cost-sensitive learning level.

2.1 Handling minority classes

As pointed out by Yang and Wu (2006), one of the challenges of data mining is dealing with observations that may suffer from the presence of imbalanced classes. At the data level, as with our telematics dataset, the class distribution is inherently not balanced creating classes that are considered majority (too many) or minority (too few). It becomes difficult to predict classes belonging to the minority since there are only a few samples to learn from regarding the features to predict from this class. One obvious approach is to resample from your dataset so that the class distribution is rebalanced by over-sampling (or under-sampling) from the underrepresented (or overrepresented) classes. The increasingly popular over-sampling approach is the technique developed by Chawla et al. (2002) called SMOTE (Synthetic Minority Over-sampling Technique). In contrast to over-sampling with replication, the SMOTE algorithm creates synthetic samples for the minority class. This is generated by drawing samples using the method of

nearest neighbors (KNN) and linearly connecting them to produce the synthetic samples. As concluded by

Chawla et al. (2002)

, it “works to cause the classifier to build larger decision regions that contain nearby minority class points.” While

SMOTE is therefore utilized to increase prediction accuracy over minority classes, we will examine this algorithm in combination with boosting methods that are utilized to increase accuracy over the entire data.

2.2 Boosting

At the algorithm level, boosting techniques are believed to be one of the most powerful learning algorithms discovered in recent years. Boosting is based on the principle of combining several weak learners to produce a strong learner for more improved and accurate predictions. Starting with equal observation weights, the algorithm is an iterative process of fitting classifiers at each step and adjusting the weights, in the subsequent steps, according to the result of the classification. More weights are given to observations that have been misclassified. Note that, as pointed out by Hastie et al. (2009), while boosting was originally intended for classification problems, the idea has been expanded to regression problems.

The first practical boosting algorithm was introduced by Freund and Schapire (1997) and is referred to as AdaBoost.M1. AdaBoost is now the name that refers to a class of adaptive boosting algorithms. Assume that we are given a set of data denoted by for where is a set of feature variables and

is a binary variable.

AdaBoost.M1 is a well-known iterative boosting algorithm. Beginning with a distribution of equal weights to the observations, this is updated after each iteration based on , which is a function of the weighted classification error

(1)

where is the distribution of the weights and is the classifier at step . This weighted error tends to increase during the iteration process while the final classifier’s training error gradually decreases. It has been shown (Friedman et al. (2000) and Hastie et al. (2009)) that AdaBoost.M1 is equivalent to an additive model with a minimization of an exponential loss function and therefore belongs to the traditional statistical family of Forward Stagewise Additive Models. This viewpoint helps the algorithm to be efficient and to have straightforward statistical interpretation. Several variants of adaptive boosting algorithms have appeared in the literature. See Ferreira and Figueiredo (2012) for a review.

AdaBoost algorithms have gained widespread popularity and several works reveal advantages of the algorithms. It has been shown that in Schapire and Singer (1999), the training error of the final classifier is bounded, and that in Freund and Schapire (1997), if each weak classifier is slightly better than random, then the training error drops exponentially fast in , the number of weak classifiers. Some have also reported that AdaBoost is robust to overfitting showing that test error consistently decreases and then levels off as more classifiers are added. In terms of practicality, many empirical applications in machine learning have shown that AdaBoost algorithms are superior classifiers. For instance, Friedman et al. (2000)

called AdaBoost with decision trees the “best off-the-shelf classifier in the world.”

AdaBoost.M1 has been extended to multi-class classification problems where belongs to the set . The extension AdaBoost.M2 in Freund and Schapire (1997), which is based on a pseudo-loss function instead of the error rate, is suitable for handling multi-class problems. The AdaBoost.MH developed by Schapire and Singer (1999) is an adaptive boosting multi-class algorithm that is based on the Hamming loss function. Because this loss function is applied to create a set of binary problems, the procedure may be slow and thereby inefficient. These are just a few extensions, but in this paper, we will compare our proposed method to the multi-class extension of the forward stagewise additive model, which is based on the generalization of exponential loss to multi-class given by

(2)

for observation . Developed by Zhu et al. (2009), this is what has been referred to as the SAMME algorithm and detailed steps of the algorithm are summarized in Appendix A.

It is quite suitable to combine the benefits of resampling and boosting, and therefore, we also considered the following algorithms:

  • SAMME with SMOTE sampling;

  • SMOTEBoost, described in Chawla et al. (2003), is an approach for learning from minority classes based on a combination of SMOTE and AdaBoost.M2; and

  • RUSBoost, described in Seiffert et al. (2010), is an algorithm that has the same goal as SMOTEBoost but replaces SMOTE sampling with random undersampling.

2.3 Cost-sensitive learning

Cost-sensitive learning provides for an additional layer of complexity in the algorithm to further improve prediction accuracy. In particular, it takes into account misclassification costs by adding a penalty to predictions that lead to incorrect classification. While costs added are primarily those accounting for misclassification, it has been identified that cost adjustments may be made to reflect other types such as computational efficiency/complexity, data collection, or model evaluation. The primary objective is to minimize the total costs of the model. Cost-sensitive algorithms that minimizes misclassification costs in classification problems first appeared in Pazzani et al. (1994). The cost matrix is an additional input to the learning procedure and is also used to evaluate the ability of the learned procedure to reduce misclassification costs. In the context of adaptive boosting, the cost adjustment function is used to modify the updating of the weights at each iteration. See Galar et al. (2012). The most familiar method of AdaBoost combined with cost-sensitive learning is Ada.C2, which inspired our algorithm suggested in this paper. In the next section, we discuss this algorithm in detail that gives us a prelude to our proposed multi-class cost-sensitive algorithm.

3 The samme.c2 algorithm

This algorithm combines the benefits of boosting and cost-sensitive algorithms for handling class imbalances in multi-class classification problems. Boosting algorithms are generally considered advantageous because implementation is straightforward, methods are statistically justified and generally suitable for many kinds of classification problems, and there are fewer issues with overfitting. By directly penalizing misclassified samples, cost-sensitive learning algorithms provide the added benefits of prediction accuracy especially for minority classes. The cost-sensitive component of our proposed algorithm is inspired by the Ada.C2.

3.1 Ada.C2 method

AdaBoost models treat samples of different classes equally. This is because weights of misclassified samples from different classes are increased by an identical ratio and weights of correctly classified samples from different classes are decreased by another identical ratio. A desirable boosting strategy for imbalance dataset is one that is able to distinguish different classes of samples and boost more weights on the samples associated with higher costs. The concept of Ada.C2 was introduced by Sun et al. (2007). The method adds the cost item to each sample as follows: input data to the algorithm consists of , for where is the total number of training samples. The differences from the original AdaBoost are

  1. The updating of the distribution of the dataset at each iteration has the form:

  2. The weight of each classifier is:

AdaBoost is accuracy-oriented and therefore, its weighting strategy may still tilt towards the majority class since it contributes more to the overall classification accuracy. However, Ada.C2 strategy of updating data distribution is by giving higher costs to minority class. On one hand, when minority classes are misclassified, the weights increase more than when majority classes are misclassified. On the other hand, when minority classes are correctly classified, the weights decrease more conservatively than when majority classes are correctly classified. Subsequently the algorithm achieves an ideal weighting strategy to address the issue of highly imbalanced classes. The steps in the algorithm are summarized in Algorithm 3.

3.2 The proposed algorithm

The proposed algorithm for which we call SAMME.C2 is a blend of SAMME and Ada.C2 algorithms. Although we inherit all the algorithmic steps in Ada.C2, there are importance differences. First, we inherit the calculation formula of at the iterative step from the SAMME algorithm that includes the addition of . As pointed out by Zhu et al. (2009), this adjustment term is crucial for multi-class classification problems as it strengthens the assurance of “the accuracy of each weak classifier to be better than random guessing.” It can be shown that the presence of the term arises as a consequence of the solution to the optimization based on the extended muti-class exponential loss function. Second, the calculation of the weighted classification error, , is not adjusted with the cost values in our algorithm. This is a result necessary to prove our algorithm follows the forward stagewise additive model.

The steps in the algorithm are summarized below:

Input: Training dataset , , ,
Output: Final classifier
1 Set initial distribution of dataset equally distributed: ;
2 for  do
3       Train weak classifier using the distribution ;
4       Get weak classifier ;
5       Compute ;
6       Choose ;
7       Update ;
8      
9 end for
10Return the final classifier: ;
Algorithm 1 SAMME.C2: cost-sensitive multi-class AdaBoost

Figure 1 displays a visual representation of the comparative effect of how the algorithm correctly classifies (or misclassifies) majority and minority classes. In the SAMME algorithm, there is an even redistribution of correct classification (or misclassification) for both majority and minority classes. On the other hand, with the addition of a cost-sensitive learning mechanism, this redistribution is uneven with a heavier weight on minority class.

Figure 1: Visualizing the effect of the SAMME.C2 algorithm on classifying majority/minority class.

4 Performance evaluation and simulation

Before we present the results of our simulation studies, we describe the performance measures used for classification problem, emphasizing that the G-mean measure is most aptly for imbalanced datasets.

4.1 Performance metrics

For classification problems, the most common performance statistics is accuracy, which is the proportion of all observations that were correctly classified. For obvious reasons, this is an irrelevant measure for imbalanced datasets. Let us consider three other performance statistics: Recall, Precision, and F1-score. We can compute performance statistics for each class, , and aggregate them with an average. The Recall statistics for class , , is defined to be the proportion of observations in class correctly classified. The Recall statistics is also sometimes called the sensitivity. The Precision statistics, , is the proportion predictions in class that were correctly classified. The F1-score, , is the harmonic average of Recall and Precision and is therefore equal to .

Consider the hypothetical example of three insurance claim frequencies for classes. Note that this illustrates highly imbalanced observations. Table 1

is a confusion matrix that is often used to describe the performance of a classifier. The columns refer to the actual observations and the rows refer to the predicted ones. Suppose the classes are designated respectively as

, with “Claim 0” as class 1, “Claim 1” as class 2, and “Class ” as class 3. For “Claim ”, we can compute the Recall , Precision , and F1-score .

Actual
Claim 0 Claim 1 Claim Total rows

Predicted

Claim 0 9,000 0 0 9,000
Claim 1 0 500 0 500
Claim 1,000 0 20 1,020
Total columns 10,000 500 20 10,520
Table 1: Confusion matrix of a hypothetical example

When we aggregate the results to provide a single measure of performance for a given classifier, we define statistics such as Macro Precision and Macro Recall to be the arithmetic average of the respective performance statistics for each class. We define Macro F1-score as the reciprocal of the harmonic average of Macro Precision and Macro Recall, that is, . For the Recall statistics, we shall use the geometric average and define

(3)

Finally, when comparing classifiers, a better performing classifier is one that gives a larger value for each of these aggregate statistics.

If we take the log of both sides of the G-mean statistics, we get an average of the log of all the Recall statistics. For log transformation, the result gives a balance of the importance of accurately classifying observations for all classes. For imbalanced datasets, this would mean that it is perfectly acceptable to sacrifice mis-classifications of a majority class to correctly classify more of the minority class. Indeed, it has been discussed that the Recall, or sensitivity, is usually a more interesting measure for imbalanced classification. For these reasons, the G-mean is therefore a reasonable performance measure for imbalanced datasets. The G-mean concept is credited to the work of Fowlkes and Mallows (1983). Table 2 provides the detailed values of the performance metrics resulting from our hypothetical example.

Values of performance metrics
Recall () Precision () F1-score ()
Claim 0
Claim 1
Claim
G-mean Macro Precision Macro F1-score
Table 2: Performance metrics of the hypothetical example in Table 1

4.2 Simulation

To investigate how SAMME.C2 algorithm performs relative to other boosting algorithms, the first crucial step is to perform this test based on a simple simulated dataset. For comparative models, as already alluded, we chose four other boosting algorithms that are potentially viable to compete when handling imbalanced multi-class classification problems: (a) SAMME, (b) SAMME with SMOTE, (c) RUSBoost, and (d) SMOTEBoost. Each of these methods has been described in the previous section.

To generate a simulated dataset, we utilize the Scikit-learn Python module described in Pedregosa et al. (2011). The module is a user-friendly tool for applying “state-of-the-art machine learning algorithms.” We find the built-in function called make_classification with parameterization executed as

"""Make Simulation"""
from sklearn.datasets  import make_classification

X, y = make_classification(n_samples=100000, n_features=50, n_informative=5-, n_redundant=0, n_repeated=0,
n_classes=3, n_clusters_per_class=2, class_sep=2, flip_y=0, weights=[0.96,0.035,0.005], random_state=16)

This generates 100,000 samples with 50 features, and 3 classes were generated, deliberately creating a highly imbalanced dataset by setting the ratio for each class as 96%, 3.5%, and 0.5%, respectively. We ask the reader to refer to the package for an explanation of the other parameters. For training, we use 75% of the data, and the rest was used for testing.

All boosting algorithms used for the analysis utilize one-depth-decision tree as a weak classifier. 200 weak classifiers are linearly combined with weights. To implement the cost values for SAMME.C2

, we employed Genetic Algorithm (GA), which is a directed random search technique invented by

Holland (1975) and described in Mühlenbein (1997). This same genetic algorithm for implementing cost function will be used in the empirical application. See also Bhowan et al. (2010)

. In the algorithm, after setting the population set, each individual in the population is tested with a fitness function. The best individuals are chosen and a new individual is made through cross-over. By adding some error to a new one, it is mutated, and new population is generated through this mutation. With new population, every step is repeated. To select the best cost vector, the G-mean value is designated as the fitness function.

Each boosting algorithm was then trained using the training data, and performance is evaluated on the training data. The results of the performance of the various models are summarized in Table 3. According to these results, in terms of the G-mean, SAMME.C2, with a G-mean of 0.93, outperforms all the other models, with SAMME with SMOTE not far behind at 0.90. However, an examination of each class Recall statistics reveals that SAMME.C2 outperforms all other models correctly classifying those belonging to the minority class. But clearly, this comes with the price of having the worst Recall statistics for the majority class, “Claim 0.”

Recall statistics
              Class                   SAMME SAMME with SMOTE                   RUSBoost                   SMOTEBoost                   SAMME.C2
Claim 0 1.00 0.96 0.88 0.99 0.88
Claim 1 0.84 0.93 0.88 0.91 0.95
Claim 0.42 0.83 0.30 0.76 0.95
G-mean 0.71 0.90 0.62 0.88 0.93
Table 3: Comparing the performance measures using G-mean based on the simulated dataset

5 Empirical data: telematics

The goal of this paper is to analyze and understand driver’s behavior using telematics. In this section, we demonstrate the strength of our SAMME.C2 algorithm as a predictive model for accident frequencies, the number of times a driver had a claim during the observation period. Using this as a predictive model, we subsequently analyze the effects of the value added by the telematics information. Insurance companies have long been using traditional variables for driver’s risk classification, such as age and gender. The advancement of technology have led insurers to offer innovative product such as UBI that uses telematics to better classify and price risks with the understanding that such additional information provides better tools for learning driver behavior.

5.1 Description of data

Our telematics dataset has been acquired from a Canadian-owned co-operative that offers insurance and investment products; its UBI program was launched in Ontario in year 2013. Raw data consisted of information from drivers who participated in this program and were observed during the period of 2013-2016, with nearly 100,000 records. The response variable of interest is accident frequency, the number of accidents observed per driver, and we can observe 0 (no claims), 1 (exactly one claim), or 2 (two or more claims). For training, we have a total of 50,301 observations; 48,822 have no claims, 1,430 have exactly one claim, and only 49 have 2 or more claims. We see the presence of highly imbalanced classes: 97.1% with no claims, 2.8% with exactly one claim, and only 0.1% with two or more claims. We have no missing values of claims frequency in the dataset.

We have a total of 49 potential predictor variables, broken down into 10 traditional (e.g., driver age, gender) and the rest are telematics driven predictor variables. The descriptions for each variable in our dataset is summarized in Table

10. To have a preliminary understanding of the data, we present in Table 4 summary statistics of three traditional continuous variables: DRIVER.AGE, VEHICLE.AGE, and CREDIT.SCORE. For example, we see that average driver age is 51.3, with 16.0 and 103.0 as the youngest and oldest drivers, respectively, in our records. The cohort of drivers in our dataset has more mature driving experience that falls within the middle age groups. According to Figure 2, there does not seem to be significant difference between male and female with respect to accident frequency even after controlling for the effect of DRIVER.AGE, VEHICLE.AGE, or CREDIT.SCORE.

Table 5 provides summary statistics of two telematics variables: DISTANCE.DRIVEN and EXPOSURE . For example, the average distance traveled is 7,555.3 kms for those without claims, 14,155.4 for those with exactly one claim, and 12,834.89 for those with two or more claims. Broadly speaking, this appears to indicate that the more distance traveled, the more likely there will be at least one claim. Figure 3 also seems to confirm that relatively compares the shape of the distribution of distance driven and exposure according to accident frequency.

Mean Std Dev Min Q1 Median Q3 Max
DRIVER.AGE 51.3 16.8 16.0 38.0 51.0 65.0 103.0
VEHICLE.AGE 5.7 4.49 -2.0 2.0 5.0 9.0 20.0
CREDIT.SCORE 754.8 88.0 390.0 716.0 780.0 811.0 892.0
Table 4: Summary statistics of three (continuous) traditional variables
Variable Acc Freq Count Mean Std Dev Min Q1 Median Q3 Max
DISTANCE.DRIVEN 0 48822 7555.3 7149.4 0.1 2374.8 5395.7 10592.7 76271.8
1 1430 14155.4 8257.3 253.9 8319.7 12657.4 18161.2 58759.2
2 49 12834.9 7925.9 2247.8 7408.1 11408.3 16621.3 46527.4
EXPOSURE 0 48822 0.49 0.31 0.00 0.24 0.50 0.73 1.08
1 1430 0.78 0.25 0.02 0.64 0.89 1.00 1.06
2 49 0.74 0.26 0.23 0.50 0.80 1.00 1.06
Table 5: Summary statistics of two telematics variables
Figure 2: Analysis of claims frequency by gender
Figure 3: Analysis of claims frequency by Exposure and Distance Driven

We can broadly classify the telematics variables into those that are considered driving maneuvers (braking, accelerations, and harsh events that include left or right turn events) and those that do not fall into this category. Most variables falling in the latter category relate to how much time drivers are on the road (percentage of driving spent during each day of the week, exposure, and distance driven). According to Table 10, there are essentially 4 types of driving maneuvers: braking, accelerations, left turn events, and right turn events.

BRAKE.xxKM refers to the number of sudden brakes applied at different measures of deceleration: 10000/13000/15000/17000/20000/23000 km per h/s. The more rapid the deceleration, the faster the car is being driven when a sudden brake is applied. Figure 4 displays a boxplot of the frequency of these sudden brakes according to accident frequency. The -axis has been measured on a logarithmic scale. First, note that the application of brakes at higher deceleration is less frequently observed. Second, the more frequent application of brakes for different decelerations shows the increase in likelihood of an accident.

Figure 4: Analysis of claims frequency by Exposure and Distance Driven

5.2 Performance evaluation

Using a test dataset, we evaluate the performance of SAMME.C2 against other classification models as earlier described. For this test dataset, we have a total of 21,574 observations: 20,901 have no claims, 650 have exactly one claim, and only 23 have two or more claims. Every settings for the various algorithms are identical to the ones described in Section 4 on simulation. For the cost vector for SAMME.C2, this has been preprocessed using genetic algorithm for which the assigned cost is unique to each class in each observation.

The results summarized in Table 6 are rather very encouraging for SAMME.C2. This is because in terms of the G-mean performance metrics, which is 0.71 for this algorithm, it far outperforms all the other models. Although SAMME.C2 made some sacrifices for accuracy prediction of the majority class (here, the case of zero claims), it produced a superior outcome of predicting the other classes considered minority. Several can be further deduced from these results. As in the simulation studies, the worst classifiers are either those that do not resample (SAMME) or do but uses only undersampling (RUSBoost). In the simulation studies, SAMME.C2 only slightly outperformed the other two boosting algorithms that uses the SMOTE resampling method. However, when real data is used, our results demonstrate that the use of cost-sensitive learning mechanism can outperform mechanisms that use resampling methods.

Recall statistics
Accident Frequency                   SAMME SAMME with SMOTE                   RUSBoost                   SMOTEBoost                   SAMME.C2
Accident 0 1.00 0.84 0.68 0.99 0.70
Accident 1 0.06 0.38 0.46 0.18 0.65
Accident 0.00 0.35 0.04 0.13 0.78
G-mean 0.02 0.48 0.24 0.28 0.71
Table 6: Comparing the performance measures using G-mean based on the telematics dataset

5.3 Interpretation

Several researchers and industry practitioners continue to be interested in the usefulness of vehicle telematics information to build more customized pricing models. In this subsection, we show how the results of using SAMME.C2 algorithm helps enhance our understanding of driving behavior.

5.3.1 Traditional vs telematics predictors

To evaluate the added value of telematics information, we compare the results of using SAMME.C2 with the 10 traditional variables only and with all 49 traditional and telematics variables. The results are summarized in Table 7.

From the confusion tables, it can easily be calculated that for the SAMME.C2 with all 49 variables, the resulting G-mean is 0.71; when only traditional variables are used, the resulting G-mean is 0.58. Hence, this suggests that prediction models from SAMME.C2 that includes telematics information outperform those with just traditional predictor variables. Prior studies have similarly concluded that claims frequency models that contain telematics information are far better than just those with classical predictor variables. See Boucher et al. (2017), Ayuso et al. (2019), and Guillen et al. (2020). Note that in this table, the diagonal values give the numbers of correctly classified instances for each accident frequency according to the SAMME.C2 algorithm.

With traditional variables only Actual Acc 0 Acc 1 Acc Tot row Predicted Acc 0 11674 175 1 11850 Acc 1 5319 274 3 5596 Acc 3908 201 19 4128 Tot col 20901 650 23 21574 With traditional and telematics variables Actual Acc 0 Acc 1 Acc Tot row Predicted Acc 0 14553 108 0 14661 Acc 1 5669 420 5 6049 Acc 679 122 18 819 Tot col 20901 650 23 21574
Table 7: Confusion tables based on the model fit of SAMME.C2
Figure 5: Feature importance based on the SAMME.C2 predictions - using training data

Figure 5 depicts the relative importance of the feature variables, with the telematics variables listed first and followed by the traditional variables. There are quite a number of conclusive evidence that we can draw and consider from this decomposition of the contributions of the feature variables. First, this figure highlights that there are many more telematics than traditional variables that are better predictor variables of accident frequency. For example, the top five telematics features (EXPOSURE, PCT.TRIP.TUE, PCT.TRIP.THU, DISTANCE.DRIVEN, and PM.RUSH.HOUR) are relatively more important than the top traditional feature (YRS.CLAIMS.FREE). This seems to suggest that traditional features that are usually considered significant predictors may be replaced by information drawn from telematics. Second, in terms of feature importance, being the top feature variable, EXPOSURE far outweighs all other variables. This variable is defined to be the frequency of being on the road driving within a 24-hour cycle.

Third, as described in Section 5.1, we may classify the telematics variables into those considered driving maneuvers and those that relate to how much time drivers are on the road. The feature importance suggests that driving maneuvers are less important predictor variables than the time and distance travelled on the road. For example, the top six telematics variables (EXPOSURE, PCT.TRIP.TUE, PCT.TRIP.THU, DISTANCE.DRIVEN, and PM.RUSH.HOUR) are not at all related to driving maneuvers. In some sense, this can be explained by the possible effect of a change in driver behavior knowing that there is constant monitoring when a telematics device is installed. These driving maneuvers are more indicative of this change than the frequency and distance the driver will spend on the road. Finally, we find that gender has little effect in predicting accident frequency. This is in line with previous studies, such as in Ayuso et al. (2016), that gender discrimination may not be necessary with telematics information. The difference in driving behavior according to gender may well be reflected in the telematics information. However, we also note that this might simply be what our data suggest, that irrespective of the models used, there is no significant difference by gender. We found this observation in our preliminary data investigation. See Figure 3.

5.3.2 Further understanding effects of telematics

To understand the predictive power of SAMME.C2, we considered drawing three sampled observations from our dataset. We selected these observations according to a common set of traditional variables: a single, male, 48-year-old driver with the range of a high credit score and have more than 18 claims-free years. For all three observations, all the values of the telematics variables have been varied. For example, the observations have respective EXPOSURE of 0.65, 0.50, and 0.14, and respective DISTANCE.DRIVEN of 23802, 7717, and 3340. Note that the first and last observations did not have any accidents, but the second one had two or more accidents. Using the model with all traditional and telematics variables, all three observations were correctly classified, but if only traditional variables were used, all three would have been incorrectly classified. These details are provided in Table 8.

DRIVER.AGE

MARITAL

CREDIT.SCORE

VEHICLE.AGE

GENDER

YRS.CLMS.FREE

EXPOSURE

DISTANCE.DRIVEN

Traditional plus Telematics: Predicted Probability (Predicted ACC_FREQ)

Traditional Only: Predicted Probability (Predicted ACC_FREQ) Actual ACC_FREQ
48 Single 793–860 0–2 Male 0.65 23802 0.34 (0) 0.34 (1) 0
0.50 7717 0.35 () 0.34 (1)
0.14 3340 0.35 (0) 0.34 (1) 0
Table 8: Comparing predicted probabilities and classification based on the model fit of SAMME.C2 for three observations

To further demonstrate the predictive power of the SAMME.C2 with telematics, we consider three hypothetical drivers with common values of traditional variables: DRIVER.AGE=40, VEHICLE.AGE=10, GENDER=Female, MARITAL=Married, VEHICLE.USE=Commute, and ZONE=Rural. Table 9 provides the predicted classification varying the values of each of the 13 top ranked telematics variables. The values of these telematics variables have been carefully picked so that we can show a prediction of each class of ACC_FREQ. Furthermore, we also preserve the observed relationships of these telematics variables to that previously discussed. For example, consider the hypothetical driver H1 for which we predict that this driver will have no accidents. Apart from the listed values of the common traditional variables, this H1 driver has EXPOSURE of 0.30, PCT.TRIP.TUE of 0.10, PCT.TRIP.THU of 0.10, DISTANCE.DRIVEN of 5000, and the rest as listed in the table.

Hypothetical driver

EXPOSURE

PCT.TRIP.TUE

PCT.TRIP.THU

DISTANCE.DRIVEN

PM.RUSH.HOUR

YRS.CLMS.FREE

PCT.TRIP.SAT

PCTWKEND.DRIV

BRAKE.15KM

CRED.SCORE

PCTWKDAY.DRIV

ACCEL.10KM

RTURN.EVENT12

Predicted ACC_FREQ
H1 0.3 0.10 0.10 5000 0.10 30 0.10 0.50 10 850 0.50 30 10 0
H2 0.8 0.13 0.13 15000 0.15 15 0.20 0.20 20 650 0.80 70 20 1
H3 0.8 0.18 0.18 15000 0.20 10 0.08 0.80 150 650 0.20 150 150
Table 9: Predicted classification based on the model fit of SAMME.C2 for hypothetical drivers varying telematics information but keeping same values of traditional variables: DRIVER.AGE=40, VEHICLE.AGE=10, GENDER=Female, MARITAL=Married, VEHICLE.USE=Commute, and ZONE=Rural

To further understand the effects of the 13 top ranked variables, we display Figure 6, which provides the resulting distributions of these variables with mean point emphasized and distinguished according to accident frequency. Each dot represents a sampled observation of 120 drivers drawn from our training dataset. Because of the apparent differences in values for all 13 top ranked variables, we standardized each variable using the min-max normalization. This transformation leads us to a common measure of values that ranges between 0 and 100 for all variables. The smallest value is assigned a 0, the largest is assigned a 100, and all others are values anywhere in between. Notice also that of these 13 top ranked feature variables, only YRS.CLAIMS.FREE and CREDIT.SCORE are traditional variables. All the rest are telematics variables.

In order to understand this figure, consider the extremely rare case of having two or more accidents. According to the figure, when compared to accident frequencies of 0 or 1, there will be no distinction with respect to the following feature variables: PCT.TRIP.TUE, PCT.TRIP.THU, PCT.TRIP.SAT, and PCT.WKEND.DRIV. This is not at all surprising because of possible high correlation of these variables to the top feature variable: EXPOSURE. Recall that EXPOSURE is the average percentage of time spent driving within a 24-hour cycle; the higher the EXPOSURE, the higher the percentage of time spent driving each day of the week or on weekend. On the other hand, on an average basis, this same extremely rare case of having two or more accidents tends to exhibit: (a) higher EXPOSURE, (b) more DISTANCE.DRIVEN, (c) more frequent driving during PM.RUSH.HOUR, (d) fewer YRS.CLAIMS.FREE, (e) more frequent BRAKE.15KM, (f) lower CREDIT.SCORE, (g) higher ACCEL.10KM, and (h) more frequent RTURN.EVENT12.

This figure also shows the directional effect of the relationships of accident frequency to each of the 13 top ranked variable, if at all there is difference. It is also interesting to note that when it comes to EXPOSURE variable, it is true that those with zero accidents have lower EXPOSURE. The pattern between one accident and two or more accidents is not as intuitive because those with accidents of two or more tend to have generally lower EXPOSURE. We attribute this to the possibility that we only have very few observations with two or more accidents.

Figure 6: Distribution and mean values of the 13 top ranked variables for a sample of 120 drivers

Finally, considering now all the observations in the training data, we visualize the distribution of the observations according to their accident frequency for the 3 top ranked predictors that are positively related to accident frequency. This is depicted in Figure 7. In each of the sub-figures, we distinguish the accident frequency according to three colors: ACC_FREQ of 0, 1, and are colored blue, orange, and green respectively. We can somewhat visualize the positive relationship of each of these 3 variables by looking at the right tail of the distribution. For each variable, we observe that the greater the value, the more observations there are in this end of the distribution. All 3 variables are considered driving maneuvers. This demonstrates that while these 3 variables are not necessarily top feature variables ranked by relative importance, the figure exhibits broad positive relationships. The more driving maneuvers the driver violates, the more likely the occurrence of an accident.

Figure 7: Histograms of 3 top ranked predictors that are positively related to accident frequency

6 Concluding remarks

In just the previous 4 to 5 years, we have seen some research work related to vehicle telematics in insurance and actuarial science. This article expands on some of this literature focusing on challenges not previously addressed. In particular, we describe our work that relates to understanding how the addition of telematics information helps improve accuracy for predicting the frequency of accidents. We treated the prediction of accident frequency as a multi-class classification problem with highly imbalanced classes. As with our empirical dataset with telematics, we find that for motor insurance claims portfolio, we typically observe a large number of drivers with zero accidents, a few number with exactly one claim, and far lesser number of drivers with two or more claims. We reviewed some existing multi-class classification models that address the presence of minority class. We find that a combination of resampling procedures and boosting have performed quite well as machine learning tools. We recommend a combination of boosting and cost-sensitive learning algorithms, for which we call SAMME.C2, to address the imbalances within a multi-class classification problem. The injection of the cost-sensitive factors has the effect of placing a heavy penalty to misclassified minority classes and simultaneously heavy reward to correct classifications of minority classes. Results of our model fitting to our telematics dataset are very promising. Conclusively, we demonstrated the superior performance of SAMME.C2 with other known boosting algorithms that are also combined with resampling.

When the SAMME.C2 algorithm is applied to our telematics dataset, we are able to draw conclusive evidence of how telematics information affects accident frequencies. First, we find that telematics are dominantly better predictors of accident frequencies than just typical classical variables (e.g. DRIVER.AGE, GENDER) used for risk classification. This suggests that telematics variables provide for a better understanding of driver behavior. Second, we are able to group the telematics according to the percentage of time drivers are on the road (e.g. EXPOSURE, PCT.TRIP.TUE, PCT.TRIP.THU) and according to driving maneuvers (e.g. BRAKE.15KM, ACCEL.10KM). We find that broadly speaking, the cluster of variables that describe percentage of time drivers are on the road are relatively more important feature variables than the cluster of variables related to driving maneuvers. Related to this are two observations about driving maneuvers. They are relatively less important because the presence of telematics device in the vehicle is in some indirect sense encouragement of good driving behavior. Drivers can have more control over these driving maneuvers, whereas the time needed to be driving on the road may be more of a necessity (e.g., driving to school or work, driving family members to events, driving for a family vacation). Finally, we find that there is evidence of positive relationships of some of these driving maneuvers to accident frequencies. The more frequent a driver is identified to violate these driving maneuvers, the more likely of the occurrence of accidents. In the future, we wish to investigate the effects of telematics variables on the claim severity or damages associated with an accident.


Appendix A. Detailed steps of the Samme and Ada.C2 algorithms

Input: Training dataset , ,
Output: Final classifier
1 Set initial distribution of dataset equally distributed: ;
2 for  do
3       Train weak classifier using the distribution ;
4       Get weak classifier ;
5       Compute ;
6       Choose ;
7       Update ;
8      
9 end for
10Return the final classifier: ;
Algorithm 2 SAMME: multi-class AdaBoost
Input: Training dataset , , ,
Output: Final classifier
1 Set initial distribution of dataset equally distributed: ;
2 for  do
3       Train weak classifier using the distribution ;
4       Get weak classifier ;
5       Choose ;
6       Update ;
7      
8 end for
9Return the final classifier: ;
Algorithm 3 Ada.C2: cost-sensitive binary AdaBoost

Appendix B. Traditional and telematics variables in the dataset

Type Variable Description
Traditional DRIVER.AGE Age of driver
GENDER Gender of the driver (M/F)
VEHICLE.AGE Vehicle age
MARITAL Marital status
VEH.USE Use of vehicle: Pleasure, Commute, Farmer, Business
CREDIT.SCORE Credit score of driver
ZONE Zone where driver lives: rural, urban
ANN.KMS.DRV.SYSTEM Kilometer driven declared by driver
YRS.CLAIMS.FREE Number of years claims free
TERRITORY Territory where vehicle is rated
Telematics EXPOSURE Exposure time in percentage of 24 hours
DISTANCE.DRIVEN Total distance driven
PCT.TRIP.xxx Percent of driving day xxx of week: MON/TUE/…/SUN
PCT.TRIP.xxx Percent vehicle driven in xxx hrs: 2HRS/3HRS/4HRS
PCT.xxx.DRIV Percent vehicle driven in xxx of week: WKDAY/WKEND
xx.RUSH.HOUR Percent of driving in xx rush hours: AM/PM
AVGDAY.USE.WKLY Average number of days used per week
ACCEL.xxKM Number of sudden acceleration 10/13/15…/23 km/h/s per 1000km
BRAKE.xxKM Number of sudden brakes 10/13/15…/23 km/h/s per 1000km
LTURN.EVENTxx Number of left turn per 1000km with intensity 08/09/10/11/12
RTURN.EVENTxx Number of right turn per 1000km with intensity 08/09/10/11/12
Response ACC_FREQ Frequency of accidents during observation:
Table 10: Variable names and descriptions.

References

  • Ayuso et al. (2019) Ayuso, M., Guillen, M., and Nielsen, J. P. (2019). Improving automobile insurance ratemaking using telematics: incorporating mileage and driver behaviour data. Transportation, 46:735–752.
  • Ayuso et al. (2016) Ayuso, M., Guillen, M., and Pérez-Marín, A. M. (2016). Telematics and gender discrimination: some usage-based evidence on whether men’s risk of accidents differs from women’s. Risks, 4:1–10.
  • Bhowan et al. (2010) Bhowan, U., Zhang, M., and Johnston, M. (2010). Genetic programming for classification with unbalanced data. In Proceedings 13th European Conference on Genetic Programming, EuroGP 2010, pages 1–13. Springer-Verlag Berlin.
  • Boucher et al. (2017) Boucher, J.-P., Côté, S., and Guillen, M. (2017). Exposure as duration and distance in telematics motor insurance using generalized additive models. Risks, 5:1–23.
  • Chawla et al. (2002) Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique.

    Journal of Artificial Intelligence Research

    , 16:321–357.
  • Chawla et al. (2003) Chawla, N. V., Lazarevic, A., Hall, L. O., and Bowyer, K. W. (2003). SMOTEBoost: Improving prediction of the minority class in boosting. In PKDD 2003: Proceedings of the Seventh European Conference on Principles and Practice of Knowledge Discovery, pages 107–119. Springer-Verlag: Berlin-Heidelberg.
  • Constantinescu et al. (2018) Constantinescu, C. C., Stancu, I., and Panait, I. (2018). Impact study of telematics auto insurance. Review of Financial Studies, 3(4):17–35.
  • Ferreira and Figueiredo (2012) Ferreira, A. J. and Figueiredo, M. A. (2012). Boosting algorithms: A review of methods, theory, and applications. In Zhang, C. and Ma, Y., editors, Ensemble Machine Learning: Methods and Applications, chapter 2, pages 35–85. Springer Science.
  • Fowlkes and Mallows (1983) Fowlkes, E. B. and Mallows, C. (1983).

    A method for comparing two hierarchical clusterings.

    Journal of the American Statistical Association, 78(383):553–569.
  • Freund and Schapire (1997) Freund, Y. and Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139.
  • Friedman et al. (2000) Friedman, J., Hastie, T., and Tibshirani, R. (2000).

    Additive logistic regression: A statistical view of boosting.

    The Annals of Statistics, 28(2):337–407.
  • Galar et al. (2012) Galar, M., Fernández, A., Barrenechea, E., Bustince, H., and Herrer, F. (2012). A review on emsembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man, and Cybernetics – Part C: Applications and Review, 42(4):463–484.
  • Guillen et al. (2019) Guillen, M., Nielsen, J. P., Ayuso, M., and Pérez-Marín, A. M. (2019). The use of telematics devices to improve automobile insurance rates. Risk Analysis, 39(3):662–672.
  • Guillen et al. (2020) Guillen, M., Nielsen, J. P., Pérez-Marín, A. M., and Elpidorou, V. (2020). Can automobile insurance telematics predict the risk of near-miss events? North American Actuarial Journal, 24(1):141–152.
  • Hastie et al. (2009) Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer: New York.
  • Holland (1975) Holland, J. H. (1975). Adaptation in Natural and Artifical Systems. Univesity of Michigan Press: Ann Arbor.
  • Karapiperis et al. (2015) Karapiperis, D., Birnbaum, B., Bradenburg, A., Catagna, S., Greenberg, A., Harbage, R., and Obersteadt, A. (2015). Usage-based insurance and vehicle telematics: Insurance market and regulatory implications. Technical report, National Association of Insurance Commissioners and The Center for Insurance Policy and Research.
  • Mühlenbein (1997) Mühlenbein, H. (1997). Genetic algorithms. In Aarts, E. H. and Lenstra, J. K., editors,

    Local Search in Combinatorial Optimization

    , pages 137–172. Princeton University Press.
  • Orphanoudakis et al. (1998) Orphanoudakis, S. C., Chronaki, C. E., Tsiknakis, M., and Kostomanolakis, S. G. (1998). Telematics in healthcare. In Wong, S. T., editor, Medical Image Databses, chapter 10, pages 251–281. Springer New York.
  • Pazzani et al. (1994) Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., and Brunk, C. (1994). Reducing misclassification costs. In ICML 1994: Proceedings of the Eleventh International Conference on Machine Learning, pages 217–225. Morgan Kaufman Publishers Inc.: San Francisco, CA.
  • Pednault et al. (2000) Pednault, E. P., Rosen, B. K., and Apte, C. (2000). Handling imbalanced data sets in insurance risk modeling. Technical report, Association for the Advancement of Artificial Intelligence (AAAI).
  • Pedregosa et al. (2011) Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.
  • Pérez-Marín et al. (2019) Pérez-Marín, A. M., Guillen, M., Alcañiz, M., and Bermúdez, L. (2019). Quantile regression with telematics information to assess the risk of driving above the posted speed limit. Risks, 7:1–11.
  • Pesantez-Narvaez et al. (2019) Pesantez-Narvaez, J., Guillen, M., and Alcañiz, M. (2019).

    Predicting motor insurance claims using telematics data – XGBoost versus logistic regression.

    Risks, 7:1–16.
  • Schapire and Singer (1999) Schapire, R. E. and Singer, Y. (1999). Using boosting algorithms using confidence-rated predictions. Machine Learning, 37:297–336.
  • Seiffert et al. (2010) Seiffert, C., Khoshgoftaar, T. M., Van Hulse, J., and Napolitano, A. (2010). RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, 40(1):185–197.
  • Sun et al. (2007) Sun, Y., Kamel, M. S., Wong, A. K., and Wang, Y. (2007). Cost-sensitive boosting for classification of imbalanced data. Pattern Recognition, 40(12):3358–3378.
  • Verbelen et al. (2018) Verbelen, R., Antonio, K., and Claeskens, G. (2018). Unravelling the predictive power of telematics data in car insurance pricing. Journal of the Royal Statistical Society: Series C (Applied Statistics), 67(5):1275–1304.
  • Yang and Wu (2006) Yang, Q. and Wu, X. (2006). 10 challenging problems in data mining research. International Journal of Information Technology & Decision Making, 5(4):597–604.
  • Zhu et al. (2009) Zhu, J., Zou, H., Rossett, S., and Hastie, T. (2009). Multi-class AdaBoost. Statistics and Its Interface, 2:349–360.