Synthetic Dataset Generation of Driver Telematics

01/30/2021
by   Banghee So, et al.
0

This article describes techniques employed in the production of a synthetic dataset of driver telematics emulated from a similar real insurance dataset. The synthetic dataset generated has 100,000 policies that included observations about driver's claims experience together with associated classical risk variables and telematics-related variables. This work is aimed to produce a resource that can be used to advance models to assess risks for usage-based insurance. It follows a three-stage process using machine learning algorithms. The first stage is simulating values for the number of claims as multiple binary classifications applying feedforward neural networks. The second stage is simulating values for aggregated amount of claims as regression using feedforward neural networks, with number of claims included in the set of feature variables. In the final stage, a synthetic portfolio of the space of feature variables is generated applying an extended SMOTE algorithm. The resulting dataset is evaluated by comparing the synthetic and real datasets when Poisson and gamma regression models are fitted to the respective data. Other visualization and data summarization produce remarkable similar statistics between the two datasets. We hope that researchers interested in obtaining telematics datasets to calibrate models or learning algorithms will find our work valuable.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 20

page 21

04/19/2017

Simultaneous Policy Learning and Latent State Inference for Imitating Driver Behavior

In this work, we propose a method for learning driver models that accoun...
03/12/2022

The Health Gym: Synthetic Health-Related Datasets for the Development of Reinforcement Learning Algorithms

In recent years, the machine learning research community has benefited t...
01/26/2021

Applications of Clustering with Mixed Type Data in Life Insurance

Death benefits are generally the largest cash flow item that affects fin...
12/26/2018

Prediction of Industrial Process Parameters using Artificial Intelligence Algorithms

In the present paper, a method of defining the industrial process parame...
08/16/2021

Information Disorders, Moral Values and the Dispute of Narratives

In this paper we propose a framework characterizing information disorder...
05/20/2017

Learning Feature Nonlinearities with Non-Convex Regularized Binned Regression

For various applications, the relations between the dependent and indepe...
05/10/2022

Nightly Automobile Claims Prediction from Telematics-Derived Features: A Multilevel Approach

In recent years it has become possible to collect GPS data from drivers ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.