An engine to simulate insurance fraud network data

08/21/2023
by   Bavo D. C. Campo, et al.
0

Traditionally, the detection of fraudulent insurance claims relies on business rules and expert judgement which makes it a time-consuming and expensive process (Óskarsdóttir et al., 2022). Consequently, researchers have been examining ways to develop efficient and accurate analytic strategies to flag suspicious claims. Feeding learning methods with features engineered from the social network of parties involved in a claim is a particularly promising strategy (see for example Van Vlasselaer et al. (2016); Tumminello et al. (2023)). When developing a fraud detection model, however, we are confronted with several challenges. The uncommon nature of fraud, for example, creates a high class imbalance which complicates the development of well performing analytic classification models. In addition, only a small number of claims are investigated and get a label, which results in a large corpus of unlabeled data. Yet another challenge is the lack of publicly available data. This hinders not only the development of new methods, but also the validation of existing techniques. We therefore design a simulation machine that is engineered to create synthetic data with a network structure and available covariates similar to the real life insurance fraud data set analyzed in Óskarsdóttir et al. (2022). Further, the user has control over several data-generating mechanisms. We can specify the total number of policyholders and parties, the desired level of imbalance and the (effect size of the) features in the fraud generating model. As such, the simulation engine enables researchers and practitioners to examine several methodological challenges as well as to test their (development strategy of) insurance fraud detection models in a range of different settings. Moreover, large synthetic data sets can be generated to evaluate the predictive performance of (advanced) machine learning techniques.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/15/2020

Social network analytics for supervised fraud detection in insurance

Insurance fraud occurs when policyholders file claims that are exaggerat...
research
05/17/2021

Putting a Compass on the Map of Elections

Recently, Szufa et al. [AAMAS 2020] presented a "map of elections" that ...
research
10/04/2018

SiMRX - A Simulation toolbox for MRX

SiMRX is a MRX simulation toolbox written in MATLAB for simulation of re...
research
08/19/2019

Issues arising from benchmarking single-cell RNA sequencing imputation methods

On June 25th, 2018, Huang et al. published a computational method SAVER ...
research
08/13/2020

SynthETIC: an individual insurance claim simulator with feature control

A simulator of individual claim experience called SynthETIC is described...
research
09/20/2021

Model Bias in NLP – Application to Hate Speech Classification

This document sums up our results forthe NLP lecture at ETH in the sprin...
research
11/17/2021

Sampling To Improve Predictions For Underrepresented Observations In Imbalanced Data

Data imbalance is common in production data, where controlled production...

Please sign up or login with your details

Forgot password? Click here to reset