Predicting Seriousness of Injury in a Traffic Accident: A New Imbalanced Dataset and Benchmark

05/20/2022
by   Paschalis Lagias, et al.
0

The paper introduces a new dataset to assess the performance of machine learning algorithms in the prediction of the seriousness of injury in a traffic accident. The dataset is created by aggregating publicly available datasets from the UK Department for Transport, which are drastically imbalanced with missing attributes sometimes approaching 50% of the overall data dimensionality. The paper presents the data analysis pipeline starting from the publicly available data of road traffic accidents and ending with predictors of possible injuries and their degree of severity. It addresses the huge incompleteness of public data with a MissForest model. The paper also introduces two baseline approaches to create injury predictors: a supervised artificial neural network and a reinforcement learning model. The dataset can potentially stimulate diverse aspects of machine learning research on imbalanced datasets and the two approaches can be used as baseline references when researchers test more advanced learning algorithms in this area.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/22/2021

Vehicle Behavior Prediction and Generalization Using Imbalanced Learning Techniques

The use of learning-based methods for vehicle behavior prediction is a p...
research
11/01/2022

Automated Imbalanced Learning

Automated Machine Learning has grown very successful in automating the t...
research
01/12/2019

A Machine Learning Benchmark for Facies Classification

The recent interest in using deep learning for seismic interpretation ta...
research
05/14/2023

Ship-D: Ship Hull Dataset for Design Optimization using Machine Learning

Machine learning has recently made significant strides in reducing desig...
research
08/06/2020

nPrint: A Standard Data Representation for Network Traffic Analysis

Conventional detection and classification ("fingerprinting") problems in...
research
10/29/2019

Predicting Louisiana Public High School Dropout through Imbalanced Learning Techniques

This study is motivated by the magnitude of the problem of Louisiana hig...
research
12/04/2018

Utilizing Imbalanced Data and Classification Cost Matrix to Predict Movie Preferences

In this paper, we propose a movie genre recommendation system based on i...

Please sign up or login with your details

Forgot password? Click here to reset