Safer Together: Machine Learning Models Trained on Shared Accident Datasets Predict Construction Injuries Better than Company-Specific Models

01/09/2023
by   Antoine J. -P. Tixier, et al.
0

In this study, we capitalized on a collective dataset repository of 57k accidents from 9 companies belonging to 3 domains and tested whether models trained on multiple datasets (generic models) predicted safety outcomes better than the company-specific models. We experimented with full generic models (trained on all data), per-domain generic models (construction, electric T D, oil gas), and with ensembles of generic and specific models. Results are very positive, with generic models outperforming the company-specific models in most cases while also generating finer-grained, hence more useful, forecasts. Successful generic models remove the needs for training company-specific models, saving a lot of time and resources, and give small companies, whose accident datasets are too limited to train their own models, access to safety outcome predictions. It may still however be advantageous to train specific models to get an extra boost in performance through ensembling with the generic models. Overall, by learning lessons from a pool of datasets whose accumulated experience far exceeds that of any single company, and making these lessons easily accessible in the form of simple forecasts, generic models tackle the holy grail of safety cross-organizational learning and dissemination in the construction industry.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/27/2022

Stock2Vec: An Embedding to Improve Predictive Models for Companies

Building predictive models for companies often relies on inference using...
research
12/11/2021

Selecting Parallel In-domain Sentences for Neural Machine Translation Using Monolingual Texts

Continuously-growing data volumes lead to larger generic models. Specifi...
research
08/16/2019

AI Predicts Independent Construction Safety Outcomes from Universal Attributes

This paper significantly improves on, and finishes to validate, the appr...
research
11/12/2018

Boosting Model Performance through Differentially Private Model Aggregation

A key factor in developing high performing machine learning models is th...
research
05/01/2023

Company classification using zero-shot learning

In recent years, natural language processing (NLP) has become increasing...
research
03/30/2022

Predicting Winners of the Reality TV Dating Show The Bachelor Using Machine Learning Algorithms

The Bachelor is a reality TV dating show in which a single bachelor sele...
research
09/28/2020

An Iterative Approach based on Explainability to Improve the Learning of Fraud Detection Models

Implementing predictive models in utility companies to detect Non-Techni...

Please sign up or login with your details

Forgot password? Click here to reset