Creating Unbiased Public Benchmark Datasets with Data Leakage Prevention for Predictive Process Monitoring

07/05/2021
by   Hans Weytjens, et al.
0

Advances in AI, and especially machine learning, are increasingly drawing research interest and efforts towards predictive process monitoring, the subfield of process mining (PM) that concerns predicting next events, process outcomes and remaining execution times. Unfortunately, researchers use a variety of datasets and ways to split them into training and test sets. The documentation of these preprocessing steps is not always complete. Consequently, research results are hard or even impossible to reproduce and to compare between papers. At times, the use of non-public domain knowledge further hampers the fair competition of ideas. Often the training and test sets are not completely separated, a data leakage problem particular to predictive process monitoring. Moreover, test sets usually suffer from bias in terms of both the mix of case durations and the number of running cases. These obstacles pose a challenge to the field's progress. The contribution of this paper is to identify and demonstrate the importance of these obstacles and to propose preprocessing steps to arrive at unbiased benchmark datasets in a principled way, thus creating representative test sets without data leakage with the aim of levelling the playing field, promoting open science and contributing to more rapid progress in predictive process monitoring.

READ FULL TEXT
research
08/04/2020

Explainable Predictive Process Monitoring

Predictive Business Process Monitoring is becoming an essential aid for ...
research
09/08/2021

How do I update my model? On the resilience of Predictive Process Monitoring models to change

Existing well investigated Predictive Process Monitoring techniques typi...
research
04/06/2018

Predictive Process Monitoring Methods: Which One Suits Me Best?

Predictive process monitoring has recently gained traction in academia a...
research
04/11/2018

Incremental Predictive Process Monitoring: How to Deal with the Variability of Real Environments

A characteristic of existing predictive process monitoring techniques is...
research
06/30/2023

Inter-case Predictive Process Monitoring: A candidate for Quantum Machine Learning?

Regardless of the domain, forecasting the future behaviour of a running ...
research
07/10/2018

A Cautionary Tail: A Framework and Casey Study for Testing Predictive Model Validity

Data scientists frequently train predictive models on administrative dat...
research
07/10/2018

A Cautionary Tail: A Framework and Case Study for Testing Predictive Model Validity

Data scientists frequently train predictive models on administrative dat...

Please sign up or login with your details

Forgot password? Click here to reset