Trace Encoding in Process Mining: a survey and benchmarking

01/05/2023
by   Sylvio Barbon Jr., et al.
0

Encoding methods are employed across several process mining tasks, including predictive process monitoring, anomalous case detection, trace clustering, etc. These methods are usually performed as preprocessing steps and are responsible for transforming complex information into a numerical feature space. Most papers choose existing encoding methods arbitrarily or employ a strategy based on a specific expert knowledge domain. Moreover, existing methods are employed by using their default hyperparameters without evaluating other options. This practice can lead to several drawbacks, such as suboptimal performance and unfair comparisons with the state-of-the-art. Therefore, this work aims at providing a comprehensive survey on event log encoding by comparing 27 methods, from different natures, in terms of expressivity, scalability, correlation, and domain agnosticism. To the best of our knowledge, this is the most comprehensive study so far focusing on trace encoding in process mining. It contributes to maturing awareness about the role of trace encoding in process mining pipelines and sheds light on issues, concerns, and future research directions regarding the use of encoding methods to bridge the gap between machine learning models and process mining.

READ FULL TEXT

page 5

page 6

page 7

page 19

page 20

page 21

research
09/01/2021

Selecting Optimal Trace Clustering Pipelines with AutoML

Trace clustering has been extensively used to preprocess event logs. By ...
research
06/05/2019

Automated Machine Learning: State-of-The-Art and Open Challenges

With the continuous and vast increase in the amount of data in our digit...
research
04/04/2022

Event Log Sampling for Predictive Monitoring

Predictive process monitoring is a subfield of process mining that aims ...
research
10/13/2021

Expert-driven Trace Clustering with Instance-level Constraints

Within the field of process mining, several different trace clustering a...
research
05/11/2022

A Survey on Fairness for Machine Learning on Graphs

Nowadays, the analysis of complex phenomena modeled by graphs plays a cr...
research
03/29/2023

TraVaG: Differentially Private Trace Variant Generation Using GANs

Process mining is rapidly growing in the industry. Consequently, privacy...
research
08/01/2023

Learning from Hypervectors: A Survey on Hypervector Encoding

Hyperdimensional computing (HDC) is an emerging computing paradigm that ...

Please sign up or login with your details

Forgot password? Click here to reset