Selecting Optimal Trace Clustering Pipelines with AutoML

09/01/2021 ∙ by Sylvio Barbon Jr., et al. ∙ Khalifa University State University of Londrina Università degli Studi di Milano 12

Trace clustering has been extensively used to preprocess event logs. By grouping similar behavior, these techniques guide the identification of sub-logs, producing more understandable models and conformance analytics. Nevertheless, little attention has been posed to the relationship between event log properties and clustering quality. In this work, we propose an Automatic Machine Learning (AutoML) framework to recommend the most suitable pipeline for trace clustering given an event log, which encompasses the encoding method, clustering algorithm, and its hyperparameters. Our experiments were conducted using a thousand event logs, four encoding techniques, and three clustering methods. Results indicate that our framework sheds light on the trace clustering problem and can assist users in choosing the best pipeline considering their scenario.



There are no comments yet.


page 12

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The execution of a business process leaves trails of the accomplished activities, performances achieved, and resources consumed. This information is stored in event logs which brace the history of the process. The executions generating the same sequence of activities are observed as the same trace by Process Mining (PM) algorithms that can group multiple executions in a single representation. Often, the variability of traces is however remarkable, and traces by themselves do not offer a helpful representation of the process. This variability causes problems for existing PM techniques. For instance, business processes with high trace variability generate spaghetti-like models, i.e., complex models with an enormous number of relations, often unreadable for the final user [2].

Trace clustering techniques have been adopted to solve this issue by identifying sub-logs grouped by trace similarity. This way, by detecting groups with homogeneous behavior, process discovery techniques can be executed in these sub-logs, producing higher quality models, which are instead accessible for stakeholders [15]. Trace clustering has also been studied in the context of explainability for PM [20], and, more recently, adapted to incorporate expert knowledge [19]

. However, selecting the appropriate clustering technique is not simple. Many transformation methods were presented, treating traces as vectors generated from bags of activities

[12], edit distance [5] or dependency spaces [13], discriminant rules [16, 25] or log footprints [20]. The set of clustering algorithms applied is also ample, e.g., k-means [16]

, hierarchical clustering


, spectral clustering

[13], constrained clustering [19], among others. Given this large set of options to setup a clustering pipeline, a non-expert user can likely feel overwhelmed.

Considering the challenge of designing pipelines for identifying the correct encoding method, clustering algorithm, and hyperparameters to be used for a specific log, we propose an AutoML framework based on Meta-learning (MtL). Our framework recommends the trace clustering pipeline that best fits a specific event log. MtL is a learning process applied to meta-data representing other learning processes and has been used successfully to emulate expert’s recommendations, maximize performance, and improve quality metrics [17]. In this work, the meta-data consist of a large set of event log features that are provided in input to the MtL workflow that outputs trace clustering pipelines described by an encoding technique, a clustering algorithm, and hyperparameters. In our scenario, MtL learning serves as an AutoML approach as it suppresses the need for expert interaction to work properly. The relationship between event log features and quality of PM techniques has been already pointed out in the literature [4, 3]. In this work, we introduce a general framework for studying this relationship for the trace clustering task using MtL. Moreover, we instantiate this framework to provide an example of its functionality. In particular, in our experiments we submit the method to a set of 1091 event logs described by 93 log features, four encoding techniques (one-hot, position profiles, bi-gram, and tri-gram), and three clustering algorithms (k-means, dbscan, and agglomerative). Results show that our approach achieves 0.77 and 0.61 F1 scores for recommending encoding and clustering techniques, respectively. We also provide a comparison with two baseline performances, highlighting the improvement supported by the MtL strategy. Although, the same framework could be applied to other techniques to further investigate the domain.

The remainder of this paper is organized as follows. Section 2 gives a historical overview of trace clustering solutions, focusing on the employed transformation and clustering methods. Section 3 defines the problem and its configuration steps, while Section 4 presents our proposed framework to solve the trace clustering recommendation issue. Section 5 presents the material used for experiments, the techniques, and quality metrics adopted. Section 6 shows the results and raises a discussion around them. Section 7 concludes the paper.

2 Related Work

Trace clustering research is deeply connected to the variant analysis problem, that is, detecting groups of similar behavior within a single business process [20]. As stated by Koninck et al. [19], clustering traces is partitioning an event log into groups of comparable traces such that each trace is assigned to a unique group, named cluster. Since its initial adoption, trace clustering has been proposed as an instrument to reduce variability. Discovering process models from clusters, for example, generally improves quality [15]. An early work in the area, presented by Greco et al. [16]

, uses a set of n-grams to encode a trace activity sequence, thus, transforming traces to feature vectors and input clustering techniques. Song et al.

[25] went further by defining multiple encoding procedures, named profiles, to represent traces as vectors. Furthermore, the authors call attention to the modularity between the profiling and clustering steps. Bose and van der Aalst [5] represent traces as strings and apply edit distance to measure trace similarity. Delias et al. [13] proposed a measure to calculate trace distance based on dependency. Following a similar line, Appice and Malerba [1] developed a co-training strategy to cluster traces based on multiple perspectives. The clusters are created using similarity measures based on multiple dimensions, namely activity, resource, sequence, and time difference. However, approaches based on instance-level similarity may be applicable only to particular domains depending on how the similarity is extracted. Thaler et al. [26] highlight that bags of activities may lose key information regarding the execution order. Delias et al. [13], show that no single optimal similarity metric is applicable for all domains and applications. Zandkarimi et al. [30] stated that trace clustering is a context-specific task. A better clarification of the problem is achieved by Koninck et al. [20], which characterize the complexity of clustering with the assessment of the best event log splitting operations. Considering the plethora of available profiling techniques and clustering algorithms, we envision two main building blocks regulating the success of clustering techniques. The first regards the encoding method, converting the trace sequences into feature vectors or computing similarity metrics. The latter comprises the clustering techniques as given the algorithm’s availability, one may not manage to choose a method. The approaches currently available in the literature are strictly attached to a specific combination of encoding and clustering algorithms; hence, they do not offer a means to study the relationship between the different steps that can compose a pipeline.

3 Problem Statement

As seen in Section 2, past research has gathered heterogeneous approaches to the trace clustering problem. An expert may be able to assess business process characteristics and relate them to clustering approaches. However, given the plethora of configuration steps and parametrization, designing the appropriate trace clustering pipeline is a complex issue even for experts.

We identified in the literature three configuration steps that highly affect the clustering results: (i) trace encoding, (ii) clustering algorithm, and (iii) hyperparameters regulating the clustering algorithm. The choice of each step is critical since slight changes deeply affect the clustering results, hindering the accessibility of solutions for non-expert users. Regarding trace encoding, Barbon et al. [4]

stated that a well-performing encoding method improves a wide range of posterior analyses without the need of tuning them. In PM applications, encoding aims at transforming traces into mathematical representations, most frequently vectors, which map process instances into a feature space. The authors also showed that there is no best encoding method for every scenario in the anomaly detection task, that is, different event logs are encoded better, considering several quality criteria, by different encoding techniques. A similar conclusion is achieved by Thaler et al.


when analyzing clustering algorithms applied to PM. The authors stated that some techniques are suitable for particular scenarios, reinforcing the argument that process characteristics may guide the decision of the appropriate clustering technique. Besides, different from supervised approaches, unsupervised learning performance is severely affected by small changes in hyperparameters, depending heavily on user-domain knowledge 

[18]. This implies the solutions proposed today are far from optimal as they are attached to a unique set of encoding and clustering algorithms.

4 AutoML as a Solution for Trace Clustering

Trace clustering solutions must be able to adapt according to domain characteristics. We then propose a framework grounded in AutoML capable to deliver suitable recommendations according to different business process behaviors. The main goal of our approach is recommending a tuple encoding, clustering, hyperparameters that maximizes quality metrics for the trace clustering problem. Fig. 1 shows the overview of building blocks controlling the framework. First, an event log repository is created to represent different business scenarios. The

Meta-feature Extraction

step mines features for each event log in the repository, creating meta-features according to MtL terminology. The description quality of the meta-features is an important constraint affecting the performance of the complete pipeline. Moreover, the Meta-target Definition defines a set of encoding and clustering (coupled with its hyperparameter) techniques that are assessed by quality metrics and ranked according to a ranking function. Then, the Meta-database combines the meta-features and meta-targets defined in previous steps, creating a data set populated by meta-instances. Using the meta-database, the Meta-learning step induces a Meta-model that is, then, used to recommend a tuple encoding, clustering, hyperparameters for a given event log considering its meta-features. It is worth mentioning that multi-output machine learning modeling for the meta-model can bring important achievements in terms of performance considering the interrelations between each step of the pipeline. In Fig. 1, green arrows indicate the steps that are used for the creation and training of the framework, while blue arrows represent a production environment where one assesses the meta-model for recommending.

Given the adaptable setup of our framework, one can implement it using a different set meta-features and meta-targets. The automatic aspect of this approach provides the user with recommendations based on event log behavior considering the possible options among the configurable steps. Moreover, other aspects are adaptable, such as the adopted quality metrics and the ranking function. Nonetheless, we note that the robustness of the approach depends on the AutoML structure, which must be maintained when the framework is instantiated in real scenarios.

Figure 1: Overview of AutoML proposal for Trace Clustering.

5 Experimental Setup

In this section, we expose the details regarding the experiments implemented to study a possible instance of our AutoML framework. This is obtained by choosing specific techniques for generating the meta-features (event log features) and the meta-targets (trace encoding and trace clustering with hyperparameters). The implementation is available for replication purposes111

5.1 Event logs and featurization

MtL benefits from using a large set of instances in the meta-database. Hence, we are aiming at a heterogeneous set of business process logs, representing different scenarios and behaviors. For that, we rely on the set of logs proposed by Barbon et al. [3]. These event logs were grouped to represent a plethora of business behaviors, mapping the relationship between process characteristics and quality metrics. This set contains both real and synthetic event logs. Regarding real-life data, there are six logs from past Business Process Intelligence Challenges (BPIC)222, the environmental permit333, helpdesk444 and sepsis555 logs. For synthetic data, the authors adopted 192 logs from the Process Discovery Contest (PDC) 2020666, an annual event organized to evaluate the efficiency of process discovery algorithms. The PDC logs are complex given the nature of employed behaviors, such as dependent tasks, loops, invisible and duplicate tasks, and noise. The next group of synthetic data contains 750 logs proposed in the context of online PM [8]. These logs are built to depict process drifts, i.e., behavior change during the business process execution. For that, a model was created and perturbed by 16 change patterns, representing different changes from the original model. Moreover, the logs contain four drift types, five noise percentages, and three trace lengths. The final group of synthetic event logs was proposed for the evaluation of trace encoding techniques [4]. This set contains 140 logs generated from five process models, six anomaly types, and four frequency percentages.

The performance of the meta-model is directly dependent on the quality of the meta-features. Thus, the group of meta-features extracted from event logs must correctly capture the process behavior and describe it from complementary perspectives. As proposing log descriptors is out of the scope of this work, we adopted the featurization introduced in [3]

. The authors presented a group of features that capture several layers of business processes, i.e., activity, trace, and log. Regarding activity-level features, the group is subdivided into: all activities, start activities, and end activities. 12 features are extracted for each group, they are the number of activities, minimum, maximum, mean, median, standard deviation, variance, the 25th and 75th percentile of data, interquartile range, skewness, and kurtosis coefficients. To capture behavior at the trace-level, the authors propose features for trace lengths and trace variants. The former group contains 29 attributes: minimum, maximum, mean, median, mode, standard deviation, variance, the 25th and 75th percentile of data, interquartile range, geometric mean and standard variation, harmonic mean, coefficient of variation, entropy, and a histogram of 10 bins along with its skewness and kurtosis coefficients. Trace variants are captured by 11 descriptors: mean number of traces per variant, standard variation, skewness coefficient, kurtosis coefficient, the ratio of the most common variant to the number of traces, and ratios of the top 1%, 5%, 10%, 20%, 50% and 75% variants to the total number of traces. Log-level behavior is captured by: number of traces, unique traces, and their ratio, and number of events. Finally, to describe log complexity, entropy-based measures have been adopted recently in PM literature

[2]. The entropy metrics proposed in [2] aim at the discretization between logs that are better mined by declarative or imperative algorithms. Hence, such metrics capture log structuredness and variability. The 14 entropy features we adopt are: trace, prefix, k-block difference and ratio (applied with k values of 1, 3 and 5), global block, k-nearest neighbor (applied with k values of 3, 5, and 7), Lempel-Ziv, and Kozachenko-Leonenko. Considering all groups, 93 meta-features were used to extract log characteristics.

5.2 Trace encoding techniques

Many PM techniques rely on encoding to transform event log-specific representations to other formats [28, 21, 23, 10]. The transformation usually applies at the trace-level, that is, converting the sequence of activities respective to a unique trace into a feature vector. In [4], the authors compared 10 different encoding techniques through the lens of quality metrics measuring data dispersity, representativeness, and compactedness. These encoding methods were inspired by three different families: PM native, word and graph embeddings. A classification task for anomaly detection was also employed to measure encoding quality. As pointed out by the authors, there is no encoding that excels in all tasks and perspectives concomitantly. For instance, graph embeddings outperform the others in the classification task and representation quality. However, these encoding methods are costly and usually sparse, meaning that there are better encoding techniques considering space and time complexity. The trace clustering literature has already experimented with several types of encoding methods. In [16] and [25]

, the authors adopt the one-hot encoding technique to transform traces before the clustering step. In

[5], the authors employ edit distance to compute the trace distance preceding the clustering. Koninck et al. [20] used log footprints, i.e., control-flow relations depicting activity sequences. In [9], the authors apply activity profiles, bi-gram and tri-gram as methods for trace encoding. Nonetheless, Leoni et al. [11] pointed that no trace similarity measure is general enough to be applicable in all scenarios.

In this work, we adopt four encoding techniques that were frequently applied in the context of trace clustering. The first one is one-hot encoding. This technique encodes activities as categorical dimensions, creating a feature vector of binary values for each trace, based on the occurrence of activities in a trace. Next, we adopt n-grams, a common technique used in text mining applications. This encoding maps groups of activities of size n into a feature vector, accounting for their occurrence or not. More specifically, we apply bi-gram and tri-gram. Finally, we applied position profiles [7], an approach that relates activity frequency and position. A log profile is created by computing the activity appearances in each trace position and its respective frequency. It follows that a trace is encoded considering the frequency of its activities in their positions according to the log profile.

5.3 Trace clustering algorithms

We selected three clustering techniques commonly applied in data mining and trace clustering literature. These techniques are grounded in different heuristics, that is, each algorithm approaches the clustering problem from a unique perspective. With this, we aim at evaluating if a particular clustering structure outperforms the others.

First, we adopt the Density-based Spatial Clustering of Applications with Noise (dbscan) algorithm [14]. The dbscan

method guides its clustering based on the density of the feature space, hence, instances in high-density regions form a cluster while instances sitting at low-density regions are regarded as outliers. The main hyperparameter affecting the clustering results is

eps, which regulates the maximum distance between two points for them to be considered of the same neighborhood. We explore different configurations of the eps hyperparameter to evaluate its impact and to recommend the best configuration in the meta-model step. For that, we apply the following eps values: 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10, 50. Moreover, we adopt k-means [22], a clustering technique that randomly selects centroids, which are the initial cluster points, and works by iteratively optimizing the centroid positions. The optimization stops either when centroid positions are stable or when the maximum number of repetitions is achieved. The k-means technique requires the expected number of clusters (k) from a given data set as a hyperparameter. We set k to these values: 2, 3, 4, 5, 6, 7, 8, 9, 10. Finally, the last technique is agglomerative clustering [29], a type of hierarchical clustering with a bottom-up approach. The algorithm starts by considering each point as a cluster. Further, it merges the clusters as the hierarchy moves up, creating a tree-like structure depicting the cluster levels and merges. Cluster pairs are merged given a linkage distance, two clusters that minimally impact the linkage distance are merged recursively. As with k-means, agglomerative clustering requires the number of clusters to be found, we then adopted the same range of values for the k parameter.

5.4 Ranking metrics

To complete the creation of a meta-database, meta-targets must be defined for each meta-instance. This way, a ranking strategy is required to compare both encoding and clustering techniques. Hence, the technique sitting at the top of the ranking strategy is the one recommended for a meta-instance, i.e., it is defined as the meta-target. As pointed out in the literature [4, 11], there is no unique solution for a problem that outperforms the others from all perspectives. Considering this hypothesis, we propose three complementary metrics to evaluate trace clustering solutions, this way, capturing different degrees of performance. Moreover, a user applying a trace clustering solution may expect to evaluate the results from several perspectives. Here, we support such a user by assessing clustering quality from a set of criteria.

Silhouette coefficient, the first metric we propose to measure performance, is based on the traditional clustering literature, commonly applied in data mining domains [24]. The Silhouette score is computed at the cluster level to capture its tightness and separation, judging instances that fit their cluster or are in between different clusters. The scores of a group of clusters can be combined to assess the relative quality of the clustering technique. Equation 1 demonstrates how the Silhouette coefficient () is obtained for a single sample considering the mean intra-cluster distance () and the mean nearest-cluster distance (). The average of the Silhouette score for all samples is the final coefficient for one clustering space, that is, the average of Equation 1 for all samples in the feature space. The Silhouette coefficient domain is , where is the worst value, indicates overlapping clusters and is the best value.


To complement this evaluation with a PM-inspired metric, we propose to measure the quality of clusters concerning trace variants. This way, by computing the trace variant frequency in each cluster, we can evaluate if the solution provides a clear separation of variants in the feature space. For that, we compute the unique traces in a cluster, and by a weighted mean, the Variant score is reached. Consider the cluster of index , the group of all clusters, the number of unique traces found in cluster and the total number of traces in the event log, Equation 2 depicts the Variant score calculation, is the optimal value.


As resource consumption is an important aspect in organizations, we also consider the computational time () of clustering as a metric to assess its quality. The lower the metric for a particular solution, the better it is ranked in comparison to others. Given this set of metrics, i.e. for silhouette coefficient, for variant score, and for computational time, a meta-target encoding, clustering, hyperparameters has to successfully balance between all metrics to be considered a good set. This way, our approach rewards techniques that excel in the three metrics, such as ignoring one or more may lead to lack of tightness, improper variant identification, and high resource consumption. Hence, we propose a ranking strategy () that combines all dimensions. Table 1 presents an example of the ranking strategy we propose. For each pair of encoding techniques and clustering algorithms, we apply it for a given event log () and measure the quality metrics (, , ). Following, a rank is built for each metric (, , ), i.e, comparing the pairs of encodings and clustering in each dimension. Finally, a rank () is computed by the average of the metrics ranks. For example, considering the pairs , and , their respective final ranks are , and . The solution chosen as the meta-target is the one that minimizes the function, in this example the pair , .

Log Encoding Clustering
0.9 0.5 50 1 2 3 2
0.3 0 10 3 1 1 1.67
0.8 0.7 15 2 3 2 2.33
Table 1: Example of ranking encoding and clustering pairs. The final rank function is the average rank of each quality dimension.

5.5 Meta-model

Before introducing the meta-learner used to create the meta-model, we first provide some particular details about the type of problem faced in our AutoML framework. As presented in Section 5.4

, we need indeed to suggest both the best trace clustering algorithm and the best trace encoding technique. This implies that given a meta-instance the system recommends the tuple that achieves the maximum performance for the combined metrics. Most research in supervised learning proposes algorithms for

single-label problems, where instances are associated with a single label from a set of disjoint labels . However, in the proposed setup, we are facing a multi-output problem, where a set of labels is associated with a single instance [27]. Following the taxonomy proposed by Tsoumakas et al. [27], we adopt a problem transformation approach, which converts the data into a format that can be used in conjunction with traditional techniques. More specifically, we employed the Binary Relevance (BR) transformation approach. BR works by transforming the original data set into data sets , where contains all instances of the original data that are labeled according to the existence or not of . Thus, BR learns

binary classifiers, one for each label

. Given a new instance, BR provides the union of the labels predicted by the classifiers.

Regarding the meta-learner, we applied the Random Forest (RF) algorithm


due to its robustness, being less prone to overfitting. RF creates a collection of decision trees with a bagging technique, i.e., randomly selecting features for each tree. This way, our meta-model combines the RF with the BR approach. Moreover, we applied a simple hyperparameter tuning technique to improve performance in the recommendation task. For that, we divided the meta-database into three sets: train, validation, and test, respectively containing 80%, 10%, and 10% of the total number of meta-instances. The grid search strategy was used for tuning. This method exhaustively evaluates all combinations of chosen hyperparameters and uses cross-validation splitting to capture an average performance. The results reported in Section

6 were extracted when applying the tuned meta-model to the test set. The hyperparameters tuned were: (i) the number of trees composing the forest, (ii) the criterion measuring split quality, (iii) the minimum required number of samples for a node split, (iv) the minimum number of samples required to be a leaf node, and (v) the number of considered features for a split.

6 Results and Discussion

In this section, we present and discuss the main experimental results regarding the proposed strategy to recommend a trace clustering pipeline based on AutoML. We started by exploring the meta-database composition by observing the encoding techniques and clustering algorithms chosen by their performance and balancing. Next, an overall analysis, including the comparison of the proposed strategy with the baselines (random and majority), is introduced, while a detailed assessment of meta-features is presented in the last part.

6.1 Meta-Learning exploratory analysis

The results, considering all algorithms for setting the meta-database, including the metrics used for ranking the meta-targets, are presented in Fig. 2. The heat-map plots show the ranking of the metrics , , and for encoding (Fig. 1(a)) and clustering (Fig. 1(b)) used to sort and identify promising algorithms as meta-targets. Each ranking varies from 1 to 81, in which 1 is the best-ranked algorithm for a given metric.

(a) Encoding Ranking
(b) Clustering Ranking
Figure 2: Ranking of encoding (a) and clustering (b) to identify the meta-target. Color variation represents the variation of ranking position.

Observing the encoding techniques (Fig. 1(a)), it is possible to identify a large discrepancy between them when evaluated by Silhouette, revealing the superiority of one-hot and position profile algorithms. Variant score and Time do not present a prominent variation such as Silhouette, leading to closer ranking positions. Based on these results, it is possible to support the hypothesis of the “no free lunch theorem” due to the ranking balance since there is no best technique for all quality criteria concurrently. However, when observing the clustering algorithms (Fig. 1(b)), it is possible to note a balance regarding Silhouette, whereas Variant score and Time reveal discrepancies. The first one, Variant score, exposes the importance of hyperparameter definition since agglomerative and k-means ranged throughout the rankings, when changing their hyperparameter k. Moreover, the Time metric delivered an important perspective, in which each clustering algorithm is recognizable regardless of its hyperparameters. In particular, agglomerative and dbscan were superior to k-means. This superiority led to no usage of k-means as a clustering meta-target.

The meta-database was built using the combination of the top ranked algorithms for each meta-instance (event logs). This combination leads to an imbalanced multi-output dataset, which was handled to support the induction of the meta-model. This imbalanced scenario can be seen in Fig. 3, where combinations such as one-hot enconding with agglomerative clustering using 10 as value () represented 469 meta-instances. The second most frequent combination was position profile with agglomerative clustering using 10 as value (), reaching 171 meta-targets. The third was one-hot using dbscan adopting a eps equals 0.001 () in 125 meta-instances. Fig. 3 represents in blue the one-hot combinations, in pink the position profile, bi-gram is brown and tri-gram gray. The domination of one-hot, followed by position profile and bi-gram is evident. Tri-gram was the best one, combined with dbscan, only with four meta-instances.

Figure 3: The combinations of encoding techniques and clustering algorithms are links, which represents a meta-instance that best fit linked algorithms colored by encoding.

When evaluating from an encoding perspective (Fig. 4), we observe a balance between dbscan with a wide range of eps and agglomerative using equals 10. Different values of for agglomerative did not meet many meta-instances. Conversely, dbscan demonstrate the necessity of hyperparameter adjustments since different values of eps could match particular meta-instances.

Figure 4: The combinations of encoding and clustering algorithms are links, which represents a meta-instance that best fit linked algorithms colored by clustering.

The imbalance issue was addressed by removing the minority classes combinations, that is, pairs of encoding techniques and clustering algorithms that appear as a meta-target for less than five meta-instances. The final meta-database was composed of 1036 samples, with fifteen different combinations of one-hot, position profile, and bi-gram with agglomerative (k in {8, 9, 10}) and dbscan (eps in {0.001, 0.005, 0.05, 0.01, 0.1, 0.5, 1}).

6.2 Meta-Model performance

Using RF as our meta-model built over the meta-database, we analyzed the performance for both encoding and clustering algorithm recommendations. It is worth mentioning that our problem was modeled as a multi-output problem, addressing encoding and clustering at once, taking advantage of possible inter-correlations between both steps.

Our proposal obtained an F1 of () when recommending the encoding technique and an F1 of () for clustering algorithm recommendation. To bring insights on the performance achieved, we compared the results with the majority classes (one-hot as encoding technique and as clustering algorithm) and with a random selection, seen in Fig. 5. The majority baseline for encoding obtained an F1 of (). The random baseline for encoding achieved () of F1. Considering clustering, the majority obtained () of F1 and random selection reached (), respectively. Regarding the mean predictive performance in terms of F1, for the whole trace clustering pipeline, our proposed AutoML approach obtained (). The results were superior to the majority and random baselines, which achieved () and (), respectively. Note that the majority results are boosted by the imbalanced scenario, for balanced meta-databases, the tendency is to underperform.

Figure 5: Performance of the AutoML framework to recommend the encoding technique and clustering algorithm in terms of accuracy and F1.

6.3 Meta-features relevance

We interpreted the outputs of our meta-model to predict encoding (Fig. 6) and clustering (Fig. 7) by taking the average absolute value of the Shapley Additive Explanation (SHAP) values. The higher relative importance for predicting encoding algorithms were obtained by the number of events (), the maximum number of activities (), and the entropy of trace length (). Similarly, the top three meta-features in terms of importance were , and when predicting clustering. It is important to mention that represented more than half of the importance among all meta-features for both encoding and clustering. The meta-feature indirectly indicates log complexity as the higher the number of events, the more heterogeneous behavior might appear, even more when considering that many of the logs come from complex models and include anomalies. Thus, becomes an important discriminator for encoding and clustering performances. These results highlight relevant directions for future research in feature extraction in PM.

Figure 6: The relative importance for each feature, obtained by taking the average absolute value of the SHAP values when recommending encoding algorithms.
Figure 7: The relative importance for each feature, obtained by taking the average absolute value of the SHAP values when recommending clustering algorithms.

7 Conclusion

In this paper, we proposed an AutoML framework to recommend the best pipeline for trace clustering based on a specific event log. For that, we extract meta-features to describe event logs and matched them with the best clustering pipeline by assessing three complementary metrics (Silhouette, Variant score, and Time). The framework recommends a tuple encoding, clustering, hyperparameters, making trace clustering solutions accessible for non-expert users. Results have shown that the framework outperforms baseline approaches. We have also provided a discussion about meta-feature influence in the decision process using SHAP values. In future research, we aim to extend the experimental evaluation to gather further insights into the relationship between trace clustering quality and event log behavior.


  • [1] A. Appice and D. Malerba (2016-11) A co-training strategy for multiple view clustering in process mining. IEEE Transactions on Services Computing 9 (6), pp. 832–845. External Links: Document Cited by: §2.
  • [2] C. O. Back, S. Debois, and T. Slaats (2019-06) Entropy as a measure of log variability. Journal on Data Semantics 8 (2), pp. 129–156. External Links: Document Cited by: §1, §5.1.
  • [3] S. Barbon Jr., P. Ceravolo, E. Damiani, and G. M. Tavares (2021) Using meta-learning to recommend process discovery methods. External Links: 2103.12874, Link Cited by: §1, §5.1, §5.1.
  • [4] S. Barbon Junior, P. Ceravolo, E. Damiani, and G. Marques Tavares (2021) Evaluating trace encoding methods in process mining. In From Data to Models and Back, J. Bowles, G. Broccia, and M. Nanni (Eds.), Cham, pp. 174–189. External Links: ISBN 978-3-030-70650-0 Cited by: §1, §3, §5.1, §5.2, §5.4.
  • [5] R. P. J. C. Bose and W. M.P. van der Aalst (2009-04) Context aware trace clustering: towards improving process mining results. In Proceedings of the 2009 SIAM International Conference on Data Mining, External Links: Document Cited by: §1, §2, §5.2.
  • [6] L. Breiman (2001) Random forests. Machine learning 45 (1), pp. 5–32. Cited by: §5.5.
  • [7] P. Ceravolo, E. Damiani, M. Torabi, and S. Barbon (2017) Toward a new generation of log pre-processing methods for process mining. In Business Process Management Forum, J. Carmona, G. Engels, and A. Kumar (Eds.), Cham, pp. 55–70. External Links: ISBN 978-3-319-65015-9 Cited by: §5.2.
  • [8] P. Ceravolo, G. M. Tavares, S. Barbon Jr., and E. Damiani (2020) Evaluation goals for online process mining: a concept drift perspective. IEEE Transactions on Services Computing (), pp. 1–1. External Links: Document Cited by: §5.1.
  • [9] P. De Koninck and J. De Weerdt (2019) Scalable mixed-paradigm trace clustering using super-instances. In 2019 International Conference on Process Mining (ICPM), Vol. , pp. 17–24. External Links: Document Cited by: §5.2.
  • [10] P. De Koninck, S. vanden Broucke, and J. De Weerdt (2018) Act2vec, trace2vec, log2vec, and model2vec: representation learning for business processes. In Business Process Management, Cham, pp. 305–321. External Links: ISBN 978-3-319-98648-7 Cited by: §5.2.
  • [11] M. de Leoni, W. M.P. van der Aalst, and M. Dees (2016) A general process mining framework for correlating, predicting and clustering dynamic behavior based on event logs. Information Systems 56, pp. 235–257. External Links: ISSN 0306-4379, Document Cited by: §5.2, §5.4.
  • [12] A. K. A. de Medeiros, A. Guzzo, G. Greco, W. M. P. van der Aalst, A. J. M. M. Weijters, B. F. van Dongen, and D. Saccà (2008) Process mining based on clustering: a quest for precision. In Business Process Management Workshops, A. ter Hofstede, B. Benatallah, and H. Paik (Eds.), Berlin, Heidelberg, pp. 17–29. External Links: ISBN 978-3-540-78238-4 Cited by: §1.
  • [13] P. Delias, M. Doumpos, E. Grigoroudis, P. Manolitzas, and N. Matsatsinis (2015) Supporting healthcare management decisions via robust clustering of event logs. Knowledge-Based Systems 84, pp. 203–213. External Links: ISSN 0950-7051, Document Cited by: §1, §2.
  • [14] M. Ester, H. Kriegel, J. Sander, and X. Xu (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD’96, pp. 226–231. Cited by: §5.3.
  • [15] M. Fani Sani, M. Boltenhagen, and W. van der Aalst (2020) Prototype selection using clustering and conformance metrics for process discovery. In Business Process Management Workshops, A. Del Río Ortega, H. Leopold, and F. M. Santoro (Eds.), Cham, pp. 281–294. External Links: ISBN 978-3-030-66498-5 Cited by: §1, §2.
  • [16] G. Greco, A. Guzzo, L. Pontieri, and D. Sacca (2006) Discovering expressive process models by clustering log traces. IEEE Transactions on Knowledge and Data Engineering 18 (8), pp. 1010–1027. External Links: Document Cited by: §1, §2, §5.2.
  • [17] X. He, K. Zhao, and X. Chu (2021) AutoML: a survey of the state-of-the-art. Knowledge-Based Systems 212, pp. 106622. External Links: ISSN 0950-7051, Document Cited by: §1.
  • [18] J. Hou, H. Gao, and X. Li (2016) DSets-dbscan: a parameter-free clustering algorithm. IEEE Transactions on Image Processing 25 (7), pp. 3182–3193. External Links: Document Cited by: §3.
  • [19] P. D. Koninck, K. Nelissen, S. vanden Broucke, B. Baesens, M. Snoeck, and J. D. Weerdt (2021-03) Expert-driven trace clustering with instance-level constraints. Knowledge and Information Systems 63 (5), pp. 1197–1220. External Links: Document Cited by: §1, §2.
  • [20] P. D. Koninck, J. D. Weerdt, and S. K. L. M. vanden Broucke (2016-12) Explaining clusterings of process instances. Data Mining and Knowledge Discovery 31 (3), pp. 774–808. External Links: Document Cited by: §1, §2, §5.2.
  • [21] A. Leontjeva, R. Conforti, C. Di Francescomarino, M. Dumas, and F. M. Maggi (2015) Complex symbolic sequence encodings for predictive monitoring of business processes. In Business Process Management, Cham, pp. 297–313. External Links: ISBN 978-3-319-23063-4 Cited by: §5.2.
  • [22] J. MacQueen (1967) Some methods for classification and analysis of multivariate observations. In

    Proceedings of the fifth Berkeley symposium on mathematical statistics and probability

    Vol. 1, pp. 281–297. Cited by: §5.3.
  • [23] M. Polato, A. Sperduti, A. Burattin, and M. d. Leoni (2018-09-01) Time and activity sequence prediction of business process instances. Computing 100 (9), pp. 1005–1031. External Links: ISSN 1436-5057 Cited by: §5.2.
  • [24] P. J. Rousseeuw (1987)

    Silhouettes: a graphical aid to the interpretation and validation of cluster analysis

    Journal of Computational and Applied Mathematics 20, pp. 53–65. External Links: ISSN 0377-0427, Document Cited by: §5.4.
  • [25] M. Song, C. W. Günther, and W. M. P. van der Aalst (2009) Trace clustering in process mining. In Business Process Management Workshops, D. Ardagna, M. Mecella, and J. Yang (Eds.), Berlin, Heidelberg, pp. 109–120. External Links: ISBN 978-3-642-00328-8 Cited by: §1, §2, §5.2.
  • [26] T. Thaler, S. F. Ternis, P. Fettke, and P. Loos (2015) A comparative analysis of process instance cluster techniques.. Wirtschaftsinformatik 2015, pp. 423–437. Cited by: §2, §3.
  • [27] G. Tsoumakas, I. Katakis, and I. Vlahavas (2010) Mining multi-label data. In Data Mining and Knowledge Discovery Handbook, O. Maimon and L. Rokach (Eds.), pp. 667–685. External Links: ISBN 978-0-387-09823-4, Document Cited by: §5.5.
  • [28] W. van der Aalst, T. Weijters, and L. Maruster (2004) Workflow mining: discovering process models from event logs. IEEE Transactions on Knowledge and Data Engineering 16 (9), pp. 1128–1142. Cited by: §5.2.
  • [29] J. H. Ward (1963-03) Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association 58 (301), pp. 236–244. External Links: Document Cited by: §5.3.
  • [30] F. Zandkarimi, J. Rehse, P. Soudmand, and H. Hoehle (2020) A generic framework for trace clustering in process mining. In 2020 2nd International Conference on Process Mining (ICPM), Vol. , pp. 177–184. External Links: Document Cited by: §2.