1 Introduction
Process mining is a fast growing discipline that combines knowledge and techniques from data mining, process modeling, and process model analysis [22]. Process mining techniques concern the analysis of events that are logged during process execution, where event records contain information on what was done, by whom, for whom, where, when, etc. Events are grouped into cases (process instances), e.g. per patient for a hospital log, or per insurance claim for an insurance company. Process discovery plays an important role in process mining, focusing on extracting interpretable models of processes from event logs. One of the attributes of the events is usually used as its label and its values become transition/activity labels in the process models generated by process discovery algorithms.
The scope of process mining have broadened in recent years from business process management to other application domains, one of them being analysis of events of human behavior with data originating from sensors in smart home environments [19, 21, 20]. Table 1 shows an example of such an event log. Events in the event log are generated by e.g. motion sensors placed in the home, power sensors placed on appliances, open/close sensors placed on closets and cabinets, etc. Particularly challenging in applying process mining in this application domain is the extraction of meaningful event labels that allow for discovery of insightful process models. Simply using the sensor that generates an event (the sensor column in Table 1) as event label is shown to produce noninformative process models that overgeneralize the event log and allow for too much behavior [21]. Abstracting sensorlevel events into events at the level of human activity (e.g. eating, sleeping, etc.) using techniques closely related to techniques used in the activity recognition field helps to discover more behaviorally more constrained and insightful process models [20]
, but applicability of this approach relies on the availability of a reliable diary of human behavior at the activity level, which is often just impossible to obtain.
Id  Timestamp  Address  Sensor  Sensor value 
1  03/11/2015 04:59:54  Mountain Rd. 7  Motion sensor  Bedroom  1 
2  03/11/2015 06:04:36  Mountain Rd. 7  Motion sensor  Bedroom  1 
3  03/11/2015 08:45:12  Mountain Rd. 7  Motion sensor  Living room  1 
4  03/11/2015 09:10:10  Mountain Rd. 7  Motion sensor  Kitchen  1 
5  03/11/2015 09:12:01  Mountain Rd. 7  Power sensor  Water cooker  1200 
6  03/11/2015 09:15:45  Mountain Rd. 7  Power sensor  Water cooker  0 
…  03/11/2015 …  Mountain Rd. 7  …  … 
7  03/12/2015 01:01:23  Mountain Rd. 7  Motion sensor  Bedroom  1 
8  03/12/2015 03:13:14  Mountain Rd. 7  Motion sensor  Bedroom  1 
9  03/12/2015 07:24:57  Mountain Rd. 7  Motion sensor  Bedroom  1 
10  03/12/2015 08:34:02  Mountain Rd. 7  Motion sensor  Bedroom  1 
11  03/12/2015 09:12:00  Mountain Rd. 7  Motion sensor  Living room  1 
…  03/12/2015 …  Mountain Rd. 7  …  … 
12  03/14/2015 03:41:46  Mountain Rd. 7  Motion sensor  Bedroom  1 
13  03/14/2015 05:00:17  Mountain Rd. 7  Motion sensor  Bedroom  1 
14  03/14/2015 08:52:32  Mountain Rd. 7  Motion sensor  Bedroom  1 
15  03/14/2015 09:30:54  Mountain Rd. 7  Motion sensor  Living room  1 
16  03/14/2015 09:35:25  Mountain Rd. 7  Power sensor  TV  160 
17  03/14/2015 10:27:37  Mountain Rd. 7  Power sensor  TV  0 
…  03/14/2015 …  Mountain Rd. 7  …  … 
…  …  …  …  … 
In our earlier work [21] we showed that better process models can be discovered by taking the name of the sensor that generated the event as a starting point for the event label and then refining these labels using information on the time within the day at which the event occurred. The refinements used in [21] were based on domain knowledge, and not identified automatically from the data. In this paper, we aim at automatic generation of semantically interpretable label refinements that can be explained to the user, by basing label refinements on data attributes of events. We explore methods to bring parts of the timestamp information to the event label in an intelligent and fully automated way, with the end goal of discovering behaviorally more precise and therefore more insightful process models.
We start by introducing basic concepts and notations used in this paper in Section 2. In Section 3, we introduce a framework for the generation of event labels refinements based on the time attribute. In Section 4, we apply this framework on a real life smart home data set and show the effect of the refined event labels on process discovery. We continue by describing related work in Section 5 and conclude in Section 6.
2 Preliminaries
In this section we introduce the notions related to event logs and relabeling functions for traces and then define the notions of refinements and abstractions. We also introduce the Petri net process model notation.
We use the usual sequence definition, and denote a sequence by listing its elements, e.g. we write for a (finite) sequence of elements from some alphabet , where for any ; denotes the length of sequence ; denotes the concatenation of sequences and . A language over an alphabet is a set of sequences over . is the prefix closure of a language (with ).
An event is the most elementary element of an event log. Let be a set of event identifiers, and be an attribute domain consisting of attributes (e.g. timestamp, resource, activity name, cost, etc.). An event is a tuple , with and . The event label of an event is the attribute set ; , and respectively denote the identifier and label of event . The timestamp attribute of an event is denoted by . is a universe of events over . The rows of Table 1 are events from an event universe over the event attributes timestamp, sensor, address, and sensor value.
Events are often considered in the context of other events. We call an event set if does not contain any events with the same event identifier. The events in Table 1 together form an event set. A trace is a finite sequence formed by the events from an event set that respects the time ordering of events, i.e. for all , , we have: . We define the universe of traces over event universe , denoted , as the set of all possible traces over . We omit in and use the shorter notation when the event universe is clear from the context.
Often it is useful to partition an event set into smaller sets in which events belong together according to some criterion. We might for example be interested in discovering the typical behavior within a household over the course of a day. In order to do so, we can e.g. group together events with the same address and the same daypart of the timestamp, as indicated by the horizontal lines in Table 1. For each of these event sets, we can construct a trace; time stamps define the ordering of events within the trace. For events of a trace having the same time stamps, an arbitrary ordering can be chosen within a trace.
An event partitioning function is a function that defines the partitioning of an arbitrary set of events from a given event universe into event sets where each is the maximal subset of such that for any , ; the value of shared by all the elements of defines the value of the trace attribute . Note that multidimensional trace attributes are also possible, i.e. a combination of the name of the person performing the event activity and the date of the event, so that every trace contains activities of one person during one day. The event sets obtained by applying an event partitioning can be transformed into traces (respecting the time ordering of events).
An event log is a finite set of traces . denotes the alphabet of event labels that occur in log . The traces of a log are often transformed before doing further analysis: very detailed but not necessarily informative event descriptions are transformed into some informative and repeatable labels. For the labels of the log in Table 1, the sensor values could be abstracted to on, and off or labels can be redefined to a subset of the event attributes, e.g. leaving the sensor values out completely. Next to that, if the event partitioning function maps each event from Table 1 to its address and the daypart of the timestamp, these attributes (indicated in gray) become the trace attribute and can safely be removed from individual events.
After this relabeling step, some traces of the log can become identically labeled (the event id’s would still be different). The information about the number of occurrences of a sequence of labels in an event log is highly relevant for process mining, since it allows differentiating between the mainstream behavior of a process (frequently occurring behavioral patterns) and exceptional behavior.
Let be an event universe. A function is an event relabeling function. A relabeling function can be used to obtain more useful event labels than the full set of event attribute values. We lift to event logs. Let be event universes with being pairwise different. Let and be event relabeling functions. Relabeling function is a refinement of relabeling function , denoted by , iff ; is then called an abstraction of .
The goal of process discovery is to discover a process model that represents the behavior seen in an event log. A frequently used process modeling notation in the process mining field is the Petri net [16]. Petri nets are directed bipartite graphs consisting of transitions and places, connected by arcs. Transitions represent activities, while places represent the enabling conditions of transitions. Labels are assigned to transitions to indicate the type of activity that they model. A special label is used to represent invisible transitions, which are only used for routing purposes and not recorded in the execution log.
A labeled Petri net is a tuple where is a finite set of places, is a finite set of transitions such that , is a set of directed arcs, called the flow relation, is an alphabet of labels representing activities, with being a label representing invisible events, and is a labeling function that assigns a label to each transition. For a node we use and to denote the set of input and output nodes of , defined as and . An example of a Petri net can be seen in Figure 1, where circles represent places and squares represent transitions. Gray transitions with smaller width represent transitions.
A state of a Petri net is defined by its marking being a multiset of places. A marking is graphically denoted by putting tokens on each place . A pair is called a marked Petri net. State changes occur through transition firings. A transition is enabled (can fire) in a given marking if each input place contains at least one token. Once a transition fires, one token is removed from each input place of and one token is added to each output place of , leading to a new marking. An accepting Petri net is a 3tuple with a labeled Petri net, an initial marking, and a set of final markings. Many process modeling notations, including accepting Petri nets, have formal executional semantics and a model defines a language of accepting traces . For the Petri net in Figure 1, the language of accepting traces is .
3 A Framework for Timebased Label Refinements
To generate potential label refinements for every label based on time we take a clustering based approach by identifying dense areas in time space for each label. The time part of the timestamps consists of values between 00:00:00 and 23:59:59, equivalent to the timestamp attribute from Table 1 with the daypart of the timestamp removed. This timestamp can be transformed into a real number hourfloat representation in interval
. We chose to apply soft clustering (also referred to as fuzzy clustering), which has the benefit of assigning to each data point a likelihood of belonging to each cluster. A wellknown approach to soft clustering is based on the combination of the ExpectationMaximization (EM) algorithm with mixture models, which are probability distributions consisting of multiple components of the same probability distribution. Each component in the mixture represents one cluster and the probability of a data point belonging to that cluster is the probability that this cluster generated that data point. The EM algorithm is used to obtain a maximum likelihood estimate of the mixture model parameters, i.e. the parameters of the probability distributions in the mixture.
A wellknown example of a mixture model is the Gaussian Mixture Model (GMM), where the components in the mixture distributions are normal distributions. The data space of time is, however, noneuclidean: it has a circular nature, e.g.
is closer to than to . This circular nature of the data space introduces problems for GMMs, as shown in Figure 2. The GMM fitted to the timestamps of the sensor events consists of two components, one with the mean at and one with a mean at . The histogram representation of the same data shows that some events happened just after midnight, which is actually closer on the clock to than to . The GMM however is unaware of the circularity of the clock, which results in the mixture model that seems inappropriate when visually comparing with the histogram. The field of circular statistics (also referred to as directional statistics), concerns analysis of such circular data spaces (cf. [14]).Here, we introduce a framework for generating refinements of event labels based on time attributes using techniques from the field of circular statistics. This framework consists of three stages:
 Datamodel prefitting stage

A known problem with many clustering techniques is that they return clusters even when the data should not be clustered. In this stage we assess how many clusters the events of a sensor type contain.
 Datamodel fitting stage

In this stage we cluster the events of a sensor type by timestamp using a mixture consisting of components that take into account the circularity of the data.
 Datamodel postfitting stage

In this stage the quality of the label refinements is assessed from both a cluster quality perspective and a process model (event ordering statistics) perspective.
3.1 Datamodel prefitting stage
We now describe a test for uniformity, a test for unimodality, and a method to select the number of clusters in the data.
3.1.1 Uniformity Check  Rao’s Spacing Test
Rao’s spacing test [15] tests the uniformity of the timestamps of the events from a sensor around the circular clock. This test is based on the idea that uniform circular data is distributed evenly around the circle, and observations are separated from each other
degrees. The null hypothesis is that the data is uniform around the circle.
Given successive observations
, either clockwise or counterclockwise, the test statistics
for Rao’s Spacing Test is defined as , where , for and .3.1.2 Unimodality Check  Hartigan’s Dip Test
Hartigan’s dip tests [7] the null hypothesis that the data follows a unimodal distribution on a circle. When the null hypothesis can be rejected, we know that the distribution of the data is at least bimodal. Hartigan’s dip test measures the maximum difference between the the empirical distribution function and the unimodal distribution function that minimizes that maximum difference.
3.1.3 Number of Component Selection  Bayesian Information Criterion
The Bayesian Information Criterion (BIC) [17] introduces a penalty for the number of model parameters to the evaluation of a mixture model. Adding a component to a mixture model increases the number of parameters of the mixture with the number of parameters of the distribution of the added component. The likelihood of the data given the model can only increase by adding extra components, adding the BIC penalty results in a tradeoff between number of components and the likelihood of the data given the mixture model. BIC is formally defined as , where is a maximized value for the data likelihood, is the sample size, and is the number of parameters to be estimated. A lower BIC value indicates a better model. We start with component, and iteratively increase from to components as long as the decrease in BIC is larger than 10, which is the threshold for decisive evidence of high BIC [10].
3.2 Datamodel fitting stage
We cluster events generated by one sensor using a mixture model consisting of components of the von Mises distribution, which is a circular version of the normal distribution. This technique is based on the approach of Banerjee et al. [1], who introduce a clustering method based on a mixture of von MisesFisher distribution components, which is a generalization of the dimensional von Mises distribution to
dimensional spheres. A probability density function for a von Mises distribution with mean direction
and concentration parameter is defined as , where mean and data point are expressed in radians on the circle, such that . represents the modified Bessel function of order 0, defined as . As approaches 0, the distribution becomes uniform around the circle. As increases, the distribution becomes relatively concentrated around the mean and the von Mises distribution starts to approximate a normal distribution. We fit a mixture model of von Mises components using the package movMF [9] provided in R.3.3 Datamodel postfitting stage
After fitting a mixture of von Mises distributions to the sensor events, we perform a goodnessoffit test to check whether the data could have been generated from this distribution. We describe the Watson statistic [25], a goodnessoffit assessment based on hypothesis testing. The Watson
statistic measures the discrepancy between the cumulative distribution function
and the empirical distribution function of some sample drawn from some population and is defined as .Furthermore we assess the quality of refining the event label into a new label for each cluster from a process perspective using the label refinement evaluation method described in [21]. This method tests whether the log statistics that are used in many process discovery algorithms become significantly more deterministic by applying the label refinement.
4 Case Study
We show the results of our timebased label refinements approach on the real life smart home data set described in van Kasteren et al. [23]. The van Kasteren data set consists of 1285 events divided over fourteen different sensors. We segment in days from midnight to midnight to define cases. Figure (a)a shows the process model discovered on this event log with the Inductive Miner infrequent [11] with 20% filtering, which discovers a process model that describes the most frequent 80% of behavior in the log. Note that this process model overgeneralises allowing too much behaviour. At the beginning a (possibly repeated) choice is made between five transitions. At the end of the process, the model allows any sequence over the alphabet of five activities, where each activity occurs at least once.
We illustrate our proof of concept by applying the framework to the bedroom door sensor. Rao’s spacing test results in a test statistic of with being the critical value for significance level
, indicating that we can reject the null hypothesis of a uniformly distributed set of
bedroom door timestamps. Hartigan’s dip test results in a pvalue of , indicating that we can reject the null hypothesis that there is only one cluster in the bedroom door data. Figure 3 shows the BIC values for different numbers of components in the model. The figure indicates that there are two clusters in the data, as this corresponds to the lowest BIC value. Table 3 shows the mean and parameters of the two clusters found by optimizing the von Mises mixture model with the EM algorithm. A value of radii equals midnight. After applying the von Mises mixture model to the bedroom door events and assigning each event to the maximum likelihood cluster we obtain a time range of [3.0810.44] for cluster 1 and a time range of [17.060.88] for cluster 2. The Watson test results in a test statistic of and for cluster 1 and 2 respectively with a critical value of for a significance level, indicating that the data is likely to be generated by the two von Mises distributions found. The label refinement evaluation method [21] finds statistically significant differences between the events from the two bedroom door clusters with regard to their controlflow relations with other activities in the log for 10 other activities using the significance level of , indicating that the two clusters are different from a controlflow perspective. Figure (b)b shows the process model discovered with the Inductive Miner infrequent with 20% filtering after applying this label refinement to the van Kasteren event log. The process model still overgeneralizes in general, but the label refinement does help restricting the behavior, as it shows that the evening bedroom door events are succeeded by one or more events of type groceries cupboard, freezer, cups cupboard, fridge, plates cupboard, or pans cupboard, while the morning bedroom door events are followed by one or more frontdoor events. It seems that this person generally goes to the bedroom inbetween coming home from work and starting to cook. The loop of the frontdoor events could be caused by the person leaving the house in the morning for work, resulting in no logged events until the person comes home again by opening the frontdoor. Note that in Figure (a)a bedroom door and frontdoor events can occur an arbitrary number of times in any order. Figure (a)a furthermore does not allow for the bedroom door to occur before the whole block of kitchenlocated events at the beginning of the net.Label refinements can be applied iteratively. Figure 5 shows the effect of a second label refinement step, where Plates cupboard using the same methodology is refined into two labels, representing time ranges [7.9814.02] and [16.050.92] respectively. This refinement shows the additional insight that the evening version of the Plates cupboard occurs in directly before or after the microwave.
5 Related Work
Refining event labels in the event log is closely related to the task of mining process models with duplicate activities, in which the resulting process model can contain multiple transitions/nodes with the same label. From the point of view of the behavior allowed by a process model, it makes no difference whether a process model is discovered on an event log with refined labels, or whether a process model is discovered with duplicate activities such that each transition/node of the duplicate activity precisely covers one versions of the refined label. The first process discovery algorithm capable of discovering duplicate tasks was proposed by Herbst and Karagiannis in 2004 [8], after which many others have been proposed, including the Genetic Miner [4], the Evolutionary Tree Miner [2], the algorithm [12], the algorithm [6], the EnhancedWFMiner [5], and a simulated annealing based algorithm [18]. An alternative approach has been proposed by VázquesBarreiros [24] et al., who describe a local search based approach to repair a process model to include duplicate activities, starting from an event log and a process model without duplicate activities. Existing work on mining models with duplicate activities all base their duplicate activities on how well the event log fits the process model, and do not try to find any semantic difference between the multiple versions of the activities in the form of data attribute differences.
The work that is closest to our work is the work by Lu et al. [13], who describe an approach to preprocess an event log by refining event labels with the goal of discovering a process model with duplicate activities. The method proposed by Lu et al., however, does not base the relabelings on data attributes of those events but instead bases them solely on the control flow context, leaving uncertainty whether two events relabeled differently are actually semantically different.
Another area of related work is dataaware process mining, where the aim is to discover rules with regard to data attributes of events that decide decision points in the process. De Leoni and van der Aalst [3]
proposed a method that discovers data guards for decision points in the process based on alignments and decision tree learning. This approach relies on the discovery of a behaviorally wellfitting process model from the original event log. When only overgeneralizing process models (i.e. allowing for too much behavior) can be discovered from an event log, the correct decision points might not be present in the discovered process model at all, resulting in this approach not being able to discover the data dependencies that are in the event log. Our label refinements use data attributes prior to process discovery to enable discover more behaviorally constrained process models by bringing parts of the event attribute space to the event label.
6 Conclusion & Future Work
We have proposed a framework based on techniques from the field of circular statistics to refine event labels automatically based on their timestamp attribute. We have shown through a proof of concept on a real life event log that this framework can be used to discover label refinements that allow for discovery of more insightful and behaviorally more specific process models. An interesting area of future work is to explore the use of other types of event data attributes to refine labels, e.g. power values of sensors. A next research step would be to explore label refinements based on multiple data attributes combined. This would bring challenge of clustering on partially circular and partially euclidean data spaces.
References

[1]
Banerjee, A., Dhillon, I. S., Ghosh, J., and Sra, S.
Clustering on the unit hypersphere using von misesfisher
distributions.
Journal of Machine Learning Research 6
, Sep (2005), 1345–1382.  [2] Buijs, J. C. A. M., Van Dongen, B. F., and van der Aalst, W. M. P. On the role of fitness, precision, generalization and simplicity in process discovery. In OTM Confederated International Conferences ”On the Move to Meaningful Internet Systems” (2012), Springer, pp. 305–322.
 [3] de Leoni, M., and van der Aalst, W. M. P. Dataaware process mining: discovering decisions in processes using alignments. In Proceedings of the 28th annual ACM symposium on applied computing (2013), ACM, pp. 1454–1461.
 [4] de Medeiros, A. K. A., Weijters, A. J. M. M., and van der Aalst, W. M. P. Genetic process mining: an experimental evaluation. Data Mining and Knowledge Discovery 14, 2 (2007), 245–304.
 [5] Folino, F., Greco, G., Guzzo, A., and Pontieri, L. Discovering expressive process models from noised log data. In Proceedings of the 2009 international database engineering & applications symposium (2009), ACM, pp. 162–172.
 [6] Gu, C.Q., Chang, H.Y., and Yi, Y. Workflow mining: Extending the algorithm to mine duplicate tasks. In 2008 International Conference on Machine Learning and Cybernetics (2008), vol. 1, IEEE, pp. 361–368.
 [7] Hartigan, J. A., and Hartigan, P. M. The dip test of unimodality. The Annals of Statistics (1985), 70–84.
 [8] Herbst, J., and Karagiannis, D. Workflow mining with inwolve. Computers in Industry 53, 3 (2004), 245–264.
 [9] Hornik, K., and Grün, B. movmf: An r package for fitting mixtures of von misesfisher distributions. Journal of Statistical Software 58, 10 (2014), 1–31.
 [10] Kass, R. E., and Raftery, A. E. Bayes factors. Journal of the American Statistical Association 90, 430 (1995), 773–795.
 [11] Leemans, S. J. J., Fahland, D., and van der Aalst, W. M. P. Discovering blockstructured process models from event logs containing infrequent behaviour. In International Conference on Business Process Management (2013), Springer, pp. 66–78.
 [12] Li, J., Liu, D., and Yang, B. Process mining: Extending algorithm to mine duplicate tasks in process logs. In Advances in Web and Network Technologies, and Information Management. Springer, 2007, pp. 396–407.
 [13] Lu, X., Fahland, D., van den Biggelaar, F. J. H. M., and van der Aalst, W. M. P. Handling duplicated tasks in process discovery by refining event labels. In International Conference on Business Process Management (2016), Springer, p. To appear.
 [14] Mardia, K. V., and Jupp, P. E. Directional statistics, vol. 494. John Wiley & Sons, 2009.
 [15] Rao, J. Some tests based on arclengths for the circle. Sankhyā: The Indian Journal of Statistics, Series B (1976), 329–338.
 [16] Reisig, W., and Rozenberg, G. Lectures on Petri nets I: basic models: advances in Petri nets, vol. 1491. Springer Science & Business Media, 1998.
 [17] Schwarz, G. Estimating the dimension of a model. The Annals of Statistics 6, 2 (1978), 461–464.
 [18] Song, W., Liu, S., and Liu, Q. Business process mining based on simulated annealing. In Young Computer Scientists, 2008. ICYCS 2008. The 9th International Conference for (2008), IEEE, pp. 725–730.
 [19] Sztyler, T., Völker, J., Carmona, J., Meier, O., and Stuckenschmidt, H. Discovery of personal processes from labeled sensor data–an application of process mining to personalized health care. In Proceedings of the International Workshop on Algorithms & Theories for the Analysis of Event Data (ATAED) (2015), pp. 22–23.

[20]
Tax, N., Sidorova, N., Haakma, R., and van der Aalst, W. M. P.
Event abstraction for process mining using supervised learning techniques.
In Proceedings of the SAI Conference on Intelligent Systems (IntelliSys) (2016), IEEE, pp. 161–170.  [21] Tax, N., Sidorova, N., Haakma, R., and van der Aalst, W. M. P. Logbased evaluation of label splits for process models. Procedia Computer Science 96 (2016), 63–72.

[22]
van der Aalst, W. M. P.
Process mining: data science in action
. Springer Science & Business Media, 2016.  [23] van Kasteren, T., Noulas, A., Englebienne, G., and Kröse, B. Accurate activity recognition in a home setting. In Proceedings of the 10th International Conference on Ubiquitous Computing (2008), ACM, pp. 1–9.
 [24] VázquezBarreiros, B., Mucientes, M., and Lama, M. Mining duplicate tasks from discovered processes. In Proceedings of the International Workshop on Algorithms & Theories for the Analysis of Event Data (ATAED) (2015), pp. 78–82.
 [25] Watson, G. S. Goodnessoffit tests on a circle. ii. Biometrika 49, 1/2 (1962), 57–63.
Comments
There are no comments yet.