Significance of Episodes Based on Minimal Windows

02/07/2019
by   Nikolaj Tatti, et al.
0

Discovering episodes, frequent sets of events from a sequence has been an active field in pattern mining. Traditionally, a level-wise approach is used to discover all frequent episodes. While this technique is computationally feasible it may result in a vast number of patterns, especially when low thresholds are used. In this paper we propose a new quality measure for episodes. We say that an episode is significant if the average length of its minimal windows deviates greatly when compared to the expected length according to the independence model. We can apply this measure as a post-pruning step to test whether the discovered frequent episodes are truly interesting and consequently to reduce the number of output. As a main contribution we introduce a technique that allows us to compute the distribution of lengths of minimal windows using the independence model. Such a computation task is surpisingly complex and in order to solve it we compute the distribution iteratively starting from simple episodes and progressively moving towards the more complex ones. In our experiments we discover candidate episodes that have a sufficient amount of minimal windows and test each candidate for significance. The experimental results demonstrate that our approach finds significant episodes while ignoring uninteresting ones.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/15/2019

Discovering Episodes with Compact Minimal Windows

Discovering the most interesting patterns is the key problem in the fiel...
research
06/03/2002

Mining All Non-Derivable Frequent Itemsets

Recent studies on frequent itemset mining algorithms resulted in signifi...
research
01/20/2022

FreSCo: Mining Frequent Patterns in Simplicial Complexes

Simplicial complexes are a generalization of graphs that model higher-or...
research
09/19/2022

OPR-Miner: Order-preserving rule mining for time series

Discovering frequent trends in time series is a critical task in data mi...
research
09/15/2016

Confining Windows Inter-Process Communications for OS-Level Virtual Machine

As OS-level virtualization technology usually imposes little overhead on...
research
07/06/2022

Improving Order with Queues

Patience Sort sorts a sequence of numbers with a minimal number of queue...
research
02/21/2019

Performance study of distributed Apriori-like frequent itemsets mining

In this article, we focus on distributed Apriori-based frequent itemsets...

Please sign up or login with your details

Forgot password? Click here to reset