Ranking Episodes using a Partition Model

by   Nikolaj Tatti, et al.

One of the biggest setbacks in traditional frequent pattern mining is that overwhelmingly many of the discovered patterns are redundant. A prototypical example of such redundancy is a freerider pattern where the pattern contains a true pattern and some additional noise events. A technique for filtering freerider patterns that has proved to be efficient in ranking itemsets is to use a partition model where a pattern is divided into two subpatterns and the observed support is compared to the expected support under the assumption that these two subpatterns occur independently. In this paper we develop a partition model for episodes, patterns discovered from sequential data. An episode is essentially a set of events, with possible restrictions on the order of events. Unlike with itemset mining, computing the expected support of an episode requires surprisingly sophisticated methods. In order to construct the model, we partition the episode into two subepisodes. We then model how likely the events in each subepisode occur close to each other. If this probability is high---which is often the case if the subepisode has a high support---then we can expect that when one event from a subepisode occurs, then the remaining events occur also close by. This approach increases the expected support of the episode, and if this increase explains the observed support, then we can deem the episode uninteresting. We demonstrate in our experiments that using the partition model can effectively and efficiently reduce the redundancy in episodes.


page 1

page 2

page 3

page 4


Free-rider Episode Screening via Dual Partition Model

One of the drawbacks of frequent episode mining is that overwhelmingly m...

Discovering Episodes with Compact Minimal Windows

Discovering the most interesting patterns is the key problem in the fiel...

Mining Closed Episodes with Simultaneous Events

Sequential pattern discovery is a well-studied field in data mining. Epi...

The Long and the Short of It: Summarising Event Sequences with Serial Episodes

An ideal outcome of pattern mining is a small set of informative pattern...

Semantics of negative sequential patterns

In the field of pattern mining, a negative sequential pattern is specifi...

Using Background Knowledge to Rank Itemsets

Assessing the quality of discovered results is an important open problem...

Finding Robust Itemsets Under Subsampling

Mining frequent patterns is plagued by the problem of pattern explosion ...

Please sign up or login with your details

Forgot password? Click here to reset