The Long and the Short of It: Summarising Event Sequences with Serial Episodes

02/07/2019
by   Nikolaj Tatti, et al.
6

An ideal outcome of pattern mining is a small set of informative patterns, containing no redundancy or noise, that identifies the key structure of the data at hand. Standard frequent pattern miners do not achieve this goal, as due to the pattern explosion typically very large numbers of highly redundant patterns are returned. We pursue the ideal for sequential data, by employing a pattern set mining approach-an approach where, instead of ranking patterns individually, we consider results as a whole. Pattern set mining has been successfully applied to transactional data, but has been surprisingly under studied for sequential data. In this paper, we employ the MDL principle to identify the set of sequential patterns that summarises the data best. In particular, we formalise how to encode sequential data using sets of serial episodes, and use the encoded length as a quality score. As search strategy, we propose two approaches: the first algorithm selects a good pattern set from a large candidate set, while the second is a parameter-free any-time algorithm that mines pattern sets directly from the data. Experimentation on synthetic and real data demonstrates we efficiently discover small sets of informative patterns.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2009

Mining Compressed Repetitive Gapped Sequential Patterns Efficiently

Mining frequent sequential patterns from sequence databases has been a c...
research
12/22/2015

Keeping it Short and Simple: Summarising Complex Event Sequences with Multivariate Patterns

We study how to obtain concise descriptions of discrete multivariate seq...
research
01/27/2017

Efficiently Summarising Event Sequences with Rich Interleaving Patterns

Discovering the key structure of a database is one of the main goals of ...
research
12/12/2017

Mining Non-Redundant Sets of Generalizing Patterns from Sequence Databases

Sequential pattern mining techniques extract patterns corresponding to f...
research
02/04/2019

Ranking Episodes using a Partition Model

One of the biggest setbacks in traditional frequent pattern mining is th...
research
02/18/2019

Finding Robust Itemsets Under Subsampling

Mining frequent patterns is plagued by the problem of pattern explosion ...
research
10/29/2020

Supervised sequential pattern mining of event sequences in sport to identify important patterns of play: an application to rugby union

Given a set of sequences comprised of time-ordered events, sequential pa...

Please sign up or login with your details

Forgot password? Click here to reset