Finding Robust Itemsets Under Subsampling

02/18/2019
by   Nikolaj Tatti, et al.
0

Mining frequent patterns is plagued by the problem of pattern explosion making pattern reduction techniques a key challenge in pattern mining. In this paper we propose a novel theoretical framework for pattern reduction. We do this by measuring the robustness of a property of an itemset such as closedness or non-derivability. The robustness of a property is the probability that this property holds on random subsets of the original data. We study four properties: closed, free, non-derivable and totally shattered itemsets, demonstrating how we can compute the robustness analytically without actually sampling the data. Our concept of robustness has many advantages: Unlike statistical approaches for reducing patterns, we do not assume a null hypothesis or any noise model and the patterns reported are simply a subset of all patterns with this property as opposed to approximate patterns for which the property does not really hold. If the underlying property is monotonic, then the measure is also monotonic, allowing us to efficiently mine robust itemsets. We further derive a parameter-free technique for ranking itemsets that can be used for top-k approaches. Our experiments demonstrate that we can successfully use the robustness measure to reduce the number of patterns and that ranking yields interesting itemsets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/02/2015

Fast Generation of Best Interval Patterns for Nonmonotonic Constraints

In pattern mining, the main challenge is the exponential explosion of th...
research
03/28/2017

Mining Best Closed Itemsets for Projection-antimonotonic Constraints in Polynomial Time

The exponential explosion of the set of patterns is one of the main chal...
research
06/10/2023

TALENT: Targeted Mining of Non-overlapping Sequential Patterns

With the widespread application of efficient pattern mining algorithms, ...
research
02/07/2019

The Long and the Short of It: Summarising Event Sequences with Serial Episodes

An ideal outcome of pattern mining is a small set of informative pattern...
research
04/15/2019

Discovering Episodes with Compact Minimal Windows

Discovering the most interesting patterns is the key problem in the fiel...
research
02/04/2019

Ranking Episodes using a Partition Model

One of the biggest setbacks in traditional frequent pattern mining is th...
research
10/26/2020

Introduction -- Parallel Universes and Local Patterns

Learning in parallel universes and the mining for local patterns are bot...

Please sign up or login with your details

Forgot password? Click here to reset