Flexible constrained sampling with guarantees for pattern mining

10/28/2016
by   Vladimir Dzyuba, et al.
0

Pattern sampling has been proposed as a potential solution to the infamous pattern explosion. Instead of enumerating all patterns that satisfy the constraints, individual patterns are sampled proportional to a given quality measure. Several sampling algorithms have been proposed, but each of them has its limitations when it comes to 1) flexibility in terms of quality measures and constraints that can be used, and/or 2) guarantees with respect to sampling accuracy. We therefore present Flexics, the first flexible pattern sampler that supports a broad class of quality measures and constraints, while providing strong guarantees regarding sampling accuracy. To achieve this, we leverage the perspective on pattern mining as a constraint satisfaction problem and build upon the latest advances in sampling solutions in SAT as well as existing pattern mining algorithms. Furthermore, the proposed algorithm is applicable to a variety of pattern languages, which allows us to introduce and tackle the novel task of sampling sets of patterns. We introduce and empirically evaluate two variants of Flexics: 1) a generic variant that addresses the well-known itemset sampling task and the novel pattern set sampling task as well as a wide range of expressive constraints within these tasks, and 2) a specialized variant that exploits existing frequent itemset techniques to achieve substantial speed-ups. Experiments show that Flexics is both accurate and efficient, making it a useful tool for pattern-based data exploration.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/07/2017

Learning what matters - Sampling interesting patterns

In the field of exploratory data mining, local structure in data can be ...
research
04/01/2016

A SAT model to mine flexible sequences in transactional datasets

Traditional pattern mining algorithms generally suffer from a lack of fl...
research
06/02/2015

Fast Generation of Best Interval Patterns for Nonmonotonic Constraints

In pattern mining, the main challenge is the exponential explosion of th...
research
04/08/2022

Exploiting complex pattern features for interactive pattern mining

Recent years have seen a shift from a pattern mining process that has us...
research
03/28/2017

Mining Best Closed Itemsets for Projection-antimonotonic Constraints in Polynomial Time

The exponential explosion of the set of patterns is one of the main chal...
research
06/16/2020

MCRapper: Monte-Carlo Rademacher Averages for Poset Families and Approximate Pattern Mining

We present MCRapper, an algorithm for efficient computation of Monte-Car...
research
09/14/2020

Should Decorators Preserve the Component Interface?

Decorator design pattern is a well known pattern that allows dynamical a...

Please sign up or login with your details

Forgot password? Click here to reset