Fast and Memory-Efficient Significant Pattern Mining via Permutation Testing

02/15/2015
by   Felipe Llinares-Lopez, et al.
0

We present a novel algorithm, Westfall-Young light, for detecting patterns, such as itemsets and subgraphs, which are statistically significantly enriched in one of two classes. Our method corrects rigorously for multiple hypothesis testing and correlations between patterns through the Westfall-Young permutation procedure, which empirically estimates the null distribution of pattern frequencies in each class via permutations. In our experiments, Westfall-Young light dramatically outperforms the current state-of-the-art approach in terms of both runtime and memory efficiency on popular real-world benchmark datasets for pattern mining. The key to this efficiency is that unlike all existing methods, our algorithm neither needs to solve the underlying frequent itemset mining problem anew for each permutation nor needs to store the occurrence list of all frequent patterns. Westfall-Young light opens the door to significant pattern mining on large datasets that previously led to prohibitive runtime or memory costs.

READ FULL TEXT
research
02/10/2012

Abstract Representations and Frequent Pattern Discovery

We discuss the frequent pattern mining problem in a general setting. Fro...
research
01/15/2020

An Efficient and Wear-Leveling-Aware Frequent-Pattern Mining on Non-Volatile Memory

Frequent-pattern mining is a common approach to reveal the valuable hidd...
research
08/24/2015

Searching for significant patterns in stratified data

Significant pattern mining, the problem of finding itemsets that are sig...
research
02/16/2022

Near-optimal Top-k Pattern Mining

Nowadays, frequent pattern mining (FPM) on large graphs receives increas...
research
06/16/2020

MCRapper: Monte-Carlo Rademacher Averages for Poset Families and Approximate Pattern Mining

We present MCRapper, an algorithm for efficient computation of Monte-Car...
research
07/01/2014

Significant Subgraph Mining with Multiple Testing Correction

The problem of finding itemsets that are statistically significantly enr...
research
11/07/2017

Grafting for Combinatorial Boolean Model using Frequent Itemset Mining

This paper introduces the combinatorial Boolean model (CBM), which is de...

Please sign up or login with your details

Forgot password? Click here to reset