MCRapper: Monte-Carlo Rademacher Averages for Poset Families and Approximate Pattern Mining

06/16/2020
by   Leonardo Pellegrina, et al.
0

We present MCRapper, an algorithm for efficient computation of Monte-Carlo Empirical Rademacher Averages (MCERA) for families of functions exhibiting poset (e.g., lattice) structure, such as those that arise in many pattern mining tasks. The MCERA allows us to compute upper bounds to the maximum deviation of sample means from their expectations, thus it can be used to find both statistically-significant functions (i.e., patterns) when the available data is seen as a sample from an unknown distribution, and approximations of collections of high-expectation functions (e.g., frequent patterns) when the available data is a small sample from a large dataset. This feature is a strong improvement over previously proposed solutions that could only achieve one of the two. MCRapper uses upper bounds to the discrepancy of the functions to efficiently explore and prune the search space, a technique borrowed from pattern mining itself. To show the practical use of MCRapper, we employ it to develop an algorithm TFP-R for the task of True Frequent Pattern (TFP) mining. TFP-R gives guarantees on the probability of including any false positives (precision) and exhibits higher statistical power (recall) than existing methods offering the same guarantees. We evaluate MCRapper and TFP-R and show that they outperform the state-of-the-art for their respective tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/20/2019

SCR-Apriori for Mining `Sets of Contrasting Rules'

In this paper, we propose an efficient algorithm for mining novel `Set o...
research
02/06/2021

Discrepancy Bounds for a Class of Negatively Dependent Random Points Including Latin Hypercube Samples

We introduce a class of γ-negatively dependent random samples. We prove ...
research
01/07/2013

Finding the True Frequent Itemsets

Frequent Itemsets (FIs) mining is a fundamental primitive in data mining...
research
02/15/2015

Fast and Memory-Efficient Significant Pattern Mining via Permutation Testing

We present a novel algorithm, Westfall-Young light, for detecting patter...
research
10/28/2016

Flexible constrained sampling with guarantees for pattern mining

Pattern sampling has been proposed as a potential solution to the infamo...
research
08/06/2018

Know Abnormal, Find Evil: Frequent Pattern Mining for Ransomware Threat Hunting and Intelligence

Emergence of crypto-ransomware has significantly changed the cyber threa...
research
01/20/2022

FreSCo: Mining Frequent Patterns in Simplicial Complexes

Simplicial complexes are a generalization of graphs that model higher-or...

Please sign up or login with your details

Forgot password? Click here to reset