Comparing Dataset Characteristics that Favor the Apriori, Eclat or FP-Growth Frequent Itemset Mining Algorithms

01/30/2017
by   Jeff Heaton, et al.
0

Frequent itemset mining is a popular data mining technique. Apriori, Eclat, and FP-Growth are among the most common algorithms for frequent itemset mining. Considerable research has been performed to compare the relative performance between these three algorithms, by evaluating the scalability of each algorithm as the dataset size increases. While scalability as data size increases is important, previous papers have not examined the performance impact of similarly sized datasets that contain different itemset characteristics. This paper explores the effects that two dataset characteristics can have on the performance of these three frequent itemset algorithms. To perform this empirical analysis, a dataset generator is created to measure the effects of frequent item density and the maximum transaction size on performance. The generated datasets contain the same number of rows. This provides some insight into dataset characteristics that are conducive to each algorithm. The results of this paper's research demonstrate Eclat and FP-Growth both handle increases in maximum transaction size and frequent itemset density considerably better than the Apriori algorithm. This paper explores the effects that two dataset characteristics can have on the performance of these three frequent itemset algorithms. To perform this empirical analysis, a dataset generator is created to measure the effects of frequent item density and the maximum transaction size on performance. The generated datasets contain the same number of rows. This provides some insight into dataset characteristics that are conducive to each algorithm. The results of this paper's research demonstrate Eclat and FP-Growth both handle increases in maximum transaction size and frequent itemset density considerably better than the Apriori algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/19/2019

Fast Mining of Spatial Frequent Wordset from Social Database

In this paper, we propose an algorithm that extracts spatial frequent pa...
research
01/23/2019

Boosting Frequent Itemset Mining via Early Stopping Intersections

Mining frequent itemsets from a transaction database has emerged as a fu...
research
03/29/2018

Frequent Item-set Mining without Ubiquitous Items

Frequent Item-set Mining (FIM), sometimes called Market Basket Analysis ...
research
01/15/2020

An Efficient and Wear-Leveling-Aware Frequent-Pattern Mining on Non-Volatile Memory

Frequent-pattern mining is a common approach to reveal the valuable hidd...
research
12/13/2019

RDD-Eclat: Approaches to Parallelize Eclat Algorithm on Spark RDD Framework

Initially, a number of frequent itemset mining (FIM) algorithms have bee...
research
01/07/2013

Finding the True Frequent Itemsets

Frequent Itemsets (FIs) mining is a fundamental primitive in data mining...
research
03/18/2018

A Guided FP-growth algorithm for fast mining of frequent itemsets from big data

In this paper we present the GFP-growth (Guided FP-growth) algorithm, a ...

Please sign up or login with your details

Forgot password? Click here to reset