Performance study of distributed Apriori-like frequent itemsets mining

02/21/2019
by   Lamine M. Aouad, et al.
0

In this article, we focus on distributed Apriori-based frequent itemsets mining. We present a new distributed approach which takes into account inherent characteristics of this algorithm. We study the distribution aspect of this algorithm and give a comparison of the proposed approach with a classical Apriori-like distributed algorithm, using both analytical and experimental studies. We find that under a wide range of conditions and datasets, the performance of a distributed Apriori-like algorithm is not related to global strategies of pruning since the performance of the local Apriori generation is usually characterized by relatively high success rates of candidate sets frequency at low levels which switch to very low rates at some stage, and often drops to zero. This means that the intermediate communication steps and remote support counts computation and collection in classical distributed schemes are computationally inefficient locally, and then constrains the global performance. Our performance evaluation is done on a large cluster of workstations using the Condor system and its workflow manager DAGMan. The results show that the presented approach greatly enhances the performance and achieves good scalability compared to a typical distributed Apriori founded algorithm.

READ FULL TEXT
research
12/28/2018

Parallel Algorithm for Frequent Itemset Mining on Intel Many-core Systems

Frequent itemset mining leads to the discovery of associations and corre...
research
04/21/2009

Ramp: Fast Frequent Itemset Mining with Efficient Bit-Vector Projection Technique

Mining frequent itemset using bit-vector representation approach is very...
research
06/03/2002

Mining All Non-Derivable Frequent Itemsets

Recent studies on frequent itemset mining algorithms resulted in signifi...
research
12/13/2019

RDD-Eclat: Approaches to Parallelize Eclat Algorithm on Spark RDD Framework

Initially, a number of frequent itemset mining (FIM) algorithms have bee...
research
06/18/2018

Mining frequent items in unstructured P2P networks

Large scale decentralized systems, such as P2P, sensor or IoT device net...
research
09/22/2017

Estimate Exchange over Network is Good for Distributed Hard Thresholding Pursuit

We investigate an existing distributed algorithm for learning sparse sig...
research
02/07/2019

Significance of Episodes Based on Minimal Windows

Discovering episodes, frequent sets of events from a sequence has been a...

Please sign up or login with your details

Forgot password? Click here to reset