Parallel Algorithm for Frequent Itemset Mining on Intel Many-core Systems

12/28/2018
by   Mikhail Zymbler, et al.
0

Frequent itemset mining leads to the discovery of associations and correlations among items in large transactional databases. Apriori is a classical frequent itemset mining algorithm, which employs iterative passes over database combining with generation of candidate itemsets based on frequent itemsets found at the previous iteration, and pruning of clearly infrequent itemsets. The Dynamic Itemset Counting (DIC) algorithm is a variation of Apriori, which tries to reduce the number of passes made over a transactional database while keeping the number of itemsets counted in a pass relatively low. In this paper, we address the problem of accelerating DIC on the Intel Xeon Phi many-core system for the case when the transactional database fits in main memory. Intel Xeon Phi provides a large number of small compute cores with vector processing units. The paper presents a parallel implementation of DIC based on OpenMP technology and thread-level parallelism. We exploit the bit-based internal layout for transactions and itemsets. This technique reduces the memory space for storing the transactional database, simplifies the support count via logical bitwise operation, and allows for vectorization of such a step. Experimental evaluation on the platforms of the Intel Xeon CPU and the Intel Xeon Phi coprocessor with large synthetic and real databases showed good performance and scalability of the proposed algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/21/2009

Fast Algorithms for Mining Interesting Frequent Itemsets without Minimum Support

Real world datasets are sparse, dirty and contain hundreds of items. In ...
research
02/21/2019

Performance study of distributed Apriori-like frequent itemsets mining

In this article, we focus on distributed Apriori-based frequent itemsets...
research
06/18/2018

Mining frequent items in unstructured P2P networks

Large scale decentralized systems, such as P2P, sensor or IoT device net...
research
08/11/2021

Parallel algorithms for mining of frequent itemsets

In the recent decade companies started collecting of large amount of dat...
research
01/01/2019

Parallel Algorithm for Time Series Discords Discovery on the Intel Xeon Phi Knights Landing Many-core Processor

Discord is a refinement of the concept of anomalous subsequence of a tim...
research
04/21/2009

HybridMiner: Mining Maximal Frequent Itemsets Using Hybrid Database Representation Approach

In this paper we present a novel hybrid (arraybased layout and vertical ...
research
12/23/2019

Simulating collective neutrinos oscillations on the Intel Many Integrated Core (MIC) architecture

We evaluate the second-generation Intel Xeon Phi coprocessor based on th...

Please sign up or login with your details

Forgot password? Click here to reset