Distributed Spatial Data Clustering as a New Approach for Big Data Analysis

10/26/2017
by   Malika Bendechache, et al.
0

In this paper we propose a new approach for Big Data mining and analysis. This new approach works well on distributed datasets and deals with data clustering task of the analysis. The approach consists of two main phases, the first phase executes a clustering algorithm on local data, assuming that the datasets was already distributed among the system processing nodes. The second phase deals with the local clusters aggregation to generate global clusters. This approach not only generates local clusters on each processing node in parallel, but also facilitates the formation of global clusters without prior knowledge of the number of the clusters, which many partitioning clustering algorithm require. In this study, this approach was applied on spatial datasets. The proposed aggregation phase is very efficient and does not involve the exchange of large amounts of data between the processing nodes. The experimental results show that the approach has super linear speed up, scales up very well, and can take advantage of the recent programming models, such as MapReduce model, as its results are not affected by the types of communications.

READ FULL TEXT
research
02/01/2018

Hierarchical Aggregation Approach for Distributed clustering of spatial datasets

In this paper, we present a new approach of distributed clustering for s...
research
02/01/2018

Distributed Clustering Algorithm for Spatial Data Mining

Distributed data mining techniques and mainly distributed clustering are...
research
10/23/2019

Knowledge Map: Toward a New Approach Supporting the Knowledge Management in Distributed Data Mining

Distributed data mining (DDM) deals with the problem of finding patterns...
research
05/08/2018

Parallel Computation of PDFs on Big Spatial Data Using Spark

We consider big spatial data, which is typically produced in scientific ...
research
09/19/2019

DAOC: Stable Clustering of Large Networks

Clustering is a crucial component of many data mining systems involving ...
research
02/02/2021

Super-klust: Another Way of Piecewise Linear Classification

With our previous study, the Super-k algorithm, we have introduced a nov...

Please sign up or login with your details

Forgot password? Click here to reset