Using Multi-Core HW/SW Co-design Architecture for Accelerating K-means Clustering Algorithm

07/09/2018
by   Hadi Mardani Kamali, et al.
0

The capability of classifying and clustering a desired set of data is an essential part of building knowledge from data. However, as the size and dimensionality of input data increases, the run-time for such clustering algorithms is expected to grow superlinearly, making it a big challenge when dealing with BigData. K-mean clustering is an essential tool for many big data applications including data mining, predictive analysis, forecasting studies, and machine learning. However, due to large size (volume) of Big-Data, and large dimensionality of its data points, even the application of a simple k-mean clustering may become extremely time and resource demanding. Specially when it is necessary to have a fast and modular dataset analysis flow. In this paper, we demonstrate that using a two-level filtering algorithm based on binary kd-tree structure is able to decrease the time of convergence in K-means algorithm for large datasets. The two-level filtering algorithm based on binary kd-tree structure evolves the SW to naturally divide the classification into smaller data sets, based on the number of available cores and size of logic available in a target FPGA. The empirical result on this two-level structure over multi-core FPGA-based architecture provides 330X speed-up compared to a conventional software-only solution.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/17/2016

High-performance K-means Implementation based on a Simplified Map-Reduce Architecture

The k-means algorithm is one of the most common clustering algorithms an...
research
04/14/2022

Big-means: Less is More for K-means Clustering

K-means clustering plays a vital role in data mining. However, its perfo...
research
09/21/2017

Recovery of Architecture Module Views using an Optimized Algorithm Based on Design Structure Matrices

Design structure matrices (DSMs) are useful to represent high-level syst...
research
06/03/2019

Big-Data Clustering: K-Means or K-Indicators?

The K-means algorithm is arguably the most popular data clustering metho...
research
11/19/2020

Freecyto: Quantized Flow Cytometry Analysis for the Web

Flow cytometry (FCM) is an analytic technique that is capable of detecti...

Please sign up or login with your details

Forgot password? Click here to reset