An Accurate and Efficient Large-scale Regression Method through Best Friend Clustering

04/22/2021
by   Kun Li, et al.
0

As the data size in Machine Learning fields grows exponentially, it is inevitable to accelerate the computation by utilizing the ever-growing large number of available cores provided by high-performance computing hardware. However, existing parallel methods for clustering or regression often suffer from problems of low accuracy, slow convergence, and complex hyperparameter-tuning. Furthermore, the parallel efficiency is usually difficult to improve while striking a balance between preserving model properties and partitioning computing workloads on distributed systems. In this paper, we propose a novel and simple data structure capturing the most important information among data samples. It has several advantageous properties supporting a hierarchical clustering strategy that is irrelevant to the hardware parallelism, well-defined metrics for determining optimal clustering, balanced partition for maintaining the compactness property, and efficient parallelization for accelerating computation phases. Then we combine the clustering with regression techniques as a parallel library and utilize a hybrid structure of data and model parallelism to make predictions. Experiments illustrate that our library obtains remarkable performance on convergence, accuracy, and scalability.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/31/2021

SplitBrain: Hybrid Data and Model Parallel Deep Learning

The recent success of deep learning applications has coincided with thos...
research
10/28/2018

Accurate, Efficient and Scalable Graph Embedding

The Graph Convolutional Network (GCN) model and its variants are powerfu...
research
05/10/2018

Unifying Data, Model and Hybrid Parallelism in Deep Learning via Tensor Tiling

Deep learning systems have become vital tools across many fields, but th...
research
09/27/2022

OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning

Large-scale deep learning models contribute to significant performance i...
research
07/22/2022

Scalable training of graph convolutional neural networks for fast and accurate predictions of HOMO-LUMO gap in molecules

Graph Convolutional Neural Network (GCNN) is a popular class of deep lea...
research
08/17/2023

Approximating Clustering for Memory Management and request processing

Clustering is a crucial tool for analyzing data in virtually every scien...
research
02/07/2023

Two Parallel PageRank Algorithms via Improving Forward Push

Initially used to rank web pages, PageRank has now been applied in many ...

Please sign up or login with your details

Forgot password? Click here to reset