Reliable Distributed Clustering with Redundant Data Assignment

02/20/2020
by   Venkata Gandikota, et al.
0

In this paper, we present distributed generalized clustering algorithms that can handle large scale data across multiple machines in spite of straggling or unreliable machines. We propose a novel data assignment scheme that enables us to obtain global information about the entire data even when some machines fail to respond with the results of the assigned local computations. The assignment scheme leads to distributed algorithms with good approximation guarantees for a variety of clustering and dimensionality reduction problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/04/2018

Deep Temporal Clustering : Fully Unsupervised Learning of Time-Domain Features

Unsupervised learning of time series data, also known as temporal cluste...
research
02/27/2023

An algorithm for geo-distributed and redundant storage in Garage

This paper presents an optimal algorithm to compute the assignment of da...
research
04/16/2019

Heterogeneous Computation across Heterogeneous Workers

Coded distributed computing framework enables large-scale machine learni...
research
04/16/2019

Heterogeneous Coded Computation across Heterogeneous Workers

Coded distributed computing framework enables large-scale machine learni...
research
01/11/2023

Fast conformational clustering of extensive molecular dynamics simulation data

We present an unsupervised data processing workflow that is specifically...
research
07/10/2011

Task swapping networks in distributed systems

In this paper we propose task swapping networks for task reassignments b...
research
03/21/2022

Coresets for Weight-Constrained Anisotropic Assignment and Clustering

The present paper constructs coresets for weight-constrained anisotropic...

Please sign up or login with your details

Forgot password? Click here to reset