Robust Coreset Construction for Distributed Machine Learning

04/11/2019
by   Hanlin Lu, et al.
0

Motivated by the need of solving machine learning problems over distributed datasets, we explore the use of coreset to reduce the communication overhead. Coreset is a summary of the original dataset in the form of a small weighted set in the same sample space. Compared to other data summaries, coreset has the advantage that it can be used as a proxy of the original dataset, potentially for different applications. However, existing coreset construction algorithms are each tailor-made for a specific machine learning problem. Thus, to solve different machine learning problems, one has to collect coresets of different types, defeating the purpose of saving communication overhead. We resolve this dilemma by developing coreset construction algorithms based on k-means/median clustering, that give a provably good approximation for a broad range of machine learning problems with sufficiently continuous cost functions. Through evaluations on diverse datasets and machine learning problems, we verify the robust performance of the proposed algorithms.

READ FULL TEXT

page 1

page 8

page 10

research
03/19/2017

Practical Coreset Constructions for Machine Learning

We investigate coresets - succinct, small summaries of large data sets -...
research
06/30/2021

Robust Coreset for Continuous-and-Bounded Learning (with Outliers)

In this big data era, we often confront large-scale data in many machine...
research
03/30/2016

Towards Geo-Distributed Machine Learning

Latency to end-users and regulatory requirements push large companies to...
research
07/11/2018

Morse Code Datasets for Machine Learning

We present an algorithm to generate synthetic datasets of tunable diffic...
research
12/02/2021

Constrained Machine Learning: The Bagel Framework

Machine learning models are widely used for real-world applications, suc...
research
11/14/2013

Fundamental Limits of Online and Distributed Algorithms for Statistical Learning and Estimation

Many machine learning approaches are characterized by information constr...
research
09/30/2017

Decontamination of Mutual Contamination Models

Many machine learning problems can be characterized by mutual contaminat...

Please sign up or login with your details

Forgot password? Click here to reset