Learning over inherently distributed data

07/30/2019
by   Donghui Yan, et al.
0

The recent decades have seen a surge of interests in distributed computing. Existing work focus primarily on either distributed computing platforms, data query tools, or, algorithms to divide big data and conquer at individual machines etc. It is, however, increasingly often that the data of interest are inherently distributed, i.e., data are stored at multiple distributed sites due to diverse collection channels, business operations etc. We propose to enable learning and inference in such a setting via a general framework based on the distortion minimizing local transformations. This framework only requires a small amount of local signatures to be shared among distributed sites, eliminating the need of having to transmitting big data. Computation can be done very efficiently via parallel local computation. The error incurred due to distributed computing vanishes when increasing the size of local signatures. As the shared data need not be in their original form, data privacy may also be preserved. Experiments on linear (logistic) regression and Random Forests have shown promise of this approach. This framework is expected to apply to a general class of tools in learning and inference with the continuity property.

READ FULL TEXT

page 15

page 16

research
05/05/2019

Fast communication-efficient spectral clustering over distributed data

The last decades have seen a surge of interests in distributed computing...
research
02/10/2020

Distributed Bayesian Matrix Decomposition for Big Data Mining and Clustering

Matrix decomposition is one of the fundamental tools to discover knowled...
research
03/24/2023

Distributed Silhouette Algorithm: Evaluating Clustering on Big Data

In the big data era, the key feature that each algorithm needs to have i...
research
04/05/2018

Robust Fusion Methods for Structured Big Data

We address one of the important problems in Big Data, namely how to comb...
research
01/09/2020

Privacy-Preserving Deep Learning Computation for Geo-Distributed Medical Big-Data Platforms

This paper proposes a distributed deep learning framework for privacy-pr...
research
06/02/2021

Hyperdimensional Computing for Efficient Distributed Classification with Randomized Neural Networks

In the supervised learning domain, considering the recent prevalence of ...
research
04/27/2022

Memory-Disaggregated In-Memory Object Store Framework for Big Data Applications

The concept of memory disaggregation has recently been gaining traction ...

Please sign up or login with your details

Forgot password? Click here to reset