Stochastic Distributed Optimization for Machine Learning from Decentralized Features

12/16/2018
by   Yaochen Hu, et al.
0

Distributed machine learning has been widely studied in the literature to scale up machine learning model training in the presence of an ever-increasing amount of data. We study distributed machine learning from another perspective, where the information about the training same samples are inherently decentralized and located on different parities. We propose an asynchronous stochastic gradient descent (SGD) algorithm for such a feature distributed machine learning (FDML) problem, to jointly learn from decentralized features, with theoretical convergence guarantees under bounded asynchrony. Our algorithm does not require sharing the original feature data or even local model parameters between parties, thus preserving a high level of data confidentiality. We implement our algorithm for FDML in a parameter server architecture. We compare our system with fully centralized training (which violates data locality requirements) and training only based on local features, through extensive experiments performed on a large amount of data from a real-world application, involving 5 million samples and 8700 features in total. Experimental results have demonstrated the effectiveness and efficiency of the proposed FDML system.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/18/2017

Asynchronous Decentralized Parallel Stochastic Gradient Descent

Recent work shows that decentralized parallel stochastic gradient decent...
research
11/20/2019

Local AdaAlter: Communication-Efficient Stochastic Gradient Descent with Adaptive Learning Rates

Recent years have witnessed the growth of large-scale distributed machin...
research
04/14/2018

When Edge Meets Learning: Adaptive Control for Resource-Constrained Distributed Machine Learning

Emerging technologies and applications including Internet of Things (IoT...
research
12/17/2021

Personalized On-Device E-health Analytics with Decentralized Block Coordinate Descent

Actuated by the growing attention to personal healthcare and the pandemi...
research
03/29/2018

A Stochastic Large-scale Machine Learning Algorithm for Distributed Features and Observations

As the size of modern data sets exceeds the disk and memory capacities o...
research
10/14/2022

Hybrid Decentralized Optimization: First- and Zeroth-Order Optimizers Can Be Jointly Leveraged For Faster Convergence

Distributed optimization has become one of the standard ways of speeding...
research
04/29/2019

AutoCross: Automatic Feature Crossing for Tabular Data in Real-World Applications

Feature crossing captures interactions among categorical features and is...

Please sign up or login with your details

Forgot password? Click here to reset