The Scalability for Parallel Machine Learning Training Algorithm: Dataset Matters

10/25/2019
by   Cheng Daning, et al.
0

To gain a better performance, many researchers put more computing resource into an application. However, in the AI area, there is still a lack of a successful large-scale machine learning training application: The scalability and performance reproducibility of parallel machine learning training algorithm are limited and there are a few pieces of research focusing on why these indexes are limited but there are very few research efforts explaining the reasons in essence. In this paper, we propose that the sample difference in dataset plays a more prominent role in parallel machine learning algorithm scalability. Dataset characters can measure sample difference. These characters include the variance of the sample in a dataset, sparsity, sample diversity and similarity in sampling sequence. To match our proposal, we choose four kinds of parallel machine learning training algorithms as our research objects: (1) Asynchronous parallel SGD algorithm (Hogwild! algorithm) (2) Parallel model average SGD algorithm (Mini-batch SGD algorithm) (3) Decenterilization optimization algorithm, (4) Dual Coordinate Optimization (DADM algorithm). These algorithms cover different types of machine learning optimization algorithms. We present the analysis of their convergence proof and design experiments. Our results show that the characters datasets decide the scalability of the machine learning algorithm. What is more, there is an upper bound of parallel scalability for machine learning algorithms.

READ FULL TEXT

page 3

page 4

page 5

page 7

page 10

page 11

page 12

page 13

research
08/24/2015

Fast Asynchronous Parallel Stochastic Gradient Decent

Stochastic gradient descent (SGD) and its variants have become more and ...
research
07/17/2018

Parallel Restarted SGD for Non-Convex Optimization with Faster Convergence and Less Communication

For large scale non-convex stochastic optimization, parallel mini-batch ...
research
08/16/2017

Weighted parallel SGD for distributed unbalanced-workload training system

Stochastic gradient descent (SGD) is a popular stochastic optimization m...
research
05/01/2023

Performance and Energy Consumption of Parallel Machine Learning Algorithms

Machine learning models have achieved remarkable success in various real...
research
10/31/2016

Optimization for Large-Scale Machine Learning with Distributed Features and Observations

As the size of modern data sets exceeds the disk and memory capacities o...
research
05/19/2023

Algorithmic failure as a humanities methodology: machine learning's mispredictions identify rich cases for qualitative analysis

This commentary tests a methodology proposed by Munk et al. (2022) for u...
research
05/14/2020

MixML: A Unified Analysis of Weakly Consistent Parallel Learning

Parallelism is a ubiquitous method for accelerating machine learning alg...

Please sign up or login with your details

Forgot password? Click here to reset