ADASS: Adaptive Sample Selection for Training Acceleration

06/11/2019
by   Shen-Yi Zhao, et al.
0

Stochastic gradient decent (SGD) and its variants, including some accelerated variants, have become popular for training in machine learning. However, in all existing SGD and its variants, the sample size in each iteration (epoch) of training is the same as the size of the full training set. In this paper, we propose a new method, called adaptive sample selection (ADASS), for training acceleration. During different epoches of training, ADASS only need to visit different training subsets which are adaptively selected from the full training set according to the Lipschitz constants of the loss functions on samples. It means that in ADASS the sample size in each epoch of training can be smaller than the size of the full training set, by discarding some samples. ADASS can be seamlessly integrated with existing optimization methods, such as SGD and momentum SGD, for training acceleration. Theoretical results show that the learning accuracy of ADASS is comparable to that of counterparts with full training set. Furthermore, empirical results on both shallow models and deep models also show that ADASS can accelerate the training process of existing methods without sacrificing accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/28/2020

Stochastic Normalized Gradient Descent with Momentum for Large Batch Training

Stochastic gradient descent (SGD) and its variants have been the dominat...
research
02/26/2021

On the Generalization of Stochastic Gradient Descent with Momentum

While momentum-based methods, in conjunction with stochastic gradient de...
research
05/29/2019

Accelerated Sparsified SGD with Error Feedback

We study a stochastic gradient method for synchronous distributed optimi...
research
06/01/2011

Committee-Based Sample Selection for Probabilistic Classifiers

In many real-world learning tasks, it is expensive to acquire a sufficie...
research
05/22/2017

Large Scale Empirical Risk Minimization via Truncated Adaptive Newton Method

We consider large scale empirical risk minimization (ERM) problems, wher...
research
11/09/2022

Variants of SGD for Lipschitz Continuous Loss Functions in Low-Precision Environments

Motivated by neural network training in low-bit floating and fixed-point...
research
08/08/2022

Pairwise Learning via Stagewise Training in Proximal Setting

The pairwise objective paradigms are an important and essential aspect o...

Please sign up or login with your details

Forgot password? Click here to reset