DeepAI AI Chat
Log In Sign Up

ADASS: Adaptive Sample Selection for Training Acceleration

by   Shen-Yi Zhao, et al.
Nanjing University

Stochastic gradient decent (SGD) and its variants, including some accelerated variants, have become popular for training in machine learning. However, in all existing SGD and its variants, the sample size in each iteration (epoch) of training is the same as the size of the full training set. In this paper, we propose a new method, called adaptive sample selection (ADASS), for training acceleration. During different epoches of training, ADASS only need to visit different training subsets which are adaptively selected from the full training set according to the Lipschitz constants of the loss functions on samples. It means that in ADASS the sample size in each epoch of training can be smaller than the size of the full training set, by discarding some samples. ADASS can be seamlessly integrated with existing optimization methods, such as SGD and momentum SGD, for training acceleration. Theoretical results show that the learning accuracy of ADASS is comparable to that of counterparts with full training set. Furthermore, empirical results on both shallow models and deep models also show that ADASS can accelerate the training process of existing methods without sacrificing accuracy.


page 1

page 2

page 3

page 4


Stochastic Normalized Gradient Descent with Momentum for Large Batch Training

Stochastic gradient descent (SGD) and its variants have been the dominat...

On the Generalization of Stochastic Gradient Descent with Momentum

While momentum-based methods, in conjunction with stochastic gradient de...

Accelerated Sparsified SGD with Error Feedback

We study a stochastic gradient method for synchronous distributed optimi...

Committee-Based Sample Selection for Probabilistic Classifiers

In many real-world learning tasks, it is expensive to acquire a sufficie...

Large Scale Empirical Risk Minimization via Truncated Adaptive Newton Method

We consider large scale empirical risk minimization (ERM) problems, wher...

DCNNs on a Diet: Sampling Strategies for Reducing the Training Set Size

Large-scale supervised classification algorithms, especially those based...