Towards Sustainable Learning: Coresets for Data-efficient Deep Learning

06/02/2023
by   Yu Yang, et al.
0

To improve the efficiency and sustainability of learning deep models, we propose CREST, the first scalable framework with rigorous theoretical guarantees to identify the most valuable examples for training non-convex models, particularly deep networks. To guarantee convergence to a stationary point of a non-convex function, CREST models the non-convex loss as a series of quadratic functions and extracts a coreset for each quadratic sub-region. In addition, to ensure faster convergence of stochastic gradient methods such as (mini-batch) SGD, CREST iteratively extracts multiple mini-batch coresets from larger random subsets of training data, to ensure nearly-unbiased gradients with small variances. Finally, to further improve scalability and efficiency, CREST identifies and excludes the examples that are learned from the coreset selection pipeline. Our extensive experiments on several deep networks trained on vision and NLP datasets, including CIFAR-10, CIFAR-100, TinyImageNet, and SNLI, confirm that CREST speeds up training deep networks on very large datasets, by 1.7x to 2.5x with minimum loss in the performance. By analyzing the learning difficulty of the subsets selected by CREST, we show that deep models benefit the most by learning from subsets of increasing difficulty levels.

READ FULL TEXT
research
07/28/2022

Adaptive Second Order Coresets for Data-efficient Machine Learning

Training machine learning models on massive datasets incurs substantial ...
research
11/15/2020

Coresets for Robust Training of Neural Networks against Noisy Labels

Modern neural networks have the capacity to overfit noisy labels frequen...
research
04/07/2019

On The Power of Curriculum Learning in Training Deep Networks

Training neural networks is traditionally done by providing a sequence o...
research
07/09/2019

A Stochastic First-Order Method for Ordered Empirical Risk Minimization

We propose a new stochastic first-order method for empirical risk minimi...
research
01/05/2023

Training trajectories, mini-batch losses and the curious role of the learning rate

Stochastic gradient descent plays a fundamental role in nearly all appli...
research
06/11/2018

Dual Pattern Learning Networks by Empirical Dual Prediction Risk Minimization

Motivated by the observation that humans can learn patterns from two giv...
research
10/31/2018

MaSS: an Accelerated Stochastic Method for Over-parametrized Learning

In this paper we introduce MaSS (Momentum-added Stochastic Solver), an a...

Please sign up or login with your details

Forgot password? Click here to reset