Less is More: An Exploration of Data Redundancy with Active Dataset Subsampling

05/29/2019
by   Kashyap Chitta, et al.
0

Deep Neural Networks (DNNs) often rely on very large datasets for training. Given the large size of such datasets, it is conceivable that they contain certain samples that either do not contribute or negatively impact the DNN's performance. If there is a large number of such samples, subsampling the training dataset in a way that removes them could provide an effective solution to both improve performance and reduce training time. In this paper, we propose an approach called Active Dataset Subsampling (ADS), to identify favorable subsets within a dataset for training using ensemble based uncertainty estimation. When applied to three image classification benchmarks (CIFAR-10, CIFAR-100 and ImageNet) we find that there are low uncertainty subsets, which can be as large as 50 These subsets are identified and removed with ADS. We demonstrate that datasets obtained using ADS with a lightweight ResNet-18 ensemble remain effective when used to train deeper models like ResNet-101. Our results provide strong empirical evidence that using all the available data for training can hurt performance on large scale vision tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/29/2019

Impact of Training Dataset Size on Neural Answer Selection Models

It is held as a truism that deep neural networks require large datasets ...
research
11/08/2018

Large-Scale Visual Active Learning with Deep Probabilistic Ensembles

Annotating the right data for training deep neural networks is an import...
research
01/06/2021

LightLayers: Parameter Efficient Dense and Convolutional Layers for Image Classification

Deep Neural Networks (DNNs) have become the de-facto standard in compute...
research
11/30/2022

Dynamic Sparse Training via Balancing the Exploration-Exploitation Trade-off

Over-parameterization of deep neural networks (DNNs) has shown high pred...
research
02/14/2022

PFGE: Parsimonious Fast Geometric Ensembling of DNNs

Ensemble methods have been widely used to improve the performance of mac...
research
09/02/2022

Impact of Scaled Image on Robustness of Deep Neural Networks

Deep neural networks (DNNs) have been widely used in computer vision tas...
research
08/10/2023

Composable Core-sets for Diversity Approximation on Multi-Dataset Streams

Core-sets refer to subsets of data that maximize some function that is c...

Please sign up or login with your details

Forgot password? Click here to reset