Semantic Redundancies in Image-Classification Datasets: The 10 Don't Need

01/29/2019
by   Vighnesh Birodkar, et al.
0

Large datasets have been crucial to the success of deep learning models in the recent years, which keep performing better as they are trained with more labelled data. While there have been sustained efforts to make these models more data-efficient, the potential benefit of understanding the data itself, is largely untapped. Specifically, focusing on object recognition tasks, we wonder if for common benchmark datasets we can do better than random subsets of the data and find a subset that can generalize on par with the full dataset when trained on. To our knowledge, this is the first result that can find notable redundancies in CIFAR-10 and ImageNet datasets (at least 10 we observe semantic correlations between required and redundant images. We hope that our findings can motivate further research into identifying additional redundancies and exploiting them for more efficient training or data-collection.

READ FULL TEXT

page 2

page 4

page 5

page 7

page 10

page 11

research
04/26/2021

Balancing Constraints and Submodularity in Data Subset Selection

Deep learning has yielded extraordinary results in vision and natural la...
research
06/24/2016

Captioning Images with Diverse Objects

Recent captioning models are limited in their ability to scale and descr...
research
09/28/2021

A Strong Baseline for the VIPriors Data-Efficient Image Classification Challenge

Learning from limited amounts of data is the hallmark of intelligence, r...
research
12/08/2020

The Lottery Ticket Hypothesis for Object Recognition

Recognition tasks, such as object recognition and keypoint estimation, h...
research
05/22/2020

From ImageNet to Image Classification: Contextualizing Progress on Benchmarks

Building rich machine learning datasets in a scalable manner often neces...
research
02/17/2022

Visual Ground Truth Construction as Faceted Classification

Recent work in Machine Learning and Computer Vision has provided evidenc...
research
03/15/2021

How to distribute data across tasks for meta-learning?

Meta-learning models transfer the knowledge acquired from previous tasks...

Please sign up or login with your details

Forgot password? Click here to reset