Learning From Noisy Large-Scale Datasets With Minimal Supervision

01/06/2017
by   Andreas Veit, et al.
0

We present an approach to effectively use millions of images with noisy annotations in conjunction with a small subset of cleanly-annotated images to learn powerful image representations. One common approach to combine clean and noisy data is to first pre-train a network using the large noisy dataset and then fine-tune with the clean dataset. We show this approach does not fully leverage the information contained in the clean set. Thus, we demonstrate how to use the clean annotations to reduce the noise in the large dataset before fine-tuning the network using both the clean set and the full set with reduced noise. The approach comprises a multi-task network that jointly learns to clean noisy annotations and to accurately classify images. We evaluate our approach on the recently released Open Images dataset, containing 9 million images, multiple annotations per image and over 6000 unique classes. For the small clean set of annotations we use a quarter of the validation set with 40k images. Our results demonstrate that the proposed approach clearly outperforms direct fine-tuning across all major categories of classes in the Open Image dataset. Further, our approach is particularly effective for a large number of classes with wide range of noise in annotations (20-80 annotations).

READ FULL TEXT

page 1

page 2

page 3

page 6

page 8

research
06/10/2019

Learning to Segment Skin Lesions from Noisy Annotations

Deep convolutional neural networks have driven substantial advancements ...
research
03/31/2018

Iterative Learning with Open-set Noisy Labels

Large-scale datasets possessing clean label annotations are crucial for ...
research
12/06/2016

Tag Prediction at Flickr: a View from the Darkroom

Automated photo tagging has established itself as one of the most compel...
research
05/27/2021

Using Early-Learning Regularization to Classify Real-World Noisy Data

The memorization problem is well-known in the field of computer vision. ...
research
10/04/2021

An Empirical Investigation of Learning from Biased Toxicity Labels

Collecting annotations from human raters often results in a trade-off be...
research
04/07/2021

MultiScene: A Large-scale Dataset and Benchmark for Multi-scene Recognition in Single Aerial Images

Aerial scene recognition is a fundamental research problem in interpreti...
research
03/29/2022

Clean Implicit 3D Structure from Noisy 2D STEM Images

Scanning Transmission Electron Microscopes (STEMs) acquire 2D images of ...

Please sign up or login with your details

Forgot password? Click here to reset