On Deep Representation Learning from Noisy Web Images

12/15/2015
by   Phong D. Vo, et al.
0

The keep-growing content of Web images may be the next important data source to scale up deep neural networks, which recently obtained a great success in the ImageNet classification challenge and related tasks. This prospect, however, has not been validated on convolutional networks (convnet) -- one of best performing deep models -- because of their supervised regime. While unsupervised alternatives are not so good as convnet in generalizing the learned model to new domains, we use convnet to leverage semi-supervised representation learning. Our approach is to use massive amounts of unlabeled and noisy Web images to train convnets as general feature detectors despite challenges coming from data such as high level of mislabeled data, outliers, and data biases. Extensive experiments are conducted at several data scales, different network architectures, and data reranking techniques. The learned representations are evaluated on nine public datasets of various topics. The best results obtained by our convnets, trained on 3.14 million Web images, outperform AlexNet trained on 1.2 million clean images of ILSVRC 2012 and is closing the gap with VGG-16. These prominent results suggest a budget solution to use deep learning in practice and motivate more research in semi-supervised representation learning.

READ FULL TEXT

page 27

page 28

page 29

page 30

research
05/10/2020

Supervision and Source Domain Impact on Representation Learning: A Histopathology Case Study

As many algorithms depend on a suitable representation of data, learning...
research
01/01/2023

Trojaning semi-supervised learning model via poisoning wild images on the web

Wild images on the web are vulnerable to backdoor (also called trojan) p...
research
05/29/2015

CURL: Co-trained Unsupervised Representation Learning for Image Classification

In this paper we propose a strategy for semi-supervised image classifica...
research
05/22/2019

Data-Efficient Image Recognition with Contrastive Predictive Coding

Large scale deep learning excels when labeled images are abundant, yet d...
research
10/01/2021

SMATE: Semi-Supervised Spatio-Temporal Representation Learning on Multivariate Time Series

Learning from Multivariate Time Series (MTS) has attracted widespread at...
research
10/21/2019

Icentia11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery

We release the largest public ECG dataset of continuous raw signals for ...
research
08/17/2022

How does the degree of novelty impacts semi-supervised representation learning for novel class retrieval?

Supervised representation learning with deep networks tends to overfit t...

Please sign up or login with your details

Forgot password? Click here to reset