Refining Image Categorization by Exploiting Web Images and General Corpus

03/16/2017
by   Yazhou Yao, et al.
0

Studies show that refining real-world categories into semantic subcategories contributes to better image modeling and classification. Previous image sub-categorization work relying on labeled images and WordNet's hierarchy is not only labor-intensive, but also restricted to classify images into NOUN subcategories. To tackle these problems, in this work, we exploit general corpus information to automatically select and subsequently classify web images into semantic rich (sub-)categories. The following two major challenges are well studied: 1) noise in the labels of subcategories derived from the general corpus; 2) noise in the labels of images retrieved from the web. Specifically, we first obtain the semantic refinement subcategories from the text perspective and remove the noise by the relevance-based approach. To suppress the search error induced noisy images, we then formulate image selection and classifier learning as a multi-class multi-instance learning problem and propose to solve the employed problem by the cutting-plane algorithm. The experiments show significant performance gains by using the generated data of our way on both image categorization and sub-categorization tasks. The proposed approach also consistently outperforms existing weakly supervised and web-supervised approaches.

READ FULL TEXT

page 1

page 8

page 10

research
08/22/2017

Towards Automatic Construction of Diverse, High-quality Image Dataset

The availability of labeled image datasets has been shown critical for h...
research
11/22/2016

Exploiting Web Images for Dataset Construction: A Domain Robust Approach

Labelled image datasets have played a critical role in high-level image ...
research
06/29/2016

How Many Folders Do You Really Need?

Email classification is still a mostly manual task. Consequently, most W...
research
02/21/2023

HierCat: Hierarchical Query Categorization from Weakly Supervised Data at Facebook Marketplace

Query categorization at customer-to-customer e-commerce platforms like F...
research
02/23/2021

Minimally-Supervised Structure-Rich Text Categorization via Learning on Text-Rich Networks

Text categorization is an essential task in Web content analysis. Consid...
research
01/19/2019

MOROCO: The Moldavian and Romanian Dialectal Corpus

In this work, we introduce the MOldavian and ROmanian Dialectal COrpus (...
research
05/05/2020

II-20: Intelligent and pragmatic analytic categorization of image collections

We introduce II-20 (Image Insight 2020), a multimedia analytics approach...

Please sign up or login with your details

Forgot password? Click here to reset