Fine-graind Image Classification via Combining Vision and Language

04/10/2017
by   Xiangteng He, et al.
0

Fine-grained image classification is a challenging task due to the large intra-class variance and small inter-class variance, aiming at recognizing hundreds of sub-categories belonging to the same basic-level category. Most existing fine-grained image classification methods generally learn part detection models to obtain the semantic parts for better classification accuracy. Despite achieving promising results, these methods mainly have two limitations: (1) not all the parts which obtained through the part detection models are beneficial and indispensable for classification, and (2) fine-grained image classification requires more detailed visual descriptions which could not be provided by the part locations or attribute annotations. For addressing the above two limitations, this paper proposes the two-stream model combining vision and language (CVL) for learning latent semantic representations. The vision stream learns deep representations from the original visual information via deep convolutional neural network. The language stream utilizes the natural language descriptions which could point out the discriminative parts or characteristics for each image, and provides a flexible and compact way of encoding the salient visual aspects for distinguishing sub-categories. Since the two streams are complementary, combining the two streams can further achieves better classification accuracy. Comparing with 12 state-of-the-art methods on the widely used CUB-200-2011 dataset for fine-grained image classification, the experimental results demonstrate our CVL approach achieves the best performance.

READ FULL TEXT

page 1

page 3

page 4

page 5

research
04/06/2017

Object-Part Attention Model for Fine-grained Image Classification

Fine-grained image classification is to recognize hundreds of subcategor...
research
08/03/2023

Deep Neural Networks Fused with Textures for Image Classification

Fine-grained image classification (FGIC) is a challenging task in comput...
research
10/13/2020

Two-Stream Compare and Contrast Network for Vertebral Compression Fracture Diagnosis

Differentiating Vertebral Compression Fractures (VCFs) associated with t...
research
02/26/2019

Unsupervised Part Mining for Fine-grained Image Classification

Fine-grained image classification remains challenging due to the large i...
research
09/07/2021

Fair Comparison: Quantifying Variance in Resultsfor Fine-grained Visual Categorization

For the task of image classification, researchers work arduously to deve...
research
09/20/2022

Fine-grained Classification of Solder Joints with α-skew Jensen-Shannon Divergence

Solder joint inspection (SJI) is a critical process in the production of...
research
09/25/2017

Fine-grained Discriminative Localization via Saliency-guided Faster R-CNN

Discriminative localization is essential for fine-grained image classifi...

Please sign up or login with your details

Forgot password? Click here to reset