Fine-grained Image Classification and Retrieval by Combining Visual and Locally Pooled Textual Features

01/14/2020
by   Andrés Mafla, et al.
6

Text contained in an image carries high-level semantics that can be exploited to achieve richer image understanding. In particular, the mere presence of text provides strong guiding content that should be employed to tackle a diversity of computer vision tasks such as image retrieval, fine-grained classification, and visual question answering. In this paper, we address the problem of fine-grained classification and image retrieval by leveraging textual information along with visual cues to comprehend the existing intrinsic relation between the two modalities. The novelty of the proposed model consists of the usage of a PHOC descriptor to construct a bag of textual words along with a Fisher Vector Encoding that captures the morphology of text. This approach provides a stronger multimodal representation for this task and as our experiments demonstrate, it achieves state-of-the-art results on two different tasks, fine-grained classification and image retrieval.

READ FULL TEXT

page 3

page 6

page 8

research
09/21/2020

Multi-Modal Reasoning Graph for Scene-Text Based Fine-Grained Image Classification and Retrieval

Scene text instances found in natural images carry explicit semantic inf...
research
02/09/2021

Telling the What while Pointing the Where: Fine-grained Mouse Trace and Language Supervision for Improved Image Retrieval

Existing image retrieval systems use text queries to provide a natural a...
research
08/27/2018

Single Shot Scene Text Retrieval

Textual information found in scene images provides high level semantic i...
research
05/23/2018

Neural Network Interpretation via Fine Grained Textual Summarization

Current visualization based network interpretation methodssuffer from la...
research
04/18/2016

Selective Convolutional Descriptor Aggregation for Fine-Grained Image Retrieval

Deep convolutional neural network models pre-trained for the ImageNet cl...
research
10/29/2014

A comparison of dense region detectors for image search and fine-grained classification

We consider a pipeline for image classification or search based on codin...
research
09/18/2022

ERNIE-mmLayout: Multi-grained MultiModal Transformer for Document Understanding

Recent efforts of multimodal Transformers have improved Visually Rich Do...

Please sign up or login with your details

Forgot password? Click here to reset