StacMR: Scene-Text Aware Cross-Modal Retrieval

12/08/2020
by   Andrés Mafla, et al.
0

Recent models for cross-modal retrieval have benefited from an increasingly rich understanding of visual scenes, afforded by scene graphs and object interactions to mention a few. This has resulted in an improved matching between the visual representation of an image and the textual representation of its caption. Yet, current visual representations overlook a key aspect: the text appearing in images, which may contain crucial information for retrieval. In this paper, we first propose a new dataset that allows exploration of cross-modal retrieval where images contain scene-text instances. Then, armed with this dataset, we describe several approaches which leverage scene text, including a better scene-text aware cross-modal retrieval method which uses specialized representations for text from the captions and text from the visual scene, and reconcile them in a common embedding space. Extensive experiments confirm that cross-modal retrieval approaches benefit from scene text and highlight interesting research questions worth exploring further. Dataset and code are available at http://europe.naverlabs.com/stacmr

READ FULL TEXT

page 1

page 4

page 15

page 16

page 17

page 18

research
11/17/2017

Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models

Textual-visual cross-modal retrieval has been a hot research topic in bo...
research
10/11/2019

Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval

Image-text retrieval of natural scenes has been a popular research topic...
research
03/31/2022

ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval

Visual appearance is considered to be the most important cue to understa...
research
04/04/2021

Scene Text Retrieval via Joint Text Detection and Similarity Learning

Scene text retrieval aims to localize and search all text instances from...
research
08/23/2018

Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval

Cross-modal retrieval between visual data and natural language descripti...
research
01/12/2023

Scene-centric vs. Object-centric Image-Text Cross-modal Retrieval: A Reproducibility Study

Most approaches to cross-modal retrieval (CMR) focus either on object-ce...
research
01/10/2023

Pix2Map: Cross-modal Retrieval for Inferring Street Maps from Images

Self-driving vehicles rely on urban street maps for autonomous navigatio...

Please sign up or login with your details

Forgot password? Click here to reset