Visual Semantic Reasoning for Image-Text Matching

09/06/2019
by   Kunpeng Li, et al.
0

Image-text matching has been a hot research topic bridging the vision and language areas. It remains challenging because the current representation of image usually lacks global semantic concepts as in its corresponding text caption. To address this issue, we propose a simple and interpretable reasoning model to generate visual representation that captures key objects and semantic concepts of a scene. Specifically, we first build up connections between image regions and perform reasoning with Graph Convolutional Networks to generate features with semantic relationships. Then, we propose to use the gate and memory mechanism to perform global semantic reasoning on these relationship-enhanced features, select the discriminative information and gradually generate the representation for the whole scene. Experiments validate that our method achieves a new state-of-the-art for the image-text matching on MS-COCO and Flickr30K datasets. It outperforms the current best method by 6.8 relatively for image retrieval and 4.8 MS-COCO (Recall@1 using 1K test set). On Flickr30K, our model improves image retrieval by 12.6 (Recall@1). Our code is available at https://github.com/KunpengLi1994/VSRN.

READ FULL TEXT

page 3

page 7

page 8

research
03/21/2018

Stacked Cross Attention for Image-Text Matching

In this paper, we study the problem of image-text matching. Inferring th...
research
10/05/2022

Improving Visual-Semantic Embedding with Adaptive Pooling and Optimization Objective

Visual-Semantic Embedding (VSE) aims to learn an embedding space where r...
research
05/03/2023

A Neural Divide-and-Conquer Reasoning Framework for Image Retrieval from Linguistically Complex Text

Pretrained Vision-Language Models (VLMs) have achieved remarkable perfor...
research
06/16/2020

Exploiting Visual Semantic Reasoning for Video-Text Retrieval

Video retrieval is a challenging research topic bridging the vision and ...
research
07/01/2022

TopicFM: Robust and Interpretable Feature Matching with Topic-assisted

Finding correspondences across images is an important task in many visua...
research
07/17/2020

Consensus-Aware Visual-Semantic Embedding for Image-Text Matching

Image-text matching plays a central role in bridging vision and language...
research
04/07/2022

ECCV Caption: Correcting False Negatives by Collecting Machine-and-Human-verified Image-Caption Associations for MS-COCO

Image-Test matching (ITM) is a common task for evaluating the quality of...

Please sign up or login with your details

Forgot password? Click here to reset