Similarity Reasoning and Filtration for Image-Text Matching

01/05/2021
by   Haiwen Diao, et al.
5

Image-text matching plays a critical role in bridging the vision and language, and great progress has been made by exploiting the global alignment between image and sentence, or local alignments between regions and words. However, how to make the most of these alignments to infer more accurate matching scores is still underexplored. In this paper, we propose a novel Similarity Graph Reasoning and Attention Filtration (SGRAF) network for image-text matching. Specifically, the vector-based similarity representations are firstly learned to characterize the local and global alignments in a more comprehensive manner, and then the Similarity Graph Reasoning (SGR) module relying on one graph convolutional neural network is introduced to infer relation-aware similarities with both the local and global alignments. The Similarity Attention Filtration (SAF) module is further developed to integrate these alignments effectively by selectively attending on the significant and representative alignments and meanwhile casting aside the interferences of non-meaningful alignments. We demonstrate the superiority of the proposed method with achieving state-of-the-art performances on the Flickr30K and MSCOCO datasets, and the good interpretability of SGR and SAF modules with extensive qualitative experiments and analyses.

READ FULL TEXT

page 4

page 7

page 12

page 13

page 14

page 15

research
06/11/2021

Step-Wise Hierarchical Alignment Network for Image-Text Matching

Image-text matching plays a central role in bridging the semantic gap be...
research
08/30/2019

Fashion Retrieval via Graph Reasoning Networks on a Similarity Pyramid

Matching clothing images from customers and online shopping stores has r...
research
06/26/2023

Hierarchical Matching and Reasoning for Multi-Query Image Retrieval

As a promising field, Multi-Query Image Retrieval (MQIR) aims at searchi...
research
11/17/2016

Instance-aware Image and Sentence Matching with Selective Multimodal LSTM

Effective image and sentence matching depends on how to well measure the...
research
06/04/2021

A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval

Conventional approaches to image-text retrieval mainly focus on indexing...
research
05/20/2023

Bi-VLGM : Bi-Level Class-Severity-Aware Vision-Language Graph Matching for Text Guided Medical Image Segmentation

Medical reports with substantial information can be naturally complement...
research
02/20/2016

Text Matching as Image Recognition

Matching two texts is a fundamental problem in many natural language pro...

Please sign up or login with your details

Forgot password? Click here to reset