Graph Structured Network for Image-Text Matching

04/01/2020
by   Chunxiao Liu, et al.
0

Image-text matching has received growing interest since it bridges vision and language. The key challenge lies in how to learn correspondence between image and text. Existing works learn coarse correspondence based on object co-occurrence statistics, while failing to learn fine-grained phrase correspondence. In this paper, we present a novel Graph Structured Matching Network (GSMN) to learn fine-grained correspondence. The GSMN explicitly models object, relation and attribute as a structured phrase, which not only allows to learn correspondence of object, relation and attribute separately, but also benefits to learn fine-grained correspondence of structured phrase. This is achieved by node-level matching and structure-level matching. The node-level matching associates each node with its relevant nodes from another modality, where the node can be object, relation or attribute. The associated nodes then jointly infer fine-grained correspondence by fusing neighborhood associations at structure-level matching. Comprehensive experiments show that GSMN outperforms state-of-the-art methods on benchmarks, with relative Recall@1 improvements of nearly 7 will be released at: https://github.com/CrossmodalGroup/GSMN.

READ FULL TEXT

page 1

page 8

research
03/20/2023

DocRED-FE: A Document-Level Fine-Grained Entity And Relation Extraction Dataset

Joint entity and relation extraction (JERE) is one of the most important...
research
09/25/2019

Focus Your Attention: A Bidirectional Focal Attention Network for Image-Text Matching

Learning semantic correspondence between image and text is significant a...
research
03/17/2021

Learning with Group Noise

Machine learning in the context of noise is a challenging but practical ...
research
12/08/2022

Graph Matching with Bi-level Noisy Correspondence

In this paper, we study a novel and widely existing problem in graph mat...
research
03/12/2022

SIGMA: Semantic-complete Graph Matching for Domain Adaptive Object Detection

Domain Adaptive Object Detection (DAOD) leverages a labeled domain to le...
research
12/09/2020

SnapMix: Semantically Proportional Mixing for Augmenting Fine-grained Data

Data mixing augmentation has proved effective in training deep models. R...
research
08/28/2019

Leveraging Structural and Semantic Correspondence for Attribute-Oriented Aspect Sentiment Discovery

Opinionated text often involves attributes such as authorship and locati...

Please sign up or login with your details

Forgot password? Click here to reset