Position Focused Attention Network for Image-Text Matching

07/23/2019
by   Yaxiong Wang, et al.
0

Image-text matching tasks have recently attracted a lot of attention in the computer vision field. The key point of this cross-domain problem is how to accurately measure the similarity between the visual and the textual contents, which demands a fine understanding of both modalities. In this paper, we propose a novel position focused attention network (PFAN) to investigate the relation between the visual and the textual views. In this work, we integrate the object position clue to enhance the visual-text joint-embedding learning. We first split the images into blocks, by which we infer the relative position of region in the image. Then, an attention mechanism is proposed to model the relations between the image region and blocks and generate the valuable position feature, which will be further utilized to enhance the region expression and model a more reliable relationship between the visual image and the textual sentence. Experiments on the popular datasets Flickr30K and MS-COCO show the effectiveness of the proposed method. Besides the public datasets, we also conduct experiments on our collected practical large-scale news dataset (Tencent-News) to validate the practical application value of proposed method. As far as we know, this is the first attempt to test the performance on the practical application. Our method achieves the state-of-art performance on all of these three datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 6

research
01/11/2020

MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding

Visual-semantic embedding enables various tasks such as image-text retri...
research
03/11/2021

Duplex Contextual Relation Network for Polyp Segmentation

Polyp segmentation is of great importance in the early diagnosis and tre...
research
11/09/2022

Automated MRI Field of View Prescription from Region of Interest Prediction by Intra-stack Attention Neural Network

Manual prescription of the field of view (FOV) by MRI technologists is v...
research
06/10/2020

Interpretable Multimodal Learning for Intelligent Regulation in Online Payment Systems

With the explosive growth of transaction activities in online payment sy...
research
06/17/2019

ParNet: Position-aware Aggregated Relation Network for Image-Text matching

Exploring fine-grained relationship between entities(e.g. objects in ima...
research
04/25/2018

Cross-media Multi-level Alignment with Relation Attention Network

With the rapid growth of multimedia data, such as image and text, it is ...
research
09/01/2020

SPAN: Spatial Pyramid Attention Network forImage Manipulation Localization

We present a novel framework, Spatial Pyramid Attention Network (SPAN) f...

Please sign up or login with your details

Forgot password? Click here to reset