Cross-media Multi-level Alignment with Relation Attention Network

04/25/2018
by   Jinwei Qi, et al.
0

With the rapid growth of multimedia data, such as image and text, it is a highly challenging problem to effectively correlate and retrieve the data of different media types. Naturally, when correlating an image with textual description, people focus on not only the alignment between discriminative image regions and key words, but also the relations lying in the visual and textual context. Relation understanding is essential for cross-media correlation learning, which is ignored by prior cross-media retrieval works. To address the above issue, we propose Cross-media Relation Attention Network (CRAN) with multi-level alignment. First, we propose visual-language relation attention model to explore both fine-grained patches and their relations of different media types. We aim to not only exploit cross-media fine-grained local information, but also capture the intrinsic relation information, which can provide complementary hints for correlation learning. Second, we propose cross-media multi-level alignment to explore global, local and relation alignments across different media types, which can mutually boost to learn more precise cross-media correlation. We conduct experiments on 2 cross-media datasets, and compare with 10 state-of-the-art methods to verify the effectiveness of proposed approach.

READ FULL TEXT
research
07/10/2019

A New Benchmark and Approach for Fine-grained Cross-media Retrieval

Cross-media retrieval is to return the results of various media types co...
research
06/11/2021

Step-Wise Hierarchical Alignment Network for Image-Text Matching

Image-text matching plays a central role in bridging the semantic gap be...
research
08/12/2020

Fine-grained Visual Textual Alignment for Cross-Modal Retrieval using Transformer Encoders

Despite the evolution of deep-learning-based visual-textual processing s...
research
10/17/2022

Cross-layer Attention Network for Fine-grained Visual Categorization

Learning discriminative representations for subtle localized details pla...
research
03/18/2016

Unsupervised Cross-Media Hashing with Structure Preservation

Recent years have seen the exponential growth of heterogeneous multimedi...
research
04/07/2017

An Overview of Cross-media Retrieval: Concepts, Methodologies, Benchmarks and Challenges

Multimedia retrieval plays an indispensable role in big data utilization...
research
07/23/2019

Position Focused Attention Network for Image-Text Matching

Image-text matching tasks have recently attracted a lot of attention in ...

Please sign up or login with your details

Forgot password? Click here to reset