AMC: Attention guided Multi-modal Correlation Learning for Image Search

04/03/2017
by   Kan Chen, et al.
1

Given a user's query, traditional image search systems rank images according to its relevance to a single modality (e.g., image content or surrounding text). Nowadays, an increasing number of images on the Internet are available with associated meta data in rich modalities (e.g., titles, keywords, tags, etc.), which can be exploited for better similarity measure with queries. In this paper, we leverage visual and textual modalities for image search by learning their correlation with input query. According to the intent of query, attention mechanism can be introduced to adaptively balance the importance of different modalities. We propose a novel Attention guided Multi-modal Correlation (AMC) learning method which consists of a jointly learned hierarchy of intra and inter-attention networks. Conditioned on query's intent, intra-attention networks (i.e., visual intra-attention network and language intra-attention network) attend on informative parts within each modality; a multi-modal inter-attention network promotes the importance of the most query-relevant modalities. In experiments, we evaluate AMC models on the search logs from two real world image search engines and show a significant boost on the ranking of user-clicked images in search results. Additionally, we extend AMC models to caption ranking task on COCO dataset and achieve competitive results compared with recent state-of-the-arts.

READ FULL TEXT

page 1

page 7

research
05/06/2023

Mixer: Image to Multi-Modal Retrieval Learning for Industrial Application

Cross-modal retrieval, where the query is an image and the doc is an ite...
research
04/20/2019

Saliency-Guided Attention Network for Image-Sentence Matching

This paper studies the task of matching image and sentence, where learni...
research
09/30/2021

Multi-Modal Sarcasm Detection Based on Contrastive Attention Mechanism

In the past decade, sarcasm detection has been intensively conducted in ...
research
10/13/2019

Granular Multimodal Attention Networks for Visual Dialog

Vision and language tasks have benefited from attention. There have been...
research
09/25/2019

Focus Your Attention: A Bidirectional Focal Attention Network for Image-Text Matching

Learning semantic correspondence between image and text is significant a...
research
06/17/2020

Learning Colour Representations of Search Queries

Image search engines rely on appropriately designed ranking features tha...
research
06/11/2020

Attention improves concentration when learning node embeddings

We consider the problem of predicting edges in a graph from node attribu...

Please sign up or login with your details

Forgot password? Click here to reset