Image Search with Text Feedback by Additive Attention Compositional Learning

03/08/2022
by   Yuxin Tian, et al.
22

Effective image retrieval with text feedback stands to impact a range of real-world applications, such as e-commerce. Given a source image and text feedback that describes the desired modifications to that image, the goal is to retrieve the target images that resemble the source yet satisfy the given modifications by composing a multi-modal (image-text) query. We propose a novel solution to this problem, Additive Attention Compositional Learning (AACL), that uses a multi-modal transformer-based architecture and effectively models the image-text contexts. Specifically, we propose a novel image-text composition module based on additive attention that can be seamlessly plugged into deep neural networks. We also introduce a new challenging benchmark derived from the Shopping100k dataset. AACL is evaluated on three large-scale datasets (FashionIQ, Fashion200k, and Shopping100k), each with strong baselines. Extensive experiments show that AACL achieves new state-of-the-art results on all three datasets.

READ FULL TEXT

page 2

page 12

page 14

research
06/19/2020

Compositional Learning of Image-Text Query for Image Retrieval

In this paper, we investigate the problem of retrieving images from a da...
research
09/03/2020

TRACE: Transform Aggregate and Compose Visiolinguistic Representations for Image Search with Text Feedback

The ability to efficiently search for images over an indexed database is...
research
06/30/2020

Modality-Agnostic Attention Fusion for visual search with text feedback

Image retrieval with natural language feedback offers the promise of cat...
research
10/11/2022

ViLPAct: A Benchmark for Compositional Generalization on Multimodal Human Activities

We introduce ViLPAct, a novel vision-language benchmark for human activi...
research
08/31/2023

Learning with Multi-modal Gradient Attention for Explainable Composed Image Retrieval

We consider the problem of composed image retrieval that takes an input ...
research
04/07/2021

RTIC: Residual Learning for Text and Image Composition using Graph Convolutional Network

In this paper, we study the compositional learning of images and texts f...
research
04/20/2020

Transformer Reasoning Network for Image-Text Matching and Retrieval

Image-text matching is an interesting and fascinating task in modern AI ...

Please sign up or login with your details

Forgot password? Click here to reset