RTIC: Residual Learning for Text and Image Composition using Graph Convolutional Network

04/07/2021
by   Minchul Shin, et al.
0

In this paper, we study the compositional learning of images and texts for image retrieval. The query is given in the form of an image and text that describes the desired modifications to the image; the goal is to retrieve the target image that satisfies the given modifications and resembles the query by composing information in both the text and image modalities. To accomplish this task, we propose a simple new architecture using skip connections that can effectively encode the errors between the source and target images in the latent space. Furthermore, we introduce a novel method that combines the graph convolutional network (GCN) with existing composition methods. We find that the combination consistently improves the performance in a plug-and-play manner. We perform thorough and exhaustive experiments on several widely used datasets, and achieve state-of-the-art scores on the task with our model. To ensure fairness in comparison, we suggest a strict standard for the evaluation because a small difference in the training conditions can significantly affect the final performance. We release our implementation, including that of all the compared methods, for reproducibility.

READ FULL TEXT

page 4

page 10

page 12

research
06/19/2020

Compositional Learning of Image-Text Query for Image Retrieval

In this paper, we investigate the problem of retrieving images from a da...
research
09/17/2020

Image Retrieval for Structure-from-Motion via Graph Convolutional Network

Conventional image retrieval techniques for Structure-from-Motion (SfM) ...
research
03/08/2022

Image Search with Text Feedback by Additive Attention Compositional Learning

Effective image retrieval with text feedback stands to impact a range of...
research
12/18/2018

Composing Text and Image for Image Retrieval - An Empirical Odyssey

In this paper, we study the task of image retrieval, where the input que...
research
07/24/2021

Cycled Compositional Learning between Images and Text

We present an approach named the Cycled Composition Network that can mea...
research
05/17/2023

Self-Training Boosted Multi-Faceted Matching Network for Composed Image Retrieval

The composed image retrieval (CIR) task aims to retrieve the desired tar...

Please sign up or login with your details

Forgot password? Click here to reset