Image-text Retrieval: A Survey on Recent Research and Development

03/28/2022
by   Min Cao, et al.
0

In the past few years, cross-modal image-text retrieval (ITR) has experienced increased interest in the research community due to its excellent research value and broad real-world application. It is designed for the scenarios where the queries are from one modality and the retrieval galleries from another modality. This paper presents a comprehensive and up-to-date survey on the ITR approaches from four perspectives. By dissecting an ITR system into two processes: feature extraction and feature alignment, we summarize the recent advance of the ITR approaches from these two perspectives. On top of this, the efficiency-focused study on the ITR system is introduced as the third perspective. To keep pace with the times, we also provide a pioneering overview of the cross-modal pre-training ITR approaches as the fourth perspective. Finally, we outline the common benchmark datasets and valuation metric for ITR, and conduct the accuracy comparison among the representative ITR approaches. Some critical yet less studied issues are discussed at the end of the paper.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/19/2020

DIME: An Online Tool for the Visual Comparison of Cross-Modal Retrieval Models

Cross-modal retrieval relies on accurate models to retrieve relevant res...
research
08/09/2021

Learning Joint Embedding with Modality Alignments for Cross-Modal Retrieval of Recipes and Food Images

This paper presents a three-tier modality alignment approach to learning...
research
07/16/2020

Preserving Semantic Neighborhoods for Robust Cross-modal Retrieval

The abundance of multimodal data (e.g. social media posts) has inspired ...
research
08/26/2023

Video and Audio are Images: A Cross-Modal Mixer for Original Data on Video-Audio Retrieval

Cross-modal retrieval has become popular in recent years, particularly w...
research
03/28/2021

HiT: Hierarchical Transformer with Momentum Contrast for Video-Text Retrieval

Video-Text Retrieval has been a hot research topic with the explosion of...
research
09/05/2023

A Survey on Interpretable Cross-modal Reasoning

In recent years, cross-modal reasoning (CMR), the process of understandi...

Please sign up or login with your details

Forgot password? Click here to reset