Cross-Modal Retrieval: A Systematic Review of Methods and Future Directions

08/28/2023
by   Lei Zhu, et al.
0

With the exponential surge in diverse multi-modal data, traditional uni-modal retrieval methods struggle to meet the needs of users demanding access to data from various modalities. To address this, cross-modal retrieval has emerged, enabling interaction across modalities, facilitating semantic matching, and leveraging complementarity and consistency between different modal data. Although prior literature undertook a review of the cross-modal retrieval field, it exhibits numerous deficiencies pertaining to timeliness, taxonomy, and comprehensiveness. This paper conducts a comprehensive review of cross-modal retrieval's evolution, spanning from shallow statistical analysis techniques to vision-language pre-training models. Commencing with a comprehensive taxonomy grounded in machine learning paradigms, mechanisms, and models, the paper then delves deeply into the principles and architectures underpinning existing cross-modal retrieval methods. Furthermore, it offers an overview of widely used benchmarks, metrics, and performances. Lastly, the paper probes the prospects and challenges that confront contemporary cross-modal retrieval, while engaging in a discourse on potential directions for further progress in the field. To facilitate the research on cross-modal retrieval, we develop an open-source code repository at https://github.com/BMC-SDNU/Cross-Modal-Retrieval.

READ FULL TEXT

page 1

page 3

page 6

page 7

page 25

research
09/05/2023

A Survey on Interpretable Cross-modal Reasoning

In recent years, cross-modal reasoning (CMR), the process of understandi...
research
08/18/2021

X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics

With the rise and development of deep learning over the past decade, the...
research
05/16/2022

CascadER: Cross-Modal Cascading for Knowledge Graph Link Prediction

Knowledge graph (KG) link prediction is a fundamental task in artificial...
research
07/21/2016

A Comprehensive Survey on Cross-modal Retrieval

In recent years, cross-modal retrieval has drawn much attention due to t...
research
03/06/2021

Perspectives and Prospects on Transformer Architecture for Cross-Modal Tasks with Language and Vision

Transformer architectures have brought about fundamental changes to comp...
research
09/11/2023

From Artificially Real to Real: Leveraging Pseudo Data from Large Language Models for Low-Resource Molecule Discovery

Molecule discovery serves as a cornerstone in numerous scientific domain...
research
04/15/2023

CoVLR: Coordinating Cross-Modal Consistency and Intra-Modal Structure for Vision-Language Retrieval

Current vision-language retrieval aims to perform cross-modal instance s...

Please sign up or login with your details

Forgot password? Click here to reset