Is Cross-modal Information Retrieval Possible without Training?

04/20/2023
by   Hyunjin Choi, et al.
0

Encoded representations from a pretrained deep learning model (e.g., BERT text embeddings, penultimate CNN layer activations of an image) convey a rich set of features beneficial for information retrieval. Embeddings for a particular modality of data occupy a high-dimensional space of its own, but it can be semantically aligned to another by a simple mapping without training a deep neural net. In this paper, we take a simple mapping computed from the least squares and singular value decomposition (SVD) for a solution to the Procrustes problem to serve a means to cross-modal information retrieval. That is, given information in one modality such as text, the mapping helps us locate a semantically equivalent data item in another modality such as image. Using off-the-shelf pretrained deep learning models, we have experimented the aforementioned simple cross-modal mappings in tasks of text-to-image and image-to-text retrieval. Despite simplicity, our mappings perform reasonably well reaching the highest accuracy of 77 those requiring costly neural net training and fine-tuning. We have improved the simple mappings by contrastive learning on the pretrained models. Contrastive learning can be thought as properly biasing the pretrained encoders to enhance the cross-modal mapping quality. We have further improved the performance by multilayer perceptron with gating (gMLP), a simple neural architecture.

READ FULL TEXT
research
02/13/2023

CLIP-RR: Improved CLIP Network for Relation-Focused Cross-Modal Information Retrieval

Relation-focused cross-modal information retrieval focuses on retrieving...
research
04/20/2022

Uncertainty-based Cross-Modal Retrieval with Probabilistic Representations

Probabilistic embeddings have proven useful for capturing polysemous wor...
research
02/03/2018

Modeling Text with Graph Convolutional Network for Cross-Modal Information Retrieval

Cross-modal information retrieval aims to find heterogeneous data of var...
research
03/22/2021

Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for Improved Cross-Modal Retrieval

Current state-of-the-art approaches to cross-modal retrieval process tex...
research
05/20/2021

More Than Just Attention: Learning Cross-Modal Attentions with Contrastive Constraints

Attention mechanisms have been widely applied to cross-modal tasks such ...
research
05/19/2018

Do Neural Network Cross-Modal Mappings Really Bridge Modalities?

Feed-forward networks are widely used in cross-modal applications to bri...

Please sign up or login with your details

Forgot password? Click here to reset