End-to-end Knowledge Retrieval with Multi-modal Queries

06/01/2023
by   Man Luo, et al.
0

We investigate knowledge retrieval with multi-modal queries, i.e. queries containing information split across image and text inputs, a challenging task that differs from previous work on cross-modal retrieval. We curate a new dataset called ReMuQ for benchmarking progress on this task. ReMuQ requires a system to retrieve knowledge from a large corpus by integrating contents from both text and image queries. We introduce a retriever model “ReViz” that can directly process input text and images to retrieve relevant knowledge in an end-to-end fashion without being dependent on intermediate modules such as object detectors or caption generators. We introduce a new pretraining task that is effective for learning knowledge retrieval with multimodal queries and also improves performance on downstream tasks. We demonstrate superior performance in retrieval on two datasets (ReMuQ and OK-VQA) under zero-shot settings as well as further improvements when finetuned on these datasets.

READ FULL TEXT

page 2

page 5

page 8

page 9

page 13

page 14

page 15

research
08/29/2023

Cross-Modal Retrieval Meets Inference:Improving Zero-Shot Classification with Cross-Modal Retrieval

Contrastive language-image pre-training (CLIP) has demonstrated remarkab...
research
05/05/2021

Audio Retrieval with Natural Language Queries

We consider the task of retrieving audio using free-form natural languag...
research
10/19/2020

DIME: An Online Tool for the Visual Comparison of Cross-Modal Retrieval Models

Cross-modal retrieval relies on accurate models to retrieve relevant res...
research
03/22/2021

Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for Improved Cross-Modal Retrieval

Current state-of-the-art approaches to cross-modal retrieval process tex...
research
03/01/2023

RAMM: Retrieval-augmented Biomedical Visual Question Answering with Multi-modal Pre-training

Vision-and-language multi-modal pretraining and fine-tuning have shown g...
research
04/23/2022

Training and challenging models for text-guided fashion image retrieval

Retrieving relevant images from a catalog based on a query image togethe...
research
08/31/2023

Learning with Multi-modal Gradient Attention for Explainable Composed Image Retrieval

We consider the problem of composed image retrieval that takes an input ...

Please sign up or login with your details

Forgot password? Click here to reset