Simple Baselines for Interactive Video Retrieval with Questions and Answers

08/21/2023
by   Kaiqu Liang, et al.
0

To date, the majority of video retrieval systems have been optimized for a "single-shot" scenario in which the user submits a query in isolation, ignoring previous interactions with the system. Recently, there has been renewed interest in interactive systems to enhance retrieval, but existing approaches are complex and deliver limited gains in performance. In this work, we revisit this topic and propose several simple yet effective baselines for interactive video retrieval via question-answering. We employ a VideoQA model to simulate user interactions and show that this enables the productive study of the interactive retrieval task without access to ground truth dialogue data. Experiments on MSR-VTT, MSVD, and AVSD show that our framework using question-based interaction significantly improves the performance of text-based video retrieval systems.

READ FULL TEXT
research
05/11/2022

Learning to Retrieve Videos by Asking Questions

The majority of traditional text-to-video retrieval systems operate in s...
research
04/15/2022

Improving Passage Retrieval with Zero-Shot Question Generation

We propose a simple and effective re-ranking method for improving passag...
research
03/23/2023

Dialogue-to-Video Retrieval

Recent years have witnessed an increasing amount of dialogue/conversatio...
research
09/04/2022

Interactive Question Answering Systems: Literature Review

Question answering systems are recognized as popular and frequently effe...
research
05/12/2020

Do not let the history haunt you – Mitigating Compounding Errors in Conversational Question Answering

The Conversational Question Answering (CoQA) task involves answering a s...
research
03/02/2021

Part2Whole: Iteratively Enrich Detail for Cross-Modal Retrieval with Partial Query

Text-based image retrieval has seen considerable progress in recent year...
research
04/09/2021

Fill-in-the-blank as a Challenging Video Understanding Evaluation Framework

Work to date on language-informed video understanding has primarily addr...

Please sign up or login with your details

Forgot password? Click here to reset