Towards Debiasing Frame Length Bias in Text-Video Retrieval via Causal Intervention

09/17/2023
by   Burak Satar, et al.
0

Many studies focus on improving pretraining or developing new backbones in text-video retrieval. However, existing methods may suffer from the learning and inference bias issue, as recent research suggests in other text-video-related tasks. For instance, spatial appearance features on action recognition or temporal object co-occurrences on video scene graph generation could induce spurious correlations. In this work, we present a unique and systematic study of a temporal bias due to frame length discrepancy between training and test sets of trimmed video clips, which is the first such attempt for a text-video retrieval task, to the best of our knowledge. We first hypothesise and verify the bias on how it would affect the model illustrated with a baseline study. Then, we propose a causal debiasing approach and perform extensive experiments and ablation studies on the Epic-Kitchens-100, YouCook2, and MSR-VTT datasets. Our model overpasses the baseline and SOTA on nDCG, a semantic-relevancy-focused evaluation metric which proves the bias is mitigated, as well as on the other conventional metrics.

READ FULL TEXT

page 1

page 2

research
06/07/2022

Revealing Single Frame Bias for Video-and-Language Learning

Training an effective video-and-language model intuitively requires mult...
research
06/07/2023

An Overview of Challenges in Egocentric Text-Video Retrieval

Text-video retrieval contains various challenges, including biases comin...
research
04/18/2021

CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval

Video-text retrieval plays an essential role in multi-modal research and...
research
05/17/2022

A CLIP-Hitchhiker's Guide to Long Video Retrieval

Our goal in this paper is the adaptation of image-text models for long v...
research
12/20/2022

Debiasing Stance Detection Models with Counterfactual Reasoning and Adversarial Bias Learning

Stance detection models may tend to rely on dataset bias in the text par...
research
09/03/2023

SOAR: Scene-debiasing Open-set Action Recognition

Deep learning models have a risk of utilizing spurious clues to make pre...
research
04/27/2021

TRECVID 2020: A comprehensive campaign for evaluating video retrieval tasks across multiple application domains

The TREC Video Retrieval Evaluation (TRECVID) is a TREC-style video anal...

Please sign up or login with your details

Forgot password? Click here to reset