VTC: Improving Video-Text Retrieval with User Comments

10/19/2022
by   Laura Hanu, et al.
2

Multi-modal retrieval is an important problem for many applications, such as recommendation and search. Current benchmarks and even datasets are often manually constructed and consist of mostly clean samples where all modalities are well-correlated with the content. Thus, current video-text retrieval literature largely focuses on video titles or audio transcripts, while ignoring user comments, since users often tend to discuss topics only vaguely related to the video. Despite the ubiquity of user comments online, there is currently no multi-modal representation learning datasets that includes comments. In this paper, we a) introduce a new dataset of videos, titles and comments; b) present an attention-based mechanism that allows the model to learn from sometimes irrelevant data such as comments; c) show that by using comments, our method is able to learn better, more contextualised, representations for image, video and audio representations. Project page: https://unitaryai.github.io/vtc-paper.

READ FULL TEXT

page 3

page 29

page 30

page 31

page 33

page 36

research
02/07/2020

Multimodal Matching Transformer for Live Commenting

Automatic live commenting aims to provide real-time comments on videos f...
research
07/23/2023

Multi-Modal Machine Learning for Assessing Gaming Skills in Online Streaming: A Case Study with CS:GO

Online streaming is an emerging market that address much attention. Asse...
research
02/09/2021

Fashion Focus: Multi-modal Retrieval System for Video Commodity Localization in E-commerce

Nowadays, live-stream and short video shopping in E-commerce have grown ...
research
04/26/2021

GPT2MVS: Generative Pre-trained Transformer-2 for Multi-modal Video Summarization

Traditional video summarization methods generate fixed video representat...
research
08/04/2021

The Potential of Using Vision Videos for CrowdRE: Video Comments as a Source of Feedback

Vision videos are established for soliciting feedback and stimulating di...
research
07/27/2022

AutoTransition: Learning to Recommend Video Transition Effects

Video transition effects are widely used in video editing to connect sho...
research
05/28/2017

Deep Learning for User Comment Moderation

Experimenting with a new dataset of 1.6M user comments from a Greek news...

Please sign up or login with your details

Forgot password? Click here to reset