SSVMR: Saliency-based Self-training for Video-Music Retrieval

02/18/2023
by   Xuxin Cheng, et al.
0

With the rise of short videos, the demand for selecting appropriate background music (BGM) for a video has increased significantly, video-music retrieval (VMR) task gradually draws much attention by research community. As other cross-modal learning tasks, existing VMR approaches usually attempt to measure the similarity between the video and music in the feature space. However, they (1) neglect the inevitable label noise; (2) neglect to enhance the ability to capture critical video clips. In this paper, we propose a novel saliency-based self-training framework, which is termed SSVMR. Specifically, we first explore to fully make use of the information containing in the training dataset by applying a semi-supervised method to suppress the adverse impact of label noise problem, where a self-training approach is adopted. In addition, we propose to capture the saliency of the video by mixing two videos at span level and preserving the locality of the two original videos. Inspired by back translation in NLP, we also conduct back retrieval to obtain more training data. Experimental results on MVD dataset show that our SSVMR achieves the state-of-the-art performance by a large margin, obtaining a relative improvement of 34.8

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/21/2021

Deep Music Retrieval for Fine-Grained Videos by Exploiting Cross-Modal-Encoded Voice-Overs

Recently, the witness of the rapidly growing popularity of short videos ...
research
03/22/2023

VMCML: Video and Music Matching via Cross-Modality Lifting

We propose a content-based system for matching video and background musi...
research
12/07/2021

STC-mix: Space, Time, Channel mixing for Self-supervised Video Representation

Contrastive representation learning of videos highly relies on the avail...
research
09/18/2023

Unified Pretraining Target Based Video-music Retrieval With Music Rhythm And Video Optical Flow Information

Background music (BGM) can enhance the video's emotion. However, selecti...
research
11/16/2022

Video-Music Retrieval:A Dual-Path Cross-Modal Network

We propose a method to recommend background music for videos. Current wo...
research
11/19/2020

Watch and Learn: Mapping Language and Noisy Real-world Videos with Self-supervision

In this paper, we teach machines to understand visuals and natural langu...
research
03/03/2023

AutoMatch: A Large-scale Audio Beat Matching Benchmark for Boosting Deep Learning Assistant Video Editing

The explosion of short videos has dramatically reshaped the manners peop...

Please sign up or login with your details

Forgot password? Click here to reset