Unified Pretraining Target Based Video-music Retrieval With Music Rhythm And Video Optical Flow Information

09/18/2023
by   Tianjun Mao, et al.
0

Background music (BGM) can enhance the video's emotion. However, selecting an appropriate BGM often requires domain knowledge. This has led to the development of video-music retrieval techniques. Most existing approaches utilize pretrained video/music feature extractors trained with different target sets to obtain average video/music-level embeddings. The drawbacks are two-fold. One is that different target sets for video/music pretraining may cause the generated embeddings difficult to match. The second is that the underlying temporal correlation between video and music is ignored. In this paper, our proposed approach leverages a unified target set to perform video/music pretraining and produces clip-level embeddings to preserve temporal information. The downstream cross-modal matching is based on the clip-level features with embedded music rhythm and optical flow information. Experiments demonstrate that our proposed method can achieve superior performance over the state-of-the-art methods by a significant margin.

READ FULL TEXT
research
08/24/2023

Emotion-Aligned Contrastive Learning Between Images and Music

Traditional music search engines rely on retrieval methods that match na...
research
11/16/2022

Video-Music Retrieval:A Dual-Path Cross-Modal Network

We propose a method to recommend background music for videos. Current wo...
research
02/18/2023

SSVMR: Saliency-based Self-training for Video-Music Retrieval

With the rise of short videos, the demand for selecting appropriate back...
research
08/07/2022

Debiased Cross-modal Matching for Content-based Micro-video Background Music Recommendation

Micro-video background music recommendation is a complicated task where ...
research
06/14/2022

It's Time for Artistic Correspondence in Music and Video

We present an approach for recommending a music track for a given video,...
research
11/01/2021

A Novel 1D State Space for Efficient Music Rhythmic Analysis

Inferring music time structures has a broad range of applications in mus...
research
07/15/2021

Cross-modal Variational Auto-encoder for Content-based Micro-video Background Music Recommendation

In this paper, we propose a cross-modal variational auto-encoder (CMVAE)...

Please sign up or login with your details

Forgot password? Click here to reset