Deep Music Retrieval for Fine-Grained Videos by Exploiting Cross-Modal-Encoded Voice-Overs

04/21/2021
by   Tingtian Li, et al.
0

Recently, the witness of the rapidly growing popularity of short videos on different Internet platforms has intensified the need for a background music (BGM) retrieval system. However, existing video-music retrieval methods only based on the visual modality cannot show promising performance regarding videos with fine-grained virtual contents. In this paper, we also investigate the widely added voice-overs in short videos and propose a novel framework to retrieve BGM for fine-grained short videos. In our framework, we use the self-attention (SA) and the cross-modal attention (CMA) modules to explore the intra- and the inter-relationships of different modalities respectively. For balancing the modalities, we dynamically assign different weights to the modal features via a fusion gate. For paring the query and the BGM embeddings, we introduce a triplet pseudo-label loss to constrain the semantics of the modal embeddings. As there are no existing virtual-content video-BGM retrieval datasets, we build and release two virtual-content video datasets HoK400 and CFM400. Experimental results show that our method achieves superior performance and outperforms other state-of-the-art methods with large margins.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/10/2019

Audio-Visual Embedding for Cross-Modal MusicVideo Retrieval through Supervised Deep CCA

Deep learning has successfully shown excellent performance in learning j...
research
02/18/2023

SSVMR: Saliency-based Self-training for Video-Music Retrieval

With the rise of short videos, the demand for selecting appropriate back...
research
09/11/2018

FIVR: Fine-grained Incident Video Retrieval

This paper introduces the problem of Fine-grained Incident Video Retriev...
research
05/25/2022

You Need to Read Again: Multi-granularity Perception Network for Moment Retrieval in Videos

Moment retrieval in videos is a challenging task that aims to retrieve t...
research
11/24/2019

A Proposal-based Approach for Activity Image-to-Video Retrieval

Activity image-to-video retrieval task aims to retrieve videos containin...
research
08/12/2023

Contrastive Learning for Cross-modal Artist Retrieval

Music retrieval and recommendation applications often rely on content fe...
research
04/04/2018

Fine-grained Video Attractiveness Prediction Using Multimodal Deep Learning on a Large Real-world Dataset

Nowadays, billions of videos are online ready to be viewed and shared. A...

Please sign up or login with your details

Forgot password? Click here to reset