Contrastive Unsupervised Learning for Audio Fingerprinting

10/26/2020
by   Zhesong Yu, et al.
0

The rise of video-sharing platforms has attracted more and more people to shoot videos and upload them to the Internet. These videos mostly contain a carefully-edited background audio track, where serious speech change, pitch shifting and various types of audio effects may involve, and existing audio identification systems may fail to recognize the audio. To solve this problem, in this paper, we introduce the idea of contrastive learning to the task of audio fingerprinting (AFP). Contrastive learning is an unsupervised approach to learn representations that can effectively group similar samples and discriminate dissimilar ones. In our work, we consider an audio track and its differently distorted versions as similar while considering different audio tracks as dissimilar. Based on the momentum contrast (MoCo) framework, we devise a contrastive learning method for AFP, which can generate fingerprints that are both discriminative and robust. A set of experiments showed that our AFP method is effective for audio identification, with robustness to serious audio distortions, including the challenging speed change and pitch shifting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/22/2020

Neural Audio Fingerprint for High-specific Audio Retrieval based on Contrastive Learning

Most of existing audio fingerprinting systems have limitations to be use...
research
04/26/2021

Multimodal Self-Supervised Learning of General Audio Representations

We present a multimodal framework to learn general audio representations...
research
05/28/2019

Ensemble-based cover song detection

Audio-based cover song detection has received much attention in the MIR ...
research
09/21/2023

Audio Contrastive based Fine-tuning

Audio classification plays a crucial role in speech and sound processing...
research
04/06/2021

Strumming to the Beat: Audio-Conditioned Contrastive Video Textures

We introduce a non-parametric approach for infinite video texture synthe...
research
04/01/2021

Enriched Music Representations with Multiple Cross-modal Contrastive Learning

Modeling various aspects that make a music piece unique is a challenging...
research
07/15/2016

DCAR: A Discriminative and Compact Audio Representation to Improve Event Detection

This paper presents a novel two-phase method for audio representation, D...

Please sign up or login with your details

Forgot password? Click here to reset