Accurate and Scalable Version Identification Using Musically-Motivated Embeddings

10/28/2019
by   Furkan Yesiler, et al.
0

The version identification (VI) task deals with the automatic detection of recordings that correspond to the same underlying musical piece. Despite many efforts, VI is still an open problem, with much room for improvement, specially with regard to combining accuracy and scalability. In this paper, we present MOVE, a musically-motivated method for accurate and scalable version identification. MOVE achieves state-of-the-art performance on two publicly-available benchmark sets by learning scalable embeddings in an Euclidean distance space, using a triplet loss and a hard triplet mining strategy. It improves over previous work by employing an alternative input representation, and introducing a novel technique for temporal content summarization, a standardized latent space, and a data augmentation strategy specifically designed for VI. In addition to the main results, we perform an ablation study to highlight the importance of our design choices, and study the relation between embedding dimensionality and model performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/07/2020

Less is more: Faster and better music version identification with embedding distillation

Version identification systems aim to detect different renditions of the...
research
11/02/2020

Set Augmented Triplet Loss for Video Person Re-Identification

Modern video person re-identification (re-ID) machines are often trained...
research
06/07/2023

A Fair Classifier Embracing Triplet Collapse

In this paper, we study the behaviour of the triplet loss and show that ...
research
02/28/2019

Robust Re-identification of Manta Rays from Natural Markings by Learning Pose Invariant Embeddings

Visual identification of individual animals that bear unique natural bod...
research
10/01/2019

Latent space representation for multi-target speaker detection and identification with a sparse dataset using Triplet neural networks

We present an approach to tackle the speaker recognition problem using T...
research
09/19/2019

Triplet-Aware Scene Graph Embeddings

Scene graphs have become an important form of structured knowledge for t...
research
05/05/2021

Novelty Detection and Analysis of Traffic Scenario Infrastructures in the Latent Space of a Vision Transformer-Based Triplet Autoencoder

Detecting unknown and untested scenarios is crucial for scenario-based t...

Please sign up or login with your details

Forgot password? Click here to reset