Learning a Joint Embedding Space of Monophonic and Mixed Music Signals for Singing Voice

06/26/2019
by   Kyungyun Lee, et al.
0

Previous approaches in singer identification have used one of monophonic vocal tracks or mixed tracks containing multiple instruments, leaving a semantic gap between these two domains of audio. In this paper, we present a system to learn a joint embedding space of monophonic and mixed tracks for singing voice. We use a metric learning method, which ensures that tracks from both domains of the same singer are mapped closer to each other than those of different singers. We train the system on a large synthetic dataset generated by music mashup to reflect real-world music recordings. Our approach opens up new possibilities for cross-domain tasks, e.g., given a monophonic track of a singer as a query, retrieving mixed tracks sung by the same singer from the database. Also, it requires no additional vocal enhancement steps such as source separation. We show the effectiveness of our system for singer identification and query-by-singer in both the same-domain and cross-domain tasks.

READ FULL TEXT
research
03/19/2023

Textless Speech-to-Music Retrieval Using Emotion Similarity

We introduce a framework that recommends music based on the emotions of ...
research
06/14/2020

Solos: A Dataset for Audio-Visual Music Analysis

In this paper, we present a new dataset of music performance videos whic...
research
02/17/2020

Addressing the confounds of accompaniments in singer identification

Identifying singers is an important task with many applications. However...
research
09/07/2022

Improving Choral Music Separation through Expressive Synthesized Data from Sampled Instruments

Choral music separation refers to the task of extracting tracks of voice...
research
01/14/2023

Music Playlist Title Generation Using Artist Information

Automatically generating or captioning music playlist titles given a set...
research
06/26/2019

Learning Soft-Attention Models for Tempo-invariant Audio-Sheet Music Retrieval

Connecting large libraries of digitized audio recordings to their corres...
research
02/03/2019

Deep Autotuner: A Data-Driven Approach to Natural-Sounding Pitch Correction for Singing Voice in Karaoke Performances

We describe a machine-learning approach to pitch correcting a solo singi...

Please sign up or login with your details

Forgot password? Click here to reset