Neural Vocoder Feature Estimation for Dry Singing Voice Separation

11/29/2022
by   Jaekwon Im, et al.
0

Singing voice separation (SVS) is a task that separates singing voice audio from its mixture with instrumental audio. Previous SVS studies have mainly employed the spectrogram masking method which requires a large dimensionality in predicting the binary masks. In addition, they focused on extracting a vocal stem that retains the wet sound with the reverberation effect. This result may hinder the reusability of the isolated singing voice. This paper addresses the issues by predicting mel-spectrogram of dry singing voices from the mixed audio as neural vocoder features and synthesizing the singing voice waveforms from the neural vocoder. We experimented with two separation methods. One is predicting binary masks in the mel-spectrogram domain and the other is directly predicting the mel-spectrogram. Furthermore, we add a singing voice detector to identify the singing voice segments over time more explicitly. We measured the model performance in terms of audio, dereverberation, separation, and overall quality. The results show that our proposed model outperforms state-of-the-art singing voice separation models in both objective and subjective evaluation except the audio quality.

READ FULL TEXT

page 3

page 5

research
04/20/2021

A cappella: Audio-visual Singing Voice Separation

Music source separation can be interpreted as the estimation of the cons...
research
07/01/2021

Audiovisual Singing Voice Separation

Separating a song into vocal and accompaniment components is an active r...
research
03/08/2022

VoViT: Low Latency Graph-based Audio-Visual Voice Separation Transformer

This paper presents an audio-visual approach for voice separation which ...
research
11/03/2020

Complex ratio masking for singing voice separation

Music source separation is important for applications such as karaoke an...
research
06/06/2019

Singing voice separation: a study on training data

In the recent years, singing voice separation systems showed increased p...
research
07/06/2020

Revisiting Representation Learning for Singing Voice Separation with Sinkhorn Distances

In this work we present a method for unsupervised learning of audio repr...
research
02/24/2015

A Review of Audio Features and Statistical Models Exploited for Voice Pattern Design

Audio fingerprinting, also named as audio hashing, has been well-known a...

Please sign up or login with your details

Forgot password? Click here to reset