Disentangled speaker and nuisance attribute embedding for robust speaker verification

08/07/2020
by   Woo Hyun Kang, et al.
0

Over the recent years, various deep learning-based embedding methods have been proposed and have shown impressive performance in speaker verification. However, as in most of the classical embedding techniques, the deep learning-based methods are known to suffer from severe performance degradation when dealing with speech samples with different conditions (e.g., recording devices, emotional states). In this paper, we propose a novel fully supervised training method for extracting a speaker embedding vector disentangled from the variability caused by the nuisance attributes. The proposed framework was compared with the conventional deep learning-based embedding methods using the RSR2015 and VoxCeleb1 dataset. Experimental results show that the proposed approach can extract speaker embeddings robust to channel and emotional variability.

READ FULL TEXT

page 1

page 2

page 6

page 7

page 9

page 10

page 11

page 12

research
12/07/2021

Robust Speech Representation Learning via Flow-based Embedding Regularization

Over the recent years, various deep learning-based methods were proposed...
research
11/27/2019

Powerful Speaker Embedding Training Framework by Adversarially Disentangled Identity Representation

The main challenge of speaker verification in the wild is the interferen...
research
01/28/2020

Masked cross self-attention encoding for deep speaker embedding

In general, speaker verification tasks require the extraction of speaker...
research
10/28/2017

Speaker Diarization with LSTM

For many years, i-vector based speaker embedding techniques were the dom...
research
03/20/2020

Improving Embedding Extraction for Speaker Verification with Ladder Network

Speaker verification is an established yet challenging task in speech pr...
research
02/10/2020

An empirical analysis of information encoded in disentangled neural speaker representations

The primary characteristic of robust speaker representations is that the...
research
01/14/2019

Exploring Transfer Learning for Low Resource Emotional TTS

During the last few years, spoken language technologies have known a big...

Please sign up or login with your details

Forgot password? Click here to reset