Disentangled Speaker Representation Learning via Mutual Information Minimization

08/17/2022
by   Sung Hwan Mun, et al.
0

Domain mismatch problem caused by speaker-unrelated feature has been a major topic in speaker recognition. In this paper, we propose an explicit disentanglement framework to unravel speaker-relevant features from speaker-unrelated features via mutual information (MI) minimization. To achieve our goal of minimizing MI between speaker-related and speaker-unrelated features, we adopt a contrastive log-ratio upper bound (CLUB), which exploits the upper bound of MI. Our framework is constructed in a 3-stage structure. First, in the front-end encoder, input speech is encoded into shared initial embedding. Next, in the decoupling block, shared initial embedding is split into separate speaker-related and speaker-unrelated embeddings. Finally, disentanglement is conducted by MI minimization in the last stage. Experiments on Far-Field Speaker Verification Challenge 2022 (FFSVC2022) demonstrate that our proposed framework is effective for disentanglement. Also, to utilize domain-unknown datasets containing numerous speakers, we pre-trained the front-end encoder with VoxCeleb datasets. We then fine-tuned the speaker embedding model in the disentanglement framework with FFSVC 2022 dataset. The experimental results show that fine-tuning with a disentanglement framework on a existing pre-trained model is valid and can further improve performance.

READ FULL TEXT

page 1

page 6

research
12/12/2020

DEAAN: Disentangled Embedding and Adversarial Adaptation Network for Robust Speaker Representation Learning

Despite speaker verification has achieved significant performance improv...
research
05/16/2022

PRISM: Pre-trained Indeterminate Speaker Representation Model for Speaker Diarization and Speaker Verification

Speaker embedding has been a fundamental feature for speaker-related tas...
research
11/27/2019

Powerful Speaker Embedding Training Framework by Adversarially Disentangled Identity Representation

The main challenge of speaker verification in the wild is the interferen...
research
08/18/2022

Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion

One-shot voice conversion (VC) with only a single target speaker's speec...
research
10/22/2020

Unsupervised Representation Learning for Speaker Recognition via Contrastive Equilibrium Learning

In this paper, we propose a simple but powerful unsupervised learning me...
research
06/22/2020

CLUB: A Contrastive Log-ratio Upper Bound of Mutual Information

Mutual information (MI) minimization has gained considerable interests i...
research
12/01/2018

Learning Speaker Representations with Mutual Information

Learning good representations is of crucial importance in deep learning....

Please sign up or login with your details

Forgot password? Click here to reset