Intra-class variation reduction of speaker representation in disentanglement framework

08/04/2020
by   Yoohwan Kwon, et al.
0

In this paper, we propose an effective training strategy to ex-tract robust speaker representations from a speech signal. Oneof the key challenges in speaker recognition tasks is to learnlatent representations or embeddings containing solely speakercharacteristic information in order to be robust in terms of intra-speaker variations. By modifying the network architecture togenerate both speaker-related and speaker-unrelated representa-tions, we exploit a learning criterion which minimizes the mu-tual information between these disentangled embeddings. Wealso introduce an identity change loss criterion which utilizes areconstruction error to different utterances spoken by the samespeaker. Since the proposed criteria reduce the variation ofspeaker characteristics caused by changes in background envi-ronment or spoken content, the resulting embeddings of eachspeaker become more consistent. The effectiveness of the pro-posed method is demonstrated through two tasks; disentangle-ment performance, and improvement of speaker recognition ac-curacy compared to the baseline model on a benchmark dataset,VoxCeleb1. Ablation studies also show the impact of each cri-terion on overall performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/21/2021

A Study of F0 Modification for X-Vector Based Speech Pseudonymization Across Gender

Speech pseudonymization aims at altering a speech signal to map the iden...
research
02/12/2021

Content-Aware Speaker Embeddings for Speaker Diarisation

Recent speaker diarisation systems often convert variable length speech ...
research
11/01/2022

Disentangled representation learning for multilingual speaker recognition

The goal of this paper is to train speaker embeddings that are robust to...
research
10/26/2021

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

Self-supervised learning (SSL) achieves great success in speech recognit...
research
09/12/2018

Frame-level speaker embeddings for text-independent speaker recognition and analysis of end-to-end model

In this paper, we propose a Convolutional Neural Network (CNN) based spe...
research
10/07/2021

Disentangled dimensionality reduction for noise-robust speaker diarisation

The objective of this work is to train noise-robust speaker embeddings f...
research
08/21/2021

Curricular SincNet: Towards Robust Deep Speaker Recognition by Emphasizing Hard Samples in Latent Space

Deep learning models have become an increasingly preferred option for bi...

Please sign up or login with your details

Forgot password? Click here to reset