Disentangled representation learning for multilingual speaker recognition

11/01/2022
by   Kihyun Nam, et al.
0

The goal of this paper is to train speaker embeddings that are robust to bilingual speaking scenario. The majority of the world's population speak at least two languages; however, most speaker recognition systems fail to recognise the same speaker when speaking in different languages. Popular speaker recognition evaluation sets do not consider the bilingual scenario, making it difficult to analyse the effect of bilingual speakers on speaker recognition performance. This paper proposes a new large-scale evaluation set derived from VoxCeleb that considers bilingual scenarios. We also introduce a representation learning strategy, which disentangles language information from speaker representation to account for the bilingual scenario. This language-disentangled representation learning strategy can be adapted to existing models with small changes to the training pipeline. Experimental results demonstrate that the baseline models suffer significant performance degradation when evaluated on the proposed bilingual test set. On the contrary, the model trained with the proposed disentanglement strategy shows significant improvement under the bilingual evaluation scenario while simultaneously retaining competitive performance on existing monolingual test sets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/26/2020

In defence of metric learning for speaker recognition

The objective of this paper is 'open-set' speaker recognition of unseen ...
research
08/04/2020

Intra-class variation reduction of speaker representation in disentanglement framework

In this paper, we propose an effective training strategy to ex-tract rob...
research
10/29/2020

The ins and outs of speaker recognition: lessons from VoxSRC 2020

The VoxCeleb Speaker Recognition Challenge (VoxSRC) at Interspeech 2020 ...
research
02/25/2020

Speech2Phone: A Multilingual and Text Independent Speaker Identification Model

Voice recognition is an area with a wide application potential. Speaker ...
research
01/07/2020

Learning Speaker Embedding with Momentum Contrast

Speaker verification can be formulated as a representation learning task...
research
03/04/2022

On the relevance of language in speaker recognition

This paper presents a new database collected from a bilingual speakers s...
research
05/04/2021

Exploring Disentanglement with Multilingual and Monolingual VQ-VAE

This work examines the content and usefulness of disentangled phone and ...

Please sign up or login with your details

Forgot password? Click here to reset