Improving Speaker Identification for Shared Devices by Adapting Embeddings to Speaker Subsets

09/06/2021
by   Zhenning Tan, et al.
0

Speaker identification typically involves three stages. First, a front-end speaker embedding model is trained to embed utterance and speaker profiles. Second, a scoring function is applied between a runtime utterance and each speaker profile. Finally, the speaker is identified using nearest neighbor according to the scoring metric. To better distinguish speakers sharing a device within the same household, we propose a household-adapted nonlinear mapping to a low dimensional space to complement the global scoring metric. The combined scoring function is optimized on labeled or pseudo-labeled speaker utterances. With input dropout, the proposed scoring model reduces EER by 45-71 household. On real-world internal data, the EER reduction is 49.2 visualization, we also show that clusters formed by household-adapted speaker embeddings are more compact and uniformly distributed, compared to clusters formed by global embeddings before adaptation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2021

Graph-based Label Propagation for Semi-Supervised Speaker Identification

Speaker identification in the household scenario (e.g., for smart speake...
research
05/16/2022

PRISM: Pre-trained Indeterminate Speaker Representation Model for Speaker Diarization and Speaker Verification

Speaker embedding has been a fundamental feature for speaker-related tas...
research
04/05/2021

Dr-Vectors: Decision Residual Networks and an Improved Loss for Speaker Recognition

Many neural network speaker recognition systems model each speaker using...
research
01/14/2020

Supervised Speaker Embedding De-Mixing in Two-Speaker Environment

In this work, a speaker embedding de-mixing approach is proposed. Instea...
research
02/20/2023

Towards Measuring and Scoring Speaker Diarization Fairness

Speaker diarization, or the task of finding "who spoke and when", is now...
research
07/08/2022

Graph-based Multi-View Fusion and Local Adaptation: Mitigating Within-Household Confusability for Speaker Identification

Speaker identification (SID) in the household scenario (e.g., for smart ...

Please sign up or login with your details

Forgot password? Click here to reset