Graph-based Multi-View Fusion and Local Adaptation: Mitigating Within-Household Confusability for Speaker Identification

07/08/2022
by   Long Chen, et al.
0

Speaker identification (SID) in the household scenario (e.g., for smart speakers) is an important but challenging problem due to limited number of labeled (enrollment) utterances, confusable voices, and demographic imbalances. Conventional speaker recognition systems generalize from a large random sample of speakers, causing the recognition to underperform for households drawn from specific cohorts or otherwise exhibiting high confusability. In this work, we propose a graph-based semi-supervised learning approach to improve household-level SID accuracy and robustness with locally adapted graph normalization and multi-signal fusion with multi-view graphs. Unlike other work on household SID, fairness, and signal fusion, this work focuses on speaker label inference (scoring) and provides a simple solution to realize household-specific adaptation and multi-signal fusion without tuning the embeddings or training a fusion network. Experiments on the VoxCeleb dataset demonstrate that our approach consistently improves the performance across households with different customer cohorts and degrees of confusability.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2021

Graph-based Label Propagation for Semi-Supervised Speaker Identification

Speaker identification in the household scenario (e.g., for smart speake...
research
02/06/2021

Speaker attribution with voice profiles by graph-based semi-supervised learning

Speaker attribution is required in many real-world applications, such as...
research
03/01/2022

Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR

Graph-based temporal classification (GTC), a generalized form of the con...
research
01/18/2015

Pairwise Constraint Propagation on Multi-View Data

This paper presents a graph-based learning approach to pairwise constrai...
research
09/06/2021

Improving Speaker Identification for Shared Devices by Adapting Embeddings to Speaker Subsets

Speaker identification typically involves three stages. First, a front-e...
research
02/23/2022

Improving fairness in speaker verification via Group-adapted Fusion Network

Modern speaker verification models use deep neural networks to encode ut...
research
10/24/2021

Learning Speaker Representation with Semi-supervised Learning approach for Speaker Profiling

Speaker profiling, which aims to estimate speaker characteristics such a...

Please sign up or login with your details

Forgot password? Click here to reset