Centroid-based deep metric learning for speaker recognition

02/06/2019
by   Jixuan Wang, et al.
0

Speaker embedding models that utilize neural networks to map utterances to a space where distances reflect similarity between speakers have driven recent progress in the speaker recognition task. However, there is still a significant performance gap between recognizing speakers in the training set and unseen speakers. The latter case corresponds to the few-shot learning task, where a trained model is evaluated on unseen classes. Here, we optimize a speaker embedding model with prototypical network loss (PNL), a state-of-the-art approach for the few-shot image classification task. The resulting embedding model outperforms the state-of-the-art triplet loss based models in both speaker verification and identification tasks, for both seen and unseen speakers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/26/2020

In defence of metric learning for speaker recognition

The objective of this paper is 'open-set' speaker recognition of unseen ...
research
10/01/2019

Latent space representation for multi-target speaker detection and identification with a sparse dataset using Triplet neural networks

We present an approach to tackle the speaker recognition problem using T...
research
05/17/2022

Dynamic Recognition of Speakers for Consent Management by Contrastive Embedding Replay

Voice assistants record sound and can overhear conversations. Thus, a co...
research
02/02/2020

DropClass and DropAdapt: Dropping classes for deep speaker representation learning

Many recent works on deep speaker embeddings train their feature extract...
research
04/17/2019

Few Shot Speaker Recognition using Deep Neural Networks

The recent advances in deep learning are mostly driven by availability o...
research
11/01/2018

Designing an Effective Metric Learning Pipeline for Speaker Diarization

State-of-the-art speaker diarization systems utilize knowledge from exte...
research
04/06/2020

Meta-Learning for Short Utterance Speaker Recognition with Imbalance Length Pairs

In realistic settings, a speaker recognition system needs to identify a ...

Please sign up or login with your details

Forgot password? Click here to reset