DropClass and DropAdapt: Dropping classes for deep speaker representation learning

02/02/2020
by   Chau Luu, et al.
0

Many recent works on deep speaker embeddings train their feature extraction networks on large classification tasks, distinguishing between all speakers in a training set. Empirically, this has been shown to produce speaker-discriminative embeddings, even for unseen speakers. However, it is not clear that this is the optimal means of training embeddings that generalize well. This work proposes two approaches to learning embeddings, based on the notion of dropping classes during training. We demonstrate that both approaches can yield performance gains in speaker verification tasks. The first proposed method, DropClass, works via periodically dropping a random subset of classes from the training data and the output layer throughout training, resulting in a feature extractor trained on many different classification tasks. Combined with an additive angular margin loss, this method can yield a 7.9 improvement in equal error rate (EER) over a strong baseline on VoxCeleb. The second proposed method, DropAdapt, is a means of adapting a trained model to a set of enrolment speakers in an unsupervised manner. This is performed by fine-tuning a model on only those classes which produce high probability predictions when the enrolment speakers are used as input, again also dropping the relevant rows from the output layer. This method yields a large 13.2 relative improvement in EER on VoxCeleb. The code for this paper has been made publicly available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/31/2020

Designing Neural Speaker Embeddings with Meta Learning

Neural speaker embeddings trained using classification objectives have d...
research
02/06/2019

Centroid-based deep metric learning for speaker recognition

Speaker embedding models that utilize neural networks to map utterances ...
research
01/07/2020

Learning Speaker Embedding with Momentum Contrast

Speaker verification can be formulated as a representation learning task...
research
11/10/2019

Improved Large-margin Softmax Loss for Speaker Diarisation

Speaker diarisation systems nowadays use embeddings generated from speec...
research
08/18/2015

Deep clustering: Discriminative embeddings for segmentation and separation

We address the problem of acoustic source separation in a deep learning ...
research
06/25/2019

LipReading with 3D-2D-CNN BLSTM-HMM and word-CTC models

In recent years, deep learning based machine lipreading has gained promi...
research
06/07/2012

Dispelling Classes Gradually to Improve Quality of Feature Reduction Approaches

Feature reduction is an important concept which is used for reducing dim...

Please sign up or login with your details

Forgot password? Click here to reset