Improved Large-margin Softmax Loss for Speaker Diarisation

11/10/2019
by   Yassir Fathullah, et al.
0

Speaker diarisation systems nowadays use embeddings generated from speech segments in a bottleneck layer, which are needed to be discriminative for unseen speakers. It is well-known that large-margin training can improve the generalisation ability to unseen data, and its use in such open-set problems has been widespread. Therefore, this paper introduces a general approach to the large-margin softmax loss without any approximations to improve the quality of speaker embeddings for diarisation. Furthermore, a novel and simple way to stabilise training, when large-margin softmax is used, is proposed. Finally, to combat the effect of overlapping speech, different training margins are used to reduce the negative effect overlapping speech has on creating discriminative embeddings. Experiments on the AMI meeting corpus show that the use of large-margin softmax significantly improves the speaker error rate (SER). By using all hyper parameters of the loss in a unified way, further improvements were achieved which reached a relative SER reduction of 24.6 baseline. However, by training overlapping and single speaker speech samples with different margins, the best result was achieved, giving overall a 29.5 SER reduction relative to the baseline.

READ FULL TEXT
research
04/06/2019

Large Margin Softmax Loss for Speaker Verification

In neural network based speaker verification, speaker embedding is expec...
research
06/15/2021

Adaptive Margin Circle Loss for Speaker Verification

Deep-Neural-Network (DNN) based speaker verification sys-tems use the an...
research
10/31/2022

Probability-Dependent Gradient Decay in Large Margin Softmax

In the past few years, Softmax has become a common component in neural n...
research
04/08/2022

Scoring of Large-Margin Embeddings for Speaker Verification: Cosine or PLDA?

The emergence of large-margin softmax cross-entropy losses in training d...
research
05/20/2020

Investigation of Large-Margin Softmax in Neural Language Modeling

To encourage intra-class compactness and inter-class separability among ...
research
02/02/2020

DropClass and DropAdapt: Dropping classes for deep speaker representation learning

Many recent works on deep speaker embeddings train their feature extract...
research
06/04/2019

A Strong and Robust Baseline for Text-Image Matching

We review the current schemes of text-image matching models and propose ...

Please sign up or login with your details

Forgot password? Click here to reset