Variable frame rate-based data augmentation to handle speaking-style variability for automatic speaker verification

08/08/2020
by   Amber Afshan, et al.
0

The effects of speaking-style variability on automatic speaker verification were investigated using the UCLA Speaker Variability database which comprises multiple speaking styles per speaker. An x-vector/PLDA (probabilistic linear discriminant analysis) system was trained with the SRE and Switchboard databases with standard augmentation techniques and evaluated with utterances from the UCLA database. The equal error rate (EER) was low when enrollment and test utterances were of the same style (e.g., 0.98 conversational speech, respectively), but it increased substantially when styles were mismatched between enrollment and test utterances. For instance, when enrolled with conversation utterances, the EER increased to 3.03 and 22.12 respectively. To reduce the effect of style mismatch, we propose an entropy-based variable frame rate technique to artificially generate style-normalized representations for PLDA adaptation. The proposed system significantly improved performance. In the aforementioned conditions, the EERs improved to 2.69 and 18.75 comparably to multi-style PLDA adaptation without the need for training data in different speaking styles per speaker.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/28/2022

Attention-based conditioning methods using variable frame rate for style-robust speaker verification

We propose an approach to extract speaker embeddings that are robust to ...
research
01/24/2022

Disentangling Style and Speaker Attributes for TTS Style Transfer

End-to-end neural TTS has shown improved performance in speech style tra...
research
09/01/2022

Joint Speaker Encoder and Neural Back-end Model for Fully End-to-End Automatic Speaker Verification with Multiple Enrollment Utterances

Conventional automatic speaker verification systems can usually be decom...
research
02/22/2022

Contrastive-mixup learning for improved speaker verification

This paper proposes a novel formulation of prototypical loss with mixup ...
research
04/07/2017

Joint Probabilistic Linear Discriminant Analysis

Standard probabilistic discriminant analysis (PLDA) for speaker recognit...
research
06/28/2022

Learning from human perception to improve automatic speaker verification in style-mismatched conditions

Our prior experiments show that humans and machines seem to employ diffe...
research
05/13/2023

Vocal Style Factorization for Effective Speaker Recognition in Affective Scenarios

The accuracy of automated speaker recognition is negatively impacted by ...

Please sign up or login with your details

Forgot password? Click here to reset