Learning from human perception to improve automatic speaker verification in style-mismatched conditions

06/28/2022
by   Amber Afshan, et al.
0

Our prior experiments show that humans and machines seem to employ different approaches to speaker discrimination, especially in the presence of speaking style variability. The experiments examined read versus conversational speech. Listeners focused on speaker-specific idiosyncrasies while "telling speakers together", and on relative distances in a shared acoustic space when "telling speakers apart". However, automatic speaker verification (ASV) systems use the same loss function irrespective of target or non-target trials. To improve ASV performance in the presence of style variability, insights learnt from human perception are used to design a new training loss function that we refer to as "CllrCE loss". CllrCE loss uses both speaker-specific idiosyncrasies and relative acoustic distances between speakers to train the ASV system. When using the UCLA speaker variability database, in the x-vector and conditioning setups, CllrCE loss results in significant relative improvements in EER by 1-66 x-vector baseline. Using the SITW evaluation tasks, which involve different conversational speech tasks, the proposed loss combined with self-attention conditioning results in significant relative improvements in EER by 2-5 minDCF by 6-12 consistent only with conditioning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/08/2020

Speaker discrimination in humans and machines: Effects of speaking style variability

Does speaking style variation affect humans' ability to distinguish indi...
research
06/28/2022

Attention-based conditioning methods using variable frame rate for style-robust speaker verification

We propose an approach to extract speaker embeddings that are robust to ...
research
10/30/2021

Speaker conditioning of acoustic models using affine transformation for multi-speaker speech recognition

This study addresses the problem of single-channel Automatic Speech Reco...
research
04/07/2017

Joint Probabilistic Linear Discriminant Analysis

Standard probabilistic discriminant analysis (PLDA) for speaker recognit...
research
08/08/2020

Variable frame rate-based data augmentation to handle speaking-style variability for automatic speaker verification

The effects of speaking-style variability on automatic speaker verificat...
research
02/07/2020

LEAP System for SRE19 Challenge – Improvements and Error Analysis

The NIST Speaker Recognition Evaluation - Conversational Telephone Speec...
research
11/09/2018

Can We Use Speaker Recognition Technology to Attack Itself? Enhancing Mimicry Attacks Using Automatic Target Speaker Selection

We consider technology-assisted mimicry attacks in the context of automa...

Please sign up or login with your details

Forgot password? Click here to reset