Universal speaker recognition encoders for different speech segments duration

10/28/2022
by   Sergey Novoselov, et al.
0

Creating universal speaker encoders which are robust for different acoustic and speech duration conditions is a big challenge today. According to our observations systems trained on short speech segments are optimal for short phrase speaker verification and systems trained on long segments are superior for long segments verification. A system trained simultaneously on pooled short and long speech segments does not give optimal verification results and usually degrades both for short and long segments. This paper addresses the problem of creating universal speaker encoders for different speech segments duration. We describe our simple recipe for training universal speaker encoder for any type of selected neural network architecture. According to our evaluation results of wav2vec-TDNN based systems obtained for NIST SRE and VoxCeleb1 benchmarks the proposed universal encoder provides speaker verification improvements in case of different enrollment and test speech segment duration. The key feature of the proposed encoder is that it has the same inference time as the selected neural network architecture.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/22/2019

A Deep Neural Network for Short-Segment Speaker Recognition

Todays interactive devices such as smart-phone assistants and smart spea...
research
05/07/2020

Segment Aggregation for short utterances speaker verification using raw waveforms

Most studies on speaker verification systems focus on long-duration utte...
research
10/10/2016

Investigation of Synthetic Speech Detection Using Frame- and Segment-Specific Importance Weighting

Speaker verification systems are vulnerable to spoofing attacks which pr...
research
10/06/2020

A Unified Deep Learning Framework for Short-Duration Speaker Verification in Adverse Environments

Speaker verification (SV) has recently attracted considerable research i...
research
03/20/2023

Dual-stream Time-Delay Neural Network with Dynamic Global Filter for Speaker Verification

The time-delay neural network (TDNN) is one of the state-of-the-art mode...
research
12/03/2018

Novel Quality Metric for Duration Variability Compensation in Speaker Verification using i-Vectors

Automatic speaker verification (ASV) is the process to recognize persons...
research
07/06/2023

DSARSR: Deep Stacked Auto-encoders Enhanced Robust Speaker Recognition

Speaker recognition is a biometric modality that utilizes the speaker's ...

Please sign up or login with your details

Forgot password? Click here to reset