A Deep Neural Network for Short-Segment Speaker Recognition

07/22/2019
by   Amirhossein Hajavi, et al.
0

Todays interactive devices such as smart-phone assistants and smart speakers often deal with short-duration speech segments. As a result, speaker recognition systems integrated into such devices will be much better suited with models capable of performing the recognition task with short-duration utterances. In this paper, a new deep neural network, UtterIdNet, capable of performing speaker recognition with short speech segments is proposed. Our proposed model utilizes a novel architecture that makes it suitable for short-segment speaker recognition through an efficiently increased use of information in short speech segments. UtterIdNet has been trained and tested on the VoxCeleb datasets, the latest benchmarks in speaker recognition. Evaluations for different segment durations show consistent and stable performance for short segments, with significant improvement over the previous models for segments of 2 seconds, 1 second, and especially sub-second durations (250 ms and 500 ms).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/28/2022

Universal speaker recognition encoders for different speech segments duration

Creating universal speaker encoders which are robust for different acous...
research
11/08/2022

BER: Balanced Error Rate For Speaker Diarization

DER is the primary metric to evaluate diarization performance while faci...
research
05/07/2020

Crop Aggregating for short utterances speaker verification using raw waveforms

Most studies on speaker verification systems focus on long-duration utte...
research
12/03/2018

Novel Quality Metric for Duration Variability Compensation in Speaker Verification using i-Vectors

Automatic speaker verification (ASV) is the process to recognize persons...
research
03/30/2022

Generation of Speaker Representations Using Heterogeneous Training Batch Assembly

In traditional speaker diarization systems, a well-trained speaker model...
research
11/15/2017

Human and Machine Speaker Recognition Based on Short Trivial Events

Trivial events are ubiquitous in human to human conversations, e.g., cou...
research
06/16/2021

Detection of Consonant Errors in Disordered Speech Based on Consonant-vowel Segment Embedding

Speech sound disorder (SSD) refers to a type of developmental disorder i...

Please sign up or login with your details

Forgot password? Click here to reset