A Comparative Re-Assessment of Feature Extractors for Deep Speaker Embeddings

07/30/2020
by   Xuechen Liu, et al.
0

Modern automatic speaker verification relies largely on deep neural networks (DNNs) trained on mel-frequency cepstral coefficient (MFCC) features. While there are alternative feature extraction methods based on phase, prosody and long-term temporal operations, they have not been extensively studied with DNN-based methods. We aim to fill this gap by providing extensive re-assessment of 14 feature extractors on VoxCeleb and SITW datasets. Our findings reveal that features equipped with techniques such as spectral centroids, group delay function, and integrated noise suppression provide promising alternatives to MFCCs for deep speaker embeddings extraction. Experimental results demonstrate up to 16.3% (VoxCeleb) and 25.1% (SITW) relative decrease in equal error rate (EER) to the baseline.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/10/2017

Deep Speaker Feature Learning for Text-independent Speaker Verification

Recently deep neural networks (DNNs) have been used to learn speaker fea...
research
09/24/2021

Optimized Power Normalized Cepstral Coefficients towards Robust Deep Speaker Verification

After their introduction to robust speech recognition, power normalized ...
research
09/08/2018

Dual-label Deep LSTM Dereverberation For Speaker Verification

In this paper, we present a reverberation removal approach for speaker v...
research
02/20/2021

Learnable MFCCs for Speaker Verification

We propose a learnable mel-frequency cepstral coefficient (MFCC) fronten...
research
02/10/2022

Learnable Nonlinear Compression for Robust Speaker Verification

In this study, we focus on nonlinear compression methods in spectral fea...
research
06/28/2023

Long-term Conversation Analysis: Exploring Utility and Privacy

The analysis of conversations recorded in everyday life requires privacy...
research
08/09/2017

Speaker Diarization using Deep Recurrent Convolutional Neural Networks for Speaker Embeddings

In this paper we propose a new method of speaker diarization that employ...

Please sign up or login with your details

Forgot password? Click here to reset