A Unified Deep Learning Framework for Short-Duration Speaker Verification in Adverse Environments

10/06/2020
by   Youngmoon Jung, et al.
0

Speaker verification (SV) has recently attracted considerable research interest due to the growing popularity of virtual assistants. At the same time, there is an increasing requirement for an SV system: it should be robust to short speech segments, especially in noisy and reverberant environments. In this paper, we consider one more important requirement for practical applications: the system should be robust to an audio stream containing long non-speech segments, where a voice activity detection (VAD) is not applied. To meet these two requirements, we introduce feature pyramid module (FPM)-based multi-scale aggregation (MSA) and self-adaptive soft VAD (SAS-VAD). We present the FPM-based MSA to deal with short speech segments in noisy and reverberant environments. Also, we use the SAS-VAD to increase the robustness to long non-speech segments. To further improve the robustness to acoustic distortions (i.e., noise and reverberation), we apply a masking-based speech enhancement (SE) method. We combine SV, VAD, and SE models in a unified deep learning framework and jointly train the entire network in an end-to-end manner. To the best of our knowledge, this is the first work combining these three models in a deep learning framework. We conduct experiments on Korean indoor (KID) and VoxCeleb datasets, which are corrupted by noise and reverberation. The results show that the proposed method is effective for SV in the challenging conditions and performs better than the baseline i-vector and deep speaker embedding systems.

READ FULL TEXT

page 6

page 7

page 8

page 10

page 14

page 15

page 16

page 17

research
03/22/2022

Joint Noise Reduction and Listening Enhancement for Full-End Speech Enhancement

Speech enhancement (SE) methods mainly focus on recovering clean speech ...
research
10/03/2021

PL-EESR: Perceptual Loss Based END-TO-END Robust Speaker Representation Extraction

Speech enhancement aims to improve the perceptual quality of the speech ...
research
10/28/2022

Universal speaker recognition encoders for different speech segments duration

Creating universal speaker encoders which are robust for different acous...
research
09/14/2023

Diff-SV: A Unified Hierarchical Framework for Noise-Robust Speaker Verification Using Score-Based Diffusion Probabilistic Models

Background noise considerably reduces the accuracy and reliability of sp...
research
02/20/2023

Improving Speech Enhancement via Event-based Query

Existing deep learning based speech enhancement (SE) methods either use ...
research
01/07/2021

Attention-based multi-task learning for speech-enhancement and speaker-identification in multi-speaker dialogue scenario

Multi-task learning (MTL) and attention mechanism have been proven to ef...
research
07/03/2023

Spatial-temporal Graph Based Multi-channel Speaker Verification With Ad-hoc Microphone Arrays

The performance of speaker verification degrades significantly in advers...

Please sign up or login with your details

Forgot password? Click here to reset