Self-Adaptive Soft Voice Activity Detection using Deep Neural Networks for Robust Speaker Verification

09/26/2019
by   Youngmoon Jung, et al.
0

Voice activity detection (VAD), which classifies frames as speech or non-speech, is an important module in many speech applications including speaker verification. In this paper, we propose a novel method, called self-adaptive soft VAD, to incorporate a deep neural network (DNN)-based VAD into a deep speaker embedding system. The proposed method is a combination of the following two approaches. The first approach is soft VAD, which performs a soft selection of frame-level features extracted from a speaker feature extractor. The frame-level features are weighted by their corresponding speech posteriors estimated from the DNN-based VAD, and then aggregated to generate a speaker embedding. The second approach is self-adaptive VAD, which fine-tunes the pre-trained VAD on the speaker verification data to reduce the domain mismatch. Here, we introduce two unsupervised domain adaptation (DA) schemes, namely speech posterior-based DA (SP-DA) and joint learning-based DA (JL-DA). Experiments on a Korean speech database demonstrate that the verification performance is improved significantly in real-world environments by using self-adaptive soft VAD.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/12/2019

Personal VAD: Speaker-Conditioned Voice Activity Detection

In this paper, we propose "personal VAD", a system to detect the voice a...
research
10/28/2022

Laugh Betrays You? Learning Robust Speaker Representation From Speech Containing Non-Verbal Fragments

The success of automatic speaker verification shows that discriminative ...
research
08/13/2020

MLNET: An Adaptive Multiple Receptive-field Attention Neural Network for Voice Activity Detection

Voice activity detection (VAD) makes a distinction between speech and no...
research
06/28/2022

Speaker Verification in Multi-Speaker Environments Using Temporal Feature Fusion

Verifying the identity of a speaker is crucial in modern human-machine i...
research
05/08/2020

Multi-Task Network for Noise-Robust Keyword Spotting and Speaker Verification using CTC-based Soft VAD and Global Query Attention

Keyword spotting (KWS) and speaker verification (SV) have been studied i...
research
09/24/2021

Optimized Power Normalized Cepstral Coefficients towards Robust Deep Speaker Verification

After their introduction to robust speech recognition, power normalized ...
research
05/03/2023

Improved Vocal Effort Transfer Vector Estimation for Vocal Effort-Robust Speaker Verification

Despite the maturity of modern speaker verification technology, its perf...

Please sign up or login with your details

Forgot password? Click here to reset