rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method

06/09/2019
by   Zheng-Hua Tan, et al.
0

This paper presents an unsupervised segment-based method for robust voice activity detection (rVAD). The method consists of two passes of denoising followed by a voice activity detection (VAD) stage. In the first pass, high-energy segments in a speech signal are detected by using a posteriori signal-to-noise ratio (SNR) weighted energy difference and if no pitch is detected within a segment, the segment is considered as a high-energy noise segment and set to zero. In the second pass, the speech signal is denoised by a speech enhancement method, for which several methods are explored. Next, neighbouring frames with pitch are grouped together to form pitch segments, and based on speech statistics, the pitch segments are further extended from both ends in order to include both voiced and unvoiced sounds and likely non-speech parts as well. In the end, a posteriori SNR weighted energy difference is applied to the extended pitch segments of the denoised speech signal for detecting voice activity. We evaluate the VAD performance of the proposed method using two databases, RATS and Aurora-2, which contain a large variety of noise conditions. The rVAD method is further evaluated, in terms of speaker verification performance, on the RedDots 2016 challenge database and its noise-corrupted versions. Experiment results show that rVAD is compared favourably with a number of existing methods. In addition, we present a modified version of rVAD where computationally intensive pitch extraction is replaced by computationally efficient spectral flatness calculation. The modified version significantly reduces the computational complexity at the cost of moderately inferior VAD performance, which is an advantage when processing a large amount of data and running on low resource devices. The source code of rVAD is made publicly available.

READ FULL TEXT

page 4

page 6

page 9

page 13

research
10/24/2022

Brouhaha: multi-task training for voice activity detection, speech-to-noise ratio, and C50 room acoustics estimation

Most automatic speech processing systems are sensitive to the acoustic e...
research
02/20/2023

Real-Time Speech Enhancement Using Spectral Subtraction with Minimum Statistics and Spectral Floor

An initial real-time speech enhancement method is presented to reduce th...
research
10/23/2020

Speech enhancement aided end-to-end multi-task learning for voice activity detection

Robust voice activity detection (VAD) is a challenging task in low signa...
research
05/29/2020

SNR-based teachers-student technique for speech enhancement

It is very challenging for speech enhancement methods to achieves robust...
research
03/27/2020

A super scalable algorithm for short segment detection

In many applications such as copy number variant (CNV) detection, the go...
research
10/19/2021

Temporal separation of whale vocalizations from background oceanic noise using a power calculation

The process of analyzing audio signals in search of cetacean vocalizatio...

Please sign up or login with your details

Forgot password? Click here to reset