MLNET: An Adaptive Multiple Receptive-field Attention Neural Network for Voice Activity Detection

08/13/2020
by   Zhenpeng Zheng, et al.
0

Voice activity detection (VAD) makes a distinction between speech and non-speech and its performance is of crucial importance for speech based services. Recently, deep neural network (DNN)-based VADs have achieved better performance than conventional signal processing methods. The existed DNNbased models always handcrafted a fixed window to make use of the contextual speech information to improve the performance of VAD. However, the fixed window of contextual speech information can't handle various unpredicatable noise environments and highlight the critical speech information to VAD task. In order to solve this problem, this paper proposed an adaptive multiple receptive-field attention neural network, called MLNET, to finish VAD task. The MLNET leveraged multi-branches to extract multiple contextual speech information and investigated an effective attention block to weight the most crucial parts of the context for final classification. Experiments in real-world scenarios demonstrated that the proposed MLNET-based model outperformed other baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/26/2019

Self-Adaptive Soft Voice Activity Detection using Deep Neural Networks for Robust Speaker Verification

Voice activity detection (VAD), which classifies frames as speech or non...
research
05/20/2020

Statistical and Neural Network Based Speech Activity Detection in Non-Stationary Acoustic Environments

Speech activity detection (SAD), which often rests on the fact that the ...
research
02/03/2020

End-to-End Automatic Speech Recognition Integrated With CTC-Based Voice Activity Detection

This paper integrates a voice activity detection (VAD) function with end...
research
03/04/2013

Denoising Deep Neural Networks Based Voice Activity Detection

Recently, the deep-belief-networks (DBN) based voice activity detection ...
research
10/23/2020

Speech Activity Detection Based on Multilingual Speech Recognition System

To better model the contextual information and increase the generalizati...
research
05/17/2020

Voice Activity Detection Scheme by Combining DNN Model with GMM Model

Due to the superior modeling ability of deep neural network (DNN), it is...
research
06/25/2023

PrimaDNN': A Characteristics-aware DNN Customization for Singing Technique Detection

Professional vocalists modulate their voice timbre or pitch to make thei...

Please sign up or login with your details

Forgot password? Click here to reset