SG-VAD: Stochastic Gates Based Speech Activity Detection

10/28/2022
by   Jonathan Svirsky, et al.
0

We propose a novel voice activity detection (VAD) model in a low-resource environment. Our key idea is to model VAD as a denoising task, and construct a network that is designed to identify nuisance features for a speech classification task. We train the model to simultaneously identify irrelevant features while predicting the type of speech event. Our model contains only 7.8K parameters, outperforms the previously proposed methods on the AVA-Speech evaluation set, and provides comparative results on the HAVIC dataset. We present its architecture, experimental results, and ablation study on the model's components. We publish the code and the models here https://www.github.com/jsvir/vad.

READ FULL TEXT
research
10/26/2022

Multitask Detection of Speaker Changes, Overlapping Speech and Voice Activity Using wav2vec 2.0

Self-supervised learning approaches have lately achieved great success o...
research
03/28/2023

Unsupervised Pre-Training For Data-Efficient Text-to-Speech On Low Resource Languages

Neural text-to-speech (TTS) models can synthesize natural human speech w...
research
04/17/2018

Classifying Antimicrobial and Multifunctional Peptides with Bayesian Network Models

Bayesian network models are finding success in characterizing enzyme-cat...
research
04/14/2020

Deep Learning Models for Multilingual Hate Speech Detection

Hate speech detection is a challenging problem with most of the datasets...
research
03/04/2013

Denoising Deep Neural Networks Based Voice Activity Detection

Recently, the deep-belief-networks (DBN) based voice activity detection ...
research
05/23/2023

Masked Modeling Duo for Speech: Specializing General-Purpose Audio Representation to Speech using Denoising Distillation

Self-supervised learning general-purpose audio representations have demo...
research
09/24/2022

Joint Speech Activity and Overlap Detection with Multi-Exit Architecture

Overlapped speech detection (OSD) is critical for speech applications in...

Please sign up or login with your details

Forgot password? Click here to reset