MarbleNet: Deep 1D Time-Channel Separable Convolutional Neural Network for Voice Activity Detection

10/26/2020
by   Fei Jia, et al.
0

We present MarbleNet, an end-to-end neural network for Voice Activity Detection (VAD). MarbleNet is a deep residual network composed from blocks of 1D time-channel separable convolution, batch-normalization, ReLU and dropout layers. When compared to a state-of-the-art VAD model, MarbleNet is able to achieve similar performance with roughly 1/10-th the parameter cost. We further conduct extensive ablation studies on different training methods and choices of parameters in order to study the robustness of MarbleNet in real-world VAD tasks.

READ FULL TEXT

page 2

page 4

research
11/28/2016

An End-to-End Architecture for Keyword Spotting and Voice Activity Detection

We propose a single neural network architecture for two tasks: on-line k...
research
08/08/2020

Stacked 1D convolutional networks for end-to-end small footprint voice trigger detection

We propose a stacked 1D convolutional neural network (S1DCNN) for end-to...
research
04/05/2019

Jasper: An End-to-End Convolutional Neural Acoustic Model

In this paper, we report state-of-the-art results on LibriSpeech among e...
research
07/16/2019

Single-bit-per-weight deep convolutional neural networks without batch-normalization layers for embedded systems

Batch-normalization (BN) layers are thought to be an integrally importan...
research
11/14/2016

Identity Matters in Deep Learning

An emerging design principle in deep learning is that each layer of a de...
research
10/27/2021

RF-Based Human Activity Recognition Using Signal Adapted Convolutional Neural Network

Human Activity Recognition (HAR) plays a critical role in a wide range o...
research
12/06/2022

BC-VAD: A Robust Bone Conduction Voice Activity Detection

Voice Activity Detection (VAD) is a fundamental module in many audio app...

Please sign up or login with your details

Forgot password? Click here to reset