DeepAI AI Chat
Log In Sign Up

CNN self-attention voice activity detector

by   Amit Sofer, et al.

In this work we present a novel single-channel Voice Activity Detector (VAD) approach. We utilize a Convolutional Neural Network (CNN) which exploits the spatial information of the noisy input spectrum to extract frame-wise embedding sequence, followed by a Self Attention (SA) Encoder with a goal of finding contextual information from the embedding sequence. Different from previous works which were employed on each frame (with context frames) separately, our method is capable of processing the entire signal at once, and thus enabling long receptive field. We show that the fusion of CNN and SA architectures outperforms methods based solely on CNN and SA. Extensive experimental-study shows that our model outperforms previous models on real-life benchmarks, and provides State Of The Art (SOTA) results with relatively small and lightweight model.


Voice and accompaniment separation in music using self-attention convolutional neural network

Music source separation has been a popular topic in signal processing fo...

Double Path Networks for Sequence to Sequence Learning

Encoder-decoder based Sequence to Sequence learning (S2S) has made remar...

Capturing Multi-Resolution Context by Dilated Self-Attention

Self-attention has become an important and widely used neural network co...

DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding

Recurrent neural nets (RNN) and convolutional neural nets (CNN) are wide...

Automatic Lyrics Transcription using Dilated Convolutional Neural Networks with Self-Attention

Speech recognition is a well developed research field so that the curren...

Multi-Field De-interlacing using Deformable Convolution Residual Blocks and Self-Attention

Although deep learning has made significant impact on image/video restor...