Spatial Processing Front-End For Distant ASR Exploiting Self-Attention Channel Combinator

03/25/2022
by   Dushyant Sharma, et al.
0

We present a novel multi-channel front-end based on channel shortening with theWeighted Prediction Error (WPE) method followed by a fixed MVDR beamformer used in combination with a recently proposed self-attention-based channel combination (SACC) scheme, for tackling the distant ASR problem. We show that the proposed system used as part of a ContextNet based end-to-end (E2E) ASR system outperforms leading ASR systems as demonstrated by a 21.6 relative WER on a multi-channel LibriSpeech playback dataset. We also show how dereverberation prior to beamforming is beneficial and compare the WPE method with a modified neural channel shortening approach. An analysis of the non-intrusive estimate of the signal C50 confirms that the 8 channel WPE method provides significant dereverberation of the signals (13.6 dB improvement). We also show how the weights of the SACC system allow the extraction of accurate spatial information which can be beneficial for other speech processing applications like diarization.

READ FULL TEXT
research
07/02/2021

Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition

Attention-based end-to-end automatic speech recognition (ASR) systems ha...
research
09/10/2021

Self-Attention Channel Combinator Frontend for End-to-End Multichannel Far-field Speech Recognition

When a sufficiently large far-field training data is presented, jointly ...
research
04/07/2021

Capturing Multi-Resolution Context by Dilated Self-Attention

Self-attention has become an important and widely used neural network co...
research
02/18/2019

Self-Attention Aligner: A Latency-Control End-to-End Model for ASR Using Self-Attention Network and Chunk-Hopping

Self-attention network, an attention-based feedforward neural network, h...
research
03/31/2022

Exploiting Single-Channel Speech for Multi-Channel End-to-End Speech Recognition: A Comparative Study

Recently, the end-to-end training approach for multi-channel ASR has sho...
research
09/23/2021

ChannelAugment: Improving generalization of multi-channel ASR by training with input channel randomization

End-to-end (E2E) multi-channel ASR systems show state-of-the-art perform...
research
02/18/2021

Gaussian Kernelized Self-Attention for Long Sequence Data and Its Application to CTC-based Speech Recognition

Self-attention (SA) based models have recently achieved significant perf...

Please sign up or login with your details

Forgot password? Click here to reset