ChannelAugment: Improving generalization of multi-channel ASR by training with input channel randomization

09/23/2021
by   Marco Gaudesi, et al.
0

End-to-end (E2E) multi-channel ASR systems show state-of-the-art performance in far-field ASR tasks by joint training of a multi-channel front-end along with the ASR model. The main limitation of such systems is that they are usually trained with data from a fixed array geometry, which can lead to degradation in accuracy when a different array is used in testing. This makes it challenging to deploy these systems in practice, as it is costly to retrain and deploy different models for various array configurations. To address this, we present a simple and effective data augmentation technique, which is based on randomly dropping channels in the multi-channel audio input during training, in order to improve the robustness to various array configurations at test time. We call this technique ChannelAugment, in contrast to SpecAugment (SA) which drops time and/or frequency components of a single channel input audio. We apply ChannelAugment to the Spatial Filtering (SF) and Minimum Variance Distortionless Response (MVDR) neural beamforming approaches. For SF, we observe 10.6 different numbers of microphones. For MVDR, we achieve a 74 training time without causing degradation of recognition accuracy.

READ FULL TEXT
research
06/15/2021

Multi-channel Opus compression for far-field automatic speech recognition with a fixed bitrate budget

Automatic speech recognition (ASR) in the cloud allows the use of larger...
research
02/01/2020

Multi-channel Acoustic Modeling using Mixed Bitrate OPUS Compression

Recent literature has shown that a learned front end with multi-channel ...
research
11/03/2018

Multi-View Networks For Multi-Channel Audio Classification

In this paper we introduce the idea of multi-view networks for sound cla...
research
03/31/2022

Exploiting Single-Channel Speech for Multi-Channel End-to-End Speech Recognition: A Comparative Study

Recently, the end-to-end training approach for multi-channel ASR has sho...
research
03/25/2022

Spatial Processing Front-End For Distant ASR Exploiting Self-Attention Channel Combinator

We present a novel multi-channel front-end based on channel shortening w...
research
07/09/2021

Noisy Training Improves E2E ASR for the Edge

Automatic speech recognition (ASR) has become increasingly ubiquitous on...
research
10/11/2022

MFCCA:Multi-Frame Cross-Channel attention for multi-speaker ASR in Multi-party meeting scenario

Recently cross-channel attention, which better leverages multi-channel s...

Please sign up or login with your details

Forgot password? Click here to reset