Unsupervised Neural Mask Estimator For Generalized Eigen-Value Beamforming Based ASR

11/28/2019
by   Rohit kumar, et al.
0

The state-of-art methods for acoustic beamforming in multi-channel ASR are based on a neural mask estimator that predicts the presence of speech and noise. These models are trained using a paired corpus of clean and noisy recordings (teacher model). In this paper, we attempt to move away from the requirements of having supervised clean recordings for training the mask estimator. The models based on signal enhancement and beamforming using multi-channel linear prediction serve as the required mask estimate. In this way, the model training can also be carried out on real recordings of noisy speech rather than simulated ones alone done in a typical teacher model. Several experiments performed on noisy and reverberant environments in the CHiME-3 corpus as well as the REVERB challenge corpus highlight the effectiveness of the proposed approach. The ASR results for the proposed approach provide performances that are significantly better than a teacher model trained on an out-of-domain dataset and on par with the oracle mask estimators trained on the in-domain dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/02/2019

Unsupervised training of neural mask-based beamforming

We present an unsupervised training approach for a neural network-based ...
research
06/19/2018

Speaker Adapted Beamforming for Multi-Channel Automatic Speech Recognition

This paper presents, in the context of multi-channel ASR, a method to ad...
research
07/15/2022

Direction-Aware Joint Adaptation of Neural Speech Enhancement and Recognition in Real Multiparty Conversational Environments

This paper describes noisy speech recognition for an augmented reality h...
research
10/27/2022

A Teacher-student Framework for Unsupervised Speech Enhancement Using Noise Remixing Training and Two-stage Inference

The lack of clean speech is a practical challenge to the development of ...
research
05/18/2023

Unsupervised Multi-channel Separation and Adaptation

A key challenge in machine learning is to generalize from training data ...
research
05/10/2021

Voice activity detection in the wild: A data-driven approach using teacher-student training

Voice activity detection is an essential pre-processing component for sp...
research
04/07/2022

Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation

Existing multi-channel continuous speech separation (CSS) models are hea...

Please sign up or login with your details

Forgot password? Click here to reset