Speaker Adapted Beamforming for Multi-Channel Automatic Speech Recognition

06/19/2018
by   Tobias Menne, et al.
0

This paper presents, in the context of multi-channel ASR, a method to adapt a mask based, statistically optimal beamforming approach to a speaker of interest. The beamforming vector of the statistically optimal beamformer is computed by utilizing speech and noise masks, which are estimated by a neural network. The proposed adaptation approach is based on the integration of the beamformer, which includes the mask estimation network, and the acoustic model of the ASR system. This allows for the propagation of the training error, from the acoustic modeling cost function, all the way through the beamforming operation and through the mask estimation network. By using the results of a first pass recognition and by keeping all other parameters fixed, the mask estimation network can therefore be fine tuned by retraining. Utterances of a speaker of interest can thus be used in a two pass approach, to optimize the beamforming for the speech characteristics of that specific speaker. It is shown that this approach improves the ASR performance of a state-of-the-art multi-channel ASR system on the CHiME-4 data. Furthermore the effect of the adaptation on the estimated speech masks is discussed.

READ FULL TEXT
research
07/23/2023

Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation

Neural speech separation has made remarkable progress and its integratio...
research
11/13/2019

3-D Feature and Acoustic Modeling for Far-Field Speech Recognition

Automatic speech recognition in multi-channel reverberant conditions is ...
research
11/28/2019

Unsupervised Neural Mask Estimator For Generalized Eigen-Value Beamforming Based ASR

The state-of-art methods for acoustic beamforming in multi-channel ASR a...
research
10/25/2018

Speaker Selective Beamformer with Keyword Mask Estimation

This paper addresses the problem of automatic speech recognition (ASR) o...
research
04/02/2019

Unsupervised training of neural mask-based beamforming

We present an unsupervised training approach for a neural network-based ...
research
06/13/2023

Statistical Beamformer Exploiting Non-stationarity and Sparsity with Spatially Constrained ICA for Robust Speech Recognition

In this paper, we present a statistical beamforming algorithm as a pre-p...
research
04/26/2022

Mask scalar prediction for improving robust automatic speech recognition

Using neural network based acoustic frontends for improving robustness o...

Please sign up or login with your details

Forgot password? Click here to reset