Unsupervised Speech Enhancement Based on Multichannel NMF-Informed Beamforming for Noise-Robust Automatic Speech Recognition

03/22/2019
by   Kazuki Shimada, et al.
10

This paper describes multichannel speech enhancement for improving automatic speech recognition (ASR) in noisy environments. Recently, the minimum variance distortionless response (MVDR) beamforming has widely been used because it works well if the steering vector of speech and the spatial covariance matrix (SCM) of noise are given. To estimating such spatial information, conventional studies take a supervised approach that classifies each time-frequency (TF) bin into noise or speech by training a deep neural network (DNN). The performance of ASR, however, is degraded in an unknown noisy environment. To solve this problem, we take an unsupervised approach that decomposes each TF bin into the sum of speech and noise by using multichannel nonnegative matrix factorization (MNMF). This enables us to accurately estimate the SCMs of speech and noise not from observed noisy mixtures but from separated speech and noise components. In this paper we propose online MVDR beamforming by effectively initializing and incrementally updating the parameters of MNMF. Another main contribution is to comprehensively investigate the performances of ASR obtained by various types of spatial filters, i.e., time-invariant and variant versions of MVDR beamformers and those of rank-1 and full-rank multichannel Wiener filters, in combination with MNMF. The experimental results showed that the proposed method outperformed the state-of-the-art DNN-based beamforming method in unknown environments that did not match training data.

READ FULL TEXT

page 1

page 8

page 12

research
12/12/2021

Improving Speech Recognition on Noisy Speech via Speech Enhancement with Multi-Discriminators CycleGAN

This paper presents our latest investigations on improving automatic spe...
research
10/06/2021

Lightweight Speech Enhancement in Unseen Noisy and Reverberant Conditions using KISS-GEV Beamforming

This paper introduces a new method referred to as KISS-GEV (for Keep It ...
research
06/13/2023

Statistical Beamformer Exploiting Non-stationarity and Sparsity with Spatially Constrained ICA for Robust Speech Recognition

In this paper, we present a statistical beamforming algorithm as a pre-p...
research
07/15/2022

Direction-Aware Joint Adaptation of Neural Speech Enhancement and Recognition in Real Multiparty Conversational Environments

This paper describes noisy speech recognition for an augmented reality h...
research
07/15/2022

Direction-Aware Adaptive Online Neural Speech Enhancement with an Augmented Reality Headset in Real Noisy Conversational Environments

This paper describes the practical response- and performance-aware devel...
research
10/31/2019

W-Net BF: DNN-based Beamformer Using Joint Training Approach

Acoustic beamformers have been widely used to enhance audio signals. The...

Please sign up or login with your details

Forgot password? Click here to reset