Direction-Aware Adaptive Online Neural Speech Enhancement with an Augmented Reality Headset in Real Noisy Conversational Environments

07/15/2022
by   Kouhei Sekiguchi, et al.
1

This paper describes the practical response- and performance-aware development of online speech enhancement for an augmented reality (AR) headset that helps a user understand conversations made in real noisy echoic environments (e.g., cocktail party). One may use a state-of-the-art blind source separation method called fast multichannel nonnegative matrix factorization (FastMNMF) that works well in various environments thanks to its unsupervised nature. Its heavy computational cost, however, prevents its application to real-time processing. In contrast, a supervised beamforming method that uses a deep neural network (DNN) for estimating spatial information of speech and noise readily fits real-time processing, but suffers from drastic performance degradation in mismatched conditions. Given such complementary characteristics, we propose a dual-process robust online speech enhancement method based on DNN-based beamforming with FastMNMF-guided adaptation. FastMNMF (back end) is performed in a mini-batch style and the noisy and enhanced speech pairs are used together with the original parallel training data for updating the direction-aware DNN (front end) with backpropagation at a computationally-allowable interval. This method is used with a blind dereverberation method called weighted prediction error (WPE) for transcribing the noisy reverberant speech of a speaker, which can be detected from video or selected by a user's hand gesture or eye gaze, in a streaming manner and spatially showing the transcriptions with an AR technique. Our experiment showed that the word error rate was improved by more than 10 points with the run-time adaptation using only twelve minutes of observation.

READ FULL TEXT

page 1

page 3

page 5

research
07/22/2022

DNN-Free Low-Latency Adaptive Speech Enhancement Based on Frame-Online Beamforming Powered by Block-Online FastMNMF

This paper describes a practical dual-process speech enhancement system ...
research
07/15/2022

Direction-Aware Joint Adaptation of Neural Speech Enhancement and Recognition in Real Multiparty Conversational Environments

This paper describes noisy speech recognition for an augmented reality h...
research
03/22/2019

Unsupervised Speech Enhancement Based on Multichannel NMF-Informed Beamforming for Noise-Robust Automatic Speech Recognition

This paper describes multichannel speech enhancement for improving autom...
research
07/09/2021

EasyCom: An Augmented Reality Dataset to Support Algorithms for Easy Communication in Noisy Environments

Augmented Reality (AR) as a platform has the potential to facilitate the...
research
10/06/2021

Lightweight Speech Enhancement in Unseen Noisy and Reverberant Conditions using KISS-GEV Beamforming

This paper introduces a new method referred to as KISS-GEV (for Keep It ...
research
05/09/2019

Block-Online Multi-Channel Speech Enhancement Using DNN-Supported Relative Transfer Function Estimates

This paper addresses the problem of block-online processing for multi-ch...

Please sign up or login with your details

Forgot password? Click here to reset