Guided Speech Enhancement Network

03/13/2023
by   Yang Yang, et al.
0

High quality speech capture has been widely studied for both voice communication and human computer interface reasons. To improve the capture performance, we can often find multi-microphone speech enhancement techniques deployed on various devices. Multi-microphone speech enhancement problem is often decomposed into two decoupled steps: a beamformer that provides spatial filtering and a single-channel speech enhancement model that cleans up the beamformer output. In this work, we propose a speech enhancement solution that takes both the raw microphone and beamformer outputs as the input for an ML model. We devise a simple yet effective training scheme that allows the model to learn from the cues of the beamformer by contrasting the two inputs and greatly boost its capability in spatial rejection, while conducting the general tasks of denoising and dereverberation. The proposed solution takes advantage of classical spatial filtering algorithms instead of competing with them. By design, the beamformer module then could be selected separately and does not require a large amount of data to be optimized for a given form factor, and the network model can be considered as a standalone module which is highly transferable independently from the microphone array. We name the ML module in our solution as GSENet, short for Guided Speech Enhancement Network. We demonstrate its effectiveness on real world data collected on multi-microphone devices in terms of the suppression of noise and interfering speech.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/27/2021

Closing the Gap Between Time-Domain Multi-Channel Speech Enhancement on Real and Simulation Conditions

The deep learning based time-domain models, e.g. Conv-TasNet, have shown...
research
09/19/2023

Efficient Multi-Channel Speech Enhancement with Spherical Harmonics Injection for Directional Encoding

Multi-channel speech enhancement extracts speech using multiple micropho...
research
10/31/2018

On Single-Channel Speech Enhancement and On Non-Linear Modulation-Domain Kalman Filtering

This report focuses on algorithms that perform single-channel speech enh...
research
09/19/2023

Hierarchical Modeling of Spatial Cues via Spherical Harmonics for Multi-Channel Speech Enhancement

Multi-channel speech enhancement utilizes spatial information from multi...
research
03/14/2022

TaylorBeamformer: Learning All-Neural Beamformer for Multi-Channel Speech Enhancement from Taylor's Approximation Theory

While existing end-to-end beamformers achieve impressive performance in ...
research
10/17/2022

spatial-dccrn: dccrn equipped with frame-level angle feature and hybrid filtering for multi-channel speech enhancement

Recently, multi-channel speech enhancement has drawn much interest due t...
research
11/16/2022

Array Configuration-Agnostic Personalized Speech Enhancement using Long-Short-Term Spatial Coherence

Personalized speech enhancement has been a field of active research for ...

Please sign up or login with your details

Forgot password? Click here to reset