Learning-based personal speech enhancement for teleconferencing by exploiting spatial-spectral features

12/10/2021
by   Yicheng Hsu, et al.
0

Teleconferencing is becoming essential during the COVID-19 pandemic. However, in real-world applications, speech quality can deteriorate due to, for example, background interference, noise, or reverberation. To solve this problem, target speech extraction from the mixture signals can be performed with the aid of the user's vocal features. Various features are accounted for in this study's proposed system, including speaker embeddings derived from user enrollment and a novel long-short-term spatial coherence (LSTSC) feature to the target speaker activity. As a learning-based approach, a target speech sifting network was employed to extract the target speech signal. The network trained with LSTSC in the proposed approach is robust to microphone array geometries and the number of microphones. Furthermore, the proposed enhancement system was compared with a baseline system with speaker embeddings and interchannel phase difference. The results demonstrated the superior performance of the proposed system over the baseline in enhancement performance and robustness.

READ FULL TEXT
research
07/17/2022

Multi-channel target speech enhancement based on ERB-scaled spatial coherence features

Recently, speech enhancement technologies that are based on deep learnin...
research
11/16/2022

Array Configuration-Agnostic Personalized Speech Enhancement using Long-Short-Term Spatial Coherence

Personalized speech enhancement has been a field of active research for ...
research
09/10/2023

Gray Jedi MVDR Post-filtering

Spatial filters can exploit deep-learning-based speech enhancement model...
research
04/05/2021

Self-Supervised Learning for Personalized Speech Enhancement

Speech enhancement systems can show improved performance by adapting the...
research
02/15/2023

Multi-Channel Target Speaker Extraction with Refinement: The WavLab Submission to the Second Clarity Enhancement Challenge

This paper describes our submission to the Second Clarity Enhancement Ch...
research
04/16/2019

Joined Audio-Visual Speech Enhancement and Recognition in the Cocktail Party: The Tug Of War Between Enhancement and Recognition Losses

In this paper we propose an end-to-end LSTM-based model that performs si...
research
02/01/2020

Analysis of Deep Feature Loss based Enhancement for Speaker Verification

Data augmentation is conventionally used to inject robustness in Speaker...

Please sign up or login with your details

Forgot password? Click here to reset