Cross-domain Single-channel Speech Enhancement Model with Bi-projection Fusion Module for Noise-robust ASR

08/26/2021
by   Fu-An Chao, et al.
0

In recent decades, many studies have suggested that phase information is crucial for speech enhancement (SE), and time-domain single-channel speech enhancement techniques have shown promise in noise suppression and robust automatic speech recognition (ASR). This paper presents a continuation of the above lines of research and explores two effective SE methods that consider phase information in time domain and frequency domain of speech signals, respectively. Going one step further, we put forward a novel cross-domain speech enhancement model and a bi-projection fusion (BPF) mechanism for noise-robust ASR. To evaluate the effectiveness of our proposed method, we conduct an extensive set of experiments on the publicly-available Aishell-1 Mandarin benchmark speech corpus. The evaluation results confirm the superiority of our proposed method in relation to a few current top-of-the-line time-domain and frequency-domain SE methods in both enhancement and ASR evaluation metrics for the test set of scenarios contaminated with seen and unseen noise, respectively.

READ FULL TEXT
research
03/09/2020

Improving noise robust automatic speech recognition with single-channel time-domain enhancement network

With the advent of deep learning, research on noise-robust automatic spe...
research
07/04/2021

TENET: A Time-reversal Enhancement Network for Noise-robust ASR

Due to the unprecedented breakthroughs brought about by deep learning, s...
research
10/20/2020

Investigating Cross-Domain Losses for Speech Enhancement

Recent years have seen a surge in the number of available frameworks for...
research
08/27/2021

Task-aware Warping Factors in Mask-based Speech Enhancement

This paper proposes the use of two task-aware warping factors in mask-ba...
research
07/29/2022

Domain Specific Wav2vec 2.0 Fine-tuning For The SE R 2022 Challenge

This paper presents our efforts to build a robust ASR model for the shar...
research
09/02/2015

Enhancement and Recognition of Reverberant and Noisy Speech by Extending Its Coherence

Most speech enhancement algorithms make use of the short-time Fourier tr...
research
07/19/2022

ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding

This paper presents recent progress on integrating speech separation and...

Please sign up or login with your details

Forgot password? Click here to reset