speech and noise dual-stream spectrogram refine network with speech distortion loss for robust speech recognition

05/29/2023
by   Haoyu Lu, et al.
0

In recent years, the joint training of speech enhancement front-end and automatic speech recognition (ASR) back-end has been widely used to improve the robustness of ASR systems. Traditional joint training methods only use enhanced speech as input for the backend. However, it is difficult for speech enhancement systems to directly separate speech from input due to the diverse types of noise with different intensities. Furthermore, speech distortion and residual noise are often observed in enhanced speech, and the distortion of speech and noise is different. Most existing methods focus on fusing enhanced and noisy features to address this issue. In this paper, we propose a dual-stream spectrogram refine network to simultaneously refine the speech and noise and decouple the noise from the noisy input. Our proposed method can achieve better performance with a relative 8.6

READ FULL TEXT

page 4

page 6

research
11/09/2020

Gated Recurrent Fusion with Joint Training Framework for Robust End-to-End Speech Recognition

The joint training framework for speech enhancement and recognition meth...
research
03/25/2022

Speech-enhanced and Noise-aware Networks for Robust Speech Recognition

Compensation for channel mismatch and noise interference is essential fo...
research
12/11/2021

Perceptual Loss with Recognition Model for Single-Channel Enhancement and Robust ASR

Single-channel speech enhancement approaches do not always improve autom...
research
10/22/2018

Investigation of Independent Monaural Front-End Processing for Robust ASR without Retraining and Joint-Training

In recent years, monaural speech separation has been formulated as a sup...
research
10/22/2018

Investigation of Monaural Front-End Processing for Robust ASR without Retraining or Joint-Training

In recent years, monaural speech separation has been formulated as a sup...
research
07/26/2019

Correlation Distance Skip Connection Denoising Autoencoder (CDSK-DAE) for Speech Feature Enhancement

Performance of learning based Automatic Speech Recognition (ASR) is susc...
research
02/24/2021

Thoughts on the potential to compensate a hearing loss in noise

The effect of hearing impairment on speech perception was described by P...

Please sign up or login with your details

Forgot password? Click here to reset