TENET: A Time-reversal Enhancement Network for Noise-robust ASR

by   Fu-An Chao, et al.

Due to the unprecedented breakthroughs brought about by deep learning, speech enhancement (SE) techniques have been developed rapidly and play an important role prior to acoustic modeling to mitigate noise effects on speech. To increase the perceptual quality of speech, current state-of-the-art in the SE field adopts adversarial training by connecting an objective metric to the discriminator. However, there is no guarantee that optimizing the perceptual quality of speech will necessarily lead to improved automatic speech recognition (ASR) performance. In this study, we present TENET, a novel Time-reversal Enhancement NETwork, which leverages the transformation of an input noisy signal itself, i.e., the time-reversed version, in conjunction with the siamese network and complex dual-path transformer to promote SE performance for noise-robust ASR. Extensive experiments conducted on the Voicebank-DEMAND dataset show that TENET can achieve state-of-the-art results compared to a few top-of-the-line methods in terms of both SE and ASR evaluation metrics. To demonstrate the model generalization ability, we further evaluate TENET on the test set of scenarios contaminated with unseen noise, and the results also confirm the superiority of this promising method.



There are no comments yet.


page 1

page 2

page 3

page 4


Cross-domain Single-channel Speech Enhancement Model with Bi-projection Fusion Module for Noise-robust ASR

In recent decades, many studies have suggested that phase information is...

Speech enhancement guided by contextual articulatory information

Previous studies have confirmed the effectiveness of leveraging articula...

How Bad Are Artifacts?: Analyzing the Impact of Speech Enhancement Errors on ASR

It is challenging to improve automatic speech recognition (ASR) performa...

CMGAN: Conformer-based Metric GAN for Speech Enhancement

Recently, convolution-augmented transformer (Conformer) has achieved pro...

Task-aware Warping Factors in Mask-based Speech Enhancement

This paper proposes the use of two task-aware warping factors in mask-ba...

Perceptual Contrast Stretching on Target Feature for Speech Enhancement

Speech enhancement (SE) performance has improved considerably since the ...

SERIL: Noise Adaptive Speech Enhancement using Regularization-based Incremental Learning

Numerous noise adaptation techniques have been proposed to address the m...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.