Convolutional Recurrent Neural Network with Attention for 3D Speech Enhancement

06/08/2023
by   Han Yin, et al.
0

3D speech enhancement can effectively improve the auditory experience and plays a crucial role in augmented reality technology. However, traditional convolutional-based speech enhancement methods have limitations in extracting dynamic voice information. In this paper, we incorporate a dual-path recurrent neural network block into the U-Net to iteratively extract dynamic audio information in both the time and frequency domains. And an attention mechanism is proposed to fuse the original signal, reference signal, and generated masks. Moreover, we introduce a loss function to simultaneously optimize the network in the time-frequency and time domains. Experimental results show that our system outperforms the state-of-the-art systems on the dataset of ICASSP L3DAS23 challenge.

READ FULL TEXT
research
02/02/2020

Single Channel Speech Enhancement Using Temporal Convolutional Recurrent Neural Networks

In recent decades, neural network based methods have significantly impro...
research
11/16/2018

Using recurrences in time and frequency within U-net architecture for speech enhancement

When designing fully-convolutional neural network, there is a trade-off ...
research
11/08/2020

Frequency Gating: Improved Convolutional Neural Networks for Speech Enhancement in the Time-Frequency Domain

One of the strengths of traditional convolutional neural networks (CNNs)...
research
05/15/2023

ForkNet: Simultaneous Time and Time-Frequency Domain Modeling for Speech Enhancement

Previous research in speech enhancement has mostly focused on modeling t...
research
04/15/2019

RHR-Net: A Residual Hourglass Recurrent Neural Network for Speech Enhancement

Most current speech enhancement models use spectrogram features that req...
research
08/18/2019

A Dual-Staged Context Aggregation Method Towards Efficient End-To-End Speech Enhancement

In speech enhancement, an end-to-end deep neural network converts a nois...
research
01/30/2022

HGCN: harmonic gated compensation network for speech enhancement

Mask processing in the time-frequency (T-F) domain through the neural ne...

Please sign up or login with your details

Forgot password? Click here to reset