BASEN: Time-Domain Brain-Assisted Speech Enhancement Network with Convolutional Cross Attention in Multi-talker Conditions

05/17/2023
by   Jie Zhang, et al.
0

Time-domain single-channel speech enhancement (SE) still remains challenging to extract the target speaker without any prior information on multi-talker conditions. It has been shown via auditory attention decoding that the brain activity of the listener contains the auditory information of the attended speaker. In this paper, we thus propose a novel time-domain brain-assisted SE network (BASEN) incorporating electroencephalography (EEG) signals recorded from the listener for extracting the target speaker from monaural speech mixtures. The proposed BASEN is based on the fully-convolutional time-domain audio separation network. In order to fully leverage the complementary information contained in the EEG signals, we further propose a convolutional multi-layer cross attention module to fuse the dual-branch features. Experimental results on a public dataset show that the proposed model outperforms the state-of-the-art method in several evaluation metrics. The reproducible code is available at https://github.com/jzhangU/Basen.git.

READ FULL TEXT
research
05/10/2020

Cognitive-driven convolutional beamforming using EEG-based auditory attention decoding

The performance of speech enhancement algorithms in a multi-speaker scen...
research
09/14/2023

M3-AUDIODEC: Multi-channel multi-speaker multi-spatial audio codec

We introduce M3-AUDIODEC, an innovative neural spatial audio codec desig...
research
02/20/2023

Personalized speech enhancement combining band-split RNN and speaker attentive module

Target speaker information can be utilized in speech enhancement (SE) mo...
research
11/10/2022

Speech Enhancement with Fullband-Subband Cross-Attention Network

FullSubNet has shown its promising performance on speech enhancement by ...
research
11/08/2022

Cross-Attention is all you need: Real-Time Streaming Transformers for Personalised Speech Enhancement

Personalised speech enhancement (PSE), which extracts only the speech of...
research
11/22/2019

Time-Domain Multi-modal Bone/air Conducted Speech Enhancement

Integrating modalities, such as video signals with speech, has been show...
research
09/07/2023

DGSD: Dynamical Graph Self-Distillation for EEG-Based Auditory Spatial Attention Detection

Auditory Attention Detection (AAD) aims to detect target speaker from br...

Please sign up or login with your details

Forgot password? Click here to reset