Attention-based scaling adaptation for target speech extraction

10/19/2020
by   Jiangyu Han, et al.
0

The target speech extraction has attracted widespread attention in recent years, however, the research of improving the target speaker clues is still limited. In this work, we focus on investigating the dynamic interaction between different mixtures and the target speaker to exploit the discriminative target speaker clues. We propose a special attention mechanism in a scaling adaptation layer to better adapt the network towards extracting the target speech. Furthermore, by introducing a mixture embedding matrix pooling method, our proposed attention-based scaling adaptation (ASA) can exploit the target speaker clues in a more efficient way. Experimental results on the spatialized reverberant WSJ0 2-mix dataset demonstrate that the proposed method improves the performance of the target speech extraction significantly. Furthermore, we find that under the same network configurations, the ASA in a single-channel condition can achieve competitive performance gains as that achieved from two-channel mixtures with inter-microphone phase difference (IPD) features.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/23/2020

Improving speaker discrimination of target speech extraction with time-domain SpeakerBeam

Target speech extraction, which extracts a single target source in a mix...
research
03/15/2023

Beamformer-Guided Target Speaker Extraction

We propose a Beamformer-guided Target Speaker Extraction (BG-TSE) method...
research
02/02/2021

Multimodal Attention Fusion for Target Speaker Extraction

Target speaker extraction, which aims at extracting a target speaker's v...
research
04/17/2020

SpEx: Multi-Scale Time Domain Speaker Extraction Network

Speaker extraction aims to mimic humans' selective auditory attention by...
research
01/14/2021

Speaker activity driven neural speech extraction

Target speech extraction, which extracts the speech of a target speaker ...
research
08/01/2020

Efficient Independent Vector Extraction of Dominant Target Speech

The complete decomposition performed by blind source separation is compu...
research
11/03/2022

Dynamic Kernels and Channel Attention with Multi-Layer Embedding Aggregation for Speaker Verification

State-of-the-art speaker verification frameworks have typically focused ...

Please sign up or login with your details

Forgot password? Click here to reset