MC-SpEx: Towards Effective Speaker Extraction with Multi-Scale Interfusion and Conditional Speaker Modulation

06/28/2023
by   Jun Chen, et al.
0

The previous SpEx+ has yielded outstanding performance in speaker extraction and attracted much attention. However, it still encounters inadequate utilization of multi-scale information and speaker embedding. To this end, this paper proposes a new effective speaker extraction system with multi-scale interfusion and conditional speaker modulation (ConSM), which is called MC-SpEx. First of all, we design the weight-share multi-scale fusers (ScaleFusers) for efficiently leveraging multi-scale information as well as ensuring consistency of the model's feature space. Then, to consider different scale information while generating masks, the multi-scale interactive mask generator (ScaleInterMG) is presented. Moreover, we introduce ConSM module to fully exploit speaker embedding in the speech extractor. Experimental results on the Libri2Mix dataset demonstrate the effectiveness of our improvements and the state-of-the-art performance of our proposed MC-SpEx.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/17/2020

SpEx: Multi-Scale Time Domain Speaker Extraction Network

Speaker extraction aims to mimic humans' selective auditory attention by...
research
03/30/2022

Multi-scale Speaker Diarization with Dynamic Scale Weighting

Speaker diarization systems are challenged by a trade-off between the te...
research
10/07/2021

Multi-scale speaker embedding-based graph attention networks for speaker diarisation

The objective of this work is effective speaker diarisation using multi-...
research
04/07/2020

Improving Multi-Scale Aggregation Using Feature Pyramid Module for Robust Speaker Verification of Variable-Duration Utterances

Currently, the most widely used approach for speaker verification is the...
research
06/28/2023

Focus on the Sound around You: Monaural Target Speaker Extraction via Distance and Speaker Information

Previously, Target Speaker Extraction (TSE) has yielded outstanding perf...
research
03/17/2022

TMS: A Temporal Multi-scale Backbone Design for Speaker Embedding

Speaker embedding is an important front-end module to explore discrimina...
research
07/06/2020

ResNeXt and Res2Net Structure for Speaker Verification

ResNet-based architecture has been widely adopted as the speaker embeddi...

Please sign up or login with your details

Forgot password? Click here to reset