Speaker-Aware Mixture of Mixtures Training for Weakly Supervised Speaker Extraction

04/15/2022
by   Zifeng Zhao, et al.
0

Dominant researches adopt supervised training for speaker extraction, while the scarcity of ideally clean corpus and channel mismatch problem are rarely considered. To this end, we propose speaker-aware mixture of mixtures training (SAMoM), utilizing the consistency of speaker identity among target source, enrollment utterance and target estimate to weakly supervise the training of a deep speaker extractor. In SAMoM, the input is constructed by mixing up different speaker-aware mixtures (SAMs), each contains multiple speakers with their identities known and enrollment utterances available. Informed by enrollment utterances, target speech is extracted from the input one by one, such that the estimated targets can approximate the original SAMs after a remix in accordance with the identity consistency. Moreover, using SAMoM in a semi-supervised setting with a certain amount of clean sources enables application in noisy scenarios. Extensive experiments on Libri2Mix show that the proposed method achieves promising results without access to any clean sources (11.06dB SI-SDRi). With a domain adaptation, our approach even outperformed supervised framework in a cross-domain evaluation on AISHELL-1.

READ FULL TEXT
research
06/01/2023

A Teacher-Student approach for extracting informative speaker embeddings from speech mixtures

We introduce a monaural neural speaker embeddings extractor that compute...
research
11/01/2022

Adapting self-supervised models to multi-talker speech recognition using speaker embeddings

Self-supervised learning (SSL) methods which learn representations of da...
research
06/18/2022

Semi-supervised Time Domain Target Speaker Extraction with Attention

In this work, we propose Exformer, a time-domain architecture for target...
research
11/05/2021

Blind Extraction of Target Speech Source Guided by Supervised Speaker Identification via X-vectors

This manuscript proposes a novel robust procedure for extraction of a sp...
research
02/07/2021

Time-Domain Speech Extraction with Spatial Information and Multi Speaker Conditioning Mechanism

In this paper, we present a novel multi-channel speech extraction system...
research
07/24/2018

Deep Extractor Network for Target Speaker Recovery From Single Channel Speech Mixtures

Speaker-aware source separation methods are promising workarounds for ma...
research
10/25/2019

Adaptive blind audio source extraction supervised by dominant speaker identification using x-vectors

We propose a novel algorithm for adaptive blind audio source extraction....

Please sign up or login with your details

Forgot password? Click here to reset