Blind Extraction of Target Speech Source Guided by Supervised Speaker Identification via X-vectors

11/05/2021
by   Jiri Malek, et al.
0

This manuscript proposes a novel robust procedure for extraction of a speaker of interest (SOI) from a mixture of audio sources. The estimation of the SOI is blind, performed via independent vector extraction. A recently proposed constant separating vector (CSV) model is employed, which improves the estimation of moving sources. The blind algorithm is guided towards the SOI via the frame-wise speaker identification, which is trained in a supervised manner and is independent of a specific scenario. When processing challenging data, an incorrect speaker may be extracted due to limitations of this guidance. To identify such cases, a criterion non-intrusively assessing quality of the estimated SOI is proposed. It utilizes the same model as the speaker identification; no additional training is therefore required. Using this criterion, the “deflation” approach to extraction is presented. If an incorrect source is estimated, it is subtracted from the mixture and the extraction of the SOI is performed again from the reduced mixture. The proposed procedure is experimentally tested on both artificial and real-world datasets containing challenging phenomena: source movements, reverberation, transient noise or microphone failures. The presented method is comparable to the state-of-the-art blind algorithms on static mixtures; it is more accurate for mixtures containing source movements. Compared to fully supervised methods, the proposed procedure achieves a lower level of accuracy but requires no scenario-specific data for the training.

READ FULL TEXT
research
10/25/2019

Adaptive blind audio source extraction supervised by dominant speaker identification using x-vectors

We propose a novel algorithm for adaptive blind audio source extraction....
research
04/15/2022

Speaker-Aware Mixture of Mixtures Training for Weakly Supervised Speaker Extraction

Dominant researches adopt supervised training for speaker extraction, wh...
research
06/18/2022

Semi-supervised Time Domain Target Speaker Extraction with Attention

In this work, we propose Exformer, a time-domain architecture for target...
research
04/04/2023

Independent Vector Extraction Constrained on Manifold of Half-Length Filters

Independent Vector Analysis (IVA) is a popular extension of Independent ...
research
08/01/2020

Efficient Independent Vector Extraction of Dominant Target Speech

The complete decomposition performed by blind source separation is compu...
research
08/04/2021

Blind and neural network-guided convolutional beamformer for joint denoising, dereverberation, and source separation

This paper proposes an approach for optimizing a Convolutional BeamForme...
research
08/02/2018

Histogram Transform-based Speaker Identification

A novel text-independent speaker identification (SI) method is proposed....

Please sign up or login with your details

Forgot password? Click here to reset