Target Speaker Verification with Selective Auditory Attention for Single and Multi-talker Speech

03/30/2021
by   Chenglin Xu, et al.
0

Speaker verification has been studied mostly under the single-talker condition. It is adversely affected in the presence of interference speakers. Inspired by the study on target speaker extraction, e.g., SpEx, we propose a unified speaker verification framework for both single- and multi-talker speech, that is able to pay selective auditory attention to the target speaker. This target speaker verification (tSV) framework jointly optimizes a speaker attention module and a speaker representation module via multi-task learning. We study four different target speaker embedding schemes under the tSV framework. The experimental results show that all four target speaker embedding schemes significantly outperform other competitive solutions for multi-talker speech. Notably, the best tSV speaker embedding scheme achieves 76.0 relative improvements over the baseline system on the WSJ0-2mix-extr and Libri2Mix corpora in terms of equal-error-rate for 2-talker speech, while the performance of tSV for single-talker speech is on par with that of traditional speaker verification system, that is trained and evaluated under the same single-talker condition.

READ FULL TEXT
research
02/07/2019

Target Speaker Extraction for Overlapped Multi-Talker Speaker Verification

The performance of speaker verification degrades significantly when the ...
research
06/01/2023

Speaker verification using attentive multi-scale convolutional recurrent network

In this paper, we propose a speaker verification method by an Attentive ...
research
05/03/2023

Improved Vocal Effort Transfer Vector Estimation for Vocal Effort-Robust Speaker Verification

Despite the maturity of modern speaker verification technology, its perf...
research
03/30/2022

Multi-target Filter and Detector for Unknown-number Speaker Diarization

A strong representation of a target speaker can aid in extracting import...
research
02/24/2022

Closing the Gap between Single-User and Multi-User VoiceFilter-Lite

VoiceFilter-Lite is a speaker-conditioned voice separation model that pl...
research
07/13/2020

DNN Speaker Tracking with Embeddings

In multi-speaker applications is common to have pre-computed models from...
research
05/23/2018

Modeling Interpersonal Influence of Verbal Behavior in Couples Therapy Dyadic Interactions

Dyadic interactions among humans are marked by speakers continuously inf...

Please sign up or login with your details

Forgot password? Click here to reset