Supervised Speaker Embedding De-Mixing in Two-Speaker Environment

01/14/2020
by   Yanpei Shi, et al.
0

In this work, a speaker embedding de-mixing approach is proposed. Instead of separating two-speaker signal in signal space like speech source separation, the proposed approach separates different speaker properties from two-speaker signal in embedding space. The proposed approach contains two steps. In step one, the clean speaker embeddings are learned and collected by a residual TDNN based network. In step two, the two-speaker signal and the embedding of one of the speakers are input to a speaker embedding de-mixing network. The de-mixing network is trained to generate the embedding of the other speaker of the by reconstruction loss. Speaker identification accuracy on the de-mixed speaker embeddings is used to evaluate the quality of the obtained embeddings. Experiments are done in two kind of data: artificial augmented two-speaker data (TIMIT) and real world recording of two-speaker data (MC-WSJ). Six diffident speaker embedding de-mixing architectures are investigated. Comparing with the speaker identification accuracy on the clean speaker embeddings (98.5 obtained results show that one of the speaker embedding de-mixing architectures obtain close performance, reaching 96.9 between the target speaker and interfering speaker is 5 dB. More surprisingly, we found choosing a simple subtraction as the embedding de-mixing function could obtain the second best performance, reaching 95.2

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/22/2020

Compositional embedding models for speaker identification and diarization with simultaneous speech from 2+ speakers

We propose a new method for speaker diarization that can handle overlapp...
research
02/06/2023

Residual Information in Deep Speaker Embedding Architectures

Speaker embeddings represent a means to extract representative vectorial...
research
10/30/2022

Symmetric Saliency-based Adversarial Attack To Speaker Identification

Adversarial attack approaches to speaker identification either need high...
research
07/13/2022

Online Target Speaker Voice Activity Detection for Speaker Diarization

This paper proposes an online target speaker voice activity detection sy...
research
02/24/2021

Triplet loss based embeddings for forensic speaker identification in Spanish

With the advent of digital technology, it is more common that committed ...
research
09/06/2021

Improving Speaker Identification for Shared Devices by Adapting Embeddings to Speaker Subsets

Speaker identification typically involves three stages. First, a front-e...
research
05/22/2020

Speaker diarization with session-level speaker embedding refinement using graph neural networks

Deep speaker embedding models have been commonly used as a building bloc...

Please sign up or login with your details

Forgot password? Click here to reset