A Teacher-Student approach for extracting informative speaker embeddings from speech mixtures

06/01/2023
by   Tobias Cord-Landwehr, et al.
0

We introduce a monaural neural speaker embeddings extractor that computes an embedding for each speaker present in a speech mixture. To allow for supervised training, a teacher-student approach is employed: the teacher computes the target embeddings from each speaker's utterance before the utterances are added to form the mixture, and the student embedding extractor is then tasked to reproduce those embeddings from the speech mixture at its input. The system much more reliably verifies the presence or absence of a given speaker in a mixture than a conventional speaker embedding extractor, and even exhibits comparable performance to a multi-channel approach that exploits spatial information for embedding extraction. Further, it is shown that a speaker embedding computed from a mixture can be used to check for the presence of that speaker in another mixture.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/02/2023

Distilling Multi-Level X-vector Knowledge for Small-footprint Speaker Verification

Deep speaker models yield low error rates in speaker verification. Nonet...
research
04/15/2022

Speaker-Aware Mixture of Mixtures Training for Weakly Supervised Speaker Extraction

Dominant researches adopt supervised training for speaker extraction, wh...
research
07/24/2018

Deep Extractor Network for Target Speaker Recovery From Single Channel Speech Mixtures

Speaker-aware source separation methods are promising workarounds for ma...
research
11/08/2022

High-resolution embedding extractor for speaker diarisation

Speaker embedding extractors significantly influence the performance of ...
research
10/23/2022

Quantitative Evidence on Overlooked Aspects of Enrollment Speaker Embeddings for Target Speaker Separation

Single channel target speaker separation (TSS) aims at extracting a spea...
research
09/21/2020

Open-set Short Utterance Forensic Speaker Verification using Teacher-Student Network with Explicit Inductive Bias

In forensic applications, it is very common that only small naturalistic...
research
05/26/2022

DT-SV: A Transformer-based Time-domain Approach for Speaker Verification

Speaker verification (SV) aims to determine whether the speaker's identi...

Please sign up or login with your details

Forgot password? Click here to reset