Leveraging speaker attribute information using multi task learning for speaker verification and diarization

10/27/2020
by   Chau Luu, et al.
0

Deep speaker embeddings have become the leading method for encoding speaker identity in speaker recognition tasks. The embedding space should ideally capture the variations between all possible speakers, encoding the multiple aspects that make up speaker identity. In this work, utilizing speaker age as an auxiliary variable in US Supreme Court recordings and speaker nationality with VoxCeleb, we show that by leveraging additional speaker attribute information in a multi task learning setting, deep speaker embedding performance can be increased for verification and diarization tasks, achieving a relative improvement of 17.8 compared to omitting the auxiliary task. Experimental code has been made publicly available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/22/2023

Leveraging Speaker Embeddings with Adversarial Multi-task Learning for Age Group Classification

Recently, researchers have utilized neural network-based speaker embeddi...
research
08/06/2020

Improving on-device speaker verification using federated learning with privacy

Information on speaker characteristics can be useful as side information...
research
12/08/2020

Adversarial Disentanglement of Speaker Representation for Attribute-Driven Privacy Preservation

With the increasing interest over speech technologies, numerous Automati...
research
04/13/2018

Speaker Embedding Extraction with Phonetic Information

Speaker embeddings achieve promising results on many speaker verificatio...
research
07/10/2017

Improving speaker turn embedding by crossmodal transfer learning from face embedding

Learning speaker turn embeddings has shown considerable improvement in s...
research
03/28/2019

Multi-Task Learning with High-Order Statistics for X-vector based Text-Independent Speaker Verification

The x-vector based deep neural network (DNN) embedding systems have demo...
research
01/14/2020

Gaussian speaker embedding learning for text-independent speaker verification

The x-vector maps segments of arbitrary duration to vectors of fixed dim...

Please sign up or login with your details

Forgot password? Click here to reset