Generation of Speaker Representations Using Heterogeneous Training Batch Assembly

03/30/2022
by   Yu-Huai Peng, et al.
0

In traditional speaker diarization systems, a well-trained speaker model is a key component to extract representations from consecutive and partially overlapping segments in a long speech session. To be more consistent with the back-end segmentation and clustering, we propose a new CNN-based speaker modeling scheme, which takes into account the heterogeneity of the speakers in each training segment and batch. We randomly and synthetically augment the training data into a set of segments, each of which contains more than one speaker and some overlapping parts. A soft label is imposed on each segment based on its speaker occupation ratio, and the standard cross entropy loss is implemented in model training. In this way, the speaker model should have the ability to generate a geometrically meaningful embedding for each multi-speaker segment. Experimental results show that our system is superior to the baseline system using x-vectors in two speaker diarization tasks. In the CALLHOME task trained on the NIST SRE and Switchboard datasets, our system achieves a relative reduction of 12.93 13.24 respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/06/2019

Self-supervised speaker embeddings

Contrary to i-vectors, speaker embeddings such as x-vectors are incapabl...
research
01/02/2020

Speaker-aware speech-transformer

Recently, end-to-end (E2E) models become a competitive alternative to th...
research
10/07/2021

Multi-scale speaker embedding-based graph attention networks for speaker diarisation

The objective of this work is effective speaker diarisation using multi-...
research
07/22/2019

A Deep Neural Network for Short-Segment Speaker Recognition

Todays interactive devices such as smart-phone assistants and smart spea...
research
05/22/2020

Speaker diarization with session-level speaker embedding refinement using graph neural networks

Deep speaker embedding models have been commonly used as a building bloc...
research
07/13/2020

DNN Speaker Tracking with Embeddings

In multi-speaker applications is common to have pre-computed models from...
research
11/27/2021

Online Speaker Diarization with Graph-based Label Generation

This paper introduces an online speaker diarization system that can hand...

Please sign up or login with your details

Forgot password? Click here to reset