Speaker diarization using latent space clustering in generative adversarial network

10/24/2019
by   Monisankha Pal, et al.
0

In this work, we propose deep latent space clustering for speaker diarization using generative adversarial network (GAN) backprojection with the help of an encoder network. The proposed diarization system is trained jointly with GAN loss, latent variable recovery loss, and a clustering-specific loss. It uses x-vector speaker embeddings at the input, while the latent variables are sampled from a combination of continuous random variables and discrete one-hot encoded variables using the original speaker labels. We benchmark our proposed system on the AMI meeting corpus, and two child-clinician interaction corpora (ADOS and BOSCC) from the autism diagnosis domain. ADOS and BOSCC contain diagnostic and treatment outcome sessions respectively obtained in clinical settings for verbal children and adolescents with autism. Experimental results show that our proposed system significantly outperform the state-of-the-art x-vector based diarization system on these databases. Further, we perform embedding fusion with x-vectors to achieve a relative DER improvement of 31 36 the x-vector baseline using oracle speech segmentation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/19/2020

Meta-learning with Latent Space Clustering in Generative Adversarial Network for Speaker Diarization

The performance of most speaker diarization systems with x-vector embedd...
research
10/09/2018

Generalized Latent Variable Recovery for Generative Adversarial Networks

The Generator of a Generative Adversarial Network (GAN) is trained to tr...
research
12/08/2020

GMM-Based Generative Adversarial Encoder Learning

While GAN is a powerful model for generating images, its inability to in...
research
11/13/2019

Clustering by Directly Disentangling Latent Space

To overcome the high dimensionality of data, learning latent feature rep...
research
10/24/2019

A study of semi-supervised speaker diarization system using gan mixture model

We propose a new speaker diarization system based on a recently introduc...
research
07/04/2023

Disentanglement in a GAN for Unconditional Speech Synthesis

Can we develop a model that can synthesize realistic speech directly fro...
research
11/17/2017

A Double Joint Bayesian Approach for J-Vector Based Text-dependent Speaker Verification

J-vector has been proved to be very effective in text-dependent speaker ...

Please sign up or login with your details

Forgot password? Click here to reset