DeepAI AI Chat
Log In Sign Up

Channel adversarial training for speaker verification and diarization

10/25/2019
by   Chau Luu, et al.
0

Previous work has encouraged domain-invariance in deep speaker embedding by adversarially classifying the dataset or labelled environment to which the generated features belong. We propose a training strategy which aims to produce features that are invariant at the granularity of the recording or channel, a finer grained objective than dataset- or environment-invariance. By training an adversary to predict whether pairs of same-speaker embeddings belong to the same recording in a Siamese fashion, learned features are discouraged from utilizing channel information that may be speaker discriminative during training. Experiments for verification on VoxCeleb and diarization and verification on CALLHOME show promising improvements over a strong baseline in addition to outperforming a dataset-adversarial model. The VoxCeleb model in particular performs well, achieving a 4% relative improvement in EER over a Kaldi baseline, while using a similar architecture and less training data.

READ FULL TEXT

page 1

page 2

page 3

page 4

10/24/2019

Delving into VoxCeleb: environment invariant speaker recognition

Research in speaker recognition has recently seen significant progress d...
11/03/2019

Robust speaker recognition using unsupervised adversarial invariance

In this paper, we address the problem of speaker recognition in challeng...
07/23/2020

Augmentation adversarial training for unsupervised speaker recognition

The goal of this work is to train robust speaker recognition models with...
08/09/2020

Cosine-Distance Virtual Adversarial Training for Semi-Supervised Speaker-Discriminative Acoustic Embeddings

In this paper, we propose a semi-supervised learning (SSL) technique for...
08/08/2020

NPU Speaker Verification System for INTERSPEECH 2020 Far-Field Speaker Verification Challenge

This paper describes the NPU system submitted to Interspeech 2020 Far-Fi...
11/11/2021

MultiSV: Dataset for Far-Field Multi-Channel Speaker Verification

Motivated by unconsolidated data situation and the lack of a standard be...