Augmentation adversarial training for unsupervised speaker recognition

07/23/2020
by   Jaesung Huh, et al.
0

The goal of this work is to train robust speaker recognition models without speaker labels. Recent works on unsupervised speaker representations are based on contrastive learning in which they encourage within-utterance embeddings to be similar and across-utterance embeddings to be dissimilar. However, since the within-utterance segments share the same acoustic characteristics, it is difficult to separate the speaker information from the channel information. To this end, we propose augmentation adversarial training strategy that trains the network to be discriminative for the speaker information, while invariant to the augmentation applied. Since the augmentation simulates the acoustic characteristics, training the network to be invariant to augmentation also encourages the network to be invariant to the channel information in general. Extensive experiments on the VoxCeleb and VOiCES datasets show significant improvements over previous works using self-supervision, and the performance of our self-supervised models far exceed that of humans.

READ FULL TEXT
research
10/25/2020

An iterative framework for self-supervised deep speaker representation learning

In this paper, we propose an iterative framework for self-supervised spe...
research
10/24/2019

Delving into VoxCeleb: environment invariant speaker recognition

Research in speaker recognition has recently seen significant progress d...
research
12/13/2020

Self-supervised Text-independent Speaker Verification using Prototypical Momentum Contrastive Learning

In this study, we investigate self-supervised representation learning fo...
research
02/03/2020

Within-sample variability-invariant loss for robust speaker recognition under noisy environments

Despite the significant improvements in speaker recognition enabled by d...
research
10/25/2019

Channel adversarial training for speaker verification and diarization

Previous work has encouraged domain-invariance in deep speaker embedding...
research
02/22/2018

Neural Predictive Coding using Convolutional Neural Networks towards Unsupervised Learning of Speaker Characteristics

Learning speaker-specific features is vital in many applications like sp...
research
09/21/2023

TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive Learning

The goal of this work is Active Speaker Detection (ASD), a task to deter...

Please sign up or login with your details

Forgot password? Click here to reset