DeepAI AI Chat
Log In Sign Up

Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing

by   Benjamin van Niekerk, et al.
Stellenbosch University

Contrastive predictive coding (CPC) aims to learn representations of speech by distinguishing future observations from a set of negative examples. Previous work has shown that linear classifiers trained on CPC features can accurately predict speaker and phone labels. However, it is unclear how the features actually capture speaker and phonetic information, and whether it is possible to normalize out the irrelevant details (depending on the downstream task). In this paper, we first show that the per-utterance mean of CPC features captures speaker information to a large extent. Concretely, we find that comparing means performs well on a speaker verification task. Next, probing experiments show that standardizing the features effectively removes speaker information. Based on this observation, we propose a speaker normalization step to improve acoustic unit discovery using K-means clustering of CPC features. Finally, we show that a language model trained on the resulting units achieves some of the best results in the ZeroSpeech2021 Challenge.


page 1

page 2

page 3

page 4


Phone and speaker spatial organization in self-supervised speech representations

Self-supervised representations of speech are currently being widely use...

Self-supervised Predictive Coding Models Encode Speaker and Phonetic Information in Orthogonal Subspaces

Self-supervised speech representations are known to encode both speaker ...

Contrastive Predictive Coding Based Feature for Automatic Speaker Verification

This thesis describes our ongoing work on Contrastive Predictive Coding ...

Voice Conversion Based Speaker Normalization for Acoustic Unit Discovery

Discovering speaker independent acoustic units purely from spoken input ...

Probing phoneme, language and speaker information in unsupervised speech representations

Unsupervised models of representations based on Contrastive Predictive C...

Augmentation adversarial training for unsupervised speaker recognition

The goal of this work is to train robust speaker recognition models with...

Supervised online diarization with sample mean loss for multi-domain data

Recently, a fully supervised speaker diarization approach was proposed (...