Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing

08/02/2021
by   Benjamin van Niekerk, et al.
0

Contrastive predictive coding (CPC) aims to learn representations of speech by distinguishing future observations from a set of negative examples. Previous work has shown that linear classifiers trained on CPC features can accurately predict speaker and phone labels. However, it is unclear how the features actually capture speaker and phonetic information, and whether it is possible to normalize out the irrelevant details (depending on the downstream task). In this paper, we first show that the per-utterance mean of CPC features captures speaker information to a large extent. Concretely, we find that comparing means performs well on a speaker verification task. Next, probing experiments show that standardizing the features effectively removes speaker information. Based on this observation, we propose a speaker normalization step to improve acoustic unit discovery using K-means clustering of CPC features. Finally, we show that a language model trained on the resulting units achieves some of the best results in the ZeroSpeech2021 Challenge.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2023

Phone and speaker spatial organization in self-supervised speech representations

Self-supervised representations of speech are currently being widely use...
research
05/21/2023

Self-supervised Predictive Coding Models Encode Speaker and Phonetic Information in Orthogonal Subspaces

Self-supervised speech representations are known to encode both speaker ...
research
04/01/2019

Contrastive Predictive Coding Based Feature for Automatic Speaker Verification

This thesis describes our ongoing work on Contrastive Predictive Coding ...
research
05/04/2021

Voice Conversion Based Speaker Normalization for Acoustic Unit Discovery

Discovering speaker independent acoustic units purely from spoken input ...
research
11/27/2019

Powerful Speaker Embedding Training Framework by Adversarially Disentangled Identity Representation

The main challenge of speaker verification in the wild is the interferen...
research
03/30/2022

Probing phoneme, language and speaker information in unsupervised speech representations

Unsupervised models of representations based on Contrastive Predictive C...
research
03/05/2020

Tatistical Context-Dependent Units Boundary Correction for Corpus-based Unit-Selection Text-to-Speech

In this study, we present an innovative technique for speaker adaptation...

Please sign up or login with your details

Forgot password? Click here to reset