Voice Conversion Based Speaker Normalization for Acoustic Unit Discovery

05/04/2021
by   Thomas Glarner, et al.
0

Discovering speaker independent acoustic units purely from spoken input is known to be a hard problem. In this work we propose an unsupervised speaker normalization technique prior to unit discovery. It is based on separating speaker related from content induced variations in a speech signal with an adversarial contrastive predictive coding approach. This technique does neither require transcribed speech nor speaker labels, and, furthermore, can be trained in a multilingual fashion, thus achieving speaker normalization even if only few unlabeled data is available from the target language. The speaker normalization is done by mapping all utterances to a medoid style which is representative for the whole database. We demonstrate the effectiveness of the approach by conducting acoustic unit discovery with a hidden Markov model variational autoencoder noting, however, that the proposed speaker normalization can serve as a front end to any unit discovery system. Experiments on English, Yoruba and Mboshi show improvements compared to using non-normalized input.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/26/2020

Adversarial Contrastive Predictive Coding for Unsupervised Learning of Disentangled Representations

In this work we tackle disentanglement of speaker and content related va...
research
04/16/2019

Unsupervised acoustic unit discovery for speech synthesis using discrete latent-variable neural networks

For our submission to the ZeroSpeech 2019 challenge, we apply discrete l...
research
08/02/2021

Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing

Contrastive predictive coding (CPC) aims to learn representations of spe...
research
05/19/2020

Vector-quantized neural networks for acoustic unit discovery in the ZeroSpeech 2020 challenge

In this paper, we explore vector quantization for acoustic unit discover...
research
03/05/2020

Tatistical Context-Dependent Units Boundary Correction for Corpus-based Unit-Selection Text-to-Speech

In this study, we present an innovative technique for speaker adaptation...
research
03/05/2020

Statistical Context-Dependent Units Boundary Correction for Corpus-based Unit-Selection Text-to-Speech

In this study, we present an innovative technique for speaker adaptation...
research
04/08/2019

Bayesian Subspace Hidden Markov Model for Acoustic Unit Discovery

This work tackles the problem of learning a set of language specific aco...

Please sign up or login with your details

Forgot password? Click here to reset