Unsupervised acoustic unit discovery for speech synthesis using discrete latent-variable neural networks

04/16/2019
by   Ryan Eloff, et al.
0

For our submission to the ZeroSpeech 2019 challenge, we apply discrete latent-variable neural networks to unlabelled speech and use the discovered units for speech synthesis. Unsupervised discrete subword modelling could be useful for studies of phonetic category learning in infants or in low-resource speech technology requiring symbolic input. We use an autoencoder (AE) architecture with intermediate discretisation. We decouple acoustic unit discovery from speaker modelling by conditioning the AE's decoder on the training speaker identity. At test time, unit discovery is performed on speech from an unseen speaker, followed by unit decoding conditioned on a known target speaker to obtain reconstructed filterbanks. This output is fed to a neural vocoder to synthesise speech in the target speaker's voice. For discretisation, categorical variational autoencoders (CatVAEs), vector-quantised VAEs (VQ-VAEs) and straight-through estimation are compared at different compression levels on two languages. Our final model uses convolutional encoding, VQ-VAE discretisation, deconvolutional decoding and an FFTNet vocoder. We show that decoupled speaker conditioning intrinsically improves discrete acoustic representations, yielding competitive synthesis quality compared to the challenge baseline.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/19/2020

Vector-quantized neural networks for acoustic unit discovery in the ZeroSpeech 2020 challenge

In this paper, we explore vector quantization for acoustic unit discover...
research
05/04/2021

Voice Conversion Based Speaker Normalization for Acoustic Unit Discovery

Discovering speaker independent acoustic units purely from spoken input ...
research
09/10/2020

Exploration of End-to-end Synthesisers forZero Resource Speech Challenge 2020

A Spoken dialogue system for an unseen language is referred to as Zero r...
research
10/24/2020

A Comparison of Discrete Latent Variable Models for Speech Representation Learning

Neural latent variable models enable the discovery of interesting struct...
research
05/27/2019

VQVAE Unsupervised Unit Discovery and Multi-scale Code2Spec Inverter for Zerospeech Challenge 2019

We describe our submitted system for the ZeroSpeech Challenge 2019. The ...
research
05/19/2020

Bayesian Subspace HMM for the Zerospeech 2020 Challenge

In this paper we describe our submission to the Zerospeech 2020 challeng...
research
08/16/2020

Unsupervised Acoustic Unit Representation Learning for Voice Conversion using WaveNet Auto-encoders

Unsupervised representation learning of speech has been of keen interest...

Please sign up or login with your details

Forgot password? Click here to reset