Applying the Information Bottleneck Principle to Prosodic Representation Learning

08/05/2021
by   Guangyan Zhang, et al.
0

This paper describes a novel design of a neural network-based speech generation model for learning prosodic representation.The problem of representation learning is formulated according to the information bottleneck (IB) principle. A modified VQ-VAE quantized layer is incorporated in the speech generation model to control the IB capacity and adjust the balance between reconstruction power and disentangle capability of the learned representation. The proposed model is able to learn word-level prosodic representations from speech data. With an optimized IB capacity, the learned representations not only are adequate to reconstruct the original speech but also can be used to transfer the prosody onto different textual content. Extensive results of the objective and subjective evaluation are presented to demonstrate the effect of IB capacity control, the effectiveness, and potential usage of the learned prosodic representation in controllable neural speech generation.

READ FULL TEXT
research
01/30/2021

Adversarially learning disentangled speech representations for robust multi-factor voice conversion

Factorizing speech as disentangled speech representations is vital to ac...
research
01/25/2019

Unsupervised speech representation learning using WaveNet autoencoders

We consider the task of unsupervised extraction of meaningful latent rep...
research
11/23/2020

STEPs-RL: Speech-Text Entanglement for Phonetically Sound Representation Learning

In this paper, we present a novel multi-modal deep neural network archit...
research
04/05/2019

An Unsupervised Autoregressive Model for Speech Representation Learning

This paper proposes a novel unsupervised autoregressive neural model for...
research
07/17/2019

Learnability for the Information Bottleneck

The Information Bottleneck (IB) method (tishby2000information) provides ...
research
09/17/2022

Representation Learning Strategies to Model Pathological Speech: Effect of Multiple Spectral Resolutions

This paper considers a representation learning strategy to model speech ...
research
10/28/2019

Towards Unsupervised Speech Recognition and Synthesis with Quantized Speech Representation Learning

In this paper we propose a Sequential Representation Quantization AutoEn...

Please sign up or login with your details

Forgot password? Click here to reset