Towards Disentangled Speech Representations

08/28/2022
by   Cal Peyser, et al.
4

The careful construction of audio representations has become a dominant feature in the design of approaches to many speech tasks. Increasingly, such approaches have emphasized "disentanglement", where a representation contains only parts of the speech signal relevant to transcription while discarding irrelevant information. In this paper, we construct a representation learning task based on joint modeling of ASR and TTS, and seek to learn a representation of audio that disentangles that part of the speech signal that is relevant to transcription from that part which is not. We present empirical evidence that successfully finding such a representation is tied to the randomness inherent in training. We then make the observation that these desired, disentangled solutions to the optimization problem possess unique statistical properties. Finally, we show that enforcing these properties during training improves WER by 24.5 motivate a novel approach to learning effective audio representations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/08/2018

Towards Learning Fine-Grained Disentangled Representations from Speech

Learning disentangled representations of high-dimensional data is curren...
research
04/06/2023

DSVAE: Interpretable Disentangled Representation for Synthetic Speech Detection

Tools to generate high quality synthetic speech signal that is perceptua...
research
04/01/2021

Speech Resynthesis from Discrete Disentangled Self-Supervised Representations

We propose using self-supervised discrete representations for the task o...
research
01/30/2021

Adversarially learning disentangled speech representations for robust multi-factor voice conversion

Factorizing speech as disentangled speech representations is vital to ac...
research
10/02/2019

Animating Face using Disentangled Audio Representations

All previous methods for audio-driven talking head generation assume the...
research
05/07/2022

Learning Disentangled Textual Representations via Statistical Measures of Similarity

When working with textual data, a natural application of disentangled re...

Please sign up or login with your details

Forgot password? Click here to reset