AI Chat AI Image Generator AI Video Text to Speech

Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data

09/22/2017

∙

by Wei-Ning Hsu, et al.

∙

∙

We present a factorized hierarchical variational autoencoder, which learns disentangled and interpretable representations from sequential data without supervision. Specifically, we exploit the multi-scale nature of information in sequential data by formulating it explicitly within a factorized hierarchical graphical model that imposes sequence-dependent priors and sequence-independent priors to different sets of latent variables. The model is evaluated on two speech corpora to demonstrate, qualitatively, its ability to transform speakers or linguistic content by manipulating different sets of latent variables; and quantitatively, its ability to outperform an i-vector baseline for speaker verification and reduce the word error rate by as much as 35 train/test scenarios for automatic speech recognition tasks.

Wei-Ning Hsu
59 publications
Yu Zhang
406 publications
James Glass
123 publications

page 6

page 16

page 17

page 18

page 20

page 21

page 22

page 23

research

∙ 02/22/2019

FAVAE: Sequence Disentanglement using Information Bottleneck Principle

We propose the factorized action variational autoencoder (FAVAE), a stat...

0 Masanori Yamada, et al. ∙

research

∙ 11/15/2022

Improved disentangled speech representations using contrastive learning in factorized hierarchical variational autoencoder

By utilizing the fact that speaker identity and content vary on differen...

0 Yuying Xie, et al. ∙

research

∙ 04/09/2018

Scalable Factorized Hierarchical Variational Autoencoder Training

Deep generative models have achieved great success in unsupervised learn...

0 Wei-Ning Hsu, et al. ∙

research

∙ 10/24/2020

Unsupervised Learning of Disentangled Speech Content and Style Representation

We present an approach for unsupervised learning of speech representatio...

0 Andros Tjandra, et al. ∙

research

∙ 03/07/2018

Extracting Domain Invariant Features by Unsupervised Learning for Robust Automatic Speech Recognition

The performance of automatic speech recognition (ASR) systems can be sig...

0 Wei-Ning Hsu, et al. ∙

research

∙ 06/13/2018

Unsupervised Adaptation with Interpretable Disentangled Representations for Distant Conversational Speech Recognition

The current trend in automatic speech recognition is to leverage large a...

0 Wei-Ning Hsu, et al. ∙

research

∙ 03/29/2021

SetVAE: Learning Hierarchical Composition for Generative Modeling of Set-Structured Data

Generative modeling of set-structured data, such as point clouds, requir...

17 Jinwoo Kim, et al. ∙