Towards Robust Unsupervised Disentanglement of Sequential Data – A Case Study Using Music Audio

05/12/2022
by   Yin-Jyun Luo, et al.
13

Disentangled sequential autoencoders (DSAEs) represent a class of probabilistic graphical models that describes an observed sequence with dynamic latent variables and a static latent variable. The former encode information at a frame rate identical to the observation, while the latter globally governs the entire sequence. This introduces an inductive bias and facilitates unsupervised disentanglement of the underlying local and global factors. In this paper, we show that the vanilla DSAE suffers from being sensitive to the choice of model architecture and capacity of the dynamic latent variables, and is prone to collapse the static latent variable. As a countermeasure, we propose TS-DSAE, a two-stage training framework that first learns sequence-level prior distributions, which are subsequently employed to regularise the model and facilitate auxiliary objectives to promote disentanglement. The proposed framework is fully unsupervised and robust against the global factor collapse problem across a wide range of model configurations. It also avoids typical solutions such as adversarial training which usually involves laborious parameter tuning, and domain-specific data augmentation. We conduct quantitative and qualitative evaluations to demonstrate its robustness in terms of disentanglement on both artificial and real-world music audio datasets.

READ FULL TEXT
research
05/28/2022

High-dimensional factor copula models with estimation of latent variables

Factor models are a parsimonious way to explain the dependence of variab...
research
08/21/2020

Doubly Stochastic Variational Inference for Neural Processes with Hierarchical Latent Variables

Neural processes (NPs) constitute a family of variational approximate mo...
research
05/24/2022

RevUp: Revise and Update Information Bottleneck for Event Representation

In machine learning, latent variables play a key role to capture the und...
research
09/12/2018

Unsupervised Representation Learning of Speech for Dialect Identification

In this paper, we explore the use of a factorized hierarchical variation...
research
11/30/2017

Auxiliary Guided Autoregressive Variational Autoencoders

Generative modeling of high-dimensional data is a key problem in machine...
research
06/08/2011

A Sequence of Relaxations Constraining Hidden Variable Models

Many widely studied graphical models with latent variables lead to nontr...
research
11/20/2020

Lightweight Data Fusion with Conjugate Mappings

We present an approach to data fusion that combines the interpretability...

Please sign up or login with your details

Forgot password? Click here to reset