VDSM: Unsupervised Video Disentanglement with State-Space Modeling and Deep Mixtures of Experts

03/12/2021
by   Matthew J. Vowels, et al.
8

Disentangled representations support a range of downstream tasks including causal reasoning, generative modeling, and fair machine learning. Unfortunately, disentanglement has been shown to be impossible without the incorporation of supervision or inductive bias. Given that supervision is often expensive or infeasible to acquire, we choose to incorporate structural inductive bias and present an unsupervised, deep State-Space-Model for Video Disentanglement (VDSM). The model disentangles latent time-varying and dynamic factors via the incorporation of hierarchical structure with a dynamic prior and a Mixture of Experts decoder. VDSM learns separate disentangled representations for the identity of the object or person in the video, and for the action being performed. We evaluate VDSM across a range of qualitative and quantitative tasks including identity and dynamics transfer, sequence generation, Fréchet Inception Distance, and factor classification. VDSM provides state-of-the-art performance and exceeds adversarial methods, even when the methods use additional supervision.

READ FULL TEXT

page 1

page 6

page 7

page 15

page 17

page 18

page 19

page 20

research
07/28/2020

A Commentary on the Unsupervised Learning of Disentangled Representations

The goal of the unsupervised learning of disentangled representations is...
research
02/22/2019

FAVAE: Sequence Disentanglement using Information Bottleneck Principle

We propose the factorized action variational autoencoder (FAVAE), a stat...
research
06/14/2020

Structural Autoencoders Improve Representations for Generation and Transfer

We study the problem of structuring a learned representation to signific...
research
06/07/2019

Disentangled State Space Representations

Sequential data often originates from diverse domains across which stati...
research
10/27/2020

A Sober Look at the Unsupervised Learning of Disentangled Representations and their Evaluation

The idea behind the unsupervised learning of disentangled representation...
research
05/31/2017

Unsupervised Learning of Disentangled Representations from Video

We present a new model DrNET that learns disentangled image representati...
research
10/15/2021

Probing as Quantifying the Inductive Bias of Pre-trained Representations

Pre-trained contextual representations have led to dramatic performance ...

Please sign up or login with your details

Forgot password? Click here to reset