Interpretable Embeddings From Molecular Simulations Using Gaussian Mixture Variational Autoencoders

Extracting insight from the enormous quantity of data generated from molecular simulations requires the identification of a small number of collective variables whose corresponding low-dimensional free-energy landscape retains the essential features of the underlying system. Data-driven techniques provide a systematic route to constructing this landscape, without the need for extensive a priori intuition into the relevant driving forces. In particular, autoencoders are powerful tools for dimensionality reduction, as they naturally force an information bottleneck and, thereby, a low-dimensional embedding of the essential features. While variational autoencoders ensure continuity of the embedding by assuming a unimodal Gaussian prior, this is at odds with the multi-basin free-energy landscapes that typically arise from the identification of meaningful collective variables. In this work, we incorporate this physical intuition into the prior by employing a Gaussian mixture variational autoencoder (GMVAE), which encourages the separation of metastable states within the embedding. The GMVAE performs dimensionality reduction and clustering within a single unified framework, and is capable of identifying the inherent dimensionality of the input data, in terms of the number of Gaussians required to categorize the data. We illustrate our approach on two toy models, alanine dipeptide, and a challenging disordered peptide ensemble, demonstrating the enhanced clustering effect of the GMVAE prior compared to standard VAEs. The resulting embeddings appear to be promising representations for constructing Markov state models, highlighting the transferability of the dimensionality reduction from static equilibrium properties to dynamics.

READ FULL TEXT

page 14

page 17

page 18

page 25

page 27

page 37

page 38

page 41

research
08/27/2021

Variational embedding of protein folding simulations using gaussian mixture variational autoencoders

Conformational sampling of biomolecules using molecular dynamics simulat...
research
12/07/2020

Multitask machine learning of collective variables for enhanced sampling of rare events

Computing accurate reaction rates is a central challenge in computationa...
research
03/18/2011

Optimal Dimensionality Reduction of Complex Dynamics: The Chess Game as Diffusion on a Free Energy Landscape

Dimensionality reduction is ubiquitous in analysis of complex dynamics. ...
research
04/22/2021

Chasing Collective Variables using Autoencoders and biased trajectories

In the last decades, free energy biasing methods have proven to be power...
research
03/06/2020

BasisVAE: Translation-invariant feature-level clustering with Variational Autoencoders

Variational Autoencoders (VAEs) provide a flexible and scalable framewor...
research
04/28/2022

Representative period selection for power system planning using autoencoder-based dimensionality reduction

Power sector capacity expansion models (CEMs) that are used for studying...

Please sign up or login with your details

Forgot password? Click here to reset