Variational embedding of protein folding simulations using gaussian mixture variational autoencoders

by   Mahdi Ghorbani, et al.

Conformational sampling of biomolecules using molecular dynamics simulations often produces large amount of high dimensional data that makes it difficult to interpret using conventional analysis techniques. Dimensionality reduction methods are thus required to extract useful and relevant information. Here we devise a machine learning method, Gaussian mixture variational autoencoder (GMVAE) that can simultaneously perform dimensionality reduction and clustering of biomolecular conformations in an unsupervised way. We show that GMVAE can learn a reduced representation of the free energy landscape of protein folding with highly separated clusters that correspond to the metastable states during folding. Since GMVAE uses a mixture of Gaussians as the prior, it can directly acknowledge the multi-basin nature of protein folding free-energy landscape. To make the model end-to-end differentialble, we use a Gumbel-softmax distribution. We test the model on three long-timescale protein folding trajectories and show that GMVAE embedding resembles the folding funnel with folded states down the funnel and unfolded states outer in the funnel path. Additionally, we show that the latent space of GMVAE can be used for kinetic analysis and Markov state models built on this embedding produce folding and unfolding timescales that are in close agreement with other rigorous dynamical embeddings such as time independent component analysis (TICA).


page 5

page 7

page 8

page 16

page 17

page 19


Interpretable Embeddings From Molecular Simulations Using Gaussian Mixture Variational Autoencoders

Extracting insight from the enormous quantity of data generated from mol...

Dimensionality reduction methods for molecular simulations

Molecular simulations produce very high-dimensional data-sets with milli...

Characterizing metastable states with the help of machine learning

Present-day atomistic simulations generate long trajectories of ever mor...

Deep learning based mixed-dimensional GMM for characterizing variability in CryoEM

The function of most protein molecules involves structural flexibility a...

Learning Clustered Representation for Complex Free Energy Landscapes

In this paper we first analyzed the inductive bias underlying the data s...

Perturbation theory approach to study the latent space degeneracy of Variational Autoencoders

The use of Variational Autoencoders in different Machine Learning tasks ...

Transferable neural networks for enhanced sampling of protein dynamics

Variational auto-encoder frameworks have demonstrated success in reducin...

Please sign up or login with your details

Forgot password? Click here to reset