
Boltzmann Generators  Sampling Equilibrium States of ManyBody Systems with Deep Learning
Computing equilibrium states in condensedmatter manybody systems, such as solvated proteins, is a longstanding challenge. Lacking methods for generating statistically independent equilibrium samples directly, vast computational effort is invested for simulating these system in small steps, e.g., using Molecular Dynamics. Combining deep learning and statistical mechanics, we here develop Boltzmann Generators, that are shown to generate statistically independent samples of equilibrium states of representative condensed matter systems and complex polymers. Boltzmann Generators use neural networks to learn a coordinate transformation of the complex configurational equilibrium distribution to a distribution that can be easily sampled. Accurate computation of free energy differences, and discovery of new system states are demonstrated, providing a new statistical mechanics tool that performs orders of magnitude faster than standard simulation methods.
12/04/2018 ∙ by Frank Noé, et al. ∙ 10 ∙ shareread it

Variational Selection of Features for Molecular Kinetics
The modeling of atomistic biomolecular simulations using kinetic models such as Markov state models (MSMs) has had many notable algorithmic advances in recent years. The variational principle has opened the door for a nearly fully automated toolkit for selecting models that predict the longtime kinetics from molecular dynamics simulations. However, one yetunoptimized step of the pipeline involves choosing the features, or collective variables, from which the model should be constructed. In order to build intuitive models, these collective variables are often sought to be interpretable and familiar features, such as torsional angles or contact distances in a protein structure. However, previous approaches for evaluating the chosen features rely on constructing a full MSM, which in turn requires additional hyperparameters to be chosen, and hence leads to a computationally expensive framework. Here, we present a method to optimize the feature choice directly, without requiring the construction of the final kinetic model. We demonstrate our rigorous preprocessing algorithm on a canonical set of twelve fastfolding protein simulations, and show that our procedure leads to more efficient model selection.
11/28/2018 ∙ by Martin K. Scherer, et al. ∙ 6 ∙ shareread it

Timelagged autoencoders: Deep learning of slow collective variables for molecular kinetics
Inspired by the success of deep learning techniques in the physical and chemical sciences, we apply a modification of an autoencoder type deep neural network to the task of dimension reduction of molecular dynamics data. We can show that our timelagged autoencoder reliably finds lowdimensional embeddings for highdimensional feature spaces which capture the slow dynamics of the underlying stochastic processes  beyond the capabilities of linear dimension reduction techniques.
10/30/2017 ∙ by Christoph Wehmeyer, et al. ∙ 0 ∙ shareread it

VAMPnets: Deep learning of molecular kinetics
Here we develop a deep learning framework for molecular kinetics from molecular dynamics (MD) simulation data. There is an increasing demand for computing the relevant structures, equilibria and longtimescale kinetics of complex biomolecular processes, such as proteindrug binding, from highthroughput MD simulations. Stateofthe art methods employ a handcrafted data processing pipeline, involving (i) transformation of simulated coordinates into a set of features characterizing the molecular structure, (ii) dimension reduction to collective variables, (iii) clustering the dimensionreduced data, and (iv) estimation of a Markov state model (MSM) or related model of the interconversion rates between molecular structures. This approach demands a substantial amount of modeling expertise, as poor decisions at every step will lead to large modeling errors. Here we employ the recently developed variational approach for Markov processes (VAMP) to develop a deep learning framework for molecular kinetics using neural networks, dubbed VAMPnets. A VAMPnet encodes the entire mapping from molecular coordinates to Markov states and learns optimal feature transformations, nonlinear dimension reduction, cluster discretization and MSM estimation within a single endtoend learning framework. Our results, ranging from toy models to protein folding, are competitive or outperform stateofthe art Markov modeling methods and readily provide easily interpretable fewstate kinetic models.
10/16/2017 ∙ by Andreas Mardt, et al. ∙ 0 ∙ shareread it

Variational approach for learning Markov processes from time series data
Inference, prediction and control of complex dynamical systems from time series is important in many areas, including financial markets, power grid management, climate and weather modeling, or molecular dynamics. The analysis of such highly nonlinear dynamical systems is facilitated by the fact that we can often find a (generally nonlinear) transformation of the system coordinates to features in which the dynamics can be excellently approximated by a linear Markovian model. Moreover, the large number of system variables often change collectively on large time and lengthscales, facilitating a lowdimensional analysis in feature space. In this paper, we introduce a variational approach for Markov processes (VAMP) that allows us to find optimal feature mappings and optimal Markovian models of the dynamics from given time series data. The key insight is that the best linear model can be obtained from the top singular components of the Koopman operator. This leads to the definition of a family of score functions called VAMPr which can be calculated from data, and can be employed to optimize a Markovian model. In addition, based on the relationship between the variational scores and approximation errors of Koopman operators, we propose a new VAMPE score, which can be applied to crossvalidation for hyperparameter optimization and model selection in VAMP. VAMP is valid for both reversible and nonreversible processes and for stationary and nonstationary processes or realizations.
07/14/2017 ∙ by Hao Wu, et al. ∙ 0 ∙ shareread it

Spectral learning of dynamic systems from nonequilibrium data
Observable operator models (OOMs) and related models are one of the most important and powerful tools for modeling and analyzing stochastic systems. They exactly describe dynamics of finiterank systems and can be efficiently and consistently estimated through spectral learning under the assumption of identically distributed data. In this paper, we investigate the properties of spectral learning without this assumption due to the requirements of analyzing largetime scale systems, and show that the equilibrium dynamics of a system can be extracted from nonequilibrium observation data by imposing an equilibrium constraint. In addition, we propose a binless extension of spectral learning for continuous data. In comparison with the other continuousvalued spectral algorithms, the binless algorithm can achieve consistent estimation of equilibrium dynamics with only linear complexity.
09/04/2016 ∙ by Hao Wu, et al. ∙ 0 ∙ shareread it

Variational Koopman models: slow collective variables and molecular kinetics from short offequilibrium simulations
Markov state models (MSMs) and Master equation models are popular approaches to approximate molecular kinetics, equilibria, metastable states, and reaction coordinates in terms of a state space discretization usually obtained by clustering. Recently, a powerful generalization of MSMs has been introduced, the variational approach (VA) of molecular kinetics and its special case the timelagged independent component analysis (TICA), which allow us to approximate slow collective variables and molecular kinetics by linear combinations of smooth basis functions or order parameters. While it is known how to estimate MSMs from trajectories whose starting points are not sampled from an equilibrium ensemble, this has not yet been the case for TICA and the VA. Previous estimates from short trajectories, have been strongly biased and thus not variationally optimal. Here, we employ Koopman operator theory and ideas from dynamic mode decomposition (DMD) to extend the VA and TICA to nonequilibrium data. The main insight is that the VA and TICA provide a coefficient matrix that we call Koopman model, as it approximates the underlying dynamical (Koopman) operator in conjunction with the basis set used. This Koopman model can be used to compute a stationary vector to reweight the data to equilibrium. From such a Koopmanreweighted sample, equilibrium expectation values and variationally optimal reversible Koopman models can be constructed even with short simulations. The Koopman model can be used to propagate densities, and its eigenvalue decomposition provide estimates of relaxation timescales and slow collective variables for dimension reduction. Koopman models are generalizations of Markov state models, TICA and the linear VA and allow molecular kinetics to be described without a cluster discretization.
10/20/2016 ∙ by Hao Wu, et al. ∙ 0 ∙ shareread it

Deep Generative Markov State Models
We propose a deep generative Markov State Model (DeepGenMSM) learning framework for inference of metastable dynamical systems and prediction of trajectories. After unsupervised training on time series data, the model contains (i) a probabilistic encoder that maps from highdimensional configuration space to a smallsized vector indicating the membership to metastable (longlived) states, (ii) a Markov chain that governs the transitions between metastable states and facilitates analysis of the longtime dynamics, and (iii) a generative part that samples the conditional distribution of configurations in the next time step. The model can be operated in a recursive fashion to generate trajectories to predict the system evolution from a defined starting state and propose new configurations. The DeepGenMSM is demonstrated to provide accurate estimates of the longtime kinetics and generate valid distributions for molecular dynamics (MD) benchmark systems. Remarkably, we show that DeepGenMSMs are able to make long timesteps in molecular configuration space and generate physically realistic structures in regions that were not seen in training data.
05/19/2018 ∙ by Hao Wu, et al. ∙ 0 ∙ shareread it

Machine Learning of coarsegrained Molecular Dynamics Force Fields
Atomistic or abinitio molecular dynamics simulations are widely used to predict thermodynamics and kinetics and relate them to molecular structure. A common approach to go beyond the time and lengthscales accessible with such computationally expensive simulations is the definition of coarsegrained molecular models. Existing coarsegraining approaches define an effective interaction potential to match defined properties of highresolution models or experimental data. In this paper we reformulate coarsegraining as a supervised machine learning problem. We use statistical learning theory to decompose the coarsegraining error and crossvalidation to select to compare the performance of different models. We introduce CGnets, a deep learning approach, that learn coarsegrained free energy functions and can be trained by the force matching scheme. CGnets maintain all physically relevant invariances and allow to incorporate prior physics knowledge to avoid sampling of unphysical structures. We demonstrate that CGnets outperform the results of classical coarsegraining methods, as they are able to capture the multibody terms that emerge from the dimensionality reduction.
12/04/2018 ∙ by Jiang Wang, et al. ∙ 0 ∙ shareread it

Machine Learning for Molecular Dynamics on Long Timescales
Molecular Dynamics (MD) simulation is widely used to analyze the properties of molecules and materials. Most practical applications, such as comparison with experimental measurements, designing drug molecules, or optimizing materials, rely on statistical quantities, which may be prohibitively expensive to compute from direct longtime MD simulations. Classical Machine Learning (ML) techniques have already had a profound impact on the field, especially for learning lowdimensional models of the longtime dynamics and for devising more efficient sampling schemes for computing longtime statistics. Novel ML methods have the potential to revolutionize longtimescale MD and to obtain interpretable models. ML concepts such as statistical estimator theory, endtoend learning, representation learning and active learning are highly interesting for the MD researcher and will help to develop new solutions to hard MD problems. With the aim of better connecting the MD and ML research areas and spawning new research on this interface, we define the learning problems in longtimescale MD, present successful approaches and outline some of the unsolved ML problems in this application field.
12/18/2018 ∙ by Frank Noé, et al. ∙ 0 ∙ shareread it
Frank Noé
is this you? claim profile