Frank Noé

is this you? claim profile


  • Boltzmann Generators - Sampling Equilibrium States of Many-Body Systems with Deep Learning

    Computing equilibrium states in condensed-matter many-body systems, such as solvated proteins, is a long-standing challenge. Lacking methods for generating statistically independent equilibrium samples directly, vast computational effort is invested for simulating these system in small steps, e.g., using Molecular Dynamics. Combining deep learning and statistical mechanics, we here develop Boltzmann Generators, that are shown to generate statistically independent samples of equilibrium states of representative condensed matter systems and complex polymers. Boltzmann Generators use neural networks to learn a coordinate transformation of the complex configurational equilibrium distribution to a distribution that can be easily sampled. Accurate computation of free energy differences, and discovery of new system states are demonstrated, providing a new statistical mechanics tool that performs orders of magnitude faster than standard simulation methods.

    12/04/2018 ∙ by Frank Noé, et al. ∙ 10 share

    read it

  • Variational Selection of Features for Molecular Kinetics

    The modeling of atomistic biomolecular simulations using kinetic models such as Markov state models (MSMs) has had many notable algorithmic advances in recent years. The variational principle has opened the door for a nearly fully automated toolkit for selecting models that predict the long-time kinetics from molecular dynamics simulations. However, one yet-unoptimized step of the pipeline involves choosing the features, or collective variables, from which the model should be constructed. In order to build intuitive models, these collective variables are often sought to be interpretable and familiar features, such as torsional angles or contact distances in a protein structure. However, previous approaches for evaluating the chosen features rely on constructing a full MSM, which in turn requires additional hyperparameters to be chosen, and hence leads to a computationally expensive framework. Here, we present a method to optimize the feature choice directly, without requiring the construction of the final kinetic model. We demonstrate our rigorous preprocessing algorithm on a canonical set of twelve fast-folding protein simulations, and show that our procedure leads to more efficient model selection.

    11/28/2018 ∙ by Martin K. Scherer, et al. ∙ 6 share

    read it

  • Time-lagged autoencoders: Deep learning of slow collective variables for molecular kinetics

    Inspired by the success of deep learning techniques in the physical and chemical sciences, we apply a modification of an autoencoder type deep neural network to the task of dimension reduction of molecular dynamics data. We can show that our time-lagged autoencoder reliably finds low-dimensional embeddings for high-dimensional feature spaces which capture the slow dynamics of the underlying stochastic processes - beyond the capabilities of linear dimension reduction techniques.

    10/30/2017 ∙ by Christoph Wehmeyer, et al. ∙ 0 share

    read it

  • VAMPnets: Deep learning of molecular kinetics

    Here we develop a deep learning framework for molecular kinetics from molecular dynamics (MD) simulation data. There is an increasing demand for computing the relevant structures, equilibria and long-timescale kinetics of complex biomolecular processes, such as protein-drug binding, from high-throughput MD simulations. State-of-the art methods employ a handcrafted data processing pipeline, involving (i) transformation of simulated coordinates into a set of features characterizing the molecular structure, (ii) dimension reduction to collective variables, (iii) clustering the dimension-reduced data, and (iv) estimation of a Markov state model (MSM) or related model of the interconversion rates between molecular structures. This approach demands a substantial amount of modeling expertise, as poor decisions at every step will lead to large modeling errors. Here we employ the recently developed variational approach for Markov processes (VAMP) to develop a deep learning framework for molecular kinetics using neural networks, dubbed VAMPnets. A VAMPnet encodes the entire mapping from molecular coordinates to Markov states and learns optimal feature transformations, nonlinear dimension reduction, cluster discretization and MSM estimation within a single end-to-end learning framework. Our results, ranging from toy models to protein folding, are competitive or outperform state-of-the art Markov modeling methods and readily provide easily interpretable few-state kinetic models.

    10/16/2017 ∙ by Andreas Mardt, et al. ∙ 0 share

    read it

  • Variational approach for learning Markov processes from time series data

    Inference, prediction and control of complex dynamical systems from time series is important in many areas, including financial markets, power grid management, climate and weather modeling, or molecular dynamics. The analysis of such highly nonlinear dynamical systems is facilitated by the fact that we can often find a (generally nonlinear) transformation of the system coordinates to features in which the dynamics can be excellently approximated by a linear Markovian model. Moreover, the large number of system variables often change collectively on large time- and length-scales, facilitating a low-dimensional analysis in feature space. In this paper, we introduce a variational approach for Markov processes (VAMP) that allows us to find optimal feature mappings and optimal Markovian models of the dynamics from given time series data. The key insight is that the best linear model can be obtained from the top singular components of the Koopman operator. This leads to the definition of a family of score functions called VAMP-r which can be calculated from data, and can be employed to optimize a Markovian model. In addition, based on the relationship between the variational scores and approximation errors of Koopman operators, we propose a new VAMP-E score, which can be applied to cross-validation for hyper-parameter optimization and model selection in VAMP. VAMP is valid for both reversible and nonreversible processes and for stationary and non-stationary processes or realizations.

    07/14/2017 ∙ by Hao Wu, et al. ∙ 0 share

    read it

  • Spectral learning of dynamic systems from nonequilibrium data

    Observable operator models (OOMs) and related models are one of the most important and powerful tools for modeling and analyzing stochastic systems. They exactly describe dynamics of finite-rank systems and can be efficiently and consistently estimated through spectral learning under the assumption of identically distributed data. In this paper, we investigate the properties of spectral learning without this assumption due to the requirements of analyzing large-time scale systems, and show that the equilibrium dynamics of a system can be extracted from nonequilibrium observation data by imposing an equilibrium constraint. In addition, we propose a binless extension of spectral learning for continuous data. In comparison with the other continuous-valued spectral algorithms, the binless algorithm can achieve consistent estimation of equilibrium dynamics with only linear complexity.

    09/04/2016 ∙ by Hao Wu, et al. ∙ 0 share

    read it

  • Variational Koopman models: slow collective variables and molecular kinetics from short off-equilibrium simulations

    Markov state models (MSMs) and Master equation models are popular approaches to approximate molecular kinetics, equilibria, metastable states, and reaction coordinates in terms of a state space discretization usually obtained by clustering. Recently, a powerful generalization of MSMs has been introduced, the variational approach (VA) of molecular kinetics and its special case the time-lagged independent component analysis (TICA), which allow us to approximate slow collective variables and molecular kinetics by linear combinations of smooth basis functions or order parameters. While it is known how to estimate MSMs from trajectories whose starting points are not sampled from an equilibrium ensemble, this has not yet been the case for TICA and the VA. Previous estimates from short trajectories, have been strongly biased and thus not variationally optimal. Here, we employ Koopman operator theory and ideas from dynamic mode decomposition (DMD) to extend the VA and TICA to non-equilibrium data. The main insight is that the VA and TICA provide a coefficient matrix that we call Koopman model, as it approximates the underlying dynamical (Koopman) operator in conjunction with the basis set used. This Koopman model can be used to compute a stationary vector to reweight the data to equilibrium. From such a Koopman-reweighted sample, equilibrium expectation values and variationally optimal reversible Koopman models can be constructed even with short simulations. The Koopman model can be used to propagate densities, and its eigenvalue decomposition provide estimates of relaxation timescales and slow collective variables for dimension reduction. Koopman models are generalizations of Markov state models, TICA and the linear VA and allow molecular kinetics to be described without a cluster discretization.

    10/20/2016 ∙ by Hao Wu, et al. ∙ 0 share

    read it

  • Deep Generative Markov State Models

    We propose a deep generative Markov State Model (DeepGenMSM) learning framework for inference of metastable dynamical systems and prediction of trajectories. After unsupervised training on time series data, the model contains (i) a probabilistic encoder that maps from high-dimensional configuration space to a small-sized vector indicating the membership to metastable (long-lived) states, (ii) a Markov chain that governs the transitions between metastable states and facilitates analysis of the long-time dynamics, and (iii) a generative part that samples the conditional distribution of configurations in the next time step. The model can be operated in a recursive fashion to generate trajectories to predict the system evolution from a defined starting state and propose new configurations. The DeepGenMSM is demonstrated to provide accurate estimates of the long-time kinetics and generate valid distributions for molecular dynamics (MD) benchmark systems. Remarkably, we show that DeepGenMSMs are able to make long time-steps in molecular configuration space and generate physically realistic structures in regions that were not seen in training data.

    05/19/2018 ∙ by Hao Wu, et al. ∙ 0 share

    read it

  • Machine Learning of coarse-grained Molecular Dynamics Force Fields

    Atomistic or ab-initio molecular dynamics simulations are widely used to predict thermodynamics and kinetics and relate them to molecular structure. A common approach to go beyond the time- and lengthscales accessible with such computationally expensive simulations is the definition of coarse-grained molecular models. Existing coarse-graining approaches define an effective interaction potential to match defined properties of high-resolution models or experimental data. In this paper we reformulate coarse-graining as a supervised machine learning problem. We use statistical learning theory to decompose the coarse-graining error and cross-validation to select to compare the performance of different models. We introduce CGnets, a deep learning approach, that learn coarse-grained free energy functions and can be trained by the force matching scheme. CGnets maintain all physically relevant invariances and allow to incorporate prior physics knowledge to avoid sampling of unphysical structures. We demonstrate that CGnets outperform the results of classical coarse-graining methods, as they are able to capture the multi-body terms that emerge from the dimensionality reduction.

    12/04/2018 ∙ by Jiang Wang, et al. ∙ 0 share

    read it

  • Machine Learning for Molecular Dynamics on Long Timescales

    Molecular Dynamics (MD) simulation is widely used to analyze the properties of molecules and materials. Most practical applications, such as comparison with experimental measurements, designing drug molecules, or optimizing materials, rely on statistical quantities, which may be prohibitively expensive to compute from direct long-time MD simulations. Classical Machine Learning (ML) techniques have already had a profound impact on the field, especially for learning low-dimensional models of the long-time dynamics and for devising more efficient sampling schemes for computing long-time statistics. Novel ML methods have the potential to revolutionize long-timescale MD and to obtain interpretable models. ML concepts such as statistical estimator theory, end-to-end learning, representation learning and active learning are highly interesting for the MD researcher and will help to develop new solutions to hard MD problems. With the aim of better connecting the MD and ML research areas and spawning new research on this interface, we define the learning problems in long-timescale MD, present successful approaches and outline some of the unsolved ML problems in this application field.

    12/18/2018 ∙ by Frank Noé, et al. ∙ 0 share

    read it