Fast conformational clustering of extensive molecular dynamics simulation data

01/11/2023
by   Simon Hunkler, et al.
0

We present an unsupervised data processing workflow that is specifically designed to obtain a fast conformational clustering of long molecular dynamics simulation trajectories. In this approach we combine two dimensionality reduction algorithms (cc_analysis and encodermap) with a density-based spatial clustering algorithm (HDBSCAN). The proposed scheme benefits from the strengths of the three algorithms while avoiding most of the drawbacks of the individual methods. Here the cc_analysis algorithm is for the first time applied to molecular simulation data. Encodermap complements cc_analysis by providing an efficient way to process and assign large amounts of data to clusters. The main goal of the procedure is to maximize the number of assigned frames of a given trajectory, while keeping a clear conformational identity of the clusters that are found. In practice we achieve this by using an iterative clustering approach and a tunable root-mean-square-deviation-based criterion in the final cluster assignment. This allows to find clusters of different densities as well as different degrees of structural identity. With the help of four test systems we illustrate the capability and performance of this clustering workflow: wild-type and thermostable mutant of the Trp-cage protein (TC5b and TC10b), NTL9 and Protein B. Each of these systems poses individual challenges to the scheme, which in total give a nice overview of the advantages, as well as potential difficulties that can arise when using the proposed method.

READ FULL TEXT

page 7

page 8

page 10

page 12

research
05/31/2018

Conformation Clustering of Long MD Protein Dynamics with an Adversarial Autoencoder

Recent developments in specialized computer hardware have greatly accele...
research
10/29/2017

Dimensionality reduction methods for molecular simulations

Molecular simulations produce very high-dimensional data-sets with milli...
research
05/20/2023

GFDC: A Granule Fusion Density-Based Clustering with Evidential Reasoning

Currently, density-based clustering algorithms are widely applied becaus...
research
02/20/2020

Reliable Distributed Clustering with Redundant Data Assignment

In this paper, we present distributed generalized clustering algorithms ...
research
12/01/2020

(k, l)-Medians Clustering of Trajectories Using Continuous Dynamic Time Warping

Due to the massively increasing amount of available geospatial data and ...
research
01/08/2018

Acceleration of Mean Square Distance Calculations with Floating Close Structure in Metadynamics Simulations

Molecular dynamics simulates the movements of atoms. Due to its high cos...
research
04/24/2014

Solution Path Clustering with Adaptive Concave Penalty

Fast accumulation of large amounts of complex data has created a need fo...

Please sign up or login with your details

Forgot password? Click here to reset