Amplifying state dissimilarity leads to robust and interpretable clustering of scientific data

07/12/2018
by   Brooke E. Husic, et al.
2

Existing methods that aim to automatically cluster data into physically meaningful subsets typically require assumptions regarding the number, size, or shape of the coherent subgroups. We present a new method, simultaneous Coherent Structure Coloring (sCSC), which accomplishes the task of unsupervised clustering without a priori guidance regarding the underlying structure of the data. To illustrate the versatility of the method, we apply it to frontier physics problems at vastly different temporal and spatial scales: in a theoretical model of geophysical fluid dynamics, in laboratory measurements of vortex ring formation and entrainment, and in atomistic simulation of the Protein G system. The theoretical flow involves sparse sampling of non-equilibrium dynamics, where this new technique can find and characterize the structures that govern fluid transport using two orders of magnitude less data than required by existing methods. Application of the method to empirical measurements of vortex formation leads to the discovery of a well defined region in which vortex ring entrainment occurs, with potential implications ranging from flow control to cardiovascular diagnostics. Finally, the protein folding example demonstrates a data-rich application governed by equilibrium dynamics, where the technique in this manuscript automatically discovers the hierarchy of distinct processes that govern protein folding and clusters protein configurations accordingly. We anticipate straightforward translation to many other fields where existing analysis tools, such as k-means and traditional hierarchical clustering, require ad hoc assumptions on the data structure or lack the interpretability of the present method. The method is also potentially generalizable to fields where the underlying processes are less accessible, such as genomics and neuroscience.

READ FULL TEXT

page 2

page 3

page 4

page 6

page 7

research
08/18/2017

Identification of individual coherent sets associated with flow trajectories using Coherent Structure Coloring

We present a method for identifying the coherent structures associated w...
research
10/01/2016

Coherent structure coloring: identification of coherent structures from sparse data using graph theory

We present a frame-invariant method for detecting coherent structures fr...
research
09/03/2021

Segmentation of turbulent computational fluid dynamics simulations with unsupervised ensemble learning

Computer vision and machine learning tools offer an exciting new way for...
research
08/09/2023

Visualizing Similarity of Pathline Dynamics in 2D Flow Fields

Even though the analysis of unsteady 2D flow fields is challenging, flui...
research
08/21/2022

Tensor-based flow reconstruction from optimally located sensor measurements

Reconstructing high-resolution flow fields from sparse measurements is a...
research
08/15/2023

APACE: AlphaFold2 and advanced computing as a service for accelerated discovery in biophysics

The prediction of protein 3D structure from amino acid sequence is a com...
research
01/26/2021

Unsupervised clustering of series using dynamic programming and neural processes

Following the work of arXiv:2101.09512, we are interested in clustering ...

Please sign up or login with your details

Forgot password? Click here to reset