DeepAI AI Chat
Log In Sign Up

Amplifying state dissimilarity leads to robust and interpretable clustering of scientific data

by   Brooke E. Husic, et al.
Stanford University

Existing methods that aim to automatically cluster data into physically meaningful subsets typically require assumptions regarding the number, size, or shape of the coherent subgroups. We present a new method, simultaneous Coherent Structure Coloring (sCSC), which accomplishes the task of unsupervised clustering without a priori guidance regarding the underlying structure of the data. To illustrate the versatility of the method, we apply it to frontier physics problems at vastly different temporal and spatial scales: in a theoretical model of geophysical fluid dynamics, in laboratory measurements of vortex ring formation and entrainment, and in atomistic simulation of the Protein G system. The theoretical flow involves sparse sampling of non-equilibrium dynamics, where this new technique can find and characterize the structures that govern fluid transport using two orders of magnitude less data than required by existing methods. Application of the method to empirical measurements of vortex formation leads to the discovery of a well defined region in which vortex ring entrainment occurs, with potential implications ranging from flow control to cardiovascular diagnostics. Finally, the protein folding example demonstrates a data-rich application governed by equilibrium dynamics, where the technique in this manuscript automatically discovers the hierarchy of distinct processes that govern protein folding and clusters protein configurations accordingly. We anticipate straightforward translation to many other fields where existing analysis tools, such as k-means and traditional hierarchical clustering, require ad hoc assumptions on the data structure or lack the interpretability of the present method. The method is also potentially generalizable to fields where the underlying processes are less accessible, such as genomics and neuroscience.


page 2

page 3

page 4

page 6

page 7


Coherent structure coloring: identification of coherent structures from sparse data using graph theory

We present a frame-invariant method for detecting coherent structures fr...

Segmentation of turbulent computational fluid dynamics simulations with unsupervised ensemble learning

Computer vision and machine learning tools offer an exciting new way for...

Computational Protein Design Using AND/OR Branch-and-Bound Search

The computation of the global minimum energy conformation (GMEC) is an i...

Tensor-based flow reconstruction from optimally located sensor measurements

Reconstructing high-resolution flow fields from sparse measurements is a...

Unsupervised clustering of series using dynamic programming and neural processes

Following the work of arXiv:2101.09512, we are interested in clustering ...

Improved characterization of Lagrangian coherent structures through time-scale analysis

The computation of Lagrangian coherent structures (LCS) has established ...