Shape is (almost) all!: Persistent homology features (PHFs) are an information rich input for efficient molecular machine learning

04/15/2023
by   Ella Gale, et al.
0

3-D shape is important to chemistry, but how important? Machine learning works best when the inputs are simple and match the problem well. Chemistry datasets tend to be very small compared to those generally used in machine learning so we need to get the most from each datapoint. Persistent homology measures the topological shape properties of point clouds at different scales and is used in topological data analysis. Here we investigate what persistent homology captures about molecular structure and create persistent homology features (PHFs) that encode a molecule's shape whilst losing most of the symbolic detail like atom labels, valence, charge, bonds etc. We demonstrate the usefulness of PHFs on a series of chemical datasets: QM7, lipophilicity, Delaney and Tox21. PHFs work as well as the best benchmarks. PHFs are very information dense and much smaller than other encoding methods yet found, meaning ML algorithms are much more energy efficient. PHFs success despite losing a large amount of chemical detail highlights how much of chemistry can be simplified to topological shape.

READ FULL TEXT

page 2

page 9

page 13

page 15

research
07/18/2023

Adaptive Topological Feature via Persistent Homology: Filtration Learning for Point Clouds

Machine learning for point clouds has been attracting much attention, wi...
research
09/12/2022

On the application of topological data analysis: a Z24 Bridge case study

Topological methods are very rarely used in structural health monitoring...
research
06/21/2022

On the effectiveness of persistent homology

Persistent homology (PH) is one of the most popular methods in Topologic...
research
07/07/2016

Persistent Homology on Grassmann Manifolds for Analysis of Hyperspectral Movies

The existence of characteristic structure, or shape, in complex data set...
research
11/25/2019

Classification of Single-lead Electrocardiograms: TDA Informed Machine Learning

Atrial Fibrillation is a heart condition characterized by erratic heart ...
research
07/12/2023

Machine learning and Topological data analysis identify unique features of human papillae in 3D scans

The tongue surface houses a range of papillae that are integral to the m...
research
03/10/2021

Topology Applied to Machine Learning: From Global to Local

Through the use of examples, we explain one way in which applied topolog...

Please sign up or login with your details

Forgot password? Click here to reset