Designing Feature Vector Representations: A case study from Chemistry

12/07/2022
by   Signe Sidwall Thygesen, et al.
0

We present a case study investigating feature descriptors in the context of the analysis of chemical multivariate ensemble data. The data of each ensemble member consists of three parts: the design parameters for each ensemble member, field data resulting from the numerical simulations, and physical properties of the molecules. Since feature-based methods have the potential to reduce the data complexity and facilitate comparison and clustering, we are focusing on such methods. However, there are many options to design the feature vector representation and there is no obvious preference. To get a better understanding of the different representations, we analyze their similarities and differences. Thereby, we focus on three characteristics derived from the representations: the distribution of pairwise distances, the clustering tendency, and the rank-order of the pairwise distances. The results of our investigations partially confirmed expected behavior, but also provided some surprising observations that can be used for the future development of feature representations in the chemical domain.

READ FULL TEXT
research
12/11/2018

Learning representations of molecules and materials with atomistic neural networks

Deep Learning has been shown to learn efficient representations for stru...
research
07/22/2015

Practical Selection of SVM Supervised Parameters with Different Feature Representations for Vowel Recognition

It is known that the classification performance of Support Vector Machin...
research
11/29/2019

Minkowski distances and standardisation for clustering and classification of high dimensional data

There are many distance-based methods for classification and clustering,...
research
11/24/2015

Statistical Properties of the Single Linkage Hierarchical Clustering Estimator

Distance-based hierarchical clustering (HC) methods are widely used in u...
research
08/19/2016

Space-Filling Curves as a Novel Crystal Structure Representation for Machine Learning Models

A fundamental problem in applying machine learning techniques for chemic...
research
10/27/2021

The chemical space of terpenes: insights from data science and AI

Terpenes are a widespread class of natural products with significant che...
research
08/03/2021

Fast Estimation Method for the Stability of Ensemble Feature Selectors

It is preferred that feature selectors be stable for better interpretabi...

Please sign up or login with your details

Forgot password? Click here to reset