Independent Vector Analysis for Data Fusion Prior to Molecular Property Prediction with Machine Learning

11/01/2018
by   Zois Boukouvalas, et al.
0

Due to its high computational speed and accuracy compared to ab-initio quantum chemistry and forcefield modeling, the prediction of molecular properties using machine learning has received great attention in the fields of materials design and drug discovery. A main ingredient required for machine learning is a training dataset consisting of molecular features— for example fingerprint bits, chemical descriptors, etc. that adequately characterize the corresponding molecules. However, choosing features for any application is highly non-trivial. No "universal" method for feature selection exists. In this work, we propose a data fusion framework that uses Independent Vector Analysis to exploit underlying complementary information contained in different molecular featurization methods, bringing us a step closer to automated feature generation. Our approach takes an arbitrary number of individual feature vectors and automatically generates a single, compact (low dimensional) set of molecular features that can be used to enhance the prediction performance of regression models. At the same time our methodology retains the possibility of interpreting the generated features to discover relationships between molecular structures and properties. We demonstrate this on the QM7b dataset for the prediction of several properties such as atomization energy, polarizability, frontier orbital eigenvalues, ionization potential, electron affinity, and excitation energies. In addition, we show how our method helps improve the prediction of experimental binding affinities for a set of human BACE-1 inhibitors. To encourage more widespread use of IVA we have developed the PyIVA Python package, an open source code which is available for download on Github.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/17/2023

Structure to Property: Chemical Element Embeddings and a Deep Learning Approach for Accurate Prediction of Chemical Properties

The application of machine learning (ML) techniques in computational che...
research
01/13/2022

Improving VAE based molecular representations for compound property prediction

Collecting labeled data for many important tasks in chemoinformatics is ...
research
06/13/2023

Automated 3D Pre-Training for Molecular Property Prediction

Molecular property prediction is an important problem in drug discovery ...
research
09/21/2022

A data-driven interpretation of the stability of molecular crystals

Due to the subtle balance of intermolecular interactions that govern str...
research
12/15/2017

WACSF - Weighted Atom-Centered Symmetry Functions as Descriptors in Machine Learning Potentials

We introduce weighted atom-centered symmetry functions (wACSFs) as descr...
research
06/20/2019

SMILES-X: autonomous molecular compounds characterization for small datasets without descriptors

In materials science and related fields, small datasets (≪1000 samples) ...
research
03/06/2021

Molecular modeling with machine-learned universal potential functions

Molecular modeling is an important topic in drug discovery. Decades of r...

Please sign up or login with your details

Forgot password? Click here to reset