Improving VAE based molecular representations for compound property prediction

01/13/2022
by   A. Tevosyan, et al.
24

Collecting labeled data for many important tasks in chemoinformatics is time consuming and requires expensive experiments. In recent years, machine learning has been used to learn rich representations of molecules using large scale unlabeled molecular datasets and transfer the knowledge to solve the more challenging tasks with limited datasets. Variational autoencoders are one of the tools that have been proposed to perform the transfer for both chemical property prediction and molecular generation tasks. In this work we propose a simple method to improve chemical property prediction performance of machine learning models by incorporating additional information on correlated molecular descriptors in the representations learned by variational autoencoders. We verify the method on three property prediction asks. We explore the impact of the number of incorporated descriptors, correlation between the descriptors and the target properties, sizes of the datasets etc. Finally, we show the relation between the performance of property prediction models and the distance between property prediction dataset and the larger unlabeled dataset in the representation space.

READ FULL TEXT

page 18

page 19

page 29

page 30

page 31

research
02/19/2021

MolCLR: Molecular Contrastive Learning of Representations via Graph Neural Networks

Molecular machine learning bears promise for efficient molecule property...
research
05/06/2022

Transferring Chemical and Energetic Knowledge Between Molecular Systems with Machine Learning

Predicting structural and energetic properties of a molecular system is ...
research
08/10/2022

Semi-Supervised Junction Tree Variational Autoencoder for Molecular Property Prediction

Recent advances in machine learning have enabled accurate prediction of ...
research
11/01/2018

Independent Vector Analysis for Data Fusion Prior to Molecular Property Prediction with Machine Learning

Due to its high computational speed and accuracy compared to ab-initio q...
research
02/17/2022

Knowledge-informed Molecular Learning: A Survey on Paradigm Transfer

Machine learning, especially deep learning, has greatly advanced molecul...
research
12/07/2020

Reprogramming Language Models for Molecular Representation Learning

Recent advancements in transfer learning have made it a promising approa...
research
09/09/2022

SPT-NRTL: A physics-guided machine learning model to predict thermodynamically consistent activity coefficients

The availability of property data is one of the major bottlenecks in the...

Please sign up or login with your details

Forgot password? Click here to reset