Efficient Chemical Space Exploration Using Active Learning Based on Marginalized Graph Kernel: an Application for Predicting the Thermodynamic Properties of Alkanes with Molecu

09/01/2022
by   Yan Xiang, et al.
0

We introduce an explorative active learning (AL) algorithm based on Gaussian process regression and marginalized graph kernel (GPR-MGK) to explore chemical space with minimum cost. Using high-throughput molecular dynamics simulation to generate data and graph neural network (GNN) to predict, we constructed an active learning molecular simulation framework for thermodynamic property prediction. In specific, targeting 251,728 alkane molecules consisting of 4 to 19 carbon atoms and their liquid physical properties: densities, heat capacities, and vaporization enthalpies, we use the AL algorithm to select the most informative molecules to represent the chemical space. Validation of computational and experimental test sets shows that only 313 (0.124% of the total) molecules were sufficient to train an accurate GNN model with R^2 > 0.99 for computational test sets and R^2 > 0.94 for experimental test sets. We highlight two advantages of the presented AL algorithm: compatibility with high-throughput data generation and reliable uncertainty quantification.

READ FULL TEXT
research
01/06/2023

Discovery of structure-property relations for molecules via hypothesis-driven active learning over the chemical space

Discovery of the molecular candidates for applications in drug targets, ...
research
10/16/2018

Prediction of Atomization Energy Using Graph Kernel and Active Learning

Data-driven prediction of molecular properties presents unique challenge...
research
06/12/2020

Weisfeiler-Lehman Embedding for Molecular Graph Neural Networks

A graph neural network (GNN) is a good choice for predicting the chemica...
research
05/23/2022

Tyger: Task-Type-Generic Active Learning for Molecular Property Prediction

How to accurately predict the properties of molecules is an essential pr...
research
01/11/2022

Two Wrongs Can Make a Right: A Transfer Learning Approach for Chemical Discovery with Chemical Accuracy

Appropriately identifying and treating molecules and materials with sign...
research
01/21/2019

Active Learning with Gaussian Processes for High Throughput Phenotyping

A looming question that must be solved before robotic plant phenotyping ...
research
07/14/2022

Uncertainty quantification for predictions of atomistic neural networks

The value of uncertainty quantification on predictions for trained neura...

Please sign up or login with your details

Forgot password? Click here to reset