Pre-training via Denoising for Molecular Property Prediction

05/31/2022
by   Sheheryar Zaidi, et al.
0

Many important problems involving molecular property prediction from 3D structures have limited data, posing a generalization challenge for neural networks. In this paper, we describe a pre-training technique that utilizes large datasets of 3D molecular structures at equilibrium to learn meaningful representations for downstream tasks. Inspired by recent advances in noise regularization, our pre-training objective is based on denoising. Relying on the well-known link between denoising autoencoders and score-matching, we also show that the objective corresponds to learning a molecular force field – arising from approximating the physical state distribution with a mixture of Gaussians – directly from equilibrium structures. Our experiments demonstrate that using this pre-training objective significantly improves performance on multiple benchmarks, achieving a new state-of-the-art on the majority of targets in the widely used QM9 dataset. Our analysis then provides practical insights into the effects of different factors – dataset sizes, model size and architecture, and the choice of upstream and downstream datasets – on pre-training.

READ FULL TEXT

page 5

page 17

research
07/20/2023

Fractional Denoising for 3D Molecular Pre-training

Coordinate denoising is a promising 3D molecular pre-training method, wh...
research
02/12/2015

Convergence of gradient based pre-training in Denoising autoencoders

The success of deep architectures is at least in part attributed to the ...
research
03/03/2023

Denoise Pre-training on Non-equilibrium Molecules for Accurate and Transferable Neural Potentials

Machine learning methods, particularly recent advances in equivariant gr...
research
06/17/2021

Dual-view Molecule Pre-training

Inspired by its success in natural language processing and computer visi...
research
08/17/2023

On Data Imbalance in Molecular Property Prediction with Pre-training

Revealing and analyzing the various properties of materials is an essent...
research
02/11/2020

Improving Molecular Design by Stochastic Iterative Target Augmentation

Generative models in molecular design tend to be richly parameterized, d...
research
09/20/2020

Provable Finite Data Generalization with Group Autoencoder

Deep Autoencoders (AEs) provide a versatile framework to learn a compres...

Please sign up or login with your details

Forgot password? Click here to reset