3D Denoisers are Good 2D Teachers: Molecular Pretraining via Denoising and Cross-Modal Distillation

09/08/2023
by   Sungjun Cho, et al.
0

Pretraining molecular representations from large unlabeled data is essential for molecular property prediction due to the high cost of obtaining ground-truth labels. While there exist various 2D graph-based molecular pretraining approaches, these methods struggle to show statistically significant gains in predictive performance. Recent work have thus instead proposed 3D conformer-based pretraining under the task of denoising, which led to promising results. During downstream finetuning, however, models trained with 3D conformers require accurate atom-coordinates of previously unseen molecules, which are computationally expensive to acquire at scale. In light of this limitation, we propose D D, a self-supervised molecular representation learning framework that pretrains a 2D graph encoder by distilling representations from a 3D denoiser. With denoising followed by cross-modal knowledge distillation, our approach enjoys use of knowledge obtained from denoising as well as painless application to downstream tasks with no access to accurate conformers. Experiments on real-world molecular property prediction datasets show that the graph encoder trained via D D can infer 3D information based on the 2D graph and shows superior performance and label-efficiency against other baselines.

READ FULL TEXT
research
10/19/2020

ChemBERTa: Large-Scale Self-Supervised Pretraining for Molecular Property Prediction

GNNs and chemical fingerprints are the predominant approaches to represe...
research
05/03/2023

MolKD: Distilling Cross-Modal Knowledge in Chemical Reactions for Molecular Property Prediction

How to effectively represent molecules is a long-standing challenge for ...
research
11/23/2022

How do Cross-View and Cross-Modal Alignment Affect Representations in Contrastive Learning?

Various state-of-the-art self-supervised visual representation learning ...
research
09/29/2022

Improving Molecular Pretraining with Complementary Featurizations

Molecular pretraining, which learns molecular representations over massi...
research
12/26/2021

Stepping Back to SMILES Transformers for Fast Molecular Representation Inference

In the intersection of molecular science and deep learning, tasks like v...
research
12/01/2021

Molecular Contrastive Learning with Chemical Element Knowledge Graph

Molecular representation learning contributes to multiple downstream tas...
research
06/18/2021

Graph Context Encoder: Graph Feature Inpainting for Graph Generation and Self-supervised Pretraining

We propose the Graph Context Encoder (GCE), a simple but efficient appro...

Please sign up or login with your details

Forgot password? Click here to reset