Improving Molecular Contrastive Learning via Faulty Negative Mitigation and Decomposed Fragment Contrast

02/18/2022
by   Yuyang Wang, et al.
2

Deep learning has been a prevalence in computational chemistry and widely implemented in molecule property predictions. Recently, self-supervised learning (SSL), especially contrastive learning (CL), gathers growing attention for the potential to learn molecular representations that generalize to the gigantic chemical space. Unlike supervised learning, SSL can directly leverage large unlabeled data, which greatly reduces the effort to acquire molecular property labels through costly and time-consuming simulations or experiments. However, most molecular SSL methods borrow the insights from the machine learning community but neglect the unique cheminformatics (e.g., molecular fingerprints) and multi-level graphical structures (e.g., functional groups) of molecules. In this work, we propose iMolCLR: improvement of Molecular Contrastive Learning of Representations with graph neural networks (GNNs) in two aspects, (1) mitigating faulty negative contrastive instances via considering cheminformatics similarities between molecule pairs; (2) fragment-level contrasting between intra- and inter-molecule substructures decomposed from molecules. Experiments have shown that the proposed strategies significantly improve the performance of GNN models on various challenging molecular property predictions. In comparison to the previous CL framework, iMolCLR demonstrates an averaged 1.3 classification benchmarks and an averaged 4.8 regression benchmarks. On most benchmarks, the generic GNN pre-trained by iMolCLR rivals or even surpasses supervised learning models with sophisticated architecture designs and engineered features. Further investigations demonstrate that representations learned through iMolCLR intrinsically embed scaffolds and functional groups that can reason molecule similarities.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/19/2021

MolCLR: Molecular Contrastive Learning of Representations via Graph Neural Networks

Molecular machine learning bears promise for efficient molecule property...
research
06/11/2021

ChemRL-GEM: Geometry Enhanced Molecular Representation Learning for Property Prediction

Effective molecular representation learning is of great importance to fa...
research
03/24/2021

Knowledge-aware Contrastive Molecular Graph Learning

Leveraging domain knowledge including fingerprints and functional groups...
research
05/22/2023

Atomic and Subgraph-aware Bilateral Aggregation for Molecular Representation Learning

Molecular representation learning is a crucial task in predicting molecu...
research
03/24/2021

Quantum Mechanics and Machine Learning Synergies: Graph Attention Neural Networks to Predict Chemical Reactivity

There is a lack of scalable quantitative measures of reactivity for func...
research
02/04/2023

Harnessing Simulation for Molecular Embeddings

While deep learning has unlocked advances in computational biology once ...
research
07/22/2023

Extracting Molecular Properties from Natural Language with Multimodal Contrastive Learning

Deep learning in computational biochemistry has traditionally focused on...

Please sign up or login with your details

Forgot password? Click here to reset