MolCLR: Molecular Contrastive Learning of Representations via Graph Neural Networks

by   Yuyang Wang, et al.

Molecular machine learning bears promise for efficient molecule property prediction and drug discovery. However, due to the limited labeled data and the giant chemical space, machine learning models trained via supervised learning perform poorly in generalization. This greatly limits the applications of machine learning methods for molecular design and discovery. In this work, we present MolCLR: Molecular Contrastive Learning of Representations via Graph Neural Networks (GNNs), a self-supervised learning framework for large unlabeled molecule datasets. Specifically, we first build a molecular graph, where each node represents an atom and each edge represents a chemical bond. A GNN is then used to encode the molecule graph. We propose three novel molecule graph augmentations: atom masking, bond deletion, and subgraph removal. A contrastive estimator is utilized to maximize the agreement of different graph augmentations from the same molecule. Experiments show that molecule representations learned by MolCLR can be transferred to multiple downstream molecular property prediction tasks. Our method thus achieves state-of-the-art performance on many challenging datasets. We also prove the efficiency of our proposed molecule graph augmentations on supervised molecular classification tasks.


page 1

page 2

page 3

page 4


Improving Molecular Contrastive Learning via Faulty Negative Mitigation and Decomposed Fragment Contrast

Deep learning has been a prevalence in computational chemistry and widel...

Gated Graph Recursive Neural Networks for Molecular Property Prediction

Molecule property prediction is a fundamental problem for computer-aided...

Improving VAE based molecular representations for compound property prediction

Collecting labeled data for many important tasks in chemoinformatics is ...

CheMixNet: Mixed DNN Architectures for Predicting Chemical Properties using Multiple Molecular Representations

SMILES is a linear representation of chemical structures which encodes t...

Chemistry-informed Macromolecule Graph Representation for Similarity Computation and Supervised Learning

Macromolecules are large, complex molecules composed of covalently bonde...

GeomGCL: Geometric Graph Contrastive Learning for Molecular Property Prediction

Recently many efforts have been devoted to applying graph neural network...

Motif-Driven Contrastive Learning of Graph Representations

Graph motifs are significant subgraph patterns occurring frequently in g...

Code Repositories