BatmanNet: Bi-branch Masked Graph Transformer Autoencoder for Molecular Representation

11/25/2022
by   Zhen Wang, et al.
0

Although substantial efforts have been made using graph neural networks (GNNs) for AI-driven drug discovery (AIDD), effective molecular representation learning remains an open challenge, especially in the case of insufficient labeled molecules. Recent studies suggest that big GNN models pre-trained by self-supervised learning on unlabeled datasets enable better transfer performance in downstream molecular property prediction tasks. However, they often require large-scale datasets and considerable computational resources, which is time-consuming, computationally expensive, and environmentally unfriendly. To alleviate these limitations, we propose a novel pre-training model for molecular representation learning, Bi-branch Masked Graph Transformer Autoencoder (BatmanNet). BatmanNet features two tailored and complementary graph autoencoders to reconstruct the missing nodes and edges from a masked molecular graph. To our surprise, BatmanNet discovered that the highly masked proportion (60 further propose an asymmetric graph-based encoder-decoder architecture for either nodes and edges, where a transformer-based encoder only takes the visible subset of nodes or edges, and a lightweight decoder reconstructs the original molecule from the latent representation and mask tokens. With this simple yet effective asymmetrical design, our BatmanNet can learn efficiently even from a much smaller-scale unlabeled molecular dataset to capture the underlying structural and semantic information, overcoming a major limitation of current deep neural networks for molecular representation learning. For instance, using only 250K unlabelled molecules as pre-training data, our BatmanNet with 2.575M parameters achieves a 0.5 compared with the current state-of-the-art method with 100M parameters pre-trained on 11M molecules.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/02/2022

KPGT: Knowledge-Guided Pre-training of Graph Transformer for Molecular Property Prediction

Designing accurate deep learning models for molecular property predictio...
research
10/15/2022

Substructure-Atom Cross Attention for Molecular Representation Learning

Designing a neural network architecture for molecular representation is ...
research
09/01/2023

Geometry-aware Line Graph Transformer Pre-training for Molecular Property Prediction

Molecular property prediction with deep learning has gained much attenti...
research
12/20/2022

MolCPT: Molecule Continuous Prompt Tuning to Generalize Molecular Representation Learning

Molecular representation learning is crucial for the problem of molecula...
research
07/14/2022

Unified 2D and 3D Pre-Training of Molecular Representations

Molecular representation learning has attracted much attention recently....
research
05/23/2021

Hypergraph Pre-training with Graph Neural Networks

Despite the prevalence of hypergraphs in a variety of high-impact applic...
research
02/17/2022

Graph Masked Autoencoder

Transformers have achieved state-of-the-art performance in learning grap...

Please sign up or login with your details

Forgot password? Click here to reset