One Transformer Can Understand Both 2D 3D Molecular Data

10/04/2022
by   Shengjie Luo, et al.
0

Unlike vision and language data which usually has a unique format, molecules can naturally be characterized using different chemical formulations. One can view a molecule as a 2D graph or define it as a collection of atoms located in a 3D space. For molecular representation learning, most previous works designed neural networks only for a particular data format, making the learned models likely to fail for other data formats. We believe a general-purpose neural network model for chemistry should be able to handle molecular tasks across data modalities. To achieve this goal, in this work, we develop a novel Transformer-based Molecular model called Transformer-M, which can take molecular data of 2D or 3D formats as input and generate meaningful semantic representations. Using the standard Transformer as the backbone architecture, Transformer-M develops two separated channels to encode 2D and 3D structural information and incorporate them with the atom features in the network modules. When the input data is in a particular format, the corresponding channel will be activated, and the other will be disabled. By training on 2D and 3D molecular data with properly designed supervised signals, Transformer-M automatically learns to leverage knowledge from different data modalities and correctly capture the representations. We conducted extensive experiments for Transformer-M. All empirical results show that Transformer-M can simultaneously achieve strong performance on 2D and 3D tasks, suggesting its broad applicability. The code and models will be made publicly available at https://github.com/lsj2408/Transformer-M.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/01/2023

Geometry-aware Line Graph Transformer Pre-training for Molecular Property Prediction

Molecular property prediction with deep learning has gained much attenti...
research
11/20/2022

Heterogenous Ensemble of Models for Molecular Property Prediction

Previous works have demonstrated the importance of considering different...
research
06/20/2019

SMILES-X: autonomous molecular compounds characterization for small datasets without descriptors

In materials science and related fields, small datasets (≪1000 samples) ...
research
07/25/2023

Curvature-based Transformer for Molecular Property Prediction

The prediction of molecular properties is one of the most important and ...
research
03/21/2023

Difficulty in learning chirality for Transformer fed with SMILES

Recent years have seen development of descriptor generation based on rep...
research
09/18/2021

MM-Deacon: Multimodal molecular domain embedding analysis via contrastive learning

Molecular representation learning plays an essential role in cheminforma...
research
04/29/2023

Conditional Graph Information Bottleneck for Molecular Relational Learning

Molecular relational learning, whose goal is to learn the interaction be...

Please sign up or login with your details

Forgot password? Click here to reset