Improving Molecular Pretraining with Complementary Featurizations

09/29/2022
by   Yanqiao Zhu, et al.
6

Molecular pretraining, which learns molecular representations over massive unlabeled data, has become a prominent paradigm to solve a variety of tasks in computational chemistry and drug discovery. Recently, prosperous progress has been made in molecular pretraining with different molecular featurizations, including 1D SMILES strings, 2D graphs, and 3D geometries. However, the role of molecular featurizations with their corresponding neural architectures in molecular pretraining remains largely unexamined. In this paper, through two case studies – chirality classification and aromatic ring counting – we first demonstrate that different featurization techniques convey chemical information differently. In light of this observation, we propose a simple and effective MOlecular pretraining framework with COmplementary featurizations (MOCO). MOCO comprehensively leverages multiple featurizations that complement each other and outperforms existing state-of-the-art models that solely relies on one or two featurizations on a wide range of molecular property prediction tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/13/2023

MolCAP: Molecular Chemical reActivity pretraining and prompted-finetuning enhanced molecular representation learning

Molecular representation learning (MRL) is a fundamental task for drug d...
research
09/05/2022

ChemBERTa-2: Towards Chemical Foundation Models

Large pretrained models such as GPT-3 have had tremendous impact on mode...
research
11/03/2022

MolE: a molecular foundation model for drug discovery

Models that accurately predict properties based on chemical structure ar...
research
09/08/2023

3D Denoisers are Good 2D Teachers: Molecular Pretraining via Denoising and Cross-Modal Distillation

Pretraining molecular representations from large unlabeled data is essen...
research
04/24/2023

Uni-QSAR: an Auto-ML Tool for Molecular Property Prediction

Recently deep learning based quantitative structure-activity relationshi...
research
05/23/2020

Insilico molecular docking analysis of isolated compounds of Ocimum sanctum against two related targets to diabetes

Background: To investigation antidiabetic activity of the isolated compo...
research
08/14/2020

Graph Polish: A Novel Graph Generation Paradigm for Molecular Optimization

Molecular optimization, which transforms a given input molecule X into a...

Please sign up or login with your details

Forgot password? Click here to reset