MolE: a molecular foundation model for drug discovery

11/03/2022
by   Oscar Méndez-Lucio, et al.
0

Models that accurately predict properties based on chemical structure are valuable tools in drug discovery. However, for many properties, public and private training sets are typically small, and it is difficult for the models to generalize well outside of the training data. Recently, large language models have addressed this problem by using self-supervised pretraining on large unlabeled datasets, followed by fine-tuning on smaller, labeled datasets. In this paper, we report MolE, a molecular foundation model that adapts the DeBERTa architecture to be used on molecular graphs together with a two-step pretraining strategy. The first step of pretraining is a self-supervised approach focused on learning chemical structures, and the second step is a massive multi-task approach to learn biological information. We show that fine-tuning pretrained MolE achieves state-of-the-art results on 9 of the 22 ADMET tasks included in the Therapeutic Data Commons.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/05/2022

ChemBERTa-2: Towards Chemical Foundation Models

Large pretrained models such as GPT-3 have had tremendous impact on mode...
research
09/29/2022

Improving Molecular Pretraining with Complementary Featurizations

Molecular pretraining, which learns molecular representations over massi...
research
04/24/2023

Uni-QSAR: an Auto-ML Tool for Molecular Property Prediction

Recently deep learning based quantitative structure-activity relationshi...
research
06/08/2021

Self-supervised Graph-level Representation Learning with Local and Global Structure

This paper studies unsupervised/self-supervised whole-graph representati...
research
10/05/2022

Antibody Representation Learning for Drug Discovery

Therapeutic antibody development has become an increasingly popular appr...
research
01/09/2023

Self-Supervised Time-to-Event Modeling with Structured Medical Records

Time-to-event models (also known as survival models) are used in medicin...
research
01/29/2023

Unifying Molecular and Textual Representations via Multi-task Language Modelling

The recent advances in neural language models have also been successfull...

Please sign up or login with your details

Forgot password? Click here to reset