MM-Deacon: Multimodal molecular domain embedding analysis via contrastive learning

09/18/2021
by   Zhihui Guo, et al.
0

Molecular representation learning plays an essential role in cheminformatics. Recently, language model-based approaches have been popular as an alternative to traditional expert-designed features to encode molecules. However, these approaches only utilize a single modality for representing molecules. Driven by the fact that a given molecule can be described through different modalities such as Simplified Molecular Line Entry System (SMILES), The International Union of Pure and Applied Chemistry (IUPAC), and The IUPAC International Chemical Identifier (InChI), we propose a multimodal molecular embedding generation approach called MM-Deacon (multimodal molecular domain embedding analysis via contrastive learning). MM-Deacon is trained using SMILES and IUPAC molecule representations as two different modalities. First, SMILES and IUPAC strings are encoded by using two different transformer-based language models independently, then the contrastive loss is utilized to bring these encoded representations from different modalities closer to each other if they belong to the same molecule, and to push embeddings farther from each other if they belong to different molecules. We evaluate the robustness of our molecule embeddings on molecule clustering, cross-modal molecule search, drug similarity assessment and drug-drug interaction tasks.

READ FULL TEXT

page 7

page 9

research
04/10/2023

SELFormer: Molecular Representation Learning via SELFIES Language Models

Automated computational analysis of the vast chemical space is critical ...
research
05/03/2023

MolKD: Distilling Cross-Modal Knowledge in Chemical Reactions for Molecular Property Prediction

How to effectively represent molecules is a long-standing challenge for ...
research
02/28/2020

A Deep Generative Model for Fragment-Based Molecule Generation

Molecule generation is a challenging open problem in cheminformatics. Cu...
research
06/01/2020

Semi-Supervised Hierarchical Drug Embedding in Hyperbolic Space

Learning accurate drug representation is essential for tasks such as com...
research
07/22/2023

Extracting Molecular Properties from Natural Language with Multimodal Contrastive Learning

Deep learning in computational biochemistry has traditionally focused on...
research
08/09/2020

Augmenting Molecular Images with Vector Representations as a Featurization Technique for Drug Classification

One of the key steps in building deep learning systems for drug classifi...
research
10/04/2022

One Transformer Can Understand Both 2D 3D Molecular Data

Unlike vision and language data which usually has a unique format, molec...

Please sign up or login with your details

Forgot password? Click here to reset