Augmenting Molecular Images with Vector Representations as a Featurization Technique for Drug Classification

08/09/2020
by   Daniel de Marchi, et al.
0

One of the key steps in building deep learning systems for drug classification and generation is the choice of featurization for the molecules. Previous featurization methods have included molecular images, binary strings, graphs, and SMILES strings. This paper proposes the creation of molecular images captioned with binary vectors that encode information not contained in or easily understood from a molecular image alone. Specifically, we use Morgan fingerprints, which encode higher level structural information, and MACCS keys, which encode yes or no questions about a molecules properties and structure. We tested our method on the HIV dataset published by the Pande lab, which consists of 41,127 molecules labeled by if they inhibit the HIV virus. Our final model achieved a state of the art AUC ROC on the HIV dataset, outperforming all other methods. Moreover, the model converged significantly faster than most other methods, requiring dramatically less computational power than unaugmented images.

READ FULL TEXT

page 1

page 3

page 4

research
11/07/2021

Structure-aware generation of drug-like molecules

Structure-based drug design involves finding ligand molecules that exhib...
research
11/16/2022

Molecular Fingerprints for Robust and Efficient ML-Driven Molecular Generation

We propose a novel molecular fingerprint-based variational autoencoder a...
research
05/30/2019

All SMILES VAE

Variational autoencoders (VAEs) defined over SMILES string and graph-bas...
research
06/09/2020

GEOM: Energy-annotated molecular conformations for property prediction and molecular generation

Machine learning outperforms traditional approaches in many molecular de...
research
05/30/2019

All SMILES Variational Autoencoder

Variational autoencoders (VAEs) defined over SMILES string and graph-bas...
research
09/03/2021

IMG2SMI: Translating Molecular Structure Images to Simplified Molecular-input Line-entry System

Like many scientific fields, new chemistry literature has grown at a sta...
research
09/18/2021

MM-Deacon: Multimodal molecular domain embedding analysis via contrastive learning

Molecular representation learning plays an essential role in cheminforma...

Please sign up or login with your details

Forgot password? Click here to reset