Scaffold Embeddings: Learning the Structure Spanned by Chemical Fragments, Scaffolds and Compounds

03/11/2021
by   Austin Clyde, et al.
0

Molecules have seemed like a natural fit to deep learning's tendency to handle a complex structure through representation learning, given enough data. However, this often continuous representation is not natural for understanding chemical space as a domain and is particular to samples and their differences. We focus on exploring a natural structure for representing chemical space as a structured domain: embedding drug-like chemical space into an enumerable hypergraph based on scaffold classes linked through an inclusion operator. This paper shows how molecules form classes of scaffolds, how scaffolds relate to each in a hypergraph, and how this structure of scaffolds is natural for drug discovery workflows such as predicting properties and optimizing molecular structures. We compare the assumptions and utility of various embeddings of molecules, such as their respective induced distance metrics, their extendibility to represent chemical space as a structured domain, and the consequences of utilizing the structure for learning tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/21/2023

Machine learning for the prediction of safe and biologically active organophosphorus molecules

Drug discovery is a complex process with a large molecular space to be c...
research
06/01/2020

Semi-Supervised Hierarchical Drug Embedding in Hyperbolic Space

Learning accurate drug representation is essential for tasks such as com...
research
02/09/2019

Clustering Bioactive Molecules in 3D Chemical Space with Unsupervised Deep Learning

Unsupervised clustering has broad applications in data stratification, p...
research
08/02/2022

AI-driven Hypernetwork of Organic Chemistry: Network Statistics and Applications in Reaction Classification

Rapid discovery of new reactions and molecules in recent years has been ...
research
11/22/2019

Approaching Small Molecule Prioritization as a Cross-Modal Information Retrieval Task through Coordinated Representation Learning

Modeling the relationship between chemical structure and molecular activ...
research
07/06/2022

Multi-scale Sinusoidal Embeddings Enable Learning on High Resolution Mass Spectrometry Data

Small molecules in biological samples are studied to provide information...
research
01/02/2022

Rxn Hypergraph: a Hypergraph Attention Model for Chemical Reaction Representation

It is fundamental for science and technology to be able to predict chemi...

Please sign up or login with your details

Forgot password? Click here to reset