Predicting drug properties with parameter-free machine learning: Pareto-Optimal Embedded Modeling (POEM)

02/11/2020
by   Andrew E. Brereton, et al.
0

The prediction of absorption, distribution, metabolism, excretion, and toxicity (ADMET) of small molecules from their molecular structure is a central problem in medicinal chemistry with great practical importance in drug discovery. Creating predictive models conventionally requires substantial trial-and-error for the selection of molecular representations, machine learning (ML) algorithms, and hyperparameter tuning. A generally applicable method that performs well on all datasets without tuning would be of great value but is currently lacking. Here, we describe Pareto-Optimal Embedded Modeling (POEM), a similarity-based method for predicting molecular properties. POEM is a non-parametric, supervised ML algorithm developed to generate reliable predictive models without need for optimization. POEMs predictive strength is obtained by combining multiple different representations of molecular structures in a context-specific manner, while maintaining low dimensionality. We benchmark POEM relative to industry-standard ML algorithms and published results across 17 classifications tasks. POEM performs well in all cases and reduces the risk of overfitting.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 16

page 18

page 23

page 29

page 30

06/17/2021

Do Large Scale Molecular Language Representations Capture Important Structural Information?

Predicting chemical properties from the structure of a molecule is of gr...
11/14/2018

CheMixNet: Mixed DNN Architectures for Predicting Chemical Properties using Multiple Molecular Representations

SMILES is a linear representation of chemical structures which encodes t...
06/09/2020

GEOM: Energy-annotated molecular conformations for property prediction and molecular generation

Machine learning outperforms traditional approaches in many molecular de...
12/18/2018

Machine Learning for Molecular Dynamics on Long Timescales

Molecular Dynamics (MD) simulation is widely used to analyze the propert...
11/14/2020

Deep Spatial Learning with Molecular Vibration

Machine learning over-fitting caused by data scarcity greatly limits the...
02/10/2021

Artificial Intelligence based Autonomous Molecular Design for Medical Therapeutic: A Perspective

Domain-aware machine learning (ML) models have been increasingly adopted...
01/11/2021

Modeling Household Online Shopping Demand in the U.S.: A Machine Learning Approach and Comparative Investigation between 2009 and 2017

Despite the rapid growth of online shopping and research interest in the...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.