Predicting drug properties with parameter-free machine learning: Pareto-Optimal Embedded Modeling (POEM)

by   Andrew E. Brereton, et al.

The prediction of absorption, distribution, metabolism, excretion, and toxicity (ADMET) of small molecules from their molecular structure is a central problem in medicinal chemistry with great practical importance in drug discovery. Creating predictive models conventionally requires substantial trial-and-error for the selection of molecular representations, machine learning (ML) algorithms, and hyperparameter tuning. A generally applicable method that performs well on all datasets without tuning would be of great value but is currently lacking. Here, we describe Pareto-Optimal Embedded Modeling (POEM), a similarity-based method for predicting molecular properties. POEM is a non-parametric, supervised ML algorithm developed to generate reliable predictive models without need for optimization. POEMs predictive strength is obtained by combining multiple different representations of molecular structures in a context-specific manner, while maintaining low dimensionality. We benchmark POEM relative to industry-standard ML algorithms and published results across 17 classifications tasks. POEM performs well in all cases and reduces the risk of overfitting.



There are no comments yet.


page 16

page 18

page 23

page 29

page 30


Do Large Scale Molecular Language Representations Capture Important Structural Information?

Predicting chemical properties from the structure of a molecule is of gr...

CheMixNet: Mixed DNN Architectures for Predicting Chemical Properties using Multiple Molecular Representations

SMILES is a linear representation of chemical structures which encodes t...

GEOM: Energy-annotated molecular conformations for property prediction and molecular generation

Machine learning outperforms traditional approaches in many molecular de...

Machine Learning for Molecular Dynamics on Long Timescales

Molecular Dynamics (MD) simulation is widely used to analyze the propert...

Deep Spatial Learning with Molecular Vibration

Machine learning over-fitting caused by data scarcity greatly limits the...

Artificial Intelligence based Autonomous Molecular Design for Medical Therapeutic: A Perspective

Domain-aware machine learning (ML) models have been increasingly adopted...

Modeling Household Online Shopping Demand in the U.S.: A Machine Learning Approach and Comparative Investigation between 2009 and 2017

Despite the rapid growth of online shopping and research interest in the...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.