MoleculeNet: A Benchmark for Molecular Machine Learning

03/02/2017
by   Zhenqin Wu, et al.
0

Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm.

READ FULL TEXT

page 30

page 31

page 35

page 38

page 40

research
11/07/2019

Machine learning for molecular simulation

Machine learning (ML) is transforming all areas of science. The complex ...
research
11/14/2020

Deep Spatial Learning with Molecular Vibration

Machine learning over-fitting caused by data scarcity greatly limits the...
research
07/01/2019

An Open Source AutoML Benchmark

In recent years, an active field of research has developed around automa...
research
05/08/2022

FP-GNN: a versatile deep learning architecture for enhanced molecular property prediction

Deep learning is an important method for molecular design and exhibits c...
research
03/23/2017

Perspective: Energy Landscapes for Machine Learning

Machine learning techniques are being increasingly used as flexible non-...
research
06/24/2018

N-Gram Graph, A Novel Molecule Representation

Virtual high-throughput screening provides a strategy for prioritizing c...
research
03/19/2019

Machine Learning for removing EEG artifacts: Setting the benchmark

Electroencephalograms (EEG) are often contaminated by artifacts which ma...

Please sign up or login with your details

Forgot password? Click here to reset