Alchemy: A Quantum Chemistry Dataset for Benchmarking AI Models

06/22/2019
by   Guangyong Chen, et al.
15

We introduce a new molecular dataset, named Alchemy, for developing machine learning models useful in chemistry and material science. As of June 20th 2019, the dataset comprises of 12 quantum mechanical properties of 119,487 organic molecules with up to 14 heavy atoms, sampled from the GDB MedChem database. The Alchemy dataset expands the volume and diversity of existing molecular datasets. Our extensive benchmarks of the state-of-the-art graph neural network models on Alchemy clearly manifest the usefulness of new data in validating and developing machine learning models for chemistry and material science. We further launch a contest to attract attentions from researchers in the related fields. More details can be found on the contest website [https://alchemy.tencent.com]. At the time of benchamrking experiment, we have generated 119,487 molecules in our Alchemy dataset. More molecular samples are generated since then. Hence, we provide a list of molecules used in the reported benchmarks.

READ FULL TEXT
research
06/09/2020

GEOM: Energy-annotated molecular conformations for property prediction and molecular generation

Machine learning outperforms traditional approaches in many molecular de...
research
11/15/2019

A Molecular-MNIST Dataset for Machine Learning Study on Diffraction Imaging and Microscopy

An image dataset of 10 different size molecules, where each molecule has...
research
11/29/2018

Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models

Deep generative models such as generative adversarial networks, variatio...
research
12/15/2022

Hybrid Quantum Generative Adversarial Networks for Molecular Simulation and Drug Discovery

In molecular research, simulation & design of molecules are key areas wi...
research
10/20/2020

The Open Catalyst 2020 (OC20) Dataset and Community Challenges

Catalyst discovery and optimization is key to solving many societal and ...
research
06/21/2023

χiplot: web-first visualisation platform for multidimensional data

χiplot is an HTML5-based system for interactive exploration of data and ...
research
03/28/2022

MolGenSurvey: A Systematic Survey in Machine Learning Models for Molecule Design

Molecule design is a fundamental problem in molecular science and has cr...

Please sign up or login with your details

Forgot password? Click here to reset