DeepAI
Log In Sign Up

The Open Catalyst 2020 (OC20) Dataset and Community Challenges

10/20/2020
by   Lowik Chanussot, et al.
38

Catalyst discovery and optimization is key to solving many societal and energy challenges including solar fuels synthesis, long-term energy storage, and renewable fertilizer production. Despite considerable effort by the catalysis community to apply machine learning models to the computational catalyst discovery process, it remains an open challenge to build models that can generalize across both elemental compositions of surfaces and adsorbate identity/configurations, perhaps because datasets have been smaller in catalysis than related fields. To address this we developed the OC20 dataset, consisting of 1,281,121 Density Functional Theory (DFT) relaxations (264,900,500 single point evaluations) across a wide swath of materials, surfaces, and adsorbates (nitrogen, carbon, and oxygen chemistries). We supplemented this dataset with randomly perturbed structures, short timescale molecular dynamics, and electronic structure analyses. The dataset comprises three central tasks indicative of day-to-day catalyst modeling and comes with pre-defined train/validation/test splits to facilitate direct comparisons with future model development efforts. We applied three state-of-the-art graph neural network models (SchNet, Dimenet, CGCNN) to each of these tasks as baseline demonstrations for the community to build on. In almost every task, no upper limit on model size was identified, suggesting that even larger models are likely to improve on initial results. The dataset and baseline models are both provided as open resources, as well as a public leader board to encourage community contributions to solve these important tasks.

READ FULL TEXT

page 2

page 5

page 7

page 8

page 15

page 22

page 24

page 25

06/17/2022

The Open Catalyst 2022 (OC22) Dataset and Challenges for Oxide Electrocatalysis

Computational catalysis and machine learning communities have made consi...
10/31/2022

The Open MatSci ML Toolkit: A Flexible Framework for Machine Learning in Materials Science

We present the Open MatSci ML Toolkit: a flexible, self-contained, and s...
06/22/2019

Alchemy: A Quantum Chemistry Dataset for Benchmarking AI Models

We introduce a new molecular dataset, named Alchemy, for developing mach...
03/07/2019

Transfer Learning Using Ensemble Neural Nets for Organic Solar Cell Screening

Organic Solar Cells are a promising technology for solving the clean ene...
11/13/2019

AMPL: A Data-Driven Modeling Pipeline for Drug Discovery

One of the key requirements for incorporating machine learning into the ...
04/06/2022

How Do Graph Networks Generalize to Large and Diverse Molecular Systems?

The predominant method of demonstrating progress of atomic graph neural ...

Code Repositories