Matbench Discovery – An evaluation framework for machine learning crystal stability prediction

08/28/2023
by   Janosh Riebesell, et al.
0

Matbench Discovery simulates the deployment of machine learning (ML) energy models in a high-throughput search for stable inorganic crystals. We address the disconnect between (i) thermodynamic stability and formation energy and (ii) in-domain vs out-of-distribution performance. Alongside this paper, we publish a Python package to aid with future model submissions and a growing online leaderboard with further insights into trade-offs between various performance metrics. To answer the question which ML methodology performs best at materials discovery, our initial release explores a variety of models including random forests, graph neural networks (GNN), one-shot predictors, iterative Bayesian optimizers and universal interatomic potentials (UIP). Ranked best-to-worst by their test set F1 score on thermodynamic stability prediction, we find CHGNet > M3GNet > MACE > ALIGNN > MEGNet > CGCNN > CGCNN+P > Wrenformer > BOWSR > Voronoi tessellation fingerprints with random forest. The top 3 models are UIPs, the winning methodology for ML-guided materials discovery, achieving F1 scores of  0.6 for crystal stability classification and discovery acceleration factors (DAF) of up to 5x on the first 10k most stable predictions compared to dummy selection from our test set. We also highlight a sharp disconnect between commonly used global regression metrics and more task-relevant classification metrics. Accurate regressors are susceptible to unexpectedly high false-positive rates if those accurate predictions lie close to the decision boundary at 0 eV/atom above the convex hull where most materials are. Our results highlight the need to focus on classification metrics that actually correlate with improved stability hit rate.

READ FULL TEXT

page 4

page 12

page 13

page 14

page 17

page 18

research
11/25/2019

Machine-learned metrics for predicting thelikelihood of success in materials discovery

Materials discovery is often compared to the challenge of finding a need...
research
11/25/2019

Machine-learned metrics for predicting the likelihood of success in materials discovery

Materials discovery is often compared to the challenge of finding a need...
research
09/14/2022

Use case-focused metrics to evaluate machine learning for diseases involving parasite loads

Communal hill-climbing, via comparison of algorithm performances, can gr...
research
11/29/2021

Prediction of Large Magnetic Moment Materials With Graph Neural Networks and Random Forests

Magnetic materials are crucial components of many technologies that coul...
research
01/14/2022

Model Stability with Continuous Data Updates

In this paper, we study the "stability" of machine learning (ML) models ...
research
08/08/2023

Explainable machine learning to enable high-throughput electrical conductivity optimization of doped conjugated polymers

The combination of high-throughput experimentation techniques and machin...
research
10/25/2022

A Database of Ultrastable MOFs Reassembled from Stable Fragments with Machine Learning Models

High-throughput screening of large hypothetical databases of metal-organ...

Please sign up or login with your details

Forgot password? Click here to reset