Operator-induced structural variable selection for identifying materials genes

10/19/2021
by   Shengbin Ye, et al.
0

In the emerging field of materials informatics, a fundamental task is to identify physicochemically meaningful descriptors, or materials genes, which are engineered from primary variables and a set of elementary algebraic operators through compositions. Standard practice directly analyzes the high-dimensional candidate predictor space in a linear model; statistical analyses are then substantially hampered by the daunting challenge posed by the astronomically large number of correlated predictors with limited sample size. We formulate this problem as variable selection with operator-induced structure (OIS), and propose a new method to achieve unconventional dimension reduction by utilizing the geometry embedded in OIS. Although the model remains linear, we iterate nonparametric variable selection for effective dimension reduction. This enables variable selection based on ab initio primary variables, leading to a method that is orders of magnitude faster than existing methods, with improved accuracy. An OIS screening property for variable selection with OIS is introduced; interestingly, finite sample assessment indicates that the employed Bayesian Additive Regression Trees (BART)-based variable selection method enjoys this property under the simulation settings. Numerical studies show the superiority of the proposed method, which continues to exhibit robust performance when the dimension of engineered features is out of reach of existing methods. Our analysis to single-atom catalysis identifies physical descriptors that explain the binding energy of metal-support pairs with high explanatory power, leading to interpretable insights to guide the prevention of a notorious problem called sintering and aid catalysis design.

READ FULL TEXT
research
12/28/2021

Variable Selection Using Bayesian Additive Regression Trees

Variable selection is an important statistical problem. This problem bec...
research
10/22/2022

Model-free variable selection in sufficient dimension reduction via FDR control

Simultaneously identifying contributory variables and controlling the fa...
research
06/05/2020

Integrative Sparse Partial Least Squares

Partial least squares, as a dimension reduction method, has become incre...
research
12/11/2020

Bayesian Variable Selection for Single Index Logistic Model

In the era of big data, variable selection is a key technology for handl...
research
05/17/2019

Comparison of Machine Learning Models in Food Authentication Studies

The underlying objective of food authentication studies is to determine ...
research
12/13/2019

MM Algorithms for Distance Covariance based Sufficient Dimension Reduction and Sufficient Variable Selection

Sufficient dimension reduction (SDR) using distance covariance (DCOV) wa...
research
09/17/2020

Dimension Reduction in Contextual Online Learning via Nonparametric Variable Selection

We consider a contextual online learning (multi-armed bandit) problem wi...

Please sign up or login with your details

Forgot password? Click here to reset