Structure-Property Maps with Kernel Principal Covariates Regression

02/12/2020
by   Benjamin A. Helfrecht, et al.
8

Data analysis based on linear methods, which look for correlations between the features describing samples in a data set, or between features and properties associated with the samples, constitute the simplest, most robust, and transparent approaches to the automatic processing of large amounts of data for building supervised or unsupervised machine learning models. Principal covariates regression (PCovR) is an under-appreciated method that interpolates between principal component analysis and linear regression, and can be used to conveniently reveal structure-property relations in terms of simple-to-interpret, low-dimensional maps. Here we provide a pedagogic overview of these data analysis schemes, including the use of the kernel trick to introduce an element of non-linearity in the process, while maintaining most of the convenience and the simplicity of linear approaches. We then introduce a kernelized version of PCovR and a sparsified extension, followed by a feature-selection scheme based on the CUR matrix decomposition modified to incorporate the same hybrid loss that underlies PCovR. We demonstrate the performance of these approaches in revealing and predicting structure-property relations in chemistry and materials science.

READ FULL TEXT

page 1

page 2

page 3

page 6

page 7

page 8

page 9

page 11

research
12/22/2020

Improving Sample and Feature Selection with Principal Covariates Regression

Selecting the most relevant features and samples out of a large set of c...
research
12/16/2015

Streaming Kernel Principal Component Analysis

Kernel principal component analysis (KPCA) provides a concise set of bas...
research
01/02/2017

Towards multiple kernel principal component analysis for integrative analysis of tumor samples

Personalized treatment of patients based on tissue-specific cancer subty...
research
12/07/2021

A semi-group approach to Principal Component Analysis

Principal Component Analysis (PCA) is a well known procedure to reduce i...
research
10/07/2019

Nonparametric principal subspace regression

In scientific applications, multivariate observations often come in tand...
research
06/27/2021

Interpretable Network Representation Learning with Principal Component Analysis

We consider the problem of interpretable network representation learning...
research
01/01/2017

Outlier Robust Online Learning

We consider the problem of learning from noisy data in practical setting...

Please sign up or login with your details

Forgot password? Click here to reset