An algorithm-based multiple detection influence measure for high dimensional regression using expectile

05/26/2021
by   Amadou Barry, et al.
0

The identification of influential observations is an important part of data analysis that can prevent erroneous conclusions drawn from biased estimators. However, in high dimensional data, this identification is challenging. Classical and recently-developed methods often perform poorly when there are multiple influential observations in the same dataset. In particular, current methods can fail when there is masking several influential observations with similar characteristics, or swamping when the influential observations are near the boundary of the space spanned by well-behaved observations. Therefore, we propose an algorithm-based, multi-step, multiple detection procedure to identify influential observations that addresses current limitations. Our three-step algorithm to identify and capture undesirable variability in the data, , is based on two complementary statistics, inspired by asymmetric correlations, and built on expectiles. Simulations demonstrate higher detection power than competing methods. Use of the resulting asymptotic distribution leads to detection of influential observations without the need for computationally demanding procedures such as the bootstrap. The application of our method to the Autism Brain Imaging Data Exchange neuroimaging dataset resulted in a more balanced and accurate prediction of brain maturity based on cortical thickness. See our GitHub for a free R package that implements our algorithm: (<github.com/AmBarry/hidetify>).

READ FULL TEXT

page 32

page 34

research
06/16/2021

Clustering inference in multiple groups

Inference in clustering is paramount to uncovering inherent group struct...
research
09/09/2019

Outlier Detection in High Dimensional Data

High-dimensional data poses unique challenges in outlier detection proce...
research
05/01/2020

Parallel subgroup analysis of high-dimensional data via M-regression

It becomes an interesting problem to identify subgroup structures in dat...
research
12/17/2018

Likelihood Ratio Test in Multivariate Linear Regression: from Low to High Dimension

Multivariate linear regressions are widely used statistical tools in man...
research
06/18/2012

Sparse Additive Functional and Kernel CCA

Canonical Correlation Analysis (CCA) is a classical tool for finding cor...
research
08/22/2018

Robust Spatial Extent Inference with a Semiparametric Bootstrap Joint Testing Procedure

Spatial extent inference (SEI) is widely used across neuroimaging modali...

Please sign up or login with your details

Forgot password? Click here to reset