Estimation and Inference with Proxy Data and its Genetic Applications

01/11/2022
by   Sai Li, et al.
0

Existing high-dimensional statistical methods are largely established for analyzing individual-level data. In this work, we study estimation and inference for high-dimensional linear models where we only observe "proxy data", which include the marginal statistics and sample covariance matrix that are computed based on different sets of individuals. We develop a rate optimal method for estimation and inference for the regression coefficient vector and its linear functionals based on the proxy data. Moreover, we show the intrinsic limitations in the proxy-data based inference: the minimax optimal rate for estimation is slower than that in the conventional case where individual data are observed; the power for testing and multiple testing does not go to one as the signal strength goes to infinity. These interesting findings are illustrated through simulation studies and an analysis of a dataset concerning the genetic associations of hindlimb muscle weight in a mouse population.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/02/2020

Integrative High Dimensional Multiple Testing with Heterogeneity under Data Sharing Constraints

Identifying informative predictors in a high dimensional regression mode...
research
06/16/2018

Semi-supervised Inference for Explained Variance in High-dimensional Linear Regression and Its Applications

We consider statistical inference for the explained variance β^Σβ under ...
research
08/20/2019

Optimal estimation of functionals of high-dimensional mean and covariance matrix

Motivated by portfolio allocation and linear discriminant analysis, we c...
research
04/20/2020

Robust Covariance Estimation for High-dimensional Compositional Data with Application to Microbial Communities Analysis

Microbial communities analysis is drawing growing attention due to the r...
research
04/16/2021

Generalized Matrix Decomposition Regression: Estimation and Inference for Two-way Structured Data

This paper studies high-dimensional regression with two-way structured d...
research
02/06/2016

Classification Accuracy as a Proxy for Two Sample Testing

When data analysts train a classifier and check if its accuracy is signi...
research
01/30/2023

The Optimal Choice of Hypothesis Is the Weakest, Not the Shortest

If A and B are sets such that A ⊂ B, generalisation may be understood as...

Please sign up or login with your details

Forgot password? Click here to reset