A robust fusion-extraction procedure with summary statistics in the presence of biased sources

08/28/2021
by   Ruoyu Wang, et al.
0

Information from various data sources is increasingly available nowadays. However, some of the data sources may produce biased estimation due to commonly encountered biased sampling, population heterogeneity, or model misspecification. This calls for statistical methods to combine information in the presence of biased sources. In this paper, a robust data fusion-extraction method is proposed. The method can produce a consistent estimator of the parameter of interest even if many of the data sources are biased. The proposed estimator is easy to compute and only employs summary statistics, and hence can be applied to many different fields, e.g. meta-analysis, Mendelian randomisation and distributed system. Moreover, the proposed estimator is asymptotically equivalent to the oracle estimator that only uses data from unbiased sources under some mild conditions. Asymptotic normality of the proposed estimator is also established. In contrast to the existing meta-analysis methods, the theoretical properties are guaranteed even if both the number of data sources and the dimension of the parameter diverge as the sample size increases, which ensures the performance of the proposed method over a wide range. The robustness and oracle property is also evaluated via simulation studies. The proposed method is applied to a meta-analysis data set to evaluate the surgical treatment for the moderate periodontal disease, and a Mendelian randomization data set to study the risk factors of head and neck cancer.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/11/2022

Semiparametric adaptive estimation under informative sampling

In survey sampling, survey data do not necessarily represent the target ...
research
12/20/2017

Linking Administrative Data: An Evolutionary Schema

Statistics New Zealand (Stats NZ) has committed unreservedly to an admin...
research
12/26/2019

Communication-Efficient Integrative Regression in High-Dimensions

We consider the task of meta-analysis in high-dimensional settings in wh...
research
10/05/2022

Fused mean structure learning in data integration with dependence

Motivated by image-on-scalar regression with data aggregated across mult...
research
04/03/2022

Probability and Non-Probability Samples: Improving Regression Modeling by Using Data from Different Sources

Non-probability sampling, for example in the form of online panels, has ...
research
01/29/2019

Robust Learning from Untrusted Sources

Modern machine learning methods often require more data for training tha...
research
07/14/2021

Querying the Most Granular Demographics Dataset

We have an API that allows you to query demographics data. Your data jus...

Please sign up or login with your details

Forgot password? Click here to reset