Turning the information-sharing dial: efficient inference from different data sources

07/18/2022
by   Emily C. Hector, et al.
0

A fundamental aspect of statistics is the integration of data from different sources. Classically, Fisher and others were focused on how to integrate homogeneous (or only mildly heterogeneous) sets of data. More recently, as data is becoming more accessible, the question of if data sets from different sources should be integrated is becoming more relevant. The current literature treats this as a question with only two answers: integrate or don't. Here we take a different approach, motivated by information-sharing principles coming from the shrinkage estimation literature. In particular, we deviate from the do/don't perspective and propose a dial parameter that controls the extent to which two data sources are integrated. How far this dial parameter should be turned is shown to depend, for example, on the informativeness of the different data sources as measured by Fisher information. In the context of generalized linear models, this more nuanced data integration framework leads to relatively simple parameter estimates and valid tests/confidence intervals. Moreover, we demonstrate both theoretically and empirically that setting the dial parameter according to our recommendation leads to more efficient estimation compared to other binary data integration schemes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/24/2018

On-Demand Big Data Integration: A Hybrid ETL Approach for Reproducible Scientific Research

Scientific research requires access, analysis, and sharing of data that ...
research
12/26/2019

Communication-Efficient Integrative Regression in High-Dimensions

We consider the task of meta-analysis in high-dimensional settings in wh...
research
08/05/2018

Schema Integration on Massive Data Sources

As the fundamental phrase of collecting and analyzing data, data integra...
research
10/22/2019

Integrating Information About Entities Progressively

Users often have to integrate information about entities from multiple d...
research
07/22/2011

Consistent Query Answering via ASP from Different Perspectives: Theory and Practice

A data integration system provides transparent access to different data ...
research
10/28/2020

Diagnostic data integration using deep neural networks for real-time plasma analysis

Recent advances in acquisition equipment is providing experiments with g...
research
06/18/2023

2D-Shapley: A Framework for Fragmented Data Valuation

Data valuation – quantifying the contribution of individual data sources...

Please sign up or login with your details

Forgot password? Click here to reset