Estimation in exponential family Regression based on linked data contaminated by mismatch error

10/01/2020
by   Zhenbang Wang, et al.
0

Identification of matching records in multiple files can be a challenging and error-prone task. Linkage error can considerably affect subsequent statistical analysis based on the resulting linked file. Several recent papers have studied post-linkage linear regression analysis with the response variable in one file and the covariates in a second file from the perspective of the "Broken Sample Problem" and "Permuted Data". In this paper, we present an extension of this line of research to exponential family response given the assumption of a small to moderate number of mismatches. A method based on observation-specific offsets to account for potential mismatches and ℓ_1-penalization is proposed, and its statistical properties are discussed. We also present sufficient conditions for the recovery of the correct correspondence between covariates and responses if the regression parameter is known. The proposed approach is compared to established baselines, namely the methods by Lahiri-Larsen and Chambers, both theoretically and empirically based on synthetic and real data. The results indicate that substantial improvements over those methods can be achieved even if only limited information about the linkage process is available.

READ FULL TEXT

page 17

page 18

page 32

research
06/01/2023

A General Framework for Regression with Mismatched Data Based on Mixture Modeling

Data sets obtained from linking multiple files are frequently affected b...
research
07/16/2019

A Two-Stage Approach to Multivariate Linear Regression with Sparsely Mismatched Data

A tacit assumption in linear regression is that (response, predictor)-pa...
research
10/03/2019

A Pseudo-Likelihood Approach to Linear Regression with Partially Shuffled Data

Recently, there has been significant interest in linear regression in th...
research
11/02/2021

Regularization for Shuffled Data Problems via Exponential Family Priors on the Permutation Group

In the analysis of data sets consisting of (X, Y)-pairs, a tacit assumpt...
research
08/04/2021

Linear regression under model uncertainty

We reexamine the classical linear regression model when the model is sub...
research
10/16/2017

Linear Regression with Sparsely Permuted Data

In regression analysis of multivariate data, it is tacitly assumed that ...
research
02/18/2021

Regression-type analysis for block maxima on block maxima

This paper devises a regression-type model for the situation where both ...

Please sign up or login with your details

Forgot password? Click here to reset