Linear Regression with Sparsely Permuted Data

10/16/2017
by   Martin Slawski, et al.
0

In regression analysis of multivariate data, it is tacitly assumed that response and predictor variables in each observed response-predictor pair correspond to the same entity or unit. In this paper, we consider the situation of "permuted data" in which this basic correspondence has been lost. Several recent papers have considered this situation without further assumptions on the underlying permutation. In applications, the latter is often to known to have additional structure that can be leveraged. Specifically, we herein consider the common scenario of "sparsely permuted data" in which only a small fraction of the data is affected by a mismatch between response and predictors. However, an adverse effect already observed for sparsely permuted data is that the least squares estimator as well as other estimators not accounting for such partial mismatch are inconsistent. One approach studied in detail herein is to treat permuted data as outliers which motivates the use of robust regression formulations to estimate the regression parameter. The resulting estimate can subsequently be used to recover the permutation. A notable benefit of the proposed approach is its computational simplicity given the general lack of procedures for the above problem that are both statistically sound and computationally appealing.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/16/2019

A Two-Stage Approach to Multivariate Linear Regression with Sparsely Mismatched Data

A tacit assumption in linear regression is that (response, predictor)-pa...
research
10/12/2018

Spherical Regression under Mismatch Corruption with Application to Automated Knowledge Translation

Motivated by a series of applications in data integration, language tran...
research
04/05/2022

A robust scalar-on-function logistic regression for classification

Scalar-on-function logistic regression, where the response is a binary o...
research
11/13/2017

Optimal estimation in functional linear regression for sparse noise-contaminated data

In this paper, we propose a novel approach to fit a functional linear re...
research
10/03/2019

A Pseudo-Likelihood Approach to Linear Regression with Partially Shuffled Data

Recently, there has been significant interest in linear regression in th...
research
10/01/2020

Estimation in exponential family Regression based on linked data contaminated by mismatch error

Identification of matching records in multiple files can be a challengin...
research
06/01/2020

A Combined Approach To Detect Key Variables In Thick Data Analytics

In machine learning one of the strategic tasks is the selection of only ...

Please sign up or login with your details

Forgot password? Click here to reset