A Two-Stage Approach to Multivariate Linear Regression with Sparsely Mismatched Data

07/16/2019
by   Martin Slawski, et al.
1

A tacit assumption in linear regression is that (response, predictor)-pairs correspond to identical observational units. A series of recent works have studied scenarios in which this assumption is violated under terms such as "Unlabeled Sensing and "Regression with Unknown Permutation". In this paper, we study the setup of multiple response variables and a notion of mismatches that generalizes permutations in order to allow for missing matches as well as for one-to-many matches. A two-stage method is proposed under the assumption that most pairs are correctly matched. In the first stage, the regression parameter is estimated by handling mismatches as contaminations, and subsequently the generalized permutation is estimated by a basic variant of matching. The approach is both computationally convenient and equipped with favorable statistical guarantees. Specifically, it is shown that the conditions for permutation recovery become considerably less stringent as the number of responses m per observation increase. Particularly, for m = Ω( n), the required signal-to-noise ratio does no longer depend on the sample size n. Numerical results on synthetic and real data are presented to support the main findings of our analysis.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/23/2022

Regression with Label Permutation in Generalized Linear Model

The assumption that response and predictor belong to the same statistica...
research
10/16/2017

Linear Regression with Sparsely Permuted Data

In regression analysis of multivariate data, it is tacitly assumed that ...
research
10/01/2020

Estimation in exponential family Regression based on linked data contaminated by mismatch error

Identification of matching records in multiple files can be a challengin...
research
05/19/2017

Linear regression without correspondence

This article considers algorithmic and statistical aspects of linear reg...
research
07/13/2022

Linear regression with unmatched data: a deconvolution perspective

Consider the regression problem where the response Y∈ℝ and the covariate...
research
11/02/2021

Regularization for Shuffled Data Problems via Exponential Family Priors on the Permutation Group

In the analysis of data sets consisting of (X, Y)-pairs, a tacit assumpt...
research
10/21/2021

On Optimal Interpolation In Linear Regression

Understanding when and why interpolating methods generalize well has rec...

Please sign up or login with your details

Forgot password? Click here to reset