Linear regression with unmatched data: a deconvolution perspective

07/13/2022
by   Mona Azadkia, et al.
0

Consider the regression problem where the response Y∈ℝ and the covariate X∈ℝ^d for d≥ 1 are unmatched. Under this scenario, we do not have access to pairs of observations from the distribution of (X, Y), but instead, we have separate datasets {Y_i}_i=1^n and {X_j}_j=1^m, possibly collected from different sources. We study this problem assuming that the regression function is linear and the noise distribution is known or can be estimated. We introduce an estimator of the regression vector based on deconvolution and demonstrate its consistency and asymptotic normality under an identifiability assumption. In the general case, we show that our estimator (DLSE: Deconvolution Least Squared Estimator) is consistent in terms of an extended ℓ_2 norm. Using this observation, we devise a method for semi-supervised learning, i.e., when we have access to a small sample of matched pairs (X_k, Y_k). Several applications with synthetic and real datasets are considered to illustrate the theory.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/19/2021

Distribution-on-Distribution Regression via Optimal Transport Maps

We present a framework for performing regression when both covariate and...
research
07/02/2020

Unlinked monotone regression

We consider so-called univariate unlinked (sometimes "decoupled," or "sh...
research
10/27/2020

Unification of Deconvolution Algorithms for Cherenkov Astronomy

Obtaining the distribution of a physical quantity is a frequent objectiv...
research
12/17/2019

Jackknife covariance matrix estimation for observations from mixture

A general jackknife estimator for the asymptotic covariance of moment es...
research
11/26/2019

Scalable Extreme Deconvolution

The Extreme Deconvolution method fits a probability density to a dataset...
research
07/16/2019

A Two-Stage Approach to Multivariate Linear Regression with Sparsely Mismatched Data

A tacit assumption in linear regression is that (response, predictor)-pa...
research
01/10/2022

Permuted and Unlinked Monotone Regression in ℝ^d: an approach based on mixture modeling and optimal transport

Suppose that we have a regression problem with response variable Y in ℝ^...

Please sign up or login with your details

Forgot password? Click here to reset