Linear Regression with Shuffled Labels

05/03/2017
by   Abubakar Abid, et al.
0

Is it possible to perform linear regression on datasets whose labels are shuffled with respect to the inputs? We explore this question by proposing several estimators that recover the weights of a noisy linear model from labels that are shuffled by an unknown permutation. We show that the analog of the classical least-squares estimator produces inconsistent estimates in this setting, and introduce an estimator based on the self-moments of the input features and labels. We study the regimes in which each estimator excels, and generalize the estimators to the setting where partial ordering information is available in the form of experiments replicated independently. The result is a framework that enables robust inference, as we demonstrate by experiments on both synthetic and standard datasets, where we are able to recover approximate weights using only shuffled labels. Our work demonstrates that linear regression in the absence of complete ordering information is possible and can be of practical interest, particularly in experiments that characterize populations of particles, such as flow cytometry.

READ FULL TEXT

page 10

page 29

research
11/04/2021

Analysis of Least square estimator for simple Linear Regression with a uniform distribution error

We study the least square estimator, in the framework of simple linear r...
research
04/02/2018

Stochastic EM for Shuffled Linear Regression

We consider the problem of inference in a linear regression model in whi...
research
09/02/2022

Shuffled total least squares

Linear regression with shuffled labels and with a noisy latent design ma...
research
08/25/2022

Efficient Truncated Linear Regression with Unknown Noise Variance

Truncated linear regression is a classical challenge in Statistics, wher...
research
03/12/2021

Max-Linear Regression by Scalable and Guaranteed Convex Programming

We consider the multivariate max-linear regression problem where the mod...
research
11/22/2019

Noise Induces Loss Discrepancy Across Groups for Linear Regression

We study the effect of feature noise (measurement error) on the discrepa...
research
12/02/2020

Improving KernelSHAP: Practical Shapley Value Estimation via Linear Regression

The Shapley value solution concept from cooperative game theory has beco...

Please sign up or login with your details

Forgot password? Click here to reset