Theory of Machine Learning Debugging via M-estimation

06/16/2020
by   Xiaomin Zhang, et al.
0

We investigate problems in penalized M-estimation, inspired by applications in machine learning debugging. Data are collected from two pools, one containing data with possibly contaminated labels, and the other which is known to contain only cleanly labeled points. We first formulate a general statistical algorithm for identifying buggy points and provide rigorous theoretical guarantees under the assumption that the data follow a linear model. We then present two case studies to illustrate the results of our general theory and the dependence of our estimator on clean versus buggy points. We further propose an algorithm for tuning parameter selection of our Lasso-based algorithm and provide corresponding theoretical guarantees. Finally, we consider a two-person "game" played between a bug generator and a debugger, where the debugger can augment the contaminated data set with cleanly labeled versions of points in the original data pool. We establish a theoretical result showing a sufficient condition under which the bug generator can always fool the debugger. Nonetheless, we provide empirical results showing that such a situation may not occur in practice, making it possible for natural augmentation strategies combined with our Lasso debugging algorithm to succeed.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/22/2023

Outlier-robust Estimation of a Sparse Linear Model Using Invexity

In this paper, we study problem of estimating a sparse regression vector...
research
04/24/2020

Estimating the Lasso's Effective Noise

Much of the theory for the lasso in the linear model Y = X β^* + ε hinge...
research
12/06/2017

Estimating the error variance in a high-dimensional linear model

The lasso has been studied extensively as a tool for estimating the coef...
research
11/28/2017

Robust machine learning by median-of-means : theory and practice

We introduce new estimators for robust machine learning based on median-...
research
10/29/2021

Sliding window strategy for convolutional spike sorting with Lasso : Algorithm, theoretical guarantees and complexity

We present a fast algorithm for the resolution of the Lasso for convolut...
research
01/25/2019

Orthogonal Statistical Learning

We provide excess risk guarantees for statistical learning in the presen...
research
04/30/2022

The Tensor Track VII: From Quantum Gravity to Artificial Intelligence

Assuming some familiarity with quantum field theory and with the tensor ...

Please sign up or login with your details

Forgot password? Click here to reset