Provably Auditing Ordinary Least Squares in Low Dimensions

05/28/2022
by   Ankur Moitra, et al.
0

Measuring the stability of conclusions derived from Ordinary Least Squares linear regression is critically important, but most metrics either only measure local stability (i.e. against infinitesimal changes in the data), or are only interpretable under statistical assumptions. Recent work proposes a simple, global, finite-sample stability metric: the minimum number of samples that need to be removed so that rerunning the analysis overturns the conclusion, specifically meaning that the sign of a particular coefficient of the estimated regressor changes. However, besides the trivial exponential-time algorithm, the only approach for computing this metric is a greedy heuristic that lacks provable guarantees under reasonable, verifiable assumptions; the heuristic provides a loose upper bound on the stability and also cannot certify lower bounds on it. We show that in the low-dimensional regime where the number of covariates is a constant but the number of samples is large, there are efficient algorithms for provably estimating (a fractional version of) this metric. Applying our algorithms to the Boston Housing dataset, we exhibit regression analyses where we can estimate the stability up to a factor of 3 better than the greedy heuristic, and analyses where we can certify stability to dropping even a majority of the samples.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/26/2019

The Approximation Ratio of the 2-Opt Heuristic for the Metric Traveling Salesman Problem

The 2-Opt heuristic is one of the simplest algorithms for finding good s...
research
10/12/2018

Finite sample performance of linear least squares estimation

Linear Least Squares is a very well known technique for parameter estima...
research
07/30/2023

Towards Practical Robustness Auditing for Linear Regression

We investigate practical algorithms to find or disprove the existence of...
research
01/30/2023

Bagging Provides Assumption-free Stability

Bagging is an important technique for stabilizing machine learning model...
research
05/20/2022

Sample Complexity of Learning Heuristic Functions for Greedy-Best-First and A* Search

Greedy best-first search (GBFS) and A* search (A*) are popular algorithm...
research
11/30/2020

An Automatic Finite-Sample Robustness Metric: Can Dropping a Little Data Change Conclusions?

We propose a method to assess the sensitivity of econometric analyses to...

Please sign up or login with your details

Forgot password? Click here to reset