A Model-free Closeness-of-influence Test for Features in Supervised Learning

06/20/2023
by   Mohammad Mehrabi, et al.
0

Understanding the effect of a feature vector x ∈ℝ^d on the response value (label) y ∈ℝ is the cornerstone of many statistical learning problems. Ideally, it is desired to understand how a set of collected features combine together and influence the response value, but this problem is notoriously difficult, due to the high-dimensionality of data and limited number of labeled data points, among many others. In this work, we take a new perspective on this problem, and we study the question of assessing the difference of influence that the two given features have on the response value. We first propose a notion of closeness for the influence of features, and show that our definition recovers the familiar notion of the magnitude of coefficients in the parametric model. We then propose a novel method to test for the closeness of influence in general model-free supervised learning problems. Our proposed test can be used with finite number of samples with control on type I error rate, no matter the ground truth conditional law ℒ(Y |X). We analyze the power of our test for two general learning problems i) linear regression, and ii) binary classification under mixture of Gaussian models, and show that under the proper choice of score function, an internal component of our test, with sufficient number of samples will achieve full statistical power. We evaluate our findings through extensive numerical simulations, specifically we adopt the datamodel framework (Ilyas, et al., 2022) for CIFAR-10 dataset to identify pairs of training samples with different influence on the trained model via optional black box training mechanisms.

READ FULL TEXT

page 7

page 20

page 21

research
09/05/2022

GRASP: A Goodness-of-Fit Test for Classification Learning

Performance of classifiers is often measured in terms of average accurac...
research
11/01/2019

Second-Order Group Influence Functions for Black-Box Predictions

With the rapid adoption of machine learning systems in sensitive applica...
research
07/19/2021

Influence of a Set of Variables on a Boolean Function

The influence of a set of variables on a Boolean function has three sepa...
research
04/06/2017

On the Statistical Efficiency of Compositional Nonparametric Prediction

In this paper, we propose a compositional nonparametric method in which ...
research
10/03/2022

Understanding Influence Functions and Datamodels via Harmonic Analysis

Influence functions estimate effect of individual data points on predict...
research
06/03/2022

Generalization for multiclass classification with overparameterized linear models

Via an overparameterized linear model with Gaussian features, we provide...
research
06/30/2018

A New Benchmark and Progress Toward Improved Weakly Supervised Learning

Knowledge Matters: Importance of Prior Information for Optimization [7],...

Please sign up or login with your details

Forgot password? Click here to reset