Model-specific Data Subsampling with Influence Functions

10/20/2020
by   Anant Raj, et al.
0

Model selection requires repeatedly evaluating models on a given dataset and measuring their relative performances. In modern applications of machine learning, the models being considered are increasingly more expensive to evaluate and the datasets of interest are increasing in size. As a result, the process of model selection is time-consuming and computationally inefficient. In this work, we develop a model-specific data subsampling strategy that improves over random sampling whenever training points have varying influence. Specifically, we leverage influence functions to guide our selection strategy, proving theoretically, and demonstrating empirically that our approach quickly selects high-quality models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/01/2023

Subjectivity in Unsupervised Machine Learning Model Selection

Model selection is a necessary step in unsupervised machine learning. De...
research
07/17/2023

An Empirical Investigation of Pre-trained Model Selection for Out-of-Distribution Generalization and Calibration

In the realm of out-of-distribution generalization tasks, finetuning has...
research
08/30/2019

Rewarding High-Quality Data via Influence Functions

We consider a crowdsourcing data acquisition scenario, such as federated...
research
04/01/2020

Evaluation of Model Selection for Kernel Fragment Recognition in Corn Silage

Model selection when designing deep learning systems for specific use-ca...
research
05/15/2017

Probabilistic Matrix Factorization for Automated Machine Learning

In order to achieve state-of-the-art performance, modern machine learnin...
research
04/04/2020

Inferring Network Structure From Data

Networks are complex models for underlying data in many application doma...
research
12/30/2022

Dynamic Feature Engineering and model selection methods for temporal tabular datasets with regime changes

The application of deep learning algorithms to temporal panel datasets i...

Please sign up or login with your details

Forgot password? Click here to reset