CheckSel: Efficient and Accurate Data-valuation Through Online Checkpoint Selection

03/14/2022
by   Soumi Das, et al.
4

Data valuation and subset selection have emerged as valuable tools for application-specific selection of important training data. However, the efficiency-accuracy tradeoffs of state-of-the-art methods hinder their widespread application to many AI workflows. In this paper, we propose a novel 2-phase solution to this problem. Phase 1 selects representative checkpoints from an SGD-like training algorithm, which are used in phase-2 to estimate the approximate training data values, e.g. decrease in validation loss due to each training point. A key contribution of this paper is CheckSel, an Orthogonal Matching Pursuit-inspired online sparse approximation algorithm for checkpoint selection in the online setting, where the features are revealed one at a time. Another key contribution is the study of data valuation in the domain adaptation setting, where a data value estimator obtained using checkpoints from training trajectory in the source domain training dataset is used for data valuation in a target domain training dataset. Experimental results on benchmark datasets show the proposed algorithm outperforms recent baseline methods by up to 30 computational burden, for both standalone and domain adaptation settings.

READ FULL TEXT
research
04/30/2022

Source Domain Subset Sampling for Semi-Supervised Domain Adaptation in Semantic Segmentation

In this paper, we introduce source domain subset sampling (SDSS) as a ne...
research
05/21/2021

DAVOS: Semi-Supervised Video Object Segmentation via Adversarial Domain Adaptation

Domain shift has always been one of the primary issues in video object s...
research
03/21/2016

Beyond Sharing Weights for Deep Domain Adaptation

The performance of a classifier trained on data coming from a specific d...
research
09/05/2015

Theoretic Analysis and Extremely Easy Algorithms for Domain Adaptive Feature Learning

Domain adaptation problems arise in a variety of applications, where a t...
research
04/28/2021

Finding High-Value Training Data Subset through Differentiable Convex Programming

Finding valuable training data points for deep neural networks has been ...
research
08/17/2021

Appearance Based Deep Domain Adaptation for the Classification of Aerial Images

This paper addresses domain adaptation for the pixel-wise classification...
research
09/16/2020

Similarity-based data mining for online domain adaptation of a sonar ATR system

Due to the expensive nature of field data gathering, the lack of trainin...

Please sign up or login with your details

Forgot password? Click here to reset