Submodular Mutual Information for Targeted Data Subset Selection

04/30/2021
by   Suraj Kothawade, et al.
0

With the rapid growth of data, it is becoming increasingly difficult to train or improve deep learning models with the right subset of data. We show that this problem can be effectively solved at an additional labeling cost by targeted data subset selection(TSS) where a subset of unlabeled data points similar to an auxiliary set are added to the training data. We do so by using a rich class of Submodular Mutual Information (SMI) functions and demonstrate its effectiveness for image classification on CIFAR-10 and MNIST datasets. Lastly, we compare the performance of SMI functions for TSS with other state-of-the-art methods for closely related problems like active learning. Using SMI functions, we observe  20-30 added targeted subset;  12

READ FULL TEXT
research
02/27/2021

PRISM: A Unified Framework of Parameterized Submodular Information Measures for Targeted Data Subset Selection and Summarization

With increasing data, techniques for finding smaller, yet effective subs...
research
10/04/2022

CLINICAL: Targeted Active Learning for Imbalanced Medical Image Classification

Training deep learning models on medical datasets that perform well for ...
research
10/10/2021

Personalizing ASR with limited data using targeted subset selection

We study the task of personalizing ASR models to a target non-native spe...
research
01/30/2022

PLATINUM: Semi-Supervised Model Agnostic Meta-Learning using Submodular Mutual Information

Few-shot classification (FSC) requires training models using a few (typi...
research
04/26/2021

Balancing Constraints and Submodularity in Data Subset Selection

Deep learning has yielded extraordinary results in vision and natural la...
research
11/30/2021

TALISMAN: Targeted Active Learning for Object Detection with Rare Classes and Slices using Submodular Mutual Information

Deep neural networks based object detectors have shown great success in ...
research
06/23/2021

Training Data Subset Selection for Regression with Controlled Generalization Error

Data subset selection from a large number of training instances has been...

Please sign up or login with your details

Forgot password? Click here to reset