Active Data Discovery: Mining Unknown Data using Submodular Information Measures

06/17/2022
by   Suraj Kothawade, et al.
0

Active Learning is a very common yet powerful framework for iteratively and adaptively sampling subsets of the unlabeled sets with a human in the loop with the goal of achieving labeling efficiency. Most real world datasets have imbalance either in classes and slices, and correspondingly, parts of the dataset are rare. As a result, there has been a lot of work in designing active learning approaches for mining these rare data instances. Most approaches assume access to a seed set of instances which contain these rare data instances. However, in the event of more extreme rareness, it is reasonable to assume that these rare data instances (either classes or slices) may not even be present in the seed labeled set, and a critical need for the active learning paradigm is to efficiently discover these rare data instances. In this work, we provide an active data discovery framework which can mine unknown data slices and classes efficiently using the submodular conditional gain and submodular conditional mutual information functions. We provide a general algorithmic framework which works in a number of scenarios including image classification and object detection and works with both rare classes and rare slices present in the unlabeled set. We show significant accuracy and labeling efficiency gains with our approach compared to existing state-of-the-art active learning approaches for actively discovering these rare classes and slices.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/30/2021

TALISMAN: Targeted Active Learning for Object Detection with Rare Classes and Slices using Submodular Mutual Information

Deep neural networks based object detectors have shown great success in ...
research
07/01/2021

SIMILAR: Submodular Information Measures Based Active Learning In Realistic Scenarios

Active learning has proven to be useful for minimizing labeling costs by...
research
06/02/2023

Beyond Active Learning: Leveraging the Full Potential of Human Interaction via Auto-Labeling, Human Correction, and Human Verification

Active Learning (AL) is a human-in-the-loop framework to interactively a...
research
03/20/2022

RareGAN: Generating Samples for Rare Classes

We study the problem of learning generative adversarial networks (GANs) ...
research
06/28/2019

Continual Rare-Class Recognition with Emerging Novel Subclasses

Given a labeled dataset that contains a rare (or minority) class of of-i...
research
05/18/2023

STREAMLINE: Streaming Active Learning for Realistic Multi-Distributional Settings

Deep neural networks have consistently shown great performance in severa...
research
11/29/2021

Improving traffic sign recognition by active search

We describe an iterative active-learning algorithm to recognise rare tra...

Please sign up or login with your details

Forgot password? Click here to reset