Investigating Multi-source Active Learning for Natural Language Inference

02/14/2023
by   Ard Snijders, et al.
1

In recent years, active learning has been successfully applied to an array of NLP tasks. However, prior work often assumes that training and test data are drawn from the same distribution. This is problematic, as in real-life settings data may stem from several sources of varying relevance and quality. We show that four popular active learning schemes fail to outperform random selection when applied to unlabelled pools comprised of multiple data sources on the task of natural language inference. We reveal that uncertainty-based strategies perform poorly due to the acquisition of collective outliers, i.e., hard-to-learn instances that hamper learning and generalization. When outliers are removed, strategies are found to recover and outperform random baselines. In further analysis, we find that collective outliers vary in form between sources, and show that hard-to-learn data is not always categorically harmful. Lastly, we leverage dataset cartography to introduce difficulty-stratified testing and find that different strategies are affected differently by example learnability and difficulty.

READ FULL TEXT

page 6

page 18

page 19

page 21

research
07/06/2021

Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering

Active learning promises to alleviate the massive data needs of supervis...
research
02/01/2022

Active Learning Over Multiple Domains in Natural Language Tasks

Studies of active learning traditionally assume the target and source da...
research
09/03/2021

Sample Noise Impact on Active Learning

This work explores the effect of noisy sample selection in active learni...
research
02/25/2018

Active Learning with Logged Data

We consider active learning with logged data, where labeled examples are...
research
06/09/2020

Dialog Policy Learning for Joint Clarification and Active Learning Queries

Intelligent systems need to be able to recover from mistakes, resolve un...
research
05/01/2010

Joint Structured Models for Extraction from Overlapping Sources

We consider the problem of jointly training structured models for extrac...

Please sign up or login with your details

Forgot password? Click here to reset