Cold-start Active Learning through Self-supervised Language Modeling

10/19/2020
by   Michelle Yuan, et al.
0

Active learning strives to reduce annotation costs by choosing the most critical examples to label. Typically, the active learning strategy is contingent on the classification model. For instance, uncertainty sampling depends on poorly calibrated model confidence scores. In the cold-start setting, active learning is impractical because of model instability and data scarcity. Fortunately, modern NLP provides an additional source of information: pre-trained language models. The pre-training loss can find examples that surprise the model and should be labeled for efficient fine-tuning. Therefore, we treat the language modeling loss as a proxy for classification uncertainty. With BERT, we develop a simple strategy based on the masked language modeling loss that minimizes labeling costs for text classification. Compared to other baselines, our approach reaches higher accuracy within less sampling iterations and computation time.

READ FULL TEXT
research
11/15/2022

An Efficient Active Learning Pipeline for Legal Text Classification

Active Learning (AL) is a powerful tool for learning with less labeled d...
research
12/16/2021

ATM: An Uncertainty-aware Active Self-training Framework for Label-efficient Text Classification

Despite the great success of pre-trained language models (LMs) in many n...
research
03/14/2022

Uncertainty Estimation for Language Reward Models

Language models can learn a range of capabilities from unsupervised trai...
research
06/27/2023

Large Language Models as Annotators: Enhancing Generalization of NLP Models at Minimal Cost

State-of-the-art supervised NLP models achieve high accuracy but are als...
research
04/17/2020

Active Sentence Learning by Adversarial Uncertainty Sampling in Discrete Space

In this paper, we focus on reducing the labeled data size for sentence l...
research
09/03/2021

ALLWAS: Active Learning on Language models in WASserstein space

Active learning has emerged as a standard paradigm in areas with scarcit...
research
12/20/2022

Smooth Sailing: Improving Active Learning for Pre-trained Language Models with Representation Smoothness Analysis

Developed as a solution to a practical need, active learning (AL) method...

Please sign up or login with your details

Forgot password? Click here to reset