Reducing Confusion in Active Learning for Part-Of-Speech Tagging

11/02/2020
by   Aditi Chaudhary, et al.
0

Active learning (AL) uses a data selection algorithm to select useful training samples to minimize annotation cost. This is now an essential tool for building low-resource syntactic analyzers such as part-of-speech (POS) taggers. Existing AL heuristics are generally designed on the principle of selecting uncertain yet representative training instances, where annotating these instances may reduce a large number of errors. However, in an empirical study across six typologically diverse languages (German, Swedish, Galician, North Sami, Persian, and Ukrainian), we found the surprising result that even in an oracle scenario where we know the true uncertainty of predictions, these current heuristics are far from optimal. Based on this analysis, we pose the problem of AL as selecting instances which maximally reduce the confusion between particular pairs of output tags. Extensive experimentation on the aforementioned languages shows that our proposed AL strategy outperforms other AL strategies by a significant margin. We also present auxiliary results demonstrating the importance of proper calibration of models, which we ensure through cross-view training, and analysis demonstrating how our proposed strategy selects examples that more closely follow the oracle data distribution.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/26/2022

Eeny, meeny, miny, moe. How to choose data for morphological inflection

Data scarcity is a widespread problem in numerous natural language proce...
research
11/01/2019

Active Learning with Siamese Twins for Sequence Tagging

Deep learning, in general, and natural language processing methods, in p...
research
08/24/2022

ImitAL: Learned Active Learning Strategy on Synthetic Data

Active Learning (AL) is a well-known standard method for efficiently obt...
research
09/02/2020

ALEX: Active Learning based Enhancement of a Model's Explainability

An active learning (AL) algorithm seeks to construct an effective classi...
research
10/07/2021

Addressing practical challenges in Active Learning via a hybrid query strategy

Active Learning (AL) is a powerful tool to address modern machine learni...
research
08/28/2023

Maturity-Aware Active Learning for Semantic Segmentation with Hierarchically-Adaptive Sample Assessment

Active Learning (AL) for semantic segmentation is challenging due to hea...
research
01/30/2023

Active Learning for Multilingual Semantic Parser

Current multilingual semantic parsing (MSP) datasets are almost all coll...

Please sign up or login with your details

Forgot password? Click here to reset