Log In Sign Up

Addressing Limited Data for Textual Entailment Across Domains

by   Chaitanya Shivade, et al.

We seek to address the lack of labeled data (and high cost of annotation) for textual entailment in some domains. To that end, we first create (for experimental purposes) an entailment dataset for the clinical domain, and a highly competitive supervised entailment system, ENT, that is effective (out of the box) on two domains. We then explore self-training and active learning strategies to address the lack of labeled data. With self-training, we successfully exploit unlabeled data to improve over ENT by 15 newswire domain, and 13 active learning experiments demonstrate that we can match (and even beat) ENT using only 6.6 the training data in the newswire domain.


page 1

page 2

page 3

page 4


The Use of Unlabeled Data versus Labeled Data for Stopping Active Learning for Text Classification

Annotation of training data is the major bottleneck in the creation of t...

Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training

Self-supervised learning of speech representations has been a very activ...

Iterative Loop Learning Combining Self-Training and Active Learning for Domain Adaptive Semantic Segmentation

Recently, self-training and active learning have been proposed to allevi...

Active Learning for Structured Prediction from Partially Labeled Data

We propose a general purpose active learning algorithm for structured pr...

Adaptive Active Learning for Coreference Resolution

Training coreference resolution models require comprehensively labeled d...

Curator: Creating Large-Scale Curated Labelled Datasets using Self-Supervised Learning

Applying Machine learning to domains like Earth Sciences is impeded by t...