Addressing Limited Data for Textual Entailment Across Domains

06/08/2016 ∙ by Chaitanya Shivade, et al. ∙ 0

We seek to address the lack of labeled data (and high cost of annotation) for textual entailment in some domains. To that end, we first create (for experimental purposes) an entailment dataset for the clinical domain, and a highly competitive supervised entailment system, ENT, that is effective (out of the box) on two domains. We then explore self-training and active learning strategies to address the lack of labeled data. With self-training, we successfully exploit unlabeled data to improve over ENT by 15 newswire domain, and 13 active learning experiments demonstrate that we can match (and even beat) ENT using only 6.6 the training data in the newswire domain.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.