Active Discriminative Text Representation Learning

06/14/2016
by   Ye Zhang, et al.
0

We propose a new active learning (AL) method for text classification with convolutional neural networks (CNNs). In AL, one selects the instances to be manually labeled with the aim of maximizing model performance with minimal effort. Neural models capitalize on word embeddings as representations (features), tuning these to the task at hand. We argue that AL strategies for multi-layered neural models should focus on selecting instances that most affect the embedding space (i.e., induce discriminative word representations). This is in contrast to traditional AL approaches (e.g., entropy-based uncertainty sampling), which specify higher level objectives. We propose a simple approach for sentence classification that selects instances containing words whose embeddings are likely to be updated with the greatest magnitude, thereby rapidly learning discriminative, task-specific embeddings. We extend this approach to document classification by jointly considering: (1) the expected changes to the constituent word representations; and (2) the model's current overall uncertainty regarding the instance. The relative emphasis placed on these criteria is governed by a stochastic process that favors selecting instances likely to improve representations at the outset of learning, and then shifts toward general uncertainty sampling as AL progresses. Empirical results show that our method outperforms baseline AL approaches on both sentence and document classification tasks. We also show that, as expected, the method quickly learns discriminative word embeddings. To the best of our knowledge, this is the first work on AL addressing neural models for text classification.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/10/2021

Active learning for reducing labeling effort in text classification tasks

Labeling data can be an expensive task as it is usually performed manual...
research
09/06/2018

An Analysis of Hierarchical Text Classification Using Word Embeddings

Efficient distributed numerical word representation models (word embeddi...
research
12/11/2020

TF-CR: Weighting Embeddings for Text Classification

Text classification, as the task consisting in assigning categories to t...
research
08/07/2019

A Simple and Effective Approach for Fine Tuning Pre-trained Word Embeddings for Improved Text Classification

This work presents a new and simple approach for fine-tuning pretrained ...
research
11/18/2019

Improving Document Classification with Multi-Sense Embeddings

Efficient representation of text documents is an important building bloc...
research
10/13/2015

A Sensitivity Analysis of (and Practitioners' Guide to) Convolutional Neural Networks for Sentence Classification

Convolutional Neural Networks (CNNs) have recently achieved remarkably s...
research
10/04/2019

Investigating the Effectiveness of Word-Embedding Based Active Learning for Labelling Text Datasets

Manually labelling large collections of text data is a time-consuming, e...

Please sign up or login with your details

Forgot password? Click here to reset