Short-Text Classification Using Unsupervised Keyword Expansion

09/16/2019
by   Duncan Cameron-Steinke, et al.
0

Short-text classification, like all data science, struggles to achieve high performance using limited data. As a solution, a short sentence may be expanded with new and relevant feature words to form an artificially enlarged dataset, and add new features to testing data. This paper applies a novel approach to text expansion by generating new words directly for each input sentence, thus requiring no additional datasets or previous training. In this unsupervised approach, new keywords are formed within the hidden states of a pre-trained language model and then used to create extended pseudo documents. The word generation process was assessed by examining how well the predicted words matched to topics of the input sentence. It was found that this method could produce 3-10 relevant new words for each target topic, while generating just 1 word related to each non-target topic. Generated words were then added to short news headlines to create extended pseudo headlines. Experimental results have shown that models trained using the pseudo headlines can improve classification accuracy when limiting the number of training examples.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/11/2018

Topic Memory Networks for Short Text Classification

Many classification models work poorly on short texts due to data sparsi...
research
01/30/2021

ShufText: A Simple Black Box Approach to Evaluate the Fragility of Text Classification Models

Text classification is the most basic natural language processing task. ...
research
10/30/2018

Word Mover's Embedding: From Word2Vec to Document Embedding

While the celebrated Word2Vec technique yields semantically rich represe...
research
10/05/2020

Acrostic Poem Generation

We propose a new task in the area of computational creativity: acrostic ...
research
02/23/2022

Prompt-Learning for Short Text Classification

In the short text, the extreme short length, feature sparsity and high a...
research
02/14/2012

Multidimensional counting grids: Inferring word order from disordered bags of words

Models of bags of words typically assume topic mixing so that the words ...
research
04/04/2023

MEGClass: Text Classification with Extremely Weak Supervision via Mutually-Enhancing Text Granularities

Text classification typically requires a substantial amount of human-ann...

Please sign up or login with your details

Forgot password? Click here to reset