tax2vec: Constructing Interpretable Features from Taxonomies for Short Text Classification

02/01/2019
by   Blaž Škrlj, et al.
0

The use of background knowledge remains largely unexploited in many text classification tasks. In this work, we explore word taxonomies as means for constructing new semantic features, which may improve the performance and robustness of the learned classifiers. We propose tax2vec, a parallel algorithm for constructing taxonomy based features, and demonstrate its use on six short-text classification problems, including gender, age and personality type prediction, drug effectiveness and side effect prediction, and news topic prediction. The experimental results indicate that the interpretable features constructed using tax2vec can notably improve the performance of classifiers; the constructed features, in combination with fast, linear classifiers tested against strong baselines, such as hierarchical attention neural networks, achieved comparable or better classification results on short documents. Further, tax2vec can also serve for extraction of corpus-specific keywords. Finally, we investigated the semantic space of potential features where we observe a similarity with the well known Zipf's law.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/05/2018

Improving Medical Short Text Classification with Semantic Expansion Using Word-Cluster Embedding

Automatic text classification (TC) research can be used for real-world p...
research
11/30/2022

Transformers are Short Text Classifiers: A Study of Inductive Short Text Classifiers on Benchmarks and Real-world Datasets

Short text classification is a crucial and challenging aspect of Natural...
research
04/12/2022

Easy Adaptation to Mitigate Gender Bias in Multilingual Text Classification

Existing approaches to mitigate demographic biases evaluate on monolingu...
research
09/01/2021

What Have Been Learned What Should Be Learned? An Empirical Study of How to Selectively Augment Text for Classification

Text augmentation techniques are widely used in text classification prob...
research
04/14/2018

ClassiNet -- Predicting Missing Features for Short-Text Classification

The fundamental problem in short-text classification is feature sparsene...
research
02/20/2022

Hierarchical Interpretation of Neural Text Classification

Recent years have witnessed increasing interests in developing interpret...
research
02/22/2020

Incorporating Effective Global Information via Adaptive Gate Attention for Text Classification

The dominant text classification studies focus on training classifiers u...

Please sign up or login with your details

Forgot password? Click here to reset