ClassiNet -- Predicting Missing Features for Short-Text Classification

04/14/2018
by   Danushka Bollegala, et al.
0

The fundamental problem in short-text classification is feature sparseness -- the lack of feature overlap between a trained model and a test instance to be classified. We propose ClassiNet -- a network of classifiers trained for predicting missing features in a given instance, to overcome the feature sparseness problem. Using a set of unlabeled training instances, we first learn binary classifiers as feature predictors for predicting whether a particular feature occurs in a given instance. Next, each feature predictor is represented as a vertex v_i in the ClassiNet where a one-to-one correspondence exists between feature predictors and vertices. The weight of the directed edge e_ij connecting a vertex v_i to a vertex v_j represents the conditional probability that given v_i exists in an instance, v_j also exists in the same instance. We show that ClassiNets generalize word co-occurrence graphs by considering implicit co-occurrences between features. We extract numerous features from the trained ClassiNet to overcome feature sparseness. In particular, for a given instance x⃗, we find similar features from ClassiNet that did not appear in x⃗, and append those features in the representation of x⃗. Moreover, we propose a method based on graph propagation to find features that are indirectly related to a given short-text. We evaluate ClassiNets on several benchmark datasets for short-text classification. Our experimental results show that by using ClassiNet, we can statistically significantly improve the accuracy in short-text classification tasks, without having to use any external resources such as thesauri for finding related features.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/30/2022

Transformers are Short Text Classifiers: A Study of Inductive Short Text Classifiers on Benchmarks and Real-world Datasets

Short text classification is a crucial and challenging aspect of Natural...
research
09/01/2019

Topics to Avoid: Demoting Latent Confounds in Text Classification

Despite impressive performance on many text classification tasks, deep n...
research
04/16/2021

Variable Instance-Level Explainability for Text Classification

Despite the high accuracy of pretrained transformer networks in text cla...
research
02/01/2019

tax2vec: Constructing Interpretable Features from Taxonomies for Short Text Classification

The use of background knowledge remains largely unexploited in many text...
research
09/29/2020

Natcat: Weakly Supervised Text Classification with Naturally Annotated Datasets

We seek to improve text classification by leveraging naturally annotated...
research
05/06/2021

The Authors Matter: Understanding and Mitigating Implicit Bias in Deep Text Classification

It is evident that deep text classification models trained on human data...
research
06/06/2020

Knowledge-Based Learning through Feature Generation

Machine learning algorithms have difficulties to generalize over a small...

Please sign up or login with your details

Forgot password? Click here to reset