Natcat: Weakly Supervised Text Classification with Naturally Annotated Datasets

09/29/2020
by   Zewei Chu, et al.
0

We seek to improve text classification by leveraging naturally annotated data. In particular, we construct a general purpose text categorization dataset (NatCat) from three online resources: Wikipedia, Reddit, and Stack Exchange. These datasets consist of document-category pairs derived from manual curation that occurs naturally by their communities. We build general purpose text classifiers by training on NatCat and evaluate them on a suite of 11 text classification tasks (CatEval). We benchmark different modeling choices and dataset combinations, and show how each task benefits from different NatCat training resources.

READ FULL TEXT
research
11/07/2021

MotifClass: Weakly Supervised Text Classification with Higher-order Metadata Information

We study the problem of weakly supervised text classification, which aim...
research
05/26/2017

A WL-SPPIM Semantic Model for Document Classification

In this paper, we explore SPPIM-based text classification method, and th...
research
12/07/2020

Leveraging Automated Machine Learning for Text Classification: Evaluation of AutoML Tools and Comparison with Human Performance

Recently, Automated Machine Learning (AutoML) has registered increasing ...
research
09/11/2018

Training and Prediction Data Discrepancies: Challenges of Text Classification with Noisy, Historical Data

Industry datasets used for text classification are rarely created for th...
research
12/04/2017

Topics and Label Propagation: Best of Both Worlds for Weakly Supervised Text Classification

We propose a Label Propagation based algorithm for weakly supervised tex...
research
04/14/2018

ClassiNet -- Predicting Missing Features for Short-Text Classification

The fundamental problem in short-text classification is feature sparsene...
research
04/16/2020

Light-Weighted CNN for Text Classification

For management, documents are categorized into a specific category, and ...

Please sign up or login with your details

Forgot password? Click here to reset