Natcat: Weakly Supervised Text Classification with Naturally Annotated Datasets

09/29/2020
by   Zewei Chu, et al.
0

We seek to improve text classification by leveraging naturally annotated data. In particular, we construct a general purpose text categorization dataset (NatCat) from three online resources: Wikipedia, Reddit, and Stack Exchange. These datasets consist of document-category pairs derived from manual curation that occurs naturally by their communities. We build general purpose text classifiers by training on NatCat and evaluate them on a suite of 11 text classification tasks (CatEval). We benchmark different modeling choices and dataset combinations, and show how each task benefits from different NatCat training resources.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/07/2021

MotifClass: Weakly Supervised Text Classification with Higher-order Metadata Information

We study the problem of weakly supervised text classification, which aim...
research
05/26/2017

A WL-SPPIM Semantic Model for Document Classification

In this paper, we explore SPPIM-based text classification method, and th...
research
12/07/2020

Leveraging Automated Machine Learning for Text Classification: Evaluation of AutoML Tools and Comparison with Human Performance

Recently, Automated Machine Learning (AutoML) has registered increasing ...
research
09/11/2018

Training and Prediction Data Discrepancies: Challenges of Text Classification with Noisy, Historical Data

Industry datasets used for text classification are rarely created for th...
research
12/04/2017

Topics and Label Propagation: Best of Both Worlds for Weakly Supervised Text Classification

We propose a Label Propagation based algorithm for weakly supervised tex...
research
04/14/2018

ClassiNet -- Predicting Missing Features for Short-Text Classification

The fundamental problem in short-text classification is feature sparsene...
research
04/16/2020

Light-Weighted CNN for Text Classification

For management, documents are categorized into a specific category, and ...

Please sign up or login with your details

Forgot password? Click here to reset