Learning Only from Relevant Keywords and Unlabeled Documents

10/10/2019
by   Nontawat Charoenphakdee, et al.
0

We consider a document classification problem where document labels are absent but only relevant keywords of a target class and unlabeled documents are given. Although heuristic methods based on pseudo-labeling have been considered, theoretical understanding of this problem has still been limited. Moreover, previous methods cannot easily incorporate well-developed techniques in supervised text classification. In this paper, we propose a theoretically guaranteed learning framework that is simple to implement and has flexible choices of models, e.g., linear models or neural networks. We demonstrate how to optimize the area under the receiver operating characteristic curve (AUC) effectively and also discuss how to adjust it to optimize other well-known evaluation metrics such as the accuracy and F1-measure. Finally, we show the effectiveness of our framework using benchmark datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/12/2022

Lbl2Vec: An Embedding-Based Approach for Unsupervised Document Retrieval on Predefined Topics

In this paper, we consider the task of retrieving documents with predefi...
research
12/29/2018

Weakly-Supervised Hierarchical Text Classification

Hierarchical text classification, which aims to classify text documents ...
research
07/06/2023

S2vNTM: Semi-supervised vMF Neural Topic Modeling

Language model based methods are powerful techniques for text classifica...
research
10/06/2021

Weakly-supervised Text Classification Based on Keyword Graph

Weakly-supervised text classification has received much attention in rec...
research
12/11/2022

FastClass: A Time-Efficient Approach to Weakly-Supervised Text Classification

Weakly-supervised text classification aims to train a classifier using o...
research
01/05/2021

A Symmetric Loss Perspective of Reliable Machine Learning

When minimizing the empirical risk in binary classification, it is a com...
research
07/10/2018

A New Variational Model for Binary Classification in the Supervised Learning Context

We examine the supervised learning problem in its continuous setting and...

Please sign up or login with your details

Forgot password? Click here to reset