Weakly-Supervised Scientific Document Classification via Retrieval-Augmented Multi-Stage Training

06/12/2023
by   Ran Xu, et al.
0

Scientific document classification is a critical task for a wide range of applications, but the cost of obtaining massive amounts of human-labeled data can be prohibitive. To address this challenge, we propose a weakly-supervised approach for scientific document classification using label names only. In scientific domains, label names often include domain-specific concepts that may not appear in the document corpus, making it difficult to match labels and documents precisely. To tackle this issue, we propose WANDER, which leverages dense retrieval to perform matching in the embedding space to capture the semantics of label names. We further design the label name expansion module to enrich the label name representations. Lastly, a self-training step is used to refine the predictions. The experiments on three datasets show that WANDER outperforms the best baseline by 11.9 at https://github.com/ritaranx/wander.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/13/2022

LIME: Weakly-Supervised Text Classification Without Seeds

In weakly-supervised text classification, only label names act as source...
research
05/11/2018

Weakly Supervised Domain-Specific Color Naming Based on Attention

The majority of existing color naming methods focuses on the eleven basi...
research
10/14/2020

Text Classification Using Label Names Only: A Language Model Self-Training Approach

Current text classification methods typically require a good number of h...
research
06/24/2023

Weakly Supervised Multi-Label Classification of Full-Text Scientific Papers

Instead of relying on human-annotated training samples to build a classi...
research
12/04/2017

Topics and Label Propagation: Best of Both Worlds for Weakly Supervised Text Classification

We propose a Label Propagation based algorithm for weakly supervised tex...
research
11/24/2021

Out-of-Category Document Identification Using Target-Category Names as Weak Supervision

Identifying outlier documents, whose content is different from the major...
research
01/16/2021

Weakly-Supervised Hierarchical Models for Predicting Persuasive Strategies in Good-faith Textual Requests

Modeling persuasive language has the potential to better facilitate our ...

Please sign up or login with your details

Forgot password? Click here to reset