LST: Lexicon-Guided Self-Training for Few-Shot Text Classification

02/05/2022
by   Hazel Kim, et al.
0

Self-training provides an effective means of using an extremely small amount of labeled data to create pseudo-labels for unlabeled data. Many state-of-the-art self-training approaches hinge on different regularization methods to prevent overfitting and improve generalization. Yet they still rely heavily on predictions initially trained with the limited labeled data as pseudo-labels and are likely to put overconfident label belief on erroneous classes depending on the first prediction. To tackle this issue in text classification, we introduce LST, a simple self-training method that uses a lexicon to guide the pseudo-labeling mechanism in a linguistically-enriched manner. We consistently refine the lexicon by predicting confidence of the unseen data to teach pseudo-labels better in the training iterations. We demonstrate that this simple yet well-crafted lexical knowledge achieves 1.0-2.0 datasets than the current state-of-the-art approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/18/2022

Self-Transriber: Few-shot Lyrics Transcription with Self-training

The current lyrics transcription approaches heavily rely on supervised l...
research
02/27/2023

Revisiting Self-Training with Regularized Pseudo-Labeling for Tabular Data

Recent progress in semi- and self-supervised learning has caused a rift ...
research
03/25/2022

Pseudo-Label Transfer from Frame-Level to Note-Level in a Teacher-Student Framework for Singing Transcription from Polyphonic Music

Lack of large-scale note-level labeled data is the major obstacle to sin...
research
11/17/2022

Self-Training with Purpose Preserving Augmentation Improves Few-shot Generative Dialogue State Tracking

In dialogue state tracking (DST), labeling the dataset involves consider...
research
01/27/2022

Confidence May Cheat: Self-Training on Graph Neural Networks under Distribution Shift

Graph Convolutional Networks (GCNs) have recently attracted vast interes...
research
02/15/2022

Debiased Pseudo Labeling in Self-Training

Deep neural networks achieve remarkable performances on a wide range of ...
research
06/13/2023

Rank-Aware Negative Training for Semi-Supervised Text Classification

Semi-supervised text classification-based paradigms (SSTC) typically emp...

Please sign up or login with your details

Forgot password? Click here to reset