Label-Wise Document Pre-Training for Multi-Label Text Classification

08/15/2020
by   Han Liu, et al.
0

A major challenge of multi-label text classification (MLTC) is to stimulatingly exploit possible label differences and label correlations. In this paper, we tackle this challenge by developing Label-Wise Pre-Training (LW-PT) method to get a document representation with label-aware information. The basic idea is that, a multi-label document can be represented as a combination of multiple label-wise representations, and that, correlated labels always cooccur in the same or similar documents. LW-PT implements this idea by constructing label-wise document classification tasks and trains label-wise document encoders. Finally, the pre-trained label-wise encoder is fine-tuned with the downstream MLTC task. Extensive experimental results validate that the proposed method has significant advantages over the previous state-of-the-art models and is able to discover reasonable label relationship. The code is released to facilitate other researchers.

READ FULL TEXT

page 1

page 2

page 3

page 4

04/14/2022

Label Semantic Aware Pre-training for Few-shot Text Classification

In text classification tasks, useful information is encoded in the label...
05/26/2019

Extreme Multi-Label Legal Text Classification: A case study in EU Legislation

We consider the task of Extreme Multi-Label Text Classification (XMTC) i...
06/06/2021

Enhancing Label Correlation Feedback in Multi-Label Text Classification via Multi-Task Learning

In multi-label text classification (MLTC), each given document is associ...
06/15/2020

Document Classification for COVID-19 Literature

The global pandemic has made it more important than ever to quickly and ...
05/03/2014

Kaggle LSHTC4 Winning Solution

Our winning submission to the 2014 Kaggle competition for Large Scale Hi...
07/12/2022

IDEA: Increasing Text Diversity via Online Multi-Label Recognition for Vision-Language Pre-training

Vision-Language Pre-training (VLP) with large-scale image-text pairs has...
04/01/2020

Deep Learning Based Multi-Label Text Classification of UNGA Resolutions

The main goal of this research is to produce a useful software for Unite...