Exploiting Dynamic and Fine-grained Semantic Scope for Extreme Multi-label Text Classification

05/24/2022
by   Yuan Wang, et al.
14

Extreme multi-label text classification (XMTC) refers to the problem of tagging a given text with the most relevant subset of labels from a large label set. A majority of labels only have a few training instances due to large label dimensionality in XMTC. To solve this data sparsity issue, most existing XMTC methods take advantage of fixed label clusters obtained in early stage to balance performance on tail labels and head labels. However, such label clusters provide static and coarse-grained semantic scope for every text, which ignores distinct characteristics of different texts and has difficulties modelling accurate semantics scope for texts with tail labels. In this paper, we propose a novel framework TReaderXML for XMTC, which adopts dynamic and fine-grained semantic scope from teacher knowledge for individual text to optimize text conditional prior category semantic ranges. TReaderXML dynamically obtains teacher knowledge for each text by similar texts and hierarchical label information in training sets to release the ability of distinctly fine-grained label-oriented semantic scope. Then, TReaderXML benefits from a novel dual cooperative network that firstly learns features of a text and its corresponding label-oriented semantic scope by parallel Encoding Module and Reading Module, secondly embeds two parts by Interaction Module to regularize the text's representation by dynamic and fine-grained label-oriented semantic scope, and finally find target labels by Prediction Module. Experimental results on three XMTC benchmark datasets show that our method achieves new state-of-the-art results and especially performs well for severely imbalanced and sparse datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2019

Label-aware Document Representation via Hybrid Attention for Extreme Multi-Label Text Classification

Extreme multi-label text classification (XMTC) aims at tagging a documen...
research
07/05/2020

Pretrained Generalized Autoregressive Model with Adaptive Probabilistic Label Clusters for Extreme Multi-label Text Classification

Extreme multi-label text classification (XMTC) is a task for tagging a g...
research
12/10/2020

GNN-XML: Graph Neural Networks for Extreme Multi-label Text Classification

Extreme multi-label text classification (XMTC) aims to tag a text instan...
research
01/24/2021

Does Head Label Help for Long-Tailed Multi-Label Text Classification

Multi-label text classification (MLTC) aims to annotate documents with t...
research
04/18/2022

HFT-ONLSTM: Hierarchical and Fine-Tuning Multi-label Text Classification

Many important classification problems in the real-world consist of a la...
research
03/06/2018

VIPE: A new interactive classification framework for large sets of short texts - application to opinion mining

This paper presents a new interactive opinion mining tool that helps use...
research
04/30/2018

Types for Information Flow Control: Labeling Granularity and Semantic Models

Language-based information flow control (IFC) tracks dependencies within...

Please sign up or login with your details

Forgot password? Click here to reset