Retrieval-augmented Multi-label Text Classification

05/22/2023
by   Ilias Chalkidis, et al.
0

Multi-label text classification (MLC) is a challenging task in settings of large label sets, where label support follows a Zipfian distribution. In this paper, we address this problem through retrieval augmentation, aiming to improve the sample efficiency of classification models. Our approach closely follows the standard MLC architecture of a Transformer-based encoder paired with a set of classification heads. In our case, however, the input document representation is augmented through cross-attention to similar documents retrieved from the training set and represented in a task-specific manner. We evaluate this approach on four datasets from the legal and biomedical domains, all of which feature highly skewed label distributions. Our experiments show that retrieval augmentation substantially improves model performance on the long tail of infrequent labels especially so for lower-resource training scenarios and more challenging long-document data scenarios.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2019

Label-aware Document Representation via Hybrid Attention for Extreme Multi-Label Text Classification

Extreme multi-label text classification (XMTC) aims at tagging a documen...
research
05/30/2023

Cross Encoding as Augmentation: Towards Effective Educational Text Classification

Text classification in education, usually called auto-tagging, is the au...
research
01/15/2023

Hawk: An Industrial-strength Multi-label Document Classifier

There are a plethora of methods and algorithms that solve the classical ...
research
05/18/2020

Interaction Matching for Long-Tail Multi-Label Classification

We present an elegant and effective approach for addressing limitations ...
research
11/19/2022

Pairwise Instance Relation Augmentation for Long-tailed Multi-label Text Classification

Multi-label text classification (MLTC) is one of the key tasks in natura...
research
04/02/2022

Long-tailed Extreme Multi-label Text Classification with Generated Pseudo Label Descriptions

Extreme Multi-label Text Classification (XMTC) has been a tough challeng...
research
12/08/2020

Unsupervised Label Refinement Improves Dataless Text Classification

Dataless text classification is capable of classifying documents into pr...

Please sign up or login with your details

Forgot password? Click here to reset