Open Vocabulary Extreme Classification Using Generative Models

05/12/2022
by   Daniel Simig, et al.
0

The extreme multi-label classification (XMC) task aims at tagging content with a subset of labels from an extremely large label set. The label vocabulary is typically defined in advance by domain experts and assumed to capture all necessary tags. However in real world scenarios this label set, although large, is often incomplete and experts frequently need to refine it. To develop systems that simplify this process, we introduce the task of open vocabulary XMC (OXMC): given a piece of content, predict a set of labels, some of which may be outside of the known tag set. Hence, in addition to not having training data for some labels - as is the case in zero-shot classification - models need to invent some labels on-the-fly. We propose GROOV, a fine-tuned seq2seq model for OXMC that generates the set of labels as a flat sequence and is trained using a novel loss independent of predicted label order. We show the efficacy of the approach, experimenting with popular XMC datasets for which GROOV is able to predict meaningful labels outside the given vocabulary while performing on par with state-of-the-art solutions for known labels.

READ FULL TEXT

page 9

page 12

page 14

page 15

page 16

page 17

page 20

page 21

research
10/26/2022

OTSeq2Set: An Optimal Transport Enhanced Sequence-to-Set Model for Extreme Multi-label Text Classification

Extreme multi-label text classification (XMTC) is the task of finding th...
research
01/09/2021

LightXML: Transformer with Dynamic Negative Sampling for High-Performance Extreme Multi-label Text Classification

Extreme Multi-label text Classification (XMC) is a task of finding the m...
research
12/03/2020

A Study on the Autoregressive and non-Autoregressive Multi-label Learning

Extreme classification tasks are multi-label tasks with an extremely lar...
research
10/20/2021

Propensity-scored Probabilistic Label Trees

Extreme multi-label classification (XMLC) refers to the task of tagging ...
research
05/12/2021

Semantic Diversity Learning for Zero-Shot Multi-label Classification

Training a neural network model for recognizing multiple labels associat...
research
06/01/2021

Enabling Efficiency-Precision Trade-offs for Label Trees in Extreme Classification

Extreme multi-label classification (XMC) aims to learn a model that can ...
research
09/14/2021

Expert Knowledge-Guided Length-Variant Hierarchical Label Generation for Proposal Classification

To advance the development of science and technology, research proposals...

Please sign up or login with your details

Forgot password? Click here to reset