A no-regret generalization of hierarchical softmax to extreme multi-label classification

10/27/2018
by   Marek Wydmuch, et al.
0

Extreme multi-label classification (XMLC) is a problem of tagging an instance with a small subset of relevant labels chosen from an extremely large pool of possible labels. Large label spaces can be efficiently handled by organizing labels as a tree, like in the hierarchical softmax (HSM) approach commonly used for multi-class problems. In this paper, we investigate probabilistic label trees (PLTs) that have been recently devised for tackling XMLC problems. We show that PLTs are a no-regret multi-label generalization of HSM when precision@k is used as a model evaluation metric. Critically, we prove that pick-one-label heuristic - a reduction technique from multi-label to multi-class that is routinely used along with HSM - is not consistent in general. We also show that our implementation of PLTs, referred to as extremeText (XT), obtains significantly better results than HSM with the pick-one-label heuristic and XML-CNN, a deep network specifically designed for XMLC problems. Moreover, XT is competitive to many state-of-the-art approaches in terms of statistical performance, model size and prediction time which makes it amenable to deploy in an online system.

READ FULL TEXT
research
09/23/2020

Probabilistic Label Trees for Extreme Multi-label Classification

Extreme multi-label classification (XMLC) is a learning task of tagging ...
research
03/05/2021

Stratified Sampling for Extreme Multi-Label Data

Extreme multi-label classification (XML) is becoming increasingly releva...
research
06/01/2021

Enabling Efficiency-Precision Trade-offs for Label Trees in Extreme Classification

Extreme multi-label classification (XMC) aims to learn a model that can ...
research
07/28/2021

XFL: eXtreme Function Labeling

Reverse engineers would benefit from identifiers like function names, bu...
research
07/17/2018

Contextual Memory Trees

We design and study a Contextual Memory Tree (CMT), a learning memory co...
research
07/08/2020

Online probabilistic label trees

We introduce online probabilistic label trees (OPLTs), an algorithm that...
research
11/23/2014

Target Fishing: A Single-Label or Multi-Label Problem?

According to Cobanoglu et al and Murphy, it is now widely acknowledged t...

Please sign up or login with your details

Forgot password? Click here to reset