Bonsai - Diverse and Shallow Trees for Extreme Multi-label Classification

04/17/2019
by   Sujay Khandagale, et al.
0

Extreme multi-label classification refers to supervised multi-label learning involving hundreds of thousand or even millions of labels. In this paper, we develop a shallow tree-based algorithm, called Bonsai, which promotes diversity of the label space and easily scales to millions of labels. Bonsai relaxes the two main constraints of the recently proposed tree-based algorithm, Parabel, which partitions labels at each tree node into exactly two child nodes, and imposes label balanced-ness between these nodes. Instead, Bonsai encourages diversity in the partitioning process by (i) allowing a much larger fan-out at each node, and (ii) maintaining the diversity of the label set further by enabling potentially imbalanced partitioning. By allowing such flexibility, it achieves the best of both worlds - fast training of tree-based methods, and prediction accuracy better than Parabel, and at par with one-vs-rest methods. As a result, Bonsai outperforms state-of-the-art one-vs-rest methods such as DiSMEC in terms of prediction accuracy, while being orders of magnitude faster to train. The code for is available at https://github.com/xmc-aalto/bonsai.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

09/08/2016

DiSMEC - Distributed Sparse Machines for Extreme Multi-label Classification

Extreme multi-label classification refers to supervised multi-label lear...
05/24/2019

LdSM: Logarithm-depth Streaming Multi-label Decision Trees

We consider multi-label classification where the goal is to annotate eac...
11/04/2018

Block-wise Partitioning for Extreme Multi-label Classification

Extreme multi-label classification aims to learn a classifier that annot...
08/01/2021

DECAF: Deep Extreme Classification with Label Features

Extreme multi-label classification (XML) involves tagging a data point w...
03/05/2021

Stratified Sampling for Extreme Multi-Label Data

Extreme multi-label classification (XML) is becoming increasingly releva...
06/04/2021

Accelerating Inference for Sparse Extreme Multi-Label Ranking Trees

Tree-based models underpin many modern semantic search engines and recom...
10/02/2018

Structured Multi-Label Biomedical Text Tagging via Attentive Neural Tree Decoding

We propose a model for tagging unstructured texts with an arbitrary numb...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.