Bonsai - Diverse and Shallow Trees for Extreme Multi-label Classification

by   Sujay Khandagale, et al.

Extreme multi-label classification refers to supervised multi-label learning involving hundreds of thousand or even millions of labels. In this paper, we develop a shallow tree-based algorithm, called Bonsai, which promotes diversity of the label space and easily scales to millions of labels. Bonsai relaxes the two main constraints of the recently proposed tree-based algorithm, Parabel, which partitions labels at each tree node into exactly two child nodes, and imposes label balanced-ness between these nodes. Instead, Bonsai encourages diversity in the partitioning process by (i) allowing a much larger fan-out at each node, and (ii) maintaining the diversity of the label set further by enabling potentially imbalanced partitioning. By allowing such flexibility, it achieves the best of both worlds - fast training of tree-based methods, and prediction accuracy better than Parabel, and at par with one-vs-rest methods. As a result, Bonsai outperforms state-of-the-art one-vs-rest methods such as DiSMEC in terms of prediction accuracy, while being orders of magnitude faster to train. The code for is available at



There are no comments yet.


page 4


DiSMEC - Distributed Sparse Machines for Extreme Multi-label Classification

Extreme multi-label classification refers to supervised multi-label lear...

LdSM: Logarithm-depth Streaming Multi-label Decision Trees

We consider multi-label classification where the goal is to annotate eac...

Block-wise Partitioning for Extreme Multi-label Classification

Extreme multi-label classification aims to learn a classifier that annot...

DECAF: Deep Extreme Classification with Label Features

Extreme multi-label classification (XML) involves tagging a data point w...

Stratified Sampling for Extreme Multi-Label Data

Extreme multi-label classification (XML) is becoming increasingly releva...

Accelerating Inference for Sparse Extreme Multi-Label Ranking Trees

Tree-based models underpin many modern semantic search engines and recom...

Structured Multi-Label Biomedical Text Tagging via Attentive Neural Tree Decoding

We propose a model for tagging unstructured texts with an arbitrary numb...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.