Enabling Efficiency-Precision Trade-offs for Label Trees in Extreme Classification

06/01/2021
by   Tavor Z. Baharav, et al.
26

Extreme multi-label classification (XMC) aims to learn a model that can tag data points with a subset of relevant labels from an extremely large label set. Real world e-commerce applications like personalized recommendations and product advertising can be formulated as XMC problems, where the objective is to predict for a user a small subset of items from a catalog of several million products. For such applications, a common approach is to organize these labels into a tree, enabling training and inference times that are logarithmic in the number of labels. While training a model once a label tree is available is well studied, designing the structure of the tree is a difficult task that is not yet well understood, and can dramatically impact both model latency and statistical performance. Existing approaches to tree construction fall at an extreme point, either optimizing exclusively for statistical performance, or for latency. We propose an efficient information theory inspired algorithm to construct intermediary operating points that trade off between the benefits of both. Our algorithm enables interpolation between these objectives, which was not previously possible. We corroborate our theoretical analysis with numerical results, showing that on the Wiki-500K benchmark dataset our method can reduce a proxy for expected latency by up to 28 as Parabel. On several datasets derived from e-commerce customer logs, our modified label tree is able to improve this expected latency metric by up to 20 realizing these latency improvements in deployed models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/27/2018

A no-regret generalization of hierarchical softmax to extreme multi-label classification

Extreme multi-label classification (XMLC) is a problem of tagging an ins...
research
09/23/2020

Probabilistic Label Trees for Extreme Multi-label Classification

Extreme multi-label classification (XMLC) is a learning task of tagging ...
research
08/01/2021

DECAF: Deep Extreme Classification with Label Features

Extreme multi-label classification (XML) involves tagging a data point w...
research
06/04/2021

Accelerating Inference for Sparse Extreme Multi-Label Ranking Trees

Tree-based models underpin many modern semantic search engines and recom...
research
05/24/2019

LdSM: Logarithm-depth Streaming Multi-label Decision Trees

We consider multi-label classification where the goal is to annotate eac...
research
06/23/2021

Extreme Multi-label Learning for Semantic Matching in Product Search

We consider the problem of semantic matching in product search: given a ...
research
05/12/2022

Open Vocabulary Extreme Classification Using Generative Models

The extreme multi-label classification (XMC) task aims at tagging conten...

Please sign up or login with your details

Forgot password? Click here to reset