DiSMEC - Distributed Sparse Machines for Extreme Multi-label Classification

09/08/2016
by   Rohit Babbar, et al.
0

Extreme multi-label classification refers to supervised multi-label learning involving hundreds of thousands or even millions of labels. Datasets in extreme classification exhibit fit to power-law distribution, i.e. a large fraction of labels have very few positive instances in the data distribution. Most state-of-the-art approaches for extreme multi-label classification attempt to capture correlation among labels by embedding the label matrix to a low-dimensional linear sub-space. However, in the presence of power-law distributed extremely large and diverse label spaces, structural assumptions such as low rank can be easily violated. In this work, we present DiSMEC, which is a large-scale distributed framework for learning one-versus-rest linear classifiers coupled with explicit capacity control to control model size. Unlike most state-of-the-art methods, DiSMEC does not make any low rank assumptions on the label matrix. Using double layer of parallelization, DiSMEC can learn classifiers for datasets consisting hundreds of thousands labels within few hours. The explicit capacity control mechanism filters out spurious parameters which keep the model compact in size, without losing prediction accuracy. We conduct extensive empirical evaluation on publicly available real-world datasets consisting upto 670,000 labels. We compare DiSMEC with recent state-of-the-art approaches, including - SLEEC which is a leading approach for learning sparse local embeddings, and FastXML which is a tree-based approach optimizing ranking based loss function. On some of the datasets, DiSMEC can significantly boost prediction accuracies - 10 compared to SLECC and 15

READ FULL TEXT
research
03/05/2018

Adversarial Extreme Multi-label Classification

The goal in extreme multi-label classification is to learn a classifier ...
research
04/17/2019

Bonsai - Diverse and Shallow Trees for Extreme Multi-label Classification

Extreme multi-label classification refers to supervised multi-label lear...
research
03/08/2018

Efficient Loss-Based Decoding On Graphs For Extreme Classification

In extreme classification problems, learning algorithms are required to ...
research
09/16/2017

Subset Labeled LDA for Large-Scale Multi-Label Classification

Labeled Latent Dirichlet Allocation (LLDA) is an extension of the standa...
research
06/18/2016

An Efficient Large-scale Semi-supervised Multi-label Classifier Capable of Handling Missing labels

Multi-label classification has received considerable interest in recent ...
research
06/04/2021

Accelerating Inference for Sparse Extreme Multi-Label Ranking Trees

Tree-based models underpin many modern semantic search engines and recom...
research
04/08/2020

Global Expanding, Local Shrinking: Discriminant Multi-label Learning with Missing Labels

In multi-label learning, the issue of missing labels brings a major chal...

Please sign up or login with your details

Forgot password? Click here to reset