An Embarrassingly Simple Baseline for eXtreme Multi-label Prediction

12/17/2019
by   Yashaswi Verma, et al.
0

The goal of eXtreme Multi-label Learning (XML) is to design and learn a model that can automatically annotate a given data point with the most relevant subset of labels from an extremely large label set. Recently, many techniques have been proposed for XML that achieve reasonable performance on benchmark datasets. Motivated by the complexities of these methods and their subsequent training requirements, in this paper we propose a simple baseline technique for this task. Precisely, we present a global feature embedding technique for XML that can easily scale to very large datasets containing millions of data points in very high-dimensional feature space, irrespective of number of samples and labels. Next we show how an ensemble of such global embeddings can be used to achieve further boost in prediction accuracies with only linear increase in training and prediction time. During testing, we assign the labels using a weighted k-nearest neighbour classifier in the embedding space. Experiments reveal that though conceptually simple, this technique achieves quite competitive results, and has training time of less than one minute using a single CPU core with 15.6 GB RAM even for large-scale datasets such as Amazon-3M.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/31/2016

A High Speed Multi-label Classifier based on Extreme Learning Machines

In this paper a high speed neural network classifier based on extreme le...
research
04/20/2023

Light-weight Deep Extreme Multilabel Classification

Extreme multi-label (XML) classification refers to the task of supervise...
research
11/04/2018

Block-wise Partitioning for Extreme Multi-label Classification

Extreme multi-label classification aims to learn a classifier that annot...
research
05/28/2019

Accelerating Extreme Classification via Adaptive Feature Agglomeration

Extreme classification seeks to assign each data point, the most relevan...
research
02/12/2018

Revisiting the Vector Space Model: Sparse Weighted Nearest-Neighbor Method for Extreme Multi-Label Classification

Machine learning has played an important role in information retrieval (...
research
04/11/2019

Ranking-Based Autoencoder for Extreme Multi-label Classification

Extreme Multi-label classification (XML) is an important yet challenging...
research
08/31/2021

Fast Multi-label Learning

Embedding approaches have become one of the most pervasive techniques fo...

Please sign up or login with your details

Forgot password? Click here to reset