Extreme Classification in Log Memory

10/09/2018
by   Qixuan Huang, et al.
0

We present Merged-Averaged Classifiers via Hashing (MACH) for K-classification with ultra-large values of K. Compared to traditional one-vs-all classifiers that require O(Kd) memory and inference cost, MACH only need O(d log K) (d is dimensionality )memory while only requiring O(K log K + d log K) operation for inference. MACH is a generic K-classification algorithm, with provably theoretical guarantees, which requires O(log K) memory without any assumption on the relationship between classes. MACH uses universal hashing to reduce classification with a large number of classes to few independent classification tasks with small (constant) number of classes. We provide theoretical quantification of discriminability-memory tradeoff. With MACH we can train ODP dataset with 100,000 classes and 400,000 features on a single Titan X GPU, with the classification accuracy of 19.28 best-reported accuracy on this dataset. Before this work, the best performing baseline is a one-vs-all classifier that requires 40 billion parameters (160 GB model size) and achieves 9 with 480x reduction in the model size (of mere 0.3GB). With MACH, we also demonstrate complete training of fine-grained imagenet dataset (compressed size 104GB), with 21,000 classes, on a single GPU. To the best of our knowledge, this is the first work to demonstrate complete training of these extreme-class datasets on a single Titan X.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/28/2019

Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products

In the last decade, it has been shown that many hard AI tasks, especiall...
research
10/23/2021

Federated Multiple Label Hashing (FedMLH): Communication Efficient Federated Learning on Extreme Classification Tasks

Federated learning enables many local devices to train a deep learning m...
research
11/24/2018

MEMOIR: Multi-class Extreme Classification with Inexact Margin

Multi-class classification with a very large number of classes, or extre...
research
09/16/2018

Maximum-Entropy Fine-Grained Classification

Fine-Grained Visual Classification (FGVC) is an important computer visio...
research
08/29/2018

Extreme Value Theory for Open Set Classification - GPD and GEV Classifiers

Classification tasks usually assume that all possible classes are presen...
research
01/30/2023

Massively Scaling Heteroscedastic Classifiers

Heteroscedastic classifiers, which learn a multivariate Gaussian distrib...
research
07/21/2022

The trade-offs of model size in large recommendation models : A 10000 × compressed criteo-tb DLRM model (100 GB parameters to mere 10MB)

Embedding tables dominate industrial-scale recommendation model sizes, u...

Please sign up or login with your details

Forgot password? Click here to reset