ML-Decoder: Scalable and Versatile Classification Head

11/25/2021
by   Tal Ridnik, et al.
0

In this paper, we introduce ML-Decoder, a new attention-based classification head. ML-Decoder predicts the existence of class labels via queries, and enables better utilization of spatial data compared to global average pooling. By redesigning the decoder architecture, and using a novel group-decoding scheme, ML-Decoder is highly efficient, and can scale well to thousands of classes. Compared to using a larger backbone, ML-Decoder consistently provides a better speed-accuracy trade-off. ML-Decoder is also versatile - it can be used as a drop-in replacement for various classification heads, and generalize to unseen classes when operated with word queries. Novel query augmentations further improve its generalization ability. Using ML-Decoder, we achieve state-of-the-art results on several classification tasks: on MS-COCO multi-label, we reach 91.4 and on ImageNet single-label, we reach with vanilla ResNet50 backbone a new top score of 80.7 at: https://github.com/Alibaba-MIIL/ML_Decoder

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/22/2021

Query2Label: A Simple Transformer Way to Multi-Label Classification

This paper presents a simple and effective approach to solving the multi...
research
08/19/2022

A Dual Modality Approach For (Zero-Shot) Multi-Label Classification

In computer vision, multi-label classification, including zero-shot mult...
research
12/10/2021

Visual Transformers with Primal Object Queries for Multi-Label Image Classification

Multi-label image classification is about predicting a set of class labe...
research
11/22/2022

DETRs with Collaborative Hybrid Assignments Training

In this paper, we provide the observation that too few queries assigned ...
research
09/29/2020

Asymmetric Loss For Multi-Label Classification

Pictures of everyday life are inherently multi-label in nature. Hence, m...
research
08/18/2023

Deep Equilibrium Object Detection

Query-based object detectors directly decode image features into object ...
research
11/15/2021

Multi-Task Classification of Sewer Pipe Defects and Properties using a Cross-Task Graph Neural Network Decoder

The sewerage infrastructure is one of the most important and expensive i...

Please sign up or login with your details

Forgot password? Click here to reset