Visual Transformers with Primal Object Queries for Multi-Label Image Classification

12/10/2021
by   Vacit Oguz Yazici, et al.
0

Multi-label image classification is about predicting a set of class labels that can be considered as orderless sequential data. Transformers process the sequential data as a whole, therefore they are inherently good at set prediction. The first vision-based transformer model, which was proposed for the object detection task introduced the concept of object queries. Object queries are learnable positional encodings that are used by attention modules in decoder layers to decode the object classes or bounding boxes using the region of interests in an image. However, inputting the same set of object queries to different decoder layers hinders the training: it results in lower performance and delays convergence. In this paper, we propose the usage of primal object queries that are only provided at the start of the transformer decoder stack. In addition, we improve the mixup technique proposed for multi-label classification. The proposed transformer model with primal object queries improves the state-of-the-art class wise F1 metric by 2.1 and speeds up the convergence by 79.0 datasets respectively.

READ FULL TEXT

page 5

page 6

research
11/27/2020

General Multi-label Image Classification with Transformers

Multi-label image classification is the task of predicting a set of labe...
research
07/22/2021

Query2Label: A Simple Transformer Way to Multi-Label Classification

This paper presents a simple and effective approach to solving the multi...
research
03/01/2023

Label Attention Network for sequential multi-label classification

Multi-label classification is a natural problem statement for sequential...
research
03/24/2023

Category Query Learning for Human-Object Interaction Classification

Unlike most previous HOI methods that focus on learning better human-obj...
research
11/25/2021

ML-Decoder: Scalable and Versatile Classification Head

In this paper, we introduce ML-Decoder, a new attention-based classifica...
research
09/12/2023

Predicting Routine Object Usage for Proactive Robot Assistance

Proactivity in robot assistance refers to the robot's ability to anticip...
research
03/28/2021

TransCenter: Transformers with Dense Queries for Multiple-Object Tracking

Transformer networks have proven extremely powerful for a wide variety o...

Please sign up or login with your details

Forgot password? Click here to reset