MlTr: Multi-label Classification with Transformer

06/11/2021
by   Xing Cheng, et al.
10

The task of multi-label image classification is to recognize all the object labels presented in an image. Though advancing for years, small objects, similar objects and objects with high conditional probability are still the main bottlenecks of previous convolutional neural network(CNN) based models, limited by convolutional kernels' representational capacity. Recent vision transformer networks utilize the self-attention mechanism to extract the feature of pixel granularity, which expresses richer local semantic information, while is insufficient for mining global spatial dependence. In this paper, we point out the three crucial problems that CNN-based methods encounter and explore the possibility of conducting specific transformer modules to settle them. We put forward a Multi-label Transformer architecture(MlTr) constructed with windows partitioning, in-window pixel attention, cross-window attention, particularly improving the performance of multi-label image classification tasks. The proposed MlTr shows state-of-the-art results on various prevalent multi-label datasets such as MS-COCO, Pascal-VOC, and NUS-WIDE with 88.5 The code will be available soon at https://github.com/starmemda/MlTr/

READ FULL TEXT

page 2

page 5

page 9

page 13

research
07/22/2021

Query2Label: A Simple Transformer Way to Multi-Label Classification

This paper presents a simple and effective approach to solving the multi...
research
12/26/2020

Coarse to Fine: Multi-label Image Classification with Global/Local Attention

In our daily life, the scenes around us are always with multiple labels ...
research
08/21/2023

LDCSF: Local depth convolution-based Swim framework for classifying multi-label histopathology images

Histopathological images are the gold standard for diagnosing liver canc...
research
09/14/2022

Combining Metric Learning and Attention Heads For Accurate and Efficient Multilabel Image Classification

Multi-label image classification allows predicting a set of labels from ...
research
06/22/2021

Multi-layered Semantic Representation Network for Multi-label Image Classification

Multi-label image classification (MLIC) is a fundamental and practical t...
research
03/01/2023

Label Attention Network for sequential multi-label classification

Multi-label classification is a natural problem statement for sequential...
research
07/18/2023

PatchCT: Aligning Patch Set and Label Set with Conditional Transport for Multi-Label Image Classification

Multi-label image classification is a prediction task that aims to ident...

Please sign up or login with your details

Forgot password? Click here to reset