Improved Image Classification with Token Fusion

08/19/2022
by   Keong Hun Choi, et al.
0

In this paper, we propose a method using the fusion of CNN and transformer structure to improve image classification performance. In the case of CNN, information about a local area on an image can be extracted well, but there is a limit to the extraction of global information. On the other hand, the transformer has an advantage in relatively global extraction, but has a disadvantage in that it requires a lot of memory for local feature value extraction. In the case of an image, it is converted into a feature map through CNN, and each feature map's pixel is considered a token. At the same time, the image is divided into patch areas and then fused with the transformer method that views them as tokens. For the fusion of tokens with two different characteristics, we propose three methods: (1) late token fusion with parallel structure, (2) early token fusion, (3) token fusion in a layer by layer. In an experiment using ImageNet 1k, the proposed method shows the best classification performance.

READ FULL TEXT

page 3

page 4

page 6

research
03/27/2021

CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification

The recently developed vision transformer (ViT) has achieved promising r...
research
05/25/2023

Multi-scale Efficient Graph-Transformer for Whole Slide Image Classification

The multi-scale information among the whole slide images (WSIs) is essen...
research
11/25/2021

Global Interaction Modelling in Vision Transformer via Super Tokens

With the popularity of Transformer architectures in computer vision, the...
research
11/19/2022

TORE: Token Reduction for Efficient Human Mesh Recovery with Transformer

In this paper, we introduce a set of effective TOken REduction (TORE) st...
research
10/04/2022

ImmFusion: Robust mmWave-RGB Fusion for 3D Human Body Reconstruction in All Weather Conditions

3D human reconstruction from RGB images achieves decent results in good ...
research
04/13/2023

TransHP: Image Classification with Hierarchical Prompting

This paper explores a hierarchical prompting mechanism for the hierarchi...
research
03/11/2023

TransMatting: Tri-token Equipped Transformer Model for Image Matting

Image matting aims to predict alpha values of elaborate uncertainty area...

Please sign up or login with your details

Forgot password? Click here to reset