Dual-stream Network for Visual Recognition

05/31/2021
by   Mingyuan Mao, et al.
0

Transformers with remarkable global representation capacities achieve competitive results for visual tasks, but fail to consider high-level local pattern information in input images. In this paper, we present a generic Dual-stream Network (DS-Net) to fully explore the representation capacity of local and global pattern features for image classification. Our DS-Net can simultaneously calculate fine-grained and integrated features and efficiently fuse them. Specifically, we propose an Intra-scale Propagation module to process two different resolutions in each block and an Inter-Scale Alignment module to perform information interaction across features at dual scales. Besides, we also design a Dual-stream FPN (DS-FPN) to further enhance contextual information for downstream dense predictions. Without bells and whistles, the propsed DS-Net outperforms Deit-Small by 2.4 accuracy on ImageNet-1k and achieves state-of-the-art performance over other Vision Transformers and ResNets. For object detection and instance segmentation, DS-Net-Small respectively outperforms ResNet-50 by 6.4 in terms of mAP on MSCOCO 2017, and surpasses the previous state-of-the-art scheme, which significantly demonstrates its potential to be a general backbone in vision tasks. The code will be released soon.

READ FULL TEXT

page 3

page 4

research
08/31/2022

MAFormer: A Transformer Network with Multi-scale Attention Fusion for Visual Recognition

Vision Transformer and its variants have demonstrated great potential in...
research
04/13/2021

Co-Scale Conv-Attentional Image Transformers

In this paper, we present Co-scale conv-attentional image Transformers (...
research
03/03/2022

Color Space-based HoVer-Net for Nuclei Instance Segmentation and Classification

Nuclei segmentation and classification is the first and most crucial ste...
research
11/22/2022

Compiler Provenance Recovery for Multi-CPU Architectures Using a Centrifuge Mechanism

Bit-stream recognition (BSR) has many applications, such as forensic inv...
research
11/07/2020

TB-Net: A Three-Stream Boundary-Aware Network for Fine-Grained Pavement Disease Segmentation

Regular pavement inspection plays a significant role in road maintenance...
research
07/27/2021

Enriching Local and Global Contexts for Temporal Action Localization

Effectively tackling the problem of temporal action localization (TAL) n...
research
10/11/2021

Revitalizing CNN Attentions via Transformers in Self-Supervised Visual Representation Learning

Studies on self-supervised visual representation learning (SSL) improve ...

Please sign up or login with your details

Forgot password? Click here to reset