Content-Augmented Feature Pyramid Network with Light Linear Transformers

05/20/2021
by   Yongxiang Gu, et al.
0

Recently, plenty of work has tried to introduce transformers into computer vision tasks, with good results. Unlike classic convolution networks, which extract features within a local receptive field, transformers can adaptively aggregate similar features from a global view using self-attention mechanism. For object detection, Feature Pyramid Network (FPN) proposes feature interaction across layers and proves its extremely importance. However, its interaction is still in a local manner, which leaves a lot of room for improvement. Since transformer was originally designed for NLP tasks, adapting processing subject directly from text to image will cause unaffordable computation and space overhead. In this paper, we utilize a linearized attention function to overcome above problems and build a novel architecture, named Content-Augmented Feature Pyramid Network (CA-FPN), which proposes a global content extraction module and deeply combines with FPN through light linear transformers. What's more, light transformers can further make the application of multi-head attention mechanism easier. Most importantly, our CA-FPN can be readily plugged into existing FPN-based models. Extensive experiments on the challenging COCO object detection dataset demonstrated that our CA-FPN significantly outperforms competitive baselines without bells and whistles. Code will be made publicly available.

READ FULL TEXT

page 2

page 11

research
09/16/2022

ConvFormer: Closing the Gap Between CNN and Vision Transformers

Vision transformers have shown excellent performance in computer vision ...
research
07/18/2020

Feature Pyramid Transformer

Feature interactions across space and scales underpin modern visual reco...
research
05/23/2020

Attention-guided Context Feature Pyramid Network for Object Detection

For object detection, how to address the contradictory requirement betwe...
research
03/26/2023

Feature Shrinkage Pyramid for Camouflaged Object Detection with Transformers

Vision transformers have recently shown strong global context modeling c...
research
10/08/2021

Trident Pyramid Networks: The importance of processing at the feature pyramid level for better object detection

Feature pyramids have become ubiquitous in multi-scale computer vision t...
research
10/28/2022

Grafting Vision Transformers

Vision Transformers (ViTs) have recently become the state-of-the-art acr...
research
09/15/2021

PnP-DETR: Towards Efficient Visual Analysis with Transformers

Recently, DETR pioneered the solution of vision tasks with transformers,...

Please sign up or login with your details

Forgot password? Click here to reset