QKVA grid: Attention in Image Perspective and Stacked DETR

07/09/2022
by   Wenyuan Sheng, et al.
0

We present a new model named Stacked-DETR(SDETR), which inherits the main ideas in canonical DETR. We improve DETR in two directions: simplifying the cost of training and introducing the stacked architecture to enhance the performance. To the former, we focus on the inside of the Attention block and propose the QKVA grid, a new perspective to describe the process of attention. By this, we can step further on how Attention works for image problems and the effect of multi-head. These two ideas contribute the design of single-head encoder-layer. To the latter, SDETR reaches great improvement(+1.1AP, +3.4APs) to DETR. Especially to the performance on small objects, SDETR achieves better results to the optimized Faster R-CNN baseline, which was a shortcoming in DETR. Our changes are based on the code of DETR. Training code and pretrained models are available at https://github.com/shengwenyuan/sdetr.

READ FULL TEXT
research
06/13/2019

Grid R-CNN Plus: Faster and Better

Grid R-CNN is a well-performed objection detection framework. It transfo...
research
06/06/2021

Transformer in Convolutional Neural Networks

We tackle the low-efficiency flaw of vision transformer caused by the hi...
research
06/21/2023

Deep Language Networks: Joint Prompt Training of Stacked LLMs using Variational Inference

We view large language models (LLMs) as stochastic language layers in a ...
research
06/17/2022

SimA: Simple Softmax-free Attention for Vision Transformers

Recently, vision transformers have become very popular. However, deployi...
research
08/04/2023

A Parameter-efficient Multi-subject Model for Predicting fMRI Activity

This is the Algonauts 2023 submission report for team "BlobGPT". Our mod...
research
02/13/2020

Superpixel Image Classification with Graph Attention Networks

This document reports the use of Graph Attention Networks for classifyin...
research
08/27/2022

Multi-Outputs Is All You Need For Deblur

Image deblurring task is an ill-posed one, where exists infinite feasibl...

Please sign up or login with your details

Forgot password? Click here to reset