CNN-transformer mixed model for object detection

12/13/2022
by   Wenshuo Li, et al.
0

Object detection, one of the three main tasks of computer vision, has been used in various applications. The main process is to use deep neural networks to extract the features of an image and then use the features to identify the class and location of an object. Therefore, the main direction to improve the accuracy of object detection tasks is to improve the neural network to extract features better. In this paper, I propose a convolutional module with a transformer[1], which aims to improve the recognition accuracy of the model by fusing the detailed features extracted by CNN[2] with the global features extracted by a transformer and significantly reduce the computational effort of the transformer module by deflating the feature mAP. The main execution steps are convolutional downsampling to reduce the feature map size, then self-attention calculation and upsampling, and finally concatenation with the initial input. In the experimental part, after splicing the block to the end of YOLOv5n[3] and training 300 epochs on the coco dataset, the mAP improved by 1.7 saturation phenomenon, so there is still potential for improvement. After 100 rounds of training on the Pascal VOC dataset, the accuracy of the results reached 81 the backbone, but the number of parameters is less than one-twentieth of it.

READ FULL TEXT

page 1

page 2

page 5

page 6

research
12/14/2020

Decoupled Self Attention for Accurate One Stage Object Detection

As the scale of object detection dataset is smaller than that of image r...
research
10/26/2021

YOLO-ReT: Towards High Accuracy Real-time Object Detection on Edge GPUs

Performance of object detection models has been growing rapidly on two m...
research
02/09/2023

IH-ViT: Vision Transformer-based Integrated Circuit Appear-ance Defect Detection

For the problems of low recognition rate and slow recognition speed of t...
research
10/08/2022

Towards Light Weight Object Detection System

Transformers are a popular choice for classification tasks and as backbo...
research
11/13/2020

Transformer-Encoder Detector Module: Using Context to Improve Robustness to Adversarial Attacks on Object Detection

Deep neural network approaches have demonstrated high performance in obj...
research
11/18/2020

End-to-End Object Detection with Adaptive Clustering Transformer

End-to-end Object Detection with Transformer (DETR)proposes to perform o...
research
02/22/2021

Deepfake Video Detection Using Convolutional Vision Transformer

The rapid advancement of deep learning models that can generate and synt...

Please sign up or login with your details

Forgot password? Click here to reset