Training Strategies for Vision Transformers for Object Detection

04/05/2023
by   Apoorv Singh, et al.
0

Vision-based Transformer have shown huge application in the perception module of autonomous driving in terms of predicting accurate 3D bounding boxes, owing to their strong capability in modeling long-range dependencies between the visual features. However Transformers, initially designed for language models, have mostly focused on the performance accuracy, and not so much on the inference-time budget. For a safety critical system like autonomous driving, real-time inference at the on-board compute is an absolute necessity. This keeps our object detection algorithm under a very tight run-time budget. In this paper, we evaluated a variety of strategies to optimize on the inference-time of vision transformers based object detection methods keeping a close-watch on any performance variations. Our chosen metric for these strategies is accuracy-runtime joint optimization. Moreover, for actual inference-time analysis we profile our strategies with float32 and float16 precision with TensorRT module. This is the most common format used by the industry for deployment of their Machine Learning networks on the edge devices. We showed that our strategies are able to improve inference-time by 63 cost of performance drop of mere 3 evaluation section. These strategies brings down Vision Transformers detectors inference-time even less than traditional single-image based CNN detectors like FCOS. We recommend practitioners use these techniques to deploy Transformers based hefty multi-view networks on a budge-constrained robotic platform.

READ FULL TEXT
research
07/30/2021

Real-time Streaming Perception System for Autonomous Driving

Nowadays, plenty of deep learning technologies are being applied to all ...
research
09/05/2023

Compressing Vision Transformers for Low-Resource Visual Learning

Vision transformer (ViT) and its variants have swept through visual lear...
research
08/09/2023

Leveraging the Edge and Cloud for V2X-Based Real-Time Object Detection in Autonomous Driving

Environmental perception is a key element of autonomous driving because ...
research
02/13/2023

Surround-View Vision-based 3D Detection for Autonomous Driving: A Survey

Vision-based 3D Detection task is fundamental task for the perception of...
research
01/27/2022

Vision Checklist: Towards Testable Error Analysis of Image Models to Help System Designers Interrogate Model Capabilities

Using large pre-trained models for image recognition tasks is becoming i...
research
09/21/2022

Safety Metrics and Losses for Object Detection in Autonomous Driving

State-of-the-art object detectors have been shown effective in many appl...
research
03/29/2023

An intelligent modular real-time vision-based system for environment perception

A significant portion of driving hazards is caused by human error and di...

Please sign up or login with your details

Forgot password? Click here to reset