Fast GraspNeXt: A Fast Self-Attention Neural Network Architecture for Multi-task Learning in Computer Vision Tasks for Robotic Grasping on the Edge

04/21/2023
by   Alexander Wong, et al.
0

Multi-task learning has shown considerable promise for improving the performance of deep learning-driven vision systems for the purpose of robotic grasping. However, high architectural and computational complexity can result in poor suitability for deployment on embedded devices that are typically leveraged in robotic arms for real-world manufacturing and warehouse environments. As such, the design of highly efficient multi-task deep neural network architectures tailored for computer vision tasks for robotic grasping on the edge is highly desired for widespread adoption in manufacturing environments. Motivated by this, we propose Fast GraspNeXt, a fast self-attention neural network architecture tailored for embedded multi-task learning in computer vision tasks for robotic grasping. To build Fast GraspNeXt, we leverage a generative network architecture search strategy with a set of architectural constraints customized to achieve a strong balance between multi-task learning performance and embedded inference efficiency. Experimental results on the MetaGraspNet benchmark dataset show that the Fast GraspNeXt network design achieves the highest performance (average precision (AP), accuracy, and mean squared error (MSE)) across multiple computer vision tasks when compared to other efficient multi-task network architecture designs, while having only 17.8M parameters (about >5x smaller), 259 GFLOPs (as much as >5x lower) and as much as >3.15x faster on a NVIDIA Jetson TX2 embedded processor.

READ FULL TEXT

page 1

page 4

research
10/25/2021

AutoMTL: A Programming Framework for Automated Multi-Task Learning

Multi-task learning (MTL) jointly learns a set of tasks. It is a promisi...
research
08/22/2023

TurboViT: Generating Fast Vision Transformers via Generative Architecture Search

Vision transformers have shown unprecedented levels of performance in ta...
research
05/30/2023

Edge-MoE: Memory-Efficient Multi-Task Vision Transformer Architecture with Task-level Sparsity via Mixture-of-Experts

Computer vision researchers are embracing two promising paradigms: Visio...
research
11/05/2020

Improving Robotic Grasping on Monocular Images Via Multi-Task Learning and Positional Loss

In this paper, we introduce two methods of improving real-time object gr...
research
12/27/2021

A Multi-channel Training Method Boost the Performance

Deep convolutional neural network has made huge revolution and shown its...
research
05/27/2021

Embedded Vision for Self-Driving on Forest Roads

Forest roads in Romania are unique natural wildlife sites used for recre...

Please sign up or login with your details

Forgot password? Click here to reset