Target-Grounded Graph-Aware Transformer for Aerial Vision-and-Dialog Navigation

08/22/2023
by   Yifei Su, et al.
0

This report details the methods of the winning entry of the AVDN Challenge in ICCV CLVL 2023. The competition addresses the Aerial Navigation from Dialog History (ANDH) task, which requires a drone agent to associate dialog history with aerial observations to reach the destination. For better cross-modal grounding abilities of the drone agent, we propose a Target-Grounded Graph-Aware Transformer (TG-GAT) framework. Concretely, TG-GAT first leverages a graph-aware transformer to capture spatiotemporal dependency, which benefits navigation state tracking and robust action planning. In addition,an auxiliary visual grounding task is devised to boost the agent's awareness of referred landmarks. Moreover, a hybrid augmentation strategy based on large language models is utilized to mitigate data scarcity limitations. Our TG-GAT framework won the AVDN Challenge, with 2.2 baseline on SPL and SR metrics, respectively. The code is available at https://github.com/yifeisu/TG-GAT.

READ FULL TEXT
research
05/24/2022

Aerial Vision-and-Dialog Navigation

The ability to converse with humans and follow commands in natural langu...
research
04/28/2020

VD-BERT: A Unified Vision and Dialog Transformer with BERT

Visual dialog is a challenging vision-language task, where a dialog agen...
research
08/27/2023

Multi-model fusion for Aerial Vision and Dialog Navigation based on human attention aids

Drones have been widely used in many areas of our daily lives. It reliev...
research
07/10/2019

Vision-and-Dialog Navigation

Robots navigating in human environments should use language to ask for a...
research
05/23/2023

R2H: Building Multimodal Navigation Helpers that Respond to Help

The ability to assist humans during a navigation task in a supportive ro...
research
11/27/2019

Perceive, Transform, and Act: Multi-Modal Attention Networks for Vision-and-Language Navigation

Vision-and-Language Navigation (VLN) is a challenging task in which an a...
research
06/20/2023

Multiverse Transformer: 1st Place Solution for Waymo Open Sim Agents Challenge 2023

This technical report presents our 1st place solution for the Waymo Open...

Please sign up or login with your details

Forgot password? Click here to reset