Active Sparse Conversations for Improved Audio-Visual Embodied Navigation

06/06/2023
by   Xiulong Liu, et al.
0

Efficient navigation towards an audio-goal necessitates an embodied agent to not only possess the ability to use audio-visual cues effectively, but also be equipped to actively (but occasionally) seek human/oracle assistance without sacrificing autonomy, e.g., when it is uncertain of where to navigate towards locating a noisy or sporadic audio goal. To this end, we present CAVEN – a conversational audio-visual embodied navigation agent that is capable of posing navigation questions to a human/oracle and processing the oracle responses; both in free-form natural language. At the core of CAVEN is a multimodal hierarchical reinforcement learning (RL) setup that is equipped with a high-level policy that is trained to choose from one of three low-level policies (at every step), namely: (i) to navigate using audio-visual cues, or (ii) to frame a question to the oracle and receive a short or detailed response, or (iii) ask generic questions (when unsure of what to ask) and receive instructions. Key to generating the agent's questions is our novel TrajectoryNet that forecasts the most likely next steps to the goal and a QuestionNet that uses these steps to produce a question. All the policies are learned end-to-end via the RL setup, with penalties to enforce sparsity in receiving navigation instructions from the oracle. To evaluate the performance of CAVEN, we present extensive experiments on the SoundSpaces framework for the task of semantic audio-visual navigation. Our results show that CAVEN achieves upto 12 new sound sources, even in the presence of auditory distractions.

READ FULL TEXT

page 1

page 8

page 14

page 15

research
10/14/2022

AVLEN: Audio-Visual-Language Embodied Navigation in 3D Environments

Recent years have seen embodied visual navigation advance in two distinc...
research
12/02/2019

Just Ask:An Interactive Learning Framework for Vision and Language Navigation

In the vision and language navigation task, the agent may encounter ambi...
research
10/01/2018

Visual Curiosity: Learning to Ask Questions to Learn Visual Recognition

In an open-world setting, it is inevitable that an intelligent agent (e....
research
08/21/2020

Learning to Set Waypoints for Audio-Visual Navigation

In audio-visual navigation, an agent intelligently travels through a com...
research
06/20/2022

Good Time to Ask: A Learning Framework for Asking for Help in Embodied Visual Navigation

In reality, it is often more efficient to ask for help than to search th...
research
06/01/2022

Towards Generalisable Audio Representations for Audio-Visual Navigation

In audio-visual navigation (AVN), an intelligent agent needs to navigate...
research
08/10/2021

Haptic Situational Awareness Using Continuous Vibrotactile Sensations

In this research, we have developed a haptic situational awareness devic...

Please sign up or login with your details

Forgot password? Click here to reset