Chat2Map: Efficient Scene Mapping from Multi-Ego Conversations

01/04/2023
by   Sagnik Majumder, et al.
0

Can conversational videos captured from multiple egocentric viewpoints reveal the map of a scene in a cost-efficient way? We seek to answer this question by proposing a new problem: efficiently building the map of a previously unseen 3D environment by exploiting shared information in the egocentric audio-visual observations of participants in a natural conversation. Our hypothesis is that as multiple people ("egos") move in a scene and talk among themselves, they receive rich audio-visual cues that can help uncover the unseen areas of the scene. Given the high cost of continuously processing egocentric visual streams, we further explore how to actively coordinate the sampling of visual information, so as to minimize redundancy and reduce power use. To that end, we present an audio-visual deep reinforcement learning approach that works with our shared scene mapper to selectively turn on the camera to efficiently chart out the space. We evaluate the approach using a state-of-the-art audio-visual simulator for 3D scenes as well as real-world video. Our model outperforms previous state-of-the-art mapping methods, and achieves an excellent cost-accuracy tradeoff. Project: http://vision.cs.utexas.edu/projects/chat2map.

READ FULL TEXT

page 1

page 4

page 8

research
12/11/2018

2.5D Visual Sound

Binaural audio provides a listener with 3D sound sensation, allowing a r...
research
02/04/2023

AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis

Human perception of the complex world relies on a comprehensive analysis...
research
06/14/2021

Learning Audio-Visual Dereverberation

Reverberation from audio reflecting off surfaces and objects in the envi...
research
07/10/2023

Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos

We propose a self-supervised method for learning representations based o...
research
12/20/2019

Exploring Context, Attention and Audio Features for Audio Visual Scene-Aware Dialog

We are witnessing a confluence of vision, speech and dialog system techn...
research
07/22/2022

Egocentric scene context for human-centric environment understanding from video

First-person video highlights a camera-wearer's activities in the contex...
research
05/15/2021

Move2Hear: Active Audio-Visual Source Separation

We introduce the active audio-visual source separation problem, where an...

Please sign up or login with your details

Forgot password? Click here to reset