Neural Multisensory Scene Inference

10/06/2019
by   Jae Hyun Lim, et al.
7

For embodied agents to infer representations of the underlying 3D physical world they inhabit, they should efficiently combine multisensory cues from numerous trials, e.g., by looking at and touching objects. Despite its importance, multisensory 3D scene representation learning has received less attention compared to the unimodal setting. In this paper, we propose the Generative Multisensory Network (GMN) for learning latent representations of 3D scenes which are partially observable through multiple sensory modalities. We also introduce a novel method, called the Amortized Product-of-Experts, to improve the computational efficiency and the robustness to unseen combinations of modalities at test time. Experimental results demonstrate that the proposed model can efficiently infer robust modality-invariant 3D-scene representations from arbitrary combinations of modalities and perform accurate cross-modal generation. To perform this exploration, we also develop the Multisensory Embodied 3D-Scene Environment (MESE).

READ FULL TEXT

page 3

page 12

page 16

page 18

research
10/27/2016

Cross-Modal Scene Networks

People can recognize scenes across many different modalities beyond natu...
research
06/22/2023

Learning Unseen Modality Interaction

Multimodal learning assumes all modality combinations of interest are av...
research
07/03/2019

Cascade Attention Guided Residue Learning GAN for Cross-Modal Translation

Since we were babies, we intuitively develop the ability to correlate th...
research
06/06/2023

Towards Visual Foundational Models of Physical Scenes

We describe a first step towards learning general-purpose visual represe...
research
08/09/2023

Neural Field Movement Primitives for Joint Modelling of Scenes and Motions

This paper presents a novel Learning from Demonstration (LfD) method tha...
research
06/30/2020

A Framework for Learning Invariant Physical Relations in Multimodal Sensory Processing

Perceptual learning enables humans to recognize and represent stimuli in...
research
12/07/2020

Generating unseen complex scenes: are we there yet?

Although recent complex scene conditional generation models generate inc...

Please sign up or login with your details

Forgot password? Click here to reset