A Framework for Multisensory Foresight for Embodied Agents

09/15/2021
by   Xiaohui Chen, et al.
0

Predicting future sensory states is crucial for learning agents such as robots, drones, and autonomous vehicles. In this paper, we couple multiple sensory modalities with exploratory actions and propose a predictive neural network architecture to address this problem. Most existing approaches rely on large, manually annotated datasets, or only use visual data as a single modality. In contrast, the unsupervised method presented here uses multi-modal perceptions for predicting future visual frames. As a result, the proposed model is more comprehensive and can better capture the spatio-temporal dynamics of the environment, leading to more accurate visual frame prediction. The other novelty of our framework is the use of sub-networks dedicated to anticipating future haptic, audio, and tactile signals. The framework was tested and validated with a dataset containing 4 sensory modalities (vision, haptic, audio, and tactile) on a humanoid robot performing 9 behaviors multiple times on a large set of objects. While the visual information is the dominant modality, utilizing the additional non-visual modalities improves the accuracy of predictions.

READ FULL TEXT

page 3

page 4

page 6

research
02/11/2023

Flexible-modal Deception Detection with Audio-Visual Adapter

Detecting deception by human behaviors is vital in many fields such as c...
research
09/18/2022

VisTaNet: Attention Guided Deep Fusion for Surface Roughness Classification

Human texture perception is a weighted average of multi-sensory inputs: ...
research
12/25/2019

Look, Listen, and Act: Towards Audio-Visual Embodied Navigation

A crucial aspect of mobile intelligent agents is their ability to integr...
research
06/17/2021

Sensory Modality Mapping for Game Adaptation and Design

In this paper we examine methods for taking game-related information pro...
research
04/28/2018

A Bimodal Learning Approach to Assist Multi-sensory Effects Synchronization

In mulsemedia applications, traditional media content (text, image, audi...
research
11/21/2019

Visual Tactile Fusion Object Clustering

Object clustering, aiming at grouping similar objects into one cluster w...

Please sign up or login with your details

Forgot password? Click here to reset