DeepAI AI Chat
Log In Sign Up

A Framework for Multisensory Foresight for Embodied Agents

09/15/2021
by   Xiaohui Chen, et al.
Tufts University
0

Predicting future sensory states is crucial for learning agents such as robots, drones, and autonomous vehicles. In this paper, we couple multiple sensory modalities with exploratory actions and propose a predictive neural network architecture to address this problem. Most existing approaches rely on large, manually annotated datasets, or only use visual data as a single modality. In contrast, the unsupervised method presented here uses multi-modal perceptions for predicting future visual frames. As a result, the proposed model is more comprehensive and can better capture the spatio-temporal dynamics of the environment, leading to more accurate visual frame prediction. The other novelty of our framework is the use of sub-networks dedicated to anticipating future haptic, audio, and tactile signals. The framework was tested and validated with a dataset containing 4 sensory modalities (vision, haptic, audio, and tactile) on a humanoid robot performing 9 behaviors multiple times on a large set of objects. While the visual information is the dominant modality, utilizing the additional non-visual modalities improves the accuracy of predictions.

READ FULL TEXT

page 3

page 4

page 6

02/11/2023

Flexible-modal Deception Detection with Audio-Visual Adapter

Detecting deception by human behaviors is vital in many fields such as c...
09/18/2022

VisTaNet: Attention Guided Deep Fusion for Surface Roughness Classification

Human texture perception is a weighted average of multi-sensory inputs: ...
12/25/2019

Look, Listen, and Act: Towards Audio-Visual Embodied Navigation

A crucial aspect of mobile intelligent agents is their ability to integr...
06/17/2021

Sensory Modality Mapping for Game Adaptation and Design

In this paper we examine methods for taking game-related information pro...
04/28/2018

A Bimodal Learning Approach to Assist Multi-sensory Effects Synchronization

In mulsemedia applications, traditional media content (text, image, audi...
11/21/2019

Visual Tactile Fusion Object Clustering

Object clustering, aiming at grouping similar objects into one cluster w...