Connecting Touch and Vision via Cross-Modal Prediction

by   Yunzhu Li, et al.

Humans perceive the world using multi-modal sensory inputs such as vision, audition, and touch. In this work, we investigate the cross-modal connection between vision and touch. The main challenge in this cross-domain modeling task lies in the significant scale discrepancy between the two: while our eyes perceive an entire visual scene at once, humans can only feel a small region of an object at any given moment. To connect vision and touch, we introduce new tasks of synthesizing plausible tactile signals from visual inputs as well as imagining how we interact with objects given tactile data as input. To accomplish our goals, we first equip robots with both visual and tactile sensors and collect a large-scale dataset of corresponding vision and tactile image sequences. To close the scale gap, we present a new conditional adversarial model that incorporates the scale and location information of the touch. Human perceptual studies demonstrate that our model can produce realistic visual images from tactile data and vice versa. Finally, we present both qualitative and quantitative experimental results regarding different system designs, as well as visualizing the learned representations of our model.


page 1

page 3

page 4

page 5

page 6

page 7

page 8


"Touching to See" and "Seeing to Feel": Robotic Cross-modal SensoryData Generation for Visual-Tactile Perception

The integration of visual-tactile stimulus is common while humans perfor...

Learning to Identify Object Instances by Touch: Tactile Recognition via Multimodal Matching

Much of the literature on robotic perception focuses on the visual modal...

Teaching Cameras to Feel: Estimating Tactile Physical Properties of Surfaces From Images

The connection between visual input and tactile sensing is critical for ...

Controllable Visual-Tactile Synthesis

Deep generative models have various content creation applications such a...

Visual-Tactile Cross-Modal Data Generation using Residue-Fusion GAN with Feature-Matching and Perceptual Losses

Existing psychophysical studies have revealed that the cross-modal visua...

Learning Self-Supervised Representations from Vision and Touch for Active Sliding Perception of Deformable Surfaces

Humans make extensive use of vision and touch as complementary senses, w...

Touch and Go: Learning from Human-Collected Vision and Touch

The ability to associate touch with sight is essential for tasks that re...

Please sign up or login with your details

Forgot password? Click here to reset