DeepAI AI Chat
Log In Sign Up

BatVision with GCC-PHAT Features for Better Sound to Vision Predictions

by   Jesper Haahr Christensen, et al.

Inspired by sophisticated echolocation abilities found in nature, we train a generative adversarial network to predict plausible depth maps and grayscale layouts from sound. To achieve this, our sound-to-vision model processes binaural echo-returns from chirping sounds. We build upon previous work with BatVision that consists of a sound-to-vision model and a self-collected dataset using our mobile robot and low-cost hardware. We improve on the previous model by introducing several changes to the model, which leads to a better depth and grayscale estimation, and increased perceptual quality. Rather than using raw binaural waveforms as input, we generate generalized cross-correlation (GCC) features and use these as input instead. In addition, we change the model generator and base it on residual learning and use spectral normalization in the discriminator. We compare and present both quantitative and qualitative improvements over our previous BatVision model.


page 1

page 2

page 4


Towards Audio to Scene Image Synthesis using Generative Adversarial Network

Humans can imagine a scene from a sound. We want machines to do so by us...

A Unified Quantitative Model of Vision and Audition

We have put forwards a unified quantitative framework of vision and audi...

BatVision: Learning to See 3D Spatial Layout with Two Ears

Virtual camera images showing the correct layout of a space ahead can be...

Deep Synthesizer Parameter Estimation

Sound synthesis is a complex field that requires domain expertise. Manua...

GANStrument: Adversarial Instrument Sound Synthesis with Pitch-invariant Instance Conditioning

We propose GANStrument, a generative adversarial model for instrument so...

Causes of Ineradicable Spurious Predictions in Qualitative Simulation

It was recently proved that a sound and complete qualitative simulator d...