BatVision with GCC-PHAT Features for Better Sound to Vision Predictions

by   Jesper Haahr Christensen, et al.

Inspired by sophisticated echolocation abilities found in nature, we train a generative adversarial network to predict plausible depth maps and grayscale layouts from sound. To achieve this, our sound-to-vision model processes binaural echo-returns from chirping sounds. We build upon previous work with BatVision that consists of a sound-to-vision model and a self-collected dataset using our mobile robot and low-cost hardware. We improve on the previous model by introducing several changes to the model, which leads to a better depth and grayscale estimation, and increased perceptual quality. Rather than using raw binaural waveforms as input, we generate generalized cross-correlation (GCC) features and use these as input instead. In addition, we change the model generator and base it on residual learning and use spectral normalization in the discriminator. We compare and present both quantitative and qualitative improvements over our previous BatVision model.



There are no comments yet.


page 1

page 2

page 4


Towards Audio to Scene Image Synthesis using Generative Adversarial Network

Humans can imagine a scene from a sound. We want machines to do so by us...

BatVision: Learning to See 3D Spatial Layout with Two Ears

Virtual camera images showing the correct layout of a space ahead can be...

A Unified Quantitative Model of Vision and Audition

We have put forwards a unified quantitative framework of vision and audi...

Deep Synthesizer Parameter Estimation

Sound synthesis is a complex field that requires domain expertise. Manua...

Causes of Ineradicable Spurious Predictions in Qualitative Simulation

It was recently proved that a sound and complete qualitative simulator d...

Pyramid Embedded Generative Adversarial Network for Automated Font Generation

In this paper, we investigate the Chinese font synthesis problem and pro...

Sound-to-Imagination: Unsupervised Crossmodal Translation Using Deep Dense Network Architecture

The motivation of our research is to develop a sound-to-image (S2I) tran...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.