BatVision with GCC-PHAT Features for Better Sound to Vision Predictions

06/14/2020
by   Jesper Haahr Christensen, et al.
0

Inspired by sophisticated echolocation abilities found in nature, we train a generative adversarial network to predict plausible depth maps and grayscale layouts from sound. To achieve this, our sound-to-vision model processes binaural echo-returns from chirping sounds. We build upon previous work with BatVision that consists of a sound-to-vision model and a self-collected dataset using our mobile robot and low-cost hardware. We improve on the previous model by introducing several changes to the model, which leads to a better depth and grayscale estimation, and increased perceptual quality. Rather than using raw binaural waveforms as input, we generate generalized cross-correlation (GCC) features and use these as input instead. In addition, we change the model generator and base it on residual learning and use spectral normalization in the discriminator. We compare and present both quantitative and qualitative improvements over our previous BatVision model.

READ FULL TEXT

page 1

page 2

page 4

research
08/13/2018

Towards Audio to Scene Image Synthesis using Generative Adversarial Network

Humans can imagine a scene from a sound. We want machines to do so by us...
research
06/23/2014

A Unified Quantitative Model of Vision and Audition

We have put forwards a unified quantitative framework of vision and audi...
research
12/15/2019

BatVision: Learning to See 3D Spatial Layout with Two Ears

Virtual camera images showing the correct layout of a space ahead can be...
research
12/15/2018

Deep Synthesizer Parameter Estimation

Sound synthesis is a complex field that requires domain expertise. Manua...
research
02/10/2022

Sound masking degrades perception of self-location during stepping: A case for sound-transparent spacesuits for Mars

Most efforts to improve spacesuits have been directed towards adding hap...
research
11/10/2022

GANStrument: Adversarial Instrument Sound Synthesis with Pitch-invariant Instance Conditioning

We propose GANStrument, a generative adversarial model for instrument so...
research
09/30/2011

Causes of Ineradicable Spurious Predictions in Qualitative Simulation

It was recently proved that a sound and complete qualitative simulator d...

Please sign up or login with your details

Forgot password? Click here to reset