Sound-to-Imagination: Unsupervised Crossmodal Translation Using Deep Dense Network Architecture

06/02/2021
by   Leonardo A. Fanzeres, et al.
0

The motivation of our research is to develop a sound-to-image (S2I) translation system for enabling a human receiver to visually infer the occurrence of sound related events. We expect the computer to 'imagine' the scene from the captured sound, generating original images that picture the sound emitting source. Previous studies on similar topics opted for simplified approaches using data with low content diversity and/or strong supervision. Differently, we propose to perform unsupervised S2I translation using thousands of distinct and unknown scenes, with slightly pre-cleaned data, just enough to guarantee aural-visual semantic coherence. To that end, we employ conditional generative adversarial networks (GANs) with a deep densely connected generator. Besides, we implemented a moving-average adversarial loss to address GANs training instability. Though the specified S2I translation problem is quite challenging, we were able to generalize the translator model enough to obtain more than 14 translated from unknown sounds. Additionally, we present a solution using informativity classifiers to perform quantitative evaluation of S2I translation.

READ FULL TEXT

page 6

page 8

page 9

page 13

page 14

page 15

page 16

page 17

research
08/13/2018

Towards Audio to Scene Image Synthesis using Generative Adversarial Network

Humans can imagine a scene from a sound. We want machines to do so by us...
research
01/25/2020

On the Role of Receptive Field in Unsupervised Sim-to-Real Image Translation

Generative Adversarial Networks (GANs) are now widely used for photo-rea...
research
01/15/2020

Structured GANs

We present Generative Adversarial Networks (GANs), in which the symmetri...
research
03/10/2018

Learning to Localize Sound Source in Visual Scenes

Visual events are usually accompanied by sounds in our daily lives. We p...
research
03/28/2023

Rethinking CycleGAN: Improving Quality of GANs for Unpaired Image-to-Image Translation

An unpaired image-to-image (I2I) translation technique seeks to find a m...

Please sign up or login with your details

Forgot password? Click here to reset