Robot Synesthesia: A Sound and Emotion Guided AI Painter

02/09/2023
by   Vihaan Misra, et al.
0

If a picture paints a thousand words, sound may voice a million. While recent robotic painting and image synthesis methods have achieved progress in generating visuals from text inputs, the translation of sound into images is vastly unexplored. Generally, sound-based interfaces and sonic interactions have the potential to expand accessibility and control for the user and provide a means to convey complex emotions and the dynamic aspects of the real world. In this paper, we propose an approach for using sound and speech to guide a robotic painting process, known here as robot synesthesia. For general sound, we encode the simulated paintings and input sounds into the same latent space. For speech, we decouple speech into its transcribed text and the tone of the speech. Whereas we use the text to control the content, we estimate the emotions from the tone to guide the mood of the painting. Our approach has been fully integrated with FRIDA, a robotic painting framework, adding sound and speech to FRIDA's existing input modalities, such as text and style. In two surveys, participants were able to correctly guess the emotion or natural sound used to generate a given painting more than twice as likely as random chance. On our sound-guided image manipulation and music-guided paintings, we discuss the results qualitatively.

READ FULL TEXT

page 1

page 3

page 4

page 5

page 6

page 7

research
06/29/2016

Penambahan emosi menggunakan metode manipulasi prosodi untuk sistem text to speech bahasa Indonesia

Adding an emotions using prosody manipulation method for Indonesian text...
research
11/30/2021

Sound-Guided Semantic Image Manipulation

The recent success of the generative model shows that leveraging the mul...
research
08/30/2022

Robust Sound-Guided Image Manipulation

Recent successes suggest that an image can be manipulated by a text prom...
research
04/06/2022

Perceive, Represent, Generate: Translating Multimodal Information to Robotic Motion Trajectories

We present Perceive-Represent-Generate (PRG), a novel three-stage framew...
research
09/19/2019

Robot Sound Interpretation: Combining Sight and Sound in Learning-Based Control

We explore the interpretation of sound for robot decision-making, inspir...
research
11/26/2022

Contextual Expressive Text-to-Speech

The goal of expressive Text-to-speech (TTS) is to synthesize natural spe...
research
08/20/2019

From Text to Sound: A Preliminary Study on Retrieving Sound Effects to Radio Stories

Sound effects play an essential role in producing high-quality radio sto...

Please sign up or login with your details

Forgot password? Click here to reset