DeepAI AI Chat
Log In Sign Up

Learning Visual Styles from Audio-Visual Associations

05/10/2022
by   Tingle Li, et al.
2

From the patter of rain to the crunch of snow, the sounds we hear often convey the visual textures that appear within a scene. In this paper, we present a method for learning visual styles from unlabeled audio-visual data. Our model learns to manipulate the texture of a scene to match a sound, a problem we term audio-driven image stylization. Given a dataset of paired audio-visual data, we learn to modify input images such that, after manipulation, they are more likely to co-occur with a given input sound. In quantitative and qualitative evaluations, our sound-based model outperforms label-based approaches. We also show that audio can be an intuitive representation for manipulating images, as adjusting a sound's volume or mixing two sounds together results in predictable changes to visual style. Project webpage: https://tinglok.netlify.app/files/avstyle

READ FULL TEXT

page 1

page 4

page 10

page 12

page 13

page 14

page 23

page 24

04/09/2018

The Sound of Pixels

We introduce PixelPlayer, a system that, by leveraging large amounts of ...
11/21/2022

LISA: Localized Image Stylization with Audio via Implicit Neural Representation

We present a novel framework, Localized Image Stylization with Audio (LI...
12/15/2018

Deep Synthesizer Parameter Estimation

Sound synthesis is a complex field that requires domain expertise. Manua...
12/11/2018

2.5D Visual Sound

Binaural audio provides a listener with 3D sound sensation, allowing a r...
01/08/2019

Audio Captcha Recognition Using RastaPLP Features by SVM

Nowadays, CAPTCHAs are computer generated tests that human can pass but ...
12/31/2022

Attentional Graph Convolutional Network for Structure-aware Audio-Visual Scene Classification

Audio-Visual scene understanding is a challenging problem due to the uns...
09/22/2021

Audio-Visual Grounding Referring Expression for Robotic Manipulation

Referring expressions are commonly used when referring to a specific tar...