SoundNet: Learning Sound Representations from Unlabeled Video

10/27/2016
by   Yusuf Aytar, et al.
0

We learn rich natural sound representations by capitalizing on large amounts of unlabeled sound data collected in the wild. We leverage the natural synchronization between vision and sound to learn an acoustic representation using two-million unlabeled videos. Unlabeled video has the advantage that it can be economically acquired at massive scales, yet contains useful signals about natural sound. We propose a student-teacher training procedure which transfers discriminative visual knowledge from well established visual recognition models into the sound modality using unlabeled video as a bridge. Our sound representation yields significant performance improvements over the state-of-the-art results on standard benchmarks for acoustic scene/object classification. Visualizations suggest some high-level semantics automatically emerge in the sound network, even though it is trained without ground truth labels.

READ FULL TEXT

page 3

page 7

page 8

research
06/03/2017

See, Hear, and Read: Deep Aligned Representations

We capitalize on large amounts of readily-available, synchronous data to...
research
04/16/2019

Audio-Visual Model Distillation Using Acoustic Images

In this paper, we investigate how to learn rich and robust feature repre...
research
04/09/2018

The Sound of Pixels

We introduce PixelPlayer, a system that, by leveraging large amounts of ...
research
11/15/2019

Cross-modal supervised learning for better acoustic representations

Obtaining large-scale human-labeled datasets to train acoustic represent...
research
06/25/2018

Tracking Emerges by Colorizing Videos

We use large amounts of unlabeled video to learn models for visual track...
research
10/25/2019

Self-supervised Moving Vehicle Tracking with Stereo Sound

Humans are able to localize objects in the environment using both visual...
research
08/25/2016

Ambient Sound Provides Supervision for Visual Learning

The sound of crashing waves, the roar of fast-moving cars -- sound conve...

Please sign up or login with your details

Forgot password? Click here to reset