Unsupervised Learning of Semantic Audio Representations

11/06/2017
by   Aren Jansen, et al.
0

Even in the absence of any explicit semantic annotation, vast collections of audio recordings provide valuable information for learning the categorical structure of sounds. We consider several class-agnostic semantic constraints that apply to unlabeled nonspeech audio: (i) noise and translations in time do not change the underlying sound category, (ii) a mixture of two sound events inherits the categories of the constituents, and (iii) the categories of events in close temporal proximity are likely to be the same or related. Without labels to ground them, these constraints are incompatible with classification loss functions. However, they may still be leveraged to identify geometric inequalities needed for triplet loss-based training of convolutional neural networks. The result is low-dimensional embeddings of the input spectrograms that recover 41 counterparts when applied to downstream query-by-example sound retrieval and sound event classification tasks, respectively. Moreover, in limited-supervision settings, our unsupervised embeddings double the state-of-the-art classification performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/21/2023

A Multimodal Prototypical Approach for Unsupervised Sound Classification

In the context of environmental sound classification, the adaptability o...
research
11/14/2019

Coincidence, Categorization, and Consolidation: Learning to Recognize Sounds with Minimal Supervision

Humans do not acquire perceptual abilities in the way we train machines....
research
08/30/2021

Unsupervised Learning of Deep Features for Music Segmentation

Music segmentation refers to the dual problem of identifying boundaries ...
research
02/20/2020

Multi-label Sound Event Retrieval Using a Deep Learning-based Siamese Structure with a Pairwise Presence Matrix

Realistic recordings of soundscapes often have multiple sound events co-...
research
04/26/2021

Identifying Actions for Sound Event Classification

In Psychology, actions are paramount for humans to perceive and separate...
research
05/05/2021

Self-Supervised Learning from Automatically Separated Sound Scenes

Real-world sound scenes consist of time-varying collections of sound sou...
research
11/01/2017

Reducing Model Complexity for DNN Based Large-Scale Audio Classification

Audio classification is the task of identifying the sound categories tha...

Please sign up or login with your details

Forgot password? Click here to reset