Learning Complex Basis Functions for Invariant Representations of Audio

07/13/2019
by   Stefan Lattner, et al.
2

Learning features from data has shown to be more successful than using hand-crafted features for many machine learning tasks. In music information retrieval (MIR), features learned from windowed spectrograms are highly variant to transformations like transposition or time-shift. Such variances are undesirable when they are irrelevant for the respective MIR task. We propose an architecture called Complex Autoencoder (CAE) which learns features invariant to orthogonal transformations. Mapping signals onto complex basis functions learned by the CAE results in a transformation-invariant "magnitude space" and a transformation-variant "phase space". The phase space is useful to infer transformations between data pairs. When exploiting the invariance-property of the magnitude space, we achieve state-of-the-art results in audio-to-score alignment and repeated section discovery for audio. A PyTorch implementation of the CAE, including the repeated section discovery method, is available online.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/21/2018

Learning Transposition-Invariant Interval Features from Symbolic Music and Audio

Many music theoretical constructs (such as scale types, modes, cadences,...
research
07/19/2018

Audio-to-Score Alignment using Transposition-invariant Features

Audio-to-score alignment is an important pre-processing step for in-dept...
research
04/01/2014

A Deep Representation for Invariance And Music Classification

Representations in the auditory cortex might be based on mechanisms simi...
research
02/05/2020

Enhancing Feature Invariance with Learned Image Transformations for Image Retrieval

Off-the-shelf convolutional neural network features achieve state-of-the...
research
06/26/2022

Exploiting Transformation Invariance and Equivariance for Self-supervised Sound Localisation

We present a simple yet effective self-supervised framework for audio-vi...
research
04/15/2019

Are Nearby Neighbors Relatives?: Diagnosing Deep Music Embedding Spaces

Deep neural networks have frequently been used to directly learn represe...
research
08/16/2018

Learning Invariances using the Marginal Likelihood

Generalising well in supervised learning tasks relies on correctly extra...

Please sign up or login with your details

Forgot password? Click here to reset