Leveraging Neural Representations for Audio Manipulation

04/10/2023
by   Scott H. Hawley, et al.
0

We investigate applying audio manipulations using pretrained neural network-based autoencoders as an alternative to traditional signal processing methods, since the former may provide greater semantic or perceptual organization. To establish the potential of this approach, we first establish if representations from these models encode information about manipulations. We carry out experiments and produce visualizations using representations from two different pretrained autoencoders. Our findings indicate that, while some information about audio manipulations is encoded, this information is both limited and encoded in a non-trivial way. This is supported by our attempts to visualize these representations, which demonstrated that trajectories of representations for common manipulations are typically nonlinear and content dependent, even for linear signal manipulations. As a result, it is not yet clear how these pretrained autoencoders can be used to manipulate audio signals, however, our results indicate this may be due to the lack of disentanglement with respect to common audio manipulations.

READ FULL TEXT

page 2

page 4

page 5

page 7

page 8

research
04/26/2022

Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representation

Recent general-purpose audio representations show state-of-the-art perfo...
research
10/30/2017

Content-based Representations of audio using Siamese neural networks

In this paper, we focus on the problem of content-based retrieval for au...
research
04/10/2019

Neuralogram: A Deep Neural Network Based Representation for Audio Signals

We propose the Neuralogram – a deep neural network based representation ...
research
05/03/2023

Learning to Detect Novel and Fine-Grained Acoustic Sequences Using Pretrained Audio Representations

This work investigates pretrained audio representations for few shot Sou...
research
01/18/2023

An investigation of the reconstruction capacity of stacked convolutional autoencoders for log-mel-spectrograms

In audio processing applications, the generation of expressive sounds ba...
research
03/30/2022

Forensic Analysis and Localization of Multiply Compressed MP3 Audio Using Transformers

Audio signals are often stored and transmitted in compressed formats. Am...
research
05/15/2020

A Novel Fusion of Attention and Sequence to Sequence Autoencoders to Predict Sleepiness From Speech

Motivated by the attention mechanism of the human visual system and rece...

Please sign up or login with your details

Forgot password? Click here to reset