SubSpectral Normalization for Neural Audio Data Processing

03/25/2021
by   Simyung Chang, et al.
0

Convolutional Neural Networks are widely used in various machine learning domains. In image processing, the features can be obtained by applying 2D convolution to all spatial dimensions of the input. However, in the audio case, frequency domain input like Mel-Spectrogram has different and unique characteristics in the frequency dimension. Thus, there is a need for a method that allows the 2D convolution layer to handle the frequency dimension differently. In this work, we introduce SubSpectral Normalization (SSN), which splits the input frequency dimension into several groups (sub-bands) and performs a different normalization for each group. SSN also includes an affine transformation that can be applied to each group. Our method removes the inter-frequency deflection while the network learns a frequency-aware characteristic. In the experiments with audio data, we observed that SSN can efficiently improve the network's performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/24/2022

Domain Generalization with Relaxed Instance Frequency-wise Normalization for Multi-device Acoustic Scene Classification

While using two-dimensional convolutional neural networks (2D-CNNs) in i...
research
06/20/2023

Frequency Channel Attention for computationally efficient sound event detection

We explore on various attention methods on frequency and channel dimensi...
research
12/27/2019

nnAudio: An on-the-fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolution Neural Networks

Converting time domain waveforms to frequency domain spectrograms is typ...
research
06/20/2023

On Frequency-Wise Normalizations for Better Recording Device Generalization in Audio Spectrogram Transformers

Varying conditions between the data seen at training and at application ...
research
01/08/2022

A novel audio representation using space filling curves

Since convolutional neural networks (CNNs) have revolutionized the image...
research
05/04/2017

Pixel Normalization from Numeric Data as Input to Neural Networks

Text to image transformation for input to neural networks requires inter...
research
10/01/2020

Helicality: An Isomap-based Measure of Octave Equivalence in Audio Data

Octave equivalence serves as domain-knowledge in MIR systems, including ...

Please sign up or login with your details

Forgot password? Click here to reset