On Frequency-Wise Normalizations for Better Recording Device Generalization in Audio Spectrogram Transformers

06/20/2023
by   Paul Primus and, et al.
0

Varying conditions between the data seen at training and at application time remain a major challenge for machine learning. We study this problem in the context of Acoustic Scene Classification (ASC) with mismatching recording devices. Previous works successfully employed frequency-wise normalization of inputs and hidden layer activations in convolutional neural networks to reduce the recording device discrepancy. The main objective of this work was to adopt frequency-wise normalization for Audio Spectrogram Transformers (ASTs), which have recently become the dominant model architecture in ASC. To this end, we first investigate how recording device characteristics are encoded in the hidden layer activations of ASTs. We find that recording device information is initially encoded in the frequency dimension; however, after the first self-attention block, it is largely transformed into the token dimension. Based on this observation, we conjecture that suppressing recording device characteristics in the input spectrogram is the most effective. We propose a frequency-centering operation for spectrograms that improves the ASC performance on unseen recording devices on average by up to 18.2 percentage points.

READ FULL TEXT

page 1

page 2

research
05/12/2023

Device-Robust Acoustic Scene Classification via Impulse Response Augmentation

The ability to generalize to a wide range of recording devices is a cruc...
research
09/04/2019

Exploiting Parallel Audio Recordings to Enforce Device Invariance in CNN-based Acoustic Scene Classification

Distribution mismatches between the data seen at training and at applica...
research
06/24/2022

Domain Generalization with Relaxed Instance Frequency-wise Normalization for Multi-device Acoustic Scene Classification

While using two-dimensional convolutional neural networks (2D-CNNs) in i...
research
06/13/2023

Domain Information Control at Inference Time for Acoustic Scene Classification

Domain shift is considered a challenge in machine learning as it causes ...
research
03/25/2021

SubSpectral Normalization for Neural Audio Data Processing

Convolutional Neural Networks are widely used in various machine learnin...
research
11/12/2021

Domain Generalization on Efficient Acoustic Scene Classification using Residual Normalization

It is a practical research topic how to deal with multi-device audio inp...
research
07/19/2021

Over-Parameterization and Generalization in Audio Classification

Convolutional Neural Networks (CNNs) have been dominating classification...

Please sign up or login with your details

Forgot password? Click here to reset