Learning the Spectrogram Temporal Resolution for Audio Classification

10/04/2022
by   Haohe Liu, et al.
16

The audio spectrogram is a time-frequency representation that has been widely used for audio classification. The temporal resolution of a spectrogram depends on hop size. Previous works generally assume the hop size should be a constant value such as ten milliseconds. However, a fixed hop size or resolution is not always optimal for different types of sound. This paper proposes a novel method, DiffRes, that enables differentiable temporal resolution learning to improve the performance of audio classification models. Given a spectrogram calculated with a fixed hop size, DiffRes merges non-essential time frames while preserving important frames. DiffRes acts as a "drop-in" module between an audio spectrogram and a classifier, and can be end-to-end optimized. We evaluate DiffRes on the mel-spectrogram, followed by state-of-the-art classifier backbones, and apply it to five different subtasks. Compared with using the fixed-resolution mel-spectrogram, the DiffRes-based method can achieve the same or better classification accuracy with at least 25 temporal dimensions on the feature level, which alleviates the computational cost at the same time. Starting from a high-temporal-resolution spectrogram such as one-millisecond hop size, we show that DiffRes can improve classification accuracy with the same computational complexity.

READ FULL TEXT

page 2

page 18

research
10/20/2022

Play It Back: Iterative Attention for Audio Recognition

A key function of auditory cognition is the association of characteristi...
research
06/22/2017

Comparison of Time-Frequency Representations for Environmental Sound Classification using Convolutional Neural Networks

Recent successful applications of convolutional neural networks (CNNs) t...
research
10/05/2020

High-resolution Piano Transcription with Pedals by Regressing Onsets and Offsets Times

Automatic music transcription (AMT) is the task of transcribing audio re...
research
10/21/2020

WaveTransformer: A Novel Architecture for Audio Captioning Based on Learning Temporal and Time-Frequency Information

Automated audio captioning (AAC) is a novel task, where a method takes a...
research
04/04/2018

Classification of Vehicles Based on Audio Signals using Quadratic Discriminant Analysis and High Energy Feature Vectors

The focus of this paper is on classification of different vehicles using...
research
11/23/2015

Sparse Linear Models applied to Power Quality Disturbance Classification

Power quality (PQ) analysis describes the non-pure electric signals that...
research
05/28/2023

Range-Based Equal Error Rate for Spoof Localization

Spoof localization, also called segment-level detection, is a crucial ta...

Please sign up or login with your details

Forgot password? Click here to reset