Simple Pooling Front-ends For Efficient Audio Classification

10/03/2022
by   Xubo Liu, et al.
19

Recently, there has been increasing interest in building efficient audio neural networks for on-device scenarios. Most existing approaches are designed to reduce the size of audio neural networks using methods such as model pruning. In this work, we show that instead of reducing model size using complex methods, eliminating the temporal redundancy in the input audio features (e.g., mel-spectrogram) could be an effective approach for efficient audio classification. To do so, we proposed a family of simple pooling front-ends (SimPFs) which use simple non-parametric pooling operations to reduce the redundant information within the mel-spectrogram. We perform extensive experiments on four audio classification tasks to evaluate the performance of SimPFs. Experimental results show that SimPFs can achieve a reduction in more than half of the number of floating point operations (FLOPs) for off-the-shelf audio neural networks, with negligible degradation or even some improvements in audio classification performance.

READ FULL TEXT
research
01/21/2021

LEAF: A Learnable Frontend for Audio Classification

Mel-filterbanks are fixed, engineered audio features which emulate human...
research
09/21/2023

Cluster-based pruning techniques for audio data

Deep learning models have become widely adopted in various domains, but ...
research
02/26/2019

Acoustic scene classification using multi-layer temporal pooling based on convolutional neural network

The temporal dynamics and the discriminative information in the audio si...
research
07/12/2022

EfficientLEAF: A Faster LEarnable Audio Frontend of Questionable Use

In audio classification, differentiable auditory filterbanks with few pa...
research
02/18/2021

Deep Neural Networks based Invisible Steganography for Audio-into-Image Algorithm

In the last few years, steganography has attracted increasing attention ...
research
03/29/2022

Visualizations of Complex Sequences of Family-Infant Vocalizations Using Bag-of-Audio-Words Approach Based on Wav2vec 2.0 Features

In the U.S., approximately 15-17 to have at least one diagnosed mental, ...
research
07/16/2018

Backward Reduction of CNN Models with Information Flow Analysis

This paper proposes backward reduction, an algorithm that explores the c...

Please sign up or login with your details

Forgot password? Click here to reset