LEAN: Light and Efficient Audio Classification Network

05/22/2023
by   Shwetank Choudhary, et al.
0

Over the past few years, audio classification task on large-scale dataset such as AudioSet has been an important research area. Several deeper Convolution-based Neural networks have shown compelling performance notably Vggish, YAMNet, and Pretrained Audio Neural Network (PANN). These models are available as pretrained architecture for transfer learning as well as specific audio task adoption. In this paper, we propose a lightweight on-device deep learning-based model for audio classification, LEAN. LEAN consists of a raw waveform-based temporal feature extractor called as Wave Encoder and logmel-based Pretrained YAMNet. We show that using a combination of trainable wave encoder, Pretrained YAMNet along with cross attention-based temporal realignment, results in competitive performance on downstream audio classification tasks with lesser memory footprints and hence making it suitable for resource constraints devices such as mobile, edge devices, etc . Our proposed system achieves on-device mean average precision(mAP) of .445 with a memory footprint of a mere 4.5MB on the FSD50K dataset which is an improvement of 22

READ FULL TEXT
research
12/21/2019

PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition

Audio pattern recognition is an important research topic in the machine ...
research
07/19/2022

GAFX: A General Audio Feature eXtractor

Most machine learning models for audio tasks are dealing with a handcraf...
research
12/18/2020

Transfer Learning Based Automatic Model Creation Tool For Resource Constraint Devices

With the enhancement of Machine Learning, many tools are being designed ...
research
10/06/2022

Matching Text and Audio Embeddings: Exploring Transfer-learning Strategies for Language-based Audio Retrieval

We present an analysis of large-scale pretrained deep learning models us...
research
04/30/2023

Transformer-based Sequence Labeling for Audio Classification based on MFCCs

Audio classification is vital in areas such as speech and music recognit...
research
11/09/2020

Improved Soccer Action Spotting using both Audio and Video Streams

In this paper, we propose a study on multi-modal (audio and video) actio...
research
09/14/2023

DDSP-based Neural Waveform Synthesis of Polyphonic Guitar Performance from String-wise MIDI Input

We explore the use of neural synthesis for acoustic guitar from string-w...

Please sign up or login with your details

Forgot password? Click here to reset