LEAF: A Learnable Frontend for Audio Classification

01/21/2021
by   Neil Zeghidour, et al.
10

Mel-filterbanks are fixed, engineered audio features which emulate human perception and have been used through the history of audio understanding up to today. However, their undeniable qualities are counterbalanced by the fundamental limitations of handmade representations. In this work we show that we can train a single learnable frontend that outperforms mel-filterbanks on a wide range of audio signals, including speech, music, audio events and animal sounds, providing a general-purpose learned frontend for audio classification. To do so, we introduce a new principled, lightweight, fully learnable architecture that can be used as a drop-in replacement of mel-filterbanks. Our system learns all operations of audio features extraction, from filtering to pooling, compression and normalization, and can be integrated into any neural network at a negligible parameter cost. We perform multi-task training on eight diverse audio classification tasks, and show consistent improvements of our model over mel-filterbanks and previous learnable alternatives. Moreover, our system outperforms the current state-of-the-art learnable frontend on Audioset, with orders of magnitude fewer parameters.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/12/2022

EfficientLEAF: A Faster LEarnable Audio Frontend of Questionable Use

In audio classification, differentiable auditory filterbanks with few pa...
research
10/03/2022

Simple Pooling Front-ends For Efficient Audio Classification

Recently, there has been increasing interest in building efficient audio...
research
08/05/2022

Deep Feature Learning for Medical Acoustics

The purpose of this paper is to compare different learnable frontends in...
research
11/25/2021

PolyViT: Co-training Vision Transformers on Images, Videos and Audio

Can we train a single transformer model capable of processing multiple m...
research
06/14/2022

Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt

Training on web-scale data can take months. But most computation and tim...
research
07/19/2022

GAFX: A General Audio Feature eXtractor

Most machine learning models for audio tasks are dealing with a handcraf...
research
06/21/2017

Learnable pooling with Context Gating for video classification

Common video representations often deploy an average or maximum pooling ...

Please sign up or login with your details

Forgot password? Click here to reset