Low-Complexity Audio Embedding Extractors

03/03/2023
by   Florian Schmid, et al.
0

Solving tasks such as speaker recognition, music classification, or semantic audio event tagging with deep learning models typically requires computationally demanding networks. General-purpose audio embeddings (GPAEs) are dense representations of audio signals that allow lightweight, shallow classifiers to tackle various audio tasks. The idea is that a single complex feature extractor would extract dense GPAEs, while shallow MLPs can produce task-specific predictions. If the extracted dense representations are general enough to allow the simple downstream classifiers to generalize to a variety of tasks in the audio domain, a single costly forward pass suffices to solve multiple tasks in parallel. In this work, we try to reduce the cost of GPAE extractors to make them suitable for resource-constrained devices. We use efficient MobileNets trained on AudioSet using Knowledge Distillation from a Transformer ensemble as efficient GPAE extractors. We explore how to obtain high-quality GPAEs from the model, study how model complexity relates to the quality of extracted GPAEs, and conclude that low-complexity models can generate competitive GPAEs, paving the way for analyzing audio streams on edge devices w.r.t. multiple audio classification and recognition tasks.

READ FULL TEXT
research
06/30/2023

Audio Embeddings as Teachers for Music Classification

Music classification has been one of the most popular tasks in the field...
research
11/25/2022

Learning General Audio Representations with Large-Scale Training of Patchout Audio Transformers

The success of supervised deep learning methods is largely due to their ...
research
07/19/2022

GAFX: A General Audio Feature eXtractor

Most machine learning models for audio tasks are dealing with a handcraf...
research
09/30/2022

An empirical study of weakly supervised audio tagging embeddings for general audio representations

We study the usability of pre-trained weakly supervised audio tagging (A...
research
03/06/2022

HEAR 2021: Holistic Evaluation of Audio Representations

What audio embedding approach generalizes best to a wide range of downst...
research
02/22/2020

Multi-Representation Knowledge Distillation For Audio Classification

As an important component of multimedia analysis tasks, audio classifica...
research
11/26/2018

Combining High-Level Features of Raw Audio Waves and Mel-Spectrograms for Audio Tagging

In this paper, we describe our contribution to Task 2 of the DCASE 2018 ...

Please sign up or login with your details

Forgot password? Click here to reset