A Multi-Head Relevance Weighting Framework For Learning Raw Waveform Audio Representations

07/30/2021
by   Debottam Dutta, et al.
0

In this work, we propose a multi-head relevance weighting framework to learn audio representations from raw waveforms. The audio waveform, split into windows of short duration, are processed with a 1-D convolutional layer of cosine modulated Gaussian filters acting as a learnable filterbank. The key novelty of the proposed framework is the introduction of multi-head relevance on the learnt filterbank representations. Each head of the relevance network is modelled as a separate sub-network. These heads perform representation enhancement by generating weight masks for different parts of the time-frequency representation learnt by the parametric acoustic filterbank layer. The relevance weighted representations are fed to a neural classifier and the whole system is trained jointly for the audio classification objective. Experiments are performed on the DCASE2020 Task 1A challenge as well as the Urban Sound Classification (USC) tasks. In these experiments, the proposed approach yields relative improvements of 10 DCASE2020 and USC datasets over the mel-spectrogram baseline. Also, the analysis of multi-head relevance weights provides insights on the learned representations.

READ FULL TEXT
research
10/29/2020

Interpretable Representation Learning for Speech and Audio Signals Based on Relevance Weighting

The learning of interpretable representations from raw data presents sig...
research
10/29/2020

Robust Raw Waveform Speech Recognition Using Relevance Weighted Representations

Speech recognition in noisy and channel distorted scenarios is often cha...
research
06/27/2022

Interpretable Acoustic Representation Learning on Breathing and Speech Signals for COVID-19 Detection

In this paper, we describe an approach for representation learning of au...
research
04/27/2023

XAI-based Comparison of Input Representations for Audio Event Classification

Deep neural networks are a promising tool for Audio Event Classification...
research
11/19/2021

Interpreting deep urban sound classification using Layer-wise Relevance Propagation

After constructing a deep neural network for urban sound classification,...
research
03/18/2023

Content Adaptive Front End For Audio Signal Processing

We propose a learnable content adaptive front end for audio signal proce...
research
12/04/2017

Raw Waveform-based Audio Classification Using Sample-level CNN Architectures

Music, speech, and acoustic scene sound are often handled separately in ...

Please sign up or login with your details

Forgot password? Click here to reset