Interpretable Representation Learning for Speech and Audio Signals Based on Relevance Weighting

10/29/2020
by   Purvi Agrawal, et al.
0

The learning of interpretable representations from raw data presents significant challenges for time series data like speech. In this work, we propose a relevance weighting scheme that allows the interpretation of the speech representations during the forward propagation of the model itself. The relevance weighting is achieved using a sub-network approach that performs the task of feature selection. A relevance sub-network, applied on the output of first layer of a convolutional neural network model operating on raw speech signals, acts as an acoustic filterbank (FB) layer with relevance weighting. A similar relevance sub-network applied on the second convolutional layer performs modulation filterbank learning with relevance weighting. The full acoustic model consisting of relevance sub-networks, convolutional layers and feed-forward layers is trained for a speech recognition task on noisy and reverberant speech in the Aurora-4, CHiME-3 and VOiCES datasets. The proposed representation learning framework is also applied for the task of sound classification in the UrbanSound8K dataset. A detailed analysis of the relevance weights learned by the model reveals that the relevance weights capture information regarding the underlying speech/audio content. In addition, speech recognition and sound classification experiments reveal that the incorporation of relevance weighting in the neural network architecture improves the performance significantly.

READ FULL TEXT

page 1

page 5

page 6

page 12

research
10/29/2020

Robust Raw Waveform Speech Recognition Using Relevance Weighted Representations

Speech recognition in noisy and channel distorted scenarios is often cha...
research
07/30/2021

A Multi-Head Relevance Weighting Framework For Learning Raw Waveform Audio Representations

In this work, we propose a multi-head relevance weighting framework to l...
research
06/27/2022

Interpretable Acoustic Representation Learning on Breathing and Speech Signals for COVID-19 Detection

In this paper, we describe an approach for representation learning of au...
research
04/19/2021

Interpreting intermediate convolutional layers of CNNs trained on raw speech

This paper presents a technique to interpret and visualize intermediate ...
research
11/19/2021

Interpreting deep urban sound classification using Layer-wise Relevance Propagation

After constructing a deep neural network for urban sound classification,...
research
03/11/2023

Explainable AI for Time Series via Virtual Inspection Layers

The field of eXplainable Artificial Intelligence (XAI) has greatly advan...
research
09/12/2015

Double Relief with progressive weighting function

Feature weighting algorithms try to solve a problem of great importance ...

Please sign up or login with your details

Forgot password? Click here to reset