Harmonicity Plays a Critical Role in DNN Based Versus in Biologically-Inspired Monaural Speech Segregation Systems

03/08/2022
by   Rahil Parikh, et al.
13

Recent advancements in deep learning have led to drastic improvements in speech segregation models. Despite their success and growing applicability, few efforts have been made to analyze the underlying principles that these networks learn to perform segregation. Here we analyze the role of harmonicity on two state-of-the-art Deep Neural Networks (DNN)-based models- Conv-TasNet and DPT-Net. We evaluate their performance with mixtures of natural speech versus slightly manipulated inharmonic speech, where harmonics are slightly frequency jittered. We find that performance deteriorates significantly if one source is even slightly harmonically jittered, e.g., an imperceptible 3 degrades performance of Conv-TasNet from 15.4 dB to 0.70 dB. Training the model on inharmonic speech does not remedy this sensitivity, instead resulting in worse performance on natural speech mixtures, making inharmonicity a powerful adversarial factor in DNN models. Furthermore, additional analyses reveal that DNN algorithms deviate markedly from biologically inspired algorithms that rely primarily on timing cues and not harmonicity to segregate speech.

READ FULL TEXT

page 2

page 3

page 4

research
10/04/2021

Towards efficient end-to-end speech recognition with biologically-inspired neural networks

Automatic speech recognition (ASR) is a capability which enables a progr...
research
03/23/2018

An improved DNN-based spectral feature mapping that removes noise and reverberation for robust automatic speech recognition

Reverberation and additive noise have detrimental effects on the perform...
research
04/12/2018

Global SNR Estimation of Speech Signals using Entropy and Uncertainty Estimates from Dropout Networks

This paper demonstrates two novel methods to estimate the global SNR of ...
research
06/15/2016

Multi-Modal Hybrid Deep Neural Network for Speech Enhancement

Deep Neural Networks (DNN) have been successful in en- hancing noisy spe...
research
06/01/2018

DNN Based Speech Enhancement for Unseen Noises Using Monte Carlo Dropout

In this work, we propose the use of dropouts as a Bayesian estimator for...
research
09/14/2023

Analysis of Speech Separation Performance Degradation on Emotional Speech Mixtures

Despite recent strides made in Speech Separation, most models are traine...
research
08/28/2018

Using Monte Carlo dropout for non-stationary noise reduction from speech

In this work, we propose the use of dropout as a Bayesian estimator for ...

Please sign up or login with your details

Forgot password? Click here to reset