Deep neural network based speech separation optimizing an objective estimator of intelligibility for low latency applications

07/18/2018
by   Gaurav Naithani, et al.
0

Mean square error (MSE) has been the preferred choice as loss function in the current deep neural network (DNN) based speech separation techniques. In this paper, we propose a new cost function with the aim of optimizing the extended short time objective intelligibility (ESTOI) measure. We focus on applications where low algorithmic latency (≤ 10 ms) is important. We use long short-term memory networks (LSTM) and evaluate our proposed approach on four sets of two-speaker mixtures from extended Danish hearing in noise (HINT) dataset. We show that the proposed loss function can offer improved or at par objective intelligibility (in terms of ESTOI) compared to an MSE optimized baseline while resulting in lower objective separation performance (in terms of the source to distortion ratio (SDR)). We then proceed to propose an approach where the network is first initialized with weights optimized for MSE criterion and then trained with the proposed ESTOI loss criterion. This approach mitigates some of the losses in objective separation performance while preserving the gains in objective intelligibility.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/19/2019

Low-Latency Deep Clustering For Speech Separation

This paper proposes a low algorithmic latency adaptation of the deep clu...
research
06/22/2021

Deep neural network Based Low-latency Speech Separation with Asymmetric analysis-Synthesis Window Pair

Time-frequency masking or spectrum prediction computed via short symmetr...
research
06/15/2021

Learning to Compensate: A Deep Neural Network Framework for 5G Power Amplifier Compensation

Owing to the complicated characteristics of 5G communication system, des...
research
06/25/2018

Single-channel Speech Dereverberation via Generative Adversarial Training

In this paper, we propose a single-channel speech dereverberation system...
research
05/26/2023

ElectrodeNet – A Deep Learning Based Sound Coding Strategy for Cochlear Implants

ElectrodeNet, a deep learning based sound coding strategy for the cochle...
research
05/26/2016

Towards optimal nonlinearities for sparse recovery using higher-order statistics

We consider machine learning techniques to develop low-latency approxima...
research
03/21/2019

Data-driven design of perfect reconstruction filterbank for DNN-based sound source enhancement

We propose a data-driven design method of perfect-reconstruction filterb...

Please sign up or login with your details

Forgot password? Click here to reset