Performance Based Cost Functions for End-to-End Speech Separation

06/01/2018
by   Shrikant Venkataramani, et al.
0

Recent neural network strategies for source separation attempt to model audio signals by processing their waveforms directly. Mean squared error (MSE) that measures the Euclidean distance between waveforms of denoised speech and the ground-truth speech, has been a natural cost-function for these approaches. However, MSE is not a perceptually motivated measure and may result in large perceptual discrepancies. In this paper, we propose and experiment with new loss functions for end-to-end source separation. These loss functions are motivated by BSS_Eval and perceptual metrics like source to distortion ratio (SDR), source to interference ratio (SIR), source to artifact ratio (SAR) and short-time objective intelligibility ratio (STOI). This enables the flexibility to mix and match these loss functions depending upon the requirements of the task. Subjective listening tests reveal that combinations of the proposed cost functions help achieve superior separation performance as compared to stand-alone MSE and SDR costs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/05/2018

End-to-end Networks for Supervised Single-channel Speech Separation

The performance of single channel source separation algorithms has impro...
research
02/16/2022

On loss functions and evaluation metrics for music source separation

We investigate which loss functions provide better separations via bench...
research
06/02/2023

Towards Robust FastSpeech 2 by Modelling Residual Multimodality

State-of-the-art non-autoregressive text-to-speech (TTS) models based on...
research
05/13/2023

A note on bounded distance-based information loss metrics for statistical disclosure control of numeric microdata

In the field of statistical disclosure control, the tradeoff between dat...
research
11/18/2019

Signal Clustering with Class-independent Segmentation

Radar signals have been dramatically increasing in complexity, limiting ...
research
10/23/2019

End-to-End Multi-Task Denoising for the Joint Optimization of Perceptual Speech Metrics

Although supervised learning based on a deep neural network has recently...
research
06/28/2018

Adversarial and Perceptual Refinement for Compressed Sensing MRI Reconstruction

Deep learning approaches have shown promising performance for compressed...

Please sign up or login with your details

Forgot password? Click here to reset