Optimization of Speaker Extraction Neural Network with Magnitude and Temporal Spectrum Approximation Loss

03/24/2019
by   Chenglin Xu, et al.
0

The SpeakerBeam-FE (SBF) method is proposed for speaker extraction. It attempts to overcome the problem of unknown number of speakers in an audio recording during source separation. The mask approximation loss of SBF is sub-optimal, which doesn't calculate direct signal reconstruction error and consider the speech context. To address these problems, this paper proposes a magnitude and temporal spectrum approximation loss to estimate a phase sensitive mask for the target speaker with the speaker characteristics. Moreover, this paper explores a concatenation framework instead of the context adaptive deep neural network in the SBF method to encode a speaker embedding into the mask estimation network. Experimental results under open evaluation condition show that the proposed method achieves 70.4 improvement over the SBF baseline on signal-to-distortion ratio (SDR) and perceptual evaluation of speech quality (PESQ), respectively. A further analysis demonstrates 69.1 proposed method for different and same gender mixtures.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/29/2020

Time-domain speaker extraction network

Speaker extraction is to extract a target speaker's voice from multi-tal...
research
04/17/2020

SpEx: Multi-Scale Time Domain Speaker Extraction Network

Speaker extraction aims to mimic humans' selective auditory attention by...
research
06/17/2022

Simultaneous Speech Extraction for Multiple Target Speakers under the Meeting Scenarios(V1)

Recently, the target speech separation or extraction techniques under th...
research
06/11/2022

Signal-informed DNN-based DOA Estimation combining an External Microphone and GCC-PHAT Features

Aiming at estimating the direction of arrival (DOA) of a desired speaker...
research
07/31/2018

DNN driven Speaker Independent Audio-Visual Mask Estimation for Speech Separation

Human auditory cortex excels at selectively suppressing background noise...
research
04/26/2018

End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction

This paper proposes an end-to-end approach for single-channel speaker-in...
research
07/20/2021

Joint Echo Cancellation and Noise Suppression based on Cascaded Magnitude and Complex Mask Estimation

Acoustic echo and background noise can seriously degrade the intelligibi...

Please sign up or login with your details

Forgot password? Click here to reset