SpEx+: A Complete Time Domain Speaker Extraction Network

05/10/2020
by   Meng Ge, et al.
0

Speaker extraction aims to extract the target speech signal from a multi-talker environment given a target speaker's reference speech. We recently proposed a time-domain solution, SpEx, that avoids the phase estimation in frequency-domain approaches. Unfortunately, SpEx is not fully a time-domain solution since it performs time-domain speech encoding for speaker extraction, while taking frequency-domain speaker embedding as the reference. The size of the analysis window for time-domain and the size for frequency-domain input are also different. Such mismatch has an adverse effect on the system performance. To eliminate such mismatch, we propose a complete time-domain speaker extraction solution, that is called SpEx+. Specifically, we tie the weights of two identical speech encoder networks, one for the encoder-extractor-decoder pipeline, another as part of the speaker encoder. Experiments show that the SpEx+ achieves 0.8dB and 2.1dB SDR improvement over the state-of-the-art SpEx baseline, under different and same gender conditions on WSJ0-2mix-extr database respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/17/2020

SpEx: Multi-Scale Time Domain Speaker Extraction Network

Speaker extraction aims to mimic humans' selective auditory attention by...
research
03/06/2022

Single microphone speaker extraction using unified time-frequency Siamese-Unet

In this paper we present a unified time-frequency method for speaker ext...
research
10/25/2020

Speakerfilter-Pro: an improved target speaker extractor combines the time domain and frequency domain

This paper introduces an improved target speaker extractor, referred to ...
research
06/23/2022

Formant Estimation and Tracking using Probabilistic Heat-Maps

Formants are the spectral maxima that result from acoustic resonances of...
research
04/29/2020

Time-domain speaker extraction network

Speaker extraction is to extract a target speaker's voice from multi-tal...
research
03/13/2023

A two-stage speaker extraction algorithm under adverse acoustic conditions using a single-microphone

In this work, we present a two-stage method for speaker extraction under...
research
06/18/2022

Semi-supervised Time Domain Target Speaker Extraction with Attention

In this work, we propose Exformer, a time-domain architecture for target...

Please sign up or login with your details

Forgot password? Click here to reset