Improved Speech Separation with Time-and-Frequency Cross-domain Joint Embedding and Clustering

04/16/2019
by   Gene-Ping Yang, et al.
0

Speech separation has been very successful with deep learning techniques. Substantial effort has been reported based on approaches over spectrogram, which is well known as the standard time-and-frequency cross-domain representation for speech signals. It is highly correlated to the phonetic structure of speech, or "how the speech sounds" when perceived by human, but primarily frequency domain features carrying temporal behaviour. Very impressive work achieving speech separation over time domain was reported recently, probably because waveforms in time domain may describe the different realizations of speech in a more precise way than spectrogram. In this paper, we propose a framework properly integrating the above two directions, hoping to achieve both purposes. We construct a time-and-frequency feature map by concatenating the 1-dim convolution encoded feature map (for time domain) and the spectrogram (for frequency domain), which was then processed by an embedding network and clustering approaches very similar to those used in time and frequency domain prior works. In this way, the information in the time and frequency domains, as well as the interactions between them, can be jointly considered during embedding and clustering. Very encouraging results (state-of-the-art to our knowledge) were obtained with WSJ0-2mix dataset in preliminary experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/05/2023

Time-frequency Network for Robust Speaker Recognition

The wide deployment of speech-based biometric systems usually demands hi...
research
08/28/2023

Time-Frequency Transformer: A Novel Time Frequency Joint Learning Method for Speech Emotion Recognition

In this paper, we propose a novel time-frequency joint learning method f...
research
06/22/2022

Radio2Speech: High Quality Speech Recovery from Radio Frequency Signals

Considering the microphone is easily affected by noise and soundproof ma...
research
07/16/2019

Machine learning without a feature set for detecting bursts in the EEG of preterm infants

Deep neural networks enable learning directly on the data without the do...
research
10/27/2022

A Fast and Accurate Pitch Estimation Algorithm Based on the Pseudo Wigner-Ville Distribution

Estimation of fundamental frequency (F0) in voiced segments of speech si...
research
11/03/2019

Onssen: an open-source speech separation and enhancement library

Speech separation is an essential task for multi-talker speech recogniti...
research
08/12/2021

Parameter Tuning of Time-Frequency Masking Algorithms for Reverberant Artifact Removal within the Cochlear Implant Stimulus

Cochlear implant users struggle to understand speech in reverberant envi...

Please sign up or login with your details

Forgot password? Click here to reset