SkiM: Skipping Memory LSTM for Low-Latency Real-Time Continuous Speech Separation

01/26/2022
by   Chenda Li, et al.
0

Continuous speech separation for meeting pre-processing has recently become a focused research topic. Compared to the data in utterance-level speech separation, the meeting-style audio stream lasts longer, has an uncertain number of speakers. We adopt the time-domain speech separation method and the recently proposed Graph-PIT to build a super low-latency online speech separation model, which is very important for the real application. The low-latency time-domain encoder with a small stride leads to an extremely long feature sequence. We proposed a simple yet efficient model named Skipping Memory (SkiM) for the long sequence modeling. Experimental results show that SkiM achieves on par or even better separation performance than DPRNN. Meanwhile, the computational cost of SkiM is reduced by 75 The strong long sequence modeling capability and low computational cost make SkiM a suitable model for online CSS applications. Our fastest real-time model gets 17.1 dB signal-to-distortion (SDR) improvement with less than 1-millisecond latency in the simulated meeting-style evaluation.

READ FULL TEXT
research
12/25/2019

Utterance-level Permutation Invariant Training with Latency-controlled BLSTM for Single-channel Multi-talker Speech Separation

Utterance-level permutation invariant training (uPIT) has achieved promi...
research
12/09/2019

MITAS: A Compressed Time-Domain Audio Separation Network with Parameter Sharing

Deep learning methods have brought substantial advancements in speech se...
research
10/28/2022

UX-NET: Filter-and-Process-based Improved U-Net for Real-time Time-domain Audio Separation

This study presents UX-Net, a time-domain audio separation network (TasN...
research
04/13/2019

Low-Latency Speaker-Independent Continuous Speech Separation

Speaker independent continuous speech separation (SI-CSS) is a task of c...
research
10/25/2019

A Multi-Phase Gammatone Filterbank for Speech Separation via TasNet

In this work, we investigate if the learned encoder of the end-to-end co...
research
04/12/2022

Low Latency Time Domain Multichannel Speech and Music Source Separation

The Goal is to obtain a simple multichannel source separation with very ...
research
06/29/2023

Modified Parametric Multichannel Wiener Filter for Low-latency Enhancement of Speech Mixtures with Unknown Number of Speakers

This paper introduces a novel low-latency online beamforming (BF) algori...

Please sign up or login with your details

Forgot password? Click here to reset