Dual-Path Modeling for Long Recording Speech Separation in Meetings

02/23/2021
by   Chenda Li, et al.
0

The continuous speech separation (CSS) is a task to separate the speech sources from a long, partially overlapped recording, which involves a varying number of speakers. A straightforward extension of conventional utterance-level speech separation to the CSS task is to segment the long recording with a size-fixed window and process each window separately. Though effective, this extension fails to model the long dependency in speech and thus leads to sub-optimum performance. The recent proposed dual-path modeling could be a remedy to this problem, thanks to its capability in jointly modeling the cross-window dependency and the local-window processing. In this work, we further extend the dual-path modeling framework for CSS task. A transformer-based dual-path system is proposed, which integrates transform layers for global modeling. The proposed models are applied to LibriCSS, a real recorded multi-talk dataset, and consistent WER reduction can be observed in the ASR evaluation for separated speech. Also, a dual-path transformer equipped with convolutional layers is proposed. It significantly reduces the computation amount by 30 dual-path models are investigated, which shows 10 compared to the baseline.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/06/2020

Speech Separation Based on Multi-Stage Elaborated Dual-Path Deep BiLSTM with Auxiliary Identity Loss

Deep neural network with dual-path bi-directional long short-term memory...
research
12/14/2022

Multi-Scale Feature Fusion Transformer Network for End-to-End Single Channel Speech Separation

Recently studies on time-domain audio separation networks (TasNets) have...
research
01/23/2020

La Furca: Iterative Context-Aware End-to-End Monaural Speech Separation Based on Dual-Path Deep Parallel Inter-Intra Bi-LSTM with Attention

Deep neural network with dual-path bi-directional long short-term memory...
research
07/28/2020

Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation

The dominant speech separation models are based on complex recurrent or ...
research
09/07/2020

Toward the pre-cocktail party problem with TasTas+

Deep neural network with dual-path bi-directional long short-term memory...
research
03/25/2022

Embedding Recurrent Layers with Dual-Path Strategy in a Variant of Convolutional Network for Speaker-Independent Speech Separation

Speaker-independent speech separation has achieved remarkable performanc...
research
09/17/2021

Continuous Streaming Multi-Talker ASR with Dual-path Transducers

Streaming recognition of multi-talker conversations has so far been eval...

Please sign up or login with your details

Forgot password? Click here to reset