Dual-Path Modeling for Long Recording Speech Separation in Meetings

by   Chenda Li, et al.

The continuous speech separation (CSS) is a task to separate the speech sources from a long, partially overlapped recording, which involves a varying number of speakers. A straightforward extension of conventional utterance-level speech separation to the CSS task is to segment the long recording with a size-fixed window and process each window separately. Though effective, this extension fails to model the long dependency in speech and thus leads to sub-optimum performance. The recent proposed dual-path modeling could be a remedy to this problem, thanks to its capability in jointly modeling the cross-window dependency and the local-window processing. In this work, we further extend the dual-path modeling framework for CSS task. A transformer-based dual-path system is proposed, which integrates transform layers for global modeling. The proposed models are applied to LibriCSS, a real recorded multi-talk dataset, and consistent WER reduction can be observed in the ASR evaluation for separated speech. Also, a dual-path transformer equipped with convolutional layers is proposed. It significantly reduces the computation amount by 30 dual-path models are investigated, which shows 10 compared to the baseline.



There are no comments yet.


page 1

page 2

page 3

page 4


Speech Separation Based on Multi-Stage Elaborated Dual-Path Deep BiLSTM with Auxiliary Identity Loss

Deep neural network with dual-path bi-directional long short-term memory...

Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation

The dominant speech separation models are based on complex recurrent or ...

Toward the pre-cocktail party problem with TasTas+

Deep neural network with dual-path bi-directional long short-term memory...

Continuous Streaming Multi-Talker ASR with Dual-path Transducers

Streaming recognition of multi-talker conversations has so far been eval...

Effect of Window Size for Detection of Abnormalities in Respiratory Sounds

The recording of respiratory sounds was of significant benefit in the di...

Multi-accent Speech Separation with One Shot Learning

Speech separation is a problem in the field of speech processing that ha...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.