Continuous speech separation: dataset and analysis

01/30/2020
by   Zhuo Chen, et al.
0

This paper describes a dataset and protocols for evaluating continuous speech separation algorithms. Most prior studies on speech separation use pre-segmented signals of artificially mixed speech utterances which are mostly fully overlapped, and the algorithms are evaluated based on signal-to-distortion ratio or similar performance metrics. However, in natural conversations, a speech signal is continuous, containing both overlapped and overlap-free components. In addition, the signal-based metrics have very weak correlations with automatic speech recognition (ASR) accuracy. We think that not only does this make it hard to assess the practical relevance of the tested algorithms, it also hinders researchers from developing systems that can be readily applied to real scenarios. In this paper, we define continuous speech separation (CSS) as a task of generating a set of non-overlapped speech signals from a continuous audio stream that contains multiple utterances that are partially overlapped by a varying degree. A new real recorded dataset, called LibriCSS, is derived from LibriSpeech by concatenating the corpus utterances to simulate a conversation and capturing the audio replays with far-field microphones. A Kaldi-based ASR evaluation protocol is also established by using a well-trained multi-conditional acoustic model. By using this dataset, several aspects of a recently proposed speaker-independent CSS algorithm are investigated. The dataset and evaluation scripts are available to facilitate the research in this direction.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/16/2023

Multi-resolution location-based training for multi-channel continuous speech separation

The performance of automatic speech recognition (ASR) systems severely d...
research
05/09/2019

Analysis of Deep Clustering as Preprocessing for Automatic Speech Recognition of Sparsely Overlapping Speech

Significant performance degradation of automatic speech recognition (ASR...
research
10/13/2021

All-neural beamformer for continuous speech separation

Continuous speech separation (CSS) aims to separate overlapping voices f...
research
08/13/2020

Continuous Speech Separation with Conformer

Continuous speech separation plays a vital role in complicated speech re...
research
09/15/2023

Mixture Encoder Supporting Continuous Speech Separation for Meeting Recognition

Many real-life applications of automatic speech recognition (ASR) requir...
research
12/10/2021

Directed Speech Separation for Automatic Speech Recognition of Long Form Conversational Speech

Many of the recent advances in speech separation are primarily aimed at ...
research
10/08/2018

Recognizing Overlapped Speech in Meetings: A Multichannel Separation Approach Using Neural Networks

The goal of this work is to develop a meeting transcription system that ...

Please sign up or login with your details

Forgot password? Click here to reset