SMS-WSJ: Database, performance measures, and baseline recipe for multi-channel source separation and recognition

10/30/2019
by   Lukas Drude, et al.
0

We present a multi-channel database of overlapping speech for training, evaluation, and detailed analysis of source separation and extraction algorithms: SMS-WSJ – Spatialized Multi-Speaker Wall Street Journal. It consists of artificially mixed speech taken from the WSJ database, but unlike earlier databases we consider all WSJ0+1 utterances and take care of strictly separating the speaker sets present in the training, validation and test sets. When spatializing the data we ensure a high degree of randomness w.r.t. room size, array center and rotation, as well as speaker position. Furthermore, this paper offers a critical assessment of recently proposed measures of source separation performance. Alongside the code to generate the database we provide a source separation baseline and a Kaldi recipe with competitive word error rates to provide common ground for evaluation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/20/2023

Directional Source Separation for Robust Speech Recognition on Smart Glasses

Modern smart glasses leverage advanced audio sensing and machine learnin...
research
12/18/2019

Ene-to-end training of time domain audio separation and recognition

The rising interest in single-channel multi-speaker speech separation sp...
research
12/10/2022

GPU-accelerated Guided Source Separation for Meeting Transcription

Guided source separation (GSS) is a type of target-speaker extraction me...
research
11/06/2019

The sound of my voice: speaker representation loss for target voice separation

Research on content and style representations has been widely studied in...
research
03/07/2023

TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings

Since diarization and source separation of meeting data are closely rela...
research
08/24/2018

Multi-scenario deep learning for multi-speaker source separation

Research in deep learning for multi-speaker source separation has receiv...
research
02/16/2022

Formulating Beurling LASSO for Source Separation via Proximal Gradient Iteration

Beurling LASSO generalizes the LASSO problem to finite Radon measures re...

Please sign up or login with your details

Forgot password? Click here to reset