GPU-accelerated Guided Source Separation for Meeting Transcription

12/10/2022
by   Desh Raj, et al.
0

Guided source separation (GSS) is a type of target-speaker extraction method that relies on pre-computed speaker activities and blind source separation to perform front-end enhancement of overlapped speech signals. It was first proposed during the CHiME-5 challenge and provided significant improvements over the delay-and-sum beamforming baseline. Despite its strengths, however, the method has seen limited adoption for meeting transcription benchmarks primarily due to its high computation time. In this paper, we describe our improved implementation of GSS that leverages the power of modern GPU-based pipelines, including batched processing of frequencies and segments, to provide 300x speed-up over CPU-based inference. The improved inference time allows us to perform detailed ablation studies over several parameters of the GSS algorithm – such as context duration, number of channels, and noise class, to name a few. We provide end-to-end reproducible pipelines for speaker-attributed transcription of popular meeting benchmarks: LibriCSS, AMI, and AliMeeting. Our code and recipes are publicly available: https://github.com/desh2608/gss.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/22/2020

DBNET: DOA-driven beamforming network for end-to-end farfield sound source separation

Many deep learning techniques are available to perform source separation...
research
10/30/2019

SMS-WSJ: Database, performance measures, and baseline recipe for multi-channel source separation and recognition

We present a multi-channel database of overlapping speech for training, ...
research
05/08/2020

Asteroid: the PyTorch-based audio source separation toolkit for researchers

This paper describes Asteroid, the PyTorch-based audio source separation...
research
11/06/2019

The sound of my voice: speaker representation loss for target voice separation

Research on content and style representations has been widely studied in...
research
04/08/2019

Parrotron: An End-to-End Speech-to-Speech Conversion Model and its Applications to Hearing-Impaired Speech and Speech Separation

We describe Parrotron, an end-to-end-trained speech-to-speech conversion...
research
04/04/2022

Target Confusion in End-to-end Speaker Extraction: Analysis and Approaches

Recently, end-to-end speaker extraction has attracted increasing attenti...
research
11/20/2021

Switching Independent Vector Analysis and Its Extension to Blind and Spatially Guided Convolutional Beamforming Algorithm

This paper develops a framework that can perform denoising, dereverberat...

Please sign up or login with your details

Forgot password? Click here to reset