Speaker activity driven neural speech extraction

01/14/2021
by   Marc Delcroix, et al.
0

Target speech extraction, which extracts the speech of a target speaker in a mixture given auxiliary speaker clues, has recently received increased interest. Various clues have been investigated such as pre-recorded enrollment utterances, direction information, or video of the target speaker. In this paper, we explore the use of speaker activity information as an auxiliary clue for single-channel neural network-based speech extraction. We propose a speaker activity driven speech extraction neural network (ADEnet) and show that it can achieve performance levels competitive with enrollment-based approaches, without the need for pre-recordings. We further demonstrate the potential of the proposed approach for processing meeting-like recordings, where the speaker activity is obtained from a diarization system. We show that this simple yet practical approach can successfully extract speakers after diarization, which results in improved ASR performance, especially in high overlapping conditions, with a relative word error rate reduction of up to 25

READ FULL TEXT
research
09/30/2021

USEV: Universal Speaker Extraction with Visual Cue

A speaker extraction algorithm seeks to extract the target speaker's voi...
research
06/26/2019

Auxiliary Interference Speaker Loss for Target-Speaker Speech Recognition

In this paper, we propose a novel auxiliary loss function for target-spe...
research
01/31/2023

Neural Target Speech Extraction: An Overview

Humans can listen to a target speaker even in challenging acoustic condi...
research
02/01/2022

New Insights on Target Speaker Extraction

In recent years, researchers have become increasingly interested in spea...
research
07/19/2013

Speaker Independent Continuous Speech to Text Converter for Mobile Application

An efficient speech to text converter for mobile application is presente...
research
04/04/2022

An Initialization Scheme for Meeting Separation with Spatial Mixture Models

Spatial mixture model (SMM) supported acoustic beamforming has been exte...
research
10/19/2020

Attention-based scaling adaptation for target speech extraction

The target speech extraction has attracted widespread attention in recen...

Please sign up or login with your details

Forgot password? Click here to reset