Neural Target Speech Extraction: An Overview

01/31/2023
by   Katerina Zmolikova, et al.
0

Humans can listen to a target speaker even in challenging acoustic conditions that have noise, reverberation, and interfering speakers. This phenomenon is known as the cocktail-party effect. For decades, researchers have focused on approaching the listening ability of humans. One critical issue is handling interfering speakers because the target and non-target speech signals share similar characteristics, complicating their discrimination. Target speech/speaker extraction (TSE) isolates the speech signal of a target speaker from a mixture of several speakers with or without noises and reverberations using clues that identify the speaker in the mixture. Such clues might be a spatial clue indicating the direction of the target speaker, a video of the speaker's lips, or a pre-recorded enrollment utterance from which their voice characteristics can be derived. TSE is an emerging field of research that has received increased attention in recent years because it offers a practical approach to the cocktail-party problem and involves such aspects of signal processing as audio, visual, array processing, and deep learning. This paper focuses on recent neural-based approaches and presents an in-depth overview of TSE. We guide readers through the different major approaches, emphasizing the similarities among frameworks and discussing potential future directions.

READ FULL TEXT

page 1

page 3

page 5

page 10

page 11

page 13

page 14

page 16

research
01/23/2020

Improving speaker discrimination of target speech extraction with time-domain SpeakerBeam

Target speech extraction, which extracts a single target source in a mix...
research
01/14/2021

Speaker activity driven neural speech extraction

Target speech extraction, which extracts the speech of a target speaker ...
research
04/17/2019

Understanding the Effectiveness of Ultrasonic Microphone Jammer

Recent works have explained the principle of using ultrasonic transmissi...
research
06/13/2019

Speaker-Targeted Audio-Visual Models for Speech Recognition in Cocktail-Party Environments

Speech recognition in cocktail-party environments remains a significant ...
research
05/03/2021

AvaTr: One-Shot Speaker Extraction with Transformers

To extract the voice of a target speaker when mixed with a variety of ot...
research
07/31/2019

Quantifying Cochlear Implant Users' Ability for Speaker Identification using CI Auditory Stimuli

Speaker recognition is a biometric modality that uses underlying speech ...
research
11/04/2022

Spatially Selective Deep Non-linear Filters for Speaker Extraction

In a scenario with multiple persons talking simultaneously, the spatial ...

Please sign up or login with your details

Forgot password? Click here to reset