AvaTr: One-Shot Speaker Extraction with Transformers

05/03/2021
by   Shell Xu Hu, et al.
0

To extract the voice of a target speaker when mixed with a variety of other sounds, such as white and ambient noises or the voices of interfering speakers, we extend the Transformer network to attend the most relevant information with respect to the target speaker given the characteristics of his or her voices as a form of contextual information. The idea has a natural interpretation in terms of the selective attention theory. Specifically, we propose two models to incorporate the voice characteristics in Transformer based on different insights of where the feature selection should take place. Both models yield excellent performance, on par or better than published state-of-the-art models on the speaker extraction task, including separating speech of novel speakers not seen during training.

READ FULL TEXT

page 2

page 3

research
01/23/2020

Improving speaker discrimination of target speech extraction with time-domain SpeakerBeam

Target speech extraction, which extracts a single target source in a mix...
research
06/19/2021

Improving robustness of one-shot voice conversion with deep discriminative speaker encoder

One-shot voice conversion has received significant attention since only ...
research
08/27/2022

Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization

This paper describes a speaker diarization model based on target speaker...
research
01/31/2023

Neural Target Speech Extraction: An Overview

Humans can listen to a target speaker even in challenging acoustic condi...
research
04/29/2020

Time-domain speaker extraction network

Speaker extraction is to extract a target speaker's voice from multi-tal...
research
06/13/2021

WASE: Learning When to Attend for Speaker Extraction in Cocktail Party Environments

In the speaker extraction problem, it is found that additional informati...
research
06/16/2022

Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations

Target speech extraction is a technique to extract the target speaker's ...

Please sign up or login with your details

Forgot password? Click here to reset