MIRNet: Learning multiple identities representations in overlapped speech

08/04/2020
by   Hyewon Han, et al.
0

Many approaches can derive information about a single speaker's identity from the speech by learning to recognize consistent characteristics of acoustic parameters. However, it is challenging to determine identity information when there are multiple concurrent speakers in a given signal. In this paper, we propose a novel deep speaker representation strategy that can reliably extract multiple speaker identities from an overlapped speech. We design a network that can extract a high-level embedding that contains information about each speaker's identity from a given mixture. Unlike conventional approaches that need reference acoustic features for training, our proposed algorithm only requires the speaker identity labels of the overlapped speech segments. We demonstrate the effectiveness and usefulness of our algorithm in a speaker verification task and a speech separation system conditioned on the target speaker embeddings obtained through the proposed method.

READ FULL TEXT
research
05/20/2021

Speaker disentanglement in video-to-speech conversion

The task of video-to-speech aims to translate silent video of lip moveme...
research
07/19/2023

An analysis on the effects of speaker embedding choice in non auto-regressive TTS

In this paper we introduce a first attempt on understanding how a non-au...
research
11/27/2019

Powerful Speaker Embedding Training Framework by Adversarially Disentangled Identity Representation

The main challenge of speaker verification in the wild is the interferen...
research
02/18/2022

Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation

Dysarthric speech reconstruction (DSR), which aims to improve the qualit...
research
02/06/2023

Residual Information in Deep Speaker Embedding Architectures

Speaker embeddings represent a means to extract representative vectorial...
research
08/30/2021

Speaker-Conditioned Hierarchical Modeling for Automated Speech Scoring

Automatic Speech Scoring (ASS) is the computer-assisted evaluation of a ...
research
05/30/2019

Speaker Anonymization Using X-vector and Neural Waveform Models

The social media revolution has produced a plethora of web services to w...

Please sign up or login with your details

Forgot password? Click here to reset