Compositional embedding models for speaker identification and diarization with simultaneous speech from 2+ speakers

10/22/2020
by   Zeqian Li, et al.
0

We propose a new method for speaker diarization that can handle overlapping speech with 2+ people. Our method is based on compositional embeddings [1]: Like standard speaker embedding methods such as x-vector [2], compositional embedding models contain a function f that separates speech from different speakers. In addition, they include a composition function g to compute set-union operations in the embedding space so as to infer the set of speakers within the input audio. In an experiment on multi-person speaker identification using synthesized LibriSpeech data, the proposed method outperforms traditional embedding methods that are only trained to separate single speakers (not speaker sets). In a speaker diarization experiment on the AMI Headset Mix corpus, we achieve state-of-the-art accuracy (DER=22.93 the previous best result (23.82

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/21/2022

Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS

This paper proposes a human-in-the-loop speaker-adaptation method for mu...
research
01/14/2020

Supervised Speaker Embedding De-Mixing in Two-Speaker Environment

In this work, a speaker embedding de-mixing approach is proposed. Instea...
research
09/09/2021

Compositional Affinity Propagation: When Clusters Have Compositional Structure

We consider a new kind of clustering problem in which clusters need not ...
research
02/25/2020

Speech2Phone: A Multilingual and Text Independent Speaker Identification Model

Voice recognition is an area with a wide application potential. Speaker ...
research
02/11/2020

Compositional Embeddings for Multi-Label One-Shot Learning

We explore the idea of compositional set embeddings that can be used to ...
research
12/01/2021

STEM: Unsupervised STructural EMbedding for Stance Detection

Stance detection is an important task, supporting many downstream tasks ...
research
05/22/2020

Speaker diarization with session-level speaker embedding refinement using graph neural networks

Deep speaker embedding models have been commonly used as a building bloc...

Please sign up or login with your details

Forgot password? Click here to reset