Fusion of Embeddings Networks for Robust Combination of Text Dependent and Independent Speaker Recognition

06/18/2021
by   Ruirui Li, et al.
0

By implicitly recognizing a user based on his/her speech input, speaker identification enables many downstream applications, such as personalized system behavior and expedited shopping checkouts. Based on whether the speech content is constrained or not, both text-dependent (TD) and text-independent (TI) speaker recognition models may be used. We wish to combine the advantages of both types of models through an ensemble system to make more reliable predictions. However, any such combined approach has to be robust to incomplete inputs, i.e., when either TD or TI input is missing. As a solution we propose a fusion of embeddings network foenet architecture, combining joint learning with neural attention. We compare foenet with four competitive baseline methods on a dataset of voice assistant inputs, and show that it achieves higher accuracy than the baseline and score fusion methods, especially in the presence of incomplete inputs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/12/2018

Frame-level speaker embeddings for text-independent speaker recognition and analysis of end-to-end model

In this paper, we propose a Convolutional Neural Network (CNN) based spe...
research
04/05/2022

Improving Voice Trigger Detection with Metric Learning

Voice trigger detection is an important task, which enables activating a...
research
02/25/2020

Speech2Phone: A Multilingual and Text Independent Speaker Identification Model

Voice recognition is an area with a wide application potential. Speaker ...
research
08/07/2016

Incorporation of Speech Duration Information in Score Fusion of Speaker Recognition Systems

In recent years identity-vector (i-vector) based speaker verification (S...
research
08/07/2020

A Machine of Few Words – Interactive Speaker Recognition with Reinforcement Learning

Speaker recognition is a well known and studied task in the speech proce...
research
10/26/2022

Effect of different splitting criteria on the performance of speech emotion recognition

Traditional speech emotion recognition (SER) evaluations have been perfo...
research
03/31/2016

System Combination for Short Utterance Speaker Recognition

For text-independent short-utterance speaker recognition (SUSR), the per...

Please sign up or login with your details

Forgot password? Click here to reset