Exploring the Use of an Unsupervised Autoregressive Model as a Shared Encoder for Text-Dependent Speaker Verification

08/08/2020
by   Vijay Ravi, et al.
0

In this paper, we propose a novel way of addressing text-dependent automatic speaker verification (TD-ASV) by using a shared-encoder with task-specific decoders. An autoregressive predictive coding (APC) encoder is pre-trained in an unsupervised manner using both out-of-domain (LibriSpeech, VoxCeleb) and in-domain (DeepMine) unlabeled datasets to learn generic, high-level feature representation that encapsulates speaker and phonetic content. Two task-specific decoders were trained using labeled datasets to classify speakers (SID) and phrases (PID). Speaker embeddings extracted from the SID decoder were scored using a PLDA. SID and PID systems were fused at the score level. There is a 51.9 supervised x-vector baseline on the cross-lingual DeepMine dataset. However, the i-vector/HMM method outperformed the proposed APC encoder-decoder system. A fusion of the x-vector/PLDA baseline and the SID/PLDA scores prior to PID fusion further improved performance by 15 proposed approach to the x-vector system. We show that the proposed approach can leverage from large, unlabeled, data-rich domains, and learn speech patterns independent of downstream tasks. Such a system can provide competitive performance in domain-mismatched scenarios where test data is from data-scarce domains.

READ FULL TEXT
research
08/05/2019

Cross-lingual Text-independent Speaker Verification using Unsupervised Adversarial Discriminative Domain Adaptation

Speaker verification systems often degrade significantly when there is a...
research
08/03/2022

The SJTU System for Short-duration Speaker Verification Challenge 2021

This paper presents the SJTU system for both text-dependent and text-ind...
research
10/18/2018

Unsupervised Neural Text Simplification

The paper presents a first attempt towards unsupervised neural text simp...
research
11/25/2020

Vocal Tract Length Perturbation for Text-Dependent Speaker Verification with Autoregressive Prediction Coding

In this letter, we propose a vocal tract length (VTL) perturbation metho...
research
04/05/2019

An Unsupervised Autoregressive Model for Speech Representation Learning

This paper proposes a novel unsupervised autoregressive neural model for...
research
01/07/2020

Learning Speaker Embedding with Momentum Contrast

Speaker verification can be formulated as a representation learning task...
research
06/01/2023

Encoder-decoder multimodal speaker change detection

The task of speaker change detection (SCD), which detects points where s...

Please sign up or login with your details

Forgot password? Click here to reset