Exploring Voice Conversion based Data Augmentation in Text-Dependent Speaker Verification

11/21/2020
by   Xiaoyi Qin, et al.
0

In this paper, we focus on improving the performance of the text-dependent speaker verification system in the scenario of limited training data. The speaker verification system deep learning based text-dependent generally needs a large scale text-dependent training data set which could be labor and cost expensive, especially for customized new wake-up words. In recent studies, voice conversion systems that can generate high quality synthesized speech of seen and unseen speakers have been proposed. Inspired by those works, we adopt two different voice conversion methods as well as the very simple re-sampling approach to generate new text-dependent speech samples for data augmentation purposes. Experimental results show that the proposed method significantly improves the Equal Error Rare performance from 6.51 of limited training data.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

02/19/2021

Unit selection synthesis based data augmentation for fixed phrase speaker verification

Data augmentation is commonly used to help build a robust speaker verifi...
07/12/2020

Data augmentation enhanced speaker enrollment for text-dependent speaker verification

Data augmentation is commonly used for generating additional data from t...
04/05/2021

SpeakerStew: Scaling to Many Languages with a Triaged Multilingual Text-Dependent and Text-Independent Speaker Verification System

In this paper, we describe SpeakerStew - a hybrid system to perform spea...
12/05/2019

Towards Robust Neural Vocoding for Speech Generation: A Survey

Recently, neural vocoders have been widely used in speech synthesis task...
04/12/2017

Trainable Referring Expression Generation using Overspecification Preferences

Referring expression generation (REG) models that use speaker-dependent ...
08/21/2019

Dialog State Tracking with Reinforced Data Augmentation

Neural dialog state trackers are generally limited due to the lack of qu...
11/17/2021

Rapping-Singing Voice Synthesis based on Phoneme-level Prosody Control

In this paper, a text-to-rapping/singing system is introduced, which can...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.