Enrollment-less training for personalized voice activity detection

06/23/2021
by   Naoki Makishima, et al.
0

We present a novel personalized voice activity detection (PVAD) learning method that does not require enrollment data during training. PVAD is a task to detect the speech segments of a specific target speaker at the frame level using enrollment speech of the target speaker. Since PVAD must learn speakers' speech variations to clarify the boundary between speakers, studies on PVAD used large-scale datasets that contain many utterances for each speaker. However, the datasets to train a PVAD model are often limited because substantial cost is needed to prepare such a dataset. In addition, we cannot utilize the datasets used to train the standard VAD because they often lack speaker labels. To solve these problems, our key idea is to use one utterance as both a kind of enrollment speech and an input to the PVAD during training, which enables PVAD training without enrollment speech. In our proposed method, called enrollment-less training, we augment one utterance so as to create variability between the input and the enrollment speech while keeping the speaker identity, which avoids the mismatch between training and inference. Our experimental results demonstrate the efficacy of the method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/05/2022

Improving Voice Trigger Detection with Metric Learning

Voice trigger detection is an important task, which enables activating a...
research
06/10/2020

Uniphore's submission to Fearless Steps Challenge Phase-2

We propose supervised systems for speech activity detection (SAD) and sp...
research
11/30/2020

Look who's not talking

The objective of this work is speaker diarisation of speech recordings '...
research
03/20/2020

Detecting Mismatch between Text Script and Voice-over Using Utterance Verification Based on Phoneme Recognition Ranking

The purpose of this study is to detect the mismatch between text script ...
research
08/07/2020

A Machine of Few Words – Interactive Speaker Recognition with Reinforcement Learning

Speaker recognition is a well known and studied task in the speech proce...
research
06/16/2022

Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations

Target speech extraction is a technique to extract the target speaker's ...
research
08/27/2020

Estimating Uniqueness of Human Voice UsingI-Vector Representation

We study the individuality of human voice with re-spect to a widely used...

Please sign up or login with your details

Forgot password? Click here to reset