Improving Voice Trigger Detection with Metric Learning

04/05/2022
by   Prateeth Nayak, et al.
3

Voice trigger detection is an important task, which enables activating a voice assistant when a target user speaks a keyword phrase. A detector is typically trained on speech data independent of speaker information and used for the voice trigger detection task. However, such a speaker independent voice trigger detector typically suffers from performance degradation on speech from underrepresented groups, such as accented speakers. In this work, we propose a novel voice trigger detector that can use a small number of utterances from a target speaker to improve detection accuracy. Our proposed model employs an encoder-decoder architecture. While the encoder performs speaker independent voice trigger detection, similar to the conventional detector, the decoder predicts a personalized embedding for each utterance. A personalized voice trigger score is then obtained as a similarity score between the embeddings of enrollment utterances and a test utterance. The personalized embedding allows adapting to target speaker's speech when computing the voice trigger score, hence improving voice trigger detection accuracy. Experimental results show that the proposed approach achieves a 38 rejection rate (FRR) compared to a baseline speaker independent voice trigger model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/23/2021

Enrollment-less training for personalized voice activity detection

We present a novel personalized voice activity detection (PVAD) learning...
research
04/18/2019

TTS Skins: Speaker Conversion via ASR

We present a fully convolutional wav-to-wav network for converting betwe...
research
06/18/2021

Fusion of Embeddings Networks for Robust Combination of Text Dependent and Independent Speaker Recognition

By implicitly recognizing a user based on his/her speech input, speaker ...
research
02/26/2021

The NPU System for the 2020 Personalized Voice Trigger Challenge

This paper describes the system developed by the NPU team for the 2020 p...
research
04/07/2021

The AS-NU System for the M2VoC Challenge

This paper describes the AS-NU systems for two tracks in MultiSpeaker Mu...
research
03/21/2023

Personalized Lightweight Text-to-Speech: Voice Cloning with Adaptive Structured Pruning

Personalized TTS is an exciting and highly desired application that allo...
research
04/12/2019

Building a mixed-lingual neural TTS system with only monolingual data

When deploying a Chinese neural text-to-speech (TTS) synthesis system, o...

Please sign up or login with your details

Forgot password? Click here to reset