An Integrated Framework for Two-pass Personalized Voice Trigger

06/30/2021
by   Dexin Liao, et al.
0

In this paper, we present the XMUSPEECH system for Task 1 of 2020 Personalized Voice Trigger Challenge (PVTC2020). Task 1 is a joint wake-up word detection with speaker verification on close talking data. The whole system consists of a keyword spotting (KWS) sub-system and a speaker verification (SV) sub-system. For the KWS system, we applied a Temporal Depthwise Separable Convolution Residual Network (TDSC-ResNet) to improve the system's performance. For the SV system, we proposed a multi-task learning network, where phonetic branch is trained with the character label of the utterance, and speaker branch is trained with the label of the speaker. Phonetic branch is optimized with connectionist temporal classification (CTC) loss, which is treated as an auxiliary module for speaker branch. Experiments show that our system gets significant improvements compared with baseline system.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

01/26/2020

Multi-task Learning for Speaker Verification and Voice Trigger Detection

Automatic speech transcription and speaker recognition are usually treat...
02/26/2021

The NPU System for the 2020 Personalized Voice Trigger Challenge

This paper describes the system developed by the NPU team for the 2020 p...
05/08/2020

Multi-Task Network for Noise-Robust Keyword Spotting and Speaker Verification using CTC-based Soft VAD and Global Query Attention

Keyword spotting (KWS) and speaker verification (SV) have been studied i...
10/19/2021

Rep Works in Speaker Verification

Multi-branch convolutional neural network architecture has raised lots o...
08/17/2020

WSRNet: Joint Spotting and Recognition of Handwritten Words

In this work, we present a unified model that can handle both Keyword Sp...
09/05/2021

Efficient Attention Branch Network with Combined Loss Function for Automatic Speaker Verification Spoof Detection

Many endeavors have sought to develop countermeasure techniques as enhan...
06/25/2019

Naver at ActivityNet Challenge 2019 -- Task B Active Speaker Detection (AVA)

This report describes our submission to the ActivityNet Challenge at CVP...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.