Multi-Task Network for Noise-Robust Keyword Spotting and Speaker Verification using CTC-based Soft VAD and Global Query Attention

05/08/2020
by   Myunghun Jung, et al.
0

Keyword spotting (KWS) and speaker verification (SV) have been studied independently although it is known that acoustic and speaker domains are complementary. In this paper, we propose a multi-task network that performs KWS and SV simultaneously to fully utilize the interrelated domain information. The multi-task network tightly combines sub-networks aiming at performance improvement in challenging conditions such as noisy environments, open-vocabulary KWS, and short-duration SV by introducing novel techniques of connectionist temporal classification (CTC)-based soft voice activity detection (VAD) and global query attention. Frame-level acoustic and speaker information is integrated with phonetically originated weights so that forms a word-level global representation. Then it is used for the aggregation of feature vectors to generate discriminative embeddings. Our proposed approach shows 4.06 26.71 baselines for both tasks. We also present a visualization example and results of ablation experiments.

READ FULL TEXT
research
06/30/2021

An Integrated Framework for Two-pass Personalized Voice Trigger

In this paper, we present the XMUSPEECH system for Task 1 of 2020 Person...
research
11/23/2018

Training Multi-Task Adversarial Network For Extracting Noise-Robust Speaker Embedding

Under noisy environments, to achieve the robust performance of speaker r...
research
04/29/2019

Adversarial Speaker Verification

The use of deep networks to extract embeddings for speaker recognition h...
research
09/26/2019

Self-Adaptive Soft Voice Activity Detection using Deep Neural Networks for Robust Speaker Verification

Voice activity detection (VAD), which classifies frames as speech or non...
research
04/02/2018

Speaker-Invariant Training via Adversarial Learning

We propose a novel adversarial multi-task learning scheme, aiming at act...
research
03/31/2022

Learning Decoupling Features Through Orthogonality Regularization

Keyword spotting (KWS) and speaker verification (SV) are two important t...
research
06/22/2019

Keyword Spotting for Hearing Assistive Devices Robust to External Speakers

Keyword spotting (KWS) is experiencing an upswing due to the pervasivene...

Please sign up or login with your details

Forgot password? Click here to reset