Learning Decoupling Features Through Orthogonality Regularization

03/31/2022
by   Li Wang, et al.
0

Keyword spotting (KWS) and speaker verification (SV) are two important tasks in speech applications. Research shows that the state-of-art KWS and SV models are trained independently using different datasets since they expect to learn distinctive acoustic features. However, humans can distinguish language content and the speaker identity simultaneously. Motivated by this, we believe it is important to explore a method that can effectively extract common features while decoupling task-specific features. Bearing this in mind, a two-branch deep network (KWS branch and SV branch) with the same network structure is developed and a novel decoupling feature learning method is proposed to push up the performance of KWS and SV simultaneously where speaker-invariant keyword representations and keyword-invariant speaker representations are expected respectively. Experiments are conducted on Google Speech Commands Dataset (GSCD). The results demonstrate that the orthogonality regularization helps the network to achieve SOTA EER of 1.31

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/03/2021

Multi-task Voice Activated Framework using Self-supervised Learning

Self-supervised learning methods such as wav2vec 2.0 have shown promisin...
research
06/30/2021

An Integrated Framework for Two-pass Personalized Voice Trigger

In this paper, we present the XMUSPEECH system for Task 1 of 2020 Person...
research
07/08/2022

A Multi-tasking Model of Speaker-Keyword Classification for Keeping Human in the Loop of Drone-assisted Inspection

Audio commands are a preferred communication medium to keep inspectors i...
research
08/17/2020

WSRNet: Joint Spotting and Recognition of Handwritten Words

In this work, we present a unified model that can handle both Keyword Sp...
research
05/08/2020

Multi-Task Network for Noise-Robust Keyword Spotting and Speaker Verification using CTC-based Soft VAD and Global Query Attention

Keyword spotting (KWS) and speaker verification (SV) have been studied i...
research
10/22/2020

Momentum Contrast Speaker Representation Learning

Unsupervised representation learning has shown remarkable achievement by...
research
08/12/2021

Text Anchor Based Metric Learning for Small-footprint Keyword Spotting

Keyword Spotting (KWS) remains challenging to achieve the trade-off betw...

Please sign up or login with your details

Forgot password? Click here to reset