Robust Speech Representation Learning via Flow-based Embedding Regularization

12/07/2021
by   Woo Hyun Kang, et al.
0

Over the recent years, various deep learning-based methods were proposed for extracting a fixed-dimensional embedding vector from speech signals. Although the deep learning-based embedding extraction methods have shown good performance in numerous tasks including speaker verification, language identification and anti-spoofing, their performance is limited when it comes to mismatched conditions due to the variability within them unrelated to the main task. In order to alleviate this problem, we propose a novel training strategy that regularizes the embedding network to have minimum information about the nuisance attributes. To achieve this, our proposed method directly incorporates the information bottleneck scheme into the training process, where the mutual information is estimated using the main task classifier and an auxiliary normalizing flow network. The proposed method was evaluated on different speech processing tasks and showed improvement over the standard training strategy in all experimentation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/07/2020

Disentangled speaker and nuisance attribute embedding for robust speaker verification

Over the recent years, various deep learning-based embedding methods hav...
research
12/06/2020

Multi-task Learning Based Spoofing-Robust Automatic Speaker Verification System

Spoofing attacks posed by generating artificial speech can severely degr...
research
04/04/2022

Anti-Spoofing Using Transfer Learning with Variational Information Bottleneck

Recent advances in sophisticated synthetic speech generated from text-to...
research
07/21/2020

Optimization of data-driven filterbank for automatic speaker verification

Most of the speech processing applications use triangular filters spaced...
research
10/12/2021

UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training

Self-supervised learning (SSL) is a long-standing goal for speech proces...
research
02/03/2020

Within-sample variability-invariant loss for robust speaker recognition under noisy environments

Despite the significant improvements in speaker recognition enabled by d...
research
08/09/2023

CasCIFF: A Cross-Domain Information Fusion Framework Tailored for Cascade Prediction in Social Networks

Existing approaches for information cascade prediction fall into three m...

Please sign up or login with your details

Forgot password? Click here to reset