DeepAI AI Chat
Log In Sign Up

Robust Speech Representation Learning via Flow-based Embedding Regularization

12/07/2021
by   Woo Hyun Kang, et al.
CRIM
0

Over the recent years, various deep learning-based methods were proposed for extracting a fixed-dimensional embedding vector from speech signals. Although the deep learning-based embedding extraction methods have shown good performance in numerous tasks including speaker verification, language identification and anti-spoofing, their performance is limited when it comes to mismatched conditions due to the variability within them unrelated to the main task. In order to alleviate this problem, we propose a novel training strategy that regularizes the embedding network to have minimum information about the nuisance attributes. To achieve this, our proposed method directly incorporates the information bottleneck scheme into the training process, where the mutual information is estimated using the main task classifier and an auxiliary normalizing flow network. The proposed method was evaluated on different speech processing tasks and showed improvement over the standard training strategy in all experimentation.

READ FULL TEXT

page 1

page 2

page 3

page 4

08/07/2020

Disentangled speaker and nuisance attribute embedding for robust speaker verification

Over the recent years, various deep learning-based embedding methods hav...
04/04/2022

Anti-Spoofing Using Transfer Learning with Variational Information Bottleneck

Recent advances in sophisticated synthetic speech generated from text-to...
11/04/2022

SAMO: Speaker Attractor Multi-Center One-Class Learning for Voice Anti-Spoofing

Voice anti-spoofing systems are crucial auxiliaries for automatic speake...
07/21/2020

Optimization of data-driven filterbank for automatic speaker verification

Most of the speech processing applications use triangular filters spaced...
10/12/2021

UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training

Self-supervised learning (SSL) is a long-standing goal for speech proces...
10/27/2022

Time-Domain Based Embeddings for Spoofed Audio Representation

Anti-spoofing is the task of speech authentication. That is, identifying...
11/14/2016

Post Training in Deep Learning with Last Kernel

One of the main challenges of deep learning methods is the choice of an ...