Learning Speaker Representations with Mutual Information

12/01/2018
by   Mirco Ravanelli, et al.
0

Learning good representations is of crucial importance in deep learning. Mutual Information (MI) or similar measures of statistical dependence are promising tools for learning these representations in an unsupervised way. Even though the mutual information between two random variables is hard to measure directly in high dimensional spaces, some recent studies have shown that an implicit optimization of MI can be achieved with an encoder-discriminator architecture similar to that of Generative Adversarial Networks (GANs). In this work, we learn representations that capture speaker identities by maximizing the mutual information between the encoded representations of chunks of speech randomly sampled from the same sentence. The proposed encoder relies on the SincNet architecture and transforms raw speech waveform into a compact feature vector. The discriminator is fed by either positive samples (of the joint distribution of encoded chunks) or negative samples (from the product of the marginals) and is trained to separate them. We report experiments showing that this approach effectively learns useful speaker representations, leading to promising results on speaker identification and verification tasks. Our experiments consider both unsupervised and semi-supervised settings and compare the performance achieved with different objective functions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/18/2021

VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice Conversion

One-shot voice conversion (VC), which performs conversion across arbitra...
research
10/08/2019

MIM: Mutual Information Machine

We introduce the Mutual Information Machine (MIM), an autoencoder model ...
research
10/13/2017

Learning Independent Features with Adversarial Nets for Non-linear ICA

Reliable measures of statistical dependence could be useful tools for le...
research
07/29/2018

Speaker Recognition from raw waveform with SincNet

Deep learning is progressively gaining popularity as a viable alternativ...
research
08/17/2022

Disentangled Speaker Representation Learning via Mutual Information Minimization

Domain mismatch problem caused by speaker-unrelated feature has been a m...
research
10/11/2021

Sliced Mutual Information: A Scalable Measure of Statistical Dependence

Mutual information (MI) is a fundamental measure of statistical dependen...
research
08/19/2019

Why So Down? The Role of Negative (and Positive) Pointwise Mutual Information in Distributional Semantics

In distributional semantics, the pointwise mutual information (PMI) weig...

Please sign up or login with your details

Forgot password? Click here to reset