Self-supervised Predictive Coding Models Encode Speaker and Phonetic Information in Orthogonal Subspaces

05/21/2023
by   Oli Liu, et al.
0

Self-supervised speech representations are known to encode both speaker and phonetic information, but how they are distributed in the high-dimensional space remains largely unexplored. We hypothesize that they are encoded in orthogonal subspaces, a property that lends itself to simple disentanglement. Applying principal component analysis to representations of two predictive coding models, we identify two subspaces that capture speaker and phonetic variances, and confirm that they are nearly orthogonal. Based on this property, we propose a new speaker normalization method which collapses the subspace that encodes speaker information, without requiring transcriptions. Probing experiments show that our method effectively eliminates speaker information and outperforms a previous baseline in phone discrimination tasks. Moreover, the approach generalizes and can be used to remove information of unseen speakers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/02/2021

Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing

Contrastive predictive coding (CPC) aims to learn representations of spe...
research
07/25/2020

Nonlinear ISA with Auxiliary Variables for Learning Speech Representations

This paper extends recent work on nonlinear Independent Component Analys...
research
10/21/2020

Learning Disentangled Phone and Speaker Representations in a Semi-Supervised VQ-VAE Paradigm

We present a new approach to disentangle speaker voice and phone content...
research
02/24/2023

Phone and speaker spatial organization in self-supervised speech representations

Self-supervised representations of speech are currently being widely use...
research
05/17/2020

Vector-Quantized Autoregressive Predictive Coding

Autoregressive Predictive Coding (APC), as a self-supervised objective, ...
research
05/03/2023

Improved Vocal Effort Transfer Vector Estimation for Vocal Effort-Robust Speaker Verification

Despite the maturity of modern speaker verification technology, its perf...
research
12/18/2010

Self-Organising Stochastic Encoders

The processing of mega-dimensional data, such as images, scales linearly...

Please sign up or login with your details

Forgot password? Click here to reset