Analysing Discrete Self Supervised Speech Representation for Spoken Language Modeling

01/02/2023
by   Amitay Sicherman, et al.
0

This work profoundly analyzes discrete self-supervised speech representations (units) through the eyes of Generative Spoken Language Modeling (GSLM). Following the findings of such an analysis, we propose practical improvements to the discrete unit for the GSLM. First, we start comprehending these units by analyzing them in three axes: interpretation, visualization, and resynthesis. Our analysis finds a high correlation between the speech units to phonemes and phoneme families, while their correlation with speaker or gender is weaker. Additionally, we found redundancies in the extracted units and claim that one reason may be the units' context. Following this analysis, we propose a new, unsupervised metric to measure unit redundancies. Finally, we use this metric to develop new methods that improve the robustness of units' clustering and show significant improvement considering zero-resource speech metrics such as ABX. Code and analysis tools are available under the following link: https://github.com/slp-rl/SLM-Discrete-Representations

READ FULL TEXT
research
09/30/2022

On The Robustness of Self-Supervised Representations for Spoken Language Modeling

Self-supervised representations have been extensively studied for discri...
research
06/15/2023

Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation

The excellent generalization ability of self-supervised learning (SSL) f...
research
03/11/2022

Are discrete units necessary for Spoken Language Modeling?

Recent work in spoken language modeling shows the possibility of learnin...
research
10/27/2022

Self-supervised language learning from raw audio: Lessons from the Zero Resource Speech Challenge

Recent progress in self-supervised or unsupervised machine learning has ...
research
11/12/2022

A unified one-shot prosody and speaker conversion system with self-supervised discrete speech units

We present a unified system to realize one-shot voice conversion (VC) on...
research
06/04/2023

An Information-Theoretic Analysis of Self-supervised Discrete Representations of Speech

Self-supervised representation learning for speech often involves a quan...
research
05/12/2021

Discrete representations in neural models of spoken language

The distributed and continuous representations used by neural networks a...

Please sign up or login with your details

Forgot password? Click here to reset