General-Purpose Speech Representation Learning through a Self-Supervised Multi-Granularity Framework

02/03/2021
by   Yucheng Zhao, et al.
0

This paper presents a self-supervised learning framework, named MGF, for general-purpose speech representation learning. In the design of MGF, speech hierarchy is taken into consideration. Specifically, we propose to use generative learning approaches to capture fine-grained information at small time scales and use discriminative learning approaches to distill coarse-grained or semantic information at large time scales. For phoneme-scale learning, we borrow idea from the masked language model but tailor it for the continuous speech signal by replacing classification loss with a contrastive loss. We corroborate our design by evaluating MGF representation on various downstream tasks, including phoneme classification, speaker classification, speech recognition, and emotion classification. Experiments verify that training at different time scales needs different training targets and loss functions, which in general complement each other and lead to a better performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/07/2023

Label Aware Speech Representation Learning For Language Identification

Speech representation learning approaches for non-semantic tasks such as...
research
10/27/2020

Speech SIMCLR: Combining Contrastive and Reconstruction Objective for Self-supervised Speech Representation Learning

Self-supervised visual pretraining has shown significant progress recent...
research
07/20/2023

MASR: Metadata Aware Speech Representation

In the recent years, speech representation learning is constructed prima...
research
12/01/2022

A General Purpose Supervisory Signal for Embodied Agents

Training effective embodied AI agents often involves manual reward engin...
research
11/15/2021

Scaling Law for Recommendation Models: Towards General-purpose User Representations

A recent trend shows that a general class of models, e.g., BERT, GPT-3, ...
research
04/08/2023

Unsupervised Speech Representation Pooling Using Vector Quantization

With the advent of general-purpose speech representations from large-sca...
research
06/01/2023

Exploration on HuBERT with Multiple Resolutions

Hidden-unit BERT (HuBERT) is a widely-used self-supervised learning (SSL...

Please sign up or login with your details

Forgot password? Click here to reset