Cocktail HuBERT: Generalized Self-Supervised Pre-training for Mixture and Single-Source Speech

03/20/2023
by   Maryam Fazel-Zarandi, et al.
0

Self-supervised learning leverages unlabeled data effectively, improving label efficiency and generalization to domains without labeled data. While recent work has studied generalization to more acoustic/linguistic domains, languages, and modalities, these investigations are limited to single-source speech with one primary speaker in the recording. This paper presents Cocktail HuBERT, a self-supervised learning framework that generalizes to mixture speech using a masked pseudo source separation objective. This objective encourages the model to identify the number of sources, separate and understand the context, and infer the content of masked regions represented as discovered units. Cocktail HuBERT outperforms state-of-the-art results with 69 on multi-speaker ASR, 31 single- and multi-speaker tasks from SUPERB.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/18/2023

Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering

Self-supervised speech representation models have succeeded in various t...
research
07/01/2020

A Survey on Self-supervised Pre-training for Sequential Transfer Learning in Neural Networks

Deep neural networks are typically trained under a supervised learning f...
research
06/27/2022

Wav2Vec-Aug: Improved self-supervised training with limited data

Self-supervised learning (SSL) of speech representations has received mu...
research
11/01/2022

Adapting self-supervised models to multi-talker speech recognition using speaker embeddings

Self-supervised learning (SSL) methods which learn representations of da...
research
05/25/2023

Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition

Self-supervised learning (SSL) based speech pre-training has attracted m...
research
10/23/2022

Quantitative Evidence on Overlooked Aspects of Enrollment Speaker Embeddings for Target Speaker Separation

Single channel target speaker separation (TSS) aims at extracting a spea...
research
11/02/2022

data2vec-aqc: Search for the right Teaching Assistant in the Teacher-Student training setup

In this paper, we propose a new Self-Supervised Learning (SSL) algorithm...

Please sign up or login with your details

Forgot password? Click here to reset