Self-Supervised Learning from Contrastive Mixtures for Personalized Speech Enhancement

11/06/2020
by   Aswin Sivaraman, et al.
0

This work explores how self-supervised learning can be universally used to discover speaker-specific features towards enabling personalized speech enhancement models. We specifically address the few-shot learning scenario where access to cleaning recordings of a test-time speaker is limited to a few seconds, but noisy recordings of the speaker are abundant. We develop a simple contrastive learning procedure which treats the abundant noisy data as makeshift training targets through pairwise noise injection: the model is pretrained to maximize agreement between pairs of differently deformed identical utterances and to minimize agreement between pairs of similarly deformed nonidentical utterances. Our experiments compare the proposed pretraining approach with two baseline alternatives: speaker-agnostic fully-supervised pretraining, and speaker-specific self-supervised pretraining without contrastive loss terms. Of all three approaches, the proposed method using contrastive mixtures is found to be most robust to model compression (using 85 seconds).

READ FULL TEXT
research
04/05/2021

Self-Supervised Learning for Personalized Speech Enhancement

Speech enhancement systems can show improved performance by adapting the...
research
04/05/2021

Personalized Speech Enhancement through Self-Supervised Data Augmentation and Purification

Training personalized speech enhancement models is innately a no-shot le...
research
07/05/2023

Self-supervised learning with diffusion-based multichannel speech enhancement for speaker verification under noisy conditions

The paper introduces Diff-Filter, a multichannel speech enhancement appr...
research
02/07/2022

Self-supervised Speaker Recognition Training Using Human-Machine Dialogues

Speaker recognition, recognizing speaker identities based on voice alone...
research
06/08/2020

Speaker Diarization as a Fully Online Learning Problem in MiniVox

We proposed a novel AI framework to conduct real-time multi-speaker diar...
research
05/08/2021

Zero-Shot Personalized Speech Enhancement through Speaker-Informed Model Selection

This paper presents a novel zero-shot learning approach towards personal...
research
08/03/2023

MAP: A Model-agnostic Pretraining Framework for Click-through Rate Prediction

With the widespread application of personalized online services, click-t...

Please sign up or login with your details

Forgot password? Click here to reset