Zero-Shot Personalized Speech Enhancement through Speaker-Informed Model Selection

05/08/2021
by   Aswin Sivaraman, et al.
0

This paper presents a novel zero-shot learning approach towards personalized speech enhancement through the use of a sparsely active ensemble model. Optimizing speech denoising systems towards a particular test-time speaker can improve performance and reduce run-time complexity. However, test-time model adaptation may be challenging if collecting data from the test-time speaker is not possible. To this end, we propose using an ensemble model wherein each specialist module denoises noisy utterances from a distinct partition of training set speakers. The gating module inexpensively estimates test-time speaker characteristics in the form of an embedding vector and selects the most appropriate specialist module for denoising the test signal. Grouping the training set speakers into non-overlapping semantically similar groups is non-trivial and ill-defined. To do this, we first train a Siamese network using noisy speech pairs to maximize or minimize the similarity of its output vectors depending on whether the utterances derive from the same speaker or not. Next, we perform k-means clustering on the latent space formed by the averaged embedding vectors per training set speaker. In this way, we designate speaker groups and train specialist modules optimized around partitions of the complete training set. Our experiments show that ensemble models made up of low-capacity specialists can outperform high-capacity generalist models with greater efficiency and improved adaptation towards unseen test-time speakers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/08/2021

Test-Time Adaptation Toward Personalized Speech Enhancement: Zero-Shot Learning with Knowledge Distillation

In realistic speech enhancement settings for end-user devices, we often ...
research
07/13/2022

SATTS: Speaker Attractor Text to Speech, Learning to Speak by Learning to Separate

The mapping of text to speech (TTS) is non-deterministic, letters may be...
research
04/05/2021

Personalized Speech Enhancement through Self-Supervised Data Augmentation and Purification

Training personalized speech enhancement models is innately a no-shot le...
research
02/20/2023

Personalized speech enhancement combining band-split RNN and speaker attentive module

Target speaker information can be utilized in speech enhancement (SE) mo...
research
11/06/2020

Self-Supervised Learning from Contrastive Mixtures for Personalized Speech Enhancement

This work explores how self-supervised learning can be universally used ...
research
04/27/2018

Deep Speech Denoising with Vector Space Projections

We propose an algorithm to denoise speakers from a single microphone in ...
research
05/02/2020

Treebank Embedding Vectors for Out-of-domain Dependency Parsing

A recent advance in monolingual dependency parsing is the idea of a tree...

Please sign up or login with your details

Forgot password? Click here to reset