Self-supervised learning with diffusion-based multichannel speech enhancement for speaker verification under noisy conditions

07/05/2023
by   Sandipana Dowerah, et al.
0

The paper introduces Diff-Filter, a multichannel speech enhancement approach based on the diffusion probabilistic model, for improving speaker verification performance under noisy and reverberant conditions. It also presents a new two-step training procedure that takes the benefit of self-supervised learning. In the first stage, the Diff-Filter is trained by conducting timedomain speech filtering using a scoring-based diffusion model. In the second stage, the Diff-Filter is jointly optimized with a pre-trained ECAPA-TDNN speaker verification model under a self-supervised learning framework. We present a novel loss based on equal error rate. This loss is used to conduct selfsupervised learning on a dataset that is not labelled in terms of speakers. The proposed approach is evaluated on MultiSV, a multichannel speaker verification dataset, and shows significant improvements in performance under noisy multichannel conditions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/07/2019

VoiceID Loss: Speech Enhancement for Speaker Verification

In this paper, we propose VoiceID loss, a novel loss function for traini...
research
03/16/2022

Raw waveform speaker verification for supervised and self-supervised learning

Speaker verification models that directly operate upon raw waveforms are...
research
04/05/2021

Self-Supervised Learning for Personalized Speech Enhancement

Speech enhancement systems can show improved performance by adapting the...
research
11/06/2020

Self-Supervised Learning from Contrastive Mixtures for Personalized Speech Enhancement

This work explores how self-supervised learning can be universally used ...
research
06/16/2023

Evaluation of Speech Representations for MOS prediction

In this paper, we evaluate feature extraction models for predicting spee...
research
09/02/2014

Visual Passwords Using Automatic Lip Reading

This paper presents a visual passwords system to increase security. The ...
research
05/19/2023

Phonetic and Prosody-aware Self-supervised Learning Approach for Non-native Fluency Scoring

Speech fluency/disfluency can be evaluated by analyzing a range of phone...

Please sign up or login with your details

Forgot password? Click here to reset