Cosine-Distance Virtual Adversarial Training for Semi-Supervised Speaker-Discriminative Acoustic Embeddings

08/09/2020
by   Florian L. Kreyssig, et al.
0

In this paper, we propose a semi-supervised learning (SSL) technique for training deep neural networks (DNNs) to generate speaker-discriminative acoustic embeddings (speaker embeddings). Obtaining large amounts of speaker recognition train-ing data can be difficult for desired target domains, especially under privacy constraints. The proposed technique reduces requirements for labelled data by leveraging unlabelled data. The technique is a variant of virtual adversarial training (VAT) [1] in the form of a loss that is defined as the robustness of the speaker embedding against input perturbations, as measured by the cosine-distance. Thus, we term the technique cosine-distance virtual adversarial training (CD-VAT). In comparison to many existing SSL techniques, the unlabelled data does not have to come from the same set of classes (here speakers) as the labelled data. The effectiveness of CD-VAT is shown on the 2750+ hour VoxCeleb data set, where on a speaker verification task it achieves a reduction in equal error rate (EER) of 11.1 relative to a purely supervised baseline. This is 32.5 would be achieved from supervised training if the speaker labels for the unlabelled data were available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/05/2017

Deep Speaker: an End-to-End Neural Speaker Embedding System

We present Deep Speaker, a neural speaker embedding system that maps utt...
research
05/25/2016

Adversarial Training Methods for Semi-Supervised Text Classification

Adversarial training provides a means of regularizing supervised learnin...
research
11/07/2018

Adapting End-to-End Neural Speaker Verification to New Languages and Recording Conditions with Adversarial Training

In this article we propose a novel approach for adapting speaker embeddi...
research
07/01/2019

Cosine similarity-based adversarial process

An adversarial process between two deep neural networks is a promising a...
research
10/22/2019

Discriminative Neural Clustering for Speaker Diarisation

This paper proposes a novel method for supervised data clustering. The c...
research
10/25/2019

Channel adversarial training for speaker verification and diarization

Previous work has encouraged domain-invariance in deep speaker embedding...
research
07/11/2021

ReconVAT: A Semi-Supervised Automatic Music Transcription Framework for Low-Resource Real-World Data

Most of the current supervised automatic music transcription (AMT) model...

Please sign up or login with your details

Forgot password? Click here to reset