Short utterance compensation in speaker verification via cosine-based teacher-student learning of speaker embeddings

10/25/2018
by   Jee-weon Jung, et al.
0

Input utterance with short duration is one of the most critical threats that degrade the performance of speaker verification systems. This study aimed to develop an integrated text-independent speaker verification system that inputs utterances with short durations of 2.05 seconds. For this goal, we propose an approach using a teacher-student learning framework that maximizes the cosine similarity of two speaker embeddings extracted from long and short utterances. In the proposed architecture, phonetic-level features in which each feature represents a segment of 130 ms are extracted using convolutional layers. The gated recurrent units extract an utterance-level speaker embedding using the phonetic-level features. Experiments were conducted using deep neural networks that take raw waveforms as input, and output speaker embeddings on the VoxCeleb 1 dataset. The equal error rates without short utterance compensation are 8.72 respectively. The proposed model with compensation exhibits an equal error rate of 10.08 performance degradation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/07/2020

Segment Aggregation for short utterances speaker verification using raw waveforms

Most studies on speaker verification systems focus on long-duration utte...
research
05/05/2017

Deep Speaker: an End-to-End Neural Speaker Embedding System

We present Deep Speaker, a neural speaker embedding system that maps utt...
research
04/01/2018

I-vector Transformation Using Conditional Generative Adversarial Networks for Short Utterance Speaker Verification

I-vector based text-independent speaker verification (SV) systems often ...
research
09/21/2020

Open-set Short Utterance Forensic Speaker Verification using Teacher-Student Network with Explicit Inductive Bias

In forensic applications, it is very common that only small naturalistic...
research
04/17/2019

RawNet: Advanced end-to-end deep neural network using raw waveforms for text-independent speaker verification

Recently, direct modeling of raw waveforms using deep neural networks ha...
research
10/22/2020

Graph Attention Networks for Speaker Verification

This work presents a novel back-end framework for speaker verification u...
research
05/26/2022

DT-SV: A Transformer-based Time-domain Approach for Speaker Verification

Speaker verification (SV) aims to determine whether the speaker's identi...

Please sign up or login with your details

Forgot password? Click here to reset