Log In Sign Up

Robust Text-Dependent Speaker Verification via Character-Level Information Preservation for the SdSV Challenge 2020

by   Sung Hwan Mun, et al.

This paper describes our submission to Task 1 of the Short-duration Speaker Verification (SdSV) challenge 2020. Task 1 is a text-dependent speaker verification task, where both the speaker and phrase are required to be verified. The submitted systems were composed of TDNN-based and ResNet-based front-end architectures, in which the frame-level features were aggregated with various pooling methods (e.g., statistical, self-attentive, ghostVLAD pooling). Although the conventional pooling methods provide embeddings with a sufficient amount of speaker-dependent information, our experiments show that these embeddings often lack phrase-dependent information. To mitigate this problem, we propose a new pooling and score compensation methods that leverage a CTC-based automatic speech recognition (ASR) model for taking the lexical content into account. Both methods showed improvement over the conventional techniques, and the best performance was achieved by fusing all the experimented systems, which showed 0.0785 challenge's evaluation subset.


page 1

page 2

page 3

page 4


Short-duration Speaker Verification (SdSV) Challenge 2020: the Challenge Evaluation Plan

This document describes task1 of the Short-Duration Speaker Verification...

The SJTU System for Short-duration Speaker Verification Challenge 2021

This paper presents the SJTU system for both text-dependent and text-ind...

Content-Aware Speaker Embeddings for Speaker Diarisation

Recent speaker diarisation systems often convert variable length speech ...

Personalized Keyphrase Detection using Speaker and Environment Information

In this paper, we introduce a streaming keyphrase detection system that ...

Comparison of Multiple Features and Modeling Methods for Text-dependent Speaker Verification

Text-dependent speaker verification is becoming popular in the speaker r...