Distilling Multi-Level X-vector Knowledge for Small-footprint Speaker Verification

03/02/2023
by   Xuechen Liu, et al.
0

Deep speaker models yield low error rates in speaker verification. Nonetheless, the high performance tends to be exchanged for model size and computation time, making these models challenging to run under limited conditions. We focus on small-footprint deep speaker embedding extraction, leveraging knowledge distillation. While prior work on this topic has addressed speaker embedding extraction at the utterance level, we propose to combine embeddings from various levels of the x-vector model (teacher network) to train small-footprint student networks. Results indicate the usefulness of frame-level information, with the student models being 85 their teacher, depending on the size of the teacher embeddings. Concatenation of teacher embeddings results in student networks that reach comparable performance along with the teacher while utilizing a 75 reduction from the teacher. The findings and analogies are furthered to other x-vector variants.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2023

A Teacher-Student approach for extracting informative speaker embeddings from speech mixtures

We introduce a monaural neural speaker embeddings extractor that compute...
research
09/21/2020

Open-set Short Utterance Forensic Speaker Verification using Teacher-Student Network with Explicit Inductive Bias

In forensic applications, it is very common that only small naturalistic...
research
06/17/2021

Multi-Level Transfer Learning from Near-Field to Far-Field Speaker Verification

In far-field speaker verification, the performance of speaker embeddings...
research
11/26/2022

SKDBERT: Compressing BERT via Stochastic Knowledge Distillation

In this paper, we propose Stochastic Knowledge Distillation (SKD) to obt...
research
04/14/2021

Learning Metrics from Mean Teacher: A Supervised Learning Method for Improving the Generalization of Speaker Verification System

Most speaker verification tasks are studied as an open-set evaluation sc...
research
04/14/2018

Developing Far-Field Speaker System Via Teacher-Student Learning

In this study, we develop the keyword spotting (KWS) and acoustic model ...
research
02/27/2019

Efficient Video Classification Using Fewer Frames

Recently,there has been a lot of interest in building compact models for...

Please sign up or login with your details

Forgot password? Click here to reset