On-Device Constrained Self-Supervised Speech Representation Learning for Keyword Spotting via Knowledge Distillation

07/06/2023
by   Gene-Ping Yang, et al.
0

Large self-supervised models are effective feature extractors, but their application is challenging under on-device budget constraints and biased dataset collection, especially in keyword spotting. To address this, we proposed a knowledge distillation-based self-supervised speech representation learning (S3RL) architecture for on-device keyword spotting. Our approach used a teacher-student framework to transfer knowledge from a larger, more complex model to a smaller, light-weight model using dual-view cross-correlation distillation and the teacher's codebook as learning objectives. We evaluated our model's performance on an Alexa keyword spotting detection task using a 16.6k-hour in-house dataset. Our technique showed exceptional performance in normal and noisy conditions, demonstrating the efficacy of knowledge distillation methods in constructing self-supervised models for keyword spotting tasks while working within on-device resource constraints.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/13/2023

Multi-Mode Online Knowledge Distillation for Self-Supervised Visual Representation Learning

Self-supervised learning (SSL) has made remarkable progress in visual re...
research
03/07/2023

Self-supervised speech representation learning for keyword-spotting with light-weight transformers

Self-supervised speech representation learning (S3RL) is revolutionizing...
research
10/27/2021

Temporal Knowledge Distillation for On-device Audio Classification

Improving the performance of on-device audio classification models remai...
research
11/23/2021

Domain-Agnostic Clustering with Self-Distillation

Recent advancements in self-supervised learning have reduced the gap bet...
research
12/06/2022

Label-free Knowledge Distillation with Contrastive Loss for Light-weight Speaker Recognition

Very deep models for speaker recognition (SR) have demonstrated remarkab...
research
10/29/2022

Application of Knowledge Distillation to Multi-task Speech Representation Learning

Model architectures such as wav2vec 2.0 and HuBERT have been proposed to...
research
05/17/2023

DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning

In this paper, we introduce self-distillation and online clustering for ...

Please sign up or login with your details

Forgot password? Click here to reset