Self-supervised speech representation learning for keyword-spotting with light-weight transformers

03/07/2023
by   Chenyang Gao, et al.
0

Self-supervised speech representation learning (S3RL) is revolutionizing the way we leverage the ever-growing availability of data. While S3RL related studies typically use large models, we employ light-weight networks to comply with tight memory of compute-constrained devices. We demonstrate the effectiveness of S3RL on a keyword-spotting (KS) problem by using transformers with 330k parameters and propose a mechanism to enhance utterance-wise distinction, which proves crucial for improving performance on classification tasks. On the Google speech commands v2 dataset, the proposed method applied to the Auto-Regressive Predictive Coding S3RL led to a 1.2 compared to training from scratch. On an in-house KS dataset with four different keywords, it provided 6 at fixed false reject rate. We argue this demonstrates the applicability of S3RL approaches to light-weight models for KS and confirms S3RL is a powerful alternative to traditional supervised learning for resource-constrained applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/06/2023

On-Device Constrained Self-Supervised Speech Representation Learning for Keyword Spotting via Knowledge Distillation

Large self-supervised models are effective feature extractors, but their...
research
10/14/2021

Attention-Free Keyword Spotting

Till now, attention-based models have been used with great success in th...
research
11/23/2020

Speech Command Recognition in Computationally Constrained Environments with a Quadratic Self-organized Operational Layer

Automatic classification of speech commands has revolutionized human com...
research
03/20/2023

Exploring Representation Learning for Small-Footprint Keyword Spotting

In this paper, we investigate representation learning for low-resource k...
research
10/04/2022

Improving Label-Deficient Keyword Spotting Using Self-Supervised Pretraining

In recent years, the development of accurate deep keyword spotting (KWS)...
research
12/14/2020

Towards localisation of keywords in speech using weak supervision

Developments in weakly supervised and self-supervised models could enabl...
research
03/03/2023

Unified Keyword Spotting and Audio Tagging on Mobile Devices with Transformers

Keyword spotting (KWS) is a core human-machine-interaction front-end tas...

Please sign up or login with your details

Forgot password? Click here to reset