Exploring Representation Learning for Small-Footprint Keyword Spotting

03/20/2023
by   Fan Cui, et al.
0

In this paper, we investigate representation learning for low-resource keyword spotting (KWS). The main challenges of KWS are limited labeled data and limited available device resources. To address those challenges, we explore representation learning for KWS by self-supervised contrastive learning and self-training with pretrained model. First, local-global contrastive siamese networks (LGCSiam) are designed to learn similar utterance-level representations for similar audio samplers by proposed local-global contrastive loss without requiring ground-truth. Second, a self-supervised pretrained Wav2Vec 2.0 model is applied as a constraint module (WVC) to force the KWS model to learn frame-level acoustic representations. By the LGCSiam and WVC modules, the proposed small-footprint KWS model can be pretrained with unlabeled data. Experiments on speech commands dataset show that the self-training WVC module and the self-supervised LGCSiam module significantly improve accuracy, especially in the case of training on a small labeled dataset.

READ FULL TEXT
research
05/12/2021

Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning

Graph representation learning plays a vital role in processing graph-str...
research
03/09/2021

Wav2vec-C: A Self-supervised Model for Speech Representation Learning

Wav2vec-C introduces a novel representation learning technique combining...
research
08/31/2023

Improving Small Footprint Few-shot Keyword Spotting with Supervision on Auxiliary Data

Few-shot keyword spotting (FS-KWS) models usually require large-scale an...
research
06/07/2021

Enabling On-Device Self-Supervised Contrastive Learning With Selective Data Contrast

After a model is deployed on edge devices, it is desirable for these dev...
research
11/02/2022

data2vec-aqc: Search for the right Teaching Assistant in the Teacher-Student training setup

In this paper, we propose a new Self-Supervised Learning (SSL) algorithm...
research
12/21/2022

Continual Contrastive Finetuning Improves Low-Resource Relation Extraction

Relation extraction (RE), which has relied on structurally annotated cor...
research
03/07/2023

Self-supervised speech representation learning for keyword-spotting with light-weight transformers

Self-supervised speech representation learning (S3RL) is revolutionizing...

Please sign up or login with your details

Forgot password? Click here to reset