Speech Augmentation Based Unsupervised Learning for Keyword Spotting

05/28/2022
by   Jian Luo, et al.
0

In this paper, we investigated a speech augmentation based unsupervised learning approach for keyword spotting (KWS) task. KWS is a useful speech application, yet also heavily depends on the labeled data. We designed a CNN-Attention architecture to conduct the KWS task. CNN layers focus on the local acoustic features, and attention layers model the long-time dependency. To improve the robustness of KWS model, we also proposed an unsupervised learning method. The unsupervised loss is based on the similarity between the original and augmented speech features, as well as the audio reconstructing information. Two speech augmentation methods are explored in the unsupervised learning: speed and intensity. The experiments on Google Speech Commands V2 Dataset demonstrated that our CNN-Attention model has competitive results. Moreover, the augmentation based unsupervised learning could further improve the classification accuracy of KWS task. In our experiments, with augmentation based unsupervised learning, our KWS model achieves better performance than other unsupervised methods, such as CPC, APC, and MPC.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/12/2021

Text Anchor Based Metric Learning for Small-footprint Keyword Spotting

Keyword Spotting (KWS) remains challenging to achieve the trade-off betw...
research
06/30/2022

Learning Audio-Text Agreement for Open-vocabulary Keyword Spotting

In this paper, we propose a novel end-to-end user-defined keyword spotti...
research
09/30/2022

Minimalistic Unsupervised Learning with the Sparse Manifold Transform

We describe a minimalistic and interpretable method for unsupervised lea...
research
03/29/2020

Learning by Analogy: Reliable Supervision from Transformations for Unsupervised Optical Flow Estimation

Unsupervised learning of optical flow, which leverages the supervision f...
research
10/24/2019

Unsupervised Representation Learning with Future Observation Prediction for Speech Emotion Recognition

Prior works on speech emotion recognition utilize various unsupervised l...
research
10/23/2022

Bootstrapping meaning through listening: Unsupervised learning of spoken sentence embeddings

Inducing semantic representations directly from speech signals is a high...
research
06/06/2019

From Caesar Cipher to Unsupervised Learning: A New Method for Classifier Parameter Estimation

Many important classification problems, such as object classification, s...

Please sign up or login with your details

Forgot password? Click here to reset