Deep Spoken Keyword Spotting: An Overview

11/20/2021
by   Iván López-Espejo, et al.
6

Spoken keyword spotting (KWS) deals with the identification of keywords in audio streams and has become a fast-growing technology thanks to the paradigm shift introduced by deep learning a few years ago. This has allowed the rapid embedding of deep KWS in a myriad of small electronic devices with different purposes like the activation of voice assistants. Prospects suggest a sustained growth in terms of social use of this technology. Thus, it is not surprising that deep KWS has become a hot research topic among speech scientists, who constantly look for KWS performance improvement and computational complexity reduction. This context motivates this paper, in which we conduct a literature review into deep spoken KWS to assist practitioners and researchers who are interested in this technology. Specifically, this overview has a comprehensive nature by covering a thorough analysis of deep KWS systems (which includes speech features, acoustic modeling and posterior handling), robustness methods, applications, datasets, evaluation metrics, performance of deep KWS systems and audio-visual KWS. The analysis performed in this paper allows us to identify a number of directions for future research, including directions adopted from automatic speech recognition research and directions that are unique to the problem of spoken KWS.

READ FULL TEXT

page 17

page 21

page 25

page 26

page 27

page 28

page 29

page 30

research
10/15/2021

Advances and Challenges in Deep Lip Reading

Driven by deep learning techniques and large-scale datasets, recent year...
research
05/22/2022

Deep Learning for Visual Speech Analysis: A Survey

Visual speech, referring to the visual domain of speech, has attracted i...
research
07/23/2023

Backdoor Attacks against Voice Recognition Systems: A Survey

Voice Recognition Systems (VRSs) employ deep learning for speech recogni...
research
11/03/2020

Unsupervised Pattern Discovery from Thematic Speech Archives Based on Multilingual Bottleneck Features

The present study tackles the problem of automatically discovering spoke...
research
06/29/2021

A Survey on Neural Speech Synthesis

Text to speech (TTS), or speech synthesis, which aims to synthesize inte...
research
11/30/2022

An Overview of Indian Spoken Language Recognition from Machine Learning Perspective

Automatic spoken language identification (LID) is a very important resea...
research
04/27/2021

Visually grounded models of spoken language: A survey of datasets, architectures and evaluation techniques

This survey provides an overview of the evolution of visually grounded m...

Please sign up or login with your details

Forgot password? Click here to reset