PhonMatchNet: Phoneme-Guided Zero-Shot Keyword Spotting for User-Defined Keywords

08/31/2023
by   Yong-Hyeok Lee, et al.
0

This study presents a novel zero-shot user-defined keyword spotting model that utilizes the audio-phoneme relationship of the keyword to improve performance. Unlike the previous approach that estimates at utterance level, we use both utterance and phoneme level information. Our proposed method comprises a two-stream speech encoder architecture, self-attention-based pattern extractor, and phoneme-level detection loss for high performance in various pronunciation environments. Based on experimental results, our proposed model outperforms the baseline model and achieves competitive performance compared with full-shot keyword spotting models. Our proposed model significantly improves the EER and AUC across all datasets, including familiar words, proper nouns, and indistinguishable pronunciations, with an average relative improvement of 67 proposed model is available at https://github.com/ncsoft/PhonMatchNet.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/30/2022

Learning Audio-Text Agreement for Open-vocabulary Keyword Spotting

In this paper, we propose a novel end-to-end user-defined keyword spotti...
research
10/12/2022

Towards visually prompted keyword localisation for zero-resource spoken languages

Imagine being able to show a system a visual depiction of a keyword and ...
research
11/14/2022

AdaptKeyBERT: An Attention-Based approach towards Few-Shot Zero-Shot Domain Adaptation of KeyBERT

Keyword extraction has been an important topic for modern natural langua...
research
11/12/2022

Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation

Contrastive learning has shown remarkable success in the field of multim...
research
09/02/2020

Seeing wake words: Audio-visual Keyword Spotting

The goal of this work is to automatically determine whether and when a w...
research
07/23/2018

Zero-shot keyword spotting for visual speech recognition in-the-wild

Visual keyword spotting (KWS) is the problem of estimating whether a tex...
research
06/03/2023

Few-Shot Open-Set Learning for On-Device Customization of KeyWord Spotting Systems

A personalized KeyWord Spotting (KWS) pipeline typically requires the tr...

Please sign up or login with your details

Forgot password? Click here to reset